diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000000000000000000000000000000000000..35ed10589b0d052fd7e35d657d8c9ff0eddf5025 --- /dev/null +++ b/docs/README.md @@ -0,0 +1,17 @@ +# Public docs for TensorFlow Models + +This directory contains the top-level public documentation for +[TensorFlow Models](https://github.com/tensorflow/models) + +This directory is mirrored to https://tensorflow.org/tfmodels, and is mainly +concerned with documenting the tools provided in the `tensorflow_models` pip +package (including `orbit`). + +Api-reference pages are +[available on the site](https://www.tensorflow.org/api_docs/more). + +The +[Official Models](https://github.com/tensorflow/models/blob/master/official/projects) +and [Research Models](https://github.com/tensorflow/models/blob/master/research) +directories are not described in detail here, refer to the individual project +directories for more information. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000000000000000000000000000000000000..2b1535a71fe395924a01b62b2668c6d8e94a89b7 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,140 @@ +# Model Garden overview + +The TensorFlow Model Garden provides implementations of many state-of-the-art +machine learning (ML) models for vision and natural language processing (NLP), +as well as workflow tools to let you quickly configure and run those models on +standard datasets. Whether you are looking to benchmark performance for a +well-known model, verify the results of recently released research, or extend +existing models, the Model Garden can help you drive your ML research and +applications forward. + +The Model Garden includes the following resources for machine learning +developers: + +- [**Official models**](#official) for vision and NLP, maintained by Google + engineers +- [**Research models**](#research) published as part of ML research papers +- [**Training experiment framework**](#training_framework) for fast, + declarative training configuration of official models +- [**Specialized ML operations**](#ops) for vision and natural language + processing (NLP) +- [**Model training loop**](#orbit) management with Orbit + +These resources are built to be used with the TensorFlow Core framework and +integrate with your existing TensorFlow development projects. Model +Garden resources are also provided under an [open +source](https://github.com/tensorflow/models/blob/master/LICENSE) license, so +you can freely extend and distribute the models and tools. + +Practical ML models are computationally intensive to train and run, and may +require accelerators such as Graphical Processing Units (GPUs) and Tensor +Processing Units (TPUs). Most of the models in Model Garden were trained on +large datasets using TPUs. However, you can also train and run these models on +GPU and CPU processors. + +## Model Garden models + +The machine learning models in the Model Garden include full code so you can +test, train, or re-train them for research and experimentation. The Model Garden +includes two primary categories of models: *official models* and *research +models*. + +### Official models {:#official} + +The [Official Models](https://github.com/tensorflow/models/tree/master/official) +repository is a collection of state-of-the-art models, with a focus on +vision and natural language processing (NLP). +These models are implemented using current TensorFlow 2.x high-level +APIs. Model libraries in this repository are optimized for fast performance and +actively maintained by Google engineers. The official models include additional +metadata you can use to quickly configure experiments using the Model Garden +[training experiment framework](#training_framework). + +### Research models {:#research} + +The [Research Models](https://github.com/tensorflow/models/tree/master/research) +repository is a collection of models published as code resources for research +papers. These models are implemented using both TensorFlow 1.x and 2.x. Model +libraries in the research folder are supported by the code owners and the +research community. + +## Training experiment framework {:#training_framework} + +The Model Garden training experiment framework lets you quickly assemble and run +training experiments using its official models and standard datasets. The +training framework uses additional metadata included with the Model Garden's +official models to allow you to configure models quickly using a declarative +programming model. You can define a training experiment using Python commands in +the +[TensorFlow Model library](https://www.tensorflow.org/api_docs/python/tfm/core) +or configure training using a YAML configuration file, like this +[example](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml). + +The training framework uses +[`tfm.core.base_trainer.ExperimentConfig`](https://www.tensorflow.org/api_docs/python/tfm/core/base_trainer/ExperimentConfig) +as the configuration object, which contains the following top-level +configuration objects: + +- [`runtime`](https://www.tensorflow.org/api_docs/python/tfm/core/base_task/RuntimeConfig): + Defines the processing hardware, distribution strategy, and other + performance optimizations +- [`task`](https://www.tensorflow.org/api_docs/python/tfm/core/config_definitions/TaskConfig): + Defines the model, training data, losses, and initialization +- [`trainer`](https://www.tensorflow.org/api_docs/python/tfm/core/base_trainer/TrainerConfig): + Defines the optimizer, training loops, evaluation loops, summaries, and + checkpoints + +For a complete example using the Model Garden training experiment framework, see +the [Image classification with Model Garden](vision/image_classification.ipynb) +tutorial. For information on the training experiment framework, check out the +[TensorFlow Models API documentation](https://tensorflow.org/api_docs/python/tfm/core). +If you are looking for a solution to manage training loops for your model +training experiments, check out [Orbit](#orbit). + +## Specialized ML operations {:#ops} + +The Model Garden contains many vision and NLP operations specifically designed +to execute state-of-the-art models that run efficiently on GPUs and TPUs. Review +the TensorFlow Models Vision library API docs for a list of specialized +[vision operations](https://www.tensorflow.org/api_docs/python/tfm/vision). +Review the TensorFlow Models NLP Library API docs for a list of +[NLP operations](https://www.tensorflow.org/api_docs/python/tfm/nlp). These +libraries also include additional utility functions used for vision and NLP data +processing, training, and model execution. + +## Training loops with Orbit {:#orbit} + +There are two default options for training TensorFlow models: + +* Use the high-level Keras +[Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) +function. If your model and training procedure fit the assumptions of Keras' +`Model.fit` (incremental gradient descent on batches of data) method this can +be very convenient. +* Write a custom training loop +[with keras](https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch), +or [without](https://www.tensorflow.org/guide/core/logistic_regression_core). +You can write a custom training loop with low-level TensorFlow methods such as +`tf.GradientTape` or `tf.function`. However, this approach requires a lot of +boilerplate code, and doesn't do anything to simplify distributed training. + +Orbit tries to provide a third option in between these two extremes. + +Orbit is a flexible, lightweight library designed to make it easier to +write custom training loops in TensorFlow 2.x, and works well with the Model +Garden [training experiment framework](#training_framework). Orbit handles +common model training tasks such as saving checkpoints, running model +evaluations, and setting up summary writing. It seamlessly integrates with +`tf.distribute` and supports running on different device types, including CPU, +GPU, and TPU hardware. The Orbit tool is also [open +source](https://github.com/tensorflow/models/blob/master/orbit/LICENSE), so you +can extend and adapt to your model training needs. + +The Orbit guide is available [here](orbit/index.ipynb). + +Note: You can customize how the Keras API executes training. Mainly you must +override the `Model.train_step` method or use `keras.callbacks` like +`callbacks.ModelCheckpoint` or `callbacks.TensorBoard`. For more information +about modifying the behavior of `train_step`, check out the +[Customize what happens in Model.fit](https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit) +page. diff --git a/docs/nlp/_guide_toc.yaml b/docs/nlp/_guide_toc.yaml new file mode 100644 index 0000000000000000000000000000000000000000..5d90c4232a46b6afcd0236a0bdc324258591729e --- /dev/null +++ b/docs/nlp/_guide_toc.yaml @@ -0,0 +1,7 @@ +toc: +- heading: TensorFlow Models - NLP + style: divider +- title: "Overview" + path: /tfmodels/nlp +- title: "Customize a transformer encoder" + path: /tfmodels/nlp/customize_encoder diff --git a/docs/nlp/customize_encoder.ipynb b/docs/nlp/customize_encoder.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..3d81f084d6c8d824188ee5acdf1719113a62fa46 --- /dev/null +++ b/docs/nlp/customize_encoder.ipynb @@ -0,0 +1,596 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Bp8t2AI8i7uP" + }, + "source": [ + "##### Copyright 2022 The TensorFlow Authors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "rxPj2Lsni9O4" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6xS-9i5DrRvO" + }, + "source": [ + "# Customizing a Transformer Encoder" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mwb9uw1cDXsa" + }, + "source": [ + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " View on TensorFlow.org\n", + " \n", + " Run in Google Colab\n", + " \n", + " View source on GitHub\n", + " \n", + " Download notebook\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iLrcV4IyrcGX" + }, + "source": [ + "## Learning objectives\n", + "\n", + "The [TensorFlow Models NLP library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling) is a collection of tools for building and training modern high performance natural language models.\n", + "\n", + "The `tfm.nlp.networks.EncoderScaffold` is the core of this library, and lots of new network architectures are proposed to improve the encoder. In this Colab notebook, we will learn how to customize the encoder to employ new network architectures." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YYxdyoWgsl8t" + }, + "source": [ + "## Install and import" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fEJSFutUsn_h" + }, + "source": [ + "### Install the TensorFlow Model Garden pip package\n", + "\n", + "* `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n", + "which is the nightly Model Garden package created daily automatically.\n", + "* `pip` will install all models and dependencies automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mfHI5JyuJ1y9" + }, + "outputs": [], + "source": [ + "!pip install -q opencv-python" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "thsKZDjhswhR" + }, + "outputs": [], + "source": [ + "!pip install -q tf-models-official" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hpf7JPCVsqtv" + }, + "source": [ + "### Import Tensorflow and other libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "my4dp-RMssQe" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "\n", + "import tensorflow_models as tfm\n", + "nlp = tfm.nlp" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vjDmVsFfs85n" + }, + "source": [ + "## Canonical BERT encoder\n", + "\n", + "Before learning how to customize the encoder, let's firstly create a canonical BERT enoder and use it to instantiate a `bert_classifier.BertClassifier` for classification task." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Oav8sbgstWc-" + }, + "outputs": [], + "source": [ + "cfg = {\n", + " \"vocab_size\": 100,\n", + " \"hidden_size\": 32,\n", + " \"num_layers\": 3,\n", + " \"num_attention_heads\": 4,\n", + " \"intermediate_size\": 64,\n", + " \"activation\": tfm.utils.activations.gelu,\n", + " \"dropout_rate\": 0.1,\n", + " \"attention_dropout_rate\": 0.1,\n", + " \"max_sequence_length\": 16,\n", + " \"type_vocab_size\": 2,\n", + " \"initializer\": tf.keras.initializers.TruncatedNormal(stddev=0.02),\n", + "}\n", + "bert_encoder = nlp.networks.BertEncoder(**cfg)\n", + "\n", + "def build_classifier(bert_encoder):\n", + " return nlp.models.BertClassifier(bert_encoder, num_classes=2)\n", + "\n", + "canonical_classifier_model = build_classifier(bert_encoder)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qe2UWI6_tsHo" + }, + "source": [ + "`canonical_classifier_model` can be trained using the training data. For details about how to train the model, please see the [Fine tuning bert](https://www.tensorflow.org/text/tutorials/fine_tune_bert) notebook. We skip the code that trains the model here.\n", + "\n", + "After training, we can apply the model to do prediction.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "csED2d-Yt5h6" + }, + "outputs": [], + "source": [ + "def predict(model):\n", + " batch_size = 3\n", + " np.random.seed(0)\n", + " word_ids = np.random.randint(\n", + " cfg[\"vocab_size\"], size=(batch_size, cfg[\"max_sequence_length\"]))\n", + " mask = np.random.randint(2, size=(batch_size, cfg[\"max_sequence_length\"]))\n", + " type_ids = np.random.randint(\n", + " cfg[\"type_vocab_size\"], size=(batch_size, cfg[\"max_sequence_length\"]))\n", + " print(model([word_ids, mask, type_ids], training=False))\n", + "\n", + "predict(canonical_classifier_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PzKStEK9t_Pb" + }, + "source": [ + "## Customize BERT encoder\n", + "\n", + "One BERT encoder consists of an embedding network and multiple transformer blocks, and each transformer block contains an attention layer and a feedforward layer." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rmwQfhj6fmKz" + }, + "source": [ + "We provide easy ways to customize each of those components via (1)\n", + "[EncoderScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/encoder_scaffold.py) and (2) [TransformerScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xsMgEVHAui11" + }, + "source": [ + "### Use EncoderScaffold\n", + "\n", + "`networks.EncoderScaffold` allows users to provide a custom embedding subnetwork\n", + " (which will replace the standard embedding logic) and/or a custom hidden layer class (which will replace the `Transformer` instantiation in the encoder)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-JBabpa2AOz8" + }, + "source": [ + "#### Without Customization\n", + "\n", + "Without any customization, `networks.EncoderScaffold` behaves the same the canonical `networks.BertEncoder`.\n", + "\n", + "As shown in the following example, `networks.EncoderScaffold` can load `networks.BertEncoder`'s weights and output the same values:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ktNzKuVByZQf" + }, + "outputs": [], + "source": [ + "default_hidden_cfg = dict(\n", + " num_attention_heads=cfg[\"num_attention_heads\"],\n", + " intermediate_size=cfg[\"intermediate_size\"],\n", + " intermediate_activation=cfg[\"activation\"],\n", + " dropout_rate=cfg[\"dropout_rate\"],\n", + " attention_dropout_rate=cfg[\"attention_dropout_rate\"],\n", + " kernel_initializer=cfg[\"initializer\"],\n", + ")\n", + "default_embedding_cfg = dict(\n", + " vocab_size=cfg[\"vocab_size\"],\n", + " type_vocab_size=cfg[\"type_vocab_size\"],\n", + " hidden_size=cfg[\"hidden_size\"],\n", + " initializer=cfg[\"initializer\"],\n", + " dropout_rate=cfg[\"dropout_rate\"],\n", + " max_seq_length=cfg[\"max_sequence_length\"]\n", + ")\n", + "default_kwargs = dict(\n", + " hidden_cfg=default_hidden_cfg,\n", + " embedding_cfg=default_embedding_cfg,\n", + " num_hidden_instances=cfg[\"num_layers\"],\n", + " pooled_output_dim=cfg[\"hidden_size\"],\n", + " return_all_layer_outputs=True,\n", + " pooler_layer_initializer=cfg[\"initializer\"],\n", + ")\n", + "\n", + "encoder_scaffold = nlp.networks.EncoderScaffold(**default_kwargs)\n", + "classifier_model_from_encoder_scaffold = build_classifier(encoder_scaffold)\n", + "classifier_model_from_encoder_scaffold.set_weights(\n", + " canonical_classifier_model.get_weights())\n", + "predict(classifier_model_from_encoder_scaffold)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sMaUmLyIuwcs" + }, + "source": [ + "#### Customize Embedding\n", + "\n", + "Next, we show how to use a customized embedding network.\n", + "\n", + "We firstly build an embedding network that will replace the default network. This one will have 2 inputs (`mask` and `word_ids`) instead of 3, and won't use positional embeddings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LTinnaG6vcsw" + }, + "outputs": [], + "source": [ + "word_ids = tf.keras.layers.Input(\n", + " shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_word_ids\")\n", + "mask = tf.keras.layers.Input(\n", + " shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_mask\")\n", + "embedding_layer = nlp.layers.OnDeviceEmbedding(\n", + " vocab_size=cfg['vocab_size'],\n", + " embedding_width=cfg['hidden_size'],\n", + " initializer=cfg[\"initializer\"],\n", + " name=\"word_embeddings\")\n", + "word_embeddings = embedding_layer(word_ids)\n", + "attention_mask = nlp.layers.SelfAttentionMask()([word_embeddings, mask])\n", + "new_embedding_network = tf.keras.Model([word_ids, mask],\n", + " [word_embeddings, attention_mask])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HN7_yu-6O3qI" + }, + "source": [ + "Inspecting `new_embedding_network`, we can see it takes two inputs:\n", + "`input_word_ids` and `input_mask`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fO9zKFE4OpHp" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(new_embedding_network, show_shapes=True, dpi=48)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9cOaGQHLv12W" + }, + "source": [ + "We then can build a new encoder using the above `new_embedding_network`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mtFDMNf2vIl9" + }, + "outputs": [], + "source": [ + "kwargs = dict(default_kwargs)\n", + "\n", + "# Use new embedding network.\n", + "kwargs['embedding_cls'] = new_embedding_network\n", + "kwargs['embedding_data'] = embedding_layer.embeddings\n", + "\n", + "encoder_with_customized_embedding = nlp.networks.EncoderScaffold(**kwargs)\n", + "classifier_model = build_classifier(encoder_with_customized_embedding)\n", + "# ... Train the model ...\n", + "print(classifier_model.inputs)\n", + "\n", + "# Assert that there are only two inputs.\n", + "assert len(classifier_model.inputs) == 2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z73ZQDtmwg9K" + }, + "source": [ + "#### Customized Transformer\n", + "\n", + "User can also override the `hidden_cls` argument in `networks.EncoderScaffold`'s constructor to employ a customized Transformer layer.\n", + "\n", + "See [the source of `nlp.layers.ReZeroTransformer`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/rezero_transformer.py) for how to implement a customized Transformer layer.\n", + "\n", + "Following is an example of using `nlp.layers.ReZeroTransformer`:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "uAIarLZgw6pA" + }, + "outputs": [], + "source": [ + "kwargs = dict(default_kwargs)\n", + "\n", + "# Use ReZeroTransformer.\n", + "kwargs['hidden_cls'] = nlp.layers.ReZeroTransformer\n", + "\n", + "encoder_with_rezero_transformer = nlp.networks.EncoderScaffold(**kwargs)\n", + "classifier_model = build_classifier(encoder_with_rezero_transformer)\n", + "# ... Train the model ...\n", + "predict(classifier_model)\n", + "\n", + "# Assert that the variable `rezero_alpha` from ReZeroTransformer exists.\n", + "assert 'rezero_alpha' in ''.join([x.name for x in classifier_model.trainable_weights])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6PMHFdvnxvR0" + }, + "source": [ + "### Use `nlp.layers.TransformerScaffold`\n", + "\n", + "The above method of customizing the model requires rewriting the whole `nlp.layers.Transformer` layer, while sometimes you may only want to customize either attention layer or feedforward block. In this case, `nlp.layers.TransformerScaffold` can be used.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D6FejlgwyAy_" + }, + "source": [ + "#### Customize Attention Layer\n", + "\n", + "User can also override the `attention_cls` argument in `layers.TransformerScaffold`'s constructor to employ a customized Attention layer.\n", + "\n", + "See [the source of `nlp.layers.TalkingHeadsAttention`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/talking_heads_attention.py) for how to implement a customized `Attention` layer.\n", + "\n", + "Following is an example of using `nlp.layers.TalkingHeadsAttention`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nFrSMrZuyNeQ" + }, + "outputs": [], + "source": [ + "# Use TalkingHeadsAttention\n", + "hidden_cfg = dict(default_hidden_cfg)\n", + "hidden_cfg['attention_cls'] = nlp.layers.TalkingHeadsAttention\n", + "\n", + "kwargs = dict(default_kwargs)\n", + "kwargs['hidden_cls'] = nlp.layers.TransformerScaffold\n", + "kwargs['hidden_cfg'] = hidden_cfg\n", + "\n", + "encoder = nlp.networks.EncoderScaffold(**kwargs)\n", + "classifier_model = build_classifier(encoder)\n", + "# ... Train the model ...\n", + "predict(classifier_model)\n", + "\n", + "# Assert that the variable `pre_softmax_weight` from TalkingHeadsAttention exists.\n", + "assert 'pre_softmax_weight' in ''.join([x.name for x in classifier_model.trainable_weights])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tKkZ8spzYmpc" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(encoder_with_rezero_transformer, show_shapes=True, dpi=48)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kuEJcTyByVvI" + }, + "source": [ + "#### Customize Feedforward Layer\n", + "\n", + "Similiarly, one could also customize the feedforward layer.\n", + "\n", + "See [the source of `nlp.layers.GatedFeedforward`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/gated_feedforward.py) for how to implement a customized feedforward layer.\n", + "\n", + "Following is an example of using `nlp.layers.GatedFeedforward`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XAbKy_l4y_-i" + }, + "outputs": [], + "source": [ + "# Use GatedFeedforward\n", + "hidden_cfg = dict(default_hidden_cfg)\n", + "hidden_cfg['feedforward_cls'] = nlp.layers.GatedFeedforward\n", + "\n", + "kwargs = dict(default_kwargs)\n", + "kwargs['hidden_cls'] = nlp.layers.TransformerScaffold\n", + "kwargs['hidden_cfg'] = hidden_cfg\n", + "\n", + "encoder_with_gated_feedforward = nlp.networks.EncoderScaffold(**kwargs)\n", + "classifier_model = build_classifier(encoder_with_gated_feedforward)\n", + "# ... Train the model ...\n", + "predict(classifier_model)\n", + "\n", + "# Assert that the variable `gate` from GatedFeedforward exists.\n", + "assert 'gate' in ''.join([x.name for x in classifier_model.trainable_weights])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a_8NWUhkzeAq" + }, + "source": [ + "### Build a new Encoder\n", + "\n", + "Finally, you could also build a new encoder using building blocks in the modeling library.\n", + "\n", + "See [the source for `nlp.networks.AlbertEncoder`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/albert_encoder.py) as an example of how to do this. \n", + "\n", + "Here is an example using `nlp.networks.AlbertEncoder`:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xsiA3RzUzmUM" + }, + "outputs": [], + "source": [ + "albert_encoder = nlp.networks.AlbertEncoder(**cfg)\n", + "classifier_model = build_classifier(albert_encoder)\n", + "# ... Train the model ...\n", + "predict(classifier_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MeidDfhlHKSO" + }, + "source": [ + "Inspecting the `albert_encoder`, we see it stacks the same `Transformer` layer multiple times (note the loop-back on the \"Transformer\" block below.." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Uv_juT22HERW" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(albert_encoder, show_shapes=True, dpi=48)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "customize_encoder.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/docs/nlp/decoding_api.ipynb b/docs/nlp/decoding_api.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..b89079ee2223020757028ebb71a1dd1dda91e364 --- /dev/null +++ b/docs/nlp/decoding_api.ipynb @@ -0,0 +1,482 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "vXLA5InzXydn" + }, + "source": [ + "##### Copyright 2021 The TensorFlow Authors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "RuRlpLL-X0R_" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2X-XaMSVcLua" + }, + "source": [ + "# Decoding API" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hYEwGTeCXnnX" + }, + "source": [ + "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/nlp/decoding_api\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/nlp/decoding_api.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/nlp/decoding_api.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/nlp/decoding_api.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n", + " \u003c/td\u003e\n", + "\u003c/table\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fsACVQpVSifi" + }, + "source": [ + "### Install the TensorFlow Model Garden pip package\n", + "\n", + "* `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n", + "which is the nightly Model Garden package created daily automatically.\n", + "* pip will install all models and dependencies automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "G4BhAu01HZcM" + }, + "outputs": [], + "source": [ + "!pip uninstall -y opencv-python" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2j-xhrsVQOQT" + }, + "outputs": [], + "source": [ + "!pip install tf-models-official" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BjP7zwxmskpY" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import tensorflow as tf\n", + "\n", + "from tensorflow_models import nlp" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "T92ccAzlnGqh" + }, + "outputs": [], + "source": [ + "def length_norm(length, dtype):\n", + " \"\"\"Return length normalization factor.\"\"\"\n", + " return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0AWgyo-IQ5sP" + }, + "source": [ + "## Overview\n", + "\n", + "This API provides an interface to experiment with different decoding strategies used for auto-regressive models.\n", + "\n", + "1. The following sampling strategies are provided in sampling_module.py, which inherits from the base Decoding class:\n", + " * [top_p](https://arxiv.org/abs/1904.09751) : [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/sampling_module.py#L65) \n", + "\n", + " This implementation chooses most probable logits with cumulative probabilities upto top_p.\n", + "\n", + " * [top_k](https://arxiv.org/pdf/1805.04833.pdf) : [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/sampling_module.py#L48)\n", + "\n", + " At each timestep, this implementation samples from top-k logits based on their probability distribution\n", + "\n", + " * Greedy : [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/sampling_module.py#L26)\n", + "\n", + " This implementation returns the top logits based on probabilities.\n", + "\n", + "2. Beam search is provided in beam_search.py. [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/beam_search.py)\n", + "\n", + " This implementation reduces the risk of missing hidden high probability logits by keeping the most likely num_beams of logits at each time step and eventually choosing the logits that has the overall highest probability." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MfOj7oaBRQnS" + }, + "source": [ + "## Initialize Sampling Module in TF-NLP.\n", + "\n", + "\n", + "\u003e **symbols_to_logits_fn** : This is a closure implemented by the users of the API. The input to this closure will be \n", + "```\n", + "Args:\n", + " 1] ids [batch_size, .. (index + 1 or 1 if padded_decode is True)],\n", + " 2] index [scalar] : current decoded step,\n", + " 3] cache [nested dictionary of tensors].\n", + "Returns:\n", + " 1] tensor for next-step logits [batch_size, vocab]\n", + " 2] the updated_cache [nested dictionary of tensors].\n", + "```\n", + "This closure calls the model to predict the logits for the 'index+1' step. The cache is used for faster decoding.\n", + "Here is a [reference](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/beam_search_test.py#L88) implementation for the above closure.\n", + "\n", + "\n", + "\u003e **length_normalization_fn** : Closure for returning length normalization parameter.\n", + "```\n", + "Args: \n", + " 1] length : scalar for decoded step index.\n", + " 2] dtype : data-type of output tensor\n", + "Returns:\n", + " 1] value of length normalization factor.\n", + "Example :\n", + " def _length_norm(length, dtype):\n", + " return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)\n", + "```\n", + "\n", + "\u003e **vocab_size** : Output vocabulary size.\n", + "\n", + "\u003e **max_decode_length** : Scalar for total number of decoding steps.\n", + "\n", + "\u003e **eos_id** : Decoding will stop if all output decoded ids in the batch have this ID.\n", + "\n", + "\u003e **padded_decode** : Set this to True if running on TPU. Tensors are padded to max_decoding_length if this is True.\n", + "\n", + "\u003e **top_k** : top_k is enabled if this value is \u003e 1.\n", + "\n", + "\u003e **top_p** : top_p is enabled if this value is \u003e 0 and \u003c 1.0\n", + "\n", + "\u003e **sampling_temperature** : This is used to re-estimate the softmax output. Temperature skews the distribution towards high probability tokens and lowers the mass in tail distribution. Value has to be positive. Low temperature is equivalent to greedy and makes the distribution sharper, while high temperature makes it more flat.\n", + "\n", + "\u003e **enable_greedy** : By default, this is true and greedy decoding is enabled.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lV1RRp6ihnGX" + }, + "source": [ + "## Initialize the Model Hyper-parameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eTsGp2gaKLdE" + }, + "outputs": [], + "source": [ + "params = {\n", + " 'num_heads': 2,\n", + " 'num_layers': 2,\n", + " 'batch_size': 2,\n", + " 'n_dims': 256,\n", + " 'max_decode_length': 4}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CYXkoplAij01" + }, + "source": [ + "## Initialize cache. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UGvmd0_dRFYI" + }, + "source": [ + "In auto-regressive architectures like Transformer based [Encoder-Decoder](https://arxiv.org/abs/1706.03762) models, \n", + "Cache is used for fast sequential decoding.\n", + "It is a nested dictionary storing pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) for every layer." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "D6kfZOOKgkm1" + }, + "outputs": [], + "source": [ + "cache = {\n", + " 'layer_%d' % layer: {\n", + " 'k': tf.zeros(\n", + " shape=[params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims'] // params['num_heads']],\n", + " dtype=tf.float32),\n", + " 'v': tf.zeros(\n", + " shape=[params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims'] // params['num_heads']],\n", + " dtype=tf.float32)\n", + " } for layer in range(params['num_layers'])\n", + " }\n", + "print(\"cache value shape for layer 1 :\", cache['layer_1']['k'].shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "syl7I5nURPgW" + }, + "source": [ + "### Create model_fn\n", + " In practice, this will be replaced by an actual model implementation such as [here](https://github.com/tensorflow/models/blob/master/official/nlp/transformer/transformer.py#L236)\n", + "```\n", + "Args:\n", + "i : Step that is being decoded.\n", + "Returns:\n", + " logit probabilities of size [batch_size, 1, vocab_size]\n", + "```\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AhzSkRisRdB6" + }, + "outputs": [], + "source": [ + "probabilities = tf.constant([[[0.3, 0.4, 0.3], [0.3, 0.3, 0.4],\n", + " [0.1, 0.1, 0.8], [0.1, 0.1, 0.8]],\n", + " [[0.2, 0.5, 0.3], [0.2, 0.7, 0.1],\n", + " [0.1, 0.1, 0.8], [0.1, 0.1, 0.8]]])\n", + "def model_fn(i):\n", + " return probabilities[:, i, :]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FAJ4CpbfVdjr" + }, + "outputs": [], + "source": [ + "def _symbols_to_logits_fn():\n", + " \"\"\"Calculates logits of the next tokens.\"\"\"\n", + " def symbols_to_logits_fn(ids, i, temp_cache):\n", + " del ids\n", + " logits = tf.cast(tf.math.log(model_fn(i)), tf.float32)\n", + " return logits, temp_cache\n", + " return symbols_to_logits_fn" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R_tV3jyWVL47" + }, + "source": [ + "## Greedy \n", + "Greedy decoding selects the token id with the highest probability as its next id: $id_t = argmax_{w}P(id | id_{1:t-1})$ at each timestep $t$. The following sketch shows greedy decoding. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "aGt9idSkVQEJ" + }, + "outputs": [], + "source": [ + "greedy_obj = sampling_module.SamplingModule(\n", + " length_normalization_fn=None,\n", + " dtype=tf.float32,\n", + " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", + " vocab_size=3,\n", + " max_decode_length=params['max_decode_length'],\n", + " eos_id=10,\n", + " padded_decode=False)\n", + "ids, _ = greedy_obj.generate(\n", + " initial_ids=tf.constant([9, 1]), initial_cache=cache)\n", + "print(\"Greedy Decoded Ids:\", ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s4pTTsQXVz5O" + }, + "source": [ + "## top_k sampling\n", + "In *Top-K* sampling, the *K* most likely next token ids are filtered and the probability mass is redistributed among only those *K* ids. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pCLWIn6GV5_G" + }, + "outputs": [], + "source": [ + "top_k_obj = sampling_module.SamplingModule(\n", + " length_normalization_fn=length_norm,\n", + " dtype=tf.float32,\n", + " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", + " vocab_size=3,\n", + " max_decode_length=params['max_decode_length'],\n", + " eos_id=10,\n", + " sample_temperature=tf.constant(1.0),\n", + " top_k=tf.constant(3),\n", + " padded_decode=False,\n", + " enable_greedy=False)\n", + "ids, _ = top_k_obj.generate(\n", + " initial_ids=tf.constant([9, 1]), initial_cache=cache)\n", + "print(\"top-k sampled Ids:\", ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Jp3G-eE_WI4Y" + }, + "source": [ + "## top_p sampling\n", + "Instead of sampling only from the most likely *K* token ids, in *Top-p* sampling chooses from the smallest possible set of ids whose cumulative probability exceeds the probability *p*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rEGdIWcuWILO" + }, + "outputs": [], + "source": [ + "top_p_obj = sampling_module.SamplingModule(\n", + " length_normalization_fn=length_norm,\n", + " dtype=tf.float32,\n", + " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", + " vocab_size=3,\n", + " max_decode_length=params['max_decode_length'],\n", + " eos_id=10,\n", + " sample_temperature=tf.constant(1.0),\n", + " top_p=tf.constant(0.9),\n", + " padded_decode=False,\n", + " enable_greedy=False)\n", + "ids, _ = top_p_obj.generate(\n", + " initial_ids=tf.constant([9, 1]), initial_cache=cache)\n", + "print(\"top-p sampled Ids:\", ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2hcuyJ2VWjDz" + }, + "source": [ + "## Beam search decoding\n", + "Beam search reduces the risk of missing hidden high probability token ids by keeping the most likely num_beams of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cJ3WzvSrWmSA" + }, + "outputs": [], + "source": [ + "beam_size = 2\n", + "params['batch_size'] = 1\n", + "beam_cache = {\n", + " 'layer_%d' % layer: {\n", + " 'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']], dtype=tf.float32),\n", + " 'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']], dtype=tf.float32)\n", + " } for layer in range(params['num_layers'])\n", + " }\n", + "print(\"cache key shape for layer 1 :\", beam_cache['layer_1']['k'].shape)\n", + "ids, _ = beam_search.sequence_beam_search(\n", + " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", + " initial_ids=tf.constant([9], tf.int32),\n", + " initial_cache=beam_cache,\n", + " vocab_size=3,\n", + " beam_size=beam_size,\n", + " alpha=0.6,\n", + " max_decode_length=params['max_decode_length'],\n", + " eos_id=10,\n", + " padded_decode=False,\n", + " dtype=tf.float32)\n", + "print(\"Beam search ids:\", ids)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "name": "decoding_api_in_tf_nlp.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/nlp/fine_tune_bert.ipynb b/docs/nlp/fine_tune_bert.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..4fd65478b65a7511a1eb2fd08591fa20fa2a89c4 --- /dev/null +++ b/docs/nlp/fine_tune_bert.ipynb @@ -0,0 +1,1582 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "vXLA5InzXydn" + }, + "source": [ + "##### Copyright 2019 The TensorFlow Authors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "RuRlpLL-X0R_" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1mLJmVotXs64" + }, + "source": [ + "# Fine-tuning a BERT model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hYEwGTeCXnnX" + }, + "source": [ + "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/nlp/fine_tune_bert\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/nlp/fine_tune_bert.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/nlp/fine_tune_bert.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/nlp/fine_tune_bert.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca href=\"https://tfhub.dev/google/collections/bert\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/hub_logo_32px.png\" /\u003eSee TF Hub model\u003c/a\u003e\n", + " \u003c/td\u003e\n", + "\u003c/table\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YN2ACivEPxgD" + }, + "source": [ + "This tutorial demonstrates how to fine-tune a [Bidirectional Encoder Representations from Transformers (BERT)](https://arxiv.org/abs/1810.04805) (Devlin et al., 2018) model using [TensorFlow Model Garden](https://github.com/tensorflow/models).\n", + "\n", + "You can also find the pre-trained BERT model used in this tutorial on [TensorFlow Hub (TF Hub)](https://tensorflow.org/hub). For concrete examples of how to use the models from TF Hub, refer to the [Solve Glue tasks using BERT](https://www.tensorflow.org/text/tutorials/bert_glue) tutorial. If you're just trying to fine-tune a model, the TF Hub tutorial is a good starting point.\n", + "\n", + "On the other hand, if you're interested in deeper customization, follow this tutorial. It shows how to do a lot of things manually, so you can learn how you can customize the workflow from data preprocessing to training, exporting and saving the model." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s2d9S2CSSO1z" + }, + "source": [ + "## Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "69de3375e32a" + }, + "source": [ + "### Install pip packages" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fsACVQpVSifi" + }, + "source": [ + "Start by installing the TensorFlow Text and Model Garden pip packages.\n", + "\n", + "* `tf-models-official` is the TensorFlow Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` GitHub repo. To include the latest changes, you may install `tf-models-nightly`, which is the nightly Model Garden package created daily automatically.\n", + "* pip will install all models and dependencies automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sE6XUxLOf1s-" + }, + "outputs": [], + "source": [ + "!pip install -q opencv-python" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yic2y7_o-BCC" + }, + "outputs": [], + "source": [ + "!pip install -q -U \"tensorflow-text==2.9.*\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NvNr2svBM-p3" + }, + "outputs": [], + "source": [ + "!pip install -q tf-models-official" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U-7qPCjWUAyy" + }, + "source": [ + "### Import libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lXsXev5MNr20" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import tensorflow as tf\n", + "import tensorflow_models as tfm\n", + "import tensorflow_hub as hub\n", + "import tensorflow_datasets as tfds\n", + "tfds.disable_progress_bar()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mbanlzTvJBsz" + }, + "source": [ + "### Resources" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PpW0x8TpR8DT" + }, + "source": [ + "The following directory contains the BERT model's configuration, vocabulary, and a pre-trained checkpoint used in this tutorial:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "vzRHOLciR8eq" + }, + "outputs": [], + "source": [ + "gs_folder_bert = \"gs://cloud-tpu-checkpoints/bert/v3/uncased_L-12_H-768_A-12\"\n", + "tf.io.gfile.listdir(gs_folder_bert)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Qv6abtRvH4xO" + }, + "source": [ + "## Load and preprocess the dataset\n", + "\n", + "This example uses the GLUE (General Language Understanding Evaluation) MRPC (Microsoft Research Paraphrase Corpus) [dataset from TensorFlow Datasets (TFDS)](https://www.tensorflow.org/datasets/catalog/glue#gluemrpc).\n", + "\n", + "This dataset is not set up such that it can be directly fed into the BERT model. The following section handles the necessary preprocessing." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "28DvUhC1YUiB" + }, + "source": [ + "### Get the dataset from TensorFlow Datasets\n", + "\n", + "The GLUE MRPC (Dolan and Brockett, 2005) dataset is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent. It has the following attributes:\n", + "\n", + "* Number of labels: 2\n", + "* Size of training dataset: 3668\n", + "* Size of evaluation dataset: 408\n", + "* Maximum sequence length of training and evaluation dataset: 128\n", + "\n", + "Begin by loading the MRPC dataset from TFDS:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ijikx5OsH9AT" + }, + "outputs": [], + "source": [ + "batch_size=32\n", + "glue, info = tfds.load('glue/mrpc',\n", + " with_info=True,\n", + " batch_size=32)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QcMTJU4N7VX-" + }, + "outputs": [], + "source": [ + "glue" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZgBg2r2nYT-K" + }, + "source": [ + "The `info` object describes the dataset and its features:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IQrHxv7W7jH5" + }, + "outputs": [], + "source": [ + "info.features" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vhsVWYNxazz5" + }, + "source": [ + "The two classes are:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "n0gfc_VTayfQ" + }, + "outputs": [], + "source": [ + "info.features['label'].names" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "38zJcap6xkbC" + }, + "source": [ + "Here is one example from the training set:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xON_i6SkwApW" + }, + "outputs": [], + "source": [ + "example_batch = next(iter(glue['train']))\n", + "\n", + "for key, value in example_batch.items():\n", + " print(f\"{key:9s}: {value[0].numpy()}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R9vEWgKA4SxV" + }, + "source": [ + "### Preprocess the data\n", + "\n", + "The keys `\"sentence1\"` and `\"sentence2\"` in the GLUE MRPC dataset contain two input sentences for each example.\n", + "\n", + "Because the BERT model from the Model Garden doesn't take raw text as input, two things need to happen first:\n", + "\n", + "1. The text needs to be _tokenized_ (split into word pieces) and converted to _indices_.\n", + "2. Then, the _indices_ need to be packed into the format that the model expects." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9fbTyfJpNr7x" + }, + "source": [ + "#### The BERT tokenizer" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wqeN54S61ZKQ" + }, + "source": [ + "To fine tune a pre-trained language model from the Model Garden, such as BERT, you need to make sure that you're using exactly the same tokenization, vocabulary, and index mapping as used during training.\n", + "\n", + "The following code rebuilds the tokenizer that was used by the base model using the Model Garden's `tfm.nlp.layers.FastWordpieceBertTokenizer` layer:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-DK4q5wEBmlB" + }, + "outputs": [], + "source": [ + "tokenizer = tfm.nlp.layers.FastWordpieceBertTokenizer(\n", + " vocab_file=os.path.join(gs_folder_bert, \"vocab.txt\"),\n", + " lower_case=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zYHDSquU2lDU" + }, + "source": [ + "Let's tokenize a test sentence:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "L_OfOYPg853R" + }, + "outputs": [], + "source": [ + "tokens = tokenizer(tf.constant([\"Hello TensorFlow!\"]))\n", + "tokens" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MfjaaMYy5Gt8" + }, + "source": [ + "Learn more about the tokenization process in the [Subword tokenization](https://www.tensorflow.org/text/guide/subwords_tokenizer) and [Tokenizing with TensorFlow Text](https://www.tensorflow.org/text/guide/tokenizers) guides." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wd1b09OO5GJl" + }, + "source": [ + "#### Pack the inputs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "62UTWLQd9-LB" + }, + "source": [ + "TensorFlow Model Garden's BERT model doesn't just take the tokenized strings as input. It also expects these to be packed into a particular format. `tfm.nlp.layers.BertPackInputs` layer can handle the conversion from _a list of tokenized sentences_ to the input format expected by the Model Garden's BERT model.\n", + "\n", + "`tfm.nlp.layers.BertPackInputs` packs the two input sentences (per example in the MRCP dataset) concatenated together. This input is expected to start with a `[CLS]` \"This is a classification problem\" token, and each sentence should end with a `[SEP]` \"Separator\" token.\n", + "\n", + "Therefore, the `tfm.nlp.layers.BertPackInputs` layer's constructor takes the `tokenizer`'s special tokens as an argument. It also needs to know the indices of the tokenizer's special tokens." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5iroDlrFDRcF" + }, + "outputs": [], + "source": [ + "special = tokenizer.get_special_tokens_dict()\n", + "special" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b71HarkuG92H" + }, + "outputs": [], + "source": [ + "max_seq_length = 128\n", + "\n", + "packer = tfm.nlp.layers.BertPackInputs(\n", + " seq_length=max_seq_length,\n", + " special_tokens_dict = tokenizer.get_special_tokens_dict())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CZlSZbYd6liN" + }, + "source": [ + "The `packer` takes a list of tokenized sentences as input. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "27dU_VkJHc9S" + }, + "outputs": [], + "source": [ + "sentences1 = [\"hello tensorflow\"]\n", + "tok1 = tokenizer(sentences1)\n", + "tok1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LURHmNOSHnWN" + }, + "outputs": [], + "source": [ + "sentences2 = [\"goodbye tensorflow\"]\n", + "tok2 = tokenizer(sentences2)\n", + "tok2" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r8bvB8gI8BqP" + }, + "source": [ + "Then, it returns a dictionary containing three outputs:\n", + "\n", + "- `input_word_ids`: The tokenized sentences packed together.\n", + "- `input_mask`: The mask indicating which locations are valid in the other outputs.\n", + "- `input_type_ids`: Indicating which sentence each token belongs to." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YsIDTOMJHrUQ" + }, + "outputs": [], + "source": [ + "packed = packer([tok1, tok2])\n", + "\n", + "for key, tensor in packed.items():\n", + " print(f\"{key:15s}: {tensor[:, :12]}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "red4tRcq74Qc" + }, + "source": [ + "#### Put it all together\n", + "\n", + "Combine these two parts into a `keras.layers.Layer` that can be attached to your model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9Qtz-tv-6nz6" + }, + "outputs": [], + "source": [ + "class BertInputProcessor(tf.keras.layers.Layer):\n", + " def __init__(self, tokenizer, packer):\n", + " super().__init__()\n", + " self.tokenizer = tokenizer\n", + " self.packer = packer\n", + "\n", + " def call(self, inputs):\n", + " tok1 = self.tokenizer(inputs['sentence1'])\n", + " tok2 = self.tokenizer(inputs['sentence2'])\n", + "\n", + " packed = self.packer([tok1, tok2])\n", + "\n", + " if 'label' in inputs:\n", + " return packed, inputs['label']\n", + " else:\n", + " return packed" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rdy9wp499btU" + }, + "source": [ + "But for now just apply it to the dataset using `Dataset.map`, since the dataset you loaded from TFDS is a `tf.data.Dataset` object:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qmyh76AL7VAs" + }, + "outputs": [], + "source": [ + "bert_inputs_processor = BertInputProcessor(tokenizer, packer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "B8SSCtDe9MCk" + }, + "outputs": [], + "source": [ + "glue_train = glue['train'].map(bert_inputs_processor).prefetch(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KXpiDosO9rkY" + }, + "source": [ + "Here is an example batch from the processed dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ffNvDE6t9rP-" + }, + "outputs": [], + "source": [ + "example_inputs, example_labels = next(iter(glue_train))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5sxtTuUi-bXt" + }, + "outputs": [], + "source": [ + "example_inputs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wP4z_-9a-dFk" + }, + "outputs": [], + "source": [ + "example_labels" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jyjTdGpFhO_1" + }, + "outputs": [], + "source": [ + "for key, value in example_inputs.items():\n", + " print(f'{key:15s} shape: {value.shape}')\n", + "\n", + "print(f'{\"labels\":15s} shape: {example_labels.shape}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mkGHN_FK-50U" + }, + "source": [ + "The `input_word_ids` contain the token IDs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eGL1_ktWLcgF" + }, + "outputs": [], + "source": [ + "plt.pcolormesh(example_inputs['input_word_ids'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ulNZ4U96-8JZ" + }, + "source": [ + "The mask allows the model to cleanly differentiate between the content and the padding. The mask has the same shape as the `input_word_ids`, and contains a `1` anywhere the `input_word_ids` is not padding." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "zB7mW7DGK3rW" + }, + "outputs": [], + "source": [ + "plt.pcolormesh(example_inputs['input_mask'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rxLenwAvCkBf" + }, + "source": [ + "The \"input type\" also has the same shape, but inside the non-padded region, contains a `0` or a `1` indicating which sentence the token is a part of." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2CetH_5C9P2m" + }, + "outputs": [], + "source": [ + "plt.pcolormesh(example_inputs['input_type_ids'])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pxHHeyei_sb9" + }, + "source": [ + "Apply the same preprocessing to the validation and test subsets of the GLUE MRPC dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yuLKxf6zHxw-" + }, + "outputs": [], + "source": [ + "glue_validation = glue['validation'].map(bert_inputs_processor).prefetch(1)\n", + "glue_test = glue['test'].map(bert_inputs_processor).prefetch(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FSwymsbkbLDA" + }, + "source": [ + "## Build, train and export the model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bxxO3pJCEM9p" + }, + "source": [ + "Now that you have formatted the data as expected, you can start working on building and training the model." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Efrj3Cn1kLAp" + }, + "source": [ + "### Build the model\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xxpOY5r2Ayq6" + }, + "source": [ + "The first step is to download the configuration file—`config_dict`—for the pre-trained BERT model:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "v7ap0BONSJuz" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "bert_config_file = os.path.join(gs_folder_bert, \"bert_config.json\")\n", + "config_dict = json.loads(tf.io.gfile.GFile(bert_config_file).read())\n", + "config_dict" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pKaEaKJSX85J" + }, + "outputs": [], + "source": [ + "encoder_config = tfm.nlp.encoders.EncoderConfig({\n", + " 'type':'bert',\n", + " 'bert': config_dict\n", + "})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LbgzWukNSqOS" + }, + "outputs": [], + "source": [ + "bert_encoder = tfm.nlp.encoders.build_encoder(encoder_config)\n", + "bert_encoder" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "96ldxDSwkVkj" + }, + "source": [ + "The configuration file defines the core BERT model from the Model Garden, which is a Keras model that predicts the outputs of `num_classes` from the inputs with maximum sequence length `max_seq_length`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cH682__U0FBv" + }, + "outputs": [], + "source": [ + "bert_classifier = tfm.nlp.models.BertClassifier(network=bert_encoder, num_classes=2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XqKp3-5GIZlw" + }, + "source": [ + "The classifier has three inputs and one output:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bAQblMIjwkvx" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, dpi=48)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sFmVG4SKZAw8" + }, + "source": [ + "Run it on a test batch of data 10 examples from the training set. The output is the logits for the two classes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "VTjgPbp4ZDKo" + }, + "outputs": [], + "source": [ + "bert_classifier(\n", + " example_inputs, training=True).numpy()[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q0NTdwZsQK8n" + }, + "source": [ + "The `TransformerEncoder` in the center of the classifier above **is** the `bert_encoder`.\n", + "\n", + "If you inspect the encoder, notice the stack of `Transformer` layers connected to those same three inputs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8L__-erBwLIQ" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(bert_encoder, show_shapes=True, dpi=48)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mKAvkQc3heSy" + }, + "source": [ + "### Restore the encoder weights\n", + "\n", + "When built, the encoder is randomly initialized. Restore the encoder's weights from the checkpoint:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "97Ll2Gichd_Y" + }, + "outputs": [], + "source": [ + "checkpoint = tf.train.Checkpoint(encoder=bert_encoder)\n", + "checkpoint.read(\n", + " os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2oHOql35k3Dd" + }, + "source": [ + "Note: The pretrained `TransformerEncoder` is also available on [TensorFlow Hub](https://tensorflow.org/hub). Go to the [TF Hub appendix](#hub_bert) for details." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "115caFLMk-_l" + }, + "source": [ + "### Set up the optimizer\n", + "\n", + "BERT typically uses the Adam optimizer with weight decay—[AdamW](https://arxiv.org/abs/1711.05101) (`tf.keras.optimizers.experimental.AdamW`).\n", + "It also employs a learning rate schedule that first warms up from 0 and then decays to 0:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c0jBycPDtkxR" + }, + "outputs": [], + "source": [ + "# Set up epochs and steps\n", + "epochs = 5\n", + "batch_size = 32\n", + "eval_batch_size = 32\n", + "\n", + "train_data_size = info.splits['train'].num_examples\n", + "steps_per_epoch = int(train_data_size / batch_size)\n", + "num_train_steps = steps_per_epoch * epochs\n", + "warmup_steps = int(0.1 * num_train_steps)\n", + "initial_learning_rate=2e-5" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GFankgHK0Rvh" + }, + "source": [ + "Linear decay from `initial_learning_rate` to zero over `num_train_steps`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qWSyT8P2j4mV" + }, + "outputs": [], + "source": [ + "linear_decay = tf.keras.optimizers.schedules.PolynomialDecay(\n", + " initial_learning_rate=initial_learning_rate,\n", + " end_learning_rate=0,\n", + " decay_steps=num_train_steps)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "anZPZPAP0Y3n" + }, + "source": [ + "Warmup to that value over `warmup_steps`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "z_AsVCiRkoN1" + }, + "outputs": [], + "source": [ + "warmup_schedule = tfm.optimization.lr_schedule.LinearWarmup(\n", + " warmup_learning_rate = 0,\n", + " after_warmup_lr_sched = linear_decay,\n", + " warmup_steps = warmup_steps\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "arfbaK6t0kH_" + }, + "source": [ + "The overall schedule looks like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rYZGunhqbGUZ" + }, + "outputs": [], + "source": [ + "x = tf.linspace(0, num_train_steps, 1001)\n", + "y = [warmup_schedule(xi) for xi in x]\n", + "plt.plot(x,y)\n", + "plt.xlabel('Train step')\n", + "plt.ylabel('Learning rate')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bjsmG_fm0opn" + }, + "source": [ + "Use `tf.keras.optimizers.experimental.AdamW` to instantiate the optimizer with that schedule:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "R8pTNuKIw1dA" + }, + "outputs": [], + "source": [ + "optimizer = tf.keras.optimizers.experimental.Adam(\n", + " learning_rate = warmup_schedule)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "78FEUOOEkoP0" + }, + "source": [ + "### Train the model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OTNcA0O0nSq9" + }, + "source": [ + "Set the metric as accuracy and the loss as sparse categorical cross-entropy. Then, compile and train the BERT classifier:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "d5FeL0b6j7ky" + }, + "outputs": [], + "source": [ + "metrics = [tf.keras.metrics.SparseCategoricalAccuracy('accuracy', dtype=tf.float32)]\n", + "loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n", + "\n", + "bert_classifier.compile(\n", + " optimizer=optimizer,\n", + " loss=loss,\n", + " metrics=metrics)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CsrylctIj_Xy" + }, + "outputs": [], + "source": [ + "bert_classifier.evaluate(glue_validation)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hgPPc2oNmcVZ" + }, + "outputs": [], + "source": [ + "bert_classifier.fit(\n", + " glue_train,\n", + " validation_data=(glue_validation),\n", + " batch_size=32,\n", + " epochs=epochs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IFtKFWbNKb0u" + }, + "source": [ + "Now run the fine-tuned model on a custom example to see that it works.\n", + "\n", + "Start by encoding some sentence pairs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "S1sdW6lLWaEi" + }, + "outputs": [], + "source": [ + "my_examples = {\n", + " 'sentence1':[\n", + " 'The rain in Spain falls mainly on the plain.',\n", + " 'Look I fine tuned BERT.'],\n", + " 'sentence2':[\n", + " 'It mostly rains on the flat lands of Spain.',\n", + " 'Is it working? This does not match.']\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7ynJibkBRTJF" + }, + "source": [ + "The model should report class `1` \"match\" for the first example and class `0` \"no-match\" for the second:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "umo0ttrgRYIM" + }, + "outputs": [], + "source": [ + "ex_packed = bert_inputs_processor(my_examples)\n", + "my_logits = bert_classifier(ex_packed, training=False)\n", + "\n", + "result_cls_ids = tf.argmax(my_logits)\n", + "result_cls_ids" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HNdmOEHKT7e8" + }, + "outputs": [], + "source": [ + "tf.gather(tf.constant(info.features['label'].names), result_cls_ids)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fVo_AnT0l26j" + }, + "source": [ + "### Export the model\n", + "\n", + "Often the goal of training a model is to _use_ it for something outside of the Python process that created it. You can do this by exporting the model using `tf.saved_model`. (Learn more in the [Using the SavedModel format](https://www.tensorflow.org/guide/saved_model) guide and the [Save and load a model using a distribution strategy](https://www.tensorflow.org/tutorials/distribute/save_and_load) tutorial.)\n", + "\n", + "First, build a wrapper class to export the model. This wrapper does two things:\n", + "\n", + "- First it packages `bert_inputs_processor` and `bert_classifier` together into a single `tf.Module`, so you can export all the functionalities.\n", + "- Second it defines a `tf.function` that implements the end-to-end execution of the model.\n", + "\n", + "Setting the `input_signature` argument of `tf.function` lets you define a fixed signature for the `tf.function`. This can be less surprising than the default automatic retracing behavior." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "78h83mlt9wpY" + }, + "outputs": [], + "source": [ + "class ExportModel(tf.Module):\n", + " def __init__(self, input_processor, classifier):\n", + " self.input_processor = input_processor\n", + " self.classifier = classifier\n", + "\n", + " @tf.function(input_signature=[{\n", + " 'sentence1': tf.TensorSpec(shape=[None], dtype=tf.string),\n", + " 'sentence2': tf.TensorSpec(shape=[None], dtype=tf.string)}])\n", + " def __call__(self, inputs):\n", + " packed = self.input_processor(inputs)\n", + " logits = self.classifier(packed, training=False)\n", + " result_cls_ids = tf.argmax(logits)\n", + " return {\n", + " 'logits': logits,\n", + " 'class_id': result_cls_ids,\n", + " 'class': tf.gather(\n", + " tf.constant(info.features['label'].names),\n", + " result_cls_ids)\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qnxysGUfIgFQ" + }, + "source": [ + "Create an instance of this export-model and save it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TmHW9DEFUZ0X" + }, + "outputs": [], + "source": [ + "export_model = ExportModel(bert_inputs_processor, bert_classifier)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Nl5x6nElZqkP" + }, + "outputs": [], + "source": [ + "import tempfile\n", + "export_dir=tempfile.mkdtemp(suffix='_saved_model')\n", + "tf.saved_model.save(export_model, export_dir=export_dir,\n", + " signatures={'serving_default': export_model.__call__})" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Pd8B5dy-ImDJ" + }, + "source": [ + "Reload the model and compare the results to the original:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9cAhHySVXHD5" + }, + "outputs": [], + "source": [ + "original_logits = export_model(my_examples)['logits']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "H9cAcYwfW2fy" + }, + "outputs": [], + "source": [ + "reloaded = tf.saved_model.load(export_dir)\n", + "reloaded_logits = reloaded(my_examples)['logits']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "y_ACvKPsVUXC" + }, + "outputs": [], + "source": [ + "# The results are identical:\n", + "print(original_logits.numpy())\n", + "print()\n", + "print(reloaded_logits.numpy())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lBlPP20dXPFR" + }, + "outputs": [], + "source": [ + "print(np.mean(abs(original_logits - reloaded_logits)))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CPsg7dZwfBM2" + }, + "source": [ + "Congratulations! You've used `tensorflow_models` to build a BERT-classifier, train it, and export for later use." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eQceYqRFT_Eg" + }, + "source": [ + "## Optional: BERT on TF Hub" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QbklKt-w_CiI" + }, + "source": [ + "\u003ca id=\"hub_bert\"\u003e\u003c/a\u003e\n", + "\n", + "\n", + "You can get the BERT model off the shelf from [TF Hub](https://tfhub.dev/). There are [many versions available along with their input preprocessors](https://tfhub.dev/google/collections/bert/1).\n", + "\n", + "This example uses [a small version of BERT from TF Hub](https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2) that was pre-trained using the English Wikipedia and BooksCorpus datasets, similar to the [original implementation](https://arxiv.org/abs/1908.08962) (Turc et al., 2019).\n", + "\n", + "Start by importing TF Hub:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GDWrHm0BGpbX" + }, + "outputs": [], + "source": [ + "import tensorflow_hub as hub" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f02f38f83ac4" + }, + "source": [ + "Select the input preprocessor and the model from TF Hub and wrap them as `hub.KerasLayer` layers:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lo6479At4sP1" + }, + "outputs": [], + "source": [ + "# Always make sure you use the right preprocessor.\n", + "hub_preprocessor = hub.KerasLayer(\n", + " \"https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3\")\n", + "\n", + "# This is a really small BERT.\n", + "hub_encoder = hub.KerasLayer(f\"https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-2_H-128_A-2/2\",\n", + " trainable=True)\n", + "\n", + "print(f\"The Hub encoder has {len(hub_encoder.trainable_variables)} trainable variables\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iTzF574wivQv" + }, + "source": [ + "Test run the preprocessor on a batch of data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GOASSKR5R3-N" + }, + "outputs": [], + "source": [ + "hub_inputs = hub_preprocessor(['Hello TensorFlow!'])\n", + "{key: value[0, :10].numpy() for key, value in hub_inputs.items()} " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XEcYrCR45Uwo" + }, + "outputs": [], + "source": [ + "result = hub_encoder(\n", + " inputs=hub_inputs,\n", + " training=False,\n", + ")\n", + "\n", + "print(\"Pooled output shape:\", result['pooled_output'].shape)\n", + "print(\"Sequence output shape:\", result['sequence_output'].shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cjojn8SmLSRI" + }, + "source": [ + "At this point it would be simple to add a classification head yourself.\n", + "\n", + "The Model Garden `tfm.nlp.models.BertClassifier` class can also build a classifier onto the TF Hub encoder:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9nTDaApyLR70" + }, + "outputs": [], + "source": [ + "hub_classifier = tfm.nlp.models.BertClassifier(\n", + " bert_encoder,\n", + " num_classes=2,\n", + " dropout_rate=0.1,\n", + " initializer=tf.keras.initializers.TruncatedNormal(\n", + " stddev=0.02))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xMJX3wV0_v7I" + }, + "source": [ + "The one downside to loading this model from TF Hub is that the structure of internal Keras layers is not restored. This makes it more difficult to inspect or modify the model.\n", + "\n", + "The BERT encoder model—`hub_classifier`—is now a single layer:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pD71dnvhM2QS" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(hub_classifier, show_shapes=True, dpi=64)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u_IqwXjRV1vd" + }, + "source": [ + "For concrete examples of this approach, refer to [Solve Glue tasks using BERT](https://www.tensorflow.org/text/tutorials/bert_glue)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ji3tdLz101km" + }, + "source": [ + "## Optional: Optimizer `config`s\n", + "\n", + "The `tensorflow_models` package defines serializable `config` classes that describe how to build the live objects. Earlier in this tutorial, you built the optimizer manually.\n", + "\n", + "The configuration below describes an (almost) identical optimizer built by the `optimizer_factory.OptimizerFactory`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Fdb9C1ontnH_" + }, + "outputs": [], + "source": [ + "optimization_config = tfm.optimization.OptimizationConfig(\n", + " optimizer=tfm.optimization.OptimizerConfig(\n", + " type = \"adam\"),\n", + " learning_rate = tfm.optimization.LrConfig(\n", + " type='polynomial',\n", + " polynomial=tfm.optimization.PolynomialLrConfig(\n", + " initial_learning_rate=2e-5,\n", + " end_learning_rate=0.0,\n", + " decay_steps=num_train_steps)),\n", + " warmup = tfm.optimization.WarmupConfig(\n", + " type='linear',\n", + " linear=tfm.optimization.LinearWarmupConfig(warmup_steps=warmup_steps)\n", + " ))\n", + "\n", + "\n", + "fac = tfm.optimization.optimizer_factory.OptimizerFactory(optimization_config)\n", + "lr = fac.build_learning_rate()\n", + "optimizer = fac.build_optimizer(lr=lr)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Rp7R1hBfv5HG" + }, + "outputs": [], + "source": [ + "x = tf.linspace(0, num_train_steps, 1001).numpy()\n", + "y = [lr(xi) for xi in x]\n", + "plt.plot(x,y)\n", + "plt.xlabel('Train step')\n", + "plt.ylabel('Learning rate')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ywn5miD_dnuh" + }, + "source": [ + "The advantage to using `config` objects is that they don't contain any complicated TensorFlow objects, and can be easily serialized to JSON, and rebuilt. Here's the JSON for the above `tfm.optimization.OptimizationConfig`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "zo5RV5lud81Y" + }, + "outputs": [], + "source": [ + "optimization_config = optimization_config.as_dict()\n", + "optimization_config" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Z6qPXPEhekkd" + }, + "source": [ + "The `tfm.optimization.optimizer_factory.OptimizerFactory` can just as easily build the optimizer from the JSON dictionary:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "p-bYrvfMYsxp" + }, + "outputs": [], + "source": [ + "fac = tfm.optimization.optimizer_factory.OptimizerFactory(\n", + " tfm.optimization.OptimizationConfig(optimization_config))\n", + "lr = fac.build_learning_rate()\n", + "optimizer = fac.build_optimizer(lr=lr)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "name": "fine_tune_bert.ipynb", + "private_outputs": true, + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/nlp/index.ipynb b/docs/nlp/index.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..0d4709d7c4f000ae31382662766ade08ef156407 --- /dev/null +++ b/docs/nlp/index.ipynb @@ -0,0 +1,557 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "80xnUmoI7fBX" + }, + "source": [ + "##### Copyright 2020 The TensorFlow Authors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "8nvTnfs6Q692" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WmfcMK5P5C1G" + }, + "source": [ + "# Introduction to the TensorFlow Models NLP library" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cH-oJ8R6AHMK" + }, + "source": [ + "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/nlp\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n", + " \u003c/td\u003e\n", + "\u003c/table\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0H_EFIhq4-MJ" + }, + "source": [ + "## Learning objectives\n", + "\n", + "In this Colab notebook, you will learn how to build transformer-based models for common NLP tasks including pretraining, span labelling and classification using the building blocks from [NLP modeling library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2N97-dps_nUk" + }, + "source": [ + "## Install and import" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "459ygAVl_rg0" + }, + "source": [ + "### Install the TensorFlow Model Garden pip package\n", + "\n", + "* `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n", + "which is the nightly Model Garden package created daily automatically.\n", + "* `pip` will install all models and dependencies automatically." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IAOmYthAzI7J" + }, + "outputs": [], + "source": [ + "!pip install -q opencv-python" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Y-qGkdh6_sZc" + }, + "outputs": [], + "source": [ + "!pip install tf-models-official" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e4huSSwyAG_5" + }, + "source": [ + "### Import Tensorflow and other libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jqYXqtjBAJd9" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "\n", + "from tensorflow_models import nlp" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "djBQWjvy-60Y" + }, + "source": [ + "## BERT pretraining model\n", + "\n", + "BERT ([Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)) introduced the method of pre-training language representations on a large text corpus and then using that model for downstream NLP tasks.\n", + "\n", + "In this section, we will learn how to build a model to pretrain BERT on the masked language modeling task and next sentence prediction task. For simplicity, we only show the minimum example and use dummy data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MKuHVlsCHmiq" + }, + "source": [ + "### Build a `BertPretrainer` model wrapping `BertEncoder`\n", + "\n", + "The `nlp.networks.BertEncoder` class implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers (`nlp.layers.TransformerEncoderBlock`), but not the masked language model or classification task networks.\n", + "\n", + "The `nlp.models.BertPretrainer` class allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EXkcXz-9BwB3" + }, + "outputs": [], + "source": [ + "# Build a small transformer network.\n", + "vocab_size = 100\n", + "network = nlp.networks.BertEncoder(\n", + " vocab_size=vocab_size, \n", + " # The number of TransformerEncoderBlock layers\n", + " num_layers=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0NH5irV5KTMS" + }, + "source": [ + "Inspecting the encoder, we see it contains few embedding layers, stacked `nlp.layers.TransformerEncoderBlock` layers and are connected to three input layers:\n", + "\n", + "`input_word_ids`, `input_type_ids` and `input_mask`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lZNoZkBrIoff" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(network, show_shapes=True, expand_nested=True, dpi=48)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "o7eFOZXiIl-b" + }, + "outputs": [], + "source": [ + "# Create a BERT pretrainer with the created network.\n", + "num_token_predictions = 8\n", + "bert_pretrainer = nlp.models.BertPretrainer(\n", + " network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d5h5HT7gNHx_" + }, + "source": [ + "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `nlp.layers.ClassificationHead` heads." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2tcNfm03IBF7" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, expand_nested=True, dpi=48)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "F2oHrXGUIS0M" + }, + "outputs": [], + "source": [ + "# We can feed some dummy data to get masked language model and sentence output.\n", + "sequence_length = 16\n", + "batch_size = 2\n", + "\n", + "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n", + "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n", + "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n", + "masked_lm_positions_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n", + "\n", + "outputs = bert_pretrainer(\n", + " [word_id_data, mask_data, type_id_data, masked_lm_positions_data])\n", + "lm_output = outputs[\"masked_lm\"]\n", + "sentence_output = outputs[\"classification\"]\n", + "print(f'lm_output: shape={lm_output.shape}, dtype={lm_output.dtype!r}')\n", + "print(f'sentence_output: shape={sentence_output.shape}, dtype={sentence_output.dtype!r}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bnx3UCHniCS5" + }, + "source": [ + "### Compute loss\n", + "Next, we can use `lm_output` and `sentence_output` to compute `loss`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "k30H4Q86f52x" + }, + "outputs": [], + "source": [ + "masked_lm_ids_data = np.random.randint(vocab_size, size=(batch_size, num_token_predictions))\n", + "masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n", + "next_sentence_labels_data = np.random.randint(2, size=(batch_size))\n", + "\n", + "mlm_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n", + " labels=masked_lm_ids_data,\n", + " predictions=lm_output,\n", + " weights=masked_lm_weights_data)\n", + "sentence_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n", + " labels=next_sentence_labels_data,\n", + " predictions=sentence_output)\n", + "loss = mlm_loss + sentence_loss\n", + "\n", + "print(loss)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wrmSs8GjHxVw" + }, + "source": [ + "With the loss, you can optimize the model.\n", + "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/legacy/bert/run_pretraining.py) for the full example.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "k8cQVFvBCV4s" + }, + "source": [ + "## Span labeling model\n", + "\n", + "Span labeling is the task to assign labels to a span of the text, for example, label a span of text as the answer of a given question.\n", + "\n", + "In this section, we will learn how to build a span labeling model. Again, we use dummy data for simplicity." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xrLLEWpfknUW" + }, + "source": [ + "### Build a BertSpanLabeler wrapping BertEncoder\n", + "\n", + "The `nlp.models.BertSpanLabeler` class implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n", + "\n", + "Note that `nlp.models.BertSpanLabeler` wraps a `nlp.networks.BertEncoder`, the weights of which can be restored from the above pretraining model.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "B941M4iUCejO" + }, + "outputs": [], + "source": [ + "network = nlp.networks.BertEncoder(\n", + " vocab_size=vocab_size, num_layers=2)\n", + "\n", + "# Create a BERT trainer with the created network.\n", + "bert_span_labeler = nlp.models.BertSpanLabeler(network)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QpB9pgj4PpMg" + }, + "source": [ + "Inspecting the `bert_span_labeler`, we see it wraps the encoder with additional `SpanLabeling` that outputs `start_position` and `end_position`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RbqRNJCLJu4H" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, expand_nested=True, dpi=48)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fUf1vRxZJwio" + }, + "outputs": [], + "source": [ + "# Create a set of 2-dimensional data tensors to feed into the model.\n", + "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n", + "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n", + "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n", + "\n", + "# Feed the data to the model.\n", + "start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])\n", + "\n", + "print(f'start_logits: shape={start_logits.shape}, dtype={start_logits.dtype!r}')\n", + "print(f'end_logits: shape={end_logits.shape}, dtype={end_logits.dtype!r}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WqhgQaN1lt-G" + }, + "source": [ + "### Compute loss\n", + "With `start_logits` and `end_logits`, we can compute loss:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "waqs6azNl3Nn" + }, + "outputs": [], + "source": [ + "start_positions = np.random.randint(sequence_length, size=(batch_size))\n", + "end_positions = np.random.randint(sequence_length, size=(batch_size))\n", + "\n", + "start_loss = tf.keras.losses.sparse_categorical_crossentropy(\n", + " start_positions, start_logits, from_logits=True)\n", + "end_loss = tf.keras.losses.sparse_categorical_crossentropy(\n", + " end_positions, end_logits, from_logits=True)\n", + "\n", + "total_loss = (tf.reduce_mean(start_loss) + tf.reduce_mean(end_loss)) / 2\n", + "print(total_loss)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Zdf03YtZmd_d" + }, + "source": [ + "With the `loss`, you can optimize the model. Please see [run_squad.py](https://github.com/tensorflow/models/blob/master/official/legacy/bert/run_squad.py) for the full example." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0A1XnGSTChg9" + }, + "source": [ + "## Classification model\n", + "\n", + "In the last section, we show how to build a text classification model.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MSK8OpZgnQa9" + }, + "source": [ + "### Build a BertClassifier model wrapping BertEncoder\n", + "\n", + "`nlp.models.BertClassifier` implements a [CLS] token classification model containing a single classification head." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cXXCsffkCphk" + }, + "outputs": [], + "source": [ + "network = nlp.networks.BertEncoder(\n", + " vocab_size=vocab_size, num_layers=2)\n", + "\n", + "# Create a BERT trainer with the created network.\n", + "num_classes = 2\n", + "bert_classifier = nlp.models.BertClassifier(\n", + " network, num_classes=num_classes)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8tZKueKYP4bB" + }, + "source": [ + "Inspecting the `bert_classifier`, we see it wraps the `encoder` with additional `Classification` head." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "snlutm9ZJgEZ" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, expand_nested=True, dpi=48)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yyHPHsqBJkCz" + }, + "outputs": [], + "source": [ + "# Create a set of 2-dimensional data tensors to feed into the model.\n", + "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n", + "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n", + "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n", + "\n", + "# Feed the data to the model.\n", + "logits = bert_classifier([word_id_data, mask_data, type_id_data])\n", + "print(f'logits: shape={logits.shape}, dtype={logits.dtype!r}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w--a2mg4nzKm" + }, + "source": [ + "### Compute loss\n", + "\n", + "With `logits`, we can compute `loss`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9X0S1DoFn_5Q" + }, + "outputs": [], + "source": [ + "labels = np.random.randint(num_classes, size=(batch_size))\n", + "\n", + "loss = tf.keras.losses.sparse_categorical_crossentropy(\n", + " labels, logits, from_logits=True)\n", + "print(loss)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mzBqOylZo3og" + }, + "source": [ + "With the `loss`, you can optimize the model. Please see [run_classifier.py](https://github.com/tensorflow/models/blob/master/official/legacy/bert/run_classifier.py) or the [Fine tune_bert](https://www.tensorflow.org/text/tutorials/fine_tune_bert) notebook for the full example." + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "nlp_modeling_library_intro.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/orbit/index.ipynb b/docs/orbit/index.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..c7e7765b8b0955835e1736e173b60bb90aecf848 --- /dev/null +++ b/docs/orbit/index.ipynb @@ -0,0 +1,898 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Tce3stUlHN0L" + }, + "source": [ + "##### Copyright 2020 The TensorFlow Authors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "tuOe1ymfHZPu" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFdPvlXBOdUN" + }, + "source": [ + "# Training with Orbit" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MfBg1C5NB3X0" + }, + "source": [ + "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/orbit\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/orbit/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/orbit/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView on GitHub\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/orbit/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n", + " \u003c/td\u003e\n", + "\n", + "\u003c/table\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "456h0idS2Xcq" + }, + "source": [ + "This example will work through fine-tuning a BERT model using the [Orbit](https://www.tensorflow.org/api_docs/python/orbit) training library.\n", + "\n", + "Orbit is a flexible, lightweight library designed to make it easy to write [custom training loops](https://www.tensorflow.org/tutorials/distribute/custom_training) in TensorFlow. Orbit handles common model training tasks such as saving checkpoints, running model evaluations, and setting up summary writing, while giving users full control over implementing the inner training loop. It integrates with `tf.distribute` and supports running on different device types (CPU, GPU, and TPU).\n", + "\n", + "Most examples on [tensorflow.org](https://www.tensorflow.org/) use custom training loops or [model.fit()](https://www.tensorflow.org/api_docs/python/tf/keras/Model) from Keras. Orbit is a good alternative to `model.fit` if your model is complex and your training loop requires more flexibility, control, or customization. Also, using Orbit can simplify the code when there are many different model architectures that all use the same custom training loop.\n", + "\n", + "This tutorial focuses on setting up and using Orbit, rather than details about BERT, model construction, and data processing. For more in-depth tutorials on these topics, refer to the following tutorials:\n", + "\n", + "* [Fine tune BERT](https://www.tensorflow.org/text/tutorials/fine_tune_bert) - which goes into detail on these sub-topics.\n", + "* [Fine tune BERT for GLUE on TPU](https://www.tensorflow.org/text/tutorials/bert_glue) - which generalizes the code to run any BERT configuration on any [GLUE](https://www.tensorflow.org/datasets/catalog/glue) sub-task, and runs on TPU." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TJ4m3khW3p_W" + }, + "source": [ + "## Install the TensorFlow Models package\n", + "\n", + "Install and import the necessary packages, then configure all the objects necessary for training a model.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FZlj0U8Aq9Gt" + }, + "outputs": [], + "source": [ + "!pip install -q opencv-python\n", + "!pip install tensorflow>=2.9.0 tf-models-official" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MEJkRrmapr16" + }, + "source": [ + "The `tf-models-official` package contains both the `orbit` and `tensorflow_models` modules." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dUVPW84Zucuq" + }, + "outputs": [], + "source": [ + "import tensorflow_models as tfm\n", + "import orbit" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "18Icocf3lwYD" + }, + "source": [ + "## Setup for training\n", + "\n", + "This tutorial does not focus on configuring the environment, building the model and optimizer, and loading data. All these techniques are covered in more detail in the [Fine tune BERT](https://www.tensorflow.org/text/tutorials/fine_tune_bert) and [Fine tune BERT with GLUE](https://www.tensorflow.org/text/tutorials/bert_glue) tutorials.\n", + "\n", + "To view how the training is set up for this tutorial, expand the rest of this section.\n", + "\n", + " \u003c!-- \u003cdiv class=\"tfo-display-only-on-site\"\u003e\u003cdevsite-expandable\u003e\n", + " \u003cbutton type=\"button\" class=\"button-red button expand-control\"\u003eExpand Section\u003c/button\u003e --\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ljy0z-i3okCS" + }, + "source": [ + "### Import the necessary packages\n", + "\n", + "Import the BERT model and dataset building library from [Tensorflow Model Garden](https://github.com/tensorflow/models)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gCBo6wxA2b5n" + }, + "outputs": [], + "source": [ + "import glob\n", + "import os\n", + "import pathlib\n", + "import tempfile\n", + "import time\n", + "\n", + "import numpy as np\n", + "\n", + "import tensorflow as tf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PG1kwhnvq3VC" + }, + "outputs": [], + "source": [ + "from official.nlp.data import sentence_prediction_dataloader\n", + "from official.nlp import optimization" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PsbhUV_p3wxN" + }, + "source": [ + "### Configure the distribution strategy\n", + "\n", + "While `tf.distribute` won't help the model's runtime if you're running on a single machine or GPU, it's necessary for TPUs. Setting up a distribution strategy allows you to use the same code regardless of the configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PG702dqstXIk" + }, + "outputs": [], + "source": [ + "logical_device_names = [logical_device.name for logical_device in tf.config.list_logical_devices()]\n", + "\n", + "if 'GPU' in ''.join(logical_device_names):\n", + " strategy = tf.distribute.MirroredStrategy()\n", + "elif 'TPU' in ''.join(logical_device_names):\n", + " resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')\n", + " tf.config.experimental_connect_to_cluster(resolver)\n", + " tf.tpu.experimental.initialize_tpu_system(resolver)\n", + " strategy = tf.distribute.TPUStrategy(resolver)\n", + "else:\n", + " strategy = tf.distribute.OneDeviceStrategy(logical_device_names[0])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eaQgM98deAMu" + }, + "source": [ + "For more information about the TPU setup, refer to the [TPU guide](https://www.tensorflow.org/guide/tpu)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7aOxMLLV32Zm" + }, + "source": [ + "### Create a model and an optimizer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YRdWzOfK3_56" + }, + "outputs": [], + "source": [ + "max_seq_length = 128\n", + "learning_rate = 3e-5\n", + "num_train_epochs = 3\n", + "train_batch_size = 32\n", + "eval_batch_size = 64\n", + "\n", + "train_data_size = 3668\n", + "steps_per_epoch = int(train_data_size / train_batch_size)\n", + "\n", + "train_steps = steps_per_epoch * num_train_epochs\n", + "warmup_steps = int(train_steps * 0.1)\n", + "\n", + "print(\"train batch size: \", train_batch_size)\n", + "print(\"train epochs: \", num_train_epochs)\n", + "print(\"steps_per_epoch: \", steps_per_epoch)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BVw3886Ysse6" + }, + "outputs": [], + "source": [ + "model_dir = pathlib.Path(tempfile.mkdtemp())\n", + "print(model_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mu9cV7ew-cVe" + }, + "source": [ + "\n", + "Create a BERT Classifier model and a simple optimizer. They must be created inside `strategy.scope` so that the variables can be distributed. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gmwtX0cp-mj5" + }, + "outputs": [], + "source": [ + "with strategy.scope():\n", + " encoder_network = tfm.nlp.encoders.build_encoder(\n", + " tfm.nlp.encoders.EncoderConfig(type=\"bert\"))\n", + " classifier_model = tfm.nlp.models.BertClassifier(\n", + " network=encoder_network, num_classes=2)\n", + "\n", + " optimizer = optimization.create_optimizer(\n", + " init_lr=3e-5,\n", + " num_train_steps=steps_per_epoch * num_train_epochs,\n", + " num_warmup_steps=warmup_steps,\n", + " end_lr=0.0,\n", + " optimizer_type='adamw')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "jwJSfewG5jVV" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(classifier_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IQy5pYgAf8Ft" + }, + "source": [ + "### Initialize from a Checkpoint" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6CE14GEybgRR" + }, + "outputs": [], + "source": [ + "bert_dir = 'gs://cloud-tpu-checkpoints/bert/v3/uncased_L-12_H-768_A-12/'\n", + "tf.io.gfile.listdir(bert_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "x7fwxz9xidKt" + }, + "outputs": [], + "source": [ + "bert_checkpoint = bert_dir + 'bert_model.ckpt'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "q7EfwVCRe7N_" + }, + "outputs": [], + "source": [ + "def init_from_ckpt_fn():\n", + " init_checkpoint = tf.train.Checkpoint(**classifier_model.checkpoint_items)\n", + " with strategy.scope():\n", + " (init_checkpoint\n", + " .read(bert_checkpoint)\n", + " .expect_partial()\n", + " .assert_existing_objects_matched())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "M0LUMlsde-2f" + }, + "outputs": [], + "source": [ + "with strategy.scope():\n", + " init_from_ckpt_fn()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gAuns4vN_IYV" + }, + "source": [ + "\n", + "To use Orbit, create a `tf.train.CheckpointManager` object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "i7NwM1Jq_MX7" + }, + "outputs": [], + "source": [ + "checkpoint = tf.train.Checkpoint(model=classifier_model, optimizer=optimizer)\n", + "checkpoint_manager = tf.train.CheckpointManager(\n", + " checkpoint,\n", + " directory=model_dir,\n", + " max_to_keep=5,\n", + " step_counter=optimizer.iterations,\n", + " checkpoint_interval=steps_per_epoch,\n", + " init_fn=init_from_ckpt_fn)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nzeiAFhcCOAo" + }, + "source": [ + "### Create distributed datasets\n", + "\n", + "As a shortcut for this tutorial, the [GLUE/MPRC dataset](https://www.tensorflow.org/datasets/catalog/glue#gluemrpc) has been converted to a pair of [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) files containing serialized `tf.train.Example` protos.\n", + "\n", + "The data was converted using [this script](https://github.com/tensorflow/models/blob/r2.9.0/official/nlp/data/create_finetuning_data.py).\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZVfbiT1dCnDk" + }, + "outputs": [], + "source": [ + "train_data_path = \"gs://download.tensorflow.org/data/model_garden_colab/mrpc_train.tf_record\"\n", + "eval_data_path = \"gs://download.tensorflow.org/data/model_garden_colab/mrpc_eval.tf_record\"\n", + "\n", + "def _dataset_fn(input_file_pattern, \n", + " global_batch_size, \n", + " is_training, \n", + " input_context=None):\n", + " data_config = sentence_prediction_dataloader.SentencePredictionDataConfig(\n", + " input_path=input_file_pattern,\n", + " seq_length=max_seq_length,\n", + " global_batch_size=global_batch_size,\n", + " is_training=is_training)\n", + " return sentence_prediction_dataloader.SentencePredictionDataLoader(\n", + " data_config).load(input_context=input_context)\n", + "\n", + "train_dataset = orbit.utils.make_distributed_dataset(\n", + " strategy, _dataset_fn, input_file_pattern=train_data_path,\n", + " global_batch_size=train_batch_size, is_training=True)\n", + "eval_dataset = orbit.utils.make_distributed_dataset(\n", + " strategy, _dataset_fn, input_file_pattern=eval_data_path,\n", + " global_batch_size=eval_batch_size, is_training=False)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dPgiDBQCjsXW" + }, + "source": [ + "### Create a loss function\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7MCUmmo2jvXl" + }, + "outputs": [], + "source": [ + "def loss_fn(labels, logits):\n", + " \"\"\"Classification loss.\"\"\"\n", + " labels = tf.squeeze(labels)\n", + " log_probs = tf.nn.log_softmax(logits, axis=-1)\n", + " one_hot_labels = tf.one_hot(\n", + " tf.cast(labels, dtype=tf.int32), depth=2, dtype=tf.float32)\n", + " per_example_loss = -tf.reduce_sum(\n", + " tf.cast(one_hot_labels, dtype=tf.float32) * log_probs, axis=-1)\n", + " return tf.reduce_mean(per_example_loss)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ohlO-8FQkwsr" + }, + "source": [ + " \u003c/devsite-expandable\u003e\u003c/div\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ymhbvPaEJ96T" + }, + "source": [ + "## Controllers, Trainers and Evaluators\n", + "\n", + "When using Orbit, the `orbit.Controller` class drives the training. The Controller handles the details of distribution strategies, step counting, TensorBoard summaries, and checkpointing.\n", + "\n", + "To implement the training and evaluation, pass a `trainer` and `evaluator`, which are subclass instances of `orbit.AbstractTrainer` and `orbit.AbstractEvaluator`. Keeping with Orbit's light-weight design, these two classes have a minimal interface.\n", + "\n", + "The Controller drives training and evaluation by calling `trainer.train(num_steps)` and `evaluator.evaluate(num_steps)`. These `train` and `evaluate` methods return a dictionary of results for logging.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a6sU2vBeyXtu" + }, + "source": [ + "Training is broken into chunks of length `num_steps`. This is set by the Controller's [`steps_per_loop`](https://tensorflow.org/api_docs/python/orbit/Controller#args) argument. With the trainer and evaluator abstract base classes, the meaning of `num_steps` is entirely determined by the implementer.\n", + "\n", + "Some common examples include:\n", + "\n", + "* Having the chunks represent dataset-epoch boundaries, like the default keras setup. \n", + "* Using it to more efficiently dispatch a number of training steps to an accelerator with a single `tf.function` call (like the `steps_per_execution` argument to `Model.compile`). \n", + "* Subdividing into smaller chunks as needed.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p4mXGIRJsf1j" + }, + "source": [ + "### StandardTrainer and StandardEvaluator\n", + "\n", + "Orbit provides two additional classes, `orbit.StandardTrainer` and `orbit.StandardEvaluator`, to give more structure around the training and evaluation loops.\n", + "\n", + "With StandardTrainer, you only need to set `train_loop_begin`, `train_step`, and `train_loop_end`. The base class handles the loops, dataset logic, and `tf.function` (according to the options set by their `orbit.StandardTrainerOptions`). This is simpler than `orbit.AbstractTrainer`, which requires you to handle the entire loop. StandardEvaluator has a similar structure and simplification to StandardTrainer.\n", + "\n", + "This is effectively an implementation of the `steps_per_execution` approach used by Keras." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-hvZ8PvohmR5" + }, + "source": [ + "Contrast this with Keras, where training is divided both into epochs (a single pass over the dataset) and `steps_per_execution`(set within [`Model.compile`](https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile). In Keras, metric averages are typically accumulated over an epoch, and reported \u0026 reset between epochs. For efficiency, `steps_per_execution` only controls the number of training steps made per call.\n", + "\n", + "In this simple case, `steps_per_loop` (within `StandardTrainer`) will handle both the metric resets and the number of steps per call. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NoDFN1L-1jIu" + }, + "source": [ + "The minimal setup when using these base classes is to implement the methods as follows:\n", + "\n", + "1. `StandardTrainer.train_loop_begin` - Reset your training metrics.\n", + "2. `StandardTrainer.train_step` - Apply a single gradient update.\n", + "3. `StandardTrainer.train_loop_end` - Report your training metrics.\n", + "\n", + "and\n", + "\n", + "4. `StandardEvaluator.eval_begin` - Reset your evaluation metrics.\n", + "5. `StandardEvaluator.eval_step` - Run a single evaluation setep.\n", + "6. `StandardEvaluator.eval_reduce` - This is not necessary in this simple setup.\n", + "7. `StandardEvaluator.eval_end` - Report your evaluation metrics.\n", + "\n", + "Depending on the settings, the base class may wrap the `train_step` and `eval_step` code in `tf.function` or `tf.while_loop`, which has some limitations compared to standard python." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3KPA0NDZt2JD" + }, + "source": [ + "### Define the trainer class" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6LDPsvJwfuPR" + }, + "source": [ + "In this section you'll create a subclass of `orbit.StandardTrainer` for this task. \n", + "\n", + "Note: To better explain the `BertClassifierTrainer` class, this section defines each method as a stand-alone function and assembles them into a class at the end.\n", + "\n", + "The trainer needs access to the training data, model, optimizer, and distribution strategy. Pass these as arguments to the initializer.\n", + "\n", + "Define a single training metric, `training_loss`, using `tf.keras.metrics.Mean`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6DQYZN5ax-MG" + }, + "outputs": [], + "source": [ + "def trainer_init(self,\n", + " train_dataset,\n", + " model,\n", + " optimizer,\n", + " strategy):\n", + " self.strategy = strategy\n", + " with self.strategy.scope():\n", + " self.model = model\n", + " self.optimizer = optimizer\n", + " self.global_step = self.optimizer.iterations\n", + " \n", + "\n", + " self.train_loss = tf.keras.metrics.Mean(\n", + " 'training_loss', dtype=tf.float32)\n", + " orbit.StandardTrainer.__init__(self, train_dataset)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QOwHD7U5hVue" + }, + "source": [ + "Before starting a run of the training loop, the `train_loop_begin` method will reset the `train_loss` metric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AkpcHqXShWL0" + }, + "outputs": [], + "source": [ + "def train_loop_begin(self):\n", + " self.train_loss.reset_states()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UjtFOFyxn2BB" + }, + "source": [ + "The `train_step` is a straight-forward loss-calculation and gradient update that is run by the distribution strategy. This is accomplished by defining the gradient step as a nested function (`step_fn`).\n", + "\n", + "The method receives `tf.distribute.DistributedIterator` to handle the [distributed input](https://www.tensorflow.org/tutorials/distribute/input). The method uses `Strategy.run` to execute `step_fn` and feeds it from the distributed iterator.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QuPwNnT5I-GP" + }, + "outputs": [], + "source": [ + "def train_step(self, iterator):\n", + "\n", + " def step_fn(inputs):\n", + " labels = inputs.pop(\"label_ids\")\n", + " with tf.GradientTape() as tape:\n", + " model_outputs = self.model(inputs, training=True)\n", + " # Raw loss is used for reporting in metrics/logs.\n", + " raw_loss = loss_fn(labels, model_outputs)\n", + " # Scales down the loss for gradients to be invariant from replicas.\n", + " loss = raw_loss / self.strategy.num_replicas_in_sync\n", + "\n", + " grads = tape.gradient(loss, self.model.trainable_variables)\n", + " optimizer.apply_gradients(zip(grads, self.model.trainable_variables))\n", + " # For reporting, the metric takes the mean of losses.\n", + " self.train_loss.update_state(raw_loss)\n", + "\n", + " self.strategy.run(step_fn, args=(next(iterator),))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VmQNwx5QpyDt" + }, + "source": [ + "The `orbit.StandardTrainer` handles the `@tf.function` and loops.\n", + "\n", + "After running through `num_steps` of training, `StandardTrainer` calls `train_loop_end`. The function returns the metric results:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GqCyVk1zzGod" + }, + "outputs": [], + "source": [ + "def train_loop_end(self):\n", + " return {\n", + " self.train_loss.name: self.train_loss.result(),\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xvmLONl80KUv" + }, + "source": [ + "Build a subclass of `orbit.StandardTrainer` with those methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oRoL7VE6xt1G" + }, + "outputs": [], + "source": [ + "class BertClassifierTrainer(orbit.StandardTrainer):\n", + " __init__ = trainer_init\n", + " train_loop_begin = train_loop_begin\n", + " train_step = train_step\n", + " train_loop_end = train_loop_end" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yjG4QAWj1B00" + }, + "source": [ + "### Define the evaluator class\n", + "\n", + "Note: Like the previous section, this section defines each method as a stand-alone function and assembles them into a `BertClassifierEvaluator` class at the end.\n", + "\n", + "The evaluator is even simpler for this task. It needs access to the evaluation dataset, the model, and the strategy. After saving references to those objects, the constructor just needs to create the metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cvX7seCY1CWj" + }, + "outputs": [], + "source": [ + "def evaluator_init(self,\n", + " eval_dataset,\n", + " model,\n", + " strategy):\n", + " self.strategy = strategy\n", + " with self.strategy.scope():\n", + " self.model = model\n", + " \n", + " self.eval_loss = tf.keras.metrics.Mean(\n", + " 'evaluation_loss', dtype=tf.float32)\n", + " self.eval_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(\n", + " name='accuracy', dtype=tf.float32)\n", + " orbit.StandardEvaluator.__init__(self, eval_dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0r-z-XK7ybyX" + }, + "source": [ + "Similar to the trainer, the `eval_begin` and `eval_end` methods just need to reset the metrics before the loop and then report the results after the loop." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7VVb0Tg6yZjI" + }, + "outputs": [], + "source": [ + "def eval_begin(self):\n", + " self.eval_accuracy.reset_states()\n", + " self.eval_loss.reset_states()\n", + "\n", + "def eval_end(self):\n", + " return {\n", + " self.eval_accuracy.name: self.eval_accuracy.result(),\n", + " self.eval_loss.name: self.eval_loss.result(),\n", + " }" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iDOZcQvttdmZ" + }, + "source": [ + "The `eval_step` method works like `train_step`. The inner `step_fn` defines the actual work of calculating the loss \u0026 accuracy and updating the metrics. The outer `eval_step` receives `tf.distribute.DistributedIterator` as input, and uses `Strategy.run` to launch the distributed execution to `step_fn`, feeding it from the distributed iterator." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JLJnYuuGJjvd" + }, + "outputs": [], + "source": [ + "def eval_step(self, iterator):\n", + "\n", + " def step_fn(inputs):\n", + " labels = inputs.pop(\"label_ids\")\n", + " model_outputs = self.model(inputs, training=True)\n", + " loss = loss_fn(labels, model_outputs)\n", + " self.eval_loss.update_state(loss)\n", + " self.eval_accuracy.update_state(labels, model_outputs)\n", + "\n", + " self.strategy.run(step_fn, args=(next(iterator),))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Gt3hh0V30QcP" + }, + "source": [ + "Build a subclass of `orbit.StandardEvaluator` with those methods." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3zqyLxfNyCgA" + }, + "outputs": [], + "source": [ + "class BertClassifierEvaluator(orbit.StandardEvaluator):\n", + " __init__ = evaluator_init\n", + " eval_begin = eval_begin\n", + " eval_end = eval_end\n", + " eval_step = eval_step" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aK9gEja9qPOc" + }, + "source": [ + "### End-to-end training and evaluation\n", + "\n", + "To run the training and evaluation, simply create the trainer, evaluator, and `orbit.Controller` instances. Then call the `Controller.train_and_evaluate` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PqQetxyXqRA9" + }, + "outputs": [], + "source": [ + "trainer = BertClassifierTrainer(\n", + " train_dataset, classifier_model, optimizer, strategy)\n", + "\n", + "evaluator = BertClassifierEvaluator(\n", + " eval_dataset, classifier_model, strategy)\n", + "\n", + "controller = orbit.Controller(\n", + " trainer=trainer,\n", + " evaluator=evaluator,\n", + " global_step=trainer.global_step,\n", + " steps_per_loop=20,\n", + " checkpoint_manager=checkpoint_manager)\n", + "\n", + "result = controller.train_and_evaluate(\n", + " train_steps=steps_per_epoch * num_train_epochs,\n", + " eval_steps=-1,\n", + " eval_interval=steps_per_epoch)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [ + "Tce3stUlHN0L" + ], + "name": "Orbit Tutorial.ipynb", + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/vision/image_classification.ipynb b/docs/vision/image_classification.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..bd2e36670211317a0e7e39b073e42957b9b5fc1f --- /dev/null +++ b/docs/vision/image_classification.ipynb @@ -0,0 +1,691 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Tce3stUlHN0L" + }, + "source": [ + "##### Copyright 2020 The TensorFlow Authors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "tuOe1ymfHZPu" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qFdPvlXBOdUN" + }, + "source": [ + "# Image classification with Model Garden" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MfBg1C5NB3X0" + }, + "source": [ + "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/vision/image_classification\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/vision/image_classification.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/vision/image_classification.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView on GitHub\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/vision/image_classification.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n", + " \u003c/td\u003e\n", + "\u003c/table\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ta_nFXaVAqLD" + }, + "source": [ + "This tutorial fine-tunes a Residual Network (ResNet) from the TensorFlow [Model Garden](https://github.com/tensorflow/models) package (`tensorflow-models`) to classify images in the [CIFAR](https://www.cs.toronto.edu/~kriz/cifar.html) dataset.\n", + "\n", + "Model Garden contains a collection of state-of-the-art vision models, implemented with TensorFlow's high-level APIs. The implementations demonstrate the best practices for modeling, letting users to take full advantage of TensorFlow for their research and product development.\n", + "\n", + "This tutorial uses a [ResNet](https://arxiv.org/pdf/1512.03385.pdf) model, a state-of-the-art image classifier. This tutorial uses the ResNet-18 model, a convolutional neural network with 18 layers.\n", + "\n", + "This tutorial demonstrates how to:\n", + "1. Use models from the TensorFlow Models package.\n", + "2. Fine-tune a pre-built ResNet for image classification.\n", + "3. Export the tuned ResNet model." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G2FlaQcEPOER" + }, + "source": [ + "## Setup\n", + "\n", + "Install and import the necessary modules. This tutorial uses the `tf-models-nightly` version of Model Garden.\n", + "\n", + "Note: Upgrading TensorFlow to 2.9 in Colab breaks GPU support, so this colab is set to run on CPU until the Colab runtimes are updated." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XvWfdCrvrV5W" + }, + "outputs": [], + "source": [ + "!pip uninstall -y opencv-python\n", + "!pip install -U -q \"tensorflow\u003e=2.9.0\" \"tf-models-official\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CKYMTPjOE400" + }, + "source": [ + "Import TensorFlow, TensorFlow Datasets, and a few helper libraries." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Wlon1uoIowmZ" + }, + "outputs": [], + "source": [ + "import pprint\n", + "import tempfile\n", + "\n", + "from IPython import display\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import tensorflow as tf\n", + "import tensorflow_datasets as tfds" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AVTs0jDd1b24" + }, + "source": [ + "The `tensorflow_models` package contains the ResNet vision model, and the `official.vision.serving` model contains the function to save and export the tuned model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NHT1iiIiBzlC" + }, + "outputs": [], + "source": [ + "import tensorflow_models as tfm\n", + "\n", + "# These are not in the tfm public API for v2.9. They will be available in v2.10\n", + "from official.vision.serving import export_saved_model_lib\n", + "import official.core.train_lib" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aKv3wdqkQ8FU" + }, + "source": [ + "## Configure the ResNet-18 model for the Cifar-10 dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5iN8mHEJjKYE" + }, + "source": [ + "The CIFAR10 dataset contains 60,000 color images in mutually exclusive 10 classes, with 6,000 images in each class.\n", + "\n", + "In Model Garden, the collections of parameters that define a model are called *configs*. Model Garden can create a config based on a known set of parameters via a [factory](https://en.wikipedia.org/wiki/Factory_method_pattern).\n", + "\n", + "Use the `resnet_imagenet` factory configuration, as defined by `tfm.vision.configs.image_classification.image_classification_imagenet`. The configuration is set up to train ResNet to converge on [ImageNet](https://www.image-net.org/)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1M77f88Dj2Td" + }, + "outputs": [], + "source": [ + "exp_config = tfm.core.exp_factory.get_exp_config('resnet_imagenet')\n", + "tfds_name = 'cifar10'\n", + "ds_info = tfds.builder(tfds_name ).info\n", + "ds_info" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U6PVwXA-j3E7" + }, + "source": [ + "Adjust the model and dataset configurations so that it works with Cifar-10 (`cifar10`)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YWI7faVStQaV" + }, + "outputs": [], + "source": [ + "# Configure model\n", + "exp_config.task.model.num_classes = 10\n", + "exp_config.task.model.input_size = list(ds_info.features[\"image\"].shape)\n", + "exp_config.task.model.backbone.resnet.model_id = 18\n", + "\n", + "# Configure training and testing data\n", + "batch_size = 128\n", + "\n", + "exp_config.task.train_data.input_path = ''\n", + "exp_config.task.train_data.tfds_name = tfds_name\n", + "exp_config.task.train_data.tfds_split = 'train'\n", + "exp_config.task.train_data.global_batch_size = batch_size\n", + "\n", + "exp_config.task.validation_data.input_path = ''\n", + "exp_config.task.validation_data.tfds_name = tfds_name\n", + "exp_config.task.validation_data.tfds_split = 'test'\n", + "exp_config.task.validation_data.global_batch_size = batch_size\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DE3ggKzzTD56" + }, + "source": [ + "Adjust the trainer configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "inE_-4UGkLud" + }, + "outputs": [], + "source": [ + "logical_device_names = [logical_device.name for logical_device in tf.config.list_logical_devices()]\n", + "\n", + "if 'GPU' in ''.join(logical_device_names):\n", + " print('This may be broken in Colab.')\n", + " device = 'GPU'\n", + "elif 'TPU' in ''.join(logical_device_names):\n", + " print('This may be broken in Colab.')\n", + " device = 'TPU'\n", + "else:\n", + " print('Running on CPU is slow, so only train for a few steps.')\n", + " device = 'CPU'\n", + "\n", + "if device=='CPU':\n", + " train_steps = 20\n", + " exp_config.trainer.steps_per_loop = 5\n", + "else:\n", + " train_steps=5000\n", + " exp_config.trainer.steps_per_loop = 100\n", + "\n", + "exp_config.trainer.summary_interval = 100\n", + "exp_config.trainer.checkpoint_interval = train_steps\n", + "exp_config.trainer.validation_interval = 1000\n", + "exp_config.trainer.validation_steps = ds_info.splits['test'].num_examples // batch_size\n", + "exp_config.trainer.train_steps = train_steps\n", + "exp_config.trainer.optimizer_config.learning_rate.type = 'cosine'\n", + "exp_config.trainer.optimizer_config.learning_rate.cosine.decay_steps = train_steps\n", + "exp_config.trainer.optimizer_config.learning_rate.cosine.initial_learning_rate = 0.1\n", + "exp_config.trainer.optimizer_config.warmup.linear.warmup_steps = 100" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5mTcDnBiTOYD" + }, + "source": [ + "Print the modified configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tuVfxSBCTK-y" + }, + "outputs": [], + "source": [ + "pprint.pprint(exp_config.as_dict())\n", + "\n", + "display.Javascript(\"google.colab.output.setIframeHeight('300px');\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w7_X0UHaRF2m" + }, + "source": [ + "Set up the distribution strategy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ykL14FIbTaSt" + }, + "outputs": [], + "source": [ + "logical_device_names = [logical_device.name for logical_device in tf.config.list_logical_devices()]\n", + "\n", + "if exp_config.runtime.mixed_precision_dtype == tf.float16:\n", + " tf.keras.mixed_precision.set_global_policy('mixed_float16')\n", + "\n", + "if 'GPU' in ''.join(logical_device_names):\n", + " distribution_strategy = tf.distribute.MirroredStrategy()\n", + "elif 'TPU' in ''.join(logical_device_names):\n", + " tf.tpu.experimental.initialize_tpu_system()\n", + " tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='/device:TPU_SYSTEM:0')\n", + " distribution_strategy = tf.distribute.experimental.TPUStrategy(tpu)\n", + "else:\n", + " print('Warning: this will be really slow.')\n", + " distribution_strategy = tf.distribute.OneDeviceStrategy(logical_device_names[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W4k5YH5pTjaK" + }, + "source": [ + "Create the `Task` object (`tfm.core.base_task.Task`) from the `config_definitions.TaskConfig`.\n", + "\n", + "The `Task` object has all the methods necessary for building the dataset, building the model, and running training \u0026 evaluation. These methods are driven by `tfm.core.train_lib.run_experiment`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "6MgYSH0PtUaW" + }, + "outputs": [], + "source": [ + "with distribution_strategy.scope():\n", + " model_dir = tempfile.mkdtemp()\n", + " task = tfm.core.task_factory.get_task(exp_config.task, logging_dir=model_dir)\n", + "\n", + "tf.keras.utils.plot_model(task.build_model(), show_shapes=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IFXEZYdzBKoX" + }, + "outputs": [], + "source": [ + "for images, labels in task.build_inputs(exp_config.task.train_data).take(1):\n", + " print()\n", + " print(f'images.shape: {str(images.shape):16} images.dtype: {images.dtype!r}')\n", + " print(f'labels.shape: {str(labels.shape):16} labels.dtype: {labels.dtype!r}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yrwxnGDaRU0U" + }, + "source": [ + "## Visualize the training data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "683c255c6c52" + }, + "source": [ + "The dataloader applies a z-score normalization using \n", + "`preprocess_ops.normalize_image(image, offset=MEAN_RGB, scale=STDDEV_RGB)`, so the images returned by the dataset can't be directly displayed by standard tools. The visualization code needs to rescale the data into the [0,1] range." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PdmOz2EC0Nx2" + }, + "outputs": [], + "source": [ + "plt.hist(images.numpy().flatten());" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7a8582ebde7b" + }, + "source": [ + "Use `ds_info` (which is an instance of `tfds.core.DatasetInfo`) to lookup the text descriptions of each class ID." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Wq4Wq_CuDG3Q" + }, + "outputs": [], + "source": [ + "label_info = ds_info.features['label']\n", + "label_info.int2str(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8c652a6fdbcf" + }, + "source": [ + "Visualize a batch of the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZKfTxytf1l0d" + }, + "outputs": [], + "source": [ + "def show_batch(images, labels, predictions=None):\n", + " plt.figure(figsize=(10, 10))\n", + " min = images.numpy().min()\n", + " max = images.numpy().max()\n", + " delta = max - min\n", + "\n", + " for i in range(12):\n", + " plt.subplot(6, 6, i + 1)\n", + " plt.imshow((images[i]-min) / delta)\n", + " if predictions is None:\n", + " plt.title(label_info.int2str(labels[i]))\n", + " else:\n", + " if labels[i] == predictions[i]:\n", + " color = 'g'\n", + " else:\n", + " color = 'r'\n", + " plt.title(label_info.int2str(predictions[i]), color=color)\n", + " plt.axis(\"off\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xkA5h_RBtYYU" + }, + "outputs": [], + "source": [ + "plt.figure(figsize=(10, 10))\n", + "for images, labels in task.build_inputs(exp_config.task.train_data).take(1):\n", + " show_batch(images, labels)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v_A9VnL2RbXP" + }, + "source": [ + "## Visualize the testing data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AXovuumW_I2z" + }, + "source": [ + "Visualize a batch of images from the validation dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ma-_Eb-nte9A" + }, + "outputs": [], + "source": [ + "plt.figure(figsize=(10, 10));\n", + "for images, labels in task.build_inputs(exp_config.task.validation_data).take(1):\n", + " show_batch(images, labels)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ihKJt2FHRi2N" + }, + "source": [ + "## Train and evaluate" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0AFMNvYxtjXx" + }, + "outputs": [], + "source": [ + "model, eval_logs = tfm.core.train_lib.run_experiment(\n", + " distribution_strategy=distribution_strategy,\n", + " task=task,\n", + " mode='train_and_eval',\n", + " params=exp_config,\n", + " model_dir=model_dir,\n", + " run_post_eval=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gCcHMQYhozmA" + }, + "outputs": [], + "source": [ + "tf.keras.utils.plot_model(model, show_shapes=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "L7nVfxlBA8Gb" + }, + "source": [ + "Print the `accuracy`, `top_5_accuracy`, and `validation_loss` evaluation metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0124f938a1b9" + }, + "outputs": [], + "source": [ + "for key, value in eval_logs.items():\n", + " print(f'{key:20}: {value.numpy():.3f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TDys5bZ1zsml" + }, + "source": [ + "Run a batch of the processed training data through the model, and view the results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "GhI7zR-Uz1JT" + }, + "outputs": [], + "source": [ + "for images, labels in task.build_inputs(exp_config.task.train_data).take(1):\n", + " predictions = model.predict(images)\n", + " predictions = tf.argmax(predictions, axis=-1)\n", + "\n", + "show_batch(images, labels, tf.cast(predictions, tf.int32))\n", + "\n", + "if device=='CPU':\n", + " plt.suptitle('The model was only trained for a few steps, it is not expected to do well.')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fkE9locGTBgt" + }, + "source": [ + "## Export a SavedModel" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9669d08c91af" + }, + "source": [ + "The `keras.Model` object returned by `train_lib.run_experiment` expects the data to be normalized by the dataset loader using the same mean and variance statiscics in `preprocess_ops.normalize_image(image, offset=MEAN_RGB, scale=STDDEV_RGB)`. This export function handles those details, so you can pass `tf.uint8` images and get the correct results.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AQCFa7BvtmDg" + }, + "outputs": [], + "source": [ + "# Saving and exporting the trained model\n", + "export_saved_model_lib.export_inference_graph(\n", + " input_type='image_tensor',\n", + " batch_size=1,\n", + " input_image_size=[32, 32],\n", + " params=exp_config,\n", + " checkpoint_path=tf.train.latest_checkpoint(model_dir),\n", + " export_dir='./export/')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vVr6DxNqTyLZ" + }, + "source": [ + "Test the exported model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gP7nOvrftsB0" + }, + "outputs": [], + "source": [ + "# Importing SavedModel\n", + "imported = tf.saved_model.load('./export/')\n", + "model_fn = imported.signatures['serving_default']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GiOp2WVIUNUZ" + }, + "source": [ + "Visualize the predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BTRMrZQAN4mk" + }, + "outputs": [], + "source": [ + "plt.figure(figsize=(10, 10))\n", + "for data in tfds.load('cifar10', split='test').batch(12).take(1):\n", + " predictions = []\n", + " for image in data['image']:\n", + " index = tf.argmax(model_fn(image[tf.newaxis, ...])['logits'], axis=1)[0]\n", + " predictions.append(index)\n", + " show_batch(data['image'], data['label'], predictions)\n", + "\n", + " if device=='CPU':\n", + " plt.suptitle('The model was only trained for a few steps, it is not expected to do better than random.')" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "classification_with_model_garden.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/LICENSE b/official/LICENSE deleted file mode 100644 index d3da228420e973edaf4123d5eeb42210f4450b0c..0000000000000000000000000000000000000000 --- a/official/LICENSE +++ /dev/null @@ -1,203 +0,0 @@ -Copyright 2015 The TensorFlow Authors. All rights reserved. - - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright 2015, The TensorFlow Authors. - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/official/README.md b/official/README.md index 83ae2ee69bdb58b697c488a48c73e4c5af0fa2d5..4d1ea9cea072955fb06eaf4c960cee225e416d01 100644 --- a/official/README.md +++ b/official/README.md @@ -1,4 +1,6 @@ -![Logo](https://storage.googleapis.com/model_garden_artifacts/TF_Model_Garden.png) +
+ +
# TensorFlow Official Models @@ -12,6 +14,9 @@ being easy to read. These models are used as end-to-end tests, ensuring that the models run with the same or improved speed and performance with each new TensorFlow build. +The API documentation of the latest stable release is published to +[tensorflow.org](https://www.tensorflow.org/api_docs/python/tfm). + ## More models to come! The team is actively developing new models. @@ -20,6 +25,7 @@ In the near future, we will add: * State-of-the-art language understanding models. * State-of-the-art image classification models. * State-of-the-art object detection and instance segmentation models. +* State-of-the-art video classification models. ## Table of Contents @@ -27,144 +33,124 @@ In the near future, we will add: * [Computer Vision](#computer-vision) + [Image Classification](#image-classification) + [Object Detection and Segmentation](#object-detection-and-segmentation) + + [Video Classification](#video-classification) * [Natural Language Processing](#natural-language-processing) * [Recommendation](#recommendation) - [How to get started with the official models](#how-to-get-started-with-the-official-models) +- [Contributions](#contributions) ## Models and Implementations -### Computer Vision +### [Computer Vision](vision/README.md) #### Image Classification | Model | Reference (Paper) | |-------|-------------------| -| [MNIST](vision/image_classification) | A basic model to classify digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) | -| [ResNet](vision/beta/MODEL_GARDEN.md) | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) | -| [ResNet-RS](vision/beta/MODEL_GARDEN.md) | [Revisiting ResNets: Improved Training and Scaling Strategies](https://arxiv.org/abs/2103.07579) | -| [EfficientNet](vision/image_classification) | [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) | +| [ResNet](vision/MODEL_GARDEN.md) | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) | +| [ResNet-RS](vision/MODEL_GARDEN.md) | [Revisiting ResNets: Improved Training and Scaling Strategies](https://arxiv.org/abs/2103.07579) | +| [EfficientNet](vision/MODEL_GARDEN.md) | [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) | +| [Vision Transformer](vision/MODEL_GARDEN.md) | [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) | #### Object Detection and Segmentation | Model | Reference (Paper) | |-------|-------------------| -| [RetinaNet](vision/beta/MODEL_GARDEN.md) | [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002) | -| [Mask R-CNN](vision/beta/MODEL_GARDEN.md) | [Mask R-CNN](https://arxiv.org/abs/1703.06870) | -| [ShapeMask](vision/detection) | [ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors](https://arxiv.org/abs/1904.03239) | -| [SpineNet](vision/beta/MODEL_GARDEN.md) | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://arxiv.org/abs/1912.05027) | -| [Cascade RCNN-RS and RetinaNet-RS](vision/beta/MODEL_GARDEN.md) | [Simple Training Strategies and Model Scaling for Object Detection](https://arxiv.org/abs/2107.00057)| +| [RetinaNet](vision/MODEL_GARDEN.md) | [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002) | +| [Mask R-CNN](vision/MODEL_GARDEN.md) | [Mask R-CNN](https://arxiv.org/abs/1703.06870) | +| [SpineNet](vision/MODEL_GARDEN.md) | [SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization](https://arxiv.org/abs/1912.05027) | +| [Cascade RCNN-RS and RetinaNet-RS](vision/MODEL_GARDEN.md) | [Simple Training Strategies and Model Scaling for Object Detection](https://arxiv.org/abs/2107.00057)| -### Natural Language Processing +#### Video Classification | Model | Reference (Paper) | |-------|-------------------| -| [ALBERT (A Lite BERT)](nlp/MODEL_GARDEN.md#available-model-configs) | [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942) | -| [BERT (Bidirectional Encoder Representations from Transformers)](nlp/MODEL_GARDEN.md#available-model-configs) | [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) | -| [NHNet (News Headline generation model)](projects/nhnet) | [Generating Representative Headlines for News Stories](https://arxiv.org/abs/2001.09386) | -| [Transformer](nlp/transformer) | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) | -| [XLNet](nlp/xlnet) | [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) | -| [MobileBERT](projects/mobilebert) | [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) | - -### Recommendation - -Model | Reference (Paper) --------------------------------- | ----------------- -[DLRM](recommendation/ranking) | [Deep Learning Recommendation Model for Personalization and Recommendation Systems](https://arxiv.org/abs/1906.00091) -[DCN v2](recommendation/ranking) | [Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/abs/2008.13535) -[NCF](recommendation) | [Neural Collaborative Filtering](https://arxiv.org/abs/1708.05031) - -## How to get started with the official models +| [Mobile Video Networks (MoViNets)](projects/movinet) | [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511) | -* The models in the master branch are developed using TensorFlow 2, -and they target the TensorFlow [nightly binaries](https://github.com/tensorflow/tensorflow#installation) -built from the -[master branch of TensorFlow](https://github.com/tensorflow/tensorflow/tree/master). -* The stable versions targeting releases of TensorFlow are available -as tagged branches or [downloadable releases](https://github.com/tensorflow/models/releases). -* Model repository version numbers match the target TensorFlow release, -such that -[release v2.5.0](https://github.com/tensorflow/models/releases/tag/v2.5.0) -are compatible with -[TensorFlow v2.5.0](https://github.com/tensorflow/tensorflow/releases/tag/v2.5.0). +### [Natural Language Processing](nlp/README.md) -Please follow the below steps before running models in this repository. +#### Pre-trained Language Model -### Requirements +| Model | Reference (Paper) | +|-------|-------------------| +| [ALBERT](nlp/MODEL_GARDEN.md#available-model-configs) | [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942) | +| [BERT](nlp/MODEL_GARDEN.md#available-model-configs) | [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) | +| [ELECTRA](nlp/tasks/electra_task.py) | [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/abs/2003.10555) | -* The latest TensorFlow Model Garden release and TensorFlow 2 - * If you are on a version of TensorFlow earlier than 2.2, please -upgrade your TensorFlow to [the latest TensorFlow 2](https://www.tensorflow.org/install/). -```shell -pip3 install tf-nightly -``` +#### Neural Machine Translation -* Python 3.7+ +| Model | Reference (Paper) | +|-------|-------------------| +| [Transformer](nlp/MODEL_GARDEN.md#available-model-configs) | [Attention Is All You Need](https://arxiv.org/abs/1706.03762) | -Our integration tests run with Python 3.7. Although Python 3.6 should work, we -don't recommend earlier versions. +#### Natural Language Generation -### Installation +| Model | Reference (Paper) | +|-------|-------------------| +| [NHNet (News Headline generation model)](projects/nhnet) | [Generating Representative Headlines for News Stories](https://arxiv.org/abs/2001.09386) | -#### Method 1: Install the TensorFlow Model Garden pip package -**tf-models-official** is the stable Model Garden package. -pip will install all models and dependencies automatically. +#### Knowledge Distillation -```shell -pip install tf-models-official -``` +| Model | Reference (Paper) | +|-------|-------------------| +| [MobileBERT](projects/mobilebert) | [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) | -If you are using nlp packages, please also install **tensorflow-text**: +### Recommendation -```shell -pip install tensorflow-text -``` +Model | Reference (Paper) +-------------------------------- | ----------------- +[DLRM](recommendation/ranking) | [Deep Learning Recommendation Model for Personalization and Recommendation Systems](https://arxiv.org/abs/1906.00091) +[DCN v2](recommendation/ranking) | [Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems](https://arxiv.org/abs/2008.13535) +[NCF](recommendation) | [Neural Collaborative Filtering](https://arxiv.org/abs/1708.05031) -Please check out our [example](colab/fine_tuning_bert.ipynb) -to learn how to use a PIP package. +## How to get started with the official models -Note that **tf-models-official** may not include the latest changes in this -github repo. To include latest changes, you may install **tf-models-nightly**, -which is the nightly Model Garden package created daily automatically. +* The official models in the master branch are developed using +[master branch of TensorFlow 2](https://github.com/tensorflow/tensorflow/tree/master). +When you clone (the repository) or download (`pip` binary) master branch of +official models , master branch of TensorFlow gets downloaded as a +dependency. This is equivalent to the following. ```shell -pip install tf-models-nightly +pip3 install tf-models-nightly +pip3 install tensorflow-text-nightly # when model uses `nlp` packages ``` -#### Method 2: Clone the source - -1. Clone the GitHub repository: +* Incase of stable versions, targeting a specific release, Tensorflow-models +repository version numbers match with the target TensorFlow release. For +example, [TensorFlow-models v2.8.x](https://github.com/tensorflow/models/releases/tag/v2.8.0) +is compatible with [TensorFlow v2.8.x](https://github.com/tensorflow/tensorflow/releases/tag/v2.8.0). +This is equivalent to the following: ```shell -git clone https://github.com/tensorflow/models.git +pip3 install tf-models-official==2.8.0 +pip3 install tensorflow-text==2.8.0 # when models in uses `nlp` packages ``` -2. Add the top-level ***/models*** folder to the Python path. - -```shell -export PYTHONPATH=$PYTHONPATH:/path/to/models -``` +Starting from 2.9.x release, we release the modeling library as +`tensorflow_models` package and users can `import tensorflow_models` directly to +access to the exported symbols. If you are +using the latest nightly version or github code directly, please follow the +docstrings in the github. -If you are using a Colab notebook, please set the Python path with os.environ. +Please follow the below steps before running models in this repository. -```python -import os -os.environ['PYTHONPATH'] += ":/path/to/models" -``` +### Requirements -3. Install other dependencies +* The latest TensorFlow Model Garden release and the latest TensorFlow 2 + * If you are on a version of TensorFlow earlier than 2.2, please +upgrade your TensorFlow to [the latest TensorFlow 2](https://www.tensorflow.org/install/). +* Python 3.7+ -```shell -pip3 install --user -r official/requirements.txt -``` +Our integration tests run with Python 3.7. Although Python 3.6 should work, we +don't recommend earlier versions. -Finally, if you are using nlp packages, please also install -**tensorflow-text-nightly**: +### Installation -```shell -pip3 install tensorflow-text-nightly -``` +Please check [here](https://github.com/tensorflow/models#Installation) for the +instructions ## Contributions diff --git a/official/__init__.py b/official/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/__init__.py +++ b/official/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/benchmark/base_benchmark.py b/official/benchmark/base_benchmark.py index 3c2c76fa496ddc46b7632cfd012cb7998d9153d5..ad39737b67366647656a1fdaa0bea64ae49adf96 100644 --- a/official/benchmark/base_benchmark.py +++ b/official/benchmark/base_benchmark.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -17,7 +16,7 @@ import os import pprint - +from typing import Optional # Import libraries from absl import logging @@ -74,16 +73,21 @@ class BaseBenchmark( # pylint: disable=undefined-variable _benchmark_parameters = _get_benchmark_params( benchmark_definitions.VISION_BENCHMARKS) + _get_benchmark_params( benchmark_definitions.NLP_BENCHMARKS) + _get_benchmark_params( - benchmark_definitions.QAT_BENCHMARKS, True) + benchmark_definitions.QAT_BENCHMARKS, + True) + _get_benchmark_params( + benchmark_definitions.TENSOR_TRACER_BENCHMARKS) def __init__(self, output_dir=None, - tpu=None): + tpu=None, + tensorflow_models_path: Optional[str] = None): """Initialize class. Args: output_dir: Base directory to store all output for the test. tpu: (optional) TPU name to use in a TPU benchmark. + tensorflow_models_path: Full path to tensorflow models directory. Needed + to locate config files. """ if os.getenv('BENCHMARK_OUTPUT_DIR'): @@ -100,6 +104,13 @@ class BaseBenchmark( # pylint: disable=undefined-variable else: self._resolved_tpu = None + if os.getenv('TENSORFLOW_MODELS_PATH'): + self._tensorflow_models_path = os.getenv('TENSORFLOW_MODELS_PATH') + elif tensorflow_models_path: + self._tensorflow_models_path = tensorflow_models_path + else: + self._tensorflow_models_path = '' + def _get_model_dir(self, folder_name): """Returns directory to store info, e.g. saved model and event log.""" return os.path.join(self.output_dir, folder_name) @@ -117,16 +128,18 @@ class BaseBenchmark( # pylint: disable=undefined-variable gin_file): with gin.unlock_config(): - gin.parse_config_files_and_bindings( - [config_utils.get_config_path(g) for g in gin_file], None) + gin.parse_config_files_and_bindings([ + config_utils.get_config_path( + g, base_dir=self._tensorflow_models_path) for g in gin_file + ], None) params = exp_factory.get_exp_config(experiment_type) for config_file in config_files: - file_path = config_utils.get_config_path(config_file) + file_path = config_utils.get_config_path( + config_file, base_dir=self._tensorflow_models_path) params = hyperparams.override_params_dict( params, file_path, is_strict=True) - if params_override: params = hyperparams.override_params_dict( params, params_override, is_strict=True) diff --git a/official/benchmark/benchmark_definitions.py b/official/benchmark/benchmark_definitions.py index 4308867059e49777b3577e691cb593222d96a841..5fe5eba807de0bf7c66d3098764e1111bb2e79b0 100644 --- a/official/benchmark/benchmark_definitions.py +++ b/official/benchmark/benchmark_definitions.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -28,8 +27,8 @@ IMAGE_CLASSIFICATION_BENCHMARKS = { 'min_value': 0.76, 'max_value': 0.77 }], - config_files=['official/vision/beta/configs/experiments/' - 'image_classification/imagenet_resnet50_tpu.yaml']), + config_files=[('official/vision/configs/experiments/' + 'image_classification/imagenet_resnet50_tpu.yaml')]), 'image_classification.resnet50.gpu.8.fp16': dict( experiment_type='resnet_imagenet', @@ -40,8 +39,8 @@ IMAGE_CLASSIFICATION_BENCHMARKS = { 'min_value': 0.76, 'max_value': 0.77 }], - config_files=['official/vision/beta/configs/experiments/' - 'image_classification/imagenet_resnet50_gpu.yaml']) + config_files=[('official/vision/configs/experiments/' + 'image_classification/imagenet_resnet50_gpu.yaml')]) } @@ -54,3 +53,6 @@ NLP_BENCHMARKS = { QAT_BENCHMARKS = { } + +TENSOR_TRACER_BENCHMARKS = { +} diff --git a/official/benchmark/benchmark_lib.py b/official/benchmark/benchmark_lib.py index de796025e1520072d81ac4bcb209494c7fda0a7a..b0aed0888a1dfb1ecd44c04ee2e81e0256ae8eae 100644 --- a/official/benchmark/benchmark_lib.py +++ b/official/benchmark/benchmark_lib.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -16,7 +15,7 @@ """TFM common benchmark training driver.""" import os import time -from typing import Any, Mapping +from typing import Any, Mapping, Optional from absl import logging import orbit @@ -27,7 +26,20 @@ from official.core import config_definitions from official.core import task_factory from official.core import train_utils from official.modeling import performance -from official.modeling.fast_training import stage_lib +from official.projects.token_dropping import experiment_configs # pylint: disable=unused-import + + +class _OutputRecorderAction: + """Simple `Action` that saves the outputs passed to `__call__`.""" + + def __init__(self): + self.train_output = {} + + def __call__( + self, + output: Optional[Mapping[str, tf.Tensor]] = None) -> Mapping[str, Any]: + self.train_output = {k: v.numpy() for k, v in output.items() + } if output else {} def run_benchmark( @@ -83,10 +95,13 @@ def run_benchmark( steps_per_loop = params.trainer.steps_per_loop if ( execution_mode in ['accuracy', 'tflite_accuracy']) else 100 + + train_output_recorder = _OutputRecorderAction() controller = orbit.Controller( strategy=strategy, trainer=trainer, evaluator=trainer if (execution_mode == 'accuracy') else None, + train_actions=[train_output_recorder], global_step=trainer.global_step, steps_per_loop=steps_per_loop) @@ -109,7 +124,10 @@ def run_benchmark( tf.convert_to_tensor(params.trainer.validation_steps)) benchmark_data = {'metrics': eval_logs} elif execution_mode == 'performance': - benchmark_data = {} + if train_output_recorder.train_output: + benchmark_data = {'metrics': train_output_recorder.train_output} + else: + benchmark_data = {} elif execution_mode == 'tflite_accuracy': eval_logs = tflite_utils.train_and_evaluate( params, task, trainer, controller) @@ -133,54 +151,3 @@ def run_benchmark( startup_time=startup_time)) return benchmark_data - - -def run_fast_training_benchmark( - execution_mode: str, - params: config_definitions.ExperimentConfig, - model_dir: str, - distribution_strategy: tf.distribute.Strategy = None -) -> Mapping[str, Any]: - """Runs benchmark for a fast training experiment. - - This benchmark tests and only tests the binary - tensorflow_models/official/modeling/fast_training/train.py - - Args: - execution_mode: A 'str', specifying the mode. Can be 'accuracy', - 'performance', or 'tflite_accuracy'. - params: ExperimentConfig instance. - model_dir: A 'str', a path to store model checkpoints and summaries. - distribution_strategy: A tf.distribute.Strategy to use. If specified, - it will be used instead of inferring the strategy from params. - - Returns: - benchmark_data: returns benchmark data in dict format. - - Raises: - NotImplementedError: If try to use unsupported setup. - """ - if execution_mode == 'performance': - logging.warn('Fast training benchmark does not support execution_mode == ' - 'performance. This benchmark run will be skipped..') - return dict(examples_per_second=0.0, - wall_time=0.0, - startup_time=0.0) - - strategy = distribution_strategy or distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - - first_loop_start_time = time.time() - _, eval_logs = stage_lib.run_progressive_experiment( - distribution_strategy=strategy, - mode='train', - params=params, - model_dir=model_dir, - run_post_eval=True) - wall_time = time.time() - first_loop_start_time - - return dict(metrics=eval_logs, wall_time=wall_time, - startup_time=0.0, examples_per_second=0.0) diff --git a/official/benchmark/benchmark_lib_test.py b/official/benchmark/benchmark_lib_test.py index 89ceb350e63c798fae92b8449655c4c705a1cfa6..0b1808dfbd37f46aa671abdd8c274a1e8929e8e3 100644 --- a/official/benchmark/benchmark_lib_test.py +++ b/official/benchmark/benchmark_lib_test.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -17,7 +16,6 @@ # pylint: disable=g-direct-tensorflow-import from absl.testing import parameterized -import gin import tensorflow as tf from tensorflow.python.distribute import combinations @@ -82,47 +80,8 @@ class BenchmarkLibTest(tf.test.TestCase, parameterized.TestCase): self.assertIn('examples_per_second', benchmark_data) self.assertIn('wall_time', benchmark_data) self.assertIn('startup_time', benchmark_data) + self.assertIn('metrics', benchmark_data) - if execution_mode == 'accuracy': - self.assertIn('metrics', benchmark_data) - - @combinations.generate( - combinations.combine( - distribution=[ - strategy_combinations.default_strategy, - strategy_combinations.cloud_tpu_strategy, - strategy_combinations.one_device_strategy_gpu, - ], - execution_mode=['performance', 'accuracy'], - )) - def test_fast_training_benchmark(self, distribution, execution_mode): - - model_dir = self.get_temp_dir() - with gin.unlock_config(): - gin.parse_config_files_and_bindings( - None, - "get_initialize_fn.stacking_pattern = 'dense_{:layer_id}/'\n" - "StageParamProgressor.stage_overrides = (" - " {'trainer': {'train_steps': 1}}," - " {'trainer': {'train_steps': 2}}," - ")") - params = exp_factory.get_exp_config('mock') - params = hyperparams.override_params_dict( - params, self._test_config, is_strict=True) - - benchmark_data = benchmark_lib.run_fast_training_benchmark(execution_mode, - params, - model_dir, - distribution) - - if execution_mode == 'performance': - self.assertEqual(dict(examples_per_second=0.0, - wall_time=0.0, - startup_time=0.0), - benchmark_data) - else: - self.assertIn('wall_time', benchmark_data) - self.assertIn('metrics', benchmark_data) if __name__ == '__main__': tf.test.main() diff --git a/official/benchmark/benchmark_wrappers.py b/official/benchmark/benchmark_wrappers.py index 3d38b690c7865e0ab560e59422a2454e44be052d..62d82a58f3b9a99007af6e1e1d374ce777a885cc 100644 --- a/official/benchmark/benchmark_wrappers.py +++ b/official/benchmark/benchmark_wrappers.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2019 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); diff --git a/official/benchmark/bert_benchmark.py b/official/benchmark/bert_benchmark.py index 6a38382f5c244a28497af1029aada01c89c7cfee..46cb0f816ca01086071546a832913f4748ba98e0 100644 --- a/official/benchmark/bert_benchmark.py +++ b/official/benchmark/bert_benchmark.py @@ -14,7 +14,6 @@ # ============================================================================== """Executes BERT benchmarks and accuracy tests.""" -import functools import json import math import os @@ -28,8 +27,8 @@ from official.benchmark import benchmark_wrappers from official.benchmark import bert_benchmark_utils as benchmark_utils from official.benchmark import owner_utils from official.common import distribute_utils -from official.nlp.bert import configs -from official.nlp.bert import run_classifier +from official.legacy.bert import configs +from official.legacy.bert import run_classifier # pylint: disable=line-too-long PRETRAINED_CHECKPOINT_PATH = 'gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16/bert_model.ckpt' diff --git a/official/benchmark/bert_pretrain_benchmark.py b/official/benchmark/bert_pretrain_benchmark.py index f3ad87ffd4dca50d7c0ace4b85161ae1ace04e48..7118857ebe7b45468e0e0833de5a7c5cb65a984a 100644 --- a/official/benchmark/bert_pretrain_benchmark.py +++ b/official/benchmark/bert_pretrain_benchmark.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -28,7 +27,7 @@ from official.benchmark import benchmark_wrappers from official.benchmark import bert_benchmark_utils from official.benchmark import owner_utils from official.common import distribute_utils -from official.nlp.bert import run_pretraining +from official.legacy.bert import run_pretraining from official.utils.flags import core as flags_core # Pretrain masked lanauge modeling accuracy range: diff --git a/official/benchmark/bert_squad_benchmark.py b/official/benchmark/bert_squad_benchmark.py index fbb1554d2e2c59fc6f1b30ccb55893aab23a4e99..4a1ff5e16c62e54b8d364d74c6c584cf674e38a2 100644 --- a/official/benchmark/bert_squad_benchmark.py +++ b/official/benchmark/bert_squad_benchmark.py @@ -26,7 +26,7 @@ from official.benchmark import benchmark_wrappers from official.benchmark import bert_benchmark_utils as benchmark_utils from official.benchmark import owner_utils from official.common import distribute_utils -from official.nlp.bert import run_squad +from official.legacy.bert import run_squad from official.utils.misc import keras_utils diff --git a/official/benchmark/keras_imagenet_benchmark.py b/official/benchmark/keras_imagenet_benchmark.py index 90bf0250cc08ab4d584b246901bc19783ed2ddd6..6c00fef2121a9964acf0ef8f0dbafaa8e6447a83 100644 --- a/official/benchmark/keras_imagenet_benchmark.py +++ b/official/benchmark/keras_imagenet_benchmark.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2018 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); diff --git a/official/benchmark/models/resnet_imagenet_main.py b/official/benchmark/models/resnet_imagenet_main.py index 2b3088dc5c0814f75419e9905b37d815b8309d3d..f8954d9b4d36f5046d4075113e25036e89b73e0c 100644 --- a/official/benchmark/models/resnet_imagenet_main.py +++ b/official/benchmark/models/resnet_imagenet_main.py @@ -74,8 +74,6 @@ def run(flags_obj): Returns: Dictionary of training and eval stats. """ - keras_utils.set_session_config( - enable_xla=flags_obj.enable_xla) # Execute flag override logic for better model performance if flags_obj.tf_gpu_thread_mode: keras_utils.set_gpu_thread_mode_and_count( @@ -251,7 +249,8 @@ def run(flags_obj): optimizer=optimizer, metrics=(['sparse_categorical_accuracy'] if flags_obj.report_accuracy_metrics else None), - run_eagerly=flags_obj.run_eagerly) + run_eagerly=flags_obj.run_eagerly, + jit_compile=flags_obj.enable_xla) train_epochs = flags_obj.train_epochs diff --git a/official/benchmark/nhnet_benchmark.py b/official/benchmark/nhnet_benchmark.py index 385a6bcd5818f002115be7310b8dccba0152ce4f..a3eb1a170bb3fe96ddd530dfbdee7cfb934ad570 100644 --- a/official/benchmark/nhnet_benchmark.py +++ b/official/benchmark/nhnet_benchmark.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); diff --git a/official/benchmark/resnet50_keras_core.py b/official/benchmark/resnet50_keras_core.py index c0c051eeb58cb84bcd569fb1ef25550212c5b254..8d881199aaf75078dfc3f8eac4472d7426a80e0c 100644 --- a/official/benchmark/resnet50_keras_core.py +++ b/official/benchmark/resnet50_keras_core.py @@ -1,4 +1,3 @@ -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -15,6 +14,7 @@ # ============================================================================== """Resnet50 Keras core benchmark.""" +import statistics import tempfile import time @@ -100,7 +100,7 @@ class Resnet50KerasCoreBenchmark(perfzero_benchmark.PerfZeroBenchmark): wall_times = [] for _ in range(num_trials): wall_times.append(_run_benchmark()) - avg_wall_time = sum(wall_times) / float(len(wall_times)) + avg_wall_time = statistics.mean(wall_times) self.report_benchmark(iters=-1, wall_time=avg_wall_time) def benchmark_1_gpu_max_3(self): @@ -111,5 +111,21 @@ class Resnet50KerasCoreBenchmark(perfzero_benchmark.PerfZeroBenchmark): max_wall_time = max(wall_times) self.report_benchmark(iters=-1, wall_time=max_wall_time) + def benchmark_1_gpu_min_3(self): + num_trials = 3 + wall_times = [] + for _ in range(num_trials): + wall_times.append(_run_benchmark()) + min_wall_time = min(wall_times) + self.report_benchmark(iters=-1, wall_time=min_wall_time) + + def benchmark_1_gpu_med_3(self): + num_trials = 3 + wall_times = [] + for _ in range(num_trials): + wall_times.append(_run_benchmark()) + med_wall_time = statistics.median(wall_times) + self.report_benchmark(iters=-1, wall_time=med_wall_time) + if __name__ == "__main__": tf.test.main() diff --git a/official/benchmark/shakespeare_benchmark.py b/official/benchmark/shakespeare_benchmark.py index cddb4d5a9f2aba24963a80f78d222366b1c99e40..ea98ebe7dbe0ac1a673554bf090aed51cc0de9fe 100644 --- a/official/benchmark/shakespeare_benchmark.py +++ b/official/benchmark/shakespeare_benchmark.py @@ -331,7 +331,7 @@ class ShakespeareKerasBenchmarkReal(ShakespeareBenchmarkBase): def benchmark_xla_8_gpu(self): """Benchmark 8 gpu w/xla.""" self._setup() - FLAGS.num_gpus = 1 + FLAGS.num_gpus = 8 FLAGS.batch_size = 64 * 8 FLAGS.log_steps = 10 FLAGS.enable_xla = True diff --git a/official/benchmark/xlnet_benchmark.py b/official/benchmark/xlnet_benchmark.py deleted file mode 100644 index 3fbc8180f68c16a1a610bb0e25df38d4e49da682..0000000000000000000000000000000000000000 --- a/official/benchmark/xlnet_benchmark.py +++ /dev/null @@ -1,247 +0,0 @@ -# Copyright 2019 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================== -"""Executes XLNet benchmarks and accuracy tests.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import json -import os -import time - -# pylint: disable=g-bad-import-order - -from absl import flags -from absl.testing import flagsaver -import tensorflow as tf -# pylint: enable=g-bad-import-order - -from official.benchmark import bert_benchmark_utils as benchmark_utils -from official.benchmark import owner_utils -from official.nlp.xlnet import run_classifier -from official.nlp.xlnet import run_squad -from official.benchmark import benchmark_wrappers - - -# pylint: disable=line-too-long -PRETRAINED_CHECKPOINT_PATH = 'gs://cloud-tpu-checkpoints/xlnet/large/xlnet_model-1' -CLASSIFIER_TRAIN_DATA_PATH = 'gs://tf-perfzero-data/xlnet/imdb/spiece.model.len-512.train.tf_record' -CLASSIFIER_EVAL_DATA_PATH = 'gs://tf-perfzero-data/xlnet/imdb/spiece.model.len-512.dev.eval.tf_record' -SQUAD_DATA_PATH = 'gs://tf-perfzero-data/xlnet/squadv2_cased/' -# pylint: enable=line-too-long - -FLAGS = flags.FLAGS - - -class XLNetBenchmarkBase(benchmark_utils.BertBenchmarkBase): - """Base class to hold methods common to test classes in the module.""" - - def __init__(self, output_dir=None, tpu=None): - super(XLNetBenchmarkBase, self).__init__(output_dir=output_dir, tpu=tpu) - self.num_epochs = None - self.num_steps_per_epoch = None - - @flagsaver.flagsaver - def _run_xlnet_classifier(self): - """Starts XLNet classification task.""" - run_classifier.main(unused_argv=None) - - @flagsaver.flagsaver - def _run_xlnet_squad(self): - """Starts XLNet classification task.""" - run_squad.main(unused_argv=None) - - -class XLNetClassifyAccuracy(XLNetBenchmarkBase): - """Short accuracy test for XLNet classifier model. - - Tests XLNet classification task model accuracy. The naming - convention of below test cases follow - `benchmark_(number of gpus)_gpu_(dataset type)` format. - """ - - def __init__(self, output_dir=None, tpu=None, **kwargs): - self.train_data_path = CLASSIFIER_TRAIN_DATA_PATH - self.eval_data_path = CLASSIFIER_EVAL_DATA_PATH - self.pretrained_checkpoint_path = PRETRAINED_CHECKPOINT_PATH - - super(XLNetClassifyAccuracy, self).__init__(output_dir=output_dir, tpu=tpu) - - @benchmark_wrappers.enable_runtime_flags - def _run_and_report_benchmark(self, - training_summary_path, - min_accuracy=0.95, - max_accuracy=0.97): - """Starts XLNet accuracy benchmark test.""" - - start_time_sec = time.time() - self._run_xlnet_classifier() - wall_time_sec = time.time() - start_time_sec - - with tf.io.gfile.GFile(training_summary_path, 'rb') as reader: - summary = json.loads(reader.read().decode('utf-8')) - - super(XLNetClassifyAccuracy, self)._report_benchmark( - stats=summary, - wall_time_sec=wall_time_sec, - min_accuracy=min_accuracy, - max_accuracy=max_accuracy) - - def _setup(self): - super(XLNetClassifyAccuracy, self)._setup() - FLAGS.test_data_size = 25024 - FLAGS.train_batch_size = 16 - FLAGS.seq_len = 512 - FLAGS.mem_len = 0 - FLAGS.n_layer = 24 - FLAGS.d_model = 1024 - FLAGS.d_embed = 1024 - FLAGS.n_head = 16 - FLAGS.d_head = 64 - FLAGS.d_inner = 4096 - FLAGS.untie_r = True - FLAGS.n_class = 2 - FLAGS.ff_activation = 'gelu' - FLAGS.strategy_type = 'mirror' - FLAGS.learning_rate = 2e-5 - FLAGS.train_steps = 4000 - FLAGS.warmup_steps = 500 - FLAGS.iterations = 200 - FLAGS.bi_data = False - FLAGS.init_checkpoint = self.pretrained_checkpoint_path - FLAGS.train_tfrecord_path = self.train_data_path - FLAGS.test_tfrecord_path = self.eval_data_path - - @owner_utils.Owner('tf-model-garden') - def benchmark_8_gpu_imdb(self): - """Run XLNet model accuracy test with 8 GPUs.""" - self._setup() - FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu_imdb') - # Sets timer_callback to None as we do not use it now. - self.timer_callback = None - - summary_path = os.path.join(FLAGS.model_dir, - 'summaries/training_summary.txt') - self._run_and_report_benchmark(summary_path) - - @owner_utils.Owner('tf-model-garden') - def benchmark_2x2_tpu_imdb(self): - """Run XLNet model accuracy test on 2x2 tpu.""" - self._setup() - FLAGS.strategy_type = 'tpu' - FLAGS.model_dir = self._get_model_dir('benchmark_2x2_tpu_imdb') - # Sets timer_callback to None as we do not use it now. - self.timer_callback = None - - summary_path = os.path.join(FLAGS.model_dir, - 'summaries/training_summary.txt') - self._run_and_report_benchmark(summary_path) - - -class XLNetSquadAccuracy(XLNetBenchmarkBase): - """Short accuracy test for XLNet squad model. - - Tests XLNet squad task model accuracy. The naming - convention of below test cases follow - `benchmark_(number of gpus)_gpu_(dataset type)` format. - """ - - def __init__(self, output_dir=None, tpu=None, **kwargs): - self.train_data_path = SQUAD_DATA_PATH - self.predict_file = os.path.join(SQUAD_DATA_PATH, 'dev-v2.0.json') - self.test_data_path = os.path.join(SQUAD_DATA_PATH, '12048.eval.tf_record') - self.spiece_model_file = os.path.join(SQUAD_DATA_PATH, 'spiece.cased.model') - self.pretrained_checkpoint_path = PRETRAINED_CHECKPOINT_PATH - - super(XLNetSquadAccuracy, self).__init__(output_dir=output_dir, tpu=tpu) - - @benchmark_wrappers.enable_runtime_flags - def _run_and_report_benchmark(self, - training_summary_path, - min_accuracy=87.0, - max_accuracy=89.0): - """Starts XLNet accuracy benchmark test.""" - - start_time_sec = time.time() - self._run_xlnet_squad() - wall_time_sec = time.time() - start_time_sec - - with tf.io.gfile.GFile(training_summary_path, 'rb') as reader: - summary = json.loads(reader.read().decode('utf-8')) - - super(XLNetSquadAccuracy, self)._report_benchmark( - stats=summary, - wall_time_sec=wall_time_sec, - min_accuracy=min_accuracy, - max_accuracy=max_accuracy) - - def _setup(self): - super(XLNetSquadAccuracy, self)._setup() - FLAGS.train_batch_size = 16 - FLAGS.seq_len = 512 - FLAGS.mem_len = 0 - FLAGS.n_layer = 24 - FLAGS.d_model = 1024 - FLAGS.d_embed = 1024 - FLAGS.n_head = 16 - FLAGS.d_head = 64 - FLAGS.d_inner = 4096 - FLAGS.untie_r = True - FLAGS.ff_activation = 'gelu' - FLAGS.strategy_type = 'mirror' - FLAGS.learning_rate = 3e-5 - FLAGS.train_steps = 8000 - FLAGS.warmup_steps = 1000 - FLAGS.iterations = 1000 - FLAGS.bi_data = False - FLAGS.init_checkpoint = self.pretrained_checkpoint_path - FLAGS.train_tfrecord_path = self.train_data_path - FLAGS.test_tfrecord_path = self.test_data_path - FLAGS.spiece_model_file = self.spiece_model_file - FLAGS.predict_file = self.predict_file - FLAGS.adam_epsilon = 1e-6 - FLAGS.lr_layer_decay_rate = 0.75 - - @owner_utils.Owner('tf-model-garden') - def benchmark_8_gpu_squadv2(self): - """Run XLNet model squad v2 accuracy test with 8 GPUs.""" - self._setup() - FLAGS.model_dir = self._get_model_dir('benchmark_8_gpu_squadv2') - FLAGS.predict_dir = FLAGS.model_dir - # Sets timer_callback to None as we do not use it now. - self.timer_callback = None - - summary_path = os.path.join(FLAGS.model_dir, - 'summaries/training_summary.txt') - self._run_and_report_benchmark(summary_path) - - @owner_utils.Owner('tf-model-garden') - def benchmark_2x2_tpu_squadv2(self): - """Run XLNet model squad v2 accuracy test on 2x2 tpu.""" - self._setup() - FLAGS.strategy_type = 'tpu' - FLAGS.model_dir = self._get_model_dir('benchmark_2x2_tpu_squadv2') - FLAGS.predict_dir = FLAGS.model_dir - # Sets timer_callback to None as we do not use it now. - self.timer_callback = None - - summary_path = os.path.join(FLAGS.model_dir, - 'summaries/training_summary.txt') - self._run_and_report_benchmark(summary_path) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/colab/README.md b/official/colab/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0bd524708fa5b3ebde5db3c0dbcedb7f59be42d5 --- /dev/null +++ b/official/colab/README.md @@ -0,0 +1,4 @@ +# Moved + +These files have moved to: +https://github.com/tensorflow/models/blob/master/docs \ No newline at end of file diff --git a/official/colab/decoding_api_in_tf_nlp.ipynb b/official/colab/decoding_api_in_tf_nlp.ipynb deleted file mode 100644 index 726b382e228265fa1e19c2af3150e7cc32a0ec56..0000000000000000000000000000000000000000 --- a/official/colab/decoding_api_in_tf_nlp.ipynb +++ /dev/null @@ -1,492 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "vXLA5InzXydn" - }, - "source": [ - "##### Copyright 2021 The TensorFlow Authors." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "id": "RuRlpLL-X0R_" - }, - "outputs": [], - "source": [ - "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "fsACVQpVSifi" - }, - "source": [ - "### Install the TensorFlow Model Garden pip package\n", - "\n", - "* `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n", - "which is the nightly Model Garden package created daily automatically.\n", - "* pip will install all models and dependencies automatically." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hYEwGTeCXnnX" - }, - "source": [ - "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", - " \u003ctd\u003e\n", - " \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/official_models/tutorials/decoding_api_in_tf_nlp.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n", - " \u003c/td\u003e\n", - " \u003ctd\u003e\n", - " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/official/colab/decoding_api_in_tf_nlp.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", - " \u003c/td\u003e\n", - " \u003ctd\u003e\n", - " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/official/colab/decoding_api_in_tf_nlp.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n", - " \u003c/td\u003e\n", - " \u003ctd\u003e\n", - " \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/official/colab/decoding_api_in_tf_nlp.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n", - " \u003c/td\u003e\n", - "\u003c/table\u003e" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2j-xhrsVQOQT" - }, - "outputs": [], - "source": [ - "pip install tf-models-nightly" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "BjP7zwxmskpY" - }, - "outputs": [], - "source": [ - "import os\n", - "\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "\n", - "import tensorflow as tf\n", - "\n", - "from official import nlp\n", - "from official.nlp.modeling.ops import sampling_module\n", - "from official.nlp.modeling.ops import beam_search" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0AWgyo-IQ5sP" - }, - "source": [ - "# Decoding API\n", - "This API provides an interface to experiment with different decoding strategies used for auto-regressive models.\n", - "\n", - "1. The following sampling strategies are provided in sampling_module.py, which inherits from the base Decoding class:\n", - " * [top_p](https://arxiv.org/abs/1904.09751) : [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/sampling_module.py#L65) \n", - "\n", - " This implementation chooses most probable logits with cumulative probabilities upto top_p.\n", - "\n", - " * [top_k](https://arxiv.org/pdf/1805.04833.pdf) : [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/sampling_module.py#L48)\n", - "\n", - " At each timestep, this implementation samples from top-k logits based on their probability distribution\n", - "\n", - " * Greedy : [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/sampling_module.py#L26)\n", - "\n", - " This implementation returns the top logits based on probabilities.\n", - "\n", - "2. Beam search is provided in beam_search.py. [github](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/beam_search.py)\n", - "\n", - " This implementation reduces the risk of missing hidden high probability logits by keeping the most likely num_beams of logits at each time step and eventually choosing the logits that has the overall highest probability." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MfOj7oaBRQnS" - }, - "source": [ - "## Initialize Sampling Module in TF-NLP.\n", - "\n", - "\n", - "\u003e **symbols_to_logits_fn** : This is a closure implemented by the users of the API. The input to this closure will be \n", - "```\n", - "Args:\n", - " 1] ids [batch_size, .. (index + 1 or 1 if padded_decode is True)],\n", - " 2] index [scalar] : current decoded step,\n", - " 3] cache [nested dictionary of tensors].\n", - "Returns:\n", - " 1] tensor for next-step logits [batch_size, vocab]\n", - " 2] the updated_cache [nested dictionary of tensors].\n", - "```\n", - "This closure calls the model to predict the logits for the 'index+1' step. The cache is used for faster decoding.\n", - "Here is a [reference](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/ops/beam_search_test.py#L88) implementation for the above closure.\n", - "\n", - "\n", - "\u003e **length_normalization_fn** : Closure for returning length normalization parameter.\n", - "```\n", - "Args: \n", - " 1] length : scalar for decoded step index.\n", - " 2] dtype : data-type of output tensor\n", - "Returns:\n", - " 1] value of length normalization factor.\n", - "Example :\n", - " def _length_norm(length, dtype):\n", - " return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)\n", - "```\n", - "\n", - "\u003e **vocab_size** : Output vocabulary size.\n", - "\n", - "\u003e **max_decode_length** : Scalar for total number of decoding steps.\n", - "\n", - "\u003e **eos_id** : Decoding will stop if all output decoded ids in the batch have this ID.\n", - "\n", - "\u003e **padded_decode** : Set this to True if running on TPU. Tensors are padded to max_decoding_length if this is True.\n", - "\n", - "\u003e **top_k** : top_k is enabled if this value is \u003e 1.\n", - "\n", - "\u003e **top_p** : top_p is enabled if this value is \u003e 0 and \u003c 1.0\n", - "\n", - "\u003e **sampling_temperature** : This is used to re-estimate the softmax output. Temperature skews the distribution towards high probability tokens and lowers the mass in tail distribution. Value has to be positive. Low temperature is equivalent to greedy and makes the distribution sharper, while high temperature makes it more flat.\n", - "\n", - "\u003e **enable_greedy** : By default, this is true and greedy decoding is enabled.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "lV1RRp6ihnGX" - }, - "source": [ - "# Initialize the Model Hyper-parameters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "eTsGp2gaKLdE" - }, - "outputs": [], - "source": [ - "params = {}\n", - "params['num_heads'] = 2\n", - "params['num_layers'] = 2\n", - "params['batch_size'] = 2\n", - "params['n_dims'] = 256\n", - "params['max_decode_length'] = 4" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "UGvmd0_dRFYI" - }, - "source": [ - "## What is a Cache?\n", - "In auto-regressive architectures like Transformer based [Encoder-Decoder](https://arxiv.org/abs/1706.03762) models, \n", - "Cache is used for fast sequential decoding.\n", - "It is a nested dictionary storing pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) for every layer.\n", - "\n", - "```\n", - "{\n", - " 'layer_%d' % layer: {\n", - " 'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32),\n", - " 'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32)\n", - " } for layer in range(params['num_layers']),\n", - " 'model_specific_item' : Model specific tensor shape,\n", - "}\n", - "\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "CYXkoplAij01" - }, - "source": [ - "# Initialize cache. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "D6kfZOOKgkm1" - }, - "outputs": [], - "source": [ - "cache = {\n", - " 'layer_%d' % layer: {\n", - " 'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32),\n", - " 'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32)\n", - " } for layer in range(params['num_layers'])\n", - " }\n", - "print(\"cache key shape for layer 1 :\", cache['layer_1']['k'].shape)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nNY3Xn8SiblP" - }, - "source": [ - "# Define closure for length normalization. **optional.**\n", - "\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "T92ccAzlnGqh" - }, - "outputs": [], - "source": [ - "def length_norm(length, dtype):\n", - " \"\"\"Return length normalization factor.\"\"\"\n", - " return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "syl7I5nURPgW" - }, - "source": [ - "# Create model_fn\n", - " In practice, this will be replaced by an actual model implementation such as [here](https://github.com/tensorflow/models/blob/master/official/nlp/transformer/transformer.py#L236)\n", - "```\n", - "Args:\n", - "i : Step that is being decoded.\n", - "Returns:\n", - " logit probabilities of size [batch_size, 1, vocab_size]\n", - "```\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "AhzSkRisRdB6" - }, - "outputs": [], - "source": [ - "probabilities = tf.constant([[[0.3, 0.4, 0.3], [0.3, 0.3, 0.4],\n", - " [0.1, 0.1, 0.8], [0.1, 0.1, 0.8]],\n", - " [[0.2, 0.5, 0.3], [0.2, 0.7, 0.1],\n", - " [0.1, 0.1, 0.8], [0.1, 0.1, 0.8]]])\n", - "def model_fn(i):\n", - " return probabilities[:, i, :]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DBMUkaVmVZBg" - }, - "source": [ - "# Initialize symbols_to_logits_fn\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "FAJ4CpbfVdjr" - }, - "outputs": [], - "source": [ - "def _symbols_to_logits_fn():\n", - " \"\"\"Calculates logits of the next tokens.\"\"\"\n", - " def symbols_to_logits_fn(ids, i, temp_cache):\n", - " del ids\n", - " logits = tf.cast(tf.math.log(model_fn(i)), tf.float32)\n", - " return logits, temp_cache\n", - " return symbols_to_logits_fn" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "R_tV3jyWVL47" - }, - "source": [ - "# Greedy \n", - "Greedy decoding selects the token id with the highest probability as its next id: $id_t = argmax_{w}P(id | id_{1:t-1})$ at each timestep $t$. The following sketch shows greedy decoding. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "aGt9idSkVQEJ" - }, - "outputs": [], - "source": [ - "greedy_obj = sampling_module.SamplingModule(\n", - " length_normalization_fn=None,\n", - " dtype=tf.float32,\n", - " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", - " vocab_size=3,\n", - " max_decode_length=params['max_decode_length'],\n", - " eos_id=10,\n", - " padded_decode=False)\n", - "ids, _ = greedy_obj.generate(\n", - " initial_ids=tf.constant([9, 1]), initial_cache=cache)\n", - "print(\"Greedy Decoded Ids:\", ids)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "s4pTTsQXVz5O" - }, - "source": [ - "# top_k sampling\n", - "In *Top-K* sampling, the *K* most likely next token ids are filtered and the probability mass is redistributed among only those *K* ids. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pCLWIn6GV5_G" - }, - "outputs": [], - "source": [ - "top_k_obj = sampling_module.SamplingModule(\n", - " length_normalization_fn=length_norm,\n", - " dtype=tf.float32,\n", - " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", - " vocab_size=3,\n", - " max_decode_length=params['max_decode_length'],\n", - " eos_id=10,\n", - " sample_temperature=tf.constant(1.0),\n", - " top_k=tf.constant(3),\n", - " padded_decode=False,\n", - " enable_greedy=False)\n", - "ids, _ = top_k_obj.generate(\n", - " initial_ids=tf.constant([9, 1]), initial_cache=cache)\n", - "print(\"top-k sampled Ids:\", ids)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Jp3G-eE_WI4Y" - }, - "source": [ - "# top_p sampling\n", - "Instead of sampling only from the most likely *K* token ids, in *Top-p* sampling chooses from the smallest possible set of ids whose cumulative probability exceeds the probability *p*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "rEGdIWcuWILO" - }, - "outputs": [], - "source": [ - "top_p_obj = sampling_module.SamplingModule(\n", - " length_normalization_fn=length_norm,\n", - " dtype=tf.float32,\n", - " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", - " vocab_size=3,\n", - " max_decode_length=params['max_decode_length'],\n", - " eos_id=10,\n", - " sample_temperature=tf.constant(1.0),\n", - " top_p=tf.constant(0.9),\n", - " padded_decode=False,\n", - " enable_greedy=False)\n", - "ids, _ = top_p_obj.generate(\n", - " initial_ids=tf.constant([9, 1]), initial_cache=cache)\n", - "print(\"top-p sampled Ids:\", ids)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2hcuyJ2VWjDz" - }, - "source": [ - "# Beam search decoding\n", - "Beam search reduces the risk of missing hidden high probability token ids by keeping the most likely num_beams of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cJ3WzvSrWmSA" - }, - "outputs": [], - "source": [ - "beam_size = 2\n", - "params['batch_size'] = 1\n", - "beam_cache = {\n", - " 'layer_%d' % layer: {\n", - " 'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']], dtype=tf.float32),\n", - " 'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']], dtype=tf.float32)\n", - " } for layer in range(params['num_layers'])\n", - " }\n", - "print(\"cache key shape for layer 1 :\", beam_cache['layer_1']['k'].shape)\n", - "ids, _ = beam_search.sequence_beam_search(\n", - " symbols_to_logits_fn=_symbols_to_logits_fn(),\n", - " initial_ids=tf.constant([9], tf.int32),\n", - " initial_cache=beam_cache,\n", - " vocab_size=3,\n", - " beam_size=beam_size,\n", - " alpha=0.6,\n", - " max_decode_length=params['max_decode_length'],\n", - " eos_id=10,\n", - " padded_decode=False,\n", - " dtype=tf.float32)\n", - "print(\"Beam search ids:\", ids)" - ] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "collapsed_sections": [], - "name": "decoding_api_in_tf_nlp.ipynb", - "provenance": [], - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/official/colab/nlp/customize_encoder.ipynb b/official/colab/nlp/customize_encoder.ipynb deleted file mode 100644 index aeddb29f96352fbd4c8df3540e6bd4b8fe70bb8b..0000000000000000000000000000000000000000 --- a/official/colab/nlp/customize_encoder.ipynb +++ /dev/null @@ -1,575 +0,0 @@ -{ - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "name": "Customizing a Transformer Encoder", - "private_outputs": true, - "provenance": [], - "collapsed_sections": [], - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "Bp8t2AI8i7uP" - }, - "source": [ - "##### Copyright 2020 The TensorFlow Authors." - ] - }, - { - "cell_type": "code", - "metadata": { - "cellView": "form", - "id": "rxPj2Lsni9O4" - }, - "source": [ - "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6xS-9i5DrRvO" - }, - "source": [ - "# Customizing a Transformer Encoder" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Mwb9uw1cDXsa" - }, - "source": [ - "\n", - " \n", - " \n", - " \n", - " \n", - "
\n", - " View on TensorFlow.org\n", - " \n", - " Run in Google Colab\n", - " \n", - " View source on GitHub\n", - " \n", - " Download notebook\n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "iLrcV4IyrcGX" - }, - "source": [ - "## Learning objectives\n", - "\n", - "The [TensorFlow Models NLP library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling) is a collection of tools for building and training modern high performance natural language models.\n", - "\n", - "The [TransformEncoder](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/encoder_scaffold.py) is the core of this library, and lots of new network architectures are proposed to improve the encoder. In this Colab notebook, we will learn how to customize the encoder to employ new network architectures." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "YYxdyoWgsl8t" - }, - "source": [ - "## Install and import" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "fEJSFutUsn_h" - }, - "source": [ - "### Install the TensorFlow Model Garden pip package\n", - "\n", - "* `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n", - "which is the nightly Model Garden package created daily automatically.\n", - "* `pip` will install all models and dependencies automatically." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "thsKZDjhswhR" - }, - "source": [ - "!pip install -q tf-models-official==2.4.0" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "hpf7JPCVsqtv" - }, - "source": [ - "### Import Tensorflow and other libraries" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "my4dp-RMssQe" - }, - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "\n", - "from official.modeling import activations\n", - "from official.nlp import modeling\n", - "from official.nlp.modeling import layers, losses, models, networks" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "vjDmVsFfs85n" - }, - "source": [ - "## Canonical BERT encoder\n", - "\n", - "Before learning how to customize the encoder, let's firstly create a canonical BERT enoder and use it to instantiate a `BertClassifier` for classification task." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "Oav8sbgstWc-" - }, - "source": [ - "cfg = {\n", - " \"vocab_size\": 100,\n", - " \"hidden_size\": 32,\n", - " \"num_layers\": 3,\n", - " \"num_attention_heads\": 4,\n", - " \"intermediate_size\": 64,\n", - " \"activation\": activations.gelu,\n", - " \"dropout_rate\": 0.1,\n", - " \"attention_dropout_rate\": 0.1,\n", - " \"max_sequence_length\": 16,\n", - " \"type_vocab_size\": 2,\n", - " \"initializer\": tf.keras.initializers.TruncatedNormal(stddev=0.02),\n", - "}\n", - "bert_encoder = modeling.networks.BertEncoder(**cfg)\n", - "\n", - "def build_classifier(bert_encoder):\n", - " return modeling.models.BertClassifier(bert_encoder, num_classes=2)\n", - "\n", - "canonical_classifier_model = build_classifier(bert_encoder)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Qe2UWI6_tsHo" - }, - "source": [ - "`canonical_classifier_model` can be trained using the training data. For details about how to train the model, please see the colab [fine_tuning_bert.ipynb](https://github.com/tensorflow/models/blob/master/official/colab/fine_tuning_bert.ipynb). We skip the code that trains the model here.\n", - "\n", - "After training, we can apply the model to do prediction.\n" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "csED2d-Yt5h6" - }, - "source": [ - "def predict(model):\n", - " batch_size = 3\n", - " np.random.seed(0)\n", - " word_ids = np.random.randint(\n", - " cfg[\"vocab_size\"], size=(batch_size, cfg[\"max_sequence_length\"]))\n", - " mask = np.random.randint(2, size=(batch_size, cfg[\"max_sequence_length\"]))\n", - " type_ids = np.random.randint(\n", - " cfg[\"type_vocab_size\"], size=(batch_size, cfg[\"max_sequence_length\"]))\n", - " print(model([word_ids, mask, type_ids], training=False))\n", - "\n", - "predict(canonical_classifier_model)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "PzKStEK9t_Pb" - }, - "source": [ - "## Customize BERT encoder\n", - "\n", - "One BERT encoder consists of an embedding network and multiple transformer blocks, and each transformer block contains an attention layer and a feedforward layer." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rmwQfhj6fmKz" - }, - "source": [ - "We provide easy ways to customize each of those components via (1)\n", - "[EncoderScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/encoder_scaffold.py) and (2) [TransformerScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "xsMgEVHAui11" - }, - "source": [ - "### Use EncoderScaffold\n", - "\n", - "`EncoderScaffold` allows users to provide a custom embedding subnetwork\n", - " (which will replace the standard embedding logic) and/or a custom hidden layer class (which will replace the `Transformer` instantiation in the encoder)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "-JBabpa2AOz8" - }, - "source": [ - "#### Without Customization\n", - "\n", - "Without any customization, `EncoderScaffold` behaves the same the canonical `BertEncoder`.\n", - "\n", - "As shown in the following example, `EncoderScaffold` can load `BertEncoder`'s weights and output the same values:" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "ktNzKuVByZQf" - }, - "source": [ - "default_hidden_cfg = dict(\n", - " num_attention_heads=cfg[\"num_attention_heads\"],\n", - " intermediate_size=cfg[\"intermediate_size\"],\n", - " intermediate_activation=activations.gelu,\n", - " dropout_rate=cfg[\"dropout_rate\"],\n", - " attention_dropout_rate=cfg[\"attention_dropout_rate\"],\n", - " kernel_initializer=tf.keras.initializers.TruncatedNormal(0.02),\n", - ")\n", - "default_embedding_cfg = dict(\n", - " vocab_size=cfg[\"vocab_size\"],\n", - " type_vocab_size=cfg[\"type_vocab_size\"],\n", - " hidden_size=cfg[\"hidden_size\"],\n", - " initializer=tf.keras.initializers.TruncatedNormal(0.02),\n", - " dropout_rate=cfg[\"dropout_rate\"],\n", - " max_seq_length=cfg[\"max_sequence_length\"]\n", - ")\n", - "default_kwargs = dict(\n", - " hidden_cfg=default_hidden_cfg,\n", - " embedding_cfg=default_embedding_cfg,\n", - " num_hidden_instances=cfg[\"num_layers\"],\n", - " pooled_output_dim=cfg[\"hidden_size\"],\n", - " return_all_layer_outputs=True,\n", - " pooler_layer_initializer=tf.keras.initializers.TruncatedNormal(0.02),\n", - ")\n", - "\n", - "encoder_scaffold = modeling.networks.EncoderScaffold(**default_kwargs)\n", - "classifier_model_from_encoder_scaffold = build_classifier(encoder_scaffold)\n", - "classifier_model_from_encoder_scaffold.set_weights(\n", - " canonical_classifier_model.get_weights())\n", - "predict(classifier_model_from_encoder_scaffold)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "sMaUmLyIuwcs" - }, - "source": [ - "#### Customize Embedding\n", - "\n", - "Next, we show how to use a customized embedding network.\n", - "\n", - "We firstly build an embedding network that will replace the default network. This one will have 2 inputs (`mask` and `word_ids`) instead of 3, and won't use positional embeddings." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "LTinnaG6vcsw" - }, - "source": [ - "word_ids = tf.keras.layers.Input(\n", - " shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_word_ids\")\n", - "mask = tf.keras.layers.Input(\n", - " shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_mask\")\n", - "embedding_layer = modeling.layers.OnDeviceEmbedding(\n", - " vocab_size=cfg['vocab_size'],\n", - " embedding_width=cfg['hidden_size'],\n", - " initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02),\n", - " name=\"word_embeddings\")\n", - "word_embeddings = embedding_layer(word_ids)\n", - "attention_mask = layers.SelfAttentionMask()([word_embeddings, mask])\n", - "new_embedding_network = tf.keras.Model([word_ids, mask],\n", - " [word_embeddings, attention_mask])" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "HN7_yu-6O3qI" - }, - "source": [ - "Inspecting `new_embedding_network`, we can see it takes two inputs:\n", - "`input_word_ids` and `input_mask`." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "fO9zKFE4OpHp" - }, - "source": [ - "tf.keras.utils.plot_model(new_embedding_network, show_shapes=True, dpi=48)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9cOaGQHLv12W" - }, - "source": [ - "We then can build a new encoder using the above `new_embedding_network`." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "mtFDMNf2vIl9" - }, - "source": [ - "kwargs = dict(default_kwargs)\n", - "\n", - "# Use new embedding network.\n", - "kwargs['embedding_cls'] = new_embedding_network\n", - "kwargs['embedding_data'] = embedding_layer.embeddings\n", - "\n", - "encoder_with_customized_embedding = modeling.networks.EncoderScaffold(**kwargs)\n", - "classifier_model = build_classifier(encoder_with_customized_embedding)\n", - "# ... Train the model ...\n", - "print(classifier_model.inputs)\n", - "\n", - "# Assert that there are only two inputs.\n", - "assert len(classifier_model.inputs) == 2" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Z73ZQDtmwg9K" - }, - "source": [ - "#### Customized Transformer\n", - "\n", - "User can also override the [hidden_cls](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/encoder_scaffold.py#L103) argument in `EncoderScaffold`'s constructor to employ a customized Transformer layer.\n", - "\n", - "See [ReZeroTransformer](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/rezero_transformer.py) for how to implement a customized Transformer layer.\n", - "\n", - "Following is an example of using `ReZeroTransformer`:\n" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "uAIarLZgw6pA" - }, - "source": [ - "kwargs = dict(default_kwargs)\n", - "\n", - "# Use ReZeroTransformer.\n", - "kwargs['hidden_cls'] = modeling.layers.ReZeroTransformer\n", - "\n", - "encoder_with_rezero_transformer = modeling.networks.EncoderScaffold(**kwargs)\n", - "classifier_model = build_classifier(encoder_with_rezero_transformer)\n", - "# ... Train the model ...\n", - "predict(classifier_model)\n", - "\n", - "# Assert that the variable `rezero_alpha` from ReZeroTransformer exists.\n", - "assert 'rezero_alpha' in ''.join([x.name for x in classifier_model.trainable_weights])" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6PMHFdvnxvR0" - }, - "source": [ - "### Use [TransformerScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py)\n", - "\n", - "The above method of customizing `Transformer` requires rewriting the whole `Transformer` layer, while sometimes you may only want to customize either attention layer or feedforward block. In this case, [TransformerScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py) can be used.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "D6FejlgwyAy_" - }, - "source": [ - "#### Customize Attention Layer\n", - "\n", - "User can also override the [attention_cls](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py#L45) argument in `TransformerScaffold`'s constructor to employ a customized Attention layer.\n", - "\n", - "See [TalkingHeadsAttention](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/talking_heads_attention.py) for how to implement a customized `Attention` layer.\n", - "\n", - "Following is an example of using [TalkingHeadsAttention](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/talking_heads_attention.py):" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "nFrSMrZuyNeQ" - }, - "source": [ - "# Use TalkingHeadsAttention\n", - "hidden_cfg = dict(default_hidden_cfg)\n", - "hidden_cfg['attention_cls'] = modeling.layers.TalkingHeadsAttention\n", - "\n", - "kwargs = dict(default_kwargs)\n", - "kwargs['hidden_cls'] = modeling.layers.TransformerScaffold\n", - "kwargs['hidden_cfg'] = hidden_cfg\n", - "\n", - "encoder = modeling.networks.EncoderScaffold(**kwargs)\n", - "classifier_model = build_classifier(encoder)\n", - "# ... Train the model ...\n", - "predict(classifier_model)\n", - "\n", - "# Assert that the variable `pre_softmax_weight` from TalkingHeadsAttention exists.\n", - "assert 'pre_softmax_weight' in ''.join([x.name for x in classifier_model.trainable_weights])" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "kuEJcTyByVvI" - }, - "source": [ - "#### Customize Feedforward Layer\n", - "\n", - "Similiarly, one could also customize the feedforward layer.\n", - "\n", - "See [GatedFeedforward](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/gated_feedforward.py) for how to implement a customized feedforward layer.\n", - "\n", - "Following is an example of using [GatedFeedforward](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/gated_feedforward.py)." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "XAbKy_l4y_-i" - }, - "source": [ - "# Use TalkingHeadsAttention\n", - "hidden_cfg = dict(default_hidden_cfg)\n", - "hidden_cfg['feedforward_cls'] = modeling.layers.GatedFeedforward\n", - "\n", - "kwargs = dict(default_kwargs)\n", - "kwargs['hidden_cls'] = modeling.layers.TransformerScaffold\n", - "kwargs['hidden_cfg'] = hidden_cfg\n", - "\n", - "encoder_with_gated_feedforward = modeling.networks.EncoderScaffold(**kwargs)\n", - "classifier_model = build_classifier(encoder_with_gated_feedforward)\n", - "# ... Train the model ...\n", - "predict(classifier_model)\n", - "\n", - "# Assert that the variable `gate` from GatedFeedforward exists.\n", - "assert 'gate' in ''.join([x.name for x in classifier_model.trainable_weights])" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "a_8NWUhkzeAq" - }, - "source": [ - "### Build a new Encoder using building blocks from KerasBERT.\n", - "\n", - "Finally, you could also build a new encoder using building blocks in the modeling library.\n", - "\n", - "See [AlbertEncoder](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/albert_encoder.py) as an example:\n" - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "xsiA3RzUzmUM" - }, - "source": [ - "albert_encoder = modeling.networks.AlbertEncoder(**cfg)\n", - "classifier_model = build_classifier(albert_encoder)\n", - "# ... Train the model ...\n", - "predict(classifier_model)" - ], - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MeidDfhlHKSO" - }, - "source": [ - "Inspecting the `albert_encoder`, we see it stacks the same `Transformer` layer multiple times." - ] - }, - { - "cell_type": "code", - "metadata": { - "id": "Uv_juT22HERW" - }, - "source": [ - "tf.keras.utils.plot_model(albert_encoder, show_shapes=True, dpi=48)" - ], - "execution_count": null, - "outputs": [] - } - ] -} \ No newline at end of file diff --git a/official/colab/nlp/nlp_modeling_library_intro.ipynb b/official/colab/nlp/nlp_modeling_library_intro.ipynb deleted file mode 100644 index e4ce780c96bfbf679c91891f38b08ac3b0bb983e..0000000000000000000000000000000000000000 --- a/official/colab/nlp/nlp_modeling_library_intro.ipynb +++ /dev/null @@ -1,544 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "80xnUmoI7fBX" - }, - "source": [ - "##### Copyright 2020 The TensorFlow Authors." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "cellView": "form", - "id": "8nvTnfs6Q692" - }, - "outputs": [], - "source": [ - "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", - "# you may not use this file except in compliance with the License.\n", - "# You may obtain a copy of the License at\n", - "#\n", - "# https://www.apache.org/licenses/LICENSE-2.0\n", - "#\n", - "# Unless required by applicable law or agreed to in writing, software\n", - "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", - "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", - "# See the License for the specific language governing permissions and\n", - "# limitations under the License." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WmfcMK5P5C1G" - }, - "source": [ - "# Introduction to the TensorFlow Models NLP library" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "cH-oJ8R6AHMK" - }, - "source": [ - "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", - " \u003ctd\u003e\n", - " \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/official_models/nlp/nlp_modeling_library_intro\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n", - " \u003c/td\u003e\n", - " \u003ctd\u003e\n", - " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/official/colab/nlp/nlp_modeling_library_intro.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", - " \u003c/td\u003e\n", - " \u003ctd\u003e\n", - " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/official/colab/nlp/nlp_modeling_library_intro.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n", - " \u003c/td\u003e\n", - " \u003ctd\u003e\n", - " \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/official/colab/nlp/nlp_modeling_library_intro.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n", - " \u003c/td\u003e\n", - "\u003c/table\u003e" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0H_EFIhq4-MJ" - }, - "source": [ - "## Learning objectives\n", - "\n", - "In this Colab notebook, you will learn how to build transformer-based models for common NLP tasks including pretraining, span labelling and classification using the building blocks from [NLP modeling library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2N97-dps_nUk" - }, - "source": [ - "## Install and import" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "459ygAVl_rg0" - }, - "source": [ - "### Install the TensorFlow Model Garden pip package\n", - "\n", - "* `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n", - "which is the nightly Model Garden package created daily automatically.\n", - "* `pip` will install all models and dependencies automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Y-qGkdh6_sZc" - }, - "outputs": [], - "source": [ - "!pip install -q tf-models-official==2.4.0" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "e4huSSwyAG_5" - }, - "source": [ - "### Import Tensorflow and other libraries" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "jqYXqtjBAJd9" - }, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "\n", - "from official.nlp import modeling\n", - "from official.nlp.modeling import layers, losses, models, networks" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "djBQWjvy-60Y" - }, - "source": [ - "## BERT pretraining model\n", - "\n", - "BERT ([Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)) introduced the method of pre-training language representations on a large text corpus and then using that model for downstream NLP tasks.\n", - "\n", - "In this section, we will learn how to build a model to pretrain BERT on the masked language modeling task and next sentence prediction task. For simplicity, we only show the minimum example and use dummy data." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MKuHVlsCHmiq" - }, - "source": [ - "### Build a `BertPretrainer` model wrapping `BertEncoder`\n", - "\n", - "The [BertEncoder](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/bert_encoder.py) implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers, but not the masked language model or classification task networks.\n", - "\n", - "The [BertPretrainer](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_pretrainer.py) allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EXkcXz-9BwB3" - }, - "outputs": [], - "source": [ - "# Build a small transformer network.\n", - "vocab_size = 100\n", - "sequence_length = 16\n", - "network = modeling.networks.BertEncoder(\n", - " vocab_size=vocab_size, num_layers=2, sequence_length=16)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0NH5irV5KTMS" - }, - "source": [ - "Inspecting the encoder, we see it contains few embedding layers, stacked `Transformer` layers and are connected to three input layers:\n", - "\n", - "`input_word_ids`, `input_type_ids` and `input_mask`.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "lZNoZkBrIoff" - }, - "outputs": [], - "source": [ - "tf.keras.utils.plot_model(network, show_shapes=True, dpi=48)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "o7eFOZXiIl-b" - }, - "outputs": [], - "source": [ - "# Create a BERT pretrainer with the created network.\n", - "num_token_predictions = 8\n", - "bert_pretrainer = modeling.models.BertPretrainer(\n", - " network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d5h5HT7gNHx_" - }, - "source": [ - "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `Classification` heads." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2tcNfm03IBF7" - }, - "outputs": [], - "source": [ - "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, dpi=48)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "F2oHrXGUIS0M" - }, - "outputs": [], - "source": [ - "# We can feed some dummy data to get masked language model and sentence output.\n", - "batch_size = 2\n", - "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n", - "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n", - "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n", - "masked_lm_positions_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n", - "\n", - "outputs = bert_pretrainer(\n", - " [word_id_data, mask_data, type_id_data, masked_lm_positions_data])\n", - "lm_output = outputs[\"masked_lm\"]\n", - "sentence_output = outputs[\"classification\"]\n", - "print(lm_output)\n", - "print(sentence_output)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "bnx3UCHniCS5" - }, - "source": [ - "### Compute loss\n", - "Next, we can use `lm_output` and `sentence_output` to compute `loss`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "k30H4Q86f52x" - }, - "outputs": [], - "source": [ - "masked_lm_ids_data = np.random.randint(vocab_size, size=(batch_size, num_token_predictions))\n", - "masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n", - "next_sentence_labels_data = np.random.randint(2, size=(batch_size))\n", - "\n", - "mlm_loss = modeling.losses.weighted_sparse_categorical_crossentropy_loss(\n", - " labels=masked_lm_ids_data,\n", - " predictions=lm_output,\n", - " weights=masked_lm_weights_data)\n", - "sentence_loss = modeling.losses.weighted_sparse_categorical_crossentropy_loss(\n", - " labels=next_sentence_labels_data,\n", - " predictions=sentence_output)\n", - "loss = mlm_loss + sentence_loss\n", - "print(loss)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "wrmSs8GjHxVw" - }, - "source": [ - "With the loss, you can optimize the model.\n", - "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_pretraining.py) for the full example.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "k8cQVFvBCV4s" - }, - "source": [ - "## Span labeling model\n", - "\n", - "Span labeling is the task to assign labels to a span of the text, for example, label a span of text as the answer of a given question.\n", - "\n", - "In this section, we will learn how to build a span labeling model. Again, we use dummy data for simplicity." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "xrLLEWpfknUW" - }, - "source": [ - "### Build a BertSpanLabeler wrapping BertEncoder\n", - "\n", - "[BertSpanLabeler](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_span_labeler.py) implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n", - "\n", - "Note that `BertSpanLabeler` wraps a `BertEncoder`, the weights of which can be restored from the above pretraining model.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "B941M4iUCejO" - }, - "outputs": [], - "source": [ - "network = modeling.networks.BertEncoder(\n", - " vocab_size=vocab_size, num_layers=2, sequence_length=sequence_length)\n", - "\n", - "# Create a BERT trainer with the created network.\n", - "bert_span_labeler = modeling.models.BertSpanLabeler(network)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QpB9pgj4PpMg" - }, - "source": [ - "Inspecting the `bert_span_labeler`, we see it wraps the encoder with additional `SpanLabeling` that outputs `start_position` and `end_postion`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "RbqRNJCLJu4H" - }, - "outputs": [], - "source": [ - "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, dpi=48)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "fUf1vRxZJwio" - }, - "outputs": [], - "source": [ - "# Create a set of 2-dimensional data tensors to feed into the model.\n", - "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n", - "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n", - "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n", - "\n", - "# Feed the data to the model.\n", - "start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])\n", - "print(start_logits)\n", - "print(end_logits)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "WqhgQaN1lt-G" - }, - "source": [ - "### Compute loss\n", - "With `start_logits` and `end_logits`, we can compute loss:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "waqs6azNl3Nn" - }, - "outputs": [], - "source": [ - "start_positions = np.random.randint(sequence_length, size=(batch_size))\n", - "end_positions = np.random.randint(sequence_length, size=(batch_size))\n", - "\n", - "start_loss = tf.keras.losses.sparse_categorical_crossentropy(\n", - " start_positions, start_logits, from_logits=True)\n", - "end_loss = tf.keras.losses.sparse_categorical_crossentropy(\n", - " end_positions, end_logits, from_logits=True)\n", - "\n", - "total_loss = (tf.reduce_mean(start_loss) + tf.reduce_mean(end_loss)) / 2\n", - "print(total_loss)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Zdf03YtZmd_d" - }, - "source": [ - "With the `loss`, you can optimize the model. Please see [run_squad.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_squad.py) for the full example." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0A1XnGSTChg9" - }, - "source": [ - "## Classification model\n", - "\n", - "In the last section, we show how to build a text classification model.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MSK8OpZgnQa9" - }, - "source": [ - "### Build a BertClassifier model wrapping BertEncoder\n", - "\n", - "[BertClassifier](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_classifier.py) implements a [CLS] token classification model containing a single classification head." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cXXCsffkCphk" - }, - "outputs": [], - "source": [ - "network = modeling.networks.BertEncoder(\n", - " vocab_size=vocab_size, num_layers=2, sequence_length=sequence_length)\n", - "\n", - "# Create a BERT trainer with the created network.\n", - "num_classes = 2\n", - "bert_classifier = modeling.models.BertClassifier(\n", - " network, num_classes=num_classes)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8tZKueKYP4bB" - }, - "source": [ - "Inspecting the `bert_classifier`, we see it wraps the `encoder` with additional `Classification` head." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "snlutm9ZJgEZ" - }, - "outputs": [], - "source": [ - "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, dpi=48)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "yyHPHsqBJkCz" - }, - "outputs": [], - "source": [ - "# Create a set of 2-dimensional data tensors to feed into the model.\n", - "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n", - "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n", - "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n", - "\n", - "# Feed the data to the model.\n", - "logits = bert_classifier([word_id_data, mask_data, type_id_data])\n", - "print(logits)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "w--a2mg4nzKm" - }, - "source": [ - "### Compute loss\n", - "\n", - "With `logits`, we can compute `loss`:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9X0S1DoFn_5Q" - }, - "outputs": [], - "source": [ - "labels = np.random.randint(num_classes, size=(batch_size))\n", - "\n", - "loss = tf.keras.losses.sparse_categorical_crossentropy(\n", - " labels, logits, from_logits=True)\n", - "print(loss)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "mzBqOylZo3og" - }, - "source": [ - "With the `loss`, you can optimize the model. Please see [run_classifier.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_classifier.py) or the colab [fine_tuning_bert.ipynb](https://github.com/tensorflow/models/blob/master/official/colab/fine_tuning_bert.ipynb) for the full example." - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "Introduction to the TensorFlow Models NLP library", - "private_outputs": true, - "provenance": [], - "toc_visible": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/official/common/__init__.py b/official/common/__init__.py index a25710c222e3327cb20e000db5df5c5651c4a2cc..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 100644 --- a/official/common/__init__.py +++ b/official/common/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/common/dataset_fn.py b/official/common/dataset_fn.py index 4ac16a31b555588368a6c0aba73adbe62a95c2eb..52138d717a0e9a7bdb2ad1c0006966916ecf9910 100644 --- a/official/common/dataset_fn.py +++ b/official/common/dataset_fn.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,7 +28,8 @@ # ============================================================================== """Utility library for picking an appropriate dataset function.""" -from typing import Any, Callable, Union, Type +import functools +from typing import Any, Callable, Type, Union import tensorflow as tf @@ -38,5 +39,6 @@ PossibleDatasetType = Union[Type[tf.data.Dataset], Callable[[tf.Tensor], Any]] def pick_dataset_fn(file_type: str) -> PossibleDatasetType: if file_type == 'tfrecord': return tf.data.TFRecordDataset - + if file_type == 'tfrecord_compressed': + return functools.partial(tf.data.TFRecordDataset, compression_type='GZIP') raise ValueError('Unrecognized file_type: {}'.format(file_type)) diff --git a/official/common/distribute_utils.py b/official/common/distribute_utils.py index c48d68d6d93111e0959c9bbfde1e767fc673a979..bce9bf25ad5f4ea5dec88b61735af69a88d6133f 100644 --- a/official/common/distribute_utils.py +++ b/official/common/distribute_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -96,7 +96,7 @@ def get_distribution_strategy(distribution_strategy="mirrored", num_packs=1, tpu_address=None, **kwargs): - """Return a DistributionStrategy for running the model. + """Return a Strategy for running the model. Args: distribution_strategy: a string specifying which distribution strategy to @@ -119,7 +119,7 @@ def get_distribution_strategy(distribution_strategy="mirrored", **kwargs: Additional kwargs for internal usages. Returns: - tf.distribute.DistibutionStrategy object. + tf.distribute.Strategy object. Raises: ValueError: if `distribution_strategy` is "off" or "one_device" and `num_gpus` is larger than 1; or `num_gpus` is negative or if diff --git a/official/common/distribute_utils_test.py b/official/common/distribute_utils_test.py index 8e49d366651de0d891c72b4494d743d052e4f749..f06ee3ba628b7b4fe385a73311fbf453f5e3c0e6 100644 --- a/official/common/distribute_utils_test.py +++ b/official/common/distribute_utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/common/flags.py b/official/common/flags.py index 01ddf57af3872b0ad6f425602b5029ece9def707..5e15856416945708bda5fbab5110bb83f838c667 100644 --- a/official/common/flags.py +++ b/official/common/flags.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -45,7 +45,8 @@ def define_flags(): default=None, enum_values=[ 'train', 'eval', 'train_and_eval', 'continuous_eval', - 'continuous_train_and_eval', 'train_and_validate' + 'continuous_train_and_eval', 'train_and_validate', + 'train_and_post_eval' ], help='Mode to run: `train`, `eval`, `train_and_eval`, ' '`continuous_eval`, `continuous_train_and_eval` and ' diff --git a/official/common/registry_imports.py b/official/common/registry_imports.py index 06f3384db6283cbef08070f3678d0afe36e50c08..eb9af692a4ad144260807623b4b0e6b8ebac11e4 100644 --- a/official/common/registry_imports.py +++ b/official/common/registry_imports.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,7 +14,7 @@ """All necessary imports for registration.""" # pylint: disable=unused-import +from official import vision from official.nlp import tasks from official.nlp.configs import experiment_configs from official.utils.testing import mock_task -from official.vision import beta diff --git a/official/common/streamz_counters.py b/official/common/streamz_counters.py index ab3df36ce6077d2dafd25eb199fc0370852795e5..5def620ec752922e68e97811451b9aa47c21e01a 100644 --- a/official/common/streamz_counters.py +++ b/official/common/streamz_counters.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/core/__init__.py b/official/core/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..48624e238c195b7b65a075113815bd8b6b682be3 100644 --- a/official/core/__init__.py +++ b/official/core/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,3 +12,20 @@ # See the License for the specific language governing permissions and # limitations under the License. +"""Core is shared by both `nlp` and `vision`.""" + +from official.core import actions +from official.core import base_task +from official.core import base_trainer +from official.core import config_definitions +from official.core import exp_factory +from official.core import export_base +from official.core import file_writers +from official.core import input_reader +from official.core import registry +from official.core import savedmodel_checkpoint_manager +from official.core import task_factory +from official.core import tf_example_builder +from official.core import tf_example_feature_key +from official.core import train_lib +from official.core import train_utils diff --git a/official/core/actions.py b/official/core/actions.py index 20453a829689b0d6cbd5735df936afea2bca6c12..4d51d30943674685d24912d6d64f7170b88be6c3 100644 --- a/official/core/actions.py +++ b/official/core/actions.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,6 @@ from absl import logging import gin import orbit import tensorflow as tf -import tensorflow_model_optimization as tfmot from official.core import base_trainer from official.core import config_definitions @@ -52,6 +51,8 @@ class PruningAction: optimizer: `tf.keras.optimizers.Optimizer` optimizer instance used for training. This will be used to find the current training steps. """ + # TODO(b/221490190): Avoid local import when the bug is fixed. + import tensorflow_model_optimization as tfmot # pylint: disable=g-import-not-at-top self._optimizer = optimizer self.update_pruning_step = tfmot.sparsity.keras.UpdatePruningStep() self.update_pruning_step.set_model(model) @@ -201,7 +202,7 @@ def get_train_actions( """Gets train actions for TFM trainer.""" train_actions = [] # Adds pruning callback actions. - if hasattr(params.task, 'pruning'): + if hasattr(params.task, 'pruning') and params.task.pruning: train_actions.append( PruningAction( export_dir=model_dir, diff --git a/official/core/actions_test.py b/official/core/actions_test.py index 017fa606d2fd57e797d320d8acc6fd9c5062f512..a42360b66ad8878f18449045ecf768dcd685af96 100644 --- a/official/core/actions_test.py +++ b/official/core/actions_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/core/base_task.py b/official/core/base_task.py index db29395d66eb24f0ea0465838ea5947c2545fb6b..56b9bc4392effcaa9a9acd4ee1c7c7ac80604e50 100644 --- a/official/core/base_task.py +++ b/official/core/base_task.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,6 +14,7 @@ """Defines the base task abstraction.""" import abc +import functools from typing import Optional from absl import logging @@ -22,9 +23,12 @@ import tensorflow as tf from official.core import config_definitions from official.modeling import optimization from official.modeling import performance +from official.modeling.privacy import configs +from official.modeling.privacy import ops OptimizationConfig = optimization.OptimizationConfig RuntimeConfig = config_definitions.RuntimeConfig +DifferentialPrivacyConfig = configs.DifferentialPrivacyConfig class Task(tf.Module, metaclass=abc.ABCMeta): @@ -65,18 +69,35 @@ class Task(tf.Module, metaclass=abc.ABCMeta): @classmethod def create_optimizer(cls, optimizer_config: OptimizationConfig, - runtime_config: Optional[RuntimeConfig] = None): + runtime_config: Optional[RuntimeConfig] = None, + dp_config: Optional[DifferentialPrivacyConfig] = None): """Creates an TF optimizer from configurations. Args: optimizer_config: the parameters of the Optimization settings. runtime_config: the parameters of the runtime. + dp_config: the parameter of differential privacy. Returns: A tf.optimizers.Optimizer object. """ + gradient_transformers = None + if dp_config is not None: + logging.info("Adding differential privacy transform with config %s.", + dp_config.as_dict()) + noise_stddev = dp_config.clipping_norm * dp_config.noise_multiplier + gradient_transformers = [ + functools.partial( + ops.clip_l2_norm, l2_norm_clip=dp_config.clipping_norm), + functools.partial( + ops.add_noise, noise_stddev=noise_stddev) + ] + opt_factory = optimization.OptimizerFactory(optimizer_config) - optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + optimizer = opt_factory.build_optimizer( + opt_factory.build_learning_rate(), + gradient_transformers=gradient_transformers + ) # Configuring optimizer when loss_scale is set in runtime config. This helps # avoiding overflow/underflow for float16 computations. if runtime_config: @@ -101,9 +122,11 @@ class Task(tf.Module, metaclass=abc.ABCMeta): ckpt_dir_or_file = self.task_config.init_checkpoint logging.info("Trying to load pretrained checkpoint from %s", ckpt_dir_or_file) - if tf.io.gfile.isdir(ckpt_dir_or_file): + if ckpt_dir_or_file and tf.io.gfile.isdir(ckpt_dir_or_file): ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) if not ckpt_dir_or_file: + logging.info("No checkpoint file found from %s. Will not load.", + ckpt_dir_or_file) return if hasattr(model, "checkpoint_items"): diff --git a/official/core/base_trainer.py b/official/core/base_trainer.py index a45ea9b9988e34e96e5267266ef863dcb8b48342..4ac35b47da2dd70dc5b7fdc584519f68899db66d 100644 --- a/official/core/base_trainer.py +++ b/official/core/base_trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -33,57 +33,6 @@ ExperimentConfig = config_definitions.ExperimentConfig TrainerConfig = config_definitions.TrainerConfig -class Recovery: - """Built-in model blowup recovery module. - - Checks the loss value by the given threshold. If applicable, recover the - model by reading the checkpoint on disk. - """ - - def __init__(self, - loss_upper_bound: float, - checkpoint_manager: tf.train.CheckpointManager, - recovery_begin_steps: int = 0, - recovery_max_trials: int = 3): - self.recover_counter = 0 - self.recovery_begin_steps = recovery_begin_steps - self.recovery_max_trials = recovery_max_trials - self.loss_upper_bound = loss_upper_bound - self.checkpoint_manager = checkpoint_manager - - def should_recover(self, loss_value, global_step): - if tf.math.is_nan(loss_value): - return True - if (global_step >= self.recovery_begin_steps and - loss_value > self.loss_upper_bound): - return True - return False - - def maybe_recover(self, loss_value, global_step): - """Conditionally recovers the training by triggering checkpoint restoration. - - Args: - loss_value: the loss value as a float. - global_step: the number of global training steps. - - Raises: - RuntimeError: when recovery happens more than the max number of trials, - the job should crash. - """ - if not self.should_recover(loss_value, global_step): - return - self.recover_counter += 1 - if self.recover_counter > self.recovery_max_trials: - raise RuntimeError( - "The loss value is NaN or out of range after training loop and " - f"this happens {self.recover_counter} times.") - # Loads the previous good checkpoint. - checkpoint_path = self.checkpoint_manager.restore_or_initialize() - logging.warning( - "Recovering the model from checkpoint: %s. The loss value becomes " - "%f at step %d.", checkpoint_path, loss_value, global_step) - - class _AsyncTrainer(orbit.StandardTrainer, orbit.StandardEvaluator): """Trainer class for both sync and async Strategy.""" @@ -370,6 +319,11 @@ class Trainer(_AsyncTrainer): """Accesses the training checkpoint.""" return self._checkpoint + @property + def checkpoint_exporter(self): + """Accesses the checkpoint exporter.""" + return self._checkpoint_exporter + def train_loop_end(self): """See base class.""" self.join() diff --git a/official/core/base_trainer_test.py b/official/core/base_trainer_test.py index e50a5bcb7c2889e2aceda3b82c8916b65718eb05..10cf98e32bae7d2c88b0c7794fa6f3a76626fb92 100644 --- a/official/core/base_trainer_test.py +++ b/official/core/base_trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -150,30 +150,6 @@ class MockAsyncTrainer(trainer_lib._AsyncTrainer): return self.eval_global_step.numpy() -class RecoveryTest(tf.test.TestCase): - - def test_recovery_module(self): - ckpt = tf.train.Checkpoint(v=tf.Variable(1, dtype=tf.int32)) - model_dir = self.get_temp_dir() - manager = tf.train.CheckpointManager(ckpt, model_dir, max_to_keep=1) - recovery_module = trainer_lib.Recovery( - loss_upper_bound=1.0, - checkpoint_manager=manager, - recovery_begin_steps=1, - recovery_max_trials=1) - self.assertFalse(recovery_module.should_recover(1.1, 0)) - self.assertFalse(recovery_module.should_recover(0.1, 1)) - self.assertTrue(recovery_module.should_recover(1.1, 2)) - - # First triggers the recovery once. - recovery_module.maybe_recover(1.1, 10) - - # Second time, it raises. - with self.assertRaisesRegex( - RuntimeError, 'The loss value is NaN .*'): - recovery_module.maybe_recover(1.1, 10) - - class TrainerTest(tf.test.TestCase, parameterized.TestCase): def setUp(self): @@ -343,7 +319,9 @@ class TrainerTest(tf.test.TestCase, parameterized.TestCase): self.assertFalse(trainer.optimizer.dynamic) self.assertEqual(trainer.optimizer.initial_scale, loss_scale) else: - self.assertIsInstance(trainer.optimizer, tf.keras.optimizers.SGD) + self.assertIsInstance( + trainer.optimizer, + (tf.keras.optimizers.SGD, tf.keras.optimizers.legacy.SGD)) metrics = trainer.train(tf.convert_to_tensor(5, dtype=tf.int32)) self.assertIn('training_loss', metrics) diff --git a/official/core/config_definitions.py b/official/core/config_definitions.py index 3bca789b5221d7e51ed112cfa753613febae11c7..abc09953c36c6952c669bd101d4b966cc7fb3155 100644 --- a/official/core/config_definitions.py +++ b/official/core/config_definitions.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,6 +19,7 @@ from typing import Optional, Sequence, Union from official.modeling.hyperparams import base_config from official.modeling.optimization.configs import optimization_config +from official.modeling.privacy import configs as dp_configs OptimizationConfig = optimization_config.OptimizationConfig @@ -61,7 +62,7 @@ class DataConfig(base_config.Config): tf_data_service_address: The URI of a tf.data service to offload preprocessing onto during training. The URI should be in the format "protocol://address", e.g. "grpc://tf-data-service:5050". It can be - overridden by `FLAGS.tf_data_service` flag in the binary. + overridden by `FLAGS.tf_data_service` flag in the binary. tf_data_service_job_name: The name of the tf.data service job. This argument makes it possible for multiple datasets to share the same job. The default behavior is that the dataset creates anonymous, exclusively owned jobs. @@ -74,7 +75,35 @@ class DataConfig(base_config.Config): decoding when loading dataset from TFDS. Use comma to separate multiple features. The main use case is to skip the image/video decoding for better performance. + enable_shared_tf_data_service_between_parallel_trainers: A bool. When set to + true, only a single tf.data service will be started, and it will be shared + between all the trainer run simultaneously, e.g. using vizier to tune + hyperparameters. This will save CPU and RAM resources compared to running + separate tf.data service for each trainer. Notice that if batch size is + different for different trainers, the field + apply_tf_data_service_before_batching also needs to be true so that only a + single tf.data service instance will be created. In this case, tf.data + service will be applied before batching operation. So make sure to not + apply any processing steps after batching (e.g. in postprocess_fn) since + they wouldn't be paralleled by tf.data service and may slow down your + tf.data pipeline. When using shared tf.data service, the tf.data dataset + must be infinite, and slow trainer may skip certain training examples. + More details about shared tf.data service can be found at: + https://www.tensorflow.org/api_docs/python/tf/data/experimental/service#sharing_tfdata_service_with_concurrent_trainers. + apply_tf_data_service_before_batching: A bool. If set to True, tf.data + service will be applied before batching operation. This is useful to make + sure only a single tf.data service instance is created when + enable_shared_tf_data_service_between_parallel_trainers is true and batch + size is changing between parallel trainers. + trainer_id: A string. The id of the trainer if there are multiple parallel + trainer running at the same time, e.g. in vizier tuning case. It will be + automatically set if this field is needed. Users does not need to set it + when creating experiment configs. seed: An optional seed to use for deterministic shuffling/preprocessing. + prefetch_buffer_size: An int specifying the buffer size of prefetch + datasets. If None, the buffer size is autotuned. Specifying this is useful + in case autotuning uses up too much memory by making the buffer size too + high. """ input_path: Union[Sequence[str], str, base_config.Config] = "" tfds_name: str = "" @@ -94,7 +123,11 @@ class DataConfig(base_config.Config): tfds_data_dir: str = "" tfds_as_supervised: bool = False tfds_skip_decoding_feature: str = "" + enable_shared_tf_data_service_between_parallel_trainers: bool = False + apply_tf_data_service_before_batching: bool = False + trainer_id: Optional[str] = None seed: Optional[int] = None + prefetch_buffer_size: Optional[int] = None @dataclasses.dataclass @@ -189,8 +222,8 @@ class TrainerConfig(base_config.Config): is only used continuous_train_and_eval and continuous_eval modes. Default value is 1 hrs. train_steps: number of train steps. - validation_steps: number of eval steps. If `None`, the entire eval dataset - is used. + validation_steps: number of eval steps. If -1, the entire eval dataset is + used. validation_interval: number of training steps to run between evaluations. best_checkpoint_export_subdir: if set, the trainer will keep track of the best evaluation metric, and export the corresponding best checkpoint under @@ -240,11 +273,17 @@ class TrainerConfig(base_config.Config): @dataclasses.dataclass class TaskConfig(base_config.Config): + """Config passed to task.""" init_checkpoint: str = "" model: Optional[base_config.Config] = None train_data: DataConfig = DataConfig() validation_data: DataConfig = DataConfig() name: Optional[str] = None + # Configs for differential privacy + # These configs are only effective if you use create_optimizer in + # tensorflow_models/official/core/base_task.py + differential_privacy_config: Optional[ + dp_configs.DifferentialPrivacyConfig] = None @dataclasses.dataclass diff --git a/official/core/exp_factory.py b/official/core/exp_factory.py index b10d49acbdaeee211194ed0c018cb91493d74d21..fef7444987715944026926875fe0edeec67704ab 100644 --- a/official/core/exp_factory.py +++ b/official/core/exp_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/core/export_base.py b/official/core/export_base.py index a300a120d7ce42346c59cf796c07c689f32447c8..0ee9163d725b0a7f0139803cc2296d21663bb86b 100644 --- a/official/core/export_base.py +++ b/official/core/export_base.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -68,8 +68,17 @@ class ExportModule(tf.Module, metaclass=abc.ABCMeta): if inference_step is not None: self.inference_step = functools.partial(inference_step, model=self.model) else: - self.inference_step = functools.partial( - self.model.__call__, training=False) + if issubclass(type(model), tf.keras.Model): + # Default to self.model.call instead of self.model.__call__ to avoid + # keras tracing logic designed for training. + # Since most of Model Garden's call doesn't not have training kwargs + # or the default is False, we don't pass anything here. + # Please pass custom inference step if your model has training=True as + # default. + self.inference_step = self.model.call + else: + self.inference_step = functools.partial( + self.model.__call__, training=False) self.preprocessor = preprocessor self.postprocessor = postprocessor diff --git a/official/core/export_base_test.py b/official/core/export_base_test.py index c76dfa326dafbec1b13dda9854ed8944149fa589..e08a4a420f98cdd5711029fb0ab04829a2ba7625 100644 --- a/official/core/export_base_test.py +++ b/official/core/export_base_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/core/file_writers.py b/official/core/file_writers.py new file mode 100644 index 0000000000000000000000000000000000000000..dd8446bbe8b01286bc244f1f5e2e0e0d23daeb58 --- /dev/null +++ b/official/core/file_writers.py @@ -0,0 +1,80 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""File writer functions for dataset preparation, infra validation, and unit tests.""" + +import io +from typing import Optional, Sequence, Union + +import tensorflow as tf + + +def write_small_dataset(examples: Sequence[Union[tf.train.Example, + tf.train.SequenceExample]], + output_path: str, + file_type: str = 'tfrecord') -> None: + """Writes `examples` to a file at `output_path` with type `file_type`. + + CAVEAT: This function is not recommended for writing large datasets, since it + will loop through `examples` and perform write operation sequentially. + + Args: + examples: List of tf.train.Example or tf.train.SequenceExample. + output_path: Output path for the dataset. + file_type: A string indicating the file format, could be: 'tfrecord', + 'tfrecords', 'tfrecord_compressed', 'tfrecords_gzip', 'riegeli'. The + string is case insensitive. + """ + file_type = file_type.lower() + + if file_type == 'tfrecord' or file_type == 'tfrecords': + _write_tfrecord(examples, output_path) + elif file_type == 'tfrecord_compressed' or file_type == 'tfrecords_gzip': + _write_tfrecord(examples, output_path, + tf.io.TFRecordOptions(compression_type='GZIP')) + elif file_type == 'riegeli': + _write_riegeli(examples, output_path) + else: + raise ValueError(f'Unknown file_type: {file_type}') + + +def _write_tfrecord(examples: Sequence[Union[tf.train.Example, + tf.train.SequenceExample]], + output_path: str, + options: Optional[tf.io.TFRecordOptions] = None) -> None: + """Writes `examples` to a TFRecord file at `output_path`. + + Args: + examples: A list of tf.train.Example. + output_path: Output path for the dataset. + options: Options used for manipulating TFRecord files. + """ + with tf.io.TFRecordWriter(output_path, options) as writer: + for example in examples: + writer.write(example.SerializeToString()) + + +def _write_riegeli(examples: Sequence[Union[tf.train.Example, + tf.train.SequenceExample]], + output_path: str) -> None: + """Writes `examples` to a Riegeli file at `output_path`. + + Args: + examples: A list of tf.train.Example. + output_path: Output path for the dataset. + """ + with io.FileIO(output_path, 'wb') as fileio: + import riegeli # pylint: disable=g-import-not-at-top + with riegeli.RecordWriter(fileio) as writer: + writer.write_messages(examples) diff --git a/official/core/file_writers_test.py b/official/core/file_writers_test.py new file mode 100644 index 0000000000000000000000000000000000000000..a281964f1ee547ed0601f9bc3360a697e4486d90 --- /dev/null +++ b/official/core/file_writers_test.py @@ -0,0 +1,53 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for file_writers.""" + +import os +from absl.testing import parameterized +import tensorflow as tf + +from official.core import file_writers +from official.core import tf_example_builder + + +class FileWritersTest(tf.test.TestCase, parameterized.TestCase): + + def setUp(self): + super().setUp() + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_bytes_feature('foo', 'Hello World!') + self._example = example_builder.example + + @parameterized.parameters('tfrecord', 'TFRecord', 'tfrecords', + 'tfrecord_compressed', 'TFRecord_Compressed', + 'tfrecords_gzip') + def test_write_small_dataset_success(self, file_type): + temp_dir = self.create_tempdir() + temp_dataset_file = os.path.join(temp_dir.full_path, 'train') + file_writers.write_small_dataset([self._example], temp_dataset_file, + file_type) + self.assertTrue(os.path.exists(temp_dataset_file)) + + def test_write_small_dataset_unrecognized_format(self): + file_type = 'bar' + temp_dir = self.create_tempdir() + temp_dataset_file = os.path.join(temp_dir.full_path, 'train') + with self.assertRaises(ValueError): + file_writers.write_small_dataset([self._example], temp_dataset_file, + file_type) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/core/input_reader.py b/official/core/input_reader.py index 736172b6a25723d6419cbfc267f567b242483a1e..76933fbbb3fac4e80f521b98c39dd4586aa10061 100644 --- a/official/core/input_reader.py +++ b/official/core/input_reader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -160,22 +160,44 @@ def _read_tfds(tfds_builder: tfds.core.DatasetBuilder, """Reads a dataset from tfds.""" # No op if exist. tfds_builder.download_and_prepare() - - read_config = tfds.ReadConfig( - interleave_cycle_length=cycle_length, - interleave_block_length=block_length, - input_context=input_context, - shuffle_seed=seed) decoders = {} if tfds_skip_decoding_feature: for skip_feature in tfds_skip_decoding_feature.split(','): decoders[skip_feature.strip()] = tfds.decode.SkipDecoding() - dataset = tfds_builder.as_dataset( - split=tfds_split, - shuffle_files=is_training, - as_supervised=tfds_as_supervised, - decoders=decoders, - read_config=read_config) + if tfds_builder.info.splits: + num_shards = len(tfds_builder.info.splits[tfds_split].file_instructions) + else: + # The tfds mock path often does not provide splits. + num_shards = 1 + if input_context and num_shards < input_context.num_input_pipelines: + # The number of files in the dataset split is smaller than the number of + # input pipelines. We read the entire dataset first and then shard in the + # host memory. + read_config = tfds.ReadConfig( + interleave_cycle_length=cycle_length, + interleave_block_length=block_length, + input_context=None, + shuffle_seed=seed) + dataset = tfds_builder.as_dataset( + split=tfds_split, + shuffle_files=is_training, + as_supervised=tfds_as_supervised, + decoders=decoders, + read_config=read_config) + dataset = dataset.shard(input_context.num_input_pipelines, + input_context.input_pipeline_id) + else: + read_config = tfds.ReadConfig( + interleave_cycle_length=cycle_length, + interleave_block_length=block_length, + input_context=input_context, + shuffle_seed=seed) + dataset = tfds_builder.as_dataset( + split=tfds_split, + shuffle_files=is_training, + as_supervised=tfds_as_supervised, + decoders=decoders, + read_config=read_config) if is_training and not cache: dataset = dataset.repeat() @@ -270,6 +292,8 @@ class InputReader: self._transform_and_batch_fn = transform_and_batch_fn self._postprocess_fn = postprocess_fn self._seed = params.seed + self._prefetch_buffer_size = ( + params.prefetch_buffer_size or tf.data.experimental.AUTOTUNE) # When tf.data service is enabled, each data service worker should get # different random seeds. Thus, we set `seed` to None. @@ -282,13 +306,36 @@ class InputReader: self._enable_tf_data_service = ( params.enable_tf_data_service and params.tf_data_service_address) self._tf_data_service_address = params.tf_data_service_address + self._enable_shared_tf_data_service_between_parallel_trainers = ( + params.enable_shared_tf_data_service_between_parallel_trainers) + self._apply_tf_data_service_before_batching = ( + params.apply_tf_data_service_before_batching) + self._trainer_id = params.trainer_id if self._enable_tf_data_service: # Add a random seed as the tf.data service job name suffix, so tf.data # service doesn't reuse the previous state if TPU worker gets preempted. + # It's necessary to add global batch size into the tf data service job + # name because when tuning batch size with vizier and tf data service is + # also enable, the tf data servce job name should be different for + # different vizier trials since once batch size is changed, from the + # tf.data perspective, the dataset is a different instance, and a + # different job name should be used for tf data service. Otherwise, the + # model would read tensors from the incorrect tf data service job, which + # would causes dimension mismatch on the batch size dimension. self._tf_data_service_job_name = ( - params.tf_data_service_job_name + str(self.static_randnum)) + f'{params.tf_data_service_job_name}_bs{params.global_batch_size}_' + f'{self.static_randnum}') self._enable_round_robin_tf_data_service = params.get( 'enable_round_robin_tf_data_service', False) + if self._enable_shared_tf_data_service_between_parallel_trainers: + # When shared tf.data service is enabled, only a single tf.data service + # instance should be created and shared between parallel trainers. If + # the global batch size is different across trainers, + # params.apply_tf_data_service_before_batching should be set to true + # because tf.data service with different batch sizes will be considered + # separate tf.data service instances. + self._tf_data_service_job_name = ( + f'{params.tf_data_service_job_name}_{self.static_randnum}') @property def tfds_info(self) -> tfds.core.DatasetInfo: @@ -411,6 +458,19 @@ class InputReader: dataset = dataset.repeat() dataset = dataset.shuffle(self._shuffle_buffer_size, seed=self._seed) + # Applies tf.data service before batching operations. This is useful when + # tf.data service is shared between parallel trainers, and batch size is + # changing between parallel trainers. Then batch size is changing, tf.data + # services will be considered different instances if applied after batching + # operations, which make it difficult to share between parallel trainers. + # However, if there are additional expensive operations in + # self._transform_and_batch_fn and self._postprocess_fn, the entire tf.data + # pipeline could be slowed down. In this case, try to move these dataset + # operations into early stages if possible. + if (self._enable_shared_tf_data_service_between_parallel_trainers and + self._apply_tf_data_service_before_batching): + dataset = self._maybe_apply_data_service(dataset, input_context) + if self._transform_and_batch_fn is not None: dataset = self._transform_and_batch_fn(dataset, input_context) else: @@ -436,13 +496,18 @@ class InputReader: num_consumers = input_context.num_input_pipelines * ( replicas_per_input_pipeline) range_dataset = tf.data.Dataset.range(replicas_per_input_pipeline) + tfds_kwargs = { + 'processing_mode': 'parallel_epochs', + 'service': self._tf_data_service_address, + 'job_name': self._tf_data_service_job_name, + 'num_consumers': num_consumers + } + if self._enable_shared_tf_data_service_between_parallel_trainers: + raise ValueError('Shared tf.data service does not support round-robin' + ' tf.data service.') dataset = range_dataset.map(lambda i: dataset.apply( # pylint: disable=g-long-lambda tf.data.experimental.service.distribute( - processing_mode='parallel_epochs', - service=self._tf_data_service_address, - job_name=self._tf_data_service_job_name, - consumer_index=base_consumer_index + i, - num_consumers=num_consumers))) + consumer_index=base_consumer_index + i, **tfds_kwargs))) # Use parallel interleave to read multiple batches from a tf.data # service worker in parallel. dataset = dataset.interleave( @@ -451,11 +516,21 @@ class InputReader: num_parallel_calls=replicas_per_input_pipeline, deterministic=True) else: + tfds_kwargs = { + 'processing_mode': 'parallel_epochs', + 'service': self._tf_data_service_address, + 'job_name': self._tf_data_service_job_name, + } + if self._enable_shared_tf_data_service_between_parallel_trainers: + tfds_kwargs.update({ + 'processing_mode': + tf.data.experimental.service.ShardingPolicy.OFF, + 'cross_trainer_cache': + tf.data.experimental.service.CrossTrainerCache( + trainer_id=self._trainer_id) + }) dataset = dataset.apply( - tf.data.experimental.service.distribute( - processing_mode='parallel_epochs', - service=self._tf_data_service_address, - job_name=self._tf_data_service_job_name)) + tf.data.experimental.service.distribute(**tfds_kwargs)) return dataset def read(self, @@ -463,16 +538,17 @@ class InputReader: dataset: Optional[tf.data.Dataset] = None) -> tf.data.Dataset: """Generates a tf.data.Dataset object.""" if dataset is None: - dataset = self._read_data_source( - self._matched_files, self._dataset_fn, input_context, - self._tfds_builder) + dataset = self._read_data_source(self._matched_files, self._dataset_fn, + input_context, self._tfds_builder) dataset = self._decode_and_parse_dataset(dataset, self._global_batch_size, input_context) dataset = _maybe_map_fn(dataset, self._postprocess_fn) - dataset = self._maybe_apply_data_service(dataset, input_context) + if not (self._enable_shared_tf_data_service_between_parallel_trainers and + self._apply_tf_data_service_before_batching): + dataset = self._maybe_apply_data_service(dataset, input_context) if self._deterministic is not None: options = tf.data.Options() - options.experimental_deterministic = self._deterministic + options.deterministic = self._deterministic dataset = dataset.with_options(options) - return dataset.prefetch(tf.data.experimental.AUTOTUNE) + return dataset.prefetch(self._prefetch_buffer_size) diff --git a/official/core/registry.py b/official/core/registry.py index f349710b54f082c8e5d2843b23210c16c8a59023..5fdaf48ad8753d0454e930b77ddccfb6bc26c156 100644 --- a/official/core/registry.py +++ b/official/core/registry.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,6 +13,7 @@ # limitations under the License. """Registry utility.""" +from absl import logging def register(registered_collection, reg_key): @@ -54,8 +55,16 @@ def register(registered_collection, reg_key): leaf_reg_key = reg_key if leaf_reg_key in collection: - raise KeyError("Function or class {} registered multiple times.".format( - leaf_reg_key)) + if "beta" in fn_or_cls.__module__: + # TODO(yeqing): Clean this temporary branch for beta. + logging.warn( + "Duplicate registeration of beta module " + "name %r new %r old %r", reg_key, collection[leaf_reg_key], + fn_or_cls.__module__) + return fn_or_cls + else: + raise KeyError("Function or class {} registered multiple times.".format( + leaf_reg_key)) collection[leaf_reg_key] = fn_or_cls return fn_or_cls diff --git a/official/core/registry_test.py b/official/core/registry_test.py index 0d0639c6b10d5f9d587593d52dd6f2458c83bcd5..559b918e1e2b7d511c3c5076da5fe2e099938b1b 100644 --- a/official/core/registry_test.py +++ b/official/core/registry_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/core/savedmodel_checkpoint_manager.py b/official/core/savedmodel_checkpoint_manager.py new file mode 100644 index 0000000000000000000000000000000000000000..6b6df5fe32ebb94f463bcb9285ca7acd2c4fc516 --- /dev/null +++ b/official/core/savedmodel_checkpoint_manager.py @@ -0,0 +1,244 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Custom checkpoint manager that also exports saved models.""" + +import os +import re +import time +from typing import Callable, List, Mapping, Optional, Union + +from absl import logging +import tensorflow as tf + +SAVED_MODULES_PATH_SUFFIX = 'saved_modules' + + +def make_saved_modules_directory_name(checkpoint_name: str) -> str: + return f'{checkpoint_name}_{SAVED_MODULES_PATH_SUFFIX}' + + +class SavedModelCheckpointManager(tf.train.CheckpointManager): + """A CheckpointManager that also exports `SavedModel`s.""" + + def __init__(self, + checkpoint: tf.train.Checkpoint, + directory: str, + max_to_keep: int, + modules_to_export: Optional[Mapping[str, tf.Module]] = None, + keep_checkpoint_every_n_hours: Optional[int] = None, + checkpoint_name: str = 'ckpt', + step_counter: Optional[tf.Variable] = None, + checkpoint_interval: Optional[int] = None, + init_fn: Optional[Callable[[], None]] = None): + """See base class.""" + super().__init__( + checkpoint=checkpoint, + directory=directory, + max_to_keep=max_to_keep, + keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours, + checkpoint_name=checkpoint_name, + step_counter=step_counter, + checkpoint_interval=checkpoint_interval, + init_fn=init_fn) + self._modules_to_export = modules_to_export + self._savedmodels = self.get_existing_savedmodels() + + def save(self, + checkpoint_number: Optional[int] = None, + check_interval: bool = True, + options: Optional[tf.train.CheckpointOptions] = None): + """See base class.""" + checkpoint_path = super().save( + checkpoint_number=checkpoint_number, + check_interval=check_interval, + options=options) + if not checkpoint_path: # Nothing got written. + return + if not self._modules_to_export: # No modules to export. + logging.info('Skip saving SavedModel due to empty modules_to_export.') + return checkpoint_path + + # Save the models for the checkpoint that just got written. + saved_modules_directory = make_saved_modules_directory_name(checkpoint_path) + for model_name, model in self._modules_to_export.items(): + signatures = getattr(model, 'saved_model_signatures', None) + tf.saved_model.save( + obj=model, + export_dir=os.path.join(saved_modules_directory, model_name), + signatures=signatures) + + saved_modules_directories_to_keep = [ + make_saved_modules_directory_name(ckpt) for ckpt in self.checkpoints + ] + existing_saved_modules_dirs = self.get_existing_savedmodels() + + self._savedmodels = [] + # Keep savedmodels in the same order as checkpoints (from oldest to newest). + for saved_modules_dir_to_keep in saved_modules_directories_to_keep: + if saved_modules_dir_to_keep in existing_saved_modules_dirs: + self._savedmodels.append(saved_modules_dir_to_keep) + + for existing_saved_modules_dir in existing_saved_modules_dirs: + if existing_saved_modules_dir not in self._savedmodels: + tf.io.gfile.rmtree(existing_saved_modules_dir) + + return checkpoint_path + + def get_existing_savedmodels(self) -> List[str]: + """Gets a list of all existing SavedModel paths in `directory`. + + Returns: + A list of all existing SavedModel paths. + """ + saved_modules_glob = make_saved_modules_directory_name( + self._checkpoint_prefix + '-*') + return tf.io.gfile.glob(saved_modules_glob) + + @property + def latest_savedmodel(self) -> Union[str, None]: + """The path of the most recent SavedModel in `directory`. + + Returns: + The latest SavedModel path. If there are no SavedModels, returns `None`. + """ + if self._savedmodels: + return self._savedmodels[-1] + return None + + @property + def savedmodels(self) -> List[str]: + """A list of managed SavedModels. + + Returns: + A list of SavedModel paths, sorted from oldest to newest. + """ + return self._savedmodels + + @property + def modules_to_export(self) -> Union[Mapping[str, tf.Module], None]: + return self._modules_to_export + + def get_savedmodel_number_from_path(self, + savedmodel_path: str) -> Union[int, None]: + """Gets the savedmodel_number/checkpoint_number from savedmodel filepath. + + The savedmodel_number is global step when using with orbit controller. + + Args: + savedmodel_path: savedmodel directory path. + + Returns: + Savedmodel number or None if no matched pattern found in savedmodel path. + """ + pattern = rf'\d+_{SAVED_MODULES_PATH_SUFFIX}$' + savedmodel_number = re.search(pattern, savedmodel_path) + if savedmodel_number: + savedmodel_number = savedmodel_number.group() + return int(savedmodel_number[:-len(SAVED_MODULES_PATH_SUFFIX) - 1]) + return None + + def savedmodels_iterator(self, + min_interval_secs: float = 0, + timeout: Optional[float] = None, + timeout_fn: Optional[Callable[[], bool]] = None): + """Continuously yield new SavedModel files as they appear. + + The iterator only checks for new savedmodels when control flow has been + reverted to it. The logic is same to the `train.checkpoints_iterator`. + + Args: + min_interval_secs: The minimum number of seconds between yielding + savedmodels. + timeout: The maximum number of seconds to wait between savedmodels. If + left as `None`, then the process will wait indefinitely. + timeout_fn: Optional function to call after a timeout. If the function + returns True, then it means that no new savedmodels will be generated + and the iterator will exit. The function is called with no arguments. + + Yields: + String paths to latest SavedModel files as they arrive. + """ + savedmodel_path = None + while True: + new_savedmodel_path = self.wait_for_new_savedmodel( + savedmodel_path, timeout=timeout) + if new_savedmodel_path is None: + if not timeout_fn: + # timed out + logging.info('Timed-out waiting for a savedmodel.') + return + if timeout_fn(): + # The timeout_fn indicated that we are truly done. + return + else: + # The timeout_fn indicated that more savedmodels may come. + continue + start = time.time() + savedmodel_path = new_savedmodel_path + yield savedmodel_path + time_to_next_eval = start + min_interval_secs - time.time() + if time_to_next_eval > 0: + time.sleep(time_to_next_eval) + + def wait_for_new_savedmodel( + self, + last_savedmodel: Optional[str] = None, + seconds_to_sleep: float = 1.0, + timeout: Optional[float] = None) -> Union[str, None]: + """Waits until a new savedmodel file is found. + + Args: + last_savedmodel: The last savedmodel path used or `None` if we're + expecting a savedmodel for the first time. + seconds_to_sleep: The number of seconds to sleep for before looking for a + new savedmodel. + timeout: The maximum number of seconds to wait. If left as `None`, then + the process will wait indefinitely. + + Returns: + A new savedmodel path, or None if the timeout was reached. + """ + logging.info('Waiting for new savedmodel at %s', self._directory) + stop_time = time.time() + timeout if timeout is not None else None + + last_savedmodel_number = 0 + if last_savedmodel: + last_savedmodel_number = self.get_savedmodel_number_from_path( + last_savedmodel) + + while True: + if stop_time is not None and time.time() + seconds_to_sleep > stop_time: + return None + + existing_savedmodels = {} + for savedmodel_path in self.get_existing_savedmodels(): + savedmodel_number = self.get_savedmodel_number_from_path( + savedmodel_path) + if savedmodel_number is not None: + existing_savedmodels[savedmodel_number] = savedmodel_path + + # Find the first savedmodel with larger step number as next savedmodel. + savedmodel_path = None + existing_savedmodels = dict(sorted(existing_savedmodels.items())) + for savedmodel_number in existing_savedmodels: + if savedmodel_number > last_savedmodel_number: + savedmodel_path = existing_savedmodels[savedmodel_number] + break + + if savedmodel_path: + logging.info('Found new savedmodel at %s', savedmodel_path) + return savedmodel_path + else: + time.sleep(seconds_to_sleep) diff --git a/official/core/savedmodel_checkpoint_manager_test.py b/official/core/savedmodel_checkpoint_manager_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8fcb51f3b59f0ec91cfc82ad377b75307ce75806 --- /dev/null +++ b/official/core/savedmodel_checkpoint_manager_test.py @@ -0,0 +1,114 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import time +from typing import Iterable + +import tensorflow as tf + +from official.core import savedmodel_checkpoint_manager + + +def _models_exist(checkpoint_path: str, models: Iterable[str]) -> bool: + for model_name in models: + if not tf.io.gfile.isdir( + os.path.join( + savedmodel_checkpoint_manager.make_saved_modules_directory_name( + checkpoint_path), model_name)): + return False + return True + + +class CheckpointManagerTest(tf.test.TestCase): + + def _create_manager(self, max_to_keep: int = 1) -> tf.train.CheckpointManager: + """Sets up SavedModelCheckpointManager object. + + Args: + max_to_keep: max number of savedmodels to keep. + + Returns: + created savedmodel manager. + """ + models = { + 'model_1': + tf.keras.Sequential( + layers=[tf.keras.layers.Dense(8, input_shape=(16,))]), + 'model_2': + tf.keras.Sequential( + layers=[tf.keras.layers.Dense(16, input_shape=(32,))]), + } + checkpoint = tf.train.Checkpoint() + manager = savedmodel_checkpoint_manager.SavedModelCheckpointManager( + checkpoint=checkpoint, + directory=self.get_temp_dir(), + max_to_keep=max_to_keep, + modules_to_export=models) + return manager + + def test_max_to_keep(self): + manager = self._create_manager() + models = manager.modules_to_export + first_path = manager.save() + second_path = manager.save() + + savedmodel = savedmodel_checkpoint_manager.make_saved_modules_directory_name( + manager.latest_checkpoint) + self.assertEqual(savedmodel, manager.latest_savedmodel) + self.assertTrue(_models_exist(second_path, models.keys())) + self.assertFalse(_models_exist(first_path, models.keys())) + + def test_returns_none_after_timeout(self): + manager = self._create_manager() + start = time.time() + ret = manager.wait_for_new_savedmodel( + None, timeout=1.0, seconds_to_sleep=0.5) + end = time.time() + self.assertIsNone(ret) + # We've waited 0.5 second. + self.assertGreater(end, start + 0.5) + # The timeout kicked in. + self.assertLess(end, start + 0.6) + + def test_saved_model_iterator(self): + manager = self._create_manager(max_to_keep=2) + self.assertIsNotNone(manager.save(checkpoint_number=1)) + self.assertIsNotNone(manager.save(checkpoint_number=2)) + self.assertIsNotNone(manager.save(checkpoint_number=3)) + + # Savedmodels are in time order. + expected_savedmodels = manager.savedmodels + # Order not guaranteed. + existing_savedmodels = manager.get_existing_savedmodels() + savedmodels = list(manager.savedmodels_iterator(timeout=3.0)) + self.assertEqual(savedmodels, expected_savedmodels) + self.assertEqual(set(savedmodels), set(existing_savedmodels)) + + def test_saved_model_iterator_timeout_fn(self): + manager = self._create_manager() + timeout_fn_calls = [0] + + def timeout_fn(): + timeout_fn_calls[0] += 1 + return timeout_fn_calls[0] > 3 + + results = list( + manager.savedmodels_iterator(timeout=0.1, timeout_fn=timeout_fn)) + self.assertEqual([], results) + self.assertEqual(4, timeout_fn_calls[0]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/core/task_factory.py b/official/core/task_factory.py index f5862462e0da94ad183e8bb7a5d60a7cad6e1b79..4dee1fe2e21fa5a9a0503392c57121e8f160796e 100644 --- a/official/core/task_factory.py +++ b/official/core/task_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/core/test_utils.py b/official/core/test_utils.py index 015373699c5c5917e9f866686d5817a791155d01..7edeff7c632102ed4d3480b58ac4d9f2ba5b5f88 100644 --- a/official/core/test_utils.py +++ b/official/core/test_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/core/tf_example_builder.py b/official/core/tf_example_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..862926fff1c2dd6e772a345d1a5c5fd79c146870 --- /dev/null +++ b/official/core/tf_example_builder.py @@ -0,0 +1,144 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Builder class for preparing tf.train.Example.""" + +# https://www.python.org/dev/peps/pep-0563/#enabling-the-future-behavior-in-python-3-7 +from __future__ import annotations + +from typing import Mapping, Sequence, Union + +import numpy as np +import tensorflow as tf + +BytesValueType = Union[bytes, Sequence[bytes], str, Sequence[str]] + +_to_array = lambda v: [v] if not isinstance(v, (list, np.ndarray)) else v +_to_bytes = lambda v: v.encode() if isinstance(v, str) else v +_to_bytes_array = lambda v: list(map(_to_bytes, _to_array(v))) + + +class TfExampleBuilder(object): + """Builder class for preparing tf.train.Example. + + Read API doc at https://www.tensorflow.org/api_docs/python/tf/train/Example. + + Example usage: + >>> example_builder = TfExampleBuilder() + >>> example = ( + example_builder.add_bytes_feature('feature_a', 'foobarbaz') + .add_ints_feature('feature_b', [1, 2, 3]) + .example) + """ + + def __init__(self) -> None: + self._example = tf.train.Example() + + @property + def example(self) -> tf.train.Example: + """Returns a copy of the generated tf.train.Example proto.""" + return self._example + + @property + def serialized_example(self) -> str: + """Returns a serialized string of the generated tf.train.Example proto.""" + return self._example.SerializeToString() + + def set(self, example: tf.train.Example) -> TfExampleBuilder: + """Sets the example.""" + self._example = example + return self + + def reset(self) -> TfExampleBuilder: + """Resets the example to an empty proto.""" + self._example = tf.train.Example() + return self + + ###### Basic APIs for primitive data types ###### + def add_feature_dict( + self, feature_dict: Mapping[str, tf.train.Feature]) -> TfExampleBuilder: + """Adds the predefined `feature_dict` to the example. + + Note: Please prefer to using feature-type-specific methods. + + Args: + feature_dict: A dictionary from tf.Example feature key to + tf.train.Feature. + + Returns: + The builder object for subsequent method calls. + """ + for k, v in feature_dict.items(): + self._example.features.feature[k].CopyFrom(v) + return self + + def add_feature(self, key: str, + feature: tf.train.Feature) -> TfExampleBuilder: + """Adds predefined `feature` with `key` to the example. + + Args: + key: String key of the feature. + feature: The feature to be added to the example. + + Returns: + The builder object for subsequent method calls. + """ + self._example.features.feature[key].CopyFrom(feature) + return self + + def add_bytes_feature(self, key: str, + value: BytesValueType) -> TfExampleBuilder: + """Adds byte(s) or string(s) with `key` to the example. + + Args: + key: String key of the feature. + value: The byte(s) or string(s) to be added to the example. + + Returns: + The builder object for subsequent method calls. + """ + return self.add_feature( + key, + tf.train.Feature( + bytes_list=tf.train.BytesList(value=_to_bytes_array(value)))) + + def add_ints_feature(self, key: str, + value: Union[int, Sequence[int]]) -> TfExampleBuilder: + """Adds integer(s) with `key` to the example. + + Args: + key: String key of the feature. + value: The integer(s) to be added to the example. + + Returns: + The builder object for subsequent method calls. + """ + return self.add_feature( + key, + tf.train.Feature(int64_list=tf.train.Int64List(value=_to_array(value)))) + + def add_floats_feature( + self, key: str, value: Union[float, Sequence[float]]) -> TfExampleBuilder: + """Adds float(s) with `key` to the example. + + Args: + key: String key of the feature. + value: The float(s) to be added to the example. + + Returns: + The builder object for subsequent method calls. + """ + return self.add_feature( + key, + tf.train.Feature(float_list=tf.train.FloatList(value=_to_array(value)))) diff --git a/official/core/tf_example_builder_test.py b/official/core/tf_example_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..880b965b300d71cf23528f173c8ad90efbad1396 --- /dev/null +++ b/official/core/tf_example_builder_test.py @@ -0,0 +1,165 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for tf_example_builder. + +See `test_add_image_matrix_feature_with_fake_image` for the typical structure of +a unit test. +""" + +from absl.testing import parameterized +import tensorflow as tf +from official.core import tf_example_builder + + +class TfExampleBuilderTest(tf.test.TestCase, parameterized.TestCase): + + def test_init_an_empty_example(self): + example_builder = tf_example_builder.TfExampleBuilder() + example = example_builder.example + self.assertProtoEquals('', example) + + def test_init_an_empty_serialized_example(self): + example_builder = tf_example_builder.TfExampleBuilder() + example = example_builder.serialized_example + self.assertProtoEquals('', example) + + def test_add_feature(self): + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_feature( + 'foo', + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[b'Hello World!']))) + example = example_builder.example + # Use proto text to show how the entire proto would look like. + self.assertProtoEquals( + """ + features: { + feature: { + key: "foo" + value: { + bytes_list: { + value: "Hello World!" + } + } + } + }""", example) + + def test_add_feature_dict(self): + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_feature_dict({ + 'foo': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[b'Hello World!'])), + 'bar': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[299, 792, 458])) + }) + example = example_builder.example + # Use proto text to show how the entire proto would look like. + self.assertProtoEquals( + """ + features: { + feature: { + key: "foo" + value: { + bytes_list: { + value: "Hello World!" + } + } + } + feature: { + key: "bar" + value: { + int64_list: { + value: 299 + value: 792 + value: 458 + } + } + } + }""", example) + + @parameterized.named_parameters( + ('single_bytes', b'Hello World!', b'Hello World!'), + ('single_string', 'Hello World!', b'Hello World!')) + def test_add_single_byte_feature(self, value, expected_value): + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_bytes_feature('foo', value) + example = example_builder.example + # Use constructor to easily work with test parameters. + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'foo': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[expected_value])) + })), example) + + @parameterized.named_parameters( + ('multiple_bytes', [b'Hello World!', b'Good Morning!' + ], [b'Hello World!', b'Good Morning!']), + ('multiple_sring', ['Hello World!', 'Good Morning!' + ], [b'Hello World!', b'Good Morning!'])) + def test_add_multiple_bytes_feature(self, values, expected_values): + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_bytes_feature('foo', values) + example = example_builder.example + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'foo': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=expected_values)) + })), example) + + @parameterized.named_parameters( + ('single_integer', 123, [123]), + ('multiple_integers', [123, 456, 789], [123, 456, 789])) + def test_add_ints_feature(self, value, expected_value): + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_ints_feature('bar', value) + example = example_builder.example + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'bar': + tf.train.Feature( + int64_list=tf.train.Int64List(value=expected_value)) + })), example) + + @parameterized.named_parameters( + ('single_float', 3.14, [3.14]), + ('multiple_floats', [3.14, 1.57, 6.28], [3.14, 1.57, 6.28])) + def test_add_floats_feature(self, value, expected_value): + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_floats_feature('baz', value) + example = example_builder.example + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'baz': + tf.train.Feature( + float_list=tf.train.FloatList(value=expected_value)) + })), example) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/core/tf_example_feature_key.py b/official/core/tf_example_feature_key.py new file mode 100644 index 0000000000000000000000000000000000000000..e9d3a1d76d26ca1d8097f3dca714bc1602342369 --- /dev/null +++ b/official/core/tf_example_feature_key.py @@ -0,0 +1,62 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Data classes for tf.Example proto feature keys. + +Feature keys are grouped by feature types. Key names follow conventions in +go/tf-example. +""" +import dataclasses +import functools +from typing import Optional + +# Disable init function to use the one defined in base class. +dataclass = functools.partial(dataclasses.dataclass(init=False)) + + +@dataclass +class TfExampleFeatureKeyBase: + """Base dataclass for defining tf.Example proto feature keys. + + This class defines the logic of adding prefix to feature keys. Subclasses + will define feature keys for a specific feature type in data fields. + + NOTE: Please follow subclass examples in this module to define feature keys + for a new feature type. + """ + + def __init__(self, prefix: Optional[str] = None): + """Instantiates the feature key class. + + Adds a string prefix to all fields of a feature key instance if `prefix` is + not None nor empty. + + Example usage: + + >>> test_key = EncodedImageFeatureKey() + >>> test_key.encoded + image/encoded + >>> test_key = EncodedImageFeatureKey('prefix') + >>> test_key.encoded + prefix/image/encoded + + Args: + prefix: A prefix string that will be added before the feature key string + with a trailing slash '/'. + """ + if prefix: + for field in dataclasses.fields(self): + key_name = field.name + key_value = getattr(self, key_name) + setattr(self, key_name, f'{prefix}/{key_value}') diff --git a/official/core/tf_example_feature_key_test.py b/official/core/tf_example_feature_key_test.py new file mode 100644 index 0000000000000000000000000000000000000000..295369468707d548cf40556bfe8e422c430ef04e --- /dev/null +++ b/official/core/tf_example_feature_key_test.py @@ -0,0 +1,49 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for tf_example_feature_key.""" +import dataclasses +import inspect +from absl.testing import absltest +from absl.testing import parameterized + +from official.core import tf_example_feature_key + + +@tf_example_feature_key.dataclass +class TestFeatureKey(tf_example_feature_key.TfExampleFeatureKeyBase): + test: str = 'foo/bar' + + +class TfExampleFeatureKeyTest(parameterized.TestCase): + + def test_add_prefix_success(self): + test_key = TestFeatureKey('prefix') + self.assertEqual(test_key.test, 'prefix/foo/bar') + + @parameterized.parameters(None, '') + def test_add_prefix_skip_success(self, prefix): + test_key = TestFeatureKey(prefix) + self.assertEqual(test_key.test, 'foo/bar') + + def test_all_feature_key_classes_are_valid(self): + for _, obj in inspect.getmembers(tf_example_feature_key): + if inspect.isclass(obj): + self.assertTrue(dataclasses.is_dataclass(obj)) + self.assertTrue( + issubclass(obj, tf_example_feature_key.TfExampleFeatureKeyBase)) + + +if __name__ == '__main__': + absltest.main() diff --git a/official/core/train_lib.py b/official/core/train_lib.py index 5f548ea722591bb878e468647b449426105ec74b..93afe0d7c55538aebfa3ac75f50eb7d3bf4ca555 100644 --- a/official/core/train_lib.py +++ b/official/core/train_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,7 @@ """TFM common training driver library.""" # pytype: disable=attribute-error import os -from typing import Any, Mapping, Optional, Tuple +from typing import Any, Mapping, Optional, Tuple, List # Import libraries @@ -32,6 +32,226 @@ from official.core import train_utils maybe_create_best_ckpt_exporter = train_utils.maybe_create_best_ckpt_exporter +class OrbitExperimentRunner: + """Runs experiment with Orbit training loop. + + The default experiment runner for model garden experiments. User can + customize the experiment pipeline by subclassing this class and replacing + components or functions. + + For example, an experiment runner with customized checkpoint manager: + + ```python + class MyExpRunnerWithExporter(AbstractExperimentRunner): + def _maybe_build_checkpoint_manager(sefl): + return MyCheckpointManager(*args) + + # In user code + MyExpRunnerWithExporter(**needed_kwargs).run(mode) + ``` + + Similar override can be done to other components. + """ + + def __init__( + self, + distribution_strategy: tf.distribute.Strategy, + task: base_task.Task, + mode: str, + params: config_definitions.ExperimentConfig, + model_dir: str, + run_post_eval: bool = False, + save_summary: bool = True, + train_actions: Optional[List[orbit.Action]] = None, + eval_actions: Optional[List[orbit.Action]] = None, + trainer: Optional[base_trainer.Trainer] = None, + controller_cls=orbit.Controller + ): + """Constructor. + + Args: + distribution_strategy: A distribution strategy. + task: A Task instance. + mode: A 'str', specifying the mode. Can be 'train', 'eval', + 'train_and_eval' or 'continuous_eval'. + params: ExperimentConfig instance. + model_dir: A 'str', a path to store model checkpoints and summaries. + run_post_eval: Whether to run post eval once after training, metrics logs + are returned. + save_summary: Whether to save train and validation summary. + train_actions: Optional list of Orbit train actions. + eval_actions: Optional list of Orbit eval actions. + trainer: the base_trainer.Trainer instance. It should be created within + the strategy.scope(). + controller_cls: The controller class to manage the train and eval process. + Must be a orbit.Controller subclass. + """ + self.strategy = distribution_strategy or tf.distribute.get_strategy() + self._params = params + self._model_dir = model_dir + self._mode = mode + self._run_post_eval = run_post_eval + + self._trainer = trainer or self._build_trainer( + task, + train='train' in mode, + evaluate=('eval' in mode) or run_post_eval) + assert self.trainer is not None + self._checkpoint_manager = self._maybe_build_checkpoint_manager() + self._controller = self._build_controller( + trainer=self.trainer if 'train' in mode else None, + evaluator=self.trainer, + save_summary=save_summary, + train_actions=train_actions, + eval_actions=eval_actions, + controller_cls=controller_cls) + + @property + def params(self) -> config_definitions.ExperimentConfig: + return self._params + + @property + def model_dir(self) -> str: + return self._model_dir + + @property + def trainer(self) -> base_trainer.Trainer: + return self._trainer + + @property + def checkpoint_manager(self) -> tf.train.CheckpointManager: + return self._checkpoint_manager + + @property + def controller(self) -> orbit.Controller: + return self._controller + + def _build_trainer(self, task: base_task.Task, train: bool, + evaluate: bool) -> base_trainer.Trainer: + """Create trainer.""" + with self.strategy.scope(): + trainer = train_utils.create_trainer( + self.params, + task, + train=train, + evaluate=evaluate, + checkpoint_exporter=self._build_best_checkpoint_exporter()) + return trainer + + def _build_best_checkpoint_exporter(self): + return maybe_create_best_ckpt_exporter(self.params, self.model_dir) + + def _maybe_build_checkpoint_manager( + self) -> Optional[tf.train.CheckpointManager]: + """Maybe create a CheckpointManager.""" + assert self.trainer is not None + if self.trainer.checkpoint: + if self.model_dir is None: + raise ValueError('model_dir must be specified, but got None') + checkpoint_manager = tf.train.CheckpointManager( + self.trainer.checkpoint, + directory=self.model_dir, + max_to_keep=self.params.trainer.max_to_keep, + step_counter=self.trainer.global_step, + checkpoint_interval=self.params.trainer.checkpoint_interval, + init_fn=self.trainer.initialize) + else: + checkpoint_manager = None + return checkpoint_manager + + def _build_controller(self, + trainer, + evaluator, + save_summary: bool = True, + train_actions: Optional[List[orbit.Action]] = None, + eval_actions: Optional[List[orbit.Action]] = None, + controller_cls=orbit.Controller) -> orbit.Controller: + """Builds a Orbit controler.""" + train_actions = [] if not train_actions else train_actions + if trainer: + train_actions += actions.get_train_actions( + self.params, + trainer, + self.model_dir, + checkpoint_manager=self.checkpoint_manager) + + eval_actions = [] if not eval_actions else eval_actions + if evaluator: + eval_actions += actions.get_eval_actions(self.params, evaluator, + self.model_dir) + + controller = controller_cls( + strategy=self.strategy, + trainer=trainer, + evaluator=evaluator, + global_step=self.trainer.global_step, + steps_per_loop=self.params.trainer.steps_per_loop, + checkpoint_manager=self.checkpoint_manager, + summary_dir=os.path.join(self.model_dir, 'train') if + (save_summary) else None, + eval_summary_dir=os.path.join( + self.model_dir, self.params.trainer.validation_summary_subdir) if + (save_summary) else None, + summary_interval=self.params.trainer.summary_interval if + (save_summary) else None, + train_actions=train_actions, + eval_actions=eval_actions) + return controller + + def run(self) -> Tuple[tf.keras.Model, Mapping[str, Any]]: + """Run experiments by mode. + + Returns: + A 2-tuple of (model, eval_logs). + model: `tf.keras.Model` instance. + eval_logs: returns eval metrics logs when run_post_eval is set to True, + otherwise, returns {}. + """ + mode = self._mode + params = self.params + logging.info('Starts to execute mode: %s', mode) + with self.strategy.scope(): + if mode == 'train' or mode == 'train_and_post_eval': + self.controller.train(steps=params.trainer.train_steps) + elif mode == 'train_and_eval': + self.controller.train_and_evaluate( + train_steps=params.trainer.train_steps, + eval_steps=params.trainer.validation_steps, + eval_interval=params.trainer.validation_interval) + elif mode == 'eval': + self.controller.evaluate(steps=params.trainer.validation_steps) + elif mode == 'continuous_eval': + + def timeout_fn(): + if self.trainer.global_step.numpy() >= params.trainer.train_steps: + return True + return False + + self.controller.evaluate_continuously( + steps=params.trainer.validation_steps, + timeout=params.trainer.continuous_eval_timeout, + timeout_fn=timeout_fn) + else: + raise NotImplementedError('The mode is not implemented: %s' % mode) + + num_params = train_utils.try_count_params(self.trainer.model) + if num_params is not None: + logging.info('Number of trainable params in model: %f Millions.', + num_params / 10.**6) + + flops = train_utils.try_count_flops(self.trainer.model) + if flops is not None: + logging.info('FLOPs (multi-adds) in model: %f Billions.', + flops / 10.**9 / 2) + + if self._run_post_eval or mode == 'train_and_post_eval': + with self.strategy.scope(): + return self.trainer.model, self.controller.evaluate( + steps=params.trainer.validation_steps) + else: + return self.trainer.model, {} + + def run_experiment( distribution_strategy: tf.distribute.Strategy, task: base_task.Task, @@ -40,6 +260,8 @@ def run_experiment( model_dir: str, run_post_eval: bool = False, save_summary: bool = True, + train_actions: Optional[List[orbit.Action]] = None, + eval_actions: Optional[List[orbit.Action]] = None, trainer: Optional[base_trainer.Trainer] = None, controller_cls=orbit.Controller ) -> Tuple[tf.keras.Model, Mapping[str, Any]]: @@ -55,6 +277,8 @@ def run_experiment( run_post_eval: Whether to run post eval once after training, metrics logs are returned. save_summary: Whether to save train and validation summary. + train_actions: Optional list of Orbit train actions. + eval_actions: Optional list of Orbit eval actions. trainer: the base_trainer.Trainer instance. It should be created within the strategy.scope(). controller_cls: The controller class to manage the train and eval process. @@ -66,85 +290,17 @@ def run_experiment( eval_logs: returns eval metrics logs when run_post_eval is set to True, otherwise, returns {}. """ - - with distribution_strategy.scope(): - if not trainer: - trainer = train_utils.create_trainer( - params, - task, - train='train' in mode, - evaluate=('eval' in mode) or run_post_eval, - checkpoint_exporter=maybe_create_best_ckpt_exporter( - params, model_dir)) - - if trainer.checkpoint: - if model_dir is None: - raise ValueError('model_dir must be specified, but got None') - checkpoint_manager = tf.train.CheckpointManager( - trainer.checkpoint, - directory=model_dir, - max_to_keep=params.trainer.max_to_keep, - step_counter=trainer.global_step, - checkpoint_interval=params.trainer.checkpoint_interval, - init_fn=trainer.initialize) - else: - checkpoint_manager = None - - controller = controller_cls( - strategy=distribution_strategy, - trainer=trainer if 'train' in mode else None, - evaluator=trainer, - global_step=trainer.global_step, - steps_per_loop=params.trainer.steps_per_loop, - checkpoint_manager=checkpoint_manager, - summary_dir=os.path.join(model_dir, 'train') if (save_summary) else None, - eval_summary_dir=os.path.join(model_dir, - params.trainer.validation_summary_subdir) if - (save_summary) else None, - summary_interval=params.trainer.summary_interval if - (save_summary) else None, - train_actions=actions.get_train_actions( - params, trainer, model_dir, checkpoint_manager=checkpoint_manager), - eval_actions=actions.get_eval_actions(params, trainer, model_dir)) - - logging.info('Starts to execute mode: %s', mode) - with distribution_strategy.scope(): - if mode == 'train': - controller.train(steps=params.trainer.train_steps) - elif mode == 'train_and_eval': - controller.train_and_evaluate( - train_steps=params.trainer.train_steps, - eval_steps=params.trainer.validation_steps, - eval_interval=params.trainer.validation_interval) - elif mode == 'eval': - controller.evaluate(steps=params.trainer.validation_steps) - elif mode == 'continuous_eval': - - def timeout_fn(): - if trainer.global_step.numpy() >= params.trainer.train_steps: - return True - return False - - controller.evaluate_continuously( - steps=params.trainer.validation_steps, - timeout=params.trainer.continuous_eval_timeout, - timeout_fn=timeout_fn) - else: - raise NotImplementedError('The mode is not implemented: %s' % mode) - - num_params = train_utils.try_count_params(trainer.model) - if num_params is not None: - logging.info('Number of trainable params in model: %f Millions.', - num_params / 10.**6) - - flops = train_utils.try_count_flops(trainer.model) - if flops is not None: - logging.info('FLOPs (multi-adds) in model: %f Billions.', - flops / 10.**9 / 2) - - if run_post_eval: - with distribution_strategy.scope(): - return trainer.model, trainer.evaluate( - tf.convert_to_tensor(params.trainer.validation_steps)) - else: - return trainer.model, {} + runner = OrbitExperimentRunner( + distribution_strategy=distribution_strategy, + task=task, + mode=mode, + params=params, + model_dir=model_dir, + run_post_eval=run_post_eval, + save_summary=save_summary, + train_actions=train_actions, + eval_actions=eval_actions, + trainer=trainer, + controller_cls=controller_cls, + ) + return runner.run() diff --git a/official/core/train_lib_test.py b/official/core/train_lib_test.py index 9c27054d539b009e43ba6f2e9b846d49e02357a1..dd87f5fee4ec4f74f391b63eaaf0fd2ea0aaac1b 100644 --- a/official/core/train_lib_test.py +++ b/official/core/train_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -117,6 +117,61 @@ class TrainTest(tf.test.TestCase, parameterized.TestCase): model_dir=model_dir, run_post_eval=run_post_eval) + @combinations.generate( + combinations.combine( + distribution_strategy=[ + strategy_combinations.default_strategy, + strategy_combinations.cloud_tpu_strategy, + strategy_combinations.one_device_strategy_gpu, + ], + flag_mode=['train', 'eval', 'train_and_eval'], + run_post_eval=[True, False])) + def test_end_to_end_class(self, distribution_strategy, flag_mode, + run_post_eval): + model_dir = self.get_temp_dir() + flags_dict = dict( + experiment='mock', + mode=flag_mode, + model_dir=model_dir, + params_override=json.dumps(self._test_config)) + with flagsaver.flagsaver(**flags_dict): + params = train_utils.parse_configuration(flags.FLAGS) + train_utils.serialize_config(params, model_dir) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + _, logs = train_lib.OrbitExperimentRunner( + distribution_strategy=distribution_strategy, + task=task, + mode=flag_mode, + params=params, + model_dir=model_dir, + run_post_eval=run_post_eval).run() + + if 'eval' in flag_mode: + self.assertTrue( + tf.io.gfile.exists( + os.path.join(model_dir, + params.trainer.validation_summary_subdir))) + if run_post_eval: + self.assertNotEmpty(logs) + else: + self.assertEmpty(logs) + self.assertNotEmpty( + tf.io.gfile.glob(os.path.join(model_dir, 'params.yaml'))) + if flag_mode == 'eval': + return + self.assertNotEmpty( + tf.io.gfile.glob(os.path.join(model_dir, 'checkpoint'))) + # Tests continuous evaluation. + _, logs = train_lib.OrbitExperimentRunner( + distribution_strategy=distribution_strategy, + task=task, + mode='continuous_eval', + params=params, + model_dir=model_dir, + run_post_eval=run_post_eval).run() + @combinations.generate( combinations.combine( distribution_strategy=[ @@ -148,12 +203,12 @@ class TrainTest(tf.test.TestCase, parameterized.TestCase): task.build_losses = build_losses with self.assertRaises(RuntimeError): - train_lib.run_experiment( + train_lib.OrbitExperimentRunner( distribution_strategy=distribution_strategy, task=task, mode=flag_mode, params=params, - model_dir=model_dir) + model_dir=model_dir).run() @combinations.generate( combinations.combine( @@ -194,12 +249,12 @@ class TrainTest(tf.test.TestCase, parameterized.TestCase): task.build_losses = build_losses - model, _ = train_lib.run_experiment( + model, _ = train_lib.OrbitExperimentRunner( distribution_strategy=distribution_strategy, task=task, mode=flag_mode, params=params, - model_dir=model_dir) + model_dir=model_dir).run() after_weights = model.get_weights() for left, right in zip(before_weights, after_weights): self.assertAllEqual(left, right) diff --git a/official/core/train_utils.py b/official/core/train_utils.py index 7672661b569d4ce72758f396118f2e3ed6632c3c..94d7bd70d32bbe3061bd6b0e47fa8165e8815c15 100644 --- a/official/core/train_utils.py +++ b/official/core/train_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,6 +15,7 @@ """Training utils.""" import copy import dataclasses +import inspect import json import os import pprint @@ -35,6 +36,9 @@ from official.core import exp_factory from official.modeling import hyperparams +BEST_CHECKPOINT_NAME = 'best_ckpt' + + def get_leaf_nested_dict(d: Dict[str, Any], keys: List[str]) -> Dict[str, Any]: """Get leaf from a dictionary with arbitrary depth with a list of keys. @@ -101,7 +105,6 @@ def maybe_create_best_ckpt_exporter(params: config_definitions.ExperimentConfig, return best_ckpt_exporter -# TODO(b/180147589): Add tests for this module. class BestCheckpointExporter: """Keeps track of the best result, and saves its checkpoint. @@ -138,7 +141,7 @@ class BestCheckpointExporter: checkpoint, directory=self._export_dir, max_to_keep=1, - checkpoint_name='best_ckpt') + checkpoint_name=BEST_CHECKPOINT_NAME) return self._checkpoint_manager @@ -209,6 +212,28 @@ class BestCheckpointExporter: return tf.train.latest_checkpoint(self._export_dir) +def create_optimizer(task: base_task.Task, + params: config_definitions.ExperimentConfig + ) -> tf.keras.optimizers.Optimizer: + """A create optimizer util to be backward compatability with new args.""" + if 'dp_config' in inspect.signature(task.create_optimizer).parameters: + dp_config = None + if hasattr(params.task, 'differential_privacy_config'): + dp_config = params.task.differential_privacy_config + optimizer = task.create_optimizer( + params.trainer.optimizer_config, params.runtime, + dp_config=dp_config) + else: + if hasattr(params.task, 'differential_privacy_config' + ) and params.task.differential_privacy_config is not None: + raise ValueError('Differential privacy config is specified but ' + 'task.create_optimizer api does not accept it.') + optimizer = task.create_optimizer( + params.trainer.optimizer_config, + params.runtime) + return optimizer + + @gin.configurable def create_trainer(params: config_definitions.ExperimentConfig, task: base_task.Task, @@ -219,8 +244,7 @@ def create_trainer(params: config_definitions.ExperimentConfig, """Create trainer.""" logging.info('Running default trainer.') model = task.build_model() - optimizer = task.create_optimizer(params.trainer.optimizer_config, - params.runtime) + optimizer = create_optimizer(task, params) return trainer_cls( params, task, diff --git a/official/core/train_utils_test.py b/official/core/train_utils_test.py index 2010736aa2a9285bc55e4cd5194db297431f1385..dbc49d2b7d504967da04c1134a5abd91fb44494b 100644 --- a/official/core/train_utils_test.py +++ b/official/core/train_utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,6 +13,7 @@ # limitations under the License. """Tests for official.core.train_utils.""" +import json import os import pprint @@ -138,5 +139,60 @@ class TrainUtilsTest(tf.test.TestCase): self.assertEqual(params_from_obj.trainer.validation_steps, 11) +class BestCheckpointExporterTest(tf.test.TestCase): + + def test_maybe_export(self): + model_dir = self.create_tempdir().full_path + best_ckpt_path = os.path.join(model_dir, 'best_ckpt-1') + metric_name = 'test_metric|metric_1' + exporter = train_utils.BestCheckpointExporter( + model_dir, metric_name, 'higher') + v = tf.Variable(1.0) + checkpoint = tf.train.Checkpoint(v=v) + ret = exporter.maybe_export_checkpoint( + checkpoint, {'test_metric': {'metric_1': 5.0}}, 100) + with self.subTest(name='Successful first save.'): + self.assertEqual(ret, True) + v_2 = tf.Variable(2.0) + checkpoint_2 = tf.train.Checkpoint(v=v_2) + checkpoint_2.restore(best_ckpt_path) + self.assertEqual(v_2.numpy(), 1.0) + + v = tf.Variable(3.0) + checkpoint = tf.train.Checkpoint(v=v) + ret = exporter.maybe_export_checkpoint( + checkpoint, {'test_metric': {'metric_1': 6.0}}, 200) + with self.subTest(name='Successful better metic save.'): + self.assertEqual(ret, True) + v_2 = tf.Variable(2.0) + checkpoint_2 = tf.train.Checkpoint(v=v_2) + checkpoint_2.restore(best_ckpt_path) + self.assertEqual(v_2.numpy(), 3.0) + + v = tf.Variable(5.0) + checkpoint = tf.train.Checkpoint(v=v) + ret = exporter.maybe_export_checkpoint( + checkpoint, {'test_metric': {'metric_1': 1.0}}, 300) + with self.subTest(name='Worse metic no save.'): + self.assertEqual(ret, False) + v_2 = tf.Variable(2.0) + checkpoint_2 = tf.train.Checkpoint(v=v_2) + checkpoint_2.restore(best_ckpt_path) + self.assertEqual(v_2.numpy(), 3.0) + + def test_export_best_eval_metric(self): + model_dir = self.create_tempdir().full_path + metric_name = 'test_metric|metric_1' + exporter = train_utils.BestCheckpointExporter(model_dir, metric_name, + 'higher') + exporter.export_best_eval_metric({'test_metric': {'metric_1': 5.0}}, 100) + with tf.io.gfile.GFile(os.path.join(model_dir, 'info.json'), + 'rb') as reader: + metric = json.loads(reader.read()) + self.assertAllEqual( + metric, + {'test_metric': {'metric_1': 5.0}, 'best_ckpt_global_step': 100.0}) + + if __name__ == '__main__': tf.test.main() diff --git a/official/legacy/README.md b/official/legacy/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ced1fce05dfcd8308d7bec8b01186a8804bc074f --- /dev/null +++ b/official/legacy/README.md @@ -0,0 +1,5 @@ +Models in this `legacy` directory are mainly are used for benchmarking the +models. + +Please note that the models in this `legacy` directory are not supported like +the models in official/nlp and official/vision. diff --git a/official/legacy/__init__.py b/official/legacy/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/__init__.py +++ b/official/legacy/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/nlp/albert/README.md b/official/legacy/albert/README.md similarity index 100% rename from official/legacy/nlp/albert/README.md rename to official/legacy/albert/README.md diff --git a/official/legacy/albert/__init__.py b/official/legacy/albert/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/legacy/albert/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/legacy/albert/configs.py b/official/legacy/albert/configs.py new file mode 100644 index 0000000000000000000000000000000000000000..7baf693aee884d71021e55f64b6477bdd397ed68 --- /dev/null +++ b/official/legacy/albert/configs.py @@ -0,0 +1,50 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""The ALBERT configurations.""" + +import six + +from official.legacy.bert import configs + + +class AlbertConfig(configs.BertConfig): + """Configuration for `ALBERT`.""" + + def __init__(self, num_hidden_groups=1, inner_group_num=1, **kwargs): + """Constructs AlbertConfig. + + Args: + num_hidden_groups: Number of group for the hidden layers, parameters in + the same group are shared. Note that this value and also the following + 'inner_group_num' has to be 1 for now, because all released ALBERT + models set them to 1. We may support arbitary valid values in future. + inner_group_num: Number of inner repetition of attention and ffn. + **kwargs: The remaining arguments are the same as above 'BertConfig'. + """ + super(AlbertConfig, self).__init__(**kwargs) + + # TODO(chendouble): 'inner_group_num' and 'num_hidden_groups' are always 1 + # in the released ALBERT. Support other values in AlbertEncoder if needed. + if inner_group_num != 1 or num_hidden_groups != 1: + raise ValueError("We only support 'inner_group_num' and " + "'num_hidden_groups' as 1.") + + @classmethod + def from_dict(cls, json_object): + """Constructs a `AlbertConfig` from a Python dictionary of parameters.""" + config = AlbertConfig(vocab_size=None) + for (key, value) in six.iteritems(json_object): + config.__dict__[key] = value + return config diff --git a/official/legacy/bert/README.md b/official/legacy/bert/README.md new file mode 100644 index 0000000000000000000000000000000000000000..cf4062a6dc39567e048232f84218ccd71e19b6fc --- /dev/null +++ b/official/legacy/bert/README.md @@ -0,0 +1,395 @@ +# BERT (Bidirectional Encoder Representations from Transformers) + +**WARNING**: We are on the way to deprecate most of the code in this directory. +Please see +[this link](../g3doc/tutorials/bert_new.md) +for the new tutorial and use the new code in `nlp/modeling`. This README is +still correct for this legacy implementation. + +The academic paper which describes BERT in detail and provides full results on a +number of tasks can be found here: https://arxiv.org/abs/1810.04805. + +This repository contains TensorFlow 2.x implementation for BERT. + +## Contents + * [Contents](#contents) + * [Pre-trained Models](#pre-trained-models) + * [Restoring from Checkpoints](#restoring-from-checkpoints) + * [Set Up](#set-up) + * [Process Datasets](#process-datasets) + * [Fine-tuning with BERT](#fine-tuning-with-bert) + * [Cloud GPUs and TPUs](#cloud-gpus-and-tpus) + * [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks) + * [SQuAD 1.1](#squad-1.1) + + +## Pre-trained Models + +We released both checkpoints and tf.hub modules as the pretrained models for +fine-tuning. They are TF 2.x compatible and are converted from the checkpoints +released in TF 1.x official BERT repository +[google-research/bert](https://github.com/google-research/bert) +in order to keep consistent with BERT paper. + + +### Access to Pretrained Checkpoints + +Pretrained checkpoints can be found in the following links: + +**Note: We have switched BERT implementation +to use Keras functional-style networks in [nlp/modeling](../modeling). +The new checkpoints are:** + +* **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz)**: + 12-layer, 768-hidden, 12-heads, 110M parameters +* **[`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz)**: + 12-layer, 768-hidden, 12-heads , 110M parameters +* **[`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Base, Multilingual Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/multi_cased_L-12_H-768_A-12.tar.gz)**: + 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters + +We recommend to host checkpoints on Google Cloud storage buckets when you use +Cloud GPU/TPU. + +### Restoring from Checkpoints + +`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore +weights from provided pre-trained checkpoints, you can use the following code: + +```python +init_checkpoint='the pretrained model checkpoint path.' +model=tf.keras.Model() # Bert pre-trained model as feature extractor. +checkpoint = tf.train.Checkpoint(model=model) +checkpoint.restore(init_checkpoint) +``` + +Checkpoints featuring native serialized Keras models +(i.e. model.load()/load_weights()) will be available soon. + +### Access to Pretrained hub modules. + +Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the +following links: + +* **[`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/)**: + 12-layer, 768-hidden, 12-heads, 110M parameters +* **[`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/)**: + 12-layer, 768-hidden, 12-heads , 110M parameters +* **[`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/)**: + 24-layer, 1024-hidden, 16-heads, 340M parameters +* **[`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/)**: + 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters +* **[`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/)**: + Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, + 110M parameters + +## Set Up + +```shell +export PYTHONPATH="$PYTHONPATH:/path/to/models" +``` + +Install `tf-nightly` to get latest updates: + +```shell +pip install tf-nightly-gpu +``` + +With TPU, GPU support is not necessary. First, you need to create a `tf-nightly` +TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu): + +```shell +ctpu up -name --tf-version=”nightly” +``` + +Second, you need to install TF 2 `tf-nightly` on your VM: + +```shell +pip install tf-nightly +``` + +## Process Datasets + +### Pre-training + +There is no change to generate pre-training data. Please use the script +[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py) +which is essentially branched from [BERT research repo](https://github.com/google-research/bert) +to get processed pre-training data and it adapts to TF2 symbols and python3 +compatibility. + +Running the pre-training script requires an input and output directory, as well as a vocab file. Note that max_seq_length will need to match the sequence length parameter you specify when you run pre-training. + +Example shell script to call create_pretraining_data.py +``` +export WORKING_DIR='local disk or cloud location' +export BERT_DIR='local disk or cloud location' +python models/official/nlp/data/create_pretraining_data.py \ + --input_file=$WORKING_DIR/input/input.txt \ + --output_file=$WORKING_DIR/output/tf_examples.tfrecord \ + --vocab_file=$BERT_DIR/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ + --do_lower_case=True \ + --max_seq_length=512 \ + --max_predictions_per_seq=76 \ + --masked_lm_prob=0.15 \ + --random_seed=12345 \ + --dupe_factor=5 +``` + +### Fine-tuning + +To prepare the fine-tuning data for final model training, use the +[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script. +Resulting datasets in `tf_record` format and training meta data should be later +passed to training or evaluation scripts. The task-specific arguments are +described in following sections: + +* GLUE + +Users can download the +[GLUE data](https://gluebenchmark.com/tasks) by running +[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e) +and unpack it to some directory `$GLUE_DIR`. +Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage. + +```shell +export GLUE_DIR=~/glue +export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 + +export TASK_NAME=MNLI +export OUTPUT_DIR=gs://some_bucket/datasets +python ../data/create_finetuning_data.py \ + --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \ + --vocab_file=${BERT_DIR}/vocab.txt \ + --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \ + --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \ + --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \ + --fine_tuning_task_type=classification --max_seq_length=128 \ + --classification_task_name=${TASK_NAME} +``` + +* SQUAD + +The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains +detailed information about the SQuAD datasets and evaluation. + +The necessary files can be found here: + +* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json) +* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json) +* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py) +* [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json) +* [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json) +* [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/) + +```shell +export SQUAD_DIR=~/squad +export SQUAD_VERSION=v1.1 +export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 +export OUTPUT_DIR=gs://some_bucket/datasets + +python ../data/create_finetuning_data.py \ + --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \ + --vocab_file=${BERT_DIR}/vocab.txt \ + --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ + --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \ + --fine_tuning_task_type=squad --max_seq_length=384 +``` + +Note: To create fine-tuning data with SQUAD 2.0, you need to add flag `--version_2_with_negative=True`. + +## Fine-tuning with BERT + +### Cloud GPUs and TPUs + +* Cloud Storage + +The unzipped pre-trained model files can also be found in the Google Cloud +Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example: + +```shell +export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 +export MODEL_DIR=gs://some_bucket/my_output_dir +``` + +Currently, users are able to access to `tf-nightly` TPUs and the following TPU +script should run with `tf-nightly`. + +* GPU -> TPU + +Just add the following flags to `run_classifier.py` or `run_squad.py`: + +```shell + --distribution_strategy=tpu + --tpu=grpc://${TPU_IP_ADDRESS}:8470 +``` + +### Sentence and Sentence-pair Classification Tasks + +This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase +Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a +few minutes on most GPUs. + +We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the +workflow. +For GPU memory of 16GB or smaller, you may try to use `BERT-Base` +(uncased_L-12_H-768_A-12). + +```shell +export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 +export MODEL_DIR=gs://some_bucket/my_output_dir +export GLUE_DIR=gs://some_bucket/datasets +export TASK=MRPC + +python run_classifier.py \ + --mode='train_and_eval' \ + --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ + --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ + --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ + --bert_config_file=${BERT_DIR}/bert_config.json \ + --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ + --train_batch_size=4 \ + --eval_batch_size=4 \ + --steps_per_loop=1 \ + --learning_rate=2e-5 \ + --num_train_epochs=3 \ + --model_dir=${MODEL_DIR} \ + --distribution_strategy=mirrored +``` + +Alternatively, instead of specifying `init_checkpoint`, you can specify +`hub_module_url` to employ a pretraind BERT hub module, e.g., +` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`. + +After training a model, to get predictions from the classifier, you can set the +`--mode=predict` and offer the test set tfrecords to `--eval_data_path`. +Output will be created in file called test_results.tsv in the output folder. +Each line will contain output for each sample, columns are the class +probabilities. + +```shell +python run_classifier.py \ + --mode='predict' \ + --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ + --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ + --bert_config_file=${BERT_DIR}/bert_config.json \ + --eval_batch_size=4 \ + --model_dir=${MODEL_DIR} \ + --distribution_strategy=mirrored +``` + +To use TPU, you only need to switch distribution strategy type to `tpu` with TPU +information and use remote storage for model checkpoints. + +```shell +export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 +export TPU_IP_ADDRESS='???' +export MODEL_DIR=gs://some_bucket/my_output_dir +export GLUE_DIR=gs://some_bucket/datasets +export TASK=MRPC + +python run_classifier.py \ + --mode='train_and_eval' \ + --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ + --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ + --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ + --bert_config_file=${BERT_DIR}/bert_config.json \ + --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ + --train_batch_size=32 \ + --eval_batch_size=32 \ + --steps_per_loop=1000 \ + --learning_rate=2e-5 \ + --num_train_epochs=3 \ + --model_dir=${MODEL_DIR} \ + --distribution_strategy=tpu \ + --tpu=grpc://${TPU_IP_ADDRESS}:8470 +``` + +Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of +training steps inside a `tf.function` can significantly increase TPU utilization +and callbacks will not be called inside the loop. + +### SQuAD 1.1 + +The Stanford Question Answering Dataset (SQuAD) is a popular question answering +benchmark dataset. See more in [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/). + +We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the +workflow. +For GPU memory of 16GB or smaller, you may try to use `BERT-Base` +(uncased_L-12_H-768_A-12). + +```shell +export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 +export SQUAD_DIR=gs://some_bucket/datasets +export MODEL_DIR=gs://some_bucket/my_output_dir +export SQUAD_VERSION=v1.1 + +python run_squad.py \ + --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ + --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ + --predict_file=${SQUAD_DIR}/dev-v1.1.json \ + --vocab_file=${BERT_DIR}/vocab.txt \ + --bert_config_file=${BERT_DIR}/bert_config.json \ + --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ + --train_batch_size=4 \ + --predict_batch_size=4 \ + --learning_rate=8e-5 \ + --num_train_epochs=2 \ + --model_dir=${MODEL_DIR} \ + --distribution_strategy=mirrored +``` + +Similarily, you can replace `init_checkpoint` FLAG with `hub_module_url` to +specify a hub module path. + +`run_squad.py` writes the prediction for `--predict_file` by default. If you set +the `--model=predict` and offer the SQuAD test data, the scripts will generate +the prediction json file. + +To use TPU, you need switch distribution strategy type to `tpu` with TPU +information. + +```shell +export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 +export TPU_IP_ADDRESS='???' +export MODEL_DIR=gs://some_bucket/my_output_dir +export SQUAD_DIR=gs://some_bucket/datasets +export SQUAD_VERSION=v1.1 + +python run_squad.py \ + --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ + --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ + --predict_file=${SQUAD_DIR}/dev-v1.1.json \ + --vocab_file=${BERT_DIR}/vocab.txt \ + --bert_config_file=${BERT_DIR}/bert_config.json \ + --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ + --train_batch_size=32 \ + --learning_rate=8e-5 \ + --num_train_epochs=2 \ + --model_dir=${MODEL_DIR} \ + --distribution_strategy=tpu \ + --tpu=grpc://${TPU_IP_ADDRESS}:8470 +``` + +The dev set predictions will be saved into a file called predictions.json in the +model_dir: + +```shell +python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json +``` + + diff --git a/official/legacy/bert/__init__.py b/official/legacy/bert/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 --- /dev/null +++ b/official/legacy/bert/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + diff --git a/official/nlp/bert/bert_cloud_tpu.md b/official/legacy/bert/bert_cloud_tpu.md similarity index 100% rename from official/nlp/bert/bert_cloud_tpu.md rename to official/legacy/bert/bert_cloud_tpu.md diff --git a/official/nlp/bert/bert_models.py b/official/legacy/bert/bert_models.py similarity index 98% rename from official/nlp/bert/bert_models.py rename to official/legacy/bert/bert_models.py index a1061e6c893a64183ae1a83d8fcd6cd4fb1e3ec8..21d095174cc3c8af00ae2b2a0a601c1dae8d3f6b 100644 --- a/official/nlp/bert/bert_models.py +++ b/official/legacy/bert/bert_models.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,9 +17,9 @@ import gin import tensorflow as tf import tensorflow_hub as hub -from official.legacy.nlp.albert import configs as albert_configs +from official.legacy.albert import configs as albert_configs +from official.legacy.bert import configs from official.modeling import tf_utils -from official.nlp.bert import configs from official.nlp.modeling import models from official.nlp.modeling import networks diff --git a/official/nlp/bert/bert_models_test.py b/official/legacy/bert/bert_models_test.py similarity index 95% rename from official/nlp/bert/bert_models_test.py rename to official/legacy/bert/bert_models_test.py index 8c4a52a20d343e3d7cc5f0ccac250d5f4f036667..e64c013c40d2724e02ffbf1ab75b2269928fb000 100644 --- a/official/nlp/bert/bert_models_test.py +++ b/official/legacy/bert/bert_models_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,8 +14,8 @@ import tensorflow as tf -from official.nlp.bert import bert_models -from official.nlp.bert import configs as bert_configs +from official.legacy.bert import bert_models +from official.legacy.bert import configs as bert_configs from official.nlp.modeling import networks diff --git a/official/legacy/bert/common_flags.py b/official/legacy/bert/common_flags.py new file mode 100644 index 0000000000000000000000000000000000000000..32ad7059f04e7b17a894de8305df353e3304440e --- /dev/null +++ b/official/legacy/bert/common_flags.py @@ -0,0 +1,125 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Defining common flags used across all BERT models/applications.""" + +from absl import flags +import tensorflow as tf + +from official.utils import hyperparams_flags +from official.utils.flags import core as flags_core + + +def define_common_bert_flags(): + """Define common flags for BERT tasks.""" + flags_core.define_base( + data_dir=False, + model_dir=True, + clean=False, + train_epochs=False, + epochs_between_evals=False, + stop_threshold=False, + batch_size=False, + num_gpu=True, + export_dir=False, + distribution_strategy=True, + run_eagerly=True) + flags_core.define_distribution() + flags.DEFINE_string('bert_config_file', None, + 'Bert configuration file to define core bert layers.') + flags.DEFINE_string( + 'model_export_path', None, + 'Path to the directory, where trainined model will be ' + 'exported.') + flags.DEFINE_string('tpu', '', 'TPU address to connect to.') + flags.DEFINE_string( + 'init_checkpoint', None, + 'Initial checkpoint (usually from a pre-trained BERT model).') + flags.DEFINE_integer('num_train_epochs', 3, + 'Total number of training epochs to perform.') + flags.DEFINE_integer( + 'steps_per_loop', None, + 'Number of steps per graph-mode loop. Only training step ' + 'happens inside the loop. Callbacks will not be called ' + 'inside. If not set the value will be configured depending on the ' + 'devices available.') + flags.DEFINE_float('learning_rate', 5e-5, + 'The initial learning rate for Adam.') + flags.DEFINE_float('end_lr', 0.0, + 'The end learning rate for learning rate decay.') + flags.DEFINE_string('optimizer_type', 'adamw', + 'The type of optimizer to use for training (adamw|lamb)') + flags.DEFINE_boolean( + 'scale_loss', False, + 'Whether to divide the loss by number of replica inside the per-replica ' + 'loss function.') + flags.DEFINE_boolean( + 'use_keras_compile_fit', False, + 'If True, uses Keras compile/fit() API for training logic. Otherwise ' + 'use custom training loop.') + flags.DEFINE_string( + 'hub_module_url', None, 'TF-Hub path/url to Bert module. ' + 'If specified, init_checkpoint flag should not be used.') + flags.DEFINE_bool('hub_module_trainable', True, + 'True to make keras layers in the hub module trainable.') + flags.DEFINE_string( + 'sub_model_export_name', None, + 'If set, `sub_model` checkpoints are exported into ' + 'FLAGS.model_dir/FLAGS.sub_model_export_name.') + flags.DEFINE_bool('explicit_allreduce', False, + 'True to use explicit allreduce instead of the implicit ' + 'allreduce in optimizer.apply_gradients(). If fp16 mixed ' + 'precision training is used, this also enables allreduce ' + 'gradients in fp16.') + flags.DEFINE_integer('allreduce_bytes_per_pack', 0, + 'Number of bytes of a gradient pack for allreduce. ' + 'Should be positive integer, if set to 0, all ' + 'gradients are in one pack. Breaking gradient into ' + 'packs could enable overlap between allreduce and ' + 'backprop computation. This flag only takes effect ' + 'when explicit_allreduce is set to True.') + + flags_core.define_log_steps() + + # Adds flags for mixed precision and multi-worker training. + flags_core.define_performance( + num_parallel_calls=False, + inter_op=False, + intra_op=False, + synthetic_data=False, + max_train_steps=False, + dtype=True, + loss_scale=True, + all_reduce_alg=True, + num_packs=False, + tf_gpu_thread_mode=True, + datasets_num_private_threads=True, + enable_xla=True, + fp16_implementation=True, + ) + + # Adds gin configuration flags. + hyperparams_flags.define_gin_flags() + + +def dtype(): + return flags_core.get_tf_dtype(flags.FLAGS) + + +def use_float16(): + return flags_core.get_tf_dtype(flags.FLAGS) == tf.float16 + + +def get_loss_scale(): + return flags_core.get_loss_scale(flags.FLAGS, default_for_fp16='dynamic') diff --git a/official/legacy/bert/configs.py b/official/legacy/bert/configs.py new file mode 100644 index 0000000000000000000000000000000000000000..bbded1932654a9884e1404ab9543b7955d494aef --- /dev/null +++ b/official/legacy/bert/configs.py @@ -0,0 +1,104 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""The main BERT model and related functions.""" + +import copy +import json + +import six +import tensorflow as tf + + +class BertConfig(object): + """Configuration for `BertModel`.""" + + def __init__(self, + vocab_size, + hidden_size=768, + num_hidden_layers=12, + num_attention_heads=12, + intermediate_size=3072, + hidden_act="gelu", + hidden_dropout_prob=0.1, + attention_probs_dropout_prob=0.1, + max_position_embeddings=512, + type_vocab_size=16, + initializer_range=0.02, + embedding_size=None, + backward_compatible=True): + """Constructs BertConfig. + + Args: + vocab_size: Vocabulary size of `inputs_ids` in `BertModel`. + hidden_size: Size of the encoder layers and the pooler layer. + num_hidden_layers: Number of hidden layers in the Transformer encoder. + num_attention_heads: Number of attention heads for each attention layer in + the Transformer encoder. + intermediate_size: The size of the "intermediate" (i.e., feed-forward) + layer in the Transformer encoder. + hidden_act: The non-linear activation function (function or string) in the + encoder and pooler. + hidden_dropout_prob: The dropout probability for all fully connected + layers in the embeddings, encoder, and pooler. + attention_probs_dropout_prob: The dropout ratio for the attention + probabilities. + max_position_embeddings: The maximum sequence length that this model might + ever be used with. Typically set this to something large just in case + (e.g., 512 or 1024 or 2048). + type_vocab_size: The vocabulary size of the `token_type_ids` passed into + `BertModel`. + initializer_range: The stdev of the truncated_normal_initializer for + initializing all weight matrices. + embedding_size: (Optional) width of the factorized word embeddings. + backward_compatible: Boolean, whether the variables shape are compatible + with checkpoints converted from TF 1.x BERT. + """ + self.vocab_size = vocab_size + self.hidden_size = hidden_size + self.num_hidden_layers = num_hidden_layers + self.num_attention_heads = num_attention_heads + self.hidden_act = hidden_act + self.intermediate_size = intermediate_size + self.hidden_dropout_prob = hidden_dropout_prob + self.attention_probs_dropout_prob = attention_probs_dropout_prob + self.max_position_embeddings = max_position_embeddings + self.type_vocab_size = type_vocab_size + self.initializer_range = initializer_range + self.embedding_size = embedding_size + self.backward_compatible = backward_compatible + + @classmethod + def from_dict(cls, json_object): + """Constructs a `BertConfig` from a Python dictionary of parameters.""" + config = BertConfig(vocab_size=None) + for (key, value) in six.iteritems(json_object): + config.__dict__[key] = value + return config + + @classmethod + def from_json_file(cls, json_file): + """Constructs a `BertConfig` from a json file of parameters.""" + with tf.io.gfile.GFile(json_file, "r") as reader: + text = reader.read() + return cls.from_dict(json.loads(text)) + + def to_dict(self): + """Serializes this instance to a Python dictionary.""" + output = copy.deepcopy(self.__dict__) + return output + + def to_json_string(self): + """Serializes this instance to a JSON string.""" + return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n" diff --git a/official/legacy/bert/export_tfhub.py b/official/legacy/bert/export_tfhub.py new file mode 100644 index 0000000000000000000000000000000000000000..69dd49865e2490a77624374e48976b48bfabeae5 --- /dev/null +++ b/official/legacy/bert/export_tfhub.py @@ -0,0 +1,139 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""A script to export BERT as a TF-Hub SavedModel. + +This script is **DEPRECATED** for exporting BERT encoder models; +see the error message in by main() for details. +""" + +from typing import Text + +# Import libraries +from absl import app +from absl import flags +from absl import logging +import tensorflow as tf +from official.legacy.bert import bert_models +from official.legacy.bert import configs + +FLAGS = flags.FLAGS + +flags.DEFINE_string("bert_config_file", None, + "Bert configuration file to define core bert layers.") +flags.DEFINE_string("model_checkpoint_path", None, + "File path to TF model checkpoint.") +flags.DEFINE_string("export_path", None, "TF-Hub SavedModel destination path.") +flags.DEFINE_string("vocab_file", None, + "The vocabulary file that the BERT model was trained on.") +flags.DEFINE_bool( + "do_lower_case", None, "Whether to lowercase. If None, " + "do_lower_case will be enabled if 'uncased' appears in the " + "name of --vocab_file") +flags.DEFINE_enum("model_type", "encoder", ["encoder", "squad"], + "What kind of BERT model to export.") + + +def create_bert_model(bert_config: configs.BertConfig) -> tf.keras.Model: + """Creates a BERT keras core model from BERT configuration. + + Args: + bert_config: A `BertConfig` to create the core model. + + Returns: + A keras model. + """ + # Adds input layers just as placeholders. + input_word_ids = tf.keras.layers.Input( + shape=(None,), dtype=tf.int32, name="input_word_ids") + input_mask = tf.keras.layers.Input( + shape=(None,), dtype=tf.int32, name="input_mask") + input_type_ids = tf.keras.layers.Input( + shape=(None,), dtype=tf.int32, name="input_type_ids") + transformer_encoder = bert_models.get_transformer_encoder( + bert_config, sequence_length=None) + sequence_output, pooled_output = transformer_encoder( + [input_word_ids, input_mask, input_type_ids]) + # To keep consistent with legacy hub modules, the outputs are + # "pooled_output" and "sequence_output". + return tf.keras.Model( + inputs=[input_word_ids, input_mask, input_type_ids], + outputs=[pooled_output, sequence_output]), transformer_encoder + + +def export_bert_tfhub(bert_config: configs.BertConfig, + model_checkpoint_path: Text, + hub_destination: Text, + vocab_file: Text, + do_lower_case: bool = None): + """Restores a tf.keras.Model and saves for TF-Hub.""" + # If do_lower_case is not explicit, default to checking whether "uncased" is + # in the vocab file name + if do_lower_case is None: + do_lower_case = "uncased" in vocab_file + logging.info("Using do_lower_case=%s based on name of vocab_file=%s", + do_lower_case, vocab_file) + core_model, encoder = create_bert_model(bert_config) + checkpoint = tf.train.Checkpoint( + model=encoder, # Legacy checkpoints. + encoder=encoder) + checkpoint.restore(model_checkpoint_path).assert_existing_objects_matched() + core_model.vocab_file = tf.saved_model.Asset(vocab_file) + core_model.do_lower_case = tf.Variable(do_lower_case, trainable=False) + core_model.save(hub_destination, include_optimizer=False, save_format="tf") + + +def export_bert_squad_tfhub(bert_config: configs.BertConfig, + model_checkpoint_path: Text, + hub_destination: Text, + vocab_file: Text, + do_lower_case: bool = None): + """Restores a tf.keras.Model for BERT with SQuAD and saves for TF-Hub.""" + # If do_lower_case is not explicit, default to checking whether "uncased" is + # in the vocab file name + if do_lower_case is None: + do_lower_case = "uncased" in vocab_file + logging.info("Using do_lower_case=%s based on name of vocab_file=%s", + do_lower_case, vocab_file) + span_labeling, _ = bert_models.squad_model(bert_config, max_seq_length=None) + checkpoint = tf.train.Checkpoint(model=span_labeling) + checkpoint.restore(model_checkpoint_path).assert_existing_objects_matched() + span_labeling.vocab_file = tf.saved_model.Asset(vocab_file) + span_labeling.do_lower_case = tf.Variable(do_lower_case, trainable=False) + span_labeling.save(hub_destination, include_optimizer=False, save_format="tf") + + +def main(_): + bert_config = configs.BertConfig.from_json_file(FLAGS.bert_config_file) + if FLAGS.model_type == "encoder": + deprecation_note = ( + "nlp/bert/export_tfhub is **DEPRECATED** for exporting BERT encoder " + "models. Please switch to nlp/tools/export_tfhub for exporting BERT " + "(and other) encoders with dict inputs/outputs conforming to " + "https://www.tensorflow.org/hub/common_saved_model_apis/text#transformer-encoders" + ) + logging.error(deprecation_note) + print("\n\nNOTICE:", deprecation_note, "\n") + export_bert_tfhub(bert_config, FLAGS.model_checkpoint_path, + FLAGS.export_path, FLAGS.vocab_file, FLAGS.do_lower_case) + elif FLAGS.model_type == "squad": + export_bert_squad_tfhub(bert_config, FLAGS.model_checkpoint_path, + FLAGS.export_path, FLAGS.vocab_file, + FLAGS.do_lower_case) + else: + raise ValueError("Unsupported model_type %s." % FLAGS.model_type) + + +if __name__ == "__main__": + app.run(main) diff --git a/official/legacy/bert/export_tfhub_test.py b/official/legacy/bert/export_tfhub_test.py new file mode 100644 index 0000000000000000000000000000000000000000..68146fb58146c489abcabc62b6c15514c1cb28d7 --- /dev/null +++ b/official/legacy/bert/export_tfhub_test.py @@ -0,0 +1,108 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests official.nlp.bert.export_tfhub.""" + +import os + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf +import tensorflow_hub as hub + +from official.legacy.bert import configs +from official.legacy.bert import export_tfhub + + +class ExportTfhubTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters("model", "encoder") + def test_export_tfhub(self, ckpt_key_name): + # Exports a savedmodel for TF-Hub + hidden_size = 16 + bert_config = configs.BertConfig( + vocab_size=100, + hidden_size=hidden_size, + intermediate_size=32, + max_position_embeddings=128, + num_attention_heads=2, + num_hidden_layers=1) + bert_model, encoder = export_tfhub.create_bert_model(bert_config) + model_checkpoint_dir = os.path.join(self.get_temp_dir(), "checkpoint") + checkpoint = tf.train.Checkpoint(**{ckpt_key_name: encoder}) + checkpoint.save(os.path.join(model_checkpoint_dir, "test")) + model_checkpoint_path = tf.train.latest_checkpoint(model_checkpoint_dir) + + vocab_file = os.path.join(self.get_temp_dir(), "uncased_vocab.txt") + with tf.io.gfile.GFile(vocab_file, "w") as f: + f.write("dummy content") + + hub_destination = os.path.join(self.get_temp_dir(), "hub") + export_tfhub.export_bert_tfhub(bert_config, model_checkpoint_path, + hub_destination, vocab_file) + + # Restores a hub KerasLayer. + hub_layer = hub.KerasLayer(hub_destination, trainable=True) + + if hasattr(hub_layer, "resolved_object"): + # Checks meta attributes. + self.assertTrue(hub_layer.resolved_object.do_lower_case.numpy()) + with tf.io.gfile.GFile( + hub_layer.resolved_object.vocab_file.asset_path.numpy()) as f: + self.assertEqual("dummy content", f.read()) + # Checks the hub KerasLayer. + for source_weight, hub_weight in zip(bert_model.trainable_weights, + hub_layer.trainable_weights): + self.assertAllClose(source_weight.numpy(), hub_weight.numpy()) + + seq_length = 10 + dummy_ids = np.zeros((2, seq_length), dtype=np.int32) + hub_outputs = hub_layer([dummy_ids, dummy_ids, dummy_ids]) + source_outputs = bert_model([dummy_ids, dummy_ids, dummy_ids]) + + # The outputs of hub module are "pooled_output" and "sequence_output", + # while the outputs of encoder is in reversed order, i.e., + # "sequence_output" and "pooled_output". + encoder_outputs = reversed(encoder([dummy_ids, dummy_ids, dummy_ids])) + self.assertEqual(hub_outputs[0].shape, (2, hidden_size)) + self.assertEqual(hub_outputs[1].shape, (2, seq_length, hidden_size)) + for source_output, hub_output, encoder_output in zip( + source_outputs, hub_outputs, encoder_outputs): + self.assertAllClose(source_output.numpy(), hub_output.numpy()) + self.assertAllClose(source_output.numpy(), encoder_output.numpy()) + + # Test that training=True makes a difference (activates dropout). + def _dropout_mean_stddev(training, num_runs=20): + input_ids = np.array([[14, 12, 42, 95, 99]], np.int32) + inputs = [input_ids, np.ones_like(input_ids), np.zeros_like(input_ids)] + outputs = np.concatenate( + [hub_layer(inputs, training=training)[0] for _ in range(num_runs)]) + return np.mean(np.std(outputs, axis=0)) + + self.assertLess(_dropout_mean_stddev(training=False), 1e-6) + self.assertGreater(_dropout_mean_stddev(training=True), 1e-3) + + # Test propagation of seq_length in shape inference. + input_word_ids = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) + input_mask = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) + input_type_ids = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) + pooled_output, sequence_output = hub_layer( + [input_word_ids, input_mask, input_type_ids]) + self.assertEqual(pooled_output.shape.as_list(), [None, hidden_size]) + self.assertEqual(sequence_output.shape.as_list(), + [None, seq_length, hidden_size]) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/nlp/bert/input_pipeline.py b/official/legacy/bert/input_pipeline.py similarity index 99% rename from official/nlp/bert/input_pipeline.py rename to official/legacy/bert/input_pipeline.py index 0c0f7615c37142ca039ad9fc68d98776a6b6b7b8..045f16ce76b165cbc329b8a2459b5f575a52eaae 100644 --- a/official/nlp/bert/input_pipeline.py +++ b/official/legacy/bert/input_pipeline.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/bert/model_saving_utils.py b/official/legacy/bert/model_saving_utils.py similarity index 97% rename from official/nlp/bert/model_saving_utils.py rename to official/legacy/bert/model_saving_utils.py index 1d69750878bd8a89482958874b5f059193f6d7f5..6a0d7074972ac8f5c78c1bec135b5bcd2586e317 100644 --- a/official/nlp/bert/model_saving_utils.py +++ b/official/legacy/bert/model_saving_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,10 +15,9 @@ """Utilities to save models.""" import os - +import typing from absl import logging import tensorflow as tf -import typing def export_bert_model(model_export_path: typing.Text, diff --git a/official/nlp/bert/model_training_utils.py b/official/legacy/bert/model_training_utils.py similarity index 99% rename from official/nlp/bert/model_training_utils.py rename to official/legacy/bert/model_training_utils.py index 8cc11993bab397442c927ddcd399c2c620093205..f7c8e443be3798dd2d9670d11d0c789a91459768 100644 --- a/official/nlp/bert/model_training_utils.py +++ b/official/legacy/bert/model_training_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/bert/model_training_utils_test.py b/official/legacy/bert/model_training_utils_test.py similarity index 98% rename from official/nlp/bert/model_training_utils_test.py rename to official/legacy/bert/model_training_utils_test.py index 544b66834002d09dfabd90169e6f53fa9f2bbaf3..298c9282c859ee288540b497ba63838ca6cf6242 100644 --- a/official/nlp/bert/model_training_utils_test.py +++ b/official/legacy/bert/model_training_utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -25,8 +25,8 @@ import tensorflow as tf from tensorflow.python.distribute import combinations from tensorflow.python.distribute import strategy_combinations -from official.nlp.bert import common_flags -from official.nlp.bert import model_training_utils +from official.legacy.bert import common_flags +from official.legacy.bert import model_training_utils common_flags.define_common_bert_flags() diff --git a/official/legacy/bert/run_classifier.py b/official/legacy/bert/run_classifier.py new file mode 100644 index 0000000000000000000000000000000000000000..6e9ea466bcee0ec228bf9be5b43f5393307e30c8 --- /dev/null +++ b/official/legacy/bert/run_classifier.py @@ -0,0 +1,515 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""BERT classification or regression finetuning runner in TF 2.x.""" + +import functools +import json +import math +import os + +# Import libraries +from absl import app +from absl import flags +from absl import logging +import gin +import tensorflow as tf +from official.common import distribute_utils +from official.legacy.bert import bert_models +from official.legacy.bert import common_flags +from official.legacy.bert import configs as bert_configs +from official.legacy.bert import input_pipeline +from official.legacy.bert import model_saving_utils +from official.modeling import performance +from official.nlp import optimization +from official.utils.misc import keras_utils + +flags.DEFINE_enum( + 'mode', 'train_and_eval', ['train_and_eval', 'export_only', 'predict'], + 'One of {"train_and_eval", "export_only", "predict"}. `train_and_eval`: ' + 'trains the model and evaluates in the meantime. ' + '`export_only`: will take the latest checkpoint inside ' + 'model_dir and export a `SavedModel`. `predict`: takes a checkpoint and ' + 'restores the model to output predictions on the test set.') +flags.DEFINE_string('train_data_path', None, + 'Path to training data for BERT classifier.') +flags.DEFINE_string('eval_data_path', None, + 'Path to evaluation data for BERT classifier.') +flags.DEFINE_string( + 'input_meta_data_path', None, + 'Path to file that contains meta data about input ' + 'to be used for training and evaluation.') +flags.DEFINE_integer('train_data_size', None, 'Number of training samples ' + 'to use. If None, uses the full train data. ' + '(default: None).') +flags.DEFINE_string('predict_checkpoint_path', None, + 'Path to the checkpoint for predictions.') +flags.DEFINE_integer( + 'num_eval_per_epoch', 1, + 'Number of evaluations per epoch. The purpose of this flag is to provide ' + 'more granular evaluation scores and checkpoints. For example, if original ' + 'data has N samples and num_eval_per_epoch is n, then each epoch will be ' + 'evaluated every N/n samples.') +flags.DEFINE_integer('train_batch_size', 32, 'Batch size for training.') +flags.DEFINE_integer('eval_batch_size', 32, 'Batch size for evaluation.') + +common_flags.define_common_bert_flags() + +FLAGS = flags.FLAGS + +LABEL_TYPES_MAP = {'int': tf.int64, 'float': tf.float32} + + +def get_loss_fn(num_classes): + """Gets the classification loss function.""" + + def classification_loss_fn(labels, logits): + """Classification loss.""" + labels = tf.reshape(labels, [-1]) + log_probs = tf.nn.log_softmax(logits, axis=-1) + one_hot_labels = tf.one_hot( + tf.cast(labels, dtype=tf.int32), depth=num_classes, dtype=tf.float32) + per_example_loss = -tf.reduce_sum( + tf.cast(one_hot_labels, dtype=tf.float32) * log_probs, axis=-1) + return tf.reduce_mean(per_example_loss) + + return classification_loss_fn + + +def get_dataset_fn(input_file_pattern, + max_seq_length, + global_batch_size, + is_training, + label_type=tf.int64, + include_sample_weights=False, + num_samples=None): + """Gets a closure to create a dataset.""" + + def _dataset_fn(ctx=None): + """Returns tf.data.Dataset for distributed BERT pretraining.""" + batch_size = ctx.get_per_replica_batch_size( + global_batch_size) if ctx else global_batch_size + dataset = input_pipeline.create_classifier_dataset( + tf.io.gfile.glob(input_file_pattern), + max_seq_length, + batch_size, + is_training=is_training, + input_pipeline_context=ctx, + label_type=label_type, + include_sample_weights=include_sample_weights, + num_samples=num_samples) + return dataset + + return _dataset_fn + + +def run_bert_classifier(strategy, + bert_config, + input_meta_data, + model_dir, + epochs, + steps_per_epoch, + steps_per_loop, + eval_steps, + warmup_steps, + initial_lr, + init_checkpoint, + train_input_fn, + eval_input_fn, + training_callbacks=True, + custom_callbacks=None, + custom_metrics=None): + """Run BERT classifier training using low-level API.""" + max_seq_length = input_meta_data['max_seq_length'] + num_classes = input_meta_data.get('num_labels', 1) + is_regression = num_classes == 1 + + def _get_classifier_model(): + """Gets a classifier model.""" + classifier_model, core_model = ( + bert_models.classifier_model( + bert_config, + num_classes, + max_seq_length, + hub_module_url=FLAGS.hub_module_url, + hub_module_trainable=FLAGS.hub_module_trainable)) + optimizer = optimization.create_optimizer(initial_lr, + steps_per_epoch * epochs, + warmup_steps, FLAGS.end_lr, + FLAGS.optimizer_type) + classifier_model.optimizer = performance.configure_optimizer( + optimizer, + use_float16=common_flags.use_float16()) + return classifier_model, core_model + + # tf.keras.losses objects accept optional sample_weight arguments (eg. coming + # from the dataset) to compute weighted loss, as used for the regression + # tasks. The classification tasks, using the custom get_loss_fn don't accept + # sample weights though. + loss_fn = (tf.keras.losses.MeanSquaredError() if is_regression + else get_loss_fn(num_classes)) + + # Defines evaluation metrics function, which will create metrics in the + # correct device and strategy scope. + if custom_metrics: + metric_fn = custom_metrics + elif is_regression: + metric_fn = functools.partial( + tf.keras.metrics.MeanSquaredError, + 'mean_squared_error', + dtype=tf.float32) + else: + metric_fn = functools.partial( + tf.keras.metrics.SparseCategoricalAccuracy, + 'accuracy', + dtype=tf.float32) + + # Start training using Keras compile/fit API. + logging.info('Training using TF 2.x Keras compile/fit API with ' + 'distribution strategy.') + return run_keras_compile_fit( + model_dir, + strategy, + _get_classifier_model, + train_input_fn, + eval_input_fn, + loss_fn, + metric_fn, + init_checkpoint, + epochs, + steps_per_epoch, + steps_per_loop, + eval_steps, + training_callbacks=training_callbacks, + custom_callbacks=custom_callbacks) + + +def run_keras_compile_fit(model_dir, + strategy, + model_fn, + train_input_fn, + eval_input_fn, + loss_fn, + metric_fn, + init_checkpoint, + epochs, + steps_per_epoch, + steps_per_loop, + eval_steps, + training_callbacks=True, + custom_callbacks=None): + """Runs BERT classifier model using Keras compile/fit API.""" + + with strategy.scope(): + training_dataset = train_input_fn() + evaluation_dataset = eval_input_fn() if eval_input_fn else None + bert_model, sub_model = model_fn() + optimizer = bert_model.optimizer + + if init_checkpoint: + checkpoint = tf.train.Checkpoint(model=sub_model, encoder=sub_model) + checkpoint.read(init_checkpoint).assert_existing_objects_matched() + + if not isinstance(metric_fn, (list, tuple)): + metric_fn = [metric_fn] + bert_model.compile( + optimizer=optimizer, + loss=loss_fn, + metrics=[fn() for fn in metric_fn], + steps_per_execution=steps_per_loop) + + summary_dir = os.path.join(model_dir, 'summaries') + summary_callback = tf.keras.callbacks.TensorBoard(summary_dir) + checkpoint = tf.train.Checkpoint(model=bert_model, optimizer=optimizer) + checkpoint_manager = tf.train.CheckpointManager( + checkpoint, + directory=model_dir, + max_to_keep=None, + step_counter=optimizer.iterations, + checkpoint_interval=0) + checkpoint_callback = keras_utils.SimpleCheckpoint(checkpoint_manager) + + if training_callbacks: + if custom_callbacks is not None: + custom_callbacks += [summary_callback, checkpoint_callback] + else: + custom_callbacks = [summary_callback, checkpoint_callback] + + history = bert_model.fit( + x=training_dataset, + validation_data=evaluation_dataset, + steps_per_epoch=steps_per_epoch, + epochs=epochs, + validation_steps=eval_steps, + callbacks=custom_callbacks) + stats = {'total_training_steps': steps_per_epoch * epochs} + if 'loss' in history.history: + stats['train_loss'] = history.history['loss'][-1] + if 'val_accuracy' in history.history: + stats['eval_metrics'] = history.history['val_accuracy'][-1] + return bert_model, stats + + +def get_predictions_and_labels(strategy, + trained_model, + eval_input_fn, + is_regression=False, + return_probs=False): + """Obtains predictions of trained model on evaluation data. + + Note that list of labels is returned along with the predictions because the + order changes on distributing dataset over TPU pods. + + Args: + strategy: Distribution strategy. + trained_model: Trained model with preloaded weights. + eval_input_fn: Input function for evaluation data. + is_regression: Whether it is a regression task. + return_probs: Whether to return probabilities of classes. + + Returns: + predictions: List of predictions. + labels: List of gold labels corresponding to predictions. + """ + + @tf.function + def test_step(iterator): + """Computes predictions on distributed devices.""" + + def _test_step_fn(inputs): + """Replicated predictions.""" + inputs, labels = inputs + logits = trained_model(inputs, training=False) + if not is_regression: + probabilities = tf.nn.softmax(logits) + return probabilities, labels + else: + return logits, labels + + outputs, labels = strategy.run(_test_step_fn, args=(next(iterator),)) + # outputs: current batch logits as a tuple of shard logits + outputs = tf.nest.map_structure(strategy.experimental_local_results, + outputs) + labels = tf.nest.map_structure(strategy.experimental_local_results, labels) + return outputs, labels + + def _run_evaluation(test_iterator): + """Runs evaluation steps.""" + preds, golds = list(), list() + try: + with tf.experimental.async_scope(): + while True: + probabilities, labels = test_step(test_iterator) + for cur_probs, cur_labels in zip(probabilities, labels): + if return_probs: + preds.extend(cur_probs.numpy().tolist()) + else: + preds.extend(tf.math.argmax(cur_probs, axis=1).numpy()) + golds.extend(cur_labels.numpy().tolist()) + except (StopIteration, tf.errors.OutOfRangeError): + tf.experimental.async_clear_error() + return preds, golds + + test_iter = iter(strategy.distribute_datasets_from_function(eval_input_fn)) + predictions, labels = _run_evaluation(test_iter) + + return predictions, labels + + +def export_classifier(model_export_path, input_meta_data, bert_config, + model_dir): + """Exports a trained model as a `SavedModel` for inference. + + Args: + model_export_path: a string specifying the path to the SavedModel directory. + input_meta_data: dictionary containing meta data about input and model. + bert_config: Bert configuration file to define core bert layers. + model_dir: The directory where the model weights and training/evaluation + summaries are stored. + + Raises: + Export path is not specified, got an empty string or None. + """ + if not model_export_path: + raise ValueError('Export path is not specified: %s' % model_export_path) + if not model_dir: + raise ValueError('Export path is not specified: %s' % model_dir) + + # Export uses float32 for now, even if training uses mixed precision. + tf.keras.mixed_precision.set_global_policy('float32') + classifier_model = bert_models.classifier_model( + bert_config, + input_meta_data.get('num_labels', 1), + hub_module_url=FLAGS.hub_module_url, + hub_module_trainable=False)[0] + + model_saving_utils.export_bert_model( + model_export_path, model=classifier_model, checkpoint_dir=model_dir) + + +def run_bert(strategy, + input_meta_data, + model_config, + train_input_fn=None, + eval_input_fn=None, + init_checkpoint=None, + custom_callbacks=None, + custom_metrics=None): + """Run BERT training.""" + # Enables XLA in Session Config. Should not be set for TPU. + keras_utils.set_session_config(FLAGS.enable_xla) + performance.set_mixed_precision_policy(common_flags.dtype()) + + epochs = FLAGS.num_train_epochs * FLAGS.num_eval_per_epoch + train_data_size = ( + input_meta_data['train_data_size'] // FLAGS.num_eval_per_epoch) + if FLAGS.train_data_size: + train_data_size = min(train_data_size, FLAGS.train_data_size) + logging.info('Updated train_data_size: %s', train_data_size) + steps_per_epoch = int(train_data_size / FLAGS.train_batch_size) + warmup_steps = int(epochs * train_data_size * 0.1 / FLAGS.train_batch_size) + eval_steps = int( + math.ceil(input_meta_data['eval_data_size'] / FLAGS.eval_batch_size)) + + if not strategy: + raise ValueError('Distribution strategy has not been specified.') + + if not custom_callbacks: + custom_callbacks = [] + + if FLAGS.log_steps: + custom_callbacks.append( + keras_utils.TimeHistory( + batch_size=FLAGS.train_batch_size, + log_steps=FLAGS.log_steps, + logdir=FLAGS.model_dir)) + + trained_model, _ = run_bert_classifier( + strategy, + model_config, + input_meta_data, + FLAGS.model_dir, + epochs, + steps_per_epoch, + FLAGS.steps_per_loop, + eval_steps, + warmup_steps, + FLAGS.learning_rate, + init_checkpoint or FLAGS.init_checkpoint, + train_input_fn, + eval_input_fn, + custom_callbacks=custom_callbacks, + custom_metrics=custom_metrics) + + if FLAGS.model_export_path: + model_saving_utils.export_bert_model( + FLAGS.model_export_path, model=trained_model) + return trained_model + + +def custom_main(custom_callbacks=None, custom_metrics=None): + """Run classification or regression. + + Args: + custom_callbacks: list of tf.keras.Callbacks passed to training loop. + custom_metrics: list of metrics passed to the training loop. + """ + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_param) + + with tf.io.gfile.GFile(FLAGS.input_meta_data_path, 'rb') as reader: + input_meta_data = json.loads(reader.read().decode('utf-8')) + label_type = LABEL_TYPES_MAP[input_meta_data.get('label_type', 'int')] + include_sample_weights = input_meta_data.get('has_sample_weights', False) + + if not FLAGS.model_dir: + FLAGS.model_dir = '/tmp/bert20/' + + bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) + + if FLAGS.mode == 'export_only': + export_classifier(FLAGS.model_export_path, input_meta_data, bert_config, + FLAGS.model_dir) + return + + strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=FLAGS.distribution_strategy, + num_gpus=FLAGS.num_gpus, + tpu_address=FLAGS.tpu) + eval_input_fn = get_dataset_fn( + FLAGS.eval_data_path, + input_meta_data['max_seq_length'], + FLAGS.eval_batch_size, + is_training=False, + label_type=label_type, + include_sample_weights=include_sample_weights) + + if FLAGS.mode == 'predict': + num_labels = input_meta_data.get('num_labels', 1) + with strategy.scope(): + classifier_model = bert_models.classifier_model( + bert_config, num_labels)[0] + checkpoint = tf.train.Checkpoint(model=classifier_model) + latest_checkpoint_file = ( + FLAGS.predict_checkpoint_path or + tf.train.latest_checkpoint(FLAGS.model_dir)) + assert latest_checkpoint_file + logging.info('Checkpoint file %s found and restoring from ' + 'checkpoint', latest_checkpoint_file) + checkpoint.restore( + latest_checkpoint_file).assert_existing_objects_matched() + preds, _ = get_predictions_and_labels( + strategy, + classifier_model, + eval_input_fn, + is_regression=(num_labels == 1), + return_probs=True) + output_predict_file = os.path.join(FLAGS.model_dir, 'test_results.tsv') + with tf.io.gfile.GFile(output_predict_file, 'w') as writer: + logging.info('***** Predict results *****') + for probabilities in preds: + output_line = '\t'.join( + str(class_probability) + for class_probability in probabilities) + '\n' + writer.write(output_line) + return + + if FLAGS.mode != 'train_and_eval': + raise ValueError('Unsupported mode is specified: %s' % FLAGS.mode) + train_input_fn = get_dataset_fn( + FLAGS.train_data_path, + input_meta_data['max_seq_length'], + FLAGS.train_batch_size, + is_training=True, + label_type=label_type, + include_sample_weights=include_sample_weights, + num_samples=FLAGS.train_data_size) + run_bert( + strategy, + input_meta_data, + bert_config, + train_input_fn, + eval_input_fn, + custom_callbacks=custom_callbacks, + custom_metrics=custom_metrics) + + +def main(_): + custom_main(custom_callbacks=None, custom_metrics=None) + + +if __name__ == '__main__': + flags.mark_flag_as_required('bert_config_file') + flags.mark_flag_as_required('input_meta_data_path') + flags.mark_flag_as_required('model_dir') + app.run(main) diff --git a/official/nlp/bert/run_pretraining.py b/official/legacy/bert/run_pretraining.py similarity index 96% rename from official/nlp/bert/run_pretraining.py rename to official/legacy/bert/run_pretraining.py index 3390d335d2ccedfec7f19fe5da4a79bad95c52d3..6a1b1d7a59b7d51231d129244ee5809a93a5a05b 100644 --- a/official/nlp/bert/run_pretraining.py +++ b/official/legacy/bert/run_pretraining.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,13 +21,13 @@ from absl import logging import gin import tensorflow as tf from official.common import distribute_utils +from official.legacy.bert import bert_models +from official.legacy.bert import common_flags +from official.legacy.bert import configs +from official.legacy.bert import input_pipeline +from official.legacy.bert import model_training_utils from official.modeling import performance from official.nlp import optimization -from official.nlp.bert import bert_models -from official.nlp.bert import common_flags -from official.nlp.bert import configs -from official.nlp.bert import input_pipeline -from official.nlp.bert import model_training_utils flags.DEFINE_string('input_files', None, diff --git a/official/legacy/bert/run_squad.py b/official/legacy/bert/run_squad.py new file mode 100644 index 0000000000000000000000000000000000000000..ee63bc96f7318941c3e4638fdf0fe076edf90f7d --- /dev/null +++ b/official/legacy/bert/run_squad.py @@ -0,0 +1,148 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Run BERT on SQuAD 1.1 and SQuAD 2.0 in TF 2.x.""" + +import json +import os +import time + +# Import libraries +from absl import app +from absl import flags +from absl import logging +import gin +import tensorflow as tf +from official.common import distribute_utils +from official.legacy.bert import configs as bert_configs +from official.legacy.bert import run_squad_helper +from official.nlp.data import squad_lib as squad_lib_wp +from official.nlp.tools import tokenization +from official.utils.misc import keras_utils + + +flags.DEFINE_string('vocab_file', None, + 'The vocabulary file that the BERT model was trained on.') + +# More flags can be found in run_squad_helper. +run_squad_helper.define_common_squad_flags() + +FLAGS = flags.FLAGS + + +def train_squad(strategy, + input_meta_data, + custom_callbacks=None, + run_eagerly=False, + init_checkpoint=None, + sub_model_export_name=None): + """Run bert squad training.""" + bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) + init_checkpoint = init_checkpoint or FLAGS.init_checkpoint + run_squad_helper.train_squad(strategy, input_meta_data, bert_config, + custom_callbacks, run_eagerly, init_checkpoint, + sub_model_export_name=sub_model_export_name) + + +def predict_squad(strategy, input_meta_data): + """Makes predictions for the squad dataset.""" + bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) + tokenizer = tokenization.FullTokenizer( + vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case) + run_squad_helper.predict_squad( + strategy, input_meta_data, tokenizer, bert_config, squad_lib_wp) + + +def eval_squad(strategy, input_meta_data): + """Evaluate on the squad dataset.""" + bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) + tokenizer = tokenization.FullTokenizer( + vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case) + eval_metrics = run_squad_helper.eval_squad( + strategy, input_meta_data, tokenizer, bert_config, squad_lib_wp) + return eval_metrics + + +def export_squad(model_export_path, input_meta_data): + """Exports a trained model as a `SavedModel` for inference. + + Args: + model_export_path: a string specifying the path to the SavedModel directory. + input_meta_data: dictionary containing meta data about input and model. + + Raises: + Export path is not specified, got an empty string or None. + """ + bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) + run_squad_helper.export_squad(model_export_path, input_meta_data, bert_config) + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_param) + + with tf.io.gfile.GFile(FLAGS.input_meta_data_path, 'rb') as reader: + input_meta_data = json.loads(reader.read().decode('utf-8')) + + if FLAGS.mode == 'export_only': + export_squad(FLAGS.model_export_path, input_meta_data) + return + + # Configures cluster spec for multi-worker distribution strategy. + if FLAGS.num_gpus > 0: + _ = distribute_utils.configure_cluster(FLAGS.worker_hosts, FLAGS.task_index) + strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=FLAGS.distribution_strategy, + num_gpus=FLAGS.num_gpus, + all_reduce_alg=FLAGS.all_reduce_alg, + tpu_address=FLAGS.tpu) + + if 'train' in FLAGS.mode: + if FLAGS.log_steps: + custom_callbacks = [keras_utils.TimeHistory( + batch_size=FLAGS.train_batch_size, + log_steps=FLAGS.log_steps, + logdir=FLAGS.model_dir, + )] + else: + custom_callbacks = None + + train_squad( + strategy, + input_meta_data, + custom_callbacks=custom_callbacks, + run_eagerly=FLAGS.run_eagerly, + sub_model_export_name=FLAGS.sub_model_export_name, + ) + if 'predict' in FLAGS.mode: + predict_squad(strategy, input_meta_data) + if 'eval' in FLAGS.mode: + eval_metrics = eval_squad(strategy, input_meta_data) + f1_score = eval_metrics['final_f1'] + logging.info('SQuAD eval F1-score: %f', f1_score) + summary_dir = os.path.join(FLAGS.model_dir, 'summaries', 'eval') + summary_writer = tf.summary.create_file_writer(summary_dir) + with summary_writer.as_default(): + # TODO(lehou): write to the correct step number. + tf.summary.scalar('F1-score', f1_score, step=0) + summary_writer.flush() + # Also write eval_metrics to json file. + squad_lib_wp.write_to_json_files( + eval_metrics, os.path.join(summary_dir, 'eval_metrics.json')) + time.sleep(60) + + +if __name__ == '__main__': + flags.mark_flag_as_required('bert_config_file') + flags.mark_flag_as_required('model_dir') + app.run(main) diff --git a/official/nlp/bert/run_squad_helper.py b/official/legacy/bert/run_squad_helper.py similarity index 97% rename from official/nlp/bert/run_squad_helper.py rename to official/legacy/bert/run_squad_helper.py index d4cee884a0d90d2b8cb312494a95e7d3d6b2d08b..be2e97dac5f1c3968fd0a720e7526c66b1f458f7 100644 --- a/official/nlp/bert/run_squad_helper.py +++ b/official/legacy/bert/run_squad_helper.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,16 +21,16 @@ import os from absl import flags from absl import logging import tensorflow as tf +from official.legacy.bert import bert_models +from official.legacy.bert import common_flags +from official.legacy.bert import input_pipeline +from official.legacy.bert import model_saving_utils +from official.legacy.bert import model_training_utils from official.modeling import performance from official.nlp import optimization -from official.nlp.bert import bert_models -from official.nlp.bert import common_flags -from official.nlp.bert import input_pipeline -from official.nlp.bert import model_saving_utils -from official.nlp.bert import model_training_utils -from official.nlp.bert import squad_evaluate_v1_1 -from official.nlp.bert import squad_evaluate_v2_0 from official.nlp.data import squad_lib_sp +from official.nlp.tools import squad_evaluate_v1_1 +from official.nlp.tools import squad_evaluate_v2_0 from official.utils.misc import keras_utils diff --git a/official/nlp/bert/serving.py b/official/legacy/bert/serving.py similarity index 97% rename from official/nlp/bert/serving.py rename to official/legacy/bert/serving.py index 7e27869c74b30ae5ce1a8a9b75760d0d8013640a..1666435aa8f2d6bb575814bf2ba612a4ed371880 100644 --- a/official/nlp/bert/serving.py +++ b/official/legacy/bert/serving.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,8 +18,8 @@ from absl import app from absl import flags import tensorflow as tf -from official.nlp.bert import bert_models -from official.nlp.bert import configs +from official.legacy.bert import bert_models +from official.legacy.bert import configs flags.DEFINE_integer( "sequence_length", None, "Sequence length to parse the tf.Example. If " diff --git a/official/legacy/detection/__init__.py b/official/legacy/detection/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/__init__.py +++ b/official/legacy/detection/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/configs/__init__.py b/official/legacy/detection/configs/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/configs/__init__.py +++ b/official/legacy/detection/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/configs/base_config.py b/official/legacy/detection/configs/base_config.py index 32b8bcc1be551c249cafeab6706ae3bc58cc2d08..e274d91adc01d0f96200bdbd3fe1b1853adb525a 100644 --- a/official/legacy/detection/configs/base_config.py +++ b/official/legacy/detection/configs/base_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/configs/factory.py b/official/legacy/detection/configs/factory.py index 3de8fcd2b0df72b2a15d80f8e3166784de26f855..d14f4b4e766a033da6c79fd5c83c9b375e58f03e 100644 --- a/official/legacy/detection/configs/factory.py +++ b/official/legacy/detection/configs/factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/configs/maskrcnn_config.py b/official/legacy/detection/configs/maskrcnn_config.py index 71af35021258b1a89b9937305e6a98c7f6d017dc..275cbf5e608434dc014f7ee30d589079568d0259 100644 --- a/official/legacy/detection/configs/maskrcnn_config.py +++ b/official/legacy/detection/configs/maskrcnn_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/configs/olnmask_config.py b/official/legacy/detection/configs/olnmask_config.py index a12ce5a7f5aa4cdd488c5a70dac8bde5fb314d3f..74e786c1fef4a56639819c89f4282cc3044ee643 100644 --- a/official/legacy/detection/configs/olnmask_config.py +++ b/official/legacy/detection/configs/olnmask_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/configs/retinanet_config.py b/official/legacy/detection/configs/retinanet_config.py index 73c288a6460b62f9eacd79d617fdda46ab754b7e..d3bd1ef19eb017be70c370bbc9fe1b2bee3ca122 100644 --- a/official/legacy/detection/configs/retinanet_config.py +++ b/official/legacy/detection/configs/retinanet_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/configs/shapemask_config.py b/official/legacy/detection/configs/shapemask_config.py index 30bc9ae92c4bf07906af1e1c5ad9006bd5dc921c..321a364f624e5dc98a257b494f18ec7dd33dbd39 100644 --- a/official/legacy/detection/configs/shapemask_config.py +++ b/official/legacy/detection/configs/shapemask_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/__init__.py b/official/legacy/detection/dataloader/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/dataloader/__init__.py +++ b/official/legacy/detection/dataloader/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/anchor.py b/official/legacy/detection/dataloader/anchor.py index 4853cb1b7a0e19741f5d00904bc075184f048fc1..a5d90ed6c1d3d5d7910f1ac8ba8690f3113890e3 100644 --- a/official/legacy/detection/dataloader/anchor.py +++ b/official/legacy/detection/dataloader/anchor.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,7 +22,7 @@ import collections import tensorflow as tf from official.legacy.detection.utils import box_utils -from official.vision.beta.ops import iou_similarity +from official.vision.ops import iou_similarity from official.vision.utils.object_detection import argmax_matcher from official.vision.utils.object_detection import balanced_positive_negative_sampler from official.vision.utils.object_detection import box_list diff --git a/official/legacy/detection/dataloader/factory.py b/official/legacy/detection/dataloader/factory.py index 4623fd1ed401291929382c8a370599ac3477c667..3bc8985eb432971020891426c11795a309269770 100644 --- a/official/legacy/detection/dataloader/factory.py +++ b/official/legacy/detection/dataloader/factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/input_reader.py b/official/legacy/detection/dataloader/input_reader.py index 601db93d84e9166d6c87cd42553ce76242af1b9f..4ffa729eda15c9a4ff983b04d4b4b157907c824c 100644 --- a/official/legacy/detection/dataloader/input_reader.py +++ b/official/legacy/detection/dataloader/input_reader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/maskrcnn_parser.py b/official/legacy/detection/dataloader/maskrcnn_parser.py index c7c156d43e36d170f9c16221a441ea15d0f5ed45..f69fa3260f05bba8853f9a3bca731c0edfd9f4d2 100644 --- a/official/legacy/detection/dataloader/maskrcnn_parser.py +++ b/official/legacy/detection/dataloader/maskrcnn_parser.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/mode_keys.py b/official/legacy/detection/dataloader/mode_keys.py index d6fdd9008bd4491ebec171d25c14d517ca3647c6..93eb7d3ad9e106d7f90a735a939d7626ebf594eb 100644 --- a/official/legacy/detection/dataloader/mode_keys.py +++ b/official/legacy/detection/dataloader/mode_keys.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/olnmask_parser.py b/official/legacy/detection/dataloader/olnmask_parser.py index 6749095319d6a0fcdedcfd39912f069c32b9b25a..b569d66be72d6004bdfef43a92c7b98bc3fe6027 100644 --- a/official/legacy/detection/dataloader/olnmask_parser.py +++ b/official/legacy/detection/dataloader/olnmask_parser.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/retinanet_parser.py b/official/legacy/detection/dataloader/retinanet_parser.py index 5de59ca2c1891509c977ec7dbfdefa9d853ab3d1..55058af79ddbdd7aa5d771212a53ce802e46aeeb 100644 --- a/official/legacy/detection/dataloader/retinanet_parser.py +++ b/official/legacy/detection/dataloader/retinanet_parser.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/shapemask_parser.py b/official/legacy/detection/dataloader/shapemask_parser.py index f8a99d018e6f551b2ad482f5454a1ac0c0233c5c..5feeb21d430bd40f20b64872baef124e7ed2ecbd 100644 --- a/official/legacy/detection/dataloader/shapemask_parser.py +++ b/official/legacy/detection/dataloader/shapemask_parser.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/dataloader/tf_example_decoder.py b/official/legacy/detection/dataloader/tf_example_decoder.py index e6472a36b9a31a8e8a98cecf10a6abf8ccb03985..9e65509ce156b23f28b7fdfb0fdf1b49993137ce 100644 --- a/official/legacy/detection/dataloader/tf_example_decoder.py +++ b/official/legacy/detection/dataloader/tf_example_decoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/evaluation/__init__.py b/official/legacy/detection/evaluation/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/evaluation/__init__.py +++ b/official/legacy/detection/evaluation/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/evaluation/coco_evaluator.py b/official/legacy/detection/evaluation/coco_evaluator.py index 4469af50cb943b4a8640f1d3ba2a1753e8c565d0..222763b5e4090e66dce19bfaed735a92a3566d24 100644 --- a/official/legacy/detection/evaluation/coco_evaluator.py +++ b/official/legacy/detection/evaluation/coco_evaluator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/evaluation/coco_utils.py b/official/legacy/detection/evaluation/coco_utils.py index 03e90c05582b4f4dfda362d14cd1cfc23626d23f..6c3692d011a83c16fb060f3a712f270a9c95f011 100644 --- a/official/legacy/detection/evaluation/coco_utils.py +++ b/official/legacy/detection/evaluation/coco_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/evaluation/factory.py b/official/legacy/detection/evaluation/factory.py index 93f18f1e42511cd02963ea27b9f59aa03f026316..b47de01f9e155b995c53c2b63ecb5fef4d01949b 100644 --- a/official/legacy/detection/evaluation/factory.py +++ b/official/legacy/detection/evaluation/factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/executor/__init__.py b/official/legacy/detection/executor/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/executor/__init__.py +++ b/official/legacy/detection/executor/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/executor/detection_executor.py b/official/legacy/detection/executor/detection_executor.py index 19dae201fad5d7c4b73ba518959aec765cd3d913..396de52cd6246adea9280b1b6728ec4c5eba5b4f 100644 --- a/official/legacy/detection/executor/detection_executor.py +++ b/official/legacy/detection/executor/detection_executor.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/executor/distributed_executor.py b/official/legacy/detection/executor/distributed_executor.py index 4079488107fdf85f441be2458e56b2d140a0d388..529e8d813bce0fbf482b719b95be4f064965df27 100644 --- a/official/legacy/detection/executor/distributed_executor.py +++ b/official/legacy/detection/executor/distributed_executor.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -683,8 +683,14 @@ class DistributedExecutor(object): if not checkpoint_path: raise ValueError('checkpoint path is empty') reader = tf.compat.v1.train.NewCheckpointReader(checkpoint_path) - current_step = reader.get_tensor( - 'optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE') + if reader.has_tensor('optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE'): + # Legacy keras optimizer iteration. + current_step = reader.get_tensor( + 'optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE') + else: + # New keras optimizer iteration. + current_step = reader.get_tensor( + 'optimizer/_iterations/.ATTRIBUTES/VARIABLE_VALUE') logging.info('Checkpoint file %s found and restoring from ' 'checkpoint', checkpoint_path) status = checkpoint.restore(checkpoint_path) diff --git a/official/legacy/detection/main.py b/official/legacy/detection/main.py index 224f5440a65f89b05f30d359d6eb610bca4adde0..9071e7c990cbb15c89d12ef84109fb6cfd1694a9 100644 --- a/official/legacy/detection/main.py +++ b/official/legacy/detection/main.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/__init__.py b/official/legacy/detection/modeling/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/modeling/__init__.py +++ b/official/legacy/detection/modeling/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/__init__.py b/official/legacy/detection/modeling/architecture/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/modeling/architecture/__init__.py +++ b/official/legacy/detection/modeling/architecture/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/factory.py b/official/legacy/detection/modeling/architecture/factory.py index 94d48c694e4bd14892af0d7a288f22ae849f225a..4b755200e1109f9efbfa9e0d03a9d69f156300ff 100644 --- a/official/legacy/detection/modeling/architecture/factory.py +++ b/official/legacy/detection/modeling/architecture/factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/fpn.py b/official/legacy/detection/modeling/architecture/fpn.py index 725e78ea7006da81f3b1b70070ce90c2249fbfb1..6b9edf6dfe3a81eee493f67bd84ec849d3782de2 100644 --- a/official/legacy/detection/modeling/architecture/fpn.py +++ b/official/legacy/detection/modeling/architecture/fpn.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/heads.py b/official/legacy/detection/modeling/architecture/heads.py index d30c7ea8cbefac9720c9be5e2d83ef59bf4aaf98..430cb01d79df90d0e9b2d8844d6fa33a1d8bdfcd 100644 --- a/official/legacy/detection/modeling/architecture/heads.py +++ b/official/legacy/detection/modeling/architecture/heads.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/identity.py b/official/legacy/detection/modeling/architecture/identity.py index 778297f8919f8a90875c69ce1f11ef5dfd9fc95f..7d3280dbd5e4b01b01bd27fca3cf72cbe6521053 100644 --- a/official/legacy/detection/modeling/architecture/identity.py +++ b/official/legacy/detection/modeling/architecture/identity.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/nn_blocks.py b/official/legacy/detection/modeling/architecture/nn_blocks.py index 69a0d28261997eddbd9826d7681edbe95940e9c9..ab61d3239a95cc37dd953edc0b97014539dfc975 100644 --- a/official/legacy/detection/modeling/architecture/nn_blocks.py +++ b/official/legacy/detection/modeling/architecture/nn_blocks.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/nn_ops.py b/official/legacy/detection/modeling/architecture/nn_ops.py index e4c389c671b5c23e48ee8061b83f63c31a6f643e..70f47c9af0bf2e9b9939a12f2c6bcd474bd945ff 100644 --- a/official/legacy/detection/modeling/architecture/nn_ops.py +++ b/official/legacy/detection/modeling/architecture/nn_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/resnet.py b/official/legacy/detection/modeling/architecture/resnet.py index 370e86b50e3b84f57a84f6de44cba89a41357d6a..0a8182bfe4a62e182526fbbd4d3b778b4e29478a 100644 --- a/official/legacy/detection/modeling/architecture/resnet.py +++ b/official/legacy/detection/modeling/architecture/resnet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/architecture/spinenet.py b/official/legacy/detection/modeling/architecture/spinenet.py index 7975a0aeb36a96e2c2081104292dbd3036cffd2e..ea86a70f28dc33f3c714636b8889e16cb3528ce7 100644 --- a/official/legacy/detection/modeling/architecture/spinenet.py +++ b/official/legacy/detection/modeling/architecture/spinenet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 # ============================================================================== """Implementation of SpineNet model. diff --git a/official/legacy/detection/modeling/base_model.py b/official/legacy/detection/modeling/base_model.py index e7f0c54853a2b9ba3294f56abcdd4be811d32d6d..aa84f4682634f7bf637116a15224293c029fec60 100644 --- a/official/legacy/detection/modeling/base_model.py +++ b/official/legacy/detection/modeling/base_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/checkpoint_utils.py b/official/legacy/detection/modeling/checkpoint_utils.py index 237cdf8f2dab8fa57c4b80dca6d04f46dbeef051..1765a059c30d3a2095b0f1f2809372e8ed0153bb 100644 --- a/official/legacy/detection/modeling/checkpoint_utils.py +++ b/official/legacy/detection/modeling/checkpoint_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/factory.py b/official/legacy/detection/modeling/factory.py index 028bdde4b0685457333af3e10c210dd8e1c6008f..3d852b8d040d9694f2a47e436deb53c288622de9 100644 --- a/official/legacy/detection/modeling/factory.py +++ b/official/legacy/detection/modeling/factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/learning_rates.py b/official/legacy/detection/modeling/learning_rates.py index 85a06f5a02b8897112b9954c314ec9929b422fda..bbd34873981ea9f3a5981cbe8a6a7285ca561bab 100644 --- a/official/legacy/detection/modeling/learning_rates.py +++ b/official/legacy/detection/modeling/learning_rates.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -61,7 +61,7 @@ class CosineLearningRateWithLinearWarmup( """Class to generate learning rate tensor.""" def __init__(self, total_steps, params): - """Creates the consine learning rate tensor with linear warmup.""" + """Creates the cosine learning rate tensor with linear warmup.""" super(CosineLearningRateWithLinearWarmup, self).__init__() self._total_steps = total_steps assert isinstance(params, (dict, params_dict.ParamsDict)) diff --git a/official/legacy/detection/modeling/losses.py b/official/legacy/detection/modeling/losses.py index 02e2632ae60c9da49f58c1239964d2f1104b52f8..f3423993390ea7cc3173b8d2f71ff1f9588556c5 100644 --- a/official/legacy/detection/modeling/losses.py +++ b/official/legacy/detection/modeling/losses.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/maskrcnn_model.py b/official/legacy/detection/modeling/maskrcnn_model.py index a381bd0ce2ac44381f47b466015f5d891c9077b0..576457b612228683ecbdcdbabbb6a131a77432be 100644 --- a/official/legacy/detection/modeling/maskrcnn_model.py +++ b/official/legacy/detection/modeling/maskrcnn_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/olnmask_model.py b/official/legacy/detection/modeling/olnmask_model.py index 8e8b080da2752624d60b13fca41846d1b843870f..255ff86e4f6c8921bcf6513d6000a3823e39a36b 100644 --- a/official/legacy/detection/modeling/olnmask_model.py +++ b/official/legacy/detection/modeling/olnmask_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/optimizers.py b/official/legacy/detection/modeling/optimizers.py index ce434495571a219deea79296fecd5cf1a60c8a93..d8ff456aa7767d3bc9bf64ff89c82739ceb0d5fc 100644 --- a/official/legacy/detection/modeling/optimizers.py +++ b/official/legacy/detection/modeling/optimizers.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/retinanet_model.py b/official/legacy/detection/modeling/retinanet_model.py index 7433179f7303239c51fd9a43715437682d630603..7e87717cc9e934a0413301939a5cf92e932bb9e7 100644 --- a/official/legacy/detection/modeling/retinanet_model.py +++ b/official/legacy/detection/modeling/retinanet_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/modeling/shapemask_model.py b/official/legacy/detection/modeling/shapemask_model.py index b8b7f37422cb7618cc199ec1e90ecde82f9a9724..6d01e122b0b1f0d9251c040cd8ca0c505681b838 100644 --- a/official/legacy/detection/modeling/shapemask_model.py +++ b/official/legacy/detection/modeling/shapemask_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/ops/__init__.py b/official/legacy/detection/ops/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/ops/__init__.py +++ b/official/legacy/detection/ops/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/ops/nms.py b/official/legacy/detection/ops/nms.py index 0beb7e3850612e261e41a0b2634224b3aab93e88..24fdcef87bcc03d0e24950b56422c70f27bffd1b 100644 --- a/official/legacy/detection/ops/nms.py +++ b/official/legacy/detection/ops/nms.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/ops/postprocess_ops.py b/official/legacy/detection/ops/postprocess_ops.py index bd11fe964f30242952d5a8cd2536cff721732e3c..8b4a8b6d9f05298937eec6c6a8e9c6493b7c908a 100644 --- a/official/legacy/detection/ops/postprocess_ops.py +++ b/official/legacy/detection/ops/postprocess_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/ops/roi_ops.py b/official/legacy/detection/ops/roi_ops.py index 6abdeadc6b2135efecc8b0fa5f54a182e0a93485..7aeb1a91b1f51a15bd88ca0c153f748edcdf41de 100644 --- a/official/legacy/detection/ops/roi_ops.py +++ b/official/legacy/detection/ops/roi_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/ops/spatial_transform_ops.py b/official/legacy/detection/ops/spatial_transform_ops.py index 4b7d7ecde48ca8dd1eeb4f7356a1642583b1754d..db9cf98fb80f711aee86a74c548f49a985b5de3e 100644 --- a/official/legacy/detection/ops/spatial_transform_ops.py +++ b/official/legacy/detection/ops/spatial_transform_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/ops/target_ops.py b/official/legacy/detection/ops/target_ops.py index db1ea313a9e981ecd0f709b2272eff520255bf3b..7b8e208b99b245a5389c2a197b1e2b6be780925e 100644 --- a/official/legacy/detection/ops/target_ops.py +++ b/official/legacy/detection/ops/target_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/utils/__init__.py b/official/legacy/detection/utils/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/detection/utils/__init__.py +++ b/official/legacy/detection/utils/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/utils/box_utils.py b/official/legacy/detection/utils/box_utils.py index bc95fa8e3602d49f922fb135531e95078942b7c1..f52b4d52c1280e2d1a028cf29552225189e0fb63 100644 --- a/official/legacy/detection/utils/box_utils.py +++ b/official/legacy/detection/utils/box_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/utils/class_utils.py b/official/legacy/detection/utils/class_utils.py index cbf806f11070736c17de79dd63240e9a626808d9..fe5525c692657270aacc70d4ec27ba262b95102c 100644 --- a/official/legacy/detection/utils/class_utils.py +++ b/official/legacy/detection/utils/class_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/utils/dataloader_utils.py b/official/legacy/detection/utils/dataloader_utils.py index 8cdbc54a05c061cbe1cf719594007875deac64a8..a3a34eb658b242fb2c7d81c1d93aa92ccb19d454 100644 --- a/official/legacy/detection/utils/dataloader_utils.py +++ b/official/legacy/detection/utils/dataloader_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/utils/input_utils.py b/official/legacy/detection/utils/input_utils.py index e194d3ca728f418e181f45a5add6fd8b8db21967..12b7c0be168baa047d2084386aaaeab9556eba17 100644 --- a/official/legacy/detection/utils/input_utils.py +++ b/official/legacy/detection/utils/input_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/detection/utils/mask_utils.py b/official/legacy/detection/utils/mask_utils.py index 926c829b81b35b11ca53a5a3d351d0ebca36205e..deb86a51605f73af4ea9b71d0bd1c3a4d7095f87 100644 --- a/official/legacy/detection/utils/mask_utils.py +++ b/official/legacy/detection/utils/mask_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/README.md b/official/legacy/image_classification/README.md index bc64b791db828a3ce4ccb2539f993e397641717f..6d9231b4848f55a69395d047f3ec33674ac59599 100644 --- a/official/legacy/image_classification/README.md +++ b/official/legacy/image_classification/README.md @@ -1,9 +1,9 @@ # Image Classification -**Warning:** the features in the `image_classification/` folder have been fully -integrated into vision/beta. Please use the [new code base](../../vision/beta/README.md). +**Warning:** the features in the `image_classification/` directory have been +fully integrated into the [new code base](https://github.com/tensorflow/models/tree/benchmark/official/vision/modeling/backbones). -This folder contains TF 2.0 model examples for image classification: +This folder contains TF 2 model examples for image classification: * [MNIST](#mnist) * [Classifier Trainer](#classifier-trainer), a framework that uses the Keras @@ -17,8 +17,7 @@ For more information about other types of models, please refer to this ## Before you begin Please make sure that you have the latest version of TensorFlow -installed and -[add the models folder to your Python path](/official/#running-the-models). +installed and add the models folder to your Python path. ### ImageNet preparation @@ -70,6 +69,7 @@ available GPUs at each host. To download the data and run the MNIST sample model locally for the first time, run one of the following command: +
```bash python3 mnist_main.py \ --model_dir=$MODEL_DIR \ @@ -79,9 +79,11 @@ python3 mnist_main.py \ --num_gpus=$NUM_GPUS \ --download ``` +
To train the model on a Cloud TPU, run the following command: +
```bash python3 mnist_main.py \ --tpu=$TPU_NAME \ @@ -91,10 +93,10 @@ python3 mnist_main.py \ --distribution_strategy=tpu \ --download ``` +
Note: the `--download` flag is only required the first time you run the model. - ## Classifier Trainer The classifier trainer is a unified framework for running image classification models using Keras's compile/fit methods. Experiments should be provided in the @@ -111,6 +113,8 @@ be 64 * 8 = 512, and for a v3-32, the global batch size is 64 * 32 = 2048. ### ResNet50 #### On GPU: + +
```bash python3 classifier_trainer.py \ --mode=train_and_eval \ @@ -121,12 +125,15 @@ python3 classifier_trainer.py \ --config_file=configs/examples/resnet/imagenet/gpu.yaml \ --params_override='runtime.num_gpus=$NUM_GPUS' ``` +
To train on multiple hosts, each with GPUs attached using [MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) please update `runtime` section in gpu.yaml (or override using `--params_override`) with: +
+ ```YAML # gpu.yaml runtime: @@ -135,12 +142,16 @@ runtime: num_gpus: $NUM_GPUS task_index: 0 ``` +
+ By having `task_index: 0` on the first host and `task_index: 1` on the second and so on. `$HOST1` and `$HOST2` are the IP addresses of the hosts, and `port` can be chosen any free port on the hosts. Only the first host will write TensorBoard Summaries and save checkpoints. #### On TPU: + +
```bash python3 classifier_trainer.py \ --mode=train_and_eval \ @@ -152,9 +163,31 @@ python3 classifier_trainer.py \ --config_file=configs/examples/resnet/imagenet/tpu.yaml ``` +
+ +### VGG-16 + +#### On GPU: + +
+```bash +python3 classifier_trainer.py \ + --mode=train_and_eval \ + --model_type=vgg \ + --dataset=imagenet \ + --model_dir=$MODEL_DIR \ + --data_dir=$DATA_DIR \ + --config_file=configs/examples/vgg/imagenet/gpu.yaml \ + --params_override='runtime.num_gpus=$NUM_GPUS' +``` + +
+ ### EfficientNet **Note: EfficientNet development is a work in progress.** #### On GPU: + +
```bash python3 classifier_trainer.py \ --mode=train_and_eval \ @@ -166,8 +199,11 @@ python3 classifier_trainer.py \ --params_override='runtime.num_gpus=$NUM_GPUS' ``` +
#### On TPU: + +
```bash python3 classifier_trainer.py \ --mode=train_and_eval \ @@ -178,6 +214,7 @@ python3 classifier_trainer.py \ --data_dir=$DATA_DIR \ --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml ``` +
Note that the number of GPU devices can be overridden in the command line using `--params_overrides`. The TPU does not need this override as the device is fixed diff --git a/official/legacy/image_classification/__init__.py b/official/legacy/image_classification/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/image_classification/__init__.py +++ b/official/legacy/image_classification/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/augment.py b/official/legacy/image_classification/augment.py index f322d31dac6ecc1e282566134720d42261a9b7fc..add7ed631ca2b6e856e726c4e2254826362769b1 100644 --- a/official/legacy/image_classification/augment.py +++ b/official/legacy/image_classification/augment.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/augment_test.py b/official/legacy/image_classification/augment_test.py index e5498a9c4778173a62bc9596187ff4623ed03753..139e10195b497d5123d9a492cb15ad4c5c98af03 100644 --- a/official/legacy/image_classification/augment_test.py +++ b/official/legacy/image_classification/augment_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/callbacks.py b/official/legacy/image_classification/callbacks.py index a4934ed88f7db280d1ffd9ad57346f68a5395d5e..061826dbd05bb3d08ae00b6d257bc96f6060badc 100644 --- a/official/legacy/image_classification/callbacks.py +++ b/official/legacy/image_classification/callbacks.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Common modules for callbacks.""" from __future__ import absolute_import from __future__ import division diff --git a/official/legacy/image_classification/classifier_trainer.py b/official/legacy/image_classification/classifier_trainer.py index 5dc1b78e3acdc7d9ae10441411d21018a178cdad..66577f6079e00b8dfee76d686ad96a5283096f9c 100644 --- a/official/legacy/image_classification/classifier_trainer.py +++ b/official/legacy/image_classification/classifier_trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Runs an Image Classification model.""" import os @@ -32,6 +31,7 @@ from official.legacy.image_classification.configs import configs from official.legacy.image_classification.efficientnet import efficientnet_model from official.legacy.image_classification.resnet import common from official.legacy.image_classification.resnet import resnet_model +from official.legacy.image_classification.vgg import vgg_model from official.modeling import hyperparams from official.modeling import performance from official.utils import hyperparams_flags @@ -43,6 +43,7 @@ def get_models() -> Mapping[str, tf.keras.Model]: return { 'efficientnet': efficientnet_model.EfficientNet.from_name, 'resnet': resnet_model.resnet50, + 'vgg': vgg_model.vgg16, } diff --git a/official/legacy/image_classification/classifier_trainer_test.py b/official/legacy/image_classification/classifier_trainer_test.py index fd304cdbae84db73d177729fbbd6338d9ecf4baf..2be5d85727f0847a69b3e43477be5dd2bd42cd29 100644 --- a/official/legacy/image_classification/classifier_trainer_test.py +++ b/official/legacy/image_classification/classifier_trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,13 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Unit tests for the classifier trainer models.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - import functools import json @@ -53,6 +48,7 @@ def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]: model=[ 'efficientnet', 'resnet', + 'vgg', ], dataset=[ 'imagenet', @@ -149,6 +145,7 @@ class ClassifierTest(tf.test.TestCase, parameterized.TestCase): model=[ 'efficientnet', 'resnet', + 'vgg', ], dataset='imagenet', dtype='float16', @@ -193,6 +190,7 @@ class ClassifierTest(tf.test.TestCase, parameterized.TestCase): model=[ 'efficientnet', 'resnet', + 'vgg', ], dataset='imagenet', dtype='bfloat16', diff --git a/official/legacy/image_classification/classifier_trainer_util_test.py b/official/legacy/image_classification/classifier_trainer_util_test.py index 634548159aaa569850ba2eb2b2a2e234e5ec0125..19a05fa678bc47ac838e08b05042c1ce41af526f 100644 --- a/official/legacy/image_classification/classifier_trainer_util_test.py +++ b/official/legacy/image_classification/classifier_trainer_util_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Unit tests for the classifier trainer models.""" from __future__ import absolute_import diff --git a/official/legacy/image_classification/configs/__init__.py b/official/legacy/image_classification/configs/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/image_classification/configs/__init__.py +++ b/official/legacy/image_classification/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/configs/base_configs.py b/official/legacy/image_classification/configs/base_configs.py index 22c9e0b3f181d3efb4ced2b76ad35ed453533ef2..7fd230b418efb4cde7551a3bfa48dc4e6c5e241e 100644 --- a/official/legacy/image_classification/configs/base_configs.py +++ b/official/legacy/image_classification/configs/base_configs.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Definitions for high level configuration groups..""" import dataclasses diff --git a/official/legacy/image_classification/configs/configs.py b/official/legacy/image_classification/configs/configs.py index a11f7f23f799d5051309756455e6a8f0da6826eb..87fb5df5b6f7780ba31890c513a1ea92e05de03c 100644 --- a/official/legacy/image_classification/configs/configs.py +++ b/official/legacy/image_classification/configs/configs.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,11 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Configuration utils for image classification experiments.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function import dataclasses @@ -24,6 +20,7 @@ from official.legacy.image_classification import dataset_factory from official.legacy.image_classification.configs import base_configs from official.legacy.image_classification.efficientnet import efficientnet_config from official.legacy.image_classification.resnet import resnet_config +from official.legacy.image_classification.vgg import vgg_config @dataclasses.dataclass @@ -92,12 +89,38 @@ class ResNetImagenetConfig(base_configs.ExperimentConfig): model: base_configs.ModelConfig = resnet_config.ResNetModelConfig() +@dataclasses.dataclass +class VGGImagenetConfig(base_configs.ExperimentConfig): + """Base configuration to train vgg-16 on ImageNet.""" + export: base_configs.ExportConfig = base_configs.ExportConfig() + runtime: base_configs.RuntimeConfig = base_configs.RuntimeConfig() + train_dataset: dataset_factory.DatasetConfig = dataset_factory.ImageNetConfig( + split='train', one_hot=False, mean_subtract=True, standardize=True) + validation_dataset: dataset_factory.DatasetConfig = dataset_factory.ImageNetConfig( + split='validation', one_hot=False, mean_subtract=True, standardize=True) + train: base_configs.TrainConfig = base_configs.TrainConfig( + resume_checkpoint=True, + epochs=90, + steps=None, + callbacks=base_configs.CallbacksConfig( + enable_checkpoint_and_export=True, enable_tensorboard=True), + metrics=['accuracy', 'top_5'], + time_history=base_configs.TimeHistoryConfig(log_steps=100), + tensorboard=base_configs.TensorBoardConfig( + track_lr=True, write_model_weights=False), + set_epoch_loop=False) + evaluation: base_configs.EvalConfig = base_configs.EvalConfig( + epochs_between_evals=1, steps=None) + model: base_configs.ModelConfig = vgg_config.VGGModelConfig() + + def get_config(model: str, dataset: str) -> base_configs.ExperimentConfig: """Given model and dataset names, return the ExperimentConfig.""" dataset_model_config_map = { 'imagenet': { 'efficientnet': EfficientNetImageNetConfig(), 'resnet': ResNetImagenetConfig(), + 'vgg': VGGImagenetConfig(), } } try: diff --git a/official/legacy/image_classification/configs/examples/vgg16/imagenet/gpu.yaml b/official/legacy/image_classification/configs/examples/vgg16/imagenet/gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..33c5a4e36a71fd975ddb57e298fa87d65c66555a --- /dev/null +++ b/official/legacy/image_classification/configs/examples/vgg16/imagenet/gpu.yaml @@ -0,0 +1,46 @@ +# Training configuration for VGG-16 trained on ImageNet on GPUs. +# Reaches > 72.8% within 90 epochs. +# Note: This configuration uses a scaled per-replica batch size based on the number of devices. +runtime: + distribution_strategy: 'mirrored' + num_gpus: 1 + batchnorm_spatial_persistent: true +train_dataset: + name: 'imagenet2012' + data_dir: null + builder: 'records' + split: 'train' + image_size: 224 + num_classes: 1000 + num_examples: 1281167 + batch_size: 128 + use_per_replica_batch_size: true + dtype: 'float32' + mean_subtract: true + standardize: true +validation_dataset: + name: 'imagenet2012' + data_dir: null + builder: 'records' + split: 'validation' + image_size: 224 + num_classes: 1000 + num_examples: 50000 + batch_size: 128 + use_per_replica_batch_size: true + dtype: 'float32' + mean_subtract: true + standardize: true +model: + name: 'vgg' + optimizer: + name: 'momentum' + momentum: 0.9 + epsilon: 0.001 + loss: + label_smoothing: 0.0 +train: + resume_checkpoint: true + epochs: 90 +evaluation: + epochs_between_evals: 1 diff --git a/official/legacy/image_classification/dataset_factory.py b/official/legacy/image_classification/dataset_factory.py index 28012996c5d2f4dd6883798a372bc275b678bc0d..19a757046b36e92dcd5d16461a4e5e8da312e1b6 100644 --- a/official/legacy/image_classification/dataset_factory.py +++ b/official/legacy/image_classification/dataset_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Dataset utilities for vision tasks using TFDS and tf.data.Dataset.""" from __future__ import absolute_import from __future__ import division diff --git a/official/legacy/image_classification/efficientnet/__init__.py b/official/legacy/image_classification/efficientnet/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/image_classification/efficientnet/__init__.py +++ b/official/legacy/image_classification/efficientnet/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/efficientnet/common_modules.py b/official/legacy/image_classification/efficientnet/common_modules.py index 0a61aa9fbf1ad53e0621e30f7444cd52692b8bdc..28be696204787c53838def5dc3474556a96161e9 100644 --- a/official/legacy/image_classification/efficientnet/common_modules.py +++ b/official/legacy/image_classification/efficientnet/common_modules.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/efficientnet/efficientnet_config.py b/official/legacy/image_classification/efficientnet/efficientnet_config.py index b031e2aa24d3b2d7207ad56ee83834c4be0cf1ca..148851cf687e722a76d6a848a7e0e33017a44ff6 100644 --- a/official/legacy/image_classification/efficientnet/efficientnet_config.py +++ b/official/legacy/image_classification/efficientnet/efficientnet_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Configuration definitions for EfficientNet losses, learning rates, and optimizers.""" from __future__ import absolute_import from __future__ import division diff --git a/official/legacy/image_classification/efficientnet/efficientnet_model.py b/official/legacy/image_classification/efficientnet/efficientnet_model.py index aa8948207c0be59e0c493dcfc239fed132b3f2a5..a9aa243b0377d2517a16af05c7db7177de42bc41 100644 --- a/official/legacy/image_classification/efficientnet/efficientnet_model.py +++ b/official/legacy/image_classification/efficientnet/efficientnet_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Contains definitions for EfficientNet model. [1] Mingxing Tan, Quoc V. Le diff --git a/official/legacy/image_classification/efficientnet/tfhub_export.py b/official/legacy/image_classification/efficientnet/tfhub_export.py index 6afd6daf72a7732184be0546c8bc22164ce2b222..67971f81ff51181eb4749d488233bc5bbabde53e 100644 --- a/official/legacy/image_classification/efficientnet/tfhub_export.py +++ b/official/legacy/image_classification/efficientnet/tfhub_export.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -43,8 +43,8 @@ def export_tfhub(model_path, hub_destination, model_name): image_input = tf.keras.layers.Input( shape=(None, None, 3), name="image_input", dtype=tf.float32) x = image_input * 255.0 - ouputs = efficientnet_model.efficientnet(x, config) - hub_model = tf.keras.Model(image_input, ouputs) + outputs = efficientnet_model.efficientnet(x, config) + hub_model = tf.keras.Model(image_input, outputs) ckpt = tf.train.Checkpoint(model=hub_model) ckpt.restore(model_path).assert_existing_objects_matched() hub_model.save( diff --git a/official/legacy/image_classification/learning_rate.py b/official/legacy/image_classification/learning_rate.py index 72f7e95187521eeebefa1e698ca5382f10642e88..248cc8472e15e7c5695a3efb962d11cfc650ccdf 100644 --- a/official/legacy/image_classification/learning_rate.py +++ b/official/legacy/image_classification/learning_rate.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Learning rate utilities for vision tasks.""" from __future__ import absolute_import from __future__ import division @@ -78,7 +77,7 @@ class CosineDecayWithWarmup(tf.keras.optimizers.schedules.LearningRateSchedule): """Class to generate learning rate tensor.""" def __init__(self, batch_size: int, total_steps: int, warmup_steps: int): - """Creates the consine learning rate tensor with linear warmup. + """Creates the cosine learning rate tensor with linear warmup. Args: batch_size: The training batch size used in the experiment. diff --git a/official/legacy/image_classification/learning_rate_test.py b/official/legacy/image_classification/learning_rate_test.py index c3d757081ef7e3078a82a910242a6277e1b9372f..77dc65c571f451c9dce2c28ab64695ab6f63fc86 100644 --- a/official/legacy/image_classification/learning_rate_test.py +++ b/official/legacy/image_classification/learning_rate_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/mnist_main.py b/official/legacy/image_classification/mnist_main.py index 9462c6ae1a9c9e15ecd352da10500a7bc1e3a8fb..cf60631444ee7a693457fb52cdd57b2b33a5ca47 100644 --- a/official/legacy/image_classification/mnist_main.py +++ b/official/legacy/image_classification/mnist_main.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/mnist_test.py b/official/legacy/image_classification/mnist_test.py index f79773a4ce02a5eb8eb455155f64f35f7d85a661..384a6a9abb3f4b751752a626ebb3bfe7ad8de3b3 100644 --- a/official/legacy/image_classification/mnist_test.py +++ b/official/legacy/image_classification/mnist_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/optimizer_factory.py b/official/legacy/image_classification/optimizer_factory.py index dfddb79524582c2a2b11d649c21b097aa221ef5e..ad6ad30d26403e7d523742c1fad5c063638e2ae3 100644 --- a/official/legacy/image_classification/optimizer_factory.py +++ b/official/legacy/image_classification/optimizer_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -48,7 +48,7 @@ def build_optimizer( `ExponentialMovingAverage`. Returns: - A tf.keras.Optimizer. + A tf.keras.optimizers.legacy.Optimizer. Raises: ValueError if the provided optimizer_name is not supported. @@ -60,12 +60,12 @@ def build_optimizer( if optimizer_name == 'sgd': logging.info('Using SGD optimizer') nesterov = params.get('nesterov', False) - optimizer = tf.keras.optimizers.SGD( + optimizer = tf.keras.optimizers.legacy.SGD( learning_rate=base_learning_rate, nesterov=nesterov) elif optimizer_name == 'momentum': logging.info('Using momentum optimizer') nesterov = params.get('nesterov', False) - optimizer = tf.keras.optimizers.SGD( + optimizer = tf.keras.optimizers.legacy.SGD( learning_rate=base_learning_rate, momentum=params['momentum'], nesterov=nesterov) @@ -74,7 +74,7 @@ def build_optimizer( rho = params.get('decay', None) or params.get('rho', 0.9) momentum = params.get('momentum', 0.9) epsilon = params.get('epsilon', 1e-07) - optimizer = tf.keras.optimizers.RMSprop( + optimizer = tf.keras.optimizers.legacy.RMSprop( learning_rate=base_learning_rate, rho=rho, momentum=momentum, @@ -84,7 +84,7 @@ def build_optimizer( beta_1 = params.get('beta_1', 0.9) beta_2 = params.get('beta_2', 0.999) epsilon = params.get('epsilon', 1e-07) - optimizer = tf.keras.optimizers.Adam( + optimizer = tf.keras.optimizers.legacy.Adam( learning_rate=base_learning_rate, beta_1=beta_1, beta_2=beta_2, diff --git a/official/legacy/image_classification/optimizer_factory_test.py b/official/legacy/image_classification/optimizer_factory_test.py index 059d1a267a6b160a2fc6e0d7fa42ae706b0c1e42..e0974505790221768988cf52207cfb0f79f871ed 100644 --- a/official/legacy/image_classification/optimizer_factory_test.py +++ b/official/legacy/image_classification/optimizer_factory_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/preprocessing.py b/official/legacy/image_classification/preprocessing.py index 346c8fc8b5b3469ad0cf596f006b7a7517b469c5..78b58243afb8466423eb9e67da0e59c024bb4c0e 100644 --- a/official/legacy/image_classification/preprocessing.py +++ b/official/legacy/image_classification/preprocessing.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/resnet/__init__.py b/official/legacy/image_classification/resnet/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/image_classification/resnet/__init__.py +++ b/official/legacy/image_classification/resnet/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/resnet/common.py b/official/legacy/image_classification/resnet/common.py index 4d57fe8cac460ab12a1822837032267a95001204..a6581d2fb831abe6776391f911aa87a18fb3dd36 100644 --- a/official/legacy/image_classification/resnet/common.py +++ b/official/legacy/image_classification/resnet/common.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -81,17 +81,18 @@ class PiecewiseConstantDecayWithWarmup( def _get_learning_rate(self, step): """Compute learning rate at given step.""" + step = tf.cast(step, dtype=tf.float32) + warmup_steps = tf.cast(self.warmup_steps, dtype=tf.float32) with tf.name_scope('PiecewiseConstantDecayWithWarmup'): def warmup_lr(step): - return self.rescaled_lr * ( - tf.cast(step, tf.float32) / tf.cast(self.warmup_steps, tf.float32)) + return self.rescaled_lr * (step / warmup_steps) def piecewise_lr(step): return tf.compat.v1.train.piecewise_constant(step, self.step_boundaries, self.lr_values) - return tf.cond(step < self.warmup_steps, lambda: warmup_lr(step), + return tf.cond(step < warmup_steps, lambda: warmup_lr(step), lambda: piecewise_lr(step)) def get_config(self): @@ -105,10 +106,14 @@ class PiecewiseConstantDecayWithWarmup( } -def get_optimizer(learning_rate=0.1): +def get_optimizer(learning_rate=0.1, use_legacy_optimizer=True): """Returns optimizer to use.""" # The learning_rate is overwritten at the beginning of each step by callback. - return tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9) + if use_legacy_optimizer: + return tf.keras.optimizers.legacy.SGD( + learning_rate=learning_rate, momentum=0.9) + else: + return tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9) def get_callbacks(pruning_method=None, diff --git a/official/legacy/image_classification/resnet/imagenet_preprocessing.py b/official/legacy/image_classification/resnet/imagenet_preprocessing.py index 86ba3ed98084987ea5d63edf8fd5f515d58fba93..d60107035da30806411d8905614ba3bfda49d9e2 100644 --- a/official/legacy/image_classification/resnet/imagenet_preprocessing.py +++ b/official/legacy/image_classification/resnet/imagenet_preprocessing.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/resnet/resnet_config.py b/official/legacy/image_classification/resnet/resnet_config.py index f06cfed82b17619c05738ecc2a0fc47fdd0c36a2..9c40628216278b6df465fa5c5c494962bd8306c9 100644 --- a/official/legacy/image_classification/resnet/resnet_config.py +++ b/official/legacy/image_classification/resnet/resnet_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Configuration definitions for ResNet losses, learning rates, and optimizers.""" from __future__ import absolute_import from __future__ import division diff --git a/official/legacy/image_classification/resnet/resnet_ctl_imagenet_main.py b/official/legacy/image_classification/resnet/resnet_ctl_imagenet_main.py index 910879b446252461e5df09562009079611c86a68..963f5b1522f3d93c764fe33cd6ebbf11e9a44170 100644 --- a/official/legacy/image_classification/resnet/resnet_ctl_imagenet_main.py +++ b/official/legacy/image_classification/resnet/resnet_ctl_imagenet_main.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,6 +22,7 @@ from absl import app from absl import flags from absl import logging import orbit +import json import tensorflow as tf from official.common import distribute_utils from official.legacy.image_classification.resnet import common diff --git a/official/legacy/image_classification/resnet/resnet_model.py b/official/legacy/image_classification/resnet/resnet_model.py index bd5ec8eb74850ed9aad8a9a3537a1d5e9283b4fd..545d06ecc9a49d815497dfebdccd7b9df59cb305 100644 --- a/official/legacy/image_classification/resnet/resnet_model.py +++ b/official/legacy/image_classification/resnet/resnet_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/resnet/resnet_runnable.py b/official/legacy/image_classification/resnet/resnet_runnable.py index 209117a1ab232fc3c0a1d568eaae56025ead867e..c8f9ade935f9711abbb7f229a025bb5cd79421b6 100644 --- a/official/legacy/image_classification/resnet/resnet_runnable.py +++ b/official/legacy/image_classification/resnet/resnet_runnable.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/resnet/tfhub_export.py b/official/legacy/image_classification/resnet/tfhub_export.py index a18360c9e8e2e6ab8455d8fa0e6b23d899ddd5ef..1d7d743ddeb11992cbc2f7c3ce0bda312e6a3108 100644 --- a/official/legacy/image_classification/resnet/tfhub_export.py +++ b/official/legacy/image_classification/resnet/tfhub_export.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/test_utils.py b/official/legacy/image_classification/test_utils.py index 8d7180c9d4e10c3241c4d6dd31d2cd013439df7a..871ac7e30f07c772134f54587cb657099361065b 100644 --- a/official/legacy/image_classification/test_utils.py +++ b/official/legacy/image_classification/test_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/image_classification/vgg/__init__.py b/official/legacy/image_classification/vgg/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 --- /dev/null +++ b/official/legacy/image_classification/vgg/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + diff --git a/official/legacy/image_classification/vgg/vgg_config.py b/official/legacy/image_classification/vgg/vgg_config.py new file mode 100644 index 0000000000000000000000000000000000000000..0bf936744fae71a63aa4f80553e5025d30cb68ae --- /dev/null +++ b/official/legacy/image_classification/vgg/vgg_config.py @@ -0,0 +1,44 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configuration definitions for VGG losses, learning rates, and optimizers.""" + +import dataclasses +from official.legacy.image_classification.configs import base_configs +from official.modeling.hyperparams import base_config + + +@dataclasses.dataclass +class VGGModelConfig(base_configs.ModelConfig): + """Configuration for the VGG model.""" + name: str = 'VGG' + num_classes: int = 1000 + model_params: base_config.Config = dataclasses.field(default_factory=lambda: { # pylint:disable=g-long-lambda + 'num_classes': 1000, + 'batch_size': None, + 'use_l2_regularizer': True + }) + loss: base_configs.LossConfig = base_configs.LossConfig( + name='sparse_categorical_crossentropy') + optimizer: base_configs.OptimizerConfig = base_configs.OptimizerConfig( + name='momentum', epsilon=0.001, momentum=0.9, moving_average_decay=None) + learning_rate: base_configs.LearningRateConfig = ( + base_configs.LearningRateConfig( + name='stepwise', + initial_lr=0.01, + examples_per_epoch=1281167, + boundaries=[30, 60], + warmup_epochs=0, + scale_by_batch_size=1. / 256., + multipliers=[0.01 / 256, 0.001 / 256, 0.0001 / 256])) diff --git a/official/legacy/image_classification/vgg/vgg_model.py b/official/legacy/image_classification/vgg/vgg_model.py new file mode 100644 index 0000000000000000000000000000000000000000..b93e22555c5d5dddb3c3de1faa6f866680b66b32 --- /dev/null +++ b/official/legacy/image_classification/vgg/vgg_model.py @@ -0,0 +1,269 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""VGG16 model for Keras. + +Adapted from tf.keras.applications.vgg16.VGG16(). + +Related papers/blogs: +- https://arxiv.org/abs/1409.1556 +""" + +import tensorflow as tf + +layers = tf.keras.layers + + +def _gen_l2_regularizer(use_l2_regularizer=True, l2_weight_decay=1e-4): + return tf.keras.regularizers.L2( + l2_weight_decay) if use_l2_regularizer else None + + +def vgg16(num_classes, + batch_size=None, + use_l2_regularizer=True, + batch_norm_decay=0.9, + batch_norm_epsilon=1e-5): + """Instantiates the VGG16 architecture. + + Args: + num_classes: `int` number of classes for image classification. + batch_size: Size of the batches for each step. + use_l2_regularizer: whether to use L2 regularizer on Conv/Dense layer. + batch_norm_decay: Moment of batch norm layers. + batch_norm_epsilon: Epsilon of batch borm layers. + + Returns: + A Keras model instance. + + """ + input_shape = (224, 224, 3) + img_input = layers.Input(shape=input_shape, batch_size=batch_size) + + x = img_input + + if tf.keras.backend.image_data_format() == 'channels_first': + x = layers.Permute((3, 1, 2))(x) + bn_axis = 1 + else: # channels_last + bn_axis = 3 + # Block 1 + x = layers.Conv2D( + 64, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block1_conv1')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv1')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 64, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block1_conv2')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv2')( + x) + x = layers.Activation('relu')(x) + x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x) + + # Block 2 + x = layers.Conv2D( + 128, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block2_conv1')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv3')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 128, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block2_conv2')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv4')( + x) + x = layers.Activation('relu')(x) + x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x) + + # Block 3 + x = layers.Conv2D( + 256, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block3_conv1')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv5')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 256, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block3_conv2')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv6')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 256, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block3_conv3')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv7')( + x) + x = layers.Activation('relu')(x) + x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x) + + # Block 4 + x = layers.Conv2D( + 512, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block4_conv1')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv8')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 512, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block4_conv2')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv9')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 512, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block4_conv3')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv10')( + x) + x = layers.Activation('relu')(x) + x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x) + + # Block 5 + x = layers.Conv2D( + 512, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block5_conv1')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv11')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 512, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block5_conv2')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv12')( + x) + x = layers.Activation('relu')(x) + x = layers.Conv2D( + 512, (3, 3), + padding='same', + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='block5_conv3')( + x) + x = layers.BatchNormalization( + axis=bn_axis, + momentum=batch_norm_decay, + epsilon=batch_norm_epsilon, + name='bn_conv13')( + x) + x = layers.Activation('relu')(x) + x = layers.MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x) + + x = layers.Flatten(name='flatten')(x) + x = layers.Dense( + 4096, + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='fc1')( + x) + x = layers.Activation('relu')(x) + x = layers.Dropout(0.5)(x) + x = layers.Dense( + 4096, + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='fc2')( + x) + x = layers.Activation('relu')(x) + x = layers.Dropout(0.5)(x) + x = layers.Dense( + num_classes, + kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), + name='fc1000')( + x) + + x = layers.Activation('softmax', dtype='float32')(x) + + # Create model. + return tf.keras.Model(img_input, x, name='vgg16') diff --git a/official/legacy/nlp/albert/__init__.py b/official/legacy/nlp/albert/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/legacy/nlp/albert/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/legacy/nlp/albert/configs.py b/official/legacy/nlp/albert/configs.py deleted file mode 100644 index 6fd6fdff7b97e7a0dce385eb4edd22de6d23b6d0..0000000000000000000000000000000000000000 --- a/official/legacy/nlp/albert/configs.py +++ /dev/null @@ -1,50 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""The ALBERT configurations.""" - -import six - -from official.nlp.bert import configs - - -class AlbertConfig(configs.BertConfig): - """Configuration for `ALBERT`.""" - - def __init__(self, num_hidden_groups=1, inner_group_num=1, **kwargs): - """Constructs AlbertConfig. - - Args: - num_hidden_groups: Number of group for the hidden layers, parameters in - the same group are shared. Note that this value and also the following - 'inner_group_num' has to be 1 for now, because all released ALBERT - models set them to 1. We may support arbitary valid values in future. - inner_group_num: Number of inner repetition of attention and ffn. - **kwargs: The remaining arguments are the same as above 'BertConfig'. - """ - super(AlbertConfig, self).__init__(**kwargs) - - # TODO(chendouble): 'inner_group_num' and 'num_hidden_groups' are always 1 - # in the released ALBERT. Support other values in AlbertEncoder if needed. - if inner_group_num != 1 or num_hidden_groups != 1: - raise ValueError("We only support 'inner_group_num' and " - "'num_hidden_groups' as 1.") - - @classmethod - def from_dict(cls, json_object): - """Constructs a `AlbertConfig` from a Python dictionary of parameters.""" - config = AlbertConfig(vocab_size=None) - for (key, value) in six.iteritems(json_object): - config.__dict__[key] = value - return config diff --git a/official/legacy/transformer/__init__.py b/official/legacy/transformer/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/transformer/__init__.py +++ b/official/legacy/transformer/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/attention_layer.py b/official/legacy/transformer/attention_layer.py index db6e95b1a293795614f86aa7041ca767b990f099..e966ce143237309b35969c9839cb4cb32908d071 100644 --- a/official/legacy/transformer/attention_layer.py +++ b/official/legacy/transformer/attention_layer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,6 +17,8 @@ import math import tensorflow as tf +from official.modeling import tf_utils + class Attention(tf.keras.layers.Layer): """Multi-headed attention layer.""" @@ -50,27 +52,27 @@ class Attention(tf.keras.layers.Layer): attention_initializer = _glorot_initializer(input_shape.as_list()[-1], self.hidden_size) - self.query_dense_layer = tf.keras.layers.experimental.EinsumDense( + self.query_dense_layer = tf.keras.layers.EinsumDense( "BTE,ENH->BTNH", output_shape=(None, self.num_heads, size_per_head), - kernel_initializer=attention_initializer, + kernel_initializer=tf_utils.clone_initializer(attention_initializer), bias_axes=None, name="query") - self.key_dense_layer = tf.keras.layers.experimental.EinsumDense( + self.key_dense_layer = tf.keras.layers.EinsumDense( "BTE,ENH->BTNH", output_shape=(None, self.num_heads, size_per_head), - kernel_initializer=attention_initializer, + kernel_initializer=tf_utils.clone_initializer(attention_initializer), bias_axes=None, name="key") - self.value_dense_layer = tf.keras.layers.experimental.EinsumDense( + self.value_dense_layer = tf.keras.layers.EinsumDense( "BTE,ENH->BTNH", output_shape=(None, self.num_heads, size_per_head), - kernel_initializer=attention_initializer, + kernel_initializer=tf_utils.clone_initializer(attention_initializer), bias_axes=None, name="value") output_initializer = _glorot_initializer(self.hidden_size, self.hidden_size) - self.output_dense_layer = tf.keras.layers.experimental.EinsumDense( + self.output_dense_layer = tf.keras.layers.EinsumDense( "BTNH,NHE->BTE", output_shape=(None, self.hidden_size), kernel_initializer=output_initializer, diff --git a/official/legacy/transformer/beam_search_v1.py b/official/legacy/transformer/beam_search_v1.py index 2c8537e63b20e718b15dfcd042f3263212af8c08..533cc01b211503a00502f71f06fc20cfb9f7b270 100644 --- a/official/legacy/transformer/beam_search_v1.py +++ b/official/legacy/transformer/beam_search_v1.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/compute_bleu.py b/official/legacy/transformer/compute_bleu.py index dbad8cbf0859ce2f24cfe792e639b4457b6a9037..c1b01e11a97b445c8da3733539911509307864f7 100644 --- a/official/legacy/transformer/compute_bleu.py +++ b/official/legacy/transformer/compute_bleu.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/compute_bleu_test.py b/official/legacy/transformer/compute_bleu_test.py index aed006e345246927dc72f76b76c8bb78333ae28e..24159248eb2472407dbbde2843bc9d7c7268c06f 100644 --- a/official/legacy/transformer/compute_bleu_test.py +++ b/official/legacy/transformer/compute_bleu_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/data_download.py b/official/legacy/transformer/data_download.py index 1b9b8f784c874cc8c4b0ba82a2ef23ddd2d2fb42..4731e82a22b268b8d9b13bfd85ff369f0207d44d 100644 --- a/official/legacy/transformer/data_download.py +++ b/official/legacy/transformer/data_download.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -188,7 +188,7 @@ def download_and_extract(path, url, input_filename, target_filename): Full paths to extracted input and target files. Raises: - OSError: if the the download/extraction fails. + OSError: if the download/extraction fails. """ # Check if extracted files already exist in path input_file = find_file(path, input_filename) diff --git a/official/legacy/transformer/data_pipeline.py b/official/legacy/transformer/data_pipeline.py index 1d9f242172cadcd38fefbc900658b914483b3b24..484c8e97a59e1c32582daa0982eb1f0dcbb08e2c 100644 --- a/official/legacy/transformer/data_pipeline.py +++ b/official/legacy/transformer/data_pipeline.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/embedding_layer.py b/official/legacy/transformer/embedding_layer.py index 69f3861ce6745bab0f62f29c2213fe53f99183c2..398a950df2b8f35628f5bf6192cab61c50912972 100644 --- a/official/legacy/transformer/embedding_layer.py +++ b/official/legacy/transformer/embedding_layer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/ffn_layer.py b/official/legacy/transformer/ffn_layer.py index 26f0a15f69c50abee6f95dd40928e844ece1c691..8e24a1e8428fb8c2659b5d6cca2ffc7cb32423d9 100644 --- a/official/legacy/transformer/ffn_layer.py +++ b/official/legacy/transformer/ffn_layer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/metrics.py b/official/legacy/transformer/metrics.py index 38330aa471c7f7384a3f42abb7eefc5a62a48d94..b469e6c6f67d70678534baf82521ab54c52a911d 100644 --- a/official/legacy/transformer/metrics.py +++ b/official/legacy/transformer/metrics.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/misc.py b/official/legacy/transformer/misc.py index 255a6b336c4081cffc148e3343b2119e3e959258..ff8930a6601680fb46469fb0e4ff1445e8670003 100644 --- a/official/legacy/transformer/misc.py +++ b/official/legacy/transformer/misc.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/model_params.py b/official/legacy/transformer/model_params.py index 0764d5e9a0d2e97754943cd61574b1c24469a0ae..70e464be20abd4b2ed02201b0397b13e3c87ac42 100644 --- a/official/legacy/transformer/model_params.py +++ b/official/legacy/transformer/model_params.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/model_utils.py b/official/legacy/transformer/model_utils.py index 6e163b97361cb7f071314909aaa1fc1e52ae6bfd..36095238822a0439f4bd7986ecaf038c76126319 100644 --- a/official/legacy/transformer/model_utils.py +++ b/official/legacy/transformer/model_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/model_utils_test.py b/official/legacy/transformer/model_utils_test.py index e6223c62b87a0055c9e8aa7269756c82fbf734b9..0758caa18707997f2766926dc95846da6ed82ba8 100644 --- a/official/legacy/transformer/model_utils_test.py +++ b/official/legacy/transformer/model_utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/optimizer.py b/official/legacy/transformer/optimizer.py index b27a6f07a4b73723be6f28d257bc3abcfbca43de..70e96ab6baddce4f326949891c056de9890e0e45 100644 --- a/official/legacy/transformer/optimizer.py +++ b/official/legacy/transformer/optimizer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/transformer.py b/official/legacy/transformer/transformer.py index da449a267ef7c3f870d03b96780bfc9cece88352..ed5d874900d863e73281ca6fb449d7acaa3d68cf 100644 --- a/official/legacy/transformer/transformer.py +++ b/official/legacy/transformer/transformer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/transformer_forward_test.py b/official/legacy/transformer/transformer_forward_test.py index b3c2c54c07d1890b30d35126d50e74b47050242f..5efdc4178f4cd87470a8bc5ddf24a930e5e2e443 100644 --- a/official/legacy/transformer/transformer_forward_test.py +++ b/official/legacy/transformer/transformer_forward_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/transformer_layers_test.py b/official/legacy/transformer/transformer_layers_test.py index 16b7482d39ebf6fb745c94eafa414ab0b0b234e4..c20804439654a2f8f4be09eb5b040958f9a3657d 100644 --- a/official/legacy/transformer/transformer_layers_test.py +++ b/official/legacy/transformer/transformer_layers_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/transformer_main.py b/official/legacy/transformer/transformer_main.py index 38fc3cff2ebc9dc8a732a40ea17d96f34b4be822..ec1e7634045c95c2ab86fd1f34f41429673f0299 100644 --- a/official/legacy/transformer/transformer_main.py +++ b/official/legacy/transformer/transformer_main.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/transformer_main_test.py b/official/legacy/transformer/transformer_main_test.py index ec1c5ac188f5bf85c07f36215c44c76684f062b2..82077858102fe9e344cbe9cfcbd3ce63695d360b 100644 --- a/official/legacy/transformer/transformer_main_test.py +++ b/official/legacy/transformer/transformer_main_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/transformer_test.py b/official/legacy/transformer/transformer_test.py index 7b3ecc5ab008ad7ac07e77f7b3afc80e5a4cd1dd..a6cedb48e1d4b3be0cfcce079733f9b1eab78e0c 100644 --- a/official/legacy/transformer/transformer_test.py +++ b/official/legacy/transformer/transformer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/translate.py b/official/legacy/transformer/translate.py index 5f88e015ba1ef68044e0a53d69979af951d8bed3..abbf82f5f1a166fe30d2be7521b8e2891fbdb0e8 100644 --- a/official/legacy/transformer/translate.py +++ b/official/legacy/transformer/translate.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/utils/__init__.py b/official/legacy/transformer/utils/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/legacy/transformer/utils/__init__.py +++ b/official/legacy/transformer/utils/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/utils/metrics.py b/official/legacy/transformer/utils/metrics.py index ec1cad0b409cfb69535dce15fab1d531d7811391..23261ac474a8f1d5a924a1e48f380a5483a3f15d 100644 --- a/official/legacy/transformer/utils/metrics.py +++ b/official/legacy/transformer/utils/metrics.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/utils/tokenizer.py b/official/legacy/transformer/utils/tokenizer.py index 6a992a324f3b0c651d219f4f2cc081a274d87db4..9533846d2fc4d7d74194806e3a7cbee73a198639 100644 --- a/official/legacy/transformer/utils/tokenizer.py +++ b/official/legacy/transformer/utils/tokenizer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/transformer/utils/tokenizer_test.py b/official/legacy/transformer/utils/tokenizer_test.py index e75cbd1e6333551d57f4910246e98097bacaf16f..2b582b99c6f7b6e07693c177fbb655231621aa55 100644 --- a/official/legacy/transformer/utils/tokenizer_test.py +++ b/official/legacy/transformer/utils/tokenizer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/xlnet/README.md b/official/legacy/xlnet/README.md similarity index 100% rename from official/nlp/xlnet/README.md rename to official/legacy/xlnet/README.md diff --git a/official/legacy/xlnet/__init__.py b/official/legacy/xlnet/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 --- /dev/null +++ b/official/legacy/xlnet/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + diff --git a/official/nlp/xlnet/classifier_utils.py b/official/legacy/xlnet/classifier_utils.py similarity index 98% rename from official/nlp/xlnet/classifier_utils.py rename to official/legacy/xlnet/classifier_utils.py index cb8acee087dc58596159d1b11ddf7c09299038dc..27aaf4ade1840a380ce054e8ab6705afe42b2b08 100644 --- a/official/nlp/xlnet/classifier_utils.py +++ b/official/legacy/xlnet/classifier_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ from absl import logging -from official.nlp.xlnet import data_utils +from official.legacy.xlnet import data_utils SEG_ID_A = 0 SEG_ID_B = 1 diff --git a/official/legacy/xlnet/common_flags.py b/official/legacy/xlnet/common_flags.py new file mode 100644 index 0000000000000000000000000000000000000000..b1ee5c3e86d8815576d89f26ed459e229adfb9cd --- /dev/null +++ b/official/legacy/xlnet/common_flags.py @@ -0,0 +1,142 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Common flags used in XLNet model.""" + +from absl import flags + +flags.DEFINE_string("master", default=None, help="master") +flags.DEFINE_string( + "tpu", + default=None, + help="The Cloud TPU to use for training. This should be " + "either the name used when creating the Cloud TPU, or a " + "url like grpc://ip.address.of.tpu:8470.") +flags.DEFINE_bool( + "use_tpu", default=True, help="Use TPUs rather than plain CPUs.") +flags.DEFINE_string("tpu_topology", "2x2", help="TPU topology.") +flags.DEFINE_integer( + "num_core_per_host", default=8, help="number of cores per host") + +flags.DEFINE_string("model_dir", default=None, help="Estimator model_dir.") +flags.DEFINE_string( + "init_checkpoint", + default=None, + help="Checkpoint path for initializing the model.") +flags.DEFINE_bool( + "init_from_transformerxl", + default=False, + help="Init from a transformerxl model checkpoint. Otherwise, init from the " + "entire model checkpoint.") + +# Optimization config +flags.DEFINE_float("learning_rate", default=1e-4, help="Maximum learning rate.") +flags.DEFINE_float("clip", default=1.0, help="Gradient clipping value.") +flags.DEFINE_float("weight_decay_rate", default=0.0, help="Weight decay rate.") + +# lr decay +flags.DEFINE_integer( + "warmup_steps", default=0, help="Number of steps for linear lr warmup.") +flags.DEFINE_float("adam_epsilon", default=1e-8, help="Adam epsilon.") +flags.DEFINE_float( + "lr_layer_decay_rate", + default=1.0, + help="Top layer: lr[L] = FLAGS.learning_rate." + "Lower layers: lr[l-1] = lr[l] * lr_layer_decay_rate.") +flags.DEFINE_float( + "min_lr_ratio", default=0.0, help="Minimum ratio learning rate.") + +# Training config +flags.DEFINE_integer( + "train_batch_size", + default=16, + help="Size of the train batch across all hosts.") +flags.DEFINE_integer( + "train_steps", default=100000, help="Total number of training steps.") +flags.DEFINE_integer( + "iterations", default=1000, help="Number of iterations per repeat loop.") + +# Data config +flags.DEFINE_integer( + "seq_len", default=0, help="Sequence length for pretraining.") +flags.DEFINE_integer( + "reuse_len", + default=0, + help="How many tokens to be reused in the next batch. " + "Could be half of `seq_len`.") +flags.DEFINE_bool("uncased", False, help="Use uncased inputs or not.") +flags.DEFINE_bool( + "bi_data", + default=False, + help="Use bidirectional data streams, " + "i.e., forward & backward.") +flags.DEFINE_integer("n_token", 32000, help="Vocab size") + +# Model config +flags.DEFINE_integer("mem_len", default=0, help="Number of steps to cache") +flags.DEFINE_bool("same_length", default=False, help="Same length attention") +flags.DEFINE_integer("clamp_len", default=-1, help="Clamp length") + +flags.DEFINE_integer("n_layer", default=6, help="Number of layers.") +flags.DEFINE_integer("d_model", default=32, help="Dimension of the model.") +flags.DEFINE_integer("d_embed", default=32, help="Dimension of the embeddings.") +flags.DEFINE_integer("n_head", default=4, help="Number of attention heads.") +flags.DEFINE_integer( + "d_head", default=8, help="Dimension of each attention head.") +flags.DEFINE_integer( + "d_inner", + default=32, + help="Dimension of inner hidden size in positionwise " + "feed-forward.") +flags.DEFINE_float("dropout", default=0.1, help="Dropout rate.") +flags.DEFINE_float("dropout_att", default=0.1, help="Attention dropout rate.") +flags.DEFINE_bool("untie_r", default=False, help="Untie r_w_bias and r_r_bias") +flags.DEFINE_string( + "ff_activation", + default="relu", + help="Activation type used in position-wise feed-forward.") +flags.DEFINE_string( + "strategy_type", + default="tpu", + help="Activation type used in position-wise feed-forward.") +flags.DEFINE_bool("use_bfloat16", False, help="Whether to use bfloat16.") + +# Parameter initialization +flags.DEFINE_enum( + "init_method", + default="normal", + enum_values=["normal", "uniform"], + help="Initialization method.") +flags.DEFINE_float( + "init_std", default=0.02, help="Initialization std when init is normal.") +flags.DEFINE_float( + "init_range", default=0.1, help="Initialization std when init is uniform.") + +flags.DEFINE_integer( + "test_data_size", default=12048, help="Number of test data samples.") +flags.DEFINE_string( + "train_tfrecord_path", + default=None, + help="Path to preprocessed training set tfrecord.") +flags.DEFINE_string( + "test_tfrecord_path", + default=None, + help="Path to preprocessed test set tfrecord.") +flags.DEFINE_integer( + "test_batch_size", + default=16, + help="Size of the test batch across all hosts.") +flags.DEFINE_integer( + "save_steps", default=1000, help="Number of steps for saving checkpoint.") +FLAGS = flags.FLAGS diff --git a/official/nlp/xlnet/data_utils.py b/official/legacy/xlnet/data_utils.py similarity index 99% rename from official/nlp/xlnet/data_utils.py rename to official/legacy/xlnet/data_utils.py index 58ffdbffc2c287064b2f98a5e04a70cc8020ff34..0048832d6eae6e1b470fbcfc45a17ec424a4d0fa 100644 --- a/official/nlp/xlnet/data_utils.py +++ b/official/legacy/xlnet/data_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/xlnet/optimization.py b/official/legacy/xlnet/optimization.py new file mode 100644 index 0000000000000000000000000000000000000000..2d394eaefba5be4d86757e798a4466a2f9b99457 --- /dev/null +++ b/official/legacy/xlnet/optimization.py @@ -0,0 +1,98 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Functions and classes related to optimization (weight updates).""" + +from absl import logging +import tensorflow as tf +from official.nlp import optimization + + +class WarmUp(tf.keras.optimizers.schedules.LearningRateSchedule): + """Applys a warmup schedule on a given learning rate decay schedule.""" + + def __init__(self, + initial_learning_rate, + decay_schedule_fn, + warmup_steps, + power=1.0, + name=None): + super(WarmUp, self).__init__() + self.initial_learning_rate = initial_learning_rate + self.warmup_steps = warmup_steps + self.power = power + self.decay_schedule_fn = decay_schedule_fn + self.name = name + + def __call__(self, step): + with tf.name_scope(self.name or "WarmUp") as name: + # Implements polynomial warmup. i.e., if global_step < warmup_steps, the + # learning rate will be `global_step/num_warmup_steps * init_lr`. + global_step_float = tf.cast(step, tf.float32) + warmup_steps_float = tf.cast(self.warmup_steps, tf.float32) + warmup_percent_done = global_step_float / warmup_steps_float + warmup_learning_rate = ( + self.initial_learning_rate * + tf.math.pow(warmup_percent_done, self.power)) + return tf.cond( + global_step_float < warmup_steps_float, + lambda: warmup_learning_rate, + lambda: self.decay_schedule_fn(step - self.warmup_steps), + name=name) + + def get_config(self): + return { + "initial_learning_rate": self.initial_learning_rate, + "decay_schedule_fn": self.decay_schedule_fn, + "warmup_steps": self.warmup_steps, + "power": self.power, + "name": self.name + } + + +def create_optimizer(init_lr, + num_train_steps, + num_warmup_steps, + min_lr_ratio=0.0, + adam_epsilon=1e-8, + weight_decay_rate=0.0): + """Creates an optimizer with learning rate schedule.""" + # Implements linear decay of the learning rate. + learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay( + initial_learning_rate=init_lr, + decay_steps=num_train_steps - num_warmup_steps, + end_learning_rate=init_lr * min_lr_ratio) + if num_warmup_steps: + learning_rate_fn = WarmUp( + initial_learning_rate=init_lr, + decay_schedule_fn=learning_rate_fn, + warmup_steps=num_warmup_steps) + if weight_decay_rate > 0.0: + logging.info( + "Using AdamWeightDecay with adam_epsilon=%.9f weight_decay_rate=%.3f", + adam_epsilon, weight_decay_rate) + optimizer = optimization.AdamWeightDecay( + learning_rate=learning_rate_fn, + weight_decay_rate=weight_decay_rate, + beta_1=0.9, + beta_2=0.999, + epsilon=adam_epsilon, + exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"], + include_in_weight_decay=["r_s_bias", "r_r_bias", "r_w_bias"]) + else: + logging.info("Using Adam with adam_epsilon=%.9f", (adam_epsilon)) + optimizer = tf.keras.optimizers.legacy.Adam( + learning_rate=learning_rate_fn, epsilon=adam_epsilon) + + return optimizer, learning_rate_fn diff --git a/official/nlp/xlnet/preprocess_classification_data.py b/official/legacy/xlnet/preprocess_classification_data.py similarity index 98% rename from official/nlp/xlnet/preprocess_classification_data.py rename to official/legacy/xlnet/preprocess_classification_data.py index e8d42fa4e61541fed4532caffcc012edcc8254bc..d517e486b039baa506acb21a85a0b97e9dd965f2 100644 --- a/official/nlp/xlnet/preprocess_classification_data.py +++ b/official/legacy/xlnet/preprocess_classification_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -26,8 +26,8 @@ import numpy as np import tensorflow as tf import sentencepiece as spm -from official.nlp.xlnet import classifier_utils -from official.nlp.xlnet import preprocess_utils +from official.legacy.xlnet import classifier_utils +from official.legacy.xlnet import preprocess_utils flags.DEFINE_bool( diff --git a/official/nlp/xlnet/preprocess_pretrain_data.py b/official/legacy/xlnet/preprocess_pretrain_data.py similarity index 99% rename from official/nlp/xlnet/preprocess_pretrain_data.py rename to official/legacy/xlnet/preprocess_pretrain_data.py index 3facc98f5941320379bd75688deeb626572db52d..aaf60ba5e4a8c1dc2ddcc5865247e8edce418bd0 100644 --- a/official/nlp/xlnet/preprocess_pretrain_data.py +++ b/official/legacy/xlnet/preprocess_pretrain_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,7 +28,7 @@ import numpy as np import tensorflow.compat.v1 as tf import sentencepiece as spm -from official.nlp.xlnet import preprocess_utils +from official.legacy.xlnet import preprocess_utils FLAGS = flags.FLAGS diff --git a/official/nlp/xlnet/preprocess_squad_data.py b/official/legacy/xlnet/preprocess_squad_data.py similarity index 97% rename from official/nlp/xlnet/preprocess_squad_data.py rename to official/legacy/xlnet/preprocess_squad_data.py index e1d49565067c57611d8613a6d14e5e4bf221b1fc..e99177c838e8dda1a8bf1e2ca639325b67e76afc 100644 --- a/official/nlp/xlnet/preprocess_squad_data.py +++ b/official/legacy/xlnet/preprocess_squad_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -25,7 +25,7 @@ from absl import logging import tensorflow as tf import sentencepiece as spm -from official.nlp.xlnet import squad_utils +from official.legacy.xlnet import squad_utils flags.DEFINE_integer( "num_proc", default=1, help="Number of preprocessing processes.") diff --git a/official/nlp/xlnet/preprocess_utils.py b/official/legacy/xlnet/preprocess_utils.py similarity index 98% rename from official/nlp/xlnet/preprocess_utils.py rename to official/legacy/xlnet/preprocess_utils.py index 5c714a0c1fdd3a7cddd9c0a63fc09c80bc08627e..19cae9174c27848d673070d11f3926be8bdf1ded 100644 --- a/official/nlp/xlnet/preprocess_utils.py +++ b/official/legacy/xlnet/preprocess_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/legacy/xlnet/run_classifier.py b/official/legacy/xlnet/run_classifier.py new file mode 100644 index 0000000000000000000000000000000000000000..258e6116ab3537384c9785ce31ec2f92478e57e6 --- /dev/null +++ b/official/legacy/xlnet/run_classifier.py @@ -0,0 +1,187 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""XLNet classification finetuning runner in tf2.0.""" + +import functools +# Import libraries +from absl import app +from absl import flags +from absl import logging + +import numpy as np +import tensorflow as tf +# pylint: disable=unused-import +from official.common import distribute_utils +from official.legacy.xlnet import common_flags +from official.legacy.xlnet import data_utils +from official.legacy.xlnet import optimization +from official.legacy.xlnet import training_utils +from official.legacy.xlnet import xlnet_config +from official.legacy.xlnet import xlnet_modeling as modeling + +flags.DEFINE_integer("n_class", default=2, help="Number of classes.") +flags.DEFINE_string( + "summary_type", + default="last", + help="Method used to summarize a sequence into a vector.") + +FLAGS = flags.FLAGS + + +def get_classificationxlnet_model(model_config, + run_config, + n_class, + summary_type="last"): + model = modeling.ClassificationXLNetModel( + model_config, run_config, n_class, summary_type, name="model") + return model + + +def run_evaluation(strategy, + test_input_fn, + eval_steps, + model, + step, + eval_summary_writer=None): + """Run evaluation for classification task. + + Args: + strategy: distribution strategy. + test_input_fn: input function for evaluation data. + eval_steps: total number of evaluation steps. + model: keras model object. + step: current train step. + eval_summary_writer: summary writer used to record evaluation metrics. As + there are fake data samples in validation set, we use mask to get rid of + them when calculating the accuracy. For the reason that there will be + dynamic-shape tensor, we first collect logits, labels and masks from TPU + and calculate the accuracy via numpy locally. + + Returns: + A float metric, accuracy. + """ + + def _test_step_fn(inputs): + """Replicated validation step.""" + + inputs["mems"] = None + _, logits = model(inputs, training=False) + return logits, inputs["label_ids"], inputs["is_real_example"] + + @tf.function + def _run_evaluation(test_iterator): + """Runs validation steps.""" + logits, labels, masks = strategy.run( + _test_step_fn, args=(next(test_iterator),)) + return logits, labels, masks + + test_iterator = data_utils.get_input_iterator(test_input_fn, strategy) + correct = 0 + total = 0 + for _ in range(eval_steps): + logits, labels, masks = _run_evaluation(test_iterator) + logits = strategy.experimental_local_results(logits) + labels = strategy.experimental_local_results(labels) + masks = strategy.experimental_local_results(masks) + merged_logits = [] + merged_labels = [] + merged_masks = [] + + for i in range(strategy.num_replicas_in_sync): + merged_logits.append(logits[i].numpy()) + merged_labels.append(labels[i].numpy()) + merged_masks.append(masks[i].numpy()) + merged_logits = np.vstack(np.array(merged_logits)) + merged_labels = np.hstack(np.array(merged_labels)) + merged_masks = np.hstack(np.array(merged_masks)) + real_index = np.where(np.equal(merged_masks, 1)) + correct += np.sum( + np.equal( + np.argmax(merged_logits[real_index], axis=-1), + merged_labels[real_index])) + total += np.shape(real_index)[-1] + accuracy = float(correct) / float(total) + logging.info("Train step: %d / acc = %d/%d = %f", step, correct, total, + accuracy) + if eval_summary_writer: + with eval_summary_writer.as_default(): + tf.summary.scalar("eval_acc", float(correct) / float(total), step=step) + eval_summary_writer.flush() + return accuracy + + +def get_metric_fn(): + train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy( + "acc", dtype=tf.float32) + return train_acc_metric + + +def main(unused_argv): + del unused_argv + strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=FLAGS.strategy_type, + tpu_address=FLAGS.tpu) + if strategy: + logging.info("***** Number of cores used : %d", + strategy.num_replicas_in_sync) + train_input_fn = functools.partial(data_utils.get_classification_input_data, + FLAGS.train_batch_size, FLAGS.seq_len, + strategy, True, FLAGS.train_tfrecord_path) + test_input_fn = functools.partial(data_utils.get_classification_input_data, + FLAGS.test_batch_size, FLAGS.seq_len, + strategy, False, FLAGS.test_tfrecord_path) + + total_training_steps = FLAGS.train_steps + steps_per_loop = FLAGS.iterations + eval_steps = int(FLAGS.test_data_size / FLAGS.test_batch_size) + eval_fn = functools.partial(run_evaluation, strategy, test_input_fn, + eval_steps) + optimizer, learning_rate_fn = optimization.create_optimizer( + FLAGS.learning_rate, + total_training_steps, + FLAGS.warmup_steps, + adam_epsilon=FLAGS.adam_epsilon) + model_config = xlnet_config.XLNetConfig(FLAGS) + run_config = xlnet_config.create_run_config(True, False, FLAGS) + model_fn = functools.partial(get_classificationxlnet_model, model_config, + run_config, FLAGS.n_class, FLAGS.summary_type) + input_meta_data = {} + input_meta_data["d_model"] = FLAGS.d_model + input_meta_data["mem_len"] = FLAGS.mem_len + input_meta_data["batch_size_per_core"] = int(FLAGS.train_batch_size / + strategy.num_replicas_in_sync) + input_meta_data["n_layer"] = FLAGS.n_layer + input_meta_data["lr_layer_decay_rate"] = FLAGS.lr_layer_decay_rate + input_meta_data["n_class"] = FLAGS.n_class + + training_utils.train( + strategy=strategy, + model_fn=model_fn, + input_meta_data=input_meta_data, + eval_fn=eval_fn, + metric_fn=get_metric_fn, + train_input_fn=train_input_fn, + init_checkpoint=FLAGS.init_checkpoint, + init_from_transformerxl=FLAGS.init_from_transformerxl, + total_training_steps=total_training_steps, + steps_per_loop=steps_per_loop, + optimizer=optimizer, + learning_rate_fn=learning_rate_fn, + model_dir=FLAGS.model_dir, + save_steps=FLAGS.save_steps) + + +if __name__ == "__main__": + app.run(main) diff --git a/official/nlp/xlnet/run_pretrain.py b/official/legacy/xlnet/run_pretrain.py similarity index 93% rename from official/nlp/xlnet/run_pretrain.py rename to official/legacy/xlnet/run_pretrain.py index 80ab0bd4d1c500c92e2d97106fb3e3eab0d0b33e..311f283a9cb46f72eb9adc7e3b509e1a482348be 100644 --- a/official/nlp/xlnet/run_pretrain.py +++ b/official/legacy/xlnet/run_pretrain.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -24,12 +24,12 @@ from absl import logging import tensorflow as tf # pylint: disable=unused-import from official.common import distribute_utils -from official.nlp.xlnet import common_flags -from official.nlp.xlnet import data_utils -from official.nlp.xlnet import optimization -from official.nlp.xlnet import training_utils -from official.nlp.xlnet import xlnet_config -from official.nlp.xlnet import xlnet_modeling as modeling +from official.legacy.xlnet import common_flags +from official.legacy.xlnet import data_utils +from official.legacy.xlnet import optimization +from official.legacy.xlnet import training_utils +from official.legacy.xlnet import xlnet_config +from official.legacy.xlnet import xlnet_modeling as modeling flags.DEFINE_integer( "num_predict", diff --git a/official/legacy/xlnet/run_squad.py b/official/legacy/xlnet/run_squad.py new file mode 100644 index 0000000000000000000000000000000000000000..29a5c5c451c929b35f68b754771e6a8aab1d3df2 --- /dev/null +++ b/official/legacy/xlnet/run_squad.py @@ -0,0 +1,295 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""XLNet SQUAD finetuning runner in tf2.0.""" + +import functools +import json +import os +import pickle + +# Import libraries +from absl import app +from absl import flags +from absl import logging + +import tensorflow as tf +# pylint: disable=unused-import +import sentencepiece as spm +from official.common import distribute_utils +from official.legacy.xlnet import common_flags +from official.legacy.xlnet import data_utils +from official.legacy.xlnet import optimization +from official.legacy.xlnet import squad_utils +from official.legacy.xlnet import training_utils +from official.legacy.xlnet import xlnet_config +from official.legacy.xlnet import xlnet_modeling as modeling + +flags.DEFINE_string( + "test_feature_path", default=None, help="Path to feature of test set.") +flags.DEFINE_integer("query_len", default=64, help="Max query length.") +flags.DEFINE_integer("start_n_top", default=5, help="Beam size for span start.") +flags.DEFINE_integer("end_n_top", default=5, help="Beam size for span end.") +flags.DEFINE_string( + "predict_dir", default=None, help="Path to write predictions.") +flags.DEFINE_string( + "predict_file", default=None, help="Path to json file of test set.") +flags.DEFINE_integer( + "n_best_size", default=5, help="n best size for predictions.") +flags.DEFINE_integer("max_answer_length", default=64, help="Max answer length.") +# Data preprocessing config +flags.DEFINE_string( + "spiece_model_file", default=None, help="Sentence Piece model path.") +flags.DEFINE_integer("max_seq_length", default=512, help="Max sequence length.") +flags.DEFINE_integer("max_query_length", default=64, help="Max query length.") +flags.DEFINE_integer("doc_stride", default=128, help="Doc stride.") + +FLAGS = flags.FLAGS + + +class InputFeatures(object): + """A single set of features of data.""" + + def __init__(self, + unique_id, + example_index, + doc_span_index, + tok_start_to_orig_index, + tok_end_to_orig_index, + token_is_max_context, + input_ids, + input_mask, + p_mask, + segment_ids, + paragraph_len, + cls_index, + start_position=None, + end_position=None, + is_impossible=None): + self.unique_id = unique_id + self.example_index = example_index + self.doc_span_index = doc_span_index + self.tok_start_to_orig_index = tok_start_to_orig_index + self.tok_end_to_orig_index = tok_end_to_orig_index + self.token_is_max_context = token_is_max_context + self.input_ids = input_ids + self.input_mask = input_mask + self.p_mask = p_mask + self.segment_ids = segment_ids + self.paragraph_len = paragraph_len + self.cls_index = cls_index + self.start_position = start_position + self.end_position = end_position + self.is_impossible = is_impossible + + +# pylint: disable=unused-argument +def run_evaluation(strategy, test_input_fn, eval_examples, eval_features, + original_data, eval_steps, input_meta_data, model, + current_step, eval_summary_writer): + """Run evaluation for SQUAD task. + + Args: + strategy: distribution strategy. + test_input_fn: input function for evaluation data. + eval_examples: tf.Examples of the evaluation set. + eval_features: Feature objects of the evaluation set. + original_data: The original json data for the evaluation set. + eval_steps: total number of evaluation steps. + input_meta_data: input meta data. + model: keras model object. + current_step: current training step. + eval_summary_writer: summary writer used to record evaluation metrics. + + Returns: + A float metric, F1 score. + """ + + def _test_step_fn(inputs): + """Replicated validation step.""" + + inputs["mems"] = None + res = model(inputs, training=False) + return res, inputs["unique_ids"] + + @tf.function + def _run_evaluation(test_iterator): + """Runs validation steps.""" + res, unique_ids = strategy.run( + _test_step_fn, args=(next(test_iterator),)) + return res, unique_ids + + test_iterator = data_utils.get_input_iterator(test_input_fn, strategy) + cur_results = [] + for _ in range(eval_steps): + results, unique_ids = _run_evaluation(test_iterator) + unique_ids = strategy.experimental_local_results(unique_ids) + + for result_key in results: + results[result_key] = ( + strategy.experimental_local_results(results[result_key])) + for core_i in range(strategy.num_replicas_in_sync): + bsz = int(input_meta_data["test_batch_size"] / + strategy.num_replicas_in_sync) + for j in range(bsz): + result = {} + for result_key in results: + result[result_key] = results[result_key][core_i].numpy()[j] + result["unique_ids"] = unique_ids[core_i].numpy()[j] + # We appended a fake example into dev set to make data size can be + # divided by test_batch_size. Ignores this fake example during + # evaluation. + if result["unique_ids"] == 1000012047: + continue + unique_id = int(result["unique_ids"]) + + start_top_log_probs = ([ + float(x) for x in result["start_top_log_probs"].flat + ]) + start_top_index = [int(x) for x in result["start_top_index"].flat] + end_top_log_probs = ([ + float(x) for x in result["end_top_log_probs"].flat + ]) + end_top_index = [int(x) for x in result["end_top_index"].flat] + + cls_logits = float(result["cls_logits"].flat[0]) + cur_results.append( + squad_utils.RawResult( + unique_id=unique_id, + start_top_log_probs=start_top_log_probs, + start_top_index=start_top_index, + end_top_log_probs=end_top_log_probs, + end_top_index=end_top_index, + cls_logits=cls_logits)) + if len(cur_results) % 1000 == 0: + logging.info("Processing example: %d", len(cur_results)) + + output_prediction_file = os.path.join(input_meta_data["predict_dir"], + "predictions.json") + output_nbest_file = os.path.join(input_meta_data["predict_dir"], + "nbest_predictions.json") + output_null_log_odds_file = os.path.join(input_meta_data["predict_dir"], + "null_odds.json") + + results = squad_utils.write_predictions( + eval_examples, eval_features, cur_results, input_meta_data["n_best_size"], + input_meta_data["max_answer_length"], output_prediction_file, + output_nbest_file, output_null_log_odds_file, original_data, + input_meta_data["start_n_top"], input_meta_data["end_n_top"]) + + # Log current results. + log_str = "Result | " + for key, val in results.items(): + log_str += "{} {} | ".format(key, val) + logging.info(log_str) + with eval_summary_writer.as_default(): + tf.summary.scalar("best_f1", results["best_f1"], step=current_step) + tf.summary.scalar("best_exact", results["best_exact"], step=current_step) + eval_summary_writer.flush() + return results["best_f1"] + + +def get_qaxlnet_model(model_config, run_config, start_n_top, end_n_top): + model = modeling.QAXLNetModel( + model_config, + run_config, + start_n_top=start_n_top, + end_n_top=end_n_top, + name="model") + return model + + +def main(unused_argv): + del unused_argv + strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=FLAGS.strategy_type, + tpu_address=FLAGS.tpu) + if strategy: + logging.info("***** Number of cores used : %d", + strategy.num_replicas_in_sync) + train_input_fn = functools.partial(data_utils.get_squad_input_data, + FLAGS.train_batch_size, FLAGS.seq_len, + FLAGS.query_len, strategy, True, + FLAGS.train_tfrecord_path) + + test_input_fn = functools.partial(data_utils.get_squad_input_data, + FLAGS.test_batch_size, FLAGS.seq_len, + FLAGS.query_len, strategy, False, + FLAGS.test_tfrecord_path) + + total_training_steps = FLAGS.train_steps + steps_per_loop = FLAGS.iterations + eval_steps = int(FLAGS.test_data_size / FLAGS.test_batch_size) + + optimizer, learning_rate_fn = optimization.create_optimizer( + FLAGS.learning_rate, + total_training_steps, + FLAGS.warmup_steps, + adam_epsilon=FLAGS.adam_epsilon) + model_config = xlnet_config.XLNetConfig(FLAGS) + run_config = xlnet_config.create_run_config(True, False, FLAGS) + input_meta_data = {} + input_meta_data["start_n_top"] = FLAGS.start_n_top + input_meta_data["end_n_top"] = FLAGS.end_n_top + input_meta_data["lr_layer_decay_rate"] = FLAGS.lr_layer_decay_rate + input_meta_data["predict_dir"] = FLAGS.predict_dir + input_meta_data["n_best_size"] = FLAGS.n_best_size + input_meta_data["max_answer_length"] = FLAGS.max_answer_length + input_meta_data["test_batch_size"] = FLAGS.test_batch_size + input_meta_data["batch_size_per_core"] = int(FLAGS.train_batch_size / + strategy.num_replicas_in_sync) + input_meta_data["mem_len"] = FLAGS.mem_len + model_fn = functools.partial(get_qaxlnet_model, model_config, run_config, + FLAGS.start_n_top, FLAGS.end_n_top) + eval_examples = squad_utils.read_squad_examples( + FLAGS.predict_file, is_training=False) + if FLAGS.test_feature_path: + logging.info("start reading pickle file...") + with tf.io.gfile.GFile(FLAGS.test_feature_path, "rb") as f: + eval_features = pickle.load(f) + logging.info("finishing reading pickle file...") + else: + sp_model = spm.SentencePieceProcessor() + sp_model.LoadFromSerializedProto( + tf.io.gfile.GFile(FLAGS.spiece_model_file, "rb").read()) + spm_basename = os.path.basename(FLAGS.spiece_model_file) + eval_features = squad_utils.create_eval_data( + spm_basename, sp_model, eval_examples, FLAGS.max_seq_length, + FLAGS.max_query_length, FLAGS.doc_stride, FLAGS.uncased) + + with tf.io.gfile.GFile(FLAGS.predict_file) as f: + original_data = json.load(f)["data"] + eval_fn = functools.partial(run_evaluation, strategy, test_input_fn, + eval_examples, eval_features, original_data, + eval_steps, input_meta_data) + + training_utils.train( + strategy=strategy, + model_fn=model_fn, + input_meta_data=input_meta_data, + eval_fn=eval_fn, + metric_fn=None, + train_input_fn=train_input_fn, + init_checkpoint=FLAGS.init_checkpoint, + init_from_transformerxl=FLAGS.init_from_transformerxl, + total_training_steps=total_training_steps, + steps_per_loop=steps_per_loop, + optimizer=optimizer, + learning_rate_fn=learning_rate_fn, + model_dir=FLAGS.model_dir, + save_steps=FLAGS.save_steps) + + +if __name__ == "__main__": + app.run(main) diff --git a/official/nlp/xlnet/squad_utils.py b/official/legacy/xlnet/squad_utils.py similarity index 99% rename from official/nlp/xlnet/squad_utils.py rename to official/legacy/xlnet/squad_utils.py index 44a7b7deed0935ccc7991d8390a7922a48e02206..641e8818f48bcef581bcc4c508c042170b5c8423 100644 --- a/official/nlp/xlnet/squad_utils.py +++ b/official/legacy/xlnet/squad_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -32,8 +32,8 @@ import numpy as np import six import tensorflow as tf -from official.nlp.xlnet import data_utils -from official.nlp.xlnet import preprocess_utils +from official.legacy.xlnet import data_utils +from official.legacy.xlnet import preprocess_utils SPIECE_UNDERLINE = u"▁" diff --git a/official/nlp/xlnet/training_utils.py b/official/legacy/xlnet/training_utils.py similarity index 98% rename from official/nlp/xlnet/training_utils.py rename to official/legacy/xlnet/training_utils.py index 45afaa76d621046d37cb39d5c4acdd509f98c3da..5fd924e8bbf8bba1b3c17f80f9c145673a793bbb 100644 --- a/official/nlp/xlnet/training_utils.py +++ b/official/legacy/xlnet/training_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,8 +21,8 @@ from typing import Any, Callable, Dict, Optional, Text from absl import logging import tensorflow as tf -from official.nlp.bert import model_training_utils -from official.nlp.xlnet import data_utils +from official.legacy.bert import model_training_utils +from official.legacy.xlnet import data_utils # pytype: disable=attribute-error # pylint: disable=g-bare-generic,unused-import diff --git a/official/nlp/xlnet/xlnet_config.py b/official/legacy/xlnet/xlnet_config.py similarity index 98% rename from official/nlp/xlnet/xlnet_config.py rename to official/legacy/xlnet/xlnet_config.py index c0f51955b57289884fc522cc02c3d3db6404bf76..d8ee7e6a07fac9000535969e3bb4a0f5493ed31e 100644 --- a/official/nlp/xlnet/xlnet_config.py +++ b/official/legacy/xlnet/xlnet_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/xlnet/xlnet_modeling.py b/official/legacy/xlnet/xlnet_modeling.py similarity index 99% rename from official/nlp/xlnet/xlnet_modeling.py rename to official/legacy/xlnet/xlnet_modeling.py index b48aff4e795444c176cc862dcb98b01e76c39c7d..f03354f62ab4cb266fa8d4fcb93a96b46711579a 100644 --- a/official/nlp/xlnet/xlnet_modeling.py +++ b/official/legacy/xlnet/xlnet_modeling.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,9 +18,8 @@ import copy import warnings import tensorflow as tf - +from official.legacy.xlnet import data_utils from official.nlp.modeling import networks -from official.nlp.xlnet import data_utils def gelu(x): diff --git a/official/modeling/__init__.py b/official/modeling/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/modeling/__init__.py +++ b/official/modeling/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/__init__.py b/official/modeling/activations/__init__.py index 086e1fb975f8517dcff3c020f5fd932f6e55edc7..24c0d2606c19d752cde41fc05076920ee88c1b6d 100644 --- a/official/modeling/activations/__init__.py +++ b/official/modeling/activations/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,6 +14,7 @@ """Activations package definition.""" from official.modeling.activations.gelu import gelu +from official.modeling.activations.mish import mish from official.modeling.activations.relu import relu6 from official.modeling.activations.sigmoid import hard_sigmoid from official.modeling.activations.swish import hard_swish diff --git a/official/modeling/activations/gelu.py b/official/modeling/activations/gelu.py index a73294aa5493747af66d9bbbc2cc26914600d7cf..1ca79ebb662c3924d82b712de31e92c985334a40 100644 --- a/official/modeling/activations/gelu.py +++ b/official/modeling/activations/gelu.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/gelu_test.py b/official/modeling/activations/gelu_test.py index cfe1950d9f112c3c33421c410ecdd4ceedd6f1d7..727a714e38bbab2e2549a49441f3ce0282eeaf21 100644 --- a/official/modeling/activations/gelu_test.py +++ b/official/modeling/activations/gelu_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/mish.py b/official/modeling/activations/mish.py new file mode 100644 index 0000000000000000000000000000000000000000..063a4a737bf811061b021e845fa10df3b74ccba5 --- /dev/null +++ b/official/modeling/activations/mish.py @@ -0,0 +1,38 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Self Regularized Non-Monotonic Activation Function.""" + +import tensorflow as tf + +from tensorflow_addons.utils import types + + +@tf.keras.utils.register_keras_serializable(package='Text') +def mish(x: types.TensorLike) -> tf.Tensor: + """Mish activation function. + + Mish: A Self Regularized Non-Monotonic Activation Function + https://arxiv.org/pdf/1908.08681.pdf + + Mish(x) = x * tanh(ln(1+e^x)) + + Args: + x: A `Tensor` representing preactivation values. + + Returns: + The activation value. + """ + x = tf.convert_to_tensor(x) + return x * tf.tanh(tf.nn.softplus(x)) diff --git a/official/modeling/activations/mish_test.py b/official/modeling/activations/mish_test.py new file mode 100644 index 0000000000000000000000000000000000000000..15eff91d160cc0ef3677e440ebd005c95043f499 --- /dev/null +++ b/official/modeling/activations/mish_test.py @@ -0,0 +1,32 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for the customized Mish activation.""" + +import tensorflow as tf + +from tensorflow.python.keras import keras_parameterized # pylint: disable=g-direct-tensorflow-import +from official.modeling import activations + + +@keras_parameterized.run_all_keras_modes +class MishTest(keras_parameterized.TestCase): + + def test_mish(self): + x = tf.constant([1.0, 0.0]) + self.assertAllClose([0.86509839, 0.0], activations.mish(x)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/modeling/activations/relu.py b/official/modeling/activations/relu.py index b3941b2f3462fa6a3eea28e023a4450bcc070797..410be29d266d6b5b06f221ebe9588abf70e952a2 100644 --- a/official/modeling/activations/relu.py +++ b/official/modeling/activations/relu.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/relu_test.py b/official/modeling/activations/relu_test.py index 215f189ea9a00ed93bf012d33429fd82b3dc7ca6..45a8339e2a2c8cf1e95608d322eef92a127359d8 100644 --- a/official/modeling/activations/relu_test.py +++ b/official/modeling/activations/relu_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/sigmoid.py b/official/modeling/activations/sigmoid.py index 277463040e784325f2b47a5492c98d6e3283ad08..a3fc77fa5eaad267a9ce70e1fd0e0ddb5d753d4d 100644 --- a/official/modeling/activations/sigmoid.py +++ b/official/modeling/activations/sigmoid.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/sigmoid_test.py b/official/modeling/activations/sigmoid_test.py index 6aad90ef3645b08708dbfde155654070c40d72ce..e5a1a61f97faead4fd53387cab05b3cecd79fe76 100644 --- a/official/modeling/activations/sigmoid_test.py +++ b/official/modeling/activations/sigmoid_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/swish.py b/official/modeling/activations/swish.py index ea79985e3006f1400350601d9b857e947287ace1..3d9372370ceb91c8fb7d762e6fb97834304d8b69 100644 --- a/official/modeling/activations/swish.py +++ b/official/modeling/activations/swish.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/activations/swish_test.py b/official/modeling/activations/swish_test.py index 3cb9495d8d19a3b89e4a9b2db0679090ac1e3e9d..1eb5fa2a94f8466425906aeff600c851e19e7b08 100644 --- a/official/modeling/activations/swish_test.py +++ b/official/modeling/activations/swish_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/experimental/tf2_utils_2x_wide.py b/official/modeling/fast_training/experimental/tf2_utils_2x_wide.py index 16940cffa153104ca9839d80bd0021acc8bdf2fe..af0760277ae8ce08bfbee9d253b5beea392a5eb9 100644 --- a/official/modeling/fast_training/experimental/tf2_utils_2x_wide.py +++ b/official/modeling/fast_training/experimental/tf2_utils_2x_wide.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/experimental/tf2_utils_2x_wide_test.py b/official/modeling/fast_training/experimental/tf2_utils_2x_wide_test.py index 25d6e7628d16dfcc83cc6608da73d5ef31834751..2b95110b606f685669e775fcc82447ced775a253 100644 --- a/official/modeling/fast_training/experimental/tf2_utils_2x_wide_test.py +++ b/official/modeling/fast_training/experimental/tf2_utils_2x_wide_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/progressive/policies.py b/official/modeling/fast_training/progressive/policies.py index b4f7c3f018bbf45896ca3eb5b3a327dcd9b4dfb7..52c3e73b486b20e88c32e96774303aa5f16e7f01 100644 --- a/official/modeling/fast_training/progressive/policies.py +++ b/official/modeling/fast_training/progressive/policies.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/progressive/train.py b/official/modeling/fast_training/progressive/train.py index f547ac9a56b0843abce6f2cdeccd6c2cd9d55217..612a485c6b48fe0bcdbe9bc0fa1fd36f282edd0d 100644 --- a/official/modeling/fast_training/progressive/train.py +++ b/official/modeling/fast_training/progressive/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/progressive/train_lib.py b/official/modeling/fast_training/progressive/train_lib.py index baa132e197bb621b276c7a6471d07fb402a804c0..1fdb1d1c23c03152741ba8572df2dc40a43cd072 100644 --- a/official/modeling/fast_training/progressive/train_lib.py +++ b/official/modeling/fast_training/progressive/train_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/progressive/train_lib_test.py b/official/modeling/fast_training/progressive/train_lib_test.py index f91faf902ebad5a7af92907fd434f585a580bf3c..fdc35b2e823b13642dd36b71def9ec7c9dfbea84 100644 --- a/official/modeling/fast_training/progressive/train_lib_test.py +++ b/official/modeling/fast_training/progressive/train_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/progressive/trainer.py b/official/modeling/fast_training/progressive/trainer.py index 685ec395045469c1120be0d02f6575e1b65fc070..af24af52787a98e34a6a4179a8a7177f62539bd8 100644 --- a/official/modeling/fast_training/progressive/trainer.py +++ b/official/modeling/fast_training/progressive/trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/fast_training/progressive/trainer_test.py b/official/modeling/fast_training/progressive/trainer_test.py index a0c5d82a55dc94fc5c6f16dfe94f047b56ebf05f..d38e1757caa6a7400658515c1c99560b38366206 100644 --- a/official/modeling/fast_training/progressive/trainer_test.py +++ b/official/modeling/fast_training/progressive/trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -226,9 +226,13 @@ class TrainerWithMaskedLMTaskTest(tf.test.TestCase, parameterized.TestCase): task = TestPolicy(None, config.task) trainer = trainer_lib.ProgressiveTrainer(config, task, self.get_temp_dir()) if mixed_precision_dtype != 'float16': - self.assertIsInstance(trainer.optimizer, tf.keras.optimizers.SGD) + self.assertIsInstance( + trainer.optimizer, + (tf.keras.optimizers.SGD, tf.keras.optimizers.legacy.SGD)) elif mixed_precision_dtype == 'float16' and loss_scale is None: - self.assertIsInstance(trainer.optimizer, tf.keras.optimizers.SGD) + self.assertIsInstance( + trainer.optimizer, + (tf.keras.optimizers.SGD, tf.keras.optimizers.legacy.SGD)) metrics = trainer.train(tf.convert_to_tensor(5, dtype=tf.int32)) self.assertIn('training_loss', metrics) diff --git a/official/modeling/fast_training/progressive/utils.py b/official/modeling/fast_training/progressive/utils.py index 192170cb87825de6972ab4a85a6b556ee40600c4..2bfd1d6be264390f52596b5cbbd82328e89ba964 100644 --- a/official/modeling/fast_training/progressive/utils.py +++ b/official/modeling/fast_training/progressive/utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,10 +18,10 @@ from absl import logging import tensorflow as tf # pylint: disable=g-direct-tensorflow-import -from tensorflow.python.training.tracking import tracking +from tensorflow.python.trackable import autotrackable -class VolatileTrackable(tracking.AutoTrackable): +class VolatileTrackable(autotrackable.AutoTrackable): """A util class to keep Trackables that might change instances.""" def __init__(self, **kwargs): diff --git a/official/modeling/grad_utils.py b/official/modeling/grad_utils.py index 1113d39d5e6f19c9c8fba9e8d8b5c3f99e4e6fba..22479e6ff3bd40dd1fb900fda9145318129aaaa2 100644 --- a/official/modeling/grad_utils.py +++ b/official/modeling/grad_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/grad_utils_test.py b/official/modeling/grad_utils_test.py index cc9c1912be268c9952c979564eacce6d0c0ed4a8..ded7794ab58c6f0a3e8f18222a7886ffd23a0e83 100644 --- a/official/modeling/grad_utils_test.py +++ b/official/modeling/grad_utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/hyperparams/__init__.py b/official/modeling/hyperparams/__init__.py index bcbc0aedd3d6013c14c641d9e61a0a717f188ec5..5503ad8e478ce624bb94219b4bc58c35387b30a9 100644 --- a/official/modeling/hyperparams/__init__.py +++ b/official/modeling/hyperparams/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/hyperparams/base_config.py b/official/modeling/hyperparams/base_config.py index f0afca0909eb0a63d54ae83b4d5fc44515a30c1e..f68b16b3645bde31152ff19f05cc2955bf374f1f 100644 --- a/official/modeling/hyperparams/base_config.py +++ b/official/modeling/hyperparams/base_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/hyperparams/base_config_test.py b/official/modeling/hyperparams/base_config_test.py index 21d0aaa1c3ee56884505e4ab5f72bc0212ceb74d..b27352af895300cf2aecaabc4a14baa7a2ef4790 100644 --- a/official/modeling/hyperparams/base_config_test.py +++ b/official/modeling/hyperparams/base_config_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/hyperparams/oneof.py b/official/modeling/hyperparams/oneof.py index 61591496eb41de44e6de9eb248c4460498a9a078..298b94fdab532c79037781f19f5a0f579aa05aa7 100644 --- a/official/modeling/hyperparams/oneof.py +++ b/official/modeling/hyperparams/oneof.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/hyperparams/oneof_test.py b/official/modeling/hyperparams/oneof_test.py index 2cde73c1545dd04894d0353b22e6254922717829..2ac29869a7aab025d75bae915d793eec0a716294 100644 --- a/official/modeling/hyperparams/oneof_test.py +++ b/official/modeling/hyperparams/oneof_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/hyperparams/params_dict.py b/official/modeling/hyperparams/params_dict.py index 76b0446f0ef407488464b1590a1f63765c8bde54..8da29ad7ddecabe6d6dd0f90da0f7fd4283314c9 100644 --- a/official/modeling/hyperparams/params_dict.py +++ b/official/modeling/hyperparams/params_dict.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -41,15 +41,15 @@ _PARAM_RE = re.compile( _CONST_VALUE_RE = re.compile(r'(\d.*|-\d.*|None)') -# Yaml loader with an implicit resolver to parse float decimal and exponential +# Yaml LOADER with an implicit resolver to parse float decimal and exponential # format. The regular experission parse the following cases: # 1- Decimal number with an optional exponential term. # 2- Integer number with an exponential term. # 3- Decimal number with an optional exponential term. # 4- Decimal number. -LOADER = yaml.SafeLoader -LOADER.add_implicit_resolver( +_LOADER = yaml.SafeLoader +_LOADER.add_implicit_resolver( 'tag:yaml.org,2002:float', re.compile(r''' ^(?:[-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)? @@ -288,42 +288,42 @@ class ParamsDict(object): _, left_v, _, right_v = _get_kvs(tokens, params_dict) if left_v != right_v: raise KeyError( - 'Found inconsistncy between key `{}` and key `{}`.'.format( + 'Found inconsistency between key `{}` and key `{}`.'.format( tokens[0], tokens[1])) elif '!=' in restriction: tokens = restriction.split('!=') _, left_v, _, right_v = _get_kvs(tokens, params_dict) if left_v == right_v: raise KeyError( - 'Found inconsistncy between key `{}` and key `{}`.'.format( + 'Found inconsistency between key `{}` and key `{}`.'.format( tokens[0], tokens[1])) elif '<' in restriction: tokens = restriction.split('<') _, left_v, _, right_v = _get_kvs(tokens, params_dict) if left_v >= right_v: raise KeyError( - 'Found inconsistncy between key `{}` and key `{}`.'.format( + 'Found inconsistency between key `{}` and key `{}`.'.format( tokens[0], tokens[1])) elif '<=' in restriction: tokens = restriction.split('<=') _, left_v, _, right_v = _get_kvs(tokens, params_dict) if left_v > right_v: raise KeyError( - 'Found inconsistncy between key `{}` and key `{}`.'.format( + 'Found inconsistency between key `{}` and key `{}`.'.format( tokens[0], tokens[1])) elif '>' in restriction: tokens = restriction.split('>') _, left_v, _, right_v = _get_kvs(tokens, params_dict) if left_v <= right_v: raise KeyError( - 'Found inconsistncy between key `{}` and key `{}`.'.format( + 'Found inconsistency between key `{}` and key `{}`.'.format( tokens[0], tokens[1])) elif '>=' in restriction: tokens = restriction.split('>=') _, left_v, _, right_v = _get_kvs(tokens, params_dict) if left_v < right_v: raise KeyError( - 'Found inconsistncy between key `{}` and key `{}`.'.format( + 'Found inconsistency between key `{}` and key `{}`.'.format( tokens[0], tokens[1])) else: raise ValueError('Unsupported relation in restriction.') @@ -332,7 +332,7 @@ class ParamsDict(object): def read_yaml_to_params_dict(file_path: str): """Reads a YAML file to a ParamsDict.""" with tf.io.gfile.GFile(file_path, 'r') as f: - params_dict = yaml.load(f, Loader=LOADER) + params_dict = yaml.load(f, Loader=_LOADER) return ParamsDict(params_dict) @@ -453,7 +453,7 @@ def override_params_dict(params, dict_or_string_or_yaml_file, is_strict): nested_csv_str_to_json_str(dict_or_string_or_yaml_file)) except ValueError: pass - params_dict = yaml.load(dict_or_string_or_yaml_file, Loader=LOADER) + params_dict = yaml.load(dict_or_string_or_yaml_file, Loader=_LOADER) if isinstance(params_dict, dict): params.override(params_dict, is_strict) else: diff --git a/official/modeling/hyperparams/params_dict_test.py b/official/modeling/hyperparams/params_dict_test.py index 248a81652a496266fb9656d40f77e665e8606f10..145590a4c2ce03b0c578b50c0188d9a7b549ceda 100644 --- a/official/modeling/hyperparams/params_dict_test.py +++ b/official/modeling/hyperparams/params_dict_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/__init__.py b/official/modeling/multitask/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/modeling/multitask/__init__.py +++ b/official/modeling/multitask/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/base_model.py b/official/modeling/multitask/base_model.py index 835d7e3443dd8991c32eb12c570479640b58487a..6db013400f5e93a817a7828c635f33956a660d60 100644 --- a/official/modeling/multitask/base_model.py +++ b/official/modeling/multitask/base_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -43,3 +43,12 @@ class MultiTaskBaseModel(tf.Module): def initialize(self): """Optional function that loads a pre-train checkpoint.""" return + + def build(self): + """Builds the networks for tasks to make sure variables are created.""" + # Try to build all sub tasks. + for task_model in self._sub_tasks.values(): + # Assumes all the tf.Module models are built because we don't have any + # way to check them. + if isinstance(task_model, tf.keras.Model) and not task_model.built: + _ = task_model(task_model.inputs) diff --git a/official/modeling/multitask/base_trainer.py b/official/modeling/multitask/base_trainer.py index 45cdb6cdde32866c31f23804df8b1efac521eee8..e3bf18718ed7adaa476af4f8aea9b30b3b820603 100644 --- a/official/modeling/multitask/base_trainer.py +++ b/official/modeling/multitask/base_trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/base_trainer_test.py b/official/modeling/multitask/base_trainer_test.py index 2427ff85f2af4c79fb3f7f3cc40c9fc82c0a7e61..2eb5acd252f5054f438aa2def1c6ea16e771c8db 100644 --- a/official/modeling/multitask/base_trainer_test.py +++ b/official/modeling/multitask/base_trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/configs.py b/official/modeling/multitask/configs.py index 453db3475072086606f0a979758524c5f789d454..a77d2c0956030a8899a8474627416e319ddfbd39 100644 --- a/official/modeling/multitask/configs.py +++ b/official/modeling/multitask/configs.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,6 +19,7 @@ import dataclasses from official.core import config_definitions as cfg from official.modeling import hyperparams +from official.modeling.privacy import configs as dp_configs @dataclasses.dataclass @@ -35,6 +36,8 @@ class MultiTaskConfig(hyperparams.Config): init_checkpoint: str = "" model: hyperparams.Config = None task_routines: Tuple[TaskRoutine, ...] = () + differential_privacy_config: Optional[ + dp_configs.DifferentialPrivacyConfig] = None @dataclasses.dataclass diff --git a/official/modeling/multitask/evaluator.py b/official/modeling/multitask/evaluator.py index c896e2c8811c828c2eb0199ae5ade8103ce65184..9433a318afb51b459079fdad9af3edcf1c1a4613 100644 --- a/official/modeling/multitask/evaluator.py +++ b/official/modeling/multitask/evaluator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/evaluator_test.py b/official/modeling/multitask/evaluator_test.py index 4725e63e5f996fa3432557906dd8548f08e99c53..660adcfc34fe957fdc5531334156e0391cdb4665 100644 --- a/official/modeling/multitask/evaluator_test.py +++ b/official/modeling/multitask/evaluator_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/interleaving_trainer.py b/official/modeling/multitask/interleaving_trainer.py index 1bc943dfb99696fbdcc3ec3517a9bbf7aea51e34..180e00ceeed14499bd290951908d8e7e8e179bf7 100644 --- a/official/modeling/multitask/interleaving_trainer.py +++ b/official/modeling/multitask/interleaving_trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -31,7 +31,9 @@ class MultiTaskInterleavingTrainer(base_trainer.MultiTaskBaseTrainer): multi_task: multitask.MultiTask, multi_task_model: Union[tf.keras.Model, base_model.MultiTaskBaseModel], - optimizer: tf.optimizers.Optimizer, + optimizer: Union[tf.optimizers.Optimizer, + tf.keras.optimizers.experimental.Optimizer, + tf.keras.optimizers.legacy.Optimizer], task_sampler: sampler.TaskSampler, trainer_options=None): super().__init__( @@ -69,6 +71,13 @@ class MultiTaskInterleavingTrainer(base_trainer.MultiTaskBaseTrainer): name: orbit.utils.create_global_step() for name in self.multi_task.tasks } + # If the new Keras optimizer is used, we require all model variables are + # created before the training and let the optimizer to create the slot + # variable all together. + if isinstance(optimizer, tf.keras.optimizers.experimental.Optimizer): + multi_task_model.build() + optimizer.build(multi_task_model.trainable_variables) + def task_step_counter(self, name): return self._task_step_counters[name] diff --git a/official/modeling/multitask/interleaving_trainer_test.py b/official/modeling/multitask/interleaving_trainer_test.py index a2b1da1b60d983817b029128737ce11275dfb549..6f871713ca7074444553e5c751dfc0d11cb35923 100644 --- a/official/modeling/multitask/interleaving_trainer_test.py +++ b/official/modeling/multitask/interleaving_trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/multitask.py b/official/modeling/multitask/multitask.py index 85a345382a33871bd587767166f73faef8454595..4a1b5d07bf6c0d2356b8046f15eac4953e561f64 100644 --- a/official/modeling/multitask/multitask.py +++ b/official/modeling/multitask/multitask.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,9 +23,11 @@ from official.core import task_factory from official.modeling import optimization from official.modeling.multitask import base_model from official.modeling.multitask import configs +from official.modeling.privacy import configs as dp_configs OptimizationConfig = optimization.OptimizationConfig RuntimeConfig = config_definitions.RuntimeConfig +DifferentialPrivacyConfig = dp_configs.DifferentialPrivacyConfig class MultiTask(tf.Module, metaclass=abc.ABCMeta): @@ -93,9 +95,11 @@ class MultiTask(tf.Module, metaclass=abc.ABCMeta): @classmethod def create_optimizer(cls, optimizer_config: OptimizationConfig, - runtime_config: Optional[RuntimeConfig] = None): + runtime_config: Optional[RuntimeConfig] = None, + dp_config: Optional[DifferentialPrivacyConfig] = None): return base_task.Task.create_optimizer( - optimizer_config=optimizer_config, runtime_config=runtime_config) + optimizer_config=optimizer_config, runtime_config=runtime_config, + dp_config=dp_config) def joint_train_step(self, task_inputs, multi_task_model: base_model.MultiTaskBaseModel, @@ -134,10 +138,10 @@ class MultiTask(tf.Module, metaclass=abc.ABCMeta): self.tasks[name].process_metrics(task_metrics[name], labels, outputs, **kwargs) - # Scales loss as the default gradients allreduce performs sum inside - # the optimizer. - scaled_loss = total_loss / tf.distribute.get_strategy( - ).num_replicas_in_sync + # Scales loss as the default gradients allreduce performs sum inside + # the optimizer. + scaled_loss = total_loss / tf.distribute.get_strategy( + ).num_replicas_in_sync tvars = multi_task_model.trainable_variables grads = tape.gradient(scaled_loss, tvars) optimizer.apply_gradients(list(zip(grads, tvars))) diff --git a/official/modeling/multitask/task_sampler.py b/official/modeling/multitask/task_sampler.py index 1c365a9df09866636f3a6bfa4ef78be8dd8ff624..5e062bd45b5c2ca8f1df515f75981736e17acc83 100644 --- a/official/modeling/multitask/task_sampler.py +++ b/official/modeling/multitask/task_sampler.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/task_sampler_test.py b/official/modeling/multitask/task_sampler_test.py index 5b4695049952dab250f9fdac3d6bfd134e2c644d..8b3d95ff462ccbea07fd618165a97c2ae52e034d 100644 --- a/official/modeling/multitask/task_sampler_test.py +++ b/official/modeling/multitask/task_sampler_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/multitask/test_utils.py b/official/modeling/multitask/test_utils.py index aa831223817b4968615f5aa87c1e3fbc39021218..166608f43aee8a9fa529bdd05aace2302b55e8e5 100644 --- a/official/modeling/multitask/test_utils.py +++ b/official/modeling/multitask/test_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,6 +28,8 @@ class MockFooModel(tf.keras.Model): super().__init__(*args, **kwargs) self._share_layer = shared_layer self._foo_specific_layer = tf.keras.layers.Dense(1) + self.inputs = {"foo": tf.keras.Input(shape=(2,), dtype=tf.float32), + "bar": tf.keras.Input(shape=(2,), dtype=tf.float32)} def call(self, inputs): self.add_loss(tf.zeros((1,), dtype=tf.float32)) @@ -39,11 +41,13 @@ class MockFooModel(tf.keras.Model): class MockBarModel(tf.keras.Model): + """A mock model can only consume 'bar' inputs.""" def __init__(self, shared_layer, *args, **kwargs): super().__init__(*args, **kwargs) self._share_layer = shared_layer self._bar_specific_layer = tf.keras.layers.Dense(1) + self.inputs = {"bar": tf.keras.Input(shape=(2,), dtype=tf.float32)} def call(self, inputs): self.add_loss(tf.zeros((2,), dtype=tf.float32)) diff --git a/official/modeling/multitask/train_lib.py b/official/modeling/multitask/train_lib.py index 62b022030937660720aed2d4417355f86a6fd7c8..920acbfcfa01664c89cdeb2aa66fb93c8ba3ed1b 100644 --- a/official/modeling/multitask/train_lib.py +++ b/official/modeling/multitask/train_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,7 @@ """Multitask training driver library.""" # pytype: disable=attribute-error import os -from typing import Any, List, Optional, Tuple +from typing import Any, List, Mapping, Optional, Tuple, Union from absl import logging import orbit import tensorflow as tf @@ -44,8 +44,12 @@ def run_experiment( mode: str, params: configs.MultiTaskExperimentConfig, model_dir: str, - trainer: base_trainer.MultiTaskBaseTrainer = None -) -> base_model.MultiTaskBaseModel: + run_post_eval: bool = False, + trainer: base_trainer.MultiTaskBaseTrainer = None, + best_ckpt_exporter_creator: Optional[Any] = train_utils + .maybe_create_best_ckpt_exporter +) -> Union[base_model.MultiTaskBaseModel, Tuple[base_model.MultiTaskBaseModel, + Mapping[Any, Any]]]: """Runs train/eval configured by the experiment params. Args: @@ -56,8 +60,11 @@ def run_experiment( or 'continuous_eval'. params: ExperimentConfig instance. model_dir: A 'str', a path to store model checkpoints and summaries. + run_post_eval: Whether to run post eval once after training, metrics logs + are returned. trainer: (optional) A multi-task trainer to use. If none is provided, a default one will be created based on `params`. + best_ckpt_exporter_creator: A functor for creating best checkpoint exporter. Returns: model: `base_model.MultiTaskBaseModel` instance. @@ -66,8 +73,7 @@ def run_experiment( is_training = 'train' in mode is_eval = 'eval' in mode with distribution_strategy.scope(): - optimizer = task.create_optimizer(params.trainer.optimizer_config, - params.runtime) + optimizer = train_utils.create_optimizer(task, params) kwargs = dict(multi_task=task, multi_task_model=model, optimizer=optimizer) if params.trainer.trainer_type == 'interleaving': sampler = task_sampler.get_task_sampler(params.trainer.task_sampler, @@ -83,8 +89,7 @@ def run_experiment( model=model, eval_steps=eval_steps, global_step=trainer.global_step if is_training else None, - checkpoint_exporter=train_utils.maybe_create_best_ckpt_exporter( - params, model_dir)) + checkpoint_exporter=best_ckpt_exporter_creator(params, model_dir)) else: evaluator = None @@ -95,7 +100,6 @@ def run_experiment( checkpoint = evaluator.checkpoint global_step = evaluator.global_step - # TODO(hongkuny,haozhangthu): Revisit initialization method. checkpoint_manager = tf.train.CheckpointManager( checkpoint, directory=model_dir, @@ -140,7 +144,11 @@ def run_experiment( else: raise NotImplementedError('The mode is not implemented: %s' % mode) - return model + if run_post_eval: + return model, evaluator.evaluate( + tf.convert_to_tensor(params.trainer.validation_steps)) # pytype: disable=bad-return-type # typed-keras + else: + return model def run_experiment_with_multitask_eval( @@ -153,7 +161,10 @@ def run_experiment_with_multitask_eval( model_dir: str, run_post_eval: bool = False, save_summary: bool = True, - trainer: Optional[core_lib.Trainer] = None) -> Tuple[Any, Any]: + trainer: Optional[core_lib.Trainer] = None, + best_ckpt_exporter_creator: Optional[Any] = train_utils + .maybe_create_best_ckpt_exporter, +) -> Tuple[Any, Any]: """Runs train/eval configured by the experiment params. Args: @@ -170,6 +181,7 @@ def run_experiment_with_multitask_eval( trainer: the core_lib.Trainer instance. It should be created within the strategy.scope(). If not provided, an instance will be created by default if `mode` contains 'train'. + best_ckpt_exporter_creator: A functor for creating best checkpoint exporter. Returns: model: `tf.keras.Model` instance. @@ -183,8 +195,7 @@ def run_experiment_with_multitask_eval( config=params, task=train_task, model=train_task.build_model(), - optimizer=train_task.create_optimizer(params.trainer.optimizer_config, - params.runtime), + optimizer=train_utils.create_optimizer(train_task, params), train=True, evaluate=False) else: @@ -200,8 +211,7 @@ def run_experiment_with_multitask_eval( model=model, global_step=trainer.global_step if is_training else None, eval_steps=eval_steps, - checkpoint_exporter=train_utils.maybe_create_best_ckpt_exporter( - params, model_dir)) + checkpoint_exporter=best_ckpt_exporter_creator(params, model_dir)) else: evaluator = None diff --git a/official/modeling/multitask/train_lib_test.py b/official/modeling/multitask/train_lib_test.py index 6f90a47f3dca42381bf0024fc8c22a835d3dfd52..acdefa584e7ec13765ef88d2068c61ca1ea528f9 100644 --- a/official/modeling/multitask/train_lib_test.py +++ b/official/modeling/multitask/train_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -58,8 +58,9 @@ class TrainLibTest(tf.test.TestCase, parameterized.TestCase): strategy_combinations.one_device_strategy_gpu, ], mode='eager', + optimizer=['sgd_experimental', 'sgd'], flag_mode=['train', 'eval', 'train_and_eval'])) - def test_end_to_end(self, distribution_strategy, flag_mode): + def test_end_to_end(self, distribution_strategy, optimizer, flag_mode): model_dir = self.get_temp_dir() experiment_config = configs.MultiTaskExperimentConfig( task=configs.MultiTaskConfig( @@ -70,6 +71,7 @@ class TrainLibTest(tf.test.TestCase, parameterized.TestCase): task_name='bar', task_config=test_utils.BarConfig())))) experiment_config = params_dict.override_params_dict( experiment_config, self._test_config, is_strict=False) + experiment_config.trainer.optimizer_config.optimizer.type = optimizer with distribution_strategy.scope(): test_multitask = multitask.MultiTask.from_config(experiment_config.task) model = test_utils.MockMultiTaskModel() diff --git a/official/modeling/optimization/__init__.py b/official/modeling/optimization/__init__.py index ee2b99603b0caf5338c0ecd1b78ef0b1577b64c1..c02b2b9a9133c49c0482e52ac51bd3623f5b9117 100644 --- a/official/modeling/optimization/__init__.py +++ b/official/modeling/optimization/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/optimization/adafactor_optimizer.py b/official/modeling/optimization/adafactor_optimizer.py index cea09bda415a7375172d781df3b7f84b3a9da322..b7f1944e61e41f411035a6b61c3d4b6a293a9cbe 100644 --- a/official/modeling/optimization/adafactor_optimizer.py +++ b/official/modeling/optimization/adafactor_optimizer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/optimization/configs/__init__.py b/official/modeling/optimization/configs/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/modeling/optimization/configs/__init__.py +++ b/official/modeling/optimization/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/optimization/configs/learning_rate_config.py b/official/modeling/optimization/configs/learning_rate_config.py index 3904b53dacb83fcb7b85793271f63ee304ad32c0..9af3cb673f8475ef6a77a36d04763e02cb29a9cf 100644 --- a/official/modeling/optimization/configs/learning_rate_config.py +++ b/official/modeling/optimization/configs/learning_rate_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -216,14 +216,14 @@ class StepCosineLrConfig(base_config.Config): """Configuration for stepwise learning rate decay. This class is a container for the piecewise cosine learning rate scheduling - configs. It will configure an instance of StepConsineDecayWithOffset keras + configs. It will configure an instance of StepCosineDecayWithOffset keras learning rate schedule. ```python boundaries: [100000, 110000] values: [1.0, 0.5] lr_decayed_fn = ( - lr_schedule.StepConsineDecayWithOffset( + lr_schedule.StepCosineDecayWithOffset( boundaries, values)) ``` @@ -243,7 +243,7 @@ class StepCosineLrConfig(base_config.Config): [boundaries[n], end] -> values[n+1] to 0. offset: An int. The offset applied to steps. Defaults to 0. """ - name: str = 'StepConsineDecayWithOffset' + name: str = 'StepCosineDecayWithOffset' boundaries: Optional[List[int]] = None values: Optional[List[float]] = None offset: int = 0 diff --git a/official/modeling/optimization/configs/optimization_config.py b/official/modeling/optimization/configs/optimization_config.py index 1bf87e420fb7e36a45f233520baec398d04a8057..f6caf069b3701c0ffbd8edc946b3db916eb89398 100644 --- a/official/modeling/optimization/configs/optimization_config.py +++ b/official/modeling/optimization/configs/optimization_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -45,8 +45,14 @@ class OptimizerConfig(oneof.OneOfConfig): """ type: Optional[str] = None sgd: opt_cfg.SGDConfig = opt_cfg.SGDConfig() + sgd_experimental: opt_cfg.SGDExperimentalConfig = ( + opt_cfg.SGDExperimentalConfig()) adam: opt_cfg.AdamConfig = opt_cfg.AdamConfig() + adam_experimental: opt_cfg.AdamExperimentalConfig = ( + opt_cfg.AdamExperimentalConfig()) adamw: opt_cfg.AdamWeightDecayConfig = opt_cfg.AdamWeightDecayConfig() + adamw_experimental: opt_cfg.AdamWeightDecayExperimentalConfig = ( + opt_cfg.AdamWeightDecayExperimentalConfig()) lamb: opt_cfg.LAMBConfig = opt_cfg.LAMBConfig() rmsprop: opt_cfg.RMSPropConfig = opt_cfg.RMSPropConfig() lars: opt_cfg.LARSConfig = opt_cfg.LARSConfig() diff --git a/official/modeling/optimization/configs/optimization_config_test.py b/official/modeling/optimization/configs/optimization_config_test.py index 02b99f592e9ba4f66ccd9e906eee5158b2b1b13e..6fc11fea0223cccf5a8920ae0c99d7753761e253 100644 --- a/official/modeling/optimization/configs/optimization_config_test.py +++ b/official/modeling/optimization/configs/optimization_config_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/optimization/configs/optimizer_config.py b/official/modeling/optimization/configs/optimizer_config.py index a4696d26548b491d1131211e418a5f0468411291..300d5c440a15c09fda72e55f0456e4bbb0a22eb8 100644 --- a/official/modeling/optimization/configs/optimizer_config.py +++ b/official/modeling/optimization/configs/optimizer_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -54,6 +54,27 @@ class SGDConfig(BaseOptimizerConfig): momentum: float = 0.0 +# TODO(b/216129465): Merge this config with SGDConfig after the experimental +# optimizer graduates. +@dataclasses.dataclass +class SGDExperimentalConfig(BaseOptimizerConfig): + """Configuration for SGD optimizer. + + The attributes for this class matches the arguments of + `tf.keras.optimizer.experimental.SGD`. + + Attributes: + name: name of the optimizer. + nesterov: nesterov for SGD optimizer. + momentum: momentum for SGD optimizer. + jit_compile: if True, jit compile will be used. + """ + name: str = "SGD" + nesterov: bool = False + momentum: float = 0.0 + jit_compile: bool = False + + @dataclasses.dataclass class RMSPropConfig(BaseOptimizerConfig): """Configuration for RMSProp optimizer. @@ -115,6 +136,30 @@ class AdamConfig(BaseOptimizerConfig): amsgrad: bool = False +@dataclasses.dataclass +class AdamExperimentalConfig(BaseOptimizerConfig): + """Configuration for experimental Adam optimizer. + + The attributes for this class matches the arguments of + `tf.keras.optimizer.experimental.Adam`. + + Attributes: + name: name of the optimizer. + beta_1: decay rate for 1st order moments. + beta_2: decay rate for 2st order moments. + epsilon: epsilon value used for numerical stability in Adam optimizer. + amsgrad: boolean. Whether to apply AMSGrad variant of this algorithm from + the paper "On the Convergence of Adam and beyond". + jit_compile: if True, jit compile will be used. + """ + name: str = "Adam" + beta_1: float = 0.9 + beta_2: float = 0.999 + epsilon: float = 1e-07 + amsgrad: bool = False + jit_compile: bool = False + + @dataclasses.dataclass class AdamWeightDecayConfig(BaseOptimizerConfig): """Configuration for Adam optimizer with weight decay. @@ -145,6 +190,32 @@ class AdamWeightDecayConfig(BaseOptimizerConfig): gradient_clip_norm: float = 1.0 +@dataclasses.dataclass +class AdamWeightDecayExperimentalConfig(BaseOptimizerConfig): + """Configuration for Adam optimizer with weight decay. + + Attributes: + name: name of the optimizer. + beta_1: decay rate for 1st order moments. + beta_2: decay rate for 2st order moments. + epsilon: epsilon value used for numerical stability in the optimizer. + amsgrad: boolean. Whether to apply AMSGrad variant of this algorithm from + the paper "On the Convergence of Adam and beyond". + weight_decay: float. Weight decay rate. Default to 0. + global_clipnorm: A positive float. Clips the gradients to this maximum + L2-norm. Default to 1.0. + jit_compile: if True, jit compile will be used. + """ + name: str = "AdamWeightDecayExperimental" + beta_1: float = 0.9 + beta_2: float = 0.999 + epsilon: float = 1e-07 + amsgrad: bool = False + weight_decay: float = 0.0 + global_clipnorm: float = 1.0 + jit_compile: bool = False + + @dataclasses.dataclass class LAMBConfig(BaseOptimizerConfig): """Configuration for LAMB optimizer. @@ -266,3 +337,5 @@ class AdafactorConfig(BaseOptimizerConfig): min_dim_size_to_factor: int = 128 epsilon1: float = 1e-30 epsilon2: float = 1e-3 + weight_decay: Optional[float] = None + include_in_weight_decay: Optional[str] = None diff --git a/official/modeling/optimization/ema_optimizer.py b/official/modeling/optimization/ema_optimizer.py index c4f44d7124d888a7b0403442b1f57385d820e789..95557d5f3776c1ea094b4ee2f944e3b6c562f95d 100644 --- a/official/modeling/optimization/ema_optimizer.py +++ b/official/modeling/optimization/ema_optimizer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ import tensorflow as tf # pylint: disable=protected-access -class ExponentialMovingAverage(tf.keras.optimizers.Optimizer): +class ExponentialMovingAverage(tf.keras.optimizers.legacy.Optimizer): """Optimizer that computes an exponential moving average of the variables. Empirically it has been found that using the moving average of the trained diff --git a/official/modeling/optimization/lars_optimizer.py b/official/modeling/optimization/lars_optimizer.py index ac15042756c02c3d3e2da22419cac2e04522b57e..ce67a10f966cb1f2d2a385a4b8c728b52464a623 100644 --- a/official/modeling/optimization/lars_optimizer.py +++ b/official/modeling/optimization/lars_optimizer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,7 +22,7 @@ import tensorflow as tf # pylint: disable=protected-access -class LARS(tf.keras.optimizers.Optimizer): +class LARS(tf.keras.optimizers.legacy.Optimizer): """Layer-wise Adaptive Rate Scaling for large batch training. Introduced by "Large Batch Training of Convolutional Networks" by Y. You, diff --git a/official/modeling/optimization/legacy_adamw.py b/official/modeling/optimization/legacy_adamw.py new file mode 100644 index 0000000000000000000000000000000000000000..c1c57e280d58ecac3f83c355b1aa4478d670f1b3 --- /dev/null +++ b/official/modeling/optimization/legacy_adamw.py @@ -0,0 +1,139 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Adam optimizer with weight decay that exactly matches the original BERT.""" + +import re + +from absl import logging +import tensorflow as tf + + +class AdamWeightDecay(tf.keras.optimizers.legacy.Adam): + """Adam enables L2 weight decay and clip_by_global_norm on gradients. + + [Warning!]: Keras optimizer supports gradient clipping and has an AdamW + implementation. Please consider evaluating the choice in Keras package. + + Just adding the square of the weights to the loss function is *not* the + correct way of using L2 regularization/weight decay with Adam, since that will + interact with the m and v parameters in strange ways. + + Instead we want to decay the weights in a manner that doesn't interact with + the m/v parameters. This is equivalent to adding the square of the weights to + the loss with plain (non-momentum) SGD. + """ + + def __init__(self, + learning_rate=0.001, + beta_1=0.9, + beta_2=0.999, + epsilon=1e-7, + amsgrad=False, + weight_decay_rate=0.0, + include_in_weight_decay=None, + exclude_from_weight_decay=None, + gradient_clip_norm=1.0, + name='AdamWeightDecay', + **kwargs): + super(AdamWeightDecay, self).__init__(learning_rate, beta_1, beta_2, + epsilon, amsgrad, name, **kwargs) + self.weight_decay_rate = weight_decay_rate + self.gradient_clip_norm = gradient_clip_norm + self._include_in_weight_decay = include_in_weight_decay + self._exclude_from_weight_decay = exclude_from_weight_decay + logging.info('AdamWeightDecay gradient_clip_norm=%f', gradient_clip_norm) + + def _prepare_local(self, var_device, var_dtype, apply_state): + super(AdamWeightDecay, self)._prepare_local(var_device, var_dtype, # pytype: disable=attribute-error # typed-keras + apply_state) + apply_state[(var_device, var_dtype)]['weight_decay_rate'] = tf.constant( + self.weight_decay_rate, name='adam_weight_decay_rate') + + def _decay_weights_op(self, var, learning_rate, apply_state): + do_decay = self._do_use_weight_decay(var.name) + if do_decay: + return var.assign_sub( + learning_rate * var * + apply_state[(var.device, var.dtype.base_dtype)]['weight_decay_rate'], + use_locking=self._use_locking) + return tf.no_op() + + def apply_gradients(self, + grads_and_vars, + name=None, + experimental_aggregate_gradients=True): + grads, tvars = list(zip(*grads_and_vars)) + if experimental_aggregate_gradients and self.gradient_clip_norm > 0.0: + # when experimental_aggregate_gradients = False, apply_gradients() no + # longer implicitly allreduce gradients, users manually allreduce gradient + # and passed the allreduced grads_and_vars. For now, the + # clip_by_global_norm will be moved to before the explicit allreduce to + # keep the math the same as TF 1 and pre TF 2.2 implementation. + (grads, _) = tf.clip_by_global_norm( + grads, clip_norm=self.gradient_clip_norm) + return super(AdamWeightDecay, self).apply_gradients( + zip(grads, tvars), + name=name, + experimental_aggregate_gradients=experimental_aggregate_gradients) + + def _get_lr(self, var_device, var_dtype, apply_state): + """Retrieves the learning rate with the given state.""" + if apply_state is None: + return self._decayed_lr_t[var_dtype], {} + + apply_state = apply_state or {} + coefficients = apply_state.get((var_device, var_dtype)) + if coefficients is None: + coefficients = self._fallback_apply_state(var_device, var_dtype) + apply_state[(var_device, var_dtype)] = coefficients + + return coefficients['lr_t'], dict(apply_state=apply_state) + + def _resource_apply_dense(self, grad, var, apply_state=None): + lr_t, kwargs = self._get_lr(var.device, var.dtype.base_dtype, apply_state) + decay = self._decay_weights_op(var, lr_t, apply_state) + with tf.control_dependencies([decay]): + return super(AdamWeightDecay, + self)._resource_apply_dense(grad, var, **kwargs) # pytype: disable=attribute-error # typed-keras + + def _resource_apply_sparse(self, grad, var, indices, apply_state=None): + lr_t, kwargs = self._get_lr(var.device, var.dtype.base_dtype, apply_state) + decay = self._decay_weights_op(var, lr_t, apply_state) + with tf.control_dependencies([decay]): + return super(AdamWeightDecay, + self)._resource_apply_sparse(grad, var, indices, **kwargs) # pytype: disable=attribute-error # typed-keras + + def get_config(self): + config = super(AdamWeightDecay, self).get_config() + config.update({ + 'weight_decay_rate': self.weight_decay_rate, + }) + return config + + def _do_use_weight_decay(self, param_name): + """Whether to use L2 weight decay for `param_name`.""" + if self.weight_decay_rate == 0: + return False + + if self._include_in_weight_decay: + for r in self._include_in_weight_decay: + if re.search(r, param_name) is not None: + return True + + if self._exclude_from_weight_decay: + for r in self._exclude_from_weight_decay: + if re.search(r, param_name) is not None: + return False + return True diff --git a/official/modeling/optimization/lr_schedule.py b/official/modeling/optimization/lr_schedule.py index 5f62f10b11501003c3f066c748811fc19f5882ec..b4846d7aeca69243a763f89f5878c1d7f14005bb 100644 --- a/official/modeling/optimization/lr_schedule.py +++ b/official/modeling/optimization/lr_schedule.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,7 +23,7 @@ import tensorflow as tf def _make_offset_wrapper(new_class_name: str, base_lr_class): """Generates a offset wrapper of learning rate schedule. - It will returns a subclass of the the `base_lr_class`, the subclass takes an + It will returns a subclass of the `base_lr_class`, the subclass takes an `offset` argument in the constructor. When the new class instance is called, the behavior is: new_class_object(step) = base_lr_class_object(step - offset) @@ -386,11 +386,11 @@ class PowerDecayWithOffset(tf.keras.optimizers.schedules.LearningRateSchedule): } -class StepConsineDecayWithOffset( +class StepCosineDecayWithOffset( tf.keras.optimizers.schedules.LearningRateSchedule): """Stepwise cosine learning rate decay with offset. - Learning rate is equivalent to one or more consine decay(s) starting and + Learning rate is equivalent to one or more cosine decay(s) starting and ending at each interval. ExampleL @@ -399,7 +399,7 @@ class StepConsineDecayWithOffset( boundaries: [100000, 110000] values: [1.0, 0.5] lr_decayed_fn = ( - lr_schedule.StepConsineDecayWithOffset( + lr_schedule.StepCosineDecayWithOffset( boundaries, values)) ``` @@ -412,7 +412,7 @@ class StepConsineDecayWithOffset( boundaries, values, offset: int = 0, - name: str = "StepConsineDecayWithOffset"): + name: str = "StepCosineDecayWithOffset"): """Initialize configuration of the learning rate schedule. Args: @@ -444,7 +444,7 @@ class StepConsineDecayWithOffset( ] + [0]) def __call__(self, global_step): - with tf.name_scope(self.name or "StepConsineDecayWithOffset"): + with tf.name_scope(self.name or "StepCosineDecayWithOffset"): global_step = tf.cast(global_step - self.offset, tf.float32) lr_levels = self.values lr_steps = self.boundaries diff --git a/official/modeling/optimization/lr_schedule_test.py b/official/modeling/optimization/lr_schedule_test.py index bafd8be1fad277cfd66579e6336e23493337730a..df74db692eb2226fd639cd5cdac18ae3abbe162d 100644 --- a/official/modeling/optimization/lr_schedule_test.py +++ b/official/modeling/optimization/lr_schedule_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/optimization/optimizer_factory.py b/official/modeling/optimization/optimizer_factory.py index 4f5b8929b0d87bcb6a8023e85e49df238cd3228e..8ceb6a33307ab648725e314f0429426102b89929 100644 --- a/official/modeling/optimization/optimizer_factory.py +++ b/official/modeling/optimization/optimizer_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,22 +23,38 @@ from official.modeling.optimization import slide_optimizer from official.modeling.optimization import adafactor_optimizer from official.modeling.optimization import ema_optimizer from official.modeling.optimization import lars_optimizer +from official.modeling.optimization import legacy_adamw from official.modeling.optimization import lr_schedule from official.modeling.optimization.configs import optimization_config as opt_cfg -from official.nlp import optimization as nlp_optimization -OPTIMIZERS_CLS = { - 'sgd': tf.keras.optimizers.SGD, - 'adam': tf.keras.optimizers.Adam, - 'adamw': nlp_optimization.AdamWeightDecay, +# Optimizer CLS to be used in both legacy and new path. +SHARED_OPTIMIZERS = { + 'sgd_experimental': tf.keras.optimizers.experimental.SGD, + 'adam_experimental': tf.keras.optimizers.experimental.Adam, + 'adamw': legacy_adamw.AdamWeightDecay, + 'adamw_experimental': tf.keras.optimizers.experimental.AdamW, 'lamb': tfa_optimizers.LAMB, - 'rmsprop': tf.keras.optimizers.RMSprop, 'lars': lars_optimizer.LARS, - 'adagrad': tf.keras.optimizers.Adagrad, 'slide': slide_optimizer.SLIDE, 'adafactor': adafactor_optimizer.Adafactor, } +LEGACY_OPTIMIZERS_CLS = { + 'sgd': tf.keras.optimizers.legacy.SGD, + 'adam': tf.keras.optimizers.legacy.Adam, + 'rmsprop': tf.keras.optimizers.legacy.RMSprop, + 'adagrad': tf.keras.optimizers.legacy.Adagrad, +} +LEGACY_OPTIMIZERS_CLS.update(SHARED_OPTIMIZERS) + +NEW_OPTIMIZERS_CLS = { + 'sgd': tf.keras.optimizers.experimental.SGD, + 'adam': tf.keras.optimizers.experimental.Adam, + 'rmsprop': tf.keras.optimizers.experimental.RMSprop, + 'adagrad': tf.keras.optimizers.experimental.Adagrad, +} +NEW_OPTIMIZERS_CLS.update(SHARED_OPTIMIZERS) + LR_CLS = { 'stepwise': lr_schedule.PiecewiseConstantDecayWithOffset, 'polynomial': lr_schedule.PolynomialDecayWithOffset, @@ -47,7 +63,7 @@ LR_CLS = { 'power': lr_schedule.DirectPowerDecay, 'power_linear': lr_schedule.PowerAndLinearDecay, 'power_with_offset': lr_schedule.PowerDecayWithOffset, - 'step_cosine_with_offset': lr_schedule.StepConsineDecayWithOffset, + 'step_cosine_with_offset': lr_schedule.StepCosineDecayWithOffset, } WARMUP_CLS = { @@ -56,8 +72,13 @@ WARMUP_CLS = { } -def register_optimizer_cls( - key: str, optimizer_config_cls: tf.keras.optimizers.Optimizer): +def register_optimizer_cls(key: str, + optimizer_config_cls: Union[ + tf.keras.optimizers.Optimizer, + tf.keras.optimizers.legacy.Optimizer, + tf.keras.optimizers.experimental.Optimizer + ], + use_legacy_optimizer: bool = True): """Register customize optimizer cls. The user will still need to subclass data classes in @@ -66,10 +87,16 @@ def register_optimizer_cls( Args: key: A string to that the optimizer_config_cls is registered with. optimizer_config_cls: A class which inherits tf.keras.optimizers.Optimizer. + use_legacy_optimizer: A boolean that indicates if using legacy optimizers. """ - if key in OPTIMIZERS_CLS: - raise ValueError('%s already registered in OPTIMIZER_CLS.' % key) - OPTIMIZERS_CLS[key] = optimizer_config_cls + if use_legacy_optimizer: + if key in LEGACY_OPTIMIZERS_CLS: + raise ValueError('%s already registered in LEGACY_OPTIMIZERS_CLS.' % key) + LEGACY_OPTIMIZERS_CLS[key] = optimizer_config_cls + else: + if key in NEW_OPTIMIZERS_CLS: + raise ValueError('%s already registered in NEW_OPTIMIZERS_CLS.' % key) + NEW_OPTIMIZERS_CLS[key] = optimizer_config_cls class OptimizerFactory: @@ -84,6 +111,8 @@ class OptimizerFactory: (4) Build optimizer. This is a typical example for using this class: + + ``` params = { 'optimizer': { 'type': 'sgd', @@ -103,6 +132,7 @@ class OptimizerFactory: opt_factory = OptimizerFactory(opt_config) lr = opt_factory.build_learning_rate() optimizer = opt_factory.build_optimizer(lr) + ``` """ def __init__(self, config: opt_cfg.OptimizationConfig): @@ -155,11 +185,15 @@ class OptimizerFactory: def build_optimizer( self, lr: Union[tf.keras.optimizers.schedules.LearningRateSchedule, float], + gradient_aggregator: Optional[Callable[ + [List[Tuple[tf.Tensor, tf.Tensor]]], List[Tuple[tf.Tensor, + tf.Tensor]]]] = None, gradient_transformers: Optional[List[Callable[ - [List[Tuple[tf.Tensor, tf.Tensor]]], List[Tuple[tf.Tensor, tf.Tensor]] - ]]] = None, + [List[Tuple[tf.Tensor, tf.Tensor]]], List[Tuple[tf.Tensor, + tf.Tensor]]]]] = None, postprocessor: Optional[Callable[[tf.keras.optimizers.Optimizer], - tf.keras.optimizers.Optimizer]] = None): + tf.keras.optimizers.Optimizer]] = None, + use_legacy_optimizer: bool = True): """Build optimizer. Builds optimizer from config. It takes learning rate as input, and builds @@ -169,6 +203,7 @@ class OptimizerFactory: Args: lr: A floating point value, or a tf.keras.optimizers.schedules.LearningRateSchedule instance. + gradient_aggregator: Optional function to overwrite gradient aggregation. gradient_transformers: Optional list of functions to use to transform gradients before applying updates to Variables. The functions are applied after gradient_aggregator. The functions should accept and @@ -176,9 +211,11 @@ class OptimizerFactory: global_clipnorm should not be set when gradient_transformers is passed. postprocessor: An optional function for postprocessing the optimizer. It takes an optimizer and returns an optimizer. + use_legacy_optimizer: A boolean that indicates if using legacy optimizers. Returns: - tf.keras.optimizers.Optimizer instance. + `tf.keras.optimizers.legacy.Optimizer` or + `tf.keras.optimizers.experimental.Optimizer` instance. """ optimizer_dict = self._optimizer_config.as_dict() @@ -191,18 +228,39 @@ class OptimizerFactory: del optimizer_dict['global_clipnorm'] optimizer_dict['learning_rate'] = lr + if gradient_aggregator is not None: + optimizer_dict['gradient_aggregator'] = gradient_aggregator if gradient_transformers is not None: optimizer_dict['gradient_transformers'] = gradient_transformers - optimizer = OPTIMIZERS_CLS[self._optimizer_type](**optimizer_dict) + if use_legacy_optimizer: + optimizer = LEGACY_OPTIMIZERS_CLS[self._optimizer_type](**optimizer_dict) + else: + if 'decay' in optimizer_dict: + raise ValueError( + '`decay` is deprecated in new Keras optimizer, please reflect the ' + 'decay logic in `lr` or set `use_legacy_optimizer=True` to use the ' + 'legacy optimizer.') + optimizer = NEW_OPTIMIZERS_CLS[self._optimizer_type](**optimizer_dict) if self._use_ema: + if not use_legacy_optimizer: + raise ValueError( + 'EMA can only work with the legacy optimizer, please set ' + '`use_legacy_optimizer=True`.') optimizer = ema_optimizer.ExponentialMovingAverage( optimizer, **self._ema_config.as_dict()) if postprocessor: optimizer = postprocessor(optimizer) - assert isinstance(optimizer, tf.keras.optimizers.Optimizer), ( - 'OptimizerFactory.build_optimizer returning a non-optimizer object: ' - '{}'.format(optimizer)) - - return optimizer + if isinstance(optimizer, tf.keras.optimizers.Optimizer): + return optimizer + # The following check makes sure the function won't break in older TF + # version because of missing the experimental/legacy package. + if hasattr(tf.keras.optimizers, 'experimental'): + if isinstance(optimizer, tf.keras.optimizers.experimental.Optimizer): + return optimizer + if hasattr(tf.keras.optimizers, 'legacy'): + if isinstance(optimizer, tf.keras.optimizers.legacy.Optimizer): + return optimizer + raise TypeError('OptimizerFactory.build_optimizer returning a ' + 'non-optimizer object: {}'.format(optimizer)) diff --git a/official/modeling/optimization/optimizer_factory_test.py b/official/modeling/optimization/optimizer_factory_test.py index e0cc714483b5a06464436b94b3e2ffe1cf65364a..4d23c2cd4574f9d4a2ecc31b3fb126ebce9c5316 100644 --- a/official/modeling/optimization/optimizer_factory_test.py +++ b/official/modeling/optimization/optimizer_factory_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -37,7 +37,7 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase): } } } - optimizer_cls = optimizer_factory.OPTIMIZERS_CLS[optimizer_type] + optimizer_cls = optimizer_factory.LEGACY_OPTIMIZERS_CLS[optimizer_type] expected_optimizer_config = optimizer_cls().get_config() expected_optimizer_config['learning_rate'] = 0.1 @@ -49,6 +49,72 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase): self.assertIsInstance(optimizer, optimizer_cls) self.assertEqual(expected_optimizer_config, optimizer.get_config()) + @parameterized.parameters(('sgd'), ('rmsprop'), ('adam'), ('adamw'), ('lamb'), + ('lars'), ('adagrad')) + def test_new_optimizers(self, optimizer_type): + params = { + 'optimizer': { + 'type': optimizer_type + }, + 'learning_rate': { + 'type': 'constant', + 'constant': { + 'learning_rate': 0.1 + } + } + } + optimizer_cls = optimizer_factory.NEW_OPTIMIZERS_CLS[optimizer_type] + expected_optimizer_config = optimizer_cls().get_config() + expected_optimizer_config['learning_rate'] = 0.1 + + opt_config = optimization_config.OptimizationConfig(params) + if optimizer_type == 'sgd': + # Delete unsupported arg `decay` from SGDConfig. + delattr(opt_config.optimizer.sgd, 'decay') + opt_factory = optimizer_factory.OptimizerFactory(opt_config) + lr = opt_factory.build_learning_rate() + optimizer = opt_factory.build_optimizer( + lr, postprocessor=lambda x: x, use_legacy_optimizer=False) + + self.assertIsInstance(optimizer, optimizer_cls) + self.assertEqual(expected_optimizer_config, optimizer.get_config()) + + def test_gradient_aggregator(self): + params = { + 'optimizer': { + 'type': 'adam', + }, + 'learning_rate': { + 'type': 'constant', + 'constant': { + 'learning_rate': 1.0 + } + } + } + opt_config = optimization_config.OptimizationConfig(params) + opt_factory = optimizer_factory.OptimizerFactory(opt_config) + lr = opt_factory.build_learning_rate() + + # Dummy function to zero out gradients. + zero_grads = lambda gv: [(tf.zeros_like(g), v) for g, v in gv] + + optimizer = opt_factory.build_optimizer(lr, gradient_aggregator=zero_grads) + if isinstance(optimizer, tf.keras.optimizers.experimental.Optimizer): + self.skipTest('New Keras optimizer does not support ' + '`gradient_aggregator` arg.') + + var0 = tf.Variable([1.0, 2.0]) + var1 = tf.Variable([3.0, 4.0]) + + grads0 = tf.constant([1.0, 1.0]) + grads1 = tf.constant([1.0, 1.0]) + + grads_and_vars = list(zip([grads0, grads1], [var0, var1])) + optimizer.apply_gradients(grads_and_vars) + + self.assertAllClose(np.array([1.0, 2.0]), var0.numpy()) + self.assertAllClose(np.array([3.0, 4.0]), var1.numpy()) + @parameterized.parameters((None, None), (1.0, None), (None, 1.0)) def test_gradient_clipping(self, clipnorm, clipvalue): params = { @@ -107,6 +173,25 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase): optimizer_factory.OptimizerFactory( optimization_config.OptimizationConfig(params)) + def test_wrong_return_type(self): + optimizer_type = 'sgd' + params = { + 'optimizer': { + 'type': optimizer_type + }, + 'learning_rate': { + 'type': 'constant', + 'constant': { + 'learning_rate': 0.1 + } + } + } + + opt_config = optimization_config.OptimizationConfig(params) + opt_factory = optimizer_factory.OptimizerFactory(opt_config) + with self.assertRaises(TypeError): + _ = opt_factory.build_optimizer(0.1, postprocessor=lambda x: None) + # TODO(b/187559334) refactor lr_schedule tests into `lr_schedule_test.py`. @@ -418,7 +503,7 @@ class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase): } } } - expected_lr_step_values = [[0, 0.0], [5000, 1e-4/2.0], [10000, 1e-4], + expected_lr_step_values = [[0, 0.0], [5000, 1e-4 / 2.0], [10000, 1e-4], [20000, 9.994863e-05], [499999, 5e-05]] opt_config = optimization_config.OptimizationConfig(params) opt_factory = optimizer_factory.OptimizerFactory(opt_config) @@ -434,10 +519,12 @@ class OptimizerFactoryRegistryTest(tf.test.TestCase): class MyClass(): pass + optimizer_factory.register_optimizer_cls('test', MyClass) - self.assertIn('test', optimizer_factory.OPTIMIZERS_CLS) + self.assertIn('test', optimizer_factory.LEGACY_OPTIMIZERS_CLS) with self.assertRaisesRegex(ValueError, 'test already registered.*'): optimizer_factory.register_optimizer_cls('test', MyClass) + if __name__ == '__main__': tf.test.main() diff --git a/official/modeling/optimization/slide_optimizer.py b/official/modeling/optimization/slide_optimizer.py index c1975a3111e109bbb0e40dfb45cb04bc98246ad2..8bbd468746149ffe20cee9ff6f40de1071630327 100644 --- a/official/modeling/optimization/slide_optimizer.py +++ b/official/modeling/optimization/slide_optimizer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/performance.py b/official/modeling/performance.py index c1b23714e2b949db97ffc7fe3ba90aa521a36428..3c6f6d15a564494e0307fb9e6af43309fe00c3a1 100644 --- a/official/modeling/performance.py +++ b/official/modeling/performance.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/modeling/privacy/__init__.py b/official/modeling/privacy/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/modeling/privacy/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/modeling/privacy/configs.py b/official/modeling/privacy/configs.py new file mode 100644 index 0000000000000000000000000000000000000000..c8d4692a563c1e458e7cdccd60419491e8cf952c --- /dev/null +++ b/official/modeling/privacy/configs.py @@ -0,0 +1,26 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs for differential privacy.""" +import dataclasses + +from official.modeling.hyperparams import base_config + + +@dataclasses.dataclass +class DifferentialPrivacyConfig(base_config.Config): + # Applied to the gradients + # Setting to a large number so nothing is clipped. + clipping_norm: float = 100000000.0 # 10^9 + noise_multiplier: float = 0.0 diff --git a/official/modeling/privacy/configs_test.py b/official/modeling/privacy/configs_test.py new file mode 100644 index 0000000000000000000000000000000000000000..485e4e5c4acb9b1187f4df0fe99ee2d745749f86 --- /dev/null +++ b/official/modeling/privacy/configs_test.py @@ -0,0 +1,41 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for configs.""" + +import tensorflow as tf +from official.modeling.privacy import configs + + +class ConfigsTest(tf.test.TestCase): + + def test_clipping_norm_default(self): + clipping_norm = configs.DifferentialPrivacyConfig().clipping_norm + self.assertEqual(100000000.0, clipping_norm) + + def test_noise_multiplier_default(self): + noise_multiplier = configs.DifferentialPrivacyConfig().noise_multiplier + self.assertEqual(0.0, noise_multiplier) + + def test_config(self): + dp_config = configs.DifferentialPrivacyConfig( + clipping_norm=1.0, + noise_multiplier=1.0, + ) + self.assertEqual(1.0, dp_config.clipping_norm) + self.assertEqual(1.0, dp_config.noise_multiplier) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/modeling/privacy/ops.py b/official/modeling/privacy/ops.py new file mode 100644 index 0000000000000000000000000000000000000000..8b0247020855158ba5bf65fc53f75d436106ab64 --- /dev/null +++ b/official/modeling/privacy/ops.py @@ -0,0 +1,42 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Ops for differential privacy (gradient) transforms.""" + +from typing import List, Tuple +import tensorflow as tf + + +def clip_l2_norm(grads_vars: List[Tuple[tf.Tensor, tf.Tensor]], + l2_norm_clip: float) -> List[Tuple[tf.Tensor, tf.Tensor]]: + """Clip gradients by global norm.""" + + gradients = [] + variables = [] + for (g, v) in grads_vars: + gradients.append(g) + variables.append(v) + clipped_gradients = tf.clip_by_global_norm(gradients, l2_norm_clip)[0] + return list(zip(clipped_gradients, variables)) + + +def add_noise(grads_vars: List[Tuple[tf.Tensor, tf.Tensor]], + noise_stddev: float) -> List[Tuple[tf.Tensor, tf.Tensor]]: + """Add noise to gradients.""" + ret = [] + for (g, v) in grads_vars: + noise = tf.random.normal(tf.shape(g), stddev=noise_stddev) + ret.append((g + noise, v)) + return ret + diff --git a/official/modeling/privacy/ops_test.py b/official/modeling/privacy/ops_test.py new file mode 100644 index 0000000000000000000000000000000000000000..4f5d580c75fe0bbcaebda5f0a912625d27792898 --- /dev/null +++ b/official/modeling/privacy/ops_test.py @@ -0,0 +1,52 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for ops.""" + +from unittest import mock + +import tensorflow as tf + +from official.modeling.privacy import ops + + +class OpsTest(tf.test.TestCase): + + def test_clip_l2_norm(self): + x = tf.constant([4.0, 3.0]) + y = tf.constant([[12.0]]) + tensors = [(x, x), (y, y)] + clipped = ops.clip_l2_norm(tensors, 1.0) + for a, b in zip(clipped, tensors): + self.assertAllClose(a[0], b[0] / 13.0) # sqrt(4^2 + 3^2 + 12 ^3) = 13 + self.assertAllClose(a[1], b[1]) + + @mock.patch.object(tf.random, + 'normal', + autospec=True) + def test_add_noise(self, mock_random): + x = tf.constant([0.0, 0.0]) + y = tf.constant([[0.0]]) + tensors = [(x, x), (y, y)] + mock_random.side_effect = [tf.constant([1.0, 1.0]), tf.constant([[1.0]])] + added = ops.add_noise(tensors, 10.0) + for a, b in zip(added, tensors): + self.assertAllClose(a[0], b[0] + 1.0) + self.assertAllClose(a[1], b[1]) + _, kwargs = mock_random.call_args + self.assertEqual(kwargs['stddev'], 10.0) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/modeling/tf_utils.py b/official/modeling/tf_utils.py index e151b7386ab1c6d62c16aa13394e30cadb7036fa..cdde227cb4b4da5bc9e645d84e5b5770ccabf6f3 100644 --- a/official/modeling/tf_utils.py +++ b/official/modeling/tf_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,6 +14,7 @@ """Common TF utilities.""" +import functools import six import tensorflow as tf @@ -82,19 +83,22 @@ def is_special_none_tensor(tensor): return tensor.shape.ndims == 0 and tensor.dtype == tf.int32 -def get_activation(identifier, use_keras_layer=False): - """Maps a identifier to a Python function, e.g., "relu" => `tf.nn.relu`. +def get_activation(identifier, use_keras_layer=False, **kwargs): + """Maps an identifier to a Python function, e.g., "relu" => `tf.nn.relu`. It checks string first and if it is one of customized activation not in TF, the corresponding activation will be returned. For non-customized activation names and callable identifiers, always fallback to tf.keras.activations.get. Prefers using keras layers when use_keras_layer=True. Now it only supports - 'relu', 'linear', 'identity', 'swish'. + 'relu', 'linear', 'identity', 'swish', 'mish', 'leaky_relu', and 'gelu'. Args: identifier: String name of the activation function or callable. use_keras_layer: If True, use keras layer if identifier is allow-listed. + **kwargs: Keyword arguments to use to instantiate an activation function. + Available only for 'leaky_relu' and 'gelu' when using keras layers. + For example: get_activation('leaky_relu', use_keras_layer=True, alpha=0.1) Returns: A Python function corresponding to the activation function or a keras @@ -110,8 +114,11 @@ def get_activation(identifier, use_keras_layer=False): "swish": "swish", "sigmoid": "sigmoid", "relu6": tf.nn.relu6, + "leaky_relu": functools.partial(tf.nn.leaky_relu, **kwargs), "hard_swish": activations.hard_swish, "hard_sigmoid": activations.hard_sigmoid, + "mish": activations.mish, + "gelu": functools.partial(tf.nn.gelu, **kwargs), } if identifier in keras_layer_allowlist: return tf.keras.layers.Activation(keras_layer_allowlist[identifier]) @@ -122,6 +129,7 @@ def get_activation(identifier, use_keras_layer=False): "relu6": activations.relu6, "hard_sigmoid": activations.hard_sigmoid, "identity": activations.identity, + "mish": activations.mish, } if identifier in name_to_fn: return tf.keras.activations.get(name_to_fn[identifier]) @@ -201,3 +209,85 @@ def safe_mean(losses): total = tf.reduce_sum(losses) num_elements = tf.cast(tf.size(losses), dtype=losses.dtype) return tf.math.divide_no_nan(total, num_elements) + + +def get_replica_id(): + """Gets replica id depending on the environment.""" + context = tf.distribute.get_replica_context() + if context is not None: + return context.replica_id_in_sync_group + else: + raise RuntimeError("Unknown replica context. The `get_replica_id` method " + "relies on TF 2.x tf.distribute API.") + + +def cross_replica_concat(value, axis, name="cross_replica_concat"): + """Concatenates the given `value` across (GPU/TPU) cores, along `axis`. + + In general, each core ("replica") will pass a + replica-specific value as `value` (corresponding to some element of a + data-parallel computation taking place across replicas). + + The resulting concatenated `Tensor` will have the same shape as `value` for + all dimensions except `axis`, where it will be larger by a factor of the + number of replicas. It will also have the same `dtype` as `value`. + + The position of a given replica's `value` within the resulting concatenation + is determined by that replica's replica ID. For + example: + + With `value` for replica 0 given as + + 0 0 0 + 0 0 0 + + and `value` for replica 1 given as + + 1 1 1 + 1 1 1 + + the resulting concatenation along axis 0 will be + + 0 0 0 + 0 0 0 + 1 1 1 + 1 1 1 + + and this result will be identical across all replicas. + + Note that this API only works in TF2 with `tf.distribute`. + + Args: + value: The `Tensor` to concatenate across replicas. Each replica will have a + different value for this `Tensor`, and these replica-specific values will + be concatenated. + axis: The axis along which to perform the concatenation as a Python integer + (not a `Tensor`). E.g., `axis=0` to concatenate along the batch dimension. + name: A name for the operation (used to create a name scope). + + Returns: + The result of concatenating `value` along `axis` across replicas. + + Raises: + RuntimeError: when the batch (0-th) dimension is None. + """ + with tf.name_scope(name): + context = tf.distribute.get_replica_context() + # Typically this could be hit only if the tensor is derived from a + # dataset with finite epochs and drop_remainder=False, where the last + # batch could of different batch size and then the dim-0 is of dynamic + # shape. + if value.shape.as_list()[0] is None: + raise RuntimeError(f"{value} has unknown batch.") + return context.all_gather(value, axis=axis) + + +def clone_initializer(initializer): + # Keras initializer is going to be stateless, which mean reusing the same + # initializer will produce same init value when the shapes are the same. + if isinstance(initializer, tf.keras.initializers.Initializer): + return initializer.__class__.from_config(initializer.get_config()) + # When the input is string/dict or other serialized configs, caller will + # create a new keras Initializer instance based on that, and we don't need to + # do anything + return initializer diff --git a/official/modeling/tf_utils_test.py b/official/modeling/tf_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..4013cd938d127766853c553584786aa2b05dcc97 --- /dev/null +++ b/official/modeling/tf_utils_test.py @@ -0,0 +1,107 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for tf_utils.""" +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from tensorflow.python.distribute import strategy_combinations +from official.modeling import tf_utils + + +def all_strategy_combinations(): + return combinations.combine( + strategy=[ + strategy_combinations.cloud_tpu_strategy, + strategy_combinations.mirrored_strategy_with_two_gpus, + ], + mode='eager', + ) + + +class TFUtilsTest(tf.test.TestCase, parameterized.TestCase): + + @combinations.generate(all_strategy_combinations()) + def test_cross_replica_concat(self, strategy): + num_cores = strategy.num_replicas_in_sync + + shape = (2, 3, 4) + + def concat(axis): + + @tf.function + def function(): + replica_value = tf.fill(shape, tf_utils.get_replica_id()) + return tf_utils.cross_replica_concat(replica_value, axis=axis) + + return function + + def expected(axis): + values = [np.full(shape, i) for i in range(num_cores)] + return np.concatenate(values, axis=axis) + + per_replica_results = strategy.run(concat(axis=0)) + replica_0_result = per_replica_results.values[0].numpy() + for value in per_replica_results.values[1:]: + self.assertAllClose(value.numpy(), replica_0_result) + self.assertAllClose(replica_0_result, expected(axis=0)) + + replica_0_result = strategy.run(concat(axis=1)).values[0].numpy() + self.assertAllClose(replica_0_result, expected(axis=1)) + + replica_0_result = strategy.run(concat(axis=2)).values[0].numpy() + self.assertAllClose(replica_0_result, expected(axis=2)) + + @combinations.generate(all_strategy_combinations()) + def test_cross_replica_concat_gradient(self, strategy): + num_cores = strategy.num_replicas_in_sync + + shape = (10, 5) + + @tf.function + def function(): + replica_value = tf.random.normal(shape) + with tf.GradientTape() as tape: + tape.watch(replica_value) + concat_value = tf_utils.cross_replica_concat(replica_value, axis=0) + output = tf.reduce_sum(concat_value) + return tape.gradient(output, replica_value) + + per_replica_gradients = strategy.run(function) + for gradient in per_replica_gradients.values: + self.assertAllClose(gradient, num_cores * tf.ones(shape)) + + @parameterized.parameters(('relu', True), ('relu', False), + ('leaky_relu', False), ('leaky_relu', True), + ('mish', True), ('mish', False), ('gelu', True)) + def test_get_activations(self, name, use_keras_layer): + fn = tf_utils.get_activation(name, use_keras_layer) + self.assertIsNotNone(fn) + + @combinations.generate(all_strategy_combinations()) + def test_get_leaky_relu_layer(self, strategy): + @tf.function + def forward(x): + fn = tf_utils.get_activation( + 'leaky_relu', use_keras_layer=True, alpha=0.1) + return strategy.run(fn, args=(x,)).values[0] + + got = forward(tf.constant([-1])) + self.assertAllClose(got, tf.constant([-0.1])) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/nightly_requirements.txt b/official/nightly_requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..41f805823b5e6395e7cb061a844f2f4f207cded1 --- /dev/null +++ b/official/nightly_requirements.txt @@ -0,0 +1,29 @@ +six +google-api-python-client>=1.6.7 +kaggle>=1.3.9 +numpy>=1.20 +oauth2client +pandas>=0.22.0 +psutil>=5.4.3 +py-cpuinfo>=3.3.0 +scipy>=0.19.1 +tensorflow-hub>=0.6.0 +tensorflow-model-optimization>=0.4.1 +tensorflow-datasets +tfa-nightly +gin-config +tf_slim>=1.1.0 +Cython +matplotlib +# Loader becomes a required positional argument in 6.0 in yaml.load +pyyaml>=5.1,<6.0 +# CV related dependencies +opencv-python-headless==4.5.2.52 +Pillow +pycocotools +# NLP related dependencies +seqeval +sentencepiece +sacrebleu +# Projects/vit dependencies +immutabledict diff --git a/official/nlp/MODEL_GARDEN.md b/official/nlp/MODEL_GARDEN.md index 5d590a9337cf9cb84294eff5ca1da7a74984e375..09294e942e8d6a2e001f36b0f3650a8496bfe69f 100644 --- a/official/nlp/MODEL_GARDEN.md +++ b/official/nlp/MODEL_GARDEN.md @@ -2,53 +2,69 @@ ## Introduction -This TF-NLP library provides a collection of scripts for the training and -evaluation of transformer-based models, on various tasks such as sentence +The TF-NLP library provides a collection of scripts for training and +evaluating transformer-based models, on various tasks such as sentence classification, question answering, and translation. Additionally, we provide checkpoints of pretrained models which can be finetuned on downstream tasks. ### How to Train Models -Model Garden can be easily installed using PIP -(`pip install tf-models-nightly`). After installation, check out +Model Garden can be easily installed with +`pip install tf-models-nightly`. After installation, check out [this instruction](https://github.com/tensorflow/models/blob/master/official/nlp/docs/train.md) on how to train models with this codebase. -## Available Tasks -There are two available model configs (we will add more) under -`configs/experiments/`: +By default, the experiment runs on GPUs. To run on TPUs, one should overwrite +`runtime.distribution_strategy` and set the tpu address. See [RuntimeConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py) for details. + +In general, the experiments can run with the folloing command by setting the +corresponding `${TASK}`, `${TASK_CONFIG}`, `${MODEL_CONFIG}`. +``` +EXPERIMENT=??? +TASK_CONFIG=??? +MODEL_CONFIG=??? +EXRTRA_PARAMS=??? +MODEL_DIR=??? # a-folder-to-hold-checkpoints-and-logs +python3 train.py \ + --experiment=${EXPERIMENT} \ + --mode=train_and_eval \ + --model_dir=${MODEL_DIR} \ + --config_file=${TASK_CONFIG} \ + --config_file=${MODEL_CONFIG} \ + --params_override=${EXRTRA_PARAMS} +``` + +* `EXPERIMENT` can be found under `configs/` +* `TASK_CONFIG` can be found under `configs/experiments/` +* `MODEL_CONFIG` can be found under `configs/models/` + +#### Order of params override: +1. `train.py` looks up the registered `ExperimentConfig` with `${EXPERIMENT}` +2. Overrides params in `TaskConfig` in `${TASK_CONFIG}` +3. Overrides params `model` in `TaskConfig` with `${MODEL_CONFIG}` +4. Overrides any params in `ExperimentConfig` with `${EXTRA_PARAMS}` + +Note that +1. `${TASK_CONFIG}`, `${MODEL_CONFIG}`, `${EXTRA_PARAMS}` can be optional when EXPERIMENT default is enough. +2. `${TASK_CONFIG}`, `${MODEL_CONFIG}`, `${EXTRA_PARAMS}` are only guaranteed to be compatible to it's `${EXPERIMENT}` that defines it. + +## Experiments + +| NAME | EXPERIMENT | TASK_CONFIG | MODEL_CONFIG | EXRTRA_PARAMS | +| ----------------- | ------------------------ | ------- | -------- | ----------- | +| BERT-base GLUE/MNLI-matched finetune | [bert/sentence_prediction](https://github.com/tensorflow/models/blob/master/official/nlp/configs/finetuning_experiments.py) | [glue_mnli_matched.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiments/glue_mnli_matched.yaml) | [bert_en_uncased_base.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/models/bert_en_uncased_base.yaml) |
data and bert-base hub inittask.train_data.input_path=/path-to-your-training-data,task.validation_data.input_path=/path-to-your-val-data,task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4
| +| BERT-base GLUE/MNLI-matched finetune | [bert/sentence_prediction](https://github.com/tensorflow/models/blob/master/official/nlp/configs/finetuning_experiments.py) | [glue_mnli_matched.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiments/glue_mnli_matched.yaml) | [bert_en_uncased_base.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/models/bert_en_uncased_base.yaml) |
data and bert-base ckpt inittask.train_data.input_path=/path-to-your-training-data,task.validation_data.input_path=/path-to-your-val-data,task.init_checkpoint=gs://tf_model_garden/nlp/bert/uncased_L-12_H-768_A-12/bert_model.ckpt
| +| BERT-base SQuAD v1.1 finetune | [bert/squad](https://github.com/tensorflow/models/blob/master/official/nlp/configs/finetuning_experiments.py) | [squad_v1.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiments/squad_v1.yaml) | [bert_en_uncased_base.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/models/bert_en_uncased_base.yaml) |
data and bert-base hub inittask.train_data.input_path=/path-to-your-training-data,task.validation_data.input_path=/path-to-your-val-data,task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4
| +|ALBERT-base SQuAD v1.1 finetune | [bert/squad](https://github.com/tensorflow/models/blob/master/official/nlp/configs/finetuning_experiments.py) | [squad_v1.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiments/squad_v1.yaml) | [albert_base.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/models/albert_base.yaml)|
data and albert-base hub inittask.train_data.input_path=/path-to-your-training-data,task.validation_data.input_path=/path-to-your-val-data,task.hub_module_url=https://tfhub.dev/tensorflow/albert_en_base/3
| +| Transformer-large WMT14/en-de scratch |[wmt_transformer/large](https://github.com/tensorflow/models/blob/master/official/nlp/configs/wmt_transformer_experiments.py)| | |
ende-32k sentencepiecetask.sentencepiece_model_path='gs://tf_model_garden/nlp/transformer_wmt/ende_bpe_32k.model'
| -| Dataset | Task | Config | Example command | -| ----------------- | ------------------------ | ------- | ---- | -| GLUE/MNLI-matched | bert/sentence_prediction | [glue_mnli_matched.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiments/glue_mnli_matched.yaml) |
finetune BERT-base on this task PARAMS=runtime.distribution_strategy=mirrored
PARAMS=${PARAMS},task.train_data.input_path=/path-to-your-training-data/
PARAMS=${PARAMS},task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4

python3 train.py \\
--experiment=bert/sentence_prediction \\
--mode=train \\
--model_dir=/a-folder-to-hold-checkpoints-and-logs/ \\
--config_file=configs/models/bert_en_uncased_base.yaml \\
--config_file=configs/experiments/glue_mnli_matched.yaml \\
--params_override=${PARAMS}
| -| SQuAD v1.1 | bert/squad | [squad_v1.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiments/squad_v1.yaml) |
finetune BERT-base on this task PARAMS=runtime.distribution_strategy=mirrored
PARAMS=${PARAMS},task.train_data.input_path=/path-to-your-training-data/
PARAMS=${PARAMS},task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4

python3 train.py \\
--experiment=bert/squad \\
--mode=train \\
--model_dir=/a-folder-to-hold-checkpoints-and-logs/ \\
--config_file=configs/models/bert_en_uncased_base.yaml \\
--config_file=configs/experiments/squad_v1.yaml \\
--params_override=${PARAMS}
| - -One example on how to use the config file: if you want to work on the SQuAD -question answering task, set -`--config_file=configs/experiments/squad_v1.yaml` and -`--experiment=bert/squad` -as arguments to `train.py`. - -## Available Model Configs - -There are two available model configs (we will add more) under -`configs/models/`: - -| Model | Config | Pretrained checkpoint & Vocabulary | TF-HUB SavedModel | Example command | -| ------------ | ------- | ---------------------------------- | ----------------- | --------------- | -| BERT-base | [bert_en_uncased_base.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/models/bert_en_uncased_base.yaml) | [uncased_L-12_H-768_A-12](https://storage.googleapis.com/tf_model_garden/nlp/bert/v3/uncased_L-12_H-768_A-12.tar.gz) | [uncased_L-12_H-768_A-12](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/) |
finetune on SQuAD v1.1 PARAMS=runtime.distribution_strategy=mirrored
PARAMS=${PARAMS},task.train_data.input_path=/path-to-your-training-data/
PARAMS=${PARAMS},task.hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4

python3 train.py \\
--experiment=bert/squad \\
--mode=train \\
--model_dir=/a-folder-to-hold-checkpoints-and-logs/ \\
--config_file=configs/models/bert_en_uncased_base.yaml \\
--config_file=configs/experiments/squad_v1.yaml \\
--params_override=${PARAMS}
| -| ALBERT-base | [albert_base.yaml](https://github.com/tensorflow/models/blob/master/official/nlp/configs/models/albert_base.yaml) | [albert_en_base](https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_base.tar.gz) | [albert_en_base](https://tfhub.dev/tensorflow/albert_en_base/3) |
finetune on SQuAD v1.1 PARAMS=runtime.distribution_strategy=mirrored
PARAMS=${PARAMS},task.train_data.input_path=/path-to-your-training-data/
PARAMS=${PARAMS},task.hub_module_url=https://tfhub.dev/tensorflow/albert_en_base/3

python3 train.py \\
--experiment=bert/squad \\
--mode=train \\
--model_dir=/a-folder-to-hold-checkpoints-and-logs/ \\
--config_file=configs/models/albert_base.yaml \\
--config_file=configs/experiments/squad_v1.yaml \\
--params_override=${PARAMS}
| - -One example on how to use the config file: if you want to train an ALBERT-base -model, set `--config_file=configs/models/albert_base.yaml` as an argument to -`train.py`. ## Useful links [How to Train Models](https://github.com/tensorflow/models/blob/master/official/nlp/docs/train.md) -[List of Pretrained Models](https://github.com/tensorflow/models/blob/master/official/nlp/docs/pretrained_models.md) +[List of Pretrained Models for finetuning](https://github.com/tensorflow/models/blob/master/official/nlp/docs/pretrained_models.md) [How to Publish Models](https://github.com/tensorflow/models/blob/master/official/nlp/docs/tfhub.md) diff --git a/official/nlp/README.md b/official/nlp/README.md index 6b095251f65a9cda91aa32fc5582f27e0bda64e4..46ac10f814a1818cdb4d271c8ed14463e44b9e4f 100644 --- a/official/nlp/README.md +++ b/official/nlp/README.md @@ -1,28 +1,33 @@ -# TensorFlow NLP Modelling Toolkit +# TF-NLP Model Garden -This codebase provides a Natrual Language Processing modeling toolkit written in +⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or +distributed by Google. The dataset is made available by third parties. Please +review the terms and conditions made available by the third parties before using +the data. + +This codebase provides a Natural Language Processing modeling toolkit written in [TF2](https://www.tensorflow.org/guide/effective_tf2). It allows researchers and developers to reproduce state-of-the-art model results and train custom models to experiment new research ideas. ## Features -* Reusable and modularized modeling building blocks -* State-of-the-art reproducible -* Easy to customize and extend -* End-to-end training -* Distributed trainable on both GPUs and TPUs +* Reusable and modularized modeling building blocks +* State-of-the-art reproducible +* Easy to customize and extend +* End-to-end training +* Distributed trainable on both GPUs and TPUs ## Major components ### Libraries We provide modeling library to allow users to train custom models for new -research ideas. Detailed intructions can be found in READMEs in each folder. +research ideas. Detailed instructions can be found in READMEs in each folder. * [modeling/](modeling): modeling library that provides building blocks (e.g.,Layers, Networks, and Models) that can be assembled into - transformer-based achitectures . + transformer-based architectures. * [data/](data): binaries and utils for input preprocessing, tokenization, etc. @@ -30,27 +35,29 @@ research ideas. Detailed intructions can be found in READMEs in each folder. We provide SoTA model implementations, pre-trained models, training and evaluation examples, and command lines. Detail instructions can be found in the -READMEs for specific papers. +READMEs for specific papers. Below are some papers implemented in the repository +and more NLP projects can be found in the +[`projects`](https://github.com/tensorflow/models/tree/master/official/projects) +folder: -1. [BERT](MODEL_GARDEN.md#available-model-configs): [BERT: Pre-training of Deep Bidirectional Transformers for - Language Understanding](https://arxiv.org/abs/1810.04805) by Devlin et al., - 2018 +1. [BERT](MODEL_GARDEN.md#available-model-configs): [BERT: Pre-training of Deep + Bidirectional Transformers for Language + Understanding](https://arxiv.org/abs/1810.04805) by Devlin et al., 2018 2. [ALBERT](MODEL_GARDEN.md#available-model-configs): [A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942) by Lan et al., 2019 -3. [XLNet](xlnet): +3. [XLNet](MODEL_GARDEN.md): [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Yang et al., 2019 -4. [Transformer for translation](transformer): +4. [Transformer for translation](MODEL_GARDEN.md#available-model-configs): [Attention Is All You Need](https://arxiv.org/abs/1706.03762) by Vaswani et al., 2017 ### Common Training Driver We provide a single common driver [train.py](train.py) to train above SoTA -models on popluar tasks. Please see [docs/train.md](docs/train.md) for -more details. - +models on popular tasks. Please see [docs/train.md](docs/train.md) for more +details. ### Pre-trained models with checkpoints and TF-Hub diff --git a/official/nlp/__init__.py b/official/nlp/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/nlp/__init__.py +++ b/official/nlp/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/bert/README.md b/official/nlp/bert/README.md deleted file mode 100644 index 037ff0b1ff8c6ea22bcf692bb8f786320b7d2d48..0000000000000000000000000000000000000000 --- a/official/nlp/bert/README.md +++ /dev/null @@ -1,395 +0,0 @@ -# BERT (Bidirectional Encoder Representations from Transformers) - -**WARNING**: We are on the way to deprecate most of the code in this directory. -Please see -[this link](https://github.com/tensorflow/models/blob/master/official/nlp/docs/train.md) -for the new tutorial and use the new code in `nlp/modeling`. This README is -still correct for this legacy implementation. - -The academic paper which describes BERT in detail and provides full results on a -number of tasks can be found here: https://arxiv.org/abs/1810.04805. - -This repository contains TensorFlow 2.x implementation for BERT. - -## Contents - * [Contents](#contents) - * [Pre-trained Models](#pre-trained-models) - * [Restoring from Checkpoints](#restoring-from-checkpoints) - * [Set Up](#set-up) - * [Process Datasets](#process-datasets) - * [Fine-tuning with BERT](#fine-tuning-with-bert) - * [Cloud GPUs and TPUs](#cloud-gpus-and-tpus) - * [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks) - * [SQuAD 1.1](#squad-1.1) - - -## Pre-trained Models - -We released both checkpoints and tf.hub modules as the pretrained models for -fine-tuning. They are TF 2.x compatible and are converted from the checkpoints -released in TF 1.x official BERT repository -[google-research/bert](https://github.com/google-research/bert) -in order to keep consistent with BERT paper. - - -### Access to Pretrained Checkpoints - -Pretrained checkpoints can be found in the following links: - -**Note: We have switched BERT implementation -to use Keras functional-style networks in [nlp/modeling](../modeling). -The new checkpoints are:** - -* **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz)**: - 12-layer, 768-hidden, 12-heads, 110M parameters -* **[`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz)**: - 12-layer, 768-hidden, 12-heads , 110M parameters -* **[`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Base, Multilingual Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/multi_cased_L-12_H-768_A-12.tar.gz)**: - 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters - -We recommend to host checkpoints on Google Cloud storage buckets when you use -Cloud GPU/TPU. - -### Restoring from Checkpoints - -`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore -weights from provided pre-trained checkpoints, you can use the following code: - -```python -init_checkpoint='the pretrained model checkpoint path.' -model=tf.keras.Model() # Bert pre-trained model as feature extractor. -checkpoint = tf.train.Checkpoint(model=model) -checkpoint.restore(init_checkpoint) -``` - -Checkpoints featuring native serialized Keras models -(i.e. model.load()/load_weights()) will be available soon. - -### Access to Pretrained hub modules. - -Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the -following links: - -* **[`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/)**: - 12-layer, 768-hidden, 12-heads, 110M parameters -* **[`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/)**: - 12-layer, 768-hidden, 12-heads , 110M parameters -* **[`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/)**: - 24-layer, 1024-hidden, 16-heads, 340M parameters -* **[`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/)**: - 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters -* **[`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/)**: - Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, - 110M parameters - -## Set Up - -```shell -export PYTHONPATH="$PYTHONPATH:/path/to/models" -``` - -Install `tf-nightly` to get latest updates: - -```shell -pip install tf-nightly-gpu -``` - -With TPU, GPU support is not necessary. First, you need to create a `tf-nightly` -TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu): - -```shell -ctpu up -name --tf-version=”nightly” -``` - -Second, you need to install TF 2 `tf-nightly` on your VM: - -```shell -pip install tf-nightly -``` - -## Process Datasets - -### Pre-training - -There is no change to generate pre-training data. Please use the script -[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py) -which is essentially branched from [BERT research repo](https://github.com/google-research/bert) -to get processed pre-training data and it adapts to TF2 symbols and python3 -compatibility. - -Running the pre-training script requires an input and output directory, as well as a vocab file. Note that max_seq_length will need to match the sequence length parameter you specify when you run pre-training. - -Example shell script to call create_pretraining_data.py -``` -export WORKING_DIR='local disk or cloud location' -export BERT_DIR='local disk or cloud location' -python models/official/nlp/data/create_pretraining_data.py \ - --input_file=$WORKING_DIR/input/input.txt \ - --output_file=$WORKING_DIR/output/tf_examples.tfrecord \ - --vocab_file=$BERT_DIR/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ - --do_lower_case=True \ - --max_seq_length=512 \ - --max_predictions_per_seq=76 \ - --masked_lm_prob=0.15 \ - --random_seed=12345 \ - --dupe_factor=5 -``` - -### Fine-tuning - -To prepare the fine-tuning data for final model training, use the -[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script. -Resulting datasets in `tf_record` format and training meta data should be later -passed to training or evaluation scripts. The task-specific arguments are -described in following sections: - -* GLUE - -Users can download the -[GLUE data](https://gluebenchmark.com/tasks) by running -[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e) -and unpack it to some directory `$GLUE_DIR`. -Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage. - -```shell -export GLUE_DIR=~/glue -export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 - -export TASK_NAME=MNLI -export OUTPUT_DIR=gs://some_bucket/datasets -python ../data/create_finetuning_data.py \ - --input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \ - --vocab_file=${BERT_DIR}/vocab.txt \ - --train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \ - --eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \ - --meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \ - --fine_tuning_task_type=classification --max_seq_length=128 \ - --classification_task_name=${TASK_NAME} -``` - -* SQUAD - -The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains -detailed information about the SQuAD datasets and evaluation. - -The necessary files can be found here: - -* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json) -* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json) -* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py) -* [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json) -* [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json) -* [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/) - -```shell -export SQUAD_DIR=~/squad -export SQUAD_VERSION=v1.1 -export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 -export OUTPUT_DIR=gs://some_bucket/datasets - -python ../data/create_finetuning_data.py \ - --squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \ - --vocab_file=${BERT_DIR}/vocab.txt \ - --train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ - --meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \ - --fine_tuning_task_type=squad --max_seq_length=384 -``` - -Note: To create fine-tuning data with SQUAD 2.0, you need to add flag `--version_2_with_negative=True`. - -## Fine-tuning with BERT - -### Cloud GPUs and TPUs - -* Cloud Storage - -The unzipped pre-trained model files can also be found in the Google Cloud -Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example: - -```shell -export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 -export MODEL_DIR=gs://some_bucket/my_output_dir -``` - -Currently, users are able to access to `tf-nightly` TPUs and the following TPU -script should run with `tf-nightly`. - -* GPU -> TPU - -Just add the following flags to `run_classifier.py` or `run_squad.py`: - -```shell - --distribution_strategy=tpu - --tpu=grpc://${TPU_IP_ADDRESS}:8470 -``` - -### Sentence and Sentence-pair Classification Tasks - -This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase -Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a -few minutes on most GPUs. - -We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the -workflow. -For GPU memory of 16GB or smaller, you may try to use `BERT-Base` -(uncased_L-12_H-768_A-12). - -```shell -export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 -export MODEL_DIR=gs://some_bucket/my_output_dir -export GLUE_DIR=gs://some_bucket/datasets -export TASK=MRPC - -python run_classifier.py \ - --mode='train_and_eval' \ - --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ - --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ - --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ - --bert_config_file=${BERT_DIR}/bert_config.json \ - --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ - --train_batch_size=4 \ - --eval_batch_size=4 \ - --steps_per_loop=1 \ - --learning_rate=2e-5 \ - --num_train_epochs=3 \ - --model_dir=${MODEL_DIR} \ - --distribution_strategy=mirrored -``` - -Alternatively, instead of specifying `init_checkpoint`, you can specify -`hub_module_url` to employ a pretraind BERT hub module, e.g., -` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`. - -After training a model, to get predictions from the classifier, you can set the -`--mode=predict` and offer the test set tfrecords to `--eval_data_path`. -Output will be created in file called test_results.tsv in the output folder. -Each line will contain output for each sample, columns are the class -probabilities. - -```shell -python run_classifier.py \ - --mode='predict' \ - --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ - --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ - --bert_config_file=${BERT_DIR}/bert_config.json \ - --eval_batch_size=4 \ - --model_dir=${MODEL_DIR} \ - --distribution_strategy=mirrored -``` - -To use TPU, you only need to switch distribution strategy type to `tpu` with TPU -information and use remote storage for model checkpoints. - -```shell -export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 -export TPU_IP_ADDRESS='???' -export MODEL_DIR=gs://some_bucket/my_output_dir -export GLUE_DIR=gs://some_bucket/datasets -export TASK=MRPC - -python run_classifier.py \ - --mode='train_and_eval' \ - --input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ - --train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ - --eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ - --bert_config_file=${BERT_DIR}/bert_config.json \ - --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ - --train_batch_size=32 \ - --eval_batch_size=32 \ - --steps_per_loop=1000 \ - --learning_rate=2e-5 \ - --num_train_epochs=3 \ - --model_dir=${MODEL_DIR} \ - --distribution_strategy=tpu \ - --tpu=grpc://${TPU_IP_ADDRESS}:8470 -``` - -Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of -training steps inside a `tf.function` can significantly increase TPU utilization -and callbacks will not be called inside the loop. - -### SQuAD 1.1 - -The Stanford Question Answering Dataset (SQuAD) is a popular question answering -benchmark dataset. See more in [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/). - -We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the -workflow. -For GPU memory of 16GB or smaller, you may try to use `BERT-Base` -(uncased_L-12_H-768_A-12). - -```shell -export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 -export SQUAD_DIR=gs://some_bucket/datasets -export MODEL_DIR=gs://some_bucket/my_output_dir -export SQUAD_VERSION=v1.1 - -python run_squad.py \ - --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ - --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ - --predict_file=${SQUAD_DIR}/dev-v1.1.json \ - --vocab_file=${BERT_DIR}/vocab.txt \ - --bert_config_file=${BERT_DIR}/bert_config.json \ - --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ - --train_batch_size=4 \ - --predict_batch_size=4 \ - --learning_rate=8e-5 \ - --num_train_epochs=2 \ - --model_dir=${MODEL_DIR} \ - --distribution_strategy=mirrored -``` - -Similarily, you can replace `init_checkpoint` FLAG with `hub_module_url` to -specify a hub module path. - -`run_squad.py` writes the prediction for `--predict_file` by default. If you set -the `--model=predict` and offer the SQuAD test data, the scripts will generate -the prediction json file. - -To use TPU, you need switch distribution strategy type to `tpu` with TPU -information. - -```shell -export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 -export TPU_IP_ADDRESS='???' -export MODEL_DIR=gs://some_bucket/my_output_dir -export SQUAD_DIR=gs://some_bucket/datasets -export SQUAD_VERSION=v1.1 - -python run_squad.py \ - --input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ - --train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ - --predict_file=${SQUAD_DIR}/dev-v1.1.json \ - --vocab_file=${BERT_DIR}/vocab.txt \ - --bert_config_file=${BERT_DIR}/bert_config.json \ - --init_checkpoint=${BERT_DIR}/bert_model.ckpt \ - --train_batch_size=32 \ - --learning_rate=8e-5 \ - --num_train_epochs=2 \ - --model_dir=${MODEL_DIR} \ - --distribution_strategy=tpu \ - --tpu=grpc://${TPU_IP_ADDRESS}:8470 -``` - -The dev set predictions will be saved into a file called predictions.json in the -model_dir: - -```shell -python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json -``` - - diff --git a/official/nlp/bert/__init__.py b/official/nlp/bert/__init__.py deleted file mode 100644 index a25710c222e3327cb20e000db5df5c5651c4a2cc..0000000000000000000000000000000000000000 --- a/official/nlp/bert/__init__.py +++ /dev/null @@ -1,15 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - diff --git a/official/nlp/bert/common_flags.py b/official/nlp/bert/common_flags.py deleted file mode 100644 index f622ab1e2f45b4d33af8e13230580cbb08d33820..0000000000000000000000000000000000000000 --- a/official/nlp/bert/common_flags.py +++ /dev/null @@ -1,125 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Defining common flags used across all BERT models/applications.""" - -from absl import flags -import tensorflow as tf - -from official.utils import hyperparams_flags -from official.utils.flags import core as flags_core - - -def define_common_bert_flags(): - """Define common flags for BERT tasks.""" - flags_core.define_base( - data_dir=False, - model_dir=True, - clean=False, - train_epochs=False, - epochs_between_evals=False, - stop_threshold=False, - batch_size=False, - num_gpu=True, - export_dir=False, - distribution_strategy=True, - run_eagerly=True) - flags_core.define_distribution() - flags.DEFINE_string('bert_config_file', None, - 'Bert configuration file to define core bert layers.') - flags.DEFINE_string( - 'model_export_path', None, - 'Path to the directory, where trainined model will be ' - 'exported.') - flags.DEFINE_string('tpu', '', 'TPU address to connect to.') - flags.DEFINE_string( - 'init_checkpoint', None, - 'Initial checkpoint (usually from a pre-trained BERT model).') - flags.DEFINE_integer('num_train_epochs', 3, - 'Total number of training epochs to perform.') - flags.DEFINE_integer( - 'steps_per_loop', None, - 'Number of steps per graph-mode loop. Only training step ' - 'happens inside the loop. Callbacks will not be called ' - 'inside. If not set the value will be configured depending on the ' - 'devices available.') - flags.DEFINE_float('learning_rate', 5e-5, - 'The initial learning rate for Adam.') - flags.DEFINE_float('end_lr', 0.0, - 'The end learning rate for learning rate decay.') - flags.DEFINE_string('optimizer_type', 'adamw', - 'The type of optimizer to use for training (adamw|lamb)') - flags.DEFINE_boolean( - 'scale_loss', False, - 'Whether to divide the loss by number of replica inside the per-replica ' - 'loss function.') - flags.DEFINE_boolean( - 'use_keras_compile_fit', False, - 'If True, uses Keras compile/fit() API for training logic. Otherwise ' - 'use custom training loop.') - flags.DEFINE_string( - 'hub_module_url', None, 'TF-Hub path/url to Bert module. ' - 'If specified, init_checkpoint flag should not be used.') - flags.DEFINE_bool('hub_module_trainable', True, - 'True to make keras layers in the hub module trainable.') - flags.DEFINE_string( - 'sub_model_export_name', None, - 'If set, `sub_model` checkpoints are exported into ' - 'FLAGS.model_dir/FLAGS.sub_model_export_name.') - flags.DEFINE_bool('explicit_allreduce', False, - 'True to use explicit allreduce instead of the implicit ' - 'allreduce in optimizer.apply_gradients(). If fp16 mixed ' - 'precision training is used, this also enables allreduce ' - 'gradients in fp16.') - flags.DEFINE_integer('allreduce_bytes_per_pack', 0, - 'Number of bytes of a gradient pack for allreduce. ' - 'Should be positive integer, if set to 0, all ' - 'gradients are in one pack. Breaking gradient into ' - 'packs could enable overlap between allreduce and ' - 'backprop computation. This flag only takes effect ' - 'when explicit_allreduce is set to True.') - - flags_core.define_log_steps() - - # Adds flags for mixed precision and multi-worker training. - flags_core.define_performance( - num_parallel_calls=False, - inter_op=False, - intra_op=False, - synthetic_data=False, - max_train_steps=False, - dtype=True, - loss_scale=True, - all_reduce_alg=True, - num_packs=False, - tf_gpu_thread_mode=True, - datasets_num_private_threads=True, - enable_xla=True, - fp16_implementation=True, - ) - - # Adds gin configuration flags. - hyperparams_flags.define_gin_flags() - - -def dtype(): - return flags_core.get_tf_dtype(flags.FLAGS) - - -def use_float16(): - return flags_core.get_tf_dtype(flags.FLAGS) == tf.float16 - - -def get_loss_scale(): - return flags_core.get_loss_scale(flags.FLAGS, default_for_fp16='dynamic') diff --git a/official/nlp/bert/configs.py b/official/nlp/bert/configs.py deleted file mode 100644 index 950c32d0bfad3e06f3d14baf042a916de2eb2828..0000000000000000000000000000000000000000 --- a/official/nlp/bert/configs.py +++ /dev/null @@ -1,104 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""The main BERT model and related functions.""" - -import copy -import json - -import six -import tensorflow as tf - - -class BertConfig(object): - """Configuration for `BertModel`.""" - - def __init__(self, - vocab_size, - hidden_size=768, - num_hidden_layers=12, - num_attention_heads=12, - intermediate_size=3072, - hidden_act="gelu", - hidden_dropout_prob=0.1, - attention_probs_dropout_prob=0.1, - max_position_embeddings=512, - type_vocab_size=16, - initializer_range=0.02, - embedding_size=None, - backward_compatible=True): - """Constructs BertConfig. - - Args: - vocab_size: Vocabulary size of `inputs_ids` in `BertModel`. - hidden_size: Size of the encoder layers and the pooler layer. - num_hidden_layers: Number of hidden layers in the Transformer encoder. - num_attention_heads: Number of attention heads for each attention layer in - the Transformer encoder. - intermediate_size: The size of the "intermediate" (i.e., feed-forward) - layer in the Transformer encoder. - hidden_act: The non-linear activation function (function or string) in the - encoder and pooler. - hidden_dropout_prob: The dropout probability for all fully connected - layers in the embeddings, encoder, and pooler. - attention_probs_dropout_prob: The dropout ratio for the attention - probabilities. - max_position_embeddings: The maximum sequence length that this model might - ever be used with. Typically set this to something large just in case - (e.g., 512 or 1024 or 2048). - type_vocab_size: The vocabulary size of the `token_type_ids` passed into - `BertModel`. - initializer_range: The stdev of the truncated_normal_initializer for - initializing all weight matrices. - embedding_size: (Optional) width of the factorized word embeddings. - backward_compatible: Boolean, whether the variables shape are compatible - with checkpoints converted from TF 1.x BERT. - """ - self.vocab_size = vocab_size - self.hidden_size = hidden_size - self.num_hidden_layers = num_hidden_layers - self.num_attention_heads = num_attention_heads - self.hidden_act = hidden_act - self.intermediate_size = intermediate_size - self.hidden_dropout_prob = hidden_dropout_prob - self.attention_probs_dropout_prob = attention_probs_dropout_prob - self.max_position_embeddings = max_position_embeddings - self.type_vocab_size = type_vocab_size - self.initializer_range = initializer_range - self.embedding_size = embedding_size - self.backward_compatible = backward_compatible - - @classmethod - def from_dict(cls, json_object): - """Constructs a `BertConfig` from a Python dictionary of parameters.""" - config = BertConfig(vocab_size=None) - for (key, value) in six.iteritems(json_object): - config.__dict__[key] = value - return config - - @classmethod - def from_json_file(cls, json_file): - """Constructs a `BertConfig` from a json file of parameters.""" - with tf.io.gfile.GFile(json_file, "r") as reader: - text = reader.read() - return cls.from_dict(json.loads(text)) - - def to_dict(self): - """Serializes this instance to a Python dictionary.""" - output = copy.deepcopy(self.__dict__) - return output - - def to_json_string(self): - """Serializes this instance to a JSON string.""" - return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n" diff --git a/official/nlp/bert/export_tfhub.py b/official/nlp/bert/export_tfhub.py deleted file mode 100644 index 833e7c10582f9252f59b3b7584a5bcca0b6f4991..0000000000000000000000000000000000000000 --- a/official/nlp/bert/export_tfhub.py +++ /dev/null @@ -1,139 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""A script to export BERT as a TF-Hub SavedModel. - -This script is **DEPRECATED** for exporting BERT encoder models; -see the error message in by main() for details. -""" - -from typing import Text - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import tensorflow as tf -from official.nlp.bert import bert_models -from official.nlp.bert import configs - -FLAGS = flags.FLAGS - -flags.DEFINE_string("bert_config_file", None, - "Bert configuration file to define core bert layers.") -flags.DEFINE_string("model_checkpoint_path", None, - "File path to TF model checkpoint.") -flags.DEFINE_string("export_path", None, "TF-Hub SavedModel destination path.") -flags.DEFINE_string("vocab_file", None, - "The vocabulary file that the BERT model was trained on.") -flags.DEFINE_bool( - "do_lower_case", None, "Whether to lowercase. If None, " - "do_lower_case will be enabled if 'uncased' appears in the " - "name of --vocab_file") -flags.DEFINE_enum("model_type", "encoder", ["encoder", "squad"], - "What kind of BERT model to export.") - - -def create_bert_model(bert_config: configs.BertConfig) -> tf.keras.Model: - """Creates a BERT keras core model from BERT configuration. - - Args: - bert_config: A `BertConfig` to create the core model. - - Returns: - A keras model. - """ - # Adds input layers just as placeholders. - input_word_ids = tf.keras.layers.Input( - shape=(None,), dtype=tf.int32, name="input_word_ids") - input_mask = tf.keras.layers.Input( - shape=(None,), dtype=tf.int32, name="input_mask") - input_type_ids = tf.keras.layers.Input( - shape=(None,), dtype=tf.int32, name="input_type_ids") - transformer_encoder = bert_models.get_transformer_encoder( - bert_config, sequence_length=None) - sequence_output, pooled_output = transformer_encoder( - [input_word_ids, input_mask, input_type_ids]) - # To keep consistent with legacy hub modules, the outputs are - # "pooled_output" and "sequence_output". - return tf.keras.Model( - inputs=[input_word_ids, input_mask, input_type_ids], - outputs=[pooled_output, sequence_output]), transformer_encoder - - -def export_bert_tfhub(bert_config: configs.BertConfig, - model_checkpoint_path: Text, - hub_destination: Text, - vocab_file: Text, - do_lower_case: bool = None): - """Restores a tf.keras.Model and saves for TF-Hub.""" - # If do_lower_case is not explicit, default to checking whether "uncased" is - # in the vocab file name - if do_lower_case is None: - do_lower_case = "uncased" in vocab_file - logging.info("Using do_lower_case=%s based on name of vocab_file=%s", - do_lower_case, vocab_file) - core_model, encoder = create_bert_model(bert_config) - checkpoint = tf.train.Checkpoint( - model=encoder, # Legacy checkpoints. - encoder=encoder) - checkpoint.restore(model_checkpoint_path).assert_existing_objects_matched() - core_model.vocab_file = tf.saved_model.Asset(vocab_file) - core_model.do_lower_case = tf.Variable(do_lower_case, trainable=False) - core_model.save(hub_destination, include_optimizer=False, save_format="tf") - - -def export_bert_squad_tfhub(bert_config: configs.BertConfig, - model_checkpoint_path: Text, - hub_destination: Text, - vocab_file: Text, - do_lower_case: bool = None): - """Restores a tf.keras.Model for BERT with SQuAD and saves for TF-Hub.""" - # If do_lower_case is not explicit, default to checking whether "uncased" is - # in the vocab file name - if do_lower_case is None: - do_lower_case = "uncased" in vocab_file - logging.info("Using do_lower_case=%s based on name of vocab_file=%s", - do_lower_case, vocab_file) - span_labeling, _ = bert_models.squad_model(bert_config, max_seq_length=None) - checkpoint = tf.train.Checkpoint(model=span_labeling) - checkpoint.restore(model_checkpoint_path).assert_existing_objects_matched() - span_labeling.vocab_file = tf.saved_model.Asset(vocab_file) - span_labeling.do_lower_case = tf.Variable(do_lower_case, trainable=False) - span_labeling.save(hub_destination, include_optimizer=False, save_format="tf") - - -def main(_): - bert_config = configs.BertConfig.from_json_file(FLAGS.bert_config_file) - if FLAGS.model_type == "encoder": - deprecation_note = ( - "nlp/bert/export_tfhub is **DEPRECATED** for exporting BERT encoder " - "models. Please switch to nlp/tools/export_tfhub for exporting BERT " - "(and other) encoders with dict inputs/outputs conforming to " - "https://www.tensorflow.org/hub/common_saved_model_apis/text#transformer-encoders" - ) - logging.error(deprecation_note) - print("\n\nNOTICE:", deprecation_note, "\n") - export_bert_tfhub(bert_config, FLAGS.model_checkpoint_path, - FLAGS.export_path, FLAGS.vocab_file, FLAGS.do_lower_case) - elif FLAGS.model_type == "squad": - export_bert_squad_tfhub(bert_config, FLAGS.model_checkpoint_path, - FLAGS.export_path, FLAGS.vocab_file, - FLAGS.do_lower_case) - else: - raise ValueError("Unsupported model_type %s." % FLAGS.model_type) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/nlp/bert/export_tfhub_test.py b/official/nlp/bert/export_tfhub_test.py deleted file mode 100644 index 77030dd3fde7d4c4d73bea0fdea017848b1e253f..0000000000000000000000000000000000000000 --- a/official/nlp/bert/export_tfhub_test.py +++ /dev/null @@ -1,108 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests official.nlp.bert.export_tfhub.""" - -import os - -from absl.testing import parameterized -import numpy as np -import tensorflow as tf -import tensorflow_hub as hub - -from official.nlp.bert import configs -from official.nlp.bert import export_tfhub - - -class ExportTfhubTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters("model", "encoder") - def test_export_tfhub(self, ckpt_key_name): - # Exports a savedmodel for TF-Hub - hidden_size = 16 - bert_config = configs.BertConfig( - vocab_size=100, - hidden_size=hidden_size, - intermediate_size=32, - max_position_embeddings=128, - num_attention_heads=2, - num_hidden_layers=1) - bert_model, encoder = export_tfhub.create_bert_model(bert_config) - model_checkpoint_dir = os.path.join(self.get_temp_dir(), "checkpoint") - checkpoint = tf.train.Checkpoint(**{ckpt_key_name: encoder}) - checkpoint.save(os.path.join(model_checkpoint_dir, "test")) - model_checkpoint_path = tf.train.latest_checkpoint(model_checkpoint_dir) - - vocab_file = os.path.join(self.get_temp_dir(), "uncased_vocab.txt") - with tf.io.gfile.GFile(vocab_file, "w") as f: - f.write("dummy content") - - hub_destination = os.path.join(self.get_temp_dir(), "hub") - export_tfhub.export_bert_tfhub(bert_config, model_checkpoint_path, - hub_destination, vocab_file) - - # Restores a hub KerasLayer. - hub_layer = hub.KerasLayer(hub_destination, trainable=True) - - if hasattr(hub_layer, "resolved_object"): - # Checks meta attributes. - self.assertTrue(hub_layer.resolved_object.do_lower_case.numpy()) - with tf.io.gfile.GFile( - hub_layer.resolved_object.vocab_file.asset_path.numpy()) as f: - self.assertEqual("dummy content", f.read()) - # Checks the hub KerasLayer. - for source_weight, hub_weight in zip(bert_model.trainable_weights, - hub_layer.trainable_weights): - self.assertAllClose(source_weight.numpy(), hub_weight.numpy()) - - seq_length = 10 - dummy_ids = np.zeros((2, seq_length), dtype=np.int32) - hub_outputs = hub_layer([dummy_ids, dummy_ids, dummy_ids]) - source_outputs = bert_model([dummy_ids, dummy_ids, dummy_ids]) - - # The outputs of hub module are "pooled_output" and "sequence_output", - # while the outputs of encoder is in reversed order, i.e., - # "sequence_output" and "pooled_output". - encoder_outputs = reversed(encoder([dummy_ids, dummy_ids, dummy_ids])) - self.assertEqual(hub_outputs[0].shape, (2, hidden_size)) - self.assertEqual(hub_outputs[1].shape, (2, seq_length, hidden_size)) - for source_output, hub_output, encoder_output in zip( - source_outputs, hub_outputs, encoder_outputs): - self.assertAllClose(source_output.numpy(), hub_output.numpy()) - self.assertAllClose(source_output.numpy(), encoder_output.numpy()) - - # Test that training=True makes a difference (activates dropout). - def _dropout_mean_stddev(training, num_runs=20): - input_ids = np.array([[14, 12, 42, 95, 99]], np.int32) - inputs = [input_ids, np.ones_like(input_ids), np.zeros_like(input_ids)] - outputs = np.concatenate( - [hub_layer(inputs, training=training)[0] for _ in range(num_runs)]) - return np.mean(np.std(outputs, axis=0)) - - self.assertLess(_dropout_mean_stddev(training=False), 1e-6) - self.assertGreater(_dropout_mean_stddev(training=True), 1e-3) - - # Test propagation of seq_length in shape inference. - input_word_ids = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) - input_mask = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) - input_type_ids = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) - pooled_output, sequence_output = hub_layer( - [input_word_ids, input_mask, input_type_ids]) - self.assertEqual(pooled_output.shape.as_list(), [None, hidden_size]) - self.assertEqual(sequence_output.shape.as_list(), - [None, seq_length, hidden_size]) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/bert/run_classifier.py b/official/nlp/bert/run_classifier.py deleted file mode 100644 index b7ee5be8afe27549803ba22901d5e4a3cffc8cce..0000000000000000000000000000000000000000 --- a/official/nlp/bert/run_classifier.py +++ /dev/null @@ -1,515 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""BERT classification or regression finetuning runner in TF 2.x.""" - -import functools -import json -import math -import os - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import gin -import tensorflow as tf -from official.common import distribute_utils -from official.modeling import performance -from official.nlp import optimization -from official.nlp.bert import bert_models -from official.nlp.bert import common_flags -from official.nlp.bert import configs as bert_configs -from official.nlp.bert import input_pipeline -from official.nlp.bert import model_saving_utils -from official.utils.misc import keras_utils - -flags.DEFINE_enum( - 'mode', 'train_and_eval', ['train_and_eval', 'export_only', 'predict'], - 'One of {"train_and_eval", "export_only", "predict"}. `train_and_eval`: ' - 'trains the model and evaluates in the meantime. ' - '`export_only`: will take the latest checkpoint inside ' - 'model_dir and export a `SavedModel`. `predict`: takes a checkpoint and ' - 'restores the model to output predictions on the test set.') -flags.DEFINE_string('train_data_path', None, - 'Path to training data for BERT classifier.') -flags.DEFINE_string('eval_data_path', None, - 'Path to evaluation data for BERT classifier.') -flags.DEFINE_string( - 'input_meta_data_path', None, - 'Path to file that contains meta data about input ' - 'to be used for training and evaluation.') -flags.DEFINE_integer('train_data_size', None, 'Number of training samples ' - 'to use. If None, uses the full train data. ' - '(default: None).') -flags.DEFINE_string('predict_checkpoint_path', None, - 'Path to the checkpoint for predictions.') -flags.DEFINE_integer( - 'num_eval_per_epoch', 1, - 'Number of evaluations per epoch. The purpose of this flag is to provide ' - 'more granular evaluation scores and checkpoints. For example, if original ' - 'data has N samples and num_eval_per_epoch is n, then each epoch will be ' - 'evaluated every N/n samples.') -flags.DEFINE_integer('train_batch_size', 32, 'Batch size for training.') -flags.DEFINE_integer('eval_batch_size', 32, 'Batch size for evaluation.') - -common_flags.define_common_bert_flags() - -FLAGS = flags.FLAGS - -LABEL_TYPES_MAP = {'int': tf.int64, 'float': tf.float32} - - -def get_loss_fn(num_classes): - """Gets the classification loss function.""" - - def classification_loss_fn(labels, logits): - """Classification loss.""" - labels = tf.reshape(labels, [-1]) - log_probs = tf.nn.log_softmax(logits, axis=-1) - one_hot_labels = tf.one_hot( - tf.cast(labels, dtype=tf.int32), depth=num_classes, dtype=tf.float32) - per_example_loss = -tf.reduce_sum( - tf.cast(one_hot_labels, dtype=tf.float32) * log_probs, axis=-1) - return tf.reduce_mean(per_example_loss) - - return classification_loss_fn - - -def get_dataset_fn(input_file_pattern, - max_seq_length, - global_batch_size, - is_training, - label_type=tf.int64, - include_sample_weights=False, - num_samples=None): - """Gets a closure to create a dataset.""" - - def _dataset_fn(ctx=None): - """Returns tf.data.Dataset for distributed BERT pretraining.""" - batch_size = ctx.get_per_replica_batch_size( - global_batch_size) if ctx else global_batch_size - dataset = input_pipeline.create_classifier_dataset( - tf.io.gfile.glob(input_file_pattern), - max_seq_length, - batch_size, - is_training=is_training, - input_pipeline_context=ctx, - label_type=label_type, - include_sample_weights=include_sample_weights, - num_samples=num_samples) - return dataset - - return _dataset_fn - - -def run_bert_classifier(strategy, - bert_config, - input_meta_data, - model_dir, - epochs, - steps_per_epoch, - steps_per_loop, - eval_steps, - warmup_steps, - initial_lr, - init_checkpoint, - train_input_fn, - eval_input_fn, - training_callbacks=True, - custom_callbacks=None, - custom_metrics=None): - """Run BERT classifier training using low-level API.""" - max_seq_length = input_meta_data['max_seq_length'] - num_classes = input_meta_data.get('num_labels', 1) - is_regression = num_classes == 1 - - def _get_classifier_model(): - """Gets a classifier model.""" - classifier_model, core_model = ( - bert_models.classifier_model( - bert_config, - num_classes, - max_seq_length, - hub_module_url=FLAGS.hub_module_url, - hub_module_trainable=FLAGS.hub_module_trainable)) - optimizer = optimization.create_optimizer(initial_lr, - steps_per_epoch * epochs, - warmup_steps, FLAGS.end_lr, - FLAGS.optimizer_type) - classifier_model.optimizer = performance.configure_optimizer( - optimizer, - use_float16=common_flags.use_float16()) - return classifier_model, core_model - - # tf.keras.losses objects accept optional sample_weight arguments (eg. coming - # from the dataset) to compute weighted loss, as used for the regression - # tasks. The classification tasks, using the custom get_loss_fn don't accept - # sample weights though. - loss_fn = (tf.keras.losses.MeanSquaredError() if is_regression - else get_loss_fn(num_classes)) - - # Defines evaluation metrics function, which will create metrics in the - # correct device and strategy scope. - if custom_metrics: - metric_fn = custom_metrics - elif is_regression: - metric_fn = functools.partial( - tf.keras.metrics.MeanSquaredError, - 'mean_squared_error', - dtype=tf.float32) - else: - metric_fn = functools.partial( - tf.keras.metrics.SparseCategoricalAccuracy, - 'accuracy', - dtype=tf.float32) - - # Start training using Keras compile/fit API. - logging.info('Training using TF 2.x Keras compile/fit API with ' - 'distribution strategy.') - return run_keras_compile_fit( - model_dir, - strategy, - _get_classifier_model, - train_input_fn, - eval_input_fn, - loss_fn, - metric_fn, - init_checkpoint, - epochs, - steps_per_epoch, - steps_per_loop, - eval_steps, - training_callbacks=training_callbacks, - custom_callbacks=custom_callbacks) - - -def run_keras_compile_fit(model_dir, - strategy, - model_fn, - train_input_fn, - eval_input_fn, - loss_fn, - metric_fn, - init_checkpoint, - epochs, - steps_per_epoch, - steps_per_loop, - eval_steps, - training_callbacks=True, - custom_callbacks=None): - """Runs BERT classifier model using Keras compile/fit API.""" - - with strategy.scope(): - training_dataset = train_input_fn() - evaluation_dataset = eval_input_fn() if eval_input_fn else None - bert_model, sub_model = model_fn() - optimizer = bert_model.optimizer - - if init_checkpoint: - checkpoint = tf.train.Checkpoint(model=sub_model, encoder=sub_model) - checkpoint.read(init_checkpoint).assert_existing_objects_matched() - - if not isinstance(metric_fn, (list, tuple)): - metric_fn = [metric_fn] - bert_model.compile( - optimizer=optimizer, - loss=loss_fn, - metrics=[fn() for fn in metric_fn], - steps_per_execution=steps_per_loop) - - summary_dir = os.path.join(model_dir, 'summaries') - summary_callback = tf.keras.callbacks.TensorBoard(summary_dir) - checkpoint = tf.train.Checkpoint(model=bert_model, optimizer=optimizer) - checkpoint_manager = tf.train.CheckpointManager( - checkpoint, - directory=model_dir, - max_to_keep=None, - step_counter=optimizer.iterations, - checkpoint_interval=0) - checkpoint_callback = keras_utils.SimpleCheckpoint(checkpoint_manager) - - if training_callbacks: - if custom_callbacks is not None: - custom_callbacks += [summary_callback, checkpoint_callback] - else: - custom_callbacks = [summary_callback, checkpoint_callback] - - history = bert_model.fit( - x=training_dataset, - validation_data=evaluation_dataset, - steps_per_epoch=steps_per_epoch, - epochs=epochs, - validation_steps=eval_steps, - callbacks=custom_callbacks) - stats = {'total_training_steps': steps_per_epoch * epochs} - if 'loss' in history.history: - stats['train_loss'] = history.history['loss'][-1] - if 'val_accuracy' in history.history: - stats['eval_metrics'] = history.history['val_accuracy'][-1] - return bert_model, stats - - -def get_predictions_and_labels(strategy, - trained_model, - eval_input_fn, - is_regression=False, - return_probs=False): - """Obtains predictions of trained model on evaluation data. - - Note that list of labels is returned along with the predictions because the - order changes on distributing dataset over TPU pods. - - Args: - strategy: Distribution strategy. - trained_model: Trained model with preloaded weights. - eval_input_fn: Input function for evaluation data. - is_regression: Whether it is a regression task. - return_probs: Whether to return probabilities of classes. - - Returns: - predictions: List of predictions. - labels: List of gold labels corresponding to predictions. - """ - - @tf.function - def test_step(iterator): - """Computes predictions on distributed devices.""" - - def _test_step_fn(inputs): - """Replicated predictions.""" - inputs, labels = inputs - logits = trained_model(inputs, training=False) - if not is_regression: - probabilities = tf.nn.softmax(logits) - return probabilities, labels - else: - return logits, labels - - outputs, labels = strategy.run(_test_step_fn, args=(next(iterator),)) - # outputs: current batch logits as a tuple of shard logits - outputs = tf.nest.map_structure(strategy.experimental_local_results, - outputs) - labels = tf.nest.map_structure(strategy.experimental_local_results, labels) - return outputs, labels - - def _run_evaluation(test_iterator): - """Runs evaluation steps.""" - preds, golds = list(), list() - try: - with tf.experimental.async_scope(): - while True: - probabilities, labels = test_step(test_iterator) - for cur_probs, cur_labels in zip(probabilities, labels): - if return_probs: - preds.extend(cur_probs.numpy().tolist()) - else: - preds.extend(tf.math.argmax(cur_probs, axis=1).numpy()) - golds.extend(cur_labels.numpy().tolist()) - except (StopIteration, tf.errors.OutOfRangeError): - tf.experimental.async_clear_error() - return preds, golds - - test_iter = iter(strategy.distribute_datasets_from_function(eval_input_fn)) - predictions, labels = _run_evaluation(test_iter) - - return predictions, labels - - -def export_classifier(model_export_path, input_meta_data, bert_config, - model_dir): - """Exports a trained model as a `SavedModel` for inference. - - Args: - model_export_path: a string specifying the path to the SavedModel directory. - input_meta_data: dictionary containing meta data about input and model. - bert_config: Bert configuration file to define core bert layers. - model_dir: The directory where the model weights and training/evaluation - summaries are stored. - - Raises: - Export path is not specified, got an empty string or None. - """ - if not model_export_path: - raise ValueError('Export path is not specified: %s' % model_export_path) - if not model_dir: - raise ValueError('Export path is not specified: %s' % model_dir) - - # Export uses float32 for now, even if training uses mixed precision. - tf.keras.mixed_precision.set_global_policy('float32') - classifier_model = bert_models.classifier_model( - bert_config, - input_meta_data.get('num_labels', 1), - hub_module_url=FLAGS.hub_module_url, - hub_module_trainable=False)[0] - - model_saving_utils.export_bert_model( - model_export_path, model=classifier_model, checkpoint_dir=model_dir) - - -def run_bert(strategy, - input_meta_data, - model_config, - train_input_fn=None, - eval_input_fn=None, - init_checkpoint=None, - custom_callbacks=None, - custom_metrics=None): - """Run BERT training.""" - # Enables XLA in Session Config. Should not be set for TPU. - keras_utils.set_session_config(FLAGS.enable_xla) - performance.set_mixed_precision_policy(common_flags.dtype()) - - epochs = FLAGS.num_train_epochs * FLAGS.num_eval_per_epoch - train_data_size = ( - input_meta_data['train_data_size'] // FLAGS.num_eval_per_epoch) - if FLAGS.train_data_size: - train_data_size = min(train_data_size, FLAGS.train_data_size) - logging.info('Updated train_data_size: %s', train_data_size) - steps_per_epoch = int(train_data_size / FLAGS.train_batch_size) - warmup_steps = int(epochs * train_data_size * 0.1 / FLAGS.train_batch_size) - eval_steps = int( - math.ceil(input_meta_data['eval_data_size'] / FLAGS.eval_batch_size)) - - if not strategy: - raise ValueError('Distribution strategy has not been specified.') - - if not custom_callbacks: - custom_callbacks = [] - - if FLAGS.log_steps: - custom_callbacks.append( - keras_utils.TimeHistory( - batch_size=FLAGS.train_batch_size, - log_steps=FLAGS.log_steps, - logdir=FLAGS.model_dir)) - - trained_model, _ = run_bert_classifier( - strategy, - model_config, - input_meta_data, - FLAGS.model_dir, - epochs, - steps_per_epoch, - FLAGS.steps_per_loop, - eval_steps, - warmup_steps, - FLAGS.learning_rate, - init_checkpoint or FLAGS.init_checkpoint, - train_input_fn, - eval_input_fn, - custom_callbacks=custom_callbacks, - custom_metrics=custom_metrics) - - if FLAGS.model_export_path: - model_saving_utils.export_bert_model( - FLAGS.model_export_path, model=trained_model) - return trained_model - - -def custom_main(custom_callbacks=None, custom_metrics=None): - """Run classification or regression. - - Args: - custom_callbacks: list of tf.keras.Callbacks passed to training loop. - custom_metrics: list of metrics passed to the training loop. - """ - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_param) - - with tf.io.gfile.GFile(FLAGS.input_meta_data_path, 'rb') as reader: - input_meta_data = json.loads(reader.read().decode('utf-8')) - label_type = LABEL_TYPES_MAP[input_meta_data.get('label_type', 'int')] - include_sample_weights = input_meta_data.get('has_sample_weights', False) - - if not FLAGS.model_dir: - FLAGS.model_dir = '/tmp/bert20/' - - bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) - - if FLAGS.mode == 'export_only': - export_classifier(FLAGS.model_export_path, input_meta_data, bert_config, - FLAGS.model_dir) - return - - strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=FLAGS.distribution_strategy, - num_gpus=FLAGS.num_gpus, - tpu_address=FLAGS.tpu) - eval_input_fn = get_dataset_fn( - FLAGS.eval_data_path, - input_meta_data['max_seq_length'], - FLAGS.eval_batch_size, - is_training=False, - label_type=label_type, - include_sample_weights=include_sample_weights) - - if FLAGS.mode == 'predict': - num_labels = input_meta_data.get('num_labels', 1) - with strategy.scope(): - classifier_model = bert_models.classifier_model( - bert_config, num_labels)[0] - checkpoint = tf.train.Checkpoint(model=classifier_model) - latest_checkpoint_file = ( - FLAGS.predict_checkpoint_path or - tf.train.latest_checkpoint(FLAGS.model_dir)) - assert latest_checkpoint_file - logging.info('Checkpoint file %s found and restoring from ' - 'checkpoint', latest_checkpoint_file) - checkpoint.restore( - latest_checkpoint_file).assert_existing_objects_matched() - preds, _ = get_predictions_and_labels( - strategy, - classifier_model, - eval_input_fn, - is_regression=(num_labels == 1), - return_probs=True) - output_predict_file = os.path.join(FLAGS.model_dir, 'test_results.tsv') - with tf.io.gfile.GFile(output_predict_file, 'w') as writer: - logging.info('***** Predict results *****') - for probabilities in preds: - output_line = '\t'.join( - str(class_probability) - for class_probability in probabilities) + '\n' - writer.write(output_line) - return - - if FLAGS.mode != 'train_and_eval': - raise ValueError('Unsupported mode is specified: %s' % FLAGS.mode) - train_input_fn = get_dataset_fn( - FLAGS.train_data_path, - input_meta_data['max_seq_length'], - FLAGS.train_batch_size, - is_training=True, - label_type=label_type, - include_sample_weights=include_sample_weights, - num_samples=FLAGS.train_data_size) - run_bert( - strategy, - input_meta_data, - bert_config, - train_input_fn, - eval_input_fn, - custom_callbacks=custom_callbacks, - custom_metrics=custom_metrics) - - -def main(_): - custom_main(custom_callbacks=None, custom_metrics=None) - - -if __name__ == '__main__': - flags.mark_flag_as_required('bert_config_file') - flags.mark_flag_as_required('input_meta_data_path') - flags.mark_flag_as_required('model_dir') - app.run(main) diff --git a/official/nlp/bert/run_squad.py b/official/nlp/bert/run_squad.py deleted file mode 100644 index 8cafb917620abe6d969fecb563c3794bc78afc00..0000000000000000000000000000000000000000 --- a/official/nlp/bert/run_squad.py +++ /dev/null @@ -1,148 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Run BERT on SQuAD 1.1 and SQuAD 2.0 in TF 2.x.""" - -import json -import os -import time - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import gin -import tensorflow as tf -from official.common import distribute_utils -from official.nlp.bert import configs as bert_configs -from official.nlp.bert import run_squad_helper -from official.nlp.bert import tokenization -from official.nlp.data import squad_lib as squad_lib_wp -from official.utils.misc import keras_utils - - -flags.DEFINE_string('vocab_file', None, - 'The vocabulary file that the BERT model was trained on.') - -# More flags can be found in run_squad_helper. -run_squad_helper.define_common_squad_flags() - -FLAGS = flags.FLAGS - - -def train_squad(strategy, - input_meta_data, - custom_callbacks=None, - run_eagerly=False, - init_checkpoint=None, - sub_model_export_name=None): - """Run bert squad training.""" - bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) - init_checkpoint = init_checkpoint or FLAGS.init_checkpoint - run_squad_helper.train_squad(strategy, input_meta_data, bert_config, - custom_callbacks, run_eagerly, init_checkpoint, - sub_model_export_name=sub_model_export_name) - - -def predict_squad(strategy, input_meta_data): - """Makes predictions for the squad dataset.""" - bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) - tokenizer = tokenization.FullTokenizer( - vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case) - run_squad_helper.predict_squad( - strategy, input_meta_data, tokenizer, bert_config, squad_lib_wp) - - -def eval_squad(strategy, input_meta_data): - """Evaluate on the squad dataset.""" - bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) - tokenizer = tokenization.FullTokenizer( - vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case) - eval_metrics = run_squad_helper.eval_squad( - strategy, input_meta_data, tokenizer, bert_config, squad_lib_wp) - return eval_metrics - - -def export_squad(model_export_path, input_meta_data): - """Exports a trained model as a `SavedModel` for inference. - - Args: - model_export_path: a string specifying the path to the SavedModel directory. - input_meta_data: dictionary containing meta data about input and model. - - Raises: - Export path is not specified, got an empty string or None. - """ - bert_config = bert_configs.BertConfig.from_json_file(FLAGS.bert_config_file) - run_squad_helper.export_squad(model_export_path, input_meta_data, bert_config) - - -def main(_): - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_param) - - with tf.io.gfile.GFile(FLAGS.input_meta_data_path, 'rb') as reader: - input_meta_data = json.loads(reader.read().decode('utf-8')) - - if FLAGS.mode == 'export_only': - export_squad(FLAGS.model_export_path, input_meta_data) - return - - # Configures cluster spec for multi-worker distribution strategy. - if FLAGS.num_gpus > 0: - _ = distribute_utils.configure_cluster(FLAGS.worker_hosts, FLAGS.task_index) - strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=FLAGS.distribution_strategy, - num_gpus=FLAGS.num_gpus, - all_reduce_alg=FLAGS.all_reduce_alg, - tpu_address=FLAGS.tpu) - - if 'train' in FLAGS.mode: - if FLAGS.log_steps: - custom_callbacks = [keras_utils.TimeHistory( - batch_size=FLAGS.train_batch_size, - log_steps=FLAGS.log_steps, - logdir=FLAGS.model_dir, - )] - else: - custom_callbacks = None - - train_squad( - strategy, - input_meta_data, - custom_callbacks=custom_callbacks, - run_eagerly=FLAGS.run_eagerly, - sub_model_export_name=FLAGS.sub_model_export_name, - ) - if 'predict' in FLAGS.mode: - predict_squad(strategy, input_meta_data) - if 'eval' in FLAGS.mode: - eval_metrics = eval_squad(strategy, input_meta_data) - f1_score = eval_metrics['final_f1'] - logging.info('SQuAD eval F1-score: %f', f1_score) - summary_dir = os.path.join(FLAGS.model_dir, 'summaries', 'eval') - summary_writer = tf.summary.create_file_writer(summary_dir) - with summary_writer.as_default(): - # TODO(lehou): write to the correct step number. - tf.summary.scalar('F1-score', f1_score, step=0) - summary_writer.flush() - # Also write eval_metrics to json file. - squad_lib_wp.write_to_json_files( - eval_metrics, os.path.join(summary_dir, 'eval_metrics.json')) - time.sleep(60) - - -if __name__ == '__main__': - flags.mark_flag_as_required('bert_config_file') - flags.mark_flag_as_required('model_dir') - app.run(main) diff --git a/official/nlp/bert/tf1_checkpoint_converter_lib.py b/official/nlp/bert/tf1_checkpoint_converter_lib.py deleted file mode 100644 index 035a694385abfede7314188e38ab6801b6fef70a..0000000000000000000000000000000000000000 --- a/official/nlp/bert/tf1_checkpoint_converter_lib.py +++ /dev/null @@ -1,201 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -r"""Convert checkpoints created by Estimator (tf1) to be Keras compatible.""" - -import numpy as np -import tensorflow.compat.v1 as tf # TF 1.x - -# Mapping between old <=> new names. The source pattern in original variable -# name will be replaced by destination pattern. -BERT_NAME_REPLACEMENTS = ( - ("bert", "bert_model"), - ("embeddings/word_embeddings", "word_embeddings/embeddings"), - ("embeddings/token_type_embeddings", - "embedding_postprocessor/type_embeddings"), - ("embeddings/position_embeddings", - "embedding_postprocessor/position_embeddings"), - ("embeddings/LayerNorm", "embedding_postprocessor/layer_norm"), - ("attention/self", "self_attention"), - ("attention/output/dense", "self_attention_output"), - ("attention/output/LayerNorm", "self_attention_layer_norm"), - ("intermediate/dense", "intermediate"), - ("output/dense", "output"), - ("output/LayerNorm", "output_layer_norm"), - ("pooler/dense", "pooler_transform"), -) - -BERT_V2_NAME_REPLACEMENTS = ( - ("bert/", ""), - ("encoder", "transformer"), - ("embeddings/word_embeddings", "word_embeddings/embeddings"), - ("embeddings/token_type_embeddings", "type_embeddings/embeddings"), - ("embeddings/position_embeddings", "position_embedding/embeddings"), - ("embeddings/LayerNorm", "embeddings/layer_norm"), - ("attention/self", "self_attention"), - ("attention/output/dense", "self_attention/attention_output"), - ("attention/output/LayerNorm", "self_attention_layer_norm"), - ("intermediate/dense", "intermediate"), - ("output/dense", "output"), - ("output/LayerNorm", "output_layer_norm"), - ("pooler/dense", "pooler_transform"), - ("cls/predictions", "bert/cls/predictions"), - ("cls/predictions/output_bias", "cls/predictions/output_bias/bias"), - ("cls/seq_relationship/output_bias", "predictions/transform/logits/bias"), - ("cls/seq_relationship/output_weights", - "predictions/transform/logits/kernel"), -) - -BERT_PERMUTATIONS = () - -BERT_V2_PERMUTATIONS = (("cls/seq_relationship/output_weights", (1, 0)),) - - -def _bert_name_replacement(var_name, name_replacements): - """Gets the variable name replacement.""" - for src_pattern, tgt_pattern in name_replacements: - if src_pattern in var_name: - old_var_name = var_name - var_name = var_name.replace(src_pattern, tgt_pattern) - tf.logging.info("Converted: %s --> %s", old_var_name, var_name) - return var_name - - -def _has_exclude_patterns(name, exclude_patterns): - """Checks if a string contains substrings that match patterns to exclude.""" - for p in exclude_patterns: - if p in name: - return True - return False - - -def _get_permutation(name, permutations): - """Checks whether a variable requires transposition by pattern matching.""" - for src_pattern, permutation in permutations: - if src_pattern in name: - tf.logging.info("Permuted: %s --> %s", name, permutation) - return permutation - - return None - - -def _get_new_shape(name, shape, num_heads): - """Checks whether a variable requires reshape by pattern matching.""" - if "self_attention/attention_output/kernel" in name: - return tuple([num_heads, shape[0] // num_heads, shape[1]]) - if "self_attention/attention_output/bias" in name: - return shape - - patterns = [ - "self_attention/query", "self_attention/value", "self_attention/key" - ] - for pattern in patterns: - if pattern in name: - if "kernel" in name: - return tuple([shape[0], num_heads, shape[1] // num_heads]) - if "bias" in name: - return tuple([num_heads, shape[0] // num_heads]) - return None - - -def create_v2_checkpoint(model, - src_checkpoint, - output_path, - checkpoint_model_name="model"): - """Converts a name-based matched TF V1 checkpoint to TF V2 checkpoint.""" - # Uses streaming-restore in eager model to read V1 name-based checkpoints. - model.load_weights(src_checkpoint).assert_existing_objects_matched() - if hasattr(model, "checkpoint_items"): - checkpoint_items = model.checkpoint_items - else: - checkpoint_items = {} - - checkpoint_items[checkpoint_model_name] = model - checkpoint = tf.train.Checkpoint(**checkpoint_items) - checkpoint.save(output_path) - - -def convert(checkpoint_from_path, - checkpoint_to_path, - num_heads, - name_replacements, - permutations, - exclude_patterns=None): - """Migrates the names of variables within a checkpoint. - - Args: - checkpoint_from_path: Path to source checkpoint to be read in. - checkpoint_to_path: Path to checkpoint to be written out. - num_heads: The number of heads of the model. - name_replacements: A list of tuples of the form (match_str, replace_str) - describing variable names to adjust. - permutations: A list of tuples of the form (match_str, permutation) - describing permutations to apply to given variables. Note that match_str - should match the original variable name, not the replaced one. - exclude_patterns: A list of string patterns to exclude variables from - checkpoint conversion. - - Returns: - A dictionary that maps the new variable names to the Variable objects. - A dictionary that maps the old variable names to the new variable names. - """ - with tf.Graph().as_default(): - tf.logging.info("Reading checkpoint_from_path %s", checkpoint_from_path) - reader = tf.train.NewCheckpointReader(checkpoint_from_path) - name_shape_map = reader.get_variable_to_shape_map() - new_variable_map = {} - conversion_map = {} - for var_name in name_shape_map: - if exclude_patterns and _has_exclude_patterns(var_name, exclude_patterns): - continue - # Get the original tensor data. - tensor = reader.get_tensor(var_name) - - # Look up the new variable name, if any. - new_var_name = _bert_name_replacement(var_name, name_replacements) - - # See if we need to reshape the underlying tensor. - new_shape = None - if num_heads > 0: - new_shape = _get_new_shape(new_var_name, tensor.shape, num_heads) - if new_shape: - tf.logging.info("Veriable %s has a shape change from %s to %s", - var_name, tensor.shape, new_shape) - tensor = np.reshape(tensor, new_shape) - - # See if we need to permute the underlying tensor. - permutation = _get_permutation(var_name, permutations) - if permutation: - tensor = np.transpose(tensor, permutation) - - # Create a new variable with the possibly-reshaped or transposed tensor. - var = tf.Variable(tensor, name=var_name) - - # Save the variable into the new variable map. - new_variable_map[new_var_name] = var - - # Keep a list of converter variables for sanity checking. - if new_var_name != var_name: - conversion_map[var_name] = new_var_name - - saver = tf.train.Saver(new_variable_map) - - with tf.Session() as sess: - sess.run(tf.global_variables_initializer()) - tf.logging.info("Writing checkpoint_to_path %s", checkpoint_to_path) - saver.save(sess, checkpoint_to_path, write_meta_graph=False) - - tf.logging.info("Summary:") - tf.logging.info(" Converted %d variable name(s).", len(new_variable_map)) - tf.logging.info(" Converted: %s", str(conversion_map)) diff --git a/official/nlp/bert/tf2_encoder_checkpoint_converter.py b/official/nlp/bert/tf2_encoder_checkpoint_converter.py deleted file mode 100644 index 9fced5daee95479c28cffd2b63dffcf6f2d90408..0000000000000000000000000000000000000000 --- a/official/nlp/bert/tf2_encoder_checkpoint_converter.py +++ /dev/null @@ -1,160 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""A converter from a V1 BERT encoder checkpoint to a V2 encoder checkpoint. - -The conversion will yield an object-oriented checkpoint that can be used -to restore a BertEncoder or BertPretrainerV2 object (see the `converted_model` -FLAG below). -""" - -import os - -from absl import app -from absl import flags - -import tensorflow as tf -from official.modeling import tf_utils -from official.nlp.bert import configs -from official.nlp.bert import tf1_checkpoint_converter_lib -from official.nlp.modeling import models -from official.nlp.modeling import networks - -FLAGS = flags.FLAGS - -flags.DEFINE_string("bert_config_file", None, - "Bert configuration file to define core bert layers.") -flags.DEFINE_string( - "checkpoint_to_convert", None, - "Initial checkpoint from a pretrained BERT model core (that is, only the " - "BertModel, with no task heads.)") -flags.DEFINE_string("converted_checkpoint_path", None, - "Name for the created object-based V2 checkpoint.") -flags.DEFINE_string("checkpoint_model_name", "encoder", - "The name of the model when saving the checkpoint, i.e., " - "the checkpoint will be saved using: " - "tf.train.Checkpoint(FLAGS.checkpoint_model_name=model).") -flags.DEFINE_enum( - "converted_model", "encoder", ["encoder", "pretrainer"], - "Whether to convert the checkpoint to a `BertEncoder` model or a " - "`BertPretrainerV2` model (with mlm but without classification heads).") - - -def _create_bert_model(cfg): - """Creates a BERT keras core model from BERT configuration. - - Args: - cfg: A `BertConfig` to create the core model. - - Returns: - A BertEncoder network. - """ - bert_encoder = networks.BertEncoder( - vocab_size=cfg.vocab_size, - hidden_size=cfg.hidden_size, - num_layers=cfg.num_hidden_layers, - num_attention_heads=cfg.num_attention_heads, - intermediate_size=cfg.intermediate_size, - activation=tf_utils.get_activation(cfg.hidden_act), - dropout_rate=cfg.hidden_dropout_prob, - attention_dropout_rate=cfg.attention_probs_dropout_prob, - max_sequence_length=cfg.max_position_embeddings, - type_vocab_size=cfg.type_vocab_size, - initializer=tf.keras.initializers.TruncatedNormal( - stddev=cfg.initializer_range), - embedding_width=cfg.embedding_size) - - return bert_encoder - - -def _create_bert_pretrainer_model(cfg): - """Creates a BERT keras core model from BERT configuration. - - Args: - cfg: A `BertConfig` to create the core model. - - Returns: - A BertPretrainerV2 model. - """ - bert_encoder = _create_bert_model(cfg) - pretrainer = models.BertPretrainerV2( - encoder_network=bert_encoder, - mlm_activation=tf_utils.get_activation(cfg.hidden_act), - mlm_initializer=tf.keras.initializers.TruncatedNormal( - stddev=cfg.initializer_range)) - # Makes sure the pretrainer variables are created. - _ = pretrainer(pretrainer.inputs) - return pretrainer - - -def convert_checkpoint(bert_config, - output_path, - v1_checkpoint, - checkpoint_model_name="model", - converted_model="encoder"): - """Converts a V1 checkpoint into an OO V2 checkpoint.""" - output_dir, _ = os.path.split(output_path) - tf.io.gfile.makedirs(output_dir) - - # Create a temporary V1 name-converted checkpoint in the output directory. - temporary_checkpoint_dir = os.path.join(output_dir, "temp_v1") - temporary_checkpoint = os.path.join(temporary_checkpoint_dir, "ckpt") - - tf1_checkpoint_converter_lib.convert( - checkpoint_from_path=v1_checkpoint, - checkpoint_to_path=temporary_checkpoint, - num_heads=bert_config.num_attention_heads, - name_replacements=tf1_checkpoint_converter_lib.BERT_V2_NAME_REPLACEMENTS, - permutations=tf1_checkpoint_converter_lib.BERT_V2_PERMUTATIONS, - exclude_patterns=["adam", "Adam"]) - - if converted_model == "encoder": - model = _create_bert_model(bert_config) - elif converted_model == "pretrainer": - model = _create_bert_pretrainer_model(bert_config) - else: - raise ValueError("Unsupported converted_model: %s" % converted_model) - - # Create a V2 checkpoint from the temporary checkpoint. - tf1_checkpoint_converter_lib.create_v2_checkpoint(model, temporary_checkpoint, - output_path, - checkpoint_model_name) - - # Clean up the temporary checkpoint, if it exists. - try: - tf.io.gfile.rmtree(temporary_checkpoint_dir) - except tf.errors.OpError: - # If it doesn't exist, we don't need to clean it up; continue. - pass - - -def main(argv): - if len(argv) > 1: - raise app.UsageError("Too many command-line arguments.") - - output_path = FLAGS.converted_checkpoint_path - v1_checkpoint = FLAGS.checkpoint_to_convert - checkpoint_model_name = FLAGS.checkpoint_model_name - converted_model = FLAGS.converted_model - bert_config = configs.BertConfig.from_json_file(FLAGS.bert_config_file) - convert_checkpoint( - bert_config=bert_config, - output_path=output_path, - v1_checkpoint=v1_checkpoint, - checkpoint_model_name=checkpoint_model_name, - converted_model=converted_model) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/nlp/configs/__init__.py b/official/nlp/configs/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/nlp/configs/__init__.py +++ b/official/nlp/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/configs/bert.py b/official/nlp/configs/bert.py index cf78de0388bf76b68cd6df8cc656842bbfc90b64..e712aae2cf3afbc78dc8d33e41fa6abbfe3842ce 100644 --- a/official/nlp/configs/bert.py +++ b/official/nlp/configs/bert.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -41,3 +41,5 @@ class PretrainerConfig(base_config.Config): cls_heads: List[ClsHeadConfig] = dataclasses.field(default_factory=list) mlm_activation: str = "gelu" mlm_initializer_range: float = 0.02 + # Currently only used for mobile bert. + mlm_output_weights_use_proj: bool = False diff --git a/official/nlp/configs/electra.py b/official/nlp/configs/electra.py index 5e62297667a470fd192779d8dc7f5c5117836804..0c55e50e5e81b5ef88bc4fa115b4c9ed9dd9409a 100644 --- a/official/nlp/configs/electra.py +++ b/official/nlp/configs/electra.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/configs/encoders.py b/official/nlp/configs/encoders.py index bc44c899b5cf905c9c50b6fe567f23414c2d0d68..90a5d47d11df7f0709ac1aef4c899b60ef820d2c 100644 --- a/official/nlp/configs/encoders.py +++ b/official/nlp/configs/encoders.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,9 +16,9 @@ Includes configurations and factory methods. """ -from typing import Optional - import dataclasses +from typing import Optional, Sequence + import gin import tensorflow as tf @@ -26,7 +26,7 @@ from official.modeling import hyperparams from official.modeling import tf_utils from official.nlp.modeling import layers from official.nlp.modeling import networks -from official.nlp.projects.bigbird import encoder as bigbird_encoder +from official.projects.bigbird import encoder as bigbird_encoder @dataclasses.dataclass @@ -221,6 +221,50 @@ class XLNetEncoderConfig(hyperparams.Config): two_stream: bool = False +@dataclasses.dataclass +class QueryBertConfig(hyperparams.Config): + """Query BERT encoder configuration.""" + vocab_size: int = 30522 + hidden_size: int = 768 + num_layers: int = 12 + num_attention_heads: int = 12 + hidden_activation: str = "gelu" + intermediate_size: int = 3072 + dropout_rate: float = 0.1 + attention_dropout_rate: float = 0.1 + max_position_embeddings: int = 512 + type_vocab_size: int = 2 + initializer_range: float = 0.02 + embedding_size: Optional[int] = None + output_range: Optional[int] = None + return_all_encoder_outputs: bool = False + # Pre/Post-LN Transformer + norm_first: bool = False + + +@dataclasses.dataclass +class FNetEncoderConfig(hyperparams.Config): + """FNet encoder configuration.""" + vocab_size: int = 30522 + hidden_size: int = 768 + num_layers: int = 12 + num_attention_heads: int = 12 + inner_activation: str = "gelu" + inner_dim: int = 3072 + output_dropout: float = 0.1 + attention_dropout: float = 0.1 + max_sequence_length: int = 512 + type_vocab_size: int = 2 + initializer_range: float = 0.02 + embedding_width: Optional[int] = None + output_range: Optional[int] = None + return_all_encoder_outputs: bool = False + # Pre/Post-LN Transformer + norm_first: bool = False + use_fft: bool = False + attention_layers: Sequence[int] = () + + @dataclasses.dataclass class EncoderConfig(hyperparams.OneOfConfig): """Encoder configuration.""" @@ -233,6 +277,8 @@ class EncoderConfig(hyperparams.OneOfConfig): mobilebert: MobileBertEncoderConfig = MobileBertEncoderConfig() reuse: ReuseEncoderConfig = ReuseEncoderConfig() xlnet: XLNetEncoderConfig = XLNetEncoderConfig() + query_bert: QueryBertConfig = QueryBertConfig() + fnet: FNetEncoderConfig = FNetEncoderConfig() # If `any` is used, the encoder building relies on any.BUILDER. any: hyperparams.Config = hyperparams.Config() @@ -513,6 +559,54 @@ def build_encoder(config: EncoderConfig, recursive=True) return networks.EncoderScaffold(**kwargs) + if encoder_type == "query_bert": + embedding_layer = layers.FactorizedEmbedding( + vocab_size=encoder_cfg.vocab_size, + embedding_width=encoder_cfg.embedding_size, + output_dim=encoder_cfg.hidden_size, + initializer=tf.keras.initializers.TruncatedNormal( + stddev=encoder_cfg.initializer_range), + name="word_embeddings") + return networks.BertEncoderV2( + vocab_size=encoder_cfg.vocab_size, + hidden_size=encoder_cfg.hidden_size, + num_layers=encoder_cfg.num_layers, + num_attention_heads=encoder_cfg.num_attention_heads, + intermediate_size=encoder_cfg.intermediate_size, + activation=tf_utils.get_activation(encoder_cfg.hidden_activation), + dropout_rate=encoder_cfg.dropout_rate, + attention_dropout_rate=encoder_cfg.attention_dropout_rate, + max_sequence_length=encoder_cfg.max_position_embeddings, + type_vocab_size=encoder_cfg.type_vocab_size, + initializer=tf.keras.initializers.TruncatedNormal( + stddev=encoder_cfg.initializer_range), + output_range=encoder_cfg.output_range, + embedding_layer=embedding_layer, + return_all_encoder_outputs=encoder_cfg.return_all_encoder_outputs, + dict_outputs=True, + norm_first=encoder_cfg.norm_first) + + if encoder_type == "fnet": + return networks.FNet( + vocab_size=encoder_cfg.vocab_size, + hidden_size=encoder_cfg.hidden_size, + num_layers=encoder_cfg.num_layers, + num_attention_heads=encoder_cfg.num_attention_heads, + inner_dim=encoder_cfg.inner_dim, + inner_activation=tf_utils.get_activation(encoder_cfg.inner_activation), + output_dropout=encoder_cfg.output_dropout, + attention_dropout=encoder_cfg.attention_dropout, + max_sequence_length=encoder_cfg.max_sequence_length, + type_vocab_size=encoder_cfg.type_vocab_size, + initializer=tf.keras.initializers.TruncatedNormal( + stddev=encoder_cfg.initializer_range), + output_range=encoder_cfg.output_range, + embedding_width=encoder_cfg.embedding_width, + embedding_layer=embedding_layer, + norm_first=encoder_cfg.norm_first, + use_fft=encoder_cfg.use_fft, + attention_layers=encoder_cfg.attention_layers) + bert_encoder_cls = networks.BertEncoder if encoder_type == "bert_v2": bert_encoder_cls = networks.BertEncoderV2 diff --git a/official/nlp/configs/encoders_test.py b/official/nlp/configs/encoders_test.py index 3b6bf6198b1757861e56258841b0c58d1951d806..6012c55fe9bab058945b5fa707de0734846cb56e 100644 --- a/official/nlp/configs/encoders_test.py +++ b/official/nlp/configs/encoders_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ import tensorflow as tf from official.modeling import hyperparams from official.nlp.configs import encoders from official.nlp.modeling import networks -from official.nlp.projects.teams import teams +from official.projects.teams import teams class EncodersTest(tf.test.TestCase): diff --git a/official/nlp/configs/experiment_configs.py b/official/nlp/configs/experiment_configs.py index 2b52d5b4b7fda4bfd487310b0bd39a255117968a..006d6d7d5582aef228f499c35d14a3c85fb893e5 100644 --- a/official/nlp/configs/experiment_configs.py +++ b/official/nlp/configs/experiment_configs.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,4 +17,3 @@ from official.nlp.configs import finetuning_experiments from official.nlp.configs import pretraining_experiments from official.nlp.configs import wmt_transformer_experiments -from official.nlp.projects.teams import teams_experiments diff --git a/official/nlp/configs/experiments/wiki_books_pretrain.yaml b/official/nlp/configs/experiments/wiki_books_pretrain.yaml new file mode 100644 index 0000000000000000000000000000000000000000..bff3cbb73a1ec6466ade7f985f7dfaf550722371 --- /dev/null +++ b/official/nlp/configs/experiments/wiki_books_pretrain.yaml @@ -0,0 +1,48 @@ +task: + init_checkpoint: '' + model: + cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768, name: next_sentence, num_classes: 2}] + train_data: + drop_remainder: true + global_batch_size: 512 + input_path: '[Your proceed wiki data path]*,[Your proceed books data path]*' + is_training: true + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: true + use_position_id: false + use_v2_feature_names: true + validation_data: + drop_remainder: false + global_batch_size: 512 + input_path: '[Your proceed wiki data path]-00000-of-00500,[Your proceed books data path]-00000-of-00500' + is_training: false + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: true + use_position_id: false + use_v2_feature_names: true +trainer: + checkpoint_interval: 20000 + max_to_keep: 5 + optimizer_config: + learning_rate: + polynomial: + cycle: false + decay_steps: 1000000 + end_learning_rate: 0.0 + initial_learning_rate: 0.0001 + power: 1.0 + type: polynomial + optimizer: + type: adamw + warmup: + polynomial: + power: 1 + warmup_steps: 10000 + type: polynomial + steps_per_loop: 1000 + summary_interval: 1000 + train_steps: 1000000 + validation_interval: 1000 + validation_steps: 64 diff --git a/official/nlp/configs/finetuning_experiments.py b/official/nlp/configs/finetuning_experiments.py index d87c9655e5118b6d8322ec2513d554c3eebdbf6b..23833d4cf49a01a315edbc3d153fde7570cba487 100644 --- a/official/nlp/configs/finetuning_experiments.py +++ b/official/nlp/configs/finetuning_experiments.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/configs/pretraining_experiments.py b/official/nlp/configs/pretraining_experiments.py index 024c6fcfb281a467ceaaffa1cdbdf07fdae5a95a..1eedb87828054729eb621b3dc6241e491a14899d 100644 --- a/official/nlp/configs/pretraining_experiments.py +++ b/official/nlp/configs/pretraining_experiments.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/configs/wmt_transformer_experiments.py b/official/nlp/configs/wmt_transformer_experiments.py index eb85b76c5a94505de9c4e7e2e11a563abce5a645..bdef599fa428e3f76e5810bb8fcf7d5a7fa4af92 100644 --- a/official/nlp/configs/wmt_transformer_experiments.py +++ b/official/nlp/configs/wmt_transformer_experiments.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 # pylint: disable=g-doc-return-or-yield,line-too-long """WMT translation configurations.""" diff --git a/official/nlp/continuous_finetune_lib.py b/official/nlp/continuous_finetune_lib.py index 6fe851741c0f631ea18f80b9bf259c551e7f2561..988b62c60328f46cf371a563f37e3ff0867743d2 100644 --- a/official/nlp/continuous_finetune_lib.py +++ b/official/nlp/continuous_finetune_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/continuous_finetune_lib_test.py b/official/nlp/continuous_finetune_lib_test.py index 08ee381dce133d73e18e697b938cab92d04f2ff0..6ed727d73e26521af15fd540b1486bd933dbf0b0 100644 --- a/official/nlp/continuous_finetune_lib_test.py +++ b/official/nlp/continuous_finetune_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/README.md b/official/nlp/data/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2a706d7be7f99974a097ca128e890589c8cdb826 --- /dev/null +++ b/official/nlp/data/README.md @@ -0,0 +1,4 @@ +This directory contains binaries and utils required for input preprocessing, +tokenization, etc that can be used with model building blocks available in +NLP modeling library [nlp/modelling](https://github.com/tensorflow/models/tree/master/official/nlp/modeling) +to train custom models and validate new research ideas. diff --git a/official/nlp/data/__init__.py b/official/nlp/data/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/nlp/data/__init__.py +++ b/official/nlp/data/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/classifier_data_lib.py b/official/nlp/data/classifier_data_lib.py index 0ba9dcf9a055ab9a3e5206f251d45d2ea41a2661..3e95caf719b83ec1c21faa60889385f519af8b72 100644 --- a/official/nlp/data/classifier_data_lib.py +++ b/official/nlp/data/classifier_data_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -24,7 +24,7 @@ from absl import logging import tensorflow as tf import tensorflow_datasets as tfds -from official.nlp.bert import tokenization +from official.nlp.tools import tokenization class InputExample(object): @@ -187,6 +187,8 @@ class AxProcessor(DataProcessor): def _create_examples_tfds(self, dataset, set_type): """Creates examples for the training/dev/test sets.""" + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -218,6 +220,8 @@ class ColaProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/cola", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -312,6 +316,8 @@ class MnliProcessor(DataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/mnli", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -343,6 +349,8 @@ class MrpcProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/mrpc", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -453,6 +461,8 @@ class QnliProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/qnli", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -484,6 +494,8 @@ class QqpProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/qqp", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -517,6 +529,8 @@ class RteProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/rte", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -548,6 +562,8 @@ class SstProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/sst2", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -574,6 +590,8 @@ class StsBProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/stsb", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) @@ -742,6 +760,8 @@ class WnliProcessor(DefaultGLUEDataProcessor): """Creates examples for the training/dev/test sets.""" dataset = tfds.load( "glue/wnli", split=set_type, try_gcs=True).as_numpy_iterator() + dataset = list(dataset) + dataset.sort(key=lambda x: x["idx"]) examples = [] for i, example in enumerate(dataset): guid = "%s-%s" % (set_type, i) diff --git a/official/nlp/data/classifier_data_lib_test.py b/official/nlp/data/classifier_data_lib_test.py index c1db1a3d03f7b6daaa4816decb653814c675f3e2..f7a517da0a2a92442da1dcb16afc9230938b20e6 100644 --- a/official/nlp/data/classifier_data_lib_test.py +++ b/official/nlp/data/classifier_data_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,8 +21,8 @@ from absl.testing import parameterized import tensorflow as tf import tensorflow_datasets as tfds -from official.nlp.bert import tokenization from official.nlp.data import classifier_data_lib +from official.nlp.tools import tokenization def decode_record(record, name_to_features): diff --git a/official/nlp/data/create_finetuning_data.py b/official/nlp/data/create_finetuning_data.py index 01f2deaecde56e9927eb41452fd896539932d123..be1b6b444a09539a7d5b07aec908f168d03049ac 100644 --- a/official/nlp/data/create_finetuning_data.py +++ b/official/nlp/data/create_finetuning_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,7 +22,6 @@ import os from absl import app from absl import flags import tensorflow as tf -from official.nlp.bert import tokenization from official.nlp.data import classifier_data_lib from official.nlp.data import sentence_retrieval_lib # word-piece tokenizer based squad_lib @@ -30,10 +29,10 @@ from official.nlp.data import squad_lib as squad_lib_wp # sentence-piece tokenizer based squad_lib from official.nlp.data import squad_lib_sp from official.nlp.data import tagging_data_lib +from official.nlp.tools import tokenization FLAGS = flags.FLAGS -# TODO(chendouble): consider moving each task to its own binary. flags.DEFINE_enum( "fine_tuning_task_type", "classification", ["classification", "regression", "squad", "retrieval", "tagging"], diff --git a/official/nlp/data/create_pretraining_data.py b/official/nlp/data/create_pretraining_data.py index 93b7723d125a6e4916a8a595ef4c5a4b470bdcc9..4d5eae4de05ba10f0a5f4294d3f7957279eef0a5 100644 --- a/official/nlp/data/create_pretraining_data.py +++ b/official/nlp/data/create_pretraining_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -24,7 +24,7 @@ from absl import flags from absl import logging import tensorflow as tf -from official.nlp.bert import tokenization +from official.nlp.tools import tokenization FLAGS = flags.FLAGS diff --git a/official/nlp/data/create_pretraining_data_test.py b/official/nlp/data/create_pretraining_data_test.py index 79a38ba8506ac428d48188f0eb4fbf2ce26b4422..da50d5479e458035f9e4379bf20ae6d1f5309432 100644 --- a/official/nlp/data/create_pretraining_data_test.py +++ b/official/nlp/data/create_pretraining_data_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/create_xlnet_pretraining_data.py b/official/nlp/data/create_xlnet_pretraining_data.py index 363164fcae001a61da53b0bb6e0afb9f4e92fd42..3657962fd19a90159eb3e0e5b5a4a21694dea628 100644 --- a/official/nlp/data/create_xlnet_pretraining_data.py +++ b/official/nlp/data/create_xlnet_pretraining_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,6 +14,7 @@ """Create LM TF examples for XLNet.""" +import dataclasses import json import math import os @@ -28,11 +29,10 @@ from absl import app from absl import flags from absl import logging -import dataclasses import numpy as np import tensorflow as tf -from official.nlp.bert import tokenization +from official.nlp.tools import tokenization special_symbols = { "": 0, diff --git a/official/nlp/data/create_xlnet_pretraining_data_test.py b/official/nlp/data/create_xlnet_pretraining_data_test.py index 5630411a7eb0e92b2baf6e203547d1c9063ebd79..6a3b96833edfd23f0740ce8b7498c00df03d970e 100644 --- a/official/nlp/data/create_xlnet_pretraining_data_test.py +++ b/official/nlp/data/create_xlnet_pretraining_data_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/data_loader.py b/official/nlp/data/data_loader.py index 2b181270658f42f0819f01fbed5af7989b1d3e5d..b962d5f97405300f26512305fc4b96e6699bf5a2 100644 --- a/official/nlp/data/data_loader.py +++ b/official/nlp/data/data_loader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/data_loader_factory.py b/official/nlp/data/data_loader_factory.py index 9602ea295283e5490d1bcb5cc67df9f99ebdb0ca..f3a2decb8c5ea3871490d0b551151c86d6887fcf 100644 --- a/official/nlp/data/data_loader_factory.py +++ b/official/nlp/data/data_loader_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/data_loader_factory_test.py b/official/nlp/data/data_loader_factory_test.py index 8aa86757df64a445692ce4bf8ff64e6649b6dfa6..518717a3f37d49b67974e5a66e6f308a1532f971 100644 --- a/official/nlp/data/data_loader_factory_test.py +++ b/official/nlp/data/data_loader_factory_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/dual_encoder_dataloader.py b/official/nlp/data/dual_encoder_dataloader.py index af9f1090fceda600373ef785d91fd215e8621fd8..1818d07f0bccb20b94489a14542d8e1bbab42bb3 100644 --- a/official/nlp/data/dual_encoder_dataloader.py +++ b/official/nlp/data/dual_encoder_dataloader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -124,7 +124,7 @@ class DualEncoderDataLoader(data_loader.DataLoader): raise ValueError('Expected {} to start with {}'.format(string, old)) def _switch_key_prefix(d, old, new): - return {_switch_prefix(key, old, new): value for key, value in d.items()} + return {_switch_prefix(key, old, new): value for key, value in d.items()} # pytype: disable=attribute-error # trace-all-classes model_inputs = _switch_key_prefix( self._bert_tokenize(record, self._left_text_fields), diff --git a/official/nlp/data/dual_encoder_dataloader_test.py b/official/nlp/data/dual_encoder_dataloader_test.py index 358b0d9635c747a72e0e8f40951f11eb2c5755a0..bebdc1531ef169a93aaa5a1e6c460330db5b6376 100644 --- a/official/nlp/data/dual_encoder_dataloader_test.py +++ b/official/nlp/data/dual_encoder_dataloader_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/pretrain_dataloader.py b/official/nlp/data/pretrain_dataloader.py index dbb7953c3fd7f1562d7b3ec07c58b09eefef8e25..f2a33cd4277cdfac1a71894beb6369ef22d8500a 100644 --- a/official/nlp/data/pretrain_dataloader.py +++ b/official/nlp/data/pretrain_dataloader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/pretrain_dataloader_test.py b/official/nlp/data/pretrain_dataloader_test.py index 5f3807c907ad9cbb2007425ac13bba620491dce6..ce7f216f9af78ce5d4cc70095444745a1db98ae9 100644 --- a/official/nlp/data/pretrain_dataloader_test.py +++ b/official/nlp/data/pretrain_dataloader_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/pretrain_dynamic_dataloader.py b/official/nlp/data/pretrain_dynamic_dataloader.py index c1de4ba54b86a3386708e3f56b76e8e3726c397d..ab61445468070a674bf5c66a63f7ee83eb5ed4dc 100644 --- a/official/nlp/data/pretrain_dynamic_dataloader.py +++ b/official/nlp/data/pretrain_dynamic_dataloader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -79,17 +79,29 @@ class PretrainingDynamicDataLoader(pretrain_dataloader.BertPretrainDataLoader): def _decode(self, record: tf.Tensor): """Decodes a serialized tf.Example.""" name_to_features = { - 'input_ids': tf.io.VarLenFeature(tf.int64), 'input_mask': tf.io.VarLenFeature(tf.int64), - 'segment_ids': tf.io.VarLenFeature(tf.int64), 'masked_lm_positions': tf.io.VarLenFeature(tf.int64), 'masked_lm_ids': tf.io.VarLenFeature(tf.int64), 'masked_lm_weights': tf.io.VarLenFeature(tf.float32), } + if self._params.use_v2_feature_names: + input_ids_key = 'input_word_ids' + segment_key = 'input_type_ids' + name_to_features.update({ + input_ids_key: tf.io.VarLenFeature(tf.int64), + segment_key: tf.io.VarLenFeature(tf.int64), + }) + else: + input_ids_key = 'input_ids' + segment_key = 'segment_ids' + name_to_features.update({ + input_ids_key: tf.io.VarLenFeature(tf.int64), + segment_key: tf.io.VarLenFeature(tf.int64), + }) if self._use_next_sentence_label: name_to_features['next_sentence_labels'] = tf.io.FixedLenFeature([1], tf.int64) - dynamic_keys = ['input_ids', 'input_mask', 'segment_ids'] + dynamic_keys = [input_ids_key, 'input_mask', segment_key] if self._use_position_id: name_to_features['position_ids'] = tf.io.VarLenFeature(tf.int64) dynamic_keys.append('position_ids') @@ -102,7 +114,7 @@ class PretrainingDynamicDataLoader(pretrain_dataloader.BertPretrainDataLoader): # sequence length dimension. # Pad before the first non pad from the back should not be removed. mask = tf.math.greater( - tf.math.cumsum(example['input_ids'], reverse=True), 0) + tf.math.cumsum(example[input_ids_key], reverse=True), 0) for key in dynamic_keys: example[key] = tf.boolean_mask(example[key], mask) diff --git a/official/nlp/data/pretrain_dynamic_dataloader_test.py b/official/nlp/data/pretrain_dynamic_dataloader_test.py index 188e6d495b71acc2699b82f71a86acb5efbf99f5..3927b7993559f89dc63133a1f1dd913d55fb8675 100644 --- a/official/nlp/data/pretrain_dynamic_dataloader_test.py +++ b/official/nlp/data/pretrain_dynamic_dataloader_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/question_answering_dataloader.py b/official/nlp/data/question_answering_dataloader.py index 0f721ed773a927e8caa8c3cfbaa5cf2ef6c896e5..171c0d3b228f48a4264a2b1b67704a6b54efe11f 100644 --- a/official/nlp/data/question_answering_dataloader.py +++ b/official/nlp/data/question_answering_dataloader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/question_answering_dataloader_test.py b/official/nlp/data/question_answering_dataloader_test.py index c853bc080cddf9fc5c26a0f7f21cff19088bad9f..9767ef0a7c1d68fa7d87f5a4f71da20b9f2f092b 100644 --- a/official/nlp/data/question_answering_dataloader_test.py +++ b/official/nlp/data/question_answering_dataloader_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/sentence_prediction_dataloader.py b/official/nlp/data/sentence_prediction_dataloader.py index 3517edfb9757f26869522d5c40b1c01f256de8e0..c601b9d72d5f9aaba658edb68b6c413648780aaa 100644 --- a/official/nlp/data/sentence_prediction_dataloader.py +++ b/official/nlp/data/sentence_prediction_dataloader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/sentence_prediction_dataloader_test.py b/official/nlp/data/sentence_prediction_dataloader_test.py index 876b9d421d26a06d8442570a65a60e63022c2fd1..d4f0d8559b10c2871fc8daf38af19ad90a8c77a6 100644 --- a/official/nlp/data/sentence_prediction_dataloader_test.py +++ b/official/nlp/data/sentence_prediction_dataloader_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/sentence_retrieval_lib.py b/official/nlp/data/sentence_retrieval_lib.py index 0bfd8e4dec5afba3eb00ff23e3f75a0cc5818958..947dbb77949789f9564e610e7398051dc7b63d70 100644 --- a/official/nlp/data/sentence_retrieval_lib.py +++ b/official/nlp/data/sentence_retrieval_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ import os from absl import logging -from official.nlp.bert import tokenization from official.nlp.data import classifier_data_lib +from official.nlp.tools import tokenization class BuccProcessor(classifier_data_lib.DataProcessor): diff --git a/official/nlp/data/squad_lib.py b/official/nlp/data/squad_lib.py index e96838664c38db4f6cdc2d39f10ad68baeac25e5..2d198e6c1b442f68446d998be4eb8a9c9354a828 100644 --- a/official/nlp/data/squad_lib.py +++ b/official/nlp/data/squad_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -25,7 +25,7 @@ import six from absl import logging import tensorflow as tf -from official.nlp.bert import tokenization +from official.nlp.tools import tokenization class SquadExample(object): diff --git a/official/nlp/data/squad_lib_sp.py b/official/nlp/data/squad_lib_sp.py index 021193d4114004adceb5a0197a46842cd9d4601b..abd4abfbc09ca8f97db977294249cf250655658e 100644 --- a/official/nlp/data/squad_lib_sp.py +++ b/official/nlp/data/squad_lib_sp.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,7 +28,7 @@ from absl import logging import numpy as np import tensorflow as tf -from official.nlp.bert import tokenization +from official.nlp.tools import tokenization class SquadExample(object): diff --git a/official/nlp/data/tagging_data_lib.py b/official/nlp/data/tagging_data_lib.py index f6b9c19744be9b6b3730e65dd24c1e19730f8e47..c73d7108a9b2f41a1288295cbe27663a30de0ccc 100644 --- a/official/nlp/data/tagging_data_lib.py +++ b/official/nlp/data/tagging_data_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,8 +19,8 @@ import os from absl import logging import tensorflow as tf -from official.nlp.bert import tokenization from official.nlp.data import classifier_data_lib +from official.nlp.tools import tokenization # A negative label id for the padding label, which will not contribute # to loss/metrics in training. diff --git a/official/nlp/data/tagging_data_lib_test.py b/official/nlp/data/tagging_data_lib_test.py index afbfebdef30586faa1ec0f362ee15d10461df1c3..6a1679f5e28f497a9bccff9bef9e684636a45247 100644 --- a/official/nlp/data/tagging_data_lib_test.py +++ b/official/nlp/data/tagging_data_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,8 +19,8 @@ import random from absl.testing import parameterized import tensorflow as tf -from official.nlp.bert import tokenization from official.nlp.data import tagging_data_lib +from official.nlp.tools import tokenization def _create_fake_file(filename, labels, is_test): diff --git a/official/nlp/data/tagging_dataloader.py b/official/nlp/data/tagging_dataloader.py index daecb8e3d8c75e2a6127f9be2892fd504d1a4385..f02d49ab94b59cfda325d41c37eba537cffdb578 100644 --- a/official/nlp/data/tagging_dataloader.py +++ b/official/nlp/data/tagging_dataloader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/tagging_dataloader_test.py b/official/nlp/data/tagging_dataloader_test.py index 2ff5fc7f2fa9e2715cac68a9648a2dd920405a60..3d2be5e97c33fde79bb6ab9f44d8874d940ca92b 100644 --- a/official/nlp/data/tagging_dataloader_test.py +++ b/official/nlp/data/tagging_dataloader_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/train_sentencepiece.py b/official/nlp/data/train_sentencepiece.py index 4d3b05c46472e55c9b804da2aa45dabfd4867b7f..5b9f944dbeea6452c0c21baedf03b3e24d463f05 100644 --- a/official/nlp/data/train_sentencepiece.py +++ b/official/nlp/data/train_sentencepiece.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -36,7 +36,7 @@ from sentencepiece import SentencePieceTrainer FLAGS = flags.FLAGS flags.DEFINE_string("output_model_path", None, - "Path to save the the sentencepiece model.") + "Path to save the sentencepiece model.") flags.mark_flag_as_required("output_model_path") flags.DEFINE_string("tfds_dir", None, "Directory of the tfds.") diff --git a/official/nlp/data/wmt_dataloader.py b/official/nlp/data/wmt_dataloader.py index e0521ad47805b05b83287248f9db80dd1881e140..e801e9d74334d75cc944add173e55ed2bffc1686 100644 --- a/official/nlp/data/wmt_dataloader.py +++ b/official/nlp/data/wmt_dataloader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/data/wmt_dataloader_test.py b/official/nlp/data/wmt_dataloader_test.py index a4454d96d889504251d50070863b9447b2263648..82e56f599d2ea6ead153448521cd8f0ed38ee4e8 100644 --- a/official/nlp/data/wmt_dataloader_test.py +++ b/official/nlp/data/wmt_dataloader_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/docs/README.md b/official/nlp/docs/README.md new file mode 100644 index 0000000000000000000000000000000000000000..5e4cd05342e2e78268ac73ab882e9ae58f476a74 --- /dev/null +++ b/official/nlp/docs/README.md @@ -0,0 +1,13 @@ +This directory contain guides to help users to train NLP models. + +1. [Training guide](train.md) explain the steps to follow for training NLP +models on GPU and TPU. + +2. [Pretrained_models guide](pretrained_models.md) explain how to load +pre-trained NLP models (baselines and checkpoints) that can be finetuned +further depending on application. + +3. [TF-Hub guide](tfhub.md) explain how to use TF-NLP's +[export_tfhub](https://github.com/tensorflow/models/blob/master/official/nlp/tools/export_tfhub.py) +tool to export pre-trained Transformer encoders to SavedModels format that are +suitable for publication on TF Hub. diff --git a/official/nlp/docs/train.md b/official/nlp/docs/train.md index f2c5245bf6a80af8ff27450494677dbc62dbe925..5a07901cdeb41c54ff279b6c76dea1cec4e8d55d 100644 --- a/official/nlp/docs/train.md +++ b/official/nlp/docs/train.md @@ -1,12 +1,14 @@ # Model Garden NLP Common Training Driver -[train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) is the common training driver that supports multiple +[train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) +is the common training driver that supports multiple NLP tasks (e.g., pre-training, GLUE and SQuAD fine-tuning etc) and multiple models (e.g., BERT, ALBERT, MobileBERT etc). ## Experiment Configuration -[train.py] is driven by configs defined by the [ExperimentConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py) +[train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) +is driven by configs defined by the [ExperimentConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py) including configurations for `task`, `trainer` and `runtime`. The pre-defined NLP related [ExperimentConfig](https://github.com/tensorflow/models/blob/master/official/core/config_definitions.py) can be found in [configs/experiment_configs.py](https://github.com/tensorflow/models/blob/master/official/nlp/configs/experiment_configs.py). @@ -78,7 +80,9 @@ setting `task.validation_data.input_path` in `PARAMS`. ## Run on Cloud TPUs -Next, we will describe how to run the [train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) on Cloud TPUs. +Next, we will describe how to run +the [train.py](https://github.com/tensorflow/models/blob/master/official/nlp/train.py) +on Cloud TPUs. ### Setup First, you need to create a `tf-nightly` TPU with @@ -99,7 +103,9 @@ pip3 install --user -r official/requirements.txt ### Fine-tuning Sentence Classification with BERT from TF-Hub -This example fine-tunes BERT-base from TF-Hub on the the Multi-Genre Natural +
+ +This example fine-tunes BERT-base from TF-Hub on the Multi-Genre Natural Language Inference (MultiNLI) corpus using TPUs. Firstly, you can prepare the fine-tuning data using @@ -163,8 +169,12 @@ python3 train.py \ You can monitor the training progress in the console and find the output models in `$OUTPUT_DIR`. +
+ ### Fine-tuning SQuAD with a pre-trained BERT checkpoint +
+ This example fine-tunes a pre-trained BERT checkpoint on the Stanford Question Answering Dataset (SQuAD) using TPUs. The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains @@ -219,4 +229,73 @@ python3 train.py \ ``` -Note: More examples about pre-training will come soon. +### Pre-train a BERT from scratch + +
+ +This example pre-trains a BERT model with Wikipedia and Books datasets used by +the original BERT paper. +The [BERT repo](https://github.com/tensorflow/models/blob/master/official/nlp/data/create_pretraining_data.py) +contains detailed information about the Wikipedia dump and +[BookCorpus](https://yknzhu.wixsite.com/mbweb). Of course, the pre-training +recipe is generic and you can apply the same recipe to your own corpus. + +Please use the script +[`create_pretraining_data.py`](https://github.com/tensorflow/models/blob/master/official/nlp/data/create_pretraining_data.py) +which is essentially branched from [BERT research repo](https://github.com/google-research/bert) +to get processed pre-training data and it adapts to TF2 symbols and python3 +compatibility. + +Running the pre-training script requires an input and output directory, as well +as a vocab file. Note that `max_seq_length` will need to match the sequence +length parameter you specify when you run pre-training. + +```shell +export WORKING_DIR='local disk or cloud location' +export BERT_DIR='local disk or cloud location' +python models/official/nlp/data/create_pretraining_data.py \ + --input_file=$WORKING_DIR/input/input.txt \ + --output_file=$WORKING_DIR/output/tf_examples.tfrecord \ + --vocab_file=$BERT_DIR/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ + --do_lower_case=True \ + --max_seq_length=512 \ + --max_predictions_per_seq=76 \ + --masked_lm_prob=0.15 \ + --random_seed=12345 \ + --dupe_factor=5 +``` + +Then, you can update the yaml configuration file, e.g. +`configs/experiments/wiki_books_pretrain.yaml` to specify your data paths and +update masking-related hyper parameters to match with your specification for +the pretraining data. When your data have multiple shards, you can +use `*` to include multiple files. + +To train different BERT sizes, you need to adjust: + +``` +model: + cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768, name: next_sentence, num_classes: 2}] +``` + +to match the hidden dimensions. + +Then, you can start the training and evaluation jobs, which runs the +[`bert/pretraining`](https://github.com/tensorflow/models/blob/master/official/nlp/configs/pretraining_experiments.py#L51) +experiment: + +```shell +export OUTPUT_DIR=gs://some_bucket/my_output_dir +export PARAMS=$PARAMS,runtime.distribution_strategy=tpu + +python3 train.py \ + --experiment=bert/pretraining \ + --mode=train_and_eval \ + --model_dir=$OUTPUT_DIR \ + --config_file=configs/models/bert_en_uncased_base.yaml \ + --config_file=configs/experiments/wiki_books_pretrain.yaml \ + --tpu=${TPU_NAME} \ + --params_override=$PARAMS +``` + +Note: More examples about pre-training with TFDS datesets will come soon. diff --git a/official/nlp/finetuning/binary_helper.py b/official/nlp/finetuning/binary_helper.py index 7e0ffc610a1f853f567bae18c4c794ad43f86075..10ad91e9377eb042add8724478e72648376577ee 100644 --- a/official/nlp/finetuning/binary_helper.py +++ b/official/nlp/finetuning/binary_helper.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/finetuning/glue/flags.py b/official/nlp/finetuning/glue/flags.py index 0f684fc916fb178cdaa542855ce3cffaa8627a9d..0ad161bc662a2aadcd3cbe0dde008931809613cd 100644 --- a/official/nlp/finetuning/glue/flags.py +++ b/official/nlp/finetuning/glue/flags.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/finetuning/glue/run_glue.py b/official/nlp/finetuning/glue/run_glue.py index aa1b047f3e6413e84e7f5882cbfcef26e3c2cad3..d1e67aa695d883592d5b648ee5fcd9539c429923 100644 --- a/official/nlp/finetuning/glue/run_glue.py +++ b/official/nlp/finetuning/glue/run_glue.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -55,9 +55,9 @@ EVAL_METRIC_MAP = { 'AX': 'matthews_corrcoef', 'COLA': 'matthews_corrcoef', 'MNLI': 'cls_accuracy', - 'MRPC': 'cls_accuracy', + 'MRPC': 'f1', 'QNLI': 'cls_accuracy', - 'QQP': 'cls_accuracy', + 'QQP': 'f1', 'RTE': 'cls_accuracy', 'SST-2': 'cls_accuracy', 'STS-B': 'pearson_spearman_corr', @@ -93,11 +93,16 @@ def _override_exp_config_by_flags(exp_config, input_meta_data): binary_helper.override_sentence_prediction_task_config, num_classes=input_meta_data['num_labels'], metric_type='matthews_corrcoef') - elif FLAGS.task_name in ('MNLI', 'MRPC', 'QNLI', 'QQP', 'RTE', 'SST-2', + elif FLAGS.task_name in ('MNLI', 'QNLI', 'RTE', 'SST-2', 'WNLI'): override_task_cfg_fn = functools.partial( binary_helper.override_sentence_prediction_task_config, num_classes=input_meta_data['num_labels']) + elif FLAGS.task_name in ('QQP', 'MRPC'): + override_task_cfg_fn = functools.partial( + binary_helper.override_sentence_prediction_task_config, + metric_type='f1', + num_classes=input_meta_data['num_labels']) elif FLAGS.task_name in ('STS-B',): override_task_cfg_fn = functools.partial( binary_helper.override_sentence_prediction_task_config, diff --git a/official/nlp/finetuning/superglue/flags.py b/official/nlp/finetuning/superglue/flags.py index 7c2f0ba72b43c65895b5ba89c5156ef9db3a0714..68457ea379be2c76efc3dc519a379da210ae9687 100644 --- a/official/nlp/finetuning/superglue/flags.py +++ b/official/nlp/finetuning/superglue/flags.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/finetuning/superglue/run_superglue.py b/official/nlp/finetuning/superglue/run_superglue.py index 01025a88f93fabc06f6da0d00f148aaf815c9af2..773abbd0315440f023aa70f31457252c61aaefc1 100644 --- a/official/nlp/finetuning/superglue/run_superglue.py +++ b/official/nlp/finetuning/superglue/run_superglue.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/metrics/__init__.py b/official/nlp/metrics/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/nlp/metrics/__init__.py +++ b/official/nlp/metrics/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/metrics/bleu.py b/official/nlp/metrics/bleu.py index 7a17db6d870f2981116e70b5556bd7a785c84476..01c6ae5faa2fc5eeafab4cd793c1cd3db3def4f4 100644 --- a/official/nlp/metrics/bleu.py +++ b/official/nlp/metrics/bleu.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/metrics/bleu_test.py b/official/nlp/metrics/bleu_test.py index e410ae80598a47ee660a56ae1ba8c73df20389c5..9097ad8fd2ccb5e3cf9299addaf5f6b11aefe776 100644 --- a/official/nlp/metrics/bleu_test.py +++ b/official/nlp/metrics/bleu_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/README.md b/official/nlp/modeling/README.md index 99c7c361f9716b380a9287306558b872238afa7e..05a90248b6dfb176591386e1aac38064c40cbf67 100644 --- a/official/nlp/modeling/README.md +++ b/official/nlp/modeling/README.md @@ -20,8 +20,7 @@ examples. * [`losses`](losses) contains common loss computation used in NLP tasks. Please see the colab -[nlp_modeling_library_intro.ipynb] -(https://colab.sandbox.google.com/github/tensorflow/models/blob/master/official/colab/nlp/nlp_modeling_library_intro.ipynb) +[NLP modeling library intro.ipynb](https://colab.sandbox.google.com/github/tensorflow/models/blob/master/docs/nlp/index.ipynb) for how to build transformer-based NLP models using above primitives. Besides the pre-defined primitives, it also provides scaffold classes to allow @@ -44,8 +43,7 @@ custom hidden layer (which will replace the Transformer instantiation in the encoder). Please see the colab -[customize_encoder.ipynb] -(https://colab.sandbox.google.com/github/tensorflow/models/blob/master/official/colab/nlp/customize_encoder.ipynb) +[customize_encoder.ipynb](https://colab.sandbox.google.com/github/tensorflow/models/blob/master/docs/nlp/customize_encoder.ipynb) for how to use scaffold classes to build noval achitectures. BERT and ALBERT models in this repo are implemented using this library. diff --git a/official/nlp/modeling/__init__.py b/official/nlp/modeling/__init__.py index 3beacedc96f26caa1db985256fd460b7ba3e543f..6159d986d397e381878b89ba46810cb1f5726989 100644 --- a/official/nlp/modeling/__init__.py +++ b/official/nlp/modeling/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/README.md b/official/nlp/modeling/layers/README.md index fb4069abd5bad9e8617f3cd74ec51fe9be262c60..54c00661ffd3f49dbd7f7ddfda9fb9bac00770fa 100644 --- a/official/nlp/modeling/layers/README.md +++ b/official/nlp/modeling/layers/README.md @@ -13,7 +13,7 @@ assemble new `tf.keras` layers or models. ["Big Bird: Transformers for Longer Sequences"](https://arxiv.org/abs/2007.14062). * [CachedAttention](attention.py) implements an attention layer with cache - used for auto-agressive decoding. + used for auto-aggressive decoding. * [KernelAttention](kernel_attention.py) implements a group of attention mechansim that express the self-attention as a linear dot-product of diff --git a/official/nlp/modeling/layers/__init__.py b/official/nlp/modeling/layers/__init__.py index f8f475d40a50d8f05d49e49ee24a6855c1ee13a7..27a161b69cf19ea8ed9a4cbc1452b6fcaaf8bc93 100644 --- a/official/nlp/modeling/layers/__init__.py +++ b/official/nlp/modeling/layers/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,9 @@ They can be used to assemble new `tf.keras` layers or models. from official.nlp.modeling.layers.attention import * from official.nlp.modeling.layers.bigbird_attention import BigBirdAttention from official.nlp.modeling.layers.bigbird_attention import BigBirdMasks +from official.nlp.modeling.layers.block_diag_feedforward import BlockDiagFeedforward from official.nlp.modeling.layers.cls_head import * +from official.nlp.modeling.layers.factorized_embedding import FactorizedEmbedding from official.nlp.modeling.layers.gated_feedforward import GatedFeedforward from official.nlp.modeling.layers.gaussian_process import RandomFeatureGaussianProcess from official.nlp.modeling.layers.kernel_attention import KernelAttention @@ -28,11 +30,19 @@ from official.nlp.modeling.layers.kernel_attention import KernelMask from official.nlp.modeling.layers.masked_lm import MaskedLM from official.nlp.modeling.layers.masked_softmax import MaskedSoftmax from official.nlp.modeling.layers.mat_mul_with_margin import MatMulWithMargin +from official.nlp.modeling.layers.mixing import FourierTransformLayer +from official.nlp.modeling.layers.mixing import HartleyTransformLayer +from official.nlp.modeling.layers.mixing import LinearTransformLayer +from official.nlp.modeling.layers.mixing import MixingMechanism from official.nlp.modeling.layers.mobile_bert_layers import MobileBertEmbedding from official.nlp.modeling.layers.mobile_bert_layers import MobileBertMaskedLM from official.nlp.modeling.layers.mobile_bert_layers import MobileBertTransformer from official.nlp.modeling.layers.multi_channel_attention import * from official.nlp.modeling.layers.on_device_embedding import OnDeviceEmbedding +from official.nlp.modeling.layers.pack_optimization import PackBertEmbeddings +from official.nlp.modeling.layers.pack_optimization import StridedTransformerEncoderBlock +from official.nlp.modeling.layers.pack_optimization import StridedTransformerScaffold +from official.nlp.modeling.layers.per_dim_scale_attention import PerDimScaleAttention from official.nlp.modeling.layers.position_embedding import PositionEmbedding from official.nlp.modeling.layers.position_embedding import RelativePositionBias from official.nlp.modeling.layers.position_embedding import RelativePositionEmbedding @@ -41,6 +51,7 @@ from official.nlp.modeling.layers.relative_attention import TwoStreamRelativeAtt from official.nlp.modeling.layers.reuse_attention import ReuseMultiHeadAttention from official.nlp.modeling.layers.reuse_transformer import ReuseTransformer from official.nlp.modeling.layers.rezero_transformer import ReZeroTransformer +from official.nlp.modeling.layers.routing import * from official.nlp.modeling.layers.self_attention_mask import SelfAttentionMask from official.nlp.modeling.layers.spectral_normalization import * from official.nlp.modeling.layers.talking_heads_attention import TalkingHeadsAttention @@ -49,7 +60,8 @@ from official.nlp.modeling.layers.text_layers import BertTokenizer from official.nlp.modeling.layers.text_layers import FastWordpieceBertTokenizer from official.nlp.modeling.layers.text_layers import SentencepieceTokenizer from official.nlp.modeling.layers.tn_transformer_expand_condense import TNTransformerExpandCondense -from official.nlp.modeling.layers.transformer import * +from official.nlp.modeling.layers.transformer import Transformer +from official.nlp.modeling.layers.transformer import TransformerDecoderBlock from official.nlp.modeling.layers.transformer_encoder_block import TransformerEncoderBlock from official.nlp.modeling.layers.transformer_scaffold import TransformerScaffold from official.nlp.modeling.layers.transformer_xl import TransformerXL diff --git a/official/nlp/modeling/layers/attention.py b/official/nlp/modeling/layers/attention.py index 9b13b89695d2e869b723270d5ac6ed929a3d1369..9d874d7bff8c9d7bb828b365bfccfdbc1ffc16e9 100644 --- a/official/nlp/modeling/layers/attention.py +++ b/official/nlp/modeling/layers/attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,13 +18,13 @@ import math import tensorflow as tf -EinsumDense = tf.keras.layers.experimental.EinsumDense +EinsumDense = tf.keras.layers.EinsumDense MultiHeadAttention = tf.keras.layers.MultiHeadAttention @tf.keras.utils.register_keras_serializable(package="Text") class CachedAttention(tf.keras.layers.MultiHeadAttention): - """Attention layer with cache used for auto-agressive decoding. + """Attention layer with cache used for autoregressive decoding. Arguments are the same as `tf.keras.layers.MultiHeadAttention` layer. """ diff --git a/official/nlp/modeling/layers/attention_test.py b/official/nlp/modeling/layers/attention_test.py index e09f88980cc60d35a40b755c45cad1a802dbfadc..1f3d73d164ae5686c0cbbb18fb3e2c1d51c10d56 100644 --- a/official/nlp/modeling/layers/attention_test.py +++ b/official/nlp/modeling/layers/attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/bigbird_attention.py b/official/nlp/modeling/layers/bigbird_attention.py index 4d3c662442965bf88018247a90fa37e0f331cfa5..8f6f3d614713ec87db5b1a6557382e4da9776e45 100644 --- a/official/nlp/modeling/layers/bigbird_attention.py +++ b/official/nlp/modeling/layers/bigbird_attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/bigbird_attention_test.py b/official/nlp/modeling/layers/bigbird_attention_test.py index adafa9316e26610c868bbcd8c5cd735c46d5ad35..3764ce49db67a8e6c7c48b3be76890da4cc3b8ff 100644 --- a/official/nlp/modeling/layers/bigbird_attention_test.py +++ b/official/nlp/modeling/layers/bigbird_attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/block_diag_feedforward.py b/official/nlp/modeling/layers/block_diag_feedforward.py new file mode 100644 index 0000000000000000000000000000000000000000..a781d7afa231c34066ff0ab8837aef312e96d1a4 --- /dev/null +++ b/official/nlp/modeling/layers/block_diag_feedforward.py @@ -0,0 +1,172 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Keras-based gated feedforward layer.""" +# pylint: disable=g-classes-have-attributes +from typing import Optional + +import tensorflow as tf + +from official.modeling import tf_utils + + +class BlockDiagFeedforward(tf.keras.layers.Layer): + """Block diagonal feedforward layer. + + This layer replaces the weight matrix of the output_dense layer with a block + diagonal matrix to save layer parameters and FLOPs. A linear mixing layer can + be added optionally to improve layer expressibility. + + Args: + intermediate_size: Size of the intermediate layer. + intermediate_activation: Activation for the intermediate layer. + dropout: Dropout probability for the output dropout. + num_blocks: The number of blocks for the block diagonal matrix of the + output_dense layer. + apply_mixing: Apply linear mixing if True. + kernel_initializer: Initializer for dense layer kernels. + bias_initializer: Initializer for dense layer biases. + kernel_regularizer: Regularizer for dense layer kernels. + bias_regularizer: Regularizer for dense layer biases. + activity_regularizer: Regularizer for dense layer activity. + kernel_constraint: Constraint for dense layer kernels. + bias_constraint: Constraint for dense layer kernels. + """ + + def __init__( + self, + intermediate_size: int, + intermediate_activation: str, + dropout: float, + num_blocks: int = 1, + apply_mixing: bool = True, + kernel_initializer: str = "glorot_uniform", + bias_initializer: str = "zeros", + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + activity_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + kernel_constraint: Optional[tf.keras.constraints.Constraint] = None, + bias_constraint: Optional[tf.keras.constraints.Constraint] = None, + **kwargs): # pylint: disable=g-doc-args + super().__init__(**kwargs) + self._intermediate_size = intermediate_size + self._intermediate_activation = intermediate_activation + self._dropout = dropout + self._num_blocks = num_blocks + self._apply_mixing = apply_mixing + + if intermediate_size % num_blocks != 0: + raise ValueError("Intermediate_size (%d) isn't a multiple of num_blocks " + "(%d)." % (intermediate_size, num_blocks)) + + self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) + self._bias_initializer = tf.keras.initializers.get(bias_initializer) + self._kernel_regularizer = tf.keras.regularizers.get(kernel_regularizer) + self._bias_regularizer = tf.keras.regularizers.get(bias_regularizer) + self._activity_regularizer = tf.keras.regularizers.get(activity_regularizer) + self._kernel_constraint = tf.keras.constraints.get(kernel_constraint) + self._bias_constraint = tf.keras.constraints.get(bias_constraint) + + def build(self, input_shape): + hidden_size = input_shape.as_list()[-1] + + common_kwargs = dict( + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activity_regularizer=self._activity_regularizer, + kernel_constraint=self._kernel_constraint, + bias_constraint=self._bias_constraint) + + self._intermediate_dense = tf.keras.layers.EinsumDense( + "abc,cde->abde", + output_shape=(None, self._num_blocks, + self._intermediate_size // self._num_blocks), + bias_axes="de", + name="intermediate", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), + **common_kwargs) + + policy = tf.keras.mixed_precision.global_policy() + if policy.name == "mixed_bfloat16": + # bfloat16 causes BERT with the LAMB optimizer to not converge + # as well, so we use float32. + policy = tf.float32 + self._intermediate_activation_layer = tf.keras.layers.Activation( + self._intermediate_activation, dtype=policy) + + self._output_dense = tf.keras.layers.EinsumDense( + "abde,deo->abdo", + output_shape=(None, self._num_blocks, hidden_size // self._num_blocks), + bias_axes="do", + name="output", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), + **common_kwargs) + + if self._apply_mixing: + self._output_mixing = tf.keras.layers.EinsumDense( + "abdo,de->abeo", + output_shape=(None, self._num_blocks, + hidden_size // self._num_blocks), + name="output_mixing", + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), + **common_kwargs) + self._output_reshape = tf.keras.layers.Reshape((-1, hidden_size)) + + self._output_dropout = tf.keras.layers.Dropout(rate=self._dropout) + + def get_config(self): + config = { + "intermediate_size": + self._intermediate_size, + "intermediate_activation": + self._intermediate_activation, + "dropout": + self._dropout, + "num_blocks": + self._num_blocks, + "apply_mixing": + self._apply_mixing, + "kernel_initializer": + tf.keras.initializers.serialize(self._kernel_initializer), + "bias_initializer": + tf.keras.initializers.serialize(self._bias_initializer), + "kernel_regularizer": + tf.keras.regularizers.serialize(self._kernel_regularizer), + "bias_regularizer": + tf.keras.regularizers.serialize(self._bias_regularizer), + "activity_regularizer": + tf.keras.regularizers.serialize(self._activity_regularizer), + "kernel_constraint": + tf.keras.constraints.serialize(self._kernel_constraint), + "bias_constraint": + tf.keras.constraints.serialize(self._bias_constraint) + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + intermediate_output = self._intermediate_dense(inputs) + intermediate_output = self._intermediate_activation_layer( + intermediate_output) + layer_output = self._output_dense(intermediate_output) + if self._apply_mixing: + layer_output = self._output_mixing(layer_output) + layer_output = self._output_reshape(layer_output) + layer_output = self._output_dropout(layer_output) + + return layer_output diff --git a/official/nlp/modeling/layers/block_diag_feedforward_test.py b/official/nlp/modeling/layers/block_diag_feedforward_test.py new file mode 100644 index 0000000000000000000000000000000000000000..e9b5b4e5e48e05f95a69b7b6c19377cf8d176243 --- /dev/null +++ b/official/nlp/modeling/layers/block_diag_feedforward_test.py @@ -0,0 +1,119 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for Keras-based gated feedforward layer.""" + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from tensorflow.python.keras import keras_parameterized # pylint: disable=g-direct-tensorflow-import +from official.nlp.modeling.layers import block_diag_feedforward + + +# This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It +# guarantees forward compatibility of this code for the V2 switchover. +@keras_parameterized.run_all_keras_modes +class BlockDiagFeedforwardTest(keras_parameterized.TestCase): + + def tearDown(self): + super(BlockDiagFeedforwardTest, self).tearDown() + tf.keras.mixed_precision.set_global_policy("float32") + + @parameterized.parameters( + (1, True, "float32"), + (1, True, "mixed_float16"), + (1, False, "float32"), + (1, False, "mixed_float16"), + (2, True, "float32"), + (2, True, "mixed_float16"), + (2, False, "float32"), + (2, False, "mixed_float16"), + ) + def test_layer_creation(self, num_blocks, apply_mixing, dtype): + tf.keras.mixed_precision.set_global_policy(dtype) + kwargs = dict( + intermediate_size=128, + intermediate_activation="relu", + dropout=0.1, + num_blocks=num_blocks, + apply_mixing=apply_mixing, + kernel_initializer="glorot_uniform", + bias_initializer="zeros") + test_layer = block_diag_feedforward.BlockDiagFeedforward(**kwargs) + + sequence_length = 64 + width = 128 + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + output_tensor = test_layer(data_tensor) + # The default output of a transformer layer should be the same as the input. + self.assertEqual(data_tensor.shape.as_list(), output_tensor.shape.as_list()) + + @parameterized.parameters( + (1, True, "float32"), + (1, True, "mixed_float16"), + (1, False, "float32"), + (1, False, "mixed_float16"), + (2, True, "float32"), + (2, True, "mixed_float16"), + (2, False, "float32"), + (2, False, "mixed_float16"), + ) + def test_layer_invocation(self, num_blocks, apply_mixing, dtype): + tf.keras.mixed_precision.set_global_policy(dtype) + kwargs = dict( + intermediate_size=16, + intermediate_activation="relu", + dropout=0.1, + num_blocks=num_blocks, + apply_mixing=apply_mixing, + kernel_initializer="glorot_uniform", + bias_initializer="zeros") + test_layer = block_diag_feedforward.BlockDiagFeedforward(**kwargs) + + sequence_length = 16 + width = 32 + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + output_tensor = test_layer(data_tensor) + + # Create a model from the test layer. + model = tf.keras.Model(data_tensor, output_tensor) + + # Invoke the model on test data. + batch_size = 6 + input_data = 10 * np.random.random_sample( + (batch_size, sequence_length, width)) + output_data = model.predict(input_data) + self.assertEqual(output_data.shape, (batch_size, sequence_length, width)) + + def test_get_config(self): + kwargs = dict( + intermediate_size=16, + intermediate_activation="relu", + dropout=0.1, + num_blocks=2, + apply_mixing=True, + kernel_initializer="glorot_uniform", + bias_initializer="zeros") + test_layer = block_diag_feedforward.BlockDiagFeedforward(**kwargs) + new_layer = block_diag_feedforward.BlockDiagFeedforward.from_config( + test_layer.get_config()) + + self.assertAllEqual(test_layer.get_config(), new_layer.get_config()) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/nlp/modeling/layers/cls_head.py b/official/nlp/modeling/layers/cls_head.py index 85720df56d404674efcb62888ba3c15d1b11a9a2..2ea2a3eab08708f83c1fdbe263175b6902997f74 100644 --- a/official/nlp/modeling/layers/cls_head.py +++ b/official/nlp/modeling/layers/cls_head.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -57,12 +57,14 @@ class ClassificationHead(tf.keras.layers.Layer): self.dense = tf.keras.layers.Dense( units=self.inner_dim, activation=self.activation, - kernel_initializer=self.initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name="pooler_dense") self.dropout = tf.keras.layers.Dropout(rate=self.dropout_rate) self.out_proj = tf.keras.layers.Dense( - units=num_classes, kernel_initializer=self.initializer, name="logits") + units=num_classes, + kernel_initializer=tf_utils.clone_initializer(self.initializer), + name="logits") def call(self, features: tf.Tensor, only_project: bool = False): """Implements call(). @@ -146,14 +148,15 @@ class MultiClsHeads(tf.keras.layers.Layer): self.dense = tf.keras.layers.Dense( units=inner_dim, activation=self.activation, - kernel_initializer=self.initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name="pooler_dense") self.dropout = tf.keras.layers.Dropout(rate=self.dropout_rate) self.out_projs = [] for name, num_classes in cls_list: self.out_projs.append( tf.keras.layers.Dense( - units=num_classes, kernel_initializer=self.initializer, + units=num_classes, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name=name)) def call(self, features: tf.Tensor, only_project: bool = False): @@ -277,7 +280,7 @@ class GaussianProcessClassificationHead(ClassificationHead): if use_gp_layer: self.out_proj = gaussian_process.RandomFeatureGaussianProcess( self.num_classes, - kernel_initializer=self.initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name="logits", **self.gp_layer_kwargs) @@ -361,3 +364,97 @@ def extract_spec_norm_kwargs(kwargs): return dict( iteration=kwargs.pop("iteration", 1), norm_multiplier=kwargs.pop("norm_multiplier", .99)) + + +class PerQueryDenseHead(tf.keras.layers.Layer): + """Pooling head used for EncT5 style models. + + This module projects each query to use a different projection. + + For a input shape= [bs, num_queries, hidden_size], it projects each query to + (features). Ending up with shape= [bs, num_queries, features]. + + For example, for classification with a few classes, one may use num_queries + as 1 and features as number of classes. For multilabel classification, one + may use num_queries as number of classes and features as 2. So each query + represents a binary classification of one label. + """ + + def __init__(self, + num_queries: int, + features: int, + use_bias: bool = False, + kernel_initializer: str = "glorot_uniform", + **kwargs): + """Initializes the `PerQueryDenseHead`. + + Args: + num_queries: number of queries (the learnable embeddings in the input + sequences) from the decoder. + features: int with numbers of output features. Each query with be + projected to this number with a different projection. + use_bias: whether to add a bias to the output. + kernel_initializer: Initializer for dense layer kernels. + **kwargs: Keyword arguments. + """ + super().__init__(**kwargs) + self.num_queries = num_queries + self.features = features + + self.use_bias = use_bias + self.kernel_initializer = tf.keras.initializers.get(kernel_initializer) + + def build(self, input_shape): + input_shape = tf.TensorShape(input_shape) + # Hidden size. + last_dim = tf.compat.dimension_value(input_shape[-1]) + + self.hidden_size = last_dim + self.kernel = self.add_weight( + "kernel", + shape=[self.num_queries, last_dim, self.features], + initializer=self.kernel_initializer, + dtype=self.dtype, + trainable=True) + if self.use_bias: + self.bias = self.add_weight( + "bias", + shape=[ + self.num_queries, + self.features, + ], + dtype=self.dtype, + trainable=True) + else: + self.bias = None + + def call(self, inputs: tf.Tensor) -> tf.Tensor: + """Implements call(). + + Args: + inputs: a rank-3 Tensor of shape= [bs, num_queries, hidden_size]. + + Returns: + A Tensor, shape= [batch size, num_queries, features]. + """ + + outputs = tf.einsum("bqh,qhf->bqf", inputs, self.kernel) + if self.use_bias: + outputs += self.bias + return outputs + + def get_config(self): + config = { + "num_queries": + self.num_queries, + "features": + self.features, + "kernel_initializer": + tf.keras.activations.serialize(self.kernel_initializer), + } + config.update(super(PerQueryDenseHead, self).get_config()) + return config + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) diff --git a/official/nlp/modeling/layers/cls_head_test.py b/official/nlp/modeling/layers/cls_head_test.py index 4c640baf414df279b4e59585fc5fd32e6589cfd0..a0bfe15ae8bd5d45843d554239df05572639298f 100644 --- a/official/nlp/modeling/layers/cls_head_test.py +++ b/official/nlp/modeling/layers/cls_head_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -199,5 +199,29 @@ class GaussianProcessClassificationHead(tf.test.TestCase, self.assertEqual(layer_config["norm_multiplier"], 1.) self.assertEqual(layer_config["num_inducing"], 512) + +class PerQueryDenseHeadTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters(("single_query", 1, 3, False), + ("multi_queries", 10, 2, False), + ("with_bias", 10, 2, True)) + def test_layer_invocation(self, num_queries, features, use_bias): + batch_size = 5 + hidden_size = 10 + layer = cls_head.PerQueryDenseHead( + num_queries=num_queries, features=features, use_bias=use_bias) + inputs = tf.zeros( + shape=(batch_size, num_queries, hidden_size), dtype=tf.float32) + outputs = layer(inputs) + self.assertEqual(outputs.shape, [batch_size, num_queries, features]) + + def test_layer_serialization(self): + layer = cls_head.PerQueryDenseHead( + num_queries=10, features=2, use_bias=True) + new_layer = cls_head.PerQueryDenseHead.from_config(layer.get_config()) + + # If the serialization was successful, the new config should match the old. + self.assertAllEqual(layer.get_config(), new_layer.get_config()) + if __name__ == "__main__": tf.test.main() diff --git a/official/nlp/modeling/layers/factorized_embedding.py b/official/nlp/modeling/layers/factorized_embedding.py new file mode 100644 index 0000000000000000000000000000000000000000..f19a4ce7883857038d2a4fab56c17e5f83bf0205 --- /dev/null +++ b/official/nlp/modeling/layers/factorized_embedding.py @@ -0,0 +1,76 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""A factorized embedding layer.""" +# pylint: disable=g-classes-have-attributes + +import tensorflow as tf + +from official.modeling import tf_utils +from official.nlp.modeling.layers import on_device_embedding + + +@tf.keras.utils.register_keras_serializable(package='Text') +class FactorizedEmbedding(on_device_embedding.OnDeviceEmbedding): + """A factorized embeddings layer for supporting larger embeddings. + + Arguments: + vocab_size: Number of elements in the vocabulary. + embedding_width: Width of word embeddings. + output_dim: The output dimension of this layer. + initializer: The initializer to use for the embedding weights. Defaults to + "glorot_uniform". + use_one_hot: Whether to use tf.one_hot over tf.gather for the embedding + lookup. Defaults to False (that is, using tf.gather). Setting this option + to True may improve performance, especially on small vocabulary sizes, but + will generally require more memory. + scale_factor: Whether to scale the output embeddings. Defaults to None (that + is, not to scale). Setting this option to a float will let values in + output embeddings multiplied by scale_factor. + """ + + def __init__(self, + vocab_size: int, + embedding_width: int, + output_dim: int, + initializer='glorot_uniform', + use_one_hot=False, + scale_factor=None, + **kwargs): + super().__init__( + vocab_size=vocab_size, + embedding_width=embedding_width, + initializer=initializer, + use_one_hot=use_one_hot, + scale_factor=scale_factor, + **kwargs) + self._output_dim = output_dim + + def get_config(self): + config = {'output_dim': self._output_dim} + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def build(self, input_shape): + self._embedding_projection = tf.keras.layers.EinsumDense( + '...x,xy->...y', + output_shape=self._output_dim, + bias_axes=None, + kernel_initializer=tf_utils.clone_initializer(self._initializer), + name='embedding_projection') + super().build(input_shape) + + def call(self, inputs): + output = super().call(inputs) + return self._embedding_projection(output) diff --git a/official/nlp/modeling/layers/factorized_embedding_test.py b/official/nlp/modeling/layers/factorized_embedding_test.py new file mode 100644 index 0000000000000000000000000000000000000000..686ed7c749512f367887299f389d39476c49dde7 --- /dev/null +++ b/official/nlp/modeling/layers/factorized_embedding_test.py @@ -0,0 +1,70 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for FactorizedEmbedding layer.""" + +import numpy as np +import tensorflow as tf + +from official.nlp.modeling.layers import factorized_embedding + + +class FactorizedEmbeddingTest(tf.test.TestCase): + + def test_layer_creation(self): + vocab_size = 31 + embedding_width = 27 + output_dim = 45 + test_layer = factorized_embedding.FactorizedEmbedding( + vocab_size=vocab_size, + embedding_width=embedding_width, + output_dim=output_dim) + # Create a 2-dimensional input (the first dimension is implicit). + sequence_length = 23 + input_tensor = tf.keras.Input(shape=(sequence_length), dtype=tf.int32) + output_tensor = test_layer(input_tensor) + + # The output should be the same as the input, save that it has an extra + # embedding_width dimension on the end. + expected_output_shape = [None, sequence_length, output_dim] + self.assertEqual(expected_output_shape, output_tensor.shape.as_list()) + self.assertEqual(output_tensor.dtype, tf.float32) + + def test_layer_invocation(self): + vocab_size = 31 + embedding_width = 27 + output_dim = 45 + test_layer = factorized_embedding.FactorizedEmbedding( + vocab_size=vocab_size, + embedding_width=embedding_width, + output_dim=output_dim) + # Create a 2-dimensional input (the first dimension is implicit). + sequence_length = 23 + input_tensor = tf.keras.Input(shape=(sequence_length), dtype=tf.int32) + output_tensor = test_layer(input_tensor) + + # Create a model from the test layer. + model = tf.keras.Model(input_tensor, output_tensor) + + # Invoke the model on test data. We can't validate the output data itself + # (the NN is too complex) but this will rule out structural runtime errors. + batch_size = 3 + input_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length)) + output = model.predict(input_data) + self.assertEqual(tf.float32, output.dtype) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/nlp/modeling/layers/gated_feedforward.py b/official/nlp/modeling/layers/gated_feedforward.py index 2de2940658c68c9cd324df339a8f90a1d0038c12..630ba5e772eda765ea1e9d34aee18dd7dcba7e54 100644 --- a/official/nlp/modeling/layers/gated_feedforward.py +++ b/official/nlp/modeling/layers/gated_feedforward.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,6 +18,9 @@ import gin import tensorflow as tf +from official.modeling import tf_utils +from official.nlp.modeling.layers import util + @tf.keras.utils.register_keras_serializable(package="Text") @gin.configurable @@ -55,9 +58,9 @@ class GatedFeedforward(tf.keras.layers.Layer): """ def __init__(self, - intermediate_size, - intermediate_activation, - dropout, + inner_dim=768, + inner_activation=tf_utils.get_activation("gelu"), + dropout=0.0, use_gate=True, apply_output_layer_norm=True, num_blocks=1, @@ -70,9 +73,12 @@ class GatedFeedforward(tf.keras.layers.Layer): kernel_constraint=None, bias_constraint=None, **kwargs): - super(GatedFeedforward, self).__init__(**kwargs) - self._intermediate_size = intermediate_size - self._intermediate_activation = intermediate_activation + inner_dim = kwargs.pop("intermediate_size", inner_dim) + inner_activation = kwargs.pop("intermediate_activation", inner_activation) + util.filter_kwargs(kwargs) + super().__init__(**kwargs) + self._inner_dim = inner_dim + self._inner_activation = inner_activation self._dropout = dropout self._use_gate = use_gate self._num_blocks = num_blocks @@ -95,15 +101,13 @@ class GatedFeedforward(tf.keras.layers.Layer): hidden_size = input_shape.as_list()[-1] common_kwargs = dict( - kernel_initializer=self._kernel_initializer, - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, kernel_constraint=self._kernel_constraint, bias_constraint=self._bias_constraint) self._intermediate_dense = [] - self._intermediate_activation_layers = [] + self._inner_activation_layers = [] self._gate_dense = [] self._output_dense = [] self._output_dropout = [] @@ -116,29 +120,41 @@ class GatedFeedforward(tf.keras.layers.Layer): activation_policy = tf.float32 for i in range(self._num_blocks): self._intermediate_dense.append( - tf.keras.layers.experimental.EinsumDense( + tf.keras.layers.EinsumDense( "abc,cd->abd", - output_shape=(None, self._intermediate_size), + output_shape=(None, self._inner_dim), bias_axes="d", name="intermediate_%d" % i, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer( + self._bias_initializer), **common_kwargs)) - self._intermediate_activation_layers.append( + self._inner_activation_layers.append( tf.keras.layers.Activation( - self._intermediate_activation, dtype=activation_policy)) + self._inner_activation, dtype=activation_policy)) if self._use_gate: self._gate_dense.append( - tf.keras.layers.experimental.EinsumDense( + tf.keras.layers.EinsumDense( "abc,cd->abd", - output_shape=(None, self._intermediate_size), + output_shape=(None, self._inner_dim), bias_axes="d", name="gate_%d" % i, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer( + self._bias_initializer), **common_kwargs)) self._output_dense.append( - tf.keras.layers.experimental.EinsumDense( + tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, hidden_size), bias_axes="d", name="output_%d" % i, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer( + self._bias_initializer), **common_kwargs)) self._output_dropout.append(tf.keras.layers.Dropout(rate=self._dropout)) # Use float32 in layernorm for numeric stability. @@ -152,10 +168,10 @@ class GatedFeedforward(tf.keras.layers.Layer): def get_config(self): config = { - "intermediate_size": - self._intermediate_size, - "intermediate_activation": - self._intermediate_activation, + "inner_dim": + self._inner_dim, + "inner_activation": + self._inner_activation, "dropout": self._dropout, "use_gate": @@ -179,7 +195,7 @@ class GatedFeedforward(tf.keras.layers.Layer): "bias_constraint": tf.keras.constraints.serialize(self._bias_constraint) } - base_config = super(GatedFeedforward, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def call(self, inputs): @@ -187,7 +203,7 @@ class GatedFeedforward(tf.keras.layers.Layer): for i in range(self._num_blocks): layer_input = layer_output intermediate_output = self._intermediate_dense[i](layer_input) - intermediate_output = self._intermediate_activation_layers[i]( + intermediate_output = self._inner_activation_layers[i]( intermediate_output) if self._use_gate: gated_linear = self._gate_dense[i](layer_input) diff --git a/official/nlp/modeling/layers/gated_feedforward_test.py b/official/nlp/modeling/layers/gated_feedforward_test.py index 46d4f4bb258cf6ea6726679c0d730ac37da50461..6ba2c20053dd5a8da8e2166066c9f541b5b5df9c 100644 --- a/official/nlp/modeling/layers/gated_feedforward_test.py +++ b/official/nlp/modeling/layers/gated_feedforward_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -44,8 +44,8 @@ class GatedFeedforwardTest(keras_parameterized.TestCase): def test_layer_creation(self, use_gate, num_blocks, dropout_position, dtype): tf.keras.mixed_precision.set_global_policy(dtype) kwargs = dict( - intermediate_size=128, - intermediate_activation="relu", + inner_dim=128, + inner_activation="relu", dropout=0.1, use_gate=use_gate, num_blocks=num_blocks, @@ -76,8 +76,8 @@ class GatedFeedforwardTest(keras_parameterized.TestCase): dtype): tf.keras.mixed_precision.set_global_policy(dtype) kwargs = dict( - intermediate_size=16, - intermediate_activation="relu", + inner_dim=16, + inner_activation="relu", dropout=0.1, use_gate=use_gate, num_blocks=num_blocks, @@ -104,8 +104,8 @@ class GatedFeedforwardTest(keras_parameterized.TestCase): def test_serialize_deserialize(self): kwargs = dict( - intermediate_size=16, - intermediate_activation="relu", + inner_dim=16, + inner_activation="relu", dropout=0.1, use_gate=False, num_blocks=4, diff --git a/official/nlp/modeling/layers/gaussian_process.py b/official/nlp/modeling/layers/gaussian_process.py index 3729d8ee6cfaebe3b6c0a077cc3ee7706295bb10..618000577f1f74f943bcd0b5774b095e71f0a651 100644 --- a/official/nlp/modeling/layers/gaussian_process.py +++ b/official/nlp/modeling/layers/gaussian_process.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Definitions for random feature Gaussian process layer.""" import math import tensorflow as tf @@ -117,7 +116,7 @@ class RandomFeatureGaussianProcess(tf.keras.layers.Layer): name: (string) Layer name. **gp_output_kwargs: Additional keyword arguments to dense output layer. """ - super(RandomFeatureGaussianProcess, self).__init__(name=name, dtype=dtype) + super().__init__(name=name, dtype=dtype) self.units = units self.num_inducing = num_inducing @@ -227,7 +226,7 @@ class RandomFeatureGaussianProcess(tf.keras.layers.Layer): """Resets covariance matrix of the GP layer. This function is useful for reseting the model's covariance matrix at the - begining of a new epoch. + beginning of a new epoch. """ self._gp_cov_layer.reset_precision_matrix() @@ -381,7 +380,7 @@ class LaplaceRandomFeatureCovariance(tf.keras.layers.Layer): """Resets precision matrix to its initial value. This function is useful for reseting the model's covariance matrix at the - begining of a new epoch. + beginning of a new epoch. """ precision_matrix_reset_op = self.precision_matrix.assign( self.initial_precision_matrix) diff --git a/official/nlp/modeling/layers/gaussian_process_test.py b/official/nlp/modeling/layers/gaussian_process_test.py index 37958fa742326dc7cde6e1c4625c2b4ba77d2a2d..7a9a56fe452c1bf924e1bb2a069e801604b06a52 100644 --- a/official/nlp/modeling/layers/gaussian_process_test.py +++ b/official/nlp/modeling/layers/gaussian_process_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for Gaussian process functions.""" import os import shutil diff --git a/official/nlp/modeling/layers/kernel_attention.py b/official/nlp/modeling/layers/kernel_attention.py index 6f8d41ad4b673ed83b3dd9a6ef0afbea3eb1803d..2a175f871b189ba4b1b110ea242affac29985f4e 100644 --- a/official/nlp/modeling/layers/kernel_attention.py +++ b/official/nlp/modeling/layers/kernel_attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,6 +18,8 @@ import functools import math import tensorflow as tf +from official.modeling import tf_utils + _NUMERIC_STABLER = 1e-6 @@ -39,6 +41,236 @@ class KernelMask(tf.keras.layers.Layer): return mask +def pad_to_chunk_length(tensor, axis, chunk_length, padding=None): + """Pads a tensor so that shape[axis] is divisible by chunk_length. + + Args: + tensor: Input tensor to pad. + axis: Axis to pad along. + chunk_length: The output tensor will have shape[axis] divisible by + chunk_length. + padding: Pad the input tensor across the axis from either left or right if + padding is set to "left" or "right"; applies no padding if padding is set + to None. In the latter case, the axis dimension of the input tensor must + be divisible by the chunk_length. + + Returns: + Padded tensor with shape[axis] divisible by chunk_length. + """ + if padding is None: + return tensor + shape = tf.shape(tensor) + rank = tf.rank(tensor) + if axis < 0: + axis += rank + axis_length = shape[axis] + pad_length = -axis_length % chunk_length + if padding == "right": + axis_paddings = [[0, pad_length]] + elif padding == "left": + axis_paddings = [[pad_length, 0]] + else: + raise ValueError( + "Illegal padding value; must be one of \"left\", \"right\" or None.") + paddings = tf.concat([ + tf.zeros([axis, 2], dtype=tf.int32), axis_paddings, + tf.zeros([rank - axis - 1, 2], dtype=tf.int32) + ], + axis=0) + return tf.pad(tensor, paddings) + + +def split_tensor_into_chunks(tensor, axis, chunk_length): + """Reshape tensor along given axis using chunk_length. + + Args: + tensor: Input tensor. + axis: Reshape tensor along this axis. + chunk_length: Split the axis into [axis/chunk_length, chunk_length] + + Returns: + Reshaped tensor. + """ + shape = tf.shape(tensor) + num_chunks = shape[axis] // chunk_length + new_shape = tf.concat( + [shape[:axis], [num_chunks, chunk_length], shape[(axis + 1):]], axis=0) + return tf.reshape(tensor, new_shape) + + +def rectangular_window_sum(tensor, window_length): + """Summarizes tensor elements over a sliding rectangular window. + + Sums elements of the input tensor of shape [B, T', C', H, dim] + across a rectangular window sliding along the dimension T'. + + Args: + tensor: Tensor of shape `[B, T', C', H, dim]`. + window_length: The length of the rectangular window. + + Returns: + A tensor of shape [B, T', C', H, dim] containing sums over the + window. + """ + tensor_cumsum = tf.cumsum(tensor, axis=-4) + tensor_winsum = tensor_cumsum - tf.pad( + tensor_cumsum, + [[0, 0], [window_length, 0], [0, 0], [0, 0], [0, 0]])[:, :-window_length] + return tensor_winsum + + +def weighted_window_sum(tensor, window_length, window_weights): + """Summarizes tensor elements over a sliding weighted window. + + Computes a weighted sum of elements of the input tensor of shape [B, + T', C', H, dim] across a window sliding along the dimension T'. + + Args: + tensor: Tensor of shape `[B, T', C', H, dim]`. + window_length: The length of the window. + window_weights: Tensor of shape [window_length] containing window weights. + + Returns: + A tensor of shape [B, T', C', H, dim] containing sums over the + window. + """ + # Flatten the last three dimensions of the [B, T', C', H, dim] shape + # into a single channels dimension. + tensor_shape = tf.shape(tensor) + tensor_2d = tf.reshape(tensor, [tensor_shape[0], tensor_shape[1], 1, -1]) + + # Apply the same weights to all channels. + conv_filter = tf.tile( + tf.reshape(window_weights, [-1, 1, 1, 1]), + multiples=[1, 1, tf.shape(tensor_2d)[-1], 1]) + tensor_winsum_2d = tf.nn.depthwise_conv2d( + tensor_2d, + conv_filter, + strides=[1, 1, 1, 1], + padding=[[0, 0], [window_length - 1, 0], [0, 0], [0, 0]]) + + # Unflatten the channels dimension into the original shape. + tensor_winsum = tf.reshape(tensor_winsum_2d, tensor_shape) + return tensor_winsum + + +def causal_windowed_performer_attention(query_matrix, + key_matrix, + value_matrix, + chunk_length, + window_length, + window_decay=None, + padding=None, + cache=None): + """Applies windowed causal kernel attention with query, key, value tensors. + + We partition the T-length input sequence into N chunks, each of + chunk_length tokens (thus: T = N * chunk_length). Within each chunk, + we apply bidirectional (non-causal) Performers’ implicit attention + and we model relationships between different chunks using + Performers’ causal attention. We consider windowed causal variant of + performer, where the current chunk attends only to the window of + window_length of the most recent chunks. + + Below is an example with T=9, chunk_length=3, window_length=2. In + this example 1 indicates attention is computed between the pair + while 0 indicates attention is not computed between the pairs: + + 111000000 + 111000000 + 111000000 + 111111000 + 111111000 + 111111000 + 000111111 + 000111111 + 000111111 + + User can ensure sequence_length is divisible by chunk_length or use + padding="left"/"right" to pad the sequence length either at the left + or right respectively and make it divisible by chunk_length. + + Args: + query_matrix: Kernel query `Tensor` of shape `[B, T, H, dim]`. + key_matrix: Kernel key `Tensor` of shape `[B, T, H, dim]`. + value_matrix: Value `Tensor` of shape `[B, T, H, out_dim]`. + chunk_length: Length of each chunk in tokens. + window_length: Length of attention window in chunks. + window_decay: Float window decay factor or `None`. If set, exponentially + decay past attention window values by this factor before summation. + padding: Pad the query, value and key input tensors across the axis from + either left or right if padding is set to "left" or "right"; apply no + padding if padding is set to None. In the latter case, the axis dimension + of the query, value and key input tensors must be divisible by the + chunk_length. + cache: Cache to accumulate history in memory. Used at inferecne time + (streaming, decoding) for causal attention. + + Returns: + Window causal performer attention of shape `[B, T, H, out_dim]`. + """ + if cache is None: # Training + old_shape = tf.shape(value_matrix) + + query_matrix = pad_to_chunk_length(query_matrix, -3, chunk_length, padding) + key_matrix = pad_to_chunk_length(key_matrix, -3, chunk_length, padding) + value_matrix = pad_to_chunk_length(value_matrix, -3, chunk_length, padding) + + new_shape = tf.shape(value_matrix) + chunked_query_matrix = split_tensor_into_chunks( + query_matrix, -3, + chunk_length) # [-1, T//chunk_length, chunk_length, N, dim] + chunked_key_matrix = split_tensor_into_chunks( + key_matrix, -3, + chunk_length) # [-1, T//chunk_length, chunk_length, N, dim] + chunked_value_matrix = split_tensor_into_chunks( + value_matrix, -3, + chunk_length) # [-1, T//chunk_length, chunk_length, N, out_dim] + + kp_v = tf.einsum("BTCHD,BTCHO->BTHDO", chunked_key_matrix, + chunked_value_matrix) + + k_sum = tf.math.reduce_sum(chunked_key_matrix, axis=-3, keepdims=True) + + if window_decay is None: + kp_v_winsum = rectangular_window_sum(kp_v, window_length) + k_winsum = rectangular_window_sum(k_sum, window_length) + else: + # Compute exponentially decaying weights. + decaying_weights = tf.math.pow( + tf.convert_to_tensor(window_decay, dtype=value_matrix.dtype), + tf.range(window_length - 1, -1, delta=-1, dtype=value_matrix.dtype)) + kp_v_winsum = weighted_window_sum(kp_v, window_length, decaying_weights) + k_winsum = weighted_window_sum(k_sum, window_length, decaying_weights) + + numerator = tf.einsum( + "BTCHD,BTHDO->BTCHO", chunked_query_matrix, kp_v_winsum) + + k_winsum = tf.squeeze(k_winsum, -3) + denominator = tf.einsum("BTCHD,BTHD->BTCH", chunked_query_matrix, k_winsum) + denominator = tf.expand_dims(denominator, -1) + _NUMERIC_STABLER + attention = numerator / denominator + attention = tf.reshape(attention, new_shape) + + start = tf.zeros([len(old_shape)], dtype=old_shape.dtype) + attention = tf.slice(attention, start, old_shape) + + # Queued window cache (drop instead of decay) not yet supported. + else: # Streaming + + if window_decay is None or window_decay > 1.0 or window_decay < 0.0: + raise ValueError("window_decay should be in (0.0, 1.0) and not None.") + kv = window_decay * cache["kv"] + tf.einsum( + "BTHD,BTHO->BHOD", key_matrix, value_matrix) + cache["kv"] = kv + k_sum = window_decay * cache["k_sum"] + tf.reduce_sum(key_matrix, axis=1) + cache["k_sum"] = k_sum + denominator = tf.einsum("BTHD,BHD->BTH", query_matrix, k_sum) + attention = tf.einsum("BTHD,BHOD,BTH->BTHO", query_matrix, kv, + 1.0 / (denominator + _NUMERIC_STABLER)) + return attention + + def create_projection_matrix(m, d, seed=None): r"""Constructs the matrix of random projections. @@ -56,8 +288,8 @@ def create_projection_matrix(m, d, seed=None): The matrix of random projections of the shape [m, d]. """ nb_full_blocks = math.ceil(m / d) - block_list = tf.TensorArray(tf.float32, - size=tf.cast(nb_full_blocks, dtype=tf.int32)) + block_list = tf.TensorArray( + tf.float32, size=tf.cast(nb_full_blocks, dtype=tf.int32)) stateful = False if seed is None: stateful = True @@ -85,11 +317,13 @@ def create_projection_matrix(m, d, seed=None): return tf.linalg.matmul(tf.linalg.diag(multiplier), final_matrix) -def _generalized_kernel(x, projection_matrix, f, h): +def _generalized_kernel(x, y, is_query, projection_matrix, f, h): """Generalized kernel in RETHINKING ATTENTION WITH PERFORMERS. Args: x: The feature being transformed with shape [B, T, N ,H]. + y: The extra stats-tensor of shape [B, T, N ,H]. + is_query: True if x is a query-tensor. projection_matrix: The matrix with shape [M, H] that we projecct x to, where M is the number of projections. f: A non-linear function applied on x or projected x. @@ -99,7 +333,8 @@ def _generalized_kernel(x, projection_matrix, f, h): Returns: Transformed feature. """ - + del y + del is_query if projection_matrix is None: return h(x) * f(x) else: @@ -108,8 +343,124 @@ def _generalized_kernel(x, projection_matrix, f, h): tf.cast(tf.shape(projection_matrix)[0], tf.float32)) +def expplus(data_orig, + other_data, + is_query, + projection_matrix=None, + numerical_stabilizer=0.000001, + normalize_data=True, + numerical_renormalizer=True, + extra_renormalize_exp_fun=False): + """FAVOR++ mechanism from the CRT paper: https://arxiv.org/abs/2205.15317 . + + Args: + data_orig: data tensor of shape [B,T,H,D] for which random features aree to + be computed + other_data: additional tensor of the shape [B,F,H,D] used to collect stats + to determine the exact instantiation of the random feature mechanism + is_query: boolean indicating whether tensor is a query tensor + projection_matrix: tensor of the shape [M,D] encoding random projections for + random features (M stands for the number of random features) + numerical_stabilizer: numerical stabilizer for the kernel features + normalize_data: whether to sqrt-d-normalize queries/keys as in the regular + attention + numerical_renormalizer: whether to apply additional renormalization for + numerical stability + extra_renormalize_exp_fun: extra renormalizer for the exponential mapping + applied to construct random features + + Returns: + Random feature map tensor for the unbiased softmax-kernel estimation. + """ + + data = data_orig + if projection_matrix is None: + return data_orig + projection_matrix = tf.cast(projection_matrix, data.dtype) + if normalize_data: + data_normalizer = 1.0 / tf.math.sqrt( + (tf.math.sqrt(tf.dtypes.cast(data.shape[-1], data.dtype)))) + else: + data_normalizer = 1.0 + lengths = tf.math.square(data) + lengths = tf.reduce_sum(lengths, axis=tf.keras.backend.ndim(data) - 1) + lengths = tf.expand_dims(lengths, axis=tf.keras.backend.ndim(data) - 1) + lengths = tf.math.sqrt(lengths) + data /= lengths + ratio = 1.0 / tf.math.sqrt( + tf.dtypes.cast(projection_matrix.shape[0], data.dtype)) + data_dash = tf.einsum("blhd,md->blhm", data_normalizer * data, + projection_matrix) + diag_data = tf.math.square(data) + diag_data = tf.math.reduce_sum( + diag_data, axis=tf.keras.backend.ndim(data) - 1) + diag_data = (diag_data / 2.0) * data_normalizer * data_normalizer + diag_data = tf.expand_dims(diag_data, axis=tf.keras.backend.ndim(data) - 1) + + # Calculating coefficients A, B of the FAVOR++ mechanism: + _, l, _, _ = tf_utils.get_shape_list(data_orig) + + l = tf.cast(l, dtype=tf.float32) + first_sum_of_squares = tf.math.square(data) + first_sum_of_squares = tf.math.reduce_sum( + first_sum_of_squares, axis=(1, -1), keepdims=True) + first_sum_of_squares *= (data_normalizer * data_normalizer) + first_sum_of_squares /= l # data.shape[1] + second_sum_of_squares = tf.math.square(other_data) + second_sum_of_squares = tf.math.reduce_sum( + second_sum_of_squares, axis=(1, -1), keepdims=True) + second_sum_of_squares *= (data_normalizer * data_normalizer) + second_sum_of_squares /= l # other_data.shape[1] + data_sum = tf.math.reduce_sum(data, axis=(1,), keepdims=True) + other_data_sum = tf.math.reduce_sum(other_data, axis=(1,), keepdims=True) + d_prod = tf.einsum("blhd,blhd->blh", data_sum, other_data_sum) + d_prod = tf.expand_dims(d_prod, axis=-1) + d_prod *= (data_normalizer * data_normalizer) + d_prod *= (2.0 / (l * l)) + ave = first_sum_of_squares + second_sum_of_squares + d_prod + dim = projection_matrix.shape[-1] + a_coeff = (1.0 / (4.0 * ave)) * ( + tf.math.sqrt((2.0 * ave + dim) * + (2.0 * ave + dim) + 8.0 * dim * ave) - 2.0 * ave - dim) + a_coeff = (1.0 - 1.0 / a_coeff) / 8.0 + b_coeff = tf.math.sqrt(1.0 - 4.0 * a_coeff) + d_coeff = tf.math.pow(1.0 - 4.0 * a_coeff, dim / 4.0) + a_coeff = tf.stop_gradient(a_coeff) + b_coeff = tf.stop_gradient(b_coeff) + d_coeff = tf.stop_gradient(d_coeff) + + # Calculating diag_omega for the FAVOR++ mechanism: + diag_omega = tf.math.square(projection_matrix) + diag_omega = tf.math.reduce_sum( + diag_omega, axis=tf.keras.backend.ndim(projection_matrix) - 1) + diag_omega = tf.expand_dims(diag_omega, axis=0) + diag_omega = tf.expand_dims(diag_omega, axis=0) + diag_omega = tf.expand_dims(diag_omega, axis=0) + diag_omega = a_coeff * diag_omega + + if numerical_renormalizer: + if is_query: + last_dims_t = (len(data_dash.shape) - 1,) + stab = b_coeff * tf.math.reduce_max( + data_dash, axis=last_dims_t, keepdims=True) + else: + stab = b_coeff * tf.math.reduce_max(data_dash, keepdims=True) + if extra_renormalize_exp_fun: + extra_stab = tf.reduce_max(diag_data, axis=1, keepdims=True) + stab = tf.math.maximum(stab, extra_stab) + data_dash = ratio * d_coeff * ( + tf.math.exp(b_coeff * data_dash - stab - diag_data + diag_omega) + + numerical_stabilizer) + else: + data_dash = ratio * d_coeff * ( + tf.math.exp(b_coeff * data_dash - diag_data + diag_omega) + + numerical_stabilizer) + + return data_dash + + # pylint: disable=g-long-lambda -_TRANSFORM_MAP = { +_CAUSAL_SUPPORT_TRANSFORM_MAP = { "elu": functools.partial( _generalized_kernel, @@ -117,19 +468,22 @@ _TRANSFORM_MAP = { h=lambda x: 1), "relu": functools.partial( - _generalized_kernel, f=tf.keras.activations.relu, h=lambda x: 1), + _generalized_kernel, + # Improve numerical stability and avoid NaNs in some cases by adding + # a tiny epsilon. + f=lambda x: tf.keras.activations.relu(x) + 1e-3, + h=lambda x: 1), "square": - functools.partial( - _generalized_kernel, f=tf.math.square, h=lambda x: 1), + functools.partial(_generalized_kernel, f=tf.math.square, h=lambda x: 1), "exp": functools.partial( _generalized_kernel, # Avoid exp explosion by shifting. - f=lambda x: tf.math.exp( - x - tf.math.reduce_max(x, axis=[1, 2, 3], keepdims=True)), - h=lambda x: tf.math.exp( - -0.5 * tf.math.reduce_sum( - tf.math.square(x), axis=-1, keepdims=True)),), + f=lambda x: tf.math.exp(x - tf.math.reduce_max( + x, axis=[1, 2, 3], keepdims=True)), + h=lambda x: tf.math.exp(-0.5 * tf.math.reduce_sum( + tf.math.square(x), axis=-1, keepdims=True)), + ), "expmod": functools.partial( _generalized_kernel, @@ -142,6 +496,16 @@ _TRANSFORM_MAP = { "identity": functools.partial(_generalized_kernel, f=lambda x: x, h=lambda x: 1) } + +_NON_CAUSAL_SUPPORT_TRANSFORM_MAP = { + "expplus": expplus, +} + +_TRANSFORM_MAP = { + **_CAUSAL_SUPPORT_TRANSFORM_MAP, + **_NON_CAUSAL_SUPPORT_TRANSFORM_MAP +} + # pylint: enable=g-long-lambda @@ -154,6 +518,9 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): (https://arxiv.org/abs/2009.14794) - exp (Lemma 1, positive), relu - random/deterministic projection + Chefs' Random Tables: Non-Trigonometric Random Features + (https://arxiv.org/abs/2205.15317) + - expplus (OPRF mechanism) Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention (https://arxiv.org/abs/2006.16236) @@ -178,13 +545,19 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): is_short_seq=False, begin_kernel=0, scale=None, + scale_by_length=False, + use_causal_windowed=False, + causal_chunk_length=1, + causal_window_length=3, + causal_window_decay=None, + causal_padding=None, **kwargs): r"""Constructor of KernelAttention. Args: - feature_transform: A non-linear transform of the keys and quries. - Possible transforms are "elu", "relu", "square", "exp", "expmod", - "identity". + feature_transform: A non-linear transform of the keys and queries. + Possible transforms are "elu", "relu", "square", "exp", "expplus", + "expmod", "identity". num_random_features: Number of random features to be used for projection. if num_random_features <= 0, no production is used before transform. seed: The seed to begin drawing random features. Once the seed is set, the @@ -194,12 +567,28 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): redraw: Whether to redraw projection every forward pass during training. The argument is only effective when num_random_features > 0. is_short_seq: boolean predicate indicating whether input data consists of - very short sequences or not; in most cases this should be False - (default option). + very short sequences or not; in most cases this should be False (default + option). begin_kernel: Apply kernel_attention after this sequence id and apply softmax attention before this. scale: The value to scale the dot product as described in `Attention Is All You Need`. If None, we use 1/sqrt(dk) as described in the paper. + scale_by_length: boolean predicate indicating whether additionally scale + the dot product based on key length. Set as log_512^(n) to stablize + attention entropy against length. Refer to + https://kexue.fm/archives/8823 for details. + use_causal_windowed: If true perform windowed causal attention. See + causal_windowed_performer_attention function docstring for more details. + causal_chunk_length: Length of each chunk in tokens. + causal_window_length: Length of attention window in chunks. + causal_window_decay: Float window decay factor or `None`. If set, + exponentially decay past attention window values by this factor before + summation. + causal_padding: Pad the query, value and key input tensors across the axis + from either left or right if padding is set to "left" or "right"; apply + no padding if padding is set to None. In the latter case, the axis + dimension of the query, value and key input tensors must be divisible by + the chunk_length. **kwargs: The same arguments `MultiHeadAttention` layer. """ if feature_transform not in _TRANSFORM_MAP: @@ -214,6 +603,7 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): self._redraw = redraw self._is_short_seq = is_short_seq self._begin_kernel = begin_kernel + self._scale_by_length = scale_by_length # We use the seed for two scenarios: # 1. inference # 2. no redraw @@ -228,6 +618,14 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): self._projection_matrix = create_projection_matrix( self._num_random_features, self._key_dim, tf.constant([self._seed, self._seed + 1])) + self.use_causal_windowed = use_causal_windowed + self.causal_chunk_length = causal_chunk_length + self.causal_window_length = causal_window_length + self.causal_window_decay = causal_window_decay + self.causal_padding = causal_padding + if self.use_causal_windowed and self._is_short_seq: + raise ValueError( + "use_causal_windowed and short_seq methods are mutually exclusive") def _compute_attention(self, query, @@ -236,6 +634,7 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): feature_transform, is_short_seq, attention_mask=None, + cache=None, training=False, numeric_stabler=_NUMERIC_STABLER): """Applies kernel attention with query, key, value tensors. @@ -252,9 +651,11 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): is_short_seq: boolean predicate indicating whether input data consists of short or long sequences; usually short sequence is defined as having length L <= 1024. - attention_mask: a boolean mask of shape `[B, S]`, that prevents - attenting to masked positions. Note that the mask is only appied to - the keys. User may want to mask the output if query contains pads. + attention_mask: a boolean mask of shape `[B, S]`, that prevents attenting + to masked positions. Note that the mask is only appied to the keys. User + may want to mask the output if query contains pads. + cache: Cache to accumulate history in memory. Used at inferecne time + (streaming, decoding) for causal attention. training: Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (doing nothing). numeric_stabler: A scalar value added to avoid divide by 0. @@ -263,6 +664,7 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): attention_output: Multi-headed outputs of attention computation. """ projection_matrix = None + if self._num_random_features > 0: if self._redraw and training: projection_matrix = create_projection_matrix(self._num_random_features, @@ -270,35 +672,53 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): else: projection_matrix = self._projection_matrix + if self._scale_by_length: + scale = tf.math.log(tf.reduce_sum(attention_mask, + axis=-1)) * self._scale / math.log(512) + scale = tf.reshape(scale, [-1, 1, 1, 1]) + else: + scale = self._scale if is_short_seq: # Note: Applying scalar multiply at the smaller end of einsum improves # XLA performance, but may introduce slight numeric differences in # the Transformer attention head. - query = query * self._scale + query = query * scale else: # Note: we suspect spliting the scale to key, query yields smaller # approximation variance when random projection is used. # For simplicity, we also split when there's no random projection. - key *= math.sqrt(self._scale) - query *= math.sqrt(self._scale) + key *= tf.math.sqrt(scale) + query *= tf.math.sqrt(scale) - key = _TRANSFORM_MAP[feature_transform](key, projection_matrix) - query = _TRANSFORM_MAP[feature_transform](query, projection_matrix) + key_prime = _TRANSFORM_MAP[feature_transform](key, query, False, + projection_matrix) + query_prime = _TRANSFORM_MAP[feature_transform](query, key, True, + projection_matrix) if attention_mask is not None: - key = tf.einsum("BSNH,BS->BSNH", key, attention_mask) + key_prime = tf.einsum("BSNH,BS->BSNH", key_prime, attention_mask) if is_short_seq: - attention_scores = tf.einsum("BTNH,BSNH->BTSN", query, key) + attention_scores = tf.einsum("BTNH,BSNH->BTSN", query_prime, key_prime) attention_scores = tf.nn.softmax(attention_scores, axis=2) attention_output = tf.einsum("BTSN,BSNH->BTNH", attention_scores, value) + elif self.use_causal_windowed: + attention_output = causal_windowed_performer_attention( + query_prime, + key_prime, + value, + chunk_length=self.causal_chunk_length, + window_length=self.causal_window_length, + window_decay=self.causal_window_decay, + padding=self.causal_padding, + cache=cache) else: - kv = tf.einsum("BSNH,BSND->BNDH", key, value) + kv = tf.einsum("BSNH,BSND->BNDH", key_prime, value) denominator = 1.0 / ( - tf.einsum("BTNH,BNH->BTN", query, tf.reduce_sum(key, axis=1)) + - _NUMERIC_STABLER) - attention_output = tf.einsum( - "BTNH,BNDH,BTN->BTND", query, kv, denominator) + tf.einsum("BTNH,BNH->BTN", query_prime, + tf.reduce_sum(key_prime, axis=1)) + _NUMERIC_STABLER) + attention_output = tf.einsum("BTNH,BNDH,BTN->BTND", query_prime, kv, + denominator) return attention_output def _build_from_signature(self, query, value, key=None): @@ -313,15 +733,12 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): kernel_constraint=self._kernel_constraint, bias_constraint=self._bias_constraint) self._output_dense_softmax = self._make_output_dense( - self._query_shape.rank - 1, common_kwargs, + self._query_shape.rank - 1, + common_kwargs, name="attention_output_softmax") self._dropout_softmax = tf.keras.layers.Dropout(rate=self._dropout) - def call(self, - query, - value, - key=None, - attention_mask=None, + def call(self, query, value, key=None, attention_mask=None, cache=None, training=False): """Compute attention with kernel mechanism. @@ -330,15 +747,32 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): value: Value `Tensor` of shape `[B, S, dim]`. key: Optional key `Tensor` of shape `[B, S, dim]`. If not given, will use `value` for both `key` and `value`, which is the most common case. - attention_mask: a boolean mask of shape `[B, S]`, that prevents - attenting to masked positions. Note that the mask is only appied to - the keys. User may want to mask the output if query contains pads. + attention_mask: a boolean mask of shape `[B, S]`, that prevents attenting + to masked positions. Note that the mask is only appied to the keys. User + may want to mask the output if query contains pads. + cache: Cache to accumulate history in memory. Used at inferecne time + (streaming, decoding) for causal attention. training: Python boolean indicating whether the layer should behave in training mode (adding dropout) or in inference mode (doing nothing). Returns: Multi-headed outputs of attention computation. """ + if cache is not None: + if training: + raise ValueError( + "Cache is not supported when training is True.") + if not self.use_causal_windowed: + raise ValueError( + "Cache is not supported for non use_causal_windowed case.") + if self._begin_kernel: + raise ValueError( + "Cache is not supported when begin_kernel is set since the bahvior " + "is too complicated.") + if self._feature_transform in _NON_CAUSAL_SUPPORT_TRANSFORM_MAP: + raise ValueError("Cache is not supported for feature_transform %s" % + (self._feature_transform)) + if not self._built_from_signature: self._build_from_signature(query=query, value=value, key=key) if key is None: @@ -357,25 +791,26 @@ class KernelAttention(tf.keras.layers.MultiHeadAttention): if self._begin_kernel > 0: attention_output_softmax = self._compute_attention( - query[:, :self._begin_kernel], - key, value, "identity", True, attention_mask, training) + query[:, :self._begin_kernel], key, value, "identity", True, + attention_mask, training) attention_output_softmax = self._dropout_softmax(attention_output_softmax) attention_output_softmax = self._output_dense_softmax( attention_output_softmax) attention_output_kernel = self._compute_attention( - query[:, self._begin_kernel:], - key, value, self._feature_transform, self._is_short_seq, - attention_mask, training) + query[:, self._begin_kernel:], key, value, self._feature_transform, + self._is_short_seq, attention_mask, training) attention_output_kernel = self._dropout_layer(attention_output_kernel) - attention_output_kernel = self._output_dense( - attention_output_kernel) + attention_output_kernel = self._output_dense(attention_output_kernel) attention_output = tf.concat( [attention_output_softmax, attention_output_kernel], axis=1) else: - attention_output = self._compute_attention( - query, key, value, self._feature_transform, - self._is_short_seq, attention_mask, training) + attention_output = self._compute_attention(query, key, value, + self._feature_transform, + self._is_short_seq, + attention_mask, + cache, + training) # This is actually dropping out entire tokens to attend to, which might # seem a bit unusual, but is taken from the original Transformer paper. attention_output = self._dropout_layer(attention_output) diff --git a/official/nlp/modeling/layers/kernel_attention_test.py b/official/nlp/modeling/layers/kernel_attention_test.py index 947704fb31dacc76da81af27a6c38328525353e6..fa86b71b96b0c48f93d9ec947dd64e9b04e8f059 100644 --- a/official/nlp/modeling/layers/kernel_attention_test.py +++ b/official/nlp/modeling/layers/kernel_attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ import tensorflow as tf from official.nlp.modeling.layers import kernel_attention as attention -_FEATURE_TRANSFORM = ['relu', 'elu', 'exp'] +_FEATURE_TRANSFORM = ["relu", "elu", "exp", "expplus"] _REDRAW = [True, False] _TRAINING = [True, False] _IS_SHORT_SEQ = [True, False] @@ -30,9 +30,67 @@ _BEGIN_KERNEL = [0, 512] class KernelAttentionTest(tf.test.TestCase, parameterized.TestCase): + # expplus is only designed for bi-directional use case. + # exp can be numeric unstable. @parameterized.parameters(itertools.product( - _FEATURE_TRANSFORM, [127], _TRAINING, [True, False], - _IS_SHORT_SEQ, _BEGIN_KERNEL)) + ["relu", "elu"], [1, 4], [0.9])) + def test_causal_windowed_attention_projection_streaming( + self, feature_transform, causal_chunk_length, causal_weight_decay): + num_heads = 12 + key_dim = 64 + seq_length = 16 + num_chunks = seq_length // causal_chunk_length + causal_window_length = num_chunks + batch_size = 2 + training = False + num_random_features = 0 + test_layer = attention.KernelAttention( + num_heads=num_heads, + key_dim=key_dim, + feature_transform=feature_transform, + num_random_features=num_random_features, + redraw=False, + is_short_seq=False, + begin_kernel=False, + use_causal_windowed=True, + causal_chunk_length=causal_chunk_length, + causal_window_length=causal_window_length, + causal_window_decay=causal_weight_decay, + causal_padding=None, + ) + query = tf.random.normal( + shape=(batch_size, seq_length, key_dim), seed=2) + value = query + encoder_inputs_mask = tf.ones((batch_size, seq_length), dtype=tf.int32) + masks = tf.cast(encoder_inputs_mask, dtype=tf.float32) + output = test_layer( + query=query, + value=value, + attention_mask=masks, + training=training) + dim = num_random_features if num_random_features > 0 else key_dim + kv_cache = tf.zeros( + (batch_size, num_heads, dim, dim)) + k_sum_cache = tf.zeros((batch_size, num_heads, dim)) + stream_output = [] + cache = {"kv": kv_cache, "k_sum": k_sum_cache} + for i in range(num_chunks): + stream_output.append( + test_layer( + query=query[:, i * causal_chunk_length:(i + 1) * + causal_chunk_length, :], + value=value[:, i * causal_chunk_length:(i + 1) * + causal_chunk_length, :], + attention_mask=masks[:, i * causal_chunk_length:(i + 1) * + causal_chunk_length], + cache=cache, + training=training)) + stream_output = tf.concat(stream_output, axis=1) + self.assertAllClose(output, stream_output) + + @parameterized.parameters( + itertools.product(_FEATURE_TRANSFORM, [127], _TRAINING, [True, False], + _IS_SHORT_SEQ, _BEGIN_KERNEL)) def test_attention_projection( self, feature_transform, num_random_features, training, redraw, is_short, begin_kernel): @@ -60,6 +118,41 @@ class KernelAttentionTest(tf.test.TestCase, parameterized.TestCase): training=training) self.assertEqual(output.shape, [batch_size, seq_length, key_dim]) + @parameterized.parameters( + itertools.product(["relu", "exp"], [127], _TRAINING, [True, False], + [0], [None, 0.97], [None, "left", "right"])) + def test_causal_windowed_attention_projection( + self, feature_transform, num_random_features, training, redraw, + begin_kernel, causal_window_decay, causal_padding): + num_heads = 12 + key_dim = 64 + seq_length = 1024 + batch_size = 2 + test_layer = attention.KernelAttention( + num_heads=num_heads, + key_dim=key_dim, + feature_transform=feature_transform, + num_random_features=num_random_features, + redraw=redraw, + is_short_seq=False, + begin_kernel=begin_kernel, + use_causal_windowed=True, + causal_chunk_length=8, + causal_window_length=3, + causal_window_decay=causal_window_decay, + causal_padding=causal_padding) + query = tf.random.normal( + shape=(batch_size, seq_length, key_dim)) + value = query + encoder_inputs_mask = tf.zeros((batch_size, seq_length), dtype=tf.int32) + masks = tf.cast(encoder_inputs_mask, dtype=tf.float32) + output = test_layer( + query=query, + value=value, + attention_mask=masks, + training=training) + self.assertEqual(output.shape, [batch_size, seq_length, key_dim]) + @parameterized.parameters(itertools.product( _FEATURE_TRANSFORM, [0], _TRAINING, [False], _IS_SHORT_SEQ, _BEGIN_KERNEL)) @@ -90,15 +183,41 @@ class KernelAttentionTest(tf.test.TestCase, parameterized.TestCase): training=training) self.assertEqual(output.shape, [batch_size, seq_length, key_dim]) + @parameterized.parameters([128, 512]) + def test_attention_scale_by_length(self, seq_length): + num_heads = 12 + key_dim = 64 + batch_size = 2 + test_layer = attention.KernelAttention( + num_heads=num_heads, + key_dim=key_dim, + num_random_features=0, + scale_by_length=True) + query = tf.random.normal( + shape=(batch_size, seq_length, key_dim)) + value = query + encoder_inputs_mask = tf.ones((batch_size, seq_length), dtype=tf.int32) + masks = tf.cast(encoder_inputs_mask, dtype=tf.float32) + output_scale_by_length = test_layer( + query=query, value=value, attention_mask=masks) + + test_layer._scale_by_length = False + output_no_scale_by_length = test_layer( + query=query, value=value, attention_mask=masks) + if seq_length == 512: # Equals because log(seq_length, base=512) = 1.0 + self.assertAllClose(output_scale_by_length, output_no_scale_by_length) + else: + self.assertNotAllClose(output_scale_by_length, output_no_scale_by_length) + def test_unsupported_feature_transform(self): - with self.assertRaisesRegex(ValueError, 'Unsupported feature_transform.*'): - _ = attention.KernelAttention(feature_transform='test') + with self.assertRaisesRegex(ValueError, "Unsupported feature_transform.*"): + _ = attention.KernelAttention(feature_transform="test") def test_redraw_true_no_projection(self): with self.assertRaisesRegex( - ValueError, 'There is nothing to redraw when num_random_features.*'): + ValueError, "There is nothing to redraw when num_random_features.*"): _ = attention.KernelAttention( - num_heads=2, key_dim=64, feature_transform='elu', + num_heads=2, key_dim=64, feature_transform="elu", num_random_features=0, redraw=True) def test_config(self): @@ -107,7 +226,7 @@ class KernelAttentionTest(tf.test.TestCase, parameterized.TestCase): test_layer = attention.KernelAttention( num_heads=num_heads, key_dim=key_dim, - feature_transform='exp', + feature_transform="exp", num_random_features=128, is_short_seq=True) new_layer = attention.KernelAttention.from_config( @@ -115,5 +234,25 @@ class KernelAttentionTest(tf.test.TestCase, parameterized.TestCase): # If the serialization was successful, the new config should match the old. self.assertAllEqual(test_layer.get_config(), new_layer.get_config()) -if __name__ == '__main__': + def test_rectangular_window_sum(self): + x = tf.ones([2, 5, 2, 2, 2]) + winsum = attention.rectangular_window_sum(x, 3) + self.assertEqual(winsum.shape, x.shape) + self.assertAllClose( + tf.tile( + tf.reshape([1., 2., 3., 3., 3.], [1, -1, 1, 1, 1]), + [2, 1, 2, 2, 2]), + winsum) + + def test_weighted_window_sum(self): + x = tf.ones([2, 5, 2, 2, 2]) + winsum = attention.weighted_window_sum(x, 3, [0.01, 0.1, 1.]) + self.assertEqual(winsum.shape, x.shape) + self.assertAllClose( + tf.tile( + tf.reshape([1., 1.1, 1.11, 1.11, 1.11], [1, -1, 1, 1, 1]), + [2, 1, 2, 2, 2]), + winsum) + +if __name__ == "__main__": tf.test.main() diff --git a/official/nlp/modeling/layers/masked_lm.py b/official/nlp/modeling/layers/masked_lm.py index 9737b22876f01156d8bb7ab2ca38f49a4aa552ec..2d02f71c77a072b637ba93d30b88c9f1592cb18f 100644 --- a/official/nlp/modeling/layers/masked_lm.py +++ b/official/nlp/modeling/layers/masked_lm.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -47,7 +47,7 @@ class MaskedLM(tf.keras.layers.Layer): output='logits', name=None, **kwargs): - super(MaskedLM, self).__init__(name=name, **kwargs) + super().__init__(name=name, **kwargs) self.embedding_table = embedding_table self.activation = activation self.initializer = tf.keras.initializers.get(initializer) @@ -73,7 +73,7 @@ class MaskedLM(tf.keras.layers.Layer): initializer='zeros', trainable=True) - super(MaskedLM, self).build(input_shape) + super().build(input_shape) def call(self, sequence_data, masked_positions): masked_lm_input = self._gather_indexes(sequence_data, masked_positions) @@ -115,7 +115,8 @@ class MaskedLM(tf.keras.layers.Layer): flat_offsets = tf.reshape( tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1]) - flat_positions = tf.reshape(positions + flat_offsets, [-1]) + flat_positions = tf.reshape( + positions + tf.cast(flat_offsets, positions.dtype), [-1]) flat_sequence_tensor = tf.reshape(sequence_tensor, [batch_size * seq_length, width]) output_tensor = tf.gather(flat_sequence_tensor, flat_positions) diff --git a/official/nlp/modeling/layers/masked_lm_test.py b/official/nlp/modeling/layers/masked_lm_test.py index 53b3b4a22b2696a4e7e8b2566f0691418b8d8e0f..0cd3ce0721a7568b919aab16f2933cb0a07a85d3 100644 --- a/official/nlp/modeling/layers/masked_lm_test.py +++ b/official/nlp/modeling/layers/masked_lm_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/masked_softmax.py b/official/nlp/modeling/layers/masked_softmax.py index 06b1994c7b8e5a6a8624130b2a7c6608b2332cf6..51a859027f194d849356ca63144dd643ca0c884f 100644 --- a/official/nlp/modeling/layers/masked_softmax.py +++ b/official/nlp/modeling/layers/masked_softmax.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -53,7 +53,7 @@ class MaskedSoftmax(tf.keras.layers.Layer): self._normalization_axes = (-1,) else: self._normalization_axes = normalization_axes - super(MaskedSoftmax, self).__init__(**kwargs) + super().__init__(**kwargs) def call(self, scores, mask=None): @@ -81,5 +81,5 @@ class MaskedSoftmax(tf.keras.layers.Layer): 'mask_expansion_axes': self._mask_expansion_axes, 'normalization_axes': self._normalization_axes } - base_config = super(MaskedSoftmax, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) diff --git a/official/nlp/modeling/layers/masked_softmax_test.py b/official/nlp/modeling/layers/masked_softmax_test.py index 802b6848211122c29fcbaef4e014f5094dd25939..d6fe410b16421af31ba060dc2344d32abbec7554 100644 --- a/official/nlp/modeling/layers/masked_softmax_test.py +++ b/official/nlp/modeling/layers/masked_softmax_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/mat_mul_with_margin.py b/official/nlp/modeling/layers/mat_mul_with_margin.py index 1fe3156caf35e1010f5838173373004add09b819..25f4ed23a1866881d01ec378f9bd63d6a2946643 100644 --- a/official/nlp/modeling/layers/mat_mul_with_margin.py +++ b/official/nlp/modeling/layers/mat_mul_with_margin.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -36,7 +36,7 @@ class MatMulWithMargin(tf.keras.layers.Layer): logit_scale=1.0, logit_margin=0.0, **kwargs): - super(MatMulWithMargin, self).__init__(**kwargs) + super().__init__(**kwargs) self.logit_scale = logit_scale self.logit_margin = logit_margin @@ -61,7 +61,7 @@ class MatMulWithMargin(tf.keras.layers.Layer): config = { 'logit_scale': self.logit_scale, 'logit_margin': self.logit_margin} - config.update(super(MatMulWithMargin, self).get_config()) + config.update(super().get_config()) return config @classmethod diff --git a/official/nlp/modeling/layers/mat_mul_with_margin_test.py b/official/nlp/modeling/layers/mat_mul_with_margin_test.py index 1ceea013caee4d060e245dcba5bec590c57937da..4a02d51362ee48970a7339b38cf62903030f2800 100644 --- a/official/nlp/modeling/layers/mat_mul_with_margin_test.py +++ b/official/nlp/modeling/layers/mat_mul_with_margin_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/mixing.py b/official/nlp/modeling/layers/mixing.py new file mode 100644 index 0000000000000000000000000000000000000000..71975e9836684583273d9ec069490b9901000f96 --- /dev/null +++ b/official/nlp/modeling/layers/mixing.py @@ -0,0 +1,283 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Keras-based mixing layers. + +Based on the mixing layers use by FNet +(https://aclanthology.org/2022.naacl-main.319/) and Sparse Mixers +(https://arxiv.org/abs/2205.12399). + +Mixing layers can be used as drop in replacements for self-attention layers. For +interoperability with attention layers, we use the same `query` and `value` call +signature. + +Note: These mixing layers currently only support encoder stacks. Decoder stacks +can be supported in the future by utilizing the `value` inputs. +""" + +import enum +import functools +from typing import Callable, Tuple, Union + +import numpy as np +from scipy import linalg +import tensorflow as tf + +from official.modeling import tf_utils + +_Initializer = Union[str, tf.keras.initializers.Initializer] + +default_kernel_initializer = tf.keras.initializers.TruncatedNormal(stddev=2e-2) + + +class MixingMechanism(enum.Enum): + """Determines the type of mixing layer. + + Possible options: + FOURIER: Fourier Transform mixing. + LINEAR: Mixing using dense matrix multiplications with learnable weights. + HARTLEY: Hartley Transform mixing. + """ + FOURIER = "fourier" + HARTLEY = "hartley" + LINEAR = "linear" + + +class MixingLayer(tf.keras.layers.Layer): + """Mixing layer base class. + + This class cannot be used directly. It just specifies the API for mixing + layer subclasses. For interoperability with attention layers, we use the same + `query` and `value` call signature. + + Based on the mixing layers use by FNet + (https://aclanthology.org/2022.naacl-main.319/) and Sparse Mixers + (https://arxiv.org/abs/2205.12399). + """ + + def __init__(self, name: str = "mixing", **kwargs): + """Initializes layer. + + Args: + name: Name for layer. + **kwargs: Keyword arguments. + """ + super().__init__(name=name, **kwargs) + + def call(self, query: tf.Tensor, value: tf.Tensor, **kwargs) -> tf.Tensor: + """Calls the layer. + + Subclasses should return tensors of shape + [batch_size, max_seq_length, hidden_dim]. + + Args: + query: Batch of input embeddings, typically of shape [batch_size, + max_seq_length, hidden_dim]. + value: Unused. Included to match attention layer API. + **kwargs: Optional arguments to catch unused attention keyword arguments. + + Raises: + NotImplementedError. This class should not be called directly. + """ + raise NotImplementedError("Abstract method") + + +class FourierTransformLayer(MixingLayer): + """Fourier Transform layer. + + Applies 2D Fourier Transform over final two dimensions of `query` inputs - + typically the sequence and hidden dimensions. + """ + + def __init__(self, + use_fft: bool = False, + name: str = "fourier_transform", + **kwargs): + """Initializes layer. + + Args: + use_fft: Whether to use Fast Fourier Transform (True) or the Discrete + Fourier Transform (DFT) matrix (False) to compute the Fourier Transform. + See _pick_fourier_transform() for recommendations on when to use FFT or + DFT. + name: Name for layer. + **kwargs: Keyword arguments. + """ + super().__init__(name=name, **kwargs) + self.use_fft = use_fft + + def build(self, input_shape: Tuple[int, ...]): + """Picks the Fourier Transform implementation.""" + self.fourier_transform = _pick_fourier_transform( + self.use_fft, + max_seq_length=input_shape[-2], + hidden_dim=input_shape[-1]) + + def call(self, query: tf.Tensor, value: tf.Tensor, **kwargs) -> tf.Tensor: + """Applies layer to `query`. + + Args: + query: Batch of input embeddings, typically of shape [batch_size, + max_seq_length, hidden_dim]. + value: Unused. Included to match attention layer API. + **kwargs: Optional arguments to catch unused attention keyword arguments. + + Returns: + Real part of discrete Fourier Transform of `query` inputs with shape + [batch_size, max_seq_length, hidden_dim]. + """ + del value # Ignored by encoder-only mixing layers + query = tf.cast(query, tf.complex64) + return tf.math.real(self.fourier_transform(query)) + + +class HartleyTransformLayer(MixingLayer): + """Hartley Transform layer. + + Applies 2D Hartley Transform over final two dimensions of `query` inputs - + typically the sequence and hidden dimensions. + """ + + def __init__(self, + use_fft: bool = False, + name: str = "hartley_transform", + **kwargs): + """Initializes layer. + + Args: + use_fft: Whether to use Fast Fourier Transform (True) or the Discrete + Fourier Transform (DFT) matrix (False) to compute the Hartley Transform. + See _pick_fourier_transform() for recommendations on when to use FFT or + DFT. + name: Name for layer. + **kwargs: Keyword arguments. + """ + super().__init__(name=name, **kwargs) + self.use_fft = use_fft + + def build(self, input_shape: Tuple[int, ...]): + """Picks the Fourier Transform implementation.""" + self.fourier_transform = _pick_fourier_transform( + self.use_fft, + max_seq_length=input_shape[-2], + hidden_dim=input_shape[-1]) + + def call(self, query: tf.Tensor, value: tf.Tensor, **kwargs) -> tf.Tensor: + """Applies layer to `query`. + + Args: + query: Batch of input embeddings, typically of shape [batch_size, + max_seq_length, hidden_dim]. + value: Unused. Included to match attention layer API. + **kwargs: Optional arguments to catch unused attention keyword arguments. + + Returns: + Real part of discrete Hartley Transform of `query` inputs with shape + [batch_size, max_seq_length, hidden_dim]. + """ + del value # Ignored by encoder-only mixing layers + query = tf.cast(query, tf.complex64) + frequencies = self.fourier_transform(query) + return tf.math.real(frequencies) - tf.math.imag(frequencies) + + +class LinearTransformLayer(MixingLayer): + """Dense, linear transformation layer. + + Applies matrix multiplications over sequence and hidden dimensions. + """ + + def __init__(self, + kernel_initializer: _Initializer = default_kernel_initializer, + name: str = "linear_transform", + **kwargs): + """Initializes layer. + + Args: + kernel_initializer: Initialization scheme for kernel. + name: Name for layer. + **kwargs: Keyword arguments. + """ + super().__init__(name=name, **kwargs) + self.kernel_initializer = kernel_initializer + + def build(self, input_shape: Tuple[int, ...]): + """Creates the hidden and sequence matrix variables of the layer.""" + self.mat_hidden = self.add_weight( + shape=(input_shape[-1], input_shape[-1]), + initializer=tf_utils.clone_initializer(self.kernel_initializer), + trainable=True, + name="hidden_kernel") + self.mat_seq = self.add_weight( + shape=(input_shape[-2], input_shape[-2]), + initializer=tf_utils.clone_initializer(self.kernel_initializer), + trainable=True, + name="seq_kernel") + + def call(self, query: tf.Tensor, value: tf.Tensor, **kwargs) -> tf.Tensor: + """Applies layer to `query`. + + Args: + query: Batch of input embeddings, typically of shape [batch_size, + max_seq_length, hidden_dim]. + value: Unused. Included to match attention layer API. + **kwargs: Optional arguments to catch unused attention keyword arguments. + + Returns: + Linearly transformed `query` inputs with shape + [batch_size, max_seq_length, hidden_dim]. + """ + del value # Ignored by encoder-only mixing layers + + return tf.einsum("bij,jk,ni->bnk", query, self.mat_hidden, self.mat_seq) + + +def _pick_fourier_transform( + use_fft: bool, max_seq_length: int, + hidden_dim: int) -> Callable[[tf.Tensor], tf.Tensor]: + """Returns FFT or DFT Fourier Transform implementation. + + On TPUs, we recommend using the Discrete Fourier Transform (DFT) matrix + (use_fft=False), except for very long sequence lengths. On GPUs and CPUs, the + Fast Fourier Transform (use_fft=True) is generally optimal for all sequence + lengths. + + Note: When using the FFT it is recommended to use a sequence length that is a + power of 2. + + Args: + use_fft: If True, return FFT. Otherwise, return DFT matrix. + max_seq_length: Maximum sequence length of inputs. Only used if + use_fft=False. + hidden_dim: Size of hidden dimension of inputs. Only used if use_fft=False. + + Returns: + Fourier Transform. + """ + if use_fft: + return tf.signal.fft2d + else: + dft_mat_seq = linalg.dft(max_seq_length).astype(np.complex64) + dft_mat_hidden = linalg.dft(hidden_dim).astype(np.complex64) + + def two_dim_matmul(x: tf.Tensor, matrix_dim_one: tf.Tensor, + matrix_dim_two: tf.Tensor) -> tf.Tensor: + """Applies 2D matrix multiplication to input tensors of rank >= 2.""" + return tf.einsum("...ij,jk,ni->...nk", tf.cast(x, tf.complex64), + matrix_dim_two, matrix_dim_one) + + return functools.partial( + two_dim_matmul, + matrix_dim_one=tf.convert_to_tensor(dft_mat_seq), + matrix_dim_two=tf.convert_to_tensor(dft_mat_hidden)) diff --git a/official/nlp/modeling/layers/mixing_test.py b/official/nlp/modeling/layers/mixing_test.py new file mode 100644 index 0000000000000000000000000000000000000000..811525884a889128f27485dfb5e200dcb5ab8958 --- /dev/null +++ b/official/nlp/modeling/layers/mixing_test.py @@ -0,0 +1,109 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for mixing.py.""" + +import numpy as np +import tensorflow as tf + +from official.nlp.modeling.layers import mixing + + +class MixingTest(tf.test.TestCase): + + def test_base_mixing_layer(self): + inputs = tf.random.uniform((3, 8, 16), + minval=0, + maxval=10, + dtype=tf.float32) + + with self.assertRaisesRegex(NotImplementedError, "Abstract method"): + _ = mixing.MixingLayer()(query=inputs, value=inputs) + + def test_fourier_layer(self): + batch_size = 4 + max_seq_length = 8 + hidden_dim = 16 + + inputs = tf.random.uniform((batch_size, max_seq_length, hidden_dim), + minval=0, + maxval=10, + dtype=tf.float32) + outputs = mixing.FourierTransformLayer(use_fft=True)( + query=inputs, value=inputs) + self.assertEqual(outputs.shape, (batch_size, max_seq_length, hidden_dim)) + + def test_hartley_layer(self): + batch_size = 3 + max_seq_length = 16 + hidden_dim = 4 + + inputs = tf.random.uniform((batch_size, max_seq_length, hidden_dim), + minval=0, + maxval=12, + dtype=tf.float32) + outputs = mixing.HartleyTransformLayer(use_fft=True)( + query=inputs, value=inputs) + self.assertEqual(outputs.shape, (batch_size, max_seq_length, hidden_dim)) + + def test_linear_mixing_layer(self): + batch_size = 2 + max_seq_length = 4 + hidden_dim = 3 + + inputs = tf.ones((batch_size, max_seq_length, hidden_dim), dtype=tf.float32) + outputs = mixing.LinearTransformLayer( + kernel_initializer=tf.keras.initializers.Ones())( + query=inputs, value=inputs) + + # hidden_dim * (max_seq_length * 1) = 12. + expected_outputs = [ + [ + [12., 12., 12.], + [12., 12., 12.], + [12., 12., 12.], + [12., 12., 12.], + ], + [ + [12., 12., 12.], + [12., 12., 12.], + [12., 12., 12.], + [12., 12., 12.], + ], + ] + np.testing.assert_allclose(outputs, expected_outputs, rtol=1e-6, atol=1e-6) + + def test_pick_fourier_transform(self): + # Ensure we don't hit an edge case which exceeds the fixed numerical error. + tf.random.set_seed(1) + np.random.seed(1) + + batch_size = 3 + max_seq_length = 4 + hidden_dim = 8 + + fft = mixing._pick_fourier_transform( + use_fft=True, max_seq_length=max_seq_length, hidden_dim=hidden_dim) + dft_matmul = mixing._pick_fourier_transform( + use_fft=False, max_seq_length=max_seq_length, hidden_dim=hidden_dim) + + inputs = tf.random.uniform([batch_size, max_seq_length, hidden_dim]) + inputs = tf.cast(inputs, tf.complex64) + + np.testing.assert_allclose( + fft(inputs), dft_matmul(inputs), rtol=1e-6, atol=1e-6) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/nlp/modeling/layers/mobile_bert_layers.py b/official/nlp/modeling/layers/mobile_bert_layers.py index cc1c5c585a349f82da1dd18e7534b669b97250ab..4c5a33a270a0a5cc99b3a3783f885f4a11528846 100644 --- a/official/nlp/modeling/layers/mobile_bert_layers.py +++ b/official/nlp/modeling/layers/mobile_bert_layers.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,6 +15,8 @@ """MobileBERT embedding and transformer layers.""" import tensorflow as tf +from official.modeling import tf_utils + from official.nlp.modeling.layers import on_device_embedding from official.nlp.modeling.layers import position_embedding @@ -24,7 +26,7 @@ class NoNorm(tf.keras.layers.Layer): """Apply element-wise linear transformation to the last dimension.""" def __init__(self, name=None): - super(NoNorm, self).__init__(name=name) + super().__init__(name=name) def build(self, shape): kernal_size = shape[-1] @@ -96,7 +98,7 @@ class MobileBertEmbedding(tf.keras.layers.Layer): dropout_rate: Dropout rate. **kwargs: keyword arguments. """ - super(MobileBertEmbedding, self).__init__(**kwargs) + super().__init__(**kwargs) self.word_vocab_size = word_vocab_size self.word_embed_size = word_embed_size self.type_vocab_size = type_vocab_size @@ -109,21 +111,21 @@ class MobileBertEmbedding(tf.keras.layers.Layer): self.word_embedding = on_device_embedding.OnDeviceEmbedding( self.word_vocab_size, self.word_embed_size, - initializer=initializer, + initializer=tf_utils.clone_initializer(self.initializer), name='word_embedding') self.type_embedding = on_device_embedding.OnDeviceEmbedding( self.type_vocab_size, self.output_embed_size, - initializer=initializer, + initializer=tf_utils.clone_initializer(self.initializer), name='type_embedding') self.pos_embedding = position_embedding.PositionEmbedding( max_length=max_sequence_length, - initializer=initializer, + initializer=tf_utils.clone_initializer(self.initializer), name='position_embedding') - self.word_embedding_proj = tf.keras.layers.experimental.EinsumDense( + self.word_embedding_proj = tf.keras.layers.EinsumDense( 'abc,cd->abd', output_shape=[None, self.output_embed_size], - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), bias_axes='d', name='embedding_projection') self.layer_norm = _get_norm_layer(normalization_type, 'embedding_norm') @@ -220,7 +222,7 @@ class MobileBertTransformer(tf.keras.layers.Layer): Raises: ValueError: A Tensor shape or parameter is invalid. """ - super(MobileBertTransformer, self).__init__(**kwargs) + super().__init__(**kwargs) self.hidden_size = hidden_size self.num_attention_heads = num_attention_heads self.intermediate_size = intermediate_size @@ -242,11 +244,11 @@ class MobileBertTransformer(tf.keras.layers.Layer): self.block_layers = {} # add input bottleneck - dense_layer_2d = tf.keras.layers.experimental.EinsumDense( + dense_layer_2d = tf.keras.layers.EinsumDense( 'abc,cd->abd', output_shape=[None, self.intra_bottleneck_size], bias_axes='d', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='bottleneck_input/dense') layer_norm = _get_norm_layer(self.normalization_type, name='bottleneck_input/norm') @@ -254,11 +256,11 @@ class MobileBertTransformer(tf.keras.layers.Layer): layer_norm] if self.key_query_shared_bottleneck: - dense_layer_2d = tf.keras.layers.experimental.EinsumDense( + dense_layer_2d = tf.keras.layers.EinsumDense( 'abc,cd->abd', output_shape=[None, self.intra_bottleneck_size], bias_axes='d', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='kq_shared_bottleneck/dense') layer_norm = _get_norm_layer(self.normalization_type, name='kq_shared_bottleneck/norm') @@ -272,7 +274,7 @@ class MobileBertTransformer(tf.keras.layers.Layer): value_dim=attention_head_size, dropout=self.attention_probs_dropout_prob, output_shape=self.intra_bottleneck_size, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='attention') layer_norm = _get_norm_layer(self.normalization_type, name='attention/norm') @@ -284,19 +286,19 @@ class MobileBertTransformer(tf.keras.layers.Layer): for ffn_layer_idx in range(self.num_feedforward_networks): layer_prefix = f'ffn_layer_{ffn_layer_idx}' layer_name = layer_prefix + '/intermediate_dense' - intermediate_layer = tf.keras.layers.experimental.EinsumDense( + intermediate_layer = tf.keras.layers.EinsumDense( 'abc,cd->abd', activation=self.intermediate_act_fn, output_shape=[None, self.intermediate_size], bias_axes='d', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name=layer_name) layer_name = layer_prefix + '/output_dense' - output_layer = tf.keras.layers.experimental.EinsumDense( + output_layer = tf.keras.layers.EinsumDense( 'abc,cd->abd', output_shape=[None, self.intra_bottleneck_size], bias_axes='d', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name=layer_name) layer_name = layer_prefix + '/norm' layer_norm = _get_norm_layer(self.normalization_type, @@ -306,12 +308,12 @@ class MobileBertTransformer(tf.keras.layers.Layer): layer_norm]) # add output bottleneck - bottleneck = tf.keras.layers.experimental.EinsumDense( + bottleneck = tf.keras.layers.EinsumDense( 'abc,cd->abd', output_shape=[None, self.hidden_size], activation=None, bias_axes='d', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='bottleneck_output/dense') dropout_layer = tf.keras.layers.Dropout( self.hidden_dropout_prob, @@ -445,6 +447,7 @@ class MobileBertMaskedLM(tf.keras.layers.Layer): activation=None, initializer='glorot_uniform', output='logits', + output_weights_use_proj=False, **kwargs): """Class initialization. @@ -455,9 +458,12 @@ class MobileBertMaskedLM(tf.keras.layers.Layer): uniform initializer. output: The output style for this layer. Can be either `logits` or `predictions`. + output_weights_use_proj: Use projection instead of concating extra output + weights, this may reduce the MLM task accuracy but will reduce the model + params as well. **kwargs: keyword arguments. """ - super(MobileBertMaskedLM, self).__init__(**kwargs) + super().__init__(**kwargs) self.embedding_table = embedding_table self.activation = activation self.initializer = tf.keras.initializers.get(initializer) @@ -467,6 +473,7 @@ class MobileBertMaskedLM(tf.keras.layers.Layer): ('Unknown `output` value "%s". `output` can be either "logits" or ' '"predictions"') % output) self._output_type = output + self._output_weights_use_proj = output_weights_use_proj def build(self, input_shape): self._vocab_size, embedding_width = self.embedding_table.shape @@ -474,15 +481,22 @@ class MobileBertMaskedLM(tf.keras.layers.Layer): self.dense = tf.keras.layers.Dense( hidden_size, activation=self.activation, - kernel_initializer=self.initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='transform/dense') if hidden_size > embedding_width: - self.extra_output_weights = self.add_weight( - 'extra_output_weights', - shape=(self._vocab_size, hidden_size - embedding_width), - initializer=self.initializer, - trainable=True) + if self._output_weights_use_proj: + self.extra_output_weights = self.add_weight( + 'output_weights_proj', + shape=(embedding_width, hidden_size), + initializer=tf_utils.clone_initializer(self.initializer), + trainable=True) + else: + self.extra_output_weights = self.add_weight( + 'extra_output_weights', + shape=(self._vocab_size, hidden_size - embedding_width), + initializer=tf_utils.clone_initializer(self.initializer), + trainable=True) elif hidden_size == embedding_width: self.extra_output_weights = None else: @@ -507,10 +521,16 @@ class MobileBertMaskedLM(tf.keras.layers.Layer): if self.extra_output_weights is None: lm_data = tf.matmul(lm_data, self.embedding_table, transpose_b=True) else: - lm_data = tf.matmul( - lm_data, - tf.concat([self.embedding_table, self.extra_output_weights], axis=1), - transpose_b=True) + if self._output_weights_use_proj: + lm_data = tf.matmul( + lm_data, self.extra_output_weights, transpose_b=True) + lm_data = tf.matmul(lm_data, self.embedding_table, transpose_b=True) + else: + lm_data = tf.matmul( + lm_data, + tf.concat([self.embedding_table, self.extra_output_weights], + axis=1), + transpose_b=True) logits = tf.nn.bias_add(lm_data, self.bias) masked_positions_length = masked_positions.shape.as_list()[1] or tf.shape( diff --git a/official/nlp/modeling/layers/mobile_bert_layers_test.py b/official/nlp/modeling/layers/mobile_bert_layers_test.py index 3edeec0539a1f8cf74e0063b50246f5fcbc764ae..b5c3c5e3fd3d1a27758bcd8397165ca01256f8df 100644 --- a/official/nlp/modeling/layers/mobile_bert_layers_test.py +++ b/official/nlp/modeling/layers/mobile_bert_layers_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/moe.py b/official/nlp/modeling/layers/moe.py new file mode 100644 index 0000000000000000000000000000000000000000..06dcbbaee1ef34150a29860d667c42494adacd34 --- /dev/null +++ b/official/nlp/modeling/layers/moe.py @@ -0,0 +1,761 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Mixture of Experts layers and their routing mechanisms.""" + +import dataclasses +from typing import Any, Callable, Optional, Tuple + +from absl import logging +import numpy as np +import tensorflow as tf + +from official.modeling import tf_utils + + +_InitializerType = tf.keras.initializers.Initializer + + +_DEFAULT_KERNEL_INITIALIZER = tf.keras.initializers.TruncatedNormal(stddev=2e-2) +_DEFAULT_BIAS_INITIALIZER = tf.keras.initializers.Zeros() + + +################## Routers (gating functions) ################## + + +def _router_z_loss(router_logits: tf.Tensor) -> float: + """Computes router z-loss. + + The router z-loss was introduced in Designing Effective Sparse Expert Models + (https://arxiv.org/abs/2202.08906). It encourages router logits to remain + small in an effort to improve stability. + + Args: + router_logits: [num_groups, tokens_per_group, num_experts] router + logits. + + Returns: + Scalar router z-loss . + """ + num_groups, tokens_per_group, _ = router_logits.shape + + log_z = tf.math.reduce_logsumexp(router_logits, axis=-1) + z_loss = log_z**2 + return tf.math.reduce_sum(z_loss) / (num_groups * tokens_per_group) + + +@dataclasses.dataclass +class RouterMask: + """Dispatch and combine arrays for expert routing with masked matmuls. + + Attributes: + dispatch_mask: + [num_groups, tokens_per_group, num_experts, expert_capacity] + dispatch array that is 1 if the token gets routed to the + corresponding expert, and 0 otherwise. + combine_array: + [num_groups, tokens_per_group, num_experts, expert_capacity] + combine array used for combining expert outputs and + scaling with router probability. + """ + dispatch_mask: tf.Tensor + combine_array: tf.Tensor + +RouterOutput = RouterMask + + +class Router(tf.keras.layers.Layer): + """Abstract base router class, defining router API and inner workings. + + Computations are performed in float32 for stability, and returned after + conversion according to the precision policy. See the discussion of + "selective precision" in https://arxiv.org/abs/2101.03961. + + Uses Keras add_loss() and add_metric() APIs. + + Attributes: + num_experts: Number of experts, used to check consistency with + FeedForwardExperts. + jitter_noise: Amplitude of jitter noise applied to router logits. + router_weights: Dense layer that computes logits for all tokens, which are + then used as expert or token weights. + """ + + def __init__( + self, + num_experts: int, + *, + jitter_noise: float = 0.0, + use_bias: bool = True, + kernel_initializer: _InitializerType = _DEFAULT_KERNEL_INITIALIZER, + bias_initializer: _InitializerType = _DEFAULT_BIAS_INITIALIZER, + name: str = "router", + dtype: Any = tf.float32, + **kwargs): + """Init. + + Args: + num_experts: Number of experts. + jitter_noise: Amplitude of jitter noise applied to router logits. + use_bias: Whether or not to use the bias term in computing the router + weights. + kernel_initializer: Kernel initializer for router weights. + bias_initializer: Bias initializer for router weights. + name: Layer name. + dtype: The dtype of the layer's computations and weights. tf.float32 is + recommended for stability. + **kwargs: Forwarded to super. + """ + super().__init__(name=name, dtype=dtype, **kwargs) + + self.num_experts = num_experts # Used to check consistency with + # FeedForwardExperts. + self.jitter_noise = jitter_noise + + self.router_weights = tf.keras.layers.Dense( + num_experts, + use_bias=use_bias, + kernel_initializer=tf_utils.clone_initializer(kernel_initializer), + bias_initializer=tf_utils.clone_initializer(bias_initializer), + name="router_weights", + dtype=dtype) + + def call(self, + inputs: tf.Tensor, + *, + expert_capacity: int, + training: Optional[bool] = None) -> RouterOutput: + """Computes dispatch and combine arrays for routing to experts. + + Args: + inputs: Inputs to send to experts of shape + [num_groups, tokens_per_group, hidden_dim]. + expert_capacity: Each group will send this many tokens to each expert. + training: If true, apply jitter noise during routing. If not provided + taken from tf.keras.backend. + + Returns: + Router indices or mask arrays (depending on router type). + """ + if training is None: + training = tf.keras.backend.learning_phase() + + # inputs shape [num_groups, tokens_per_group, hidden_dim] + router_probs, router_logits = self._compute_router_probabilities( + inputs, apply_jitter=training) + # router_probs [num_groups, tokens_per_group, num_experts] + # router_logits [num_groups, tokens_per_group, num_experts] + router_z_loss = _router_z_loss(router_logits) + self.add_loss(router_z_loss) + self.add_metric(router_z_loss, name="router_z_loss") + + routing_instructions = self._compute_routing_instructions( + router_probs, expert_capacity) + return routing_instructions + + def _compute_router_probabilities( + self, inputs: tf.Tensor, + apply_jitter: bool) -> Tuple[tf.Tensor, tf.Tensor]: + """Computes router probabilities from input tokens. + + Args: + inputs: Inputs from which router probabilities are computed, shape + [num_groups, tokens_per_group, hidden_dim]. + apply_jitter: If true, apply jitter noise. + + Returns: + - [num_groups, tokens_per_group, num_experts] probabilities for + each token and expert. Used for routing tokens to experts. + - [num_groups, tokens_per_group, num_experts] raw router logits. + Used for computing router z-loss. + """ + if apply_jitter and self.jitter_noise > 0: + inputs *= tf.random.uniform( + inputs.shape, + minval=1.0 - self.jitter_noise, + maxval=1.0 + self.jitter_noise, + dtype=inputs.dtype) + # inputs , router_logits + router_logits = self.router_weights(inputs) + router_probs = tf.keras.activations.softmax(router_logits, axis=-1) + return router_probs, router_logits + + def _compute_routing_instructions(self, router_probs: tf.Tensor, + expert_capacity: int) -> RouterOutput: + """Computes instructions for routing inputs to experts.""" + raise NotImplementedError( + "Router is an abstract class that should be subclassed.") + + +class MaskedRouter(Router): + """Abstract base router class for masked matmul dispatch routers. + + MaskedRouter(s) return RouterMask(s) containing a dispatch mask and combine + array for sending and receiving (via masked matmuls) inputs and outputs to and + from experts. + + Routing using masked matmuls is generally faster than scatter-based routing on + TPUs. + + Uses Keras add_loss() and add_metric() APIs. + """ + + def _compute_routing_instructions(self, router_probs: tf.Tensor, + expert_capacity: int) -> RouterMask: + """Computes masks for the top-k experts per token. + + Args: + router_probs: [num_groups, tokens_per_group, num_experts] + probabilities used to determine the routing of tokens to the experts. + expert_capacity: Each group will send this many tokens to each expert. + + Returns: + Router mask arrays. + """ + raise NotImplementedError( + "MaskedRouter is an abstract class that should be subclassed.") + + +class ExpertsChooseMaskedRouter(MaskedRouter): + """Masked matmul router using experts choose tokens assignment. + + This router uses the same mechanism as in Mixture-of-Experts with Expert + Choice (https://arxiv.org/abs/2202.09368): each expert selects its top + expert_capacity tokens. An individual token may be processed by multiple + experts or none at all. + + Note: "experts choose routing" should not be used in decoder blocks because it + breaks the autoregressive behavior, leading to a mismatch between training + (teacher forcing) and inference (autoregressive decoding). + + Uses Keras add_loss() and add_metric() APIs. + """ + + def _compute_routing_instructions(self, router_probs: tf.Tensor, + expert_capacity: int) -> RouterMask: + """Computes masks for the highest probability token per expert. + + Args: + router_probs: [num_groups, tokens_per_group, num_experts] + probabilities used to determine the routing of tokens to the experts. + expert_capacity: Each group will send this many tokens to each expert. + + Returns: + Dispatch and combine arrays for routing with masked matmuls. + """ + num_groups, tokens_per_group, _ = router_probs.shape + router_probs_t = tf.transpose(router_probs, perm=[0, 2, 1]) + # router_probs_t: [num_groups, num_experts, tokens_per_group] + + # Top expert_capacity router probability and corresponding token indices for + # each expert. + # Shapes [num_groups, num_experts, expert_capacity] + expert_gate, expert_index = tf.math.top_k( + router_probs_t, k=expert_capacity, sorted=False) + + # Convert to one-hot mask of expert indices for each token in each group. + # Shape: [num_groups, num_experts, expert_capacity, tokens_per_group]. + dispatch_mask = tf.one_hot( + expert_index, tokens_per_group, dtype=router_probs.dtype) + + # Move axes to conform with shape expected by MoeLayer API. + # Shape: [num_groups, tokens_per_group, num_experts, expert_capacity] + dispatch_mask = tf.transpose(dispatch_mask, perm=[0, 3, 1, 2]) + + # The combine array will be used for combining expert outputs, scaled by the + # router probabilities. + # Shape: [num_groups, num_experts, tokens_per_group, expert_capacity] + combine_array = tf.einsum( + "...ec,...tec->...tec", + expert_gate, + dispatch_mask) + + # Add load balancing loss. + # Each expert is choosing tokens until it reaches full capacity, so we don't + # need an auxiliary loading balancing loss for expert choice routing. + self.add_metric(0.0, name="load_balancing_loss") + + # Gather expert metrics. + # Number of tokens that were dispatched to at least one expert. + num_tokens = num_groups * tokens_per_group + num_tokens_dispatched_somewhere = tf.math.reduce_sum(tf.math.reduce_max( + dispatch_mask, axis=(-1, -2))) + fraction_tokens_left_behind = 1.0 - num_tokens_dispatched_somewhere / float( + num_tokens) + # Total number of tokens that were dispatched (one token could be + # dispatched to multiple experts). + num_tokens_dispatched = tf.math.reduce_sum(dispatch_mask) + # Of the tokens dispatched, how confident was the router in its routing? + router_confidence = tf.math.reduce_sum( + combine_array) / num_tokens_dispatched + + expert_usage = 1.0 # Experts fully utilized when "expert choose tokens" + + self.add_metric(fraction_tokens_left_behind, + name="fraction_tokens_left_behind") + self.add_metric(router_confidence, name="router_confidence") + self.add_metric(expert_usage, name="expert_usage") + + # Return to default dtype now that router computation is complete. + dtype = tf.keras.mixed_precision.global_policy().compute_dtype + dispatch_mask = tf.cast(dispatch_mask, dtype) + combine_array = tf.cast(combine_array, dtype) + output = RouterMask(dispatch_mask, combine_array) + return output + + +################## Model layers ################## + + +class FeedForward(tf.keras.layers.Layer): + """Feed-forward layer - position independent, dense, nonlinear transformation. + + Typically used in an MLP Transformer block. + """ + + def __init__( + self, + d_ff: int, + *, + dropout_rate: float = 0.1, + activation: Callable[[tf.Tensor], + tf.Tensor] = tf.keras.activations.gelu, + kernel_initializer: _InitializerType = _DEFAULT_KERNEL_INITIALIZER, + bias_initializer: _InitializerType = _DEFAULT_BIAS_INITIALIZER, + name: str = "feed_forward", + **kwargs): + """Initializes layer. + + Args: + d_ff: Dimension of feed-forward layer. + dropout_rate: The dropout probability. + activation: (Nonlinear) transform applied in layer. + kernel_initializer: Initialization scheme for kernel. + bias_initializer: Initialization scheme for bias. + name: Layer name. + **kwargs: Forwarded to super. + """ + super().__init__(name=name, **kwargs) + self.activation = activation + self.kernel_initializer = kernel_initializer + self.bias_initializer = bias_initializer + + self.intermediate_layer = tf.keras.layers.Dense( + d_ff, + kernel_initializer=tf_utils.clone_initializer(self.kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self.bias_initializer), + name="intermediate") + self.dropout_layer = tf.keras.layers.Dropout(dropout_rate) + + def build(self, input_shape: Tuple[int, int, int]): + """Creates the input shape dependent output weight variables.""" + self.output_layer = tf.keras.layers.Dense( + input_shape[-1], + kernel_initializer=tf_utils.clone_initializer(self.kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self.bias_initializer), + name="output") + + def call(self, + inputs: tf.Tensor, + *, + training: Optional[bool] = None) -> tf.Tensor: + """Applies layer to inputs. + + Args: + inputs: Batch of input embeddings, of shape + [batch_size, seq_len, hidden_dim]. + training: Only apply dropout during training. + + Returns: + Transformed inputs with the same shape as inputs + [batch_size, seq_len, hidden_dim]. + """ + x = self.intermediate_layer(inputs) + x = self.activation(x) + x = self.output_layer(x) + x = self.dropout_layer(x, training=training) + return x + + +class FeedForwardExperts(tf.keras.layers.Layer): + """Feed-forward layer with multiple experts. + + Note that call() takes inputs with shape + [num_groups, num_experts, expert_capacity, hidden_dim] + which is different from the usual [batch_size, seq_len, hidden_dim] used by + the FeedForward layer. + + The experts are independent FeedForward layers of the + same shape, i.e. the kernel doesn't have shape [hidden_dim, out_dim], but + [num_experts, hidden_dim, out_dim]. + """ + + def __init__( + self, + num_experts: int, + d_ff: int, + *, + dropout_rate: float = 0.1, + activation: Callable[[tf.Tensor], + tf.Tensor] = tf.keras.activations.gelu, + kernel_initializer: _InitializerType = _DEFAULT_KERNEL_INITIALIZER, + bias_initializer: _InitializerType = _DEFAULT_BIAS_INITIALIZER, + name: str = "experts", + **kwargs): + """Initializes layer. + + Args: + num_experts: Number of experts (i.e. number of independent feed-forward + blocks). + d_ff: Dimension of feed-forward layer of each expert. + dropout_rate: The dropout probability (expert_dropout_rate). + activation: (Nonlinear) transform applied in layer. + kernel_initializer: Initialization scheme for kernel. + bias_initializer: Initialization scheme for bias. + name: Layer name. + **kwargs: Forwarded to super. + """ + super().__init__(name=name, **kwargs) + self.num_experts = num_experts + self.activation = activation + self.kernel_initializer = kernel_initializer + self.bias_initializer = bias_initializer + + self.intermediate_layer = tf.keras.layers.EinsumDense( + "gech,ehf->gecf", + output_shape=(self.num_experts, None, d_ff), + bias_axes="ef", + kernel_initializer=tf_utils.clone_initializer(self.kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self.bias_initializer), + name="intermediate") + self.dropout_layer = tf.keras.layers.Dropout(dropout_rate) + + def build(self, input_shape: Tuple[int, int, int, int]): + """Creates the input shape dependent output weight variables.""" + if input_shape[1] != self.num_experts: + raise ValueError( + f"Input shape {input_shape} is inconsistent with num_experts " + f"{self.num_experts}.") + + self.output_layer = tf.keras.layers.EinsumDense( + "gecf,efh->gech", + output_shape=(self.num_experts, None, input_shape[-1]), + bias_axes="eh", + kernel_initializer=tf_utils.clone_initializer(self.kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self.bias_initializer), + name="output") + + def call(self, + inputs: tf.Tensor, + *, + training: Optional[bool] = None) -> tf.Tensor: + """Applies layer to inputs. + + Args: + inputs: Inputs of shape + [num_groups, num_experts, expert_capacity, hidden_dim]. + training: Only apply dropout during training. + + Returns: + Transformed inputs with the same shape as inputs + [num_groups, num_experts, expert_capacity, hidden_dim]. + """ + x = self.intermediate_layer(inputs) + x = self.activation(x) + x = self.output_layer(x) + x = self.dropout_layer(x, training=training) + return x + + +class MoeLayer(tf.keras.layers.Layer): + """Sparse MoE layer with per-token routing. + + In this TF implementation, all experts need to fit onto a single device + allowing for batch parallelism only. + + Uses Keras add_loss() and add_metric() APIs. + + Attributes: + num_experts: Number of experts (i.e. number of independent feed-forward + blocks). + """ + + def __init__( + self, + experts: FeedForwardExperts, + router: MaskedRouter, + *, + train_capacity_factor: float = 1.0, + eval_capacity_factor: float = 1.0, + min_expert_capacity: int = 4, + max_group_size: int = 4096, + strict_group_size: bool = False, + name: str = "moe", + **kwargs): + """Init. + + Args: + experts: Instance of FeedForwardExperts. Needs to have the same + num_experts as the router. + router: Instance of MaskedRouter to route the tokens to + the different experts. + train_capacity_factor: Scaling factor to increase the expert token + capacity during training. This factor plays an analogous, but slightly + different, role depending on the routing assignment algorithm: + - For "tokens choose" routing, the capacity factor only affects the + maximum number of tokens that an expert will process. It does not + affect how many experts a given token is routed to; see the + num_selected_experts attributes of "tokens choose" routers. + - For "experts choose" routing, because experts always fill their + buffer, increasing the capacity factor will increase the number of + tokens that an expert will process AND will indirectly increase the + number of experts that a given token is routed to. + eval_capacity_factor: As above, but used during evaluation. + min_expert_capacity: Minimum token processing capacity for each expert. + max_group_size: The total number of tokens on each device is subdivided + into groups of this size. Router computations are then performed on a + per-group basis. A larger group size will result in slower but more + accurate top-k and sorting computations, whereas a smaller group size + will result in faster but more approximate (and potentially less stable) + routing choices. Note that actual group size may be smaller than + max_group_size for consistency with the number of experts and tokens; + see also `strict_group_size` attribute. In practice, + we find that imperfect routing choices are tolerable and recommend + choosing a group size on the order of 4096 tokens, although this number + will vary based on model configuration and size. + strict_group_size: If True, fail if unable to set the token group size + equal to max_group_size. If False (default), the actual group size may + be smaller than max_group_size for consistency with the number of + experts and tokens. + name: Layer name. + **kwargs: Forwarded to super. + """ + super().__init__(name=name, **kwargs) + self._experts = experts + self._router = router + + self.num_experts = experts.num_experts + assert experts.num_experts == router.num_experts + + self._train_capacity_factor = train_capacity_factor + self._eval_capacity_factor = eval_capacity_factor + self._max_group_size = max_group_size + self._min_expert_capacity = min_expert_capacity + self._strict_group_size = strict_group_size + + def call(self, + inputs: tf.Tensor, + *, + training: Optional[bool] = None) -> tf.Tensor: + """Applies MoeLayer. + + Args: + inputs: Batch of input embeddings of shape + [batch_size, seq_length, hidden_dim]. + training: Only apply dropout and jitter noise during training. If not + provided taken from tf.keras.backend. + + Returns: + Transformed inputs with same shape as inputs: + [batch_size, seq_length, hidden_dim]. + + Raises: + ValueError if we cannot find a group_size satisfying given requirements. + """ + if training is None: + training = tf.keras.backend.learning_phase() + + # inputs shape [batch_size, seq_length, hidden_dim] + per_device_batch_size, seq_length, hidden_dim = inputs.shape + num_tokens = per_device_batch_size * seq_length + num_groups = self._num_groups(num_tokens, self._max_group_size) + tokens_per_group = num_tokens // num_groups + + if training: + capacity_factor = self._train_capacity_factor + else: + capacity_factor = self._eval_capacity_factor + # Each group will send expert_capacity tokens to each expert. + expert_capacity = int( + round(capacity_factor * tokens_per_group / self.num_experts)) + expert_capacity = max(expert_capacity, self._min_expert_capacity) + logging.info( + "Selected expert_capacity=%d for num_experts=%d and training=%r.", + expert_capacity, self.num_experts, training) + + # Reshape batch and sequence/token dimensions for expert routing. + x = tf.reshape(inputs, (num_groups, tokens_per_group, hidden_dim)) + + x = self._mask_and_dispatch_to_experts(x, expert_capacity, training) + + # Return to original input shape. + x = tf.reshape(x, (per_device_batch_size, seq_length, hidden_dim)) + return x + + def _num_groups(self, num_tokens: int, max_group_size: int) -> int: + """Returns the number of token routing groups. + + Note that the quantities are local to the device. + + We select the smallest num_groups such that: + - num_groups >= num_tokens / max_group_size (ensuring the group size is no + larger than max_group_size), + - num_tokens % num_groups = 0 (ensuring that the group size evenly divides + into the num_tokens), + + Args: + num_tokens: Number of tokens from input batch. + max_group_size: Maximum size of each token routing group. Actual group + size may end up being smaller unless strict_group_size==True. + + Returns: + Number of token routing groups. + + Raises: + ValueError if we cannot find a group_size satisfying the above + requirements. + """ + # Increase the number of groups (and decrease the group size) until we have + # a viable number of groups. + min_num_groups = int(np.ceil(num_tokens / max_group_size)) + num_groups = min_num_groups + while num_groups < num_tokens and num_tokens % num_groups != 0: + num_groups += 1 + + group_size = num_tokens // num_groups + logging.info( + "Selected group_size=%d and num_groups=%d for input num_tokens=%d, " + "max_group_size=%d, num_experts=%d.", + group_size, num_groups, num_tokens, max_group_size, self.num_experts) + + if group_size < self._min_expert_capacity: + raise ValueError( + f"Local (per-device) group_size {group_size} is smaller than " + f"min_expert_capacity {self._min_expert_capacity}, which is probably " + "not intended. Please increase max_group_size {max_group_size} to" + " seq_length or increase batch_size or decrease min_expert_capacity.") + + if self._strict_group_size and group_size != self._max_group_size: + raise ValueError( + f"Selected group_size={group_size} is less than the " + f"max_group_size={max_group_size}. Exiting because strict mode is " + "active (strict_group_size=True)") + + return num_groups + + def _mask_and_dispatch_to_experts(self, inputs: tf.Tensor, + expert_capacity: int, + training: bool) -> tf.Tensor: + """Wraps expert masked routing and dispatching algorithm. + + This algorithm takes the following steps: + (1) Compute dispatch mask and combine array using self._router. + (2) Dispatch inputs to experts based on dispatch mask. + (3) Recombine individual expert outputs using combine array. + + Args: + inputs: [num_groups, tokens_per_group, hidden_dim] inputs to + send to experts. + expert_capacity: Each group will send this many tokens to each expert. + training: If true, apply jitter noise during routing and dropout + during expert computation. + + Returns: + [num_groups, num_tokens_per_group, hidden_dim] outputs from + experts. + """ + # Shape [num_groups, tokens_per_group, num_experts, expert_capacity] + router_mask = self._router( + inputs, + expert_capacity=expert_capacity, + training=training) + + # Shape [num_groups, num_experts, expert_capacity, hidden_dim] + expert_inputs = tf.einsum( + "gth,gtec->gech", + inputs, + router_mask.dispatch_mask) + + expert_outputs = self._experts(expert_inputs, training=training) + + # Shape [num_groups, tokens_per_group, hidden_dim] + combined_outputs = tf.einsum( + "gech,gtec->gth", + expert_outputs, + router_mask.combine_array) + + return combined_outputs + + +class MoeLayerWithBackbone(tf.keras.layers.Layer): + """Sparse MoE layer plus a FeedForward layer evaluated for all tokens. + + Uses Keras add_loss() and add_metric() APIs. + """ + + def __init__( + self, + moe: MoeLayer, + backbone_d_ff: int, + *, + dropout_rate: float = 0.1, + activation: Callable[[tf.Tensor], + tf.Tensor] = tf.keras.activations.gelu, + kernel_initializer: _InitializerType = _DEFAULT_KERNEL_INITIALIZER, + bias_initializer: _InitializerType = _DEFAULT_BIAS_INITIALIZER, + name: str = "moe_with_backbone", + **kwargs): + """Init. + + Args: + moe: Instance of MoeLayer with experts and router. + backbone_d_ff: Dimension of feed-forward layer of a lightweight backbone, + which is evaluated for all tokens. + dropout_rate: Dropout rate for the backbone. + activation: (Nonlinear) transform applied in the backbone. + kernel_initializer: Initialization scheme for kernels in the backbone. + bias_initializer: Initialization scheme for biases in the backbone. + name: Layer name. + **kwargs: Forwarded to super. + """ + super().__init__(name=name, **kwargs) + self._moe = moe + + self._backbone = FeedForward( + backbone_d_ff, + dropout_rate=dropout_rate, + activation=activation, + kernel_initializer=tf_utils.clone_initializer(kernel_initializer), + bias_initializer=tf_utils.clone_initializer(bias_initializer), + name="backbone") + + def call(self, + inputs: tf.Tensor, + *, + training: Optional[bool] = None) -> tf.Tensor: + """Applies MoeLayerWithBackbone layer. + + Args: + inputs: Batch of input embeddings of shape + [batch_size, seq_length, hidden_dim]. + training: Only apply dropout and jitter noise during training. If not + provided taken from tf.keras.backend. + + Returns: + Transformed inputs with same shape as inputs: + [batch_size, seq_length, hidden_dim]. + """ + return self._backbone( + inputs, training=training) + self._moe( + inputs, training=training) diff --git a/official/nlp/modeling/layers/moe_test.py b/official/nlp/modeling/layers/moe_test.py new file mode 100644 index 0000000000000000000000000000000000000000..47c10175e578c017d72fa91ceef4937b68a691e8 --- /dev/null +++ b/official/nlp/modeling/layers/moe_test.py @@ -0,0 +1,255 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for moe.py.""" + +import ml_collections +import numpy as np +import tensorflow as tf + +from official.nlp.modeling.layers import moe + + +def small_config() -> ml_collections.ConfigDict: + """Creates a small model config that can be used by all tests.""" + config = ml_collections.ConfigDict() + + config.d_ff = 32 + config.dropout_rate = 0.1 + + config.num_experts = 2 + config.expert_d_ff = 33 + config.expert_dropout_rate = 0.1 + config.jitter_noise = 0.1 + config.train_capacity_factor = 1.0 + config.eval_capacity_factor = 1.0 + config.min_expert_capacity = 1 + config.max_group_size = 9 + + config.backbone_d_ff = 13 + return config + + +def make_input_ones(batch_size: int = 2, + seq_length: int = 10, + hidden_dim: int = 7) -> tf.Tensor: + return tf.ones((batch_size, seq_length, hidden_dim), dtype=tf.float32) + + +def make_experts_input_ones(num_groups: int = 1, + num_experts: int = 2, + expert_capacity: int = 5, + hidden_dim: int = 7) -> tf.Tensor: + return tf.ones((num_groups, num_experts, expert_capacity, hidden_dim), + dtype=tf.float32) + + +class MoeTest(tf.test.TestCase): + + def tearDown(self): + super().tearDown() + tf.keras.mixed_precision.set_global_policy('float32') + + def test_router_z_loss_dtype(self): + x = tf.constant([[[10.0, 5.0]]], dtype=tf.float32) + y = moe._router_z_loss(x) + expected = (5 + np.log(np.exp(5) + 1))**2 + self.assertAllClose(expected, y, atol=1e-7) + + x = tf.constant([[[10.0, 5.0]]], dtype=tf.bfloat16) + y = moe._router_z_loss(x) + expected = 100.0 + self.assertAllClose(expected, y, atol=1e-7) + + def test_router_z_loss_shape(self): + x = make_input_ones(2, 5, 7) + y = moe._router_z_loss(x) + expected = (np.log(7) + 1)**2 + self.assertAllClose(expected, y, atol=1e-7) + + def test_experts_choose_masked_router_dtype_shape(self): + tf.keras.mixed_precision.set_global_policy('mixed_bfloat16') + num_groups = 2 + tokens_per_group = 3 + hidden_dim = tokens_per_group + num_experts = tokens_per_group + expert_capacity = 2 + x = np.zeros([num_groups, tokens_per_group, hidden_dim]) + x[0, 0, 0] += 1 + x[0, :2, :2] += 1 + x[1, 1:, 1:] += 1 + x[1, -1, -1] += 1 + + router = moe.ExpertsChooseMaskedRouter( + num_experts=num_experts, + jitter_noise=0.1, + use_bias=True, + kernel_initializer=tf.keras.initializers.get('identity'), + bias_initializer=tf.keras.initializers.get('ones')) + router_mask = router(x, expert_capacity=expert_capacity, training=False) + + self.assertDTypeEqual(router_mask.dispatch_mask, tf.bfloat16) + self.assertDTypeEqual(router_mask.combine_array, tf.bfloat16) + + expect_shape = [num_groups, tokens_per_group, num_experts, expert_capacity] + self.assertEqual(expect_shape, router_mask.dispatch_mask.shape) + self.assertEqual(expect_shape, router_mask.combine_array.shape) + + # top_k call may not be sorted, so can't compare the output directly + # Check that the output contains only 0s and 1s + out_dm = router_mask.dispatch_mask.numpy() + self.assertSetEqual({0, 1}, set(out_dm.flatten().astype(np.int32))) + # Check that the right tokens for selected + out_dm_indices = np.dot( + out_dm.transpose((0, 2, 3, 1)), np.arange(tokens_per_group)) + # Shape [num_groups, num_experts, expert_capacity] + self.assertSetEqual({0, 1}, set(out_dm_indices[0, 0, :].astype(np.int32))) + self.assertSetEqual({1, 2}, set(out_dm_indices[0, 1, :].astype(np.int32))) + self.assertSetEqual({1, 2}, set(out_dm_indices[0, 2, :].astype(np.int32))) + self.assertSetEqual({0, 1}, set(out_dm_indices[1, 0, :].astype(np.int32))) + self.assertSetEqual({0, 1}, set(out_dm_indices[1, 1, :].astype(np.int32))) + self.assertSetEqual({1, 2}, set(out_dm_indices[1, 2, :].astype(np.int32))) + + out_ca = router_mask.combine_array.numpy() + out_ca = np.dot(out_ca, np.ones((expert_capacity,))) + + expected_combine_array = np.array( + [[[0.66, 0.0, 0.0], [0.42, 0.42, 0.16], [0.0, 0.33, 0.33]], + [[0.33, 0.33, 0.0], [0.16, 0.42, 0.42], [0.0, 0.0, 0.66]]]) + self.assertAllClose(expected_combine_array, out_ca, atol=1e-2) + + def test_feed_forward_shape_and_vars(self): + config = small_config() + layer = moe.FeedForward(d_ff=config.d_ff, dropout_rate=config.dropout_rate) + inputs = make_input_ones() + outputs = layer(inputs) + self.assertAllEqual(tf.shape(inputs), tf.shape(outputs)) + var_names = sorted([v.name for v in layer.trainable_variables]) + self.assertAllEqual(['feed_forward/intermediate/bias:0', + 'feed_forward/intermediate/kernel:0', + 'feed_forward/output/bias:0', + 'feed_forward/output/kernel:0'], var_names) + + def test_feed_forward_manual(self): + config = small_config() + layer = moe.FeedForward( + d_ff=config.d_ff, + dropout_rate=config.dropout_rate, + activation=tf.keras.activations.relu, + kernel_initializer=tf.keras.initializers.get('ones'), + bias_initializer=tf.keras.initializers.get('ones')) + inputs = make_input_ones(1, 2, 3) + outputs = layer(inputs, training=False) + manual_outputs = tf.constant([[[129.0, 129.0, 129.0], + [129.0, 129.0, 129.0]]]) + self.assertAllClose(manual_outputs, outputs, atol=1e-7) + + def test_feed_forward_experts_shape_and_vars(self): + config = small_config() + layer = moe.FeedForwardExperts( + num_experts=config.num_experts, + d_ff=config.expert_d_ff, + dropout_rate=config.expert_dropout_rate) + inputs = make_experts_input_ones() + outputs = layer(inputs) + self.assertAllEqual(tf.shape(inputs), tf.shape(outputs)) + var_names = sorted([v.name for v in layer.trainable_variables]) + self.assertAllEqual(['experts/intermediate/bias:0', + 'experts/intermediate/kernel:0', + 'experts/output/bias:0', + 'experts/output/kernel:0'], var_names) + + def test_feed_forward_experts_manual(self): + config = small_config() + layer = moe.FeedForwardExperts( + num_experts=1, + d_ff=config.expert_d_ff, + dropout_rate=config.expert_dropout_rate, + activation=tf.keras.activations.relu, + kernel_initializer=tf.keras.initializers.get('ones'), + bias_initializer=tf.keras.initializers.get('ones')) + inputs = make_experts_input_ones(1, 1, 2, 3) + outputs = layer(inputs, training=False) + manual_outputs = tf.constant([[[[133.0, 133.0, 133.0], + [133.0, 133.0, 133.0]]]]) + self.assertAllClose(manual_outputs, outputs, atol=1e-7) + + def test_moe_layer(self): + config = small_config() + experts = moe.FeedForwardExperts( + num_experts=config.num_experts, + d_ff=config.expert_d_ff, + dropout_rate=config.expert_dropout_rate) + router = moe.ExpertsChooseMaskedRouter( + config.num_experts, + jitter_noise=config.jitter_noise) + moe_layer = moe.MoeLayer( + experts, + router, + train_capacity_factor=config.train_capacity_factor, + eval_capacity_factor=config.eval_capacity_factor, + max_group_size=config.max_group_size, + min_expert_capacity=config.min_expert_capacity) + + inputs = make_input_ones() + with self.assertLogs('absl', level='INFO') as cm: + outputs = moe_layer(inputs, training=True) + self.assertAllEqual(tf.shape(inputs), tf.shape(outputs)) + + self.assertEqual(cm.output, [ + ('INFO:absl:Selected group_size=5 and num_groups=4 for input ' + 'num_tokens=20, max_group_size=9, num_experts=2.'), + ('INFO:absl:Selected expert_capacity=2 for num_experts=2 and ' + 'training=True.')]) + + var_names = sorted([v.name for v in moe_layer.trainable_variables]) + self.assertAllEqual(['moe/experts/intermediate/bias:0', + 'moe/experts/intermediate/kernel:0', + 'moe/experts/output/bias:0', + 'moe/experts/output/kernel:0', + 'moe/router/router_weights/bias:0', + 'moe/router/router_weights/kernel:0'], var_names) + self.assertLen(moe_layer.losses, 1) + metrics = [metric.name for metric in moe_layer.metrics] + self.assertSetEqual( + { + 'router_z_loss', 'load_balancing_loss', + 'fraction_tokens_left_behind', 'router_confidence', 'expert_usage' + }, set(metrics)) + + def test_moe_layer_with_backbone(self): + config = small_config() + experts = moe.FeedForwardExperts( + num_experts=config.num_experts, + d_ff=config.expert_d_ff, + dropout_rate=config.expert_dropout_rate) + router = moe.ExpertsChooseMaskedRouter( + config.num_experts, + jitter_noise=config.jitter_noise) + moe_layer = moe.MoeLayer( + experts, + router, + train_capacity_factor=config.train_capacity_factor, + eval_capacity_factor=config.eval_capacity_factor, + max_group_size=config.max_group_size, + min_expert_capacity=config.min_expert_capacity) + layer = moe.MoeLayerWithBackbone(moe_layer, config.backbone_d_ff) + + inputs = make_input_ones() + outputs = layer(inputs) + self.assertAllEqual(tf.shape(inputs), tf.shape(outputs)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/nlp/modeling/layers/multi_channel_attention.py b/official/nlp/modeling/layers/multi_channel_attention.py index dfdf7274c9b8d30a514b1dfc9b43a9e4533e31d5..94c22aee3330f4eb7f221447547124f969dec3f8 100644 --- a/official/nlp/modeling/layers/multi_channel_attention.py +++ b/official/nlp/modeling/layers/multi_channel_attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,6 +18,7 @@ import math import tensorflow as tf + from official.modeling import tf_utils from official.nlp.modeling.layers import masked_softmax @@ -48,7 +49,7 @@ class VotingAttention(tf.keras.layers.Layer): kernel_constraint=None, bias_constraint=None, **kwargs): - super(VotingAttention, self).__init__(**kwargs) + super().__init__(**kwargs) self._num_heads = num_heads self._head_size = head_size self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) @@ -60,26 +61,28 @@ class VotingAttention(tf.keras.layers.Layer): def build(self, unused_input_shapes): common_kwargs = dict( - kernel_initializer=self._kernel_initializer, - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, kernel_constraint=self._kernel_constraint, bias_constraint=self._bias_constraint) - self._query_dense = tf.keras.layers.experimental.EinsumDense( + self._query_dense = tf.keras.layers.EinsumDense( "BAE,ENH->BANH", output_shape=(None, self._num_heads, self._head_size), bias_axes="NH", name="query", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) - self._key_dense = tf.keras.layers.experimental.EinsumDense( + self._key_dense = tf.keras.layers.EinsumDense( "BAE,ENH->BANH", output_shape=(None, self._num_heads, self._head_size), bias_axes="NH", name="key", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) - super(VotingAttention, self).build(unused_input_shapes) + super().build(unused_input_shapes) def call(self, encoder_outputs, doc_attention_mask): num_docs = tf_utils.get_shape_list(encoder_outputs, expected_rank=[4])[1] @@ -120,7 +123,7 @@ class MultiChannelAttention(tf.keras.layers.MultiHeadAttention): """ def _build_attention(self, rank): - super(MultiChannelAttention, self)._build_attention(rank) # pytype: disable=attribute-error # typed-keras + super()._build_attention(rank) # pytype: disable=attribute-error # typed-keras self._masked_softmax = masked_softmax.MaskedSoftmax(mask_expansion_axes=[2]) def call(self, diff --git a/official/nlp/modeling/layers/multi_channel_attention_test.py b/official/nlp/modeling/layers/multi_channel_attention_test.py index 2831fc29a5c9f32dcdfba189427fb4d7cbd9f31b..8c022046756b0ce6c106ca21664813adfd8ca4c3 100644 --- a/official/nlp/modeling/layers/multi_channel_attention_test.py +++ b/official/nlp/modeling/layers/multi_channel_attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/on_device_embedding.py b/official/nlp/modeling/layers/on_device_embedding.py index 3d2faa45fe09c2b5e05a67d346d328d10310e96e..6cc5a05b4fe2fd3ca9e92e1979ba6e6bd1e56bf7 100644 --- a/official/nlp/modeling/layers/on_device_embedding.py +++ b/official/nlp/modeling/layers/on_device_embedding.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -47,7 +47,7 @@ class OnDeviceEmbedding(tf.keras.layers.Layer): scale_factor=None, **kwargs): - super(OnDeviceEmbedding, self).__init__(**kwargs) + super().__init__(**kwargs) self._vocab_size = vocab_size self._embedding_width = embedding_width self._initializer = initializer @@ -62,7 +62,7 @@ class OnDeviceEmbedding(tf.keras.layers.Layer): "use_one_hot": self._use_one_hot, "scale_factor": self._scale_factor, } - base_config = super(OnDeviceEmbedding, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def build(self, input_shape): @@ -72,7 +72,7 @@ class OnDeviceEmbedding(tf.keras.layers.Layer): initializer=self._initializer, dtype=tf.float32) - super(OnDeviceEmbedding, self).build(input_shape) + super().build(input_shape) def call(self, inputs): flat_inputs = tf.reshape(inputs, [-1]) diff --git a/official/nlp/modeling/layers/on_device_embedding_test.py b/official/nlp/modeling/layers/on_device_embedding_test.py index b724130a181f0666ab6f8f49e27c88f51727f8a5..373cfdb6d3dc8a366939a804c31dea7ecee7734c 100644 --- a/official/nlp/modeling/layers/on_device_embedding_test.py +++ b/official/nlp/modeling/layers/on_device_embedding_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/pack_optimization.py b/official/nlp/modeling/layers/pack_optimization.py new file mode 100644 index 0000000000000000000000000000000000000000..2c35faac0f2a2b6a1fec9d5c2d99c950fca13785 --- /dev/null +++ b/official/nlp/modeling/layers/pack_optimization.py @@ -0,0 +1,250 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Pack sequence optimization on accelerators.""" +from typing import Dict +import tensorflow as tf +from official.modeling import tf_utils +from official.nlp.modeling.layers import rezero_transformer +from official.nlp.modeling.layers import self_attention_mask +from official.nlp.modeling.layers import transformer_encoder_block +from official.nlp.modeling.layers import transformer_scaffold + + +@tf.keras.utils.register_keras_serializable(package='Text') +class PackBertEmbeddings(tf.keras.layers.Layer): + """Performs packing tricks for BERT inputs to improve TPU utilization.""" + + def __init__(self, pack_sequences: int, **kwargs): + super().__init__(**kwargs) + self.pack_sequences = pack_sequences + + def call(self, input_embeddings: tf.Tensor, + input_mask: tf.Tensor) -> Dict[str, tf.Tensor]: + batch_size, seq_len, embedding_dim = tf_utils.get_shape_list( + input_embeddings, expected_rank=3) + reduced_batch_size = batch_size // self.pack_sequences + packed_seq_len = self.pack_sequences * seq_len + packed_embeddings = tf.reshape( + input_embeddings, [reduced_batch_size, packed_seq_len, embedding_dim]) + input_mask = tf.reshape(input_mask, [reduced_batch_size, packed_seq_len]) + example_ids = 1 + tf.range(self.pack_sequences) + # Shape: [batch_size, seq_len, pack_sequences]. + example_ids = tf.tile(example_ids[None, :, None], + [reduced_batch_size, 1, seq_len]) + example_ids = tf.reshape(example_ids, [reduced_batch_size, packed_seq_len]) + example_ids = tf.where( + tf.math.equal(input_mask, 0), tf.zeros_like(example_ids), example_ids) + packing_mask = tf.cast( + tf.equal( + tf.expand_dims(example_ids, 2), tf.expand_dims(example_ids, 1)), + dtype=tf.bool) + + attention_mask = self_attention_mask.get_mask( + packed_embeddings, input_mask, dtype=tf.bool) + + combined_attention_mask = tf.cast( + tf.math.logical_and(attention_mask, packing_mask), tf.float32) + + return dict( + packed_embeddings=packed_embeddings, + combined_attention_mask=combined_attention_mask) + + +@tf.keras.utils.register_keras_serializable(package='Text') +class StridedTransformerEncoderBlock( + transformer_encoder_block.TransformerEncoderBlock): + """Transformer layer for packing optimization to stride over inputs.""" + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + if self._output_range is not None: + raise ValueError('StridedTransformerEncoderBlock does not ' + 'support `output_range` argument.') + + def call(self, inputs, stride: tf.Tensor): + if isinstance(inputs, (list, tuple)): + if len(inputs) == 2: + input_tensor, attention_mask = inputs + key_value = None + elif len(inputs) == 3: + input_tensor, key_value, attention_mask = inputs + else: + raise ValueError('Unexpected inputs to %s with length at %d' % + (self.__class__, len(inputs))) + else: + input_tensor, key_value, attention_mask = (inputs, None, None) + + if self._norm_first: + source_tensor = input_tensor[:, ::stride, :] + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm_kv(key_value) + target_tensor = input_tensor[:, ::stride, :] + if attention_mask is not None: + attention_mask = attention_mask[:, ::stride, :] + + if key_value is None: + key_value = input_tensor + attention_output = self._attention_layer( + query=target_tensor, value=key_value, attention_mask=attention_mask) + attention_output = self._attention_dropout(attention_output) + + if self._norm_first: + # Important to not combine `self._norm_first` and + # `self._use_query_residual` into one if clause because else is only for + # `_norm_first == False`. + if self._use_query_residual: + attention_output = source_tensor + attention_output + else: + if self._use_query_residual: + attention_output = target_tensor + attention_output + attention_output = self._attention_layer_norm(attention_output) + + if self._norm_first: + source_attention_output = attention_output + attention_output = self._output_layer_norm(attention_output) + inner_output = self._intermediate_dense(attention_output) + inner_output = self._intermediate_activation_layer(inner_output) + inner_output = self._inner_dropout_layer(inner_output) + layer_output = self._output_dense(inner_output) + layer_output = self._output_dropout(layer_output) + + if self._norm_first: + return source_attention_output + layer_output + + layer_output = tf.cast(layer_output, tf.float32) + return self._output_layer_norm(layer_output + attention_output) + + +@tf.keras.utils.register_keras_serializable(package='Text') +class StridedReZeroTransformer(rezero_transformer.ReZeroTransformer): + """ReZeroTransformer for packing optimization to stride over inputs.""" + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + if self._output_range is not None: + raise ValueError(f'{self.__class__} does not ' + 'support `output_range` argument.') + + def call(self, inputs, stride: tf.Tensor): + if isinstance(inputs, (list, tuple)): + if len(inputs) == 2: + input_tensor, attention_mask = inputs + key_value = None + elif len(inputs) == 3: + input_tensor, key_value, attention_mask = inputs + else: + raise ValueError(f'Unexpected inputs to {self.__class__} with ' + f'length at {len(inputs)}.') + else: + input_tensor, key_value, attention_mask = (inputs, None, None) + + target_tensor = input_tensor[:, ::stride, :] + if attention_mask is not None: + attention_mask = attention_mask[:, ::stride, :] + + if key_value is None: + key_value = input_tensor + + attention_output = self._attention_layer( + query=target_tensor, value=key_value, attention_mask=attention_mask) + attention_output = self._attention_dropout(attention_output) + attention_output = target_tensor + self._rezero_a * attention_output + if self._use_layer_norm: + attention_output = self._attention_layer_norm(attention_output) + else: + attention_output = tf.cast(attention_output, tf.float32) + + intermediate_output = self._intermediate_dense(attention_output) + intermediate_output = self._inner_activation_layer(intermediate_output) + layer_output = self._output_dense(intermediate_output) + layer_output = self._output_dropout(layer_output) + layer_output = attention_output + tf.cast(self._rezero_a_ffn * layer_output, + tf.float32) + if self._use_layer_norm: + layer_output = self._output_layer_norm(layer_output) + + return layer_output + + +@tf.keras.utils.register_keras_serializable(package='Text') +class StridedTransformerScaffold(transformer_scaffold.TransformerScaffold): + """TransformerScaffold for packing optimization to stride over inputs.""" + + def call(self, inputs, stride: tf.Tensor, training=None): + if isinstance(inputs, (list, tuple)): + if len(inputs) == 2: + input_tensor, attention_mask = inputs + key_value = None + elif len(inputs) == 3: + input_tensor, key_value, attention_mask = inputs + else: + raise ValueError('Unexpected inputs to %s with length at %d' % + (self.__class__, len(inputs))) + else: + input_tensor, key_value, attention_mask = (inputs, None, None) + + if key_value is None: + key_value = input_tensor + + if self._norm_first: + source_tensor = input_tensor[:, ::stride, :] + input_tensor = self._attention_layer_norm(input_tensor, training=training) + if attention_mask is not None: + attention_mask = attention_mask[:, ::stride, :] + target_tensor = input_tensor[:, ::stride, :] + + attention_output = self._attention_layer( + query=target_tensor, + value=key_value, + attention_mask=attention_mask, + training=training) + attention_output = self._attention_dropout( + attention_output, training=training) + + if self._norm_first: + attention_output = source_tensor + attention_output + else: + attention_output = self._attention_layer_norm( + target_tensor + attention_output, training=training) + if self._norm_first: + source_attention_output = attention_output + attention_output = self._output_layer_norm( + attention_output, training=training) + + if self._feedforward_block is None: + intermediate_output = self._intermediate_dense(attention_output) + intermediate_output = self._intermediate_activation_layer( + intermediate_output) + layer_output = self._output_dense(intermediate_output, training=training) + layer_output = self._output_dropout(layer_output, training=training) + layer_output = tf.cast(layer_output, tf.float32) + if self._norm_first: + layer_output = source_attention_output + layer_output + else: + layer_output = self._output_layer_norm( + layer_output + attention_output, training=training) + else: + if self._norm_first: + # if norm_first, assume the feedforward block will not apply layer norm + layer_output = self._feedforward_block( + attention_output, training=training) + layer_output += source_attention_output + else: + # if not norm_first, assume that the feedforwad does apply layer norm + layer_output = self._feedforward_block( + attention_output, training=training) + + return layer_output diff --git a/official/nlp/modeling/layers/pack_optimization_test.py b/official/nlp/modeling/layers/pack_optimization_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c475c4c33621d34a784905f3ada023380c9fead1 --- /dev/null +++ b/official/nlp/modeling/layers/pack_optimization_test.py @@ -0,0 +1,66 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for pack_optimization.""" + +import tensorflow as tf +from official.nlp.modeling.layers import pack_optimization + + +class PackOptimizationTest(tf.test.TestCase): + + def test_bert_embedding_packing(self): + batch_size, seq_len, embed_dim = 2, 4, 8 + pack_sequences = 2 + token_and_position_embed = tf.ones((batch_size, seq_len, embed_dim), + dtype=tf.float32) + input_mask = tf.ones((batch_size, seq_len), dtype=tf.int32) + + layer = pack_optimization.PackBertEmbeddings(pack_sequences=pack_sequences) + outputs = layer(token_and_position_embed, input_mask) + self.assertEqual(outputs["packed_embeddings"].shape, (1, 8, embed_dim)) + self.assertEqual(outputs["combined_attention_mask"].shape, (1, 8, 8)) + + def test_strided_transformer_encoder_block(self): + inputs = tf.zeros((2, 4, 8), dtype=tf.float32) + attention_mask = tf.ones((2, 4, 4), dtype=tf.float32) + transformer = pack_optimization.StridedTransformerEncoderBlock( + num_attention_heads=2, inner_dim=4, inner_activation="relu") + outputs = transformer([inputs, attention_mask], + stride=tf.constant(2, dtype=tf.int32)) + self.assertEqual(outputs.shape, (2, 2, 8)) + + def test_strided_rezero_transformer(self): + inputs = tf.zeros((2, 4, 8), dtype=tf.float32) + attention_mask = tf.ones((2, 4, 4), dtype=tf.float32) + transformer = pack_optimization.StridedReZeroTransformer( + num_attention_heads=2, inner_dim=4, inner_activation="relu") + outputs = transformer([inputs, attention_mask], + stride=tf.constant(2, dtype=tf.int32)) + self.assertEqual(outputs.shape, (2, 2, 8)) + + def test_strided_scaffold(self): + inputs = tf.zeros((2, 4, 8), dtype=tf.float32) + attention_mask = tf.ones((2, 4, 4), dtype=tf.float32) + test_layer = pack_optimization.StridedTransformerScaffold( + num_attention_heads=2, + inner_dim=128, + inner_activation="relu") + outputs = test_layer([inputs, attention_mask], + stride=tf.constant(2, dtype=tf.int32)) + self.assertEqual(outputs.shape, (2, 2, 8)) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/nlp/modeling/layers/per_dim_scale_attention.py b/official/nlp/modeling/layers/per_dim_scale_attention.py new file mode 100644 index 0000000000000000000000000000000000000000..0930b8d65a3d8f688777f6b48b9614977e8a5ba3 --- /dev/null +++ b/official/nlp/modeling/layers/per_dim_scale_attention.py @@ -0,0 +1,101 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Keras-based attention layer with learnable per dim scaling.""" +import gin +import numpy as np +import tensorflow as tf + + +@gin.configurable +@tf.keras.utils.register_keras_serializable(package='Text') +class PerDimScaleAttention(tf.keras.layers.MultiHeadAttention): + """Learn scales for individual dims. + + It can improve quality but might hurt training stability. + """ + + def _build_from_signature(self, query, value, key=None): + super()._build_from_signature(query=query, value=value, key=key) # pytype: disable=attribute-error + self._scale_dim = self._key_dim + with tf.init_scope(): + self.per_dim_scale = self.add_weight( + name='per_dim_scale', + shape=(self._scale_dim,), + initializer='zeros', + dtype=self.dtype, + trainable=True) + + def _scale_query(self, query): + # 1.0/tf.nn.softplus(0.0) = 1.442695041. Hard code this number so that we + # can avoid unnecessary XLA op fusion mess on TPU. + r_softplus_0 = 1.442695041 + scale = tf.constant( + r_softplus_0 / np.sqrt(float(self._scale_dim)), dtype=query.dtype) + + scale *= tf.nn.softplus(self.per_dim_scale) + return query * scale + + def _compute_attention(self, + query, + key, + value, + attention_mask=None, + training=None): + query = self._scale_query(query) + + attention_scores = tf.einsum(self._dot_product_equation, key, query) + + attention_scores = self._masked_softmax(attention_scores, attention_mask) + + attention_scores_dropout = self._dropout_layer( + attention_scores, training=training) + + # `context_layer` = [B, T, N, H] + attention_output = tf.einsum(self._combine_equation, + attention_scores_dropout, value) + return attention_output, attention_scores + + def call( + self, + query, + value, + key=None, + attention_mask=None, + return_attention_scores=False, + training=None, + ): + if not self._built_from_signature: + self._build_from_signature(query=query, value=value, key=key) + if key is None: + key = value + + # N = `num_attention_heads` + # H = `size_per_head` + # `query` = [B, T, N ,H] + query = self._query_dense(query) + + # `key` = [B, S, N, H] + key = self._key_dense(key) + + # `value` = [B, S, N, H] + value = self._value_dense(value) + + attention_output, attention_scores = self._compute_attention( + query, key, value, attention_mask, training) + attention_output = self._output_dense(attention_output) + + if return_attention_scores: + return attention_output, attention_scores + return attention_output diff --git a/official/nlp/modeling/layers/per_dim_scale_attention_test.py b/official/nlp/modeling/layers/per_dim_scale_attention_test.py new file mode 100644 index 0000000000000000000000000000000000000000..a5e61ab128fea3b1b0323352c120db32d09b9119 --- /dev/null +++ b/official/nlp/modeling/layers/per_dim_scale_attention_test.py @@ -0,0 +1,52 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for PerDimScaleAttention.""" + +import tensorflow as tf + +from official.nlp.modeling.layers import per_dim_scale_attention as attention + + +class PerDimScaleAttentionTest(tf.test.TestCase): + + def test_attention(self): + num_heads = 12 + key_dim = 64 + seq_length = 1024 + batch_size = 2 + test_layer = attention.PerDimScaleAttention( + num_heads=num_heads, key_dim=key_dim) + query = tf.random.normal( + shape=(batch_size, seq_length, key_dim * num_heads)) + value = query + output = test_layer(query=query, value=value) + self.assertEqual(output.shape, + [batch_size, seq_length, key_dim * num_heads]) + + def test_config(self): + num_heads = 12 + key_dim = 64 + test_layer = attention.PerDimScaleAttention( + num_heads=num_heads, key_dim=key_dim) + print(test_layer.get_config()) + new_layer = attention.PerDimScaleAttention.from_config( + test_layer.get_config()) + + # If the serialization was successful, the new config should match the old. + self.assertAllEqual(test_layer.get_config(), new_layer.get_config()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/nlp/modeling/layers/position_embedding.py b/official/nlp/modeling/layers/position_embedding.py index 8e2744b791d007bd762ca20c51f1498da6864a1e..8f27460d9e4fbc3e78fbb1d2479b1da1adb25deb 100644 --- a/official/nlp/modeling/layers/position_embedding.py +++ b/official/nlp/modeling/layers/position_embedding.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -53,7 +53,7 @@ class PositionEmbedding(tf.keras.layers.Layer): seq_axis=1, **kwargs): - super(PositionEmbedding, self).__init__(**kwargs) + super().__init__(**kwargs) if max_length is None: raise ValueError( "`max_length` must be an Integer, not `None`." @@ -81,7 +81,7 @@ class PositionEmbedding(tf.keras.layers.Layer): shape=[weight_sequence_length, width], initializer=self._initializer) - super(PositionEmbedding, self).build(input_shape) + super().build(input_shape) def call(self, inputs): input_shape = tf.shape(inputs) diff --git a/official/nlp/modeling/layers/position_embedding_test.py b/official/nlp/modeling/layers/position_embedding_test.py index 6593d428e6ef17c6ac99fb7bca28c2b71a67e6a1..f9f170854212e89b23fa7e3f598ab744b7b36d91 100644 --- a/official/nlp/modeling/layers/position_embedding_test.py +++ b/official/nlp/modeling/layers/position_embedding_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/relative_attention.py b/official/nlp/modeling/layers/relative_attention.py index be18c9d1eb0bdedab8b7bd07964b5aefadcfbe61..ffa1369796aab7cec23c9269f2c0f942ba16849e 100644 --- a/official/nlp/modeling/layers/relative_attention.py +++ b/official/nlp/modeling/layers/relative_attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -98,14 +98,14 @@ class MultiHeadRelativeAttention(tf.keras.layers.MultiHeadAttention): `[B, L, dim]`. segment_matrix: Optional `Tensor` representing segmentation IDs used in XLNet of shape `[B, S, S + M]`. - segment_encoding: Optional `Tensor` representing the segmentation - encoding as used in XLNet of shape `[2, num_heads, dim]`. - segment_attention_bias: Optional trainable bias parameter added to the - query had when calculating the segment-based attention score used in - XLNet of shape `[num_heads, dim]`. + segment_encoding: Optional `Tensor` representing the segmentation encoding + as used in XLNet of shape `[2, num_heads, dim]`. + segment_attention_bias: Optional trainable bias parameter added to the query + had when calculating the segment-based attention score used in XLNet of + shape `[num_heads, dim]`. state: Optional `Tensor` of shape `[B, M, E]` where M is the length of the - state or memory. - If passed, this is also attended over as in Transformer XL. + state or memory. If passed, this is also attended over as in Transformer + XL. attention_mask: A boolean mask of shape `[B, T, S]` that prevents attention to certain positions. """ @@ -144,7 +144,7 @@ class MultiHeadRelativeAttention(tf.keras.layers.MultiHeadAttention): with tf.init_scope(): einsum_equation, _, output_rank = _build_proj_equation( key_shape.rank - 1, bound_dims=1, output_dims=2) - self._encoding_dense = tf.keras.layers.experimental.EinsumDense( + self._encoding_dense = tf.keras.layers.EinsumDense( einsum_equation, output_shape=_get_output_shape(output_rank - 1, [self._num_heads, self._key_dim]), @@ -255,8 +255,8 @@ class MultiHeadRelativeAttention(tf.keras.layers.MultiHeadAttention): Args: query: attention input. value: attention input. - content_attention_bias: A trainable bias parameter added to the query - head when calculating the content-based attention score. + content_attention_bias: A trainable bias parameter added to the query head + when calculating the content-based attention score. positional_attention_bias: A trainable bias parameter added to the query head when calculating the position-based attention score. key: attention input. @@ -264,8 +264,8 @@ class MultiHeadRelativeAttention(tf.keras.layers.MultiHeadAttention): value. segment_matrix: Optional `Tensor` representing segmentation IDs used in XLNet. - segment_encoding: Optional `Tensor` representing the segmentation - encoding as used in XLNet. + segment_encoding: Optional `Tensor` representing the segmentation encoding + as used in XLNet. segment_attention_bias: Optional trainable bias parameter added to the query had when calculating the segment-based attention score used in XLNet. @@ -394,22 +394,22 @@ class TwoStreamRelativeAttention(MultiHeadRelativeAttention): content_stream: The content representation, commonly referred to as h. This serves a similar role to the standard hidden states in Transformer-XL. - content_attention_bias: A trainable bias parameter added to the query - head when calculating the content-based attention score. + content_attention_bias: A trainable bias parameter added to the query head + when calculating the content-based attention score. positional_attention_bias: A trainable bias parameter added to the query head when calculating the position-based attention score. - query_stream: The query representation, commonly referred to as g. - This only has access to contextual information and position, but not - content. If not provided, then this is MultiHeadRelativeAttention with + query_stream: The query representation, commonly referred to as g. This + only has access to contextual information and position, but not content. + If not provided, then this is MultiHeadRelativeAttention with self-attention. relative_position_encoding: relative positional encoding for key and value. - target_mapping: Optional `Tensor` representing the target mapping used - in partial prediction. + target_mapping: Optional `Tensor` representing the target mapping used in + partial prediction. segment_matrix: Optional `Tensor` representing segmentation IDs used in XLNet. - segment_encoding: Optional `Tensor` representing the segmentation - encoding as used in XLNet. + segment_encoding: Optional `Tensor` representing the segmentation encoding + as used in XLNet. segment_attention_bias: Optional trainable bias parameter added to the query head when calculating the segment-based attention score. state: (default None) optional state. If passed, this is also attended @@ -417,8 +417,8 @@ class TwoStreamRelativeAttention(MultiHeadRelativeAttention): content_attention_mask: (default None) Optional mask that is added to content attention logits. If state is not None, the mask source sequence dimension should extend M. - query_attention_mask: (default None) Optional mask that is added to - query attention logits. If state is not None, the mask source sequence + query_attention_mask: (default None) Optional mask that is added to query + attention logits. If state is not None, the mask source sequence dimension should extend M. Returns: @@ -496,4 +496,3 @@ class TwoStreamRelativeAttention(MultiHeadRelativeAttention): query_attention_output = self._output_dense(query_attention_output) return content_attention_output, query_attention_output - diff --git a/official/nlp/modeling/layers/relative_attention_test.py b/official/nlp/modeling/layers/relative_attention_test.py index b092bc6740c187e482ae6ebe4917f81ed67c40c3..d07093f72af1f63e680466b67b7119467866de41 100644 --- a/official/nlp/modeling/layers/relative_attention_test.py +++ b/official/nlp/modeling/layers/relative_attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/reuse_attention.py b/official/nlp/modeling/layers/reuse_attention.py index 6e36a7154366cb33c3a3ff1251bae2962c75d05e..75778cdc9ea8ed79f87c9cbcdd77ebbb11e02f10 100644 --- a/official/nlp/modeling/layers/reuse_attention.py +++ b/official/nlp/modeling/layers/reuse_attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,6 +22,8 @@ import string import numpy as np import tensorflow as tf +from official.modeling import tf_utils + _CHR_IDX = string.ascii_lowercase @@ -221,7 +223,7 @@ class ReuseMultiHeadAttention(tf.keras.layers.Layer): kernel_constraint=None, bias_constraint=None, **kwargs): - super(ReuseMultiHeadAttention, self).__init__(**kwargs) + super().__init__(**kwargs) self._num_heads = num_heads self._key_dim = key_dim self._value_dim = value_dim if value_dim else key_dim @@ -299,7 +301,7 @@ class ReuseMultiHeadAttention(tf.keras.layers.Layer): "key_shape": self._key_shape, "value_shape": self._value_shape, } - base_config = super(ReuseMultiHeadAttention, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) @classmethod @@ -347,8 +349,6 @@ class ReuseMultiHeadAttention(tf.keras.layers.Layer): self._key_shape = tf.TensorShape(key) common_kwargs = dict( - kernel_initializer=self._kernel_initializer, - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, @@ -362,42 +362,61 @@ class ReuseMultiHeadAttention(tf.keras.layers.Layer): if self._reuse_heads < self._num_heads: einsum_equation, bias_axes, output_rank = _build_proj_equation( free_dims, bound_dims=1, output_dims=2) - self._query_dense = tf.keras.layers.experimental.EinsumDense( + self._query_dense = tf.keras.layers.EinsumDense( einsum_equation, - output_shape=_get_output_shape(output_rank - 1, [ - self._num_heads - self._reuse_heads, self._key_dim]), + output_shape=_get_output_shape( + output_rank - 1, + [self._num_heads - self._reuse_heads, self._key_dim]), bias_axes=bias_axes if self._use_bias else None, name="query", + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) einsum_equation, bias_axes, output_rank = _build_proj_equation( self._key_shape.rank - 1, bound_dims=1, output_dims=2) - self._key_dense = tf.keras.layers.experimental.EinsumDense( + self._key_dense = tf.keras.layers.EinsumDense( einsum_equation, - output_shape=_get_output_shape(output_rank - 1, [ - self._num_heads - self._reuse_heads, self._key_dim]), + output_shape=_get_output_shape( + output_rank - 1, + [self._num_heads - self._reuse_heads, self._key_dim]), bias_axes=bias_axes if self._use_bias else None, name="key", + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) einsum_equation, bias_axes, output_rank = _build_proj_equation( self._value_shape.rank - 1, bound_dims=1, output_dims=2) self._value_dense = [] if self._reuse_heads > 0: - self._value_dense.append(tf.keras.layers.experimental.EinsumDense( - einsum_equation, - output_shape=_get_output_shape( - output_rank - 1, [self._reuse_heads, self._value_dim]), - bias_axes=bias_axes if self._use_bias else None, - name="value_reuse", - **common_kwargs)) + self._value_dense.append( + tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape( + output_rank - 1, [self._reuse_heads, self._value_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="value_reuse", + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer( + self._bias_initializer), + **common_kwargs)) if self._reuse_heads < self._num_heads: - self._value_dense.append(tf.keras.layers.experimental.EinsumDense( - einsum_equation, - output_shape=_get_output_shape(output_rank - 1, [ - self._num_heads - self._reuse_heads, self._value_dim]), - bias_axes=bias_axes if self._use_bias else None, - name="value_new", - **common_kwargs)) + self._value_dense.append( + tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape( + output_rank - 1, + [self._num_heads - self._reuse_heads, self._value_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="value_new", + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer( + self._bias_initializer), + **common_kwargs)) # Builds the attention computations for multi-head dot product attention. # These computations could be wrapped into the keras attention layer once @@ -434,18 +453,20 @@ class ReuseMultiHeadAttention(tf.keras.layers.Layer): output_shape = [self._query_shape[-1]] einsum_equation, bias_axes, output_rank = _build_proj_equation( free_dims, bound_dims=2, output_dims=len(output_shape)) - return tf.keras.layers.experimental.EinsumDense( + return tf.keras.layers.EinsumDense( einsum_equation, output_shape=_get_output_shape(output_rank - 1, output_shape), bias_axes=bias_axes if (use_bias and self._use_bias) else None, name=name, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) def _build_attention(self, rank): """Builds multi-head dot-product attention computations. This function builds attributes necessary for `_compute_attention` to - costomize attention computation to replace the default dot-product + customize attention computation to replace the default dot-product attention. Args: diff --git a/official/nlp/modeling/layers/reuse_attention_test.py b/official/nlp/modeling/layers/reuse_attention_test.py index 0da8cf5e31742366f50c0e1370d8de1a638aa9cb..fe9e71d2f06c09f3467c14a9a13f7e87f8e8a236 100644 --- a/official/nlp/modeling/layers/reuse_attention_test.py +++ b/official/nlp/modeling/layers/reuse_attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/reuse_transformer.py b/official/nlp/modeling/layers/reuse_transformer.py index 38736ea4242148ed08513acdae9c55e358187930..79f304bc8a09c35d989f4be78d73306c350998e1 100644 --- a/official/nlp/modeling/layers/reuse_transformer.py +++ b/official/nlp/modeling/layers/reuse_transformer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,6 +14,8 @@ """Keras-based TransformerEncoder block layer.""" import tensorflow as tf + +from official.modeling import tf_utils from official.nlp.modeling.layers import reuse_attention as attention @@ -131,7 +133,8 @@ class ReuseTransformer(tf.keras.layers.Layer): self._attention_initializer = tf.keras.initializers.get( attention_initializer) else: - self._attention_initializer = self._kernel_initializer + self._attention_initializer = tf_utils.clone_initializer( + self._kernel_initializer) self._attention_axes = attention_axes def build(self, input_shape): @@ -156,7 +159,6 @@ class ReuseTransformer(tf.keras.layers.Layer): else: self._attention_head_size = self._head_size common_kwargs = dict( - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, @@ -168,6 +170,7 @@ class ReuseTransformer(tf.keras.layers.Layer): dropout=self._attention_dropout, use_bias=self._use_bias, kernel_initializer=self._attention_initializer, + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), attention_axes=self._attention_axes, reuse_attention=self._reuse_attention, use_relative_pe=self._use_relative_pe, @@ -184,11 +187,12 @@ class ReuseTransformer(tf.keras.layers.Layer): axis=-1, epsilon=self._norm_epsilon, dtype=tf.float32)) - self._intermediate_dense = tf.keras.layers.experimental.EinsumDense( + self._intermediate_dense = tf.keras.layers.EinsumDense( einsum_equation, output_shape=(None, self._inner_dim), bias_axes="d", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="intermediate", **common_kwargs) policy = tf.keras.mixed_precision.global_policy() @@ -201,12 +205,13 @@ class ReuseTransformer(tf.keras.layers.Layer): self._inner_activation, dtype=policy) self._inner_dropout_layer = tf.keras.layers.Dropout( rate=self._inner_dropout) - self._output_dense = tf.keras.layers.experimental.EinsumDense( + self._output_dense = tf.keras.layers.EinsumDense( einsum_equation, output_shape=(None, hidden_size), bias_axes="d", name="output", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) self._output_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) # Use float32 in layernorm for numeric stability. diff --git a/official/nlp/modeling/layers/reuse_transformer_test.py b/official/nlp/modeling/layers/reuse_transformer_test.py index 40526ecb3c76ca5abbcc432b94abbc10c2852199..0376906e909e736528938f4ec1e0fc554ac30f2b 100644 --- a/official/nlp/modeling/layers/reuse_transformer_test.py +++ b/official/nlp/modeling/layers/reuse_transformer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -68,7 +68,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): # Invoke the model on test data. We can't validate the output data itself # (the NN is too complex) but this will rule out structural runtime errors. batch_size = 6 - input_data = 10 * np.random.random_sample( + input_data = np.random.random_sample( (batch_size, sequence_length, width)) _ = model.predict(input_data) @@ -89,7 +89,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): # Invoke the model on test data. We can't validate the output data itself # (the NN is too complex) but this will rule out structural runtime errors. batch_size = 6 - input_data = 10 * np.random.random_sample( + input_data = np.random.random_sample( (batch_size, sequence_length, width)) # The attention mask should be of shape (batch, from_seq_len, to_seq_len), # which here is (batch, sequence_length, sequence_length) @@ -104,7 +104,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): width = 80 batch_size = 6 - input_data = 10 * np.random.random_sample( + input_data = np.random.random_sample( (batch_size, sequence_length, width)) mask_data = np.random.randint( 2, size=(batch_size, sequence_length, sequence_length)) @@ -121,7 +121,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): new_layer.set_weights(test_layer.get_weights()) new_output_tensor, _ = new_layer([input_data, mask_data]) self.assertAllClose( - new_output_tensor, output_tensor[:, 0:1, :], atol=0.002, rtol=0.25) + new_output_tensor, output_tensor[:, 0:1, :], atol=0.002, rtol=0.01) def test_layer_output_range_with_relative_pe(self, transformer_cls): test_layer = transformer_cls( @@ -131,7 +131,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): width = 80 batch_size = 6 - input_data = 10 * np.random.random_sample( + input_data = np.random.random_sample( (batch_size, sequence_length, width)) mask_data = np.random.randint( 2, size=(batch_size, sequence_length, sequence_length)) @@ -149,7 +149,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): new_layer.set_weights(test_layer.get_weights()) new_output_tensor, _ = new_layer([input_data, mask_data]) self.assertAllClose( - new_output_tensor, output_tensor[:, 0:1, :], atol=5e-5, rtol=0.003) + new_output_tensor, output_tensor[:, 0:1, :], atol=0.002, rtol=0.01) def test_layer_output_range_without_mask(self, transformer_cls): test_layer = transformer_cls( @@ -159,7 +159,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): width = 80 batch_size = 6 - input_data = 10 * np.random.random_sample( + input_data = np.random.random_sample( (batch_size, sequence_length, width)) output_tensor, _ = test_layer(input_data) @@ -175,7 +175,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): new_layer.set_weights(test_layer.get_weights()) new_output_tensor, _ = new_layer(input_data) self.assertAllClose( - new_output_tensor, output_tensor[:, 0:1, :], atol=5e-5, rtol=0.003) + new_output_tensor, output_tensor[:, 0:1, :], atol=0.002, rtol=0.01) def test_layer_output_range_with_pre_norm(self, transformer_cls): test_layer = transformer_cls( @@ -185,7 +185,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): width = 80 batch_size = 6 - input_data = 10 * np.random.random_sample( + input_data = np.random.random_sample( (batch_size, sequence_length, width)) mask_data = np.random.randint( 2, size=(batch_size, sequence_length, sequence_length)) @@ -203,7 +203,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): new_layer.set_weights(test_layer.get_weights()) new_output_tensor, _ = new_layer([input_data, mask_data]) self.assertAllClose( - new_output_tensor, output_tensor[:, 0:1, :], atol=5e-5, rtol=0.003) + new_output_tensor, output_tensor[:, 0:1, :], atol=0.002, rtol=0.01) def test_layer_invocation_with_float16_dtype(self, transformer_cls): tf.keras.mixed_precision.set_global_policy('mixed_float16') @@ -223,7 +223,7 @@ class ReuseTransformerLayerTest(tf.test.TestCase, parameterized.TestCase): # Invoke the model on test data. We can't validate the output data itself # (the NN is too complex) but this will rule out structural runtime errors. batch_size = 6 - input_data = (10 * np.random.random_sample( + input_data = (np.random.random_sample( (batch_size, sequence_length, width))) # The attention mask should be of shape (batch, from_seq_len, to_seq_len), # which here is (batch, sequence_length, sequence_length) @@ -368,7 +368,7 @@ class ReuseTransformerArgumentTest(tf.test.TestCase, parameterized.TestCase): # Invoke the model on test data. We can't validate the output data itself # (the NN is too complex) but this will rule out structural runtime errors. batch_size = 6 - input_data = 10 * np.random.random_sample( + input_data = np.random.random_sample( (batch_size, sequence_length, width)) # The attention mask should be of shape (batch, from_seq_len, to_seq_len), # which here is (batch, sequence_length, sequence_length) @@ -404,7 +404,7 @@ class ReuseTransformerArgumentTest(tf.test.TestCase, parameterized.TestCase): # Invoke the model on test data. We can't validate the output data itself # (the NN is too complex) but this will rule out structural runtime errors. batch_size = 6 - input_data = (10 * np.random.random_sample( + input_data = (np.random.random_sample( (batch_size, sequence_length, width))) # The attention mask should be of shape (batch, from_seq_len, to_seq_len), # which here is (batch, sequence_length, sequence_length) diff --git a/official/nlp/modeling/layers/rezero_transformer.py b/official/nlp/modeling/layers/rezero_transformer.py index 6a9fb1a66459d81a9c5a8c0b8d305c1c917aa84b..6796345c60da38457015bd6454bb7ba4918d390e 100644 --- a/official/nlp/modeling/layers/rezero_transformer.py +++ b/official/nlp/modeling/layers/rezero_transformer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,10 +14,13 @@ """Keras-based rezero-transformer block layer (Transformer with ReZero).""" # pylint: disable=g-classes-have-attributes +from typing import Optional +from absl import logging import gin import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling.layers import util @@ -33,8 +36,10 @@ class ReZeroTransformer(tf.keras.layers.Layer): Args: num_attention_heads: Number of attention heads. - intermediate_size: Size of the intermediate layer. - intermediate_activation: Activation for the intermediate layer. + inner_dim: The output dimension of the first Dense layer in a two-layer + feedforward network. + inner_activation: The activation for the first Dense layer in a two-layer + feedforward network. dropout_rate: Dropout probability for the post-attention and output dropout. attention_dropout_rate: Dropout probability for within the attention layer. output_range: the sequence output range, [0, output_range) by slicing the @@ -52,8 +57,8 @@ class ReZeroTransformer(tf.keras.layers.Layer): def __init__(self, num_attention_heads, - intermediate_size, - intermediate_activation, + inner_dim=768, + inner_activation=tf_utils.get_activation("gelu"), dropout_rate=0.0, attention_dropout_rate=0.0, output_range=None, @@ -72,12 +77,19 @@ class ReZeroTransformer(tf.keras.layers.Layer): attention_dropout_rate = kwargs.pop("attention_dropout", attention_dropout_rate) dropout_rate = kwargs.pop("output_dropout", dropout_rate) + inner_dim = kwargs.pop("intermediate_size", inner_dim) + inner_activation = kwargs.pop("intermediate_activation", inner_activation) util.filter_kwargs(kwargs) - super(ReZeroTransformer, self).__init__(**kwargs) + super().__init__(**kwargs) + + # Deprecation warning. + if output_range is not None: + logging.warning("`output_range` is avaliable as an argument for `call()`." + "The `output_range` as __init__ argument is deprecated.") self._num_heads = num_attention_heads - self._intermediate_size = intermediate_size - self._intermediate_activation = intermediate_activation + self._inner_dim = inner_dim + self._inner_activation = inner_activation self._attention_dropout_rate = attention_dropout_rate self._dropout_rate = dropout_rate self._output_range = output_range @@ -121,8 +133,6 @@ class ReZeroTransformer(tf.keras.layers.Layer): "heads (%d)" % (hidden_size, self._num_heads)) self._attention_head_size = int(hidden_size // self._num_heads) common_kwargs = dict( - kernel_initializer=self._kernel_initializer, - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, @@ -133,6 +143,8 @@ class ReZeroTransformer(tf.keras.layers.Layer): key_dim=self._attention_head_size, dropout=self._attention_dropout_rate, name="self_attention", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) self._attention_dropout = tf.keras.layers.Dropout(rate=self._dropout_rate) if self._use_layer_norm: @@ -144,11 +156,13 @@ class ReZeroTransformer(tf.keras.layers.Layer): axis=-1, epsilon=1e-12, dtype=tf.float32)) - self._intermediate_dense = tf.keras.layers.experimental.EinsumDense( + self._intermediate_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", - output_shape=(None, self._intermediate_size), + output_shape=(None, self._inner_dim), bias_axes="d", name="intermediate", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) policy = tf.keras.mixed_precision.global_policy() if policy.name == "mixed_bfloat16": @@ -156,13 +170,15 @@ class ReZeroTransformer(tf.keras.layers.Layer): # as well, so we use float32. # TODO(b/154538392): Investigate this. policy = tf.float32 - self._intermediate_activation_layer = tf.keras.layers.Activation( - self._intermediate_activation, dtype=policy) - self._output_dense = tf.keras.layers.experimental.EinsumDense( + self._inner_activation_layer = tf.keras.layers.Activation( + self._inner_activation, dtype=policy) + self._output_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, hidden_size), bias_axes="d", name="output", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) self._output_dropout = tf.keras.layers.Dropout(rate=self._dropout_rate) if self._use_layer_norm: @@ -185,16 +201,16 @@ class ReZeroTransformer(tf.keras.layers.Layer): trainable=True, dtype=tf.float32) - super(ReZeroTransformer, self).build(input_shape) + super().build(input_shape) def get_config(self): config = { "num_attention_heads": self._num_heads, - "intermediate_size": - self._intermediate_size, - "intermediate_activation": - self._intermediate_activation, + "inner_dim": + self._inner_dim, + "inner_activation": + self._inner_activation, "dropout_rate": self._dropout_rate, "attention_dropout_rate": @@ -220,7 +236,7 @@ class ReZeroTransformer(tf.keras.layers.Layer): "bias_constraint": tf.keras.constraints.serialize(self._bias_constraint), } - base_config = super(ReZeroTransformer, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def reset_rezero(self): @@ -228,7 +244,7 @@ class ReZeroTransformer(tf.keras.layers.Layer): if not self._share_rezero: self._rezero_a_ffn.assign(0.) - def call(self, inputs): + def call(self, inputs, output_range: Optional[tf.Tensor] = None) -> tf.Tensor: if isinstance(inputs, (list, tuple)): if len(inputs) == 2: input_tensor, attention_mask = inputs @@ -241,10 +257,12 @@ class ReZeroTransformer(tf.keras.layers.Layer): else: input_tensor, key_value, attention_mask = (inputs, None, None) - if self._output_range: - target_tensor = input_tensor[:, 0:self._output_range, :] + if output_range is None: + output_range = self._output_range + if output_range: + target_tensor = input_tensor[:, 0:output_range, :] if attention_mask is not None: - attention_mask = attention_mask[:, 0:self._output_range, :] + attention_mask = attention_mask[:, 0:output_range, :] else: target_tensor = input_tensor @@ -261,8 +279,7 @@ class ReZeroTransformer(tf.keras.layers.Layer): attention_output = tf.cast(attention_output, tf.float32) intermediate_output = self._intermediate_dense(attention_output) - intermediate_output = self._intermediate_activation_layer( - intermediate_output) + intermediate_output = self._inner_activation_layer(intermediate_output) layer_output = self._output_dense(intermediate_output) layer_output = self._output_dropout(layer_output) # During mixed precision training, attention_output is from layer norm and diff --git a/official/nlp/modeling/layers/rezero_transformer_test.py b/official/nlp/modeling/layers/rezero_transformer_test.py index 48d680f922002718416948ed7a1b9c14a5f40737..cb949c8cbdd29a557b1c1cd4ab8f3ce6458bc674 100644 --- a/official/nlp/modeling/layers/rezero_transformer_test.py +++ b/official/nlp/modeling/layers/rezero_transformer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -128,6 +128,9 @@ class TransformerWithReZeroLayerTest(keras_parameterized.TestCase): new_output_tensor = new_layer([input_data, mask_data]) self.assertAllClose(new_output_tensor, output_tensor[:, 0:1, :]) + output_tensor = test_layer([input_data, mask_data], output_range=1) + self.assertAllClose(new_output_tensor, output_tensor, atol=5e-5, rtol=0.003) + def test_separate_qkv(self): test_layer = rezero_transformer.ReZeroTransformer( num_attention_heads=2, diff --git a/official/nlp/modeling/layers/routing.py b/official/nlp/modeling/layers/routing.py new file mode 100644 index 0000000000000000000000000000000000000000..ce0b6875ab2051f64b8ac243654cf269db8410b6 --- /dev/null +++ b/official/nlp/modeling/layers/routing.py @@ -0,0 +1,125 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Layers for Mixture of Experts (MoE) routing. + +For MoE routing, we need to separate a set of tokens to sets of tokens. +Later on, different sets of tokens can potentially go to different experts. +""" + +import tensorflow as tf + + +@tf.keras.utils.register_keras_serializable(package="Text") +class TokenImportanceWithMovingAvg(tf.keras.layers.Layer): + """Routing based on per-token importance value.""" + + def __init__(self, + vocab_size, + init_importance, + moving_average_beta=0.995, + **kwargs): + self._vocab_size = vocab_size + self._init_importance = init_importance + self._moving_average_beta = moving_average_beta + super().__init__(**kwargs) + + def build(self, input_shape): + self._importance_embedding = self.add_weight( + name="importance_embed", + shape=(self._vocab_size), + initializer=tf.keras.initializers.Constant(self._init_importance), + trainable=False) + + def get_config(self): + config = { + "vocab_size": + self._vocab_size, + "init_importance": + self._init_importance, + "moving_average_beta": + self._moving_average_beta, + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def update_token_importance(self, token_ids, importance): + token_ids = tf.reshape(token_ids, shape=[-1]) + importance = tf.reshape(importance, shape=[-1]) + + beta = self._moving_average_beta + old_importance = tf.gather(self._importance_embedding, token_ids) + self._importance_embedding.assign(tf.tensor_scatter_nd_update( + self._importance_embedding, + tf.expand_dims(token_ids, axis=1), + old_importance * beta + tf.cast(importance * (1.0 - beta), + dtype=tf.float32))) + + def call(self, inputs): + return tf.gather(self._importance_embedding, inputs) + + +@tf.keras.utils.register_keras_serializable(package="Text") +class SelectTopK(tf.keras.layers.Layer): + """Select top-k + random-k tokens according to importance.""" + + def __init__(self, + top_k=None, + random_k=None, + **kwargs): + self._top_k = top_k + self._random_k = random_k + super().__init__(**kwargs) + + def get_config(self): + config = { + "top_k": + self._top_k, + "random_k": + self._random_k, + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + if self._random_k is None: + # Pure top-k, not randomness. + pos = tf.argsort(inputs, direction="DESCENDING") + selected = tf.slice(pos, [0, 0], [-1, self._top_k]) + not_selected = tf.slice(pos, [0, self._top_k], [-1, -1]) + elif self._top_k is None: + # Pure randomness, no top-k. + pos = tf.argsort(tf.random.uniform(shape=tf.shape(inputs)), + direction="DESCENDING") + selected = tf.slice(pos, [0, 0], [-1, self._random_k]) + not_selected = tf.slice(pos, [0, self._random_k], [-1, -1]) + else: + # Top-k plus randomness. + pos = tf.argsort(inputs, direction="DESCENDING") + selected_top_k = tf.slice(pos, [0, 0], [-1, self._top_k]) + pos_left = tf.slice(pos, [0, self._top_k], [-1, -1]) + + # Randomly shuffle pos_left + sort_index = tf.argsort( + tf.random.uniform(shape=tf.shape(pos_left)), + direction="DESCENDING") + pos_left = tf.gather(pos_left, sort_index, batch_dims=1, axis=1) + + selected_rand = tf.slice(pos_left, [0, 0], [-1, self._random_k]) + not_selected = tf.slice(pos_left, [0, self._random_k], [-1, -1]) + + selected = tf.concat([selected_top_k, selected_rand], axis=1) + + # Return the indices of selected and not-selected tokens. + return selected, not_selected diff --git a/official/nlp/modeling/layers/routing_test.py b/official/nlp/modeling/layers/routing_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8d124187f3cd780abadd961cc87c737ad9e034b5 --- /dev/null +++ b/official/nlp/modeling/layers/routing_test.py @@ -0,0 +1,59 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for routing.""" + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.nlp.modeling.layers import routing + + +class TokenImportanceTest(tf.test.TestCase, parameterized.TestCase): + + def test_token_importance(self): + token_importance_embed = routing.TokenImportanceWithMovingAvg( + vocab_size=4, + init_importance=10.0, + moving_average_beta=0.995) + importance = token_importance_embed(np.array([[0, 1], [2, 3]])) + self.assertAllClose(importance, np.array([[10.0, 10.0], [10.0, 10.0]])) + token_importance_embed.update_token_importance( + token_ids=np.array([[0, 1]]), + importance=np.array([[0.0, 0.0]])) + importance = token_importance_embed(np.array([[0, 1], [2, 3]])) + self.assertAllClose(importance, np.array([[9.95, 9.95], [10.0, 10.0]])) + + +class TopKSelectionTest(tf.test.TestCase, parameterized.TestCase): + + def test_top_k_selection(self): + token_selection = routing.SelectTopK(top_k=2) + selected, _ = token_selection(np.array([[0, 1, 2, 3], [4, 3, 2, 1]])) + self.assertAllClose(selected, np.array([[3, 2], [0, 1]])) + + def test_random_k_selection(self): + token_selection = routing.SelectTopK(random_k=2) + selected, _ = token_selection(np.array([[0, 1, 2, 3], [4, 3, 2, 1]])) + self.assertAllClose(selected.shape, (2, 2)) + + def test_top_k_random_k(self): + token_selection = routing.SelectTopK(top_k=1, random_k=1) + selected, _ = token_selection(np.array([[0, 1, 2, 3], [4, 3, 2, 1]])) + self.assertAllClose(selected.shape, (2, 2)) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/nlp/modeling/layers/self_attention_mask.py b/official/nlp/modeling/layers/self_attention_mask.py index cd538bed0b8e6b2bc42bc616154e2d35fe7e4db7..e2c99d7a3f7c56547338e7e56d13fe0557f2d4d0 100644 --- a/official/nlp/modeling/layers/self_attention_mask.py +++ b/official/nlp/modeling/layers/self_attention_mask.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,10 +13,38 @@ # limitations under the License. """Keras layer that creates a self-attention mask.""" - +from typing import Optional import tensorflow as tf +def get_mask(inputs: tf.Tensor, + to_mask: tf.Tensor, + dtype: Optional[tf.DType] = None) -> tf.Tensor: + """Gets a 3D self-attention mask. + + Args: + inputs: from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, + ...]. + to_mask: int32 Tensor of shape [batch_size, to_seq_length]. + dtype: the output Tensor dtype. + + Returns: + float Tensor of shape [batch_size, from_seq_length, to_seq_length]. + """ + from_shape = tf.shape(inputs) + batch_size = from_shape[0] + from_seq_length = from_shape[1] + dtype = inputs.dtype if dtype is None else dtype + + to_shape = tf.shape(to_mask) + to_seq_length = to_shape[1] + + to_mask = tf.cast( + tf.reshape(to_mask, [batch_size, 1, to_seq_length]), dtype=dtype) + + return tf.broadcast_to(to_mask, [batch_size, from_seq_length, to_seq_length]) + + @tf.keras.utils.register_keras_serializable(package='Text') class SelfAttentionMask(tf.keras.layers.Layer): """Create 3D attention mask from a 2D tensor mask. @@ -33,26 +61,4 @@ class SelfAttentionMask(tf.keras.layers.Layer): if isinstance(inputs, list) and to_mask is None: to_mask = inputs[1] inputs = inputs[0] - from_shape = tf.shape(inputs) - batch_size = from_shape[0] - from_seq_length = from_shape[1] - - to_shape = tf.shape(to_mask) - to_seq_length = to_shape[1] - - to_mask = tf.cast( - tf.reshape(to_mask, [batch_size, 1, to_seq_length]), - dtype=inputs.dtype) - - # We don't assume that `from_tensor` is a mask (although it could be). We - # don't actually care if we attend *from* padding tokens (only *to* padding) - # tokens so we create a tensor of all ones. - # - # `broadcast_ones` = [batch_size, from_seq_length, 1] - broadcast_ones = tf.ones( - shape=[batch_size, from_seq_length, 1], dtype=inputs.dtype) - - # Here we broadcast along two dimensions to create the mask. - mask = broadcast_ones * to_mask - - return mask + return get_mask(inputs, to_mask) diff --git a/official/nlp/modeling/layers/spectral_normalization.py b/official/nlp/modeling/layers/spectral_normalization.py index 175150c8ab683238e6cf89d8c87c7cb8b6e23862..aa81dbe1f0946dd55061c8c5fddcbcac29b8fefa 100644 --- a/official/nlp/modeling/layers/spectral_normalization.py +++ b/official/nlp/modeling/layers/spectral_normalization.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -74,21 +74,20 @@ class SpectralNormalization(tf.keras.layers.Wrapper): if not isinstance(layer, tf.keras.layers.Layer): raise ValueError('`layer` must be a `tf.keras.layer.Layer`. ' 'Observed `{}`'.format(layer)) - super(SpectralNormalization, self).__init__( + super().__init__( layer, name=wrapper_name, **kwargs) def build(self, input_shape): - super(SpectralNormalization, self).build(input_shape) + super().build(input_shape) self.layer.kernel._aggregation = self.aggregation # pylint: disable=protected-access self._dtype = self.layer.kernel.dtype self.w = self.layer.kernel self.w_shape = self.w.shape.as_list() - self.uv_initializer = tf.initializers.random_normal() self.v = self.add_weight( shape=(1, np.prod(self.w_shape[:-1])), - initializer=self.uv_initializer, + initializer=tf.initializers.random_normal(), trainable=False, name='v', dtype=self.dtype, @@ -96,7 +95,7 @@ class SpectralNormalization(tf.keras.layers.Wrapper): self.u = self.add_weight( shape=(1, self.w_shape[-1]), - initializer=self.uv_initializer, + initializer=tf.initializers.random_normal(), trainable=False, name='u', dtype=self.dtype, @@ -194,10 +193,11 @@ class SpectralNormalizationConv2D(tf.keras.layers.Wrapper): raise ValueError( 'layer must be a `tf.keras.layer.Conv2D` instance. You passed: {input}' .format(input=layer)) - super(SpectralNormalizationConv2D, self).__init__(layer, **kwargs) + super().__init__(layer, **kwargs) def build(self, input_shape): - self.layer.build(input_shape) + if not self.layer.built: + self.layer.build(input_shape) self.layer.kernel._aggregation = self.aggregation # pylint: disable=protected-access self._dtype = self.layer.kernel.dtype @@ -221,11 +221,10 @@ class SpectralNormalizationConv2D(tf.keras.layers.Wrapper): self.in_shape = (uv_dim, in_height, in_width, in_channel) self.out_shape = (uv_dim, out_height, out_width, out_channel) - self.uv_initializer = tf.initializers.random_normal() self.v = self.add_weight( shape=self.in_shape, - initializer=self.uv_initializer, + initializer=tf.initializers.random_normal(), trainable=False, name='v', dtype=self.dtype, @@ -233,13 +232,13 @@ class SpectralNormalizationConv2D(tf.keras.layers.Wrapper): self.u = self.add_weight( shape=self.out_shape, - initializer=self.uv_initializer, + initializer=tf.initializers.random_normal(), trainable=False, name='u', dtype=self.dtype, aggregation=self.aggregation) - super(SpectralNormalizationConv2D, self).build() + super().build() def call(self, inputs): u_update_op, v_update_op, w_update_op = self.update_weights() diff --git a/official/nlp/modeling/layers/spectral_normalization_test.py b/official/nlp/modeling/layers/spectral_normalization_test.py index e2162ac6c2ab860eeabdba42889ccbd0d9fdb97a..41b1f5c4fe5a5d2c26560cf26151014fa680fe0e 100644 --- a/official/nlp/modeling/layers/spectral_normalization_test.py +++ b/official/nlp/modeling/layers/spectral_normalization_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -66,7 +66,7 @@ class NormalizationTest(tf.test.TestCase, parameterized.TestCase): spectral_norm_computed = _compute_spectral_norm(normalized_kernel) spectral_norm_expected = self.norm_multiplier self.assertAllClose( - spectral_norm_computed, spectral_norm_expected, atol=5e-2) + spectral_norm_computed, spectral_norm_expected, atol=1e-1) # Test that the normalized layer is K-Lipschitz. In particular, if the layer # is a function f, then ||f(x1) - f(x2)||_2 <= K * ||(x1 - x2)||_2, where K diff --git a/official/nlp/modeling/layers/talking_heads_attention.py b/official/nlp/modeling/layers/talking_heads_attention.py index bddfacaa86d1dea6afd7ec67b4608d15cfc36a81..5a939cd0963b29cb2e5222cc8d569906507b7fcb 100644 --- a/official/nlp/modeling/layers/talking_heads_attention.py +++ b/official/nlp/modeling/layers/talking_heads_attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,6 +20,8 @@ import string import gin import tensorflow as tf +from official.modeling import tf_utils + _CHR_IDX = string.ascii_lowercase @@ -87,7 +89,7 @@ class TalkingHeadsAttention(tf.keras.layers.MultiHeadAttention): self._pre_softmax_weight = self.add_weight( "pre_softmax_weight", shape=(self._num_heads, self._num_heads), - initializer=self._kernel_initializer, + initializer=tf_utils.clone_initializer(self._kernel_initializer), regularizer=self._kernel_regularizer, constraint=self._kernel_constraint, dtype=self.dtype, @@ -95,7 +97,7 @@ class TalkingHeadsAttention(tf.keras.layers.MultiHeadAttention): self._post_softmax_weight = self.add_weight( "post_softmax_weight", shape=(self._num_heads, self._num_heads), - initializer=self._kernel_initializer, + initializer=tf_utils.clone_initializer(self._kernel_initializer), regularizer=self._kernel_regularizer, constraint=self._kernel_constraint, dtype=self.dtype, diff --git a/official/nlp/modeling/layers/talking_heads_attention_test.py b/official/nlp/modeling/layers/talking_heads_attention_test.py index 579384bb754952187682bb8dcfdb74fe9e0b6478..6f14e2023c25e13d883ebd1949864fb328652699 100644 --- a/official/nlp/modeling/layers/talking_heads_attention_test.py +++ b/official/nlp/modeling/layers/talking_heads_attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/text_layers.py b/official/nlp/modeling/layers/text_layers.py index 299901d2df7504ef207f4adcf3149c9cac700350..60b2f11a7a6d7db024fe42bb416cccf5bdeaea18 100644 --- a/official/nlp/modeling/layers/text_layers.py +++ b/official/nlp/modeling/layers/text_layers.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,14 +14,16 @@ """Keras Layers for BERT-specific preprocessing.""" # pylint: disable=g-import-not-at-top -from typing import Any, Dict, List, Optional, Union +from typing import Any, Dict, List, Mapping, Optional, Text, Union from absl import logging import tensorflow as tf try: + # pytype: disable=import-error import tensorflow_text as text from tensorflow_text.python.ops import bert_tokenizer + # pytype: enable=import-error except ImportError: text = None bert_tokenizer = None @@ -57,7 +59,7 @@ def _truncate_row_lengths(ragged_tensor: tf.RaggedTensor, class BertTokenizer(tf.keras.layers.Layer): - """Wraps BertTokenizer with pre-defined vocab as a Keras Layer. + """Wraps TF.Text's BertTokenizer with pre-defined vocab as a Keras Layer. Attributes: tokenize_with_offsets: If true, calls @@ -71,8 +73,9 @@ class BertTokenizer(tf.keras.layers.Layer): def __init__(self, *, vocab_file: str, - lower_case: bool, + lower_case: Optional[bool] = None, tokenize_with_offsets: bool = False, + tokenizer_kwargs: Optional[Mapping[Text, Any]] = None, **kwargs): """Initialize a `BertTokenizer` layer. @@ -81,15 +84,18 @@ class BertTokenizer(tf.keras.layers.Layer): This is a text file with newline-separated wordpiece tokens. This layer initializes a lookup table from it that gets used with `text.BertTokenizer`. - lower_case: A Python boolean forwarded to `text.BertTokenizer`. + lower_case: Optional boolean forwarded to `text.BertTokenizer`. If true, input text is converted to lower case (where applicable) before tokenization. This must be set to match the way in which - the `vocab_file` was created. + the `vocab_file` was created. If passed, this overrides whatever value + may have been passed in `tokenizer_kwargs`. tokenize_with_offsets: A Python boolean. If true, this layer calls `text.BertTokenizer.tokenize_with_offsets()` instead of plain `text.BertTokenizer.tokenize()` and outputs a triple of `(tokens, start_offsets, limit_offsets)` insead of just tokens. + tokenizer_kwargs: Optional mapping with keyword arguments to forward to + `text.BertTokenizer`'s constructor. **kwargs: Standard arguments to `Layer()`. Raises: @@ -111,8 +117,11 @@ class BertTokenizer(tf.keras.layers.Layer): self._special_tokens_dict = self._create_special_tokens_dict( self._vocab_table, vocab_file) super().__init__(**kwargs) - self._bert_tokenizer = text.BertTokenizer( - self._vocab_table, lower_case=lower_case) + tokenizer_kwargs = dict(tokenizer_kwargs or {}) + if lower_case is not None: + tokenizer_kwargs["lower_case"] = lower_case + self._bert_tokenizer = text.BertTokenizer(self._vocab_table, + **tokenizer_kwargs) @property def vocab_size(self): diff --git a/official/nlp/modeling/layers/text_layers_test.py b/official/nlp/modeling/layers/text_layers_test.py index 0608863ca8f8e933005c760ef597e5a8200d666e..d3bc63352c1c9cd519944f8f5561218723cd050b 100644 --- a/official/nlp/modeling/layers/text_layers_test.py +++ b/official/nlp/modeling/layers/text_layers_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,6 +19,7 @@ import tempfile import numpy as np import tensorflow as tf +from tensorflow import estimator as tf_estimator from sentencepiece import SentencePieceTrainer from official.nlp.modeling.layers import text_layers @@ -120,10 +121,10 @@ class BertTokenizerTest(tf.test.TestCase): def model_fn(features, labels, mode): del labels # Unused. - return tf.estimator.EstimatorSpec(mode=mode, + return tf_estimator.EstimatorSpec(mode=mode, predictions=features["input_word_ids"]) - estimator = tf.estimator.Estimator(model_fn=model_fn) + estimator = tf_estimator.Estimator(model_fn=model_fn) outputs = list(estimator.predict(input_fn)) self.assertAllEqual(outputs, np.array([[2, 6, 3, 0], [2, 4, 5, 3]])) @@ -231,10 +232,10 @@ class SentencepieceTokenizerTest(tf.test.TestCase): def model_fn(features, labels, mode): del labels # Unused. - return tf.estimator.EstimatorSpec(mode=mode, + return tf_estimator.EstimatorSpec(mode=mode, predictions=features["input_word_ids"]) - estimator = tf.estimator.Estimator(model_fn=model_fn) + estimator = tf_estimator.Estimator(model_fn=model_fn) outputs = list(estimator.predict(input_fn)) self.assertAllEqual(outputs, np.array([[2, 8, 3, 0], [2, 12, 3, 0]])) @@ -537,10 +538,10 @@ class FastWordPieceBertTokenizerTest(tf.test.TestCase): def model_fn(features, labels, mode): del labels # Unused. - return tf.estimator.EstimatorSpec(mode=mode, + return tf_estimator.EstimatorSpec(mode=mode, predictions=features["input_word_ids"]) - estimator = tf.estimator.Estimator(model_fn=model_fn) + estimator = tf_estimator.Estimator(model_fn=model_fn) outputs = list(estimator.predict(input_fn)) self.assertAllEqual(outputs, np.array([[2, 6, 3, 0], [2, 4, 5, 3]])) diff --git a/official/nlp/modeling/layers/tn_expand_condense.py b/official/nlp/modeling/layers/tn_expand_condense.py index c4bd08c5dcadc02defe46e0e2bb23e369ffd389b..406044cda65cb92a12a185141ec2c9bcb576f945 100644 --- a/official/nlp/modeling/layers/tn_expand_condense.py +++ b/official/nlp/modeling/layers/tn_expand_condense.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,6 +17,8 @@ from typing import List, Optional, Text, Any, Dict import tensorflow as tf +from official.modeling import tf_utils + Layer = tf.keras.layers.Layer activations = tf.keras.activations initializers = tf.keras.initializers @@ -64,7 +66,7 @@ class TNExpandCondense(Layer): if 'input_shape' not in kwargs and 'input_dim' in kwargs: kwargs['input_shape'] = (kwargs.pop('input_dim'),) - super(TNExpandCondense, self).__init__(**kwargs) + super().__init__(**kwargs) assert proj_multiplier in [ 2, 4, 6, 8, 10, 12 @@ -84,7 +86,7 @@ class TNExpandCondense(Layer): 'The last dimension of the inputs to `TNExpandCondense` ' 'should be defined. Found `None`.') - super(TNExpandCondense, self).build(input_shape) + super().build(input_shape) self.proj_size = self.proj_multiplier * input_shape[-1] @@ -98,24 +100,24 @@ class TNExpandCondense(Layer): name='w1', shape=(input_shape[-1], input_shape[-1]), trainable=True, - initializer=self.kernel_initializer) + initializer=tf_utils.clone_initializer(self.kernel_initializer)) self.w2 = self.add_weight( name='w2', shape=(128, (128 * (self.proj_size // input_shape[-1]))), trainable=True, - initializer=self.kernel_initializer) + initializer=tf_utils.clone_initializer(self.kernel_initializer)) self.w3 = self.add_weight( name='w3', shape=(128 * (self.proj_size // input_shape[-1]), 128), trainable=True, - initializer=self.kernel_initializer) + initializer=tf_utils.clone_initializer(self.kernel_initializer)) self.w4 = self.add_weight( name='w4', shape=(input_shape[-1] // 128, 128, input_shape[-1]), trainable=True, - initializer=self.kernel_initializer) + initializer=tf_utils.clone_initializer(self.kernel_initializer)) if self.use_bias: self.bias = self.add_weight( @@ -176,5 +178,5 @@ class TNExpandCondense(Layer): getattr(self, initializer_arg)) # Get base config - base_config = super(TNExpandCondense, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) diff --git a/official/nlp/modeling/layers/tn_expand_condense_test.py b/official/nlp/modeling/layers/tn_expand_condense_test.py index ae39b8550252537fb44e42406f5051641eecd893..09ec2a86a2bd217e9ab1a1e2814053fb90a7629b 100644 --- a/official/nlp/modeling/layers/tn_expand_condense_test.py +++ b/official/nlp/modeling/layers/tn_expand_condense_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,8 +19,6 @@ import os from absl.testing import parameterized import numpy as np import tensorflow as tf -# pylint: disable=g-direct-tensorflow-import -from tensorflow.python.keras.testing_utils import layer_test from official.nlp.modeling.layers.tn_expand_condense import TNExpandCondense @@ -45,13 +43,9 @@ class TNLayerTest(tf.test.TestCase, parameterized.TestCase): @parameterized.parameters((768, 6), (1024, 2)) def test_keras_layer(self, input_dim, proj_multiple): - self.skipTest('Disable the test for now since it imports ' - 'keras.testing_utils, will reenable this test after we ' - 'fix the b/184578869') - # TODO(scottzhu): Reenable after fix b/184578869 data = np.random.normal(size=(100, input_dim)) data = data.astype(np.float32) - layer_test( + tf.keras.__internal__.utils.layer_test( TNExpandCondense, kwargs={ 'proj_multiplier': proj_multiple, @@ -64,9 +58,9 @@ class TNLayerTest(tf.test.TestCase, parameterized.TestCase): @parameterized.parameters((768, 6), (1024, 2)) def test_train(self, input_dim, proj_multiple): + tf.keras.utils.set_random_seed(0) data = np.random.randint(10, size=(100, input_dim)) model = self._build_model(data, proj_multiple) - tf.random.set_seed(0) model.compile( optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) @@ -81,7 +75,7 @@ class TNLayerTest(tf.test.TestCase, parameterized.TestCase): @parameterized.parameters((768, 6), (1024, 2)) def test_weights_change(self, input_dim, proj_multiple): - tf.random.set_seed(0) + tf.keras.utils.set_random_seed(0) data = np.random.randint(10, size=(100, input_dim)) model = self._build_model(data, proj_multiple) model.compile( diff --git a/official/nlp/modeling/layers/tn_transformer_expand_condense.py b/official/nlp/modeling/layers/tn_transformer_expand_condense.py index c244fcb1cd051a88eebd363dace39914745c582c..53705a1faa486cfd08c94a4bcde131f97539dabe 100644 --- a/official/nlp/modeling/layers/tn_transformer_expand_condense.py +++ b/official/nlp/modeling/layers/tn_transformer_expand_condense.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,6 +19,7 @@ import gin import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling.layers.tn_expand_condense import TNExpandCondense @@ -77,7 +78,7 @@ class TNTransformerExpandCondense(tf.keras.layers.Layer): intermediate_dropout=0.0, attention_initializer=None, **kwargs): - super(TNTransformerExpandCondense, self).__init__(**kwargs) + super().__init__(**kwargs) self._num_heads = num_attention_heads self._intermediate_size = intermediate_size @@ -100,7 +101,8 @@ class TNTransformerExpandCondense(tf.keras.layers.Layer): self._attention_initializer = tf.keras.initializers.get( attention_initializer) else: - self._attention_initializer = self._kernel_initializer + self._attention_initializer = tf_utils.clone_initializer( + self._kernel_initializer) def build(self, input_shape): input_tensor = input_shape[0] if len(input_shape) == 2 else input_shape @@ -128,7 +130,6 @@ class TNTransformerExpandCondense(tf.keras.layers.Layer): "heads (%d)" % (hidden_size, self._num_heads)) self._attention_head_size = int(hidden_size // self._num_heads) common_kwargs = dict( - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, @@ -140,6 +141,7 @@ class TNTransformerExpandCondense(tf.keras.layers.Layer): dropout=self._attention_dropout_rate, use_bias=self._use_bias, kernel_initializer=self._attention_initializer, + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="self_attention", **common_kwargs) self._attention_dropout = tf.keras.layers.Dropout(rate=self._dropout_rate) @@ -168,7 +170,7 @@ class TNTransformerExpandCondense(tf.keras.layers.Layer): epsilon=self._norm_epsilon, dtype=tf.float32) - super(TNTransformerExpandCondense, self).build(input_shape) + super().build(input_shape) def get_config(self): config = { @@ -209,7 +211,7 @@ class TNTransformerExpandCondense(tf.keras.layers.Layer): "attention_initializer": tf.keras.initializers.serialize(self._attention_initializer) } - base_config = super(TNTransformerExpandCondense, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def call(self, inputs): diff --git a/official/nlp/modeling/layers/tn_transformer_test.py b/official/nlp/modeling/layers/tn_transformer_test.py index a21193e7c7b10b2aef1ae3b0e68c74e191149e2e..af52661a99b72d2559cd5d4d949e508de07121f8 100644 --- a/official/nlp/modeling/layers/tn_transformer_test.py +++ b/official/nlp/modeling/layers/tn_transformer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/transformer.py b/official/nlp/modeling/layers/transformer.py index 8026aaaa328ef8301ef67844f4c8a1d8ab12675b..338edf24724c1d4f33f71bcde903e3a531ad3766 100644 --- a/official/nlp/modeling/layers/transformer.py +++ b/official/nlp/modeling/layers/transformer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,9 +15,11 @@ """Keras-based transformer block layer.""" # pylint: disable=g-classes-have-attributes +from absl import logging import gin import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling.layers import attention from official.nlp.modeling.layers import multi_channel_attention from official.nlp.modeling.layers import transformer_encoder_block @@ -31,6 +33,9 @@ class Transformer(transformer_encoder_block.TransformerEncoderBlock): This layer implements the Transformer from "Attention Is All You Need". (https://arxiv.org/abs/1706.03762). + **Warning: this layer is deprecated. Please don't use it. Use the + `TransformerEncoderBlock` layer instead.** + Args: num_attention_heads: Number of attention heads. intermediate_size: Size of the intermediate layer. @@ -97,6 +102,8 @@ class Transformer(transformer_encoder_block.TransformerEncoderBlock): inner_dropout=intermediate_dropout, attention_initializer=attention_initializer, **kwargs) + logging.warning("The `Transformer` layer is deprecated. Please directly " + "use `TransformerEncoderBlock`.") def get_config(self): return { @@ -226,7 +233,8 @@ class TransformerDecoderBlock(tf.keras.layers.Layer): self._attention_initializer = tf.keras.initializers.get( attention_initializer) else: - self._attention_initializer = self._kernel_initializer + self._attention_initializer = tf_utils.clone_initializer( + self._kernel_initializer) if self.multi_channel_cross_attention: self._cross_attention_cls = multi_channel_attention.MultiChannelAttention else: @@ -244,7 +252,6 @@ class TransformerDecoderBlock(tf.keras.layers.Layer): "heads (%d)" % (hidden_size, self.num_attention_heads)) self.attention_head_size = int(hidden_size) // self.num_attention_heads common_kwargs = dict( - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, @@ -256,14 +263,17 @@ class TransformerDecoderBlock(tf.keras.layers.Layer): key_dim=self.attention_head_size, dropout=self.attention_dropout_rate, use_bias=self._use_bias, - kernel_initializer=self._attention_initializer, + kernel_initializer=tf_utils.clone_initializer( + self._attention_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="self_attention", **common_kwargs) - self.self_attention_output_dense = tf.keras.layers.experimental.EinsumDense( + self.self_attention_output_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, hidden_size), bias_axes="d", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="output", **common_kwargs) self.self_attention_dropout = tf.keras.layers.Dropout( @@ -281,7 +291,9 @@ class TransformerDecoderBlock(tf.keras.layers.Layer): dropout=self.attention_dropout_rate, output_shape=hidden_size, use_bias=self._use_bias, - kernel_initializer=self._attention_initializer, + kernel_initializer=tf_utils.clone_initializer( + self._attention_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="attention/encdec", **common_kwargs) @@ -295,22 +307,24 @@ class TransformerDecoderBlock(tf.keras.layers.Layer): dtype="float32")) # Feed-forward projection. - self.intermediate_dense = tf.keras.layers.experimental.EinsumDense( + self.intermediate_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, self.intermediate_size), bias_axes="d", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="intermediate", **common_kwargs) self.intermediate_activation_layer = tf.keras.layers.Activation( self.intermediate_activation) self._intermediate_dropout_layer = tf.keras.layers.Dropout( rate=self._intermediate_dropout) - self.output_dense = tf.keras.layers.experimental.EinsumDense( + self.output_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, hidden_size), bias_axes="d", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="output", **common_kwargs) self.output_dropout = tf.keras.layers.Dropout(rate=self.dropout_rate) diff --git a/official/nlp/modeling/layers/transformer_encoder_block.py b/official/nlp/modeling/layers/transformer_encoder_block.py index 49e0e0cbee4315da573b1045c9fe38f6436fd6b9..b7634fbd2c77afc05eb7a8db64d58caad1b55b78 100644 --- a/official/nlp/modeling/layers/transformer_encoder_block.py +++ b/official/nlp/modeling/layers/transformer_encoder_block.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,9 +13,11 @@ # limitations under the License. """Keras-based TransformerEncoder block layer.""" - +from typing import Any, Optional +from absl import logging import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling.layers import util @@ -54,9 +56,32 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): inner_dropout=0.0, attention_initializer=None, attention_axes=None, + use_query_residual=True, + key_dim=None, + value_dim=None, + output_last_dim=None, + diff_q_kv_att_layer_norm=False, + return_attention_scores=False, **kwargs): """Initializes `TransformerEncoderBlock`. + Note: If `output_last_dim` is used and `use_query_residual` is `True`, the + `output_last_dim`'s value must equal the first input's last dimension for + the query residual connection to work. This is because the residual + connection after the multi-head-attention requires their dimensions to + match. If `use_query_residual` is `False`, the `output_last_dim` dictactes + the last dimension of the output of this module and the + multi-head-attention. + + E.g. let's say input dims are `[batch_size, seq_dim, input_last_dim]`. + Scenario 1: If `output_last_dim` is not `None`, then the output dims of this + module would be `[batch_size, seq_dim, output_last_dim]`. Note `key_dim` is + overriden by `output_last_dim`. + Scenario 2: If `output_last_dim` is `None` and `key_dim` is not `None`, then + the output dims of this module would be `[batch_size, seq_dim, key_dim]`. + Scenario 3: If the `output_last_dim` and `key_dim` are both `None`, the + output dims would be `[batch_size, seq_dim, input_last_dim]`. + Args: num_attention_heads: Number of attention heads. inner_dim: The output dimension of the first Dense layer in a two-layer @@ -88,17 +113,35 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): kernel. attention_axes: axes over which the attention is applied. `None` means attention over all axes, but batch, heads, and features. + use_query_residual: Toggle to execute residual connection after attention. + key_dim: `key_dim` for the `tf.keras.layers.MultiHeadAttention`. If + `None`, we use the first `input_shape`'s last dim. + value_dim: `value_dim` for the `tf.keras.layers.MultiHeadAttention`. + output_last_dim: Final dimension of the output of this module. This also + dictates the value for the final dimension of the multi-head-attention. + When it's `None`, we use, in order of decreasing precedence, `key_dim` * + `num_heads` or the first `input_shape`'s last dim as the output's last + dim. + diff_q_kv_att_layer_norm: If `True`, create a separate attention layer + norm layer for query and key-value if `norm_first` is `True`. Invalid to + set to `True` if `norm_first` is `False`. + return_attention_scores: If `True`, the output of this layer will be a + tuple and additionally contain the attention scores in the shape of + `[batch_size, num_attention_heads, seq_dim, seq_dim]`. **kwargs: keyword arguments. """ util.filter_kwargs(kwargs) super().__init__(**kwargs) + # Deprecation warning. + if output_range is not None: + logging.warning("`output_range` is available as an argument for `call()`." + "The `output_range` as __init__ argument is deprecated.") + self._num_heads = num_attention_heads self._inner_dim = inner_dim self._inner_activation = inner_activation - self._attention_dropout = attention_dropout self._attention_dropout_rate = attention_dropout - self._output_dropout = output_dropout self._output_dropout_rate = output_dropout self._output_range = output_range self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) @@ -112,13 +155,24 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): self._norm_first = norm_first self._norm_epsilon = norm_epsilon self._inner_dropout = inner_dropout + self._use_query_residual = use_query_residual + self._key_dim = key_dim + self._value_dim = value_dim + self._output_last_dim = output_last_dim + self._diff_q_kv_att_layer_norm = diff_q_kv_att_layer_norm + self._return_attention_scores = return_attention_scores if attention_initializer: self._attention_initializer = tf.keras.initializers.get( attention_initializer) else: - self._attention_initializer = self._kernel_initializer + self._attention_initializer = tf_utils.clone_initializer( + self._kernel_initializer) self._attention_axes = attention_axes + if self._diff_q_kv_att_layer_norm and not self._norm_first: + raise ValueError("Setting `diff_q_and_kv_attention_layer_norm` to True" + "when `norm_first` is False is invalid.") + def build(self, input_shape): if isinstance(input_shape, tf.TensorShape): input_tensor_shape = input_shape @@ -133,27 +187,35 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): einsum_equation = "...bc,cd->...bd" hidden_size = input_tensor_shape[-1] if hidden_size % self._num_heads != 0: - raise ValueError( + logging.warning( "The input size (%d) is not a multiple of the number of attention " - "heads (%d)" % (hidden_size, self._num_heads)) - self._attention_head_size = int(hidden_size // self._num_heads) + "heads (%d)", hidden_size, self._num_heads) + if self._key_dim is None: + self._key_dim = int(hidden_size // self._num_heads) + if self._output_last_dim is None: + last_output_shape = hidden_size + else: + last_output_shape = self._output_last_dim + common_kwargs = dict( - bias_initializer=self._bias_initializer, - kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, kernel_constraint=self._kernel_constraint, bias_constraint=self._bias_constraint) self._attention_layer = tf.keras.layers.MultiHeadAttention( num_heads=self._num_heads, - key_dim=self._attention_head_size, - dropout=self._attention_dropout, + key_dim=self._key_dim, + value_dim=self._value_dim, + dropout=self._attention_dropout_rate, use_bias=self._use_bias, kernel_initializer=self._attention_initializer, + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), attention_axes=self._attention_axes, + output_shape=self._output_last_dim, name="self_attention", **common_kwargs) - self._attention_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) + self._attention_dropout = tf.keras.layers.Dropout( + rate=self._attention_dropout_rate) # Use float32 in layernorm for numeric stability. # It is probably safe in mixed_float16, but we haven't validated this yet. self._attention_layer_norm = ( @@ -162,11 +224,21 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): axis=-1, epsilon=self._norm_epsilon, dtype=tf.float32)) - self._intermediate_dense = tf.keras.layers.experimental.EinsumDense( + self._attention_layer_norm_kv = self._attention_layer_norm + if self._diff_q_kv_att_layer_norm: + self._attention_layer_norm_kv = ( + tf.keras.layers.LayerNormalization( + name="self_attention_layer_norm_kv", + axis=-1, + epsilon=self._norm_epsilon, + dtype=tf.float32)) + + self._intermediate_dense = tf.keras.layers.EinsumDense( einsum_equation, output_shape=(None, self._inner_dim), bias_axes="d", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), name="intermediate", **common_kwargs) policy = tf.keras.mixed_precision.global_policy() @@ -179,14 +251,16 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): self._inner_activation, dtype=policy) self._inner_dropout_layer = tf.keras.layers.Dropout( rate=self._inner_dropout) - self._output_dense = tf.keras.layers.experimental.EinsumDense( + self._output_dense = tf.keras.layers.EinsumDense( einsum_equation, - output_shape=(None, hidden_size), + output_shape=(None, last_output_shape), bias_axes="d", name="output", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) - self._output_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) + self._output_dropout = tf.keras.layers.Dropout( + rate=self._output_dropout_rate) # Use float32 in layernorm for numeric stability. self._output_layer_norm = tf.keras.layers.LayerNormalization( name="output_layer_norm", @@ -194,7 +268,7 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): epsilon=self._norm_epsilon, dtype=tf.float32) - super(TransformerEncoderBlock, self).build(input_shape) + super().build(input_shape) def get_config(self): config = { @@ -234,22 +308,35 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): self._inner_dropout, "attention_initializer": tf.keras.initializers.serialize(self._attention_initializer), - "attention_axes": self._attention_axes, + "attention_axes": + self._attention_axes, + "use_query_residual": + self._use_query_residual, + "key_dim": + self._key_dim, + "value_dim": + self._value_dim, + "output_last_dim": + self._output_last_dim, + "diff_q_kv_att_layer_norm": + self._diff_q_kv_att_layer_norm, } - base_config = super(TransformerEncoderBlock, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) - def call(self, inputs): + def call(self, inputs: Any, output_range: Optional[tf.Tensor] = None) -> Any: """Transformer self-attention encoder block call. Args: - inputs: a single tensor or a list of tensors. - `input tensor` as the single sequence of embeddings. - [`input tensor`, `attention mask`] to have the additional attention - mask. - [`query tensor`, `key value tensor`, `attention mask`] to have separate - input streams for the query, and key/value to the multi-head - attention. + inputs: a single tensor or a list of tensors. `input tensor` as the single + sequence of embeddings. [`input tensor`, `attention mask`] to have the + additional attention mask. [`query tensor`, `key value tensor`, + `attention mask`] to have separate input streams for the query, and + key/value to the multi-head attention. + output_range: the sequence output range, [0, output_range) for slicing the + target sequence. `None` means the target sequence is not sliced. If you + would like to have no change to the model training, it is better to only + set the `output_range` for serving. Returns: An output tensor with the same dimensions as input/query tensor. @@ -266,33 +353,50 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): else: input_tensor, key_value, attention_mask = (inputs, None, None) - if self._output_range: + if output_range is None: + output_range = self._output_range + if output_range: if self._norm_first: - source_tensor = input_tensor[:, 0:self._output_range, :] + source_tensor = input_tensor[:, 0:output_range, :] input_tensor = self._attention_layer_norm(input_tensor) if key_value is not None: - key_value = self._attention_layer_norm(key_value) - target_tensor = input_tensor[:, 0:self._output_range, :] + key_value = self._attention_layer_norm_kv(key_value) + target_tensor = input_tensor[:, 0:output_range, :] if attention_mask is not None: - attention_mask = attention_mask[:, 0:self._output_range, :] + attention_mask = attention_mask[:, 0:output_range, :] else: if self._norm_first: source_tensor = input_tensor input_tensor = self._attention_layer_norm(input_tensor) if key_value is not None: - key_value = self._attention_layer_norm(key_value) + key_value = self._attention_layer_norm_kv(key_value) target_tensor = input_tensor if key_value is None: key_value = input_tensor - attention_output = self._attention_layer( - query=target_tensor, value=key_value, attention_mask=attention_mask) + + if self._return_attention_scores: + attention_output, attention_scores = self._attention_layer( + query=target_tensor, + value=key_value, + attention_mask=attention_mask, + return_attention_scores=True) + else: + attention_output = self._attention_layer( + query=target_tensor, value=key_value, attention_mask=attention_mask) attention_output = self._attention_dropout(attention_output) + if self._norm_first: - attention_output = source_tensor + attention_output + # Important to not combine `self._norm_first` and + # `self._use_query_residual` into one if clause because else is only for + # `_norm_first == False`. + if self._use_query_residual: + attention_output = source_tensor + attention_output else: - attention_output = self._attention_layer_norm(target_tensor + - attention_output) + if self._use_query_residual: + attention_output = target_tensor + attention_output + attention_output = self._attention_layer_norm(attention_output) + if self._norm_first: source_attention_output = attention_output attention_output = self._output_layer_norm(attention_output) @@ -303,9 +407,14 @@ class TransformerEncoderBlock(tf.keras.layers.Layer): layer_output = self._output_dropout(layer_output) if self._norm_first: - return source_attention_output + layer_output + layer_output = source_attention_output + layer_output + else: + # During mixed precision training, layer norm output is always fp32 for + # now. Casts fp32 for the subsequent add. + layer_output = tf.cast(layer_output, tf.float32) + layer_output = self._output_layer_norm(layer_output + attention_output) - # During mixed precision training, layer norm output is always fp32 for now. - # Casts fp32 for the subsequent add. - layer_output = tf.cast(layer_output, tf.float32) - return self._output_layer_norm(layer_output + attention_output) + if self._return_attention_scores: + return layer_output, attention_scores + else: + return layer_output diff --git a/official/nlp/modeling/layers/transformer_encoder_block_test.py b/official/nlp/modeling/layers/transformer_encoder_block_test.py index bb9c4f1e38b36f988ecd32fc9816f236a6322320..ca929e9741fe01db3fbf9d9e3f29dce19bd96364 100644 --- a/official/nlp/modeling/layers/transformer_encoder_block_test.py +++ b/official/nlp/modeling/layers/transformer_encoder_block_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,8 +23,7 @@ from official.nlp.modeling.layers.transformer_encoder_block import TransformerEn @keras_parameterized.run_all_keras_modes -@parameterized.named_parameters( - ('base', TransformerEncoderBlock)) +@parameterized.named_parameters(('base', TransformerEncoderBlock)) class TransformerEncoderBlockLayerTest(keras_parameterized.TestCase): def tearDown(self): @@ -117,18 +116,22 @@ class TransformerEncoderBlockLayerTest(keras_parameterized.TestCase): new_layer = transformer_cls( num_attention_heads=10, inner_dim=2048, - inner_activation='relu', - output_range=1) - _ = new_layer([input_data, mask_data]) + inner_activation='relu') + _ = new_layer([input_data, mask_data], output_range=1) new_layer.set_weights(test_layer.get_weights()) - new_output_tensor = new_layer([input_data, mask_data]) + new_output_tensor = new_layer([input_data, mask_data], output_range=1) self.assertAllClose( new_output_tensor, output_tensor[:, 0:1, :], atol=5e-5, rtol=0.003) + output_tensor = test_layer([input_data, mask_data], output_range=1) + self.assertAllClose(new_output_tensor, output_tensor, atol=5e-5, rtol=0.003) + def test_layer_output_range_without_mask(self, transformer_cls): test_layer = transformer_cls( - num_attention_heads=10, inner_dim=2048, - inner_activation='relu', norm_first=True) + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu', + norm_first=True) sequence_length = 21 width = 80 @@ -143,18 +146,19 @@ class TransformerEncoderBlockLayerTest(keras_parameterized.TestCase): num_attention_heads=10, inner_dim=2048, inner_activation='relu', - output_range=1, norm_first=True) - _ = new_layer(input_data) + _ = new_layer(input_data, output_range=1) new_layer.set_weights(test_layer.get_weights()) - new_output_tensor = new_layer(input_data) + new_output_tensor = new_layer(input_data, output_range=1) self.assertAllClose( new_output_tensor, output_tensor[:, 0:1, :], atol=5e-5, rtol=0.003) def test_layer_output_range_with_pre_norm(self, transformer_cls): test_layer = transformer_cls( - num_attention_heads=10, inner_dim=2048, - inner_activation='relu', norm_first=True) + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu', + norm_first=True) sequence_length = 21 width = 80 @@ -171,14 +175,16 @@ class TransformerEncoderBlockLayerTest(keras_parameterized.TestCase): num_attention_heads=10, inner_dim=2048, inner_activation='relu', - output_range=1, norm_first=True) - _ = new_layer([input_data, mask_data]) + _ = new_layer([input_data, mask_data], output_range=1) new_layer.set_weights(test_layer.get_weights()) - new_output_tensor = new_layer([input_data, mask_data]) + new_output_tensor = new_layer([input_data, mask_data], output_range=1) self.assertAllClose( new_output_tensor, output_tensor[:, 0:1, :], atol=5e-5, rtol=0.003) + output_tensor = test_layer([input_data, mask_data], output_range=1) + self.assertAllClose(new_output_tensor, output_tensor, atol=5e-5, rtol=0.003) + def test_layer_invocation_with_float16_dtype(self, transformer_cls): tf.keras.mixed_precision.set_global_policy('mixed_float16') test_layer = transformer_cls( @@ -252,6 +258,155 @@ class TransformerEncoderBlockLayerTest(keras_parameterized.TestCase): self.assertEqual(output.shape, q_tensor.shape) +@keras_parameterized.run_all_keras_modes +class TransformerEncoderBlockLayerTestWithoutParams(keras_parameterized.TestCase + ): + + def tearDown(self): + super(TransformerEncoderBlockLayerTestWithoutParams, self).tearDown() + tf.keras.mixed_precision.set_global_policy('float32') + + def test_raises_invalid_arg_error_when_q_kv_dims_are_different(self): + test_layer = TransformerEncoderBlock( + num_attention_heads=2, + inner_dim=128, + inner_activation='relu', + norm_first=True) + # Forward path. + q_tensor = tf.zeros([2, 4, 16], dtype=tf.float32) + kv_tensor = tf.zeros([2, 8, 32], dtype=tf.float32) + dummy_mask = tf.zeros([2, 4, 8], dtype=tf.float32) + inputs = [q_tensor, kv_tensor, dummy_mask] + with self.assertRaises(tf.errors.InvalidArgumentError): + test_layer(inputs) + + @parameterized.named_parameters(('output_range_not_none', 2), + ('output_range_none', None)) + def test_needs_diff_q_kv_att_layer_norm_to_be_true_for_diff_q_and_kv_dims( + self, output_range): + test_layer = TransformerEncoderBlock( + num_attention_heads=2, + inner_dim=128, + inner_activation='relu', + norm_first=True) + # Forward path. + q_tensor = tf.zeros([2, 4, 16], dtype=tf.float32) + kv_tensor = tf.zeros([2, 8, 32], dtype=tf.float32) + dummy_mask = tf.zeros([2, 4, 8], dtype=tf.float32) + inputs = [q_tensor, kv_tensor, dummy_mask] + with self.assertRaises(tf.errors.InvalidArgumentError): + test_layer(inputs, output_range=output_range) + + test_layer = TransformerEncoderBlock( + num_attention_heads=2, + inner_dim=128, + inner_activation='relu', + diff_q_kv_att_layer_norm=True, + norm_first=True) + # Forward path. + test_layer(inputs) + + @parameterized.named_parameters(('norm_first_is_true', True), + ('norm_first_is_false', False)) + def test_use_query_residual_false_removes_add_op(self, norm_first): + graph_with_res = tf.Graph() + with graph_with_res.as_default(): + layer = TransformerEncoderBlock( + num_attention_heads=2, + inner_dim=128, + inner_activation='relu', + norm_first=norm_first) + inputs = tf.keras.Input(shape=(None, None, 2)) + outputs = layer(inputs) + tf.keras.Model(inputs=inputs, outputs=outputs) + + graph_without_res = tf.Graph() + with graph_without_res.as_default(): + layer = TransformerEncoderBlock( + num_attention_heads=2, + inner_dim=128, + inner_activation='relu', + norm_first=norm_first, + use_query_residual=False) + inputs = tf.keras.Input(shape=(None, None, 2)) + outputs = layer(inputs) + tf.keras.Model(inputs=inputs, outputs=outputs) + graph_with_res_names = {x.name for x in graph_with_res.get_operations()} + graph_without_res_names = { + x.name for x in graph_without_res.get_operations() + } + + self.assertIn('transformer_encoder_block/add', + list(graph_with_res_names - graph_without_res_names)[0]) + self.assertEmpty(graph_without_res_names - graph_with_res_names) + + @parameterized.named_parameters(('key_dim_is_none', None, 128, 2, 128 // 2), + ('key_dim_is_not_none', 30, 128, 2, 30)) + def test_key_dim(self, key_dim, q_tensor_last_dim, some_num_attention_heads, + expected): + some_inner_dim = 32 + some_inner_activation = 'relu' + test_layer = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + key_dim=key_dim) + + q_tensor = tf.zeros([2, 4, q_tensor_last_dim], dtype=tf.float32) + kv_tensor = tf.zeros([2, 8, 32], dtype=tf.float32) + dummy_mask = tf.zeros([2, 4, 8], dtype=tf.float32) + test_layer([q_tensor, kv_tensor, dummy_mask]) + + self.assertEqual(expected, + test_layer._attention_layer.get_config()['key_dim']) + + @parameterized.named_parameters( + ('output_last_dim_is_none_use_query_residual_false', False, None, 128, + 128), + ('output_last_dim_is_none_use_query_residual_true', True, None, 128, 128), + ('output_last_dim_is_not_none', False, 30, 128, 30)) + def test_output_last_dim(self, use_query_residual, output_last_dim, + q_tensor_last_dim, expected): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + test_layer = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + # Must be false for multi-head output to be different from + # first input's last dim + use_query_residual=use_query_residual, + output_last_dim=output_last_dim) + + q_tensor = tf.zeros([2, 4, q_tensor_last_dim], dtype=tf.float32) + kv_tensor = tf.zeros([2, 8, 32], dtype=tf.float32) + dummy_mask = tf.zeros([2, 4, 8], dtype=tf.float32) + output = test_layer([q_tensor, kv_tensor, dummy_mask]) + + self.assertEqual(output.numpy().shape[-1], expected) + + @parameterized.named_parameters(('value_dim_is_none', None, 128, 2, 128 // 2), + ('value_dim_is_not_none', 30, 128, 2, 30)) + def test_value_dim(self, value_dim, q_tensor_last_dim, + some_num_attention_heads, expected): + some_inner_dim = 32 + some_inner_activation = 'relu' + test_layer = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + value_dim=value_dim) + + q_tensor = tf.zeros([2, 4, q_tensor_last_dim], dtype=tf.float32) + kv_tensor = tf.zeros([2, 8, 32], dtype=tf.float32) + dummy_mask = tf.zeros([2, 4, 8], dtype=tf.float32) + test_layer([q_tensor, kv_tensor, dummy_mask]) + + self.assertEqual(expected, + test_layer._attention_layer.get_config()['value_dim']) + + @keras_parameterized.run_all_keras_modes class TransformerArgumentTest(keras_parameterized.TestCase): @@ -277,6 +432,138 @@ class TransformerArgumentTest(keras_parameterized.TestCase): output = encoder_block(inputs) self.assertEqual(output.shape, (2, 4, hidden_size)) + def test_norm_first_false_and_diff_q_kv_att_layer_norm_true_raises(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + with self.assertRaises(ValueError): + TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + norm_first=False, + diff_q_kv_att_layer_norm=True) + + def test_diff_q_kv_att_layer_norm_is_part_of_config_1(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + norm_first=False) + self.assertIn('diff_q_kv_att_layer_norm', encoder.get_config()) + self.assertFalse(encoder.get_config()['diff_q_kv_att_layer_norm']) + + def test_diff_q_kv_att_layer_norm_is_part_of_config_2(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + norm_first=True, + diff_q_kv_att_layer_norm=True) + self.assertIn('diff_q_kv_att_layer_norm', encoder.get_config()) + self.assertTrue(encoder.get_config()['diff_q_kv_att_layer_norm']) + + def test_use_query_residual_is_part_of_config_1(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation) + self.assertIn('use_query_residual', encoder.get_config()) + self.assertTrue(encoder.get_config()['use_query_residual']) + + def test_use_query_residual_is_part_of_config_2(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + use_query_residual=False) + self.assertIn('use_query_residual', encoder.get_config()) + self.assertFalse(encoder.get_config()['use_query_residual']) + + def test_key_dim_is_part_of_config_1(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation) + self.assertIn('key_dim', encoder.get_config()) + self.assertIsNone(encoder.get_config()['key_dim']) + + def test_key_dim_is_part_of_config_2(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + key_dim = 10 + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + key_dim=key_dim) + self.assertIn('key_dim', encoder.get_config()) + self.assertEqual(key_dim, encoder.get_config()['key_dim']) + + def test_value_dim_is_part_of_config_1(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation) + self.assertIn('value_dim', encoder.get_config()) + self.assertIsNone(encoder.get_config()['value_dim']) + + def test_value_dim_is_part_of_config_2(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + value_dim = 10 + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + value_dim=value_dim) + self.assertIn('value_dim', encoder.get_config()) + self.assertEqual(value_dim, encoder.get_config()['value_dim']) + + def test_output_last_dim_is_part_of_config_1(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation) + self.assertIn('output_last_dim', encoder.get_config()) + self.assertIsNone(encoder.get_config()['output_last_dim']) + + def test_output_last_dim_is_part_of_config_2(self): + some_num_attention_heads = 2 + some_inner_dim = 32 + some_inner_activation = 'relu' + output_last_dim = 10 + encoder = TransformerEncoderBlock( + num_attention_heads=some_num_attention_heads, + inner_dim=some_inner_dim, + inner_activation=some_inner_activation, + output_last_dim=output_last_dim) + self.assertIn('output_last_dim', encoder.get_config()) + self.assertEqual(output_last_dim, encoder.get_config()['output_last_dim']) + def test_get_config(self): num_attention_heads = 2 encoder_block = TransformerEncoderBlock( @@ -290,7 +577,12 @@ class TransformerArgumentTest(keras_parameterized.TestCase): norm_epsilon=1e-6, inner_dropout=0.1, attention_initializer=tf.keras.initializers.RandomUniform( - minval=0., maxval=1.)) + minval=0., maxval=1.), + use_query_residual=False, + key_dim=20, + value_dim=30, + output_last_dim=40, + diff_q_kv_att_layer_norm=True) encoder_block_config = encoder_block.get_config() new_encoder_block = TransformerEncoderBlock.from_config( encoder_block_config) @@ -319,6 +611,88 @@ class TransformerArgumentTest(keras_parameterized.TestCase): # The default output of a transformer layer should be the same as the input. self.assertEqual(data_tensor.shape.as_list(), output_tensor.shape.as_list()) + @parameterized.parameters( + { + 'output_dropout': 0.1, + 'attention_dropout': 0.2, + 'inner_dropout': 0.3 + }, { + 'output_dropout': 0.0, + 'attention_dropout': 0.2, + 'inner_dropout': 0.3 + }, { + 'output_dropout': 0.1, + 'attention_dropout': 0.0, + 'inner_dropout': 0.3 + }, { + 'output_dropout': 0.1, + 'attention_dropout': 0.2, + 'inner_dropout': 0.0 + }) + def test_dropout_config(self, output_dropout, attention_dropout, + inner_dropout): + test_layer = TransformerEncoderBlock( + num_attention_heads=2, + inner_dim=32, + inner_activation='relu', + output_dropout=output_dropout, + attention_dropout=attention_dropout, + inner_dropout=inner_dropout) + seq_len = 21 + hidden_size = 512 + input_tensor = tf.keras.Input(shape=(seq_len, hidden_size)) + _ = test_layer(input_tensor) + + true_output_dropout = test_layer._output_dropout.get_config()['rate'] + true_attention_dropout = test_layer._attention_dropout.get_config()['rate'] + true_inner_dropout = test_layer._inner_dropout_layer.get_config()['rate'] + self.assertEqual(true_output_dropout, output_dropout) + self.assertEqual(true_attention_dropout, attention_dropout) + self.assertEqual(true_inner_dropout, inner_dropout) + + @parameterized.named_parameters( + ( + 'return_attention_scores_is_false', + False, + ), + ( + 'return_attention_scores_is_true', + True, + ), + ) + def test_return_attention_scores(self, return_attention_scores): + num_attention_heads = 7 + sequence_length = 21 + width = 80 + + test_layer = TransformerEncoderBlock( + num_attention_heads=num_attention_heads, + inner_dim=2048, + inner_activation='relu', + return_attention_scores=return_attention_scores) + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + output_tensor = test_layer(data_tensor) + + expected_layer_output_shape = [None, sequence_length, width] + expected_attention_scores_shape = [ + None, num_attention_heads, sequence_length, sequence_length + ] + + if return_attention_scores: + self.assertIsInstance(output_tensor, tuple) + self.assertEqual(len(output_tensor), 2) + # First is the standard output. + self.assertEqual(output_tensor[0].shape.as_list(), + expected_layer_output_shape) + # Second is the attention scores. + self.assertEqual(output_tensor[1].shape.as_list(), + expected_attention_scores_shape) + else: + # Only the standard layer output. + self.assertEqual(output_tensor.shape.as_list(), + expected_layer_output_shape) + if __name__ == '__main__': tf.test.main() diff --git a/official/nlp/modeling/layers/transformer_scaffold.py b/official/nlp/modeling/layers/transformer_scaffold.py index 4f6de71ceafe5b40442ae68c9bffb2e90cfa7c5b..6b46a4b8123c24495b888f0cd3245c50615c4aec 100644 --- a/official/nlp/modeling/layers/transformer_scaffold.py +++ b/official/nlp/modeling/layers/transformer_scaffold.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,9 @@ from absl import logging import gin import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling.layers import attention +from official.nlp.modeling.layers import util @tf.keras.utils.register_keras_serializable(package="Text") @@ -37,8 +39,10 @@ class TransformerScaffold(tf.keras.layers.Layer): Args: num_attention_heads: Number of attention heads. - intermediate_size: Size of the intermediate layer. - intermediate_activation: Activation for the intermediate layer. + inner_dim: The output dimension of the first Dense layer in a two-layer + feedforward network. + inner_activation: The activation for the first Dense layer in a two-layer + feedforward network. attention_cls: A class to instantiate attention layer, or a layer instance. attention_cfg: The config with which to instantiate `attention_cls`. Ignored if attention_cls is a layer instance or None. If `attention_cls` is a @@ -58,8 +62,8 @@ class TransformerScaffold(tf.keras.layers.Layer): Ignored if feedforward_cls is a layer instance or is None. If `feedforward_cls` is a class, but `feedforward_cfg` is None, following kwargs will be used to instantiate the feedforward instance: { - "intermediate_size": intermediate_size, - "intermediate_activation": intermediate_activation, + "inner_dim": inner_dim, + "inner_activation": inner_activation, "dropout": dropout_rate, "name": "feedforward" }. dropout_rate: Dropout probability for the post-attention and output dropout. @@ -75,8 +79,8 @@ class TransformerScaffold(tf.keras.layers.Layer): def __init__(self, num_attention_heads, - intermediate_size, - intermediate_activation, + inner_dim=768, + inner_activation=tf_utils.get_activation("gelu"), attention_cls=attention.MultiHeadAttention, attention_cfg=None, feedforward_cls=None, @@ -92,7 +96,10 @@ class TransformerScaffold(tf.keras.layers.Layer): kernel_constraint=None, bias_constraint=None, **kwargs): - super(TransformerScaffold, self).__init__(**kwargs) + inner_dim = kwargs.pop("intermediate_size", inner_dim) + inner_activation = kwargs.pop("inner_activation", inner_activation) + util.filter_kwargs(kwargs) + super().__init__(**kwargs) self._attention_cfg = attention_cfg self._attention_cls = attention_cls @@ -100,8 +107,8 @@ class TransformerScaffold(tf.keras.layers.Layer): self._feedforward_cfg = feedforward_cfg self._norm_first = norm_first self._num_heads = num_attention_heads - self._intermediate_size = intermediate_size - self._intermediate_activation = intermediate_activation + self._inner_dim = inner_dim + self._inner_activation = inner_activation self._attention_dropout_rate = attention_dropout_rate self._dropout_rate = dropout_rate self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) @@ -112,9 +119,15 @@ class TransformerScaffold(tf.keras.layers.Layer): self._bias_constraint = tf.keras.constraints.get(bias_constraint) def build(self, input_shape): - input_tensor_shape = input_shape[0] if ( - len(input_shape) == 2) else input_shape - input_tensor_shape = tf.TensorShape(input_tensor_shape) + if isinstance(input_shape, tf.TensorShape): + input_tensor_shape = input_shape + elif isinstance(input_shape, (list, tuple)): + input_tensor_shape = tf.TensorShape(input_shape[0]) + else: + raise ValueError( + "The type of input shape argument is not supported, got: %s" % + type(input_shape)) + if len(input_tensor_shape.as_list()) != 3: raise ValueError( "TransformerScaffold expects a three-dimensional input of " @@ -127,8 +140,6 @@ class TransformerScaffold(tf.keras.layers.Layer): self._attention_head_size = int(hidden_size // self._num_heads) common_kwargs = dict( - kernel_initializer=self._kernel_initializer, - bias_initializer=self._bias_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, activity_regularizer=self._activity_regularizer, @@ -145,6 +156,9 @@ class TransformerScaffold(tf.keras.layers.Layer): return instance_or_cls(**config) default_attention_cfg = { + "kernel_initializer": tf_utils.clone_initializer( + self._kernel_initializer), + "bias_initializer": tf_utils.clone_initializer(self._bias_initializer), "num_heads": self._num_heads, "key_dim": self._attention_head_size, "dropout": self._attention_dropout_rate, @@ -158,8 +172,15 @@ class TransformerScaffold(tf.keras.layers.Layer): if self._feedforward_cls is not None: default_feedforward_cfg = { - "intermediate_size": self._intermediate_size, - "intermediate_activation": self._intermediate_activation, + "kernel_initializer": tf_utils.clone_initializer( + self._kernel_initializer), + "bias_initializer": tf_utils.clone_initializer( + self._bias_initializer), + "inner_dim": self._inner_dim, + "inner_activation": self._inner_activation, + # TODO(hongkuny): try to update all ffn block args. + "intermediate_size": self._inner_dim, + "intermediate_activation": self._inner_activation, "dropout": self._dropout_rate, "name": "feedforward", } @@ -184,11 +205,14 @@ class TransformerScaffold(tf.keras.layers.Layer): dtype=tf.float32)) if self._feedforward_block is None: - self._intermediate_dense = tf.keras.layers.experimental.EinsumDense( + self._intermediate_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", - output_shape=(None, self._intermediate_size), + output_shape=(None, self._inner_dim), bias_axes="d", name="intermediate", + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) policy = tf.keras.mixed_precision.global_policy() if policy.name == "mixed_bfloat16": @@ -197,12 +221,15 @@ class TransformerScaffold(tf.keras.layers.Layer): # TODO(b/154538392): Investigate this. policy = tf.float32 self._intermediate_activation_layer = tf.keras.layers.Activation( - self._intermediate_activation, dtype=policy) - self._output_dense = tf.keras.layers.experimental.EinsumDense( + self._inner_activation, dtype=policy) + self._output_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, hidden_size), bias_axes="d", name="output", + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + bias_initializer=tf_utils.clone_initializer(self._bias_initializer), **common_kwargs) self._output_dropout = tf.keras.layers.Dropout(rate=self._dropout_rate) @@ -210,7 +237,7 @@ class TransformerScaffold(tf.keras.layers.Layer): self._output_layer_norm = tf.keras.layers.LayerNormalization( name="output_layer_norm", axis=-1, epsilon=1e-12, dtype=tf.float32) - super(TransformerScaffold, self).build(input_shape) + super().build(input_shape) logging.info("%s configs: %s", self.__class__.__name__, self.get_config()) def get_config(self): @@ -221,10 +248,10 @@ class TransformerScaffold(tf.keras.layers.Layer): self._feedforward_block, "num_attention_heads": self._num_heads, - "intermediate_size": - self._intermediate_size, - "intermediate_activation": - self._intermediate_activation, + "inner_dim": + self._inner_dim, + "inner_activation": + self._inner_activation, "dropout_rate": self._dropout_rate, "attention_dropout_rate": @@ -246,21 +273,31 @@ class TransformerScaffold(tf.keras.layers.Layer): "bias_constraint": tf.keras.constraints.serialize(self._bias_constraint) } - base_config = super(TransformerScaffold, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def call(self, inputs, training=None): - if isinstance(inputs, (list, tuple)) and len(inputs) == 2: - input_tensor, attention_mask = inputs + if isinstance(inputs, (list, tuple)): + if len(inputs) == 2: + input_tensor, attention_mask = inputs + key_value = None + elif len(inputs) == 3: + input_tensor, key_value, attention_mask = inputs + else: + raise ValueError("Unexpected inputs to %s with length at %d" % + (self.__class__, len(inputs))) else: - input_tensor, attention_mask = (inputs, None) + input_tensor, key_value, attention_mask = (inputs, None, None) + + if key_value is None: + key_value = input_tensor if self._norm_first: source_tensor = input_tensor input_tensor = self._attention_layer_norm(input_tensor, training=training) attention_output = self._attention_layer( - query=input_tensor, value=input_tensor, attention_mask=attention_mask, + query=input_tensor, value=key_value, attention_mask=attention_mask, training=training) attention_output = self._attention_dropout(attention_output, training=training) @@ -298,7 +335,9 @@ class TransformerScaffold(tf.keras.layers.Layer): training=training) layer_output += source_attention_output else: - # if not norm_first, assume that the feedforwad does apply layer norm + # Attention: if not norm_first, assume that the feedforwad does apply + # layer norm. The feedford also apply residual connection. Please + # read the `GatedFeedforward` as a concrete example. layer_output = self._feedforward_block(attention_output, training=training) diff --git a/official/nlp/modeling/layers/transformer_scaffold_test.py b/official/nlp/modeling/layers/transformer_scaffold_test.py index 5267a27efd627e3418ab76526505ec2b4617147d..d72cbf0ff716b1d4f7b38e15d3b28c3f48db9b99 100644 --- a/official/nlp/modeling/layers/transformer_scaffold_test.py +++ b/official/nlp/modeling/layers/transformer_scaffold_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -58,7 +58,7 @@ class ValidatedFeedforwardLayer(tf.keras.layers.Layer): def build(self, input_shape): hidden_size = input_shape.as_list()[-1] - self._feedforward_dense = tf.keras.layers.experimental.EinsumDense( + self._feedforward_dense = tf.keras.layers.EinsumDense( '...x,xy->...y', output_shape=hidden_size, bias_axes='y', @@ -99,8 +99,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cls=ValidatedAttentionLayer, attention_cfg=attention_layer_cfg, num_attention_heads=10, - intermediate_size=2048, - intermediate_activation='relu') + inner_dim=2048, + inner_activation='relu') # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -134,8 +134,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): feedforward_cls=ValidatedFeedforwardLayer, feedforward_cfg=feedforward_layer_cfg, num_attention_heads=10, - intermediate_size=None, - intermediate_activation=None) + inner_dim=None, + inner_activation=None) # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -165,8 +165,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cls=ValidatedAttentionLayer, attention_cfg=attention_layer_cfg, num_attention_heads=10, - intermediate_size=2048, - intermediate_activation='relu') + inner_dim=2048, + inner_activation='relu') # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -194,8 +194,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cls=ValidatedAttentionLayer, attention_cfg=attention_layer_cfg, num_attention_heads=10, - intermediate_size=2048, - intermediate_activation='relu') + inner_dim=2048, + inner_activation='relu') # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -236,8 +236,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cfg=attention_layer_cfg, feedforward_cls=feedforward_layer, num_attention_heads=10, - intermediate_size=None, - intermediate_activation=None) + inner_dim=None, + inner_activation=None) # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -280,8 +280,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cls=ValidatedAttentionLayer, attention_cfg=attention_layer_cfg, num_attention_heads=10, - intermediate_size=2048, - intermediate_activation='relu') + inner_dim=2048, + inner_activation='relu') # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -322,8 +322,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cls=ValidatedAttentionLayer, attention_cfg=attention_layer_cfg, num_attention_heads=10, - intermediate_size=2048, - intermediate_activation='relu') + inner_dim=2048, + inner_activation='relu') # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -363,8 +363,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cls=ValidatedAttentionLayer, attention_cfg=attention_layer_cfg, num_attention_heads=10, - intermediate_size=2048, - intermediate_activation='relu', + inner_dim=2048, + inner_activation='relu', kernel_initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02)) # Create a 3-dimensional input (the first dimension is implicit). @@ -392,8 +392,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): attention_cls=ValidatedAttentionLayer, attention_cfg=attention_layer_cfg, num_attention_heads=10, - intermediate_size=2048, - intermediate_activation='relu') + inner_dim=2048, + inner_activation='relu') # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) @@ -458,8 +458,8 @@ class TransformerLayerTest(keras_parameterized.TestCase): feedforward_cls=ValidatedFeedforwardLayer, feedforward_cfg=feedforward_layer_cfg, num_attention_heads=10, - intermediate_size=None, - intermediate_activation=None) + inner_dim=None, + inner_activation=None) # Create a 3-dimensional input (the first dimension is implicit). data_tensor = tf.keras.Input(shape=(sequence_length, width)) diff --git a/official/nlp/modeling/layers/transformer_test.py b/official/nlp/modeling/layers/transformer_test.py index 0c6c472ec4dfc643450b2d584ce3fdb3f34dffa5..8ee11b9196f6be20cadd22b477d7535740b4207f 100644 --- a/official/nlp/modeling/layers/transformer_test.py +++ b/official/nlp/modeling/layers/transformer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/transformer_xl.py b/official/nlp/modeling/layers/transformer_xl.py index 748957398c923bf0069d7ad0f41c486b9c8ac947..462d80c25341f489840b9a6969ba94565a1c32ab 100644 --- a/official/nlp/modeling/layers/transformer_xl.py +++ b/official/nlp/modeling/layers/transformer_xl.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,6 +18,7 @@ from absl import logging import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling.layers import relative_attention @@ -102,7 +103,7 @@ class TransformerXLBlock(tf.keras.layers.Layer): **kwargs): """Initializes TransformerXLBlock layer.""" - super(TransformerXLBlock, self).__init__(**kwargs) + super().__init__(**kwargs) self._vocab_size = vocab_size self._num_heads = num_attention_heads self._head_size = head_size @@ -148,7 +149,7 @@ class TransformerXLBlock(tf.keras.layers.Layer): value_dim=self._head_size, dropout=self._attention_dropout_rate, use_bias=False, - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), name="rel_attn") self._attention_dropout = tf.keras.layers.Dropout( rate=self._attention_dropout_rate) @@ -157,30 +158,30 @@ class TransformerXLBlock(tf.keras.layers.Layer): axis=-1, epsilon=self._norm_epsilon, dtype=tf.float32) - self._inner_dense = tf.keras.layers.experimental.EinsumDense( + self._inner_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, self._inner_size), bias_axes="d", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), name="inner") self._inner_activation_layer = tf.keras.layers.Activation( self._inner_activation) self._inner_dropout_layer = tf.keras.layers.Dropout( rate=self._inner_dropout) - self._output_dense = tf.keras.layers.experimental.EinsumDense( + self._output_dense = tf.keras.layers.EinsumDense( "abc,cd->abd", output_shape=(None, hidden_size), bias_axes="d", name="output", - kernel_initializer=self._kernel_initializer) + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer)) self._output_dropout = tf.keras.layers.Dropout(rate=self._dropout_rate) self._output_layer_norm = tf.keras.layers.LayerNormalization( name="output_layer_norm", axis=-1, epsilon=self._norm_epsilon) - super(TransformerXLBlock, self).build(input_shape) + super().build(input_shape) def get_config(self): config = { @@ -209,7 +210,7 @@ class TransformerXLBlock(tf.keras.layers.Layer): "inner_dropout": self._inner_dropout, } - base_config = super(TransformerXLBlock, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def call(self, @@ -370,7 +371,7 @@ class TransformerXL(tf.keras.layers.Layer): inner_activation="relu", **kwargs): """Initializes TransformerXL.""" - super(TransformerXL, self).__init__(**kwargs) + super().__init__(**kwargs) self._vocab_size = vocab_size self._initializer = initializer @@ -398,17 +399,17 @@ class TransformerXL(tf.keras.layers.Layer): "content_attention_bias", shape=attention_bias_shape, dtype=tf.float32, - initializer=self._initializer) + initializer=tf_utils.clone_initializer(self._initializer)) self.positional_attention_bias = self.add_weight( "positional_attention_bias", shape=attention_bias_shape, dtype=tf.float32, - initializer=self._initializer) + initializer=tf_utils.clone_initializer(self._initializer)) self.segment_attention_bias = self.add_weight( "segment_attention_bias", shape=attention_bias_shape, dtype=tf.float32, - initializer=self._initializer) + initializer=tf_utils.clone_initializer(self._initializer)) self.transformer_xl_layers = [] for i in range(self._num_layers): @@ -460,7 +461,7 @@ class TransformerXL(tf.keras.layers.Layer): "inner_activation": self._inner_activation, } - base_config = super(TransformerXL, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def call(self, diff --git a/official/nlp/modeling/layers/transformer_xl_test.py b/official/nlp/modeling/layers/transformer_xl_test.py index 94945c962a0a1e897340770005b6c9678e28a050..375d96ec8ff15f8caad4cfdc826978c9fc4f84b8 100644 --- a/official/nlp/modeling/layers/transformer_xl_test.py +++ b/official/nlp/modeling/layers/transformer_xl_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/layers/util.py b/official/nlp/modeling/layers/util.py index d9562e24d47ffe8aada5602115b3cb894fdcabd7..a3a7820712ab1dd1283c2a6219f32191a5ec13c5 100644 --- a/official/nlp/modeling/layers/util.py +++ b/official/nlp/modeling/layers/util.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/losses/__init__.py b/official/nlp/modeling/losses/__init__.py index cdd2c29f1b50d965af0202b86e2b0cf34e679315..2cb70ee5e5a50116dfae4312c06e6c1717cbec23 100644 --- a/official/nlp/modeling/losses/__init__.py +++ b/official/nlp/modeling/losses/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy.py b/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy.py index d777800c611cb83ae5a04c2394ce89cecef50e51..81c9b38c544c4787e17e6ef9fcbd79a8cc6665ff 100644 --- a/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy.py +++ b/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy_test.py b/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy_test.py index f890d5b7e35c8dd50747554dcf99dd752449c890..3acab53394bd00e7cf1b7d28bbef66eae08ab341 100644 --- a/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy_test.py +++ b/official/nlp/modeling/losses/weighted_sparse_categorical_crossentropy_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/__init__.py b/official/nlp/modeling/models/__init__.py index 456d06629f4a43da39b2bfa1fb5e6707b54b7787..afe28858b0bf739fb6c268cbe0f3ed4827c37458 100644 --- a/official/nlp/modeling/models/__init__.py +++ b/official/nlp/modeling/models/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/bert_classifier.py b/official/nlp/modeling/models/bert_classifier.py index 72a29b24ad010fc0540f39383cac340f1dfecd9b..105f00497a07fb1f3a7df836f20a40658b8fa4d0 100644 --- a/official/nlp/modeling/models/bert_classifier.py +++ b/official/nlp/modeling/models/bert_classifier.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/bert_classifier_test.py b/official/nlp/modeling/models/bert_classifier_test.py index 52d3157827eb9670e6b54a0d6d0c74e88905701b..98c4d8287f2eaca7ab0fa95b701589b5957f6312 100644 --- a/official/nlp/modeling/models/bert_classifier_test.py +++ b/official/nlp/modeling/models/bert_classifier_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/bert_pretrainer.py b/official/nlp/modeling/models/bert_pretrainer.py index dc75da76d937858c70d06337cff945455396af6d..f9bdf7ac6a14fc72e00b18cfc203ce7d626bf559 100644 --- a/official/nlp/modeling/models/bert_pretrainer.py +++ b/official/nlp/modeling/models/bert_pretrainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,6 +22,7 @@ from absl import logging import gin import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling import layers from official.nlp.modeling import networks @@ -102,7 +103,7 @@ class BertPretrainer(tf.keras.Model): masked_lm = layers.MaskedLM( embedding_table=embedding_table, activation=activation, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), output=output, name='cls/predictions') lm_outputs = masked_lm( @@ -111,7 +112,7 @@ class BertPretrainer(tf.keras.Model): classification = networks.Classification( input_width=cls_output.shape[-1], num_classes=num_classes, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), output=output, name='classification') sentence_outputs = classification(cls_output) @@ -199,6 +200,7 @@ class BertPretrainerV2(tf.keras.Model): self._config = { 'encoder_network': encoder_network, 'mlm_initializer': mlm_initializer, + 'mlm_activation': mlm_activation, 'classification_heads': classification_heads, 'name': name, } diff --git a/official/nlp/modeling/models/bert_pretrainer_test.py b/official/nlp/modeling/models/bert_pretrainer_test.py index 152dce89d3efec8a9f949559b64d5f13aafad780..869777372215ed7eac8877d71597e8936cea4672 100644 --- a/official/nlp/modeling/models/bert_pretrainer_test.py +++ b/official/nlp/modeling/models/bert_pretrainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/bert_span_labeler.py b/official/nlp/modeling/models/bert_span_labeler.py index a444ebbf9cc3839693daa0dc3c8bc097e7397c41..5edc62967b8414a5ae3f8f0b09e647ad2fc85f1a 100644 --- a/official/nlp/modeling/models/bert_span_labeler.py +++ b/official/nlp/modeling/models/bert_span_labeler.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/bert_span_labeler_test.py b/official/nlp/modeling/models/bert_span_labeler_test.py index 59d0e256c921d8ece1396a71e7c5d6ca30f8055a..9f9da14c30f7eeaf7bbc90299a905af0d353d698 100644 --- a/official/nlp/modeling/models/bert_span_labeler_test.py +++ b/official/nlp/modeling/models/bert_span_labeler_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/bert_token_classifier.py b/official/nlp/modeling/models/bert_token_classifier.py index 340d92fd662103393514415489e22c8d5dac0d76..6375aa4b61c58a836f54d01e7a99a4e2dfe399c9 100644 --- a/official/nlp/modeling/models/bert_token_classifier.py +++ b/official/nlp/modeling/models/bert_token_classifier.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/bert_token_classifier_test.py b/official/nlp/modeling/models/bert_token_classifier_test.py index 8af0897638d850a25590b475ae7be65365271343..83765f5fed5e76efd3a7d1822e92b069ec800d2a 100644 --- a/official/nlp/modeling/models/bert_token_classifier_test.py +++ b/official/nlp/modeling/models/bert_token_classifier_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/dual_encoder.py b/official/nlp/modeling/models/dual_encoder.py index 7fa496e89623c88866b2183b39ba67fb4c4e156b..b5b948c11a1a65d4f57ef4263c43caa64abdac0b 100644 --- a/official/nlp/modeling/models/dual_encoder.py +++ b/official/nlp/modeling/models/dual_encoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/dual_encoder_test.py b/official/nlp/modeling/models/dual_encoder_test.py index 30d3d4793554ac16426de31bd635aefb8c1525fe..699277966d294b059f8d94b7f0a79564a51fb222 100644 --- a/official/nlp/modeling/models/dual_encoder_test.py +++ b/official/nlp/modeling/models/dual_encoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/electra_pretrainer.py b/official/nlp/modeling/models/electra_pretrainer.py index dcbbc552175625455edb0395a686d2a254419ddc..19db3cc04063608729aeb730894df67e4bf05f8e 100644 --- a/official/nlp/modeling/models/electra_pretrainer.py +++ b/official/nlp/modeling/models/electra_pretrainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -96,21 +96,22 @@ class ElectraPretrainer(tf.keras.Model): self.masked_lm = layers.MaskedLM( embedding_table=generator_network.get_embedding_table(), activation=mlm_activation, - initializer=mlm_initializer, + initializer=tf_utils.clone_initializer(mlm_initializer), output=output_type, name='generator_masked_lm') self.classification = layers.ClassificationHead( inner_dim=generator_network.get_config()['hidden_size'], num_classes=num_classes, - initializer=mlm_initializer, + initializer=tf_utils.clone_initializer(mlm_initializer), name='generator_classification_head') self.discriminator_projection = tf.keras.layers.Dense( units=discriminator_network.get_config()['hidden_size'], activation=mlm_activation, - kernel_initializer=mlm_initializer, + kernel_initializer=tf_utils.clone_initializer(mlm_initializer), name='discriminator_projection_head') self.discriminator_head = tf.keras.layers.Dense( - units=1, kernel_initializer=mlm_initializer) + units=1, + kernel_initializer=tf_utils.clone_initializer(mlm_initializer)) def call(self, inputs): """ELECTRA forward pass. diff --git a/official/nlp/modeling/models/electra_pretrainer_test.py b/official/nlp/modeling/models/electra_pretrainer_test.py index d5d44fa49d005720a13a6752c6af119e99709d31..23864934993ce269c38cb91dd01e64e2d0eae3b7 100644 --- a/official/nlp/modeling/models/electra_pretrainer_test.py +++ b/official/nlp/modeling/models/electra_pretrainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/seq2seq_transformer.py b/official/nlp/modeling/models/seq2seq_transformer.py index 3ec765f7afce40f90b4ca3d61340afe4d0fd22bd..d33e690250ed88a72432f4dece04364f5fa703e4 100644 --- a/official/nlp/modeling/models/seq2seq_transformer.py +++ b/official/nlp/modeling/models/seq2seq_transformer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/seq2seq_transformer_test.py b/official/nlp/modeling/models/seq2seq_transformer_test.py index 85e7672fc556d982fc907f5592ed0a785560fafc..f45f5f3cef669b1200354e8216bbe6c408cbe123 100644 --- a/official/nlp/modeling/models/seq2seq_transformer_test.py +++ b/official/nlp/modeling/models/seq2seq_transformer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/t5.py b/official/nlp/modeling/models/t5.py index 61f90971044581f4bd03b5293151b515c4dea209..acd7fc648ddfad83f114885484fad6c5f7fc991f 100644 --- a/official/nlp/modeling/models/t5.py +++ b/official/nlp/modeling/models/t5.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -55,6 +55,7 @@ class Module(tf.Module): initializer: Initializer, dtype: tf.DType = tf.float32, **kwargs): + initializer = tf_utils.clone_initializer(initializer) return tf.Variable(initializer(shape, dtype=dtype, **kwargs), name=name) def read_variable(self, @@ -588,7 +589,8 @@ class MultiHeadAttention(Module): init_std_rescaling = tf.math.sqrt(tf.cast(self.d_kv, dtype=self.dtype)) query_w_init = ( lambda *args, **kwargs: ( # pylint: disable=g-long-lambda - weight_initializer(*args, **kwargs) / init_std_rescaling)) + tf_utils.clone_initializer(weight_initializer)( + *args, **kwargs) / init_std_rescaling)) self.q = Linear3D( self.d_model, self.d_kv, @@ -1004,6 +1006,7 @@ class T5TransformerParams: num_heads: int d_ff: int vocab_size: int + target_vocab_size: Optional[int] = None dropout_rate: float = 0.0 layer_norm_epsilon: float = 1e-6 shared_embedding: bool = False @@ -1020,6 +1023,9 @@ class T5TransformerParams: num_decoder_layers: Optional[int] = None one_hot_embedding: bool = True layer_sharing: bool = False + # If true, uses one relative embedding for all encoder layers and one for all + # decoder layers. Otherwise, have relative embedding for each layer. + use_shared_relative_position_bias: bool = True class Encoder(Module): @@ -1048,17 +1054,34 @@ class Encoder(Module): self.input_embed = shared_embedding # Creates an alias to the input embed for encoder-only models. self.word_embed = self.input_embed - self.relative_embedding = RelativePositionEmbedding( - num_heads=self.config.num_heads, - relative_attention_num_buckets=self.config - .relative_attention_num_buckets, - relative_attention_max_distance=self.config - .relative_attention_max_distance, - bidirectional=self.config.bidirectional, - embeddings_initializer=self.config.relative_embeddings_initializer, - dtype=self.dtype, - compute_dtype=self.compute_dtype, - name="relative_posemb") + if config.use_shared_relative_position_bias: + self.relative_embedding = RelativePositionEmbedding( + num_heads=self.config.num_heads, + relative_attention_num_buckets=self.config + .relative_attention_num_buckets, + relative_attention_max_distance=self.config + .relative_attention_max_distance, + bidirectional=self.config.bidirectional, + embeddings_initializer=self.config.relative_embeddings_initializer, + dtype=self.dtype, + compute_dtype=self.compute_dtype, + name="relative_posemb") + else: + self.relative_embeddings = [] + for layer_idx in range(self.config.num_layers): + relative_embedding = RelativePositionEmbedding( + num_heads=self.config.num_heads, + relative_attention_num_buckets=self.config + .relative_attention_num_buckets, + relative_attention_max_distance=self.config + .relative_attention_max_distance, + bidirectional=self.config.bidirectional, + embeddings_initializer=self.config + .relative_embeddings_initializer, + dtype=self.dtype, + compute_dtype=self.compute_dtype, + name=f"relative_posemb_{layer_idx}") + self.relative_embeddings.append(relative_embedding) self.input_dropout = Dropout(self.config.dropout_rate,) self.encoder_layers = [] for layer_idx in range(self.config.num_layers): @@ -1086,12 +1109,38 @@ class Encoder(Module): self.output_dropout = Dropout(self.config.dropout_rate,) @tf.Module.with_name_scope - def __call__(self, inputs, encoder_mask=None, training=False): + def get_relpos_bias(self, + input_length: int, + dense_inputs: tf.Tensor, + layer_idx: Optional[int] = None) -> tf.Tensor: + if self.config.use_shared_relative_position_bias: + position_bias = self.relative_embedding(input_length, input_length) + else: + position_bias = self.relative_embeddings[layer_idx](input_length, + input_length) + if dense_inputs is not None: + # Here we ignore relative position bias for dense embeddings. + # TODO(yejiayu): If we proceed to video use cases, rework this part. + dense_input_length = tf_utils.get_shape_list(dense_inputs)[1] + # Position bias shape: [batch, 1, len, len] + paddings = tf.constant([[0, 0], [0, 0], [0, dense_input_length], + [0, dense_input_length]]) + position_bias = tf.pad(position_bias, paddings, "CONSTANT") + return position_bias + + @tf.Module.with_name_scope + def __call__(self, + inputs=None, + encoder_mask=None, + dense_inputs=None, + training=False): """Applies Transformer model on the inputs. Args: - inputs: input data + inputs: input word ids. Optional if dense data are provided. encoder_mask: the encoder self-attention mask. + dense_inputs: dense input data. Concat after the embedding if word ids + are provided. training: whether it is training pass, affecting dropouts. Returns: @@ -1101,14 +1150,26 @@ class Encoder(Module): if encoder_mask is not None: encoder_mask = tf.cast(encoder_mask, self.compute_dtype) cfg = self.config - x = self.input_embed(inputs, one_hot=cfg.one_hot_embedding) + inputs_array = [] + if inputs is not None: + inputs_array.append( + self.input_embed(inputs, one_hot=cfg.one_hot_embedding)) + if dense_inputs is not None: + inputs_array.append(dense_inputs) + if not inputs_array: + raise ValueError("At least one of inputs and dense_inputs must not be " + "None.") + x = tf.concat(inputs_array, axis=1) tensor_shape = tf_utils.get_shape_list(x) tensor_shape[-2] = 1 x = self.input_dropout(x, noise_shape=tensor_shape, training=training) - input_length = tf_utils.get_shape_list(inputs)[1] - position_bias = self.relative_embedding(input_length, input_length) + if inputs is not None: + input_length = tf_utils.get_shape_list(inputs)[1] + else: + input_length = 0 for i in range(cfg.num_layers): + position_bias = self.get_relpos_bias(input_length, dense_inputs, i) x = self.encoder_layers[i]( x, attention_mask=encoder_mask, @@ -1133,11 +1194,15 @@ class Decoder(Module): self.compute_dtype = compute_dtype if self.config.num_decoder_layers is None: self.config.num_decoder_layers = self.config.num_layers + if not hasattr( + self.config, + "target_vocab_size") or self.config.target_vocab_size is None: + self.config.target_vocab_size = self.config.vocab_size with self.name_scope: # Target Embedding. if shared_embedding is None: self.target_embed = Embed( - vocab_size=self.config.vocab_size, + vocab_size=self.config.target_vocab_size, features=self.config.d_model, embeddings_initializer=self.config.vocab_embeddings_initializer, dtype=self.dtype, @@ -1147,17 +1212,34 @@ class Decoder(Module): self.target_embed = shared_embedding self.target_dropout = Dropout(self.config.dropout_rate,) # Position bias for the target self attention. - self.relative_embedding = RelativePositionEmbedding( - num_heads=self.config.num_heads, - relative_attention_num_buckets=self.config - .relative_attention_num_buckets, - relative_attention_max_distance=self.config - .relative_attention_max_distance, - bidirectional=self.config.bidirectional, - embeddings_initializer=self.config.relative_embeddings_initializer, - dtype=self.dtype, - compute_dtype=self.compute_dtype, - name="relative_posemb") + if config.use_shared_relative_position_bias: + self.relative_embedding = RelativePositionEmbedding( + num_heads=self.config.num_heads, + relative_attention_num_buckets=self.config + .relative_attention_num_buckets, + relative_attention_max_distance=self.config + .relative_attention_max_distance, + bidirectional=self.config.bidirectional, + embeddings_initializer=self.config.relative_embeddings_initializer, + dtype=self.dtype, + compute_dtype=self.compute_dtype, + name="relative_posemb") + else: + self.relative_embeddings = [] + for layer_idx in range(self.config.num_decoder_layers): + relative_embedding = RelativePositionEmbedding( + num_heads=self.config.num_heads, + relative_attention_num_buckets=self.config + .relative_attention_num_buckets, + relative_attention_max_distance=self.config + .relative_attention_max_distance, + bidirectional=self.config.bidirectional, + embeddings_initializer=self.config + .relative_embeddings_initializer, + dtype=self.dtype, + compute_dtype=self.compute_dtype, + name=f"relative_posemb_{layer_idx}") + self.relative_embeddings.append(relative_embedding) self.decoder_layers = [] for layer_idx in range(self.config.num_decoder_layers): if self.config.layer_sharing and layer_idx > 0: @@ -1185,11 +1267,18 @@ class Decoder(Module): if not self.config.logits_via_embedding: self.logits_dense = Linear( in_features=self.config.d_model, - out_features=self.config.vocab_size, + out_features=self.config.target_vocab_size, use_bias=False, dtype=self.dtype, name="logits") + @tf.Module.with_name_scope + def get_relpos_bias(self, input_length: int, layer_idx: int) -> tf.Tensor: + if self.config.use_shared_relative_position_bias: + return self.relative_embedding(input_length, input_length) + else: + return self.relative_embeddings[layer_idx](input_length, input_length) + @tf.Module.with_name_scope def __call__(self, decoder_input_tokens, @@ -1208,7 +1297,7 @@ class Decoder(Module): encoded: the encoder outputs. decoder_mask: the decoder self-attention mask. encoder_decoder_mask: the cross-attention mask. - decode: Whether to perform autoaggressive decoding. + decode: Whether to perform autoregressive decoding. decode_position: integer, the position to decode. cache: The cache dictionary of key, value tensors. max_decode_len: An optional integer specifying the maximum decoding @@ -1217,7 +1306,10 @@ class Decoder(Module): training: Whether it is training pass, affecting dropouts. Returns: - output of a transformer encoder. + output of a transformer encoder including + 1. logits: Logits for each word in the vocab. + 2. raw_logits: Logits along the moded dimension. + 3. cache: Used for decoding in inference mode. """ cfg = self.config # Casts inputs to the dtype. @@ -1230,12 +1322,14 @@ class Decoder(Module): tensor_shape = tf_utils.get_shape_list(x) tensor_shape[-2] = 1 x = self.target_dropout(x, noise_shape=tensor_shape, training=training) - if cache is not None: - position_bias = self.relative_embedding(max_decode_len, max_decode_len) - else: - input_length = tf_utils.get_shape_list(decoder_input_tokens)[1] - position_bias = self.relative_embedding(input_length, input_length) + for i in range(cfg.num_decoder_layers): + if cache is not None: + position_bias = self.get_relpos_bias(max_decode_len, i) + else: + input_length = tf_utils.get_shape_list(decoder_input_tokens)[1] + position_bias = self.get_relpos_bias(input_length, i) + if cache is None: x, _ = self.decoder_layers[i]( x, @@ -1265,7 +1359,7 @@ class Decoder(Module): logits = logits / math.sqrt(cfg.d_model) else: logits = self.logits_dense(output) - return logits, cache + return dict(logits=logits, cache=cache, raw_logits=output) class T5Transformer(Module): @@ -1306,33 +1400,72 @@ class T5Transformer(Module): compute_dtype=self.compute_dtype) def encode(self, - encoder_input_tokens, + encoder_input_tokens=None, encoder_segment_ids=None, + encoder_dense_inputs=None, + encoder_dense_segment_ids=None, training=False): - eligible_positions = tf.cast( - tf.not_equal(encoder_input_tokens, 0), self.compute_dtype) + eligible_position_array = [] + if encoder_input_tokens is not None: + eligible_position_array.append( + tf.cast(tf.not_equal(encoder_input_tokens, 0), self.compute_dtype)) + if encoder_dense_inputs is not None: + eligible_dense_positions = tf.cast( + tf.reduce_any(tf.not_equal(encoder_dense_inputs, 0), axis=-1), + self.compute_dtype) + eligible_position_array.append(eligible_dense_positions) + if not eligible_position_array: + raise ValueError("At least one of encoder_input_tokens and" + " encoder_dense_inputs must be provided.") + + eligible_positions = tf.concat(eligible_position_array, axis=1) encoder_mask = make_attention_mask( eligible_positions, eligible_positions, dtype=tf.bool) + + encoder_segment_id_array = [] if encoder_segment_ids is not None: + encoder_segment_id_array.append(encoder_segment_ids) + if encoder_dense_segment_ids is not None: + encoder_segment_id_array.append(encoder_dense_segment_ids) + if encoder_segment_id_array: + encoder_segment_ids = tf.concat(encoder_segment_id_array, axis=1) segment_mask = make_attention_mask( encoder_segment_ids, encoder_segment_ids, tf.equal, dtype=tf.bool) encoder_mask = tf.math.logical_and(encoder_mask, segment_mask) encoder_mask = (1.0 - tf.cast(encoder_mask, self.compute_dtype)) * -1e9 - return self.encoder(encoder_input_tokens, encoder_mask, training=training) + return self.encoder( + encoder_input_tokens, + encoder_mask, + encoder_dense_inputs, + training=training) def decode( self, encoded, decoder_target_tokens, - encoder_input_tokens, # only used for masks + encoder_input_tokens=None, # only used for masks + encoder_dense_inputs=None, decoder_input_tokens=None, encoder_segment_ids=None, + encoder_dense_segment_ids=None, decoder_segment_ids=None, decode_position=None, cache=None, max_decode_len=None, decode=False, - training=False): + training=False) -> Dict[str, tf.Tensor]: + eligible_inputs_array = [] + if encoder_input_tokens is not None: + eligible_inputs = tf.cast( + tf.not_equal(encoder_input_tokens, 0), self.compute_dtype) + eligible_inputs_array.append(eligible_inputs) + if encoder_dense_inputs is not None: + eligible_dense_inputs = tf.cast( + tf.reduce_any(tf.not_equal(encoder_dense_inputs, 0), axis=-1), + self.compute_dtype) + eligible_inputs_array.append(eligible_dense_inputs) + eligible_inputs = tf.concat(eligible_inputs_array, axis=1) + if decode: # For decoding, the decoder_input_tokens is the decoder_target_tokens. decoder_input_tokens = decoder_target_tokens @@ -1342,14 +1475,12 @@ class T5Transformer(Module): tf.cast( tf.not_equal(tf.ones_like(decoder_target_tokens), 0), self.compute_dtype), - tf.cast(tf.not_equal(encoder_input_tokens, 0), self.compute_dtype), + eligible_inputs, dtype=tf.bool) else: # Note that, masks should be created using decoder_target_tokens. eligible_targets = tf.cast( tf.not_equal(decoder_target_tokens, 0), self.compute_dtype) - eligible_inputs = tf.cast( - tf.not_equal(encoder_input_tokens, 0), self.compute_dtype) decoder_mask = tf.math.logical_and( make_attention_mask( eligible_targets, eligible_targets, dtype=tf.bool), @@ -1365,6 +1496,9 @@ class T5Transformer(Module): decoder_segment_ids, tf.equal, dtype=tf.bool)) + if encoder_dense_segment_ids is not None: + encoder_segment_ids = tf.concat( + [encoder_segment_ids, encoder_dense_segment_ids], axis=1) encoder_decoder_mask = tf.math.logical_and( encoder_decoder_mask, make_attention_mask( @@ -1376,7 +1510,7 @@ class T5Transformer(Module): decoder_mask = (1.0 - tf.cast(decoder_mask, self.compute_dtype)) * -1e9 encoder_decoder_mask = ( 1.0 - tf.cast(encoder_decoder_mask, self.compute_dtype)) * -1e9 - logits, cache = self.decoder( + outputs = self.decoder( decoder_input_tokens, encoded, decode_position=decode_position, @@ -1386,12 +1520,15 @@ class T5Transformer(Module): max_decode_len=max_decode_len, decode=decode, training=training) - return dict(logits=logits, encoded=encoded, cache=cache) + outputs["encoded"] = encoded + return outputs @tf.Module.with_name_scope def __call__(self, - encoder_input_tokens, - decoder_target_tokens, + encoder_input_tokens=None, + decoder_target_tokens=None, + encoder_dense_inputs=None, + encoder_dense_segment_ids=None, decoder_input_tokens=None, encoder_segment_ids=None, decoder_segment_ids=None, @@ -1401,9 +1538,12 @@ class T5Transformer(Module): Args: encoder_input_tokens: input tokens to the encoder. decoder_target_tokens: target tokens to the decoder. + encoder_dense_inputs: input dense vectors to the encoder. + encoder_dense_segment_ids: dense input segmentation info for packed decoder_input_tokens: input tokens to the decoder, only required for training. encoder_segment_ids: input segmentation info for packed examples. + examples. decoder_segment_ids: target segmentation info for packed examples. training: whether it is training pass, affecting dropouts. @@ -1411,15 +1551,19 @@ class T5Transformer(Module): a dictionary of logits/cache. """ encoded = self.encode( - encoder_input_tokens, + encoder_input_tokens=encoder_input_tokens, encoder_segment_ids=encoder_segment_ids, + encoder_dense_inputs=encoder_dense_inputs, + encoder_dense_segment_ids=encoder_dense_segment_ids, training=training) outputs = self.decode( encoded=encoded, decoder_target_tokens=decoder_target_tokens, encoder_input_tokens=encoder_input_tokens, # only used for masks. + encoder_dense_inputs=encoder_dense_inputs, # only used for masks. decoder_input_tokens=decoder_input_tokens, encoder_segment_ids=encoder_segment_ids, + encoder_dense_segment_ids=encoder_dense_segment_ids, decoder_segment_ids=decoder_segment_ids, training=training) outputs["encoded"] = encoded diff --git a/official/nlp/modeling/models/t5_test.py b/official/nlp/modeling/models/t5_test.py index 86acae973f7351286702735906a5b1c3b55238e8..72e1c8f3428a2620a7cf00525909e5c9b3a0d755 100644 --- a/official/nlp/modeling/models/t5_test.py +++ b/official/nlp/modeling/models/t5_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -354,6 +354,40 @@ class T5Test(tf.test.TestCase, parameterized.TestCase): encoded = encoder(tf.zeros((4, 8), dtype=tf.int32)) self.assertEqual(encoded.shape, (4, 8, config.d_model)) + @parameterized.named_parameters(("bfloat16", tf.bfloat16), + ("float32", tf.float32)) + def test_encoder_with_dense(self, dtype): + config = t5.T5TransformerParams( + num_layers=2, + d_model=4, + d_kv=3, + num_heads=4, + d_ff=16, + vocab_size=10, + vocab_embeddings_initializer=tf.keras.initializers.Ones(), + relative_embeddings_initializer=tf.keras.initializers.Ones()) + encoder = t5.Encoder(config, compute_dtype=dtype) + encoded = encoder( + tf.zeros((4, 8), dtype=tf.int32), + dense_inputs=tf.ones((4, 2, 4), dtype=dtype)) + self.assertEqual(encoded.shape, (4, 10, config.d_model)) + + @parameterized.named_parameters(("bfloat16", tf.bfloat16), + ("float32", tf.float32)) + def test_encoder_only_dense(self, dtype): + config = t5.T5TransformerParams( + num_layers=2, + d_model=4, + d_kv=3, + num_heads=4, + d_ff=16, + vocab_size=10, + vocab_embeddings_initializer=tf.keras.initializers.Ones(), + relative_embeddings_initializer=tf.keras.initializers.Ones()) + encoder = t5.Encoder(config, compute_dtype=dtype) + encoded = encoder(dense_inputs=tf.ones((4, 2, 4), dtype=dtype)) + self.assertEqual(encoded.shape, (4, 2, config.d_model)) + def test_decoder(self): max_decode_len = 10 config = t5.T5TransformerParams( @@ -369,7 +403,9 @@ class T5Test(tf.test.TestCase, parameterized.TestCase): batch_size = 4 targets = tf.zeros((4, 8), dtype=tf.int32) encoded = tf.zeros((4, 8, config.d_model), dtype=tf.float32) - logits, cache = decoder(targets, encoded) + outputs = decoder(targets, encoded) + logits = outputs["logits"] + cache = outputs["cache"] self.assertEqual(logits.shape, (4, 8, config.vocab_size)) cache = {} @@ -378,13 +414,15 @@ class T5Test(tf.test.TestCase, parameterized.TestCase): cache[1] = _create_cache(batch_size, max_decode_len, config.num_heads, config.d_kv) targets = tf.zeros((4, 1), dtype=tf.int32) - logits, cache = decoder( + outputs = decoder( targets, encoded, decode_position=2, cache=cache, decode=True, max_decode_len=max_decode_len) + logits = outputs["logits"] + cache = outputs["cache"] self.assertEqual(logits.shape, (batch_size, 1, config.vocab_size)) for entry in cache.values(): for tensor in entry.values(): @@ -445,6 +483,180 @@ class T5Test(tf.test.TestCase, parameterized.TestCase): print(v.name, v.shape) self.assertEqual(v.dtype, tf.float32) + @parameterized.named_parameters( + ("t5_10_dense", ("relu",), True, 26, False, tf.float32),) + def test_transformer_with_dense(self, ffn_activations, logits_via_embedding, + expect_num_variables, layer_sharing, dtype): + max_decode_len = 10 + config = t5.T5TransformerParams( + num_layers=1, + d_model=8, + d_kv=4, + num_heads=4, + d_ff=32, + vocab_size=10, + shared_embedding=True, + layer_sharing=layer_sharing, + ffn_activations=ffn_activations, + logits_via_embedding=logits_via_embedding) + transformer = t5.T5Transformer(config, compute_dtype=dtype) + + self.assertLen(transformer.trainable_variables, expect_num_variables) + inputs = tf.convert_to_tensor( + np.array([[2, 2, 1, 3, 1, 0], [3, 3, 1, 2, 2, 1]])) + segments = tf.convert_to_tensor( + np.array([[1, 1, 1, 2, 2, 0], [1, 1, 1, 2, 2, 2]])) + + dense_inputs = tf.convert_to_tensor(np.random.randn(2, 2, 8), dtype=dtype) + dense_segments = tf.convert_to_tensor(np.array([[1, 2], [1, 2]])) + outputs = transformer( + encoder_input_tokens=inputs, + encoder_dense_inputs=dense_inputs, + decoder_input_tokens=inputs, + decoder_target_tokens=inputs, + encoder_segment_ids=segments, + encoder_dense_segment_ids=dense_segments, + decoder_segment_ids=segments) + cache = {} + batch_size = 2 + cache[0] = _create_cache( + batch_size, max_decode_len, config.num_heads, config.d_kv, dtype=dtype) + outputs = transformer.decode( + encoder_input_tokens=inputs, + encoder_dense_inputs=dense_inputs, + encoded=outputs["encoded"], + decoder_target_tokens=tf.ones((batch_size, 1), dtype=tf.int32), + decode_position=1, + decode=True, + max_decode_len=max_decode_len, + cache=cache) + self.assertEqual(outputs["logits"].shape, + (batch_size, 1, config.vocab_size)) + for v in transformer.trainable_variables: + print(v.name, v.shape) + self.assertEqual(v.dtype, tf.float32) + + @parameterized.named_parameters( + ("t5_10_dense_layerwise_relpos", + ("relu",), True, 26, False, tf.float32, False, 1), + ("t5_10_dense_shared_relpos_d2", + ("relu",), True, 39, False, tf.float32, True, 2), + ("t5_10_dense_layerwise_relpos_d2", + ("relu",), True, 40, False, tf.float32, False, 2), + ) + def test_transformer_with_lw_relpos(self, ffn_activations, + logits_via_embedding, + expect_num_variables, layer_sharing, + dtype, use_shared_relpos, + num_decoder_layers): + max_decode_len = 10 + config = t5.T5TransformerParams( + num_layers=1, + num_decoder_layers=num_decoder_layers, + d_model=8, + d_kv=4, + num_heads=4, + d_ff=32, + vocab_size=10, + shared_embedding=True, + layer_sharing=layer_sharing, + ffn_activations=ffn_activations, + logits_via_embedding=logits_via_embedding, + use_shared_relative_position_bias=use_shared_relpos) + transformer = t5.T5Transformer(config, compute_dtype=dtype) + + self.assertLen(transformer.trainable_variables, expect_num_variables) + inputs = tf.convert_to_tensor( + np.array([[2, 2, 1, 3, 1, 0], [3, 3, 1, 2, 2, 1]])) + segments = tf.convert_to_tensor( + np.array([[1, 1, 1, 2, 2, 0], [1, 1, 1, 2, 2, 2]])) + + dense_inputs = tf.convert_to_tensor(np.random.randn(2, 2, 8), dtype=dtype) + dense_segments = tf.convert_to_tensor(np.array([[1, 2], [1, 2]])) + outputs = transformer( + encoder_input_tokens=inputs, + encoder_dense_inputs=dense_inputs, + decoder_input_tokens=inputs, + decoder_target_tokens=inputs, + encoder_segment_ids=segments, + encoder_dense_segment_ids=dense_segments, + decoder_segment_ids=segments) + cache = {} + batch_size = 2 + for i in range(num_decoder_layers): + cache[i] = _create_cache( + batch_size, + max_decode_len, + config.num_heads, + config.d_kv, + dtype=dtype) + outputs = transformer.decode( + encoder_input_tokens=inputs, + encoder_dense_inputs=dense_inputs, + encoded=outputs["encoded"], + decoder_target_tokens=tf.ones((batch_size, 1), dtype=tf.int32), + decode_position=1, + decode=True, + max_decode_len=max_decode_len, + cache=cache) + self.assertEqual(outputs["logits"].shape, + (batch_size, 1, config.vocab_size)) + for v in transformer.trainable_variables: + print(v.name, v.shape) + self.assertEqual(v.dtype, tf.float32) + + @parameterized.named_parameters( + ("t5_10", ("relu",), True, 26, False, tf.float32),) + def test_transformer_with_dense_only(self, ffn_activations, + logits_via_embedding, + expect_num_variables, layer_sharing, + dtype): + max_decode_len = 10 + config = t5.T5TransformerParams( + num_layers=1, + d_model=8, + d_kv=4, + num_heads=4, + d_ff=32, + vocab_size=10, + shared_embedding=True, + layer_sharing=layer_sharing, + ffn_activations=ffn_activations, + logits_via_embedding=logits_via_embedding) + transformer = t5.T5Transformer(config, compute_dtype=dtype) + self.assertLen(transformer.trainable_variables, expect_num_variables) + + decoder_inputs = tf.convert_to_tensor( + np.array([[2, 2, 1, 3, 1, 0], [3, 3, 1, 2, 2, 1]])) + decoder_segments = tf.convert_to_tensor( + np.array([[1, 1, 1, 2, 2, 0], [1, 1, 1, 2, 2, 2]])) + + dense_inputs = tf.convert_to_tensor(np.random.randn(2, 2, 8), dtype=dtype) + dense_segments = tf.convert_to_tensor(np.array([[1, 2], [1, 2]])) + outputs = transformer( + encoder_dense_inputs=dense_inputs, + encoder_dense_segment_ids=dense_segments, + decoder_input_tokens=decoder_inputs, + decoder_target_tokens=decoder_inputs, + decoder_segment_ids=decoder_segments) + cache = {} + batch_size = 2 + cache[0] = _create_cache( + batch_size, max_decode_len, config.num_heads, config.d_kv, dtype=dtype) + outputs = transformer.decode( + encoder_dense_inputs=dense_inputs, + encoded=outputs["encoded"], + decoder_target_tokens=tf.ones((batch_size, 1), dtype=tf.int32), + decode_position=1, + decode=True, + max_decode_len=max_decode_len, + cache=cache) + self.assertEqual(outputs["logits"].shape, + (batch_size, 1, config.vocab_size)) + for v in transformer.trainable_variables: + print(v.name, v.shape) + self.assertEqual(v.dtype, tf.float32) + @parameterized.named_parameters( ("t5_10", ("relu",), True, 39, tf.float32, 2), ("t5_10_bfloat16", ("relu",), True, 39, tf.bfloat16, 2)) diff --git a/official/nlp/modeling/models/xlnet.py b/official/nlp/modeling/models/xlnet.py index c359c20e949e15f5d228b7714303a99fed794b30..eea637e03163b4afa7bdec6bf4fbd639ae77eac1 100644 --- a/official/nlp/modeling/models/xlnet.py +++ b/official/nlp/modeling/models/xlnet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/models/xlnet_test.py b/official/nlp/modeling/models/xlnet_test.py index 74480a48d9b029fdd7f27f543d490e1f7854bf7f..e22883508da994f2419411a93583754a7c1780a9 100644 --- a/official/nlp/modeling/models/xlnet_test.py +++ b/official/nlp/modeling/models/xlnet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/networks/README.md b/official/nlp/modeling/networks/README.md index b192399a7276ef122725f40d2b0e3d237805e644..b32a30775cd3976635618a1d1d404df50f85743a 100644 --- a/official/nlp/modeling/networks/README.md +++ b/official/nlp/modeling/networks/README.md @@ -37,3 +37,8 @@ Generalized Autoregressive Pretraining for Language Understanding" (https://arxiv.org/abs/1906.08237). It includes embedding lookups, relative position encodings, mask computations, segment matrix computations and Transformer XL layers using one or two stream relative self-attention. + +* [`FNet`](fnet.py) implements the encoder model from ["FNet: Mixing Tokens with +Fourier Transforms"](https://aclanthology.org/2022.naacl-main.319/). FNet has +the same structure as a Transformer encoder, except that all or most of the +self-attention sublayers are replaced with Fourier sublayers. diff --git a/official/nlp/modeling/networks/__init__.py b/official/nlp/modeling/networks/__init__.py index 137bc3ac4f787f32c36e9eed15d026d15ec8199c..0128481d91eb3552ee4ac5970e5a983863e5297c 100644 --- a/official/nlp/modeling/networks/__init__.py +++ b/official/nlp/modeling/networks/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,6 +23,7 @@ from official.nlp.modeling.networks.bert_encoder import BertEncoder from official.nlp.modeling.networks.bert_encoder import BertEncoderV2 from official.nlp.modeling.networks.classification import Classification from official.nlp.modeling.networks.encoder_scaffold import EncoderScaffold +from official.nlp.modeling.networks.fnet import FNet from official.nlp.modeling.networks.funnel_transformer import FunnelTransformerEncoder from official.nlp.modeling.networks.mobile_bert_encoder import MobileBERTEncoder from official.nlp.modeling.networks.packed_sequence_embedding import PackedSequenceEmbedding diff --git a/official/nlp/modeling/networks/albert_encoder.py b/official/nlp/modeling/networks/albert_encoder.py index f7453787bef757beb9380276f82f11acc1b562fe..e7095de4e90d914448c73fc88e1774453377ba11 100644 --- a/official/nlp/modeling/networks/albert_encoder.py +++ b/official/nlp/modeling/networks/albert_encoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,6 +18,7 @@ import collections import tensorflow as tf from official.modeling import activations +from official.modeling import tf_utils from official.nlp.modeling import layers @@ -92,13 +93,13 @@ class AlbertEncoder(tf.keras.Model): embedding_layer = layers.OnDeviceEmbedding( vocab_size=vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), name='word_embeddings') word_embeddings = embedding_layer(word_ids) # Always uses dynamic slicing for simplicity. position_embedding_layer = layers.PositionEmbedding( - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), max_length=max_sequence_length, name='position_embedding') position_embeddings = position_embedding_layer(word_embeddings) @@ -107,7 +108,7 @@ class AlbertEncoder(tf.keras.Model): layers.OnDeviceEmbedding( vocab_size=type_vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), use_one_hot=True, name='type_embeddings')(type_ids)) @@ -123,11 +124,11 @@ class AlbertEncoder(tf.keras.Model): # We project the 'embedding' output to 'hidden_size' if it is not already # 'hidden_size'. if embedding_width != hidden_size: - embeddings = tf.keras.layers.experimental.EinsumDense( + embeddings = tf.keras.layers.EinsumDense( '...x,xy->...y', output_shape=hidden_size, bias_axes='y', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='embedding_projection')( embeddings) @@ -139,7 +140,7 @@ class AlbertEncoder(tf.keras.Model): inner_activation=activation, output_dropout=dropout_rate, attention_dropout=attention_dropout_rate, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='transformer') encoder_outputs = [] for _ in range(num_layers): @@ -153,7 +154,7 @@ class AlbertEncoder(tf.keras.Model): cls_output = tf.keras.layers.Dense( units=hidden_size, activation='tanh', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='pooler_transform')( first_token_tensor) if dict_outputs: @@ -172,7 +173,7 @@ class AlbertEncoder(tf.keras.Model): # created using the Functional API. Once super().__init__ is called, we # can assign attributes to `self` - note that all `self` assignments are # below this line. - super(AlbertEncoder, self).__init__( + super().__init__( inputs=[word_ids, mask, type_ids], outputs=outputs, **kwargs) config_dict = { 'vocab_size': vocab_size, diff --git a/official/nlp/modeling/networks/albert_encoder_test.py b/official/nlp/modeling/networks/albert_encoder_test.py index f3cb60c36f9938397a55d17eef00b19cedfdd819..f7116afc9150f85440d20e85f7548abaa8191c95 100644 --- a/official/nlp/modeling/networks/albert_encoder_test.py +++ b/official/nlp/modeling/networks/albert_encoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/networks/bert_dense_encoder.py b/official/nlp/modeling/networks/bert_dense_encoder.py deleted file mode 100644 index 344e9e0406b8a541de7d035efeb71a5a4a5af50d..0000000000000000000000000000000000000000 --- a/official/nlp/modeling/networks/bert_dense_encoder.py +++ /dev/null @@ -1,276 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Transformer-based BERT encoder network with dense features as inputs.""" -# pylint: disable=g-classes-have-attributes - -from typing import Any, Callable, Optional, Union -from absl import logging -import tensorflow as tf - -from official.nlp.modeling import layers - - -_Initializer = Union[str, tf.keras.initializers.Initializer] -_approx_gelu = lambda x: tf.keras.activations.gelu(x, approximate=True) - - -class BertDenseEncoder(tf.keras.layers.Layer): - """Bi-directional Transformer-based encoder network with dense features. - - This network is the same as the BertEncoder except it also concats dense - features with the embeddings. - - Args: - vocab_size: The size of the token vocabulary. - hidden_size: The size of the transformer hidden layers. - num_layers: The number of transformer layers. - num_attention_heads: The number of attention heads for each transformer. The - hidden size must be divisible by the number of attention heads. - max_sequence_length: The maximum sequence length that this encoder can - consume. If None, max_sequence_length uses the value from sequence length. - This determines the variable shape for positional embeddings. - type_vocab_size: The number of types that the 'type_ids' input can take. - inner_dim: The output dimension of the first Dense layer in a two-layer - feedforward network for each transformer. - inner_activation: The activation for the first Dense layer in a two-layer - feedforward network for each transformer. - output_dropout: Dropout probability for the post-attention and output - dropout. - attention_dropout: The dropout rate to use for the attention layers within - the transformer layers. - initializer: The initialzer to use for all weights in this encoder. - output_range: The sequence output range, [0, output_range), by slicing the - target sequence of the last transformer layer. `None` means the entire - target sequence will attend to the source sequence, which yields the full - output. - embedding_width: The width of the word embeddings. If the embedding width is - not equal to hidden size, embedding parameters will be factorized into two - matrices in the shape of ['vocab_size', 'embedding_width'] and - ['embedding_width', 'hidden_size'] ('embedding_width' is usually much - smaller than 'hidden_size'). - embedding_layer: An optional Layer instance which will be called to generate - embeddings for the input word IDs. - norm_first: Whether to normalize inputs to attention and intermediate dense - layers. If set False, output of attention and intermediate dense layers is - normalized. - """ - - def __init__( - self, - vocab_size: int, - hidden_size: int = 768, - num_layers: int = 12, - num_attention_heads: int = 12, - max_sequence_length: int = 512, - type_vocab_size: int = 16, - inner_dim: int = 3072, - inner_activation: Callable[..., Any] = _approx_gelu, - output_dropout: float = 0.1, - attention_dropout: float = 0.1, - initializer: _Initializer = tf.keras.initializers.TruncatedNormal( - stddev=0.02), - output_range: Optional[int] = None, - embedding_width: Optional[int] = None, - embedding_layer: Optional[tf.keras.layers.Layer] = None, - norm_first: bool = False, - **kwargs): - # Pops kwargs that are used in V1 implementation. - if 'dict_outputs' in kwargs: - kwargs.pop('dict_outputs') - if 'return_all_encoder_outputs' in kwargs: - kwargs.pop('return_all_encoder_outputs') - if 'intermediate_size' in kwargs: - inner_dim = kwargs.pop('intermediate_size') - if 'activation' in kwargs: - inner_activation = kwargs.pop('activation') - if 'dropout_rate' in kwargs: - output_dropout = kwargs.pop('dropout_rate') - if 'attention_dropout_rate' in kwargs: - attention_dropout = kwargs.pop('attention_dropout_rate') - super().__init__(**kwargs) - - activation = tf.keras.activations.get(inner_activation) - initializer = tf.keras.initializers.get(initializer) - - if embedding_width is None: - embedding_width = hidden_size - - if embedding_layer is None: - self._embedding_layer = layers.OnDeviceEmbedding( - vocab_size=vocab_size, - embedding_width=embedding_width, - initializer=initializer, - name='word_embeddings') - else: - self._embedding_layer = embedding_layer - - self._position_embedding_layer = layers.PositionEmbedding( - initializer=initializer, - max_length=max_sequence_length, - name='position_embedding') - - self._type_embedding_layer = layers.OnDeviceEmbedding( - vocab_size=type_vocab_size, - embedding_width=embedding_width, - initializer=initializer, - use_one_hot=True, - name='type_embeddings') - - self._embedding_norm_layer = tf.keras.layers.LayerNormalization( - name='embeddings/layer_norm', axis=-1, epsilon=1e-12, dtype=tf.float32) - - self._embedding_dropout = tf.keras.layers.Dropout( - rate=output_dropout, name='embedding_dropout') - - # We project the 'embedding' output to 'hidden_size' if it is not already - # 'hidden_size'. - self._embedding_projection = None - if embedding_width != hidden_size: - self._embedding_projection = tf.keras.layers.experimental.EinsumDense( - '...x,xy->...y', - output_shape=hidden_size, - bias_axes='y', - kernel_initializer=initializer, - name='embedding_projection') - - self._transformer_layers = [] - self._attention_mask_layer = layers.SelfAttentionMask( - name='self_attention_mask') - for i in range(num_layers): - layer = layers.TransformerEncoderBlock( - num_attention_heads=num_attention_heads, - inner_dim=inner_dim, - inner_activation=inner_activation, - output_dropout=output_dropout, - attention_dropout=attention_dropout, - norm_first=norm_first, - output_range=output_range if i == num_layers - 1 else None, - kernel_initializer=initializer, - name='transformer/layer_%d' % i) - self._transformer_layers.append(layer) - - self._pooler_layer = tf.keras.layers.Dense( - units=hidden_size, - activation='tanh', - kernel_initializer=initializer, - name='pooler_transform') - - self._config = { - 'vocab_size': vocab_size, - 'hidden_size': hidden_size, - 'num_layers': num_layers, - 'num_attention_heads': num_attention_heads, - 'max_sequence_length': max_sequence_length, - 'type_vocab_size': type_vocab_size, - 'inner_dim': inner_dim, - 'inner_activation': tf.keras.activations.serialize(activation), - 'output_dropout': output_dropout, - 'attention_dropout': attention_dropout, - 'initializer': tf.keras.initializers.serialize(initializer), - 'output_range': output_range, - 'embedding_width': embedding_width, - 'embedding_layer': embedding_layer, - 'norm_first': norm_first, - } - self.inputs = dict( - input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), - input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), - input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), - dense_inputs=tf.keras.Input( - shape=(None, embedding_width), dtype=tf.float32), - dense_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), - dense_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), - ) - - def call(self, inputs): - word_embeddings = None - if isinstance(inputs, dict): - word_ids = inputs.get('input_word_ids') - mask = inputs.get('input_mask') - type_ids = inputs.get('input_type_ids') - word_embeddings = inputs.get('input_word_embeddings', None) - dense_inputs = inputs.get('dense_inputs') - dense_mask = inputs.get('dense_mask') - dense_type_ids = inputs.get('dense_type_ids') - else: - raise ValueError('Unexpected inputs type to %s.' % self.__class__) - - if word_embeddings is None: - word_embeddings = self._embedding_layer(word_ids) - - # Concat the dense embeddings at sequence end. - combined_embeddings = tf.concat([word_embeddings, dense_inputs], axis=1) - combined_type_ids = tf.concat([type_ids, dense_type_ids], axis=1) - combined_mask = tf.concat([mask, dense_mask], axis=1) - - # absolute position embeddings. - position_embeddings = self._position_embedding_layer(combined_embeddings) - type_embeddings = self._type_embedding_layer(combined_type_ids) - - embeddings = combined_embeddings + position_embeddings + type_embeddings - embeddings = self._embedding_norm_layer(embeddings) - embeddings = self._embedding_dropout(embeddings) - - if self._embedding_projection is not None: - embeddings = self._embedding_projection(embeddings) - - attention_mask = self._attention_mask_layer(embeddings, combined_mask) - - encoder_outputs = [] - x = embeddings - for layer in self._transformer_layers: - x = layer([x, attention_mask]) - encoder_outputs.append(x) - - last_encoder_output = encoder_outputs[-1] - first_token_tensor = last_encoder_output[:, 0, :] - pooled_output = self._pooler_layer(first_token_tensor) - - return dict( - sequence_output=encoder_outputs[-1], - pooled_output=pooled_output, - encoder_outputs=encoder_outputs) - - def get_embedding_table(self): - return self._embedding_layer.embeddings - - def get_embedding_layer(self): - return self._embedding_layer - - def get_config(self): - return dict(self._config) - - @property - def transformer_layers(self): - """List of Transformer layers in the encoder.""" - return self._transformer_layers - - @property - def pooler_layer(self): - """The pooler dense layer after the transformer layers.""" - return self._pooler_layer - - @classmethod - def from_config(cls, config, custom_objects=None): - if 'embedding_layer' in config and config['embedding_layer'] is not None: - warn_string = ( - 'You are reloading a model that was saved with a ' - 'potentially-shared embedding layer object. If you contine to ' - 'train this model, the embedding layer will no longer be shared. ' - 'To work around this, load the model outside of the Keras API.') - print('WARNING: ' + warn_string) - logging.warn(warn_string) - - return cls(**config) diff --git a/official/nlp/modeling/networks/bert_dense_encoder_test.py b/official/nlp/modeling/networks/bert_dense_encoder_test.py index dcc9e3e8af87267d369785b3dc98dd51aa972ada..a2ed8b1b8b68fa3f3e1fbf7108f49f353238139d 100644 --- a/official/nlp/modeling/networks/bert_dense_encoder_test.py +++ b/official/nlp/modeling/networks/bert_dense_encoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,29 +20,30 @@ import numpy as np import tensorflow as tf from tensorflow.python.keras import keras_parameterized # pylint: disable=g-direct-tensorflow-import -from official.nlp.modeling.networks import bert_dense_encoder +from official.nlp.modeling.networks import bert_encoder # This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It # guarantees forward compatibility of this code for the V2 switchover. @keras_parameterized.run_all_keras_modes -class BertDenseEncoderTest(keras_parameterized.TestCase): +class BertEncoderV2Test(keras_parameterized.TestCase): def tearDown(self): - super(BertDenseEncoderTest, self).tearDown() + super(BertEncoderV2Test, self).tearDown() tf.keras.mixed_precision.set_global_policy("float32") def test_dict_outputs_network_creation(self): hidden_size = 32 sequence_length = 21 dense_sequence_length = 20 - # Create a small dense BertDenseEncoder for testing. + # Create a small dense BertEncoderV2 for testing. kwargs = {} - test_network = bert_dense_encoder.BertDenseEncoder( + test_network = bert_encoder.BertEncoderV2( vocab_size=100, hidden_size=hidden_size, num_attention_heads=2, num_layers=3, + with_dense_inputs=True, **kwargs) # Create the inputs (note that the first dimension is implicit). word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) @@ -86,12 +87,13 @@ class BertDenseEncoderTest(keras_parameterized.TestCase): sequence_length = 21 dense_sequence_length = 20 # Create a small BertEncoder for testing. - test_network = bert_dense_encoder.BertDenseEncoder( + test_network = bert_encoder.BertEncoderV2( vocab_size=100, hidden_size=hidden_size, num_attention_heads=2, num_layers=3, - dict_outputs=True) + dict_outputs=True, + with_dense_inputs=True) # Create the inputs (note that the first dimension is implicit). word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) @@ -134,12 +136,13 @@ class BertDenseEncoderTest(keras_parameterized.TestCase): dense_sequence_length = 20 tf.keras.mixed_precision.set_global_policy("mixed_float16") # Create a small BertEncoder for testing. - test_network = bert_dense_encoder.BertDenseEncoder( + test_network = bert_encoder.BertEncoderV2( vocab_size=100, hidden_size=hidden_size, num_attention_heads=2, num_layers=3, - dict_outputs=True) + dict_outputs=True, + with_dense_inputs=True) # Create the inputs (note that the first dimension is implicit). word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) @@ -176,9 +179,8 @@ class BertDenseEncoderTest(keras_parameterized.TestCase): self.assertAllEqual(tf.float16, pooled.dtype) @parameterized.named_parameters( - ("all_sequence_encoder_v2", bert_dense_encoder.BertDenseEncoder, None, - 41), - ("output_range_encoder_v2", bert_dense_encoder.BertDenseEncoder, 1, 1), + ("all_sequence_encoder_v2", bert_encoder.BertEncoderV2, None, 41), + ("output_range_encoder_v2", bert_encoder.BertEncoderV2, 1, 1), ) def test_dict_outputs_network_invocation( self, encoder_cls, output_range, out_seq_len): @@ -194,8 +196,9 @@ class BertDenseEncoderTest(keras_parameterized.TestCase): num_attention_heads=2, num_layers=3, type_vocab_size=num_types, - output_range=output_range, - dict_outputs=True) + dict_outputs=True, + with_dense_inputs=True, + output_range=output_range) # Create the inputs (note that the first dimension is implicit). word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) @@ -276,7 +279,7 @@ class BertDenseEncoderTest(keras_parameterized.TestCase): # Creates a BertEncoder with embedding_width != hidden_size embedding_width = 16 - test_network = bert_dense_encoder.BertDenseEncoder( + test_network = bert_encoder.BertEncoderV2( vocab_size=vocab_size, hidden_size=hidden_size, max_sequence_length=max_sequence_length, @@ -316,11 +319,12 @@ class BertDenseEncoderTest(keras_parameterized.TestCase): sequence_length = 21 dense_sequence_length = 20 # Create a small BertEncoder for testing. - test_network = bert_dense_encoder.BertDenseEncoder( + test_network = bert_encoder.BertEncoderV2( vocab_size=100, hidden_size=hidden_size, num_attention_heads=2, - num_layers=3) + num_layers=3, + with_dense_inputs=True) # Create the inputs (note that the first dimension is implicit). word_ids = tf.keras.Input(shape=(sequence_length), dtype=tf.int32) mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) diff --git a/official/nlp/modeling/networks/bert_encoder.py b/official/nlp/modeling/networks/bert_encoder.py index 40fbd2da2427907e4f9a0aca3ee1ff1dabf5407e..e9dd91d4bac41931ed50a942a6e22b0e4d8fc2cc 100644 --- a/official/nlp/modeling/networks/bert_encoder.py +++ b/official/nlp/modeling/networks/bert_encoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,9 +19,9 @@ from typing import Any, Callable, Optional, Union from absl import logging import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling import layers - _Initializer = Union[str, tf.keras.initializers.Initializer] _Activation = Union[str, Callable[..., Any]] @@ -48,8 +48,7 @@ class BertEncoderV2(tf.keras.layers.Layer): num_attention_heads: The number of attention heads for each transformer. The hidden size must be divisible by the number of attention heads. max_sequence_length: The maximum sequence length that this encoder can - consume. If None, max_sequence_length uses the value from sequence length. - This determines the variable shape for positional embeddings. + consume. This determines the variable shape for positional embeddings. type_vocab_size: The number of types that the 'type_ids' input can take. inner_dim: The output dimension of the first Dense layer in a two-layer feedforward network for each transformer. @@ -74,6 +73,11 @@ class BertEncoderV2(tf.keras.layers.Layer): norm_first: Whether to normalize inputs to attention and intermediate dense layers. If set False, output of attention and intermediate dense layers is normalized. + with_dense_inputs: Whether to accept dense embeddings as the input. + return_attention_scores: Whether to add an additional output containing the + attention scores of all transformer layers. This will be a list of length + `num_layers`, and each element will be in the shape [batch_size, + num_attention_heads, seq_dim, seq_dim]. """ def __init__( @@ -94,6 +98,8 @@ class BertEncoderV2(tf.keras.layers.Layer): embedding_width: Optional[int] = None, embedding_layer: Optional[tf.keras.layers.Layer] = None, norm_first: bool = False, + with_dense_inputs: bool = False, + return_attention_scores: bool = False, **kwargs): # Pops kwargs that are used in V1 implementation. if 'dict_outputs' in kwargs: @@ -110,6 +116,8 @@ class BertEncoderV2(tf.keras.layers.Layer): attention_dropout = kwargs.pop('attention_dropout_rate') super().__init__(**kwargs) + self._output_range = output_range + activation = tf.keras.activations.get(inner_activation) initializer = tf.keras.initializers.get(initializer) @@ -120,20 +128,20 @@ class BertEncoderV2(tf.keras.layers.Layer): self._embedding_layer = layers.OnDeviceEmbedding( vocab_size=vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), name='word_embeddings') else: self._embedding_layer = embedding_layer self._position_embedding_layer = layers.PositionEmbedding( - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), max_length=max_sequence_length, name='position_embedding') self._type_embedding_layer = layers.OnDeviceEmbedding( vocab_size=type_vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), use_one_hot=True, name='type_embeddings') @@ -147,16 +155,17 @@ class BertEncoderV2(tf.keras.layers.Layer): # 'hidden_size'. self._embedding_projection = None if embedding_width != hidden_size: - self._embedding_projection = tf.keras.layers.experimental.EinsumDense( + self._embedding_projection = tf.keras.layers.EinsumDense( '...x,xy->...y', output_shape=hidden_size, bias_axes='y', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='embedding_projection') self._transformer_layers = [] self._attention_mask_layer = layers.SelfAttentionMask( name='self_attention_mask') + self._num_layers = num_layers for i in range(num_layers): layer = layers.TransformerEncoderBlock( num_attention_heads=num_attention_heads, @@ -165,15 +174,15 @@ class BertEncoderV2(tf.keras.layers.Layer): output_dropout=output_dropout, attention_dropout=attention_dropout, norm_first=norm_first, - output_range=output_range if i == num_layers - 1 else None, - kernel_initializer=initializer, + return_attention_scores=return_attention_scores, + kernel_initializer=tf_utils.clone_initializer(initializer), name='transformer/layer_%d' % i) self._transformer_layers.append(layer) self._pooler_layer = tf.keras.layers.Dense( units=hidden_size, activation='tanh', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='pooler_transform') self._config = { @@ -192,11 +201,24 @@ class BertEncoderV2(tf.keras.layers.Layer): 'embedding_width': embedding_width, 'embedding_layer': embedding_layer, 'norm_first': norm_first, + 'with_dense_inputs': with_dense_inputs, + 'return_attention_scores': return_attention_scores, } - self.inputs = dict( - input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), - input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), - input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32)) + if with_dense_inputs: + self.inputs = dict( + input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + dense_inputs=tf.keras.Input( + shape=(None, embedding_width), dtype=tf.float32), + dense_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + dense_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + ) + else: + self.inputs = dict( + input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32)) def call(self, inputs): word_embeddings = None @@ -205,11 +227,22 @@ class BertEncoderV2(tf.keras.layers.Layer): mask = inputs.get('input_mask') type_ids = inputs.get('input_type_ids') word_embeddings = inputs.get('input_word_embeddings', None) + + dense_inputs = inputs.get('dense_inputs', None) + dense_mask = inputs.get('dense_mask', None) + dense_type_ids = inputs.get('dense_type_ids', None) else: raise ValueError('Unexpected inputs type to %s.' % self.__class__) if word_embeddings is None: word_embeddings = self._embedding_layer(word_ids) + + if dense_inputs is not None: + # Concat the dense embeddings at sequence end. + word_embeddings = tf.concat([word_embeddings, dense_inputs], axis=1) + type_ids = tf.concat([type_ids, dense_type_ids], axis=1) + mask = tf.concat([mask, dense_mask], axis=1) + # absolute position embeddings. position_embeddings = self._position_embedding_layer(word_embeddings) type_embeddings = self._type_embedding_layer(type_ids) @@ -224,19 +257,29 @@ class BertEncoderV2(tf.keras.layers.Layer): attention_mask = self._attention_mask_layer(embeddings, mask) encoder_outputs = [] + attention_outputs = [] x = embeddings - for layer in self._transformer_layers: - x = layer([x, attention_mask]) + for i, layer in enumerate(self._transformer_layers): + transformer_output_range = None + if i == self._num_layers - 1: + transformer_output_range = self._output_range + x = layer([x, attention_mask], output_range=transformer_output_range) + if self._config['return_attention_scores']: + x, attention_scores = x + attention_outputs.append(attention_scores) encoder_outputs.append(x) last_encoder_output = encoder_outputs[-1] first_token_tensor = last_encoder_output[:, 0, :] pooled_output = self._pooler_layer(first_token_tensor) - return dict( + output = dict( sequence_output=encoder_outputs[-1], pooled_output=pooled_output, encoder_outputs=encoder_outputs) + if self._config['return_attention_scores']: + output['attention_scores'] = attention_outputs + return output def get_embedding_table(self): return self._embedding_layer.embeddings @@ -299,13 +342,13 @@ class BertEncoder(tf.keras.Model): This determines the variable shape for positional embeddings. type_vocab_size: The number of types that the 'type_ids' input can take. inner_dim: The output dimension of the first Dense layer in a two-layer - feedforward network for each transformer. + feedforward network for each transformer. inner_activation: The activation for the first Dense layer in a two-layer - feedforward network for each transformer. + feedforward network for each transformer. output_dropout: Dropout probability for the post-attention and output - dropout. - attention_dropout: The dropout rate to use for the attention layers - within the transformer layers. + dropout. + attention_dropout: The dropout rate to use for the attention layers within + the transformer layers. initializer: The initialzer to use for all weights in this encoder. output_range: The sequence output range, [0, output_range), by slicing the target sequence of the last transformer layer. `None` means the entire @@ -316,16 +359,20 @@ class BertEncoder(tf.keras.Model): matrices in the shape of ['vocab_size', 'embedding_width'] and ['embedding_width', 'hidden_size'] ('embedding_width' is usually much smaller than 'hidden_size'). - embedding_layer: An optional Layer instance which will be called to - generate embeddings for the input word IDs. - norm_first: Whether to normalize inputs to attention and intermediate - dense layers. If set False, output of attention and intermediate dense - layers is normalized. + embedding_layer: An optional Layer instance which will be called to generate + embeddings for the input word IDs. + norm_first: Whether to normalize inputs to attention and intermediate dense + layers. If set False, output of attention and intermediate dense layers is + normalized. dict_outputs: Whether to use a dictionary as the model outputs. return_all_encoder_outputs: Whether to output sequence embedding outputs of all encoder transformer layers. Note: when the following `dict_outputs` argument is True, all encoder outputs are always returned in the dict, keyed by `encoder_outputs`. + return_attention_scores: Whether to add an additional output containing the + attention scores of all transformer layers. This will be a list of length + `num_layers`, and each element will be in the shape [batch_size, + num_attention_heads, seq_dim, seq_dim]. """ def __init__( @@ -347,6 +394,7 @@ class BertEncoder(tf.keras.Model): norm_first=False, dict_outputs=False, return_all_encoder_outputs=False, + return_attention_scores: bool = False, **kwargs): if 'sequence_length' in kwargs: kwargs.pop('sequence_length') @@ -384,7 +432,7 @@ class BertEncoder(tf.keras.Model): embedding_layer_inst = layers.OnDeviceEmbedding( vocab_size=vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), name='word_embeddings') else: embedding_layer_inst = embedding_layer @@ -392,14 +440,14 @@ class BertEncoder(tf.keras.Model): # Always uses dynamic slicing for simplicity. position_embedding_layer = layers.PositionEmbedding( - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), max_length=max_sequence_length, name='position_embedding') position_embeddings = position_embedding_layer(word_embeddings) type_embedding_layer = layers.OnDeviceEmbedding( vocab_size=type_vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), use_one_hot=True, name='type_embeddings') type_embeddings = type_embedding_layer(type_ids) @@ -416,11 +464,11 @@ class BertEncoder(tf.keras.Model): # We project the 'embedding' output to 'hidden_size' if it is not already # 'hidden_size'. if embedding_width != hidden_size: - embedding_projection = tf.keras.layers.experimental.EinsumDense( + embedding_projection = tf.keras.layers.EinsumDense( '...x,xy->...y', output_shape=hidden_size, bias_axes='y', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='embedding_projection') embeddings = embedding_projection(embeddings) else: @@ -430,11 +478,11 @@ class BertEncoder(tf.keras.Model): data = embeddings attention_mask = layers.SelfAttentionMask()(data, mask) encoder_outputs = [] + attention_outputs = [] for i in range(num_layers): - if i == num_layers - 1 and output_range is not None: + transformer_output_range = None + if i == num_layers - 1: transformer_output_range = output_range - else: - transformer_output_range = None layer = layers.TransformerEncoderBlock( num_attention_heads=num_attention_heads, inner_dim=inner_dim, @@ -442,11 +490,15 @@ class BertEncoder(tf.keras.Model): output_dropout=output_dropout, attention_dropout=attention_dropout, norm_first=norm_first, - output_range=transformer_output_range, - kernel_initializer=initializer, + return_attention_scores=return_attention_scores, + kernel_initializer=tf_utils.clone_initializer(initializer), name='transformer/layer_%d' % i) transformer_layers.append(layer) - data = layer([data, attention_mask]) + data = layer([data, attention_mask], + output_range=transformer_output_range) + if return_attention_scores: + data, attention_scores = data + attention_outputs.append(attention_scores) encoder_outputs.append(data) last_encoder_output = encoder_outputs[-1] @@ -457,7 +509,7 @@ class BertEncoder(tf.keras.Model): pooler_layer = tf.keras.layers.Dense( units=hidden_size, activation='tanh', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='pooler_transform') cls_output = pooler_layer(first_token_tensor) @@ -466,6 +518,8 @@ class BertEncoder(tf.keras.Model): pooled_output=cls_output, encoder_outputs=encoder_outputs, ) + if return_attention_scores: + outputs['attention_scores'] = attention_outputs if dict_outputs: super().__init__( @@ -478,6 +532,8 @@ class BertEncoder(tf.keras.Model): else: sequence_output = outputs['sequence_output'] outputs = [sequence_output, cls_output] + if return_attention_scores: + outputs.append(attention_outputs) super().__init__( # pylint: disable=bad-super-call inputs=[word_ids, mask, type_ids], outputs=outputs, @@ -509,6 +565,7 @@ class BertEncoder(tf.keras.Model): 'embedding_layer': embedding_layer, 'norm_first': norm_first, 'dict_outputs': dict_outputs, + 'return_attention_scores': return_attention_scores, } # pylint: disable=protected-access self._setattr_tracking = False @@ -547,3 +604,4 @@ class BertEncoder(tf.keras.Model): logging.warn(warn_string) return cls(**config) + diff --git a/official/nlp/modeling/networks/bert_encoder_test.py b/official/nlp/modeling/networks/bert_encoder_test.py index 9b3b0826759b4198d282874d4ad57b17422f769c..7bc9b4f27ffeef29a870510dc8d8a01a629d7056 100644 --- a/official/nlp/modeling/networks/bert_encoder_test.py +++ b/official/nlp/modeling/networks/bert_encoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -106,6 +106,42 @@ class BertEncoderTest(keras_parameterized.TestCase): self.assertAllEqual(tf.float32, all_encoder_outputs[-1].dtype) self.assertAllEqual(tf.float32, pooled.dtype) + @parameterized.named_parameters( + ("encoder_v2", bert_encoder.BertEncoderV2), + ("encoder_v1", bert_encoder.BertEncoder), + ) + def test_dict_outputs_network_creation_return_attention_scores( + self, encoder_cls): + hidden_size = 32 + sequence_length = 21 + num_attention_heads = 5 + num_layers = 3 + # Create a small BertEncoder for testing. + test_network = encoder_cls( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=num_attention_heads, + num_layers=num_layers, + return_attention_scores=True, + dict_outputs=True) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + all_attention_outputs = dict_outputs["attention_scores"] + + expected_data_shape = [ + None, num_attention_heads, sequence_length, sequence_length + ] + self.assertLen(all_attention_outputs, num_layers) + for data in all_attention_outputs: + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, all_attention_outputs[-1].dtype) + @parameterized.named_parameters( ("encoder_v2", bert_encoder.BertEncoderV2), ("encoder_v1", bert_encoder.BertEncoder), @@ -369,6 +405,34 @@ class BertEncoderTest(keras_parameterized.TestCase): self.assertAllEqual(tf.float32, all_encoder_outputs[-1].dtype) self.assertAllEqual(tf.float32, pooled.dtype) + def test_attention_scores_output_network_creation(self): + hidden_size = 32 + sequence_length = 21 + num_attention_heads = 5 + num_layers = 3 + # Create a small BertEncoder for testing. + test_network = bert_encoder.BertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=num_attention_heads, + num_layers=num_layers, + return_attention_scores=True) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + _, _, all_attention_outputs = test_network([word_ids, mask, type_ids]) + + expected_data_shape = [ + None, num_attention_heads, sequence_length, sequence_length + ] + self.assertLen(all_attention_outputs, num_layers) + for data in all_attention_outputs: + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, all_attention_outputs[-1].dtype) + def test_network_creation_with_float16_dtype(self): hidden_size = 32 sequence_length = 21 @@ -481,8 +545,7 @@ class BertEncoderV2CompatibilityTest(tf.test.TestCase): hidden_size=hidden_size, num_attention_heads=2, num_layers=3, - type_vocab_size=num_types, - output_range=None) + type_vocab_size=num_types) word_id_data = np.random.randint( vocab_size, size=(batch_size, sequence_length)) @@ -541,8 +604,7 @@ class BertEncoderV2CompatibilityTest(tf.test.TestCase): hidden_size=hidden_size, num_attention_heads=2, num_layers=3, - type_vocab_size=num_types, - output_range=None) + type_vocab_size=num_types) word_id_data = np.random.randint( vocab_size, size=(batch_size, sequence_length)) diff --git a/official/nlp/modeling/networks/classification.py b/official/nlp/modeling/networks/classification.py index b91810796e3d9aec66605c6c24a13a46252e6dbf..ce8b1d7048593022e7355de3be2251f531934ebb 100644 --- a/official/nlp/modeling/networks/classification.py +++ b/official/nlp/modeling/networks/classification.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -74,7 +74,7 @@ class Classification(tf.keras.Model): ('Unknown `output` value "%s". `output` can be either "logits" or ' '"predictions"') % output) - super(Classification, self).__init__( + super().__init__( inputs=[cls_output], outputs=output_tensors, **kwargs) # b/164516224 diff --git a/official/nlp/modeling/networks/classification_test.py b/official/nlp/modeling/networks/classification_test.py index ba0360855ec344225398f1e689dfa08106a42656..3f0551813274c4b8eb0549006b5e3a3e9beeb21f 100644 --- a/official/nlp/modeling/networks/classification_test.py +++ b/official/nlp/modeling/networks/classification_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/networks/encoder_scaffold.py b/official/nlp/modeling/networks/encoder_scaffold.py index b71a74b706a93b448b7c2ece8a7bd686cf2d45d8..72130d785a53748bc0f33fc97ce9e3ac2deee3a8 100644 --- a/official/nlp/modeling/networks/encoder_scaffold.py +++ b/official/nlp/modeling/networks/encoder_scaffold.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,6 +21,7 @@ from absl import logging import gin import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling import layers @@ -153,14 +154,14 @@ class EncoderScaffold(tf.keras.Model): embedding_layer = layers.OnDeviceEmbedding( vocab_size=embedding_cfg['vocab_size'], embedding_width=embedding_cfg['hidden_size'], - initializer=embedding_cfg['initializer'], + initializer=tf_utils.clone_initializer(embedding_cfg['initializer']), name='word_embeddings') word_embeddings = embedding_layer(word_ids) # Always uses dynamic slicing for simplicity. position_embedding_layer = layers.PositionEmbedding( - initializer=embedding_cfg['initializer'], + initializer=tf_utils.clone_initializer(embedding_cfg['initializer']), max_length=embedding_cfg['max_seq_length'], name='position_embedding') position_embeddings = position_embedding_layer(word_embeddings) @@ -168,7 +169,7 @@ class EncoderScaffold(tf.keras.Model): type_embedding_layer = layers.OnDeviceEmbedding( vocab_size=embedding_cfg['type_vocab_size'], embedding_width=embedding_cfg['hidden_size'], - initializer=embedding_cfg['initializer'], + initializer=tf_utils.clone_initializer(embedding_cfg['initializer']), use_one_hot=True, name='type_embeddings') type_embeddings = type_embedding_layer(type_ids) @@ -243,6 +244,8 @@ class EncoderScaffold(tf.keras.Model): # like this will create a SliceOpLambda layer. This is better than a Lambda # layer with Python code, because that is fundamentally less portable. first_token_tensor = last_layer_output[:, 0, :] + pooler_layer_initializer = tf.keras.initializers.get( + pooler_layer_initializer) pooler_layer = tf.keras.layers.Dense( units=pooled_output_dim, activation='tanh', @@ -268,7 +271,7 @@ class EncoderScaffold(tf.keras.Model): # created using the Functional API. Once super().__init__ is called, we # can assign attributes to `self` - note that all `self` assignments are # below this line. - super(EncoderScaffold, self).__init__( + super().__init__( inputs=inputs, outputs=outputs, **kwargs) self._hidden_cls = hidden_cls @@ -303,7 +306,8 @@ class EncoderScaffold(tf.keras.Model): config_dict = { 'num_hidden_instances': self._num_hidden_instances, 'pooled_output_dim': self._pooled_output_dim, - 'pooler_layer_initializer': self._pooler_layer_initializer, + 'pooler_layer_initializer': tf.keras.initializers.serialize( + self._pooler_layer_initializer), 'embedding_cls': self._embedding_network, 'embedding_cfg': self._embedding_cfg, 'layer_norm_before_pooling': self._layer_norm_before_pooling, diff --git a/official/nlp/modeling/networks/encoder_scaffold_test.py b/official/nlp/modeling/networks/encoder_scaffold_test.py index 433343ae8fbbe5798b33cff3d3b7a8c544e5e2d7..bc0b02e3cf0f4c69e96a623d6f11ddb58594569a 100644 --- a/official/nlp/modeling/networks/encoder_scaffold_test.py +++ b/official/nlp/modeling/networks/encoder_scaffold_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/networks/fnet.py b/official/nlp/modeling/networks/fnet.py new file mode 100644 index 0000000000000000000000000000000000000000..ac9676699425ec31d51b52fac15d40095a79acb4 --- /dev/null +++ b/official/nlp/modeling/networks/fnet.py @@ -0,0 +1,355 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""FNet encoder network. + +Based on ["FNet: Mixing Tokens with Fourier Transforms"] +(https://aclanthology.org/2022.naacl-main.319/). +""" +# pylint: disable=g-classes-have-attributes + +from typing import Any, Callable, Optional, Sequence, Union +from absl import logging +import tensorflow as tf + +from official.modeling import tf_utils +from official.nlp.modeling import layers + +_Activation = Union[str, Callable[..., Any]] +_Initializer = Union[str, tf.keras.initializers.Initializer] + +_approx_gelu = lambda x: tf.keras.activations.gelu(x, approximate=True) + + +class FNet(tf.keras.layers.Layer): + """FNet encoder network. + + Based on ["FNet: Mixing Tokens with Fourier Transforms"] + (https://aclanthology.org/2022.naacl-main.319/). FNet is an efficient + Transformer-like encoder network that replaces self-attention sublayers with + Fourier sublayers. + + This implementation defaults to the canonical FNet Base model, but the network + also supports more general mixing models (e.g. 'Linear', 'HNet') and hybrid + models (e.g. 'FNet-Hybrid') models that use both mixing and self-attention + layers. The input length is fixed to 'max_sequence_length'. + + Args: + vocab_size: The size of the token vocabulary. + hidden_size: The size of the transformer hidden layers. + num_layers: The number of transformer layers. + mixing_mechanism: Type of mixing mechanism used in place of self-attention + layers. Defaults to FNet ('Fourier') mixing. + use_fft: Only used for spectral mixing mechanims. Determines whether to use + Fast Fourier Transform (True) or the Discrete Fourier Transform (DFT) + matrix (False; default) to compute the Fourier Transform. See + layers.FourierTransformLayer or layers.HartleyTransformLayer for advice. + attention_layers: Specifies which layers, if any, should be attention layers + in the encoder. The remaining [0, num_layers) setminus attention_layers + will use the specified `mixing_mechanism`. If using attention layers, a + good rule of thumb is to place them in the final few layers. + num_attention_heads: The number of attention heads for each transformer. The + hidden size must be divisible by the number of attention heads. + max_sequence_length: The only sequence length that this encoder can + consume. This determines the variable shape for positional embeddings and + the size of the mixing matrices. + type_vocab_size: The number of types that the 'type_ids' input can take. + inner_dim: The output dimension of the first Dense layer in a two-layer + feedforward network for each transformer. + inner_activation: The activation for the first Dense layer in a two-layer + feedforward network for each transformer. + output_dropout: Dropout probability for the post-attention and output + dropout. + attention_dropout: The dropout rate to use for the attention layers within + the transformer layers. + initializer: The initializer to use for all weights in this encoder. + output_range: The sequence output range, [0, output_range), by slicing the + target sequence of the last transformer layer. `None` means the entire + target sequence will attend to the source sequence, which yields the full + output. + embedding_width: The width of the word embeddings. If the embedding width is + not equal to hidden size, embedding parameters will be factorized into two + matrices in the shape of ['vocab_size', 'embedding_width'] and + ['embedding_width', 'hidden_size'] ('embedding_width' is usually much + smaller than 'hidden_size'). + embedding_layer: An optional Layer instance which will be called to generate + embeddings for the input word IDs. + norm_first: Whether to normalize inputs to attention and intermediate dense + layers. If set False, output of attention and intermediate dense layers is + normalized. + with_dense_inputs: Whether to accept dense embeddings as the input. + """ + + def __init__( + self, + vocab_size: int, + hidden_size: int = 768, + num_layers: int = 12, + mixing_mechanism: layers.MixingMechanism = layers.MixingMechanism.FOURIER, + use_fft: bool = False, + attention_layers: Sequence[int] = (), + num_attention_heads: int = 12, + max_sequence_length: int = 512, + type_vocab_size: int = 16, + inner_dim: int = 3072, + inner_activation: _Activation = _approx_gelu, + output_dropout: float = 0.1, + attention_dropout: float = 0.1, + initializer: _Initializer = tf.keras.initializers.TruncatedNormal( + stddev=0.02), + output_range: Optional[int] = None, + embedding_width: Optional[int] = None, + embedding_layer: Optional[tf.keras.layers.Layer] = None, + norm_first: bool = False, + with_dense_inputs: bool = False, + **kwargs): + super().__init__(**kwargs) + + activation = tf.keras.activations.get(inner_activation) + initializer = tf.keras.initializers.get(initializer) + + if embedding_width is None: + embedding_width = hidden_size + + self._config = { + 'vocab_size': vocab_size, + 'hidden_size': hidden_size, + 'num_layers': num_layers, + 'mixing_mechanism': mixing_mechanism, + 'use_fft': use_fft, + 'attention_layers': attention_layers, + 'num_attention_heads': num_attention_heads, + 'max_sequence_length': max_sequence_length, + 'type_vocab_size': type_vocab_size, + 'inner_dim': inner_dim, + 'inner_activation': tf.keras.activations.serialize(activation), + 'output_dropout': output_dropout, + 'attention_dropout': attention_dropout, + 'initializer': tf.keras.initializers.serialize(initializer), + 'output_range': output_range, + 'embedding_width': embedding_width, + 'embedding_layer': embedding_layer, + 'norm_first': norm_first, + 'with_dense_inputs': with_dense_inputs, + } + + if embedding_layer is None: + self._embedding_layer = layers.OnDeviceEmbedding( + vocab_size=vocab_size, + embedding_width=embedding_width, + initializer=tf_utils.clone_initializer(initializer), + name='word_embeddings') + else: + self._embedding_layer = embedding_layer + + self._position_embedding_layer = layers.PositionEmbedding( + initializer=tf_utils.clone_initializer(initializer), + max_length=max_sequence_length, + name='position_embedding') + + self._type_embedding_layer = layers.OnDeviceEmbedding( + vocab_size=type_vocab_size, + embedding_width=embedding_width, + initializer=tf_utils.clone_initializer(initializer), + use_one_hot=True, + name='type_embeddings') + + self._embedding_norm_layer = tf.keras.layers.LayerNormalization( + name='embeddings/layer_norm', axis=-1, epsilon=1e-12, dtype=tf.float32) + + self._embedding_dropout = tf.keras.layers.Dropout( + rate=output_dropout, name='embedding_dropout') + + # We project the 'embedding' output to 'hidden_size' if it is not already + # 'hidden_size'. + self._embedding_projection = None + if embedding_width != hidden_size: + self._embedding_projection = tf.keras.layers.EinsumDense( + '...x,xy->...y', + output_shape=hidden_size, + bias_axes='y', + kernel_initializer=tf_utils.clone_initializer(initializer), + name='embedding_projection') + + self._transformer_layers = [] + for layer in range(num_layers): + if layer in attention_layers: + mixing_layer = layers.MultiHeadAttention( + num_heads=num_attention_heads, + key_dim=int(hidden_size // num_attention_heads), + dropout=attention_dropout, + use_bias=True, + kernel_initializer=tf_utils.clone_initializer(initializer), + name='self_attention', + ) + else: + mixing_layer = self._init_mixing_sublayer(layer) + + block = layers.TransformerScaffold( + num_attention_heads=num_attention_heads, + inner_dim=inner_dim, + inner_activation=inner_activation, + attention_cls=mixing_layer, + feedforward_cls=None, # Fallback to default FeedForward class + output_dropout=output_dropout, + attention_dropout=attention_dropout, + norm_first=norm_first, + output_range=output_range if layer == num_layers - 1 else None, + kernel_initializer=tf_utils.clone_initializer(initializer), + name='transformer/layer_%d' % layer) + self._transformer_layers.append(block) + + self._attention_mask_layer = layers.SelfAttentionMask( + name='self_attention_mask') + + self._pooler_layer = tf.keras.layers.Dense( + units=hidden_size, + activation='tanh', + kernel_initializer=tf_utils.clone_initializer(initializer), + name='pooler_transform') + + if with_dense_inputs: + self.inputs = dict( + input_word_ids=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32), + input_mask=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32), + input_type_ids=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32), + dense_inputs=tf.keras.Input( + shape=(max_sequence_length, embedding_width), dtype=tf.float32), + dense_mask=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32), + dense_type_ids=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32), + ) + else: + self.inputs = dict( + input_word_ids=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32), + input_mask=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32), + input_type_ids=tf.keras.Input( + shape=(max_sequence_length,), dtype=tf.int32)) + self._max_sequence_length = max_sequence_length + + def call(self, inputs): + word_embeddings = None + if isinstance(inputs, dict): + word_ids = inputs.get('input_word_ids') + mask = inputs.get('input_mask') + type_ids = inputs.get('input_type_ids') + word_embeddings = inputs.get('input_word_embeddings', None) + + dense_inputs = inputs.get('dense_inputs', None) + dense_mask = inputs.get('dense_mask', None) + dense_type_ids = inputs.get('dense_type_ids', None) + else: + raise ValueError('Unexpected inputs type (%s) to %s.' % + (type(inputs), self.__class__)) + + if word_embeddings is None: + word_embeddings = self._embedding_layer(word_ids) + + if dense_inputs is not None: + # Concat the dense embeddings at sequence end. + word_embeddings = tf.concat([word_embeddings, dense_inputs], axis=1) + type_ids = tf.concat([type_ids, dense_type_ids], axis=1) + mask = tf.concat([mask, dense_mask], axis=1) + + seq_length = word_embeddings.shape[1] + if seq_length != self._max_sequence_length: + raise ValueError('FNet: Sequence length must be the same as ' + '`max_sequence_length` ({}), but it is {}.'.format( + self._max_sequence_length, seq_length)) + + # Absolute position embeddings. + position_embeddings = self._position_embedding_layer(word_embeddings) + type_embeddings = self._type_embedding_layer(type_ids) + + embeddings = word_embeddings + position_embeddings + type_embeddings + embeddings = self._embedding_norm_layer(embeddings) + embeddings = self._embedding_dropout(embeddings) + + if self._embedding_projection is not None: + embeddings = self._embedding_projection(embeddings) + + attention_mask = self._attention_mask_layer(embeddings, mask) + + encoder_outputs = [] + x = embeddings + for layer in self._transformer_layers: + x = layer([x, attention_mask]) + encoder_outputs.append(x) + + last_encoder_output = encoder_outputs[-1] + first_token_tensor = last_encoder_output[:, 0, :] + pooled_output = self._pooler_layer(first_token_tensor) + + output = dict( + sequence_output=encoder_outputs[-1], + pooled_output=pooled_output, + encoder_outputs=encoder_outputs) + return output + + def get_embedding_table(self): + return self._embedding_layer.embeddings + + def get_embedding_layer(self): + return self._embedding_layer + + def get_config(self): + return dict(self._config) + + @property + def transformer_layers(self): + """List of Transformer layers in the encoder.""" + return self._transformer_layers + + @property + def pooler_layer(self): + """The pooler dense layer after the transformer layers.""" + return self._pooler_layer + + @classmethod + def from_config(cls, config, custom_objects=None): + if 'embedding_layer' in config and config['embedding_layer'] is not None: + warn_string = ( + 'You are reloading a model that was saved with a ' + 'potentially-shared embedding layer object. If you contine to ' + 'train this model, the embedding layer will no longer be shared. ' + 'To work around this, load the model outside of the Keras API.') + print('WARNING: ' + warn_string) + logging.warn(warn_string) + + return cls(**config) + + def _init_mixing_sublayer(self, layer: int): + """Initializes config-dependent mixing sublayer.""" + if self._config['mixing_mechanism'] == layers.MixingMechanism.FOURIER: + mixing_sublayer = layers.FourierTransformLayer( + use_fft=self._config['use_fft'], name='fourier_transform') + elif self._config['mixing_mechanism'] == layers.MixingMechanism.HARTLEY: + mixing_sublayer = layers.HartleyTransformLayer( + use_fft=self._config['use_fft'], name='hartley_transform') + elif self._config['mixing_mechanism'] == layers.MixingMechanism.LINEAR: + mixing_sublayer = layers.LinearTransformLayer( + kernel_initializer=tf_utils.clone_initializer( + self._config['initializer']), + name='linear_transform') + else: + raise ValueError('Unsupported mixing mechanism: %s' % + self._config['mixing_mechanism']) + + return mixing_sublayer diff --git a/official/nlp/modeling/networks/fnet_test.py b/official/nlp/modeling/networks/fnet_test.py new file mode 100644 index 0000000000000000000000000000000000000000..09a32e2d13c211152aeddc07b06eb27e8ec4a6d7 --- /dev/null +++ b/official/nlp/modeling/networks/fnet_test.py @@ -0,0 +1,119 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for FNet encoder network.""" + +from typing import Sequence + +from absl.testing import parameterized +import tensorflow as tf + +from official.nlp.modeling import layers +from official.nlp.modeling.networks import fnet + + +class FNetTest(parameterized.TestCase, tf.test.TestCase): + + def tearDown(self): + super(FNetTest, self).tearDown() + tf.keras.mixed_precision.set_global_policy("float32") + + @parameterized.named_parameters( + ("fnet", layers.MixingMechanism.FOURIER, ()), + ("fnet_hybrid", layers.MixingMechanism.FOURIER, (1, 2)), + ("hnet", layers.MixingMechanism.HARTLEY, ()), + ("hnet_hybrid", layers.MixingMechanism.HARTLEY, (1, 2)), + ("linear", layers.MixingMechanism.LINEAR, ()), + ("linear_hybrid", layers.MixingMechanism.LINEAR, (0,)), + ("bert", layers.MixingMechanism.FOURIER, (0, 1, 2)), + ) + def test_network(self, mixing_mechanism: layers.MixingMechanism, + attention_layers: Sequence[int]): + num_layers = 3 + hidden_size = 32 + sequence_length = 21 + test_network = fnet.FNet( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + max_sequence_length=sequence_length, + num_layers=num_layers, + mixing_mechanism=mixing_mechanism, + attention_layers=attention_layers) + + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + self.assertIsInstance(test_network.transformer_layers, list) + self.assertLen(test_network.transformer_layers, 3) + self.assertIsInstance(test_network.pooler_layer, tf.keras.layers.Dense) + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, data.dtype) + self.assertAllEqual(tf.float32, pooled.dtype) + + def test_embeddings_as_inputs(self): + hidden_size = 32 + sequence_length = 21 + test_network = fnet.FNet( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + max_sequence_length=sequence_length, + num_layers=3) + + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + + test_network.build( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + embeddings = test_network.get_embedding_layer()(word_ids) + + # Calls with the embeddings. + dict_outputs = test_network( + dict( + input_word_embeddings=embeddings, + input_mask=mask, + input_type_ids=type_ids)) + all_encoder_outputs = dict_outputs["encoder_outputs"] + pooled = dict_outputs["pooled_output"] + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertLen(all_encoder_outputs, 3) + for data in all_encoder_outputs: + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, all_encoder_outputs[-1].dtype) + self.assertAllEqual(tf.float32, pooled.dtype) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/nlp/modeling/networks/funnel_transformer.py b/official/nlp/modeling/networks/funnel_transformer.py index fd3d10c114f453d5afab515ccc7d89e9533b7bd7..f1957f870196fc8ddd06f1ef57522a8403a3016b 100644 --- a/official/nlp/modeling/networks/funnel_transformer.py +++ b/official/nlp/modeling/networks/funnel_transformer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,6 +20,7 @@ from absl import logging import numpy as np import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling import layers _Initializer = Union[str, tf.keras.initializers.Initializer] @@ -226,6 +227,7 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): funnel encoder relies on. share_rezero: bool. Whether to share ReZero alpha between the attention layer and the ffn layer. This option is specific to ReZero. + with_dense_inputs: Whether to accept dense embeddings as the input. """ def __init__( @@ -251,9 +253,14 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): norm_first: bool = False, transformer_cls: Union[ str, tf.keras.layers.Layer] = layers.TransformerEncoderBlock, - share_rezero: bool = True, + share_rezero: bool = False, **kwargs): super().__init__(**kwargs) + + if output_range is not None: + logging.warning('`output_range` is available as an argument for `call()`.' + 'The `output_range` as __init__ argument is deprecated.') + activation = tf.keras.activations.get(inner_activation) initializer = tf.keras.initializers.get(initializer) @@ -264,20 +271,20 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): self._embedding_layer = layers.OnDeviceEmbedding( vocab_size=vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), name='word_embeddings') else: self._embedding_layer = embedding_layer self._position_embedding_layer = layers.PositionEmbedding( - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), max_length=max_sequence_length, name='position_embedding') self._type_embedding_layer = layers.OnDeviceEmbedding( vocab_size=type_vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), use_one_hot=True, name='type_embeddings') @@ -291,11 +298,11 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): # 'hidden_size'. self._embedding_projection = None if embedding_width != hidden_size: - self._embedding_projection = tf.keras.layers.experimental.EinsumDense( + self._embedding_projection = tf.keras.layers.EinsumDense( '...x,xy->...y', output_shape=hidden_size, bias_axes='y', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='embedding_projection') self._transformer_layers = [] @@ -304,6 +311,7 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): # Will raise an error if the string is not supported. if isinstance(transformer_cls, str): transformer_cls = _str2transformer_cls[transformer_cls] + self._num_layers = num_layers for i in range(num_layers): layer = transformer_cls( num_attention_heads=num_attention_heads, @@ -314,8 +322,7 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): output_dropout=output_dropout, attention_dropout=attention_dropout, norm_first=norm_first, - output_range=output_range if i == num_layers - 1 else None, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), share_rezero=share_rezero, name='transformer/layer_%d' % i) self._transformer_layers.append(layer) @@ -323,7 +330,7 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): self._pooler_layer = tf.keras.layers.Dense( units=hidden_size, activation='tanh', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='pooler_transform') if isinstance(pool_stride, int): # TODO(b/197133196): Pooling layer can be shared. @@ -341,9 +348,6 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): # TODO(b/203665205): unpool_length should be implemented. if unpool_length != 0: raise ValueError('unpool_length is not supported by truncated_avg now.') - # Compute the attention masks and pooling transforms. - self._pooling_transforms = _create_truncated_avg_transforms( - max_sequence_length, pool_strides) else: raise ValueError('pool_type not supported.') @@ -357,6 +361,7 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): name='att_input_pool_layer') self._att_input_pool_layers.append(att_input_pool_layer) + self._max_sequence_length = max_sequence_length self._pool_strides = pool_strides # This is a list here. self._unpool_length = unpool_length self._pool_type = pool_type @@ -402,12 +407,22 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): _transformer_cls2str.get(transformer_cls, str(transformer_cls)) } - def call(self, inputs): + self.inputs = dict( + input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32)) + + def call(self, inputs, output_range: Optional[tf.Tensor] = None): # inputs are [word_ids, mask, type_ids] if isinstance(inputs, (list, tuple)): logging.warning('List inputs to %s are discouraged.', self.__class__) if len(inputs) == 3: word_ids, mask, type_ids = inputs + dense_inputs = None + dense_mask = None + dense_type_ids = None + elif len(inputs) == 6: + word_ids, mask, type_ids, dense_inputs, dense_mask, dense_type_ids = inputs else: raise ValueError('Unexpected inputs to %s with length at %d.' % (self.__class__, len(inputs))) @@ -415,10 +430,21 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): word_ids = inputs.get('input_word_ids') mask = inputs.get('input_mask') type_ids = inputs.get('input_type_ids') + + dense_inputs = inputs.get('dense_inputs', None) + dense_mask = inputs.get('dense_mask', None) + dense_type_ids = inputs.get('dense_type_ids', None) else: raise ValueError('Unexpected inputs type to %s.' % self.__class__) word_embeddings = self._embedding_layer(word_ids) + + if dense_inputs is not None: + # Concat the dense embeddings at sequence begin so unpool_len can control + # embedding not being pooled. + word_embeddings = tf.concat([dense_inputs, word_embeddings], axis=1) + type_ids = tf.concat([dense_type_ids, type_ids], axis=1) + mask = tf.concat([dense_mask, mask], axis=1) # absolute position embeddings position_embeddings = self._position_embedding_layer(word_embeddings) type_embeddings = self._type_embedding_layer(type_ids) @@ -456,7 +482,9 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): x[:, :self._unpool_length, :], dtype=pooled_inputs.dtype), pooled_inputs), axis=1) - x = layer([query_inputs, x, attention_mask]) + x = layer([query_inputs, x, attention_mask], + output_range=output_range if i == self._num_layers - + 1 else None) # Pools the corresponding attention_mask. if i < len(self._transformer_layers) - 1: attention_mask = _pool_and_concat( @@ -466,25 +494,35 @@ class FunnelTransformerEncoder(tf.keras.layers.Layer): axes=[1, 2]) encoder_outputs.append(x) elif self._pool_type == _TRUNCATED_AVG: + # Compute the attention masks and pooling transforms. + # Note we do not compute this in __init__ due to inference converter issue + # b/215659399. + pooling_transforms = _create_truncated_avg_transforms( + self._max_sequence_length, self._pool_strides) attention_masks = _create_truncated_avg_masks(mask, self._pool_strides, - self._pooling_transforms) + pooling_transforms) for i, layer in enumerate(self._transformer_layers): attention_mask = attention_masks[i] + transformer_output_range = None + if i == self._num_layers - 1: + transformer_output_range = output_range # Bypass no pooling cases. if self._pool_strides[i] == 1: - x = layer([x, x, attention_mask]) + x = layer([x, x, attention_mask], + output_range=transformer_output_range) else: pooled_inputs = tf.einsum( 'BFD,FT->BTD', tf.cast(x[:, self._unpool_length:, :], _get_policy_dtype() ), # extra casting for faster mixed computation. - self._pooling_transforms[i]) + pooling_transforms[i]) query_inputs = tf.concat( values=(tf.cast( x[:, :self._unpool_length, :], dtype=pooled_inputs.dtype), pooled_inputs), axis=1) - x = layer([query_inputs, x, attention_mask]) + x = layer([query_inputs, x, attention_mask], + output_range=transformer_output_range) encoder_outputs.append(x) last_encoder_output = encoder_outputs[-1] diff --git a/official/nlp/modeling/networks/funnel_transformer_test.py b/official/nlp/modeling/networks/funnel_transformer_test.py index 26a519d433c4e64fc96a32c78aa767ec215c0ed9..202b07e7319819debb43e2de2110f9faf94bbe18 100644 --- a/official/nlp/modeling/networks/funnel_transformer_test.py +++ b/official/nlp/modeling/networks/funnel_transformer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -101,6 +101,55 @@ class FunnelTransformerEncoderTest(parameterized.TestCase, tf.test.TestCase): self.assertAllEqual(tf.float32, data.dtype) self.assertAllEqual(pooled_dtype, pooled.dtype) + def test_network_creation_dense(self): + tf.keras.mixed_precision.set_global_policy("mixed_float16") + pool_type = "avg" + + hidden_size = 32 + sequence_length = 21 + dense_sequence_length = 3 + pool_stride = 2 + num_layers = 3 + # Create a small FunnelTransformerEncoder for testing. + test_network = funnel_transformer.FunnelTransformerEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=num_layers, + pool_stride=pool_stride, + pool_type=pool_type, + max_sequence_length=sequence_length + dense_sequence_length, + unpool_length=0, + transformer_cls="TransformerEncoderBlock") + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + + dense_inputs = tf.keras.Input( + shape=(dense_sequence_length, hidden_size), dtype=tf.float32) + dense_mask = tf.keras.Input(shape=(dense_sequence_length,), dtype=tf.int32) + dense_type_ids = tf.keras.Input( + shape=(dense_sequence_length,), dtype=tf.int32) + + dict_outputs = test_network( + [word_ids, mask, type_ids, dense_inputs, dense_mask, dense_type_ids]) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + self.assertIsInstance(test_network.transformer_layers, list) + self.assertLen(test_network.transformer_layers, num_layers) + self.assertIsInstance(test_network.pooler_layer, tf.keras.layers.Dense) + + # Stride=2 compresses sequence length to half the size at each layer. + # For pool_type = max or avg, + # this configuration gives each layer of seq length: 24->12->6->3. + expected_data_shape = [None, 3, hidden_size] + expected_pooled_shape = [None, hidden_size] + + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + def test_invalid_stride_and_num_layers(self): hidden_size = 32 num_layers = 3 @@ -180,14 +229,14 @@ class FunnelTransformerEncoderTest(parameterized.TestCase, tf.test.TestCase): num_attention_heads=2, num_layers=3, type_vocab_size=num_types, - output_range=output_range, pool_stride=pool_stride, unpool_length=unpool_length) # Create the inputs (note that the first dimension is implicit). word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) - dict_outputs = test_network([word_ids, mask, type_ids]) + dict_outputs = test_network([word_ids, mask, type_ids], + output_range=output_range) data = dict_outputs["sequence_output"] pooled = dict_outputs["pooled_output"] diff --git a/official/nlp/modeling/networks/mobile_bert_encoder.py b/official/nlp/modeling/networks/mobile_bert_encoder.py index 8f3dcd9f2d22e77a1f6f780eeb560bc006d31ae9..46b2dbb21c00dc8af19f65606f1fd95443c76890 100644 --- a/official/nlp/modeling/networks/mobile_bert_encoder.py +++ b/official/nlp/modeling/networks/mobile_bert_encoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -146,7 +146,7 @@ class MobileBERTEncoder(tf.keras.Model): first_token = tf.squeeze(prev_output[:, 0:1, :], axis=1) if classifier_activation: - self._pooler_layer = tf.keras.layers.experimental.EinsumDense( + self._pooler_layer = tf.keras.layers.EinsumDense( 'ab,bc->ac', output_shape=hidden_size, activation=tf.tanh, @@ -163,7 +163,7 @@ class MobileBERTEncoder(tf.keras.Model): encoder_outputs=all_layer_outputs, attention_scores=all_attention_scores) - super(MobileBERTEncoder, self).__init__( + super().__init__( inputs=self.inputs, outputs=outputs, **kwargs) def get_embedding_table(self): diff --git a/official/nlp/modeling/networks/mobile_bert_encoder_test.py b/official/nlp/modeling/networks/mobile_bert_encoder_test.py index 2360e7202f87686a83d11bf8d9fd66d1281c1cf1..1b119005b325ab2c3ca46519d5d1bcdda05f971b 100644 --- a/official/nlp/modeling/networks/mobile_bert_encoder_test.py +++ b/official/nlp/modeling/networks/mobile_bert_encoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/networks/packed_sequence_embedding.py b/official/nlp/modeling/networks/packed_sequence_embedding.py index 353c5e88e21bd9708d3e9aceac234de71e9788be..6457e736b15c6e4d588fb90cd5e3cd1c39b674bb 100644 --- a/official/nlp/modeling/networks/packed_sequence_embedding.py +++ b/official/nlp/modeling/networks/packed_sequence_embedding.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -97,13 +97,13 @@ class PackedSequenceEmbedding(tf.keras.Model): embedding_layer = layers.OnDeviceEmbedding( vocab_size=vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), name='word_embeddings') word_embeddings = embedding_layer(word_ids) # Always uses dynamic slicing for simplicity. position_embedding_layer = PositionEmbeddingWithSubSeqMask( - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), use_dynamic_slicing=True, max_sequence_length=max_seq_length, name='position_embedding') @@ -114,7 +114,7 @@ class PackedSequenceEmbedding(tf.keras.Model): layers.OnDeviceEmbedding( vocab_size=type_vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), use_one_hot=True, name='type_embeddings')(type_ids)) @@ -128,11 +128,11 @@ class PackedSequenceEmbedding(tf.keras.Model): embeddings) if embedding_width != hidden_size: - embeddings = tf.keras.layers.experimental.EinsumDense( + embeddings = tf.keras.layers.EinsumDense( '...x,xy->...y', output_shape=hidden_size, bias_axes=None, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='embedding_projection')( embeddings) @@ -143,7 +143,7 @@ class PackedSequenceEmbedding(tf.keras.Model): [attention_mask, sub_seq_mask]) outputs = [embeddings, attention_mask] - super(PackedSequenceEmbedding, self).__init__( + super().__init__( inputs=inputs, outputs=outputs, **kwargs) # TF does not track immutable attrs which do not contain Trackables, # so by creating a config namedtuple instead of a dict we avoid tracking it. @@ -221,7 +221,7 @@ class PositionEmbeddingWithSubSeqMask(tf.keras.layers.Layer): if 'dtype' not in kwargs: kwargs['dtype'] = 'float32' - super(PositionEmbeddingWithSubSeqMask, self).__init__(**kwargs) + super().__init__(**kwargs) if use_dynamic_slicing and max_sequence_length is None: raise ValueError( 'If `use_dynamic_slicing` is True, `max_sequence_length` must be set.' @@ -236,7 +236,7 @@ class PositionEmbeddingWithSubSeqMask(tf.keras.layers.Layer): 'initializer': tf.keras.initializers.serialize(self._initializer), 'use_dynamic_slicing': self._use_dynamic_slicing, } - base_config = super(PositionEmbeddingWithSubSeqMask, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def build(self, input_shape): @@ -273,7 +273,7 @@ class PositionEmbeddingWithSubSeqMask(tf.keras.layers.Layer): shape=[weight_sequence_length, width], initializer=self._initializer) - super(PositionEmbeddingWithSubSeqMask, self).build(input_shape) + super().build(input_shape) def call(self, inputs, position_ids=None, sub_sequence_mask=None): """Implements call() for the layer. diff --git a/official/nlp/modeling/networks/packed_sequence_embedding_test.py b/official/nlp/modeling/networks/packed_sequence_embedding_test.py index bfab20ba33898d66fc6e4e4e8e13b30548ac00bb..64080f3c8f227a4155669e4818686a7e8a977b1c 100644 --- a/official/nlp/modeling/networks/packed_sequence_embedding_test.py +++ b/official/nlp/modeling/networks/packed_sequence_embedding_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/networks/span_labeling.py b/official/nlp/modeling/networks/span_labeling.py index efbf69d19216b24b1af492b4eec7a080d8457265..7da8a174e6c5be59b10b996d77b3e5c7e43a4b5a 100644 --- a/official/nlp/modeling/networks/span_labeling.py +++ b/official/nlp/modeling/networks/span_labeling.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,6 +17,8 @@ import collections import tensorflow as tf +from official.modeling import tf_utils + def _apply_paragraph_mask(logits, paragraph_mask): """Applies a position mask to calculated logits.""" @@ -79,7 +81,7 @@ class SpanLabeling(tf.keras.Model): # created using the Functional API. Once super().__init__ is called, we # can assign attributes to `self` - note that all `self` assignments are # below this line. - super(SpanLabeling, self).__init__( + super().__init__( inputs=[sequence_data], outputs=output_tensors, **kwargs) config_dict = { 'input_width': input_width, @@ -156,12 +158,12 @@ class XLNetSpanLabeling(tf.keras.layers.Layer): self._end_n_top = end_n_top self.start_logits_dense = tf.keras.layers.Dense( units=1, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='predictions/transform/start_logits') self.end_logits_inner_dense = tf.keras.layers.Dense( units=input_width, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), activation=activation, name='predictions/transform/end_logits/inner') self.end_logits_layer_norm = tf.keras.layers.LayerNormalization( @@ -169,18 +171,18 @@ class XLNetSpanLabeling(tf.keras.layers.Layer): name='predictions/transform/end_logits/layernorm') self.end_logits_output_dense = tf.keras.layers.Dense( units=1, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='predictions/transform/end_logits/output') self.answer_logits_inner = tf.keras.layers.Dense( units=input_width, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), activation=activation, name='predictions/transform/answer_logits/inner') self.answer_logits_dropout = tf.keras.layers.Dropout(rate=dropout_rate) self.answer_logits_output = tf.keras.layers.Dense( units=1, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), use_bias=False, name='predictions/transform/answer_logits/output') diff --git a/official/nlp/modeling/networks/span_labeling_test.py b/official/nlp/modeling/networks/span_labeling_test.py index 45084520e0cccdb21d6e1aae146a8cb3e2fe9f99..a51a0a7c6ec8c43f1168abb10b2e400e918178ee 100644 --- a/official/nlp/modeling/networks/span_labeling_test.py +++ b/official/nlp/modeling/networks/span_labeling_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/networks/xlnet_base.py b/official/nlp/modeling/networks/xlnet_base.py index ce32d3dfdda85cdeec5ef1cad4bf7cfbb8d43787..337fd8259ff7ec873c42b4d177280fb3d5518468 100644 --- a/official/nlp/modeling/networks/xlnet_base.py +++ b/official/nlp/modeling/networks/xlnet_base.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,6 +18,7 @@ from absl import logging import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling import layers from official.nlp.modeling.layers import transformer_xl @@ -383,7 +384,7 @@ class RelativePositionEncoding(tf.keras.layers.Layer): """ def __init__(self, hidden_size, **kwargs): - super(RelativePositionEncoding, self).__init__(**kwargs) + super().__init__(**kwargs) self._hidden_size = hidden_size self._inv_freq = 1.0 / (10000.0**( tf.range(0, self._hidden_size, 2.0) / self._hidden_size)) @@ -475,7 +476,7 @@ class XLNetBase(tf.keras.layers.Layer): use_cls_mask=False, embedding_width=None, **kwargs): - super(XLNetBase, self).__init__(**kwargs) + super().__init__(**kwargs) self._vocab_size = vocab_size self._initializer = initializer @@ -507,7 +508,7 @@ class XLNetBase(tf.keras.layers.Layer): self._embedding_layer = layers.OnDeviceEmbedding( vocab_size=self._vocab_size, embedding_width=embedding_width, - initializer=self._initializer, + initializer=tf_utils.clone_initializer(self._initializer), dtype=tf.float32, name="word_embedding") self._dropout = tf.keras.layers.Dropout(rate=self._dropout_rate) @@ -573,7 +574,7 @@ class XLNetBase(tf.keras.layers.Layer): "embedding_width": self._embedding_width, } - base_config = super(XLNetBase, self).get_config() + base_config = super().get_config() return dict(list(base_config.items()) + list(config.items())) def get_embedding_lookup_table(self): @@ -600,7 +601,7 @@ class XLNetBase(tf.keras.layers.Layer): "target_mapping": target_mapping, "masked_tokens": masked_tokens } - return super(XLNetBase, self).__call__(inputs, **kwargs) + return super().__call__(inputs, **kwargs) def call(self, inputs): """Implements call() for the layer.""" @@ -666,7 +667,7 @@ class XLNetBase(tf.keras.layers.Layer): shape=[self._num_layers, 2, self._num_attention_heads, self._head_size], dtype=tf.float32, - initializer=self._initializer) + initializer=tf_utils.clone_initializer(self._initializer)) segment_embedding = self._segment_embedding segment_matrix = _compute_segment_matrix( diff --git a/official/nlp/modeling/networks/xlnet_base_test.py b/official/nlp/modeling/networks/xlnet_base_test.py index 81db32487325b3b61d47afac6217590491067257..c2abda3871189d020b74b77032aa2088da262831 100644 --- a/official/nlp/modeling/networks/xlnet_base_test.py +++ b/official/nlp/modeling/networks/xlnet_base_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/ops/__init__.py b/official/nlp/modeling/ops/__init__.py index e21f33273f3801a34073aceecf301e23808727d3..3fec3645836df885429024c3b06c5a1fbc5669a2 100644 --- a/official/nlp/modeling/ops/__init__.py +++ b/official/nlp/modeling/ops/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,5 +14,7 @@ """Ops package definition.""" from official.nlp.modeling.ops.beam_search import sequence_beam_search +from official.nlp.modeling.ops.beam_search import SequenceBeamSearch +from official.nlp.modeling.ops.sampling_module import SamplingModule from official.nlp.modeling.ops.segment_extractor import get_next_sentence_labels from official.nlp.modeling.ops.segment_extractor import get_sentence_order_labels diff --git a/official/nlp/modeling/ops/beam_search.py b/official/nlp/modeling/ops/beam_search.py index eddb31212bcd633cb35b9eeb78b49a796f73b2e7..afac4b81e69711dfd446582be3c0e23850968021 100644 --- a/official/nlp/modeling/ops/beam_search.py +++ b/official/nlp/modeling/ops/beam_search.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -107,18 +107,18 @@ class SequenceBeamSearch(tf.Module): max_decode_length, eos_id, padded_decode, - dtype=tf.float32): + dtype=tf.float32, + decoding_name=None): """Initialize sequence beam search. Args: - symbols_to_logits_fn: A function to provide logits, which is the - interface to the Transformer model. The passed in arguments are: ids -> - A tensor with shape [batch_size * beam_size, index]. index -> A - scalar. cache -> A nested dictionary of tensors [batch_size * - beam_size, ...]. - The function must return a tuple of logits and the updated cache: logits - -> A tensor with shape [batch * beam_size, vocab_size]. updated cache - -> A nested dictionary with the same structure as the input cache. + symbols_to_logits_fn: A function to provide logits, which is the interface + to the Transformer model. The passed in arguments are: ids -> A tensor + with shape [batch_size * beam_size, index]. index -> A scalar. cache -> + A nested dictionary of tensors [batch_size * beam_size, ...]. The + function must return a tuple of logits and the updated cache: logits -> + A tensor with shape [batch * beam_size, vocab_size]. updated cache -> A + nested dictionary with the same structure as the input cache. vocab_size: An integer, the size of the vocabulary, used for topk computation. beam_size: An integer, number of beams for beam search. @@ -130,6 +130,7 @@ class SequenceBeamSearch(tf.Module): for beam search. dtype: A tensorflow data type used for score computation. The default is tf.float32. + decoding_name: an optional name for the decoding loop tensors. """ self.symbols_to_logits_fn = symbols_to_logits_fn self.vocab_size = vocab_size @@ -139,6 +140,7 @@ class SequenceBeamSearch(tf.Module): self.eos_id = eos_id self.padded_decode = padded_decode self.dtype = tf.as_dtype(dtype) + self.decoding_name = decoding_name def search(self, initial_ids, initial_cache): """Beam search for sequences with highest scores. @@ -204,7 +206,7 @@ class SequenceBeamSearch(tf.Module): candidate_log_probs = _log_prob_from_logits(logits) # Calculate new log probabilities if each of the alive sequences were - # extended # by the the candidate IDs. + # extended # by the candidate IDs. # Shape [batch_size, beam_size, vocab_size] log_probs = candidate_log_probs + tf.expand_dims(alive_log_probs, axis=2) @@ -370,7 +372,8 @@ class SequenceBeamSearch(tf.Module): _search_step, loop_vars=[state], shape_invariants=[state_shapes], - parallel_iterations=1)) + parallel_iterations=1, + name=self.decoding_name)) finished_state = finished_state[0] return self._process_finished_state(finished_state) @@ -587,7 +590,8 @@ def sequence_beam_search(symbols_to_logits_fn, max_decode_length, eos_id, padded_decode=False, - dtype="float32"): + dtype="float32", + decoding_name=None): """Search for sequence of subtoken ids with the largest probability. Args: @@ -612,13 +616,15 @@ def sequence_beam_search(symbols_to_logits_fn, beam search. dtype: A tensorflow data type used for score computation. The default is tf.float32. + decoding_name: an optional name for the decoding loop tensors. Returns: Top decoded sequences [batch_size, beam_size, max_decode_length] sequence scores [batch_size, beam_size] """ sbs = SequenceBeamSearch(symbols_to_logits_fn, vocab_size, beam_size, alpha, - max_decode_length, eos_id, padded_decode, dtype) + max_decode_length, eos_id, padded_decode, dtype, + decoding_name) return sbs.search(initial_ids, initial_cache) diff --git a/official/nlp/modeling/ops/beam_search_test.py b/official/nlp/modeling/ops/beam_search_test.py index 6b46868c3841437107e8858075b36dfed9bbcd64..89daabe137acf1df77d977eca22f058858a3f3ef 100644 --- a/official/nlp/modeling/ops/beam_search_test.py +++ b/official/nlp/modeling/ops/beam_search_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -60,10 +60,12 @@ class BeamSearchTests(tf.test.TestCase, parameterized.TestCase): y) @parameterized.named_parameters([ - ('padded_decode_true', True), - ('padded_decode_false', False), + ('padded_decode_true_with_name', True, 'decoding'), + ('padded_decode_false_with_name', False, 'decoding'), + ('padded_decode_true_without_name', True, None), + ('padded_decode_false_without_name', False, None), ]) - def test_sequence_beam_search(self, padded_decode): + def test_sequence_beam_search(self, padded_decode, name): # batch_size*beam_size, max_decode_length, vocab_size probabilities = tf.constant([[[0.2, 0.7, 0.1], [0.5, 0.3, 0.2], [0.1, 0.8, 0.1]], @@ -91,7 +93,8 @@ class BeamSearchTests(tf.test.TestCase, parameterized.TestCase): max_decode_length=3, eos_id=9, padded_decode=padded_decode, - dtype=tf.float32) + dtype=tf.float32, + decoding_name=name) self.assertAllEqual([[[0, 1, 0, 1], [0, 1, 1, 2]]], predictions) diff --git a/official/nlp/modeling/ops/decoding_module.py b/official/nlp/modeling/ops/decoding_module.py index bfd928f130ed82f839155bdc845a5d7326e1ec2f..e4272936c234703cbf27596894e6f6b2033557c8 100644 --- a/official/nlp/modeling/ops/decoding_module.py +++ b/official/nlp/modeling/ops/decoding_module.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,14 +15,14 @@ """Base class for Decoding Strategies (beam_search, top_k, top_p and greedy).""" import abc -from typing import Any, Callable, Dict, Tuple +from typing import Any, Callable, Dict, Optional, Tuple import tensorflow as tf from tensorflow.python.framework import dtypes from official.modeling import tf_utils -Output = Tuple[tf.Tensor, tf.Tensor] +Output = Tuple[tf.Tensor, tf.Tensor, Optional[tf.Tensor]] InternalState = Tuple[tf.Tensor, tf.Tensor, tf.Tensor, Dict] InitialState = Tuple[Dict[str, Any], Dict[str, Any]] @@ -46,6 +46,10 @@ class StateKeys: # the previous iteration. ALIVE_CACHE = "ALIVE_CACHE" + # The initial model state/cache after model processing the initial token. + # The cache will be filled if extra_cache_output is true. + INITIAL_OUTPUT_CACHE = "INITIAL_OUTPUT_CACHE" + # Top finished sequences for each batch item. # Has shape [batch_size, beam_size, CUR_INDEX + 1]. Sequences that are # shorter than CUR_INDEX + 1 are padded with 0s. @@ -108,7 +112,9 @@ class DecodingModule(tf.Module, metaclass=abc.ABCMeta): def __init__(self, length_normalization_fn: Callable[[int, tf.DType], float], - dtype: tf.DType = tf.float32): + dtype: tf.DType = tf.float32, + decoding_name: Optional[str] = None, + extra_cache_output: bool = False): """Initialize the Decoding Module. Args: @@ -116,31 +122,39 @@ class DecodingModule(tf.Module, metaclass=abc.ABCMeta): parameter. Function accepts input as length, dtype and returns float. dtype: A tensorflow data type used for score computation. The default is tf.float32. + decoding_name: an optional name for the decoding loop tensors. + extra_cache_output: If true, the first cache will be in the states. """ self.length_normalization_fn = length_normalization_fn self.dtype = tf.as_dtype(dtype) + self.decoding_name = decoding_name def generate(self, initial_ids: tf.Tensor, - initial_cache: Dict[str, tf.Tensor]) -> Output: + initial_cache: Dict[str, tf.Tensor], + initial_log_probs: Optional[tf.Tensor] = None) -> Output: """Implements the decoding strategy (beam_search or sampling). Args: - initial_ids: initial ids to pass into the symbols_to_logits_fn. - int tensor with shape [batch_size, 1] + initial_ids: initial ids to pass into the symbols_to_logits_fn. int tensor + with shape [batch_size, 1] initial_cache: dictionary for caching model outputs from previous step. + initial_log_probs: Optionally initial log probs if there is a prefix + sequence we want to start to decode from. + Returns: Tuple of tensors representing finished_sequence: shape [batch, max_seq_length] finished_scores: [batch] + first_cache: The cache after init token """ batch_size = ( initial_ids.shape.as_list()[0] if self.padded_decode else tf.shape(initial_ids)[0]) - state, state_shapes = self._create_initial_state(initial_ids, - initial_cache, - batch_size) + state, state_shapes = self._create_initial_state(initial_ids, initial_cache, + batch_size, + initial_log_probs) def _generate_step(state): topk_seq, topk_log_probs, topk_ids, new_cache = self._grow_alive_seq( @@ -160,6 +174,17 @@ class DecodingModule(tf.Module, metaclass=abc.ABCMeta): } new_state.update(alive_state) new_state.update(finished_state) + if self.extra_cache_output: + i = state[StateKeys.CUR_INDEX] + old_cache = state[StateKeys.INITIAL_OUTPUT_CACHE] + + def update_with_cache(new_state, cache): + """Updates new_state with cache.""" + new_state.update({StateKeys.INITIAL_OUTPUT_CACHE: cache}) + + tf.cond( + tf.equal(i, 0), lambda: update_with_cache(new_state, new_cache), + lambda: update_with_cache(new_state, old_cache)) return [new_state] finished_state = tf.nest.map_structure( @@ -169,15 +194,18 @@ class DecodingModule(tf.Module, metaclass=abc.ABCMeta): _generate_step, loop_vars=[state], shape_invariants=[state_shapes], - parallel_iterations=1)) + parallel_iterations=1, + name=self.decoding_name)) final_state = self._process_finished_state(finished_state[0]) return final_state @abc.abstractmethod - def _create_initial_state(self, - initial_ids: tf.Tensor, - initial_cache: Dict[str, tf.Tensor], - batch_size: int) -> InitialState: + def _create_initial_state( + self, + initial_ids: tf.Tensor, + initial_cache: Dict[str, tf.Tensor], + batch_size: int, + initial_log_probs: Optional[tf.Tensor] = None) -> InitialState: """Return initial state dictionary and its shape invariants.""" pass @@ -277,6 +305,3 @@ class DecodingModule(tf.Module, metaclass=abc.ABCMeta): return dtypes.float16.max else: raise AssertionError("Invalid dtype: %s" % self.dtype) - - - diff --git a/official/nlp/modeling/ops/decoding_module_test.py b/official/nlp/modeling/ops/decoding_module_test.py index da444ed5394a6fd257663b61c9230be715d7846c..cec2902de716c16900806197339e95bb8fd5194b 100644 --- a/official/nlp/modeling/ops/decoding_module_test.py +++ b/official/nlp/modeling/ops/decoding_module_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -29,6 +29,7 @@ class TestSubclass(decoding_module.DecodingModule, metaclass=abc.ABCMeta): def __init__(self, length_normalization_fn=length_normalization, + extra_cache_output=True, dtype=tf.float32): super(TestSubclass, self).__init__( length_normalization_fn=length_normalization, dtype=dtype) diff --git a/official/nlp/modeling/ops/sampling_module.py b/official/nlp/modeling/ops/sampling_module.py index dc396b7b1f8182fb7f8b4a76645e8b60f1fe8a2b..12e882421deb033d3f06f1d5d65ac272157788af 100644 --- a/official/nlp/modeling/ops/sampling_module.py +++ b/official/nlp/modeling/ops/sampling_module.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -55,6 +55,8 @@ def sample_top_k(logits, top_k): Returns: Logits with top_k filtering applied. """ + top_k = tf.clip_by_value( + top_k, clip_value_min=1, clip_value_max=tf.shape(logits)[-1]) top_k_logits = tf.math.top_k(logits, k=top_k) indices_to_remove = logits < tf.expand_dims(top_k_logits[0][..., -1], -1) top_k_logits = set_tensor_by_indices_to_value(logits, indices_to_remove, @@ -160,7 +162,9 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): top_p=1.0, sample_temperature=0.0, enable_greedy: bool = True, - dtype: tf.DType = tf.float32): + dtype: tf.DType = tf.float32, + decoding_name: Optional[str] = None, + extra_cache_output: bool = False): """Initialize sampling module.""" self.symbols_to_logits_fn = symbols_to_logits_fn self.length_normalization_fn = length_normalization_fn @@ -174,8 +178,13 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): self.sample_temperature = tf.convert_to_tensor( sample_temperature, dtype=tf.float32) self.enable_greedy = enable_greedy + self.decoding_name = decoding_name + self.extra_cache_output = extra_cache_output super(SamplingModule, self).__init__( - length_normalization_fn=length_normalization_fn, dtype=dtype) + length_normalization_fn=length_normalization_fn, + dtype=dtype, + decoding_name=decoding_name, + extra_cache_output=extra_cache_output) def _grow_alive_seq(self, state: Dict[str, Any], @@ -241,10 +250,13 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): topk_seq = tf.concat([alive_seq, topk_ids], axis=-1) return topk_seq, topk_log_probs, topk_ids, new_cache - def _create_initial_state(self, - initial_ids: tf.Tensor, - initial_cache: Dict[str, tf.Tensor], - batch_size: int) -> decoding_module.InitialState: + def _create_initial_state( + self, + initial_ids: tf.Tensor, + initial_cache: Dict[str, tf.Tensor], + batch_size: int, + initial_log_probs: Optional[tf.Tensor] = None + ) -> decoding_module.InitialState: """Return initial state dictionary and its shape invariants.""" for key, value in initial_cache.items(): for inner_value in tf.nest.flatten(value): @@ -264,8 +276,11 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): alive_seq = tf.tile(alive_seq, [1, self.max_decode_length + 1]) # Initial log probabilities with shape [batch_size, 1]. - initial_log_probs = tf.constant([[0.]], dtype=self.dtype) - alive_log_probs = tf.tile(initial_log_probs, [batch_size, 1]) + if initial_log_probs is None: + initial_log_probs = tf.constant([[0.]], dtype=self.dtype) + alive_log_probs = tf.tile(initial_log_probs, [batch_size, 1]) + else: + alive_log_probs = initial_log_probs alive_cache = initial_cache @@ -294,16 +309,14 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): decoding_module.StateKeys.CUR_INDEX: tf.TensorShape([]), decoding_module.StateKeys.ALIVE_SEQ: - tf.TensorShape( - [batch_size, self.max_decode_length + 1]), + tf.TensorShape([batch_size, self.max_decode_length + 1]), decoding_module.StateKeys.ALIVE_LOG_PROBS: tf.TensorShape([batch_size, 1]), decoding_module.StateKeys.ALIVE_CACHE: tf.nest.map_structure(lambda state: state.get_shape(), alive_cache), decoding_module.StateKeys.FINISHED_SEQ: - tf.TensorShape( - [batch_size, self.max_decode_length + 1]), + tf.TensorShape([batch_size, self.max_decode_length + 1]), decoding_module.StateKeys.FINISHED_SCORES: tf.TensorShape([batch_size, 1]), decoding_module.StateKeys.FINISHED_FLAGS: @@ -318,9 +331,8 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): decoding_module.StateKeys.ALIVE_LOG_PROBS: tf.TensorShape([None, 1]), decoding_module.StateKeys.ALIVE_CACHE: - tf.nest.map_structure( - decoding_module.get_shape_keep_last_dim, - alive_cache), + tf.nest.map_structure(decoding_module.get_shape_keep_last_dim, + alive_cache), decoding_module.StateKeys.FINISHED_SEQ: tf.TensorShape([None, None]), decoding_module.StateKeys.FINISHED_SCORES: @@ -329,6 +341,22 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): tf.TensorShape([None, 1]) } + if self.extra_cache_output: + state.update( + {decoding_module.StateKeys.INITIAL_OUTPUT_CACHE: alive_cache}) + if self.padded_decode: + state_shape_invariants.update({ + decoding_module.StateKeys.INITIAL_OUTPUT_CACHE: + tf.nest.map_structure(lambda state: state.get_shape(), + alive_cache) + }) + else: + state_shape_invariants.update({ + decoding_module.StateKeys.INITIAL_OUTPUT_CACHE: + tf.nest.map_structure(decoding_module.get_shape_keep_last_dim, + alive_cache), + }) + return state, state_shape_invariants def _get_new_alive_state(self, new_seq: tf.Tensor, new_log_probs: tf.Tensor, @@ -422,6 +450,9 @@ class SamplingModule(decoding_module.DecodingModule, metaclass=abc.ABCMeta): finished_scores) finished_seq = tf.where(seq_cond, finished_seq, alive_seq) finished_scores = tf.where(score_cond, finished_scores, alive_log_probs) + if self.extra_cache_output: + return finished_seq, finished_scores, finished_state[ + decoding_module.StateKeys.INITIAL_OUTPUT_CACHE] return finished_seq, finished_scores def _continue_search(self, state) -> tf.Tensor: diff --git a/official/nlp/modeling/ops/segment_extractor.py b/official/nlp/modeling/ops/segment_extractor.py index e01649e4a3deffcd6aa9634da639c26b0c7199a0..8016b65cdd4d826829820936906cf7220a34be34 100644 --- a/official/nlp/modeling/ops/segment_extractor.py +++ b/official/nlp/modeling/ops/segment_extractor.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/modeling/ops/segment_extractor_test.py b/official/nlp/modeling/ops/segment_extractor_test.py index 6b4094b87870ab3054f460b150e230f74ab30339..3fb6f5667310370c75e3718d1f39b875422cd07c 100644 --- a/official/nlp/modeling/ops/segment_extractor_test.py +++ b/official/nlp/modeling/ops/segment_extractor_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/optimization.py b/official/nlp/optimization.py index 4040b73e9a704859ae30e27bafe6c4bc468f6cc8..13d21f144c7796efee359cb0c5b1afde2fba605b 100644 --- a/official/nlp/optimization.py +++ b/official/nlp/optimization.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,15 @@ # See the License for the specific language governing permissions and # limitations under the License. -"""Functions and classes related to optimization (weight updates).""" - -import re +"""Legacy functions and classes related to optimization.""" from absl import logging import gin import tensorflow as tf import tensorflow_addons.optimizers as tfa_optimizers +from official.modeling.optimization import legacy_adamw + +AdamWeightDecay = legacy_adamw.AdamWeightDecay class WarmUp(tf.keras.optimizers.schedules.LearningRateSchedule): @@ -70,13 +71,15 @@ def create_optimizer(init_lr, num_warmup_steps, end_lr=0.0, optimizer_type='adamw', - beta_1=0.9): + beta_1=0.9, + poly_power=1.0): """Creates an optimizer with learning rate schedule.""" # Implements linear decay of the learning rate. lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay( initial_learning_rate=init_lr, decay_steps=num_train_steps, - end_learning_rate=end_lr) + end_learning_rate=end_lr, + power=poly_power) if num_warmup_steps: lr_schedule = WarmUp( initial_learning_rate=init_lr, @@ -105,126 +108,3 @@ def create_optimizer(init_lr, raise ValueError('Unsupported optimizer type: ', optimizer_type) return optimizer - - -class AdamWeightDecay(tf.keras.optimizers.Adam): - """Adam enables L2 weight decay and clip_by_global_norm on gradients. - - Just adding the square of the weights to the loss function is *not* the - correct way of using L2 regularization/weight decay with Adam, since that will - interact with the m and v parameters in strange ways. - - Instead we want to decay the weights in a manner that doesn't interact with - the m/v parameters. This is equivalent to adding the square of the weights to - the loss with plain (non-momentum) SGD. - """ - - def __init__(self, - learning_rate=0.001, - beta_1=0.9, - beta_2=0.999, - epsilon=1e-7, - amsgrad=False, - weight_decay_rate=0.0, - include_in_weight_decay=None, - exclude_from_weight_decay=None, - gradient_clip_norm=1.0, - name='AdamWeightDecay', - **kwargs): - super(AdamWeightDecay, self).__init__(learning_rate, beta_1, beta_2, - epsilon, amsgrad, name, **kwargs) - self.weight_decay_rate = weight_decay_rate - self.gradient_clip_norm = gradient_clip_norm - self._include_in_weight_decay = include_in_weight_decay - self._exclude_from_weight_decay = exclude_from_weight_decay - logging.info('gradient_clip_norm=%f', gradient_clip_norm) - - @classmethod - def from_config(cls, config): - """Creates an optimizer from its config with WarmUp custom object.""" - custom_objects = {'WarmUp': WarmUp} - return super(AdamWeightDecay, cls).from_config( - config, custom_objects=custom_objects) - - def _prepare_local(self, var_device, var_dtype, apply_state): - super(AdamWeightDecay, self)._prepare_local(var_device, var_dtype, # pytype: disable=attribute-error # typed-keras - apply_state) - apply_state[(var_device, var_dtype)]['weight_decay_rate'] = tf.constant( - self.weight_decay_rate, name='adam_weight_decay_rate') - - def _decay_weights_op(self, var, learning_rate, apply_state): - do_decay = self._do_use_weight_decay(var.name) - if do_decay: - return var.assign_sub( - learning_rate * var * - apply_state[(var.device, var.dtype.base_dtype)]['weight_decay_rate'], - use_locking=self._use_locking) - return tf.no_op() - - def apply_gradients(self, - grads_and_vars, - name=None, - experimental_aggregate_gradients=True): - grads, tvars = list(zip(*grads_and_vars)) - if experimental_aggregate_gradients and self.gradient_clip_norm > 0.0: - # when experimental_aggregate_gradients = False, apply_gradients() no - # longer implicitly allreduce gradients, users manually allreduce gradient - # and passed the allreduced grads_and_vars. For now, the - # clip_by_global_norm will be moved to before the explicit allreduce to - # keep the math the same as TF 1 and pre TF 2.2 implementation. - (grads, _) = tf.clip_by_global_norm( - grads, clip_norm=self.gradient_clip_norm) - return super(AdamWeightDecay, self).apply_gradients( - zip(grads, tvars), - name=name, - experimental_aggregate_gradients=experimental_aggregate_gradients) - - def _get_lr(self, var_device, var_dtype, apply_state): - """Retrieves the learning rate with the given state.""" - if apply_state is None: - return self._decayed_lr_t[var_dtype], {} - - apply_state = apply_state or {} - coefficients = apply_state.get((var_device, var_dtype)) - if coefficients is None: - coefficients = self._fallback_apply_state(var_device, var_dtype) - apply_state[(var_device, var_dtype)] = coefficients - - return coefficients['lr_t'], dict(apply_state=apply_state) - - def _resource_apply_dense(self, grad, var, apply_state=None): - lr_t, kwargs = self._get_lr(var.device, var.dtype.base_dtype, apply_state) - decay = self._decay_weights_op(var, lr_t, apply_state) - with tf.control_dependencies([decay]): - return super(AdamWeightDecay, - self)._resource_apply_dense(grad, var, **kwargs) # pytype: disable=attribute-error # typed-keras - - def _resource_apply_sparse(self, grad, var, indices, apply_state=None): - lr_t, kwargs = self._get_lr(var.device, var.dtype.base_dtype, apply_state) - decay = self._decay_weights_op(var, lr_t, apply_state) - with tf.control_dependencies([decay]): - return super(AdamWeightDecay, - self)._resource_apply_sparse(grad, var, indices, **kwargs) # pytype: disable=attribute-error # typed-keras - - def get_config(self): - config = super(AdamWeightDecay, self).get_config() - config.update({ - 'weight_decay_rate': self.weight_decay_rate, - }) - return config - - def _do_use_weight_decay(self, param_name): - """Whether to use L2 weight decay for `param_name`.""" - if self.weight_decay_rate == 0: - return False - - if self._include_in_weight_decay: - for r in self._include_in_weight_decay: - if re.search(r, param_name) is not None: - return True - - if self._exclude_from_weight_decay: - for r in self._exclude_from_weight_decay: - if re.search(r, param_name) is not None: - return False - return True diff --git a/official/nlp/projects/__init__.py b/official/nlp/projects/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/nlp/projects/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/nlp/projects/bigbird/__init__.py b/official/nlp/projects/bigbird/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/nlp/projects/bigbird/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/nlp/projects/bigbird/encoder.py b/official/nlp/projects/bigbird/encoder.py deleted file mode 100644 index 911d8cca77e95448246ac2ca9d15e3a0fd73d861..0000000000000000000000000000000000000000 --- a/official/nlp/projects/bigbird/encoder.py +++ /dev/null @@ -1,238 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Transformer-based text encoder network.""" -# pylint: disable=g-classes-have-attributes - -import tensorflow as tf - -from official.modeling import activations -from official.nlp import modeling -from official.nlp.modeling import layers -from official.nlp.projects.bigbird import recompute_grad -from official.nlp.projects.bigbird import recomputing_dropout - - -_MAX_SEQ_LEN = 4096 - - -class RecomputeTransformerLayer(layers.TransformerScaffold): - """Transformer layer that recomputes the forward pass during backpropagation.""" - - def call(self, inputs, training=None): - emb, mask = inputs - def f(*args): - # recompute_grad can only handle tensor inputs. so we enumerate the - # nested input [emb, mask] as follows: - # args[0]: emb - # args[1]: mask[0] = band_mask - # args[2]: mask[1] = encoder_from_mask - # args[3]: mask[2] = encoder_to_mask - # args[4]: mask[3] = blocked_encoder_mask - x = super(RecomputeTransformerLayer, - self).call([args[0], [args[1], args[2], args[3], args[4]]], - training=training) - return x - - f = recompute_grad.recompute_grad(f) - - return f(emb, *mask) - - -@tf.keras.utils.register_keras_serializable(package='Text') -class BigBirdEncoder(tf.keras.Model): - """Transformer-based encoder network with BigBird attentions. - - *Note* that the network is constructed by - [Keras Functional API](https://keras.io/guides/functional_api/). - - Args: - vocab_size: The size of the token vocabulary. - hidden_size: The size of the transformer hidden layers. - num_layers: The number of transformer layers. - num_attention_heads: The number of attention heads for each transformer. The - hidden size must be divisible by the number of attention heads. - max_position_embeddings: The maximum length of position embeddings that this - encoder can consume. If None, max_position_embeddings uses the value from - sequence length. This determines the variable shape for positional - embeddings. - type_vocab_size: The number of types that the 'type_ids' input can take. - intermediate_size: The intermediate size for the transformer layers. - block_size: int. A BigBird Attention parameter: size of block in from/to - sequences. - num_rand_blocks: int. A BigBird Attention parameter: number of random chunks - per row. - activation: The activation to use for the transformer layers. - dropout_rate: The dropout rate to use for the transformer layers. - attention_dropout_rate: The dropout rate to use for the attention layers - within the transformer layers. - initializer: The initialzer to use for all weights in this encoder. - embedding_width: The width of the word embeddings. If the embedding width is - not equal to hidden size, embedding parameters will be factorized into two - matrices in the shape of ['vocab_size', 'embedding_width'] and - ['embedding_width', 'hidden_size'] ('embedding_width' is usually much - smaller than 'hidden_size'). - use_gradient_checkpointing: Use gradient checkpointing to trade-off compute - for memory. - """ - - def __init__(self, - vocab_size, - hidden_size=768, - num_layers=12, - num_attention_heads=12, - max_position_embeddings=_MAX_SEQ_LEN, - type_vocab_size=16, - intermediate_size=3072, - block_size=64, - num_rand_blocks=3, - activation=activations.gelu, - dropout_rate=0.1, - attention_dropout_rate=0.1, - initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02), - embedding_width=None, - use_gradient_checkpointing=False, - **kwargs): - activation = tf.keras.activations.get(activation) - initializer = tf.keras.initializers.get(initializer) - - if use_gradient_checkpointing: - tf.keras.layers.Dropout = recomputing_dropout.RecomputingDropout - layer_cls = RecomputeTransformerLayer - else: - layer_cls = layers.TransformerScaffold - - self._self_setattr_tracking = False - self._config_dict = { - 'vocab_size': vocab_size, - 'hidden_size': hidden_size, - 'num_layers': num_layers, - 'num_attention_heads': num_attention_heads, - 'max_position_embeddings': max_position_embeddings, - 'type_vocab_size': type_vocab_size, - 'intermediate_size': intermediate_size, - 'block_size': block_size, - 'num_rand_blocks': num_rand_blocks, - 'activation': tf.keras.activations.serialize(activation), - 'dropout_rate': dropout_rate, - 'attention_dropout_rate': attention_dropout_rate, - 'initializer': tf.keras.initializers.serialize(initializer), - 'embedding_width': embedding_width, - } - - word_ids = tf.keras.layers.Input( - shape=(None,), dtype=tf.int32, name='input_word_ids') - mask = tf.keras.layers.Input( - shape=(None,), dtype=tf.int32, name='input_mask') - type_ids = tf.keras.layers.Input( - shape=(None,), dtype=tf.int32, name='input_type_ids') - - if embedding_width is None: - embedding_width = hidden_size - self._embedding_layer = modeling.layers.OnDeviceEmbedding( - vocab_size=vocab_size, - embedding_width=embedding_width, - initializer=initializer, - name='word_embeddings') - word_embeddings = self._embedding_layer(word_ids) - - # Always uses dynamic slicing for simplicity. - self._position_embedding_layer = modeling.layers.PositionEmbedding( - initializer=initializer, - max_length=max_position_embeddings, - name='position_embedding') - position_embeddings = self._position_embedding_layer(word_embeddings) - self._type_embedding_layer = modeling.layers.OnDeviceEmbedding( - vocab_size=type_vocab_size, - embedding_width=embedding_width, - initializer=initializer, - use_one_hot=True, - name='type_embeddings') - type_embeddings = self._type_embedding_layer(type_ids) - - embeddings = tf.keras.layers.Add()( - [word_embeddings, position_embeddings, type_embeddings]) - - self._embedding_norm_layer = tf.keras.layers.LayerNormalization( - name='embeddings/layer_norm', axis=-1, epsilon=1e-12, dtype=tf.float32) - - embeddings = self._embedding_norm_layer(embeddings) - embeddings = tf.keras.layers.Dropout(rate=dropout_rate)(embeddings) - - # We project the 'embedding' output to 'hidden_size' if it is not already - # 'hidden_size'. - if embedding_width != hidden_size: - self._embedding_projection = tf.keras.layers.experimental.EinsumDense( - '...x,xy->...y', - output_shape=hidden_size, - bias_axes='y', - kernel_initializer=initializer, - name='embedding_projection') - embeddings = self._embedding_projection(embeddings) - - self._transformer_layers = [] - data = embeddings - masks = layers.BigBirdMasks(block_size=block_size)( - data, mask) - encoder_outputs = [] - attn_head_dim = hidden_size // num_attention_heads - for i in range(num_layers): - layer = layer_cls( - num_attention_heads, - intermediate_size, - activation, - attention_cls=layers.BigBirdAttention, - attention_cfg=dict( - num_heads=num_attention_heads, - key_dim=attn_head_dim, - kernel_initializer=initializer, - from_block_size=block_size, - to_block_size=block_size, - num_rand_blocks=num_rand_blocks, - max_rand_mask_length=max_position_embeddings, - seed=i), - dropout_rate=dropout_rate, - attention_dropout_rate=dropout_rate, - kernel_initializer=initializer) - self._transformer_layers.append(layer) - data = layer([data, masks]) - encoder_outputs.append(data) - - outputs = dict( - sequence_output=encoder_outputs[-1], encoder_outputs=encoder_outputs) - super().__init__( - inputs=[word_ids, mask, type_ids], outputs=outputs, **kwargs) - - def get_embedding_table(self): - return self._embedding_layer.embeddings - - def get_embedding_layer(self): - return self._embedding_layer - - def get_config(self): - return self._config_dict - - @property - def transformer_layers(self): - """List of Transformer layers in the encoder.""" - return self._transformer_layers - - @property - def pooler_layer(self): - """The pooler dense layer after the transformer layers.""" - return self._pooler_layer - - @classmethod - def from_config(cls, config, custom_objects=None): - return cls(**config) diff --git a/official/nlp/projects/bigbird/encoder_test.py b/official/nlp/projects/bigbird/encoder_test.py deleted file mode 100644 index 5ebab7776b56b1af40539e164fa41c7f016f32e0..0000000000000000000000000000000000000000 --- a/official/nlp/projects/bigbird/encoder_test.py +++ /dev/null @@ -1,63 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for official.nlp.projects.bigbird.encoder.""" - -import numpy as np -import tensorflow as tf - -from official.nlp.projects.bigbird import encoder - - -class BigBirdEncoderTest(tf.test.TestCase): - - def test_encoder(self): - sequence_length = 1024 - batch_size = 2 - vocab_size = 1024 - network = encoder.BigBirdEncoder( - num_layers=1, vocab_size=1024, max_position_embeddings=4096) - word_id_data = np.random.randint( - vocab_size, size=(batch_size, sequence_length)) - mask_data = np.random.randint(2, size=(batch_size, sequence_length)) - type_id_data = np.random.randint(2, size=(batch_size, sequence_length)) - outputs = network([word_id_data, mask_data, type_id_data]) - self.assertEqual(outputs["sequence_output"].shape, - (batch_size, sequence_length, 768)) - - def test_save_restore(self): - sequence_length = 1024 - batch_size = 2 - vocab_size = 1024 - network = encoder.BigBirdEncoder( - num_layers=1, vocab_size=1024, max_position_embeddings=4096) - word_id_data = np.random.randint( - vocab_size, size=(batch_size, sequence_length)) - mask_data = np.random.randint(2, size=(batch_size, sequence_length)) - type_id_data = np.random.randint(2, size=(batch_size, sequence_length)) - inputs = dict( - input_word_ids=word_id_data, - input_mask=mask_data, - input_type_ids=type_id_data) - ref_outputs = network(inputs) - model_path = self.get_temp_dir() + "/model" - network.save(model_path) - loaded = tf.keras.models.load_model(model_path) - outputs = loaded(inputs) - self.assertAllClose(outputs["sequence_output"], - ref_outputs["sequence_output"]) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/projects/bigbird/experiment_configs.py b/official/nlp/projects/bigbird/experiment_configs.py deleted file mode 100644 index 35de842102b90429d6755ac589e8bc858e7ae109..0000000000000000000000000000000000000000 --- a/official/nlp/projects/bigbird/experiment_configs.py +++ /dev/null @@ -1,100 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Bigbird experiment configurations.""" -# pylint: disable=g-doc-return-or-yield,line-too-long -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import optimization -from official.nlp.data import question_answering_dataloader -from official.nlp.data import sentence_prediction_dataloader -from official.nlp.tasks import question_answering -from official.nlp.tasks import sentence_prediction - - -@exp_factory.register_config_factory('bigbird/glue') -def bigbird_glue() -> cfg.ExperimentConfig: - r"""BigBird GLUE.""" - config = cfg.ExperimentConfig( - task=sentence_prediction.SentencePredictionConfig( - train_data=sentence_prediction_dataloader - .SentencePredictionDataConfig(), - validation_data=sentence_prediction_dataloader - .SentencePredictionDataConfig( - is_training=False, drop_remainder=False)), - trainer=cfg.TrainerConfig( - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'adamw', - 'adamw': { - 'weight_decay_rate': - 0.01, - 'exclude_from_weight_decay': - ['LayerNorm', 'layer_norm', 'bias'], - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 3e-5, - 'end_learning_rate': 0.0, - } - }, - 'warmup': { - 'type': 'polynomial' - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - config.task.model.encoder.type = 'bigbird' - return config - - -@exp_factory.register_config_factory('bigbird/squad') -def bigbird_squad() -> cfg.ExperimentConfig: - r"""BigBird Squad V1/V2.""" - config = cfg.ExperimentConfig( - task=question_answering.QuestionAnsweringConfig( - train_data=question_answering_dataloader.QADataConfig(), - validation_data=question_answering_dataloader.QADataConfig()), - trainer=cfg.TrainerConfig( - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'adamw', - 'adamw': { - 'weight_decay_rate': - 0.01, - 'exclude_from_weight_decay': - ['LayerNorm', 'layer_norm', 'bias'], - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 8e-5, - 'end_learning_rate': 0.0, - } - }, - 'warmup': { - 'type': 'polynomial' - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - config.task.model.encoder.type = 'bigbird' - return config diff --git a/official/nlp/projects/teams/__init__.py b/official/nlp/projects/teams/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/nlp/projects/teams/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/nlp/projects/teams/teams_experiments_test.py b/official/nlp/projects/teams/teams_experiments_test.py deleted file mode 100644 index b4b4448c46ce83a44fdc18c87890bdf0fa0ffe85..0000000000000000000000000000000000000000 --- a/official/nlp/projects/teams/teams_experiments_test.py +++ /dev/null @@ -1,38 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for teams_experiments.""" - -from absl.testing import parameterized -import tensorflow as tf - -# pylint: disable=unused-import -from official.common import registry_imports -# pylint: enable=unused-import -from official.core import config_definitions as cfg -from official.core import exp_factory - - -class TeamsExperimentsTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters(('teams/pretraining',)) - def test_teams_experiments(self, config_name): - config = exp_factory.get_exp_config(config_name) - self.assertIsInstance(config, cfg.ExperimentConfig) - self.assertIsInstance(config.task.train_data, cfg.DataConfig) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/nlp/projects/triviaqa/__init__.py b/official/nlp/projects/triviaqa/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/nlp/projects/triviaqa/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/nlp/projects/triviaqa/train.py b/official/nlp/projects/triviaqa/train.py deleted file mode 100644 index c4e4c101f9f0034600c955fa0fb218a6253299c2..0000000000000000000000000000000000000000 --- a/official/nlp/projects/triviaqa/train.py +++ /dev/null @@ -1,384 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""TriviaQA training script.""" -import collections -import contextlib -import functools -import json -import operator -import os - -from absl import app -from absl import flags -from absl import logging -import gin -import tensorflow as tf -import tensorflow_datasets as tfds - -import sentencepiece as spm -from official.nlp import optimization as nlp_optimization -from official.nlp.configs import encoders -from official.nlp.projects.triviaqa import evaluation -from official.nlp.projects.triviaqa import inputs -from official.nlp.projects.triviaqa import modeling -from official.nlp.projects.triviaqa import prediction - -flags.DEFINE_string('data_dir', None, 'Data directory for TensorFlow Datasets.') - -flags.DEFINE_string( - 'validation_gold_path', None, - 'Path to golden validation. Usually, the wikipedia-dev.json file.') - -flags.DEFINE_string('model_dir', None, - 'Directory for checkpoints and summaries.') - -flags.DEFINE_string('model_config_path', None, - 'JSON file containing model coniguration.') - -flags.DEFINE_string('sentencepiece_model_path', None, - 'Path to sentence piece model.') - -flags.DEFINE_enum('encoder', 'bigbird', - ['bert', 'bigbird', 'albert', 'mobilebert'], - 'Which transformer encoder model to use.') - -flags.DEFINE_integer('bigbird_block_size', 64, - 'Size of blocks for sparse block attention.') - -flags.DEFINE_string('init_checkpoint_path', None, - 'Path from which to initialize weights.') - -flags.DEFINE_integer('train_sequence_length', 4096, - 'Maximum number of tokens for training.') - -flags.DEFINE_integer('train_global_sequence_length', 320, - 'Maximum number of global tokens for training.') - -flags.DEFINE_integer('validation_sequence_length', 4096, - 'Maximum number of tokens for validation.') - -flags.DEFINE_integer('validation_global_sequence_length', 320, - 'Maximum number of global tokens for validation.') - -flags.DEFINE_integer('batch_size', 32, 'Size of batch.') - -flags.DEFINE_string('master', '', 'Address of the TPU master.') - -flags.DEFINE_integer('decode_top_k', 8, - 'Maximum number of tokens to consider for begin/end.') - -flags.DEFINE_integer('decode_max_size', 16, - 'Maximum number of sentence pieces in an answer.') - -flags.DEFINE_float('dropout_rate', 0.1, 'Dropout rate for hidden layers.') - -flags.DEFINE_float('attention_dropout_rate', 0.3, - 'Dropout rate for attention layers.') - -flags.DEFINE_float('label_smoothing', 1e-1, 'Degree of label smoothing.') - -flags.DEFINE_multi_string( - 'gin_bindings', [], - 'Gin bindings to override the values set in the config files') - -FLAGS = flags.FLAGS - - -@contextlib.contextmanager -def worker_context(): - if FLAGS.master: - with tf.device('/job:worker') as d: - yield d - else: - yield - - -def read_sentencepiece_model(path): - with tf.io.gfile.GFile(path, 'rb') as file: - processor = spm.SentencePieceProcessor() - processor.LoadFromSerializedProto(file.read()) - return processor - - -# Rename old BERT v1 configuration parameters. -_MODEL_CONFIG_REPLACEMENTS = { - 'num_hidden_layers': 'num_layers', - 'attention_probs_dropout_prob': 'attention_dropout_rate', - 'hidden_dropout_prob': 'dropout_rate', - 'hidden_act': 'hidden_activation', - 'window_size': 'block_size', -} - - -def read_model_config(encoder, - path, - bigbird_block_size=None) -> encoders.EncoderConfig: - """Merges the JSON configuration into the encoder configuration.""" - with tf.io.gfile.GFile(path) as f: - model_config = json.load(f) - for key, value in _MODEL_CONFIG_REPLACEMENTS.items(): - if key in model_config: - model_config[value] = model_config.pop(key) - model_config['attention_dropout_rate'] = FLAGS.attention_dropout_rate - model_config['dropout_rate'] = FLAGS.dropout_rate - model_config['block_size'] = bigbird_block_size - encoder_config = encoders.EncoderConfig(type=encoder) - # Override the default config with those loaded from the JSON file. - encoder_config_keys = encoder_config.get().as_dict().keys() - overrides = {} - for key, value in model_config.items(): - if key in encoder_config_keys: - overrides[key] = value - else: - logging.warning('Ignoring config parameter %s=%s', key, value) - encoder_config.get().override(overrides) - return encoder_config - - -@gin.configurable(denylist=[ - 'model', - 'strategy', - 'train_dataset', - 'model_dir', - 'init_checkpoint_path', - 'evaluate_fn', -]) -def fit(model, - strategy, - train_dataset, - model_dir, - init_checkpoint_path=None, - evaluate_fn=None, - learning_rate=1e-5, - learning_rate_polynomial_decay_rate=1., - weight_decay_rate=1e-1, - num_warmup_steps=5000, - num_decay_steps=51000, - num_epochs=6): - """Train and evaluate.""" - hparams = dict( - learning_rate=learning_rate, - num_decay_steps=num_decay_steps, - num_warmup_steps=num_warmup_steps, - num_epochs=num_epochs, - weight_decay_rate=weight_decay_rate, - dropout_rate=FLAGS.dropout_rate, - attention_dropout_rate=FLAGS.attention_dropout_rate, - label_smoothing=FLAGS.label_smoothing) - logging.info(hparams) - learning_rate_schedule = nlp_optimization.WarmUp( - learning_rate, - tf.keras.optimizers.schedules.PolynomialDecay( - learning_rate, - num_decay_steps, - end_learning_rate=0., - power=learning_rate_polynomial_decay_rate), num_warmup_steps) - with strategy.scope(): - optimizer = nlp_optimization.AdamWeightDecay( - learning_rate_schedule, - weight_decay_rate=weight_decay_rate, - epsilon=1e-6, - exclude_from_weight_decay=['LayerNorm', 'layer_norm', 'bias']) - model.compile(optimizer, loss=modeling.SpanOrCrossEntropyLoss()) - - def init_fn(init_checkpoint_path): - ckpt = tf.train.Checkpoint(encoder=model.encoder) - ckpt.restore(init_checkpoint_path).assert_existing_objects_matched() - - with worker_context(): - ckpt_manager = tf.train.CheckpointManager( - tf.train.Checkpoint(model=model, optimizer=optimizer), - model_dir, - max_to_keep=None, - init_fn=(functools.partial(init_fn, init_checkpoint_path) - if init_checkpoint_path else None)) - with strategy.scope(): - ckpt_manager.restore_or_initialize() - val_summary_writer = tf.summary.create_file_writer( - os.path.join(model_dir, 'val')) - best_exact_match = 0. - for epoch in range(len(ckpt_manager.checkpoints), num_epochs): - model.fit( - train_dataset, - callbacks=[ - tf.keras.callbacks.TensorBoard(model_dir, write_graph=False), - ]) - ckpt_path = ckpt_manager.save() - if evaluate_fn is None: - continue - metrics = evaluate_fn() - logging.info('Epoch %d: %s', epoch + 1, metrics) - if best_exact_match < metrics['exact_match']: - best_exact_match = metrics['exact_match'] - model.save(os.path.join(model_dir, 'export'), include_optimizer=False) - logging.info('Exporting %s as SavedModel.', ckpt_path) - with val_summary_writer.as_default(): - for name, data in metrics.items(): - tf.summary.scalar(name, data, epoch + 1) - - -def evaluate(sp_processor, features_map_fn, labels_map_fn, logits_fn, - decode_logits_fn, split_and_pad_fn, distribute_strategy, - validation_dataset, ground_truth): - """Run evaluation.""" - loss_metric = tf.keras.metrics.Mean() - - @tf.function - def update_loss(y, logits): - loss_fn = modeling.SpanOrCrossEntropyLoss( - reduction=tf.keras.losses.Reduction.NONE) - return loss_metric(loss_fn(y, logits)) - - predictions = collections.defaultdict(list) - for _, (features, labels) in validation_dataset.enumerate(): - token_ids = features['token_ids'] - y = labels_map_fn(token_ids, labels) - x = split_and_pad_fn(features_map_fn(features)) - logits = tf.concat( - distribute_strategy.experimental_local_results(logits_fn(x)), 0) - logits = logits[:features['token_ids'].shape[0]] - update_loss(y, logits) - end_limit = token_ids.row_lengths() - 1 # inclusive - begin, end, scores = decode_logits_fn(logits, end_limit) - answers = prediction.decode_answer(features['context'], begin, end, - features['token_offsets'], - end_limit).numpy() - for _, (qid, token_id, offset, score, answer) in enumerate( - zip(features['qid'].numpy(), - tf.gather(features['token_ids'], begin, batch_dims=1).numpy(), - tf.gather(features['token_offsets'], begin, batch_dims=1).numpy(), - scores, answers)): - if not answer: - continue - if sp_processor.IdToPiece(int(token_id)).startswith('▁') and offset > 0: - answer = answer[1:] - predictions[qid.decode('utf-8')].append((score, answer.decode('utf-8'))) - predictions = { - qid: evaluation.normalize_answer( - sorted(answers, key=operator.itemgetter(0), reverse=True)[0][1]) - for qid, answers in predictions.items() - } - metrics = evaluation.evaluate_triviaqa(ground_truth, predictions, mute=True) - metrics['loss'] = loss_metric.result().numpy() - return metrics - - -def main(argv): - if len(argv) > 1: - raise app.UsageError('Too many command-line arguments.') - gin.parse_config(FLAGS.gin_bindings) - model_config = read_model_config( - FLAGS.encoder, - FLAGS.model_config_path, - bigbird_block_size=FLAGS.bigbird_block_size) - logging.info(model_config.get().as_dict()) - # Configure input processing. - sp_processor = read_sentencepiece_model(FLAGS.sentencepiece_model_path) - features_map_fn = functools.partial( - inputs.features_map_fn, - local_radius=FLAGS.bigbird_block_size, - relative_pos_max_distance=24, - use_hard_g2l_mask=True, - padding_id=sp_processor.PieceToId(''), - eos_id=sp_processor.PieceToId(''), - null_id=sp_processor.PieceToId(''), - cls_id=sp_processor.PieceToId(''), - sep_id=sp_processor.PieceToId('')) - train_features_map_fn = tf.function( - functools.partial( - features_map_fn, - sequence_length=FLAGS.train_sequence_length, - global_sequence_length=FLAGS.train_global_sequence_length), - autograph=False) - train_labels_map_fn = tf.function( - functools.partial( - inputs.labels_map_fn, sequence_length=FLAGS.train_sequence_length)) - # Connect to TPU cluster. - if FLAGS.master: - resolver = tf.distribute.cluster_resolver.TPUClusterResolver(FLAGS.master) - tf.config.experimental_connect_to_cluster(resolver) - tf.tpu.experimental.initialize_tpu_system(resolver) - strategy = tf.distribute.TPUStrategy(resolver) - else: - strategy = tf.distribute.MirroredStrategy() - # Initialize datasets. - with worker_context(): - _ = tf.random.get_global_generator() - train_dataset = inputs.read_batches( - FLAGS.data_dir, - tfds.Split.TRAIN, - FLAGS.batch_size, - shuffle=True, - drop_final_batch=True) - validation_dataset = inputs.read_batches(FLAGS.data_dir, - tfds.Split.VALIDATION, - FLAGS.batch_size) - - def train_map_fn(x, y): - features = train_features_map_fn(x) - labels = modeling.smooth_labels(FLAGS.label_smoothing, - train_labels_map_fn(x['token_ids'], y), - features['question_lengths'], - features['token_ids']) - return features, labels - - train_dataset = train_dataset.map(train_map_fn, 16).prefetch(16) - # Initialize model and compile. - with strategy.scope(): - model = modeling.TriviaQaModel(model_config, FLAGS.train_sequence_length) - logits_fn = tf.function( - functools.partial(prediction.distributed_logits_fn, model)) - decode_logits_fn = tf.function( - functools.partial(prediction.decode_logits, FLAGS.decode_top_k, - FLAGS.decode_max_size)) - split_and_pad_fn = tf.function( - functools.partial(prediction.split_and_pad, strategy, FLAGS.batch_size)) - # Evaluation strategy. - with tf.io.gfile.GFile(FLAGS.validation_gold_path) as f: - ground_truth = { - datum['QuestionId']: datum['Answer'] for datum in json.load(f)['Data'] - } - validation_features_map_fn = tf.function( - functools.partial( - features_map_fn, - sequence_length=FLAGS.validation_sequence_length, - global_sequence_length=FLAGS.validation_global_sequence_length), - autograph=False) - validation_labels_map_fn = tf.function( - functools.partial( - inputs.labels_map_fn, - sequence_length=FLAGS.validation_sequence_length)) - evaluate_fn = functools.partial( - evaluate, - sp_processor=sp_processor, - features_map_fn=validation_features_map_fn, - labels_map_fn=validation_labels_map_fn, - logits_fn=logits_fn, - decode_logits_fn=decode_logits_fn, - split_and_pad_fn=split_and_pad_fn, - distribute_strategy=strategy, - validation_dataset=validation_dataset, - ground_truth=ground_truth) - logging.info('Model initialized. Beginning training fit loop.') - fit(model, strategy, train_dataset, FLAGS.model_dir, - FLAGS.init_checkpoint_path, evaluate_fn) - - -if __name__ == '__main__': - flags.mark_flags_as_required([ - 'model_config_path', 'model_dir', 'sentencepiece_model_path', - 'validation_gold_path' - ]) - app.run(main) diff --git a/official/nlp/serving/__init__.py b/official/nlp/serving/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 --- /dev/null +++ b/official/nlp/serving/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + diff --git a/official/nlp/serving/export_savedmodel.py b/official/nlp/serving/export_savedmodel.py index d96752ee04f9bcbddc1e6d11f4b731dd0e918b1f..d4da2f5eca596de49a0be663bda9d14197063933 100644 --- a/official/nlp/serving/export_savedmodel.py +++ b/official/nlp/serving/export_savedmodel.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,12 +13,14 @@ # limitations under the License. """A binary/library to export TF-NLP serving `SavedModel`.""" +import dataclasses import os from typing import Any, Dict, Text + from absl import app from absl import flags -import dataclasses import yaml + from official.core import base_task from official.core import task_factory from official.modeling import hyperparams @@ -29,6 +31,7 @@ from official.nlp.tasks import masked_lm from official.nlp.tasks import question_answering from official.nlp.tasks import sentence_prediction from official.nlp.tasks import tagging +from official.nlp.tasks import translation FLAGS = flags.FLAGS @@ -40,7 +43,9 @@ SERVING_MODULES = { question_answering.QuestionAnsweringTask: serving_modules.QuestionAnswering, tagging.TaggingTask: - serving_modules.Tagging + serving_modules.Tagging, + translation.TranslationTask: + serving_modules.Translation } @@ -67,6 +72,12 @@ def define_flags(): flags.DEFINE_bool("convert_tpu", False, "") flags.DEFINE_multi_integer("allowed_batch_size", None, "Allowed batch sizes for batching ops.") + flags.DEFINE_integer("num_batch_threads", 4, + "Number of threads to do TPU batching.") + flags.DEFINE_integer("batch_timeout_micros", 100000, + "TPU batch function timeout in microseconds.") + flags.DEFINE_integer("max_enqueued_batches", 1000, + "Max number of batches in queue for TPU batching.") def lookup_export_module(task: base_task.Task): @@ -125,21 +136,30 @@ def main(_): if FLAGS.convert_tpu: # pylint: disable=g-import-not-at-top - from cloud_tpu.inference_converter import converter_cli - from cloud_tpu.inference_converter import converter_options_pb2 + from cloud_tpu.inference_converter_v2 import converter_options_v2_pb2 + from cloud_tpu.inference_converter_v2.python import converter + tpu_dir = os.path.join(export_dir, "tpu") - options = converter_options_pb2.ConverterOptions() + batch_options = [] if FLAGS.allowed_batch_size is not None: allowed_batch_sizes = sorted(FLAGS.allowed_batch_size) - options.batch_options.num_batch_threads = 4 - options.batch_options.max_batch_size = allowed_batch_sizes[-1] - options.batch_options.batch_timeout_micros = 100000 - options.batch_options.allowed_batch_sizes[:] = allowed_batch_sizes - options.batch_options.max_enqueued_batches = 1000 - converter_cli.ConvertSavedModel( - export_dir, tpu_dir, function_alias="tpu_candidate", options=options, - graph_rewrite_only=True) - + batch_option = converter_options_v2_pb2.BatchOptionsV2( + num_batch_threads=FLAGS.num_batch_threads, + max_batch_size=allowed_batch_sizes[-1], + batch_timeout_micros=FLAGS.batch_timeout_micros, + allowed_batch_sizes=allowed_batch_sizes, + max_enqueued_batches=FLAGS.max_enqueued_batches + ) + batch_options.append(batch_option) + + converter_options = converter_options_v2_pb2.ConverterOptionsV2( + tpu_functions=[ + converter_options_v2_pb2.TpuFunction(function_alias="tpu_candidate") + ], + batch_options=batch_options, + ) + + converter.ConvertSavedModel(export_dir, tpu_dir, converter_options) if __name__ == "__main__": define_flags() diff --git a/official/nlp/serving/export_savedmodel_test.py b/official/nlp/serving/export_savedmodel_test.py index 2891a9499f65c4e19f3e231b7178c2e98c7c704a..1f1a82a90d219e5c5a53611cffc5904158245bb5 100644 --- a/official/nlp/serving/export_savedmodel_test.py +++ b/official/nlp/serving/export_savedmodel_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/serving/export_savedmodel_util.py b/official/nlp/serving/export_savedmodel_util.py index b4363f434b296cdfdadb2392c85b3b423a37a000..8fe163c72e990b68c1c0d76b262f8189167d3045 100644 --- a/official/nlp/serving/export_savedmodel_util.py +++ b/official/nlp/serving/export_savedmodel_util.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/serving/serving_modules.py b/official/nlp/serving/serving_modules.py index 3fadde8180f57c091455b9360ade4232fd52d5c2..1621e0de543ee0d53b0f6a7a20f964c708ed2293 100644 --- a/official/nlp/serving/serving_modules.py +++ b/official/nlp/serving/serving_modules.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,10 +14,12 @@ """Serving export modules for TF Model Garden NLP models.""" # pylint:disable=missing-class-docstring +import dataclasses from typing import Dict, List, Optional, Text -import dataclasses import tensorflow as tf +import tensorflow_text as tf_text + from official.core import export_base from official.modeling.hyperparams import base_config from official.nlp.data import sentence_prediction_dataloader @@ -407,3 +409,52 @@ class Tagging(export_base.ExportModule): signatures[signature_key] = self.serve_examples.get_concrete_function( tf.TensorSpec(shape=[None], dtype=tf.string, name="examples")) return signatures + + +class Translation(export_base.ExportModule): + """The export module for the translation task.""" + + @dataclasses.dataclass + class Params(base_config.Config): + sentencepiece_model_path: str = "" + # Needs to be specified if padded_decode is True/on TPUs. + batch_size: Optional[int] = None + + def __init__(self, params, model: tf.keras.Model, inference_step=None): + super().__init__(params, model, inference_step) + self._sp_tokenizer = tf_text.SentencepieceTokenizer( + model=tf.io.gfile.GFile(params.sentencepiece_model_path, "rb").read(), + add_eos=True) + try: + empty_str_tokenized = self._sp_tokenizer.tokenize("").numpy() + except tf.errors.InternalError: + raise ValueError( + "EOS token not in tokenizer vocab." + "Please make sure the tokenizer generates a single token for an " + "empty string.") + self._eos_id = empty_str_tokenized.item() + self._batch_size = params.batch_size + + @tf.function + def serve(self, inputs) -> Dict[str, tf.Tensor]: + return self.inference_step(inputs) + + @tf.function + def serve_text(self, text: tf.Tensor) -> Dict[str, tf.Tensor]: + tokenized = self._sp_tokenizer.tokenize(text).to_tensor(0) + return self._sp_tokenizer.detokenize( + self.serve({"inputs": tokenized})["outputs"]) + + def get_inference_signatures(self, function_keys: Dict[Text, Text]): + signatures = {} + valid_keys = ("serve_text") + for func_key, signature_key in function_keys.items(): + if func_key not in valid_keys: + raise ValueError("Invalid function key for the module: %s with key %s. " + "Valid keys are: %s" % + (self.__class__, func_key, valid_keys)) + if func_key == "serve_text": + signatures[signature_key] = self.serve_text.get_concrete_function( + tf.TensorSpec(shape=[self._batch_size], + dtype=tf.string, name="text")) + return signatures diff --git a/official/nlp/serving/serving_modules_test.py b/official/nlp/serving/serving_modules_test.py index 16c481c98a8b1bb99887ccb51019b6a8e415a8b4..e967c60662930d851860c72266372a2479f1268a 100644 --- a/official/nlp/serving/serving_modules_test.py +++ b/official/nlp/serving/serving_modules_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,8 +15,12 @@ """Tests for nlp.serving.serving_modules.""" import os + from absl.testing import parameterized import tensorflow as tf + +from sentencepiece import SentencePieceTrainer +from official.core import export_base from official.nlp.configs import bert from official.nlp.configs import encoders from official.nlp.serving import serving_modules @@ -24,6 +28,7 @@ from official.nlp.tasks import masked_lm from official.nlp.tasks import question_answering from official.nlp.tasks import sentence_prediction from official.nlp.tasks import tagging +from official.nlp.tasks import translation def _create_fake_serialized_examples(features_dict): @@ -59,6 +64,33 @@ def _create_fake_vocab_file(vocab_file_path): outfile.write("\n".join(tokens)) +def _train_sentencepiece(input_path, vocab_size, model_path, eos_id=1): + argstr = " ".join([ + f"--input={input_path}", f"--vocab_size={vocab_size}", + "--character_coverage=0.995", + f"--model_prefix={model_path}", "--model_type=bpe", + "--bos_id=-1", "--pad_id=0", f"--eos_id={eos_id}", "--unk_id=2" + ]) + SentencePieceTrainer.Train(argstr) + + +def _generate_line_file(filepath, lines): + with tf.io.gfile.GFile(filepath, "w") as f: + for l in lines: + f.write("{}\n".format(l)) + + +def _make_sentencepeice(output_dir): + src_lines = ["abc ede fg", "bbcd ef a g", "de f a a g"] + tgt_lines = ["dd cc a ef g", "bcd ef a g", "gef cd ba"] + sentencepeice_input_path = os.path.join(output_dir, "inputs.txt") + _generate_line_file(sentencepeice_input_path, src_lines + tgt_lines) + sentencepeice_model_prefix = os.path.join(output_dir, "sp") + _train_sentencepiece(sentencepeice_input_path, 11, sentencepeice_model_prefix) + sentencepeice_model_path = "{}.model".format(sentencepeice_model_prefix) + return sentencepeice_model_path + + class ServingModulesTest(tf.test.TestCase, parameterized.TestCase): @parameterized.parameters( @@ -312,6 +344,48 @@ class ServingModulesTest(tf.test.TestCase, parameterized.TestCase): with self.assertRaises(ValueError): _ = export_module.get_inference_signatures({"foo": None}) + @parameterized.parameters( + (False, None), + (True, 2)) + def test_translation(self, padded_decode, batch_size): + sp_path = _make_sentencepeice(self.get_temp_dir()) + encdecoder = translation.EncDecoder( + num_attention_heads=4, intermediate_size=256) + config = translation.TranslationConfig( + model=translation.ModelConfig( + encoder=encdecoder, + decoder=encdecoder, + embedding_width=256, + padded_decode=padded_decode, + decode_max_length=100), + sentencepiece_model_path=sp_path, + ) + task = translation.TranslationTask(config) + model = task.build_model() + + params = serving_modules.Translation.Params( + sentencepiece_model_path=sp_path, batch_size=batch_size) + export_module = serving_modules.Translation(params=params, model=model) + functions = export_module.get_inference_signatures({ + "serve_text": "serving_default" + }) + outputs = functions["serving_default"](tf.constant(["abcd", "ef gh"])) + self.assertEqual(outputs.shape, (2,)) + self.assertEqual(outputs.dtype, tf.string) + + tmp_dir = self.get_temp_dir() + tmp_dir = os.path.join(tmp_dir, "padded_decode", str(padded_decode)) + export_base_dir = os.path.join(tmp_dir, "export") + ckpt_dir = os.path.join(tmp_dir, "ckpt") + ckpt_path = tf.train.Checkpoint(model=model).save(ckpt_dir) + export_dir = export_base.export(export_module, + {"serve_text": "serving_default"}, + export_base_dir, ckpt_path) + loaded = tf.saved_model.load(export_dir) + infer = loaded.signatures["serving_default"] + out = infer(text=tf.constant(["abcd", "ef gh"])) + self.assertLen(out["output_0"], 2) + if __name__ == "__main__": tf.test.main() diff --git a/official/nlp/tasks/__init__.py b/official/nlp/tasks/__init__.py index e506913b0c22a84006e647058636fe08a7cb894b..cec41dff173e4bd1e5ff9b865344a557b57055c5 100644 --- a/official/nlp/tasks/__init__.py +++ b/official/nlp/tasks/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/dual_encoder.py b/official/nlp/tasks/dual_encoder.py index 24c750d9ed547c62976976fcde788fa30089f331..116456b590988f72492217795647bcbc3a25cc77 100644 --- a/official/nlp/tasks/dual_encoder.py +++ b/official/nlp/tasks/dual_encoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -187,9 +187,13 @@ class DualEncoderTask(base_task.Task): def initialize(self, model): """Load a pretrained checkpoint (if exists) and then train from iter 0.""" ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): + logging.info('Trying to load pretrained checkpoint from %s', + ckpt_dir_or_file) + if ckpt_dir_or_file and tf.io.gfile.isdir(ckpt_dir_or_file): ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) if not ckpt_dir_or_file: + logging.info('No checkpoint file found from %s. Will not load.', + ckpt_dir_or_file) return pretrain2finetune_mapping = { diff --git a/official/nlp/tasks/dual_encoder_test.py b/official/nlp/tasks/dual_encoder_test.py index 96763871f06f5e7d992e79dfe8e0d0f33b6fb020..3e1a72605ae62f7029b8de103a4acd4be1ce0f8d 100644 --- a/official/nlp/tasks/dual_encoder_test.py +++ b/official/nlp/tasks/dual_encoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ import os from absl.testing import parameterized import tensorflow as tf -from official.nlp.bert import configs +from official.legacy.bert import configs from official.nlp.configs import bert from official.nlp.configs import encoders from official.nlp.data import dual_encoder_dataloader diff --git a/official/nlp/tasks/electra_task.py b/official/nlp/tasks/electra_task.py index 6853a2cc246acd79f7ee81c7ba0b843ac2c9bfb3..9473c0d4e0626cef58c23574f68559a6077d3f78 100644 --- a/official/nlp/tasks/electra_task.py +++ b/official/nlp/tasks/electra_task.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/electra_task_test.py b/official/nlp/tasks/electra_task_test.py index 4f775d26906dc93f78b3fdd66f1cbb230c558104..4018c9220acef88272ffda2385dd97ee44b92f48 100644 --- a/official/nlp/tasks/electra_task_test.py +++ b/official/nlp/tasks/electra_task_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/masked_lm.py b/official/nlp/tasks/masked_lm.py index 8e5802ada291c332ed80d874030b5f36f099f835..f784b141676e0bb55868a4380b1fb2e91064cf99 100644 --- a/official/nlp/tasks/masked_lm.py +++ b/official/nlp/tasks/masked_lm.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/masked_lm_determinism_test.py b/official/nlp/tasks/masked_lm_determinism_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d9aa90a041ef04eb0dcf5dc95292a83c15d39e79 --- /dev/null +++ b/official/nlp/tasks/masked_lm_determinism_test.py @@ -0,0 +1,103 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests that masked LM models are deterministic when determinism is enabled.""" + +import tensorflow as tf + +from official.nlp.configs import bert +from official.nlp.configs import encoders +from official.nlp.data import pretrain_dataloader +from official.nlp.tasks import masked_lm + + +class MLMTaskTest(tf.test.TestCase): + + def _build_dataset(self, params, vocab_size): + def dummy_data(_): + dummy_ids = tf.random.uniform((1, params.seq_length), maxval=vocab_size, + dtype=tf.int32) + dummy_mask = tf.ones((1, params.seq_length), dtype=tf.int32) + dummy_type_ids = tf.zeros((1, params.seq_length), dtype=tf.int32) + dummy_lm = tf.zeros((1, params.max_predictions_per_seq), dtype=tf.int32) + return dict( + input_word_ids=dummy_ids, + input_mask=dummy_mask, + input_type_ids=dummy_type_ids, + masked_lm_positions=dummy_lm, + masked_lm_ids=dummy_lm, + masked_lm_weights=tf.cast(dummy_lm, dtype=tf.float32), + next_sentence_labels=tf.zeros((1, 1), dtype=tf.int32)) + + dataset = tf.data.Dataset.range(1) + dataset = dataset.repeat() + dataset = dataset.map( + dummy_data, num_parallel_calls=tf.data.experimental.AUTOTUNE) + return dataset + + def _build_and_run_model(self, config, num_steps=5): + task = masked_lm.MaskedLMTask(config) + model = task.build_model() + metrics = task.build_metrics() + dataset = self._build_dataset(config.train_data, + config.model.encoder.get().vocab_size) + + iterator = iter(dataset) + optimizer = tf.keras.optimizers.SGD(lr=0.1) + + # Run training + for _ in range(num_steps): + logs = task.train_step(next(iterator), model, optimizer, metrics=metrics) + for metric in metrics: + logs[metric.name] = metric.result() + + # Run validation + validation_logs = task.validation_step(next(iterator), model, + metrics=metrics) + for metric in metrics: + validation_logs[metric.name] = metric.result() + + return logs, validation_logs, model.weights + + def test_task_determinism(self): + config = masked_lm.MaskedLMConfig( + init_checkpoint=self.get_temp_dir(), + scale_loss=True, + model=bert.PretrainerConfig( + encoder=encoders.EncoderConfig( + bert=encoders.BertEncoderConfig(vocab_size=30522, + num_layers=1)), + cls_heads=[ + bert.ClsHeadConfig( + inner_dim=10, num_classes=2, name="next_sentence") + ]), + train_data=pretrain_dataloader.BertPretrainDataConfig( + max_predictions_per_seq=20, + seq_length=128, + global_batch_size=1)) + + tf.keras.utils.set_random_seed(1) + logs1, validation_logs1, weights1 = self._build_and_run_model(config) + tf.keras.utils.set_random_seed(1) + logs2, validation_logs2, weights2 = self._build_and_run_model(config) + + self.assertEqual(logs1["loss"], logs2["loss"]) + self.assertEqual(validation_logs1["loss"], validation_logs2["loss"]) + for weight1, weight2 in zip(weights1, weights2): + self.assertAllEqual(weight1, weight2) + + +if __name__ == "__main__": + tf.config.experimental.enable_op_determinism() + tf.test.main() diff --git a/official/nlp/tasks/masked_lm_test.py b/official/nlp/tasks/masked_lm_test.py index 14774e9859f3389ddb6839a2c1eeacbe4077505e..221fa6c0978a2c890b9e3139021f4cdf7d9642b3 100644 --- a/official/nlp/tasks/masked_lm_test.py +++ b/official/nlp/tasks/masked_lm_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/question_answering.py b/official/nlp/tasks/question_answering.py index aee3fab883434c89114a7633ef5fd934d1122eed..d9c7508fe862e2ffd352a63db22be33002abaa5a 100644 --- a/official/nlp/tasks/question_answering.py +++ b/official/nlp/tasks/question_answering.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,13 +13,13 @@ # limitations under the License. """Question answering task.""" +import dataclasses import functools import json import os from typing import List, Optional from absl import logging -import dataclasses import orbit import tensorflow as tf @@ -27,15 +27,15 @@ from official.core import base_task from official.core import config_definitions as cfg from official.core import task_factory from official.modeling.hyperparams import base_config -from official.nlp.bert import squad_evaluate_v1_1 -from official.nlp.bert import squad_evaluate_v2_0 -from official.nlp.bert import tokenization from official.nlp.configs import encoders from official.nlp.data import data_loader_factory from official.nlp.data import squad_lib as squad_lib_wp from official.nlp.data import squad_lib_sp from official.nlp.modeling import models from official.nlp.tasks import utils +from official.nlp.tools import squad_evaluate_v1_1 +from official.nlp.tools import squad_evaluate_v2_0 +from official.nlp.tools import tokenization @dataclasses.dataclass diff --git a/official/nlp/tasks/question_answering_test.py b/official/nlp/tasks/question_answering_test.py index aa79e3ae86eaf54dca5318df6fef8ceec48ba703..cc50592a829c35336677bd95ce947523ac3edb5a 100644 --- a/official/nlp/tasks/question_answering_test.py +++ b/official/nlp/tasks/question_answering_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/sentence_prediction.py b/official/nlp/tasks/sentence_prediction.py index abc038a000fd6934bddd1a9d96b228ecd7884383..41b39eb6f08e8bba87339219c09aa1e2b1ff97b3 100644 --- a/official/nlp/tasks/sentence_prediction.py +++ b/official/nlp/tasks/sentence_prediction.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -34,7 +34,7 @@ from official.nlp.modeling import models from official.nlp.tasks import utils METRIC_TYPES = frozenset( - ['accuracy', 'matthews_corrcoef', 'pearson_spearman_corr']) + ['accuracy', 'f1', 'matthews_corrcoef', 'pearson_spearman_corr']) @dataclasses.dataclass @@ -165,14 +165,17 @@ class SentencePredictionTask(base_task.Task): compiled_metrics.update_state(labels[self.label_field], model_outputs) def validation_step(self, inputs, model: tf.keras.Model, metrics=None): - if self.metric_type == 'accuracy': - return super(SentencePredictionTask, - self).validation_step(inputs, model, metrics) features, labels = inputs, inputs outputs = self.inference_step(features, model) loss = self.build_losses( labels=labels, model_outputs=outputs, aux_losses=model.losses) logs = {self.loss: loss} + if metrics: + self.process_metrics(metrics, labels, outputs) + if model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in metrics or []}) + logs.update({m.name: m.result() for m in model.metrics}) if self.metric_type == 'matthews_corrcoef': logs.update({ 'sentence_prediction': # Ensure one prediction along batch dimension. @@ -180,7 +183,7 @@ class SentencePredictionTask(base_task.Task): 'labels': labels[self.label_field], }) - if self.metric_type == 'pearson_spearman_corr': + else: logs.update({ 'sentence_prediction': outputs, 'labels': labels[self.label_field], @@ -202,18 +205,20 @@ class SentencePredictionTask(base_task.Task): def reduce_aggregated_logs(self, aggregated_logs, global_step=None): if self.metric_type == 'accuracy': return None + + preds = np.concatenate(aggregated_logs['sentence_prediction'], axis=0) + labels = np.concatenate(aggregated_logs['labels'], axis=0) + if self.metric_type == 'f1': + preds = np.argmax(preds, axis=1) + return {self.metric_type: sklearn_metrics.f1_score(labels, preds)} elif self.metric_type == 'matthews_corrcoef': - preds = np.concatenate(aggregated_logs['sentence_prediction'], axis=0) preds = np.reshape(preds, -1) - labels = np.concatenate(aggregated_logs['labels'], axis=0) labels = np.reshape(labels, -1) return { self.metric_type: sklearn_metrics.matthews_corrcoef(preds, labels) } elif self.metric_type == 'pearson_spearman_corr': - preds = np.concatenate(aggregated_logs['sentence_prediction'], axis=0) preds = np.reshape(preds, -1) - labels = np.concatenate(aggregated_logs['labels'], axis=0) labels = np.reshape(labels, -1) pearson_corr = stats.pearsonr(preds, labels)[0] spearman_corr = stats.spearmanr(preds, labels)[0] @@ -223,10 +228,14 @@ class SentencePredictionTask(base_task.Task): def initialize(self, model): """Load a pretrained checkpoint (if exists) and then train from iter 0.""" ckpt_dir_or_file = self.task_config.init_checkpoint + logging.info('Trying to load pretrained checkpoint from %s', + ckpt_dir_or_file) + if ckpt_dir_or_file and tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) if not ckpt_dir_or_file: + logging.info('No checkpoint file found from %s. Will not load.', + ckpt_dir_or_file) return - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) pretrain2finetune_mapping = { 'encoder': model.checkpoint_items['encoder'], diff --git a/official/nlp/tasks/sentence_prediction_test.py b/official/nlp/tasks/sentence_prediction_test.py index 94d056fee6b059ac96e0de01780de9499d612934..316ff7dabe169133eeb28bd66d56de952936ff40 100644 --- a/official/nlp/tasks/sentence_prediction_test.py +++ b/official/nlp/tasks/sentence_prediction_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -32,10 +32,12 @@ def _create_fake_dataset(output_path, seq_length, num_classes, num_examples): writer = tf.io.TFRecordWriter(output_path) def create_int_feature(values): - return tf.train.Feature(int64_list=tf.train.Int64List(value=list(values))) + return tf.train.Feature( + int64_list=tf.train.Int64List(value=np.ravel(values))) def create_float_feature(values): - return tf.train.Feature(float_list=tf.train.FloatList(value=list(values))) + return tf.train.Feature( + float_list=tf.train.FloatList(value=np.ravel(values))) for i in range(num_examples): features = {} @@ -81,7 +83,7 @@ class SentencePredictionTaskTest(tf.test.TestCase, parameterized.TestCase): functools.partial(task.build_inputs, config.train_data)) iterator = iter(dataset) - optimizer = tf.keras.optimizers.SGD(lr=0.1) + optimizer = tf.keras.optimizers.SGD(learning_rate=0.1) task.train_step(next(iterator), model, optimizer, metrics=metrics) model.save(os.path.join(self.get_temp_dir(), "saved_model")) return task.validation_step(next(iterator), model, metrics=metrics) @@ -118,7 +120,7 @@ class SentencePredictionTaskTest(tf.test.TestCase, parameterized.TestCase): dataset = task.build_inputs(config.train_data) iterator = iter(dataset) - optimizer = tf.keras.optimizers.SGD(lr=0.1) + optimizer = tf.keras.optimizers.SGD(learning_rate=0.1) task.initialize(model) task.train_step(next(iterator), model, optimizer, metrics=metrics) task.validation_step(next(iterator), model, metrics=metrics) @@ -149,7 +151,7 @@ class SentencePredictionTaskTest(tf.test.TestCase, parameterized.TestCase): dataset = task.build_inputs(config.train_data) iterator = iter(dataset) - optimizer = tf.keras.optimizers.SGD(lr=0.1) + optimizer = tf.keras.optimizers.SGD(learning_rate=0.1) task.train_step(next(iterator), model, optimizer, metrics=metrics) logs = task.validation_step(next(iterator), model, metrics=metrics) @@ -160,7 +162,8 @@ class SentencePredictionTaskTest(tf.test.TestCase, parameterized.TestCase): self.assertLess(loss, 1.0) @parameterized.parameters(("matthews_corrcoef", 2), - ("pearson_spearman_corr", 1)) + ("pearson_spearman_corr", 1), + ("f1", 2)) def test_np_metrics(self, metric_type, num_classes): config = sentence_prediction.SentencePredictionConfig( metric_type=metric_type, diff --git a/official/nlp/tasks/tagging.py b/official/nlp/tasks/tagging.py index bf6a3b7b1828fc9ca7e5d6f1d95f0f3d8f8c224a..5f2a3f64fc28b2a461d75507c479b67f2eff942f 100644 --- a/official/nlp/tasks/tagging.py +++ b/official/nlp/tasks/tagging.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/tagging_test.py b/official/nlp/tasks/tagging_test.py index 98ac97627abdbfb89fe6a55ef87b5e6f89c67b1a..e888abb5614583fd167a5e357325e08862b18f1a 100644 --- a/official/nlp/tasks/tagging_test.py +++ b/official/nlp/tasks/tagging_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/translation.py b/official/nlp/tasks/translation.py index 736d68e3e8b0ed2f245fcf985bd819cb504973a6..bb9591d461738f3a51bff81c799dd20973463154 100644 --- a/official/nlp/tasks/translation.py +++ b/official/nlp/tasks/translation.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/translation_test.py b/official/nlp/tasks/translation_test.py index a7f9d1c0902de4aa90cfa95968dfc88fb0a69026..30cd8b7f352a21fd192ac12dfa8dc3b10b8ce4fa 100644 --- a/official/nlp/tasks/translation_test.py +++ b/official/nlp/tasks/translation_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tasks/utils.py b/official/nlp/tasks/utils.py index 35be4e3d4546dc29f5b43983687d697967eda4f3..44295e6590b5ef61f21952cdb907c82e1ffc6d6e 100644 --- a/official/nlp/tasks/utils.py +++ b/official/nlp/tasks/utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tools/__init__.py b/official/nlp/tools/__init__.py index a25710c222e3327cb20e000db5df5c5651c4a2cc..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 100644 --- a/official/nlp/tools/__init__.py +++ b/official/nlp/tools/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tools/export_tfhub.py b/official/nlp/tools/export_tfhub.py index 0effd56863bbe1fbe956cb25114c1f7705a181a1..e81dabd32fd36fbeaa1d5d0fd4e0cbc610caee90 100644 --- a/official/nlp/tools/export_tfhub.py +++ b/official/nlp/tools/export_tfhub.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -71,8 +71,8 @@ from absl import app from absl import flags import gin +from official.legacy.bert import configs from official.modeling import hyperparams -from official.nlp.bert import configs from official.nlp.configs import encoders from official.nlp.tools import export_tfhub_lib diff --git a/official/nlp/tools/export_tfhub_lib.py b/official/nlp/tools/export_tfhub_lib.py index 7062e41661e9db9f842bd28368e3ad4147eb6514..ad65fd7643bca24068095a748c8aa906ae0a41fb 100644 --- a/official/nlp/tools/export_tfhub_lib.py +++ b/official/nlp/tools/export_tfhub_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,8 +28,8 @@ import tensorflow as tf from tensorflow.core.protobuf import saved_model_pb2 from tensorflow.python.ops import control_flow_ops # pylint: enable=g-direct-tensorflow-import +from official.legacy.bert import configs from official.modeling import tf_utils -from official.nlp.bert import configs from official.nlp.configs import encoders from official.nlp.modeling import layers from official.nlp.modeling import models @@ -84,13 +84,13 @@ def _create_model( """Creates the model to export and the model to restore the checkpoint. Args: - bert_config: A legacy `BertConfig` to create a `BertEncoder` object. - Exactly one of encoder_config and bert_config must be set. + bert_config: A legacy `BertConfig` to create a `BertEncoder` object. Exactly + one of encoder_config and bert_config must be set. encoder_config: An `EncoderConfig` to create an encoder of the configured type (`BertEncoder` or other). - with_mlm: A bool to control the second component of the result. - If True, will create a `BertPretrainerV2` object; otherwise, will - create a `BertEncoder` object. + with_mlm: A bool to control the second component of the result. If True, + will create a `BertPretrainerV2` object; otherwise, will create a + `BertEncoder` object. Returns: A Tuple of (1) a Keras model that will be exported, (2) a `BertPretrainerV2` @@ -110,7 +110,11 @@ def _create_model( # Convert from list of named inputs to dict of inputs keyed by name. # Only the latter accepts a dict of inputs after restoring from SavedModel. - encoder_inputs_dict = {x.name: x for x in encoder.inputs} + if isinstance(encoder.inputs, list) or isinstance(encoder.inputs, tuple): + encoder_inputs_dict = {x.name: x for x in encoder.inputs} + else: + # encoder.inputs by default is dict for BertEncoderV2. + encoder_inputs_dict = encoder.inputs encoder_output_dict = encoder(encoder_inputs_dict) # For interchangeability with other text representations, # add "default" as an alias for BERT's whole-input reptesentations. @@ -129,7 +133,10 @@ def _create_model( encoder_network=encoder, mlm_activation=tf_utils.get_activation(hidden_act)) - pretrainer_inputs_dict = {x.name: x for x in pretrainer.inputs} + if isinstance(pretrainer.inputs, dict): + pretrainer_inputs_dict = pretrainer.inputs + else: + pretrainer_inputs_dict = {x.name: x for x in pretrainer.inputs} pretrainer_output_dict = pretrainer(pretrainer_inputs_dict) mlm_model = tf.keras.Model( inputs=pretrainer_inputs_dict, outputs=pretrainer_output_dict) @@ -206,26 +213,28 @@ def export_model(export_path: Text, encoder_config: An optional `encoders.EncoderConfig` object. model_checkpoint_path: The path to the checkpoint. with_mlm: Whether to export the additional mlm sub-object. - copy_pooler_dense_to_encoder: Whether to copy the pooler's dense layer - used in the next sentence prediction task to the encoder. + copy_pooler_dense_to_encoder: Whether to copy the pooler's dense layer used + in the next sentence prediction task to the encoder. vocab_file: The path to the wordpiece vocab file, or None. - sp_model_file: The path to the sentencepiece model file, or None. - Exactly one of vocab_file and sp_model_file must be set. + sp_model_file: The path to the sentencepiece model file, or None. Exactly + one of vocab_file and sp_model_file must be set. do_lower_case: Whether to lower-case text before tokenization. """ if with_mlm: - core_model, pretrainer = _create_model(bert_config=bert_config, - encoder_config=encoder_config, - with_mlm=with_mlm) + core_model, pretrainer = _create_model( + bert_config=bert_config, + encoder_config=encoder_config, + with_mlm=with_mlm) encoder = pretrainer.encoder_network # It supports both the new pretrainer checkpoint produced by TF-NLP and # the checkpoint converted from TF1 (original BERT, SmallBERTs). checkpoint_items = pretrainer.checkpoint_items checkpoint = tf.train.Checkpoint(**checkpoint_items) else: - core_model, encoder = _create_model(bert_config=bert_config, - encoder_config=encoder_config, - with_mlm=with_mlm) + core_model, encoder = _create_model( + bert_config=bert_config, + encoder_config=encoder_config, + with_mlm=with_mlm) checkpoint = tf.train.Checkpoint( model=encoder, # Legacy checkpoints. encoder=encoder) @@ -279,21 +288,26 @@ class BertPackInputsSavedModelWrapper(tf.train.Checkpoint): # overridable. Having this dynamically determined default argument # requires self.__call__ to be defined in this indirect way. default_seq_length = bert_pack_inputs.seq_length + @tf.function(autograph=False) def call(inputs, seq_length=default_seq_length): return layers.BertPackInputs.bert_pack_inputs( - inputs, seq_length=seq_length, + inputs, + seq_length=seq_length, start_of_sequence_id=bert_pack_inputs.start_of_sequence_id, end_of_segment_id=bert_pack_inputs.end_of_segment_id, padding_id=bert_pack_inputs.padding_id) + self.__call__ = call for ragged_rank in range(1, 3): for num_segments in range(1, 3): - _ = self.__call__.get_concrete_function( - [tf.RaggedTensorSpec([None] * (ragged_rank + 1), dtype=tf.int32) - for _ in range(num_segments)], - seq_length=tf.TensorSpec([], tf.int32)) + _ = self.__call__.get_concrete_function([ + tf.RaggedTensorSpec([None] * (ragged_rank + 1), dtype=tf.int32) + for _ in range(num_segments) + ], + seq_length=tf.TensorSpec( + [], tf.int32)) def create_preprocessing(*, @@ -311,14 +325,14 @@ def create_preprocessing(*, Args: vocab_file: The path to the wordpiece vocab file, or None. - sp_model_file: The path to the sentencepiece model file, or None. - Exactly one of vocab_file and sp_model_file must be set. - This determines the type of tokenzer that is used. + sp_model_file: The path to the sentencepiece model file, or None. Exactly + one of vocab_file and sp_model_file must be set. This determines the type + of tokenzer that is used. do_lower_case: Whether to do lower case. tokenize_with_offsets: Whether to include the .tokenize_with_offsets subobject. - default_seq_length: The sequence length of preprocessing results from - root callable. This is also the default sequence length for the + default_seq_length: The sequence length of preprocessing results from root + callable. This is also the default sequence length for the bert_pack_inputs subobject. Returns: @@ -378,7 +392,8 @@ def create_preprocessing(*, def _move_to_tmpdir(file_path: Optional[Text], tmpdir: Text) -> Optional[Text]: """Returns new path with same basename and hash of original path.""" - if file_path is None: return None + if file_path is None: + return None olddir, filename = os.path.split(file_path) hasher = hashlib.sha1() hasher.update(olddir.encode("utf-8")) @@ -460,12 +475,17 @@ def _check_no_assert(saved_model_path): assert_nodes = [] graph_def = saved_model.meta_graphs[0].graph_def - assert_nodes += ["node '{}' in global graph".format(n.name) - for n in graph_def.node if n.op == "Assert"] + assert_nodes += [ + "node '{}' in global graph".format(n.name) + for n in graph_def.node + if n.op == "Assert" + ] for fdef in graph_def.library.function: assert_nodes += [ "node '{}' in function '{}'".format(n.name, fdef.signature.name) - for n in fdef.node_def if n.op == "Assert"] + for n in fdef.node_def + if n.op == "Assert" + ] if assert_nodes: raise AssertionError( "Internal tool error: " diff --git a/official/nlp/tools/export_tfhub_lib_test.py b/official/nlp/tools/export_tfhub_lib_test.py index d2fade8e9580bb9c7df80de21a523fae38fab0d8..51bb87319d784fd3fbbec5b8e121c23c426f602a 100644 --- a/official/nlp/tools/export_tfhub_lib_test.py +++ b/official/nlp/tools/export_tfhub_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,21 +20,39 @@ import tempfile from absl.testing import parameterized import numpy as np import tensorflow as tf +from tensorflow import estimator as tf_estimator import tensorflow_hub as hub import tensorflow_text as text from sentencepiece import SentencePieceTrainer +from official.legacy.bert import configs from official.modeling import tf_utils -from official.nlp.bert import configs from official.nlp.configs import encoders from official.nlp.modeling import layers from official.nlp.modeling import models from official.nlp.tools import export_tfhub_lib -def _get_bert_config_or_encoder_config(use_bert_config, hidden_size, - num_hidden_layers, vocab_size=100): - """Returns config args for export_tfhub_lib._create_model().""" +def _get_bert_config_or_encoder_config(use_bert_config, + hidden_size, + num_hidden_layers, + encoder_type="albert", + vocab_size=100): + """Generates config args for export_tfhub_lib._create_model(). + + Args: + use_bert_config: bool. If True, returns legacy BertConfig. + hidden_size: int. + num_hidden_layers: int. + encoder_type: str. Can be ['albert', 'bert', 'bert_v2']. If use_bert_config + == True, then model_type is not used. + vocab_size: int. + + Returns: + bert_config, encoder_config. Only one is not None. If + `use_bert_config` == True, the first config is valid. Otherwise + `bert_config` == None. + """ if use_bert_config: bert_config = configs.BertConfig( vocab_size=vocab_size, @@ -46,17 +64,31 @@ def _get_bert_config_or_encoder_config(use_bert_config, hidden_size, encoder_config = None else: bert_config = None - encoder_config = encoders.EncoderConfig( - type="albert", - albert=encoders.AlbertEncoderConfig( - vocab_size=vocab_size, - embedding_width=16, - hidden_size=hidden_size, - intermediate_size=32, - max_position_embeddings=128, - num_attention_heads=2, - num_layers=num_hidden_layers, - dropout_rate=0.1)) + if encoder_type == "albert": + encoder_config = encoders.EncoderConfig( + type="albert", + albert=encoders.AlbertEncoderConfig( + vocab_size=vocab_size, + embedding_width=16, + hidden_size=hidden_size, + intermediate_size=32, + max_position_embeddings=128, + num_attention_heads=2, + num_layers=num_hidden_layers, + dropout_rate=0.1)) + else: + # encoder_type can be 'bert' or 'bert_v2'. + model_config = encoders.BertEncoderConfig( + vocab_size=vocab_size, + embedding_size=16, + hidden_size=hidden_size, + intermediate_size=32, + max_position_embeddings=128, + num_attention_heads=2, + num_layers=num_hidden_layers, + dropout_rate=0.1) + kwargs = {"type": encoder_type, encoder_type: model_config} + encoder_config = encoders.EncoderConfig(**kwargs) return bert_config, encoder_config @@ -105,13 +137,18 @@ class ExportModelTest(tf.test.TestCase, parameterized.TestCase): alternative to BertTokenizer). """ - @parameterized.named_parameters(("Bert", True), ("Albert", False)) - def test_export_model(self, use_bert): + @parameterized.named_parameters( + ("Bert_Legacy", True, None), ("Albert", False, "albert"), + ("BertEncoder", False, "bert"), ("BertEncoderV2", False, "bert_v2")) + def test_export_model(self, use_bert, encoder_type): # Create the encoder and export it. hidden_size = 16 num_hidden_layers = 1 bert_config, encoder_config = _get_bert_config_or_encoder_config( - use_bert, hidden_size, num_hidden_layers) + use_bert, + hidden_size=hidden_size, + num_hidden_layers=num_hidden_layers, + encoder_type=encoder_type) bert_model, encoder = export_tfhub_lib._create_model( bert_config=bert_config, encoder_config=encoder_config, with_mlm=False) self.assertEmpty( @@ -151,8 +188,8 @@ class ExportModelTest(tf.test.TestCase, parameterized.TestCase): _read_asset(hub_layer.resolved_object.sp_model_file)) # Check restored weights. - self.assertEqual(len(bert_model.trainable_weights), - len(hub_layer.trainable_weights)) + self.assertEqual( + len(bert_model.trainable_weights), len(hub_layer.trainable_weights)) for source_weight, hub_weight in zip(bert_model.trainable_weights, hub_layer.trainable_weights): self.assertAllClose(source_weight.numpy(), hub_weight.numpy()) @@ -334,8 +371,8 @@ class ExportModelWithMLMTest(tf.test.TestCase, parameterized.TestCase): # Note that we set `_auto_track_sub_layers` to False when exporting the # SavedModel, so hub_layer has the same number of weights as bert_model; # otherwise, hub_layer will have extra weights from its `mlm` subobject. - self.assertEqual(len(bert_model.trainable_weights), - len(hub_layer.trainable_weights)) + self.assertEqual( + len(bert_model.trainable_weights), len(hub_layer.trainable_weights)) for source_weight, hub_weight in zip(bert_model.trainable_weights, hub_layer.trainable_weights): self.assertAllClose(source_weight, hub_weight) @@ -473,10 +510,11 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): The absolute filename of the created vocab file. """ full_vocab = ["[PAD]", "[UNK]", "[CLS]", "[SEP]" - ] + ["[MASK]"]*add_mask_token + vocab + ] + ["[MASK]"] * add_mask_token + vocab path = os.path.join( - tempfile.mkdtemp(dir=self.get_temp_dir(), # New subdir each time. - prefix=_STRING_NOT_TO_LEAK), + tempfile.mkdtemp( + dir=self.get_temp_dir(), # New subdir each time. + prefix=_STRING_NOT_TO_LEAK), filename) with tf.io.gfile.GFile(path, "w") as f: f.write("\n".join(full_vocab + [""])) @@ -522,22 +560,30 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): model_prefix=model_prefix, model_type="word", input=input_file, - pad_id=0, unk_id=1, control_symbols=control_symbols, + pad_id=0, + unk_id=1, + control_symbols=control_symbols, vocab_size=full_vocab_size, - bos_id=full_vocab_size-2, eos_id=full_vocab_size-1) - SentencePieceTrainer.Train( - " ".join(["--{}={}".format(k, v) for k, v in flags.items()])) + bos_id=full_vocab_size - 2, + eos_id=full_vocab_size - 1) + SentencePieceTrainer.Train(" ".join( + ["--{}={}".format(k, v) for k, v in flags.items()])) return model_prefix + ".model" - def _do_export(self, vocab, do_lower_case, default_seq_length=128, - tokenize_with_offsets=True, use_sp_model=False, - experimental_disable_assert=False, add_mask_token=False): + def _do_export(self, + vocab, + do_lower_case, + default_seq_length=128, + tokenize_with_offsets=True, + use_sp_model=False, + experimental_disable_assert=False, + add_mask_token=False): """Runs SavedModel export and returns the export_path.""" export_path = tempfile.mkdtemp(dir=self.get_temp_dir()) vocab_file = sp_model_file = None if use_sp_model: - sp_model_file = self._make_sp_model_file(vocab, - add_mask_token=add_mask_token) + sp_model_file = self._make_sp_model_file( + vocab, add_mask_token=add_mask_token) else: vocab_file = self._make_vocab_file(vocab, add_mask_token=add_mask_token) export_tfhub_lib.export_preprocessing( @@ -554,19 +600,24 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): def test_no_leaks(self): """Tests not leaking the path to the original vocab file.""" - path = self._do_export( - ["d", "ef", "abc", "xy"], do_lower_case=True, use_sp_model=False) + path = self._do_export(["d", "ef", "abc", "xy"], + do_lower_case=True, + use_sp_model=False) with tf.io.gfile.GFile(os.path.join(path, "saved_model.pb"), "rb") as f: self.assertFalse( # pylint: disable=g-generic-assert _STRING_NOT_TO_LEAK.encode("ascii") in f.read()) @parameterized.named_parameters(("Bert", False), ("Sentencepiece", True)) def test_exported_callables(self, use_sp_model): - preprocess = tf.saved_model.load(self._do_export( - ["d", "ef", "abc", "xy"], do_lower_case=True, - tokenize_with_offsets=not use_sp_model, # TODO(b/181866850): drop this. - experimental_disable_assert=True, # TODO(b/175369555): drop this. - use_sp_model=use_sp_model)) + preprocess = tf.saved_model.load( + self._do_export( + ["d", "ef", "abc", "xy"], + do_lower_case=True, + # TODO(b/181866850): drop this. + tokenize_with_offsets=not use_sp_model, + # TODO(b/175369555): drop this. + experimental_disable_assert=True, + use_sp_model=use_sp_model)) def fold_dim(rt): """Removes the word/subword distinction of BertTokenizer.""" @@ -575,18 +626,20 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): # .tokenize() inputs = tf.constant(["abc d ef", "ABC D EF d"]) token_ids = preprocess.tokenize(inputs) - self.assertAllEqual(fold_dim(token_ids), - tf.ragged.constant([[6, 4, 5], - [6, 4, 5, 4]])) + self.assertAllEqual( + fold_dim(token_ids), tf.ragged.constant([[6, 4, 5], [6, 4, 5, 4]])) special_tokens_dict = { k: v.numpy().item() # Expecting eager Tensor, converting to Python. - for k, v in preprocess.tokenize.get_special_tokens_dict().items()} - self.assertDictEqual(special_tokens_dict, - dict(padding_id=0, - start_of_sequence_id=2, - end_of_segment_id=3, - vocab_size=4+6 if use_sp_model else 4+4)) + for k, v in preprocess.tokenize.get_special_tokens_dict().items() + } + self.assertDictEqual( + special_tokens_dict, + dict( + padding_id=0, + start_of_sequence_id=2, + end_of_segment_id=3, + vocab_size=4 + 6 if use_sp_model else 4 + 4)) # .tokenize_with_offsets() if use_sp_model: @@ -595,92 +648,104 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): else: token_ids, start_offsets, limit_offsets = ( preprocess.tokenize_with_offsets(inputs)) - self.assertAllEqual(fold_dim(token_ids), - tf.ragged.constant([[6, 4, 5], - [6, 4, 5, 4]])) - self.assertAllEqual(fold_dim(start_offsets), - tf.ragged.constant([[0, 4, 6], - [0, 4, 6, 9]])) - self.assertAllEqual(fold_dim(limit_offsets), - tf.ragged.constant([[3, 5, 8], - [3, 5, 8, 10]])) + self.assertAllEqual( + fold_dim(token_ids), tf.ragged.constant([[6, 4, 5], [6, 4, 5, 4]])) + self.assertAllEqual( + fold_dim(start_offsets), tf.ragged.constant([[0, 4, 6], [0, 4, 6, + 9]])) + self.assertAllEqual( + fold_dim(limit_offsets), tf.ragged.constant([[3, 5, 8], [3, 5, 8, + 10]])) self.assertIs(preprocess.tokenize.get_special_tokens_dict, preprocess.tokenize_with_offsets.get_special_tokens_dict) # Root callable. bert_inputs = preprocess(inputs) self.assertAllEqual(bert_inputs["input_word_ids"].shape.as_list(), [2, 128]) - self.assertAllEqual(bert_inputs["input_word_ids"][:, :10], - tf.constant([[2, 6, 4, 5, 3, 0, 0, 0, 0, 0], - [2, 6, 4, 5, 4, 3, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_word_ids"][:, :10], + tf.constant([[2, 6, 4, 5, 3, 0, 0, 0, 0, 0], + [2, 6, 4, 5, 4, 3, 0, 0, 0, 0]])) self.assertAllEqual(bert_inputs["input_mask"].shape.as_list(), [2, 128]) - self.assertAllEqual(bert_inputs["input_mask"][:, :10], - tf.constant([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0], - [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_mask"][:, :10], + tf.constant([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0], + [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]])) self.assertAllEqual(bert_inputs["input_type_ids"].shape.as_list(), [2, 128]) - self.assertAllEqual(bert_inputs["input_type_ids"][:, :10], - tf.constant([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_type_ids"][:, :10], + tf.constant([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])) # .bert_pack_inputs() inputs_2 = tf.constant(["d xy", "xy abc"]) token_ids_2 = preprocess.tokenize(inputs_2) - bert_inputs = preprocess.bert_pack_inputs( - [token_ids, token_ids_2], seq_length=256) + bert_inputs = preprocess.bert_pack_inputs([token_ids, token_ids_2], + seq_length=256) self.assertAllEqual(bert_inputs["input_word_ids"].shape.as_list(), [2, 256]) - self.assertAllEqual(bert_inputs["input_word_ids"][:, :10], - tf.constant([[2, 6, 4, 5, 3, 4, 7, 3, 0, 0], - [2, 6, 4, 5, 4, 3, 7, 6, 3, 0]])) + self.assertAllEqual( + bert_inputs["input_word_ids"][:, :10], + tf.constant([[2, 6, 4, 5, 3, 4, 7, 3, 0, 0], + [2, 6, 4, 5, 4, 3, 7, 6, 3, 0]])) self.assertAllEqual(bert_inputs["input_mask"].shape.as_list(), [2, 256]) - self.assertAllEqual(bert_inputs["input_mask"][:, :10], - tf.constant([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0], - [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]])) + self.assertAllEqual( + bert_inputs["input_mask"][:, :10], + tf.constant([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0], + [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]])) self.assertAllEqual(bert_inputs["input_type_ids"].shape.as_list(), [2, 256]) - self.assertAllEqual(bert_inputs["input_type_ids"][:, :10], - tf.constant([[0, 0, 0, 0, 0, 1, 1, 1, 0, 0], - [0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])) + self.assertAllEqual( + bert_inputs["input_type_ids"][:, :10], + tf.constant([[0, 0, 0, 0, 0, 1, 1, 1, 0, 0], + [0, 0, 0, 0, 0, 0, 1, 1, 1, 0]])) # For BertTokenizer only: repeat relevant parts for do_lower_case=False, # default_seq_length=10, experimental_disable_assert=False, # tokenize_with_offsets=False, and without folding the word/subword dimension. def test_cased_length10(self): - preprocess = tf.saved_model.load(self._do_export( - ["d", "##ef", "abc", "ABC"], - do_lower_case=False, default_seq_length=10, - tokenize_with_offsets=False, - use_sp_model=False, - experimental_disable_assert=False)) + preprocess = tf.saved_model.load( + self._do_export(["d", "##ef", "abc", "ABC"], + do_lower_case=False, + default_seq_length=10, + tokenize_with_offsets=False, + use_sp_model=False, + experimental_disable_assert=False)) inputs = tf.constant(["abc def", "ABC DEF"]) token_ids = preprocess.tokenize(inputs) - self.assertAllEqual(token_ids, tf.ragged.constant([[[6], [4, 5]], - [[7], [1]]])) + self.assertAllEqual(token_ids, + tf.ragged.constant([[[6], [4, 5]], [[7], [1]]])) self.assertFalse(hasattr(preprocess, "tokenize_with_offsets")) bert_inputs = preprocess(inputs) - self.assertAllEqual(bert_inputs["input_word_ids"], - tf.constant([[2, 6, 4, 5, 3, 0, 0, 0, 0, 0], - [2, 7, 1, 3, 0, 0, 0, 0, 0, 0]])) - self.assertAllEqual(bert_inputs["input_mask"], - tf.constant([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0], - [1, 1, 1, 1, 0, 0, 0, 0, 0, 0]])) - self.assertAllEqual(bert_inputs["input_type_ids"], - tf.constant([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_word_ids"], + tf.constant([[2, 6, 4, 5, 3, 0, 0, 0, 0, 0], + [2, 7, 1, 3, 0, 0, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_mask"], + tf.constant([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0], + [1, 1, 1, 1, 0, 0, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_type_ids"], + tf.constant([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])) inputs_2 = tf.constant(["d ABC", "ABC abc"]) token_ids_2 = preprocess.tokenize(inputs_2) bert_inputs = preprocess.bert_pack_inputs([token_ids, token_ids_2]) # Test default seq_length=10. - self.assertAllEqual(bert_inputs["input_word_ids"], - tf.constant([[2, 6, 4, 5, 3, 4, 7, 3, 0, 0], - [2, 7, 1, 3, 7, 6, 3, 0, 0, 0]])) - self.assertAllEqual(bert_inputs["input_mask"], - tf.constant([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0], - [1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])) - self.assertAllEqual(bert_inputs["input_type_ids"], - tf.constant([[0, 0, 0, 0, 0, 1, 1, 1, 0, 0], - [0, 0, 0, 0, 1, 1, 1, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_word_ids"], + tf.constant([[2, 6, 4, 5, 3, 4, 7, 3, 0, 0], + [2, 7, 1, 3, 7, 6, 3, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_mask"], + tf.constant([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0], + [1, 1, 1, 1, 1, 1, 1, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_type_ids"], + tf.constant([[0, 0, 0, 0, 0, 1, 1, 1, 0, 0], + [0, 0, 0, 0, 1, 1, 1, 0, 0, 0]])) # XLA requires fixed shapes for tensors found in graph mode. # Statically known shapes in Python are a particularly firm way to @@ -689,16 +754,21 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): # inference when applied to fully or partially known input shapes. @parameterized.named_parameters(("Bert", False), ("Sentencepiece", True)) def test_shapes(self, use_sp_model): - preprocess = tf.saved_model.load(self._do_export( - ["abc", "def"], do_lower_case=True, - tokenize_with_offsets=not use_sp_model, # TODO(b/181866850): drop this. - experimental_disable_assert=True, # TODO(b/175369555): drop this. - use_sp_model=use_sp_model)) + preprocess = tf.saved_model.load( + self._do_export( + ["abc", "def"], + do_lower_case=True, + # TODO(b/181866850): drop this. + tokenize_with_offsets=not use_sp_model, + # TODO(b/175369555): drop this. + experimental_disable_assert=True, + use_sp_model=use_sp_model)) def expected_bert_input_shapes(batch_size, seq_length): - return dict(input_word_ids=[batch_size, seq_length], - input_mask=[batch_size, seq_length], - input_type_ids=[batch_size, seq_length]) + return dict( + input_word_ids=[batch_size, seq_length], + input_mask=[batch_size, seq_length], + input_type_ids=[batch_size, seq_length]) for batch_size in [7, None]: if use_sp_model: @@ -706,11 +776,9 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): else: token_out_shape = [batch_size, None, None] self.assertEqual( - _result_shapes_in_tf_function( - preprocess.tokenize, - tf.TensorSpec([batch_size], tf.string)), - token_out_shape, - "with batch_size=%s" % batch_size) + _result_shapes_in_tf_function(preprocess.tokenize, + tf.TensorSpec([batch_size], tf.string)), + token_out_shape, "with batch_size=%s" % batch_size) # TODO(b/181866850): Enable tokenize_with_offsets when it works and test. if use_sp_model: self.assertFalse(hasattr(preprocess, "tokenize_with_offsets")) @@ -718,8 +786,7 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): self.assertEqual( _result_shapes_in_tf_function( preprocess.tokenize_with_offsets, - tf.TensorSpec([batch_size], tf.string)), - [token_out_shape] * 3, + tf.TensorSpec([batch_size], tf.string)), [token_out_shape] * 3, "with batch_size=%s" % batch_size) self.assertEqual( _result_shapes_in_tf_function( @@ -737,7 +804,9 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): def test_reexport(self, use_sp_model): """Test that preprocess keeps working after another save/load cycle.""" path1 = self._do_export( - ["d", "ef", "abc", "xy"], do_lower_case=True, default_seq_length=10, + ["d", "ef", "abc", "xy"], + do_lower_case=True, + default_seq_length=10, tokenize_with_offsets=False, experimental_disable_assert=True, # TODO(b/175369555): drop this. use_sp_model=use_sp_model) @@ -752,35 +821,46 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): inputs = tf.constant(["abc d ef", "ABC D EF d"]) bert_inputs = model2(inputs) - self.assertAllEqual(bert_inputs["input_word_ids"], - tf.constant([[2, 6, 4, 5, 3, 0, 0, 0, 0, 0], - [2, 6, 4, 5, 4, 3, 0, 0, 0, 0]])) - self.assertAllEqual(bert_inputs["input_mask"], - tf.constant([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0], - [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]])) - self.assertAllEqual(bert_inputs["input_type_ids"], - tf.constant([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], - [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_word_ids"], + tf.constant([[2, 6, 4, 5, 3, 0, 0, 0, 0, 0], + [2, 6, 4, 5, 4, 3, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_mask"], + tf.constant([[1, 1, 1, 1, 1, 0, 0, 0, 0, 0], + [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]])) + self.assertAllEqual( + bert_inputs["input_type_ids"], + tf.constant([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], + [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])) @parameterized.named_parameters(("Bert", True), ("Albert", False)) def test_preprocessing_for_mlm(self, use_bert): """Combines both SavedModel types and TF.text helpers for MLM.""" # Create the preprocessing SavedModel with a [MASK] token. - non_special_tokens = ["hello", "world", - "nice", "movie", "great", "actors", - "quick", "fox", "lazy", "dog"] - preprocess = tf.saved_model.load(self._do_export( - non_special_tokens, do_lower_case=True, - tokenize_with_offsets=use_bert, # TODO(b/181866850): drop this. - experimental_disable_assert=True, # TODO(b/175369555): drop this. - add_mask_token=True, use_sp_model=not use_bert)) + non_special_tokens = [ + "hello", "world", "nice", "movie", "great", "actors", "quick", "fox", + "lazy", "dog" + ] + + preprocess = tf.saved_model.load( + self._do_export( + non_special_tokens, + do_lower_case=True, + tokenize_with_offsets=use_bert, # TODO(b/181866850): drop this. + experimental_disable_assert=True, # TODO(b/175369555): drop this. + add_mask_token=True, + use_sp_model=not use_bert)) vocab_size = len(non_special_tokens) + (5 if use_bert else 7) # Create the encoder SavedModel with an .mlm subobject. hidden_size = 16 num_hidden_layers = 2 bert_config, encoder_config = _get_bert_config_or_encoder_config( - use_bert, hidden_size, num_hidden_layers, vocab_size) + use_bert_config=use_bert, + hidden_size=hidden_size, + num_hidden_layers=num_hidden_layers, + vocab_size=vocab_size) _, pretrainer = export_tfhub_lib._create_model( bert_config=bert_config, encoder_config=encoder_config, with_mlm=True) model_checkpoint_dir = os.path.join(self.get_temp_dir(), "checkpoint") @@ -814,8 +894,10 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): self.assertEqual(mask_id, 4) # A batch of 3 segment pairs. - raw_segments = [tf.constant(["hello", "nice movie", "quick fox"]), - tf.constant(["world", "great actors", "lazy dog"])] + raw_segments = [ + tf.constant(["hello", "nice movie", "quick fox"]), + tf.constant(["world", "great actors", "lazy dog"]) + ] batch_size = 3 # Misc hyperparameters. @@ -842,18 +924,18 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): selection_rate=0.5, # Adjusted for the short test examples. unselectable_ids=[start_of_sequence_id, end_of_segment_id]), mask_values_chooser=text.MaskValuesChooser( - vocab_size=vocab_size, mask_token=mask_id, + vocab_size=vocab_size, + mask_token=mask_id, # Always put [MASK] to have a predictable result. - mask_token_rate=1.0, random_token_rate=0.0)) + mask_token_rate=1.0, + random_token_rate=0.0)) # Pad to fixed-length Transformer encoder inputs. - input_word_ids, _ = text.pad_model_inputs(masked_input_ids, - seq_length, - pad_value=padding_id) - input_type_ids, input_mask = text.pad_model_inputs(segment_ids, seq_length, - pad_value=0) - masked_lm_positions, _ = text.pad_model_inputs(masked_lm_positions, - max_selections_per_seq, - pad_value=0) + input_word_ids, _ = text.pad_model_inputs( + masked_input_ids, seq_length, pad_value=padding_id) + input_type_ids, input_mask = text.pad_model_inputs( + segment_ids, seq_length, pad_value=0) + masked_lm_positions, _ = text.pad_model_inputs( + masked_lm_positions, max_selections_per_seq, pad_value=0) masked_lm_positions = tf.cast(masked_lm_positions, tf.int32) num_predictions = int(tf.shape(masked_lm_positions)[1]) @@ -865,7 +947,8 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): # [CLS] nice movie [SEP] great actors [SEP] [2, 7, 8, 3, 9, 10, 3, 0, 0, 0], # [CLS] brown fox [SEP] lazy dog [SEP] - [2, 11, 12, 3, 13, 14, 3, 0, 0, 0]]) + [2, 11, 12, 3, 13, 14, 3, 0, 0, 0] + ]) for i in range(batch_size): for j in range(num_predictions): k = int(masked_lm_positions[i, j]) @@ -896,15 +979,17 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): @parameterized.named_parameters(("Bert", False), ("Sentencepiece", True)) def test_special_tokens_in_estimator(self, use_sp_model): """Tests getting special tokens without an Eager init context.""" - preprocess_export_path = self._do_export( - ["d", "ef", "abc", "xy"], do_lower_case=True, - use_sp_model=use_sp_model, tokenize_with_offsets=False) + preprocess_export_path = self._do_export(["d", "ef", "abc", "xy"], + do_lower_case=True, + use_sp_model=use_sp_model, + tokenize_with_offsets=False) def _get_special_tokens_dict(obj): """Returns special tokens of restored tokenizer as Python values.""" if tf.executing_eagerly(): - special_tokens_numpy = {k: v.numpy() - for k, v in obj.get_special_tokens_dict()} + special_tokens_numpy = { + k: v.numpy() for k, v in obj.get_special_tokens_dict() + } else: with tf.Graph().as_default(): # This code expects `get_special_tokens_dict()` to be a tf.function @@ -913,8 +998,10 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): special_tokens_tensors = obj.get_special_tokens_dict() with tf.compat.v1.Session() as sess: special_tokens_numpy = sess.run(special_tokens_tensors) - return {k: v.item() # Numpy to Python. - for k, v in special_tokens_numpy.items()} + return { + k: v.item() # Numpy to Python. + for k, v in special_tokens_numpy.items() + } def input_fn(): self.assertFalse(tf.executing_eagerly()) @@ -927,7 +1014,8 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): self.assertIsInstance(v, int, "Unexpected type for {}".format(k)) tokens = tokenize(sentences) packed_inputs = layers.BertPackInputs( - 4, special_tokens_dict=special_tokens_dict)(tokens) + 4, special_tokens_dict=special_tokens_dict)( + tokens) preprocessing = tf.keras.Model(sentences, packed_inputs) # Map the dataset. ds = tf.data.Dataset.from_tensors( @@ -937,22 +1025,22 @@ class ExportPreprocessingTest(tf.test.TestCase, parameterized.TestCase): def model_fn(features, labels, mode): del labels # Unused. - return tf.estimator.EstimatorSpec(mode=mode, - predictions=features["input_word_ids"]) + return tf_estimator.EstimatorSpec( + mode=mode, predictions=features["input_word_ids"]) - estimator = tf.estimator.Estimator(model_fn=model_fn) + estimator = tf_estimator.Estimator(model_fn=model_fn) outputs = list(estimator.predict(input_fn)) - self.assertAllEqual(outputs, np.array([[2, 6, 3, 0], - [2, 4, 5, 3]])) + self.assertAllEqual(outputs, np.array([[2, 6, 3, 0], [2, 4, 5, 3]])) # TODO(b/175369555): Remove that code and its test. @parameterized.named_parameters(("Bert", False), ("Sentencepiece", True)) def test_check_no_assert(self, use_sp_model): """Tests the self-check during export without assertions.""" - preprocess_export_path = self._do_export( - ["d", "ef", "abc", "xy"], do_lower_case=True, - use_sp_model=use_sp_model, tokenize_with_offsets=False, - experimental_disable_assert=False) + preprocess_export_path = self._do_export(["d", "ef", "abc", "xy"], + do_lower_case=True, + use_sp_model=use_sp_model, + tokenize_with_offsets=False, + experimental_disable_assert=False) with self.assertRaisesRegex(AssertionError, r"failed to suppress \d+ Assert ops"): export_tfhub_lib._check_no_assert(preprocess_export_path) @@ -963,8 +1051,8 @@ def _result_shapes_in_tf_function(fn, *args, **kwargs): Args: fn: A callable. - *args: TensorSpecs for Tensor-valued arguments and actual values - for Python-valued arguments to fn. + *args: TensorSpecs for Tensor-valued arguments and actual values for + Python-valued arguments to fn. **kwargs: Same for keyword arguments. Returns: diff --git a/official/nlp/bert/squad_evaluate_v1_1.py b/official/nlp/tools/squad_evaluate_v1_1.py similarity index 98% rename from official/nlp/bert/squad_evaluate_v1_1.py rename to official/nlp/tools/squad_evaluate_v1_1.py index a39f571c37b002ab10cfe36a1454827d91512945..795fa471e3dff93f0cc153a2062905bd9ccf52b9 100644 --- a/official/nlp/bert/squad_evaluate_v1_1.py +++ b/official/nlp/tools/squad_evaluate_v1_1.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/bert/squad_evaluate_v2_0.py b/official/nlp/tools/squad_evaluate_v2_0.py similarity index 99% rename from official/nlp/bert/squad_evaluate_v2_0.py rename to official/nlp/tools/squad_evaluate_v2_0.py index 12c5a7e3d6b406e45e4f91580f8b4198733db37c..ac02f72bec56dab8bb0c2d9a9cfae908adb1142f 100644 --- a/official/nlp/bert/squad_evaluate_v2_0.py +++ b/official/nlp/tools/squad_evaluate_v2_0.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/tools/tf1_bert_checkpoint_converter_lib.py b/official/nlp/tools/tf1_bert_checkpoint_converter_lib.py new file mode 100644 index 0000000000000000000000000000000000000000..b34bd00088f2da866093c72619482a01f15677f4 --- /dev/null +++ b/official/nlp/tools/tf1_bert_checkpoint_converter_lib.py @@ -0,0 +1,201 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Convert checkpoints created by Estimator (tf1) to be Keras compatible.""" + +import numpy as np +import tensorflow.compat.v1 as tf # TF 1.x + +# Mapping between old <=> new names. The source pattern in original variable +# name will be replaced by destination pattern. +BERT_NAME_REPLACEMENTS = ( + ("bert", "bert_model"), + ("embeddings/word_embeddings", "word_embeddings/embeddings"), + ("embeddings/token_type_embeddings", + "embedding_postprocessor/type_embeddings"), + ("embeddings/position_embeddings", + "embedding_postprocessor/position_embeddings"), + ("embeddings/LayerNorm", "embedding_postprocessor/layer_norm"), + ("attention/self", "self_attention"), + ("attention/output/dense", "self_attention_output"), + ("attention/output/LayerNorm", "self_attention_layer_norm"), + ("intermediate/dense", "intermediate"), + ("output/dense", "output"), + ("output/LayerNorm", "output_layer_norm"), + ("pooler/dense", "pooler_transform"), +) + +BERT_V2_NAME_REPLACEMENTS = ( + ("bert/", ""), + ("encoder", "transformer"), + ("embeddings/word_embeddings", "word_embeddings/embeddings"), + ("embeddings/token_type_embeddings", "type_embeddings/embeddings"), + ("embeddings/position_embeddings", "position_embedding/embeddings"), + ("embeddings/LayerNorm", "embeddings/layer_norm"), + ("attention/self", "self_attention"), + ("attention/output/dense", "self_attention/attention_output"), + ("attention/output/LayerNorm", "self_attention_layer_norm"), + ("intermediate/dense", "intermediate"), + ("output/dense", "output"), + ("output/LayerNorm", "output_layer_norm"), + ("pooler/dense", "pooler_transform"), + ("cls/predictions", "bert/cls/predictions"), + ("cls/predictions/output_bias", "cls/predictions/output_bias/bias"), + ("cls/seq_relationship/output_bias", "predictions/transform/logits/bias"), + ("cls/seq_relationship/output_weights", + "predictions/transform/logits/kernel"), +) + +BERT_PERMUTATIONS = () + +BERT_V2_PERMUTATIONS = (("cls/seq_relationship/output_weights", (1, 0)),) + + +def _bert_name_replacement(var_name, name_replacements): + """Gets the variable name replacement.""" + for src_pattern, tgt_pattern in name_replacements: + if src_pattern in var_name: + old_var_name = var_name + var_name = var_name.replace(src_pattern, tgt_pattern) + tf.logging.info("Converted: %s --> %s", old_var_name, var_name) + return var_name + + +def _has_exclude_patterns(name, exclude_patterns): + """Checks if a string contains substrings that match patterns to exclude.""" + for p in exclude_patterns: + if p in name: + return True + return False + + +def _get_permutation(name, permutations): + """Checks whether a variable requires transposition by pattern matching.""" + for src_pattern, permutation in permutations: + if src_pattern in name: + tf.logging.info("Permuted: %s --> %s", name, permutation) + return permutation + + return None + + +def _get_new_shape(name, shape, num_heads): + """Checks whether a variable requires reshape by pattern matching.""" + if "self_attention/attention_output/kernel" in name: + return tuple([num_heads, shape[0] // num_heads, shape[1]]) + if "self_attention/attention_output/bias" in name: + return shape + + patterns = [ + "self_attention/query", "self_attention/value", "self_attention/key" + ] + for pattern in patterns: + if pattern in name: + if "kernel" in name: + return tuple([shape[0], num_heads, shape[1] // num_heads]) + if "bias" in name: + return tuple([num_heads, shape[0] // num_heads]) + return None + + +def create_v2_checkpoint(model, + src_checkpoint, + output_path, + checkpoint_model_name="model"): + """Converts a name-based matched TF V1 checkpoint to TF V2 checkpoint.""" + # Uses streaming-restore in eager model to read V1 name-based checkpoints. + model.load_weights(src_checkpoint).assert_existing_objects_matched() + if hasattr(model, "checkpoint_items"): + checkpoint_items = model.checkpoint_items + else: + checkpoint_items = {} + + checkpoint_items[checkpoint_model_name] = model + checkpoint = tf.train.Checkpoint(**checkpoint_items) + checkpoint.save(output_path) + + +def convert(checkpoint_from_path, + checkpoint_to_path, + num_heads, + name_replacements, + permutations, + exclude_patterns=None): + """Migrates the names of variables within a checkpoint. + + Args: + checkpoint_from_path: Path to source checkpoint to be read in. + checkpoint_to_path: Path to checkpoint to be written out. + num_heads: The number of heads of the model. + name_replacements: A list of tuples of the form (match_str, replace_str) + describing variable names to adjust. + permutations: A list of tuples of the form (match_str, permutation) + describing permutations to apply to given variables. Note that match_str + should match the original variable name, not the replaced one. + exclude_patterns: A list of string patterns to exclude variables from + checkpoint conversion. + + Returns: + A dictionary that maps the new variable names to the Variable objects. + A dictionary that maps the old variable names to the new variable names. + """ + with tf.Graph().as_default(): + tf.logging.info("Reading checkpoint_from_path %s", checkpoint_from_path) + reader = tf.train.NewCheckpointReader(checkpoint_from_path) + name_shape_map = reader.get_variable_to_shape_map() + new_variable_map = {} + conversion_map = {} + for var_name in name_shape_map: + if exclude_patterns and _has_exclude_patterns(var_name, exclude_patterns): + continue + # Get the original tensor data. + tensor = reader.get_tensor(var_name) + + # Look up the new variable name, if any. + new_var_name = _bert_name_replacement(var_name, name_replacements) + + # See if we need to reshape the underlying tensor. + new_shape = None + if num_heads > 0: + new_shape = _get_new_shape(new_var_name, tensor.shape, num_heads) + if new_shape: + tf.logging.info("Veriable %s has a shape change from %s to %s", + var_name, tensor.shape, new_shape) + tensor = np.reshape(tensor, new_shape) + + # See if we need to permute the underlying tensor. + permutation = _get_permutation(var_name, permutations) + if permutation: + tensor = np.transpose(tensor, permutation) + + # Create a new variable with the possibly-reshaped or transposed tensor. + var = tf.Variable(tensor, name=var_name) + + # Save the variable into the new variable map. + new_variable_map[new_var_name] = var + + # Keep a list of converter variables for sanity checking. + if new_var_name != var_name: + conversion_map[var_name] = new_var_name + + saver = tf.train.Saver(new_variable_map) + + with tf.Session() as sess: + sess.run(tf.global_variables_initializer()) + tf.logging.info("Writing checkpoint_to_path %s", checkpoint_to_path) + saver.save(sess, checkpoint_to_path, write_meta_graph=False) + + tf.logging.info("Summary:") + tf.logging.info(" Converted %d variable name(s).", len(new_variable_map)) + tf.logging.info(" Converted: %s", str(conversion_map)) diff --git a/official/nlp/tools/tf2_albert_encoder_checkpoint_converter.py b/official/nlp/tools/tf2_albert_encoder_checkpoint_converter.py index 57b32a02fa4ac9679029e84d820c8eef8d23042c..4583e4c4c6525b9f9e72d2a78548a43eeb7f8b1f 100644 --- a/official/nlp/tools/tf2_albert_encoder_checkpoint_converter.py +++ b/official/nlp/tools/tf2_albert_encoder_checkpoint_converter.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,11 +23,11 @@ from absl import app from absl import flags import tensorflow as tf -from official.legacy.nlp.albert import configs +from official.legacy.albert import configs from official.modeling import tf_utils -from official.nlp.bert import tf1_checkpoint_converter_lib from official.nlp.modeling import models from official.nlp.modeling import networks +from official.nlp.tools import tf1_bert_checkpoint_converter_lib FLAGS = flags.FLAGS @@ -128,12 +128,12 @@ def convert_checkpoint(bert_config, output_path, v1_checkpoint, # Create a temporary V1 name-converted checkpoint in the output directory. temporary_checkpoint_dir = os.path.join(output_dir, "temp_v1") temporary_checkpoint = os.path.join(temporary_checkpoint_dir, "ckpt") - tf1_checkpoint_converter_lib.convert( + tf1_bert_checkpoint_converter_lib.convert( checkpoint_from_path=v1_checkpoint, checkpoint_to_path=temporary_checkpoint, num_heads=bert_config.num_attention_heads, name_replacements=ALBERT_NAME_REPLACEMENTS, - permutations=tf1_checkpoint_converter_lib.BERT_V2_PERMUTATIONS, + permutations=tf1_bert_checkpoint_converter_lib.BERT_V2_PERMUTATIONS, exclude_patterns=["adam", "Adam"]) # Create a V2 checkpoint from the temporary checkpoint. @@ -144,9 +144,8 @@ def convert_checkpoint(bert_config, output_path, v1_checkpoint, else: raise ValueError("Unsupported converted_model: %s" % converted_model) - tf1_checkpoint_converter_lib.create_v2_checkpoint(model, temporary_checkpoint, - output_path, - checkpoint_model_name) + tf1_bert_checkpoint_converter_lib.create_v2_checkpoint( + model, temporary_checkpoint, output_path, checkpoint_model_name) # Clean up the temporary checkpoint, if it exists. try: diff --git a/official/nlp/tools/tf2_bert_encoder_checkpoint_converter.py b/official/nlp/tools/tf2_bert_encoder_checkpoint_converter.py new file mode 100644 index 0000000000000000000000000000000000000000..ddbff775faf703eaf6c745cffd0ce9f28142cb20 --- /dev/null +++ b/official/nlp/tools/tf2_bert_encoder_checkpoint_converter.py @@ -0,0 +1,160 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""A converter from a V1 BERT encoder checkpoint to a V2 encoder checkpoint. + +The conversion will yield an object-oriented checkpoint that can be used +to restore a BertEncoder or BertPretrainerV2 object (see the `converted_model` +FLAG below). +""" + +import os + +from absl import app +from absl import flags + +import tensorflow as tf +from official.legacy.bert import configs +from official.modeling import tf_utils +from official.nlp.modeling import models +from official.nlp.modeling import networks +from official.nlp.tools import tf1_bert_checkpoint_converter_lib + +FLAGS = flags.FLAGS + +flags.DEFINE_string("bert_config_file", None, + "Bert configuration file to define core bert layers.") +flags.DEFINE_string( + "checkpoint_to_convert", None, + "Initial checkpoint from a pretrained BERT model core (that is, only the " + "BertModel, with no task heads.)") +flags.DEFINE_string("converted_checkpoint_path", None, + "Name for the created object-based V2 checkpoint.") +flags.DEFINE_string("checkpoint_model_name", "encoder", + "The name of the model when saving the checkpoint, i.e., " + "the checkpoint will be saved using: " + "tf.train.Checkpoint(FLAGS.checkpoint_model_name=model).") +flags.DEFINE_enum( + "converted_model", "encoder", ["encoder", "pretrainer"], + "Whether to convert the checkpoint to a `BertEncoder` model or a " + "`BertPretrainerV2` model (with mlm but without classification heads).") + + +def _create_bert_model(cfg): + """Creates a BERT keras core model from BERT configuration. + + Args: + cfg: A `BertConfig` to create the core model. + + Returns: + A BertEncoder network. + """ + bert_encoder = networks.BertEncoder( + vocab_size=cfg.vocab_size, + hidden_size=cfg.hidden_size, + num_layers=cfg.num_hidden_layers, + num_attention_heads=cfg.num_attention_heads, + intermediate_size=cfg.intermediate_size, + activation=tf_utils.get_activation(cfg.hidden_act), + dropout_rate=cfg.hidden_dropout_prob, + attention_dropout_rate=cfg.attention_probs_dropout_prob, + max_sequence_length=cfg.max_position_embeddings, + type_vocab_size=cfg.type_vocab_size, + initializer=tf.keras.initializers.TruncatedNormal( + stddev=cfg.initializer_range), + embedding_width=cfg.embedding_size) + + return bert_encoder + + +def _create_bert_pretrainer_model(cfg): + """Creates a BERT keras core model from BERT configuration. + + Args: + cfg: A `BertConfig` to create the core model. + + Returns: + A BertPretrainerV2 model. + """ + bert_encoder = _create_bert_model(cfg) + pretrainer = models.BertPretrainerV2( + encoder_network=bert_encoder, + mlm_activation=tf_utils.get_activation(cfg.hidden_act), + mlm_initializer=tf.keras.initializers.TruncatedNormal( + stddev=cfg.initializer_range)) + # Makes sure the pretrainer variables are created. + _ = pretrainer(pretrainer.inputs) + return pretrainer + + +def convert_checkpoint(bert_config, + output_path, + v1_checkpoint, + checkpoint_model_name="model", + converted_model="encoder"): + """Converts a V1 checkpoint into an OO V2 checkpoint.""" + output_dir, _ = os.path.split(output_path) + tf.io.gfile.makedirs(output_dir) + + # Create a temporary V1 name-converted checkpoint in the output directory. + temporary_checkpoint_dir = os.path.join(output_dir, "temp_v1") + temporary_checkpoint = os.path.join(temporary_checkpoint_dir, "ckpt") + + tf1_bert_checkpoint_converter_lib.convert( + checkpoint_from_path=v1_checkpoint, + checkpoint_to_path=temporary_checkpoint, + num_heads=bert_config.num_attention_heads, + name_replacements=( + tf1_bert_checkpoint_converter_lib.BERT_V2_NAME_REPLACEMENTS), + permutations=tf1_bert_checkpoint_converter_lib.BERT_V2_PERMUTATIONS, + exclude_patterns=["adam", "Adam"]) + + if converted_model == "encoder": + model = _create_bert_model(bert_config) + elif converted_model == "pretrainer": + model = _create_bert_pretrainer_model(bert_config) + else: + raise ValueError("Unsupported converted_model: %s" % converted_model) + + # Create a V2 checkpoint from the temporary checkpoint. + tf1_bert_checkpoint_converter_lib.create_v2_checkpoint( + model, temporary_checkpoint, output_path, checkpoint_model_name) + + # Clean up the temporary checkpoint, if it exists. + try: + tf.io.gfile.rmtree(temporary_checkpoint_dir) + except tf.errors.OpError: + # If it doesn't exist, we don't need to clean it up; continue. + pass + + +def main(argv): + if len(argv) > 1: + raise app.UsageError("Too many command-line arguments.") + + output_path = FLAGS.converted_checkpoint_path + v1_checkpoint = FLAGS.checkpoint_to_convert + checkpoint_model_name = FLAGS.checkpoint_model_name + converted_model = FLAGS.converted_model + bert_config = configs.BertConfig.from_json_file(FLAGS.bert_config_file) + convert_checkpoint( + bert_config=bert_config, + output_path=output_path, + v1_checkpoint=v1_checkpoint, + checkpoint_model_name=checkpoint_model_name, + converted_model=converted_model) + + +if __name__ == "__main__": + app.run(main) diff --git a/official/nlp/bert/tokenization.py b/official/nlp/tools/tokenization.py similarity index 99% rename from official/nlp/bert/tokenization.py rename to official/nlp/tools/tokenization.py index ea1546e3c29f33c593c64a4341366254da328b86..65d2b7717b1adb79b6fbc0cd3f93517582b39e3e 100644 --- a/official/nlp/bert/tokenization.py +++ b/official/nlp/tools/tokenization.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/bert/tokenization_test.py b/official/nlp/tools/tokenization_test.py similarity index 97% rename from official/nlp/bert/tokenization_test.py rename to official/nlp/tools/tokenization_test.py index 07759de20b7c6eaf1a964c110da645215c10753a..c67a7e53d44890cd0652fd9cb1cad85cc1c4e024 100644 --- a/official/nlp/bert/tokenization_test.py +++ b/official/nlp/tools/tokenization_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ import tempfile import six import tensorflow as tf -from official.nlp.bert import tokenization +from official.nlp.tools import tokenization class TokenizationTest(tf.test.TestCase): diff --git a/official/nlp/train.py b/official/nlp/train.py index 6d022fdb67ce2d5860076d7f107bae82452c8ba1..feef3d54ea51885ab1dd5bb839885f5df2e3c9fd 100644 --- a/official/nlp/train.py +++ b/official/nlp/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/transformer/README.md b/official/nlp/transformer/README.md deleted file mode 100644 index a3aec5f9a052fa4e591df7c477011d626e6f257b..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/README.md +++ /dev/null @@ -1,220 +0,0 @@ -# Transformer Translation Model -This is an implementation of the Transformer translation model as described in -the [Attention is All You Need](https://arxiv.org/abs/1706.03762) paper. The -implementation leverages tf.keras and makes sure it is compatible with TF 2.x. - -**Warning: the features in the `transformer/` folder have been fully intergrated -into nlp/modeling. -Due to its dependencies, we will remove this folder after the model -garden 2.5 release. The model in `nlp/modeling/models/seq2seq_transformer.py` is -identical to the model in this folder.** - -## Contents - * [Contents](#contents) - * [Walkthrough](#walkthrough) - * [Detailed instructions](#detailed-instructions) - * [Environment preparation](#environment-preparation) - * [Download and preprocess datasets](#download-and-preprocess-datasets) - * [Model training and evaluation](#model-training-and-evaluation) - * [Implementation overview](#implementation-overview) - * [Model Definition](#model-definition) - * [Model Trainer](#model-trainer) - * [Test dataset](#test-dataset) - -## Walkthrough - -Below are the commands for running the Transformer model. See the -[Detailed instructions](#detailed-instructions) for more details on running the -model. - -``` -# Ensure that PYTHONPATH is correctly defined as described in -# https://github.com/tensorflow/models/tree/master/official#requirements -export PYTHONPATH="$PYTHONPATH:/path/to/models" - -cd /path/to/models/official/nlp/transformer - -# Export variables -PARAM_SET=big -DATA_DIR=$HOME/transformer/data -MODEL_DIR=$HOME/transformer/model_$PARAM_SET -VOCAB_FILE=$DATA_DIR/vocab.ende.32768 - -# Download training/evaluation/test datasets -python3 data_download.py --data_dir=$DATA_DIR - -# Train the model for 100000 steps and evaluate every 5000 steps on a single GPU. -# Each train step, takes 4096 tokens as a batch budget with 64 as sequence -# maximal length. -python3 transformer_main.py --data_dir=$DATA_DIR --model_dir=$MODEL_DIR \ - --vocab_file=$VOCAB_FILE --param_set=$PARAM_SET \ - --train_steps=100000 --steps_between_evals=5000 \ - --batch_size=4096 --max_length=64 \ - --bleu_source=$DATA_DIR/newstest2014.en \ - --bleu_ref=$DATA_DIR/newstest2014.de \ - --num_gpus=1 \ - --enable_time_history=false - -# Run during training in a separate process to get continuous updates, -# or after training is complete. -tensorboard --logdir=$MODEL_DIR -``` - -## Detailed instructions - - -0. ### Environment preparation - - #### Add models repo to PYTHONPATH - Follow the instructions described in the [Requirements](https://github.com/tensorflow/models/tree/master/official#requirements) section to add the models folder to the python path. - - #### Export variables (optional) - - Export the following variables, or modify the values in each of the snippets below: - - ```shell - PARAM_SET=big - DATA_DIR=$HOME/transformer/data - MODEL_DIR=$HOME/transformer/model_$PARAM_SET - VOCAB_FILE=$DATA_DIR/vocab.ende.32768 - ``` - -1. ### Download and preprocess datasets - - [data_download.py](data_download.py) downloads and preprocesses the training and evaluation WMT datasets. After the data is downloaded and extracted, the training data is used to generate a vocabulary of subtokens. The evaluation and training strings are tokenized, and the resulting data is sharded, shuffled, and saved as TFRecords. - - 1.75GB of compressed data will be downloaded. In total, the raw files (compressed, extracted, and combined files) take up 8.4GB of disk space. The resulting TFRecord and vocabulary files are 722MB. The script takes around 40 minutes to run, with the bulk of the time spent downloading and ~15 minutes spent on preprocessing. - - Command to run: - ``` - python3 data_download.py --data_dir=$DATA_DIR - ``` - - Arguments: - * `--data_dir`: Path where the preprocessed TFRecord data, and vocab file will be saved. - * Use the `--help` or `-h` flag to get a full list of possible arguments. - -2. ### Model training and evaluation - - [transformer_main.py](transformer_main.py) creates a Transformer keras model, - and trains it uses keras model.fit(). - - Users need to adjust `batch_size` and `num_gpus` to get good performance - running multiple GPUs. - - **Note that:** - when using multiple GPUs or TPUs, this is the global batch size for all - devices. For example, if the batch size is `4096*4` and there are 4 devices, - each device will take 4096 tokens as a batch budget. - - Command to run: - ``` - python3 transformer_main.py --data_dir=$DATA_DIR --model_dir=$MODEL_DIR \ - --vocab_file=$VOCAB_FILE --param_set=$PARAM_SET - ``` - - Arguments: - * `--data_dir`: This should be set to the same directory given to the `data_download`'s `data_dir` argument. - * `--model_dir`: Directory to save Transformer model training checkpoints. - * `--vocab_file`: Path to subtoken vocabulary file. If data_download was used, you may find the file in `data_dir`. - * `--param_set`: Parameter set to use when creating and training the model. Options are `base` and `big` (default). - * `--enable_time_history`: Whether add TimeHistory call. If so, --log_steps must be specified. - * `--batch_size`: The number of tokens to consider in a batch. Combining with - `--max_length`, they decide how many sequences are used per batch. - * Use the `--help` or `-h` flag to get a full list of possible arguments. - - #### Using multiple GPUs - You can train these models on multiple GPUs using `tf.distribute.Strategy` API. - You can read more about them in this - [guide](https://www.tensorflow.org/guide/distribute_strategy). - - In this example, we have made it easier to use is with just a command line flag - `--num_gpus`. By default this flag is 1 if TensorFlow is compiled with CUDA, - and 0 otherwise. - - - --num_gpus=0: Uses tf.distribute.OneDeviceStrategy with CPU as the device. - - --num_gpus=1: Uses tf.distribute.OneDeviceStrategy with GPU as the device. - - --num_gpus=2+: Uses tf.distribute.MirroredStrategy to run synchronous - distributed training across the GPUs. - - #### Using Cloud TPUs - - You can train the Transformer model on Cloud TPUs using - `tf.distribute.TPUStrategy`. If you are not familiar with Cloud TPUs, it is - strongly recommended that you go through the - [quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to - create a TPU and GCE VM. - - To run the Transformer model on a TPU, you must set - `--distribution_strategy=tpu`, `--tpu=$TPU_NAME`, and `--use_ctl=True` where - `$TPU_NAME` the name of your TPU in the Cloud Console. - - An example command to run Transformer on a v2-8 or v3-8 TPU would be: - - ```bash - python transformer_main.py \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --vocab_file=$DATA_DIR/vocab.ende.32768 \ - --bleu_source=$DATA_DIR/newstest2014.en \ - --bleu_ref=$DATA_DIR/newstest2014.end \ - --batch_size=6144 \ - --train_steps=2000 \ - --static_batch=true \ - --use_ctl=true \ - --param_set=big \ - --max_length=64 \ - --decode_batch_size=32 \ - --decode_max_length=97 \ - --padded_decode=true \ - --distribution_strategy=tpu - ``` - Note: `$MODEL_DIR` and `$DATA_DIR` must be GCS paths. - - #### Customizing training schedule - - By default, the model will train for 10 epochs, and evaluate after every epoch. The training schedule may be defined through the flags: - - * Training with steps: - * `--train_steps`: sets the total number of training steps to run. - * `--steps_between_evals`: Number of training steps to run between evaluations. - - #### Compute BLEU score during model evaluation - - Use these flags to compute the BLEU when the model evaluates: - - * `--bleu_source`: Path to file containing text to translate. - * `--bleu_ref`: Path to file containing the reference translation. - - When running `transformer_main.py`, use the flags: `--bleu_source=$DATA_DIR/newstest2014.en --bleu_ref=$DATA_DIR/newstest2014.de` - - #### Tensorboard - Training and evaluation metrics (loss, accuracy, approximate BLEU score, etc.) are logged, and can be displayed in the browser using Tensorboard. - ``` - tensorboard --logdir=$MODEL_DIR - ``` - The values are displayed at [localhost:6006](localhost:6006). - -## Implementation overview - -A brief look at each component in the code: - -### Model Definition -* [transformer.py](transformer.py): Defines a tf.keras.Model: `Transformer`. -* [embedding_layer.py](embedding_layer.py): Contains the layer that calculates the embeddings. The embedding weights are also used to calculate the pre-softmax probabilities from the decoder output. -* [attention_layer.py](attention_layer.py): Defines the multi-headed and self attention layers that are used in the encoder/decoder stacks. -* [ffn_layer.py](ffn_layer.py): Defines the feedforward network that is used in the encoder/decoder stacks. The network is composed of 2 fully connected layers. - -Other files: -* [beam_search.py](beam_search.py) contains the beam search implementation, which is used during model inference to find high scoring translations. - -### Model Trainer -[transformer_main.py](transformer_main.py) creates an `TransformerTask` to train and evaluate the model using tf.keras. - -### Test dataset -The [newstest2014 files](https://storage.googleapis.com/tf-perf-public/official_transformer/test_data/newstest2014.tgz) -are extracted from the [NMT Seq2Seq tutorial](https://google.github.io/seq2seq/nmt/#download-data). -The raw text files are converted from the SGM format of the -[WMT 2016](http://www.statmt.org/wmt16/translation-task.html) test sets. The -newstest2014 files are put into the `$DATA_DIR` when executing `data_download.py` diff --git a/official/nlp/transformer/__init__.py b/official/nlp/transformer/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/nlp/transformer/attention_layer.py b/official/nlp/transformer/attention_layer.py deleted file mode 100644 index db6e95b1a293795614f86aa7041ca767b990f099..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/attention_layer.py +++ /dev/null @@ -1,176 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Implementation of multiheaded attention and self-attention layers.""" -import math - -import tensorflow as tf - - -class Attention(tf.keras.layers.Layer): - """Multi-headed attention layer.""" - - def __init__(self, hidden_size, num_heads, attention_dropout): - """Initialize Attention. - - Args: - hidden_size: int, output dim of hidden layer. - num_heads: int, number of heads to repeat the same attention structure. - attention_dropout: float, dropout rate inside attention for training. - """ - if hidden_size % num_heads: - raise ValueError( - "Hidden size ({}) must be divisible by the number of heads ({})." - .format(hidden_size, num_heads)) - - super(Attention, self).__init__() - self.hidden_size = hidden_size - self.num_heads = num_heads - self.attention_dropout = attention_dropout - - def build(self, input_shape): - """Builds the layer.""" - # Layers for linearly projecting the queries, keys, and values. - size_per_head = self.hidden_size // self.num_heads - - def _glorot_initializer(fan_in, fan_out): - limit = math.sqrt(6.0 / (fan_in + fan_out)) - return tf.keras.initializers.RandomUniform(minval=-limit, maxval=limit) - - attention_initializer = _glorot_initializer(input_shape.as_list()[-1], - self.hidden_size) - self.query_dense_layer = tf.keras.layers.experimental.EinsumDense( - "BTE,ENH->BTNH", - output_shape=(None, self.num_heads, size_per_head), - kernel_initializer=attention_initializer, - bias_axes=None, - name="query") - self.key_dense_layer = tf.keras.layers.experimental.EinsumDense( - "BTE,ENH->BTNH", - output_shape=(None, self.num_heads, size_per_head), - kernel_initializer=attention_initializer, - bias_axes=None, - name="key") - self.value_dense_layer = tf.keras.layers.experimental.EinsumDense( - "BTE,ENH->BTNH", - output_shape=(None, self.num_heads, size_per_head), - kernel_initializer=attention_initializer, - bias_axes=None, - name="value") - - output_initializer = _glorot_initializer(self.hidden_size, self.hidden_size) - self.output_dense_layer = tf.keras.layers.experimental.EinsumDense( - "BTNH,NHE->BTE", - output_shape=(None, self.hidden_size), - kernel_initializer=output_initializer, - bias_axes=None, - name="output_transform") - super(Attention, self).build(input_shape) - - def get_config(self): - return { - "hidden_size": self.hidden_size, - "num_heads": self.num_heads, - "attention_dropout": self.attention_dropout, - } - - def call(self, - query_input, - source_input, - bias, - training, - cache=None, - decode_loop_step=None): - """Apply attention mechanism to query_input and source_input. - - Args: - query_input: A tensor with shape [batch_size, length_query, hidden_size]. - source_input: A tensor with shape [batch_size, length_source, - hidden_size]. - bias: A tensor with shape [batch_size, 1, length_query, length_source], - the attention bias that will be added to the result of the dot product. - training: A bool, whether in training mode or not. - cache: (Used during prediction) A dictionary with tensors containing - results of previous attentions. The dictionary must have the items: - {"k": tensor with shape [batch_size, i, heads, dim_per_head], - "v": tensor with shape [batch_size, i, heads, dim_per_head]} where - i is the current decoded length for non-padded decode, or max - sequence length for padded decode. - decode_loop_step: An integer, step number of the decoding loop. Used only - for autoregressive inference on TPU. - - Returns: - Attention layer output with shape [batch_size, length_query, hidden_size] - """ - # Linearly project the query, key and value using different learned - # projections. Splitting heads is automatically done during the linear - # projections --> [batch_size, length, num_heads, dim_per_head]. - query = self.query_dense_layer(query_input) - key = self.key_dense_layer(source_input) - value = self.value_dense_layer(source_input) - - if cache is not None: - # Combine cached keys and values with new keys and values. - if decode_loop_step is not None: - cache_k_shape = cache["k"].shape.as_list() - indices = tf.reshape( - tf.one_hot(decode_loop_step, cache_k_shape[1], dtype=key.dtype), - [1, cache_k_shape[1], 1, 1]) - key = cache["k"] + key * indices - cache_v_shape = cache["v"].shape.as_list() - indices = tf.reshape( - tf.one_hot(decode_loop_step, cache_v_shape[1], dtype=value.dtype), - [1, cache_v_shape[1], 1, 1]) - value = cache["v"] + value * indices - else: - key = tf.concat([tf.cast(cache["k"], key.dtype), key], axis=1) - value = tf.concat([tf.cast(cache["v"], value.dtype), value], axis=1) - - # Update cache - cache["k"] = key - cache["v"] = value - - # Scale query to prevent the dot product between query and key from growing - # too large. - depth = (self.hidden_size // self.num_heads) - query *= depth**-0.5 - - # Calculate dot product attention - logits = tf.einsum("BTNH,BFNH->BNFT", key, query) - logits += bias - # Note that softmax internally performs math operations using float32 - # for numeric stability. When training with float16, we keep the input - # and output in float16 for better performance. - weights = tf.nn.softmax(logits, name="attention_weights") - if training: - weights = tf.nn.dropout(weights, rate=self.attention_dropout) - attention_output = tf.einsum("BNFT,BTNH->BFNH", weights, value) - - # Run the outputs through another linear projection layer. Recombining heads - # is automatically done --> [batch_size, length, hidden_size] - attention_output = self.output_dense_layer(attention_output) - return attention_output - - -class SelfAttention(Attention): - """Multiheaded self-attention layer.""" - - def call(self, - query_input, - bias, - training, - cache=None, - decode_loop_step=None): - return super(SelfAttention, self).call(query_input, query_input, bias, - training, cache, decode_loop_step) diff --git a/official/nlp/transformer/beam_search_v1.py b/official/nlp/transformer/beam_search_v1.py deleted file mode 100644 index 2c8537e63b20e718b15dfcd042f3263212af8c08..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/beam_search_v1.py +++ /dev/null @@ -1,82 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Beam search to find the translated sequence with the highest probability.""" - -import tensorflow.compat.v1 as tf -from official.nlp.modeling.ops import beam_search - -_StateKeys = beam_search._StateKeys # pylint: disable=protected-access - - -class SequenceBeamSearch(beam_search.SequenceBeamSearch): - """Implementation of beam search loop.""" - - def _process_finished_state(self, finished_state): - alive_seq = finished_state[_StateKeys.ALIVE_SEQ] - alive_log_probs = finished_state[_StateKeys.ALIVE_LOG_PROBS] - finished_seq = finished_state[_StateKeys.FINISHED_SEQ] - finished_scores = finished_state[_StateKeys.FINISHED_SCORES] - finished_flags = finished_state[_StateKeys.FINISHED_FLAGS] - - # Account for corner case where there are no finished sequences for a - # particular batch item. In that case, return alive sequences for that batch - # item. - finished_seq = tf.where( - tf.reduce_any(finished_flags, 1), finished_seq, alive_seq) - finished_scores = tf.where( - tf.reduce_any(finished_flags, 1), finished_scores, alive_log_probs) - return finished_seq, finished_scores - - -def sequence_beam_search(symbols_to_logits_fn, - initial_ids, - initial_cache, - vocab_size, - beam_size, - alpha, - max_decode_length, - eos_id, - padded_decode=False): - """Search for sequence of subtoken ids with the largest probability. - - Args: - symbols_to_logits_fn: A function that takes in ids, index, and cache as - arguments. The passed in arguments will have shape: ids -> A tensor with - shape [batch_size * beam_size, index]. index -> A scalar. cache -> A - nested dictionary of tensors [batch_size * beam_size, ...]. - The function must return a tuple of logits and new cache: logits -> A - tensor with shape [batch * beam_size, vocab_size]. new cache -> A nested - dictionary with the same shape/structure as the inputted cache. - initial_ids: An int32 tensor with shape [batch_size]. Starting ids for each - batch item. - initial_cache: A dictionary, containing starting decoder variables - information. - vocab_size: An integer, the size of the vocabulary, used for topk - computation. - beam_size: An integer, the number of beams. - alpha: A float, defining the strength of length normalization. - max_decode_length: An integer, the maximum length to decoded a sequence. - eos_id: An integer, ID of eos token, used to determine when a sequence has - finished. - padded_decode: A bool, indicating if max_sequence_length padding is used for - beam search. - - Returns: - Top decoded sequences [batch_size, beam_size, max_decode_length] - sequence scores [batch_size, beam_size] - """ - sbs = SequenceBeamSearch(symbols_to_logits_fn, vocab_size, beam_size, alpha, - max_decode_length, eos_id, padded_decode) - return sbs.search(initial_ids, initial_cache) diff --git a/official/nlp/transformer/compute_bleu.py b/official/nlp/transformer/compute_bleu.py deleted file mode 100644 index 38c77261973c024acbcb7047c2c49942f15962e1..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/compute_bleu.py +++ /dev/null @@ -1,148 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Script to compute official BLEU score. - -Source: -https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/bleu_hook.py -""" - -import re -import sys -import unicodedata - -from absl import app -from absl import flags -from absl import logging -import six -from six.moves import range -import tensorflow as tf - -from official.nlp.transformer.utils import metrics -from official.nlp.transformer.utils import tokenizer -from official.utils.flags import core as flags_core - - -class UnicodeRegex(object): - """Ad-hoc hack to recognize all punctuation and symbols.""" - - def __init__(self): - punctuation = self.property_chars("P") - self.nondigit_punct_re = re.compile(r"([^\d])([" + punctuation + r"])") - self.punct_nondigit_re = re.compile(r"([" + punctuation + r"])([^\d])") - self.symbol_re = re.compile("([" + self.property_chars("S") + "])") - - def property_chars(self, prefix): - return "".join( - six.unichr(x) - for x in range(sys.maxunicode) - if unicodedata.category(six.unichr(x)).startswith(prefix)) - - -uregex = UnicodeRegex() - - -def bleu_tokenize(string): - r"""Tokenize a string following the official BLEU implementation. - - See https://github.com/moses-smt/mosesdecoder/' - 'blob/master/scripts/generic/mteval-v14.pl#L954-L983 - In our case, the input string is expected to be just one line - and no HTML entities de-escaping is needed. - So we just tokenize on punctuation and symbols, - except when a punctuation is preceded and followed by a digit - (e.g. a comma/dot as a thousand/decimal separator). - - Note that a numer (e.g. a year) followed by a dot at the end of sentence - is NOT tokenized, - i.e. the dot stays with the number because `s/(\p{P})(\P{N})/ $1 $2/g` - does not match this case (unless we add a space after each sentence). - However, this error is already in the original mteval-v14.pl - and we want to be consistent with it. - - Args: - string: the input string - - Returns: - a list of tokens - """ - string = uregex.nondigit_punct_re.sub(r"\1 \2 ", string) - string = uregex.punct_nondigit_re.sub(r" \1 \2", string) - string = uregex.symbol_re.sub(r" \1 ", string) - return string.split() - - -def bleu_wrapper(ref_filename, hyp_filename, case_sensitive=False): - """Compute BLEU for two files (reference and hypothesis translation).""" - ref_lines = tokenizer.native_to_unicode( - tf.io.gfile.GFile(ref_filename).read()).strip().splitlines() - hyp_lines = tokenizer.native_to_unicode( - tf.io.gfile.GFile(hyp_filename).read()).strip().splitlines() - return bleu_on_list(ref_lines, hyp_lines, case_sensitive) - - -def bleu_on_list(ref_lines, hyp_lines, case_sensitive=False): - """Compute BLEU for two list of strings (reference and hypothesis).""" - if len(ref_lines) != len(hyp_lines): - raise ValueError( - "Reference and translation files have different number of " - "lines (%d VS %d). If training only a few steps (100-200), the " - "translation may be empty." % (len(ref_lines), len(hyp_lines))) - if not case_sensitive: - ref_lines = [x.lower() for x in ref_lines] - hyp_lines = [x.lower() for x in hyp_lines] - ref_tokens = [bleu_tokenize(x) for x in ref_lines] - hyp_tokens = [bleu_tokenize(x) for x in hyp_lines] - return metrics.compute_bleu(ref_tokens, hyp_tokens) * 100 - - -def main(unused_argv): - if FLAGS.bleu_variant in ("both", "uncased"): - score = bleu_wrapper(FLAGS.reference, FLAGS.translation, False) - logging.info("Case-insensitive results: %f", score) - - if FLAGS.bleu_variant in ("both", "cased"): - score = bleu_wrapper(FLAGS.reference, FLAGS.translation, True) - logging.info("Case-sensitive results: %f", score) - - -def define_compute_bleu_flags(): - """Add flags for computing BLEU score.""" - flags.DEFINE_string( - name="translation", - default=None, - help=flags_core.help_wrap("File containing translated text.")) - flags.mark_flag_as_required("translation") - - flags.DEFINE_string( - name="reference", - default=None, - help=flags_core.help_wrap("File containing reference translation.")) - flags.mark_flag_as_required("reference") - - flags.DEFINE_enum( - name="bleu_variant", - short_name="bv", - default="both", - enum_values=["both", "uncased", "cased"], - case_sensitive=False, - help=flags_core.help_wrap( - "Specify one or more BLEU variants to calculate. Variants: \"cased\"" - ", \"uncased\", or \"both\".")) - - -if __name__ == "__main__": - define_compute_bleu_flags() - FLAGS = flags.FLAGS - app.run(main) diff --git a/official/nlp/transformer/compute_bleu_test.py b/official/nlp/transformer/compute_bleu_test.py deleted file mode 100644 index 6160bf66ecfc5f36f18ddf730f96780bda236b50..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/compute_bleu_test.py +++ /dev/null @@ -1,72 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test functions in compute_blue.py.""" - -import tempfile - -import tensorflow as tf - -from official.nlp.transformer import compute_bleu - - -class ComputeBleuTest(tf.test.TestCase): - - def _create_temp_file(self, text): - temp_file = tempfile.NamedTemporaryFile(delete=False) - with tf.io.gfile.GFile(temp_file.name, "w") as w: - w.write(text) - return temp_file.name - - def test_bleu_same(self): - ref = self._create_temp_file("test 1 two 3\nmore tests!") - hyp = self._create_temp_file("test 1 two 3\nmore tests!") - - uncased_score = compute_bleu.bleu_wrapper(ref, hyp, False) - cased_score = compute_bleu.bleu_wrapper(ref, hyp, True) - self.assertEqual(100, uncased_score) - self.assertEqual(100, cased_score) - - def test_bleu_same_different_case(self): - ref = self._create_temp_file("Test 1 two 3\nmore tests!") - hyp = self._create_temp_file("test 1 two 3\nMore tests!") - uncased_score = compute_bleu.bleu_wrapper(ref, hyp, False) - cased_score = compute_bleu.bleu_wrapper(ref, hyp, True) - self.assertEqual(100, uncased_score) - self.assertLess(cased_score, 100) - - def test_bleu_different(self): - ref = self._create_temp_file("Testing\nmore tests!") - hyp = self._create_temp_file("Dog\nCat") - uncased_score = compute_bleu.bleu_wrapper(ref, hyp, False) - cased_score = compute_bleu.bleu_wrapper(ref, hyp, True) - self.assertLess(uncased_score, 100) - self.assertLess(cased_score, 100) - - def test_bleu_tokenize(self): - s = "Test0, 1 two, 3" - tokenized = compute_bleu.bleu_tokenize(s) - self.assertEqual(["Test0", ",", "1", "two", ",", "3"], tokenized) - - def test_bleu_list(self): - ref = ["test 1 two 3", "more tests!"] - hyp = ["test 1 two 3", "More tests!"] - uncased_score = compute_bleu.bleu_on_list(ref, hyp, False) - cased_score = compute_bleu.bleu_on_list(ref, hyp, True) - self.assertEqual(uncased_score, 100) - self.assertLess(cased_score, 100) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/transformer/data_download.py b/official/nlp/transformer/data_download.py deleted file mode 100644 index 5a8b8595fd3031b430ccbc489431d9f7711d982c..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/data_download.py +++ /dev/null @@ -1,443 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Download and preprocess WMT17 ende training and evaluation datasets.""" - -import os -import random -import tarfile - -# pylint: disable=g-bad-import-order - -from absl import app -from absl import flags -from absl import logging -import six -from six.moves import range -from six.moves import urllib -from six.moves import zip -import tensorflow.compat.v1 as tf - -from official.nlp.transformer.utils import tokenizer -from official.utils.flags import core as flags_core -# pylint: enable=g-bad-import-order - -# Data sources for training/evaluating the transformer translation model. -# If any of the training sources are changed, then either: -# 1) use the flag `--search` to find the best min count or -# 2) update the _TRAIN_DATA_MIN_COUNT constant. -# min_count is the minimum number of times a token must appear in the data -# before it is added to the vocabulary. "Best min count" refers to the value -# that generates a vocabulary set that is closest in size to _TARGET_VOCAB_SIZE. -_TRAIN_DATA_SOURCES = [ - { - "url": "http://data.statmt.org/wmt17/translation-task/" - "training-parallel-nc-v12.tgz", - "input": "news-commentary-v12.de-en.en", - "target": "news-commentary-v12.de-en.de", - }, - { - "url": "http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz", - "input": "commoncrawl.de-en.en", - "target": "commoncrawl.de-en.de", - }, - { - "url": "http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz", - "input": "europarl-v7.de-en.en", - "target": "europarl-v7.de-en.de", - }, -] -# Use pre-defined minimum count to generate subtoken vocabulary. -_TRAIN_DATA_MIN_COUNT = 6 - -_EVAL_DATA_SOURCES = [{ - "url": "http://data.statmt.org/wmt17/translation-task/dev.tgz", - "input": "newstest2013.en", - "target": "newstest2013.de", -}] - -_TEST_DATA_SOURCES = [{ - "url": ("https://storage.googleapis.com/cloud-tpu-test-datasets/" - "transformer_data/newstest2014.tgz"), - "input": "newstest2014.en", - "target": "newstest2014.de", -}] - -# Vocabulary constants -_TARGET_VOCAB_SIZE = 32768 # Number of subtokens in the vocabulary list. -_TARGET_THRESHOLD = 327 # Accept vocabulary if size is within this threshold -VOCAB_FILE = "vocab.ende.%d" % _TARGET_VOCAB_SIZE - -# Strings to inclue in the generated files. -_PREFIX = "wmt32k" -_TRAIN_TAG = "train" -_EVAL_TAG = "dev" # Following WMT and Tensor2Tensor conventions, in which the -# evaluation datasets are tagged as "dev" for development. - -# Number of files to split train and evaluation data -_TRAIN_SHARDS = 100 -_EVAL_SHARDS = 1 - - -def find_file(path, filename, max_depth=5): - """Returns full filepath if the file is in path or a subdirectory.""" - for root, dirs, files in os.walk(path): - if filename in files: - return os.path.join(root, filename) - - # Don't search past max_depth - depth = root[len(path) + 1:].count(os.sep) - if depth > max_depth: - del dirs[:] # Clear dirs - return None - - -############################################################################### -# Download and extraction functions -############################################################################### -def get_raw_files(raw_dir, data_source): - """Return raw files from source. - - Downloads/extracts if needed. - - Args: - raw_dir: string directory to store raw files - data_source: dictionary with - {"url": url of compressed dataset containing input and target files - "input": file with data in input language - "target": file with data in target language} - - Returns: - dictionary with - {"inputs": list of files containing data in input language - "targets": list of files containing corresponding data in target language - } - """ - raw_files = { - "inputs": [], - "targets": [], - } # keys - for d in data_source: - input_file, target_file = download_and_extract(raw_dir, d["url"], - d["input"], d["target"]) - raw_files["inputs"].append(input_file) - raw_files["targets"].append(target_file) - return raw_files - - -def download_report_hook(count, block_size, total_size): - """Report hook for download progress. - - Args: - count: current block number - block_size: block size - total_size: total size - """ - percent = int(count * block_size * 100 / total_size) - print(six.ensure_str("\r%d%%" % percent) + " completed", end="\r") - - -def download_from_url(path, url): - """Download content from a url. - - Args: - path: string directory where file will be downloaded - url: string url - - Returns: - Full path to downloaded file - """ - filename = six.ensure_str(url).split("/")[-1] - found_file = find_file(path, filename, max_depth=0) - if found_file is None: - filename = os.path.join(path, filename) - logging.info("Downloading from %s to %s.", url, filename) - inprogress_filepath = six.ensure_str(filename) + ".incomplete" - inprogress_filepath, _ = urllib.request.urlretrieve( - url, inprogress_filepath, reporthook=download_report_hook) - # Print newline to clear the carriage return from the download progress. - print() - tf.gfile.Rename(inprogress_filepath, filename) - return filename - else: - logging.info("Already downloaded: %s (at %s).", url, found_file) - return found_file - - -def download_and_extract(path, url, input_filename, target_filename): - """Extract files from downloaded compressed archive file. - - Args: - path: string directory where the files will be downloaded - url: url containing the compressed input and target files - input_filename: name of file containing data in source language - target_filename: name of file containing data in target language - - Returns: - Full paths to extracted input and target files. - - Raises: - OSError: if the the download/extraction fails. - """ - # Check if extracted files already exist in path - input_file = find_file(path, input_filename) - target_file = find_file(path, target_filename) - if input_file and target_file: - logging.info("Already downloaded and extracted %s.", url) - return input_file, target_file - - # Download archive file if it doesn't already exist. - compressed_file = download_from_url(path, url) - - # Extract compressed files - logging.info("Extracting %s.", compressed_file) - with tarfile.open(compressed_file, "r:gz") as corpus_tar: - corpus_tar.extractall(path) - - # Return file paths of the requested files. - input_file = find_file(path, input_filename) - target_file = find_file(path, target_filename) - - if input_file and target_file: - return input_file, target_file - - raise OSError("Download/extraction failed for url %s to path %s" % - (url, path)) - - -def txt_line_iterator(path): - """Iterate through lines of file.""" - with tf.io.gfile.GFile(path) as f: - for line in f: - yield line.strip() - - -def compile_files(raw_dir, raw_files, tag): - """Compile raw files into a single file for each language. - - Args: - raw_dir: Directory containing downloaded raw files. - raw_files: Dict containing filenames of input and target data. - {"inputs": list of files containing data in input language - "targets": list of files containing corresponding data in target language - } - tag: String to append to the compiled filename. - - Returns: - Full path of compiled input and target files. - """ - logging.info("Compiling files with tag %s.", tag) - filename = "%s-%s" % (_PREFIX, tag) - input_compiled_file = os.path.join(raw_dir, - six.ensure_str(filename) + ".lang1") - target_compiled_file = os.path.join(raw_dir, - six.ensure_str(filename) + ".lang2") - - with tf.io.gfile.GFile(input_compiled_file, mode="w") as input_writer: - with tf.io.gfile.GFile(target_compiled_file, mode="w") as target_writer: - for i in range(len(raw_files["inputs"])): - input_file = raw_files["inputs"][i] - target_file = raw_files["targets"][i] - - logging.info("Reading files %s and %s.", input_file, target_file) - write_file(input_writer, input_file) - write_file(target_writer, target_file) - return input_compiled_file, target_compiled_file - - -def write_file(writer, filename): - """Write all of lines from file using the writer.""" - for line in txt_line_iterator(filename): - writer.write(line) - writer.write("\n") - - -############################################################################### -# Data preprocessing -############################################################################### -def encode_and_save_files(subtokenizer, data_dir, raw_files, tag, total_shards): - """Save data from files as encoded Examples in TFrecord format. - - Args: - subtokenizer: Subtokenizer object that will be used to encode the strings. - data_dir: The directory in which to write the examples - raw_files: A tuple of (input, target) data files. Each line in the input and - the corresponding line in target file will be saved in a tf.Example. - tag: String that will be added onto the file names. - total_shards: Number of files to divide the data into. - - Returns: - List of all files produced. - """ - # Create a file for each shard. - filepaths = [ - shard_filename(data_dir, tag, n + 1, total_shards) - for n in range(total_shards) - ] - - if all_exist(filepaths): - logging.info("Files with tag %s already exist.", tag) - return filepaths - - logging.info("Saving files with tag %s.", tag) - input_file = raw_files[0] - target_file = raw_files[1] - - # Write examples to each shard in round robin order. - tmp_filepaths = [six.ensure_str(fname) + ".incomplete" for fname in filepaths] - writers = [tf.python_io.TFRecordWriter(fname) for fname in tmp_filepaths] - counter, shard = 0, 0 - for counter, (input_line, target_line) in enumerate( - zip(txt_line_iterator(input_file), txt_line_iterator(target_file))): - if counter > 0 and counter % 100000 == 0: - logging.info("\tSaving case %d.", counter) - example = dict_to_example({ - "inputs": subtokenizer.encode(input_line, add_eos=True), - "targets": subtokenizer.encode(target_line, add_eos=True) - }) - writers[shard].write(example.SerializeToString()) - shard = (shard + 1) % total_shards - for writer in writers: - writer.close() - - for tmp_name, final_name in zip(tmp_filepaths, filepaths): - tf.gfile.Rename(tmp_name, final_name) - - logging.info("Saved %d Examples", counter + 1) - return filepaths - - -def shard_filename(path, tag, shard_num, total_shards): - """Create filename for data shard.""" - return os.path.join( - path, "%s-%s-%.5d-of-%.5d" % (_PREFIX, tag, shard_num, total_shards)) - - -def shuffle_records(fname): - """Shuffle records in a single file.""" - logging.info("Shuffling records in file %s", fname) - - # Rename file prior to shuffling - tmp_fname = six.ensure_str(fname) + ".unshuffled" - tf.gfile.Rename(fname, tmp_fname) - - reader = tf.io.tf_record_iterator(tmp_fname) - records = [] - for record in reader: - records.append(record) - if len(records) % 100000 == 0: - logging.info("\tRead: %d", len(records)) - - random.shuffle(records) - - # Write shuffled records to original file name - with tf.python_io.TFRecordWriter(fname) as w: - for count, record in enumerate(records): - w.write(record) - if count > 0 and count % 100000 == 0: - logging.info("\tWriting record: %d", count) - - tf.gfile.Remove(tmp_fname) - - -def dict_to_example(dictionary): - """Converts a dictionary of string->int to a tf.Example.""" - features = {} - for k, v in six.iteritems(dictionary): - features[k] = tf.train.Feature(int64_list=tf.train.Int64List(value=v)) - return tf.train.Example(features=tf.train.Features(feature=features)) - - -def all_exist(filepaths): - """Returns true if all files in the list exist.""" - for fname in filepaths: - if not tf.gfile.Exists(fname): - return False - return True - - -def make_dir(path): - if not tf.gfile.Exists(path): - logging.info("Creating directory %s", path) - tf.gfile.MakeDirs(path) - - -def main(unused_argv): - """Obtain training and evaluation data for the Transformer model.""" - make_dir(FLAGS.raw_dir) - make_dir(FLAGS.data_dir) - - # Download test_data - logging.info("Step 1/5: Downloading test data") - get_raw_files(FLAGS.data_dir, _TEST_DATA_SOURCES) - - # Get paths of download/extracted training and evaluation files. - logging.info("Step 2/5: Downloading data from source") - train_files = get_raw_files(FLAGS.raw_dir, _TRAIN_DATA_SOURCES) - eval_files = get_raw_files(FLAGS.raw_dir, _EVAL_DATA_SOURCES) - - # Create subtokenizer based on the training files. - logging.info("Step 3/5: Creating subtokenizer and building vocabulary") - train_files_flat = train_files["inputs"] + train_files["targets"] - vocab_file = os.path.join(FLAGS.data_dir, VOCAB_FILE) - subtokenizer = tokenizer.Subtokenizer.init_from_files( - vocab_file, - train_files_flat, - _TARGET_VOCAB_SIZE, - _TARGET_THRESHOLD, - min_count=None if FLAGS.search else _TRAIN_DATA_MIN_COUNT) - - logging.info("Step 4/5: Compiling training and evaluation data") - compiled_train_files = compile_files(FLAGS.raw_dir, train_files, _TRAIN_TAG) - compiled_eval_files = compile_files(FLAGS.raw_dir, eval_files, _EVAL_TAG) - - # Tokenize and save data as Examples in the TFRecord format. - logging.info("Step 5/5: Preprocessing and saving data") - train_tfrecord_files = encode_and_save_files(subtokenizer, FLAGS.data_dir, - compiled_train_files, _TRAIN_TAG, - _TRAIN_SHARDS) - encode_and_save_files(subtokenizer, FLAGS.data_dir, compiled_eval_files, - _EVAL_TAG, _EVAL_SHARDS) - - for fname in train_tfrecord_files: - shuffle_records(fname) - - -def define_data_download_flags(): - """Add flags specifying data download arguments.""" - flags.DEFINE_string( - name="data_dir", - short_name="dd", - default="/tmp/translate_ende", - help=flags_core.help_wrap( - "Directory for where the translate_ende_wmt32k dataset is saved.")) - flags.DEFINE_string( - name="raw_dir", - short_name="rd", - default="/tmp/translate_ende_raw", - help=flags_core.help_wrap( - "Path where the raw data will be downloaded and extracted.")) - flags.DEFINE_bool( - name="search", - default=False, - help=flags_core.help_wrap( - "If set, use binary search to find the vocabulary set with size" - "closest to the target size (%d)." % _TARGET_VOCAB_SIZE)) - - -if __name__ == "__main__": - logging.set_verbosity(logging.INFO) - define_data_download_flags() - FLAGS = flags.FLAGS - app.run(main) diff --git a/official/nlp/transformer/data_pipeline.py b/official/nlp/transformer/data_pipeline.py deleted file mode 100644 index 1d9f242172cadcd38fefbc900658b914483b3b24..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/data_pipeline.py +++ /dev/null @@ -1,330 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Input pipeline for the transformer model to read, filter, and batch examples. - -Two things to note in the pipeline: - -1. Batching scheme - - The examples encoded in the TFRecord files contain data in the format: - {"inputs": [variable length array of integers], - "targets": [variable length array of integers]} - Where integers in the arrays refer to tokens in the English and German vocab - file (named `vocab.ende.32768`). - - Prior to batching, elements in the dataset are grouped by length (max between - "inputs" and "targets" length). Each group is then batched such that: - group_batch_size * length <= batch_size. - - Another way to view batch_size is the maximum number of tokens in each batch. - - Once batched, each element in the dataset will have the shape: - {"inputs": [group_batch_size, padded_input_length], - "targets": [group_batch_size, padded_target_length]} - Lengths are padded to the longest "inputs" or "targets" sequence in the batch - (padded_input_length and padded_target_length can be different). - - This batching scheme decreases the fraction of padding tokens per training - batch, thus improving the training speed significantly. - -2. Shuffling - - While training, the dataset is shuffled in two places in the code. The first - is the list of training files. Second, while reading records using - `parallel_interleave`, the `sloppy` argument is used to generate randomness - in the order of the examples. -""" - -import os - -from absl import logging -import tensorflow as tf - -from official.utils.misc import model_helpers - -# Buffer size for reading records from a TFRecord file. Each training file is -# 7.2 MB, so 8 MB allows an entire file to be kept in memory. -_READ_RECORD_BUFFER = 8 * 1000 * 1000 - -# Example grouping constants. Defines length boundaries for each group. -# These values are the defaults used in Tensor2Tensor. -_MIN_BOUNDARY = 8 -_BOUNDARY_SCALE = 1.1 - - -def _load_records(filename): - """Read file and return a dataset of tf.Examples.""" - return tf.data.TFRecordDataset(filename, buffer_size=_READ_RECORD_BUFFER) - - -def _parse_example(serialized_example): - """Return inputs and targets Tensors from a serialized tf.Example.""" - data_fields = { - "inputs": tf.io.VarLenFeature(tf.int64), - "targets": tf.io.VarLenFeature(tf.int64) - } - parsed = tf.io.parse_single_example(serialized_example, data_fields) - inputs = tf.sparse.to_dense(parsed["inputs"]) - targets = tf.sparse.to_dense(parsed["targets"]) - return inputs, targets - - -def _filter_max_length(example, max_length=256): - """Indicates whether the example's length is lower than the maximum length.""" - return tf.logical_and( - tf.size(example[0]) <= max_length, - tf.size(example[1]) <= max_length) - - -def _get_example_length(example): - """Returns the maximum length between the example inputs and targets.""" - length = tf.maximum(tf.shape(example[0])[0], tf.shape(example[1])[0]) - return length - - -def _create_min_max_boundaries(max_length, - min_boundary=_MIN_BOUNDARY, - boundary_scale=_BOUNDARY_SCALE): - """Create min and max boundary lists up to max_length. - - For example, when max_length=24, min_boundary=4 and boundary_scale=2, the - returned values will be: - buckets_min = [0, 4, 8, 16, 24] - buckets_max = [4, 8, 16, 24, 25] - - Args: - max_length: The maximum length of example in dataset. - min_boundary: Minimum length in boundary. - boundary_scale: Amount to scale consecutive boundaries in the list. - - Returns: - min and max boundary lists - - """ - # Create bucket boundaries list by scaling the previous boundary or adding 1 - # (to ensure increasing boundary sizes). - bucket_boundaries = [] - x = min_boundary - while x < max_length: - bucket_boundaries.append(x) - x = max(x + 1, int(x * boundary_scale)) - - # Create min and max boundary lists from the initial list. - buckets_min = [0] + bucket_boundaries - buckets_max = bucket_boundaries + [max_length + 1] - return buckets_min, buckets_max - - -def _batch_examples(dataset, batch_size, max_length): - """Group examples by similar lengths, and return batched dataset. - - Each batch of similar-length examples are padded to the same length, and may - have different number of elements in each batch, such that: - group_batch_size * padded_length <= batch_size. - - This decreases the number of padding tokens per batch, which improves the - training speed. - - Args: - dataset: Dataset of unbatched examples. - batch_size: Max number of tokens per batch of examples. - max_length: Max number of tokens in an example input or target sequence. - - Returns: - Dataset of batched examples with similar lengths. - """ - # Get min and max boundary lists for each example. These are used to calculate - # the `bucket_id`, which is the index at which: - # buckets_min[bucket_id] <= len(example) < buckets_max[bucket_id] - # Note that using both min and max lists improves the performance. - buckets_min, buckets_max = _create_min_max_boundaries(max_length) - - # Create list of batch sizes for each bucket_id, so that - # bucket_batch_size[bucket_id] * buckets_max[bucket_id] <= batch_size - bucket_batch_sizes = [int(batch_size) // x for x in buckets_max] - # bucket_id will be a tensor, so convert this list to a tensor as well. - bucket_batch_sizes = tf.constant(bucket_batch_sizes, dtype=tf.int64) - - def example_to_bucket_id(example_input, example_target): - """Return int64 bucket id for this example, calculated based on length.""" - seq_length = _get_example_length((example_input, example_target)) - - # TODO(xunkai): investigate if removing code branching improves performance. - conditions_c = tf.logical_and( - tf.less_equal(buckets_min, seq_length), tf.less(seq_length, - buckets_max)) - bucket_id = tf.reduce_min(tf.where(conditions_c)) - return bucket_id - - def window_size_fn(bucket_id): - """Return number of examples to be grouped when given a bucket id.""" - return bucket_batch_sizes[bucket_id] - - def batching_fn(bucket_id, grouped_dataset): - """Batch and add padding to a dataset of elements with similar lengths.""" - bucket_batch_size = window_size_fn(bucket_id) - - # Batch the dataset and add padding so that all input sequences in the - # examples have the same length, and all target sequences have the same - # lengths as well. Resulting lengths of inputs and targets can differ. - return grouped_dataset.padded_batch(bucket_batch_size, ([None], [None])) - - return dataset.apply( - tf.data.experimental.group_by_window( - key_func=example_to_bucket_id, - reduce_func=batching_fn, - window_size=None, - window_size_func=window_size_fn)) - - -def _read_and_batch_from_files(file_pattern, - batch_size, - max_length, - max_io_parallelism, - shuffle, - repeat, - static_batch=False, - num_replicas=1, - ctx=None): - """Create dataset where each item is a dict of "inputs" and "targets". - - Args: - file_pattern: String used to match the input TFRecord files. - batch_size: Maximum number of tokens per global batch of examples. - max_length: Maximum number of tokens per example - max_io_parallelism: Max number of cpu cores for parallel input processing. - shuffle: If true, randomizes order of elements. - repeat: Number of times to repeat the dataset. If None, the dataset is - repeated forever. - static_batch: Whether the batches in the dataset should have static shapes. - If True, the input is batched so that every batch has the shape - [batch_size // max_length, max_length]. If False, the input is grouped by - length, and batched so that batches may have different - shapes [N, M], where: N * M <= batch_size M <= max_length In general, this - setting should be False. Dynamic shapes allow the inputs to be grouped - so that the number of padding tokens is minimized, and helps model - training. In cases where the input shape must be static (e.g. running on - TPU), this setting should be set to True. - num_replicas: Number of GPUs or other workers. We will generate global - batches, and each global batch is equally divisible by number of replicas. - Currently it is only effective when static_batch==True. TODO: make it - effective when static_batch=False. - ctx: Input context. - - Returns: - tf.data.Dataset object containing examples loaded from the files. - """ - dataset = tf.data.Dataset.list_files(file_pattern, shuffle=shuffle) - - if ctx and ctx.num_input_pipelines > 1: - logging.info("Shard %d of the dataset.", ctx.input_pipeline_id) - dataset = dataset.shard(ctx.num_input_pipelines, ctx.input_pipeline_id) - - # Read files and interleave results. When training, the order of the examples - # will be non-deterministic. - options = tf.data.Options() - options.experimental_deterministic = False - dataset = dataset.interleave( - _load_records, - cycle_length=max_io_parallelism, - num_parallel_calls=tf.data.experimental.AUTOTUNE).with_options(options) - - # Parse each tf.Example into a dictionary - # TODO: Look into prefetch_input_elements for performance optimization. # pylint: disable=g-bad-todo - dataset = dataset.map( - _parse_example, num_parallel_calls=tf.data.experimental.AUTOTUNE) - - # Remove examples where the input or target length exceeds the maximum length, - dataset = dataset.filter(lambda x, y: _filter_max_length((x, y), max_length)) - - if static_batch: - dataset = dataset.padded_batch( - # First calculate batch size (token number) per worker, then divide it - # into sentences, and finally expand to a global batch. It could prove - # the global batch divisble for distribution strategy. - int(batch_size // num_replicas // max_length * num_replicas), - ([max_length], [max_length]), - drop_remainder=True) - else: - # Group and batch such that each batch has examples of similar length. - # TODO(xunkai): _batch_examples might need to do something special for - # num_replicas. - dataset = _batch_examples(dataset, batch_size, max_length) - - dataset = dataset.repeat(repeat) - - # Prefetch the next element to improve speed of input pipeline. - dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) - return dataset - - -def _generate_synthetic_data(params): - """Create synthetic data based on the parameter batch size.""" - batch_size = int(params["batch_size"] // params["max_length"]) - length = params["max_length"] - dataset = model_helpers.generate_synthetic_data( - input_shape=tf.TensorShape([length]), - input_value=1, - input_dtype=tf.int64, - label_shape=tf.TensorShape([length]), - label_value=1, - label_dtype=tf.int64, - ) - if params["static_batch"]: - dataset = dataset.batch(batch_size, drop_remainder=True) - else: - dataset = dataset.padded_batch(batch_size, ([None], [None])) - return dataset - - -def train_input_fn(params, ctx=None): - """Load and return dataset of batched examples for use during training.""" - file_pattern = os.path.join(params["data_dir"] or "", "*train*") - if params["use_synthetic_data"]: - return _generate_synthetic_data(params) - return _read_and_batch_from_files( - file_pattern, - params["batch_size"], - params["max_length"], - params["max_io_parallelism"], - shuffle=True, - repeat=params["repeat_dataset"], - static_batch=params["static_batch"], - num_replicas=params["num_gpus"], - ctx=ctx) - - -def eval_input_fn(params, ctx=None): - """Load and return dataset of batched examples for use during evaluation.""" - file_pattern = os.path.join(params["data_dir"] or "", "*dev*") - if params["use_synthetic_data"]: - return _generate_synthetic_data(params) - return _read_and_batch_from_files( - file_pattern, - params["batch_size"], - params["max_length"], - params["max_io_parallelism"], - shuffle=False, - repeat=1, - static_batch=params["static_batch"], - num_replicas=params["num_gpus"], - ctx=ctx) - - -def map_data_for_transformer_fn(x, y): - """Maps data for training, and handles weried behaviors for different vers.""" - # Will transform input x and targets y into tuple(x, y) as new model inputs. - # For TF v2, the 2nd parameter is omitted to make Keras training work. - return ((x, y),) diff --git a/official/nlp/transformer/embedding_layer.py b/official/nlp/transformer/embedding_layer.py deleted file mode 100644 index 69f3861ce6745bab0f62f29c2213fe53f99183c2..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/embedding_layer.py +++ /dev/null @@ -1,102 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Implementation of embedding layer with shared weights.""" - -import tensorflow as tf - - -class EmbeddingSharedWeights(tf.keras.layers.Layer): - """Calculates input embeddings and pre-softmax linear with shared weights.""" - - def __init__(self, vocab_size, hidden_size): - """Specify characteristic parameters of embedding layer. - - Args: - vocab_size: Number of tokens in the embedding. (Typically ~32,000) - hidden_size: Dimensionality of the embedding. (Typically 512 or 1024) - """ - super(EmbeddingSharedWeights, self).__init__() - self.vocab_size = vocab_size - self.hidden_size = hidden_size - - def build(self, input_shape): - """Build embedding layer.""" - with tf.name_scope("embedding_and_softmax"): - # Create and initialize weights. The random normal initializer was chosen - # arbitrarily, and works well. - self.shared_weights = self.add_weight( - "weights", - shape=[self.vocab_size, self.hidden_size], - dtype=tf.float32, - initializer=tf.random_normal_initializer( - mean=0., stddev=self.hidden_size**-0.5)) - super(EmbeddingSharedWeights, self).build(input_shape) - - def get_config(self): - return { - "vocab_size": self.vocab_size, - "hidden_size": self.hidden_size, - } - - def call(self, inputs, mode="embedding"): - """Get token embeddings of inputs. - - Args: - inputs: An int64 tensor with shape [batch_size, length] - mode: string, a valid value is one of "embedding" and "linear". - - Returns: - outputs: (1) If mode == "embedding", output embedding tensor, float32 with - shape [batch_size, length, embedding_size]; (2) mode == "linear", output - linear tensor, float32 with shape [batch_size, length, vocab_size]. - Raises: - ValueError: if mode is not valid. - """ - if mode == "embedding": - return self._embedding(inputs) - elif mode == "linear": - return self._linear(inputs) - else: - raise ValueError("mode {} is not valid.".format(mode)) - - def _embedding(self, inputs): - """Applies embedding based on inputs tensor.""" - with tf.name_scope("embedding"): - # Create binary mask of size [batch_size, length] - embeddings = tf.gather(self.shared_weights, inputs) - # mask = tf.cast(tf.not_equal(inputs, 0), embeddings.dtype) - # embeddings *= tf.expand_dims(mask, -1) - # Scale embedding by the sqrt of the hidden size - embeddings *= self.hidden_size**0.5 - - return embeddings - - def _linear(self, inputs): - """Computes logits by running inputs through a linear layer. - - Args: - inputs: A float32 tensor with shape [batch_size, length, hidden_size] - - Returns: - float32 tensor with shape [batch_size, length, vocab_size]. - """ - with tf.name_scope("presoftmax_linear"): - batch_size = tf.shape(inputs)[0] - length = tf.shape(inputs)[1] - - x = tf.reshape(inputs, [-1, self.hidden_size]) - logits = tf.matmul(x, self.shared_weights, transpose_b=True) - - return tf.reshape(logits, [batch_size, length, self.vocab_size]) diff --git a/official/nlp/transformer/ffn_layer.py b/official/nlp/transformer/ffn_layer.py deleted file mode 100644 index 26f0a15f69c50abee6f95dd40928e844ece1c691..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/ffn_layer.py +++ /dev/null @@ -1,71 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Implementation of fully connected network.""" - -import tensorflow as tf - - -class FeedForwardNetwork(tf.keras.layers.Layer): - """Fully connected feedforward network.""" - - def __init__(self, hidden_size, filter_size, relu_dropout): - """Initialize FeedForwardNetwork. - - Args: - hidden_size: int, output dim of hidden layer. - filter_size: int, filter size for the inner (first) dense layer. - relu_dropout: float, dropout rate for training. - """ - super(FeedForwardNetwork, self).__init__() - self.hidden_size = hidden_size - self.filter_size = filter_size - self.relu_dropout = relu_dropout - - def build(self, input_shape): - self.filter_dense_layer = tf.keras.layers.Dense( - self.filter_size, - use_bias=True, - activation=tf.nn.relu, - name="filter_layer") - self.output_dense_layer = tf.keras.layers.Dense( - self.hidden_size, use_bias=True, name="output_layer") - super(FeedForwardNetwork, self).build(input_shape) - - def get_config(self): - return { - "hidden_size": self.hidden_size, - "filter_size": self.filter_size, - "relu_dropout": self.relu_dropout, - } - - def call(self, x, training): - """Return outputs of the feedforward network. - - Args: - x: tensor with shape [batch_size, length, hidden_size] - training: boolean, whether in training mode or not. - - Returns: - Output of the feedforward network. - tensor with shape [batch_size, length, hidden_size] - """ - # Retrieve dynamically known shapes - - output = self.filter_dense_layer(x) - if training: - output = tf.nn.dropout(output, rate=self.relu_dropout) - output = self.output_dense_layer(output) - - return output diff --git a/official/nlp/transformer/metrics.py b/official/nlp/transformer/metrics.py deleted file mode 100644 index 38330aa471c7f7384a3f42abb7eefc5a62a48d94..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/metrics.py +++ /dev/null @@ -1,180 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Functions for calculating loss, accuracy, and other model metrics. - -Metrics: - - Padded loss, accuracy, and negative log perplexity. Source: - https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/metrics.py - - BLEU approximation. Source: - https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/bleu_hook.py - - ROUGE score. Source: - https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/rouge.py -""" - -import functools - -import tensorflow as tf - - -def _pad_tensors_to_same_length(x, y): - """Pad x and y so that the results have the same length (second dimension).""" - with tf.name_scope("pad_to_same_length"): - x_length = tf.shape(x)[1] - y_length = tf.shape(y)[1] - - max_length = tf.maximum(x_length, y_length) - - x = tf.pad(x, [[0, 0], [0, max_length - x_length], [0, 0]]) - y = tf.pad(y, [[0, 0], [0, max_length - y_length]]) - return x, y - - -def padded_cross_entropy_loss(logits, labels, smoothing, vocab_size): - """Calculate cross entropy loss while ignoring padding. - - Args: - logits: Tensor of size [batch_size, length_logits, vocab_size] - labels: Tensor of size [batch_size, length_labels] - smoothing: Label smoothing constant, used to determine the on and off values - vocab_size: int size of the vocabulary - - Returns: - Returns the cross entropy loss and weight tensors: float32 tensors with - shape [batch_size, max(length_logits, length_labels)] - """ - with tf.name_scope("loss"): - logits, labels = _pad_tensors_to_same_length(logits, labels) - - # Calculate smoothing cross entropy - with tf.name_scope("smoothing_cross_entropy"): - confidence = 1.0 - smoothing - low_confidence = (1.0 - confidence) / tf.cast(vocab_size - 1, tf.float32) - soft_targets = tf.one_hot( - tf.cast(labels, tf.int32), - depth=vocab_size, - on_value=confidence, - off_value=low_confidence) - xentropy = tf.nn.softmax_cross_entropy_with_logits( - logits=logits, labels=soft_targets) - - # Calculate the best (lowest) possible value of cross entropy, and - # subtract from the cross entropy loss. - normalizing_constant = -( - confidence * tf.math.log(confidence) + - tf.cast(vocab_size - 1, tf.float32) * low_confidence * - tf.math.log(low_confidence + 1e-20)) - xentropy -= normalizing_constant - - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - return xentropy * weights, weights - - -def padded_accuracy(logits, labels): - """Percentage of times that predictions matches labels on non-0s.""" - with tf.name_scope("padded_accuracy"): - logits, labels = _pad_tensors_to_same_length(logits, labels) - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - outputs = tf.cast(tf.argmax(logits, axis=-1), tf.int32) - padded_labels = tf.cast(labels, tf.int32) - return tf.cast(tf.equal(outputs, padded_labels), tf.float32), weights - - -def padded_accuracy_topk(logits, labels, k): - """Percentage of times that top-k predictions matches labels on non-0s.""" - with tf.name_scope("padded_accuracy_topk"): - logits, labels = _pad_tensors_to_same_length(logits, labels) - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - effective_k = tf.minimum(k, tf.shape(logits)[-1]) - _, outputs = tf.nn.top_k(logits, k=effective_k) - outputs = tf.cast(outputs, tf.int32) - padded_labels = tf.cast(labels, tf.int32) - padded_labels = tf.expand_dims(padded_labels, axis=-1) - padded_labels += tf.zeros_like(outputs) # Pad to same shape. - same = tf.cast(tf.equal(outputs, padded_labels), tf.float32) - same_topk = tf.reduce_sum(same, axis=-1) - return same_topk, weights - - -def padded_accuracy_top5(logits, labels): - return padded_accuracy_topk(logits, labels, 5) - - -def padded_sequence_accuracy(logits, labels): - """Percentage of times that predictions matches labels everywhere (non-0).""" - with tf.name_scope("padded_sequence_accuracy"): - logits, labels = _pad_tensors_to_same_length(logits, labels) - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - outputs = tf.cast(tf.argmax(logits, axis=-1), tf.int32) - padded_labels = tf.cast(labels, tf.int32) - not_correct = tf.cast(tf.not_equal(outputs, padded_labels), - tf.float32) * weights - axis = list(range(1, len(outputs.get_shape()))) - correct_seq = 1.0 - tf.minimum(1.0, tf.reduce_sum(not_correct, axis=axis)) - return correct_seq, tf.constant(1.0) - - -def padded_neg_log_perplexity(logits, labels, vocab_size): - """Average log-perplexity excluding padding 0s. No smoothing.""" - num, den = padded_cross_entropy_loss(logits, labels, 0, vocab_size) - return -num, den - - -class MetricLayer(tf.keras.layers.Layer): - """Custom a layer of metrics for Transformer model.""" - - def __init__(self, vocab_size): - super(MetricLayer, self).__init__() - self.vocab_size = vocab_size - self.metric_mean_fns = [] - - def build(self, input_shape): - """"Builds metric layer.""" - neg_log_perplexity = functools.partial( - padded_neg_log_perplexity, vocab_size=self.vocab_size) - self.metric_mean_fns = [ - (tf.keras.metrics.Mean("accuracy"), padded_accuracy), - (tf.keras.metrics.Mean("accuracy_top5"), padded_accuracy_top5), - (tf.keras.metrics.Mean("accuracy_per_sequence"), - padded_sequence_accuracy), - (tf.keras.metrics.Mean("neg_log_perplexity"), neg_log_perplexity), - ] - super(MetricLayer, self).build(input_shape) - - def get_config(self): - return {"vocab_size": self.vocab_size} - - def call(self, inputs): - logits, targets = inputs[0], inputs[1] - for mean, fn in self.metric_mean_fns: - m = mean(*fn(logits, targets)) - self.add_metric(m) - return logits - - -def transformer_loss(logits, labels, smoothing, vocab_size): - """Calculates total loss containing cross entropy with padding ignored. - - Args: - logits: Tensor of size [batch_size, length_logits, vocab_size] - labels: Tensor of size [batch_size, length_labels] - smoothing: Label smoothing constant, used to determine the on and off values - vocab_size: int size of the vocabulary - - Returns: - A scalar float tensor for loss. - """ - xentropy, weights = padded_cross_entropy_loss(logits, labels, smoothing, - vocab_size) - return tf.reduce_sum(xentropy) / tf.reduce_sum(weights) diff --git a/official/nlp/transformer/misc.py b/official/nlp/transformer/misc.py deleted file mode 100644 index a457e92f754f96547b527bddef016c30efea0cd9..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/misc.py +++ /dev/null @@ -1,288 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Misc for Transformer.""" - -# pylint: disable=g-bad-import-order - -from absl import flags -import tensorflow as tf - -from official.nlp.transformer import model_params -from official.utils.flags import core as flags_core -from official.utils.misc import keras_utils - -FLAGS = flags.FLAGS - -PARAMS_MAP = { - 'tiny': model_params.TINY_PARAMS, - 'base': model_params.BASE_PARAMS, - 'big': model_params.BIG_PARAMS, -} - - -def get_model_params(param_set, num_gpus): - """Gets predefined model params.""" - if num_gpus > 1: - if param_set == 'big': - return model_params.BIG_MULTI_GPU_PARAMS.copy() - elif param_set == 'base': - return model_params.BASE_MULTI_GPU_PARAMS.copy() - else: - raise ValueError('Not valid params: param_set={} num_gpus={}'.format( - param_set, num_gpus)) - - return PARAMS_MAP[param_set].copy() - - -def define_transformer_flags(): - """Add flags and flag validators for running transformer_main.""" - # Add common flags (data_dir, model_dir, etc.). - flags_core.define_base(num_gpu=True, distribution_strategy=True) - flags_core.define_performance( - num_parallel_calls=True, - inter_op=False, - intra_op=False, - synthetic_data=True, - max_train_steps=False, - dtype=True, - loss_scale=True, - all_reduce_alg=True, - num_packs=True, - tf_gpu_thread_mode=True, - datasets_num_private_threads=True, - enable_xla=True, - fp16_implementation=True) - - flags_core.define_benchmark() - flags_core.define_device(tpu=True) - - flags.DEFINE_integer( - name='train_steps', - short_name='ts', - default=300000, - help=flags_core.help_wrap('The number of steps used to train.')) - flags.DEFINE_integer( - name='steps_between_evals', - short_name='sbe', - default=5000, - help=flags_core.help_wrap( - 'The Number of training steps to run between evaluations. This is ' - 'used if --train_steps is defined.')) - flags.DEFINE_boolean( - name='enable_time_history', - default=True, - help='Whether to enable TimeHistory callback.') - flags.DEFINE_boolean( - name='enable_tensorboard', - default=False, - help='Whether to enable Tensorboard callback.') - flags.DEFINE_boolean( - name='enable_metrics_in_training', - default=False, - help='Whether to enable metrics during training.') - flags.DEFINE_boolean( - name='enable_mlir_bridge', - default=False, - help='Whether to enable the TF to XLA bridge.') - # Set flags from the flags_core module as 'key flags' so they're listed when - # the '-h' flag is used. Without this line, the flags defined above are - # only shown in the full `--helpful` help text. - flags.adopt_module_key_flags(flags_core) - - # Add transformer-specific flags - flags.DEFINE_enum( - name='param_set', - short_name='mp', - default='big', - enum_values=PARAMS_MAP.keys(), - help=flags_core.help_wrap( - 'Parameter set to use when creating and training the model. The ' - 'parameters define the input shape (batch size and max length), ' - 'model configuration (size of embedding, # of hidden layers, etc.), ' - 'and various other settings. The big parameter set increases the ' - 'default batch size, embedding/hidden size, and filter size. For a ' - 'complete list of parameters, please see model/model_params.py.')) - - flags.DEFINE_bool( - name='static_batch', - short_name='sb', - default=False, - help=flags_core.help_wrap( - 'Whether the batches in the dataset should have static shapes. In ' - 'general, this setting should be False. Dynamic shapes allow the ' - 'inputs to be grouped so that the number of padding tokens is ' - 'minimized, and helps model training. In cases where the input shape ' - 'must be static (e.g. running on TPU), this setting will be ignored ' - 'and static batching will always be used.')) - flags.DEFINE_integer( - name='max_length', - short_name='ml', - default=256, - help=flags_core.help_wrap( - 'Max sentence length for Transformer. Default is 256. Note: Usually ' - 'it is more effective to use a smaller max length if static_batch is ' - 'enabled, e.g. 64.')) - - # Flags for training with steps (may be used for debugging) - flags.DEFINE_integer( - name='validation_steps', - short_name='vs', - default=64, - help=flags_core.help_wrap('The number of steps used in validation.')) - - # BLEU score computation - flags.DEFINE_string( - name='bleu_source', - short_name='bls', - default=None, - help=flags_core.help_wrap( - 'Path to source file containing text translate when calculating the ' - 'official BLEU score. Both --bleu_source and --bleu_ref must be set. ' - )) - flags.DEFINE_string( - name='bleu_ref', - short_name='blr', - default=None, - help=flags_core.help_wrap( - 'Path to source file containing text translate when calculating the ' - 'official BLEU score. Both --bleu_source and --bleu_ref must be set. ' - )) - flags.DEFINE_string( - name='vocab_file', - short_name='vf', - default=None, - help=flags_core.help_wrap( - 'Path to subtoken vocabulary file. If data_download.py was used to ' - 'download and encode the training data, look in the data_dir to find ' - 'the vocab file.')) - flags.DEFINE_string( - name='mode', - default='train', - help=flags_core.help_wrap('mode: train, eval, or predict')) - flags.DEFINE_bool( - name='use_ctl', - default=False, - help=flags_core.help_wrap( - 'Whether the model runs with custom training loop.')) - flags.DEFINE_integer( - name='decode_batch_size', - default=32, - help=flags_core.help_wrap( - 'Global batch size used for Transformer autoregressive decoding on ' - 'TPU.')) - flags.DEFINE_integer( - name='decode_max_length', - default=97, - help=flags_core.help_wrap( - 'Max sequence length of the decode/eval data. This is used by ' - 'Transformer autoregressive decoding on TPU to have minimum ' - 'paddings.')) - flags.DEFINE_bool( - name='padded_decode', - default=False, - help=flags_core.help_wrap( - 'Whether the autoregressive decoding runs with input data padded to ' - 'the decode_max_length. For TPU/XLA-GPU runs, this flag has to be ' - 'set due the static shape requirement. Although CPU/GPU could also ' - 'use padded_decode, it has not been tested. In addition, this method ' - 'will introduce unnecessary overheads which grow quadratically with ' - 'the max sequence length.')) - flags.DEFINE_bool( - name='enable_checkpointing', - default=True, - help=flags_core.help_wrap( - 'Whether to do checkpointing during training. When running under ' - 'benchmark harness, we will avoid checkpointing.')) - flags.DEFINE_bool( - name='save_weights_only', - default=True, - help=flags_core.help_wrap( - 'Only used when above `enable_checkpointing` is True. ' - 'If True, then only the model\'s weights will be saved ' - '(`model.save_weights(filepath)`), else the full model is saved ' - '(`model.save(filepath)`)')) - - flags_core.set_defaults( - data_dir='/tmp/translate_ende', - model_dir='/tmp/transformer_model', - batch_size=None) - - # pylint: disable=unused-variable - @flags.multi_flags_validator( - ['bleu_source', 'bleu_ref'], - message='Both or neither --bleu_source and --bleu_ref must be defined.') - def _check_bleu_files(flags_dict): - return (flags_dict['bleu_source'] is None) == ( - flags_dict['bleu_ref'] is None) - - @flags.multi_flags_validator( - ['bleu_source', 'bleu_ref', 'vocab_file'], - message='--vocab_file must be defined if --bleu_source and --bleu_ref ' - 'are defined.') - def _check_bleu_vocab_file(flags_dict): - if flags_dict['bleu_source'] and flags_dict['bleu_ref']: - return flags_dict['vocab_file'] is not None - return True - - # pylint: enable=unused-variable - - -def get_callbacks(): - """Returns common callbacks.""" - callbacks = [] - if FLAGS.enable_time_history: - time_callback = keras_utils.TimeHistory( - FLAGS.batch_size, - FLAGS.log_steps, - logdir=FLAGS.model_dir if FLAGS.enable_tensorboard else None) - callbacks.append(time_callback) - - if FLAGS.enable_tensorboard: - tensorboard_callback = tf.keras.callbacks.TensorBoard( - log_dir=FLAGS.model_dir) - callbacks.append(tensorboard_callback) - - return callbacks - - -def update_stats(history, stats, callbacks): - """Normalizes and updates dictionary of stats. - - Args: - history: Results of the training step. - stats: Dict with pre-existing training stats. - callbacks: a list of callbacks which might include a time history callback - used during keras.fit. - """ - - if history and history.history: - train_hist = history.history - # Gets final loss from training. - stats['loss'] = float(train_hist['loss'][-1]) - - if not callbacks: - return - - # Look for the time history callback which was used during keras.fit - for callback in callbacks: - if isinstance(callback, keras_utils.TimeHistory): - timestamp_log = callback.timestamp_log - stats['step_timestamp_log'] = timestamp_log - stats['train_finish_time'] = callback.train_finish_time - if len(timestamp_log) > 1: - stats['avg_exp_per_second'] = ( - callback.batch_size * callback.log_steps * - (len(callback.timestamp_log) - 1) / - (timestamp_log[-1].timestamp - timestamp_log[0].timestamp)) diff --git a/official/nlp/transformer/model_params.py b/official/nlp/transformer/model_params.py deleted file mode 100644 index 0764d5e9a0d2e97754943cd61574b1c24469a0ae..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/model_params.py +++ /dev/null @@ -1,96 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Defines Transformer model parameters.""" - -import collections - - -BASE_PARAMS = collections.defaultdict( - lambda: None, # Set default value to None. - - # Input params - default_batch_size=2048, # Maximum number of tokens per batch of examples. - default_batch_size_tpu=32768, - max_length=256, # Maximum number of tokens per example. - - # Model params - initializer_gain=1.0, # Used in trainable variable initialization. - vocab_size=33708, # Number of tokens defined in the vocabulary file. - hidden_size=512, # Model dimension in the hidden layers. - num_hidden_layers=6, # Number of layers in the encoder and decoder stacks. - num_heads=8, # Number of heads to use in multi-headed attention. - filter_size=2048, # Inner layer dimension in the feedforward network. - - # Dropout values (only used when training) - layer_postprocess_dropout=0.1, - attention_dropout=0.1, - relu_dropout=0.1, - - # Training params - label_smoothing=0.1, - learning_rate=2.0, - learning_rate_decay_rate=1.0, - learning_rate_warmup_steps=16000, - - # Optimizer params - optimizer_adam_beta1=0.9, - optimizer_adam_beta2=0.997, - optimizer_adam_epsilon=1e-09, - - # Default prediction params - extra_decode_length=50, - beam_size=4, - alpha=0.6, # used to calculate length normalization in beam search - - # TPU specific parameters - use_tpu=False, - static_batch=False, - allow_ffn_pad=True, -) - -BIG_PARAMS = BASE_PARAMS.copy() -BIG_PARAMS.update( - default_batch_size=4096, - - # default batch size is smaller than for BASE_PARAMS due to memory limits. - default_batch_size_tpu=16384, - - hidden_size=1024, - filter_size=4096, - num_heads=16, -) - -# Parameters for running the model in multi gpu. These should not change the -# params that modify the model shape (such as the hidden_size or num_heads). -BASE_MULTI_GPU_PARAMS = BASE_PARAMS.copy() -BASE_MULTI_GPU_PARAMS.update( - learning_rate_warmup_steps=8000 -) - -BIG_MULTI_GPU_PARAMS = BIG_PARAMS.copy() -BIG_MULTI_GPU_PARAMS.update( - layer_postprocess_dropout=0.3, - learning_rate_warmup_steps=8000 -) - -# Parameters for testing the model -TINY_PARAMS = BASE_PARAMS.copy() -TINY_PARAMS.update( - default_batch_size=1024, - default_batch_size_tpu=1024, - hidden_size=32, - num_heads=4, - filter_size=256, -) diff --git a/official/nlp/transformer/model_utils.py b/official/nlp/transformer/model_utils.py deleted file mode 100644 index 6e163b97361cb7f071314909aaa1fc1e52ae6bfd..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/model_utils.py +++ /dev/null @@ -1,121 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Transformer model helper methods.""" - -import math - -import numpy as np -import tensorflow as tf - -# Very low numbers to represent -infinity. We do not actually use -Inf, since we -# want to be able to multiply these values by zero to get zero. (-Inf * 0 = NaN) -_NEG_INF_FP32 = -1e9 -_NEG_INF_FP16 = np.finfo(np.float16).min - - -def get_position_encoding(length, - hidden_size, - min_timescale=1.0, - max_timescale=1.0e4): - """Return positional encoding. - - Calculates the position encoding as a mix of sine and cosine functions with - geometrically increasing wavelengths. - Defined and formulized in Attention is All You Need, section 3.5. - - Args: - length: Sequence length. - hidden_size: Size of the - min_timescale: Minimum scale that will be applied at each position - max_timescale: Maximum scale that will be applied at each position - - Returns: - Tensor with shape [length, hidden_size] - """ - # We compute the positional encoding in float32 even if the model uses - # float16, as many of the ops used, like log and exp, are numerically unstable - # in float16. - position = tf.cast(tf.range(length), tf.float32) - num_timescales = hidden_size // 2 - log_timescale_increment = ( - math.log(float(max_timescale) / float(min_timescale)) / - (tf.cast(num_timescales, tf.float32) - 1)) - inv_timescales = min_timescale * tf.exp( - tf.cast(tf.range(num_timescales), tf.float32) * -log_timescale_increment) - scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(inv_timescales, 0) - signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=1) - return signal - - -def get_decoder_self_attention_bias(length, dtype=tf.float32): - """Calculate bias for decoder that maintains model's autoregressive property. - - Creates a tensor that masks out locations that correspond to illegal - connections, so prediction at position i cannot draw information from future - positions. - - Args: - length: int length of sequences in batch. - dtype: The dtype of the return value. - - Returns: - float tensor of shape [1, 1, length, length] - """ - neg_inf = _NEG_INF_FP16 if dtype == tf.float16 else _NEG_INF_FP32 - with tf.name_scope("decoder_self_attention_bias"): - valid_locs = tf.linalg.band_part( - tf.ones([length, length], dtype=dtype), -1, 0) - valid_locs = tf.reshape(valid_locs, [1, 1, length, length]) - decoder_bias = neg_inf * (1.0 - valid_locs) - return decoder_bias - - -def get_padding(x, padding_value=0, dtype=tf.float32): - """Return float tensor representing the padding values in x. - - Args: - x: int tensor with any shape - padding_value: int which represents padded values in input - dtype: The dtype of the return value. - - Returns: - float tensor with same shape as x containing values 0 or 1. - 0 -> non-padding, 1 -> padding - """ - with tf.name_scope("padding"): - return tf.cast(tf.equal(x, padding_value), dtype) - - -def get_padding_bias(x, padding_value=0, dtype=tf.float32): - """Calculate bias tensor from padding values in tensor. - - Bias tensor that is added to the pre-softmax multi-headed attention logits, - which has shape [batch_size, num_heads, length, length]. The tensor is zero at - non-padding locations, and -1e9 (negative infinity) at padding locations. - - Args: - x: int tensor with shape [batch_size, length] - padding_value: int which represents padded values in input - dtype: The dtype of the return value - - Returns: - Attention bias tensor of shape [batch_size, 1, 1, length]. - """ - with tf.name_scope("attention_bias"): - padding = get_padding(x, padding_value, dtype) - attention_bias = padding * _NEG_INF_FP32 - attention_bias = tf.expand_dims( - tf.expand_dims(attention_bias, axis=1), axis=1) - return attention_bias diff --git a/official/nlp/transformer/model_utils_test.py b/official/nlp/transformer/model_utils_test.py deleted file mode 100644 index 10ddeed8392a77175b82b69c6e628cc1306c607c..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/model_utils_test.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test Transformer model helper methods.""" - -import tensorflow as tf - -from official.nlp.transformer import model_utils - -NEG_INF = -1e9 - - -class ModelUtilsTest(tf.test.TestCase): - - def test_get_padding(self): - x = tf.constant([[1, 0, 0, 0, 2], [3, 4, 0, 0, 0], [0, 5, 6, 0, 7]]) - padding = model_utils.get_padding(x, padding_value=0) - - self.assertAllEqual([[0, 1, 1, 1, 0], [0, 0, 1, 1, 1], [1, 0, 0, 1, 0]], - padding) - - def test_get_padding_bias(self): - x = tf.constant([[1, 0, 0, 0, 2], [3, 4, 0, 0, 0], [0, 5, 6, 0, 7]]) - bias = model_utils.get_padding_bias(x) - bias_shape = tf.shape(bias) - flattened_bias = tf.reshape(bias, [3, 5]) - - self.assertAllEqual( - [[0, NEG_INF, NEG_INF, NEG_INF, 0], [0, 0, NEG_INF, NEG_INF, NEG_INF], - [NEG_INF, 0, 0, NEG_INF, 0]], flattened_bias) - self.assertAllEqual([3, 1, 1, 5], bias_shape) - - def test_get_decoder_self_attention_bias(self): - length = 5 - bias = model_utils.get_decoder_self_attention_bias(length) - - self.assertAllEqual( - [[[[0, NEG_INF, NEG_INF, NEG_INF, NEG_INF], - [0, 0, NEG_INF, NEG_INF, NEG_INF], [0, 0, 0, NEG_INF, NEG_INF], - [0, 0, 0, 0, NEG_INF], [0, 0, 0, 0, 0]]]], bias) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/transformer/optimizer.py b/official/nlp/transformer/optimizer.py deleted file mode 100644 index b27a6f07a4b73723be6f28d257bc3abcfbca43de..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/optimizer.py +++ /dev/null @@ -1,64 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Optimizer from addons and learning rate scheduler.""" - -import tensorflow as tf - - -class LearningRateSchedule(tf.keras.optimizers.schedules.LearningRateSchedule): - """Learning rate schedule.""" - - def __init__(self, initial_learning_rate, hidden_size, warmup_steps): - """Initialize configuration of the learning rate schedule. - - Args: - initial_learning_rate: A float, the initial learning rate. - hidden_size: An integer, the model dimension in the hidden layers. - warmup_steps: An integer, the number of steps required for linear warmup. - """ - super(LearningRateSchedule, self).__init__() - self.initial_learning_rate = initial_learning_rate - self.hidden_size = hidden_size - self.warmup_steps = warmup_steps - self.warmup_steps_tensor = tf.cast(warmup_steps, tf.float32) - - def __call__(self, global_step): - """Calculate learning rate with linear warmup and rsqrt decay. - - Args: - global_step: An integer, the current global step used for learning rate - calculation. - - Returns: - A float, the learning rate needs to be used for current global step. - """ - with tf.name_scope('learning_rate_schedule'): - global_step = tf.cast(global_step, tf.float32) - learning_rate = self.initial_learning_rate - learning_rate *= (self.hidden_size**-0.5) - # Apply linear warmup - learning_rate *= tf.minimum(1.0, global_step / self.warmup_steps_tensor) - # Apply rsqrt decay - learning_rate /= tf.sqrt( - tf.maximum(global_step, self.warmup_steps_tensor)) - return learning_rate - - def get_config(self): - """Get the configuration of the learning rate schedule.""" - return { - 'initial_learning_rate': self.initial_learning_rate, - 'hidden_size': self.hidden_size, - 'warmup_steps': self.warmup_steps, - } diff --git a/official/nlp/transformer/transformer.py b/official/nlp/transformer/transformer.py deleted file mode 100644 index b7ea0fe7f5f9bd6a0c57a6b02642df39e953894a..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/transformer.py +++ /dev/null @@ -1,549 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Defines the Transformer model in TF 2.0. - -Model paper: https://arxiv.org/pdf/1706.03762.pdf -Transformer model code source: https://github.com/tensorflow/tensor2tensor -""" - -import tensorflow as tf -from official.nlp.modeling.layers import position_embedding -from official.nlp.modeling.ops import beam_search -from official.nlp.transformer import attention_layer -from official.nlp.transformer import embedding_layer -from official.nlp.transformer import ffn_layer -from official.nlp.transformer import metrics -from official.nlp.transformer import model_utils -from official.nlp.transformer.utils.tokenizer import EOS_ID - -# Disable the not-callable lint error, since it claims many objects are not -# callable when they actually are. -# pylint: disable=not-callable - - -def create_model(params, is_train): - """Creates transformer model.""" - with tf.name_scope("model"): - if is_train: - inputs = tf.keras.layers.Input((None,), dtype="int64", name="inputs") - targets = tf.keras.layers.Input((None,), dtype="int64", name="targets") - internal_model = Transformer(params, name="transformer_v2") - logits = internal_model([inputs, targets], training=is_train) - vocab_size = params["vocab_size"] - label_smoothing = params["label_smoothing"] - if params["enable_metrics_in_training"]: - logits = metrics.MetricLayer(vocab_size)([logits, targets]) - logits = tf.keras.layers.Lambda( - lambda x: x, name="logits", dtype=tf.float32)( - logits) - model = tf.keras.Model([inputs, targets], logits) - loss = metrics.transformer_loss(logits, targets, label_smoothing, - vocab_size) - model.add_loss(loss) - return model - - else: - inputs = tf.keras.layers.Input((None,), dtype="int64", name="inputs") - internal_model = Transformer(params, name="transformer_v2") - ret = internal_model([inputs], training=is_train) - outputs, scores = ret["outputs"], ret["scores"] - return tf.keras.Model(inputs, [outputs, scores]) - - -class Transformer(tf.keras.Model): - """Transformer model with Keras. - - Implemented as described in: https://arxiv.org/pdf/1706.03762.pdf - - The Transformer model consists of an encoder and decoder. The input is an int - sequence (or a batch of sequences). The encoder produces a continuous - representation, and the decoder uses the encoder output to generate - probabilities for the output sequence. - """ - - def __init__(self, params, name=None): - """Initialize layers to build Transformer model. - - Args: - params: hyperparameter object defining layer sizes, dropout values, etc. - name: name of the model. - """ - super(Transformer, self).__init__(name=name) - self.params = params - self.embedding_softmax_layer = embedding_layer.EmbeddingSharedWeights( - params["vocab_size"], params["hidden_size"]) - self.encoder_stack = EncoderStack(params) - self.decoder_stack = DecoderStack(params) - self.position_embedding = position_embedding.RelativePositionEmbedding( - hidden_size=self.params["hidden_size"]) - - def get_config(self): - return { - "params": self.params, - } - - def call(self, inputs, training): - """Calculate target logits or inferred target sequences. - - Args: - inputs: input tensor list of size 1 or 2. - First item, inputs: int tensor with shape [batch_size, input_length]. - Second item (optional), targets: None or int tensor with shape - [batch_size, target_length]. - training: boolean, whether in training mode or not. - - Returns: - If targets is defined, then return logits for each word in the target - sequence. float tensor with shape [batch_size, target_length, vocab_size] - If target is none, then generate output sequence one token at a time. - returns a dictionary { - outputs: int tensor with shape [batch_size, decoded_length] - scores: float tensor with shape [batch_size]} - Even when float16 is used, the output tensor(s) are always float32. - - Raises: - NotImplementedError: If try to use padded decode method on CPU/GPUs. - """ - inputs = inputs if isinstance(inputs, list) else [inputs] - if len(inputs) == 2: - inputs, targets = inputs[0], inputs[1] - else: - # Decoding path. - inputs, targets = inputs[0], None - if self.params["padded_decode"]: - if not self.params["num_replicas"]: - raise NotImplementedError( - "Padded decoding on CPU/GPUs is not supported.") - decode_batch_size = int(self.params["decode_batch_size"] / - self.params["num_replicas"]) - inputs.set_shape([decode_batch_size, self.params["decode_max_length"]]) - - # Variance scaling is used here because it seems to work in many problems. - # Other reasonable initializers may also work just as well. - with tf.name_scope("Transformer"): - # Calculate attention bias for encoder self-attention and decoder - # multi-headed attention layers. - attention_bias = model_utils.get_padding_bias(inputs) - - # Run the inputs through the encoder layer to map the symbol - # representations to continuous representations. - encoder_outputs = self.encode(inputs, attention_bias, training) - # Generate output sequence if targets is None, or return logits if target - # sequence is known. - if targets is None: - return self.predict(encoder_outputs, attention_bias, training) - else: - logits = self.decode(targets, encoder_outputs, attention_bias, training) - return logits - - def encode(self, inputs, attention_bias, training): - """Generate continuous representation for inputs. - - Args: - inputs: int tensor with shape [batch_size, input_length]. - attention_bias: float tensor with shape [batch_size, 1, 1, input_length]. - training: boolean, whether in training mode or not. - - Returns: - float tensor with shape [batch_size, input_length, hidden_size] - """ - with tf.name_scope("encode"): - # Prepare inputs to the layer stack by adding positional encodings and - # applying dropout. - embedded_inputs = self.embedding_softmax_layer(inputs) - embedded_inputs = tf.cast(embedded_inputs, self.params["dtype"]) - inputs_padding = model_utils.get_padding(inputs) - attention_bias = tf.cast(attention_bias, self.params["dtype"]) - - with tf.name_scope("add_pos_encoding"): - pos_encoding = self.position_embedding(inputs=embedded_inputs) - pos_encoding = tf.cast(pos_encoding, self.params["dtype"]) - encoder_inputs = embedded_inputs + pos_encoding - - if training: - encoder_inputs = tf.nn.dropout( - encoder_inputs, rate=self.params["layer_postprocess_dropout"]) - - return self.encoder_stack( - encoder_inputs, attention_bias, inputs_padding, training=training) - - def decode(self, targets, encoder_outputs, attention_bias, training): - """Generate logits for each value in the target sequence. - - Args: - targets: target values for the output sequence. int tensor with shape - [batch_size, target_length] - encoder_outputs: continuous representation of input sequence. float tensor - with shape [batch_size, input_length, hidden_size] - attention_bias: float tensor with shape [batch_size, 1, 1, input_length] - training: boolean, whether in training mode or not. - - Returns: - float32 tensor with shape [batch_size, target_length, vocab_size] - """ - with tf.name_scope("decode"): - # Prepare inputs to decoder layers by shifting targets, adding positional - # encoding and applying dropout. - with tf.name_scope("shift_targets"): - # Shift targets to the right, and remove the last element - targets = tf.pad(targets, [[0, 0], [1, 0]])[:, :-1] - decoder_inputs = self.embedding_softmax_layer(targets) - decoder_inputs = tf.cast(decoder_inputs, self.params["dtype"]) - attention_bias = tf.cast(attention_bias, self.params["dtype"]) - with tf.name_scope("add_pos_encoding"): - length = tf.shape(decoder_inputs)[1] - pos_encoding = self.position_embedding(decoder_inputs) - pos_encoding = tf.cast(pos_encoding, self.params["dtype"]) - decoder_inputs += pos_encoding - if training: - decoder_inputs = tf.nn.dropout( - decoder_inputs, rate=self.params["layer_postprocess_dropout"]) - - # Run values - decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias( - length, dtype=self.params["dtype"]) - outputs = self.decoder_stack( - decoder_inputs, - encoder_outputs, - decoder_self_attention_bias, - attention_bias, - training=training) - logits = self.embedding_softmax_layer(outputs, mode="linear") - logits = tf.cast(logits, tf.float32) - return logits - - def _get_symbols_to_logits_fn(self, max_decode_length, training): - """Returns a decoding function that calculates logits of the next tokens.""" - timing_signal = self.position_embedding( - inputs=None, length=max_decode_length + 1) - timing_signal = tf.cast(timing_signal, self.params["dtype"]) - decoder_self_attention_bias = model_utils.get_decoder_self_attention_bias( - max_decode_length, dtype=self.params["dtype"]) - - def symbols_to_logits_fn(ids, i, cache): - """Generate logits for next potential IDs. - - Args: - ids: Current decoded sequences. int tensor with shape [batch_size * - beam_size, i + 1]. - i: Loop index. - cache: dictionary of values storing the encoder output, encoder-decoder - attention bias, and previous decoder attention values. - - Returns: - Tuple of - (logits with shape [batch_size * beam_size, vocab_size], - updated cache values) - """ - # Set decoder input to the last generated IDs - decoder_input = ids[:, -1:] - - # Preprocess decoder input by getting embeddings and adding timing signal. - decoder_input = self.embedding_softmax_layer(decoder_input) - decoder_input += timing_signal[i] - if self.params["padded_decode"]: - bias_shape = decoder_self_attention_bias.shape.as_list() - self_attention_bias = tf.slice( - decoder_self_attention_bias, [0, 0, i, 0], - [bias_shape[0], bias_shape[1], 1, bias_shape[3]]) - else: - self_attention_bias = decoder_self_attention_bias[:, :, i:i + 1, :i + 1] - - decoder_outputs = self.decoder_stack( - decoder_input, - cache.get("encoder_outputs"), - self_attention_bias, - cache.get("encoder_decoder_attention_bias"), - training=training, - cache=cache, - decode_loop_step=i if self.params["padded_decode"] else None) - logits = self.embedding_softmax_layer(decoder_outputs, mode="linear") - logits = tf.squeeze(logits, axis=[1]) - return logits, cache - - return symbols_to_logits_fn - - def predict(self, encoder_outputs, encoder_decoder_attention_bias, training): - """Return predicted sequence.""" - encoder_outputs = tf.cast(encoder_outputs, self.params["dtype"]) - if self.params["padded_decode"]: - batch_size = encoder_outputs.shape.as_list()[0] - input_length = encoder_outputs.shape.as_list()[1] - else: - batch_size = tf.shape(encoder_outputs)[0] - input_length = tf.shape(encoder_outputs)[1] - max_decode_length = input_length + self.params["extra_decode_length"] - encoder_decoder_attention_bias = tf.cast(encoder_decoder_attention_bias, - self.params["dtype"]) - - symbols_to_logits_fn = self._get_symbols_to_logits_fn( - max_decode_length, training) - - # Create initial set of IDs that will be passed into symbols_to_logits_fn. - initial_ids = tf.zeros([batch_size], dtype=tf.int32) - - # Create cache storing decoder attention values for each layer. - # pylint: disable=g-complex-comprehension - init_decode_length = ( - max_decode_length if self.params["padded_decode"] else 0) - num_heads = self.params["num_heads"] - dim_per_head = self.params["hidden_size"] // num_heads - cache = { - "layer_%d" % layer: { - "k": - tf.zeros( - [batch_size, init_decode_length, num_heads, dim_per_head], - dtype=self.params["dtype"]), - "v": - tf.zeros( - [batch_size, init_decode_length, num_heads, dim_per_head], - dtype=self.params["dtype"]) - } for layer in range(self.params["num_hidden_layers"]) - } - # pylint: enable=g-complex-comprehension - - # Add encoder output and attention bias to the cache. - cache["encoder_outputs"] = encoder_outputs - cache["encoder_decoder_attention_bias"] = encoder_decoder_attention_bias - - # Use beam search to find the top beam_size sequences and scores. - decoded_ids, scores = beam_search.sequence_beam_search( - symbols_to_logits_fn=symbols_to_logits_fn, - initial_ids=initial_ids, - initial_cache=cache, - vocab_size=self.params["vocab_size"], - beam_size=self.params["beam_size"], - alpha=self.params["alpha"], - max_decode_length=max_decode_length, - eos_id=EOS_ID, - padded_decode=self.params["padded_decode"], - dtype=self.params["dtype"]) - - # Get the top sequence for each batch element - top_decoded_ids = decoded_ids[:, 0, 1:] - top_scores = scores[:, 0] - - return {"outputs": top_decoded_ids, "scores": top_scores} - - -class PrePostProcessingWrapper(tf.keras.layers.Layer): - """Wrapper class that applies layer pre-processing and post-processing.""" - - def __init__(self, layer, params): - super(PrePostProcessingWrapper, self).__init__() - self.layer = layer - self.params = params - self.postprocess_dropout = params["layer_postprocess_dropout"] - - def build(self, input_shape): - # Create normalization layer - self.layer_norm = tf.keras.layers.LayerNormalization( - epsilon=1e-6, dtype="float32") - super(PrePostProcessingWrapper, self).build(input_shape) - - def get_config(self): - return { - "params": self.params, - } - - def call(self, x, *args, **kwargs): - """Calls wrapped layer with same parameters.""" - # Preprocessing: apply layer normalization - training = kwargs["training"] - - y = self.layer_norm(x) - - # Get layer output - y = self.layer(y, *args, **kwargs) - - # Postprocessing: apply dropout and residual connection - if training: - y = tf.nn.dropout(y, rate=self.postprocess_dropout) - return x + y - - -class EncoderStack(tf.keras.layers.Layer): - """Transformer encoder stack. - - The encoder stack is made up of N identical layers. Each layer is composed - of the sublayers: - 1. Self-attention layer - 2. Feedforward network (which is 2 fully-connected layers) - """ - - def __init__(self, params): - super(EncoderStack, self).__init__() - self.params = params - self.layers = [] - - def build(self, input_shape): - """Builds the encoder stack.""" - params = self.params - for _ in range(params["num_hidden_layers"]): - # Create sublayers for each layer. - self_attention_layer = attention_layer.SelfAttention( - params["hidden_size"], params["num_heads"], - params["attention_dropout"]) - feed_forward_network = ffn_layer.FeedForwardNetwork( - params["hidden_size"], params["filter_size"], params["relu_dropout"]) - - self.layers.append([ - PrePostProcessingWrapper(self_attention_layer, params), - PrePostProcessingWrapper(feed_forward_network, params) - ]) - - # Create final layer normalization layer. - self.output_normalization = tf.keras.layers.LayerNormalization( - epsilon=1e-6, dtype="float32") - super(EncoderStack, self).build(input_shape) - - def get_config(self): - return { - "params": self.params, - } - - def call(self, encoder_inputs, attention_bias, inputs_padding, training): - """Return the output of the encoder layer stacks. - - Args: - encoder_inputs: tensor with shape [batch_size, input_length, hidden_size] - attention_bias: bias for the encoder self-attention layer. [batch_size, 1, - 1, input_length] - inputs_padding: tensor with shape [batch_size, input_length], inputs with - zero paddings. - training: boolean, whether in training mode or not. - - Returns: - Output of encoder layer stack. - float32 tensor with shape [batch_size, input_length, hidden_size] - """ - for n, layer in enumerate(self.layers): - # Run inputs through the sublayers. - self_attention_layer = layer[0] - feed_forward_network = layer[1] - - with tf.name_scope("layer_%d" % n): - with tf.name_scope("self_attention"): - encoder_inputs = self_attention_layer( - encoder_inputs, attention_bias, training=training) - with tf.name_scope("ffn"): - encoder_inputs = feed_forward_network( - encoder_inputs, training=training) - - return self.output_normalization(encoder_inputs) - - -class DecoderStack(tf.keras.layers.Layer): - """Transformer decoder stack. - - Like the encoder stack, the decoder stack is made up of N identical layers. - Each layer is composed of the sublayers: - 1. Self-attention layer - 2. Multi-headed attention layer combining encoder outputs with results from - the previous self-attention layer. - 3. Feedforward network (2 fully-connected layers) - """ - - def __init__(self, params): - super(DecoderStack, self).__init__() - self.params = params - self.layers = [] - - def build(self, input_shape): - """Builds the decoder stack.""" - params = self.params - for _ in range(params["num_hidden_layers"]): - self_attention_layer = attention_layer.SelfAttention( - params["hidden_size"], params["num_heads"], - params["attention_dropout"]) - enc_dec_attention_layer = attention_layer.Attention( - params["hidden_size"], params["num_heads"], - params["attention_dropout"]) - feed_forward_network = ffn_layer.FeedForwardNetwork( - params["hidden_size"], params["filter_size"], params["relu_dropout"]) - - self.layers.append([ - PrePostProcessingWrapper(self_attention_layer, params), - PrePostProcessingWrapper(enc_dec_attention_layer, params), - PrePostProcessingWrapper(feed_forward_network, params) - ]) - self.output_normalization = tf.keras.layers.LayerNormalization( - epsilon=1e-6, dtype="float32") - super(DecoderStack, self).build(input_shape) - - def get_config(self): - return { - "params": self.params, - } - - def call(self, - decoder_inputs, - encoder_outputs, - decoder_self_attention_bias, - attention_bias, - training, - cache=None, - decode_loop_step=None): - """Return the output of the decoder layer stacks. - - Args: - decoder_inputs: A tensor with shape [batch_size, target_length, - hidden_size]. - encoder_outputs: A tensor with shape [batch_size, input_length, - hidden_size] - decoder_self_attention_bias: A tensor with shape [1, 1, target_len, - target_length], the bias for decoder self-attention layer. - attention_bias: A tensor with shape [batch_size, 1, 1, input_length], the - bias for encoder-decoder attention layer. - training: A bool, whether in training mode or not. - cache: (Used for fast decoding) A nested dictionary storing previous - decoder self-attention values. The items are: - {layer_n: {"k": A tensor with shape [batch_size, i, key_channels], - "v": A tensor with shape [batch_size, i, value_channels]}, - ...} - decode_loop_step: An integer, the step number of the decoding loop. Used - only for autoregressive inference on TPU. - - Returns: - Output of decoder layer stack. - float32 tensor with shape [batch_size, target_length, hidden_size] - """ - for n, layer in enumerate(self.layers): - self_attention_layer = layer[0] - enc_dec_attention_layer = layer[1] - feed_forward_network = layer[2] - - # Run inputs through the sublayers. - layer_name = "layer_%d" % n - layer_cache = cache[layer_name] if cache is not None else None - with tf.name_scope(layer_name): - with tf.name_scope("self_attention"): - decoder_inputs = self_attention_layer( - decoder_inputs, - decoder_self_attention_bias, - training=training, - cache=layer_cache, - decode_loop_step=decode_loop_step) - with tf.name_scope("encdec_attention"): - decoder_inputs = enc_dec_attention_layer( - decoder_inputs, - encoder_outputs, - attention_bias, - training=training) - with tf.name_scope("ffn"): - decoder_inputs = feed_forward_network( - decoder_inputs, training=training) - - return self.output_normalization(decoder_inputs) diff --git a/official/nlp/transformer/transformer_forward_test.py b/official/nlp/transformer/transformer_forward_test.py deleted file mode 100644 index 4c8406a32e906bc8683b0a3a744eb5890e665cc9..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/transformer_forward_test.py +++ /dev/null @@ -1,157 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Forward pass test for Transformer model refactoring.""" - -import numpy as np - -import tensorflow as tf - -from official.nlp.modeling import models -from official.nlp.transformer import metrics -from official.nlp.transformer import model_params -from official.nlp.transformer import transformer - - -def _count_params(layer, trainable_only=True): - """Returns the count of all model parameters, or just trainable ones.""" - if not trainable_only: - return layer.count_params() - else: - return int( - np.sum([ - tf.keras.backend.count_params(p) for p in layer.trainable_weights - ])) - - -def _create_model(params, is_train): - """Creates transformer model.""" - - encdec_kwargs = dict( - num_layers=params["num_hidden_layers"], - num_attention_heads=params["num_heads"], - intermediate_size=params["filter_size"], - activation="relu", - dropout_rate=params["relu_dropout"], - attention_dropout_rate=params["attention_dropout"], - use_bias=False, - norm_first=True, - norm_epsilon=1e-6, - intermediate_dropout=params["relu_dropout"]) - encoder_layer = models.TransformerEncoder(**encdec_kwargs) - decoder_layer = models.TransformerDecoder(**encdec_kwargs) - - model_kwargs = dict( - vocab_size=params["vocab_size"], - embedding_width=params["hidden_size"], - dropout_rate=params["layer_postprocess_dropout"], - padded_decode=params["padded_decode"], - decode_max_length=params["decode_max_length"], - dtype=params["dtype"], - extra_decode_length=params["extra_decode_length"], - beam_size=params["beam_size"], - alpha=params["alpha"], - encoder_layer=encoder_layer, - decoder_layer=decoder_layer, - name="transformer_v2") - - if is_train: - inputs = tf.keras.layers.Input((None,), dtype="int64", name="inputs") - targets = tf.keras.layers.Input((None,), dtype="int64", name="targets") - internal_model = models.Seq2SeqTransformer(**model_kwargs) - logits = internal_model( - dict(inputs=inputs, targets=targets), training=is_train) - vocab_size = params["vocab_size"] - label_smoothing = params["label_smoothing"] - if params["enable_metrics_in_training"]: - logits = metrics.MetricLayer(vocab_size)([logits, targets]) - logits = tf.keras.layers.Lambda( - lambda x: x, name="logits", dtype=tf.float32)( - logits) - model = tf.keras.Model([inputs, targets], logits) - loss = metrics.transformer_loss(logits, targets, label_smoothing, - vocab_size) - model.add_loss(loss) - return model - - batch_size = params["decode_batch_size"] if params["padded_decode"] else None - inputs = tf.keras.layers.Input((None,), - batch_size=batch_size, - dtype="int64", - name="inputs") - internal_model = models.Seq2SeqTransformer(**model_kwargs) - ret = internal_model(dict(inputs=inputs), training=is_train) - outputs, scores = ret["outputs"], ret["scores"] - return tf.keras.Model(inputs, [outputs, scores]) - - -class TransformerForwardTest(tf.test.TestCase): - - def setUp(self): - super(TransformerForwardTest, self).setUp() - self.params = params = model_params.TINY_PARAMS - params["batch_size"] = params["default_batch_size"] = 16 - params["hidden_size"] = 12 - params["num_hidden_layers"] = 3 - params["filter_size"] = 14 - params["num_heads"] = 2 - params["vocab_size"] = 41 - params["extra_decode_length"] = 0 - params["beam_size"] = 3 - params["dtype"] = tf.float32 - params["layer_postprocess_dropout"] = 0.0 - params["attention_dropout"] = 0.0 - params["relu_dropout"] = 0.0 - - def test_forward_pass_train(self): - # Set input_len different from target_len - inputs = np.asarray([[5, 2, 1], [7, 5, 0], [1, 4, 0], [7, 5, 11]]) - targets = np.asarray([[4, 3, 4, 0], [13, 19, 17, 8], [20, 14, 1, 2], - [5, 7, 3, 0]]) - - # src_model is the original model before refactored. - src_model = transformer.create_model(self.params, True) - src_num_weights = _count_params(src_model) - src_weights = src_model.get_weights() - src_model_output = src_model([inputs, targets], training=True) - - # dest_model is the refactored model. - dest_model = _create_model(self.params, True) - dest_num_weights = _count_params(dest_model) - self.assertEqual(src_num_weights, dest_num_weights) - dest_model.set_weights(src_weights) - dest_model_output = dest_model([inputs, targets], training=True) - self.assertAllEqual(src_model_output, dest_model_output) - - def test_forward_pass_not_train(self): - inputs = np.asarray([[5, 2, 1], [7, 5, 0], [1, 4, 0], [7, 5, 11]]) - - # src_model is the original model before refactored. - src_model = transformer.create_model(self.params, False) - src_num_weights = _count_params(src_model) - src_weights = src_model.get_weights() - src_model_output = src_model([inputs], training=False) - - # dest_model is the refactored model. - dest_model = _create_model(self.params, False) - dest_num_weights = _count_params(dest_model) - self.assertEqual(src_num_weights, dest_num_weights) - dest_model.set_weights(src_weights) - dest_model_output = dest_model([inputs], training=False) - self.assertAllEqual(src_model_output[0], dest_model_output[0]) - self.assertAllEqual(src_model_output[1], dest_model_output[1]) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/transformer/transformer_layers_test.py b/official/nlp/transformer/transformer_layers_test.py deleted file mode 100644 index 83e76890548e2c4d40345e1b802e22a7fd645b2d..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/transformer_layers_test.py +++ /dev/null @@ -1,125 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for layers in Transformer.""" - -import tensorflow as tf - -from official.nlp.transformer import attention_layer -from official.nlp.transformer import embedding_layer -from official.nlp.transformer import ffn_layer -from official.nlp.transformer import metrics - - -class TransformerLayersTest(tf.test.TestCase): - - def test_attention_layer(self): - hidden_size = 64 - num_heads = 4 - dropout = 0.5 - dim_per_head = hidden_size // num_heads - layer = attention_layer.SelfAttention(hidden_size, num_heads, dropout) - self.assertDictEqual( - layer.get_config(), { - "hidden_size": hidden_size, - "num_heads": num_heads, - "attention_dropout": dropout, - }) - length = 2 - x = tf.ones([1, length, hidden_size]) - bias = tf.ones([1]) - cache = { - "k": tf.zeros([1, 0, num_heads, dim_per_head]), - "v": tf.zeros([1, 0, num_heads, dim_per_head]), - } - y = layer(x, bias, training=True, cache=cache) - self.assertEqual(y.shape, ( - 1, - length, - 64, - )) - self.assertEqual(cache["k"].shape, ( - 1, - length, - num_heads, - dim_per_head, - )) - self.assertEqual(cache["v"].shape, ( - 1, - length, - num_heads, - dim_per_head, - )) - - def test_embedding_shared_weights(self): - vocab_size = 50 - hidden_size = 64 - length = 2 - layer = embedding_layer.EmbeddingSharedWeights(vocab_size, hidden_size) - self.assertDictEqual(layer.get_config(), { - "vocab_size": 50, - "hidden_size": 64, - }) - - idx = tf.ones([1, length], dtype="int32") - y = layer(idx) - self.assertEqual(y.shape, ( - 1, - length, - hidden_size, - )) - x = tf.ones([1, length, hidden_size]) - output = layer(x, "linear") - self.assertEqual(output.shape, ( - 1, - length, - vocab_size, - )) - - def test_feed_forward_network(self): - hidden_size = 64 - filter_size = 32 - relu_dropout = 0.5 - layer = ffn_layer.FeedForwardNetwork(hidden_size, filter_size, relu_dropout) - self.assertDictEqual( - layer.get_config(), { - "hidden_size": hidden_size, - "filter_size": filter_size, - "relu_dropout": relu_dropout, - }) - length = 2 - x = tf.ones([1, length, hidden_size]) - y = layer(x, training=True) - self.assertEqual(y.shape, ( - 1, - length, - hidden_size, - )) - - def test_metric_layer(self): - vocab_size = 50 - logits = tf.keras.layers.Input((None, vocab_size), - dtype="float32", - name="logits") - targets = tf.keras.layers.Input((None,), dtype="int64", name="targets") - output_logits = metrics.MetricLayer(vocab_size)([logits, targets]) - self.assertEqual(output_logits.shape.as_list(), [ - None, - None, - vocab_size, - ]) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/transformer/transformer_main.py b/official/nlp/transformer/transformer_main.py deleted file mode 100644 index 015c4d7dda1a7153af8ac0f14cdf38a984304e9b..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/transformer_main.py +++ /dev/null @@ -1,482 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Train and evaluate the Transformer model. - -See README for description of setting the training schedule and evaluating the -BLEU score. -""" - -import os -import tempfile - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import tensorflow as tf -from official.common import distribute_utils -from official.modeling import performance -from official.nlp.transformer import compute_bleu -from official.nlp.transformer import data_pipeline -from official.nlp.transformer import metrics -from official.nlp.transformer import misc -from official.nlp.transformer import optimizer -from official.nlp.transformer import transformer -from official.nlp.transformer import translate -from official.nlp.transformer.utils import tokenizer -from official.utils.flags import core as flags_core -from official.utils.misc import keras_utils -# pylint:disable=logging-format-interpolation - -INF = int(1e9) -BLEU_DIR = "bleu" -_SINGLE_SAMPLE = 1 - - -def translate_and_compute_bleu(model, - params, - subtokenizer, - bleu_source, - bleu_ref, - distribution_strategy=None): - """Translate file and report the cased and uncased bleu scores. - - Args: - model: A Keras model, used to generate the translations. - params: A dictionary, containing the translation related parameters. - subtokenizer: A subtokenizer object, used for encoding and decoding source - and translated lines. - bleu_source: A file containing source sentences for translation. - bleu_ref: A file containing the reference for the translated sentences. - distribution_strategy: A platform distribution strategy, used for TPU based - translation. - - Returns: - uncased_score: A float, the case insensitive BLEU score. - cased_score: A float, the case sensitive BLEU score. - """ - # Create temporary file to store translation. - tmp = tempfile.NamedTemporaryFile(delete=False) - tmp_filename = tmp.name - - translate.translate_file( - model, - params, - subtokenizer, - bleu_source, - output_file=tmp_filename, - print_all_translations=False, - distribution_strategy=distribution_strategy) - - # Compute uncased and cased bleu scores. - uncased_score = compute_bleu.bleu_wrapper(bleu_ref, tmp_filename, False) - cased_score = compute_bleu.bleu_wrapper(bleu_ref, tmp_filename, True) - os.remove(tmp_filename) - return uncased_score, cased_score - - -def evaluate_and_log_bleu(model, - params, - bleu_source, - bleu_ref, - vocab_file, - distribution_strategy=None): - """Calculate and record the BLEU score. - - Args: - model: A Keras model, used to generate the translations. - params: A dictionary, containing the translation related parameters. - bleu_source: A file containing source sentences for translation. - bleu_ref: A file containing the reference for the translated sentences. - vocab_file: A file containing the vocabulary for translation. - distribution_strategy: A platform distribution strategy, used for TPU based - translation. - - Returns: - uncased_score: A float, the case insensitive BLEU score. - cased_score: A float, the case sensitive BLEU score. - """ - subtokenizer = tokenizer.Subtokenizer(vocab_file) - - uncased_score, cased_score = translate_and_compute_bleu( - model, params, subtokenizer, bleu_source, bleu_ref, distribution_strategy) - - logging.info("Bleu score (uncased): %s", uncased_score) - logging.info("Bleu score (cased): %s", cased_score) - return uncased_score, cased_score - - -class TransformerTask(object): - """Main entry of Transformer model.""" - - def __init__(self, flags_obj): - """Init function of TransformerMain. - - Args: - flags_obj: Object containing parsed flag values, i.e., FLAGS. - - Raises: - ValueError: if not using static batch for input data on TPU. - """ - self.flags_obj = flags_obj - self.predict_model = None - - # Add flag-defined parameters to params object - num_gpus = flags_core.get_num_gpus(flags_obj) - self.params = params = misc.get_model_params(flags_obj.param_set, num_gpus) - - params["num_gpus"] = num_gpus - params["use_ctl"] = flags_obj.use_ctl - params["data_dir"] = flags_obj.data_dir - params["model_dir"] = flags_obj.model_dir - params["static_batch"] = flags_obj.static_batch - params["max_length"] = flags_obj.max_length - params["decode_batch_size"] = flags_obj.decode_batch_size - params["decode_max_length"] = flags_obj.decode_max_length - params["padded_decode"] = flags_obj.padded_decode - params["max_io_parallelism"] = ( - flags_obj.num_parallel_calls or tf.data.experimental.AUTOTUNE) - - params["use_synthetic_data"] = flags_obj.use_synthetic_data - params["batch_size"] = flags_obj.batch_size or params["default_batch_size"] - params["repeat_dataset"] = None - params["dtype"] = flags_core.get_tf_dtype(flags_obj) - params["enable_tensorboard"] = flags_obj.enable_tensorboard - params["enable_metrics_in_training"] = flags_obj.enable_metrics_in_training - params["steps_between_evals"] = flags_obj.steps_between_evals - params["enable_checkpointing"] = flags_obj.enable_checkpointing - params["save_weights_only"] = flags_obj.save_weights_only - - self.distribution_strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=flags_obj.distribution_strategy, - num_gpus=num_gpus, - all_reduce_alg=flags_obj.all_reduce_alg, - num_packs=flags_obj.num_packs, - tpu_address=flags_obj.tpu or "") - if self.use_tpu: - params["num_replicas"] = self.distribution_strategy.num_replicas_in_sync - else: - logging.info("Running transformer with num_gpus = %d", num_gpus) - - if self.distribution_strategy: - logging.info("For training, using distribution strategy: %s", - self.distribution_strategy) - else: - logging.info("Not using any distribution strategy.") - - performance.set_mixed_precision_policy(params["dtype"]) - - @property - def use_tpu(self): - if self.distribution_strategy: - return isinstance(self.distribution_strategy, tf.distribute.TPUStrategy) - return False - - def train(self): - """Trains the model.""" - params = self.params - flags_obj = self.flags_obj - # Sets config options. - keras_utils.set_session_config(enable_xla=flags_obj.enable_xla) - - _ensure_dir(flags_obj.model_dir) - with distribute_utils.get_strategy_scope(self.distribution_strategy): - model = transformer.create_model(params, is_train=True) - opt = self._create_optimizer() - - current_step = 0 - checkpoint = tf.train.Checkpoint(model=model, optimizer=opt) - latest_checkpoint = tf.train.latest_checkpoint(flags_obj.model_dir) - if latest_checkpoint: - checkpoint.restore(latest_checkpoint) - logging.info("Loaded checkpoint %s", latest_checkpoint) - current_step = opt.iterations.numpy() - - if params["use_ctl"]: - train_loss_metric = tf.keras.metrics.Mean( - "training_loss", dtype=tf.float32) - if params["enable_tensorboard"]: - summary_writer = tf.summary.create_file_writer( - os.path.join(flags_obj.model_dir, "summary")) - else: - summary_writer = tf.summary.create_noop_writer() - train_metrics = [train_loss_metric] - if params["enable_metrics_in_training"]: - train_metrics = train_metrics + model.metrics - else: - model.compile(opt) - - model.summary() - - if self.use_tpu: - # Different from experimental_distribute_dataset, - # distribute_datasets_from_function requires - # per-replica/local batch size. - params["batch_size"] /= self.distribution_strategy.num_replicas_in_sync - train_ds = ( - self.distribution_strategy.distribute_datasets_from_function( - lambda ctx: data_pipeline.train_input_fn(params, ctx))) - else: - train_ds = data_pipeline.train_input_fn(params) - map_data_fn = data_pipeline.map_data_for_transformer_fn - train_ds = train_ds.map( - map_data_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) - if params["use_ctl"]: - train_ds_iterator = iter(train_ds) - - callbacks = self._create_callbacks(flags_obj.model_dir, params) - - # Only TimeHistory callback is supported for CTL - if params["use_ctl"]: - callbacks = [cb for cb in callbacks - if isinstance(cb, keras_utils.TimeHistory)] - - @tf.function - def train_steps(iterator, steps): - """Training steps function for TPU runs. - - Args: - iterator: The input iterator of the training dataset. - steps: An integer, the number of training steps. - - Returns: - A float, the loss value. - """ - - def _step_fn(inputs): - """Per-replica step function.""" - inputs, targets = inputs - with tf.GradientTape() as tape: - logits = model([inputs, targets], training=True) - loss = metrics.transformer_loss(logits, targets, - params["label_smoothing"], - params["vocab_size"]) - # Scales the loss, which results in using the average loss across all - # of the replicas for backprop. - scaled_loss = loss / self.distribution_strategy.num_replicas_in_sync - - # De-dupes variables due to keras tracking issues. - tvars = list({id(v): v for v in model.trainable_variables}.values()) - grads = tape.gradient(scaled_loss, tvars) - opt.apply_gradients(zip(grads, tvars)) - # For reporting, the metric takes the mean of losses. - train_loss_metric.update_state(loss) - - for _ in tf.range(steps): - train_loss_metric.reset_states() - self.distribution_strategy.run( - _step_fn, args=(next(iterator),)) - - cased_score, uncased_score = None, None - cased_score_history, uncased_score_history = [], [] - while current_step < flags_obj.train_steps: - remaining_steps = flags_obj.train_steps - current_step - train_steps_per_eval = ( - remaining_steps if remaining_steps < flags_obj.steps_between_evals - else flags_obj.steps_between_evals) - current_iteration = current_step // flags_obj.steps_between_evals - - logging.info( - "Start train iteration at global step:{}".format(current_step)) - history = None - if params["use_ctl"]: - if not self.use_tpu: - raise NotImplementedError( - "Custom training loop on GPUs is not implemented.") - - # Runs training steps. - with summary_writer.as_default(): - for cb in callbacks: - cb.on_epoch_begin(current_iteration) - cb.on_batch_begin(0) - - train_steps( - train_ds_iterator, - tf.convert_to_tensor(train_steps_per_eval, dtype=tf.int32)) - current_step += train_steps_per_eval - train_loss = train_loss_metric.result().numpy().astype(float) - logging.info("Train Step: %d/%d / loss = %s", current_step, - flags_obj.train_steps, train_loss) - - for cb in callbacks: - cb.on_batch_end(train_steps_per_eval - 1) - cb.on_epoch_end(current_iteration) - - if params["enable_tensorboard"]: - for metric_obj in train_metrics: - tf.summary.scalar(metric_obj.name, metric_obj.result(), - current_step) - summary_writer.flush() - - for cb in callbacks: - cb.on_train_end() - - if flags_obj.enable_checkpointing: - # avoid check-pointing when running for benchmarking. - checkpoint_name = checkpoint.save( - os.path.join(flags_obj.model_dir, - "ctl_step_{}.ckpt".format(current_step))) - logging.info("Saved checkpoint to %s", checkpoint_name) - else: - if self.use_tpu: - raise NotImplementedError( - "Keras model.fit on TPUs is not implemented.") - history = model.fit( - train_ds, - initial_epoch=current_iteration, - epochs=current_iteration + 1, - steps_per_epoch=train_steps_per_eval, - callbacks=callbacks, - # If TimeHistory is enabled, progress bar would be messy. Increase - # the verbose level to get rid of it. - verbose=(2 if flags_obj.enable_time_history else 1)) - current_step += train_steps_per_eval - logging.info("Train history: {}".format(history.history)) - - logging.info("End train iteration at global step:{}".format(current_step)) - - if (flags_obj.bleu_source and flags_obj.bleu_ref): - uncased_score, cased_score = self.eval() - cased_score_history.append([current_iteration + 1, cased_score]) - uncased_score_history.append([current_iteration + 1, uncased_score]) - - stats = ({ - "loss": train_loss - } if history is None else {}) - misc.update_stats(history, stats, callbacks) - if uncased_score and cased_score: - stats["bleu_uncased"] = uncased_score - stats["bleu_cased"] = cased_score - stats["bleu_uncased_history"] = uncased_score_history - stats["bleu_cased_history"] = cased_score_history - return stats - - def eval(self): - """Evaluates the model.""" - distribution_strategy = self.distribution_strategy if self.use_tpu else None - - # We only want to create the model under DS scope for TPU case. - # When 'distribution_strategy' is None, a no-op DummyContextManager will - # be used. - with distribute_utils.get_strategy_scope(distribution_strategy): - if not self.predict_model: - self.predict_model = transformer.create_model(self.params, False) - self._load_weights_if_possible( - self.predict_model, - tf.train.latest_checkpoint(self.flags_obj.model_dir)) - self.predict_model.summary() - return evaluate_and_log_bleu( - self.predict_model, self.params, self.flags_obj.bleu_source, - self.flags_obj.bleu_ref, self.flags_obj.vocab_file, - distribution_strategy) - - def predict(self): - """Predicts result from the model.""" - params = self.params - flags_obj = self.flags_obj - - with tf.name_scope("model"): - model = transformer.create_model(params, is_train=False) - self._load_weights_if_possible( - model, tf.train.latest_checkpoint(self.flags_obj.model_dir)) - model.summary() - subtokenizer = tokenizer.Subtokenizer(flags_obj.vocab_file) - - ds = data_pipeline.eval_input_fn(params) - ds = ds.map(lambda x, y: x).take(_SINGLE_SAMPLE) - ret = model.predict(ds) - val_outputs, _ = ret - length = len(val_outputs) - for i in range(length): - translate.translate_from_input(val_outputs[i], subtokenizer) - - def _create_callbacks(self, cur_log_dir, params): - """Creates a list of callbacks.""" - callbacks = misc.get_callbacks() - if params["enable_checkpointing"]: - ckpt_full_path = os.path.join(cur_log_dir, "cp-{epoch:04d}.ckpt") - callbacks.append( - tf.keras.callbacks.ModelCheckpoint( - ckpt_full_path, save_weights_only=params["save_weights_only"])) - return callbacks - - def _load_weights_if_possible(self, model, init_weight_path=None): - """Loads model weights when it is provided.""" - if init_weight_path: - logging.info("Load weights: {}".format(init_weight_path)) - if self.use_tpu: - checkpoint = tf.train.Checkpoint( - model=model, optimizer=self._create_optimizer()) - checkpoint.restore(init_weight_path) - else: - model.load_weights(init_weight_path) - else: - logging.info("Weights not loaded from path:{}".format(init_weight_path)) - - def _create_optimizer(self): - """Creates optimizer.""" - params = self.params - lr_schedule = optimizer.LearningRateSchedule( - params["learning_rate"], params["hidden_size"], - params["learning_rate_warmup_steps"]) - opt = tf.keras.optimizers.Adam( - lr_schedule, - params["optimizer_adam_beta1"], - params["optimizer_adam_beta2"], - epsilon=params["optimizer_adam_epsilon"]) - - opt = performance.configure_optimizer( - opt, - use_float16=params["dtype"] == tf.float16, - loss_scale=flags_core.get_loss_scale( - self.flags_obj, default_for_fp16="dynamic")) - - return opt - - -def _ensure_dir(log_dir): - """Makes log dir if not existed.""" - if not tf.io.gfile.exists(log_dir): - tf.io.gfile.makedirs(log_dir) - - -def main(_): - flags_obj = flags.FLAGS - if flags_obj.enable_mlir_bridge: - tf.config.experimental.enable_mlir_bridge() - task = TransformerTask(flags_obj) - - # Execute flag override logic for better model performance - if flags_obj.tf_gpu_thread_mode: - keras_utils.set_gpu_thread_mode_and_count( - per_gpu_thread_count=flags_obj.per_gpu_thread_count, - gpu_thread_mode=flags_obj.tf_gpu_thread_mode, - num_gpus=flags_obj.num_gpus, - datasets_num_private_threads=flags_obj.datasets_num_private_threads) - - if flags_obj.mode == "train": - task.train() - elif flags_obj.mode == "predict": - task.predict() - elif flags_obj.mode == "eval": - task.eval() - else: - raise ValueError("Invalid mode {}".format(flags_obj.mode)) - - -if __name__ == "__main__": - logging.set_verbosity(logging.INFO) - misc.define_transformer_flags() - app.run(main) diff --git a/official/nlp/transformer/transformer_main_test.py b/official/nlp/transformer/transformer_main_test.py deleted file mode 100644 index 79f4e17dc64f5c7d05331116d32c0be4d0f99dc0..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/transformer_main_test.py +++ /dev/null @@ -1,193 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test Transformer model.""" - -import os -import re -import sys -import unittest - -from absl import flags -from absl.testing import flagsaver -import tensorflow as tf -from tensorflow.python.eager import context # pylint: disable=ungrouped-imports -from official.nlp.transformer import misc -from official.nlp.transformer import transformer_main - -FLAGS = flags.FLAGS -FIXED_TIMESTAMP = 'my_time_stamp' -WEIGHT_PATTERN = re.compile(r'weights-epoch-.+\.hdf5') - - -def _generate_file(filepath, lines): - with open(filepath, 'w') as f: - for l in lines: - f.write('{}\n'.format(l)) - - -class TransformerTaskTest(tf.test.TestCase): - local_flags = None - - def setUp(self): # pylint: disable=g-missing-super-call - temp_dir = self.get_temp_dir() - if TransformerTaskTest.local_flags is None: - misc.define_transformer_flags() - # Loads flags, array cannot be blank. - flags.FLAGS(['foo']) - TransformerTaskTest.local_flags = flagsaver.save_flag_values() - else: - flagsaver.restore_flag_values(TransformerTaskTest.local_flags) - FLAGS.model_dir = os.path.join(temp_dir, FIXED_TIMESTAMP) - FLAGS.param_set = 'tiny' - FLAGS.use_synthetic_data = True - FLAGS.steps_between_evals = 1 - FLAGS.train_steps = 1 - FLAGS.validation_steps = 1 - FLAGS.batch_size = 4 - FLAGS.max_length = 1 - FLAGS.num_gpus = 1 - FLAGS.distribution_strategy = 'off' - FLAGS.dtype = 'fp32' - self.model_dir = FLAGS.model_dir - self.temp_dir = temp_dir - self.vocab_file = os.path.join(temp_dir, 'vocab') - self.vocab_size = misc.get_model_params(FLAGS.param_set, 0)['vocab_size'] - self.bleu_source = os.path.join(temp_dir, 'bleu_source') - self.bleu_ref = os.path.join(temp_dir, 'bleu_ref') - self.orig_policy = ( - tf.compat.v2.keras.mixed_precision.global_policy()) - - def tearDown(self): # pylint: disable=g-missing-super-call - tf.compat.v2.keras.mixed_precision.set_global_policy(self.orig_policy) - - def _assert_exists(self, filepath): - self.assertTrue(os.path.exists(filepath)) - - def test_train_no_dist_strat(self): - if context.num_gpus() >= 2: - self.skipTest('No need to test 2+ GPUs without a distribution strategy.') - t = transformer_main.TransformerTask(FLAGS) - t.train() - - def test_train_save_full_model(self): - if context.num_gpus() >= 2: - self.skipTest('No need to test 2+ GPUs without a distribution strategy.') - FLAGS.save_weights_only = False - t = transformer_main.TransformerTask(FLAGS) - t.train() - - def test_train_static_batch(self): - if context.num_gpus() >= 2: - self.skipTest('No need to test 2+ GPUs without a distribution strategy.') - FLAGS.distribution_strategy = 'one_device' - if tf.test.is_built_with_cuda(): - FLAGS.num_gpus = 1 - else: - FLAGS.num_gpus = 0 - FLAGS.static_batch = True - t = transformer_main.TransformerTask(FLAGS) - t.train() - - @unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU') - def test_train_1_gpu_with_dist_strat(self): - FLAGS.distribution_strategy = 'one_device' - t = transformer_main.TransformerTask(FLAGS) - t.train() - - @unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU') - def test_train_fp16(self): - FLAGS.distribution_strategy = 'one_device' - FLAGS.dtype = 'fp16' - t = transformer_main.TransformerTask(FLAGS) - t.train() - - @unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU') - def test_train_2_gpu(self): - if context.num_gpus() < 2: - self.skipTest( - '{} GPUs are not available for this test. {} GPUs are available' - .format(2, context.num_gpus())) - FLAGS.distribution_strategy = 'mirrored' - FLAGS.num_gpus = 2 - FLAGS.param_set = 'base' - t = transformer_main.TransformerTask(FLAGS) - t.train() - - @unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU') - def test_train_2_gpu_fp16(self): - if context.num_gpus() < 2: - self.skipTest( - '{} GPUs are not available for this test. {} GPUs are available' - .format(2, context.num_gpus())) - FLAGS.distribution_strategy = 'mirrored' - FLAGS.num_gpus = 2 - FLAGS.param_set = 'base' - FLAGS.dtype = 'fp16' - t = transformer_main.TransformerTask(FLAGS) - t.train() - - def _prepare_files_and_flags(self, *extra_flags): - # Make log dir. - if not os.path.exists(self.temp_dir): - os.makedirs(self.temp_dir) - - # Fake vocab, bleu_source and bleu_ref. - tokens = [ - "''", "''", "'_'", "'a'", "'b'", "'c'", "'d'", "'a_'", "'b_'", - "'c_'", "'d_'" - ] - tokens += ["'{}'".format(i) for i in range(self.vocab_size - len(tokens))] - _generate_file(self.vocab_file, tokens) - _generate_file(self.bleu_source, ['a b', 'c d']) - _generate_file(self.bleu_ref, ['a b', 'd c']) - - # Update flags. - update_flags = [ - 'ignored_program_name', - '--vocab_file={}'.format(self.vocab_file), - '--bleu_source={}'.format(self.bleu_source), - '--bleu_ref={}'.format(self.bleu_ref), - ] - if extra_flags: - update_flags.extend(extra_flags) - FLAGS(update_flags) - - def test_predict(self): - if context.num_gpus() >= 2: - self.skipTest('No need to test 2+ GPUs without a distribution strategy.') - self._prepare_files_and_flags() - t = transformer_main.TransformerTask(FLAGS) - t.predict() - - @unittest.skipUnless(tf.test.is_built_with_cuda(), 'requires GPU') - def test_predict_fp16(self): - if context.num_gpus() >= 2: - self.skipTest('No need to test 2+ GPUs without a distribution strategy.') - self._prepare_files_and_flags('--dtype=fp16') - t = transformer_main.TransformerTask(FLAGS) - t.predict() - - def test_eval(self): - if context.num_gpus() >= 2: - self.skipTest('No need to test 2+ GPUs without a distribution strategy.') - if 'test_xla' in sys.argv[0]: - self.skipTest('TODO(xla): Make this test faster under XLA.') - self._prepare_files_and_flags() - t = transformer_main.TransformerTask(FLAGS) - t.eval() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/nlp/transformer/transformer_test.py b/official/nlp/transformer/transformer_test.py deleted file mode 100644 index c64686dac034c5d0e1d4f29bf4b378f2b64ef130..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/transformer_test.py +++ /dev/null @@ -1,98 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test Transformer model.""" - -import tensorflow as tf - -from official.nlp.transformer import model_params -from official.nlp.transformer import transformer - - -class TransformerV2Test(tf.test.TestCase): - - def setUp(self): - super().setUp() - self.params = params = model_params.TINY_PARAMS - params["batch_size"] = params["default_batch_size"] = 16 - params["use_synthetic_data"] = True - params["hidden_size"] = 12 - params["num_hidden_layers"] = 2 - params["filter_size"] = 14 - params["num_heads"] = 2 - params["vocab_size"] = 41 - params["extra_decode_length"] = 2 - params["beam_size"] = 3 - params["dtype"] = tf.float32 - - def test_create_model_train(self): - model = transformer.create_model(self.params, True) - inputs, outputs = model.inputs, model.outputs - self.assertEqual(len(inputs), 2) - self.assertEqual(len(outputs), 1) - self.assertEqual(inputs[0].shape.as_list(), [None, None]) - self.assertEqual(inputs[0].dtype, tf.int64) - self.assertEqual(inputs[1].shape.as_list(), [None, None]) - self.assertEqual(inputs[1].dtype, tf.int64) - self.assertEqual(outputs[0].shape.as_list(), [None, None, 41]) - self.assertEqual(outputs[0].dtype, tf.float32) - - def test_create_model_not_train(self): - model = transformer.create_model(self.params, False) - inputs, outputs = model.inputs, model.outputs - self.assertEqual(len(inputs), 1) - self.assertEqual(len(outputs), 2) - self.assertEqual(inputs[0].shape.as_list(), [None, None]) - self.assertEqual(inputs[0].dtype, tf.int64) - self.assertEqual(outputs[0].shape.as_list(), [None, None]) - self.assertEqual(outputs[0].dtype, tf.int32) - self.assertEqual(outputs[1].shape.as_list(), [None]) - self.assertEqual(outputs[1].dtype, tf.float32) - - def test_export(self): - model = transformer.Transformer(self.params, name="transformer_v2") - export_dir = self.get_temp_dir() - batch_size = 5 - max_length = 6 - - class SaveModule(tf.Module): - - def __init__(self, model): - super(SaveModule, self).__init__() - self.model = model - - @tf.function - def serve(self, x): - return self.model.call([x], training=False) - - save_module = SaveModule(model) - tensor_shape = (None, None) - sample_input = tf.zeros((batch_size, max_length), dtype=tf.int64) - _ = save_module.serve(sample_input) - signatures = dict( - serving_default=save_module.serve.get_concrete_function( - tf.TensorSpec(shape=tensor_shape, dtype=tf.int64, name="x"))) - tf.saved_model.save(save_module, export_dir, signatures=signatures) - imported = tf.saved_model.load(export_dir) - serving_fn = imported.signatures["serving_default"] - all_outputs = serving_fn(sample_input) - output = all_outputs["outputs"] - output_shapes = output.shape.as_list() - self.assertEqual(output_shapes[0], batch_size) - self.assertEqual(output_shapes[1], - max_length + model.params["extra_decode_length"]) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/transformer/translate.py b/official/nlp/transformer/translate.py deleted file mode 100644 index 0c15096aed6b33fea0beb2f0ff76daf1737e09bb..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/translate.py +++ /dev/null @@ -1,190 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Translate text or files using trained transformer model.""" - -# Import libraries -from absl import logging -import numpy as np -import tensorflow as tf - -from official.nlp.transformer.utils import tokenizer - -_EXTRA_DECODE_LENGTH = 100 -_BEAM_SIZE = 4 -_ALPHA = 0.6 - - -def _get_sorted_inputs(filename): - """Read and sort lines from the file sorted by decreasing length. - - Args: - filename: String name of file to read inputs from. - Returns: - Sorted list of inputs, and dictionary mapping original index->sorted index - of each element. - """ - with tf.io.gfile.GFile(filename) as f: - records = f.read().split("\n") - inputs = [record.strip() for record in records] - if not inputs[-1]: - inputs.pop() - - input_lens = [(i, len(line.split())) for i, line in enumerate(inputs)] - sorted_input_lens = sorted(input_lens, key=lambda x: x[1], reverse=True) - - sorted_inputs = [None] * len(sorted_input_lens) - sorted_keys = [0] * len(sorted_input_lens) - for i, (index, _) in enumerate(sorted_input_lens): - sorted_inputs[i] = inputs[index] - sorted_keys[index] = i - return sorted_inputs, sorted_keys - - -def _encode_and_add_eos(line, subtokenizer): - """Encode line with subtokenizer, and add EOS id to the end.""" - return subtokenizer.encode(line) + [tokenizer.EOS_ID] - - -def _trim_and_decode(ids, subtokenizer): - """Trim EOS and PAD tokens from ids, and decode to return a string.""" - try: - index = list(ids).index(tokenizer.EOS_ID) - return subtokenizer.decode(ids[:index]) - except ValueError: # No EOS found in sequence - return subtokenizer.decode(ids) - - -def translate_file(model, - params, - subtokenizer, - input_file, - output_file=None, - print_all_translations=True, - distribution_strategy=None): - """Translate lines in file, and save to output file if specified. - - Args: - model: A Keras model, used to generate the translations. - params: A dictionary, containing the translation related parameters. - subtokenizer: A subtokenizer object, used for encoding and decoding source - and translated lines. - input_file: A file containing lines to translate. - output_file: A file that stores the generated translations. - print_all_translations: A bool. If true, all translations are printed to - stdout. - distribution_strategy: A distribution strategy, used to perform inference - directly with tf.function instead of Keras model.predict(). - - Raises: - ValueError: if output file is invalid. - """ - batch_size = params["decode_batch_size"] - - # Read and sort inputs by length. Keep dictionary (original index-->new index - # in sorted list) to write translations in the original order. - sorted_inputs, sorted_keys = _get_sorted_inputs(input_file) - total_samples = len(sorted_inputs) - num_decode_batches = (total_samples - 1) // batch_size + 1 - - def input_generator(): - """Yield encoded strings from sorted_inputs.""" - for i in range(num_decode_batches): - lines = [ - sorted_inputs[j + i * batch_size] - for j in range(batch_size) - if j + i * batch_size < total_samples - ] - lines = [_encode_and_add_eos(l, subtokenizer) for l in lines] - if distribution_strategy: - for j in range(batch_size - len(lines)): - lines.append([tokenizer.EOS_ID]) - batch = tf.keras.preprocessing.sequence.pad_sequences( - lines, - maxlen=params["decode_max_length"], - dtype="int32", - padding="post") - logging.info("Decoding batch %d out of %d.", i, num_decode_batches) - yield batch - - @tf.function - def predict_step(inputs): - """Decoding step function for TPU runs.""" - - def _step_fn(inputs): - """Per replica step function.""" - tag = inputs[0] - val_inputs = inputs[1] - val_outputs, _ = model([val_inputs], training=False) - return tag, val_outputs - - return distribution_strategy.run(_step_fn, args=(inputs,)) - - translations = [] - if distribution_strategy: - num_replicas = distribution_strategy.num_replicas_in_sync - local_batch_size = params["decode_batch_size"] // num_replicas - for i, text in enumerate(input_generator()): - if distribution_strategy: - text = np.reshape(text, [num_replicas, local_batch_size, -1]) - # Add tag to the input of each replica with the reordering logic after - # outputs, to ensure the output order matches the input order. - text = tf.constant(text) - - @tf.function - def text_as_per_replica(): - replica_context = tf.distribute.get_replica_context() - replica_id = replica_context.replica_id_in_sync_group - return replica_id, text[replica_id] # pylint: disable=cell-var-from-loop - - text = distribution_strategy.run(text_as_per_replica) - outputs = distribution_strategy.experimental_local_results( - predict_step(text)) - val_outputs = [output for _, output in outputs] - - val_outputs = np.reshape(val_outputs, [params["decode_batch_size"], -1]) - else: - val_outputs, _ = model.predict(text) - - length = len(val_outputs) - for j in range(length): - if j + i * batch_size < total_samples: - translation = _trim_and_decode(val_outputs[j], subtokenizer) - translations.append(translation) - if print_all_translations: - logging.info("Translating:\n\tInput: %s\n\tOutput: %s", - sorted_inputs[j + i * batch_size], translation) - - # Write translations in the order they appeared in the original file. - if output_file is not None: - if tf.io.gfile.isdir(output_file): - raise ValueError("File output is a directory, will not save outputs to " - "file.") - logging.info("Writing to file %s", output_file) - with tf.io.gfile.GFile(output_file, "w") as f: - for i in sorted_keys: - f.write("%s\n" % translations[i]) - - -def translate_from_text(model, subtokenizer, txt): - encoded_txt = _encode_and_add_eos(txt, subtokenizer) - result = model.predict(encoded_txt) - outputs = result["outputs"] - logging.info("Original: \"%s\"", txt) - translate_from_input(outputs, subtokenizer) - - -def translate_from_input(outputs, subtokenizer): - translation = _trim_and_decode(outputs, subtokenizer) - logging.info("Translation: \"%s\"", translation) diff --git a/official/nlp/transformer/utils/__init__.py b/official/nlp/transformer/utils/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/utils/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/nlp/transformer/utils/metrics.py b/official/nlp/transformer/utils/metrics.py deleted file mode 100644 index ec1cad0b409cfb69535dce15fab1d531d7811391..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/utils/metrics.py +++ /dev/null @@ -1,491 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Functions for calculating loss, accuracy, and other model metrics. - -Metrics: - - Padded loss, accuracy, and negative log perplexity. Source: - https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/metrics.py - - BLEU approximation. Source: - https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/bleu_hook.py - - ROUGE score. Source: - https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/rouge.py -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import collections -import math - -import numpy as np -import six -from six.moves import xrange # pylint: disable=redefined-builtin -import tensorflow.compat.v1 as tf - - -def _pad_tensors_to_same_length(x, y): - """Pad x and y so that the results have the same length (second dimension).""" - with tf.name_scope("pad_to_same_length"): - x_length = tf.shape(x)[1] - y_length = tf.shape(y)[1] - - max_length = tf.maximum(x_length, y_length) - - x = tf.pad(x, [[0, 0], [0, max_length - x_length], [0, 0]]) - y = tf.pad(y, [[0, 0], [0, max_length - y_length]]) - return x, y - - -def padded_cross_entropy_loss(logits, labels, smoothing, vocab_size): - """Calculate cross entropy loss while ignoring padding. - - Args: - logits: Tensor of size [batch_size, length_logits, vocab_size] - labels: Tensor of size [batch_size, length_labels] - smoothing: Label smoothing constant, used to determine the on and off values - vocab_size: int size of the vocabulary - Returns: - Returns the cross entropy loss and weight tensors: float32 tensors with - shape [batch_size, max(length_logits, length_labels)] - """ - with tf.name_scope("loss", values=[logits, labels]): - logits, labels = _pad_tensors_to_same_length(logits, labels) - - # Calculate smoothing cross entropy - with tf.name_scope("smoothing_cross_entropy", values=[logits, labels]): - confidence = 1.0 - smoothing - low_confidence = (1.0 - confidence) / tf.cast(vocab_size - 1, tf.float32) - soft_targets = tf.one_hot( - tf.cast(labels, tf.int32), - depth=vocab_size, - on_value=confidence, - off_value=low_confidence) - xentropy = tf.nn.softmax_cross_entropy_with_logits_v2( - logits=logits, labels=soft_targets) - - # Calculate the best (lowest) possible value of cross entropy, and - # subtract from the cross entropy loss. - normalizing_constant = -( - confidence * tf.log(confidence) + tf.cast(vocab_size - 1, tf.float32) - * low_confidence * tf.log(low_confidence + 1e-20)) - xentropy -= normalizing_constant - - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - return xentropy * weights, weights - - -def _convert_to_eval_metric(metric_fn): - """Wrap a metric fn that returns scores and weights as an eval metric fn. - - The input metric_fn returns values for the current batch. The wrapper - aggregates the return values collected over all of the batches evaluated. - - Args: - metric_fn: function that returns scores and weights for the current batch's - logits and predicted labels. - - Returns: - function that aggregates the scores and weights from metric_fn. - """ - def problem_metric_fn(*args): - """Returns an aggregation of the metric_fn's returned values.""" - (scores, weights) = metric_fn(*args) - - # The tf.metrics.mean function assures correct aggregation. - return tf.metrics.mean(scores, weights) - return problem_metric_fn - - -def get_eval_metrics(logits, labels, params): - """Return dictionary of model evaluation metrics.""" - metrics = { - "accuracy": _convert_to_eval_metric(padded_accuracy)(logits, labels), - "accuracy_top5": _convert_to_eval_metric(padded_accuracy_top5)( - logits, labels), - "accuracy_per_sequence": _convert_to_eval_metric( - padded_sequence_accuracy)(logits, labels), - "neg_log_perplexity": _convert_to_eval_metric(padded_neg_log_perplexity)( - logits, labels, params["vocab_size"]), - } - - if not params["use_tpu"]: - # TPU does not support tf.py_func - metrics.update({ - "approx_bleu_score": _convert_to_eval_metric( - bleu_score)(logits, labels), - "rouge_2_fscore": _convert_to_eval_metric( - rouge_2_fscore)(logits, labels), - "rouge_L_fscore": _convert_to_eval_metric( - rouge_l_fscore)(logits, labels), - }) - - # Prefix each of the metric names with "metrics/". This allows the metric - # graphs to display under the "metrics" category in TensorBoard. - metrics = {"metrics/%s" % k: v for k, v in six.iteritems(metrics)} - return metrics - - -def padded_accuracy(logits, labels): - """Percentage of times that predictions matches labels on non-0s.""" - with tf.variable_scope("padded_accuracy", values=[logits, labels]): - logits, labels = _pad_tensors_to_same_length(logits, labels) - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - outputs = tf.cast(tf.argmax(logits, axis=-1), tf.int32) - padded_labels = tf.cast(labels, tf.int32) - return tf.cast(tf.equal(outputs, padded_labels), tf.float32), weights - - -def padded_accuracy_topk(logits, labels, k): - """Percentage of times that top-k predictions matches labels on non-0s.""" - with tf.variable_scope("padded_accuracy_topk", values=[logits, labels]): - logits, labels = _pad_tensors_to_same_length(logits, labels) - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - effective_k = tf.minimum(k, tf.shape(logits)[-1]) - _, outputs = tf.nn.top_k(logits, k=effective_k) - outputs = tf.cast(outputs, tf.int32) - padded_labels = tf.cast(labels, tf.int32) - padded_labels = tf.expand_dims(padded_labels, axis=-1) - padded_labels += tf.zeros_like(outputs) # Pad to same shape. - same = tf.cast(tf.equal(outputs, padded_labels), tf.float32) - same_topk = tf.reduce_sum(same, axis=-1) - return same_topk, weights - - -def padded_accuracy_top5(logits, labels): - return padded_accuracy_topk(logits, labels, 5) - - -def padded_sequence_accuracy(logits, labels): - """Percentage of times that predictions matches labels everywhere (non-0).""" - with tf.variable_scope("padded_sequence_accuracy", values=[logits, labels]): - logits, labels = _pad_tensors_to_same_length(logits, labels) - weights = tf.cast(tf.not_equal(labels, 0), tf.float32) - outputs = tf.cast(tf.argmax(logits, axis=-1), tf.int32) - padded_labels = tf.cast(labels, tf.int32) - not_correct = (tf.cast(tf.not_equal(outputs, padded_labels), tf.float32) * - weights) - axis = list(range(1, len(outputs.get_shape()))) - correct_seq = 1.0 - tf.minimum(1.0, tf.reduce_sum(not_correct, axis=axis)) - return correct_seq, tf.constant(1.0) - - -def padded_neg_log_perplexity(logits, labels, vocab_size): - """Average log-perplexity excluding padding 0s. No smoothing.""" - num, den = padded_cross_entropy_loss(logits, labels, 0, vocab_size) - return -num, den - - -def bleu_score(logits, labels): - """Approximate BLEU score computation between labels and predictions. - - An approximate BLEU scoring method since we do not glue word pieces or - decode the ids and tokenize the output. By default, we use ngram order of 4 - and use brevity penalty. Also, this does not have beam search. - - Args: - logits: Tensor of size [batch_size, length_logits, vocab_size] - labels: Tensor of size [batch-size, length_labels] - - Returns: - bleu: int, approx bleu score - """ - predictions = tf.cast(tf.argmax(logits, axis=-1), tf.int32) - # TODO: Look into removing use of py_func # pylint: disable=g-bad-todo - bleu = tf.py_func(compute_bleu, (labels, predictions), tf.float32) - return bleu, tf.constant(1.0) - - -def _get_ngrams_with_counter(segment, max_order): - """Extracts all n-grams up to a given maximum order from an input segment. - - Args: - segment: text segment from which n-grams will be extracted. - max_order: maximum length in tokens of the n-grams returned by this - methods. - - Returns: - The Counter containing all n-grams upto max_order in segment - with a count of how many times each n-gram occurred. - """ - ngram_counts = collections.Counter() - for order in xrange(1, max_order + 1): - for i in xrange(0, len(segment) - order + 1): - ngram = tuple(segment[i:i + order]) - ngram_counts[ngram] += 1 - return ngram_counts - - -def compute_bleu(reference_corpus, translation_corpus, max_order=4, - use_bp=True): - """Computes BLEU score of translated segments against one or more references. - - Args: - reference_corpus: list of references for each translation. Each - reference should be tokenized into a list of tokens. - translation_corpus: list of translations to score. Each translation - should be tokenized into a list of tokens. - max_order: Maximum n-gram order to use when computing BLEU score. - use_bp: boolean, whether to apply brevity penalty. - - Returns: - BLEU score. - """ - reference_length = 0 - translation_length = 0 - bp = 1.0 - geo_mean = 0 - - matches_by_order = [0] * max_order - possible_matches_by_order = [0] * max_order - precisions = [] - - for (references, translations) in zip(reference_corpus, translation_corpus): - reference_length += len(references) - translation_length += len(translations) - ref_ngram_counts = _get_ngrams_with_counter(references, max_order) - translation_ngram_counts = _get_ngrams_with_counter(translations, max_order) - - overlap = dict((ngram, - min(count, translation_ngram_counts[ngram])) - for ngram, count in ref_ngram_counts.items()) - - for ngram in overlap: - matches_by_order[len(ngram) - 1] += overlap[ngram] - for ngram in translation_ngram_counts: - possible_matches_by_order[len(ngram) - 1] += translation_ngram_counts[ - ngram] - - precisions = [0] * max_order - smooth = 1.0 - - for i in xrange(0, max_order): - if possible_matches_by_order[i] > 0: - precisions[i] = float(matches_by_order[i]) / possible_matches_by_order[i] - if matches_by_order[i] > 0: - precisions[i] = float(matches_by_order[i]) / possible_matches_by_order[ - i] - else: - smooth *= 2 - precisions[i] = 1.0 / (smooth * possible_matches_by_order[i]) - else: - precisions[i] = 0.0 - - if max(precisions) > 0: - p_log_sum = sum(math.log(p) for p in precisions if p) - geo_mean = math.exp(p_log_sum / max_order) - - if use_bp: - ratio = translation_length / reference_length - bp = math.exp(1 - 1. / ratio) if ratio < 1.0 else 1.0 - bleu = geo_mean * bp - return np.float32(bleu) - - -def rouge_2_fscore(logits, labels): - """ROUGE-2 F1 score computation between labels and predictions. - - This is an approximate ROUGE scoring method since we do not glue word pieces - or decode the ids and tokenize the output. - - Args: - logits: tensor, model predictions - labels: tensor, gold output. - - Returns: - rouge2_fscore: approx rouge-2 f1 score. - """ - predictions = tf.cast(tf.argmax(logits, axis=-1), tf.int32) - # TODO: Look into removing use of py_func # pylint: disable=g-bad-todo - rouge_2_f_score = tf.py_func(rouge_n, (predictions, labels), tf.float32) - return rouge_2_f_score, tf.constant(1.0) - - -def _get_ngrams(n, text): - """Calculates n-grams. - - Args: - n: which n-grams to calculate - text: An array of tokens - - Returns: - A set of n-grams - """ - ngram_set = set() - text_length = len(text) - max_index_ngram_start = text_length - n - for i in range(max_index_ngram_start + 1): - ngram_set.add(tuple(text[i:i + n])) - return ngram_set - - -def rouge_n(eval_sentences, ref_sentences, n=2): - """Computes ROUGE-N f1 score of two text collections of sentences. - - Source: https://www.microsoft.com/en-us/research/publication/ - rouge-a-package-for-automatic-evaluation-of-summaries/ - - Args: - eval_sentences: Predicted sentences. - ref_sentences: Sentences from the reference set - n: Size of ngram. Defaults to 2. - - Returns: - f1 score for ROUGE-N - """ - f1_scores = [] - for eval_sentence, ref_sentence in zip(eval_sentences, ref_sentences): - eval_ngrams = _get_ngrams(n, eval_sentence) - ref_ngrams = _get_ngrams(n, ref_sentence) - ref_count = len(ref_ngrams) - eval_count = len(eval_ngrams) - - # Count the overlapping ngrams between evaluated and reference - overlapping_ngrams = eval_ngrams.intersection(ref_ngrams) - overlapping_count = len(overlapping_ngrams) - - # Handle edge case. This isn't mathematically correct, but it's good enough - if eval_count == 0: - precision = 0.0 - else: - precision = float(overlapping_count) / eval_count - if ref_count == 0: - recall = 0.0 - else: - recall = float(overlapping_count) / ref_count - f1_scores.append(2.0 * ((precision * recall) / (precision + recall + 1e-8))) - - # return overlapping_count / reference_count - return np.mean(f1_scores, dtype=np.float32) - - -def rouge_l_fscore(predictions, labels): - """ROUGE scores computation between labels and predictions. - - This is an approximate ROUGE scoring method since we do not glue word pieces - or decode the ids and tokenize the output. - - Args: - predictions: tensor, model predictions - labels: tensor, gold output. - - Returns: - rouge_l_fscore: approx rouge-l f1 score. - """ - outputs = tf.cast(tf.argmax(predictions, axis=-1), tf.int32) - rouge_l_f_score = tf.py_func(rouge_l_sentence_level, (outputs, labels), - tf.float32) - return rouge_l_f_score, tf.constant(1.0) - - -def rouge_l_sentence_level(eval_sentences, ref_sentences): - """Computes ROUGE-L (sentence level) of two collections of sentences. - - Source: https://www.microsoft.com/en-us/research/publication/ - rouge-a-package-for-automatic-evaluation-of-summaries/ - - Calculated according to: - R_lcs = LCS(X,Y)/m - P_lcs = LCS(X,Y)/n - F_lcs = ((1 + beta^2)*R_lcs*P_lcs) / (R_lcs + (beta^2) * P_lcs) - - where: - X = reference summary - Y = Candidate summary - m = length of reference summary - n = length of candidate summary - - Args: - eval_sentences: The sentences that have been picked by the summarizer - ref_sentences: The sentences from the reference set - - Returns: - A float: F_lcs - """ - - f1_scores = [] - for eval_sentence, ref_sentence in zip(eval_sentences, ref_sentences): - m = float(len(ref_sentence)) - n = float(len(eval_sentence)) - lcs = _len_lcs(eval_sentence, ref_sentence) - f1_scores.append(_f_lcs(lcs, m, n)) - return np.mean(f1_scores, dtype=np.float32) - - -def _len_lcs(x, y): - """Returns the length of the Longest Common Subsequence between two seqs. - - Source: http://www.algorithmist.com/index.php/Longest_Common_Subsequence - - Args: - x: sequence of words - y: sequence of words - - Returns - integer: Length of LCS between x and y - """ - table = _lcs(x, y) - n, m = len(x), len(y) - return table[n, m] - - -def _lcs(x, y): - """Computes the length of the LCS between two seqs. - - The implementation below uses a DP programming algorithm and runs - in O(nm) time where n = len(x) and m = len(y). - Source: http://www.algorithmist.com/index.php/Longest_Common_Subsequence - - Args: - x: collection of words - y: collection of words - - Returns: - Table of dictionary of coord and len lcs - """ - n, m = len(x), len(y) - table = dict() - for i in range(n + 1): - for j in range(m + 1): - if i == 0 or j == 0: - table[i, j] = 0 - elif x[i - 1] == y[j - 1]: - table[i, j] = table[i - 1, j - 1] + 1 - else: - table[i, j] = max(table[i - 1, j], table[i, j - 1]) - return table - - -def _f_lcs(llcs, m, n): - """Computes the LCS-based F-measure score. - - Source: http://research.microsoft.com/en-us/um/people/cyl/download/papers/ - rouge-working-note-v1.3.1.pdf - - Args: - llcs: Length of LCS - m: number of words in reference summary - n: number of words in candidate summary - - Returns: - Float. LCS-based F-measure score - """ - r_lcs = llcs / m - p_lcs = llcs / n - beta = p_lcs / (r_lcs + 1e-12) - num = (1 + (beta ** 2)) * r_lcs * p_lcs - denom = r_lcs + ((beta ** 2) * p_lcs) - f_lcs = num / (denom + 1e-12) - return f_lcs diff --git a/official/nlp/transformer/utils/tokenizer.py b/official/nlp/transformer/utils/tokenizer.py deleted file mode 100644 index 6a992a324f3b0c651d219f4f2cc081a274d87db4..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/utils/tokenizer.py +++ /dev/null @@ -1,660 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Defines Subtokenizer class to encode and decode strings.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import collections -import re -import sys -import unicodedata - -from absl import logging - -import numpy as np -import six -from six.moves import xrange # pylint: disable=redefined-builtin -import tensorflow as tf - -# pylint: disable=g-complex-comprehension -PAD = "" -PAD_ID = 0 -EOS = "" -EOS_ID = 1 -RESERVED_TOKENS = [PAD, EOS] - -# Set of characters that will be used in the function _escape_token() (see func -# docstring for more details). -# This set is added to the alphabet list to ensure that all escaped tokens can -# be encoded. -_ESCAPE_CHARS = set(u"\\_u;0123456789") -# Regex for the function _unescape_token(), the inverse of _escape_token(). -# This is used to find "\u", "\\", and "\###;" substrings in the token. -_UNESCAPE_REGEX = re.compile(r"\\u|\\\\|\\([0-9]+);") - -_UNDEFINED_UNICODE = u"\u3013" - - -def alphanumeric_char_set(): - return set( - six.unichr(i) - for i in xrange(sys.maxunicode) - if (unicodedata.category(six.unichr(i)).startswith("L") or - unicodedata.category(six.unichr(i)).startswith("N"))) - - -# Set contains all letter and number characters. -_ALPHANUMERIC_CHAR_SET = alphanumeric_char_set() - -# min_count is the minimum number of times a subtoken must appear in the data -# before before it is added to the vocabulary. The value is found using binary -# search to obtain the target vocabulary size. -_MIN_MIN_COUNT = 1 # min value to use when binary searching for min_count -_MAX_MIN_COUNT = 1000 # max value to use when binary searching for min_count - - -class Subtokenizer(object): - """Encodes and decodes strings to/from integer IDs.""" - - def __init__(self, vocab_file, reserved_tokens=None, master_char_set=None): - """Initializes class, creating a vocab file if data_files is provided.""" - logging.info("Initializing Subtokenizer from file %s.", vocab_file) - - if master_char_set is None: - master_char_set = _ALPHANUMERIC_CHAR_SET - - if reserved_tokens is None: - reserved_tokens = RESERVED_TOKENS - - self.subtoken_list = _load_vocab_file(vocab_file, reserved_tokens) - self.alphabet = _generate_alphabet_dict(self.subtoken_list) - self.subtoken_to_id_dict = _list_to_index_dict(self.subtoken_list) - - self.max_subtoken_length = 0 - for subtoken in self.subtoken_list: - self.max_subtoken_length = max(self.max_subtoken_length, len(subtoken)) - - # Create cache to speed up subtokenization - self._cache_size = 2**20 - self._cache = [(None, None)] * self._cache_size - self._master_char_set = master_char_set - - @staticmethod - def init_from_files(vocab_file, - files, - target_vocab_size, - threshold, - min_count=None, - file_byte_limit=1e6, - reserved_tokens=None, - correct_strip=True, - master_char_set=None): - """Create subtoken vocabulary based on files, and save vocab to file. - - Args: - vocab_file: String name of vocab file to store subtoken vocabulary. - files: List of file paths that will be used to generate vocabulary. - target_vocab_size: target vocabulary size to generate. - threshold: int threshold of vocabulary size to accept. - min_count: int minimum count to use for generating the vocabulary. The min - count is the minimum number of times a subtoken should appear in the - files before it is added to the vocabulary. If set to none, this value - is found using binary search. - file_byte_limit: (Default 1e6) Maximum number of bytes of sample text that - will be drawn from the files. - reserved_tokens: List of string tokens that are guaranteed to be at the - beginning of the subtoken vocabulary list. - correct_strip: Whether to convert text to unicode before strip. - master_char_set: the char set. - - Returns: - Subtokenizer object - """ - if master_char_set is None: - master_char_set = _ALPHANUMERIC_CHAR_SET - if reserved_tokens is None: - reserved_tokens = RESERVED_TOKENS - - if tf.io.gfile.exists(vocab_file): - logging.info("Vocab file already exists (%s)", vocab_file) - else: - logging.info("Begin steps to create subtoken vocabulary...") - token_counts = _count_tokens(files, file_byte_limit, correct_strip, - master_char_set) - alphabet = _generate_alphabet_dict(token_counts) - subtoken_list = _generate_subtokens_with_target_vocab_size( - token_counts, alphabet, target_vocab_size, threshold, min_count, - reserved_tokens) - logging.info("Generated vocabulary with %d subtokens.", - len(subtoken_list)) - _save_vocab_file(vocab_file, subtoken_list) - return Subtokenizer(vocab_file, master_char_set=master_char_set) - - def encode(self, raw_string, add_eos=False): - """Encodes a string into a list of int subtoken ids.""" - ret = [] - tokens = _split_string_to_tokens( - native_to_unicode(raw_string), self._master_char_set) - for token in tokens: - ret.extend(self._token_to_subtoken_ids(token)) - if add_eos: - assert EOS in self.subtoken_list, \ - "Can't append 'EOS' because it is not in list of known subtokens." - ret.append(EOS_ID) - return ret - - def _token_to_subtoken_ids(self, token): - """Encode a single token into a list of subtoken ids.""" - cache_location = hash(token) % self._cache_size - cache_key, cache_value = self._cache[cache_location] - if cache_key == token: - return cache_value - - ret = _split_token_to_subtokens( - _escape_token(token, self.alphabet), self.subtoken_to_id_dict, - self.max_subtoken_length) - ret = [self.subtoken_to_id_dict[subtoken_id] for subtoken_id in ret] - - self._cache[cache_location] = (token, ret) - return ret - - def decode(self, subtokens): - """Converts list of int subtokens ids into a string.""" - if isinstance(subtokens, np.ndarray): - # Note that list(subtokens) converts subtokens to a python list, but the - # items remain as np.int32. This converts both the array and its items. - subtokens = subtokens.tolist() - - if not subtokens: - return "" - - assert isinstance(subtokens, list) and isinstance(subtokens[0], int), ( - "Subtokens argument passed into decode() must be a list of integers.") - - return _unicode_to_native( - _join_tokens_to_string( - self._subtoken_ids_to_tokens(subtokens), self._master_char_set)) - - def _subtoken_ids_to_tokens(self, subtokens): - """Convert list of int subtoken ids to a list of string tokens.""" - escaped_tokens = "".join([ - self.subtoken_list[s] for s in subtokens if s < len(self.subtoken_list) - ]) - escaped_tokens = escaped_tokens.split("_") - - # All tokens in the vocabulary list have been escaped (see _escape_token()) - # so each token must be unescaped when decoding. - ret = [] - for token in escaped_tokens: - if token: - ret.append(_unescape_token(token)) - return ret - - -def _save_vocab_file(vocab_file, subtoken_list): - """Save subtokens to file.""" - with tf.io.gfile.GFile(vocab_file, mode="w") as f: - for subtoken in subtoken_list: - f.write("'%s'\n" % _unicode_to_native(subtoken)) - - -def _load_vocab_file(vocab_file, reserved_tokens=None): - """Load vocabulary while ensuring reserved tokens are at the top.""" - if reserved_tokens is None: - reserved_tokens = RESERVED_TOKENS - - subtoken_list = [] - with tf.io.gfile.GFile(vocab_file, mode="r") as f: - for line in f: - subtoken = native_to_unicode(line.strip()) - subtoken = subtoken[1:-1] # Remove surrounding single-quotes - if subtoken in reserved_tokens: - continue - subtoken_list.append(native_to_unicode(subtoken)) - return reserved_tokens + subtoken_list - - -def native_to_unicode(s): - """Convert string to unicode (required in Python 2).""" - try: # Python 2 - return s if isinstance(s, unicode) else s.decode("utf-8") - except NameError: # Python 3 - return s - - -def _unicode_to_native(s): - """Convert string from unicode to native format (required in Python 2).""" - try: # Python 2 - return s.encode("utf-8") if isinstance(s, unicode) else s - except NameError: # Python 3 - return s - - -def _split_string_to_tokens(text, master_char_set): - """Splits text to a list of string tokens.""" - if not text: - return [] - ret = [] - token_start = 0 - # Classify each character in the input string - is_master = [c in master_char_set for c in text] - for pos in xrange(1, len(text)): - if is_master[pos] != is_master[pos - 1]: - token = text[token_start:pos] - if token != u" " or token_start == 0: - ret.append(token) - token_start = pos - final_token = text[token_start:] - ret.append(final_token) - return ret - - -def _join_tokens_to_string(tokens, master_char_set): - """Join a list of string tokens into a single string.""" - token_is_master = [t[0] in master_char_set for t in tokens] - ret = [] - for i, token in enumerate(tokens): - if i > 0 and token_is_master[i - 1] and token_is_master[i]: - ret.append(u" ") - ret.append(token) - return "".join(ret) - - -def _escape_token(token, alphabet): - r"""Replace characters that aren't in the alphabet and append "_" to token. - - Apply three transformations to the token: - 1. Replace underline character "_" with "\u", and backslash "\" with "\\". - 2. Replace characters outside of the alphabet with "\###;", where ### is the - character's Unicode code point. - 3. Appends "_" to mark the end of a token. - - Args: - token: unicode string to be escaped - alphabet: list of all known characters - - Returns: - escaped string - """ - token = token.replace(u"\\", u"\\\\").replace(u"_", u"\\u") - ret = [c if c in alphabet and c != u"\n" else r"\%d;" % ord(c) for c in token] - return u"".join(ret) + "_" - - -def _unescape_token(token): - r"""Replaces escaped characters in the token with their unescaped versions. - - Applies inverse transformations as _escape_token(): - 1. Replace "\u" with "_", and "\\" with "\". - 2. Replace "\###;" with the unicode character the ### refers to. - - Args: - token: escaped string - - Returns: - unescaped string - """ - - def match(m): - r"""Returns replacement string for matched object. - - Matched objects contain one of the strings that matches the regex pattern: - r"\\u|\\\\|\\([0-9]+);" - The strings can be '\u', '\\', or '\###;' (### is any digit number). - - m.group(0) refers to the entire matched string ('\u', '\\', or '\###;'). - m.group(1) refers to the first parenthesized subgroup ('###'). - - m.group(0) exists for all match objects, while m.group(1) exists only for - the string '\###;'. - - This function looks to see if m.group(1) exists. If it doesn't, then the - matched string must be '\u' or '\\' . In this case, the corresponding - replacement ('_' and '\') are returned. Note that in python, a single - backslash is written as '\\', and double backslash as '\\\\'. - - If m.goup(1) exists, then use the integer in m.group(1) to return a - unicode character. - - Args: - m: match object - - Returns: - String to replace matched object with. - """ - # Check if the matched strings are '\u' or '\\'. - if m.group(1) is None: - return u"_" if m.group(0) == u"\\u" else u"\\" - - # If m.group(1) exists, try and return unicode character. - try: - return six.unichr(int(m.group(1))) - except (ValueError, OverflowError) as _: - return _UNDEFINED_UNICODE - - # Use match function to replace escaped substrings in the token. - return _UNESCAPE_REGEX.sub(match, token) - - -def _count_tokens(files, - file_byte_limit=1e6, - correct_strip=True, - master_char_set=None): - """Return token counts of words in the files. - - Samples file_byte_limit bytes from each file, and counts the words that appear - in the samples. The samples are semi-evenly distributed across the file. - - Args: - files: List of filepaths - file_byte_limit: Max number of bytes that will be read from each file. - correct_strip: Whether to convert text to unicode before strip. This affects - vocabulary generation for PY2. Sets correct_strip to False in PY2 to - reproduce previous common public result. Sets correct_strip to True will - let PY2 and PY3 get a consistent vocabulary. - master_char_set: the char set. - - Returns: - Dictionary mapping tokens to the number of times they appear in the sampled - lines from the files. - """ - if master_char_set is None: - master_char_set = _ALPHANUMERIC_CHAR_SET - - token_counts = collections.defaultdict(int) - - for filepath in files: - with tf.io.gfile.GFile(filepath, mode="r") as reader: - file_byte_budget = file_byte_limit - counter = 0 - lines_to_skip = int(reader.size() / (file_byte_budget * 2)) - for line in reader: - if counter < lines_to_skip: - counter += 1 - else: - if file_byte_budget < 0: - break - if correct_strip: - line = native_to_unicode(line) - line = line.strip() - file_byte_budget -= len(line) - counter = 0 - - # Add words to token counts - for token in _split_string_to_tokens( - native_to_unicode(line), master_char_set): - token_counts[token] += 1 - return token_counts - - -def _list_to_index_dict(lst): - """Create dictionary mapping list items to their indices in the list.""" - return {item: n for n, item in enumerate(lst)} - - -def _split_token_to_subtokens(token, subtoken_dict, max_subtoken_length): - """Splits a token into subtokens defined in the subtoken dict.""" - ret = [] - start = 0 - token_len = len(token) - while start < token_len: - # Find the longest subtoken, so iterate backwards. - for end in xrange(min(token_len, start + max_subtoken_length), start, -1): - subtoken = token[start:end] - if subtoken in subtoken_dict: - ret.append(subtoken) - start = end - break - else: # Did not break - # If there is no possible encoding of the escaped token then one of the - # characters in the token is not in the alphabet. This should be - # impossible and would be indicative of a bug. - raise ValueError("Was unable to split token \"%s\" into subtokens." % - token) - return ret - - -def _generate_subtokens_with_target_vocab_size(token_counts, - alphabet, - target_size, - threshold, - min_count=None, - reserved_tokens=None): - """Generate subtoken vocabulary close to the target size.""" - if reserved_tokens is None: - reserved_tokens = RESERVED_TOKENS - - if min_count is not None: - logging.info("Using min_count=%d to generate vocab with target size %d", - min_count, target_size) - return _generate_subtokens( - token_counts, alphabet, min_count, reserved_tokens=reserved_tokens) - - def bisect(min_val, max_val): - """Recursive function to binary search for subtoken vocabulary.""" - cur_count = (min_val + max_val) // 2 - logging.info("Binary search: trying min_count=%d (%d %d)", cur_count, - min_val, max_val) - subtoken_list = _generate_subtokens( - token_counts, alphabet, cur_count, reserved_tokens=reserved_tokens) - - val = len(subtoken_list) - logging.info("Binary search: min_count=%d resulted in %d tokens", cur_count, - val) - - within_threshold = abs(val - target_size) < threshold - if within_threshold or min_val >= max_val or cur_count < 2: - return subtoken_list - if val > target_size: - other_subtoken_list = bisect(cur_count + 1, max_val) - else: - other_subtoken_list = bisect(min_val, cur_count - 1) - - # Return vocabulary dictionary with the closest number of tokens. - other_val = len(other_subtoken_list) - if abs(other_val - target_size) < abs(val - target_size): - return other_subtoken_list - return subtoken_list - - logging.info("Finding best min_count to get target size of %d", target_size) - return bisect(_MIN_MIN_COUNT, _MAX_MIN_COUNT) - - -def _generate_alphabet_dict(iterable, reserved_tokens=None): - """Create set of characters that appear in any element in the iterable.""" - if reserved_tokens is None: - reserved_tokens = RESERVED_TOKENS - alphabet = {c for token in iterable for c in token} - alphabet |= {c for token in reserved_tokens for c in token} - alphabet |= _ESCAPE_CHARS # Add escape characters to alphabet set. - return alphabet - - -def _count_and_gen_subtokens(token_counts, alphabet, subtoken_dict, - max_subtoken_length): - """Count number of times subtokens appear, and generate new subtokens. - - Args: - token_counts: dict mapping tokens to the number of times they appear in the - original files. - alphabet: list of allowed characters. Used to escape the tokens, which - guarantees that all tokens can be split into subtokens. - subtoken_dict: dict mapping subtokens to ids. - max_subtoken_length: maximum length of subtoken in subtoken_dict. - - Returns: - A defaultdict mapping subtokens to the number of times they appear in the - tokens. The dict may contain new subtokens. - """ - subtoken_counts = collections.defaultdict(int) - for token, count in six.iteritems(token_counts): - token = _escape_token(token, alphabet) - subtokens = _split_token_to_subtokens(token, subtoken_dict, - max_subtoken_length) - - # Generate new subtokens by taking substrings from token. - start = 0 - for subtoken in subtokens: - for end in xrange(start + 1, len(token) + 1): - new_subtoken = token[start:end] - subtoken_counts[new_subtoken] += count - start += len(subtoken) - - return subtoken_counts - - -def _filter_and_bucket_subtokens(subtoken_counts, min_count): - """Return a bucketed list of subtokens that are filtered by count. - - Args: - subtoken_counts: defaultdict mapping subtokens to their counts - min_count: int count used to filter subtokens - - Returns: - List of subtoken sets, where subtokens in set i have the same length=i. - """ - # Create list of buckets, where subtokens in bucket i have length i. - subtoken_buckets = [] - for subtoken, count in six.iteritems(subtoken_counts): - if count < min_count: # Filter out subtokens that don't appear enough - continue - while len(subtoken_buckets) <= len(subtoken): - subtoken_buckets.append(set()) - subtoken_buckets[len(subtoken)].add(subtoken) - return subtoken_buckets - - -def _gen_new_subtoken_list(subtoken_counts, - min_count, - alphabet, - reserved_tokens=None): - """Generate candidate subtokens ordered by count, and new max subtoken length. - - Add subtokens to the candiate list in order of length (longest subtokens - first). When a subtoken is added, the counts of each of its prefixes are - decreased. Prefixes that don't appear much outside the subtoken are not added - to the candidate list. - - For example: - subtoken being added to candidate list: 'translate' - subtoken_counts: {'translate':10, 't':40, 'tr':16, 'tra':12, ...} - min_count: 5 - - When 'translate' is added, subtoken_counts is updated to: - {'translate':0, 't':30, 'tr':6, 'tra': 2, ...} - - The subtoken 'tra' will not be added to the candidate list, because it appears - twice (less than min_count) outside of 'translate'. - - Args: - subtoken_counts: defaultdict mapping str subtokens to int counts - min_count: int minumum count requirement for subtokens - alphabet: set of characters. Each character is added to the subtoken list to - guarantee that all tokens can be encoded. - reserved_tokens: list of tokens that will be added to the beginning of the - returned subtoken list. - - Returns: - List of candidate subtokens in decreasing count order, and maximum subtoken - length - """ - if reserved_tokens is None: - reserved_tokens = RESERVED_TOKENS - - # Create a list of (count, subtoken) for each candidate subtoken. - subtoken_candidates = [] - - # Use bucketted list to iterate through subtokens in order of length. - # subtoken_buckets[i] = set(subtokens), where each subtoken has length i. - subtoken_buckets = _filter_and_bucket_subtokens(subtoken_counts, min_count) - max_subtoken_length = len(subtoken_buckets) - 1 - - # Go through the list in reverse order to consider longer subtokens first. - for subtoken_len in xrange(max_subtoken_length, 0, -1): - for subtoken in subtoken_buckets[subtoken_len]: - count = subtoken_counts[subtoken] - - # Possible if this subtoken is a prefix of another token. - if count < min_count: - continue - - # Ignore alphabet/reserved tokens, which will be added manually later. - if subtoken not in alphabet and subtoken not in reserved_tokens: - subtoken_candidates.append((count, subtoken)) - - # Decrement count of the subtoken's prefixes (if a longer subtoken is - # added, its prefixes lose priority to be added). - for end in xrange(1, subtoken_len): - subtoken_counts[subtoken[:end]] -= count - - # Add alphabet subtokens (guarantees that all strings are encodable). - subtoken_candidates.extend((subtoken_counts.get(a, 0), a) for a in alphabet) - - # Order subtoken candidates by decreasing count. - subtoken_list = [t for _, t in sorted(subtoken_candidates, reverse=True)] - - # Add reserved tokens to beginning of the list. - subtoken_list = reserved_tokens + subtoken_list - return subtoken_list, max_subtoken_length - - -def _generate_subtokens(token_counts, - alphabet, - min_count, - num_iterations=4, - reserved_tokens=None): - """Create a list of subtokens in decreasing order of frequency. - - Args: - token_counts: dict mapping str tokens -> int count - alphabet: set of characters - min_count: int minimum number of times a subtoken must appear before it is - added to the vocabulary. - num_iterations: int number of iterations to generate new tokens. - reserved_tokens: list of tokens that will be added to the beginning to the - returned subtoken list. - - Returns: - Sorted list of subtokens (most frequent first) - """ - if reserved_tokens is None: - reserved_tokens = RESERVED_TOKENS - - # Use alphabet set to create initial list of subtokens - subtoken_list = reserved_tokens + list(alphabet) - max_subtoken_length = 1 - - # On each iteration, segment all words using the subtokens defined in - # subtoken_dict, count how often the resulting subtokens appear, and update - # the dictionary with subtokens w/ high enough counts. - for i in xrange(num_iterations): - logging.info("\tGenerating subtokens: iteration %d", i) - # Generate new subtoken->id dictionary using the new subtoken list. - subtoken_dict = _list_to_index_dict(subtoken_list) - - # Create dict mapping subtoken->count, with additional subtokens created - # from substrings taken from the tokens. - subtoken_counts = _count_and_gen_subtokens(token_counts, alphabet, - subtoken_dict, - max_subtoken_length) - - # Generate new list of subtokens sorted by subtoken count. - subtoken_list, max_subtoken_length = _gen_new_subtoken_list( - subtoken_counts, min_count, alphabet, reserved_tokens) - - logging.info("\tVocab size: %d", len(subtoken_list)) - return subtoken_list diff --git a/official/nlp/transformer/utils/tokenizer_test.py b/official/nlp/transformer/utils/tokenizer_test.py deleted file mode 100644 index f6ef7a08b2490c49410201a5114183f24a87a1e7..0000000000000000000000000000000000000000 --- a/official/nlp/transformer/utils/tokenizer_test.py +++ /dev/null @@ -1,204 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test Subtokenizer and string helper methods.""" - -import collections -import tempfile - -import tensorflow as tf - -from official.nlp.transformer.utils import tokenizer - - -class SubtokenizerTest(tf.test.TestCase): - - def _init_subtokenizer(self, vocab_list): - temp_file = tempfile.NamedTemporaryFile(delete=False) - with tf.io.gfile.GFile(temp_file.name, "w") as w: - for subtoken in vocab_list: - w.write("'%s'" % subtoken) - w.write("\n") - return tokenizer.Subtokenizer(temp_file.name, reserved_tokens=[]) - - def test_encode(self): - vocab_list = ["123_", "test", "ing_"] - subtokenizer = self._init_subtokenizer(vocab_list) - s = "testing 123" - encoded_list = subtokenizer.encode(s) - self.assertEqual([1, 2, 0], encoded_list) - - def test_decode(self): - vocab_list = ["123_", "test", "ing_"] - subtokenizer = self._init_subtokenizer(vocab_list) - encoded_list = [1, 2, 0] # testing 123 - decoded_str = subtokenizer.decode(encoded_list) - self.assertEqual("testing 123", decoded_str) - - def test_subtoken_ids_to_tokens(self): - vocab_list = ["123_", "test", "ing_"] - subtokenizer = self._init_subtokenizer(vocab_list) - encoded_list = [1, 2, 0] # testing 123 - token_list = subtokenizer._subtoken_ids_to_tokens(encoded_list) - self.assertEqual([u"testing", u"123"], token_list) - - -class StringHelperTest(tf.test.TestCase): - - def test_split_string_to_tokens(self): - text = "test? testing 123." - - tokens = tokenizer._split_string_to_tokens(text, - tokenizer._ALPHANUMERIC_CHAR_SET) - self.assertEqual(["test", "? ", "testing", "123", "."], tokens) - - def test_join_tokens_to_string(self): - tokens = ["test", "? ", "testing", "123", "."] - - s = tokenizer._join_tokens_to_string(tokens, - tokenizer._ALPHANUMERIC_CHAR_SET) - self.assertEqual("test? testing 123.", s) - - def test_escape_token(self): - token = u"abc_\\4" - alphabet = set("abc_\\u;") - - escaped_token = tokenizer._escape_token(token, alphabet) - self.assertEqual("abc\\u\\\\\\52;_", escaped_token) - - def test_unescape_token(self): - escaped_token = u"Underline: \\u, Backslash: \\\\, Unicode: \\52;" - - unescaped_token = tokenizer._unescape_token(escaped_token) - self.assertEqual("Underline: _, Backslash: \\, Unicode: 4", unescaped_token) - - def test_list_to_index_dict(self): - lst = ["test", "strings"] - - d = tokenizer._list_to_index_dict(lst) - self.assertDictEqual({"test": 0, "strings": 1}, d) - - def test_split_token_to_subtokens(self): - token = "abc" - subtoken_dict = {"a": 0, "b": 1, "c": 2, "ab": 3} - max_subtoken_length = 2 - - subtokens = tokenizer._split_token_to_subtokens(token, subtoken_dict, - max_subtoken_length) - self.assertEqual(["ab", "c"], subtokens) - - def test_generate_alphabet_dict(self): - s = ["testing", "123"] - reserved_tokens = ["???"] - - alphabet = tokenizer._generate_alphabet_dict(s, reserved_tokens) - self.assertIn("?", alphabet) - self.assertIn("t", alphabet) - self.assertIn("e", alphabet) - self.assertIn("s", alphabet) - self.assertIn("i", alphabet) - self.assertIn("n", alphabet) - self.assertIn("g", alphabet) - self.assertIn("1", alphabet) - self.assertIn("2", alphabet) - self.assertIn("3", alphabet) - - def test_count_and_gen_subtokens(self): - token_counts = {"abc": 5} - alphabet = set("abc_") - subtoken_dict = {"a": 0, "b": 1, "c": 2, "_": 3} - max_subtoken_length = 2 - - subtoken_counts = tokenizer._count_and_gen_subtokens( - token_counts, alphabet, subtoken_dict, max_subtoken_length) - - self.assertIsInstance(subtoken_counts, collections.defaultdict) - self.assertDictEqual( - { - "a": 5, - "b": 5, - "c": 5, - "_": 5, - "ab": 5, - "bc": 5, - "c_": 5, - "abc": 5, - "bc_": 5, - "abc_": 5 - }, subtoken_counts) - - def test_filter_and_bucket_subtokens(self): - subtoken_counts = collections.defaultdict(int, { - "a": 2, - "b": 4, - "c": 1, - "ab": 6, - "ac": 3, - "abbc": 5 - }) - min_count = 3 - - subtoken_buckets = tokenizer._filter_and_bucket_subtokens( - subtoken_counts, min_count) - - self.assertEqual(len(subtoken_buckets[0]), 0) - self.assertEqual(set("b"), subtoken_buckets[1]) - self.assertEqual(set(["ab", "ac"]), subtoken_buckets[2]) - self.assertEqual(len(subtoken_buckets[3]), 0) - self.assertEqual(set(["abbc"]), subtoken_buckets[4]) - - def test_gen_new_subtoken_list(self): - subtoken_counts = collections.defaultdict(int, { - "translate": 10, - "t": 40, - "tr": 16, - "tra": 12 - }) - min_count = 5 - alphabet = set("translate") - reserved_tokens = ["reserved", "tokens"] - - subtoken_list, max_token_length = tokenizer._gen_new_subtoken_list( - subtoken_counts, min_count, alphabet, reserved_tokens) - - # Check that "tra" isn"t in the list (its count should be decremented to 2, - # so it should not be added to the canddiate list). - self.assertNotIn("tra", subtoken_list) - - self.assertIn("tr", subtoken_list) - self.assertIn("t", subtoken_list) - - self.assertEqual(len("translate"), max_token_length) - - def test_generate_subtokens(self): - token_counts = {"ab": 1, "bc": 3, "abc": 5} - alphabet = set("abc_") - min_count = 100 - num_iterations = 1 - reserved_tokens = ["reserved", "tokens"] - - vocab_list = tokenizer._generate_subtokens(token_counts, alphabet, - min_count, num_iterations, - reserved_tokens) - - # Check that reserved tokens are at the front of the list - self.assertEqual(vocab_list[:2], reserved_tokens) - - # Check that each character in alphabet is in the vocab list - for c in alphabet: - self.assertIn(c, vocab_list) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/nlp/xlnet/__init__.py b/official/nlp/xlnet/__init__.py deleted file mode 100644 index a25710c222e3327cb20e000db5df5c5651c4a2cc..0000000000000000000000000000000000000000 --- a/official/nlp/xlnet/__init__.py +++ /dev/null @@ -1,15 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - diff --git a/official/nlp/xlnet/common_flags.py b/official/nlp/xlnet/common_flags.py deleted file mode 100644 index 549e7b036e8133c6e6e50deea5099404e9ee1dcf..0000000000000000000000000000000000000000 --- a/official/nlp/xlnet/common_flags.py +++ /dev/null @@ -1,142 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Common flags used in XLNet model.""" - -from absl import flags - -flags.DEFINE_string("master", default=None, help="master") -flags.DEFINE_string( - "tpu", - default=None, - help="The Cloud TPU to use for training. This should be " - "either the name used when creating the Cloud TPU, or a " - "url like grpc://ip.address.of.tpu:8470.") -flags.DEFINE_bool( - "use_tpu", default=True, help="Use TPUs rather than plain CPUs.") -flags.DEFINE_string("tpu_topology", "2x2", help="TPU topology.") -flags.DEFINE_integer( - "num_core_per_host", default=8, help="number of cores per host") - -flags.DEFINE_string("model_dir", default=None, help="Estimator model_dir.") -flags.DEFINE_string( - "init_checkpoint", - default=None, - help="Checkpoint path for initializing the model.") -flags.DEFINE_bool( - "init_from_transformerxl", - default=False, - help="Init from a transformerxl model checkpoint. Otherwise, init from the " - "entire model checkpoint.") - -# Optimization config -flags.DEFINE_float("learning_rate", default=1e-4, help="Maximum learning rate.") -flags.DEFINE_float("clip", default=1.0, help="Gradient clipping value.") -flags.DEFINE_float("weight_decay_rate", default=0.0, help="Weight decay rate.") - -# lr decay -flags.DEFINE_integer( - "warmup_steps", default=0, help="Number of steps for linear lr warmup.") -flags.DEFINE_float("adam_epsilon", default=1e-8, help="Adam epsilon.") -flags.DEFINE_float( - "lr_layer_decay_rate", - default=1.0, - help="Top layer: lr[L] = FLAGS.learning_rate." - "Lower layers: lr[l-1] = lr[l] * lr_layer_decay_rate.") -flags.DEFINE_float( - "min_lr_ratio", default=0.0, help="Minimum ratio learning rate.") - -# Training config -flags.DEFINE_integer( - "train_batch_size", - default=16, - help="Size of the train batch across all hosts.") -flags.DEFINE_integer( - "train_steps", default=100000, help="Total number of training steps.") -flags.DEFINE_integer( - "iterations", default=1000, help="Number of iterations per repeat loop.") - -# Data config -flags.DEFINE_integer( - "seq_len", default=0, help="Sequence length for pretraining.") -flags.DEFINE_integer( - "reuse_len", - default=0, - help="How many tokens to be reused in the next batch. " - "Could be half of `seq_len`.") -flags.DEFINE_bool("uncased", False, help="Use uncased inputs or not.") -flags.DEFINE_bool( - "bi_data", - default=False, - help="Use bidirectional data streams, " - "i.e., forward & backward.") -flags.DEFINE_integer("n_token", 32000, help="Vocab size") - -# Model config -flags.DEFINE_integer("mem_len", default=0, help="Number of steps to cache") -flags.DEFINE_bool("same_length", default=False, help="Same length attention") -flags.DEFINE_integer("clamp_len", default=-1, help="Clamp length") - -flags.DEFINE_integer("n_layer", default=6, help="Number of layers.") -flags.DEFINE_integer("d_model", default=32, help="Dimension of the model.") -flags.DEFINE_integer("d_embed", default=32, help="Dimension of the embeddings.") -flags.DEFINE_integer("n_head", default=4, help="Number of attention heads.") -flags.DEFINE_integer( - "d_head", default=8, help="Dimension of each attention head.") -flags.DEFINE_integer( - "d_inner", - default=32, - help="Dimension of inner hidden size in positionwise " - "feed-forward.") -flags.DEFINE_float("dropout", default=0.1, help="Dropout rate.") -flags.DEFINE_float("dropout_att", default=0.1, help="Attention dropout rate.") -flags.DEFINE_bool("untie_r", default=False, help="Untie r_w_bias and r_r_bias") -flags.DEFINE_string( - "ff_activation", - default="relu", - help="Activation type used in position-wise feed-forward.") -flags.DEFINE_string( - "strategy_type", - default="tpu", - help="Activation type used in position-wise feed-forward.") -flags.DEFINE_bool("use_bfloat16", False, help="Whether to use bfloat16.") - -# Parameter initialization -flags.DEFINE_enum( - "init_method", - default="normal", - enum_values=["normal", "uniform"], - help="Initialization method.") -flags.DEFINE_float( - "init_std", default=0.02, help="Initialization std when init is normal.") -flags.DEFINE_float( - "init_range", default=0.1, help="Initialization std when init is uniform.") - -flags.DEFINE_integer( - "test_data_size", default=12048, help="Number of test data samples.") -flags.DEFINE_string( - "train_tfrecord_path", - default=None, - help="Path to preprocessed training set tfrecord.") -flags.DEFINE_string( - "test_tfrecord_path", - default=None, - help="Path to preprocessed test set tfrecord.") -flags.DEFINE_integer( - "test_batch_size", - default=16, - help="Size of the test batch across all hosts.") -flags.DEFINE_integer( - "save_steps", default=1000, help="Number of steps for saving checkpoint.") -FLAGS = flags.FLAGS diff --git a/official/nlp/xlnet/optimization.py b/official/nlp/xlnet/optimization.py deleted file mode 100644 index d6954ab9fb76b12e37c05b7b8da51505dc72d6cb..0000000000000000000000000000000000000000 --- a/official/nlp/xlnet/optimization.py +++ /dev/null @@ -1,98 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Functions and classes related to optimization (weight updates).""" - -from absl import logging -import tensorflow as tf -from official.nlp import optimization - - -class WarmUp(tf.keras.optimizers.schedules.LearningRateSchedule): - """Applys a warmup schedule on a given learning rate decay schedule.""" - - def __init__(self, - initial_learning_rate, - decay_schedule_fn, - warmup_steps, - power=1.0, - name=None): - super(WarmUp, self).__init__() - self.initial_learning_rate = initial_learning_rate - self.warmup_steps = warmup_steps - self.power = power - self.decay_schedule_fn = decay_schedule_fn - self.name = name - - def __call__(self, step): - with tf.name_scope(self.name or "WarmUp") as name: - # Implements polynomial warmup. i.e., if global_step < warmup_steps, the - # learning rate will be `global_step/num_warmup_steps * init_lr`. - global_step_float = tf.cast(step, tf.float32) - warmup_steps_float = tf.cast(self.warmup_steps, tf.float32) - warmup_percent_done = global_step_float / warmup_steps_float - warmup_learning_rate = ( - self.initial_learning_rate * - tf.math.pow(warmup_percent_done, self.power)) - return tf.cond( - global_step_float < warmup_steps_float, - lambda: warmup_learning_rate, - lambda: self.decay_schedule_fn(step - self.warmup_steps), - name=name) - - def get_config(self): - return { - "initial_learning_rate": self.initial_learning_rate, - "decay_schedule_fn": self.decay_schedule_fn, - "warmup_steps": self.warmup_steps, - "power": self.power, - "name": self.name - } - - -def create_optimizer(init_lr, - num_train_steps, - num_warmup_steps, - min_lr_ratio=0.0, - adam_epsilon=1e-8, - weight_decay_rate=0.0): - """Creates an optimizer with learning rate schedule.""" - # Implements linear decay of the learning rate. - learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay( - initial_learning_rate=init_lr, - decay_steps=num_train_steps - num_warmup_steps, - end_learning_rate=init_lr * min_lr_ratio) - if num_warmup_steps: - learning_rate_fn = WarmUp( - initial_learning_rate=init_lr, - decay_schedule_fn=learning_rate_fn, - warmup_steps=num_warmup_steps) - if weight_decay_rate > 0.0: - logging.info( - "Using AdamWeightDecay with adam_epsilon=%.9f weight_decay_rate=%.3f", - adam_epsilon, weight_decay_rate) - optimizer = optimization.AdamWeightDecay( - learning_rate=learning_rate_fn, - weight_decay_rate=weight_decay_rate, - beta_1=0.9, - beta_2=0.999, - epsilon=adam_epsilon, - exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"], - include_in_weight_decay=["r_s_bias", "r_r_bias", "r_w_bias"]) - else: - logging.info("Using Adam with adam_epsilon=%.9f", (adam_epsilon)) - optimizer = tf.keras.optimizers.Adam( - learning_rate=learning_rate_fn, epsilon=adam_epsilon) - - return optimizer, learning_rate_fn diff --git a/official/nlp/xlnet/run_classifier.py b/official/nlp/xlnet/run_classifier.py deleted file mode 100644 index f2681e0ce8a714cb4f784430a86f076bdc356676..0000000000000000000000000000000000000000 --- a/official/nlp/xlnet/run_classifier.py +++ /dev/null @@ -1,187 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""XLNet classification finetuning runner in tf2.0.""" - -import functools -# Import libraries -from absl import app -from absl import flags -from absl import logging - -import numpy as np -import tensorflow as tf -# pylint: disable=unused-import -from official.common import distribute_utils -from official.nlp.xlnet import common_flags -from official.nlp.xlnet import data_utils -from official.nlp.xlnet import optimization -from official.nlp.xlnet import training_utils -from official.nlp.xlnet import xlnet_config -from official.nlp.xlnet import xlnet_modeling as modeling - -flags.DEFINE_integer("n_class", default=2, help="Number of classes.") -flags.DEFINE_string( - "summary_type", - default="last", - help="Method used to summarize a sequence into a vector.") - -FLAGS = flags.FLAGS - - -def get_classificationxlnet_model(model_config, - run_config, - n_class, - summary_type="last"): - model = modeling.ClassificationXLNetModel( - model_config, run_config, n_class, summary_type, name="model") - return model - - -def run_evaluation(strategy, - test_input_fn, - eval_steps, - model, - step, - eval_summary_writer=None): - """Run evaluation for classification task. - - Args: - strategy: distribution strategy. - test_input_fn: input function for evaluation data. - eval_steps: total number of evaluation steps. - model: keras model object. - step: current train step. - eval_summary_writer: summary writer used to record evaluation metrics. As - there are fake data samples in validation set, we use mask to get rid of - them when calculating the accuracy. For the reason that there will be - dynamic-shape tensor, we first collect logits, labels and masks from TPU - and calculate the accuracy via numpy locally. - - Returns: - A float metric, accuracy. - """ - - def _test_step_fn(inputs): - """Replicated validation step.""" - - inputs["mems"] = None - _, logits = model(inputs, training=False) - return logits, inputs["label_ids"], inputs["is_real_example"] - - @tf.function - def _run_evaluation(test_iterator): - """Runs validation steps.""" - logits, labels, masks = strategy.run( - _test_step_fn, args=(next(test_iterator),)) - return logits, labels, masks - - test_iterator = data_utils.get_input_iterator(test_input_fn, strategy) - correct = 0 - total = 0 - for _ in range(eval_steps): - logits, labels, masks = _run_evaluation(test_iterator) - logits = strategy.experimental_local_results(logits) - labels = strategy.experimental_local_results(labels) - masks = strategy.experimental_local_results(masks) - merged_logits = [] - merged_labels = [] - merged_masks = [] - - for i in range(strategy.num_replicas_in_sync): - merged_logits.append(logits[i].numpy()) - merged_labels.append(labels[i].numpy()) - merged_masks.append(masks[i].numpy()) - merged_logits = np.vstack(np.array(merged_logits)) - merged_labels = np.hstack(np.array(merged_labels)) - merged_masks = np.hstack(np.array(merged_masks)) - real_index = np.where(np.equal(merged_masks, 1)) - correct += np.sum( - np.equal( - np.argmax(merged_logits[real_index], axis=-1), - merged_labels[real_index])) - total += np.shape(real_index)[-1] - accuracy = float(correct) / float(total) - logging.info("Train step: %d / acc = %d/%d = %f", step, correct, total, - accuracy) - if eval_summary_writer: - with eval_summary_writer.as_default(): - tf.summary.scalar("eval_acc", float(correct) / float(total), step=step) - eval_summary_writer.flush() - return accuracy - - -def get_metric_fn(): - train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy( - "acc", dtype=tf.float32) - return train_acc_metric - - -def main(unused_argv): - del unused_argv - strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=FLAGS.strategy_type, - tpu_address=FLAGS.tpu) - if strategy: - logging.info("***** Number of cores used : %d", - strategy.num_replicas_in_sync) - train_input_fn = functools.partial(data_utils.get_classification_input_data, - FLAGS.train_batch_size, FLAGS.seq_len, - strategy, True, FLAGS.train_tfrecord_path) - test_input_fn = functools.partial(data_utils.get_classification_input_data, - FLAGS.test_batch_size, FLAGS.seq_len, - strategy, False, FLAGS.test_tfrecord_path) - - total_training_steps = FLAGS.train_steps - steps_per_loop = FLAGS.iterations - eval_steps = int(FLAGS.test_data_size / FLAGS.test_batch_size) - eval_fn = functools.partial(run_evaluation, strategy, test_input_fn, - eval_steps) - optimizer, learning_rate_fn = optimization.create_optimizer( - FLAGS.learning_rate, - total_training_steps, - FLAGS.warmup_steps, - adam_epsilon=FLAGS.adam_epsilon) - model_config = xlnet_config.XLNetConfig(FLAGS) - run_config = xlnet_config.create_run_config(True, False, FLAGS) - model_fn = functools.partial(get_classificationxlnet_model, model_config, - run_config, FLAGS.n_class, FLAGS.summary_type) - input_meta_data = {} - input_meta_data["d_model"] = FLAGS.d_model - input_meta_data["mem_len"] = FLAGS.mem_len - input_meta_data["batch_size_per_core"] = int(FLAGS.train_batch_size / - strategy.num_replicas_in_sync) - input_meta_data["n_layer"] = FLAGS.n_layer - input_meta_data["lr_layer_decay_rate"] = FLAGS.lr_layer_decay_rate - input_meta_data["n_class"] = FLAGS.n_class - - training_utils.train( - strategy=strategy, - model_fn=model_fn, - input_meta_data=input_meta_data, - eval_fn=eval_fn, - metric_fn=get_metric_fn, - train_input_fn=train_input_fn, - init_checkpoint=FLAGS.init_checkpoint, - init_from_transformerxl=FLAGS.init_from_transformerxl, - total_training_steps=total_training_steps, - steps_per_loop=steps_per_loop, - optimizer=optimizer, - learning_rate_fn=learning_rate_fn, - model_dir=FLAGS.model_dir, - save_steps=FLAGS.save_steps) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/nlp/xlnet/run_squad.py b/official/nlp/xlnet/run_squad.py deleted file mode 100644 index a6126295ec1bd571abf04b90e2713eb43d1df002..0000000000000000000000000000000000000000 --- a/official/nlp/xlnet/run_squad.py +++ /dev/null @@ -1,295 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""XLNet SQUAD finetuning runner in tf2.0.""" - -import functools -import json -import os -import pickle - -# Import libraries -from absl import app -from absl import flags -from absl import logging - -import tensorflow as tf -# pylint: disable=unused-import -import sentencepiece as spm -from official.common import distribute_utils -from official.nlp.xlnet import common_flags -from official.nlp.xlnet import data_utils -from official.nlp.xlnet import optimization -from official.nlp.xlnet import squad_utils -from official.nlp.xlnet import training_utils -from official.nlp.xlnet import xlnet_config -from official.nlp.xlnet import xlnet_modeling as modeling - -flags.DEFINE_string( - "test_feature_path", default=None, help="Path to feature of test set.") -flags.DEFINE_integer("query_len", default=64, help="Max query length.") -flags.DEFINE_integer("start_n_top", default=5, help="Beam size for span start.") -flags.DEFINE_integer("end_n_top", default=5, help="Beam size for span end.") -flags.DEFINE_string( - "predict_dir", default=None, help="Path to write predictions.") -flags.DEFINE_string( - "predict_file", default=None, help="Path to json file of test set.") -flags.DEFINE_integer( - "n_best_size", default=5, help="n best size for predictions.") -flags.DEFINE_integer("max_answer_length", default=64, help="Max answer length.") -# Data preprocessing config -flags.DEFINE_string( - "spiece_model_file", default=None, help="Sentence Piece model path.") -flags.DEFINE_integer("max_seq_length", default=512, help="Max sequence length.") -flags.DEFINE_integer("max_query_length", default=64, help="Max query length.") -flags.DEFINE_integer("doc_stride", default=128, help="Doc stride.") - -FLAGS = flags.FLAGS - - -class InputFeatures(object): - """A single set of features of data.""" - - def __init__(self, - unique_id, - example_index, - doc_span_index, - tok_start_to_orig_index, - tok_end_to_orig_index, - token_is_max_context, - input_ids, - input_mask, - p_mask, - segment_ids, - paragraph_len, - cls_index, - start_position=None, - end_position=None, - is_impossible=None): - self.unique_id = unique_id - self.example_index = example_index - self.doc_span_index = doc_span_index - self.tok_start_to_orig_index = tok_start_to_orig_index - self.tok_end_to_orig_index = tok_end_to_orig_index - self.token_is_max_context = token_is_max_context - self.input_ids = input_ids - self.input_mask = input_mask - self.p_mask = p_mask - self.segment_ids = segment_ids - self.paragraph_len = paragraph_len - self.cls_index = cls_index - self.start_position = start_position - self.end_position = end_position - self.is_impossible = is_impossible - - -# pylint: disable=unused-argument -def run_evaluation(strategy, test_input_fn, eval_examples, eval_features, - original_data, eval_steps, input_meta_data, model, - current_step, eval_summary_writer): - """Run evaluation for SQUAD task. - - Args: - strategy: distribution strategy. - test_input_fn: input function for evaluation data. - eval_examples: tf.Examples of the evaluation set. - eval_features: Feature objects of the evaluation set. - original_data: The original json data for the evaluation set. - eval_steps: total number of evaluation steps. - input_meta_data: input meta data. - model: keras model object. - current_step: current training step. - eval_summary_writer: summary writer used to record evaluation metrics. - - Returns: - A float metric, F1 score. - """ - - def _test_step_fn(inputs): - """Replicated validation step.""" - - inputs["mems"] = None - res = model(inputs, training=False) - return res, inputs["unique_ids"] - - @tf.function - def _run_evaluation(test_iterator): - """Runs validation steps.""" - res, unique_ids = strategy.run( - _test_step_fn, args=(next(test_iterator),)) - return res, unique_ids - - test_iterator = data_utils.get_input_iterator(test_input_fn, strategy) - cur_results = [] - for _ in range(eval_steps): - results, unique_ids = _run_evaluation(test_iterator) - unique_ids = strategy.experimental_local_results(unique_ids) - - for result_key in results: - results[result_key] = ( - strategy.experimental_local_results(results[result_key])) - for core_i in range(strategy.num_replicas_in_sync): - bsz = int(input_meta_data["test_batch_size"] / - strategy.num_replicas_in_sync) - for j in range(bsz): - result = {} - for result_key in results: - result[result_key] = results[result_key][core_i].numpy()[j] - result["unique_ids"] = unique_ids[core_i].numpy()[j] - # We appended a fake example into dev set to make data size can be - # divided by test_batch_size. Ignores this fake example during - # evaluation. - if result["unique_ids"] == 1000012047: - continue - unique_id = int(result["unique_ids"]) - - start_top_log_probs = ([ - float(x) for x in result["start_top_log_probs"].flat - ]) - start_top_index = [int(x) for x in result["start_top_index"].flat] - end_top_log_probs = ([ - float(x) for x in result["end_top_log_probs"].flat - ]) - end_top_index = [int(x) for x in result["end_top_index"].flat] - - cls_logits = float(result["cls_logits"].flat[0]) - cur_results.append( - squad_utils.RawResult( - unique_id=unique_id, - start_top_log_probs=start_top_log_probs, - start_top_index=start_top_index, - end_top_log_probs=end_top_log_probs, - end_top_index=end_top_index, - cls_logits=cls_logits)) - if len(cur_results) % 1000 == 0: - logging.info("Processing example: %d", len(cur_results)) - - output_prediction_file = os.path.join(input_meta_data["predict_dir"], - "predictions.json") - output_nbest_file = os.path.join(input_meta_data["predict_dir"], - "nbest_predictions.json") - output_null_log_odds_file = os.path.join(input_meta_data["predict_dir"], - "null_odds.json") - - results = squad_utils.write_predictions( - eval_examples, eval_features, cur_results, input_meta_data["n_best_size"], - input_meta_data["max_answer_length"], output_prediction_file, - output_nbest_file, output_null_log_odds_file, original_data, - input_meta_data["start_n_top"], input_meta_data["end_n_top"]) - - # Log current results. - log_str = "Result | " - for key, val in results.items(): - log_str += "{} {} | ".format(key, val) - logging.info(log_str) - with eval_summary_writer.as_default(): - tf.summary.scalar("best_f1", results["best_f1"], step=current_step) - tf.summary.scalar("best_exact", results["best_exact"], step=current_step) - eval_summary_writer.flush() - return results["best_f1"] - - -def get_qaxlnet_model(model_config, run_config, start_n_top, end_n_top): - model = modeling.QAXLNetModel( - model_config, - run_config, - start_n_top=start_n_top, - end_n_top=end_n_top, - name="model") - return model - - -def main(unused_argv): - del unused_argv - strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=FLAGS.strategy_type, - tpu_address=FLAGS.tpu) - if strategy: - logging.info("***** Number of cores used : %d", - strategy.num_replicas_in_sync) - train_input_fn = functools.partial(data_utils.get_squad_input_data, - FLAGS.train_batch_size, FLAGS.seq_len, - FLAGS.query_len, strategy, True, - FLAGS.train_tfrecord_path) - - test_input_fn = functools.partial(data_utils.get_squad_input_data, - FLAGS.test_batch_size, FLAGS.seq_len, - FLAGS.query_len, strategy, False, - FLAGS.test_tfrecord_path) - - total_training_steps = FLAGS.train_steps - steps_per_loop = FLAGS.iterations - eval_steps = int(FLAGS.test_data_size / FLAGS.test_batch_size) - - optimizer, learning_rate_fn = optimization.create_optimizer( - FLAGS.learning_rate, - total_training_steps, - FLAGS.warmup_steps, - adam_epsilon=FLAGS.adam_epsilon) - model_config = xlnet_config.XLNetConfig(FLAGS) - run_config = xlnet_config.create_run_config(True, False, FLAGS) - input_meta_data = {} - input_meta_data["start_n_top"] = FLAGS.start_n_top - input_meta_data["end_n_top"] = FLAGS.end_n_top - input_meta_data["lr_layer_decay_rate"] = FLAGS.lr_layer_decay_rate - input_meta_data["predict_dir"] = FLAGS.predict_dir - input_meta_data["n_best_size"] = FLAGS.n_best_size - input_meta_data["max_answer_length"] = FLAGS.max_answer_length - input_meta_data["test_batch_size"] = FLAGS.test_batch_size - input_meta_data["batch_size_per_core"] = int(FLAGS.train_batch_size / - strategy.num_replicas_in_sync) - input_meta_data["mem_len"] = FLAGS.mem_len - model_fn = functools.partial(get_qaxlnet_model, model_config, run_config, - FLAGS.start_n_top, FLAGS.end_n_top) - eval_examples = squad_utils.read_squad_examples( - FLAGS.predict_file, is_training=False) - if FLAGS.test_feature_path: - logging.info("start reading pickle file...") - with tf.io.gfile.GFile(FLAGS.test_feature_path, "rb") as f: - eval_features = pickle.load(f) - logging.info("finishing reading pickle file...") - else: - sp_model = spm.SentencePieceProcessor() - sp_model.LoadFromSerializedProto( - tf.io.gfile.GFile(FLAGS.spiece_model_file, "rb").read()) - spm_basename = os.path.basename(FLAGS.spiece_model_file) - eval_features = squad_utils.create_eval_data( - spm_basename, sp_model, eval_examples, FLAGS.max_seq_length, - FLAGS.max_query_length, FLAGS.doc_stride, FLAGS.uncased) - - with tf.io.gfile.GFile(FLAGS.predict_file) as f: - original_data = json.load(f)["data"] - eval_fn = functools.partial(run_evaluation, strategy, test_input_fn, - eval_examples, eval_features, original_data, - eval_steps, input_meta_data) - - training_utils.train( - strategy=strategy, - model_fn=model_fn, - input_meta_data=input_meta_data, - eval_fn=eval_fn, - metric_fn=None, - train_input_fn=train_input_fn, - init_checkpoint=FLAGS.init_checkpoint, - init_from_transformerxl=FLAGS.init_from_transformerxl, - total_training_steps=total_training_steps, - steps_per_loop=steps_per_loop, - optimizer=optimizer, - learning_rate_fn=learning_rate_fn, - model_dir=FLAGS.model_dir, - save_steps=FLAGS.save_steps) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/pip_package/setup.py b/official/pip_package/setup.py index 56087d7c106a70f8e97a542b569571de91d88984..348c277454f5a691a7df4e0a7aa9bd341c01e86e 100644 --- a/official/pip_package/setup.py +++ b/official/pip_package/setup.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,8 +20,8 @@ import sys from setuptools import find_packages from setuptools import setup -version = '2.7.0' -tf_version = '2.7.0' # Major version. +version = '2.10.0' +tf_version = '2.10.0' # Major version. project_name = 'tf-models-official' @@ -74,7 +74,7 @@ setup( description='TensorFlow Official Models', long_description=long_description, author='Google Inc.', - author_email='no-reply@google.com', + author_email='packages@tensorflow.org', url='https://github.com/tensorflow/models', license='Apache 2.0', packages=find_packages(exclude=[ diff --git a/official/projects/README.md b/official/projects/README.md index 9c94fdd110642ed56a2610f063f54acba9ae045a..fb2768b558fe1601afd741cf2e4bf36d833d4db7 100644 --- a/official/projects/README.md +++ b/official/projects/README.md @@ -1,11 +1,31 @@ # TensorFlow Model Garden Modeling Projects -This directory contains projects using TensorFlow Model Garden Modeling -libraries. +This directory contains projects using Modeling libraries of TensorFlow Model +Garden. More details about each project can be found in the individual +project folders listed below. ## Projects -* [NHNet](nhnet): - [Generating Representative Headlines for News Stories](https://arxiv.org/abs/2001.09386) - by Gu et al, 2020 - +* [AssembleNet](./assemblenet/README.md) +* [BASNet](./basnet/README.md) +* [BigBird](./bigbird/README.md) +* [DeepMAC Mask-RCNN](./deepmac_maskrcnn/README.md) +* [DETR](./detr/README.md) +* [Edge-TPU for Vision and NLP](./edgetpu/README.md) +* [Language-agnostic BERT Sentence Embedding](./labse/README.md) +* [Long-Document Transformer](./longformer/README.md) +* [MobileBERT](./mobilebert/README.md) +* [MoViNets](./movinet/README.md) +* [News Headline Generation Model: NHNet](./nhnet/README.md) +* [Training with Pruning](./pruning/README.md) +* [QAT for Computer Vision](./qat/vision/README.md) +* [Roformer Project](./roformer/README.md) +* [Training ELECTRA Augmented with Multi-word Selection](./teams/README.md) +* [NLP example project](./text_classification_example/README.md) +* [TensorNetwork BERT](./tn_bert/README.md) +* [Token Dropping for Efficient BERT Pretraining](./token_dropping/README.md) +* [Spatiotemporal Contrastive Video Representation Learning](./video_ssl/README.md) +* [Vision Transformer (ViT)](./vit/README.md) +* [Data-Efficient Image Transformer (DEIT)](./vit/README.md) +* [Volumetric Models](./volumetric_models/README.md) +* [YouTube-8M Tensorflow Starter Code](./yt8m/README.md) diff --git a/official/projects/__init__.py b/official/projects/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/assemblenet/configs/assemblenet.py b/official/projects/assemblenet/configs/assemblenet.py index e021f3a4bca0a3a30020eb8dd352557c4575f26a..5bbe79b58a84dae09b54d6afd3774b69259f91d3 100644 --- a/official/projects/assemblenet/configs/assemblenet.py +++ b/official/projects/assemblenet/configs/assemblenet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Definitions for AssembleNet/++ structures. This structure is a `list` corresponding to a graph representation of the @@ -40,9 +39,9 @@ from typing import List, Optional, Tuple from official.core import config_definitions as cfg from official.core import exp_factory from official.modeling import hyperparams -from official.vision.beta.configs import backbones_3d -from official.vision.beta.configs import common -from official.vision.beta.configs import video_classification +from official.vision.configs import backbones_3d +from official.vision.configs import common +from official.vision.configs import video_classification @dataclasses.dataclass @@ -62,7 +61,7 @@ def flat_lists_to_blocks(model_structures, model_edge_weights): if node[0] < 0: block = BlockSpec(level=node[0], temporal_dilation=node[1]) else: - block = BlockSpec( + block = BlockSpec( # pytype: disable=wrong-arg-types level=node[0], input_blocks=node[1], num_filters=node[2], diff --git a/official/projects/assemblenet/configs/assemblenet_test.py b/official/projects/assemblenet/configs/assemblenet_test.py index 26fc08edc028982c0e0c6dfa1087078f3c7938b2..f11c21135c0dbb38508d4e3613376cbc9a788336 100644 --- a/official/projects/assemblenet/configs/assemblenet_test.py +++ b/official/projects/assemblenet/configs/assemblenet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,13 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 from absl.testing import parameterized import tensorflow as tf from official.core import config_definitions as cfg from official.core import exp_factory from official.projects.assemblenet.configs import assemblenet -from official.vision.beta.configs import video_classification as exp_cfg +from official.vision.configs import video_classification as exp_cfg class AssemblenetTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/projects/assemblenet/modeling/assemblenet.py b/official/projects/assemblenet/modeling/assemblenet.py index f84f38a854f7f7a9f4fa0e07b978ebe66f012100..3c2417a94e8594b05d72e541006e566eb4a9693d 100644 --- a/official/projects/assemblenet/modeling/assemblenet.py +++ b/official/projects/assemblenet/modeling/assemblenet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Contains definitions for the AssembleNet [1] models. Requires the AssembleNet architecture to be specified in @@ -57,8 +56,8 @@ import tensorflow as tf from official.modeling import hyperparams from official.projects.assemblenet.configs import assemblenet as cfg from official.projects.assemblenet.modeling import rep_flow_2d_layer as rf -from official.vision.beta.modeling import factory_3d as model_factory -from official.vision.beta.modeling.backbones import factory as backbone_factory +from official.vision.modeling import factory_3d as model_factory +from official.vision.modeling.backbones import factory as backbone_factory layers = tf.keras.layers intermediate_channel_size = [64, 128, 256, 512] diff --git a/official/projects/assemblenet/modeling/assemblenet_plus.py b/official/projects/assemblenet/modeling/assemblenet_plus.py index 85d7629012080dc642c7558c416416b863869239..c07657bdf1ff54af8024ea0f7ed1aed4ac739d35 100644 --- a/official/projects/assemblenet/modeling/assemblenet_plus.py +++ b/official/projects/assemblenet/modeling/assemblenet_plus.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -64,8 +64,8 @@ from official.modeling import hyperparams from official.projects.assemblenet.configs import assemblenet as cfg from official.projects.assemblenet.modeling import assemblenet as asn from official.projects.assemblenet.modeling import rep_flow_2d_layer as rf -from official.vision.beta.modeling import factory_3d as model_factory -from official.vision.beta.modeling.backbones import factory as backbone_factory +from official.vision.modeling import factory_3d as model_factory +from official.vision.modeling.backbones import factory as backbone_factory layers = tf.keras.layers diff --git a/official/projects/assemblenet/modeling/assemblenet_plus_test.py b/official/projects/assemblenet/modeling/assemblenet_plus_test.py index 5eb6ae810e559983f8e7576274c34d4702c72e4a..a2799c0b045eb234b4a60c0d09016026032d5004 100644 --- a/official/projects/assemblenet/modeling/assemblenet_plus_test.py +++ b/official/projects/assemblenet/modeling/assemblenet_plus_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/assemblenet/modeling/rep_flow_2d_layer.py b/official/projects/assemblenet/modeling/rep_flow_2d_layer.py index d29968a668d6e6695b4442983239aeacd9b59b61..2b6439342ed06482d756e74570e694a744c040b5 100644 --- a/official/projects/assemblenet/modeling/rep_flow_2d_layer.py +++ b/official/projects/assemblenet/modeling/rep_flow_2d_layer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Contains definitions for 'Representation Flow' layer [1]. Representation flow layer is a generalization of optical flow extraction; the diff --git a/official/projects/assemblenet/train.py b/official/projects/assemblenet/train.py index 3107f807dbaf475aa340e613282d22ab2be0d03c..54b682ef059b1947120062763ce348aab5a2a391 100644 --- a/official/projects/assemblenet/train.py +++ b/official/projects/assemblenet/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 r"""Training driver. Commandline: @@ -29,9 +28,6 @@ from absl import flags from absl import logging import gin -# pylint: disable=unused-import -from official.common import registry_imports -# pylint: enable=unused-import from official.common import distribute_utils from official.common import flags as tfm_flags from official.core import task_factory @@ -42,6 +38,7 @@ from official.modeling import performance from official.projects.assemblenet.configs import assemblenet as asn_configs from official.projects.assemblenet.modeling import assemblenet as asn from official.projects.assemblenet.modeling import assemblenet_plus as asnp +from official.vision import registry_imports # pylint: enable=unused-import FLAGS = flags.FLAGS diff --git a/official/projects/assemblenet/train_test.py b/official/projects/assemblenet/train_test.py index c07fa1c63473fac97fcaaab173d1feee18605027..b3fda06790440539fb1767fd6c1dd79e2bb5b9f5 100644 --- a/official/projects/assemblenet/train_test.py +++ b/official/projects/assemblenet/train_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 import json import os import random @@ -22,7 +21,7 @@ from absl import logging from absl.testing import flagsaver import tensorflow as tf from official.projects.assemblenet import train as train_lib -from official.vision.beta.dataloaders import tfexample_utils +from official.vision.dataloaders import tfexample_utils FLAGS = flags.FLAGS diff --git a/official/projects/backbone_reuse/README.md b/official/projects/backbone_reuse/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0981c9e2260c5383df0dd7973427f69bc274d78c --- /dev/null +++ b/official/projects/backbone_reuse/README.md @@ -0,0 +1,41 @@ +# Proper Reuse of Image Classification Features Improves Object Detection + +This project brings the backbone freezing training approach into the Mask-RCNN +architecture. Please see the paper for more details +\([arxiv](https://arxiv.org/abs/2204.00484) - selected for oral presentation at +CVPR 2022\). + +### Training Mask-Rcnn Models with backbone frozen. + +#### Freezing Resnet-RS-101 checkpoint (ImageNet pretrained). + +1. Download the ResNet-RS-101 pretrained checkpoint from + [TF-Vision Model Garden](https://github.com/tensorflow/models/tree/master/official/vision#resnet-rs-models-trained-with-various-settings), + \([checkpoint](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz)\) + +2. Config files used in our Resnet-101 ablations are included in the + [configs folder](https://github.com/tensorflow/models/tree/master/official/projects/backbone_reuse/configs/experiments/faster_rcnn). + Select one according to the target architecture (FPN, NASFPN, NASFPN + + Cascades) and training schedule preference (shorter--72 epochs, or longer + --600 epochs). + +3. Change the config flag `init_checkpoint` to point to the downloaded file. + +You are all set. Follow the standard TFVision Mask-Rcnn training pipeline to +complete the training. + +#### How does it work? + +The config files set the task's flag `freeze_backbone: true`. This flag prevents +the pretrained backbone weights from being updated during the downstream model +training. + +## Citation + +``` +@inproceedings{vasconcelos2022backbonefreeze, + title = {Proper Reuse of Image Classification Features Improves Object Detection}, + author = {Cristina Vasconcelos and Vighnesh Birodkar and Vincent Dumoulin}, + booktitle={CVPR} + year={2022}, +``` diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_600epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_600epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..2696f3d0e81b4a3e223ca2d9c78f9847d0e339d5 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_600epochs.yaml @@ -0,0 +1,38 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: fpn + detection_head: + num_fcs: 2 + norm_activation: + activation: swish + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [1062734, 1090458] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 1108940 diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_72epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_72epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..423fd2d59cad108db91b02163fe741e4aafa2894 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_72epochs.yaml @@ -0,0 +1,38 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: fpn + detection_head: + num_fcs: 2 + norm_activation: + activation: swish + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [88704, 125664] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 133056 diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_cascade_600epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_cascade_600epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c6cecb4f8e9e6c3c1b65932f5dc7a9115768b495 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_cascade_600epochs.yaml @@ -0,0 +1,43 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: fpn + detection_head: + cascade_class_ensemble: true + class_agnostic_bbox_pred: true + num_fcs: 2 + input_size: [1280, 1280, 3] + norm_activation: + activation: swish + roi_sampler: + cascade_iou_thresholds: [0.7, 0.8] + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [1062734, 1090458] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 1108940 diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_cascade_72epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_cascade_72epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..482c655c83237f3d3474c87f318e3b4f5d6c3f54 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_fpn_cascade_72epochs.yaml @@ -0,0 +1,43 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: fpn + detection_head: + cascade_class_ensemble: true + class_agnostic_bbox_pred: true + num_fcs: 2 + input_size: [1280, 1280, 3] + norm_activation: + activation: swish + roi_sampler: + cascade_iou_thresholds: [0.7, 0.8] + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [88704, 125664] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 133056 diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_600epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_600epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1affdaf22e47ab990dee9da9edcdffe7cd078b30 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_600epochs.yaml @@ -0,0 +1,41 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: nasfpn + detection_head: + num_fcs: 2 + include_mask: false + max_level: 7 + min_level: 3 + norm_activation: + activation: swish + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [1062734, 1090458] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 1108940 diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_72epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_72epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..2c239c96f131cfe277d054e0c41207e885ac4898 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_72epochs.yaml @@ -0,0 +1,41 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: nasfpn + detection_head: + num_fcs: 2 + include_mask: false + max_level: 7 + min_level: 3 + norm_activation: + activation: swish + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [88704, 125664] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 133056 diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_cascade_600epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_cascade_600epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b2378c49499343ebdf7ee814c702552db69b4426 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_cascade_600epochs.yaml @@ -0,0 +1,45 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: nasfpn + detection_head: + cascade_class_ensemble: true + class_agnostic_bbox_pred: true + num_fcs: 2 + input_size: [1280, 1280, 3] + max_level: 7 + min_level: 3 + norm_activation: + activation: swish + roi_sampler: + cascade_iou_thresholds: [0.7, 0.8] + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [1062734, 1090458] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 1108940 diff --git a/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_cascade_72epochs.yaml b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_cascade_72epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..061aeeb9ac91cb088db38ca3101a02f7b16d49cd --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/faster_rcnn/fastrcnn_resnet101_nasfpn_cascade_72epochs.yaml @@ -0,0 +1,45 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: nasfpn + detection_head: + cascade_class_ensemble: true + class_agnostic_bbox_pred: true + num_fcs: 2 + input_size: [1280, 1280, 3] + max_level: 7 + min_level: 3 + norm_activation: + activation: swish + roi_sampler: + cascade_iou_thresholds: [0.7, 0.8] + train_data: + global_batch_size: 64 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [88704, 125664] + name: PiecewiseConstantDecay + offset: 0 + values: [0.16, 0.016, 0.0016] + type: stepwise + steps_per_loop: 1848 + summary_interval: 1848 + train_steps: 133056 diff --git a/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_fpn_600epochs.yaml b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_fpn_600epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7b4f285b890c43010edbd4968ae74d624825b15f --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_fpn_600epochs.yaml @@ -0,0 +1,34 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: fpn + train_data: + global_batch_size: 256 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [265684, 272615] + name: PiecewiseConstantDecay + offset: 0 + values: [0.32, 0.032, 0.0032] + type: stepwise + steps_per_loop: 462 + summary_interval: 462 + train_steps: 277235 diff --git a/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_fpn_72epochs.yaml b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_fpn_72epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..436f7a0c7f4d6912b0090aa74bf91bf0ad937072 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_fpn_72epochs.yaml @@ -0,0 +1,34 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: fpn + train_data: + global_batch_size: 256 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [22176, 31416] + name: PiecewiseConstantDecay + offset: 0 + values: [0.32, 0.032, 0.0032] + type: stepwise + steps_per_loop: 462 + summary_interval: 462 + train_steps: 33264 diff --git a/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_nasfpn_600epochs.yaml b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_nasfpn_600epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..13db989d4077630769b477e56b0faf1059fccd78 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_nasfpn_600epochs.yaml @@ -0,0 +1,34 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: nasfpn + train_data: + global_batch_size: 256 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [265684, 272615] + name: PiecewiseConstantDecay + offset: 0 + values: [0.32, 0.032, 0.0032] + type: stepwise + steps_per_loop: 462 + summary_interval: 462 + train_steps: 277235 diff --git a/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_nasfpn_72epochs.yaml b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_nasfpn_72epochs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..74a35ee9a5382e653b048f0bd3c15d9e4faa3294 --- /dev/null +++ b/official/projects/backbone_reuse/configs/experiments/retinanet/retinanet_resnet101_nasfpn_72epochs.yaml @@ -0,0 +1,34 @@ +task: + # init_checkpoint: 'a_pretrained_backbone_checkpoint' + init_checkpoint_modules: backbone + freeze_backbone: true + model: + backbone: + resnet: + model_id: 101 + replace_stem_max_pool: true + resnetd_shortcut: true + scale_stem: true + se_ratio: 0.25 + stem_type: v1 + type: resnet + decoder: + type: nasfpn + train_data: + global_batch_size: 256 + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.1 +trainer: + optimizer_config: + learning_rate: + stepwise: + boundaries: [22176, 31416] + name: PiecewiseConstantDecay + offset: 0 + values: [0.32, 0.032, 0.0032] + type: stepwise + steps_per_loop: 462 + summary_interval: 462 + train_steps: 33264 diff --git a/official/projects/basnet/configs/basnet.py b/official/projects/basnet/configs/basnet.py index 6a79e370f65c9c010d01483bdb5ce179a8ce3c18..3c971d3ca6615e4e4f0c4534bd39cbdeb6fb4fe7 100644 --- a/official/projects/basnet/configs/basnet.py +++ b/official/projects/basnet/configs/basnet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ from official.core import config_definitions as cfg from official.core import exp_factory from official.modeling import hyperparams from official.modeling import optimization -from official.vision.beta.configs import common +from official.vision.configs import common @dataclasses.dataclass diff --git a/official/projects/basnet/configs/basnet_test.py b/official/projects/basnet/configs/basnet_test.py index 2d0c40ef0cd5bc20855a052231cc78b84249f27e..3e474ab098dcf8e9e5d0674712430b6341da29f3 100644 --- a/official/projects/basnet/configs/basnet_test.py +++ b/official/projects/basnet/configs/basnet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/basnet/evaluation/metrics.py b/official/projects/basnet/evaluation/metrics.py index 0126bbc357e23372f7e6d7df5e70ca7c8a845687..88fb5907222fc6471fdc001f712b140f5294c552 100644 --- a/official/projects/basnet/evaluation/metrics.py +++ b/official/projects/basnet/evaluation/metrics.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/basnet/evaluation/metrics_test.py b/official/projects/basnet/evaluation/metrics_test.py index 26cb7d83551349f376908e2d8a0aec7812362690..e37b0185ff5bbc2d1ef76de8f57e7ca0d37010a4 100644 --- a/official/projects/basnet/evaluation/metrics_test.py +++ b/official/projects/basnet/evaluation/metrics_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/basnet/losses/basnet_losses.py b/official/projects/basnet/losses/basnet_losses.py index ece1f8646f0573e0b164abc57ba5f40775e3353c..023d3c6358f21add9935648e0225362484cadf1e 100644 --- a/official/projects/basnet/losses/basnet_losses.py +++ b/official/projects/basnet/losses/basnet_losses.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/basnet/modeling/basnet_model.py b/official/projects/basnet/modeling/basnet_model.py index cdcc978a8f6c989ce44c4bc8a59f186e9951b92b..cef6d456d6433fef9c2456787cc35578a8de4301 100644 --- a/official/projects/basnet/modeling/basnet_model.py +++ b/official/projects/basnet/modeling/basnet_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ import tensorflow as tf from official.modeling import tf_utils from official.projects.basnet.modeling import nn_blocks -from official.vision.beta.modeling.backbones import factory +from official.vision.modeling.backbones import factory # Specifications for BASNet encoder. # Each element in the block configuration is in the following format: diff --git a/official/projects/basnet/modeling/basnet_model_test.py b/official/projects/basnet/modeling/basnet_model_test.py index 8f919d7fa5026e17d25409a4d4a67254c8a6b964..8f59904e5d1ba2e4163728c43b9967537e53718a 100644 --- a/official/projects/basnet/modeling/basnet_model_test.py +++ b/official/projects/basnet/modeling/basnet_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/basnet/modeling/nn_blocks.py b/official/projects/basnet/modeling/nn_blocks.py index c0815ab7094c3ef71873cfc7a71e36f59da0d520..1254c9c78d8d24a8e52086dc930dd19aaa3669c3 100644 --- a/official/projects/basnet/modeling/nn_blocks.py +++ b/official/projects/basnet/modeling/nn_blocks.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/basnet/modeling/refunet.py b/official/projects/basnet/modeling/refunet.py index 0a730f4c7807a381e661c260b460815ad85c78fd..a052adc9ef4802f273fb3c6e4fffb8eb717fd315 100644 --- a/official/projects/basnet/modeling/refunet.py +++ b/official/projects/basnet/modeling/refunet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/basnet/serving/basnet.py b/official/projects/basnet/serving/basnet.py index 84c0617c16a40bec3bd2ebef893d16503c796db7..d25f11c18f58fc73de69693bb693a73a52390c4d 100644 --- a/official/projects/basnet/serving/basnet.py +++ b/official/projects/basnet/serving/basnet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,11 +17,7 @@ import tensorflow as tf from official.projects.basnet.tasks import basnet -from official.vision.beta.serving import semantic_segmentation - - -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) +from official.vision.serving import semantic_segmentation class BASNetModule(semantic_segmentation.SegmentationModule): diff --git a/official/projects/basnet/serving/export_saved_model.py b/official/projects/basnet/serving/export_saved_model.py index a08a1bf5a470b32974445d10d87aa819259debd4..417beac57fde4999c2bd8f68fc4eb0d7b2e9e79e 100644 --- a/official/projects/basnet/serving/export_saved_model.py +++ b/official/projects/basnet/serving/export_saved_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -41,7 +41,7 @@ from absl import flags from official.core import exp_factory from official.modeling import hyperparams from official.projects.basnet.serving import basnet -from official.vision.beta.serving import export_saved_model_lib +from official.vision.serving import export_saved_model_lib FLAGS = flags.FLAGS diff --git a/official/projects/basnet/tasks/basnet.py b/official/projects/basnet/tasks/basnet.py index 99e97586a80630710c5280515e41f2ecdd19dc09..07332a552590fbad657d67e9aa573b6f24353f74 100644 --- a/official/projects/basnet/tasks/basnet.py +++ b/official/projects/basnet/tasks/basnet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -27,13 +27,13 @@ from official.projects.basnet.evaluation import metrics as basnet_metrics from official.projects.basnet.losses import basnet_losses from official.projects.basnet.modeling import basnet_model from official.projects.basnet.modeling import refunet -from official.vision.beta.dataloaders import segmentation_input +from official.vision.dataloaders import segmentation_input def build_basnet_model( input_specs: tf.keras.layers.InputSpec, model_config: exp_cfg.BASNetModel, - l2_regularizer: tf.keras.regularizers.Regularizer = None): + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None): """Builds BASNet model.""" norm_activation_config = model_config.norm_activation backbone = basnet_model.BASNetEncoder( @@ -203,8 +203,7 @@ class BASNetTask(base_task.Task): # For mixed_precision policy, when LossScaleOptimizer is used, loss is # scaled for numerical stability. - if isinstance( - optimizer, tf.keras.mixed_precision.experimental.LossScaleOptimizer): + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): scaled_loss = optimizer.get_scaled_loss(scaled_loss) tvars = model.trainable_variables @@ -212,8 +211,7 @@ class BASNetTask(base_task.Task): # Scales back gradient before apply_gradients when LossScaleOptimizer is # used. - if isinstance( - optimizer, tf.keras.mixed_precision.experimental.LossScaleOptimizer): + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): grads = optimizer.get_unscaled_gradients(grads) # Apply gradient clipping. diff --git a/official/projects/basnet/train.py b/official/projects/basnet/train.py index c65604f5d8bf5cc8d44b88fa132ed838ec833d69..d30321ac37337c14c4d0fc099eacd91e7d76b31e 100644 --- a/official/projects/basnet/train.py +++ b/official/projects/basnet/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """TensorFlow Model Garden Vision training driver.""" from absl import app @@ -23,7 +22,7 @@ from official.projects.basnet.configs import basnet as basnet_cfg from official.projects.basnet.modeling import basnet_model from official.projects.basnet.modeling import refunet from official.projects.basnet.tasks import basnet as basenet_task -from official.vision.beta import train +from official.vision import train if __name__ == '__main__': diff --git a/official/nlp/projects/bigbird/README.md b/official/projects/bigbird/README.md similarity index 100% rename from official/nlp/projects/bigbird/README.md rename to official/projects/bigbird/README.md diff --git a/official/projects/bigbird/__init__.py b/official/projects/bigbird/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/bigbird/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/bigbird/encoder.py b/official/projects/bigbird/encoder.py new file mode 100644 index 0000000000000000000000000000000000000000..f0d0af458ab7df8b57bcb294382d345f75e8c579 --- /dev/null +++ b/official/projects/bigbird/encoder.py @@ -0,0 +1,238 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Transformer-based text encoder network.""" +# pylint: disable=g-classes-have-attributes + +import tensorflow as tf + +from official.modeling import activations +from official.nlp import modeling +from official.nlp.modeling import layers +from official.projects.bigbird import recompute_grad +from official.projects.bigbird import recomputing_dropout + + +_MAX_SEQ_LEN = 4096 + + +class RecomputeTransformerLayer(layers.TransformerScaffold): + """Transformer layer that recomputes the forward pass during backpropagation.""" + + def call(self, inputs, training=None): + emb, mask = inputs + def f(*args): + # recompute_grad can only handle tensor inputs. so we enumerate the + # nested input [emb, mask] as follows: + # args[0]: emb + # args[1]: mask[0] = band_mask + # args[2]: mask[1] = encoder_from_mask + # args[3]: mask[2] = encoder_to_mask + # args[4]: mask[3] = blocked_encoder_mask + x = super(RecomputeTransformerLayer, + self).call([args[0], [args[1], args[2], args[3], args[4]]], + training=training) + return x + + f = recompute_grad.recompute_grad(f) + + return f(emb, *mask) + + +@tf.keras.utils.register_keras_serializable(package='Text') +class BigBirdEncoder(tf.keras.Model): + """Transformer-based encoder network with BigBird attentions. + + *Note* that the network is constructed by + [Keras Functional API](https://keras.io/guides/functional_api/). + + Args: + vocab_size: The size of the token vocabulary. + hidden_size: The size of the transformer hidden layers. + num_layers: The number of transformer layers. + num_attention_heads: The number of attention heads for each transformer. The + hidden size must be divisible by the number of attention heads. + max_position_embeddings: The maximum length of position embeddings that this + encoder can consume. If None, max_position_embeddings uses the value from + sequence length. This determines the variable shape for positional + embeddings. + type_vocab_size: The number of types that the 'type_ids' input can take. + intermediate_size: The intermediate size for the transformer layers. + block_size: int. A BigBird Attention parameter: size of block in from/to + sequences. + num_rand_blocks: int. A BigBird Attention parameter: number of random chunks + per row. + activation: The activation to use for the transformer layers. + dropout_rate: The dropout rate to use for the transformer layers. + attention_dropout_rate: The dropout rate to use for the attention layers + within the transformer layers. + initializer: The initialzer to use for all weights in this encoder. + embedding_width: The width of the word embeddings. If the embedding width is + not equal to hidden size, embedding parameters will be factorized into two + matrices in the shape of ['vocab_size', 'embedding_width'] and + ['embedding_width', 'hidden_size'] ('embedding_width' is usually much + smaller than 'hidden_size'). + use_gradient_checkpointing: Use gradient checkpointing to trade-off compute + for memory. + """ + + def __init__(self, + vocab_size, + hidden_size=768, + num_layers=12, + num_attention_heads=12, + max_position_embeddings=_MAX_SEQ_LEN, + type_vocab_size=16, + intermediate_size=3072, + block_size=64, + num_rand_blocks=3, + activation=activations.gelu, + dropout_rate=0.1, + attention_dropout_rate=0.1, + initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02), + embedding_width=None, + use_gradient_checkpointing=False, + **kwargs): + activation = tf.keras.activations.get(activation) + initializer = tf.keras.initializers.get(initializer) + + if use_gradient_checkpointing: + tf.keras.layers.Dropout = recomputing_dropout.RecomputingDropout + layer_cls = RecomputeTransformerLayer + else: + layer_cls = layers.TransformerScaffold + + self._self_setattr_tracking = False + self._config_dict = { + 'vocab_size': vocab_size, + 'hidden_size': hidden_size, + 'num_layers': num_layers, + 'num_attention_heads': num_attention_heads, + 'max_position_embeddings': max_position_embeddings, + 'type_vocab_size': type_vocab_size, + 'intermediate_size': intermediate_size, + 'block_size': block_size, + 'num_rand_blocks': num_rand_blocks, + 'activation': tf.keras.activations.serialize(activation), + 'dropout_rate': dropout_rate, + 'attention_dropout_rate': attention_dropout_rate, + 'initializer': tf.keras.initializers.serialize(initializer), + 'embedding_width': embedding_width, + } + + word_ids = tf.keras.layers.Input( + shape=(None,), dtype=tf.int32, name='input_word_ids') + mask = tf.keras.layers.Input( + shape=(None,), dtype=tf.int32, name='input_mask') + type_ids = tf.keras.layers.Input( + shape=(None,), dtype=tf.int32, name='input_type_ids') + + if embedding_width is None: + embedding_width = hidden_size + self._embedding_layer = modeling.layers.OnDeviceEmbedding( + vocab_size=vocab_size, + embedding_width=embedding_width, + initializer=initializer, + name='word_embeddings') + word_embeddings = self._embedding_layer(word_ids) + + # Always uses dynamic slicing for simplicity. + self._position_embedding_layer = modeling.layers.PositionEmbedding( + initializer=initializer, + max_length=max_position_embeddings, + name='position_embedding') + position_embeddings = self._position_embedding_layer(word_embeddings) + self._type_embedding_layer = modeling.layers.OnDeviceEmbedding( + vocab_size=type_vocab_size, + embedding_width=embedding_width, + initializer=initializer, + use_one_hot=True, + name='type_embeddings') + type_embeddings = self._type_embedding_layer(type_ids) + + embeddings = tf.keras.layers.Add()( + [word_embeddings, position_embeddings, type_embeddings]) + + self._embedding_norm_layer = tf.keras.layers.LayerNormalization( + name='embeddings/layer_norm', axis=-1, epsilon=1e-12, dtype=tf.float32) + + embeddings = self._embedding_norm_layer(embeddings) + embeddings = tf.keras.layers.Dropout(rate=dropout_rate)(embeddings) + + # We project the 'embedding' output to 'hidden_size' if it is not already + # 'hidden_size'. + if embedding_width != hidden_size: + self._embedding_projection = tf.keras.layers.EinsumDense( + '...x,xy->...y', + output_shape=hidden_size, + bias_axes='y', + kernel_initializer=initializer, + name='embedding_projection') + embeddings = self._embedding_projection(embeddings) + + self._transformer_layers = [] + data = embeddings + masks = layers.BigBirdMasks(block_size=block_size)( + data, mask) + encoder_outputs = [] + attn_head_dim = hidden_size // num_attention_heads + for i in range(num_layers): + layer = layer_cls( + num_attention_heads, + intermediate_size, + activation, + attention_cls=layers.BigBirdAttention, + attention_cfg=dict( + num_heads=num_attention_heads, + key_dim=attn_head_dim, + kernel_initializer=initializer, + from_block_size=block_size, + to_block_size=block_size, + num_rand_blocks=num_rand_blocks, + max_rand_mask_length=max_position_embeddings, + seed=i), + dropout_rate=dropout_rate, + attention_dropout_rate=dropout_rate, + kernel_initializer=initializer) + self._transformer_layers.append(layer) + data = layer([data, masks]) + encoder_outputs.append(data) + + outputs = dict( + sequence_output=encoder_outputs[-1], encoder_outputs=encoder_outputs) + super().__init__( + inputs=[word_ids, mask, type_ids], outputs=outputs, **kwargs) + + def get_embedding_table(self): + return self._embedding_layer.embeddings + + def get_embedding_layer(self): + return self._embedding_layer + + def get_config(self): + return self._config_dict + + @property + def transformer_layers(self): + """List of Transformer layers in the encoder.""" + return self._transformer_layers + + @property + def pooler_layer(self): + """The pooler dense layer after the transformer layers.""" + return self._pooler_layer + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) diff --git a/official/projects/bigbird/encoder_test.py b/official/projects/bigbird/encoder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..9b6833720501dab7d0d3258dee5e0826b8987738 --- /dev/null +++ b/official/projects/bigbird/encoder_test.py @@ -0,0 +1,63 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for official.nlp.projects.bigbird.encoder.""" + +import numpy as np +import tensorflow as tf + +from official.projects.bigbird import encoder + + +class BigBirdEncoderTest(tf.test.TestCase): + + def test_encoder(self): + sequence_length = 1024 + batch_size = 2 + vocab_size = 1024 + network = encoder.BigBirdEncoder( + num_layers=1, vocab_size=1024, max_position_embeddings=4096) + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length)) + mask_data = np.random.randint(2, size=(batch_size, sequence_length)) + type_id_data = np.random.randint(2, size=(batch_size, sequence_length)) + outputs = network([word_id_data, mask_data, type_id_data]) + self.assertEqual(outputs["sequence_output"].shape, + (batch_size, sequence_length, 768)) + + def test_save_restore(self): + sequence_length = 1024 + batch_size = 2 + vocab_size = 1024 + network = encoder.BigBirdEncoder( + num_layers=1, vocab_size=1024, max_position_embeddings=4096) + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length)) + mask_data = np.random.randint(2, size=(batch_size, sequence_length)) + type_id_data = np.random.randint(2, size=(batch_size, sequence_length)) + inputs = dict( + input_word_ids=word_id_data, + input_mask=mask_data, + input_type_ids=type_id_data) + ref_outputs = network(inputs) + model_path = self.get_temp_dir() + "/model" + network.save(model_path) + loaded = tf.keras.models.load_model(model_path) + outputs = loaded(inputs) + self.assertAllClose(outputs["sequence_output"], + ref_outputs["sequence_output"]) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/projects/bigbird/experiment_configs.py b/official/projects/bigbird/experiment_configs.py new file mode 100644 index 0000000000000000000000000000000000000000..0ad3e4e5820ab5627bfc4a477e10f4274fdb3b33 --- /dev/null +++ b/official/projects/bigbird/experiment_configs.py @@ -0,0 +1,100 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Bigbird experiment configurations.""" +# pylint: disable=g-doc-return-or-yield,line-too-long +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import optimization +from official.nlp.data import question_answering_dataloader +from official.nlp.data import sentence_prediction_dataloader +from official.nlp.tasks import question_answering +from official.nlp.tasks import sentence_prediction + + +@exp_factory.register_config_factory('bigbird/glue') +def bigbird_glue() -> cfg.ExperimentConfig: + r"""BigBird GLUE.""" + config = cfg.ExperimentConfig( + task=sentence_prediction.SentencePredictionConfig( + train_data=sentence_prediction_dataloader + .SentencePredictionDataConfig(), + validation_data=sentence_prediction_dataloader + .SentencePredictionDataConfig( + is_training=False, drop_remainder=False)), + trainer=cfg.TrainerConfig( + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': + 0.01, + 'exclude_from_weight_decay': + ['LayerNorm', 'layer_norm', 'bias'], + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 3e-5, + 'end_learning_rate': 0.0, + } + }, + 'warmup': { + 'type': 'polynomial' + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + config.task.model.encoder.type = 'bigbird' + return config + + +@exp_factory.register_config_factory('bigbird/squad') +def bigbird_squad() -> cfg.ExperimentConfig: + r"""BigBird Squad V1/V2.""" + config = cfg.ExperimentConfig( + task=question_answering.QuestionAnsweringConfig( + train_data=question_answering_dataloader.QADataConfig(), + validation_data=question_answering_dataloader.QADataConfig()), + trainer=cfg.TrainerConfig( + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': + 0.01, + 'exclude_from_weight_decay': + ['LayerNorm', 'layer_norm', 'bias'], + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 8e-5, + 'end_learning_rate': 0.0, + } + }, + 'warmup': { + 'type': 'polynomial' + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + config.task.model.encoder.type = 'bigbird' + return config diff --git a/official/nlp/projects/bigbird/experiments/glue_mnli_matched.yaml b/official/projects/bigbird/experiments/glue_mnli_matched.yaml similarity index 100% rename from official/nlp/projects/bigbird/experiments/glue_mnli_matched.yaml rename to official/projects/bigbird/experiments/glue_mnli_matched.yaml diff --git a/official/nlp/projects/bigbird/experiments/squad_v1.yaml b/official/projects/bigbird/experiments/squad_v1.yaml similarity index 100% rename from official/nlp/projects/bigbird/experiments/squad_v1.yaml rename to official/projects/bigbird/experiments/squad_v1.yaml diff --git a/official/nlp/projects/bigbird/recompute_grad.py b/official/projects/bigbird/recompute_grad.py similarity index 99% rename from official/nlp/projects/bigbird/recompute_grad.py rename to official/projects/bigbird/recompute_grad.py index d570ba848be467425f6cb3177fb1b8587a25632d..be9424d60316c19b9ba9ec46c997d07868410a41 100644 --- a/official/nlp/projects/bigbird/recompute_grad.py +++ b/official/projects/bigbird/recompute_grad.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/projects/bigbird/recomputing_dropout.py b/official/projects/bigbird/recomputing_dropout.py similarity index 96% rename from official/nlp/projects/bigbird/recomputing_dropout.py rename to official/projects/bigbird/recomputing_dropout.py index 3a0cfa31c2143d2dd06505badf7f66a5af658d7a..fb3e565b9662413a9f48584ac325ac15805f7fd3 100644 --- a/official/nlp/projects/bigbird/recomputing_dropout.py +++ b/official/projects/bigbird/recomputing_dropout.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ import numpy as np import tensorflow as tf -from official.nlp.projects.bigbird import recompute_grad as recompute_grad_lib -from official.nlp.projects.bigbird import stateless_dropout as stateless_dropout_lib +from official.projects.bigbird import recompute_grad as recompute_grad_lib +from official.projects.bigbird import stateless_dropout as stateless_dropout_lib # Reimplements internal function diff --git a/official/nlp/projects/bigbird/stateless_dropout.py b/official/projects/bigbird/stateless_dropout.py similarity index 98% rename from official/nlp/projects/bigbird/stateless_dropout.py rename to official/projects/bigbird/stateless_dropout.py index d61b313b5465d7eb2ada787c70ad97035fd098d4..49941253c646bce30fa173881ee7d04d9ee82b14 100644 --- a/official/nlp/projects/bigbird/stateless_dropout.py +++ b/official/projects/bigbird/stateless_dropout.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/centernet/README.md b/official/projects/centernet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b6a182aa6cc7dc730e31c5362daa1801b125fcb1 --- /dev/null +++ b/official/projects/centernet/README.md @@ -0,0 +1,82 @@ +# Centernet + +[![Paper](http://img.shields.io/badge/Paper-arXiv.1904.07850-B3181B?logo=arXiv)](https://arxiv.org/abs/1904.07850) + +Centernet builds upon CornerNet, an anchor-free model for object detection. + +Many other models, such as YOLO and RetinaNet, use anchor boxes. These anchor +boxes are predefined to be close to the aspect ratios and scales of the objects +in the training dataset. Anchor-based models do not predict the bounding boxes +of objects directly. They instead predict the location and size/shape +refinements to a predefined anchor box. The detection generator then computes +the final confidences, positions, and size of the detection. + +CornerNet eliminates the need for anchor boxes. RetinaNet needs thousands of +anchor boxes in order to cover the most common ground truth boxes. This adds +unnecessary complexity to the model which slow down training and create +imbalances in positive and negative anchor boxes. Instead, CornerNet creates +heatmaps for each of the corners and pools them together in order to get the +final detection boxes for the objects. CenterNet removes even more complexity +by using the center instead of the corners, meaning that only one set of +heatmaps (one heatmap for each class) is needed to predict the object. CenterNet +proves that this can be done without a significant difference in accuracy. + + +## Environment setup + +The code can be run on multiple GPUs or TPUs with different distribution +strategies. See the TensorFlow distributed training +[guide](https://www.tensorflow.org/guide/distributed_training) for an overview +of `tf.distribute`. + +The code is compatible with TensorFlow 2.5+. See requirements.txt for all +prerequisites, and you can also install them using the following command. `pip +install -r ./official/requirements.txt` + +## Training +To train the model on Coco, try the following command: + +``` +python3 -m official.vision.beta.projects.centernet.train \ + --mode=train_and_eval \ + --experiment=centernet_hourglass_coco \ + --model_dir={MODEL_DIR} \ + --config_file={CONFIG_FILE} +``` + +## Configurations + +In the following table, we report the mAP measured on the `coco-val2017` set. + +Backbone | Config name | mAP +:--------------- | :-----------------------------------------------| -------: +Hourglass-104 | `coco-centernet-hourglass-gpu.yaml` | 40.01 +Hourglass-104 | `coco-centernet-hourglass-tpu.yaml` | 40.5 + +**Note:** `float16` (`bfloat16` for TPU) is used in the provided configurations. + + +## Cite + +[Centernet](https://arxiv.org/abs/1904.07850): +``` +@article{Zhou2019ObjectsAP, + title={Objects as Points}, + author={Xingyi Zhou and Dequan Wang and Philipp Kr{\"a}henb{\"u}hl}, + journal={ArXiv}, + year={2019}, + volume={abs/1904.07850} +} +``` + +[CornerNet](https://arxiv.org/abs/1808.01244): +``` +@article{Law2019CornerNetDO, + title={CornerNet: Detecting Objects as Paired Keypoints}, + author={Hei Law and J. Deng}, + journal={International Journal of Computer Vision}, + year={2019}, + volume={128}, + pages={642-656} +} +``` diff --git a/official/projects/centernet/__init__.py b/official/projects/centernet/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/centernet/common/__init__.py b/official/projects/centernet/common/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/common/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/centernet/common/registry_imports.py b/official/projects/centernet/common/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..49631514340480654dcd7706bda011f3cbedb08c --- /dev/null +++ b/official/projects/centernet/common/registry_imports.py @@ -0,0 +1,22 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration.""" + +# pylint: disable=unused-import +from official.projects.centernet.configs import centernet +from official.projects.centernet.modeling import centernet_model +from official.projects.centernet.modeling.backbones import hourglass +from official.projects.centernet.tasks import centernet as centernet_task +from official.vision import registry_imports diff --git a/official/projects/centernet/configs/__init__.py b/official/projects/centernet/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/configs/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/centernet/configs/backbones.py b/official/projects/centernet/configs/backbones.py new file mode 100644 index 0000000000000000000000000000000000000000..170aa4969324ef107c9f4a5d156a70d72e746aca --- /dev/null +++ b/official/projects/centernet/configs/backbones.py @@ -0,0 +1,35 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Backbones configurations.""" + +import dataclasses + +from official.modeling import hyperparams +from official.vision.configs import backbones + + +@dataclasses.dataclass +class Hourglass(hyperparams.Config): + """Hourglass config.""" + model_id: int = 52 + input_channel_dims: int = 128 + num_hourglasses: int = 2 + initial_downsample: bool = True + activation: str = 'relu' + + +@dataclasses.dataclass +class Backbone(backbones.Backbone): + hourglass: Hourglass = Hourglass() diff --git a/official/projects/centernet/configs/centernet.py b/official/projects/centernet/configs/centernet.py new file mode 100644 index 0000000000000000000000000000000000000000..14f950e1285334c2a0e19d4a46754c8a464c9d98 --- /dev/null +++ b/official/projects/centernet/configs/centernet.py @@ -0,0 +1,226 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""CenterNet configuration definition.""" + +import dataclasses +import os +from typing import List, Optional, Tuple + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.projects.centernet.configs import backbones +from official.vision.configs import common + + +TfExampleDecoderLabelMap = common.TfExampleDecoderLabelMap + + +@dataclasses.dataclass +class TfExampleDecoder(hyperparams.Config): + regenerate_source_id: bool = False + + +@dataclasses.dataclass +class DataDecoder(hyperparams.OneOfConfig): + type: Optional[str] = 'simple_decoder' + simple_decoder: TfExampleDecoder = TfExampleDecoder() + label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap() + + +@dataclasses.dataclass +class Parser(hyperparams.Config): + """Config for parser.""" + bgr_ordering: bool = True + aug_rand_hflip: bool = True + aug_scale_min: float = 1.0 + aug_scale_max: float = 1.0 + aug_rand_saturation: bool = False + aug_rand_brightness: bool = False + aug_rand_hue: bool = False + aug_rand_contrast: bool = False + odapi_augmentation: bool = False + channel_means: Tuple[float, float, float] = dataclasses.field( + default_factory=lambda: (104.01362025, 114.03422265, 119.9165958)) + channel_stds: Tuple[float, float, float] = dataclasses.field( + default_factory=lambda: (73.6027665, 69.89082075, 70.9150767)) + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + input_path: str = '' + global_batch_size: int = 32 + is_training: bool = True + dtype: str = 'float16' + decoder: DataDecoder = DataDecoder() + parser: Parser = Parser() + shuffle_buffer_size: int = 10000 + file_type: str = 'tfrecord' + drop_remainder: bool = True + + +@dataclasses.dataclass +class DetectionLoss(hyperparams.Config): + object_center_weight: float = 1.0 + offset_weight: float = 1.0 + scale_weight: float = 0.1 + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + detection: DetectionLoss = DetectionLoss() + gaussian_iou: float = 0.7 + class_offset: int = 1 + + +@dataclasses.dataclass +class CenterNetHead(hyperparams.Config): + heatmap_bias: float = -2.19 + input_levels: List[str] = dataclasses.field( + default_factory=lambda: ['2_0', '2']) + + +@dataclasses.dataclass +class CenterNetDetectionGenerator(hyperparams.Config): + max_detections: int = 100 + peak_error: float = 1e-6 + peak_extract_kernel_size: int = 3 + class_offset: int = 1 + use_nms: bool = False + nms_pre_thresh: float = 0.1 + nms_thresh: float = 0.4 + use_reduction_sum: bool = True + + +@dataclasses.dataclass +class CenterNetModel(hyperparams.Config): + """Config for centernet model.""" + num_classes: int = 90 + max_num_instances: int = 128 + input_size: List[int] = dataclasses.field(default_factory=list) + backbone: backbones.Backbone = backbones.Backbone( + type='hourglass', hourglass=backbones.Hourglass(model_id=52)) + head: CenterNetHead = CenterNetHead() + # pylint: disable=line-too-long + detection_generator: CenterNetDetectionGenerator = CenterNetDetectionGenerator() + norm_activation: common.NormActivation = common.NormActivation( + norm_momentum=0.1, norm_epsilon=1e-5, use_sync_bn=True) + + +@dataclasses.dataclass +class CenterNetDetection(hyperparams.Config): + # use_center is the only option implemented currently. + use_centers: bool = True + + +@dataclasses.dataclass +class CenterNetSubTasks(hyperparams.Config): + detection: CenterNetDetection = CenterNetDetection() + + +@dataclasses.dataclass +class CenterNetTask(cfg.TaskConfig): + """Config for centernet task.""" + model: CenterNetModel = CenterNetModel() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False) + subtasks: CenterNetSubTasks = CenterNetSubTasks() + losses: Losses = Losses() + gradient_clip_norm: float = 10.0 + per_category_metrics: bool = False + weight_decay: float = 5e-4 + # Load checkpoints + init_checkpoint: Optional[str] = None + init_checkpoint_modules: str = 'all' + annotation_file: Optional[str] = None + + def get_output_length_dict(self): + task_outputs = {} + if self.subtasks.detection and self.subtasks.detection.use_centers: + task_outputs.update({ + 'ct_heatmaps': self.model.num_classes, + 'ct_offset': 2, + 'ct_size': 2 + }) + else: + raise ValueError('Detection with center point is only available ') + return task_outputs + + +COCO_INPUT_PATH_BASE = 'coco' +COCO_TRAIN_EXAMPLES = 118287 +COCO_VAL_EXAMPLES = 5000 + + +@exp_factory.register_config_factory('centernet_hourglass_coco') +def centernet_hourglass_coco() -> cfg.ExperimentConfig: + """COCO object detection with CenterNet.""" + train_batch_size = 128 + eval_batch_size = 8 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + + config = cfg.ExperimentConfig( + task=CenterNetTask( + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=CenterNetModel(), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser(), + shuffle_buffer_size=2), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + shuffle_buffer_size=2), + ), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=150 * steps_per_epoch, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adam', + 'adam': { + 'epsilon': 1e-7 + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.001, + 'decay_steps': 150 * steps_per_epoch + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config diff --git a/official/vision/beta/projects/centernet/configs/centernet_test.py b/official/projects/centernet/configs/centernet_test.py similarity index 84% rename from official/vision/beta/projects/centernet/configs/centernet_test.py rename to official/projects/centernet/configs/centernet_test.py index 93e3b8f02665dcb2e4fc4cca24f18fed426c9256..06fbadd56ab9e64e0eb925913b6846660825c994 100644 --- a/official/vision/beta/projects/centernet/configs/centernet_test.py +++ b/official/projects/centernet/configs/centernet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,8 +19,8 @@ import tensorflow as tf from official.core import config_definitions as cfg from official.core import exp_factory -from official.vision.beta.projects.centernet.common import registry_imports # pylint: disable=unused-import -from official.vision.beta.projects.centernet.configs import centernet as exp_cfg +from official.projects.centernet.common import registry_imports # pylint: disable=unused-import +from official.projects.centernet.configs import centernet as exp_cfg class CenterNetConfigTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/vision/beta/projects/centernet/configs/experiments/coco-centernet-hourglass-gpu.yaml b/official/projects/centernet/configs/experiments/coco-centernet-hourglass-gpu.yaml similarity index 82% rename from official/vision/beta/projects/centernet/configs/experiments/coco-centernet-hourglass-gpu.yaml rename to official/projects/centernet/configs/experiments/coco-centernet-hourglass-gpu.yaml index 6483de509f3d6f5a632b2dfd4fef6dab128d3332..a4d665e7a91384bf2317da14bedd8fd612ca7398 100644 --- a/official/vision/beta/projects/centernet/configs/experiments/coco-centernet-hourglass-gpu.yaml +++ b/official/projects/centernet/configs/experiments/coco-centernet-hourglass-gpu.yaml @@ -38,11 +38,11 @@ task: per_category_metrics: false weight_decay: 0.0005 gradient_clip_norm: 10.0 - annotation_file: 'coco/instances_val2017.json' - init_checkpoint: '/placer/prod/scratch/home/tf-model-garden-dev/vision/centernet/extremenet_hg104_512x512_coco17/2021-10-19' + annotation_file: '/readahead/200M/placer/prod/home/tensorflow-performance-data/datasets/coco/instances_val2017.json' + init_checkpoint: gs://tf_model_garden/vision/centernet/extremenet_hg104_512x512_coco17 init_checkpoint_modules: 'backbone' train_data: - input_path: 'coco/train*' + input_path: '/readahead/200M/placer/prod/home/tensorflow-performance-data/datasets/coco/train*' drop_remainder: true dtype: 'float16' global_batch_size: 64 @@ -57,7 +57,7 @@ task: aug_rand_contrast: true odapi_augmentation: true validation_data: - input_path: 'coco/val*' + input_path: '/readahead/200M/placer/prod/home/tensorflow-performance-data/datasets/coco/val*' drop_remainder: false dtype: 'float16' global_batch_size: 16 diff --git a/official/vision/beta/projects/centernet/configs/experiments/coco-centernet-hourglass-tpu.yaml b/official/projects/centernet/configs/experiments/coco-centernet-hourglass-tpu.yaml similarity index 80% rename from official/vision/beta/projects/centernet/configs/experiments/coco-centernet-hourglass-tpu.yaml rename to official/projects/centernet/configs/experiments/coco-centernet-hourglass-tpu.yaml index 23456708349b8507a683f9ee24ca60317ba5e42d..5d60831e5ed840bc3d1385929e1b61152f903f03 100644 --- a/official/vision/beta/projects/centernet/configs/experiments/coco-centernet-hourglass-tpu.yaml +++ b/official/projects/centernet/configs/experiments/coco-centernet-hourglass-tpu.yaml @@ -37,11 +37,11 @@ task: per_category_metrics: false weight_decay: 0.0005 gradient_clip_norm: 10.0 - annotation_file: 'coco/instances_val2017.json' - init_checkpoint: '/placer/prod/scratch/home/tf-model-garden-dev/vision/centernet/extremenet_hg104_512x512_coco17/2021-10-19' + annotation_file: '/readahead/200M/placer/prod/home/tensorflow-performance-data/datasets/coco/instances_val2017.json' + init_checkpoint: gs://tf_model_garden/vision/centernet/extremenet_hg104_512x512_coco17 init_checkpoint_modules: 'backbone' train_data: - input_path: 'coco/train*' + input_path: '/readahead/200M/placer/prod/home/tensorflow-performance-data/datasets/coco/train*' drop_remainder: true dtype: 'bfloat16' global_batch_size: 128 @@ -56,14 +56,14 @@ task: aug_rand_contrast: true odapi_augmentation: true validation_data: - input_path: 'coco/val*' + input_path: '/readahead/200M/placer/prod/home/tensorflow-performance-data/datasets/coco/val*' drop_remainder: false dtype: 'bfloat16' - global_batch_size: 16 + global_batch_size: 64 is_training: false trainer: train_steps: 140000 - validation_steps: 78 # 5000 / 16 + validation_steps: 78 # 5000 / 64 steps_per_loop: 924 # 118287 / 128 validation_interval: 924 summary_interval: 924 diff --git a/official/projects/centernet/dataloaders/__init__.py b/official/projects/centernet/dataloaders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/dataloaders/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/centernet/dataloaders/centernet_input.py b/official/projects/centernet/dataloaders/centernet_input.py similarity index 96% rename from official/vision/beta/projects/centernet/dataloaders/centernet_input.py rename to official/projects/centernet/dataloaders/centernet_input.py index 373b7a87e364d0fa58350287e14f62aa9d0e10ea..b44d98d213cb88d06010ddc6dc38f90cf4d07f87 100644 --- a/official/vision/beta/projects/centernet/dataloaders/centernet_input.py +++ b/official/projects/centernet/dataloaders/centernet_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,13 +18,13 @@ from typing import Tuple import tensorflow as tf -from official.vision.beta.dataloaders import parser -from official.vision.beta.dataloaders import utils -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.projects.centernet.ops import box_list -from official.vision.beta.projects.centernet.ops import box_list_ops -from official.vision.beta.projects.centernet.ops import preprocess_ops as cn_prep_ops +from official.projects.centernet.ops import box_list +from official.projects.centernet.ops import box_list_ops +from official.projects.centernet.ops import preprocess_ops as cn_prep_ops +from official.vision.dataloaders import parser +from official.vision.dataloaders import utils +from official.vision.ops import box_ops +from official.vision.ops import preprocess_ops CHANNEL_MEANS = (104.01362025, 114.03422265, 119.9165958) diff --git a/official/projects/centernet/losses/__init__.py b/official/projects/centernet/losses/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/losses/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/centernet/losses/centernet_losses.py b/official/projects/centernet/losses/centernet_losses.py similarity index 98% rename from official/vision/beta/projects/centernet/losses/centernet_losses.py rename to official/projects/centernet/losses/centernet_losses.py index a83f8ae8143b7824f16c6c1f0cd8c29f3ab924aa..4cb7b0fe8d0eb9df1a84da763cd0b8c5eb80f229 100644 --- a/official/vision/beta/projects/centernet/losses/centernet_losses.py +++ b/official/projects/centernet/losses/centernet_losses.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/centernet/losses/centernet_losses_test.py b/official/projects/centernet/losses/centernet_losses_test.py similarity index 96% rename from official/vision/beta/projects/centernet/losses/centernet_losses_test.py rename to official/projects/centernet/losses/centernet_losses_test.py index ac1e699a2af4eaab8fbcb6dc5d39201db324388a..3be0341d456b3fb70dea69c5a9055abf80269694 100644 --- a/official/vision/beta/projects/centernet/losses/centernet_losses_test.py +++ b/official/projects/centernet/losses/centernet_losses_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ import numpy as np import tensorflow as tf -from official.vision.beta.projects.centernet.losses import centernet_losses +from official.projects.centernet.losses import centernet_losses LOG_2 = np.log(2) LOG_3 = np.log(3) diff --git a/official/projects/centernet/modeling/__init__.py b/official/projects/centernet/modeling/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/modeling/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/centernet/modeling/backbones/__init__.py b/official/projects/centernet/modeling/backbones/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/modeling/backbones/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/centernet/modeling/backbones/hourglass.py b/official/projects/centernet/modeling/backbones/hourglass.py similarity index 96% rename from official/vision/beta/projects/centernet/modeling/backbones/hourglass.py rename to official/projects/centernet/modeling/backbones/hourglass.py index b3f5ba394655f3a03ec31f1e7d53612ad8519394..c369e20fc64dbbebe2654ad1174b650daf38dddc 100644 --- a/official/vision/beta/projects/centernet/modeling/backbones/hourglass.py +++ b/official/projects/centernet/modeling/backbones/hourglass.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,10 +19,10 @@ from typing import Optional import tensorflow as tf from official.modeling import hyperparams -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.backbones import mobilenet -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.projects.centernet.modeling.layers import cn_nn_blocks +from official.projects.centernet.modeling.layers import cn_nn_blocks +from official.vision.modeling.backbones import factory +from official.vision.modeling.backbones import mobilenet +from official.vision.modeling.layers import nn_blocks HOURGLASS_SPECS = { 10: { diff --git a/official/vision/beta/projects/centernet/modeling/backbones/hourglass_test.py b/official/projects/centernet/modeling/backbones/hourglass_test.py similarity index 77% rename from official/vision/beta/projects/centernet/modeling/backbones/hourglass_test.py rename to official/projects/centernet/modeling/backbones/hourglass_test.py index 3e5af61024e5503278174a2da0b6a2d3283e50c5..3217608bd360a6ed8fd87a1ec7e118c83a518f32 100644 --- a/official/vision/beta/projects/centernet/modeling/backbones/hourglass_test.py +++ b/official/projects/centernet/modeling/backbones/hourglass_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,10 +18,10 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.configs import common -from official.vision.beta.projects.centernet.common import registry_imports # pylint: disable=unused-import -from official.vision.beta.projects.centernet.configs import backbones -from official.vision.beta.projects.centernet.modeling.backbones import hourglass +from official.projects.centernet.common import registry_imports # pylint: disable=unused-import +from official.projects.centernet.configs import backbones +from official.projects.centernet.modeling.backbones import hourglass +from official.vision.configs import common class HourglassTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/vision/beta/projects/centernet/modeling/centernet_model.py b/official/projects/centernet/modeling/centernet_model.py similarity index 97% rename from official/vision/beta/projects/centernet/modeling/centernet_model.py rename to official/projects/centernet/modeling/centernet_model.py index e35adf9ac1dc893d6ad88e0d16aaf9cb8f29546d..3b8de7534b2639af37a9901d6f8912da8f0477af 100644 --- a/official/vision/beta/projects/centernet/modeling/centernet_model.py +++ b/official/projects/centernet/modeling/centernet_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/centernet/modeling/centernet_model_test.py b/official/projects/centernet/modeling/centernet_model_test.py similarity index 80% rename from official/vision/beta/projects/centernet/modeling/centernet_model_test.py rename to official/projects/centernet/modeling/centernet_model_test.py index 6fa767f3e2c0876c7334a2f32d114b959ba97961..f4dc3cccc6130f7354732c8347a6db43b5617ce5 100644 --- a/official/vision/beta/projects/centernet/modeling/centernet_model_test.py +++ b/official/projects/centernet/modeling/centernet_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,12 +17,12 @@ from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.configs import common -from official.vision.beta.projects.centernet.configs import backbones -from official.vision.beta.projects.centernet.modeling import centernet_model -from official.vision.beta.projects.centernet.modeling.backbones import hourglass -from official.vision.beta.projects.centernet.modeling.heads import centernet_head -from official.vision.beta.projects.centernet.modeling.layers import detection_generator +from official.projects.centernet.configs import backbones +from official.projects.centernet.modeling import centernet_model +from official.projects.centernet.modeling.backbones import hourglass +from official.projects.centernet.modeling.heads import centernet_head +from official.projects.centernet.modeling.layers import detection_generator +from official.vision.configs import common class CenterNetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/centernet/modeling/heads/__init__.py b/official/projects/centernet/modeling/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/modeling/heads/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/centernet/modeling/heads/centernet_head.py b/official/projects/centernet/modeling/heads/centernet_head.py similarity index 93% rename from official/vision/beta/projects/centernet/modeling/heads/centernet_head.py rename to official/projects/centernet/modeling/heads/centernet_head.py index d493076c7149f2ad9cc808c4bf95fdce307b4a43..37703c1c761baac3b3d8ebba446c303748330617 100644 --- a/official/vision/beta/projects/centernet/modeling/heads/centernet_head.py +++ b/official/projects/centernet/modeling/heads/centernet_head.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,11 +14,11 @@ """Contains the definitions of head for CenterNet.""" -from typing import Any, Mapping, Dict, List +from typing import Any, Dict, List, Mapping import tensorflow as tf -from official.vision.beta.projects.centernet.modeling.layers import cn_nn_blocks +from official.projects.centernet.modeling.layers import cn_nn_blocks class CenterNetHead(tf.keras.Model): @@ -61,7 +61,6 @@ class CenterNetHead(tf.keras.Model): self._heatmap_bias = heatmap_bias self._num_inputs = len(input_levels) - input_levels = sorted(self._input_specs.keys()) inputs = {level: tf.keras.layers.Input(shape=self._input_specs[level][1:]) for level in input_levels} outputs = {} diff --git a/official/vision/beta/projects/centernet/modeling/heads/centernet_head_test.py b/official/projects/centernet/modeling/heads/centernet_head_test.py similarity index 94% rename from official/vision/beta/projects/centernet/modeling/heads/centernet_head_test.py rename to official/projects/centernet/modeling/heads/centernet_head_test.py index f1497a7e9a9275b1bda18bdd8eeaa368f6ec4c77..269d8c9ba1264634517322326650568715e7b8fe 100644 --- a/official/vision/beta/projects/centernet/modeling/heads/centernet_head_test.py +++ b/official/projects/centernet/modeling/heads/centernet_head_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.projects.centernet.modeling.heads import centernet_head +from official.projects.centernet.modeling.heads import centernet_head class CenterNetHeadTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/projects/centernet/modeling/layers/__init__.py b/official/projects/centernet/modeling/layers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/modeling/layers/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/centernet/modeling/layers/cn_nn_blocks.py b/official/projects/centernet/modeling/layers/cn_nn_blocks.py similarity index 99% rename from official/vision/beta/projects/centernet/modeling/layers/cn_nn_blocks.py rename to official/projects/centernet/modeling/layers/cn_nn_blocks.py index f8d395cb694423026dbed86591fd0e75f9473ed8..eba920e428397a59515abfbdbc1643264aeca84d 100644 --- a/official/vision/beta/projects/centernet/modeling/layers/cn_nn_blocks.py +++ b/official/projects/centernet/modeling/layers/cn_nn_blocks.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ from typing import List, Optional import tensorflow as tf -from official.vision.beta.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_blocks def _apply_blocks(inputs, blocks): diff --git a/official/vision/beta/projects/centernet/modeling/layers/cn_nn_blocks_test.py b/official/projects/centernet/modeling/layers/cn_nn_blocks_test.py similarity index 96% rename from official/vision/beta/projects/centernet/modeling/layers/cn_nn_blocks_test.py rename to official/projects/centernet/modeling/layers/cn_nn_blocks_test.py index 5ad90b496567e73ee643b110c3392c2b72324354..b66232d895f6a3561cd108037995641f23419a73 100644 --- a/official/vision/beta/projects/centernet/modeling/layers/cn_nn_blocks_test.py +++ b/official/projects/centernet/modeling/layers/cn_nn_blocks_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,8 +21,8 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.projects.centernet.modeling.layers import cn_nn_blocks +from official.projects.centernet.modeling.layers import cn_nn_blocks +from official.vision.modeling.layers import nn_blocks class HourglassBlockPyTorch(tf.keras.layers.Layer): diff --git a/official/projects/centernet/modeling/layers/detection_generator.py b/official/projects/centernet/modeling/layers/detection_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..899f97f0c021b4bb608c3e953f09610e634047fb --- /dev/null +++ b/official/projects/centernet/modeling/layers/detection_generator.py @@ -0,0 +1,339 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Detection generator for centernet. + +Parses predictions from the CenterNet head into the final bounding boxes, +confidences, and classes. This class contains repurposed methods from the +TensorFlow Object Detection API +in: https://github.com/tensorflow/models/blob/master/research/object_detection +/meta_architectures/center_net_meta_arch.py +""" + +from typing import Any, Mapping + +import tensorflow as tf + +from official.projects.centernet.ops import loss_ops +from official.projects.centernet.ops import nms_ops +from official.vision.ops import box_ops + + +class CenterNetDetectionGenerator(tf.keras.layers.Layer): + """CenterNet Detection Generator.""" + + def __init__(self, + input_image_dims: int = 512, + net_down_scale: int = 4, + max_detections: int = 100, + peak_error: float = 1e-6, + peak_extract_kernel_size: int = 3, + class_offset: int = 1, + use_nms: bool = False, + nms_pre_thresh: float = 0.1, + nms_thresh: float = 0.4, + **kwargs): + """Initialize CenterNet Detection Generator. + + Args: + input_image_dims: An `int` that specifies the input image size. + net_down_scale: An `int` that specifies stride of the output. + max_detections: An `int` specifying the maximum number of bounding + boxes generated. This is an upper bound, so the number of generated + boxes may be less than this due to thresholding/non-maximum suppression. + peak_error: A `float` for determining non-valid heatmap locations to mask. + peak_extract_kernel_size: An `int` indicating the kernel size used when + performing max-pool over the heatmaps to detect valid center locations + from its neighbors. From the paper, set this to 3 to detect valid. + locations that have responses greater than its 8-connected neighbors + class_offset: An `int` indicating to add an offset to the class + prediction if the dataset labels have been shifted. + use_nms: A `bool` for whether or not to use non-maximum suppression to + filter the bounding boxes. + nms_pre_thresh: A `float` for pre-nms threshold. + nms_thresh: A `float` for nms threshold. + **kwargs: Additional keyword arguments to be passed. + """ + super(CenterNetDetectionGenerator, self).__init__(**kwargs) + + # Object center selection parameters + self._max_detections = max_detections + self._peak_error = peak_error + self._peak_extract_kernel_size = peak_extract_kernel_size + + # Used for adjusting class prediction + self._class_offset = class_offset + + # Box normalization parameters + self._net_down_scale = net_down_scale + self._input_image_dims = input_image_dims + + self._use_nms = use_nms + self._nms_pre_thresh = nms_pre_thresh + self._nms_thresh = nms_thresh + + def process_heatmap(self, + feature_map: tf.Tensor, + kernel_size: int) -> tf.Tensor: + """Processes the heatmap into peaks for box selection. + + Given a heatmap, this function first masks out nearby heatmap locations of + the same class using max-pooling such that, ideally, only one center for the + object remains. Then, center locations are masked according to their scores + in comparison to a threshold. NOTE: Repurposed from Google OD API. + + Args: + feature_map: A Tensor with shape [batch_size, height, width, num_classes] + which is the center heatmap predictions. + kernel_size: An integer value for max-pool kernel size. + + Returns: + A Tensor with the same shape as the input but with non-valid center + prediction locations masked out. + """ + + feature_map = tf.math.sigmoid(feature_map) + if not kernel_size or kernel_size == 1: + feature_map_peaks = feature_map + else: + feature_map_max_pool = tf.nn.max_pool( + feature_map, + ksize=kernel_size, + strides=1, + padding='SAME') + + feature_map_peak_mask = tf.math.abs( + feature_map - feature_map_max_pool) < self._peak_error + + # Zero out everything that is not a peak. + feature_map_peaks = ( + feature_map * tf.cast(feature_map_peak_mask, feature_map.dtype)) + + return feature_map_peaks + + def get_top_k_peaks(self, + feature_map_peaks: tf.Tensor, + batch_size: int, + width: int, + num_classes: int, + k: int = 100): + """Gets the scores and indices of the top-k peaks from the feature map. + + This function flattens the feature map in order to retrieve the top-k + peaks, then computes the x, y, and class indices for those scores. + NOTE: Repurposed from Google OD API. + + Args: + feature_map_peaks: A `Tensor` with shape [batch_size, height, + width, num_classes] which is the processed center heatmap peaks. + batch_size: An `int` that indicates the batch size of the input. + width: An `int` that indicates the width (and also height) of the input. + num_classes: An `int` for the number of possible classes. This is also + the channel depth of the input. + k: `int`` that controls how many peaks to select. + + Returns: + top_scores: A Tensor with shape [batch_size, k] containing the top-k + scores. + y_indices: A Tensor with shape [batch_size, k] containing the top-k + y-indices corresponding to top_scores. + x_indices: A Tensor with shape [batch_size, k] containing the top-k + x-indices corresponding to top_scores. + channel_indices: A Tensor with shape [batch_size, k] containing the top-k + channel indices corresponding to top_scores. + """ + # Flatten the entire prediction per batch + feature_map_peaks_flat = tf.reshape(feature_map_peaks, [batch_size, -1]) + + # top_scores and top_indices have shape [batch_size, k] + top_scores, top_indices = tf.math.top_k(feature_map_peaks_flat, k=k) + + # Get x, y and channel indices corresponding to the top indices in the flat + # array. + y_indices, x_indices, channel_indices = ( + loss_ops.get_row_col_channel_indices_from_flattened_indices( + top_indices, width, num_classes)) + + return top_scores, y_indices, x_indices, channel_indices + + def get_boxes(self, + y_indices: tf.Tensor, + x_indices: tf.Tensor, + channel_indices: tf.Tensor, + height_width_predictions: tf.Tensor, + offset_predictions: tf.Tensor, + num_boxes: int): + """Organizes prediction information into the final bounding boxes. + + NOTE: Repurposed from Google OD API. + + Args: + y_indices: A Tensor with shape [batch_size, k] containing the top-k + y-indices corresponding to top_scores. + x_indices: A Tensor with shape [batch_size, k] containing the top-k + x-indices corresponding to top_scores. + channel_indices: A Tensor with shape [batch_size, k] containing the top-k + channel indices corresponding to top_scores. + height_width_predictions: A Tensor with shape [batch_size, height, + width, 2] containing the object size predictions. + offset_predictions: A Tensor with shape [batch_size, height, width, 2] + containing the object local offset predictions. + num_boxes: `int`, the number of boxes. + + Returns: + boxes: A Tensor with shape [batch_size, num_boxes, 4] that contains the + bounding box coordinates in [y_min, x_min, y_max, x_max] format. + detection_classes: A Tensor with shape [batch_size, num_boxes] that + gives the class prediction for each box. + num_detections: Number of non-zero confidence detections made. + """ + # TF Lite does not support tf.gather with batch_dims > 0, so we need to use + # tf_gather_nd instead and here we prepare the indices for that. + + # shapes of heatmap output + shape = tf.shape(height_width_predictions) + batch_size, height, width = shape[0], shape[1], shape[2] + + # combined indices dtype=int32 + combined_indices = tf.stack([ + loss_ops.multi_range(batch_size, value_repetitions=num_boxes), + tf.reshape(y_indices, [-1]), + tf.reshape(x_indices, [-1]) + ], axis=1) + + new_height_width = tf.gather_nd(height_width_predictions, combined_indices) + new_height_width = tf.reshape(new_height_width, [batch_size, num_boxes, 2]) + height_width = tf.maximum(new_height_width, 0.0) + + # height and widths dtype=float32 + heights = height_width[..., 0] + widths = height_width[..., 1] + + # Get the offsets of center points + new_offsets = tf.gather_nd(offset_predictions, combined_indices) + offsets = tf.reshape(new_offsets, [batch_size, num_boxes, 2]) + + # offsets are dtype=float32 + y_offsets = offsets[..., 0] + x_offsets = offsets[..., 1] + + y_indices = tf.cast(y_indices, dtype=heights.dtype) + x_indices = tf.cast(x_indices, dtype=widths.dtype) + + detection_classes = channel_indices + self._class_offset + ymin = y_indices + y_offsets - heights / 2.0 + xmin = x_indices + x_offsets - widths / 2.0 + ymax = y_indices + y_offsets + heights / 2.0 + xmax = x_indices + x_offsets + widths / 2.0 + + ymin = tf.clip_by_value(ymin, 0., tf.cast(height, ymin.dtype)) + xmin = tf.clip_by_value(xmin, 0., tf.cast(width, xmin.dtype)) + ymax = tf.clip_by_value(ymax, 0., tf.cast(height, ymax.dtype)) + xmax = tf.clip_by_value(xmax, 0., tf.cast(width, xmax.dtype)) + boxes = tf.stack([ymin, xmin, ymax, xmax], axis=2) + + return boxes, detection_classes + + def convert_strided_predictions_to_normalized_boxes(self, boxes: tf.Tensor): + boxes = boxes * tf.cast(self._net_down_scale, boxes.dtype) + boxes = boxes / tf.cast(self._input_image_dims, boxes.dtype) + boxes = tf.clip_by_value(boxes, 0.0, 1.0) + return boxes + + def __call__(self, inputs): + # Get heatmaps from decoded outputs via final hourglass stack output + all_ct_heatmaps = inputs['ct_heatmaps'] + all_ct_sizes = inputs['ct_size'] + all_ct_offsets = inputs['ct_offset'] + + ct_heatmaps = all_ct_heatmaps[-1] + ct_sizes = all_ct_sizes[-1] + ct_offsets = all_ct_offsets[-1] + + shape = tf.shape(ct_heatmaps) + + _, width = shape[1], shape[2] + batch_size, num_channels = shape[0], shape[3] + + # Process heatmaps using 3x3 max pool and applying sigmoid + peaks = self.process_heatmap( + feature_map=ct_heatmaps, + kernel_size=self._peak_extract_kernel_size) + + # Get top scores along with their x, y, and class + # Each has size [batch_size, k] + scores, y_indices, x_indices, channel_indices = self.get_top_k_peaks( + feature_map_peaks=peaks, + batch_size=batch_size, + width=width, + num_classes=num_channels, + k=self._max_detections) + + # Parse the score and indices into bounding boxes + boxes, classes = self.get_boxes( + y_indices=y_indices, + x_indices=x_indices, + channel_indices=channel_indices, + height_width_predictions=ct_sizes, + offset_predictions=ct_offsets, + num_boxes=self._max_detections) + + # Normalize bounding boxes + boxes = self.convert_strided_predictions_to_normalized_boxes(boxes) + + # Apply nms + if self._use_nms: + boxes = tf.expand_dims(boxes, axis=-2) + multi_class_scores = tf.gather_nd( + peaks, tf.stack([y_indices, x_indices], -1), batch_dims=1) + + boxes, _, scores = nms_ops.nms( + boxes=boxes, + classes=multi_class_scores, + confidence=scores, + k=self._max_detections, + limit_pre_thresh=True, + pre_nms_thresh=0.1, + nms_thresh=0.4) + + num_det = tf.reduce_sum(tf.cast(scores > 0, dtype=tf.int32), axis=1) + boxes = box_ops.denormalize_boxes( + boxes, [self._input_image_dims, self._input_image_dims]) + + return { + 'boxes': boxes, + 'classes': classes, + 'confidence': scores, + 'num_detections': num_det + } + + def get_config(self) -> Mapping[str, Any]: + config = { + 'max_detections': self._max_detections, + 'peak_error': self._peak_error, + 'peak_extract_kernel_size': self._peak_extract_kernel_size, + 'class_offset': self._class_offset, + 'net_down_scale': self._net_down_scale, + 'input_image_dims': self._input_image_dims, + 'use_nms': self._use_nms, + 'nms_pre_thresh': self._nms_pre_thresh, + 'nms_thresh': self._nms_thresh + } + + base_config = super(CenterNetDetectionGenerator, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/projects/centernet/ops/__init__.py b/official/projects/centernet/ops/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/ops/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/centernet/ops/box_list.py b/official/projects/centernet/ops/box_list.py similarity index 99% rename from official/vision/beta/projects/centernet/ops/box_list.py rename to official/projects/centernet/ops/box_list.py index 6de3b975d02e36ce88bed06e83772b8ee9c2c0e8..4e93b9fd631f6f37fc89c013e7b1dc2428e686d6 100644 --- a/official/vision/beta/projects/centernet/ops/box_list.py +++ b/official/projects/centernet/ops/box_list.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/centernet/ops/box_list_ops.py b/official/projects/centernet/ops/box_list_ops.py similarity index 98% rename from official/vision/beta/projects/centernet/ops/box_list_ops.py rename to official/projects/centernet/ops/box_list_ops.py index 998c32cf0292c41ac535819d74139c6dd8f7cdc3..d419be84af33c3a4d2f271c7822d3e01a3755513 100644 --- a/official/vision/beta/projects/centernet/ops/box_list_ops.py +++ b/official/projects/centernet/ops/box_list_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,8 +16,8 @@ import tensorflow as tf -from official.vision.beta.ops import sampling_ops -from official.vision.beta.projects.centernet.ops import box_list +from official.projects.centernet.ops import box_list +from official.vision.ops import sampling_ops def _copy_extra_fields(boxlist_to_copy_to, boxlist_to_copy_from): diff --git a/official/vision/beta/projects/centernet/ops/loss_ops.py b/official/projects/centernet/ops/loss_ops.py similarity index 98% rename from official/vision/beta/projects/centernet/ops/loss_ops.py rename to official/projects/centernet/ops/loss_ops.py index dfb585f6ff632975b8c778eac098f81490707989..db7875c110e71d779cb1f768ed3d1a9ef02d8eb3 100644 --- a/official/vision/beta/projects/centernet/ops/loss_ops.py +++ b/official/projects/centernet/ops/loss_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ import tensorflow as tf -from official.vision.beta.ops import sampling_ops +from official.vision.ops import sampling_ops def _get_shape(tensor, num_dims): diff --git a/official/vision/beta/projects/centernet/ops/nms_ops.py b/official/projects/centernet/ops/nms_ops.py similarity index 96% rename from official/vision/beta/projects/centernet/ops/nms_ops.py rename to official/projects/centernet/ops/nms_ops.py index c331b62159167464a23912ae8898b11d7de5466c..1da690b6f1a90c9da941882db2512732ad594bc3 100644 --- a/official/vision/beta/projects/centernet/ops/nms_ops.py +++ b/official/projects/centernet/ops/nms_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ import tensorflow as tf -from official.vision.beta.projects.yolo.ops import box_ops +from official.projects.yolo.ops import box_ops NMS_TILE_SIZE = 512 diff --git a/official/projects/centernet/ops/preprocess_ops.py b/official/projects/centernet/ops/preprocess_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..38f16afb3e1c40c597756186ac06235f63fec470 --- /dev/null +++ b/official/projects/centernet/ops/preprocess_ops.py @@ -0,0 +1,496 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Preprocessing ops imported from OD API.""" + +import functools + +import tensorflow as tf + +from official.projects.centernet.ops import box_list +from official.projects.centernet.ops import box_list_ops + + +def _get_or_create_preprocess_rand_vars(generator_func, + function_id, + preprocess_vars_cache, + key=''): + """Returns a tensor stored in preprocess_vars_cache or using generator_func. + + If the tensor was previously generated and appears in the PreprocessorCache, + the previously generated tensor will be returned. Otherwise, a new tensor + is generated using generator_func and stored in the cache. + + Args: + generator_func: A 0-argument function that generates a tensor. + function_id: identifier for the preprocessing function used. + preprocess_vars_cache: PreprocessorCache object that records previously + performed augmentations. Updated in-place. If this + function is called multiple times with the same + non-null cache, it will perform deterministically. + key: identifier for the variable stored. + + Returns: + The generated tensor. + """ + if preprocess_vars_cache is not None: + var = preprocess_vars_cache.get(function_id, key) + if var is None: + var = generator_func() + preprocess_vars_cache.update(function_id, key, var) + else: + var = generator_func() + return var + + +def _random_integer(minval, maxval, seed): + """Returns a random 0-D tensor between minval and maxval. + + Args: + minval: minimum value of the random tensor. + maxval: maximum value of the random tensor. + seed: random seed. + + Returns: + A random 0-D tensor between minval and maxval. + """ + return tf.random.uniform( + [], minval=minval, maxval=maxval, dtype=tf.int32, seed=seed) + + +def _get_crop_border(border, size): + """Get the border of cropping.""" + + border = tf.cast(border, tf.float32) + size = tf.cast(size, tf.float32) + + i = tf.math.ceil(tf.math.log(2.0 * border / size) / tf.math.log(2.0)) + divisor = tf.pow(2.0, i) + divisor = tf.clip_by_value(divisor, 1, border) + divisor = tf.cast(divisor, tf.int32) + + return tf.cast(border, tf.int32) // divisor + + +def random_square_crop_by_scale(image, + boxes, + labels, + max_border=128, + scale_min=0.6, + scale_max=1.3, + num_scales=8, + seed=None, + preprocess_vars_cache=None): + """Randomly crop a square in proportion to scale and image size. + + Extract a square sized crop from an image whose side length is sampled by + randomly scaling the maximum spatial dimension of the image. If part of + the crop falls outside the image, it is filled with zeros. + The augmentation is borrowed from [1] + [1]: https://arxiv.org/abs/1904.07850 + + Args: + image: rank 3 float32 tensor containing 1 image -> + [height, width, channels]. + boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. + Boxes are in normalized form meaning their coordinates vary + between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax]. + Boxes on the crop boundary are clipped to the boundary and boxes + falling outside the crop are ignored. + labels: rank 1 int32 tensor containing the object classes. + max_border: The maximum size of the border. The border defines distance in + pixels to the image boundaries that will not be considered as a center of + a crop. To make sure that the border does not go over the center of the + image, we chose the border value by computing the minimum k, such that + (max_border / (2**k)) < image_dimension/2. + scale_min: float, the minimum value for scale. + scale_max: float, the maximum value for scale. + num_scales: int, the number of discrete scale values to sample between + [scale_min, scale_max] + seed: random seed. + preprocess_vars_cache: PreprocessorCache object that records previously + performed augmentations. Updated in-place. If this + function is called multiple times with the same + non-null cache, it will perform deterministically. + + + Returns: + image: image which is the same rank as input image. + boxes: boxes which is the same rank as input boxes. + Boxes are in normalized form. + labels: new labels. + + """ + + img_shape = tf.shape(image) + height, width = img_shape[0], img_shape[1] + scales = tf.linspace(scale_min, scale_max, num_scales) + + scale = _get_or_create_preprocess_rand_vars( + lambda: scales[_random_integer(0, num_scales, seed)], + 'square_crop_scale', + preprocess_vars_cache, 'scale') + + image_size = scale * tf.cast(tf.maximum(height, width), tf.float32) + image_size = tf.cast(image_size, tf.int32) + h_border = _get_crop_border(max_border, height) + w_border = _get_crop_border(max_border, width) + + def y_function(): + y = _random_integer(h_border, + tf.cast(height, tf.int32) - h_border + 1, + seed) + return y + + def x_function(): + x = _random_integer(w_border, + tf.cast(width, tf.int32) - w_border + 1, + seed) + return x + + y_center = _get_or_create_preprocess_rand_vars( + y_function, + 'square_crop_scale', + preprocess_vars_cache, 'y_center') + + x_center = _get_or_create_preprocess_rand_vars( + x_function, + 'square_crop_scale', + preprocess_vars_cache, 'x_center') + + half_size = tf.cast(image_size / 2, tf.int32) + crop_ymin, crop_ymax = y_center - half_size, y_center + half_size + crop_xmin, crop_xmax = x_center - half_size, x_center + half_size + + ymin = tf.maximum(crop_ymin, 0) + xmin = tf.maximum(crop_xmin, 0) + ymax = tf.minimum(crop_ymax, height - 1) + xmax = tf.minimum(crop_xmax, width - 1) + + cropped_image = image[ymin:ymax, xmin:xmax] + offset_y = tf.maximum(0, ymin - crop_ymin) + offset_x = tf.maximum(0, xmin - crop_xmin) + + oy_i = offset_y + ox_i = offset_x + + output_image = tf.image.pad_to_bounding_box( + cropped_image, offset_height=oy_i, offset_width=ox_i, + target_height=image_size, target_width=image_size) + + if ymin == 0: + # We might be padding the image. + box_ymin = -offset_y + else: + box_ymin = crop_ymin + + if xmin == 0: + # We might be padding the image. + box_xmin = -offset_x + else: + box_xmin = crop_xmin + + box_ymax = box_ymin + image_size + box_xmax = box_xmin + image_size + + image_box = [box_ymin / height, box_xmin / width, + box_ymax / height, box_xmax / width] + boxlist = box_list.BoxList(boxes) + boxlist = box_list_ops.change_coordinate_frame(boxlist, image_box) + boxlist, indices = box_list_ops.prune_completely_outside_window( + boxlist, [0.0, 0.0, 1.0, 1.0]) + boxlist = box_list_ops.clip_to_window(boxlist, [0.0, 0.0, 1.0, 1.0], + filter_nonoverlapping=False) + + return_values = [output_image, + boxlist.get(), + tf.gather(labels, indices)] + + return return_values + + +def resize_to_range(image, + masks=None, + min_dimension=None, + max_dimension=None, + method=tf.image.ResizeMethod.BILINEAR, + pad_to_max_dimension=False, + per_channel_pad_value=(0, 0, 0)): + """Resizes an image so its dimensions are within the provided value. + + The output size can be described by two cases: + 1. If the image can be rescaled so its minimum dimension is equal to the + provided value without the other dimension exceeding max_dimension, + then do so. + 2. Otherwise, resize so the largest dimension is equal to max_dimension. + + Args: + image: A 3D tensor of shape [height, width, channels] + masks: (optional) rank 3 float32 tensor with shape + [num_instances, height, width] containing instance masks. + min_dimension: (optional) (scalar) desired size of the smaller image + dimension. + max_dimension: (optional) (scalar) maximum allowed size + of the larger image dimension. + method: (optional) interpolation method used in resizing. Defaults to + BILINEAR. + pad_to_max_dimension: Whether to resize the image and pad it with zeros + so the resulting image is of the spatial size + [max_dimension, max_dimension]. If masks are included they are padded + similarly. + per_channel_pad_value: A tuple of per-channel scalar value to use for + padding. By default pads zeros. + + Returns: + Note that the position of the resized_image_shape changes based on whether + masks are present. + resized_image: A 3D tensor of shape [new_height, new_width, channels], + where the image has been resized (with bilinear interpolation) so that + min(new_height, new_width) == min_dimension or + max(new_height, new_width) == max_dimension. + resized_masks: If masks is not None, also outputs masks. A 3D tensor of + shape [num_instances, new_height, new_width]. + resized_image_shape: A 1D tensor of shape [3] containing shape of the + resized image. + + Raises: + ValueError: if the image is not a 3D tensor. + """ + if len(image.get_shape()) != 3: + raise ValueError('Image should be 3D tensor') + + def _resize_landscape_image(image): + # resize a landscape image + return tf.image.resize( + image, tf.stack([min_dimension, max_dimension]), method=method, + preserve_aspect_ratio=True) + + def _resize_portrait_image(image): + # resize a portrait image + return tf.image.resize( + image, tf.stack([max_dimension, min_dimension]), method=method, + preserve_aspect_ratio=True) + + with tf.name_scope('ResizeToRange'): + if image.get_shape().is_fully_defined(): + if image.get_shape()[0] < image.get_shape()[1]: + new_image = _resize_landscape_image(image) + else: + new_image = _resize_portrait_image(image) + new_size = tf.constant(new_image.get_shape().as_list()) + else: + new_image = tf.cond( + tf.less(tf.shape(image)[0], tf.shape(image)[1]), + lambda: _resize_landscape_image(image), + lambda: _resize_portrait_image(image)) + new_size = tf.shape(new_image) + + if pad_to_max_dimension: + channels = tf.unstack(new_image, axis=2) + if len(channels) != len(per_channel_pad_value): + raise ValueError('Number of channels must be equal to the length of ' + 'per-channel pad value.') + new_image = tf.stack( + [ + tf.pad( # pylint: disable=g-complex-comprehension + channels[i], [[0, max_dimension - new_size[0]], + [0, max_dimension - new_size[1]]], + constant_values=per_channel_pad_value[i]) + for i in range(len(channels)) + ], + axis=2) + new_image.set_shape([max_dimension, max_dimension, len(channels)]) + + result = [new_image, new_size] + if masks is not None: + new_masks = tf.expand_dims(masks, 3) + new_masks = tf.image.resize( + new_masks, + new_size[:-1], + method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) + if pad_to_max_dimension: + new_masks = tf.image.pad_to_bounding_box( + new_masks, 0, 0, max_dimension, max_dimension) + new_masks = tf.squeeze(new_masks, 3) + result.append(new_masks) + + return result + + +def _augment_only_rgb_channels(image, augment_function): + """Augments only the RGB slice of an image with additional channels.""" + rgb_slice = image[:, :, :3] + augmented_rgb_slice = augment_function(rgb_slice) + image = tf.concat([augmented_rgb_slice, image[:, :, 3:]], -1) + return image + + +def random_adjust_brightness(image, + max_delta=0.2, + seed=None, + preprocess_vars_cache=None): + """Randomly adjusts brightness. + + Makes sure the output image is still between 0 and 255. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 255]. + max_delta: how much to change the brightness. A value between [0, 1). + seed: random seed. + preprocess_vars_cache: PreprocessorCache object that records previously + performed augmentations. Updated in-place. If this + function is called multiple times with the same + non-null cache, it will perform deterministically. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomAdjustBrightness'): + generator_func = functools.partial(tf.random.uniform, [], + -max_delta, max_delta, seed=seed) + delta = _get_or_create_preprocess_rand_vars( + generator_func, + 'adjust_brightness', + preprocess_vars_cache) + + def _adjust_brightness(image): + image = tf.image.adjust_brightness(image / 255, delta) * 255 + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) + return image + + image = _augment_only_rgb_channels(image, _adjust_brightness) + return image + + +def random_adjust_contrast(image, + min_delta=0.8, + max_delta=1.25, + seed=None, + preprocess_vars_cache=None): + """Randomly adjusts contrast. + + Makes sure the output image is still between 0 and 255. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 255]. + min_delta: see max_delta. + max_delta: how much to change the contrast. Contrast will change with a + value between min_delta and max_delta. This value will be + multiplied to the current contrast of the image. + seed: random seed. + preprocess_vars_cache: PreprocessorCache object that records previously + performed augmentations. Updated in-place. If this + function is called multiple times with the same + non-null cache, it will perform deterministically. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomAdjustContrast'): + generator_func = functools.partial(tf.random.uniform, [], + min_delta, max_delta, seed=seed) + contrast_factor = _get_or_create_preprocess_rand_vars( + generator_func, + 'adjust_contrast', + preprocess_vars_cache) + + def _adjust_contrast(image): + image = tf.image.adjust_contrast(image / 255, contrast_factor) * 255 + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) + return image + + image = _augment_only_rgb_channels(image, _adjust_contrast) + return image + + +def random_adjust_hue(image, + max_delta=0.02, + seed=None, + preprocess_vars_cache=None): + """Randomly adjusts hue. + + Makes sure the output image is still between 0 and 255. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 255]. + max_delta: change hue randomly with a value between 0 and max_delta. + seed: random seed. + preprocess_vars_cache: PreprocessorCache object that records previously + performed augmentations. Updated in-place. If this + function is called multiple times with the same + non-null cache, it will perform deterministically. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomAdjustHue'): + generator_func = functools.partial(tf.random.uniform, [], + -max_delta, max_delta, seed=seed) + delta = _get_or_create_preprocess_rand_vars( + generator_func, + 'adjust_hue', + preprocess_vars_cache) + + def _adjust_hue(image): + image = tf.image.adjust_hue(image / 255, delta) * 255 + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) + return image + + image = _augment_only_rgb_channels(image, _adjust_hue) + return image + + +def random_adjust_saturation(image, + min_delta=0.8, + max_delta=1.25, + seed=None, + preprocess_vars_cache=None): + """Randomly adjusts saturation. + + Makes sure the output image is still between 0 and 255. + + Args: + image: rank 3 float32 tensor contains 1 image -> [height, width, channels] + with pixel values varying between [0, 255]. + min_delta: see max_delta. + max_delta: how much to change the saturation. Saturation will change with a + value between min_delta and max_delta. This value will be + multiplied to the current saturation of the image. + seed: random seed. + preprocess_vars_cache: PreprocessorCache object that records previously + performed augmentations. Updated in-place. If this + function is called multiple times with the same + non-null cache, it will perform deterministically. + + Returns: + image: image which is the same shape as input image. + """ + with tf.name_scope('RandomAdjustSaturation'): + generator_func = functools.partial(tf.random.uniform, [], + min_delta, max_delta, seed=seed) + saturation_factor = _get_or_create_preprocess_rand_vars( + generator_func, + 'adjust_saturation', + preprocess_vars_cache) + + def _adjust_saturation(image): + image = tf.image.adjust_saturation(image / 255, saturation_factor) * 255 + image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) + return image + + image = _augment_only_rgb_channels(image, _adjust_saturation) + return image diff --git a/official/vision/beta/projects/centernet/ops/target_assigner.py b/official/projects/centernet/ops/target_assigner.py similarity index 99% rename from official/vision/beta/projects/centernet/ops/target_assigner.py rename to official/projects/centernet/ops/target_assigner.py index dd1cdc1710cdbed49a8af9e0108d5b837f8e86dc..0bbe39dffe663b2789278c903a0e1cec34321f84 100644 --- a/official/vision/beta/projects/centernet/ops/target_assigner.py +++ b/official/projects/centernet/ops/target_assigner.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ from typing import Dict, List import tensorflow as tf -from official.vision.beta.ops import sampling_ops +from official.vision.ops import sampling_ops def smallest_positive_root(a, b, c): diff --git a/official/vision/beta/projects/centernet/ops/target_assigner_test.py b/official/projects/centernet/ops/target_assigner_test.py similarity index 97% rename from official/vision/beta/projects/centernet/ops/target_assigner_test.py rename to official/projects/centernet/ops/target_assigner_test.py index 4d10dc0c65b64d80309b72af59b988630d7b19ce..36de86a4e1f1917e04ceaa6104053fe29c3baa33 100644 --- a/official/vision/beta/projects/centernet/ops/target_assigner_test.py +++ b/official/projects/centernet/ops/target_assigner_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.projects.centernet.ops import target_assigner +from official.projects.centernet.ops import target_assigner +from official.vision.ops import preprocess_ops class TargetAssignerTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/projects/centernet/tasks/__init__.py b/official/projects/centernet/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/tasks/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/centernet/tasks/centernet.py b/official/projects/centernet/tasks/centernet.py new file mode 100644 index 0000000000000000000000000000000000000000..fae44dae03ebe729403e1cc783c3405edfc2bf63 --- /dev/null +++ b/official/projects/centernet/tasks/centernet.py @@ -0,0 +1,425 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Centernet task definition.""" + +from typing import Any, List, Optional, Tuple + +from absl import logging +import tensorflow as tf + +from official.core import base_task +from official.core import input_reader +from official.core import task_factory +from official.projects.centernet.configs import centernet as exp_cfg +from official.projects.centernet.dataloaders import centernet_input +from official.projects.centernet.losses import centernet_losses +from official.projects.centernet.modeling import centernet_model +from official.projects.centernet.modeling.heads import centernet_head +from official.projects.centernet.modeling.layers import detection_generator +from official.projects.centernet.ops import loss_ops +from official.projects.centernet.ops import target_assigner +from official.vision.dataloaders import tf_example_decoder +from official.vision.dataloaders import tfds_factory +from official.vision.dataloaders import tf_example_label_map_decoder +from official.vision.evaluation import coco_evaluator +from official.vision.modeling.backbones import factory + + +@task_factory.register_task_cls(exp_cfg.CenterNetTask) +class CenterNetTask(base_task.Task): + """Task definition for centernet.""" + + def build_inputs(self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None): + """Build input dataset.""" + if params.tfds_name: + decoder = tfds_factory.get_detection_decoder(params.tfds_name) + else: + decoder_cfg = params.decoder.get() + if params.decoder.type == 'simple_decoder': + decoder = tf_example_decoder.TfExampleDecoder( + regenerate_source_id=decoder_cfg.regenerate_source_id) + elif params.decoder.type == 'label_map_decoder': + decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( + label_map=decoder_cfg.label_map, + regenerate_source_id=decoder_cfg.regenerate_source_id) + else: + raise ValueError('Unknown decoder type: {}!'.format( + params.decoder.type)) + + parser = centernet_input.CenterNetParser( + output_height=self.task_config.model.input_size[0], + output_width=self.task_config.model.input_size[1], + max_num_instances=self.task_config.model.max_num_instances, + bgr_ordering=params.parser.bgr_ordering, + channel_means=params.parser.channel_means, + channel_stds=params.parser.channel_stds, + aug_rand_hflip=params.parser.aug_rand_hflip, + aug_scale_min=params.parser.aug_scale_min, + aug_scale_max=params.parser.aug_scale_max, + aug_rand_hue=params.parser.aug_rand_hue, + aug_rand_brightness=params.parser.aug_rand_brightness, + aug_rand_contrast=params.parser.aug_rand_contrast, + aug_rand_saturation=params.parser.aug_rand_saturation, + odapi_augmentation=params.parser.odapi_augmentation, + dtype=params.dtype) + + reader = input_reader.InputReader( + params, + dataset_fn=tf.data.TFRecordDataset, + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + + dataset = reader.read(input_context=input_context) + + return dataset + + def build_model(self): + """get an instance of CenterNet.""" + model_config = self.task_config.model + input_specs = tf.keras.layers.InputSpec( + shape=[None] + model_config.input_size) + + l2_weight_decay = self.task_config.weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + backbone = factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=model_config.norm_activation, + l2_regularizer=l2_regularizer) + + task_outputs = self.task_config.get_output_length_dict() + head_config = model_config.head + head = centernet_head.CenterNetHead( + input_specs=backbone.output_specs, + task_outputs=task_outputs, + input_levels=head_config.input_levels, + heatmap_bias=head_config.heatmap_bias) + + # output_specs is a dict + backbone_output_spec = backbone.output_specs[head_config.input_levels[-1]] + if len(backbone_output_spec) == 4: + bb_output_height = backbone_output_spec[1] + elif len(backbone_output_spec) == 3: + bb_output_height = backbone_output_spec[0] + else: + raise ValueError + self._net_down_scale = int(model_config.input_size[0] / bb_output_height) + dg_config = model_config.detection_generator + detect_generator_obj = detection_generator.CenterNetDetectionGenerator( + max_detections=dg_config.max_detections, + peak_error=dg_config.peak_error, + peak_extract_kernel_size=dg_config.peak_extract_kernel_size, + class_offset=dg_config.class_offset, + net_down_scale=self._net_down_scale, + input_image_dims=model_config.input_size[0], + use_nms=dg_config.use_nms, + nms_pre_thresh=dg_config.nms_pre_thresh, + nms_thresh=dg_config.nms_thresh) + + model = centernet_model.CenterNetModel( + backbone=backbone, + head=head, + detection_generator=detect_generator_obj) + + return model + + def initialize(self, model: tf.keras.Model): + """Loading pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + + # Restoring checkpoint. + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.restore(ckpt_dir_or_file) + status.assert_consumed() + elif self.task_config.init_checkpoint_modules == 'backbone': + ckpt = tf.train.Checkpoint(backbone=model.backbone) + status = ckpt.restore(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + raise ValueError( + "Only 'all' or 'backbone' can be used to initialize the model.") + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_losses(self, + outputs, + labels, + aux_losses=None): + """Build losses.""" + input_size = self.task_config.model.input_size[0:2] + output_size = outputs['ct_heatmaps'][0].get_shape().as_list()[1:3] + + gt_label = tf.map_fn( + # pylint: disable=g-long-lambda + fn=lambda x: target_assigner.assign_centernet_targets( + labels=x, + input_size=input_size, + output_size=output_size, + num_classes=self.task_config.model.num_classes, + max_num_instances=self.task_config.model.max_num_instances, + gaussian_iou=self.task_config.losses.gaussian_iou, + class_offset=self.task_config.losses.class_offset), + elems=labels, + fn_output_signature={ + 'ct_heatmaps': tf.TensorSpec( + shape=[output_size[0], output_size[1], + self.task_config.model.num_classes], + dtype=tf.float32), + 'ct_offset': tf.TensorSpec( + shape=[self.task_config.model.max_num_instances, 2], + dtype=tf.float32), + 'size': tf.TensorSpec( + shape=[self.task_config.model.max_num_instances, 2], + dtype=tf.float32), + 'box_mask': tf.TensorSpec( + shape=[self.task_config.model.max_num_instances], + dtype=tf.int32), + 'box_indices': tf.TensorSpec( + shape=[self.task_config.model.max_num_instances, 2], + dtype=tf.int32), + } + ) + + losses = {} + + # Create loss functions + object_center_loss_fn = centernet_losses.PenaltyReducedLogisticFocalLoss() + localization_loss_fn = centernet_losses.L1LocalizationLoss() + + # Set up box indices so that they have a batch element as well + box_indices = loss_ops.add_batch_to_indices(gt_label['box_indices']) + + box_mask = tf.cast(gt_label['box_mask'], dtype=tf.float32) + num_boxes = tf.cast( + loss_ops.get_num_instances_from_weights(gt_label['box_mask']), + dtype=tf.float32) + + # Calculate center heatmap loss + output_unpad_image_shapes = tf.math.ceil( + tf.cast(labels['unpad_image_shapes'], + tf.float32) / self._net_down_scale) + valid_anchor_weights = loss_ops.get_valid_anchor_weights_in_flattened_image( + output_unpad_image_shapes, output_size[0], output_size[1]) + valid_anchor_weights = tf.expand_dims(valid_anchor_weights, 2) + + pred_ct_heatmap_list = outputs['ct_heatmaps'] + true_flattened_ct_heatmap = loss_ops.flatten_spatial_dimensions( + gt_label['ct_heatmaps']) + true_flattened_ct_heatmap = tf.cast(true_flattened_ct_heatmap, tf.float32) + + total_center_loss = 0.0 + for ct_heatmap in pred_ct_heatmap_list: + pred_flattened_ct_heatmap = loss_ops.flatten_spatial_dimensions( + ct_heatmap) + pred_flattened_ct_heatmap = tf.cast(pred_flattened_ct_heatmap, tf.float32) + total_center_loss += object_center_loss_fn( + target_tensor=true_flattened_ct_heatmap, + prediction_tensor=pred_flattened_ct_heatmap, + weights=valid_anchor_weights) + + center_loss = tf.reduce_sum(total_center_loss) / float( + len(pred_ct_heatmap_list) * num_boxes) + losses['ct_loss'] = center_loss + + # Calculate scale loss + pred_scale_list = outputs['ct_size'] + true_scale = tf.cast(gt_label['size'], tf.float32) + + total_scale_loss = 0.0 + for scale_map in pred_scale_list: + pred_scale = loss_ops.get_batch_predictions_from_indices(scale_map, + box_indices) + pred_scale = tf.cast(pred_scale, tf.float32) + # Only apply loss for boxes that appear in the ground truth + total_scale_loss += tf.reduce_sum( + localization_loss_fn(target_tensor=true_scale, + prediction_tensor=pred_scale), + axis=-1) * box_mask + + scale_loss = tf.reduce_sum(total_scale_loss) / float( + len(pred_scale_list) * num_boxes) + losses['scale_loss'] = scale_loss + + # Calculate offset loss + pred_offset_list = outputs['ct_offset'] + true_offset = tf.cast(gt_label['ct_offset'], tf.float32) + + total_offset_loss = 0.0 + for offset_map in pred_offset_list: + pred_offset = loss_ops.get_batch_predictions_from_indices(offset_map, + box_indices) + pred_offset = tf.cast(pred_offset, tf.float32) + # Only apply loss for boxes that appear in the ground truth + total_offset_loss += tf.reduce_sum( + localization_loss_fn(target_tensor=true_offset, + prediction_tensor=pred_offset), + axis=-1) * box_mask + + offset_loss = tf.reduce_sum(total_offset_loss) / float( + len(pred_offset_list) * num_boxes) + losses['ct_offset_loss'] = offset_loss + + # Aggregate and finalize loss + loss_weights = self.task_config.losses.detection + total_loss = (loss_weights.object_center_weight * center_loss + + loss_weights.scale_weight * scale_loss + + loss_weights.offset_weight * offset_loss) + + if aux_losses: + total_loss += tf.add_n(aux_losses) + + losses['total_loss'] = total_loss + return losses + + def build_metrics(self, training=True): + metrics = [] + metric_names = ['total_loss', 'ct_loss', 'scale_loss', 'ct_offset_loss'] + for name in metric_names: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + + if not training: + if (self.task_config.validation_data.tfds_name + and self.task_config.annotation_file): + raise ValueError( + "Can't evaluate using annotation file when TFDS is used.") + self.coco_metric = coco_evaluator.COCOEvaluator( + annotation_file=self.task_config.annotation_file, + include_mask=False, + per_category_metrics=self.task_config.per_category_metrics) + + return metrics + + def train_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model(features, training=True) + # Casting output layer as float32 is necessary when mixed_precision is + # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + + losses = self.build_losses(outputs['raw_output'], labels) + + scaled_loss = losses['total_loss'] / num_replicas + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + # compute the gradient + tvars = model.trainable_variables + gradients = tape.gradient(scaled_loss, tvars) + + # get unscaled loss if the scaled loss was used + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + gradients = optimizer.get_unscaled_gradients(gradients) + + if self.task_config.gradient_clip_norm > 0.0: + gradients, _ = tf.clip_by_global_norm(gradients, + self.task_config.gradient_clip_norm) + + optimizer.apply_gradients(list(zip(gradients, tvars))) + + logs = {self.loss: losses['total_loss']} + + if metrics: + for m in metrics: + m.update_state(losses[m.name]) + logs.update({m.name: m.result()}) + + return logs + + def validation_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None): + """Validation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + + outputs = model(features, training=False) + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + + losses = self.build_losses(outputs['raw_output'], labels) + + logs = {self.loss: losses['total_loss']} + + coco_model_outputs = { + 'detection_boxes': outputs['boxes'], + 'detection_scores': outputs['confidence'], + 'detection_classes': outputs['classes'], + 'num_detections': outputs['num_detections'], + 'source_id': labels['groundtruths']['source_id'], + 'image_info': labels['image_info'] + } + + logs.update({self.coco_metric.name: (labels['groundtruths'], + coco_model_outputs)}) + + if metrics: + for m in metrics: + m.update_state(losses[m.name]) + logs.update({m.name: m.result()}) + return logs + + def aggregate_logs(self, state=None, step_outputs=None): + if state is None: + self.coco_metric.reset_states() + state = self.coco_metric + self.coco_metric.update_state(step_outputs[self.coco_metric.name][0], + step_outputs[self.coco_metric.name][1]) + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + return self.coco_metric.result() diff --git a/official/projects/centernet/train.py b/official/projects/centernet/train.py new file mode 100644 index 0000000000000000000000000000000000000000..62474a33cf0f80242b17d0579d6da4390358844d --- /dev/null +++ b/official/projects/centernet/train.py @@ -0,0 +1,67 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision Centernet trainer.""" +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +from official.projects.centernet.common import registry_imports # pylint: disable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(main) diff --git a/official/projects/centernet/utils/__init__.py b/official/projects/centernet/utils/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/utils/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/centernet/utils/checkpoints/__init__.py b/official/projects/centernet/utils/checkpoints/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/centernet/utils/checkpoints/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/centernet/utils/checkpoints/config_classes.py b/official/projects/centernet/utils/checkpoints/config_classes.py similarity index 99% rename from official/vision/beta/projects/centernet/utils/checkpoints/config_classes.py rename to official/projects/centernet/utils/checkpoints/config_classes.py index 12b25d25e2e58de93347f05bfc8470cab445672a..5c67085f9f6f9478e7426ff743c64541afe35e1c 100644 --- a/official/vision/beta/projects/centernet/utils/checkpoints/config_classes.py +++ b/official/projects/centernet/utils/checkpoints/config_classes.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/centernet/utils/checkpoints/config_data.py b/official/projects/centernet/utils/checkpoints/config_data.py similarity index 96% rename from official/vision/beta/projects/centernet/utils/checkpoints/config_data.py rename to official/projects/centernet/utils/checkpoints/config_data.py index 302f661f3464e13fdfbadd12fbdc2937c0c387b8..e5cffe83467e492e196dba5fa0550c3655f6cadb 100644 --- a/official/vision/beta/projects/centernet/utils/checkpoints/config_data.py +++ b/official/projects/centernet/utils/checkpoints/config_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ from typing import Dict, Optional import numpy as np -from official.vision.beta.projects.centernet.utils.checkpoints import config_classes +from official.projects.centernet.utils.checkpoints import config_classes Conv2DBNCFG = config_classes.Conv2DBNCFG HeadConvCFG = config_classes.HeadConvCFG diff --git a/official/vision/beta/projects/centernet/utils/checkpoints/load_weights.py b/official/projects/centernet/utils/checkpoints/load_weights.py similarity index 92% rename from official/vision/beta/projects/centernet/utils/checkpoints/load_weights.py rename to official/projects/centernet/utils/checkpoints/load_weights.py index 4bc387f4e02509709ac1c4d05d2d2d4c3b473699..e193ef4da24e92da8da877f3acda44a30b37d619 100644 --- a/official/vision/beta/projects/centernet/utils/checkpoints/load_weights.py +++ b/official/projects/centernet/utils/checkpoints/load_weights.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,11 +14,11 @@ """Functions used to load the ODAPI CenterNet checkpoint.""" -from official.vision.beta.modeling.backbones import mobilenet -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.projects.centernet.modeling.layers import cn_nn_blocks -from official.vision.beta.projects.centernet.utils.checkpoints import config_classes -from official.vision.beta.projects.centernet.utils.checkpoints import config_data +from official.projects.centernet.modeling.layers import cn_nn_blocks +from official.projects.centernet.utils.checkpoints import config_classes +from official.projects.centernet.utils.checkpoints import config_data +from official.vision.modeling.backbones import mobilenet +from official.vision.modeling.layers import nn_blocks Conv2DBNCFG = config_classes.Conv2DBNCFG HeadConvCFG = config_classes.HeadConvCFG diff --git a/official/vision/beta/projects/centernet/utils/checkpoints/read_checkpoints.py b/official/projects/centernet/utils/checkpoints/read_checkpoints.py similarity index 98% rename from official/vision/beta/projects/centernet/utils/checkpoints/read_checkpoints.py rename to official/projects/centernet/utils/checkpoints/read_checkpoints.py index 850b3382587e02f41c47d7176f96d7faa097d89a..4128f404600e383fc027323171a8e8c45ade6a3b 100644 --- a/official/vision/beta/projects/centernet/utils/checkpoints/read_checkpoints.py +++ b/official/projects/centernet/utils/checkpoints/read_checkpoints.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/centernet/utils/tf2_centernet_checkpoint_converter.py b/official/projects/centernet/utils/tf2_centernet_checkpoint_converter.py similarity index 85% rename from official/vision/beta/projects/centernet/utils/tf2_centernet_checkpoint_converter.py rename to official/projects/centernet/utils/tf2_centernet_checkpoint_converter.py index 33c29efe2c307de51da3ce55fe1b82ff214b270c..d1afcf9d8d30b79403d436cb1d72e92a7f0d1448 100644 --- a/official/vision/beta/projects/centernet/utils/tf2_centernet_checkpoint_converter.py +++ b/official/projects/centernet/utils/tf2_centernet_checkpoint_converter.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,15 +19,15 @@ from absl import flags from absl import logging import tensorflow as tf -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.projects.centernet.common import registry_imports # pylint: disable=unused-import -from official.vision.beta.projects.centernet.configs import backbones -from official.vision.beta.projects.centernet.configs import centernet -from official.vision.beta.projects.centernet.modeling import centernet_model -from official.vision.beta.projects.centernet.modeling.heads import centernet_head -from official.vision.beta.projects.centernet.modeling.layers import detection_generator -from official.vision.beta.projects.centernet.utils.checkpoints import load_weights -from official.vision.beta.projects.centernet.utils.checkpoints import read_checkpoints +from official.projects.centernet.common import registry_imports # pylint: disable=unused-import +from official.projects.centernet.configs import backbones +from official.projects.centernet.configs import centernet +from official.projects.centernet.modeling import centernet_model +from official.projects.centernet.modeling.heads import centernet_head +from official.projects.centernet.modeling.layers import detection_generator +from official.projects.centernet.utils.checkpoints import load_weights +from official.projects.centernet.utils.checkpoints import read_checkpoints +from official.vision.modeling.backbones import factory FLAGS = flags.FLAGS diff --git a/official/projects/const_cl/README.md b/official/projects/const_cl/README.md new file mode 100644 index 0000000000000000000000000000000000000000..488b455fda9e606b51061e597546bc6783a5c4ac --- /dev/null +++ b/official/projects/const_cl/README.md @@ -0,0 +1,5 @@ +# Contextualized Spatial-Temporal Contrastive Learning with Self-Supervision + +(WIP) This repository contains the official implementation of +[Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision](https://arxiv.org/abs/2112.05181) +in TF2. diff --git a/official/projects/cots_detector/README.md b/official/projects/cots_detector/README.md new file mode 100644 index 0000000000000000000000000000000000000000..550382bbcedd3cb48b827f8c803e6bd701ad264f --- /dev/null +++ b/official/projects/cots_detector/README.md @@ -0,0 +1,32 @@ +# Crown-of-Thorns Starfish Detection Pipeline + +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb?force_crab_mode=1) + +This repository shows how to detect crown-of-thorns starfish (COTS) using a +pre-trained COTS detector implemented in TensorFlow. + +![Underwater photo of coral reef with annotated boxes identifying detected +starfish](https://storage.googleapis.com/download.tensorflow.org/data/cots_detection/COTS_detected_sample.png) + +## Description + +Coral reefs are some of the most diverse and important ecosystems in the world, +however they face a number of rising threats that have resulted in massive +global declines. In Australia, outbreaks of the coral-eating crown-of-thorns +starfish (COTS) have been shown to cause major coral loss, with just 15 starfish +in a hectare being able to strip a reef of 90% of its coral tissue. While COTS +naturally exist in the Indo-Pacific, overfishing and excess run-off nutrients +have led to massive outbreaks that are devastating already vulnerable coral +communities. + +Controlling COTS populations is critical to promoting coral growth and +resilience, so Google teamed up with Australia’s national science agency, +[CSIRO](https://www.csiro.au/en/), to tackle this problem. We trained ML object +detection models to help scale underwater surveys, enabling the monitoring and +mapping out these harmful invertebrates with the ultimate goal of helping +control teams to address and prioritize outbreaks. + +## Get started + +[Open the notebook in Colab](https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb?force_crab_mode=1) +to run the COTS detection pipeline. diff --git a/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb b/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..f9a790c60a633d7a51b0e11f4a5b8fc74fcadded --- /dev/null +++ b/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb @@ -0,0 +1,1473 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "xBH8CcrkV3IU" + }, + "outputs": [], + "source": [ + "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9CzbXNRovpbc" + }, + "source": [ + "# Crown-of-Thorns Starfish Detection Pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Lpb0yoNjiWhw" + }, + "source": [ + "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb?force_crab_mode=1\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n", + " \u003c/td\u003e\n", + " \u003ctd\u003e\n", + " \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView on GitHub\u003c/a\u003e\n", + " \u003c/td\u003e\n", + "\u003c/table\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GUQ1x137ysLD" + }, + "source": [ + "Coral reefs are some of the most diverse and important ecosystems in the world , however they face a number of rising threats that have resulted in massive global declines. In Australia, outbreaks of the coral-eating crown-of-thorns starfish (COTS) have been shown to cause major coral loss, with just 15 starfish in a hectare being able to strip a reef of 90% of its coral tissue. While COTS naturally exist in the Indo-Pacific, overfishing and excess run-off nutrients have led to massive outbreaks that are devastating already vulnerable coral communities.\n", + "\n", + "Controlling COTS populations is critical to promoting coral growth and resilience, so Google teamed up with Australia’s national science agency, [CSIRO](https://www.csiro.au/en/), to tackle this problem. We trained ML object detection models to help scale underwater surveys, enabling the monitoring and mapping out these harmful invertebrates with the ultimate goal of helping control teams to address and prioritize outbreaks." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jDiIX2xawkJw" + }, + "source": [ + "## About this notebook\n", + "\n", + "This notebook tutorial shows how to detect COTS using a pre-trained COTS detector implemented in TensorFlow. On top of just running the model on each frame of the video, the tracking code in this notebook aligns detections from frame to frame creating a consistent track for each COTS. Each track is given an id and frame count. Here is an example image from a video of a reef showing labeled COTS starfish.\n", + "\n", + "\u003cimg src=\"https://storage.googleapis.com/download.tensorflow.org/data/cots_detection/COTS_detected_sample.png\"\u003e" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YxCF1t-Skag8" + }, + "source": [ + "It is recommended to enable GPU to accelerate the inference. On CPU, this runs for about 40 minutes, but on GPU it takes only 10 minutes. (In Colab it should already be set to GPU in the Runtime menu: *Runtime \u003e Change runtime type \u003e Hardware accelerator \u003e select \"GPU\"*)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a4R2T97u442o" + }, + "source": [ + "## Setup \n", + "\n", + "Install all needed packages." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "5Gs7XvCGlwlj" + }, + "outputs": [], + "source": [ + "# remove the existing datascience package to avoid package conflicts in the colab environment\n", + "!pip3 uninstall -y datascience\n", + "!pip3 install -q opencv-python\n", + "!pip3 install PILLOW" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "w-UQ87240x5R" + }, + "outputs": [], + "source": [ + "# Imports\n", + "import base64\n", + "import copy\n", + "import dataclasses\n", + "import glob\n", + "import logging\n", + "import mimetypes\n", + "import os\n", + "import pathlib\n", + "import subprocess\n", + "import time\n", + "import textwrap\n", + "from typing import Dict, Iterable, List, Optional, Tuple\n", + "\n", + "from absl import logging as absl_logging\n", + "from IPython import display\n", + "import cv2\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import PIL.Image\n", + "import tensorflow as tf\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gsSclJg4sJbX" + }, + "source": [ + "Define all needed variables." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iKMCvnZEXBBT" + }, + "outputs": [], + "source": [ + "model_name = \"cots_1080_v1\" #@param [\"cots_1080_v1\", \"cots_720_v1\"]\n", + "test_sequence_name = \"test3\" #@param [\"test1\", \"test2\", \"test3\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ORLJSdLq4-gd" + }, + "outputs": [], + "source": [ + "cots_model = f\"https://storage.googleapis.com/download.tensorflow.org/models/cots_detection/{model_name}.zip\"\n", + "\n", + "# Alternatively, this dataset can be downloaded through CSIRO's Data Access Portal at https://data.csiro.au/collection/csiro:54830v2\n", + "sample_data_link = f\"https://storage.googleapis.com/download.tensorflow.org/data/cots_detection/sample_images.zip\"\n", + "\n", + "preview_video_path = \"preview.mp4\"\n", + "detection_small_video_path = \"COTS_detection.mp4\"\n", + "detection_csv_path = \"detections.csv\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FNwP3s-5xgaF" + }, + "source": [ + "You also need to retrieve the sample data. This sample data is made up of a series of chronological images." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "DF_c_ZMXdPRN" + }, + "outputs": [], + "source": [ + "sample_data_path = tf.keras.utils.get_file(origin=sample_data_link)\n", + "# Unzip data\n", + "!mkdir sample_images\n", + "!unzip -o -q {sample_data_path} -d sample_images" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ghf-4E5-ZiJn" + }, + "source": [ + "Convert the images to a video file:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kCdWsbO1afIJ" + }, + "outputs": [], + "source": [ + "tmp_video_path = \"tmp_preview.mp4\"\n", + "\n", + "filenames = sorted(glob.glob(f\"sample_images/{test_sequence_name}/*.jpg\"))\n", + "img = cv2.imread(filenames[0])\n", + "height, width, layers = img.shape\n", + "size = (width, height)\n", + "\n", + "video_writer = cv2.VideoWriter(\n", + " filename=tmp_video_path,\n", + " fourcc=cv2.VideoWriter_fourcc(*\"MP4V\"), \n", + " fps=15, \n", + " frameSize=size)\n", + " \n", + "for filename in tqdm(filenames):\n", + " img = cv2.imread(filename)\n", + " video_writer.write(img)\n", + "cv2.destroyAllWindows()\n", + "video_writer.release()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cHsKpPyviWmF" + }, + "source": [ + "Re-encode the video, and reduce its size (Colab crashes if you try to embed the full size video)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_li0qe-gh1iT" + }, + "outputs": [], + "source": [ + "subprocess.check_call([\n", + " \"ffmpeg\", \"-y\", \"-i\", tmp_video_path,\n", + " \"-vf\",\"scale=800:-1\",\n", + " \"-crf\", \"18\",\n", + " \"-preset\", \"veryfast\",\n", + " \"-vcodec\", \"libx264\", preview_video_path])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2ItoiHyYQGya" + }, + "source": [ + "The images you downloaded are frames of a movie showing a top view of a coral reef with crown-of-thorns starfish. Use the `base64` data-URL trick to embed the video in this notebook:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u0fqXQUzdZCu" + }, + "outputs": [], + "source": [ + "def embed_video_file(path: os.PathLike) -\u003e display.HTML:\n", + " \"\"\"Embeds a file in the notebook as an html tag with a data-url.\"\"\"\n", + " path = pathlib.Path(path)\n", + " mime, unused_encoding = mimetypes.guess_type(str(path))\n", + " data = path.read_bytes()\n", + "\n", + " b64 = base64.b64encode(data).decode()\n", + " return display.HTML(\n", + " textwrap.dedent(\"\"\"\n", + " \u003cvideo width=\"640\" height=\"480\" controls\u003e\n", + " \u003csource src=\"data:{mime};base64,{b64}\" type=\"{mime}\"\u003e\n", + " Your browser does not support the video tag.\n", + " \u003c/video\u003e\n", + " \"\"\").format(mime=mime, b64=b64))\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "SiOsbr8xePkg" + }, + "outputs": [], + "source": [ + "embed_video_file(preview_video_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9Z0DTbWrZMZ-" + }, + "source": [ + "Can you se them? there are lots. The goal of the model is to put boxes around all of the starfish. Each starfish will get its own ID, and that ID will be stable as the camera passes over it." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d0iALUwM0g2p" + }, + "source": [ + "## Load the model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fVq6vNBTxM62" + }, + "source": [ + "Download the trained COTS detection model that matches your preferences from earlier." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "No5jRA1TxXj0" + }, + "outputs": [], + "source": [ + "model_path = tf.keras.utils.get_file(origin=cots_model)\n", + "# Unzip model\n", + "!mkdir {model_name}\n", + "!unzip -o -q {model_path} -d {model_name}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ezyuSHK5ap__" + }, + "source": [ + "Load trained model from disk and create the inference function `model_fn()`. This might take a little while." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HXQnNjwl8Beu" + }, + "outputs": [], + "source": [ + "absl_logging.set_verbosity(absl_logging.ERROR)\n", + "\n", + "tf.config.optimizer.set_experimental_options({'auto_mixed_precision': True})\n", + "tf.config.optimizer.set_jit(True)\n", + "\n", + "model_fn = tf.saved_model.load(model_name).signatures['serving_default']" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OvLuznhUa7uG" + }, + "source": [ + "Here's one test image; how many COTS can you see?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "XmQF_2L_a7Hu" + }, + "outputs": [], + "source": [ + "example_frame_number = 52\n", + "image = tf.io.read_file(filenames[example_frame_number])\n", + "image = tf.io.decode_jpeg(image)\n", + "\n", + "# Caution PIL and tf use \"RGB\" color order, while cv2 uses \"BGR\".\n", + "PIL.Image.fromarray(image.numpy())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KSOf4V8WhTHF" + }, + "source": [ + "## Raw model outputs\n", + "\n", + "Try running the model on the image. The model expects a batch of images so add an outer `batch` dimension before calling the model.\n", + "\n", + "Note: The model only runs correctly with a batch size of 1.\n", + "\n", + "The result is a dictionary with a number of fields. For all fields the first dimension of the shape is the `batch` dimension, " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iqLHo8h0c2pW" + }, + "outputs": [], + "source": [ + "image_batch = image[tf.newaxis, ...]\n", + "result = model_fn(image_batch)\n", + "\n", + "print(f\"{'image_batch':20s}- shape: {image_batch.shape}\")\n", + "\n", + "for key, value in result.items():\n", + " print(f\"{key:20s}- shape: {value.shape}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0xuNoKLCjyDz" + }, + "source": [ + "The `num_detections` field gives the number of valid detections, but this is always 100. There are always 100 locations that _could_ be a COTS." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nGCDZJQvkIOL" + }, + "outputs": [], + "source": [ + "print('\\nnum_detections: ', result['num_detections'].numpy())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cSd7JJYqkPz7" + }, + "source": [ + "Similarly the `detection_classes` field is always `0`, since the model only detects 1 class: COTS." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JoY8bJrfkcuS" + }, + "outputs": [], + "source": [ + "print('detection_classes: \\n', result['detection_classes'].numpy())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "X2nVLSOokyog" + }, + "source": [ + "What actually matters here is the detection scores, indicating the quality of each detection: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iepEgCc2jsRD" + }, + "outputs": [], + "source": [ + "result['detection_scores'].numpy()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fn2B0nbplAFy" + }, + "source": [ + "You need to choose a threshold that determines what counts as a good detection. This frame has a few good detections:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a30Uyc0WlK2a" + }, + "outputs": [], + "source": [ + "good_detections = result['detection_scores'] \u003e 0.4\n", + "good_detections.numpy()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y_xrbQiAlWrK" + }, + "source": [ + "## Bounding boxes and detections\n", + "\n", + "Build a class to handle the detection boxes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "S5inzqu-4JhT" + }, + "outputs": [], + "source": [ + "@dataclasses.dataclass(frozen=True)\n", + "class BBox:\n", + " x0: float\n", + " y0: float\n", + " x1: float\n", + " y1: float\n", + "\n", + " def replace(self, **kwargs):\n", + " d = self.__dict__.copy()\n", + " d.update(kwargs)\n", + " return type(self)(**d)\n", + "\n", + " @property\n", + " def center(self)-\u003e Tuple[float, float]:\n", + " return ((self.x0+self.x1)/2, (self.y0+self.y1)/2)\n", + " \n", + " @property\n", + " def width(self) -\u003e float:\n", + " return self.x1 - self.x0\n", + "\n", + " @property\n", + " def height(self) -\u003e float:\n", + " return self.y1 - self.y0\n", + "\n", + " @property\n", + " def area(self)-\u003e float:\n", + " return (self.x1 - self.x0 + 1) * (self.y1 - self.y0 + 1)\n", + " \n", + " def intersection(self, other)-\u003e Optional['BBox']:\n", + " x0 = max(self.x0, other.x0)\n", + " y0 = max(self.y0, other.y0)\n", + " x1 = min(self.x1, other.x1)\n", + " y1 = min(self.y1, other.y1)\n", + " if x0 \u003e x1 or y0 \u003e y1:\n", + " return None\n", + " return BBox(x0, y0, x1, y1)\n", + "\n", + " def iou(self, other):\n", + " intersection = self.intersection(other)\n", + " if intersection is None:\n", + " return 0\n", + " \n", + " ia = intersection.area\n", + "\n", + " return ia/(self.area + other.area - ia)\n", + " \n", + " def draw(self, image, label=None, color=(0, 140, 255)):\n", + " image = np.asarray(image)\n", + " cv2.rectangle(image, \n", + " (int(self.x0), int(self.y0)),\n", + " (int(self.x1), int(self.y1)),\n", + " color,\n", + " thickness=2)\n", + " if label is not None:\n", + " cv2.putText(image, str(label), \n", + " (int(self.x0), int(self.y0-10)),\n", + " cv2.FONT_HERSHEY_SIMPLEX,\n", + " 0.9, color, thickness=2)\n", + " return image" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2izYMR9Q6Dn0" + }, + "source": [ + "And a class to represent a `Detection`, with a method to create a list of detections from the model's output:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tybwY3eaY803" + }, + "outputs": [], + "source": [ + "@dataclasses.dataclass(frozen=True)\n", + "class Detection:\n", + " \"\"\"Detection dataclass.\"\"\"\n", + " class_id: int\n", + " score: float\n", + " bbox: BBox\n", + " threshold:float = 0.4\n", + "\n", + " def replace(self, **kwargs):\n", + " d = self.__dict__.copy()\n", + " d.update(kwargs)\n", + " return type(self)(**d)\n", + "\n", + " @classmethod\n", + " def process_model_output(\n", + " cls, image, detections: Dict[str, tf.Tensor]\n", + " ) -\u003e Iterable['Detection']:\n", + " \n", + " # The model only works on a batch size of 1.\n", + " detection_boxes = detections['detection_boxes'].numpy()[0]\n", + " detection_classes = detections['detection_classes'].numpy()[0].astype(np.int32)\n", + " detection_scores = detections['detection_scores'].numpy()[0]\n", + "\n", + " img_h, img_w = image.shape[0:2]\n", + "\n", + " valid_indices = detection_scores \u003e= cls.threshold\n", + " classes = detection_classes[valid_indices]\n", + " scores = detection_scores[valid_indices]\n", + " boxes = detection_boxes[valid_indices, :]\n", + " detections = []\n", + "\n", + " for class_id, score, box in zip(classes, scores, boxes):\n", + " detections.append(\n", + " Detection(\n", + " class_id=class_id,\n", + " score=score,\n", + " bbox=BBox(\n", + " x0=box[1] * img_w,\n", + " y0=box[0] * img_h,\n", + " x1=box[3] * img_w,\n", + " y1=box[2] * img_h,)))\n", + "\n", + " return detections" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QRZ9Q5meHl84" + }, + "source": [ + "## Preview some detections\n", + "\n", + "Now you can preview the model's output:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Px7AoFCn-psx" + }, + "outputs": [], + "source": [ + "detections = Detection.process_model_output(image, result)\n", + "\n", + "for n, det in enumerate(detections):\n", + " det.bbox.draw(image, label=n+1, color=(255, 140, 0))\n", + "\n", + "PIL.Image.fromarray(image.numpy())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "B1q_n1xJLm60" + }, + "source": [ + "That works well for one frame, but to count the number of COTS in a video you'll need to track the detections from frame to frame. The raw detection indices are not stable, they're just sorted by the detection score. Below both sets of detections are overlaid on the second image with the first frame's detections in white and the second frame's in orange, the indices are not aligned. The positions are shifted because of camera motion between the two frames:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PLtxJFPuLma0" + }, + "outputs": [], + "source": [ + "image2 = tf.io.read_file(filenames[example_frame_number+5]) # five frames later\n", + "image2 = tf.io.decode_jpeg(image2)\n", + "result2 = model_fn(image2[tf.newaxis, ...])\n", + "detections2 = Detection.process_model_output(image2, result2)\n", + "\n", + "for n, det in enumerate(detections):\n", + " det.bbox.draw(image2, label=n+1, color=(255, 255, 255))\n", + "\n", + "for n, det in enumerate(detections2):\n", + " det.bbox.draw(image2, label=n+1, color=(255, 140, 0))\n", + "\n", + "PIL.Image.fromarray(image2.numpy())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CoRxLon5MZ35" + }, + "source": [ + "## Use optical flow to align detections\n", + "\n", + "The two sets of bounding boxes above don't line up because of camera movement. \n", + "To see in more detail how tracks are aligned, initialize the tracker with the first image, and then run the optical flow step, `propagate_tracks`. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wb_nkcPJJx2t" + }, + "outputs": [], + "source": [ + "def default_of_params():\n", + " its=20\n", + " eps=0.03\n", + " return {\n", + " 'winSize': (64,64),\n", + " 'maxLevel': 3,\n", + " 'criteria': (cv2.TermCriteria_COUNT + cv2.TermCriteria_EPS, its, eps)\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mHVPymG8F2ke" + }, + "outputs": [], + "source": [ + "def propagate_detections(detections, image1, image2, of_params=None):\n", + " if of_params is None:\n", + " of_params = default_of_params()\n", + "\n", + " bboxes = [det.bbox for det in detections]\n", + " centers = np.float32([[bbox.center for bbox in bboxes]])\n", + " widths = np.float32([[bbox.width for bbox in bboxes]])\n", + " heights = np.float32([[bbox.height for bbox in bboxes]])\n", + "\n", + "\n", + " new_centers, status, error = cv2.calcOpticalFlowPyrLK(\n", + " image1, image2, centers, None, **of_params)\n", + "\n", + " x0s = new_centers[...,0] - widths/2\n", + " x1s = new_centers[...,0] + widths/2\n", + " y0s = new_centers[...,1] - heights/2\n", + " y1s = new_centers[...,1] + heights/2\n", + "\n", + " updated_detections = []\n", + " for i, det in enumerate(detections):\n", + " det = det.replace(\n", + " bbox = BBox(x0s[0,i], y0s[0,i], x1s[0,i], y1s[0,i]))\n", + " updated_detections.append(det)\n", + " return updated_detections" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dCjgvoZnOcBu" + }, + "source": [ + "Now keep the white boxes for the initial detections, and the orange boxes for the new set of detections. But add the optical-flow propagated tracks in green. You can see that by using optical-flow to propagate the old detections to the new frame the alignment is quite good. It's this alignment between the old and new detections (between the green and orange boxes) that allows the tracker to make a persistent track for each COTS. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "aeTny8YnHwTw" + }, + "outputs": [], + "source": [ + "image = tf.io.read_file(filenames[example_frame_number])\n", + "image = tf.io.decode_jpeg(image).numpy()\n", + "image_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)\n", + "\n", + "image2 = tf.io.read_file(filenames[example_frame_number+5]) # five frames later\n", + "image2 = tf.io.decode_jpeg(image2).numpy()\n", + "image2_gray = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)\n", + "\n", + "updated_detections = propagate_detections(detections, image_gray, image2_gray)\n", + "\n", + "\n", + "for det in detections:\n", + " det.bbox.draw(image2, color=(255, 255, 255))\n", + "\n", + "for det in updated_detections:\n", + " det.bbox.draw(image2, color=(0, 255, 0))\n", + "\n", + "for det in detections2:\n", + " det.bbox.draw(image2, color=(255, 140, 0))\n", + "\n", + "PIL.Image.fromarray(image2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jbZ-7ICCENWG" + }, + "source": [ + "## Define **OpticalFlowTracker** class\n", + "\n", + "These help track the movement of each COTS object across the video frames.\n", + "\n", + "The tracker collects related detections into `Track` objects. \n", + "\n", + "The class's init is defined below, it's methods are defined in the following cells.\n", + "\n", + "The `__init__` method just initializes the track counter (`track_id`), and sets some default values for the tracking and optical flow configurations. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3j2Ka1uGEoz4" + }, + "outputs": [], + "source": [ + "class OpticalFlowTracker:\n", + " \"\"\"Optical flow tracker.\"\"\"\n", + "\n", + " @classmethod\n", + " def add_method(cls, fun):\n", + " \"\"\"Attach a new method to the class.\"\"\"\n", + " setattr(cls, fun.__name__, fun)\n", + "\n", + "\n", + " def __init__(self, tid=1, ft=3.0, iou=0.5, tt=2.0, bb=32, of_params=None):\n", + " # Bookkeeping for the tracks.\n", + " # The running track count, incremented for each new track.\n", + " self.track_id = tid\n", + " self.tracks = []\n", + " self.prev_image = None\n", + " self.prev_time = None\n", + "\n", + " # Configuration for the track cleanup logic.\n", + " # How long to apply optical flow tracking without getting positive \n", + " # detections (sec).\n", + " self.track_flow_time = ft * 1000\n", + " # Required IoU overlap to link a detection to a track.\n", + " self.overlap_threshold = iou\n", + " # Used to detect if detector needs to be reset.\n", + " self.time_threshold = tt * 1000\n", + " self.border = bb\n", + "\n", + " if of_params is None:\n", + " of_params = default_of_params()\n", + " self.of_params = of_params\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yBLSv0Fi_JJD" + }, + "source": [ + "Internally the tracker will use small `Track` and `Tracklet` classes to organize the data. The `Tracklet` class is just a `Detection` with a timestamp, while a `Track` is a track ID, the most recent detection and a list of `Tracklet` objects forming the history of the track." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gCQFfAkaY_WN" + }, + "outputs": [], + "source": [ + "@dataclasses.dataclass(frozen=True)\n", + "class Tracklet:\n", + " timestamp:float\n", + " detection:Detection\n", + "\n", + " def replace(self, **kwargs):\n", + " d = self.__dict__.copy()\n", + " d.update(kwargs)\n", + " return type(self)(**d)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7qVW1a_YZBgL" + }, + "outputs": [], + "source": [ + "@dataclasses.dataclass(frozen=True)\n", + "class Track:\n", + " \"\"\"Tracker entries.\"\"\"\n", + " id:int\n", + " det: Detection\n", + " linked_dets:List[Tracklet] = dataclasses.field(default_factory=list)\n", + "\n", + " def replace(self, **kwargs):\n", + " d = self.__dict__.copy()\n", + " d.update(kwargs)\n", + " return type(self)(**d)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ntl_4oUp_1nD" + }, + "source": [ + "The tracker keeps a list of active `Track` objects.\n", + "\n", + "The main `update` method takes an image, along with the list of detections and the timestamp for that image. On each frame step it performs the following sub-tasks:\n", + "\n", + "* The tracker uses optical flow to calculate where each `Track` expects to see a new `Detection`.\n", + "* The tracker matches up the actual detections for the frame to the expected detections for each Track.\n", + "* If a detection doesn't get matched to an existing track, a new track is created for the detection.\n", + "* If a track stops getting assigned new detections, it is eventually deactivated. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "koZ0mjFTpiTv" + }, + "outputs": [], + "source": [ + "@OpticalFlowTracker.add_method\n", + "def update(self, image_bgr, detections, timestamp):\n", + " start = time.time()\n", + "\n", + " image = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2GRAY)\n", + "\n", + " # Remove dead tracks.\n", + " self.tracks = self.cleanup_tracks(image, timestamp)\n", + "\n", + " # Run optical flow to update existing tracks.\n", + " if self.prev_time is not None:\n", + " self.tracks = self.propagate_tracks(image)\n", + "\n", + " # Update the track list based on the new detections\n", + " self.apply_detections_to_tracks(image, detections, timestamp)\n", + "\n", + " self.prev_image = image\n", + " self.prev_time = timestamp\n", + "\n", + " return self.tracks" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U-6__zF2CHFS" + }, + "source": [ + "The `cleanup_tracks` method clears tracks that are too old or are too close to the edge of the image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HQBj8GihjF3-" + }, + "outputs": [], + "source": [ + "@OpticalFlowTracker.add_method\n", + "def cleanup_tracks(self, image, timestamp) -\u003e List[Track]:\n", + " image_w = image.shape[1]\n", + " image_h = image.shape[0]\n", + "\n", + " # Assume tracker is invalid if too much time has passed!\n", + " if (self.prev_time is not None and\n", + " timestamp - self.prev_time \u003e self.time_threshold):\n", + " logging.info(\n", + " 'Too much time since last update, resetting tracker.')\n", + " return []\n", + "\n", + " # Remove tracks which are:\n", + " # - Touching the image edge.\n", + " # - Have existed for a long time without linking a real detection.\n", + " active_tracks = []\n", + " for track in self.tracks:\n", + " bbox = track.det.bbox\n", + " if (bbox.x0 \u003c self.border or bbox.y0 \u003c self.border or\n", + " bbox.x1 \u003e= (image_w - self.border) or\n", + " bbox.y1 \u003e= (image_h - self.border)):\n", + " logging.info(f'Removing track {track.id} because it\\'s near the border')\n", + " continue\n", + "\n", + " time_since_last_detection = timestamp - track.linked_dets[-1].timestamp\n", + " if (time_since_last_detection \u003e self.track_flow_time):\n", + " logging.info(f'Removing track {track.id} because it\\'s too old '\n", + " f'({time_since_last_detection:.02f}s)')\n", + " continue\n", + "\n", + " active_tracks.append(track)\n", + "\n", + " return active_tracks" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DVzNcESxC6vY" + }, + "source": [ + "The `propagate_tracks` method uses optical flow to update each track's bounding box's position to predict their location in the new image: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0GycdAflCs6v" + }, + "outputs": [], + "source": [ + "@OpticalFlowTracker.add_method\n", + "def propagate_tracks(self, image):\n", + " if not self.tracks:\n", + " return self.tracks[:]\n", + "\n", + " detections = [track.det for track in self.tracks]\n", + " detections = propagate_detections(detections, self.prev_image, image, self.of_params)\n", + "\n", + " return [track.replace(det=det) \n", + " for track, det in zip(self.tracks, detections)]\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uLbVeetwD0ph" + }, + "source": [ + "The `apply_detections_to_tracks` method compares each detection to the updated bounding box for each track. The detection is added to the track that matches best, if the match is better than the `overlap_threshold`. If no track is better than the threshold, the detection is used to create a new track. \n", + "\n", + "If a track has no new detection assigned to it the predicted detection is used." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "j6pfRhDRlApe" + }, + "outputs": [], + "source": [ + "@OpticalFlowTracker.add_method\n", + "def apply_detections_to_tracks(self, image, detections, timestamp):\n", + " image_w = image.shape[1]\n", + " image_h = image.shape[0]\n", + "\n", + " # Insert new detections.\n", + " detected_obj_track_ids = set()\n", + "\n", + " for detection in detections:\n", + " bbox = detection.bbox\n", + " if (bbox.x0 \u003c self.border or bbox.y0 \u003c self.border or\n", + " bbox.x1 \u003e= image_w - self.border or\n", + " bbox.y1 \u003e= image_h - self.border):\n", + " logging.debug('Skipping detection because it\\'s close to the border.')\n", + " continue\n", + "\n", + " # See if detection can be linked to an existing track.\n", + " linked = False\n", + " overlap_index = 0\n", + " overlap_max = -1000\n", + " for track_index, track in enumerate(self.tracks):\n", + " logging.debug('Testing track %d', track_index)\n", + " if track.det.class_id != detection.class_id:\n", + " continue\n", + " overlap = detection.bbox.iou(track.det.bbox)\n", + " if overlap \u003e overlap_max:\n", + " overlap_index = track_index\n", + " overlap_max = overlap\n", + "\n", + " # Link to existing track with maximal IoU.\n", + " if overlap_max \u003e self.overlap_threshold:\n", + " track = self.tracks[overlap_index]\n", + " self.tracks[overlap_index] = track.replace(det=detection)\n", + " track.linked_dets.append(Tracklet(timestamp, detection))\n", + " detected_obj_track_ids.add(track.id)\n", + " linked = True\n", + "\n", + " if not linked:\n", + " logging.info(f'Creating new track with ID {self.track_id}')\n", + " new_track = Track(self.track_id, detection)\n", + " new_track.linked_dets.append(Tracklet(timestamp, detection))\n", + " detected_obj_track_ids.add(self.track_id)\n", + " self.tracks.append(new_track)\n", + " self.track_id += 1\n", + "\n", + " for track in self.tracks:\n", + " # If the detector does not find the obj but estimated in the tracker, \n", + " # add the estimated one to that tracker's linked_dets\n", + " if track.id not in detected_obj_track_ids:\n", + " track.linked_dets.append(Tracklet(timestamp, track.det))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gY0AH-KUHPlC" + }, + "source": [ + "## Test run the tracker\n", + "\n", + "So reload the test images, and run the detections to test out the tracker.\n", + "\n", + "On the first frame it creates and returns one track per detection:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7Ekkj_XFGdfq" + }, + "outputs": [], + "source": [ + "example_frame_number = 52\n", + "image = tf.io.read_file(filenames[example_frame_number])\n", + "image = tf.io.decode_jpeg(image)\n", + "result = model_fn(image[tf.newaxis, ...])\n", + "detections = Detection.process_model_output(image, result)\n", + "\n", + "tracker = OpticalFlowTracker()\n", + "tracks = tracker.update(image.numpy(), detections, timestamp = 0)\n", + "\n", + "print(f'detections : {len(detections)}') \n", + "print(f'tracks : {len(tracks)}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WovDYdNMII-n" + }, + "source": [ + "On the second frame many of the detections get assigned to existing tracks:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7iFEKwgMGi5n" + }, + "outputs": [], + "source": [ + "image2 = tf.io.read_file(filenames[example_frame_number+5]) # five frames later\n", + "image2 = tf.io.decode_jpeg(image2)\n", + "result2 = model_fn(image2[tf.newaxis, ...])\n", + "detections2 = Detection.process_model_output(image2, result2)\n", + "\n", + "new_tracks = tracker.update(image2.numpy(), detections2, timestamp = 1000)\n", + "\n", + "print(f'detections : {len(detections2)}') \n", + "print(f'tracks : {len(new_tracks)}')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dbkedwiVrxnQ" + }, + "source": [ + "Now the track IDs should be consistent:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QexJR5gerw6q" + }, + "outputs": [], + "source": [ + "test_img = image2.numpy()\n", + "for n,track in enumerate(tracks):\n", + " track.det.bbox.draw(test_img, label=n, color=(255, 255, 255))\n", + "\n", + "for n,track in enumerate(new_tracks):\n", + " track.det.bbox.draw(test_img, label=n, color=(255, 140, 0))\n", + "\n", + "PIL.Image.fromarray(test_img)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OW5gGixy1osE" + }, + "source": [ + "## Perform the COTS detection inference and tracking." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "f21596933d08" + }, + "source": [ + "The main tracking loop will perform the following: \n", + "\n", + "1. Load the images in order.\n", + "2. Run the model on the image.\n", + "3. Update the tracker with the new images and detections.\n", + "4. Keep information about each track (id, current index and length) analysis or display. \n", + "\n", + "The `TrackAnnotation` class, below, will collect the data about each track:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "lESJE0qXxubm" + }, + "outputs": [], + "source": [ + "@dataclasses.dataclass(frozen=True)\n", + "class TrackAnnotation:\n", + " det: Detection\n", + " seq_id: int\n", + " seq_idx: int\n", + " seq_length: Optional[int] = None\n", + "\n", + " def replace(self, **kwargs):\n", + " d = self.__dict__.copy()\n", + " d.update(kwargs)\n", + " return type(self)(**d)\n", + "\n", + " def annotation_str(self):\n", + " return f\"{self.seq_id} ({self.seq_idx}/{self.seq_length})\"\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3863fb28cd34" + }, + "source": [ + "The `parse_image` function, below, will take `(index, filename)` pairs load the images as tensors and return `(timestamp_ms, filename, image)` triples, assuming 30fps" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Dn7efhr0GBGz" + }, + "outputs": [], + "source": [ + "# Read a jpg image and decode it to a uint8 tf tensor.\n", + "def parse_image(index, filename):\n", + " image = tf.io.read_file(filename)\n", + " image = tf.io.decode_jpeg(image)\n", + " timestamp_ms = 1000*index/30 # assuming 30fps\n", + " return (timestamp_ms, filename, image)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8f878e4b0852" + }, + "source": [ + "Here is the main tracker loop. Note that initially the saved `TrackAnnotations` don't contain the track lengths. The lengths are collected in the `track_length_for_id` dict." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cqN8RGBgVbr4" + }, + "outputs": [], + "source": [ + "# Create a tracker object\n", + "tracker = OpticalFlowTracker(tid=1)\n", + "# Record tracking responses from the tracker\n", + "detection_result = []\n", + "# Record the length of each tracking sequence\n", + "track_length_for_id = {}\n", + "\n", + "# Create a data loader\n", + "file_list = sorted(glob.glob(f\"sample_images/{test_sequence_name}/*.jpg\"))\n", + "list_ds = tf.data.Dataset.from_tensor_slices(file_list).enumerate()\n", + "images_ds = list_ds.map(parse_image)\n", + "\n", + "# Traverse the dataset with batch size = 1, you cannot change the batch size\n", + "for timestamp_ms, file_path, images in tqdm(images_ds.batch(1, drop_remainder=True)):\n", + " # get detection result\n", + " detections = Detection.process_model_output(images[0], model_fn(images))\n", + "\n", + " # Feed detection results and the corresponding timestamp to the tracker, and then get tracker response\n", + " tracks = tracker.update(images[0].numpy(), detections, timestamp_ms[0])\n", + " annotations = []\n", + " for track in tracks:\n", + " anno = TrackAnnotation(\n", + " det=track.det,\n", + " seq_id = track.id,\n", + " seq_idx = len(track.linked_dets)\n", + " )\n", + " annotations.append(anno)\n", + " track_length_for_id[track.id] = len(track.linked_dets)\n", + " \n", + " detection_result.append((file_path.numpy()[0].decode(), annotations))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "29306d7f32df" + }, + "source": [ + "Once the tracking loop has completed you can update the track length (`seq_length`) for each annotation from the `track_length_for_id` dict:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oPSfnQ1o04Rx" + }, + "outputs": [], + "source": [ + "def update_annotation_lengths(detection_result, track_length_for_id):\n", + " new_result = []\n", + " for file_path, annotations in detection_result:\n", + " new_annotations = []\n", + " for anno in annotations:\n", + " anno = anno.replace(seq_length=track_length_for_id[anno.seq_id])\n", + " new_annotations.append(anno)\n", + " new_result.append((file_path, new_annotations))\n", + " return new_result" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "zda914lv1o_v" + }, + "outputs": [], + "source": [ + "detection_result = update_annotation_lengths(detection_result, track_length_for_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QkpmYRyFAMlM" + }, + "source": [ + "## Output the detection results and play the result video\n", + "\n", + "Once the inference is done, we draw the bounding boxes and track information onto each frame's image. Finally, we combine all frames into a video for visualisation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gWMJG7g95MGk" + }, + "outputs": [], + "source": [ + "detection_full_video_path = \"COTS_detection_full_size.mp4\"\n", + "detect_video_writer = cv2.VideoWriter(\n", + " filename=detection_full_video_path,\n", + " fourcc=cv2.VideoWriter_fourcc(*\"MP4V\"), \n", + " fps=15, \n", + " frameSize=size)\n", + "\n", + "for file_path, annotations in tqdm(detection_result):\n", + " image = cv2.imread(file_path)\n", + " for anno in annotations:\n", + " anno.det.bbox.draw(image, label=anno.annotation_str(), color=(0, 140, 255))\n", + " detect_video_writer.write(image)\n", + "cv2.destroyAllWindows()\n", + "\n", + "detect_video_writer.release()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9s1myz67jcV8" + }, + "outputs": [], + "source": [ + "subprocess.check_call([\n", + " \"ffmpeg\",\"-y\", \"-i\", detection_full_video_path,\n", + " \"-vf\",\"scale=800:-1\",\n", + " \"-crf\", \"18\",\n", + " \"-preset\", \"veryfast\",\n", + " \"-vcodec\", \"libx264\", detection_small_video_path])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wsK5cvX5jkL7" + }, + "outputs": [], + "source": [ + "embed_video_file(detection_small_video_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n1oOgMR2zzIl" + }, + "source": [ + "The output video is now saved as movie at `detection_full_video_path`. You can download your video by uncommenting the following code." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tyHucK8lbGXk" + }, + "outputs": [], + "source": [ + "#try:\n", + "# from google.colab import files\n", + "# files.download(detection_full_video_path)\n", + "#except ImportError:\n", + "# pass" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "name": "crown_of_thorns_starfish_detection_pipeline.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/deepmac_maskrcnn/README.md b/official/projects/deepmac_maskrcnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e0dc3cafa5b837aabb30948c19ff7571d82b8746 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/README.md @@ -0,0 +1,129 @@ +# Mask R-CNN with deep mask heads + +This project brings insights from the DeepMAC model into the Mask-RCNN +architecture. Please see the paper +[The surprising impact of mask-head architecture on novel class segmentation](https://arxiv.org/abs/2104.00613) +for more details. + +## Code structure + +* This folder contains forks of a few Mask R-CNN files and repurposes them to + support deep mask heads. +* To see the benefits of using deep mask heads, it is important to train the + mask head with only groundtruth boxes. This is configured via the + `task.model.use_gt_boxes_for_masks` flag. +* Architecture of the mask head can be changed via the config value + `task.model.mask_head.convnet_variant`. Supported values are `"default"`, + `"hourglass20"`, `"hourglass52"`, and `"hourglass100"`. +* The flag `task.model.mask_head.class_agnostic` trains the model in class + agnostic mode and `task.allowed_mask_class_ids` controls which classes are + allowed to have masks during training. +* Majority of experiments and ablations from the paper are perfomed with the + [DeepMAC model](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/deepmac.md) + in the Object Detection API code base. + +## Prerequisites + +### Prepare dataset + +Use [create_coco_tf_record.py](https://github.com/tensorflow/models/blob/master/official/vision/data/create_coco_tf_record.py) to create +the COCO dataset. The data needs to be store in a +[Google cloud storage bucket](https://cloud.google.com/storage/docs/creating-buckets) +so that it can be accessed by the TPU. + +### Start a TPU v3-32 instance + +See [TPU Quickstart](https://cloud.google.com/tpu/docs/quickstart) for +instructions. An example command would look like: + +```shell +ctpu up --name --zone --tpu-size=v3-32 --tf-version nightly +``` + +This model requires TF version `>= 2.5`. Currently, that is only available via a +`nightly` build on Cloud. + + +### Install requirements + +SSH into the TPU host with `gcloud compute ssh ` and execute the +following. + +```shell +$ git clone https://github.com/tensorflow/models.git +$ cd models +$ pip3 install -r official/requirements.txt +``` + +## Training Models + +The configurations can be found in the `configs/experiments` directory. You can +launch a training job by executing. + +```shell +$ export CONFIG=./official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50.yaml +$ export MODEL_DIR="gs://" +$ export ANNOTAION_FILE="gs://" +$ export TRAIN_DATA="gs://" +$ export EVAL_DATA="gs://" +# Overrides to access data. These can also be changed in the config file. +$ export OVERRIDES="task.validation_data.input_path=${EVAL_DATA},\ +task.train_data.input_path=${TRAIN_DATA},\ +task.annotation_file=${ANNOTAION_FILE},\ +runtime.distribution_strategy=tpu" + +$ python3 -m official.projects.deepmac_maskrcnn.train \ + --logtostderr \ + --mode=train_and_eval \ + --experiment=deep_mask_head_rcnn_resnetfpn_coco \ + --model_dir=$MODEL_DIR \ + --config_file=$CONFIG \ + --params_override=$OVERRIDES\ + --tpu= +``` + +`CONFIG_FILE` can be any file in the `configs/experiments` directory. +When using SpineNet models, please specify +`--experiment=deep_mask_head_rcnn_spinenet_coco` + +**Note:** The default eval batch size of 32 discards some samples during +validation. For accurate vaidation statistics, launch a dedicated eval job on +TPU `v3-8` and set batch size to 8. + +## Configurations + +In the following table, we report the Mask mAP of our models on the non-VOC +classes when only training with masks for the VOC calsses. Performance is +measured on the `coco-val2017` set. + +Backbone | Mask head | Config name | Mask mAP +:------------| :----------- | :-----------------------------------------------| -------: +ResNet-50 | Default | `deep_mask_head_rcnn_voc_r50.yaml` | 25.9 +ResNet-50 | Hourglass-52 | `deep_mask_head_rcnn_voc_r50_hg52.yaml` | 33.1 +ResNet-101 | Hourglass-52 | `deep_mask_head_rcnn_voc_r101_hg52.yaml` | 34.4 +SpienNet-143 | Hourglass-52 | `deep_mask_head_rcnn_voc_spinenet143_hg52.yaml` | 38.7 + +## Checkpoints +This model takes Image + boxes as input and produces per-box instance +masks as output. + +* [Mask-RCNN SpineNet backbone](https://storage.googleapis.com/tf_model_garden/vision/deepmac_maskrcnn/deepmarc_spinenet.zip) + +## See also + +* [DeepMAC model](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/deepmac.md) + in the Object Detection API code base. +* Project website - [git.io/deepmac](https://google.github.io/deepmac/) + +## Citation + +``` +@misc{birodkar2021surprising, + title={The surprising impact of mask-head architecture on novel class segmentation}, + author={Vighnesh Birodkar and Zhichao Lu and Siyang Li and Vivek Rathod and Jonathan Huang}, + year={2021}, + eprint={2104.00613}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` diff --git a/official/projects/deepmac_maskrcnn/__init__.py b/official/projects/deepmac_maskrcnn/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/deepmac_maskrcnn/common/__init__.py b/official/projects/deepmac_maskrcnn/common/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/common/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/deepmac_maskrcnn/common/registry_imports.py b/official/projects/deepmac_maskrcnn/common/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..018e01f61c158ba09ac461397ab1c4ec9cc0cebf --- /dev/null +++ b/official/projects/deepmac_maskrcnn/common/registry_imports.py @@ -0,0 +1,18 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Imports to configure Mask R-CNN with deep mask heads.""" + +# pylint: disable=unused-import +from official.projects.deepmac_maskrcnn.tasks import deep_mask_head_rcnn diff --git a/official/projects/deepmac_maskrcnn/configs/__init__.py b/official/projects/deepmac_maskrcnn/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/configs/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn.py b/official/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..932e76dc883597658f0219b74410ef781b7703ae --- /dev/null +++ b/official/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn.py @@ -0,0 +1,196 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configuration for Mask R-CNN with deep mask heads.""" + +import dataclasses +import os +from typing import Optional + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import optimization +from official.vision.configs import backbones +from official.vision.configs import common +from official.vision.configs import decoders +from official.vision.configs import maskrcnn as maskrcnn_config +from official.vision.configs import retinanet as retinanet_config + + +@dataclasses.dataclass +class DeepMaskHead(maskrcnn_config.MaskHead): + convnet_variant: str = 'default' + + +@dataclasses.dataclass +class DeepMaskHeadRCNN(maskrcnn_config.MaskRCNN): + mask_head: Optional[DeepMaskHead] = DeepMaskHead() + use_gt_boxes_for_masks: bool = False + + +@dataclasses.dataclass +class DeepMaskHeadRCNNTask(maskrcnn_config.MaskRCNNTask): + """Configuration for the deep mask head R-CNN task.""" + model: DeepMaskHeadRCNN = DeepMaskHeadRCNN() + + +@exp_factory.register_config_factory('deep_mask_head_rcnn_resnetfpn_coco') +def deep_mask_head_rcnn_resnetfpn_coco() -> cfg.ExperimentConfig: + """COCO object detection with Mask R-CNN with deep mask heads.""" + global_batch_size = 64 + steps_per_epoch = int(retinanet_config.COCO_TRAIN_EXAMPLES / + global_batch_size) + coco_val_samples = 5000 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=DeepMaskHeadRCNNTask( + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', + init_checkpoint_modules='backbone', + annotation_file=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=DeepMaskHeadRCNN( + num_classes=91, input_size=[1024, 1024, 3], include_mask=True), # pytype: disable=wrong-keyword-args + losses=maskrcnn_config.Losses(l2_weight_decay=0.00004), + train_data=maskrcnn_config.DataConfig( + input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, + 'train*'), + is_training=True, + global_batch_size=global_batch_size, + parser=maskrcnn_config.Parser( + aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), + validation_data=maskrcnn_config.DataConfig( + input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, + 'val*'), + is_training=False, + global_batch_size=8)), # pytype: disable=wrong-keyword-args + trainer=cfg.TrainerConfig( + train_steps=22500, + validation_steps=coco_val_samples // 8, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [15000, 20000], + 'values': [0.12, 0.012, 0.0012], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 500, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('deep_mask_head_rcnn_spinenet_coco') +def deep_mask_head_rcnn_spinenet_coco() -> cfg.ExperimentConfig: + """COCO object detection with Mask R-CNN with SpineNet backbone.""" + steps_per_epoch = 463 + coco_val_samples = 5000 + train_batch_size = 256 + eval_batch_size = 8 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=DeepMaskHeadRCNNTask( + annotation_file=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), # pytype: disable=wrong-keyword-args + model=DeepMaskHeadRCNN( + backbone=backbones.Backbone( + type='spinenet', + spinenet=backbones.SpineNet( + model_id='49', + min_level=3, + max_level=7, + )), + decoder=decoders.Decoder( + type='identity', identity=decoders.Identity()), + anchor=maskrcnn_config.Anchor(anchor_size=3), + norm_activation=common.NormActivation(use_sync_bn=True), + num_classes=91, + input_size=[640, 640, 3], + min_level=3, + max_level=7, + include_mask=True), # pytype: disable=wrong-keyword-args + losses=maskrcnn_config.Losses(l2_weight_decay=0.00004), + train_data=maskrcnn_config.DataConfig( + input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, + 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=maskrcnn_config.Parser( + aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)), + validation_data=maskrcnn_config.DataConfig( + input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, + 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), # pytype: disable=wrong-keyword-args + trainer=cfg.TrainerConfig( + train_steps=steps_per_epoch * 350, + validation_steps=coco_val_samples // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + steps_per_epoch * 320, steps_per_epoch * 340 + ], + 'values': [0.32, 0.032, 0.0032], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.model.min_level == task.model.backbone.spinenet.min_level', + 'task.model.max_level == task.model.backbone.spinenet.max_level', + ]) + return config diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn_config_test.py b/official/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn_config_test.py similarity index 87% rename from official/vision/beta/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn_config_test.py rename to official/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn_config_test.py index 77920be603b84146fc7e648187827918f41ea496..03a7d52747349eb80fcf45685387c89d768f1706 100644 --- a/official/vision/beta/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn_config_test.py +++ b/official/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn_config_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ import tensorflow as tf -from official.vision.beta.projects.deepmac_maskrcnn.configs import deep_mask_head_rcnn +from official.projects.deepmac_maskrcnn.configs import deep_mask_head_rcnn class DeepMaskHeadRcnnConfigTest(tf.test.TestCase): diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_nonvoc_spinenet143_hg52.yaml b/official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_nonvoc_spinenet143_hg52.yaml similarity index 100% rename from official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_nonvoc_spinenet143_hg52.yaml rename to official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_nonvoc_spinenet143_hg52.yaml diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r101_hg52.yaml b/official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r101_hg52.yaml similarity index 100% rename from official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r101_hg52.yaml rename to official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r101_hg52.yaml diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50.yaml b/official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50.yaml similarity index 100% rename from official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50.yaml rename to official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50.yaml diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50_hg52.yaml b/official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50_hg52.yaml similarity index 100% rename from official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50_hg52.yaml rename to official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50_hg52.yaml diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_spinenet143_hg52.yaml b/official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_spinenet143_hg52.yaml similarity index 100% rename from official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_spinenet143_hg52.yaml rename to official/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_spinenet143_hg52.yaml diff --git a/official/projects/deepmac_maskrcnn/modeling/__init__.py b/official/projects/deepmac_maskrcnn/modeling/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/modeling/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/deepmac_maskrcnn/modeling/heads/__init__.py b/official/projects/deepmac_maskrcnn/modeling/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/modeling/heads/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/hourglass_network.py b/official/projects/deepmac_maskrcnn/modeling/heads/hourglass_network.py similarity index 99% rename from official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/hourglass_network.py rename to official/projects/deepmac_maskrcnn/modeling/heads/hourglass_network.py index 8b73140457940d9a45c8f545f0bd085bda0daa8c..b6f3cac996d2d794d0a1e025f44641aa01f01e5a 100644 --- a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/hourglass_network.py +++ b/official/projects/deepmac_maskrcnn/modeling/heads/hourglass_network.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/deepmac_maskrcnn/modeling/heads/instance_heads.py b/official/projects/deepmac_maskrcnn/modeling/heads/instance_heads.py new file mode 100644 index 0000000000000000000000000000000000000000..cec8bd3a49e10dd12e7346ecd7e23d8b759b1583 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/modeling/heads/instance_heads.py @@ -0,0 +1,311 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Instance prediction heads.""" + +# Import libraries + +from absl import logging +import tensorflow as tf + +from official.modeling import tf_utils +from official.projects.deepmac_maskrcnn.modeling.heads import hourglass_network + + +class DeepMaskHead(tf.keras.layers.Layer): + """Creates a mask head.""" + + def __init__(self, + num_classes, + upsample_factor=2, + num_convs=4, + num_filters=256, + use_separable_conv=False, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + class_agnostic=False, + convnet_variant='default', + **kwargs): + """Initializes a mask head. + + Args: + num_classes: An `int` of the number of classes. + upsample_factor: An `int` that indicates the upsample factor to generate + the final predicted masks. It should be >= 1. + num_convs: An `int` number that represents the number of the intermediate + convolution layers before the mask prediction layers. + num_filters: An `int` number that represents the number of filters of the + intermediate convolution layers. + use_separable_conv: A `bool` that indicates whether the separable + convolution layers is used. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + class_agnostic: A `bool`. If set, we use a single channel mask head that + is shared between all classes. + convnet_variant: A `str` denoting the architecture of network used in the + head. Supported options are 'default', 'hourglass20', 'hourglass52' + and 'hourglass100'. + **kwargs: Additional keyword arguments to be passed. + """ + super(DeepMaskHead, self).__init__(**kwargs) + self._config_dict = { + 'num_classes': num_classes, + 'upsample_factor': upsample_factor, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'use_separable_conv': use_separable_conv, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + 'class_agnostic': class_agnostic, + 'convnet_variant': convnet_variant, + } + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation = tf_utils.get_activation(activation) + + def _get_conv_op_and_kwargs(self): + conv_op = (tf.keras.layers.SeparableConv2D + if self._config_dict['use_separable_conv'] + else tf.keras.layers.Conv2D) + conv_kwargs = { + 'filters': self._config_dict['num_filters'], + 'kernel_size': 3, + 'padding': 'same', + } + if self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'depthwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'pointwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'depthwise_regularizer': self._config_dict['kernel_regularizer'], + 'pointwise_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + else: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + + return conv_op, conv_kwargs + + def _get_bn_op_and_kwargs(self): + + bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + return bn_op, bn_kwargs + + def build(self, input_shape): + """Creates the variables of the head.""" + + conv_op, conv_kwargs = self._get_conv_op_and_kwargs() + + self._build_convnet_variant() + + self._deconv = tf.keras.layers.Conv2DTranspose( + filters=self._config_dict['num_filters'], + kernel_size=self._config_dict['upsample_factor'], + strides=self._config_dict['upsample_factor'], + padding='valid', + kernel_initializer=tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + bias_initializer=tf.zeros_initializer(), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer'], + name='mask-upsampling') + + bn_op, bn_kwargs = self._get_bn_op_and_kwargs() + self._deconv_bn = bn_op(name='mask-deconv-bn', **bn_kwargs) + + if self._config_dict['class_agnostic']: + num_filters = 1 + else: + num_filters = self._config_dict['num_classes'] + + conv_kwargs = { + 'filters': num_filters, + 'kernel_size': 1, + 'padding': 'valid', + } + if self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'depthwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'pointwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'depthwise_regularizer': self._config_dict['kernel_regularizer'], + 'pointwise_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + else: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + self._mask_regressor = conv_op(name='mask-logits', **conv_kwargs) + + super(DeepMaskHead, self).build(input_shape) + + def call(self, inputs, training=None): + """Forward pass of mask branch for the Mask-RCNN model. + + Args: + inputs: A `list` of two tensors where + inputs[0]: A `tf.Tensor` of shape [batch_size, num_instances, + roi_height, roi_width, roi_channels], representing the ROI features. + inputs[1]: A `tf.Tensor` of shape [batch_size, num_instances], + representing the classes of the ROIs. + training: A `bool` indicating whether it is in `training` mode. + + Returns: + mask_outputs: A `tf.Tensor` of shape + [batch_size, num_instances, roi_height * upsample_factor, + roi_width * upsample_factor], representing the mask predictions. + """ + roi_features, roi_classes = inputs + features_shape = tf.shape(roi_features) + batch_size, num_rois, height, width, filters = ( + features_shape[0], features_shape[1], features_shape[2], + features_shape[3], features_shape[4]) + if batch_size is None: + batch_size = tf.shape(roi_features)[0] + + x = tf.reshape(roi_features, [-1, height, width, filters]) + + x = self._call_convnet_variant(x) + + x = self._deconv(x) + x = self._deconv_bn(x) + x = self._activation(x) + + logits = self._mask_regressor(x) + + mask_height = height * self._config_dict['upsample_factor'] + mask_width = width * self._config_dict['upsample_factor'] + + if self._config_dict['class_agnostic']: + logits = tf.reshape(logits, [-1, num_rois, mask_height, mask_width, 1]) + else: + logits = tf.reshape( + logits, + [-1, num_rois, mask_height, mask_width, + self._config_dict['num_classes']]) + + batch_indices = tf.tile( + tf.expand_dims(tf.range(batch_size), axis=1), [1, num_rois]) + mask_indices = tf.tile( + tf.expand_dims(tf.range(num_rois), axis=0), [batch_size, 1]) + + if self._config_dict['class_agnostic']: + class_gather_indices = tf.zeros_like(roi_classes, dtype=tf.int32) + else: + class_gather_indices = tf.cast(roi_classes, dtype=tf.int32) + + gather_indices = tf.stack( + [batch_indices, mask_indices, class_gather_indices], + axis=2) + mask_outputs = tf.gather_nd( + tf.transpose(logits, [0, 1, 4, 2, 3]), gather_indices) + return mask_outputs + + def _build_convnet_variant(self): + + variant = self._config_dict['convnet_variant'] + if variant == 'default': + bn_op, bn_kwargs = self._get_bn_op_and_kwargs() + self._convs = [] + self._conv_norms = [] + for i in range(self._config_dict['num_convs']): + conv_name = 'mask-conv_{}'.format(i) + conv_op, conv_kwargs = self._get_conv_op_and_kwargs() + self._convs.append(conv_op(name=conv_name, **conv_kwargs)) + bn_name = 'mask-conv-bn_{}'.format(i) + self._conv_norms.append(bn_op(name=bn_name, **bn_kwargs)) + + elif variant == 'hourglass20': + logging.info('Using hourglass 20 network.') + self._hourglass = hourglass_network.hourglass_20( + self._config_dict['num_filters'], initial_downsample=False) + + elif variant == 'hourglass52': + logging.info('Using hourglass 52 network.') + self._hourglass = hourglass_network.hourglass_52( + self._config_dict['num_filters'], initial_downsample=False) + + elif variant == 'hourglass100': + logging.info('Using hourglass 100 network.') + self._hourglass = hourglass_network.hourglass_100( + self._config_dict['num_filters'], initial_downsample=False) + + else: + raise ValueError('Unknown ConvNet variant - {}'.format(variant)) + + def _call_convnet_variant(self, x): + + variant = self._config_dict['convnet_variant'] + if variant == 'default': + for conv, bn in zip(self._convs, self._conv_norms): + x = conv(x) + x = bn(x) + x = self._activation(x) + return x + elif variant == 'hourglass20': + return self._hourglass(x)[-1] + elif variant == 'hourglass52': + return self._hourglass(x)[-1] + elif variant == 'hourglass100': + return self._hourglass(x)[-1] + else: + raise ValueError('Unknown ConvNet variant - {}'.format(variant)) + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/projects/deepmac_maskrcnn/modeling/heads/instance_heads_test.py b/official/projects/deepmac_maskrcnn/modeling/heads/instance_heads_test.py new file mode 100644 index 0000000000000000000000000000000000000000..20cdc0fcab66ffc23bf465c435454e1b30540247 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/modeling/heads/instance_heads_test.py @@ -0,0 +1,98 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for instance_heads.py.""" + +# Import libraries +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.projects.deepmac_maskrcnn.modeling.heads import instance_heads as deep_instance_heads + + +class MaskHeadTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (1, 1, False), + (1, 2, False), + (2, 1, False), + (2, 2, False), + ) + def test_forward(self, upsample_factor, num_convs, use_sync_bn): + mask_head = deep_instance_heads.DeepMaskHead( + num_classes=3, + upsample_factor=upsample_factor, + num_convs=num_convs, + num_filters=16, + use_separable_conv=False, + activation='relu', + use_sync_bn=use_sync_bn, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + roi_features = np.random.rand(2, 10, 14, 14, 16) + roi_classes = np.zeros((2, 10)) + masks = mask_head([roi_features, roi_classes]) + self.assertAllEqual( + masks.numpy().shape, + [2, 10, 14 * upsample_factor, 14 * upsample_factor]) + + def test_serialize_deserialize(self): + mask_head = deep_instance_heads.DeepMaskHead( + num_classes=3, + upsample_factor=2, + num_convs=1, + num_filters=256, + use_separable_conv=False, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + config = mask_head.get_config() + new_mask_head = deep_instance_heads.DeepMaskHead.from_config(config) + self.assertAllEqual( + mask_head.get_config(), new_mask_head.get_config()) + + def test_forward_class_agnostic(self): + mask_head = deep_instance_heads.DeepMaskHead( + num_classes=3, + class_agnostic=True + ) + roi_features = np.random.rand(2, 10, 14, 14, 16) + roi_classes = np.zeros((2, 10)) + masks = mask_head([roi_features, roi_classes]) + self.assertAllEqual(masks.numpy().shape, [2, 10, 28, 28]) + + def test_instance_head_hourglass(self): + mask_head = deep_instance_heads.DeepMaskHead( + num_classes=3, + class_agnostic=True, + convnet_variant='hourglass20', + num_filters=32, + upsample_factor=2 + ) + roi_features = np.random.rand(2, 10, 16, 16, 16) + roi_classes = np.zeros((2, 10)) + masks = mask_head([roi_features, roi_classes]) + self.assertAllEqual(masks.numpy().shape, [2, 10, 32, 32]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/deepmac_maskrcnn/modeling/maskrcnn_model.py b/official/projects/deepmac_maskrcnn/modeling/maskrcnn_model.py new file mode 100644 index 0000000000000000000000000000000000000000..488485e2881c28c56d1d755a28929bc36bfef5ed --- /dev/null +++ b/official/projects/deepmac_maskrcnn/modeling/maskrcnn_model.py @@ -0,0 +1,221 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Mask R-CNN model.""" + +from typing import List, Mapping, Optional, Union + +# Import libraries + +from absl import logging +import tensorflow as tf + +from official.vision.modeling import maskrcnn_model + + +def resize_as(source, size): + + source = tf.transpose(source, (0, 2, 3, 1)) + source = tf.image.resize(source, (size, size)) + return tf.transpose(source, (0, 3, 1, 2)) + + +class DeepMaskRCNNModel(maskrcnn_model.MaskRCNNModel): + """The Mask R-CNN model.""" + + def __init__(self, + backbone: tf.keras.Model, + decoder: tf.keras.Model, + rpn_head: tf.keras.layers.Layer, + detection_head: Union[tf.keras.layers.Layer, + List[tf.keras.layers.Layer]], + roi_generator: tf.keras.layers.Layer, + roi_sampler: Union[tf.keras.layers.Layer, + List[tf.keras.layers.Layer]], + roi_aligner: tf.keras.layers.Layer, + detection_generator: tf.keras.layers.Layer, + mask_head: Optional[tf.keras.layers.Layer] = None, + mask_sampler: Optional[tf.keras.layers.Layer] = None, + mask_roi_aligner: Optional[tf.keras.layers.Layer] = None, + class_agnostic_bbox_pred: bool = False, + cascade_class_ensemble: bool = False, + min_level: Optional[int] = None, + max_level: Optional[int] = None, + num_scales: Optional[int] = None, + aspect_ratios: Optional[List[float]] = None, + anchor_size: Optional[float] = None, + use_gt_boxes_for_masks=False, + **kwargs): + """Initializes the Mask R-CNN model. + + Args: + backbone: `tf.keras.Model`, the backbone network. + decoder: `tf.keras.Model`, the decoder network. + rpn_head: the RPN head. + detection_head: the detection head or a list of heads. + roi_generator: the ROI generator. + roi_sampler: a single ROI sampler or a list of ROI samplers for cascade + detection heads. + roi_aligner: the ROI aligner. + detection_generator: the detection generator. + mask_head: the mask head. + mask_sampler: the mask sampler. + mask_roi_aligner: the ROI alginer for mask prediction. + class_agnostic_bbox_pred: if True, perform class agnostic bounding box + prediction. Needs to be `True` for Cascade RCNN models. + cascade_class_ensemble: if True, ensemble classification scores over all + detection heads. + min_level: Minimum level in output feature maps. + max_level: Maximum level in output feature maps. + num_scales: A number representing intermediate scales added on each level. + For instances, num_scales=2 adds one additional intermediate anchor + scales [2^0, 2^0.5] on each level. + aspect_ratios: A list representing the aspect raito anchors added on each + level. The number indicates the ratio of width to height. For instances, + aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each scale level. + anchor_size: A number representing the scale of size of the base anchor to + the feature stride 2^level. + use_gt_boxes_for_masks: bool, if set, crop using groundtruth boxes instead + of proposals for training mask head + **kwargs: keyword arguments to be passed. + """ + super(DeepMaskRCNNModel, self).__init__( + backbone=backbone, + decoder=decoder, + rpn_head=rpn_head, + detection_head=detection_head, + roi_generator=roi_generator, + roi_sampler=roi_sampler, + roi_aligner=roi_aligner, + detection_generator=detection_generator, + mask_head=mask_head, + mask_sampler=mask_sampler, + mask_roi_aligner=mask_roi_aligner, + class_agnostic_bbox_pred=class_agnostic_bbox_pred, + cascade_class_ensemble=cascade_class_ensemble, + min_level=min_level, + max_level=max_level, + num_scales=num_scales, + aspect_ratios=aspect_ratios, + anchor_size=anchor_size, + **kwargs) + + self._config_dict['use_gt_boxes_for_masks'] = use_gt_boxes_for_masks + + def call(self, + images: tf.Tensor, + image_shape: tf.Tensor, + anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, + gt_boxes: Optional[tf.Tensor] = None, + gt_classes: Optional[tf.Tensor] = None, + gt_masks: Optional[tf.Tensor] = None, + training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: + + model_outputs, intermediate_outputs = self._call_box_outputs( + images=images, image_shape=image_shape, anchor_boxes=anchor_boxes, + gt_boxes=gt_boxes, gt_classes=gt_classes, training=training) + if not self._include_mask: + return model_outputs + + model_mask_outputs = self._call_mask_outputs( + model_box_outputs=model_outputs, + features=model_outputs['decoder_features'], + current_rois=intermediate_outputs['current_rois'], + matched_gt_indices=intermediate_outputs['matched_gt_indices'], + matched_gt_boxes=intermediate_outputs['matched_gt_boxes'], + matched_gt_classes=intermediate_outputs['matched_gt_classes'], + gt_masks=gt_masks, + gt_classes=gt_classes, + gt_boxes=gt_boxes, + training=training) + model_outputs.update(model_mask_outputs) + return model_outputs + + def call_images_and_boxes(self, images, boxes): + """Predict masks given an image and bounding boxes.""" + + _, decoder_features = self._get_backbone_and_decoder_features(images) + boxes_shape = tf.shape(boxes) + batch_size, num_boxes = boxes_shape[0], boxes_shape[1] + classes = tf.zeros((batch_size, num_boxes), dtype=tf.int32) + + _, mask_probs = self._features_to_mask_outputs( + decoder_features, boxes, classes) + return { + 'detection_masks': mask_probs + } + + def _call_mask_outputs( + self, + model_box_outputs: Mapping[str, tf.Tensor], + features: tf.Tensor, + current_rois: tf.Tensor, + matched_gt_indices: tf.Tensor, + matched_gt_boxes: tf.Tensor, + matched_gt_classes: tf.Tensor, + gt_masks: tf.Tensor, + gt_classes: tf.Tensor, + gt_boxes: tf.Tensor, + training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: + + model_outputs = dict(model_box_outputs) + if training: + if self._config_dict['use_gt_boxes_for_masks']: + mask_size = ( + self.mask_roi_aligner._config_dict['crop_size'] * # pylint:disable=protected-access + self.mask_head._config_dict['upsample_factor'] # pylint:disable=protected-access + ) + gt_masks = resize_as(source=gt_masks, size=mask_size) + + logging.info('Using GT class and mask targets.') + model_outputs.update({ + 'mask_class_targets': gt_classes, + 'mask_targets': gt_masks, + }) + else: + rois, roi_classes, roi_masks = self.mask_sampler( + current_rois, matched_gt_boxes, matched_gt_classes, + matched_gt_indices, gt_masks) + roi_masks = tf.stop_gradient(roi_masks) + model_outputs.update({ + 'mask_class_targets': roi_classes, + 'mask_targets': roi_masks, + }) + + else: + rois = model_outputs['detection_boxes'] + roi_classes = model_outputs['detection_classes'] + + # Mask RoI align. + if training and self._config_dict['use_gt_boxes_for_masks']: + logging.info('Using GT mask roi features.') + roi_aligner_boxes = gt_boxes + mask_head_classes = gt_classes + + else: + roi_aligner_boxes = rois + mask_head_classes = roi_classes + + mask_logits, mask_probs = self._features_to_mask_outputs( + features, roi_aligner_boxes, mask_head_classes) + + if training: + model_outputs.update({ + 'mask_outputs': mask_logits, + }) + else: + model_outputs.update({ + 'detection_masks': mask_probs, + }) + return model_outputs diff --git a/official/projects/deepmac_maskrcnn/modeling/maskrcnn_model_test.py b/official/projects/deepmac_maskrcnn/modeling/maskrcnn_model_test.py new file mode 100644 index 0000000000000000000000000000000000000000..08e9ab5376f36ac5c753504dfcee67438ab7299e --- /dev/null +++ b/official/projects/deepmac_maskrcnn/modeling/maskrcnn_model_test.py @@ -0,0 +1,153 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for maskrcnn_model.py.""" + +# Import libraries + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.projects.deepmac_maskrcnn.modeling import maskrcnn_model +from official.projects.deepmac_maskrcnn.modeling.heads import instance_heads as deep_instance_heads +from official.vision.modeling.backbones import resnet +from official.vision.modeling.decoders import fpn +from official.vision.modeling.heads import dense_prediction_heads +from official.vision.modeling.heads import instance_heads +from official.vision.modeling.layers import detection_generator +from official.vision.modeling.layers import mask_sampler +from official.vision.modeling.layers import roi_aligner +from official.vision.modeling.layers import roi_generator +from official.vision.modeling.layers import roi_sampler +from official.vision.ops import anchor + + +def construct_model_and_anchors(image_size, use_gt_boxes_for_masks): + num_classes = 3 + min_level = 3 + max_level = 4 + num_scales = 3 + aspect_ratios = [1.0] + + anchor_boxes = anchor.Anchor( + min_level=min_level, + max_level=max_level, + num_scales=num_scales, + aspect_ratios=aspect_ratios, + anchor_size=3, + image_size=image_size).multilevel_boxes + num_anchors_per_location = len(aspect_ratios) * num_scales + + input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) + backbone = resnet.ResNet(model_id=50, input_specs=input_specs) + decoder = fpn.FPN( + min_level=min_level, + max_level=max_level, + input_specs=backbone.output_specs) + rpn_head = dense_prediction_heads.RPNHead( + min_level=min_level, + max_level=max_level, + num_anchors_per_location=num_anchors_per_location) + detection_head = instance_heads.DetectionHead( + num_classes=num_classes) + roi_generator_obj = roi_generator.MultilevelROIGenerator() + roi_sampler_obj = roi_sampler.ROISampler() + roi_aligner_obj = roi_aligner.MultilevelROIAligner() + detection_generator_obj = detection_generator.DetectionGenerator() + mask_head = deep_instance_heads.DeepMaskHead( + num_classes=num_classes, upsample_factor=2) + mask_sampler_obj = mask_sampler.MaskSampler( + mask_target_size=28, num_sampled_masks=1) + mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) + + model = maskrcnn_model.DeepMaskRCNNModel( + backbone, + decoder, + rpn_head, + detection_head, + roi_generator_obj, + roi_sampler_obj, + roi_aligner_obj, + detection_generator_obj, + mask_head, + mask_sampler_obj, + mask_roi_aligner_obj, + use_gt_boxes_for_masks=use_gt_boxes_for_masks) + + return model, anchor_boxes + + +class MaskRCNNModelTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (False, False,), + (False, True,), + (True, False,), + (True, True,), + ) + def test_forward(self, use_gt_boxes_for_masks, training): + image_size = (256, 256) + images = np.random.rand(2, image_size[0], image_size[1], 3) + image_shape = np.array([[224, 100], [100, 224]]) + model, anchor_boxes = construct_model_and_anchors( + image_size, use_gt_boxes_for_masks) + + gt_boxes = tf.zeros((2, 16, 4), dtype=tf.float32) + gt_masks = tf.zeros((2, 16, 32, 32)) + gt_classes = tf.zeros((2, 16), dtype=tf.int32) + results = model(images.astype(np.uint8), + image_shape, + anchor_boxes, + gt_boxes, + gt_classes, + gt_masks, + training=training) + + self.assertIn('rpn_boxes', results) + self.assertIn('rpn_scores', results) + if training: + self.assertIn('class_targets', results) + self.assertIn('box_targets', results) + self.assertIn('class_outputs', results) + self.assertIn('box_outputs', results) + self.assertIn('mask_outputs', results) + self.assertEqual(results['mask_targets'].shape, + results['mask_outputs'].shape) + else: + self.assertIn('detection_boxes', results) + self.assertIn('detection_scores', results) + self.assertIn('detection_classes', results) + self.assertIn('num_detections', results) + self.assertIn('detection_masks', results) + + @parameterized.parameters( + [(1, 5), (1, 10), (1, 15), (2, 5), (2, 10), (2, 15)] + ) + def test_image_and_boxes(self, batch_size, num_boxes): + image_size = (640, 640) + images = np.random.rand(1, image_size[0], image_size[1], 3).astype( + np.float32) + model, _ = construct_model_and_anchors( + image_size, use_gt_boxes_for_masks=True) + + boxes = np.zeros((1, num_boxes, 4), dtype=np.float32) + boxes[:, :, [2, 3]] = 1.0 + boxes = tf.constant(boxes) + results = model.call_images_and_boxes(images, boxes) + self.assertIn('detection_masks', results) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/deepmac_maskrcnn/serving/__init__.py b/official/projects/deepmac_maskrcnn/serving/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/serving/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/deepmac_maskrcnn/serving/detection.py b/official/projects/deepmac_maskrcnn/serving/detection.py new file mode 100644 index 0000000000000000000000000000000000000000..9e3bbfd28892173a015818c0906b7ce3e130a449 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/serving/detection.py @@ -0,0 +1,139 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Detection input and model functions for serving/inference.""" + +from typing import Dict, Mapping, Text + +import tensorflow as tf + +from official.projects.deepmac_maskrcnn.configs import deep_mask_head_rcnn as cfg +from official.projects.deepmac_maskrcnn.modeling import maskrcnn_model +from official.projects.deepmac_maskrcnn.tasks import deep_mask_head_rcnn +from official.vision.ops import box_ops +from official.vision.serving import detection + + +def reverse_input_box_transformation(boxes, image_info): + """Reverse the Mask R-CNN model's input boxes tranformation. + + Args: + boxes: A [batch_size, num_boxes, 4] float tensor of boxes in normalized + coordinates. + image_info: a 2D `Tensor` that encodes the information of the image and the + applied preprocessing. It is in the format of + [[original_height, original_width], [desired_height, desired_width], + [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, + desired_width] is the actual scaled image size, and [y_scale, x_scale] is + the scaling factor, which is the ratio of + scaled dimension / original dimension. + + Returns: + boxes: Same shape as input `boxes` but in the absolute coordinate space of + the preprocessed image. + """ + # Reversing sequence from Detection_module.serve when + # output_normalized_coordinates=true + scale = image_info[:, 2:3, :] + scale = tf.tile(scale, [1, 1, 2]) + boxes = boxes * scale + height_width = image_info[:, 0:1, :] + return box_ops.denormalize_boxes(boxes, height_width) + + +class DetectionModule(detection.DetectionModule): + """Detection Module.""" + + def _build_model(self): + + if self._batch_size is None: + ValueError("batch_size can't be None for detection models") + if self.params.task.model.detection_generator.nms_version != 'batched': + ValueError('Only batched_nms is supported.') + input_specs = tf.keras.layers.InputSpec(shape=[self._batch_size] + + self._input_image_size + [3]) + + if isinstance(self.params.task.model, cfg.DeepMaskHeadRCNN): + model = deep_mask_head_rcnn.build_maskrcnn( + input_specs=input_specs, model_config=self.params.task.model) + else: + raise ValueError('Detection module not implemented for {} model.'.format( + type(self.params.task.model))) + + return model + + @tf.function + def inference_for_tflite_image_and_boxes( + self, images: tf.Tensor, boxes: tf.Tensor) -> Mapping[str, tf.Tensor]: + """A tf-function for serve_image_and_boxes. + + Args: + images: A [batch_size, height, width, channels] float tensor. + boxes: A [batch_size, num_boxes, 4] float tensor containing boxes + normalized to the input image. + + Returns: + result: A dict containing: + 'detection_masks': A [batch_size, num_boxes, mask_height, mask_width] + float tensor containing per-pixel mask probabilities. + """ + + if not isinstance(self.model, maskrcnn_model.DeepMaskRCNNModel): + raise ValueError( + ('Can only use image and boxes input for DeepMaskRCNNModel, ' + 'Found {}'.format(type(self.model)))) + + return self.serve_image_and_boxes(images, boxes) + + def serve_image_and_boxes(self, images: tf.Tensor, boxes: tf.Tensor): + """Function used to export a model that consumes and image and boxes. + + The model predicts the class-agnostic masks at the given box locations. + + Args: + images: A [batch_size, height, width, channels] float tensor. + boxes: A [batch_size, num_boxes, 4] float tensor containing boxes + normalized to the input image. + + Returns: + result: A dict containing: + 'detection_masks': A [batch_size, num_boxes, mask_height, mask_width] + float tensor containing per-pixel mask probabilities. + """ + images, _, image_info = self.preprocess(images) + boxes = reverse_input_box_transformation(boxes, image_info) + result = self.model.call_images_and_boxes(images, boxes) + return result + + def get_inference_signatures(self, function_keys: Dict[Text, Text]): + signatures = {} + + if 'image_and_boxes_tensor' in function_keys: + def_name = function_keys['image_and_boxes_tensor'] + image_signature = tf.TensorSpec( + shape=[self._batch_size] + [None] * len(self._input_image_size) + + [self._num_channels], + dtype=tf.uint8) + boxes_signature = tf.TensorSpec(shape=[self._batch_size, None, 4], + dtype=tf.float32) + tf_function = self.inference_for_tflite_image_and_boxes + signatures[def_name] = tf_function.get_concrete_function( + image_signature, boxes_signature) + + function_keys.pop('image_and_boxes_tensor', None) + parent_signatures = super(DetectionModule, self).get_inference_signatures( + function_keys) + signatures.update(parent_signatures) + + return signatures diff --git a/official/projects/deepmac_maskrcnn/serving/detection_test.py b/official/projects/deepmac_maskrcnn/serving/detection_test.py new file mode 100644 index 0000000000000000000000000000000000000000..cd832e2821fcc126e9986faf53733069c146c2f1 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/serving/detection_test.py @@ -0,0 +1,164 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Test for image detection export lib.""" + +import io +import os + +from absl.testing import parameterized +import numpy as np +from PIL import Image +import tensorflow as tf + +from official.core import exp_factory +from official.projects.deepmac_maskrcnn.serving import detection + + +class DetectionExportTest(tf.test.TestCase, parameterized.TestCase): + + def _get_detection_module(self, experiment_name, image_size=(640, 640)): + params = exp_factory.get_exp_config(experiment_name) + params.task.model.backbone.resnet.model_id = 18 + params.task.model.detection_generator.use_batched_nms = True + detection_module = detection.DetectionModule( + params, batch_size=1, input_image_size=list(image_size)) + return detection_module + + def _export_from_module(self, module, input_type, save_directory): + signatures = module.get_inference_signatures( + {input_type: 'serving_default'}) + tf.saved_model.save(module, save_directory, signatures=signatures) + + def _get_dummy_input(self, input_type, batch_size, image_size): + """Get dummy input for the given input type.""" + h, w = image_size + + if input_type == 'image_tensor': + return tf.zeros((batch_size, h, w, 3), dtype=np.uint8) + elif input_type == 'image_bytes': + image = Image.fromarray(np.zeros((h, w, 3), dtype=np.uint8)) + byte_io = io.BytesIO() + image.save(byte_io, 'PNG') + return [byte_io.getvalue() for b in range(batch_size)] + elif input_type == 'tf_example': + image_tensor = tf.zeros((h, w, 3), dtype=tf.uint8) + encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() + example = tf.train.Example( + features=tf.train.Features( + feature={ + 'image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[encoded_jpeg])), + })).SerializeToString() + return [example for b in range(batch_size)] + + @parameterized.parameters( + ('image_tensor', 'deep_mask_head_rcnn_resnetfpn_coco', [640, 640]), + ('image_bytes', 'deep_mask_head_rcnn_resnetfpn_coco', [640, 384]), + ('tf_example', 'deep_mask_head_rcnn_resnetfpn_coco', [640, 640]), + ) + def test_export(self, input_type, experiment_name, image_size): + self.skipTest('a') + tmp_dir = self.get_temp_dir() + module = self._get_detection_module(experiment_name, image_size) + + self._export_from_module(module, input_type, tmp_dir) + + self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) + self.assertTrue( + os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) + self.assertTrue( + os.path.exists( + os.path.join(tmp_dir, 'variables', + 'variables.data-00000-of-00001'))) + + imported = tf.saved_model.load(tmp_dir) + detection_fn = imported.signatures['serving_default'] + + images = self._get_dummy_input( + input_type, batch_size=1, image_size=image_size) + + processed_images, anchor_boxes, image_info = module._build_inputs( + tf.zeros((224, 224, 3), dtype=tf.uint8)) + image_shape = image_info[1, :] + image_shape = tf.expand_dims(image_shape, 0) + processed_images = tf.expand_dims(processed_images, 0) + for l, l_boxes in anchor_boxes.items(): + anchor_boxes[l] = tf.expand_dims(l_boxes, 0) + + expected_outputs = module.model( + images=processed_images, + image_shape=image_shape, + anchor_boxes=anchor_boxes, + training=False) + outputs = detection_fn(tf.constant(images)) + + self.assertAllClose(outputs['num_detections'].numpy(), + expected_outputs['num_detections'].numpy()) + + @parameterized.parameters( + ('deep_mask_head_rcnn_resnetfpn_coco', [640, 640], 1), + ('deep_mask_head_rcnn_resnetfpn_coco', [640, 640], 5), + ('deep_mask_head_rcnn_spinenet_coco', [640, 384], 3), + ('deep_mask_head_rcnn_spinenet_coco', [640, 384], 9), + ) + def test_export_image_and_boxes(self, experiment_name, image_size, num_boxes): + tmp_dir = self.get_temp_dir() + module = self._get_detection_module(experiment_name) + + self._export_from_module(module, 'image_and_boxes_tensor', tmp_dir) + + self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) + self.assertTrue( + os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) + self.assertTrue( + os.path.exists( + os.path.join(tmp_dir, 'variables', + 'variables.data-00000-of-00001'))) + + imported = tf.saved_model.load(tmp_dir) + detection_fn = imported.signatures['serving_default'] + + images = self._get_dummy_input( + 'image_tensor', batch_size=1, image_size=image_size) + + processed_images, anchor_boxes, image_info = module._build_inputs( + tf.zeros(image_size + [3], dtype=tf.uint8)) + + image_shape = image_info[1, :] + image_shape = image_shape[tf.newaxis] + processed_images = processed_images[tf.newaxis] + image_info = image_info[tf.newaxis] + + for l, l_boxes in anchor_boxes.items(): + anchor_boxes[l] = tf.expand_dims(l_boxes, 0) + + boxes = np.zeros((1, num_boxes, 4), dtype=np.float32) + boxes[:, :, [2, 3]] = 1.0 + boxes = tf.constant(boxes) + + denormalized_boxes = detection.reverse_input_box_transformation( + boxes, image_info) + expected_outputs = module.model.call_images_and_boxes( + images=processed_images, boxes=denormalized_boxes) + outputs = detection_fn(images=tf.constant(images), boxes=boxes) + + self.assertAllClose(outputs['detection_masks'].numpy(), + expected_outputs['detection_masks'].numpy(), + rtol=1e-3, atol=1e-3) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/deepmac_maskrcnn/serving/export_saved_model.py b/official/projects/deepmac_maskrcnn/serving/export_saved_model.py new file mode 100644 index 0000000000000000000000000000000000000000..b88aa70eb8a59a51a29390a6199b214bef924acb --- /dev/null +++ b/official/projects/deepmac_maskrcnn/serving/export_saved_model.py @@ -0,0 +1,106 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Deepmac model export binary for serving/inference. + +To export a trained checkpoint in saved_model format (shell script): + +CHECKPOINT_PATH = XX +EXPORT_DIR_PATH = XX +CONFIG_FILE_PATH = XX +export_saved_model --export_dir=${EXPORT_DIR_PATH}/ \ + --checkpoint_path=${CHECKPOINT_PATH} \ + --config_file=${CONFIG_FILE_PATH} \ + --batch_size=2 \ + --input_image_size=224,224 +To serve (python): +export_dir_path = XX +input_type = XX +input_images = XX +imported = tf.saved_model.load(export_dir_path) +model_fn = imported.signatures['serving_default'] +output = model_fn(input_images) +""" + +from absl import app +from absl import flags + +from official.core import exp_factory +from official.modeling import hyperparams +from official.projects.deepmac_maskrcnn.serving import detection +from official.projects.deepmac_maskrcnn.tasks import deep_mask_head_rcnn # pylint: disable=unused-import +from official.vision.serving import export_saved_model_lib + +FLAGS = flags.FLAGS + +flags.DEFINE_string('experiment', 'deep_mask_head_rcnn_resnetfpn_coco', + 'experiment type, e.g. retinanet_resnetfpn_coco') +flags.DEFINE_string('export_dir', None, 'The export directory.') +flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path.') +flags.DEFINE_multi_string( + 'config_file', + default=None, + help='YAML/JSON files which specifies overrides. The override order ' + 'follows the order of args. Note that each file ' + 'can be used as an override template to override the default parameters ' + 'specified in Python. If the same parameter is specified in both ' + '`--config_file` and `--params_override`, `config_file` will be used ' + 'first, followed by params_override.') +flags.DEFINE_string( + 'params_override', '', + 'The JSON/YAML file or string which specifies the parameter to be overriden' + ' on top of `config_file` template.') +flags.DEFINE_integer('batch_size', None, 'The batch size.') +flags.DEFINE_string('input_type', 'image_tensor', + ('One of `image_tensor`, `image_bytes`, `tf_example` ' + 'or `image_and_boxes_tensor`.')) +flags.DEFINE_string( + 'input_image_size', '224,224', + 'The comma-separated string of two integers representing the height,width ' + 'of the input to the model.') + + +def main(_): + + params = exp_factory.get_exp_config(FLAGS.experiment) + for config_file in FLAGS.config_file or []: + params = hyperparams.override_params_dict( + params, config_file, is_strict=True) + if FLAGS.params_override: + params = hyperparams.override_params_dict( + params, FLAGS.params_override, is_strict=True) + + params.validate() + params.lock() + + export_module = detection.DetectionModule( + params=params, + batch_size=FLAGS.batch_size, + input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], + num_channels=3) + + export_saved_model_lib.export_inference_graph( + input_type=FLAGS.input_type, + batch_size=FLAGS.batch_size, + input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], + params=params, + checkpoint_path=FLAGS.checkpoint_path, + export_dir=FLAGS.export_dir, + export_module=export_module, + export_checkpoint_subdir='checkpoint', + export_saved_model_subdir='saved_model') + + +if __name__ == '__main__': + app.run(main) diff --git a/official/projects/deepmac_maskrcnn/tasks/__init__.py b/official/projects/deepmac_maskrcnn/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/tasks/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/deepmac_maskrcnn/tasks/deep_mask_head_rcnn.py b/official/projects/deepmac_maskrcnn/tasks/deep_mask_head_rcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..f15df962a025f87eac015b267a6bb7483dd7be18 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/tasks/deep_mask_head_rcnn.py @@ -0,0 +1,194 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Mask R-CNN variant with support for deep mask heads.""" + +import tensorflow as tf + +from official.core import task_factory +from official.projects.deepmac_maskrcnn.configs import deep_mask_head_rcnn as deep_mask_head_rcnn_config +from official.projects.deepmac_maskrcnn.modeling import maskrcnn_model as deep_maskrcnn_model +from official.projects.deepmac_maskrcnn.modeling.heads import instance_heads as deep_instance_heads +from official.vision.modeling import backbones +from official.vision.modeling.decoders import factory as decoder_factory +from official.vision.modeling.heads import dense_prediction_heads +from official.vision.modeling.heads import instance_heads +from official.vision.modeling.layers import detection_generator +from official.vision.modeling.layers import mask_sampler +from official.vision.modeling.layers import roi_aligner +from official.vision.modeling.layers import roi_generator +from official.vision.modeling.layers import roi_sampler +from official.vision.tasks import maskrcnn + + +# Taken from modeling/factory.py +def build_maskrcnn(input_specs: tf.keras.layers.InputSpec, + model_config: deep_mask_head_rcnn_config.DeepMaskHeadRCNN, + l2_regularizer: tf.keras.regularizers.Regularizer = None): # pytype: disable=annotation-type-mismatch # typed-keras + """Builds Mask R-CNN model.""" + norm_activation_config = model_config.norm_activation + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + + decoder = decoder_factory.build_decoder( + input_specs=backbone.output_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + rpn_head_config = model_config.rpn_head + roi_generator_config = model_config.roi_generator + roi_sampler_config = model_config.roi_sampler + roi_aligner_config = model_config.roi_aligner + detection_head_config = model_config.detection_head + generator_config = model_config.detection_generator + num_anchors_per_location = ( + len(model_config.anchor.aspect_ratios) * model_config.anchor.num_scales) + + rpn_head = dense_prediction_heads.RPNHead( + min_level=model_config.min_level, + max_level=model_config.max_level, + num_anchors_per_location=num_anchors_per_location, + num_convs=rpn_head_config.num_convs, + num_filters=rpn_head_config.num_filters, + use_separable_conv=rpn_head_config.use_separable_conv, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + detection_head = instance_heads.DetectionHead( + num_classes=model_config.num_classes, + num_convs=detection_head_config.num_convs, + num_filters=detection_head_config.num_filters, + use_separable_conv=detection_head_config.use_separable_conv, + num_fcs=detection_head_config.num_fcs, + fc_dims=detection_head_config.fc_dims, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + roi_generator_obj = roi_generator.MultilevelROIGenerator( + pre_nms_top_k=roi_generator_config.pre_nms_top_k, + pre_nms_score_threshold=roi_generator_config.pre_nms_score_threshold, + pre_nms_min_size_threshold=( + roi_generator_config.pre_nms_min_size_threshold), + nms_iou_threshold=roi_generator_config.nms_iou_threshold, + num_proposals=roi_generator_config.num_proposals, + test_pre_nms_top_k=roi_generator_config.test_pre_nms_top_k, + test_pre_nms_score_threshold=( + roi_generator_config.test_pre_nms_score_threshold), + test_pre_nms_min_size_threshold=( + roi_generator_config.test_pre_nms_min_size_threshold), + test_nms_iou_threshold=roi_generator_config.test_nms_iou_threshold, + test_num_proposals=roi_generator_config.test_num_proposals, + use_batched_nms=roi_generator_config.use_batched_nms) + + roi_sampler_obj = roi_sampler.ROISampler( + mix_gt_boxes=roi_sampler_config.mix_gt_boxes, + num_sampled_rois=roi_sampler_config.num_sampled_rois, + foreground_fraction=roi_sampler_config.foreground_fraction, + foreground_iou_threshold=roi_sampler_config.foreground_iou_threshold, + background_iou_high_threshold=( + roi_sampler_config.background_iou_high_threshold), + background_iou_low_threshold=( + roi_sampler_config.background_iou_low_threshold)) + + roi_aligner_obj = roi_aligner.MultilevelROIAligner( + crop_size=roi_aligner_config.crop_size, + sample_offset=roi_aligner_config.sample_offset) + + detection_generator_obj = detection_generator.DetectionGenerator( + apply_nms=True, + pre_nms_top_k=generator_config.pre_nms_top_k, + pre_nms_score_threshold=generator_config.pre_nms_score_threshold, + nms_iou_threshold=generator_config.nms_iou_threshold, + max_num_detections=generator_config.max_num_detections, + nms_version=generator_config.nms_version) + + if model_config.include_mask: + mask_head = deep_instance_heads.DeepMaskHead( + num_classes=model_config.num_classes, + upsample_factor=model_config.mask_head.upsample_factor, + num_convs=model_config.mask_head.num_convs, + num_filters=model_config.mask_head.num_filters, + use_separable_conv=model_config.mask_head.use_separable_conv, + activation=model_config.norm_activation.activation, + norm_momentum=model_config.norm_activation.norm_momentum, + norm_epsilon=model_config.norm_activation.norm_epsilon, + kernel_regularizer=l2_regularizer, + class_agnostic=model_config.mask_head.class_agnostic, + convnet_variant=model_config.mask_head.convnet_variant) + + mask_sampler_obj = mask_sampler.MaskSampler( + mask_target_size=( + model_config.mask_roi_aligner.crop_size * + model_config.mask_head.upsample_factor), + num_sampled_masks=model_config.mask_sampler.num_sampled_masks) + + mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner( + crop_size=model_config.mask_roi_aligner.crop_size, + sample_offset=model_config.mask_roi_aligner.sample_offset) + else: + mask_head = None + mask_sampler_obj = None + mask_roi_aligner_obj = None + + model = deep_maskrcnn_model.DeepMaskRCNNModel( + backbone=backbone, + decoder=decoder, + rpn_head=rpn_head, + detection_head=detection_head, + roi_generator=roi_generator_obj, + roi_sampler=roi_sampler_obj, + roi_aligner=roi_aligner_obj, + detection_generator=detection_generator_obj, + mask_head=mask_head, + mask_sampler=mask_sampler_obj, + mask_roi_aligner=mask_roi_aligner_obj, + use_gt_boxes_for_masks=model_config.use_gt_boxes_for_masks) + return model + + +@task_factory.register_task_cls(deep_mask_head_rcnn_config.DeepMaskHeadRCNNTask) +class DeepMaskHeadRCNNTask(maskrcnn.MaskRCNNTask): + """Mask R-CNN with support for deep mask heads.""" + + def build_model(self): + """Build Mask R-CNN model.""" + + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = build_maskrcnn( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + + if self.task_config.freeze_backbone: + model.backbone.trainable = False + + return model diff --git a/official/projects/deepmac_maskrcnn/train.py b/official/projects/deepmac_maskrcnn/train.py new file mode 100644 index 0000000000000000000000000000000000000000..ac866f51ded4ccfd185d9369432a74f602d44517 --- /dev/null +++ b/official/projects/deepmac_maskrcnn/train.py @@ -0,0 +1,71 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver.""" + +from absl import app +from absl import flags +from absl import logging + +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +# pylint: disable=unused-import +from official.projects.deepmac_maskrcnn.common import registry_imports +# pylint: enable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + logging.info('Training with task %s', task) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(main) diff --git a/official/projects/detr/README.md b/official/projects/detr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e8860f5e1eb881624608ba2c56b98538e7c7bf1d --- /dev/null +++ b/official/projects/detr/README.md @@ -0,0 +1,46 @@ +# End-to-End Object Detection with Transformers (DETR) + +[![DETR](https://img.shields.io/badge/DETR-arXiv.2005.12872-B3181B?)](https://arxiv.org/abs/2005.12872). + +TensorFlow 2 implementation of End-to-End Object Detection with Transformers + +⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or +distributed by Google. The dataset is made available by third parties. +Please review the terms and conditions made available by the third parties +before using the data. + +## Scripts: + +You can find the scripts to reproduce the following experiments in +detr/experiments. + + +## DETR [COCO](https://cocodataset.org) ([ImageNet](https://www.image-net.org) pretrained) + +| Model | Resolution | Batch size | Epochs | Decay@ | Params (M) | Box AP | Dashboard | Checkpoint | Experiment | +| --------- | :--------: | ----------:| ------:| -----: | ---------: | -----: | --------: | ---------: | ---------: | +| DETR-ResNet-50 | 1333x1333 |64|300| 200 |41 | 40.6 | [tensorboard](https://tensorboard.dev/experiment/o2IEZnniRYu6pqViBeopIg/#scalars) | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/detr/detr_resnet_50_300.tar.gz) | detr_r50_300epochs.sh | +| DETR-ResNet-50 | 1333x1333 |64|500| 400 |41 | 42.0| [tensorboard](https://tensorboard.dev/experiment/YFMDKpESR4yjocPh5HgfRw/) | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/detr/detr_resnet_50_500.tar.gz) | detr_r50_500epochs.sh | +| DETR-ResNet-50 | 1333x1333 |64|300| 200 |41 | 40.6 | paper | NA | NA | +| DETR-ResNet-50 | 1333x1333 |64|500| 400 |41 | 42.0 | paper | NA | NA | +| DETR-DC5-ResNet-50 | 1333x1333 |64|500| 400 |41 | 43.3 | paper | NA | NA | + +## Need contribution: + +* Add DC5 support and update experiment table. + + +## Citing TensorFlow Model Garden + +If you find this codebase helpful in your research, please cite this repository. + +``` +@misc{tensorflowmodelgarden2020, + author = {Hongkun Yu and Chen Chen and Xianzhi Du and Yeqing Li and + Abdullah Rashwan and Le Hou and Pengchong Jin and Fan Yang and + Frederick Liu and Jaeyoun Kim and Jing Li}, + title = {{TensorFlow Model Garden}}, + howpublished = {\url{https://github.com/tensorflow/models}}, + year = {2020} +} +``` diff --git a/official/projects/detr/configs/detr.py b/official/projects/detr/configs/detr.py new file mode 100644 index 0000000000000000000000000000000000000000..bcdd50e95b0b0f659c10b1566cbdfb254ef1bee2 --- /dev/null +++ b/official/projects/detr/configs/detr.py @@ -0,0 +1,277 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""DETR configurations.""" + +import dataclasses +import os +from typing import List, Optional, Union + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.projects.detr import optimization +from official.projects.detr.dataloaders import coco +from official.vision.configs import backbones +from official.vision.configs import common + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + input_path: str = '' + tfds_name: str = '' + tfds_split: str = 'train' + global_batch_size: int = 0 + is_training: bool = False + dtype: str = 'bfloat16' + decoder: common.DataDecoder = common.DataDecoder() + shuffle_buffer_size: int = 10000 + file_type: str = 'tfrecord' + drop_remainder: bool = True + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + class_offset: int = 0 + lambda_cls: float = 1.0 + lambda_box: float = 5.0 + lambda_giou: float = 2.0 + background_cls_weight: float = 0.1 + l2_weight_decay: float = 1e-4 + + +@dataclasses.dataclass +class Detr(hyperparams.Config): + """Detr model definations.""" + num_queries: int = 100 + hidden_size: int = 256 + num_classes: int = 91 # 0: background + num_encoder_layers: int = 6 + num_decoder_layers: int = 6 + input_size: List[int] = dataclasses.field(default_factory=list) + backbone: backbones.Backbone = backbones.Backbone( + type='resnet', resnet=backbones.ResNet(model_id=50, bn_trainable=False)) + norm_activation: common.NormActivation = common.NormActivation() + backbone_endpoint_name: str = '5' + + +@dataclasses.dataclass +class DetrTask(cfg.TaskConfig): + model: Detr = Detr() + train_data: cfg.DataConfig = cfg.DataConfig() + validation_data: cfg.DataConfig = cfg.DataConfig() + losses: Losses = Losses() + init_checkpoint: Optional[str] = None + init_checkpoint_modules: Union[str, List[str]] = 'all' # all, backbone + annotation_file: Optional[str] = None + per_category_metrics: bool = False + + +COCO_INPUT_PATH_BASE = 'coco' +COCO_TRAIN_EXAMPLES = 118287 +COCO_VAL_EXAMPLES = 5000 + + +@exp_factory.register_config_factory('detr_coco') +def detr_coco() -> cfg.ExperimentConfig: + """Config to get results that matches the paper.""" + train_batch_size = 64 + eval_batch_size = 64 + num_train_data = COCO_TRAIN_EXAMPLES + num_steps_per_epoch = num_train_data // train_batch_size + train_steps = 500 * num_steps_per_epoch # 500 epochs + decay_at = train_steps - 100 * num_steps_per_epoch # 400 epochs + config = cfg.ExperimentConfig( + task=DetrTask( + init_checkpoint='', + init_checkpoint_modules='backbone', + model=Detr( + num_classes=81, + input_size=[1333, 1333, 3], + norm_activation=common.NormActivation()), + losses=Losses(), + train_data=coco.COCODataConfig( + tfds_name='coco/2017', + tfds_split='train', + is_training=True, + global_batch_size=train_batch_size, + shuffle_buffer_size=1000, + ), + validation_data=coco.COCODataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), + trainer=cfg.TrainerConfig( + train_steps=train_steps, + validation_steps=-1, + steps_per_loop=10000, + summary_interval=10000, + checkpoint_interval=10000, + validation_interval=10000, + max_to_keep=1, + best_checkpoint_export_subdir='best_ckpt', + best_checkpoint_eval_metric='AP', + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'detr_adamw', + 'detr_adamw': { + 'weight_decay_rate': 1e-4, + 'global_clipnorm': 0.1, + # Avoid AdamW legacy behavior. + 'gradient_clip_norm': 0.0 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [decay_at], + 'values': [0.0001, 1.0e-05] + } + }, + })), + restrictions=[ + 'task.train_data.is_training != None', + ]) + return config + + +@exp_factory.register_config_factory('detr_coco_tfrecord') +def detr_coco_tfrecord() -> cfg.ExperimentConfig: + """Config to get results that matches the paper.""" + train_batch_size = 64 + eval_batch_size = 64 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + train_steps = 300 * steps_per_epoch # 300 epochs + decay_at = train_steps - 100 * steps_per_epoch # 200 epochs + config = cfg.ExperimentConfig( + task=DetrTask( + init_checkpoint='', + init_checkpoint_modules='backbone', + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=Detr( + input_size=[1333, 1333, 3], + norm_activation=common.NormActivation()), + losses=Losses(), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + shuffle_buffer_size=1000, + ), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False, + )), + trainer=cfg.TrainerConfig( + train_steps=train_steps, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + validation_interval=5 * steps_per_epoch, + max_to_keep=1, + best_checkpoint_export_subdir='best_ckpt', + best_checkpoint_eval_metric='AP', + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'detr_adamw', + 'detr_adamw': { + 'weight_decay_rate': 1e-4, + 'global_clipnorm': 0.1, + # Avoid AdamW legacy behavior. + 'gradient_clip_norm': 0.0 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [decay_at], + 'values': [0.0001, 1.0e-05] + } + }, + })), + restrictions=[ + 'task.train_data.is_training != None', + ]) + return config + + +@exp_factory.register_config_factory('detr_coco_tfds') +def detr_coco_tfds() -> cfg.ExperimentConfig: + """Config to get results that matches the paper.""" + train_batch_size = 64 + eval_batch_size = 64 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + train_steps = 300 * steps_per_epoch # 300 epochs + decay_at = train_steps - 100 * steps_per_epoch # 200 epochs + config = cfg.ExperimentConfig( + task=DetrTask( + init_checkpoint='', + init_checkpoint_modules='backbone', + model=Detr( + num_classes=81, + input_size=[1333, 1333, 3], + norm_activation=common.NormActivation()), + losses=Losses(class_offset=1), + train_data=DataConfig( + tfds_name='coco/2017', + tfds_split='train', + is_training=True, + global_batch_size=train_batch_size, + shuffle_buffer_size=1000, + ), + validation_data=DataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), + trainer=cfg.TrainerConfig( + train_steps=train_steps, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + validation_interval=5 * steps_per_epoch, + max_to_keep=1, + best_checkpoint_export_subdir='best_ckpt', + best_checkpoint_eval_metric='AP', + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'detr_adamw', + 'detr_adamw': { + 'weight_decay_rate': 1e-4, + 'global_clipnorm': 0.1, + # Avoid AdamW legacy behavior. + 'gradient_clip_norm': 0.0 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [decay_at], + 'values': [0.0001, 1.0e-05] + } + }, + })), + restrictions=[ + 'task.train_data.is_training != None', + ]) + return config diff --git a/official/projects/detr/configs/detr_test.py b/official/projects/detr/configs/detr_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7a3f04d457689b5d8601ef8f8b2941840bcf088d --- /dev/null +++ b/official/projects/detr/configs/detr_test.py @@ -0,0 +1,51 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for detr.""" + +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.detr.configs import detr as exp_cfg +from official.projects.detr.dataloaders import coco + + +class DetrTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters(('detr_coco',)) + def test_detr_configs_tfds(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.DetrTask) + self.assertIsInstance(config.task.train_data, coco.COCODataConfig) + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + + @parameterized.parameters(('detr_coco_tfrecord'), ('detr_coco_tfds')) + def test_detr_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.DetrTask) + self.assertIsInstance(config.task.train_data, cfg.DataConfig) + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/detr/dataloaders/coco.py b/official/projects/detr/dataloaders/coco.py new file mode 100644 index 0000000000000000000000000000000000000000..e9c2e5fb37d524ae61664357310a546169595cd3 --- /dev/null +++ b/official/projects/detr/dataloaders/coco.py @@ -0,0 +1,157 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""COCO data loader for DETR.""" + +import dataclasses +from typing import Optional, Tuple +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.core import input_reader +from official.vision.ops import box_ops +from official.vision.ops import preprocess_ops + + +@dataclasses.dataclass +class COCODataConfig(cfg.DataConfig): + """Data config for COCO.""" + output_size: Tuple[int, int] = (1333, 1333) + max_num_boxes: int = 100 + resize_scales: Tuple[int, ...] = ( + 480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) + + +class COCODataLoader(): + """A class to load dataset for COCO detection task.""" + + def __init__(self, params: COCODataConfig): + self._params = params + + def preprocess(self, inputs): + """Preprocess COCO for DETR.""" + image = inputs['image'] + boxes = inputs['objects']['bbox'] + classes = inputs['objects']['label'] + 1 + is_crowd = inputs['objects']['is_crowd'] + + image = preprocess_ops.normalize_image(image) + if self._params.is_training: + image, boxes, _ = preprocess_ops.random_horizontal_flip(image, boxes) + + do_crop = tf.greater(tf.random.uniform([]), 0.5) + if do_crop: + # Rescale + boxes = box_ops.denormalize_boxes(boxes, tf.shape(image)[:2]) + index = tf.random.categorical(tf.zeros([1, 3]), 1)[0] + scales = tf.gather([400.0, 500.0, 600.0], index, axis=0) + short_side = scales[0] + image, image_info = preprocess_ops.resize_image(image, short_side) + boxes = preprocess_ops.resize_and_crop_boxes(boxes, + image_info[2, :], + image_info[1, :], + image_info[3, :]) + boxes = box_ops.normalize_boxes(boxes, image_info[1, :]) + + # Do croping + shape = tf.cast(image_info[1], dtype=tf.int32) + h = tf.random.uniform( + [], 384, tf.math.minimum(shape[0], 600), dtype=tf.int32) + w = tf.random.uniform( + [], 384, tf.math.minimum(shape[1], 600), dtype=tf.int32) + i = tf.random.uniform([], 0, shape[0] - h + 1, dtype=tf.int32) + j = tf.random.uniform([], 0, shape[1] - w + 1, dtype=tf.int32) + image = tf.image.crop_to_bounding_box(image, i, j, h, w) + boxes = tf.clip_by_value( + (boxes[..., :] * tf.cast( + tf.stack([shape[0], shape[1], shape[0], shape[1]]), + dtype=tf.float32) - + tf.cast(tf.stack([i, j, i, j]), dtype=tf.float32)) / + tf.cast(tf.stack([h, w, h, w]), dtype=tf.float32), 0.0, 1.0) + scales = tf.constant( + self._params.resize_scales, + dtype=tf.float32) + index = tf.random.categorical(tf.zeros([1, 11]), 1)[0] + scales = tf.gather(scales, index, axis=0) + else: + scales = tf.constant([self._params.resize_scales[-1]], tf.float32) + + image_shape = tf.shape(image)[:2] + boxes = box_ops.denormalize_boxes(boxes, image_shape) + gt_boxes = boxes + short_side = scales[0] + image, image_info = preprocess_ops.resize_image( + image, + short_side, + max(self._params.output_size)) + boxes = preprocess_ops.resize_and_crop_boxes(boxes, + image_info[2, :], + image_info[1, :], + image_info[3, :]) + boxes = box_ops.normalize_boxes(boxes, image_info[1, :]) + + # Filters out ground truth boxes that are all zeros. + indices = box_ops.get_non_empty_box_indices(boxes) + boxes = tf.gather(boxes, indices) + classes = tf.gather(classes, indices) + is_crowd = tf.gather(is_crowd, indices) + boxes = box_ops.yxyx_to_cycxhw(boxes) + + image = tf.image.pad_to_bounding_box( + image, 0, 0, self._params.output_size[0], self._params.output_size[1]) + labels = { + 'classes': + preprocess_ops.clip_or_pad_to_fixed_size( + classes, self._params.max_num_boxes), + 'boxes': + preprocess_ops.clip_or_pad_to_fixed_size( + boxes, self._params.max_num_boxes) + } + if not self._params.is_training: + labels.update({ + 'id': + inputs['image/id'], + 'image_info': + image_info, + 'is_crowd': + preprocess_ops.clip_or_pad_to_fixed_size( + is_crowd, self._params.max_num_boxes), + 'gt_boxes': + preprocess_ops.clip_or_pad_to_fixed_size( + gt_boxes, self._params.max_num_boxes), + }) + + return image, labels + + def _transform_and_batch_fn( + self, + dataset, + input_context: Optional[tf.distribute.InputContext] = None): + """Preprocess and batch.""" + dataset = dataset.map( + self.preprocess, num_parallel_calls=tf.data.experimental.AUTOTUNE) + per_replica_batch_size = input_context.get_per_replica_batch_size( + self._params.global_batch_size + ) if input_context else self._params.global_batch_size + dataset = dataset.batch( + per_replica_batch_size, drop_remainder=self._params.drop_remainder) + return dataset + + def load(self, input_context: Optional[tf.distribute.InputContext] = None): + """Returns a tf.dataset.Dataset.""" + reader = input_reader.InputReader( + params=self._params, + decoder_fn=None, + transform_and_batch_fn=self._transform_and_batch_fn) + return reader.read(input_context) diff --git a/official/projects/detr/dataloaders/coco_test.py b/official/projects/detr/dataloaders/coco_test.py new file mode 100644 index 0000000000000000000000000000000000000000..cad38e18c0091d84914449b9fef608134d4a36dc --- /dev/null +++ b/official/projects/detr/dataloaders/coco_test.py @@ -0,0 +1,111 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for tensorflow_models.official.projects.detr.dataloaders.coco.""" + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf +import tensorflow_datasets as tfds + +from official.projects.detr.dataloaders import coco + + +def _gen_fn(): + h = np.random.randint(0, 300) + w = np.random.randint(0, 300) + num_boxes = np.random.randint(0, 50) + return { + 'image': np.ones(shape=(h, w, 3), dtype=np.uint8), + 'image/id': np.random.randint(0, 100), + 'image/filename': 'test', + 'objects': { + 'is_crowd': np.ones(shape=(num_boxes), dtype=np.bool), + 'bbox': np.ones(shape=(num_boxes, 4), dtype=np.float32), + 'label': np.ones(shape=(num_boxes), dtype=np.int64), + 'id': np.ones(shape=(num_boxes), dtype=np.int64), + 'area': np.ones(shape=(num_boxes), dtype=np.int64), + } + } + + +class CocoDataloaderTest(tf.test.TestCase, parameterized.TestCase): + + def test_load_dataset(self): + output_size = 1280 + max_num_boxes = 100 + batch_size = 2 + data_config = coco.COCODataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=False, + global_batch_size=batch_size, + output_size=(output_size, output_size), + max_num_boxes=max_num_boxes, + ) + + num_examples = 10 + def as_dataset(self, *args, **kwargs): + del args + del kwargs + return tf.data.Dataset.from_generator( + lambda: (_gen_fn() for i in range(num_examples)), + output_types=self.info.features.dtype, + output_shapes=self.info.features.shape, + ) + + with tfds.testing.mock_data(num_examples=num_examples, + as_dataset_fn=as_dataset): + dataset = coco.COCODataLoader(data_config).load() + dataset_iter = iter(dataset) + images, labels = next(dataset_iter) + self.assertEqual(images.shape, (batch_size, output_size, output_size, 3)) + self.assertEqual(labels['classes'].shape, (batch_size, max_num_boxes)) + self.assertEqual(labels['boxes'].shape, (batch_size, max_num_boxes, 4)) + self.assertEqual(labels['id'].shape, (batch_size,)) + self.assertEqual( + labels['image_info'].shape, (batch_size, 4, 2)) + self.assertEqual(labels['is_crowd'].shape, (batch_size, max_num_boxes)) + + @parameterized.named_parameters( + ('training', True), + ('validation', False)) + def test_preprocess(self, is_training): + output_size = 1280 + max_num_boxes = 100 + batch_size = 2 + data_config = coco.COCODataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=is_training, + global_batch_size=batch_size, + output_size=(output_size, output_size), + max_num_boxes=max_num_boxes, + ) + + dl = coco.COCODataLoader(data_config) + inputs = _gen_fn() + image, label = dl.preprocess(inputs) + self.assertEqual(image.shape, (output_size, output_size, 3)) + self.assertEqual(label['classes'].shape, (max_num_boxes)) + self.assertEqual(label['boxes'].shape, (max_num_boxes, 4)) + if not is_training: + self.assertDTypeEqual(label['id'], int) + self.assertEqual( + label['image_info'].shape, (4, 2)) + self.assertEqual(label['is_crowd'].shape, (max_num_boxes)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/detr/dataloaders/detr_input.py b/official/projects/detr/dataloaders/detr_input.py new file mode 100644 index 0000000000000000000000000000000000000000..2085d56ac848193cd32fa6f60f21d32e1e2cd432 --- /dev/null +++ b/official/projects/detr/dataloaders/detr_input.py @@ -0,0 +1,175 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""COCO data loader for DETR.""" + +from typing import Tuple +import tensorflow as tf + +from official.vision.dataloaders import parser + +from official.vision.ops import box_ops +from official.vision.ops import preprocess_ops + +RESIZE_SCALES = (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800) + + +class Parser(parser.Parser): + """Parse an image and its annotations into a dictionary of tensors.""" + + def __init__(self, + class_offset: int = 0, + output_size: Tuple[int, int] = (1333, 1333), + max_num_boxes: int = 100, + resize_scales: Tuple[int, ...] = RESIZE_SCALES, + aug_rand_hflip=True): + self._class_offset = class_offset + self._output_size = output_size + self._max_num_boxes = max_num_boxes + self._resize_scales = resize_scales + self._aug_rand_hflip = aug_rand_hflip + + def _parse_train_data(self, data): + """Parses data for training and evaluation.""" + classes = data['groundtruth_classes'] + self._class_offset + boxes = data['groundtruth_boxes'] + is_crowd = data['groundtruth_is_crowd'] + + # Gets original image. + image = data['image'] + + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image(image) + image, boxes, _ = preprocess_ops.random_horizontal_flip(image, boxes) + + do_crop = tf.greater(tf.random.uniform([]), 0.5) + if do_crop: + # Rescale + boxes = box_ops.denormalize_boxes(boxes, tf.shape(image)[:2]) + index = tf.random.categorical(tf.zeros([1, 3]), 1)[0] + scales = tf.gather([400.0, 500.0, 600.0], index, axis=0) + short_side = scales[0] + image, image_info = preprocess_ops.resize_image(image, short_side) + boxes = preprocess_ops.resize_and_crop_boxes(boxes, image_info[2, :], + image_info[1, :], + image_info[3, :]) + boxes = box_ops.normalize_boxes(boxes, image_info[1, :]) + + # Do croping + shape = tf.cast(image_info[1], dtype=tf.int32) + h = tf.random.uniform([], + 384, + tf.math.minimum(shape[0], 600), + dtype=tf.int32) + w = tf.random.uniform([], + 384, + tf.math.minimum(shape[1], 600), + dtype=tf.int32) + i = tf.random.uniform([], 0, shape[0] - h + 1, dtype=tf.int32) + j = tf.random.uniform([], 0, shape[1] - w + 1, dtype=tf.int32) + image = tf.image.crop_to_bounding_box(image, i, j, h, w) + boxes = tf.clip_by_value( + (boxes[..., :] * tf.cast( + tf.stack([shape[0], shape[1], shape[0], shape[1]]), + dtype=tf.float32) - + tf.cast(tf.stack([i, j, i, j]), dtype=tf.float32)) / + tf.cast(tf.stack([h, w, h, w]), dtype=tf.float32), 0.0, 1.0) + scales = tf.constant(self._resize_scales, dtype=tf.float32) + index = tf.random.categorical(tf.zeros([1, 11]), 1)[0] + scales = tf.gather(scales, index, axis=0) + + image_shape = tf.shape(image)[:2] + boxes = box_ops.denormalize_boxes(boxes, image_shape) + short_side = scales[0] + image, image_info = preprocess_ops.resize_image(image, short_side, + max(self._output_size)) + boxes = preprocess_ops.resize_and_crop_boxes(boxes, image_info[2, :], + image_info[1, :], + image_info[3, :]) + boxes = box_ops.normalize_boxes(boxes, image_info[1, :]) + + # Filters out ground truth boxes that are all zeros. + indices = box_ops.get_non_empty_box_indices(boxes) + boxes = tf.gather(boxes, indices) + classes = tf.gather(classes, indices) + is_crowd = tf.gather(is_crowd, indices) + boxes = box_ops.yxyx_to_cycxhw(boxes) + + image = tf.image.pad_to_bounding_box(image, 0, 0, self._output_size[0], + self._output_size[1]) + labels = { + 'classes': + preprocess_ops.clip_or_pad_to_fixed_size(classes, + self._max_num_boxes), + 'boxes': + preprocess_ops.clip_or_pad_to_fixed_size(boxes, self._max_num_boxes) + } + + return image, labels + + def _parse_eval_data(self, data): + """Parses data for training and evaluation.""" + classes = data['groundtruth_classes'] + boxes = data['groundtruth_boxes'] + is_crowd = data['groundtruth_is_crowd'] + + # Gets original image and its size. + image = data['image'] + + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image(image) + + scales = tf.constant([self._resize_scales[-1]], tf.float32) + + image_shape = tf.shape(image)[:2] + boxes = box_ops.denormalize_boxes(boxes, image_shape) + gt_boxes = boxes + short_side = scales[0] + image, image_info = preprocess_ops.resize_image(image, short_side, + max(self._output_size)) + boxes = preprocess_ops.resize_and_crop_boxes(boxes, image_info[2, :], + image_info[1, :], + image_info[3, :]) + boxes = box_ops.normalize_boxes(boxes, image_info[1, :]) + + # Filters out ground truth boxes that are all zeros. + indices = box_ops.get_non_empty_box_indices(boxes) + boxes = tf.gather(boxes, indices) + classes = tf.gather(classes, indices) + is_crowd = tf.gather(is_crowd, indices) + boxes = box_ops.yxyx_to_cycxhw(boxes) + + image = tf.image.pad_to_bounding_box(image, 0, 0, self._output_size[0], + self._output_size[1]) + labels = { + 'classes': + preprocess_ops.clip_or_pad_to_fixed_size(classes, + self._max_num_boxes), + 'boxes': + preprocess_ops.clip_or_pad_to_fixed_size(boxes, self._max_num_boxes) + } + labels.update({ + 'id': + int(data['source_id']), + 'image_info': + image_info, + 'is_crowd': + preprocess_ops.clip_or_pad_to_fixed_size(is_crowd, + self._max_num_boxes), + 'gt_boxes': + preprocess_ops.clip_or_pad_to_fixed_size(gt_boxes, + self._max_num_boxes), + }) + + return image, labels diff --git a/official/projects/detr/experiments/detr_r50_300epochs.sh b/official/projects/detr/experiments/detr_r50_300epochs.sh new file mode 100644 index 0000000000000000000000000000000000000000..162f974306bfdaed79cf0f092285bb614ec7fc69 --- /dev/null +++ b/official/projects/detr/experiments/detr_r50_300epochs.sh @@ -0,0 +1,6 @@ +#!/bin/bash +python3 official/projects/detr/train.py \ + --experiment=detr_coco \ + --mode=train_and_eval \ + --model_dir=/tmp/logging_dir/ \ + --params_override=task.init_checkpoint='gs://tf_model_garden/vision/resnet50_imagenet/ckpt-62400',trainer.train_steps=554400,trainer.optimizer_config.learning_rate.stepwise.boundaries="[369600]" diff --git a/official/projects/detr/experiments/detr_r50_500epochs.sh b/official/projects/detr/experiments/detr_r50_500epochs.sh new file mode 100644 index 0000000000000000000000000000000000000000..58036040578a73f35ef663c91aa5ab828c90cb4b --- /dev/null +++ b/official/projects/detr/experiments/detr_r50_500epochs.sh @@ -0,0 +1,6 @@ +#!/bin/bash +python3 official/projects/detr/train.py \ + --experiment=detr_coco \ + --mode=train_and_eval \ + --model_dir=/tmp/logging_dir/ \ + --params_override=task.init_checkpoint='gs://tf_model_garden/vision/resnet50_imagenet/ckpt-62400' diff --git a/official/projects/detr/modeling/detr.py b/official/projects/detr/modeling/detr.py new file mode 100644 index 0000000000000000000000000000000000000000..a3051aa796e8259d91040bfbf3a7687ddb7efd52 --- /dev/null +++ b/official/projects/detr/modeling/detr.py @@ -0,0 +1,297 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Implements End-to-End Object Detection with Transformers. + +Model paper: https://arxiv.org/abs/2005.12872 +This module does not support Keras de/serialization. Please use +tf.train.Checkpoint for object based saving and loading and tf.saved_model.save +for graph serializaiton. +""" +import math +from typing import Any, List + +import tensorflow as tf + +from official.modeling import tf_utils +from official.projects.detr.modeling import transformer + + +def position_embedding_sine(attention_mask, + num_pos_features=256, + temperature=10000., + normalize=True, + scale=2 * math.pi): + """Sine-based positional embeddings for 2D images. + + Args: + attention_mask: a `bool` Tensor specifying the size of the input image to + the Transformer and which elements are padded, of size [batch_size, + height, width] + num_pos_features: a `int` specifying the number of positional features, + should be equal to the hidden size of the Transformer network + temperature: a `float` specifying the temperature of the positional + embedding. Any type that is converted to a `float` can also be accepted. + normalize: a `bool` determining whether the positional embeddings should be + normalized between [0, scale] before application of the sine and cos + functions. + scale: a `float` if normalize is True specifying the scale embeddings before + application of the embedding function. + + Returns: + embeddings: a `float` tensor of the same shape as input_tensor specifying + the positional embeddings based on sine features. + """ + if num_pos_features % 2 != 0: + raise ValueError( + "Number of embedding features (num_pos_features) must be even when " + "column and row embeddings are concatenated.") + num_pos_features = num_pos_features // 2 + + # Produce row and column embeddings based on total size of the image + # [batch_size, height, width] + attention_mask = tf.cast(attention_mask, tf.float32) + row_embedding = tf.cumsum(attention_mask, 1) + col_embedding = tf.cumsum(attention_mask, 2) + + if normalize: + eps = 1e-6 + row_embedding = row_embedding / (row_embedding[:, -1:, :] + eps) * scale + col_embedding = col_embedding / (col_embedding[:, :, -1:] + eps) * scale + + dim_t = tf.range(num_pos_features, dtype=row_embedding.dtype) + dim_t = tf.pow(temperature, 2 * (dim_t // 2) / num_pos_features) + + # Creates positional embeddings for each row and column position + # [batch_size, height, width, num_pos_features] + pos_row = tf.expand_dims(row_embedding, -1) / dim_t + pos_col = tf.expand_dims(col_embedding, -1) / dim_t + pos_row = tf.stack( + [tf.sin(pos_row[:, :, :, 0::2]), + tf.cos(pos_row[:, :, :, 1::2])], axis=4) + pos_col = tf.stack( + [tf.sin(pos_col[:, :, :, 0::2]), + tf.cos(pos_col[:, :, :, 1::2])], axis=4) + + # final_shape = pos_row.shape.as_list()[:3] + [-1] + final_shape = tf_utils.get_shape_list(pos_row)[:3] + [-1] + pos_row = tf.reshape(pos_row, final_shape) + pos_col = tf.reshape(pos_col, final_shape) + output = tf.concat([pos_row, pos_col], -1) + + embeddings = tf.cast(output, tf.float32) + return embeddings + + +class DETR(tf.keras.Model): + """DETR model with Keras. + + DETR consists of backbone, query embedding, DETRTransformer, + class and box heads. + """ + + def __init__(self, + backbone, + backbone_endpoint_name, + num_queries, + hidden_size, + num_classes, + num_encoder_layers=6, + num_decoder_layers=6, + dropout_rate=0.1, + **kwargs): + super().__init__(**kwargs) + self._num_queries = num_queries + self._hidden_size = hidden_size + self._num_classes = num_classes + self._num_encoder_layers = num_encoder_layers + self._num_decoder_layers = num_decoder_layers + self._dropout_rate = dropout_rate + if hidden_size % 2 != 0: + raise ValueError("hidden_size must be a multiple of 2.") + self._backbone = backbone + self._backbone_endpoint_name = backbone_endpoint_name + + def build(self, input_shape=None): + self._input_proj = tf.keras.layers.Conv2D( + self._hidden_size, 1, name="detr/conv2d") + self._build_detection_decoder() + super().build(input_shape) + + def _build_detection_decoder(self): + """Builds detection decoder.""" + self._transformer = DETRTransformer( + num_encoder_layers=self._num_encoder_layers, + num_decoder_layers=self._num_decoder_layers, + dropout_rate=self._dropout_rate) + self._query_embeddings = self.add_weight( + "detr/query_embeddings", + shape=[self._num_queries, self._hidden_size], + initializer=tf.keras.initializers.RandomNormal(mean=0., stddev=1.), + dtype=tf.float32) + sqrt_k = math.sqrt(1.0 / self._hidden_size) + self._class_embed = tf.keras.layers.Dense( + self._num_classes, + kernel_initializer=tf.keras.initializers.RandomUniform(-sqrt_k, sqrt_k), + name="detr/cls_dense") + self._bbox_embed = [ + tf.keras.layers.Dense( + self._hidden_size, activation="relu", + kernel_initializer=tf.keras.initializers.RandomUniform( + -sqrt_k, sqrt_k), + name="detr/box_dense_0"), + tf.keras.layers.Dense( + self._hidden_size, activation="relu", + kernel_initializer=tf.keras.initializers.RandomUniform( + -sqrt_k, sqrt_k), + name="detr/box_dense_1"), + tf.keras.layers.Dense( + 4, kernel_initializer=tf.keras.initializers.RandomUniform( + -sqrt_k, sqrt_k), + name="detr/box_dense_2")] + self._sigmoid = tf.keras.layers.Activation("sigmoid") + + @property + def backbone(self) -> tf.keras.Model: + return self._backbone + + def get_config(self): + return { + "backbone": self._backbone, + "backbone_endpoint_name": self._backbone_endpoint_name, + "num_queries": self._num_queries, + "hidden_size": self._hidden_size, + "num_classes": self._num_classes, + "num_encoder_layers": self._num_encoder_layers, + "num_decoder_layers": self._num_decoder_layers, + "dropout_rate": self._dropout_rate, + } + + @classmethod + def from_config(cls, config): + return cls(**config) + + def _generate_image_mask(self, inputs: tf.Tensor, + target_shape: tf.Tensor) -> tf.Tensor: + """Generates image mask from input image.""" + mask = tf.expand_dims( + tf.cast(tf.not_equal(tf.reduce_sum(inputs, axis=-1), 0), inputs.dtype), + axis=-1) + mask = tf.image.resize( + mask, target_shape, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) + return mask + + def call(self, inputs: tf.Tensor) -> List[Any]: + batch_size = tf.shape(inputs)[0] + features = self._backbone(inputs)[self._backbone_endpoint_name] + shape = tf.shape(features) + mask = self._generate_image_mask(inputs, shape[1: 3]) + + pos_embed = position_embedding_sine( + mask[:, :, :, 0], num_pos_features=self._hidden_size) + pos_embed = tf.reshape(pos_embed, [batch_size, -1, self._hidden_size]) + + features = tf.reshape( + self._input_proj(features), [batch_size, -1, self._hidden_size]) + mask = tf.reshape(mask, [batch_size, -1]) + + decoded_list = self._transformer({ + "inputs": + features, + "targets": + tf.tile( + tf.expand_dims(self._query_embeddings, axis=0), + (batch_size, 1, 1)), + "pos_embed": pos_embed, + "mask": mask, + }) + out_list = [] + for decoded in decoded_list: + decoded = tf.stack(decoded) + output_class = self._class_embed(decoded) + box_out = decoded + for layer in self._bbox_embed: + box_out = layer(box_out) + output_coord = self._sigmoid(box_out) + out = {"cls_outputs": output_class, "box_outputs": output_coord} + out_list.append(out) + return out_list + + +class DETRTransformer(tf.keras.layers.Layer): + """Encoder and Decoder of DETR.""" + + def __init__(self, num_encoder_layers=6, num_decoder_layers=6, + dropout_rate=0.1, **kwargs): + super().__init__(**kwargs) + self._dropout_rate = dropout_rate + self._num_encoder_layers = num_encoder_layers + self._num_decoder_layers = num_decoder_layers + + def build(self, input_shape=None): + if self._num_encoder_layers > 0: + self._encoder = transformer.TransformerEncoder( + attention_dropout_rate=self._dropout_rate, + dropout_rate=self._dropout_rate, + intermediate_dropout=self._dropout_rate, + norm_first=False, + num_layers=self._num_encoder_layers) + else: + self._encoder = None + + self._decoder = transformer.TransformerDecoder( + attention_dropout_rate=self._dropout_rate, + dropout_rate=self._dropout_rate, + intermediate_dropout=self._dropout_rate, + norm_first=False, + num_layers=self._num_decoder_layers) + super().build(input_shape) + + def get_config(self): + return { + "num_encoder_layers": self._num_encoder_layers, + "num_decoder_layers": self._num_decoder_layers, + "dropout_rate": self._dropout_rate, + } + + def call(self, inputs): + sources = inputs["inputs"] + targets = inputs["targets"] + pos_embed = inputs["pos_embed"] + mask = inputs["mask"] + input_shape = tf_utils.get_shape_list(sources) + source_attention_mask = tf.tile( + tf.expand_dims(mask, axis=1), [1, input_shape[1], 1]) + if self._encoder is not None: + memory = self._encoder( + sources, attention_mask=source_attention_mask, pos_embed=pos_embed) + else: + memory = sources + + target_shape = tf_utils.get_shape_list(targets) + cross_attention_mask = tf.tile( + tf.expand_dims(mask, axis=1), [1, target_shape[1], 1]) + target_shape = tf.shape(targets) + decoded = self._decoder( + tf.zeros_like(targets), + memory, + # TODO(b/199545430): self_attention_mask could be set to None when this + # bug is resolved. Passing ones for now. + self_attention_mask=tf.ones( + (target_shape[0], target_shape[1], target_shape[1])), + cross_attention_mask=cross_attention_mask, + return_all_decoder_outputs=True, + input_pos_embed=targets, + memory_pos_embed=pos_embed) + return decoded diff --git a/official/projects/detr/modeling/detr_test.py b/official/projects/detr/modeling/detr_test.py new file mode 100644 index 0000000000000000000000000000000000000000..90d6d64906c9ae63a84b65a95560dfc5c53bde71 --- /dev/null +++ b/official/projects/detr/modeling/detr_test.py @@ -0,0 +1,70 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for tensorflow_models.official.projects.detr.detr.""" +import tensorflow as tf +from official.projects.detr.modeling import detr +from official.vision.modeling.backbones import resnet + + +class DetrTest(tf.test.TestCase): + + def test_forward(self): + num_queries = 10 + hidden_size = 128 + num_classes = 10 + image_size = 640 + batch_size = 2 + backbone = resnet.ResNet(50, bn_trainable=False) + backbone_endpoint_name = '5' + model = detr.DETR(backbone, backbone_endpoint_name, num_queries, + hidden_size, num_classes) + outs = model(tf.ones((batch_size, image_size, image_size, 3))) + self.assertLen(outs, 6) # intermediate decoded outputs. + for out in outs: + self.assertAllEqual( + tf.shape(out['cls_outputs']), (batch_size, num_queries, num_classes)) + self.assertAllEqual( + tf.shape(out['box_outputs']), (batch_size, num_queries, 4)) + + def test_get_from_config_detr_transformer(self): + config = { + 'num_encoder_layers': 1, + 'num_decoder_layers': 2, + 'dropout_rate': 0.5, + } + detr_model = detr.DETRTransformer.from_config(config) + retrieved_config = detr_model.get_config() + + self.assertEqual(config, retrieved_config) + + def test_get_from_config_detr(self): + config = { + 'backbone': resnet.ResNet(50, bn_trainable=False), + 'backbone_endpoint_name': '5', + 'num_queries': 2, + 'hidden_size': 4, + 'num_classes': 10, + 'num_encoder_layers': 4, + 'num_decoder_layers': 5, + 'dropout_rate': 0.5, + } + detr_model = detr.DETR.from_config(config) + retrieved_config = detr_model.get_config() + + self.assertEqual(config, retrieved_config) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/detr/modeling/transformer.py b/official/projects/detr/modeling/transformer.py new file mode 100644 index 0000000000000000000000000000000000000000..d1eeb9aa118da33dd087c71cde95c66d0835b4a4 --- /dev/null +++ b/official/projects/detr/modeling/transformer.py @@ -0,0 +1,851 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Specialized Transformers for DETR. + +the position embeddings are added to the query and key for every self- and +cross-attention layer. +""" + +import tensorflow as tf + +from official.modeling import tf_utils +from official.nlp.modeling import layers +from official.nlp.modeling import models + + +class TransformerEncoder(tf.keras.layers.Layer): + """Transformer encoder. + + Transformer encoder is made up of N identical layers. Each layer is composed + of the sublayers: + 1. Self-attention layer + 2. Feedforward network (which is 2 fully-connected layers) + """ + + def __init__(self, + num_layers=6, + num_attention_heads=8, + intermediate_size=2048, + activation="relu", + dropout_rate=0.0, + attention_dropout_rate=0.0, + use_bias=False, + norm_first=True, + norm_epsilon=1e-6, + intermediate_dropout=0.0, + **kwargs): + """Initialize a Transformer encoder. + + Args: + num_layers: Number of layers. + num_attention_heads: Number of attention heads. + intermediate_size: Size of the intermediate (Feedforward) layer. + activation: Activation for the intermediate layer. + dropout_rate: Dropout probability. + attention_dropout_rate: Dropout probability for attention layers. + use_bias: Whether to enable use_bias in attention layer. If set False, + use_bias in attention layer is disabled. + norm_first: Whether to normalize inputs to attention and intermediate + dense layers. If set False, output of attention and intermediate dense + layers is normalized. + norm_epsilon: Epsilon value to initialize normalization layers. + intermediate_dropout: Dropout probability for intermediate_dropout_layer. + **kwargs: key word arguemnts passed to tf.keras.layers.Layer. + """ + + super(TransformerEncoder, self).__init__(**kwargs) + self.num_layers = num_layers + self.num_attention_heads = num_attention_heads + self._intermediate_size = intermediate_size + self._activation = activation + self._dropout_rate = dropout_rate + self._attention_dropout_rate = attention_dropout_rate + self._use_bias = use_bias + self._norm_first = norm_first + self._norm_epsilon = norm_epsilon + self._intermediate_dropout = intermediate_dropout + + def build(self, input_shape): + """Implements build() for the layer.""" + self.encoder_layers = [] + for i in range(self.num_layers): + self.encoder_layers.append( + TransformerEncoderBlock( + num_attention_heads=self.num_attention_heads, + inner_dim=self._intermediate_size, + inner_activation=self._activation, + output_dropout=self._dropout_rate, + attention_dropout=self._attention_dropout_rate, + use_bias=self._use_bias, + norm_first=self._norm_first, + norm_epsilon=self._norm_epsilon, + inner_dropout=self._intermediate_dropout, + attention_initializer=tf_utils.clone_initializer( + models.seq2seq_transformer.attention_initializer( + input_shape[2])), + name=("layer_%d" % i))) + self.output_normalization = tf.keras.layers.LayerNormalization( + epsilon=self._norm_epsilon, dtype="float32") + super(TransformerEncoder, self).build(input_shape) + + def get_config(self): + config = { + "num_layers": self.num_layers, + "num_attention_heads": self.num_attention_heads, + "intermediate_size": self._intermediate_size, + "activation": self._activation, + "dropout_rate": self._dropout_rate, + "attention_dropout_rate": self._attention_dropout_rate, + "use_bias": self._use_bias, + "norm_first": self._norm_first, + "norm_epsilon": self._norm_epsilon, + "intermediate_dropout": self._intermediate_dropout + } + base_config = super(TransformerEncoder, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, encoder_inputs, attention_mask=None, pos_embed=None): + """Return the output of the encoder. + + Args: + encoder_inputs: A tensor with shape `(batch_size, input_length, + hidden_size)`. + attention_mask: A mask for the encoder self-attention layer with shape + `(batch_size, input_length, input_length)`. + pos_embed: Position embedding to add to every encoder layer. + + Returns: + Output of encoder which is a `float32` tensor with shape + `(batch_size, input_length, hidden_size)`. + """ + for layer_idx in range(self.num_layers): + encoder_inputs = self.encoder_layers[layer_idx]( + [encoder_inputs, attention_mask, pos_embed]) + + output_tensor = encoder_inputs + output_tensor = self.output_normalization(output_tensor) + + return output_tensor + + +class TransformerEncoderBlock(tf.keras.layers.Layer): + """TransformerEncoderBlock layer. + + This layer implements the Transformer Encoder from + "Attention Is All You Need". (https://arxiv.org/abs/1706.03762), + which combines a `tf.keras.layers.MultiHeadAttention` layer with a + two-layer feedforward network. The only difference: position embedding is + added to the query and key of self-attention. + + References: + [Attention Is All You Need](https://arxiv.org/abs/1706.03762) + [BERT: Pre-training of Deep Bidirectional Transformers for Language + Understanding](https://arxiv.org/abs/1810.04805) + """ + + def __init__(self, + num_attention_heads, + inner_dim, + inner_activation, + output_range=None, + kernel_initializer="glorot_uniform", + bias_initializer="zeros", + kernel_regularizer=None, + bias_regularizer=None, + activity_regularizer=None, + kernel_constraint=None, + bias_constraint=None, + use_bias=True, + norm_first=False, + norm_epsilon=1e-12, + output_dropout=0.0, + attention_dropout=0.0, + inner_dropout=0.0, + attention_initializer=None, + attention_axes=None, + **kwargs): + """Initializes `TransformerEncoderBlock`. + + Args: + num_attention_heads: Number of attention heads. + inner_dim: The output dimension of the first Dense layer in a two-layer + feedforward network. + inner_activation: The activation for the first Dense layer in a two-layer + feedforward network. + output_range: the sequence output range, [0, output_range) for slicing the + target sequence. `None` means the target sequence is not sliced. + kernel_initializer: Initializer for dense layer kernels. + bias_initializer: Initializer for dense layer biases. + kernel_regularizer: Regularizer for dense layer kernels. + bias_regularizer: Regularizer for dense layer biases. + activity_regularizer: Regularizer for dense layer activity. + kernel_constraint: Constraint for dense layer kernels. + bias_constraint: Constraint for dense layer kernels. + use_bias: Whether to enable use_bias in attention layer. If set False, + use_bias in attention layer is disabled. + norm_first: Whether to normalize inputs to attention and intermediate + dense layers. If set False, output of attention and intermediate dense + layers is normalized. + norm_epsilon: Epsilon value to initialize normalization layers. + output_dropout: Dropout probability for the post-attention and output + dropout. + attention_dropout: Dropout probability for within the attention layer. + inner_dropout: Dropout probability for the first Dense layer in a + two-layer feedforward network. + attention_initializer: Initializer for kernels of attention layers. If set + `None`, attention layers use kernel_initializer as initializer for + kernel. + attention_axes: axes over which the attention is applied. `None` means + attention over all axes, but batch, heads, and features. + **kwargs: keyword arguments/ + """ + super().__init__(**kwargs) + + self._num_heads = num_attention_heads + self._inner_dim = inner_dim + self._inner_activation = inner_activation + self._attention_dropout = attention_dropout + self._attention_dropout_rate = attention_dropout + self._output_dropout = output_dropout + self._output_dropout_rate = output_dropout + self._output_range = output_range + self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) + self._bias_initializer = tf.keras.initializers.get(bias_initializer) + self._kernel_regularizer = tf.keras.regularizers.get(kernel_regularizer) + self._bias_regularizer = tf.keras.regularizers.get(bias_regularizer) + self._activity_regularizer = tf.keras.regularizers.get(activity_regularizer) + self._kernel_constraint = tf.keras.constraints.get(kernel_constraint) + self._bias_constraint = tf.keras.constraints.get(bias_constraint) + self._use_bias = use_bias + self._norm_first = norm_first + self._norm_epsilon = norm_epsilon + self._inner_dropout = inner_dropout + if attention_initializer: + self._attention_initializer = tf.keras.initializers.get( + attention_initializer) + else: + self._attention_initializer = tf_utils.clone_initializer( + self._kernel_initializer) + self._attention_axes = attention_axes + + def build(self, input_shape): + if isinstance(input_shape, tf.TensorShape): + input_tensor_shape = input_shape + elif isinstance(input_shape, (list, tuple)): + input_tensor_shape = tf.TensorShape(input_shape[0]) + else: + raise ValueError( + "The type of input shape argument is not supported, got: %s" % + type(input_shape)) + einsum_equation = "abc,cd->abd" + if len(input_tensor_shape.as_list()) > 3: + einsum_equation = "...bc,cd->...bd" + hidden_size = input_tensor_shape[-1] + if hidden_size % self._num_heads != 0: + raise ValueError( + "The input size (%d) is not a multiple of the number of attention " + "heads (%d)" % (hidden_size, self._num_heads)) + self._attention_head_size = int(hidden_size // self._num_heads) + common_kwargs = dict( + bias_initializer=self._bias_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activity_regularizer=self._activity_regularizer, + kernel_constraint=self._kernel_constraint, + bias_constraint=self._bias_constraint) + self._attention_layer = tf.keras.layers.MultiHeadAttention( + num_heads=self._num_heads, + key_dim=self._attention_head_size, + dropout=self._attention_dropout, + use_bias=self._use_bias, + kernel_initializer=self._attention_initializer, + attention_axes=self._attention_axes, + name="self_attention", + **common_kwargs) + self._attention_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) + # Use float32 in layernorm for numeric stability. + # It is probably safe in mixed_float16, but we haven't validated this yet. + self._attention_layer_norm = ( + tf.keras.layers.LayerNormalization( + name="self_attention_layer_norm", + axis=-1, + epsilon=self._norm_epsilon, + dtype=tf.float32)) + self._intermediate_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=(None, self._inner_dim), + bias_axes="d", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + name="intermediate", + **common_kwargs) + policy = tf.keras.mixed_precision.global_policy() + if policy.name == "mixed_bfloat16": + # bfloat16 causes BERT with the LAMB optimizer to not converge + # as well, so we use float32. + # TODO(b/154538392): Investigate this. + policy = tf.float32 + self._intermediate_activation_layer = tf.keras.layers.Activation( + self._inner_activation, dtype=policy) + self._inner_dropout_layer = tf.keras.layers.Dropout( + rate=self._inner_dropout) + self._output_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=(None, hidden_size), + bias_axes="d", + name="output", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + **common_kwargs) + self._output_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) + # Use float32 in layernorm for numeric stability. + self._output_layer_norm = tf.keras.layers.LayerNormalization( + name="output_layer_norm", + axis=-1, + epsilon=self._norm_epsilon, + dtype=tf.float32) + + super(TransformerEncoderBlock, self).build(input_shape) + + def get_config(self): + config = { + "num_attention_heads": + self._num_heads, + "inner_dim": + self._inner_dim, + "inner_activation": + self._inner_activation, + "output_dropout": + self._output_dropout_rate, + "attention_dropout": + self._attention_dropout_rate, + "output_range": + self._output_range, + "kernel_initializer": + tf.keras.initializers.serialize(self._kernel_initializer), + "bias_initializer": + tf.keras.initializers.serialize(self._bias_initializer), + "kernel_regularizer": + tf.keras.regularizers.serialize(self._kernel_regularizer), + "bias_regularizer": + tf.keras.regularizers.serialize(self._bias_regularizer), + "activity_regularizer": + tf.keras.regularizers.serialize(self._activity_regularizer), + "kernel_constraint": + tf.keras.constraints.serialize(self._kernel_constraint), + "bias_constraint": + tf.keras.constraints.serialize(self._bias_constraint), + "use_bias": + self._use_bias, + "norm_first": + self._norm_first, + "norm_epsilon": + self._norm_epsilon, + "inner_dropout": + self._inner_dropout, + "attention_initializer": + tf.keras.initializers.serialize(self._attention_initializer), + "attention_axes": + self._attention_axes, + } + base_config = super(TransformerEncoderBlock, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + """Transformer self-attention encoder block call. + + Args: + inputs: a single tensor or a list of tensors. `input tensor` as the single + sequence of embeddings. [`input tensor`, `attention mask`] to have the + additional attention mask. [`input tensor`, `attention mask`, `query + embed`] to have an additional position embedding to add. + + Returns: + An output tensor with the same dimensions as input/query tensor. + """ + input_tensor, attention_mask, pos_embed = inputs + + key_value = None + + if self._output_range: + if self._norm_first: + source_tensor = input_tensor[:, 0:self._output_range, :] + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor[:, 0:self._output_range, :] + if attention_mask is not None: + attention_mask = attention_mask[:, 0:self._output_range, :] + else: + if self._norm_first: + source_tensor = input_tensor + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor + + if key_value is None: + key_value = input_tensor + attention_output = self._attention_layer( + query=target_tensor + pos_embed, + key=key_value + pos_embed, + value=key_value, + attention_mask=attention_mask) + attention_output = self._attention_dropout(attention_output) + if self._norm_first: + attention_output = source_tensor + attention_output + else: + attention_output = self._attention_layer_norm(target_tensor + + attention_output) + if self._norm_first: + source_attention_output = attention_output + attention_output = self._output_layer_norm(attention_output) + inner_output = self._intermediate_dense(attention_output) + inner_output = self._intermediate_activation_layer(inner_output) + inner_output = self._inner_dropout_layer(inner_output) + layer_output = self._output_dense(inner_output) + layer_output = self._output_dropout(layer_output) + + if self._norm_first: + return source_attention_output + layer_output + + # During mixed precision training, layer norm output is always fp32 for now. + # Casts fp32 for the subsequent add. + layer_output = tf.cast(layer_output, tf.float32) + return self._output_layer_norm(layer_output + attention_output) + + +class TransformerDecoder(tf.keras.layers.Layer): + """Transformer decoder. + + Like the encoder, the decoder is made up of N identical layers. + Each layer is composed of the sublayers: + 1. Self-attention layer + 2. Multi-headed attention layer combining encoder outputs with results from + the previous self-attention layer. + 3. Feedforward network (2 fully-connected layers) + """ + + def __init__(self, + num_layers=6, + num_attention_heads=8, + intermediate_size=2048, + activation="relu", + dropout_rate=0.0, + attention_dropout_rate=0.0, + use_bias=False, + norm_first=True, + norm_epsilon=1e-6, + intermediate_dropout=0.0, + **kwargs): + """Initialize a Transformer decoder. + + Args: + num_layers: Number of layers. + num_attention_heads: Number of attention heads. + intermediate_size: Size of the intermediate (Feedforward) layer. + activation: Activation for the intermediate layer. + dropout_rate: Dropout probability. + attention_dropout_rate: Dropout probability for attention layers. + use_bias: Whether to enable use_bias in attention layer. If set `False`, + use_bias in attention layer is disabled. + norm_first: Whether to normalize inputs to attention and intermediate + dense layers. If set `False`, output of attention and intermediate dense + layers is normalized. + norm_epsilon: Epsilon value to initialize normalization layers. + intermediate_dropout: Dropout probability for intermediate_dropout_layer. + **kwargs: key word arguemnts passed to tf.keras.layers.Layer. + """ + super(TransformerDecoder, self).__init__(**kwargs) + self.num_layers = num_layers + self.num_attention_heads = num_attention_heads + self._intermediate_size = intermediate_size + self._activation = activation + self._dropout_rate = dropout_rate + self._attention_dropout_rate = attention_dropout_rate + self._use_bias = use_bias + self._norm_first = norm_first + self._norm_epsilon = norm_epsilon + self._intermediate_dropout = intermediate_dropout + + def build(self, input_shape): + """Implements build() for the layer.""" + self.decoder_layers = [] + for i in range(self.num_layers): + self.decoder_layers.append( + TransformerDecoderBlock( + num_attention_heads=self.num_attention_heads, + intermediate_size=self._intermediate_size, + intermediate_activation=self._activation, + dropout_rate=self._dropout_rate, + attention_dropout_rate=self._attention_dropout_rate, + use_bias=self._use_bias, + norm_first=self._norm_first, + norm_epsilon=self._norm_epsilon, + intermediate_dropout=self._intermediate_dropout, + attention_initializer=tf_utils.clone_initializer( + models.seq2seq_transformer.attention_initializer( + input_shape[2])), + name=("layer_%d" % i))) + self.output_normalization = tf.keras.layers.LayerNormalization( + epsilon=self._norm_epsilon, dtype="float32") + super(TransformerDecoder, self).build(input_shape) + + def get_config(self): + config = { + "num_layers": self.num_layers, + "num_attention_heads": self.num_attention_heads, + "intermediate_size": self._intermediate_size, + "activation": self._activation, + "dropout_rate": self._dropout_rate, + "attention_dropout_rate": self._attention_dropout_rate, + "use_bias": self._use_bias, + "norm_first": self._norm_first, + "norm_epsilon": self._norm_epsilon, + "intermediate_dropout": self._intermediate_dropout + } + base_config = super(TransformerDecoder, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, + target, + memory, + self_attention_mask=None, + cross_attention_mask=None, + cache=None, + decode_loop_step=None, + return_all_decoder_outputs=False, + input_pos_embed=None, + memory_pos_embed=None): + """Return the output of the decoder layer stacks. + + Args: + target: A tensor with shape `(batch_size, target_length, hidden_size)`. + memory: A tensor with shape `(batch_size, input_length, hidden_size)`. + self_attention_mask: A tensor with shape `(batch_size, target_len, + target_length)`, the mask for decoder self-attention layer. + cross_attention_mask: A tensor with shape `(batch_size, target_length, + input_length)` which is the mask for encoder-decoder attention layer. + cache: (Used for fast decoding) A nested dictionary storing previous + decoder self-attention values. The items are: + {layer_n: {"k": A tensor with shape `(batch_size, i, key_channels)`, + "v": A tensor with shape `(batch_size, i, value_channels)`}, + ...} + decode_loop_step: An integer, the step number of the decoding loop. Used + only for autoregressive inference on TPU. + return_all_decoder_outputs: Return all decoder layer outputs. Note that + the outputs are layer normed. This is useful when introducing per layer + auxiliary loss. + input_pos_embed: A tensor that is added to the query and key of the + self-attention layer. + memory_pos_embed: A tensor that is added to the query and key of the + cross-attention layer. + + Returns: + Output of decoder. + float32 tensor with shape `(batch_size, target_length, hidden_size`). + """ + + output_tensor = target + decoder_outputs = [] + for layer_idx in range(self.num_layers): + transformer_inputs = [ + output_tensor, memory, cross_attention_mask, self_attention_mask, + input_pos_embed, memory_pos_embed + ] + # Gets the cache for decoding. + if cache is None: + output_tensor, _ = self.decoder_layers[layer_idx](transformer_inputs) + else: + cache_layer_idx = str(layer_idx) + output_tensor, cache[cache_layer_idx] = self.decoder_layers[layer_idx]( + transformer_inputs, + cache=cache[cache_layer_idx], + decode_loop_step=decode_loop_step) + if return_all_decoder_outputs: + decoder_outputs.append(self.output_normalization(output_tensor)) + + if return_all_decoder_outputs: + return decoder_outputs + else: + return self.output_normalization(output_tensor) + + +class TransformerDecoderBlock(tf.keras.layers.Layer): + """Single transformer layer for decoder. + + It has three sub-layers: + (1) a multi-head self-attention mechanism. + (2) a encoder-decoder attention. + (3) a positionwise fully connected feed-forward network. + """ + + def __init__(self, + num_attention_heads, + intermediate_size, + intermediate_activation, + dropout_rate=0.0, + attention_dropout_rate=0.0, + kernel_initializer="glorot_uniform", + bias_initializer="zeros", + kernel_regularizer=None, + bias_regularizer=None, + activity_regularizer=None, + kernel_constraint=None, + bias_constraint=None, + use_bias=True, + norm_first=False, + norm_epsilon=1e-12, + intermediate_dropout=0.0, + attention_initializer=None, + **kwargs): + """Initialize a Transformer decoder block. + + Args: + num_attention_heads: Number of attention heads. + intermediate_size: Size of the intermediate layer. + intermediate_activation: Activation for the intermediate layer. + dropout_rate: Dropout probability for the post-attention and output + dropout. + attention_dropout_rate: Dropout probability for within the attention + layer. + kernel_initializer: Initializer for dense layer kernels. + bias_initializer: Initializer for dense layer biases. + kernel_regularizer: Regularizer for dense layer kernels. + bias_regularizer: Regularizer for dense layer biases. + activity_regularizer: Regularizer for dense layer activity. + kernel_constraint: Constraint for dense layer kernels. + bias_constraint: Constraint for dense layer kernels. + use_bias: Whether to enable use_bias in attention layer. If set False, + use_bias in attention layer is disabled. + norm_first: Whether to normalize inputs to attention and intermediate + dense layers. If set False, output of attention and intermediate dense + layers is normalized. + norm_epsilon: Epsilon value to initialize normalization layers. + intermediate_dropout: Dropout probability for intermediate_dropout_layer. + attention_initializer: Initializer for kernels of attention layers. If set + `None`, attention layers use kernel_initializer as initializer for + kernel. + **kwargs: key word arguemnts passed to tf.keras.layers.Layer. + """ + super().__init__(**kwargs) + self.num_attention_heads = num_attention_heads + self.intermediate_size = intermediate_size + self.intermediate_activation = tf.keras.activations.get( + intermediate_activation) + self.dropout_rate = dropout_rate + self.attention_dropout_rate = attention_dropout_rate + self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) + self._bias_initializer = tf.keras.initializers.get(bias_initializer) + self._kernel_regularizer = tf.keras.regularizers.get(kernel_regularizer) + self._bias_regularizer = tf.keras.regularizers.get(bias_regularizer) + self._activity_regularizer = tf.keras.regularizers.get(activity_regularizer) + self._kernel_constraint = tf.keras.constraints.get(kernel_constraint) + self._bias_constraint = tf.keras.constraints.get(bias_constraint) + self._use_bias = use_bias + self._norm_first = norm_first + self._norm_epsilon = norm_epsilon + self._intermediate_dropout = intermediate_dropout + if attention_initializer: + self._attention_initializer = tf.keras.initializers.get( + attention_initializer) + else: + self._attention_initializer = tf_utils.clone_initializer( + self._kernel_initializer) + self._cross_attention_cls = layers.attention.MultiHeadAttention + + def build(self, input_shape): + target_tensor_shape = tf.TensorShape(input_shape[0]) + if len(target_tensor_shape.as_list()) != 3: + raise ValueError("TransformerLayer expects a three-dimensional input of " + "shape [batch, sequence, width].") + hidden_size = target_tensor_shape[2] + if hidden_size % self.num_attention_heads != 0: + raise ValueError( + "The hidden size (%d) is not a multiple of the number of attention " + "heads (%d)" % (hidden_size, self.num_attention_heads)) + self.attention_head_size = int(hidden_size) // self.num_attention_heads + common_kwargs = dict( + bias_initializer=self._bias_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activity_regularizer=self._activity_regularizer, + kernel_constraint=self._kernel_constraint, + bias_constraint=self._bias_constraint) + # Self attention. + self.self_attention = layers.attention.CachedAttention( + num_heads=self.num_attention_heads, + key_dim=self.attention_head_size, + dropout=self.attention_dropout_rate, + use_bias=self._use_bias, + kernel_initializer=self._attention_initializer, + name="self_attention", + **common_kwargs) + self.self_attention_output_dense = tf.keras.layers.EinsumDense( + "abc,cd->abd", + output_shape=(None, hidden_size), + bias_axes="d", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + name="output", + **common_kwargs) + self.self_attention_dropout = tf.keras.layers.Dropout( + rate=self.dropout_rate) + self.self_attention_layer_norm = ( + tf.keras.layers.LayerNormalization( + name="self_attention_layer_norm", + axis=-1, + epsilon=self._norm_epsilon, + dtype="float32")) + # Encoder-decoder attention. + self.encdec_attention = self._cross_attention_cls( + num_heads=self.num_attention_heads, + key_dim=self.attention_head_size, + dropout=self.attention_dropout_rate, + output_shape=hidden_size, + use_bias=self._use_bias, + kernel_initializer=self._attention_initializer, + name="attention/encdec", + **common_kwargs) + + self.encdec_attention_dropout = tf.keras.layers.Dropout( + rate=self.dropout_rate) + self.encdec_attention_layer_norm = ( + tf.keras.layers.LayerNormalization( + name="attention/encdec_output_layer_norm", + axis=-1, + epsilon=self._norm_epsilon, + dtype="float32")) + + # Feed-forward projection. + self.intermediate_dense = tf.keras.layers.EinsumDense( + "abc,cd->abd", + output_shape=(None, self.intermediate_size), + bias_axes="d", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + name="intermediate", + **common_kwargs) + self.intermediate_activation_layer = tf.keras.layers.Activation( + self.intermediate_activation) + self._intermediate_dropout_layer = tf.keras.layers.Dropout( + rate=self._intermediate_dropout) + self.output_dense = tf.keras.layers.EinsumDense( + "abc,cd->abd", + output_shape=(None, hidden_size), + bias_axes="d", + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + name="output", + **common_kwargs) + self.output_dropout = tf.keras.layers.Dropout(rate=self.dropout_rate) + self.output_layer_norm = tf.keras.layers.LayerNormalization( + name="output_layer_norm", + axis=-1, + epsilon=self._norm_epsilon, + dtype="float32") + super().build(input_shape) + + def get_config(self): + config = { + "num_attention_heads": + self.num_attention_heads, + "intermediate_size": + self.intermediate_size, + "intermediate_activation": + tf.keras.activations.serialize(self.intermediate_activation), + "dropout_rate": + self.dropout_rate, + "attention_dropout_rate": + self.attention_dropout_rate, + "kernel_initializer": + tf.keras.initializers.serialize(self._kernel_initializer), + "bias_initializer": + tf.keras.initializers.serialize(self._bias_initializer), + "kernel_regularizer": + tf.keras.regularizers.serialize(self._kernel_regularizer), + "bias_regularizer": + tf.keras.regularizers.serialize(self._bias_regularizer), + "activity_regularizer": + tf.keras.regularizers.serialize(self._activity_regularizer), + "kernel_constraint": + tf.keras.constraints.serialize(self._kernel_constraint), + "bias_constraint": + tf.keras.constraints.serialize(self._bias_constraint), + "use_bias": + self._use_bias, + "norm_first": + self._norm_first, + "norm_epsilon": + self._norm_epsilon, + "intermediate_dropout": + self._intermediate_dropout, + "attention_initializer": + tf.keras.initializers.serialize(self._attention_initializer) + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def common_layers_with_encoder(self): + """Gets layer objects that can make a Transformer encoder block.""" + return [ + self.self_attention, self.self_attention_layer_norm, + self.intermediate_dense, self.output_dense, self.output_layer_norm + ] + + def call(self, inputs, cache=None, decode_loop_step=None): + input_tensor, memory, attention_mask, self_attention_mask, input_pos_embed, memory_pos_embed = inputs + source_tensor = input_tensor + if self._norm_first: + input_tensor = self.self_attention_layer_norm(input_tensor) + self_attention_output, cache = self.self_attention( + query=input_tensor + input_pos_embed, + key=input_tensor + input_pos_embed, + value=input_tensor, + attention_mask=self_attention_mask, + cache=cache, + decode_loop_step=decode_loop_step) + self_attention_output = self.self_attention_dropout(self_attention_output) + if self._norm_first: + self_attention_output = source_tensor + self_attention_output + else: + self_attention_output = self.self_attention_layer_norm( + input_tensor + self_attention_output) + if self._norm_first: + source_self_attention_output = self_attention_output + self_attention_output = self.encdec_attention_layer_norm( + self_attention_output) + cross_attn_inputs = dict( + query=self_attention_output + input_pos_embed, + key=memory + memory_pos_embed, + value=memory, + attention_mask=attention_mask) + attention_output = self.encdec_attention(**cross_attn_inputs) + attention_output = self.encdec_attention_dropout(attention_output) + if self._norm_first: + attention_output = source_self_attention_output + attention_output + else: + attention_output = self.encdec_attention_layer_norm( + self_attention_output + attention_output) + if self._norm_first: + source_attention_output = attention_output + attention_output = self.output_layer_norm(attention_output) + + intermediate_output = self.intermediate_dense(attention_output) + intermediate_output = self.intermediate_activation_layer( + intermediate_output) + intermediate_output = self._intermediate_dropout_layer(intermediate_output) + layer_output = self.output_dense(intermediate_output) + layer_output = self.output_dropout(layer_output) + if self._norm_first: + layer_output = source_attention_output + layer_output + else: + layer_output = self.output_layer_norm(layer_output + attention_output) + return layer_output, cache diff --git a/official/projects/detr/modeling/transformer_test.py b/official/projects/detr/modeling/transformer_test.py new file mode 100644 index 0000000000000000000000000000000000000000..0752403a2a816beca3da2f07b648214228a8cf80 --- /dev/null +++ b/official/projects/detr/modeling/transformer_test.py @@ -0,0 +1,263 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for transformer.""" + +import tensorflow as tf + +from official.projects.detr.modeling import transformer + + +class TransformerTest(tf.test.TestCase): + + def test_transformer_encoder_block(self): + batch_size = 2 + sequence_length = 100 + feature_size = 256 + num_attention_heads = 2 + inner_dim = 256 + inner_activation = 'relu' + model = transformer.TransformerEncoderBlock(num_attention_heads, inner_dim, + inner_activation) + input_tensor = tf.ones((batch_size, sequence_length, feature_size)) + attention_mask = tf.ones((batch_size, sequence_length, sequence_length), + dtype=tf.int64) + pos_embed = tf.ones((batch_size, sequence_length, feature_size)) + + out = model([input_tensor, attention_mask, pos_embed]) + self.assertAllEqual( + tf.shape(out), (batch_size, sequence_length, feature_size)) + + def test_transformer_encoder_block_get_config(self): + num_attention_heads = 2 + inner_dim = 256 + inner_activation = 'relu' + model = transformer.TransformerEncoderBlock(num_attention_heads, inner_dim, + inner_activation) + config = model.get_config() + expected_config = { + 'name': 'transformer_encoder_block', + 'trainable': True, + 'dtype': 'float32', + 'num_attention_heads': 2, + 'inner_dim': 256, + 'inner_activation': 'relu', + 'output_dropout': 0.0, + 'attention_dropout': 0.0, + 'output_range': None, + 'kernel_initializer': { + 'class_name': 'GlorotUniform', + 'config': { + 'seed': None} + }, + 'bias_initializer': { + 'class_name': 'Zeros', + 'config': {} + }, + 'kernel_regularizer': None, + 'bias_regularizer': None, + 'activity_regularizer': None, + 'kernel_constraint': None, + 'bias_constraint': None, + 'use_bias': True, + 'norm_first': False, + 'norm_epsilon': 1e-12, + 'inner_dropout': 0.0, + 'attention_initializer': { + 'class_name': 'GlorotUniform', + 'config': {'seed': None} + }, + 'attention_axes': None} + self.assertAllEqual(expected_config, config) + + def test_transformer_encoder(self): + batch_size = 2 + sequence_length = 100 + feature_size = 256 + num_layers = 2 + num_attention_heads = 2 + intermediate_size = 256 + model = transformer.TransformerEncoder( + num_layers=num_layers, + num_attention_heads=num_attention_heads, + intermediate_size=intermediate_size) + input_tensor = tf.ones((batch_size, sequence_length, feature_size)) + attention_mask = tf.ones((batch_size, sequence_length, sequence_length), + dtype=tf.int64) + pos_embed = tf.ones((batch_size, sequence_length, feature_size)) + out = model(input_tensor, attention_mask, pos_embed) + self.assertAllEqual( + tf.shape(out), (batch_size, sequence_length, feature_size)) + + def test_transformer_encoder_get_config(self): + num_layers = 2 + num_attention_heads = 2 + intermediate_size = 256 + model = transformer.TransformerEncoder( + num_layers=num_layers, + num_attention_heads=num_attention_heads, + intermediate_size=intermediate_size) + config = model.get_config() + expected_config = { + 'name': 'transformer_encoder', + 'trainable': True, + 'dtype': 'float32', + 'num_layers': 2, + 'num_attention_heads': 2, + 'intermediate_size': 256, + 'activation': 'relu', + 'dropout_rate': 0.0, + 'attention_dropout_rate': 0.0, + 'use_bias': False, + 'norm_first': True, + 'norm_epsilon': 1e-06, + 'intermediate_dropout': 0.0 + } + self.assertAllEqual(expected_config, config) + + def test_transformer_decoder_block(self): + batch_size = 2 + sequence_length = 100 + memory_length = 200 + feature_size = 256 + num_attention_heads = 2 + intermediate_size = 256 + intermediate_activation = 'relu' + model = transformer.TransformerDecoderBlock(num_attention_heads, + intermediate_size, + intermediate_activation) + input_tensor = tf.ones((batch_size, sequence_length, feature_size)) + memory = tf.ones((batch_size, memory_length, feature_size)) + attention_mask = tf.ones((batch_size, sequence_length, memory_length), + dtype=tf.int64) + self_attention_mask = tf.ones( + (batch_size, sequence_length, sequence_length), dtype=tf.int64) + input_pos_embed = tf.ones((batch_size, sequence_length, feature_size)) + memory_pos_embed = tf.ones((batch_size, memory_length, feature_size)) + + out, _ = model([ + input_tensor, memory, attention_mask, self_attention_mask, + input_pos_embed, memory_pos_embed + ]) + self.assertAllEqual( + tf.shape(out), (batch_size, sequence_length, feature_size)) + + def test_transformer_decoder_block_get_config(self): + num_attention_heads = 2 + intermediate_size = 256 + intermediate_activation = 'relu' + model = transformer.TransformerDecoderBlock(num_attention_heads, + intermediate_size, + intermediate_activation) + config = model.get_config() + expected_config = { + 'name': 'transformer_decoder_block', + 'trainable': True, + 'dtype': 'float32', + 'num_attention_heads': 2, + 'intermediate_size': 256, + 'intermediate_activation': 'relu', + 'dropout_rate': 0.0, + 'attention_dropout_rate': 0.0, + 'kernel_initializer': { + 'class_name': 'GlorotUniform', + 'config': { + 'seed': None + } + }, + 'bias_initializer': { + 'class_name': 'Zeros', + 'config': {} + }, + 'kernel_regularizer': None, + 'bias_regularizer': None, + 'activity_regularizer': None, + 'kernel_constraint': None, + 'bias_constraint': None, + 'use_bias': True, + 'norm_first': False, + 'norm_epsilon': 1e-12, + 'intermediate_dropout': 0.0, + 'attention_initializer': { + 'class_name': 'GlorotUniform', + 'config': { + 'seed': None + } + } + } + self.assertAllEqual(expected_config, config) + + def test_transformer_decoder(self): + batch_size = 2 + sequence_length = 100 + memory_length = 200 + feature_size = 256 + num_layers = 2 + num_attention_heads = 2 + intermediate_size = 256 + model = transformer.TransformerDecoder( + num_layers=num_layers, + num_attention_heads=num_attention_heads, + intermediate_size=intermediate_size) + input_tensor = tf.ones((batch_size, sequence_length, feature_size)) + memory = tf.ones((batch_size, memory_length, feature_size)) + attention_mask = tf.ones((batch_size, sequence_length, memory_length), + dtype=tf.int64) + self_attention_mask = tf.ones( + (batch_size, sequence_length, sequence_length), dtype=tf.int64) + input_pos_embed = tf.ones((batch_size, sequence_length, feature_size)) + memory_pos_embed = tf.ones((batch_size, memory_length, feature_size)) + + outs = model( + input_tensor, + memory, + self_attention_mask, + attention_mask, + return_all_decoder_outputs=True, + input_pos_embed=input_pos_embed, + memory_pos_embed=memory_pos_embed) + self.assertLen(outs, 2) # intermeidate decoded outputs. + for out in outs: + self.assertAllEqual( + tf.shape(out), (batch_size, sequence_length, feature_size)) + + def test_transformer_decoder_get_config(self): + num_layers = 2 + num_attention_heads = 2 + intermediate_size = 256 + model = transformer.TransformerDecoder( + num_layers=num_layers, + num_attention_heads=num_attention_heads, + intermediate_size=intermediate_size) + config = model.get_config() + expected_config = { + 'name': 'transformer_decoder', + 'trainable': True, + 'dtype': 'float32', + 'num_layers': 2, + 'num_attention_heads': 2, + 'intermediate_size': 256, + 'activation': 'relu', + 'dropout_rate': 0.0, + 'attention_dropout_rate': 0.0, + 'use_bias': False, + 'norm_first': True, + 'norm_epsilon': 1e-06, + 'intermediate_dropout': 0.0 + } + self.assertAllEqual(expected_config, config) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/detr/ops/matchers.py b/official/projects/detr/ops/matchers.py new file mode 100644 index 0000000000000000000000000000000000000000..56f25585ed21208eaac0d5bc75d8ce731aef03a8 --- /dev/null +++ b/official/projects/detr/ops/matchers.py @@ -0,0 +1,489 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tensorflow implementation to solve the Linear Sum Assignment problem. + +The Linear Sum Assignment problem involves determining the minimum weight +matching for bipartite graphs. For example, this problem can be defined by +a 2D matrix C, where each element i,j determines the cost of matching worker i +with job j. The solution to the problem is a complete assignment of jobs to +workers, such that no job is assigned to more than one work and no worker is +assigned more than one job, with minimum cost. + +This implementation builds off of the Hungarian +Matching Algorithm (https://www.cse.ust.hk/~golin/COMP572/Notes/Matching.pdf). + +Based on the original implementation by Jiquan Ngiam . +""" +import tensorflow as tf +from official.modeling import tf_utils + + +def _prepare(weights): + """Prepare the cost matrix. + + To speed up computational efficiency of the algorithm, all weights are shifted + to be non-negative. Each element is reduced by the row / column minimum. Note + that neither operation will effect the resulting solution but will provide + a better starting point for the greedy assignment. Note this corresponds to + the pre-processing and step 1 of the Hungarian algorithm from Wikipedia. + + Args: + weights: A float32 [batch_size, num_elems, num_elems] tensor, where each + inner matrix represents weights to be use for matching. + + Returns: + A prepared weights tensor of the same shape and dtype. + """ + # Since every worker needs a job and every job needs a worker, we can subtract + # the minimum from each. + weights -= tf.reduce_min(weights, axis=2, keepdims=True) + weights -= tf.reduce_min(weights, axis=1, keepdims=True) + return weights + + +def _greedy_assignment(adj_matrix): + """Greedily assigns workers to jobs based on an adjaceny matrix. + + Starting with an adjacency matrix representing the available connections + in the bi-partite graph, this function greedily chooses elements such + that each worker is matched to at most one job (or each job is assigned to + at most one worker). Note, if the adjacency matrix has no available values + for a particular row/column, the corresponding job/worker may go unassigned. + + Args: + adj_matrix: A bool [batch_size, num_elems, num_elems] tensor, where each + element of the inner matrix represents whether the worker (row) can be + matched to the job (column). + + Returns: + A bool [batch_size, num_elems, num_elems] tensor, where each element of the + inner matrix represents whether the worker has been matched to the job. + Each row and column can have at most one true element. Some of the rows + and columns may not be matched. + """ + _, num_elems, _ = tf_utils.get_shape_list(adj_matrix, expected_rank=3) + adj_matrix = tf.transpose(adj_matrix, [1, 0, 2]) + + # Create a dynamic TensorArray containing the assignments for each worker/job + assignment = tf.TensorArray(tf.bool, num_elems) + + # Store the elements assigned to each column to update each iteration + col_assigned = tf.zeros_like(adj_matrix[0, ...], dtype=tf.bool) + + # Iteratively assign each row using tf.foldl. Intuitively, this is a loop + # over rows, where we incrementally assign each row. + def _assign_row(accumulator, row_adj): + # The accumulator tracks the row assignment index. + idx, assignment, col_assigned = accumulator + + # Viable candidates cannot already be assigned to another job. + candidates = row_adj & (~col_assigned) + + # Deterministically assign to the candidates of the highest index count. + max_candidate_idx = tf.argmax( + tf.cast(candidates, tf.int32), axis=1, output_type=tf.int32) + + candidates_indicator = tf.one_hot( + max_candidate_idx, + num_elems, + on_value=True, + off_value=False, + dtype=tf.bool) + candidates_indicator &= candidates + + # Make assignment to the column. + col_assigned |= candidates_indicator + assignment = assignment.write(idx, candidates_indicator) + + return (idx + 1, assignment, col_assigned) + + _, assignment, _ = tf.foldl( + _assign_row, adj_matrix, (0, assignment, col_assigned), back_prop=False) + + assignment = assignment.stack() + assignment = tf.transpose(assignment, [1, 0, 2]) + return assignment + + +def _find_augmenting_path(assignment, adj_matrix): + """Finds an augmenting path given an assignment and an adjacency matrix. + + The augmenting path search starts from the unassigned workers, then goes on + to find jobs (via an unassigned pairing), then back again to workers (via an + existing pairing), and so on. The path alternates between unassigned and + existing pairings. Returns the state after the search. + + Note: In the state the worker and job, indices are 1-indexed so that we can + use 0 to represent unreachable nodes. State contains the following keys: + + - jobs: A [batch_size, 1, num_elems] tensor containing the highest index + unassigned worker that can reach this job through a path. + - jobs_from_worker: A [batch_size, num_elems] tensor containing the worker + reached immediately before this job. + - workers: A [batch_size, num_elems, 1] tensor containing the highest index + unassigned worker that can reach this worker through a path. + - workers_from_job: A [batch_size, num_elems] tensor containing the job + reached immediately before this worker. + - new_jobs: A bool [batch_size, num_elems] tensor containing True if the + unassigned job can be reached via a path. + + State can be used to recover the path via backtracking. + + Args: + assignment: A bool [batch_size, num_elems, num_elems] tensor, where each + element of the inner matrix represents whether the worker has been matched + to the job. This may be a partial assignment. + adj_matrix: A bool [batch_size, num_elems, num_elems] tensor, where each + element of the inner matrix represents whether the worker (row) can be + matched to the job (column). + + Returns: + A state dict, which represents the outcome of running an augmenting + path search on the graph given the assignment. + """ + batch_size, num_elems, _ = tf_utils.get_shape_list( + assignment, expected_rank=3) + unassigned_workers = ~tf.reduce_any(assignment, axis=2, keepdims=True) + unassigned_jobs = ~tf.reduce_any(assignment, axis=1, keepdims=True) + + unassigned_pairings = tf.cast(adj_matrix & ~assignment, tf.int32) + existing_pairings = tf.cast(assignment, tf.int32) + + # Initialize unassigned workers to have non-zero ids, assigned workers will + # have ids = 0. + worker_indices = tf.range(1, num_elems + 1, dtype=tf.int32) + init_workers = tf.tile(worker_indices[tf.newaxis, :, tf.newaxis], + [batch_size, 1, 1]) + init_workers *= tf.cast(unassigned_workers, tf.int32) + + state = { + "jobs": tf.zeros((batch_size, 1, num_elems), dtype=tf.int32), + "jobs_from_worker": tf.zeros((batch_size, num_elems), dtype=tf.int32), + "workers": init_workers, + "workers_from_job": tf.zeros((batch_size, num_elems), dtype=tf.int32) + } + + def _has_active_workers(state, curr_workers): + """Check if there are still active workers.""" + del state + return tf.reduce_sum(curr_workers) > 0 + + def _augment_step(state, curr_workers): + """Performs one search step.""" + + # Note: These steps could be potentially much faster if sparse matrices are + # supported. The unassigned_pairings and existing_pairings matrices can be + # very sparse. + + # Find potential jobs using current workers. + potential_jobs = curr_workers * unassigned_pairings + curr_jobs = tf.reduce_max(potential_jobs, axis=1, keepdims=True) + curr_jobs_from_worker = 1 + tf.argmax( + potential_jobs, axis=1, output_type=tf.int32) + + # Remove already accessible jobs from curr_jobs. + default_jobs = tf.zeros_like(state["jobs"], dtype=state["jobs"].dtype) + curr_jobs = tf.where(state["jobs"] > 0, default_jobs, curr_jobs) + curr_jobs_from_worker *= tf.cast(curr_jobs > 0, tf.int32)[:, 0, :] + + # Find potential workers from current jobs. + potential_workers = curr_jobs * existing_pairings + curr_workers = tf.reduce_max(potential_workers, axis=2, keepdims=True) + curr_workers_from_job = 1 + tf.argmax( + potential_workers, axis=2, output_type=tf.int32) + + # Remove already accessible workers from curr_workers. + default_workers = tf.zeros_like(state["workers"]) + curr_workers = tf.where( + state["workers"] > 0, default_workers, curr_workers) + curr_workers_from_job *= tf.cast(curr_workers > 0, tf.int32)[:, :, 0] + + # Update state so that we can backtrack later. + state = state.copy() + state["jobs"] = tf.maximum(state["jobs"], curr_jobs) + state["jobs_from_worker"] = tf.maximum(state["jobs_from_worker"], + curr_jobs_from_worker) + state["workers"] = tf.maximum(state["workers"], curr_workers) + state["workers_from_job"] = tf.maximum(state["workers_from_job"], + curr_workers_from_job) + + return state, curr_workers + + state, _ = tf.while_loop( + _has_active_workers, + _augment_step, (state, init_workers), + back_prop=False) + + # Compute new jobs, this is useful for determnining termnination of the + # maximum bi-partite matching and initialization for backtracking. + new_jobs = (state["jobs"] > 0) & unassigned_jobs + state["new_jobs"] = new_jobs[:, 0, :] + return state + + +def _improve_assignment(assignment, state): + """Improves an assignment by backtracking the augmented path using state. + + Args: + assignment: A bool [batch_size, num_elems, num_elems] tensor, where each + element of the inner matrix represents whether the worker has been matched + to the job. This may be a partial assignment. + state: A dict, which represents the outcome of running an augmenting path + search on the graph given the assignment. + + Returns: + A new assignment matrix of the same shape and type as assignment, where the + assignment has been updated using the augmented path found. + """ + batch_size, num_elems, _ = tf_utils.get_shape_list(assignment, 3) + + # We store the current job id and iteratively backtrack using jobs_from_worker + # and workers_from_job until we reach an unassigned worker. We flip all the + # assignments on this path to discover a better overall assignment. + + # Note: The indices in state are 1-indexed, where 0 represents that the + # worker / job cannot be reached. + + # Obtain initial job indices based on new_jobs. + curr_job_idx = tf.argmax( + tf.cast(state["new_jobs"], tf.int32), axis=1, output_type=tf.int32) + + # Track whether an example is actively being backtracked. Since we are + # operating on a batch, not all examples in the batch may be active. + active = tf.gather(state["new_jobs"], curr_job_idx, batch_dims=1) + batch_range = tf.range(0, batch_size, dtype=tf.int32) + + # Flip matrix tracks which assignments we need to flip - corresponding to the + # augmenting path taken. We use an integer tensor here so that we can use + # tensor_scatter_nd_add to update the tensor, and then cast it back to bool + # after the loop. + flip_matrix = tf.zeros((batch_size, num_elems, num_elems), dtype=tf.int32) + + def _has_active_backtracks(flip_matrix, active, curr_job_idx): + """Check if there are still active workers.""" + del flip_matrix, curr_job_idx + return tf.reduce_any(active) + + def _backtrack_one_step(flip_matrix, active, curr_job_idx): + """Take one step in backtracking.""" + # Discover the worker that the job originated from, note that this worker + # must exist by construction. + curr_worker_idx = tf.gather( + state["jobs_from_worker"], curr_job_idx, batch_dims=1) - 1 + curr_worker_idx = tf.maximum(curr_worker_idx, 0) + update_indices = tf.stack([batch_range, curr_worker_idx, curr_job_idx], + axis=1) + update_indices = tf.maximum(update_indices, 0) + flip_matrix = tf.tensor_scatter_nd_add(flip_matrix, update_indices, + tf.cast(active, tf.int32)) + + # Discover the (potential) job that the worker originated from. + curr_job_idx = tf.gather( + state["workers_from_job"], curr_worker_idx, batch_dims=1) - 1 + # Note that jobs may not be active, and we track that here (before + # adjusting indices so that they are all >= 0 for gather). + active &= curr_job_idx >= 0 + curr_job_idx = tf.maximum(curr_job_idx, 0) + update_indices = tf.stack([batch_range, curr_worker_idx, curr_job_idx], + axis=1) + update_indices = tf.maximum(update_indices, 0) + flip_matrix = tf.tensor_scatter_nd_add(flip_matrix, update_indices, + tf.cast(active, tf.int32)) + + return flip_matrix, active, curr_job_idx + + flip_matrix, _, _ = tf.while_loop( + _has_active_backtracks, + _backtrack_one_step, (flip_matrix, active, curr_job_idx), + back_prop=False) + + flip_matrix = tf.cast(flip_matrix, tf.bool) + assignment = tf.math.logical_xor(assignment, flip_matrix) + + return assignment + + +def _maximum_bipartite_matching(adj_matrix, assignment=None): + """Performs maximum bipartite matching using augmented paths. + + Args: + adj_matrix: A bool [batch_size, num_elems, num_elems] tensor, where each + element of the inner matrix represents whether the worker (row) can be + matched to the job (column). + assignment: An optional bool [batch_size, num_elems, num_elems] tensor, + where each element of the inner matrix represents whether the worker has + been matched to the job. This may be a partial assignment. If specified, + this assignment will be used to seed the iterative algorithm. + + Returns: + A state dict representing the final augmenting path state search, and + a maximum bipartite matching assignment tensor. Note that the state outcome + can be used to compute a minimum vertex cover for the bipartite graph. + """ + + if assignment is None: + assignment = _greedy_assignment(adj_matrix) + + state = _find_augmenting_path(assignment, adj_matrix) + + def _has_new_jobs(state, assignment): + del assignment + return tf.reduce_any(state["new_jobs"]) + + def _improve_assignment_and_find_new_path(state, assignment): + assignment = _improve_assignment(assignment, state) + state = _find_augmenting_path(assignment, adj_matrix) + return state, assignment + + state, assignment = tf.while_loop( + _has_new_jobs, + _improve_assignment_and_find_new_path, (state, assignment), + back_prop=False) + + return state, assignment + + +def _compute_cover(state, assignment): + """Computes a cover for the bipartite graph. + + We compute a cover using the construction provided at + https://en.wikipedia.org/wiki/K%C5%91nig%27s_theorem_(graph_theory)#Proof + which uses the outcome from the alternating path search. + + Args: + state: A state dict, which represents the outcome of running an augmenting + path search on the graph given the assignment. + assignment: An optional bool [batch_size, num_elems, num_elems] tensor, + where each element of the inner matrix represents whether the worker has + been matched to the job. This may be a partial assignment. If specified, + this assignment will be used to seed the iterative algorithm. + + Returns: + A tuple of (workers_cover, jobs_cover) corresponding to row and column + covers for the bipartite graph. workers_cover is a boolean tensor of shape + [batch_size, num_elems, 1] and jobs_cover is a boolean tensor of shape + [batch_size, 1, num_elems]. + """ + assigned_workers = tf.reduce_any(assignment, axis=2, keepdims=True) + assigned_jobs = tf.reduce_any(assignment, axis=1, keepdims=True) + + reachable_workers = state["workers"] > 0 + reachable_jobs = state["jobs"] > 0 + + workers_cover = assigned_workers & (~reachable_workers) + jobs_cover = assigned_jobs & reachable_jobs + + return workers_cover, jobs_cover + + +def _update_weights_using_cover(workers_cover, jobs_cover, weights): + """Updates weights for hungarian matching using a cover. + + We first find the minimum uncovered weight. Then, we subtract this from all + the uncovered weights, and add it to all the doubly covered weights. + + Args: + workers_cover: A boolean tensor of shape [batch_size, num_elems, 1]. + jobs_cover: A boolean tensor of shape [batch_size, 1, num_elems]. + weights: A float32 [batch_size, num_elems, num_elems] tensor, where each + inner matrix represents weights to be use for matching. + + Returns: + A new weight matrix with elements adjusted by the cover. + """ + max_value = tf.reduce_max(weights) + + covered = workers_cover | jobs_cover + double_covered = workers_cover & jobs_cover + + uncovered_weights = tf.where(covered, + tf.ones_like(weights) * max_value, weights) + min_weight = tf.reduce_min(uncovered_weights, axis=[-2, -1], keepdims=True) + + add_weight = tf.where(double_covered, + tf.ones_like(weights) * min_weight, + tf.zeros_like(weights)) + sub_weight = tf.where(covered, tf.zeros_like(weights), + tf.ones_like(weights) * min_weight) + + return weights + add_weight - sub_weight + + +def assert_rank(tensor, expected_rank, name=None): + """Raises an exception if the tensor rank is not of the expected rank. + + Args: + tensor: A tf.Tensor to check the rank of. + expected_rank: Python integer or list of integers, expected rank. + name: Optional name of the tensor for the error message. + + Raises: + ValueError: If the expected shape doesn't match the actual shape. + """ + expected_rank_dict = {} + if isinstance(expected_rank, int): + expected_rank_dict[expected_rank] = True + else: + for x in expected_rank: + expected_rank_dict[x] = True + + actual_rank = len(tensor.shape) + if actual_rank not in expected_rank_dict: + raise ValueError( + "For the tensor `%s`, the actual tensor rank `%d` (shape = %s) is not " + "equal to the expected tensor rank `%s`" % + (name, actual_rank, str(tensor.shape), str(expected_rank))) + + +def hungarian_matching(weights): + """Computes the minimum linear sum assignment using the Hungarian algorithm. + + Args: + weights: A float32 [batch_size, num_elems, num_elems] tensor, where each + inner matrix represents weights to be use for matching. + + Returns: + A bool [batch_size, num_elems, num_elems] tensor, where each element of the + inner matrix represents whether the worker has been matched to the job. + The returned matching will always be a perfect match. + """ + batch_size, num_elems, _ = tf_utils.get_shape_list(weights, 3) + + weights = _prepare(weights) + adj_matrix = tf.equal(weights, 0.) + state, assignment = _maximum_bipartite_matching(adj_matrix) + workers_cover, jobs_cover = _compute_cover(state, assignment) + + def _cover_incomplete(workers_cover, jobs_cover, *args): + del args + cover_sum = ( + tf.reduce_sum(tf.cast(workers_cover, tf.int32)) + + tf.reduce_sum(tf.cast(jobs_cover, tf.int32))) + return tf.less(cover_sum, batch_size * num_elems) + + def _update_weights_and_match(workers_cover, jobs_cover, weights, assignment): + weights = _update_weights_using_cover(workers_cover, jobs_cover, weights) + adj_matrix = tf.equal(weights, 0.) + state, assignment = _maximum_bipartite_matching(adj_matrix, assignment) + workers_cover, jobs_cover = _compute_cover(state, assignment) + return workers_cover, jobs_cover, weights, assignment + + workers_cover, jobs_cover, weights, assignment = tf.while_loop( + _cover_incomplete, + _update_weights_and_match, + (workers_cover, jobs_cover, weights, assignment), + back_prop=False) + return weights, assignment + diff --git a/official/projects/detr/ops/matchers_test.py b/official/projects/detr/ops/matchers_test.py new file mode 100644 index 0000000000000000000000000000000000000000..71c607123a31e6fd185c2d2528a7c17970e3b40a --- /dev/null +++ b/official/projects/detr/ops/matchers_test.py @@ -0,0 +1,95 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for tensorflow_models.official.projects.detr.ops.matchers.""" + +import numpy as np +from scipy import optimize +import tensorflow as tf + +from official.projects.detr.ops import matchers + + +class MatchersOpsTest(tf.test.TestCase): + + def testLinearSumAssignment(self): + """Check a simple 2D test case of the Linear Sum Assignment problem. + + Ensures that the implementation of the matching algorithm is correct + and functional on TPUs. + """ + cost_matrix = np.array([[[4, 1, 3], [2, 0, 5], [3, 2, 2]]], + dtype=np.float32) + _, adjacency_matrix = matchers.hungarian_matching(tf.constant(cost_matrix)) + adjacency_output = adjacency_matrix.numpy() + + correct_output = np.array([ + [0, 1, 0], + [1, 0, 0], + [0, 0, 1], + ], dtype=bool) + self.assertAllEqual(adjacency_output[0], correct_output) + + def testBatchedLinearSumAssignment(self): + """Check a batched case of the Linear Sum Assignment Problem. + + Ensures that a correct solution is found for all inputted problems within + a batch. + """ + cost_matrix = np.array([ + [[4, 1, 3], [2, 0, 5], [3, 2, 2]], + [[1, 4, 3], [0, 2, 5], [2, 3, 2]], + [[1, 3, 4], [0, 5, 2], [2, 2, 3]], + ], + dtype=np.float32) + _, adjacency_matrix = matchers.hungarian_matching(tf.constant(cost_matrix)) + adjacency_output = adjacency_matrix.numpy() + + # Hand solved correct output for the linear sum assignment problem + correct_output = np.array([ + [[0, 1, 0], [1, 0, 0], [0, 0, 1]], + [[1, 0, 0], [0, 1, 0], [0, 0, 1]], + [[1, 0, 0], [0, 0, 1], [0, 1, 0]], + ], + dtype=bool) + self.assertAllClose(adjacency_output, correct_output) + + def testMaximumBipartiteMatching(self): + """Check that the maximum bipartite match assigns the correct numbers.""" + adj_matrix = tf.cast([[ + [1, 0, 0, 0, 1], + [0, 1, 0, 1, 0], + [0, 0, 1, 0, 0], + [0, 1, 0, 0, 0], + [1, 0, 0, 0, 0], + ]], tf.bool) + _, assignment = matchers._maximum_bipartite_matching(adj_matrix) + self.assertEqual(np.sum(assignment.numpy()), 5) + + def testAssignmentMatchesScipy(self): + """Check that the Linear Sum Assignment matches the Scipy implementation.""" + batch_size, num_elems = 2, 25 + weights = tf.random.uniform((batch_size, num_elems, num_elems), + minval=0., + maxval=1.) + weights, assignment = matchers.hungarian_matching(weights) + + for idx in range(batch_size): + _, scipy_assignment = optimize.linear_sum_assignment(weights.numpy()[idx]) + hungarian_assignment = np.where(assignment.numpy()[idx])[1] + + self.assertAllEqual(hungarian_assignment, scipy_assignment) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/detr/optimization.py b/official/projects/detr/optimization.py new file mode 100644 index 0000000000000000000000000000000000000000..a9da1740d11a83f3989d85e78c04b04c50773539 --- /dev/null +++ b/official/projects/detr/optimization.py @@ -0,0 +1,147 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Customized optimizer to match paper results.""" + +import dataclasses +import tensorflow as tf +from official.modeling import optimization +from official.nlp import optimization as nlp_optimization + + +@dataclasses.dataclass +class DETRAdamWConfig(optimization.AdamWeightDecayConfig): + pass + + +@dataclasses.dataclass +class OptimizerConfig(optimization.OptimizerConfig): + detr_adamw: DETRAdamWConfig = DETRAdamWConfig() + + +@dataclasses.dataclass +class OptimizationConfig(optimization.OptimizationConfig): + """Configuration for optimizer and learning rate schedule. + + Attributes: + optimizer: optimizer oneof config. + ema: optional exponential moving average optimizer config, if specified, ema + optimizer will be used. + learning_rate: learning rate oneof config. + warmup: warmup oneof config. + """ + optimizer: OptimizerConfig = OptimizerConfig() + + +# TODO(frederickliu): figure out how to make this configuable. +# TODO(frederickliu): Study if this is needed. +class _DETRAdamW(nlp_optimization.AdamWeightDecay): + """Custom AdamW to support different lr scaling for backbone. + + The code is copied from AdamWeightDecay and Adam with learning scaling. + """ + + def _resource_apply_dense(self, grad, var, apply_state=None): + lr_t, kwargs = self._get_lr(var.device, var.dtype.base_dtype, apply_state) + apply_state = kwargs['apply_state'] + if 'detr' not in var.name: + lr_t *= 0.1 + decay = self._decay_weights_op(var, lr_t, apply_state) + with tf.control_dependencies([decay]): + var_device, var_dtype = var.device, var.dtype.base_dtype + coefficients = ((apply_state or {}).get((var_device, var_dtype)) + or self._fallback_apply_state(var_device, var_dtype)) + + m = self.get_slot(var, 'm') + v = self.get_slot(var, 'v') + lr = coefficients[ + 'lr_t'] * 0.1 if 'detr' not in var.name else coefficients['lr_t'] + + if not self.amsgrad: + return tf.raw_ops.ResourceApplyAdam( + var=var.handle, + m=m.handle, + v=v.handle, + beta1_power=coefficients['beta_1_power'], + beta2_power=coefficients['beta_2_power'], + lr=lr, + beta1=coefficients['beta_1_t'], + beta2=coefficients['beta_2_t'], + epsilon=coefficients['epsilon'], + grad=grad, + use_locking=self._use_locking) + else: + vhat = self.get_slot(var, 'vhat') + return tf.raw_ops.ResourceApplyAdamWithAmsgrad( + var=var.handle, + m=m.handle, + v=v.handle, + vhat=vhat.handle, + beta1_power=coefficients['beta_1_power'], + beta2_power=coefficients['beta_2_power'], + lr=lr, + beta1=coefficients['beta_1_t'], + beta2=coefficients['beta_2_t'], + epsilon=coefficients['epsilon'], + grad=grad, + use_locking=self._use_locking) + + def _resource_apply_sparse(self, grad, var, indices, apply_state=None): + lr_t, kwargs = self._get_lr(var.device, var.dtype.base_dtype, apply_state) + apply_state = kwargs['apply_state'] + if 'detr' not in var.name: + lr_t *= 0.1 + decay = self._decay_weights_op(var, lr_t, apply_state) + with tf.control_dependencies([decay]): + var_device, var_dtype = var.device, var.dtype.base_dtype + coefficients = ((apply_state or {}).get((var_device, var_dtype)) + or self._fallback_apply_state(var_device, var_dtype)) + + # m_t = beta1 * m + (1 - beta1) * g_t + m = self.get_slot(var, 'm') + m_scaled_g_values = grad * coefficients['one_minus_beta_1_t'] + m_t = tf.compat.v1.assign(m, m * coefficients['beta_1_t'], + use_locking=self._use_locking) + with tf.control_dependencies([m_t]): + m_t = self._resource_scatter_add(m, indices, m_scaled_g_values) + + # v_t = beta2 * v + (1 - beta2) * (g_t * g_t) + v = self.get_slot(var, 'v') + v_scaled_g_values = (grad * grad) * coefficients['one_minus_beta_2_t'] + v_t = tf.compat.v1.assign(v, v * coefficients['beta_2_t'], + use_locking=self._use_locking) + with tf.control_dependencies([v_t]): + v_t = self._resource_scatter_add(v, indices, v_scaled_g_values) + lr = coefficients[ + 'lr_t'] * 0.1 if 'detr' not in var.name else coefficients['lr_t'] + if not self.amsgrad: + v_sqrt = tf.sqrt(v_t) + var_update = tf.compat.v1.assign_sub( + var, lr * m_t / (v_sqrt + coefficients['epsilon']), + use_locking=self._use_locking) + return tf.group(*[var_update, m_t, v_t]) + else: + v_hat = self.get_slot(var, 'vhat') + v_hat_t = tf.maximum(v_hat, v_t) + with tf.control_dependencies([v_hat_t]): + v_hat_t = tf.compat.v1.assign( + v_hat, v_hat_t, use_locking=self._use_locking) + v_hat_sqrt = tf.sqrt(v_hat_t) + var_update = tf.compat.v1.assign_sub( + var, + lr* m_t / (v_hat_sqrt + coefficients['epsilon']), + use_locking=self._use_locking) + return tf.group(*[var_update, m_t, v_t, v_hat_t]) + +optimization.register_optimizer_cls('detr_adamw', _DETRAdamW) diff --git a/official/projects/detr/tasks/detection.py b/official/projects/detr/tasks/detection.py new file mode 100644 index 0000000000000000000000000000000000000000..732b1801b881ef1a8060201434c118b1e71c1715 --- /dev/null +++ b/official/projects/detr/tasks/detection.py @@ -0,0 +1,402 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""DETR detection task definition.""" +from typing import Optional + +from absl import logging +import tensorflow as tf + +from official.common import dataset_fn +from official.core import base_task +from official.core import task_factory +from official.projects.detr.configs import detr as detr_cfg +from official.projects.detr.dataloaders import coco +from official.projects.detr.dataloaders import detr_input +from official.projects.detr.modeling import detr +from official.projects.detr.ops import matchers +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import tf_example_decoder +from official.vision.dataloaders import tfds_factory +from official.vision.dataloaders import tf_example_label_map_decoder +from official.vision.evaluation import coco_evaluator +from official.vision.modeling import backbones +from official.vision.ops import box_ops + + +@task_factory.register_task_cls(detr_cfg.DetrTask) +class DetectionTask(base_task.Task): + """A single-replica view of training procedure. + + DETR task provides artifacts for training/evalution procedures, including + loading/iterating over Datasets, initializing the model, calculating the loss, + post-processing, and customized metrics with reduction. + """ + + def build_model(self): + """Build DETR model.""" + + input_specs = tf.keras.layers.InputSpec(shape=[None] + + self._task_config.model.input_size) + + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=self._task_config.model.backbone, + norm_activation_config=self._task_config.model.norm_activation) + + model = detr.DETR(backbone, + self._task_config.model.backbone_endpoint_name, + self._task_config.model.num_queries, + self._task_config.model.hidden_size, + self._task_config.model.num_classes, + self._task_config.model.num_encoder_layers, + self._task_config.model.num_decoder_layers) + return model + + def initialize(self, model: tf.keras.Model): + """Loading pretrained checkpoint.""" + if not self._task_config.init_checkpoint: + return + + ckpt_dir_or_file = self._task_config.init_checkpoint + + # Restoring checkpoint. + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + if self._task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.restore(ckpt_dir_or_file) + status.assert_consumed() + elif self._task_config.init_checkpoint_modules == 'backbone': + ckpt = tf.train.Checkpoint(backbone=model.backbone) + status = ckpt.restore(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs(self, + params, + input_context: Optional[tf.distribute.InputContext] = None): + """Build input dataset.""" + if isinstance(params, coco.COCODataConfig): + dataset = coco.COCODataLoader(params).load(input_context) + else: + if params.tfds_name: + decoder = tfds_factory.get_detection_decoder(params.tfds_name) + else: + decoder_cfg = params.decoder.get() + if params.decoder.type == 'simple_decoder': + decoder = tf_example_decoder.TfExampleDecoder( + regenerate_source_id=decoder_cfg.regenerate_source_id) + elif params.decoder.type == 'label_map_decoder': + decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( + label_map=decoder_cfg.label_map, + regenerate_source_id=decoder_cfg.regenerate_source_id) + else: + raise ValueError('Unknown decoder type: {}!'.format( + params.decoder.type)) + + parser = detr_input.Parser( + class_offset=self._task_config.losses.class_offset, + output_size=self._task_config.model.input_size[:2], + ) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + dataset = reader.read(input_context=input_context) + + return dataset + + def _compute_cost(self, cls_outputs, box_outputs, cls_targets, box_targets): + # Approximate classification cost with 1 - prob[target class]. + # The 1 is a constant that doesn't change the matching, it can be ommitted. + # background: 0 + cls_cost = self._task_config.losses.lambda_cls * tf.gather( + -tf.nn.softmax(cls_outputs), cls_targets, batch_dims=1, axis=-1) + + # Compute the L1 cost between boxes, + paired_differences = self._task_config.losses.lambda_box * tf.abs( + tf.expand_dims(box_outputs, 2) - tf.expand_dims(box_targets, 1)) + box_cost = tf.reduce_sum(paired_differences, axis=-1) + + # Compute the giou cost betwen boxes + giou_cost = self._task_config.losses.lambda_giou * -box_ops.bbox_generalized_overlap( + box_ops.cycxhw_to_yxyx(box_outputs), + box_ops.cycxhw_to_yxyx(box_targets)) + + total_cost = cls_cost + box_cost + giou_cost + + max_cost = ( + self._task_config.losses.lambda_cls * 0.0 + + self._task_config.losses.lambda_box * 4. + + self._task_config.losses.lambda_giou * 0.0) + + # Set pads to large constant + valid = tf.expand_dims( + tf.cast(tf.not_equal(cls_targets, 0), dtype=total_cost.dtype), axis=1) + total_cost = (1 - valid) * max_cost + valid * total_cost + + # Set inf of nan to large constant + total_cost = tf.where( + tf.logical_or(tf.math.is_nan(total_cost), tf.math.is_inf(total_cost)), + max_cost * tf.ones_like(total_cost, dtype=total_cost.dtype), + total_cost) + + return total_cost + + def build_losses(self, outputs, labels, aux_losses=None): + """Build DETR losses.""" + cls_outputs = outputs['cls_outputs'] + box_outputs = outputs['box_outputs'] + cls_targets = labels['classes'] + box_targets = labels['boxes'] + + cost = self._compute_cost( + cls_outputs, box_outputs, cls_targets, box_targets) + + _, indices = matchers.hungarian_matching(cost) + indices = tf.stop_gradient(indices) + + target_index = tf.math.argmax(indices, axis=1) + cls_assigned = tf.gather(cls_outputs, target_index, batch_dims=1, axis=1) + box_assigned = tf.gather(box_outputs, target_index, batch_dims=1, axis=1) + + background = tf.equal(cls_targets, 0) + num_boxes = tf.reduce_sum( + tf.cast(tf.logical_not(background), tf.float32), axis=-1) + + # Down-weight background to account for class imbalance. + xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits( + labels=cls_targets, logits=cls_assigned) + cls_loss = self._task_config.losses.lambda_cls * tf.where( + background, self._task_config.losses.background_cls_weight * xentropy, + xentropy) + cls_weights = tf.where( + background, + self._task_config.losses.background_cls_weight * tf.ones_like(cls_loss), + tf.ones_like(cls_loss)) + + # Box loss is only calculated on non-background class. + l_1 = tf.reduce_sum(tf.abs(box_assigned - box_targets), axis=-1) + box_loss = self._task_config.losses.lambda_box * tf.where( + background, tf.zeros_like(l_1), l_1) + + # Giou loss is only calculated on non-background class. + giou = tf.linalg.diag_part(1.0 - box_ops.bbox_generalized_overlap( + box_ops.cycxhw_to_yxyx(box_assigned), + box_ops.cycxhw_to_yxyx(box_targets) + )) + giou_loss = self._task_config.losses.lambda_giou * tf.where( + background, tf.zeros_like(giou), giou) + + # Consider doing all reduce once in train_step to speed up. + num_boxes_per_replica = tf.reduce_sum(num_boxes) + cls_weights_per_replica = tf.reduce_sum(cls_weights) + replica_context = tf.distribute.get_replica_context() + num_boxes_sum, cls_weights_sum = replica_context.all_reduce( + tf.distribute.ReduceOp.SUM, + [num_boxes_per_replica, cls_weights_per_replica]) + cls_loss = tf.math.divide_no_nan( + tf.reduce_sum(cls_loss), cls_weights_sum) + box_loss = tf.math.divide_no_nan( + tf.reduce_sum(box_loss), num_boxes_sum) + giou_loss = tf.math.divide_no_nan( + tf.reduce_sum(giou_loss), num_boxes_sum) + + aux_losses = tf.add_n(aux_losses) if aux_losses else 0.0 + + total_loss = cls_loss + box_loss + giou_loss + aux_losses + return total_loss, cls_loss, box_loss, giou_loss + + def build_metrics(self, training=True): + """Build detection metrics.""" + metrics = [] + metric_names = ['cls_loss', 'box_loss', 'giou_loss'] + for name in metric_names: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + + if not training: + self.coco_metric = coco_evaluator.COCOEvaluator( + annotation_file=self._task_config.annotation_file, + include_mask=False, + need_rescale_bboxes=True, + per_category_metrics=self._task_config.per_category_metrics) + return metrics + + def train_step(self, inputs, model, optimizer, metrics=None): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + with tf.GradientTape() as tape: + outputs = model(features, training=True) + + loss = 0.0 + cls_loss = 0.0 + box_loss = 0.0 + giou_loss = 0.0 + + for output in outputs: + # Computes per-replica loss. + layer_loss, layer_cls_loss, layer_box_loss, layer_giou_loss = self.build_losses( + outputs=output, labels=labels, aux_losses=model.losses) + loss += layer_loss + cls_loss += layer_cls_loss + box_loss += layer_box_loss + giou_loss += layer_giou_loss + + # Consider moving scaling logic from build_losses to here. + scaled_loss = loss + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient when LossScaleOptimizer is used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + # Multiply for logging. + # Since we expect the gradient replica sum to happen in the optimizer, + # the loss is scaled with global num_boxes and weights. + # To have it more interpretable/comparable we scale it back when logging. + num_replicas_in_sync = tf.distribute.get_strategy().num_replicas_in_sync + loss *= num_replicas_in_sync + cls_loss *= num_replicas_in_sync + box_loss *= num_replicas_in_sync + giou_loss *= num_replicas_in_sync + + # Trainer class handles loss metric for you. + logs = {self.loss: loss} + + all_losses = { + 'cls_loss': cls_loss, + 'box_loss': box_loss, + 'giou_loss': giou_loss, + } + + # Metric results will be added to logs for you. + if metrics: + for m in metrics: + m.update_state(all_losses[m.name]) + return logs + + def validation_step(self, inputs, model, metrics=None): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + + outputs = model(features, training=False)[-1] + loss, cls_loss, box_loss, giou_loss = self.build_losses( + outputs=outputs, labels=labels, aux_losses=model.losses) + + # Multiply for logging. + # Since we expect the gradient replica sum to happen in the optimizer, + # the loss is scaled with global num_boxes and weights. + # To have it more interpretable/comparable we scale it back when logging. + num_replicas_in_sync = tf.distribute.get_strategy().num_replicas_in_sync + loss *= num_replicas_in_sync + cls_loss *= num_replicas_in_sync + box_loss *= num_replicas_in_sync + giou_loss *= num_replicas_in_sync + + # Evaluator class handles loss metric for you. + logs = {self.loss: loss} + + predictions = { + 'detection_boxes': + box_ops.cycxhw_to_yxyx(outputs['box_outputs']) + * tf.expand_dims( + tf.concat([ + labels['image_info'][:, 1:2, 0], + labels['image_info'][:, 1:2, 1], + labels['image_info'][:, 1:2, 0], + labels['image_info'][:, 1:2, 1] + ], + axis=1), + axis=1), + 'detection_scores': + tf.math.reduce_max( + tf.nn.softmax(outputs['cls_outputs'])[:, :, 1:], axis=-1), + 'detection_classes': + tf.math.argmax(outputs['cls_outputs'][:, :, 1:], axis=-1) + 1, + # Fix this. It's not being used at the moment. + 'num_detections': tf.reduce_sum( + tf.cast( + tf.math.greater(tf.math.reduce_max( + outputs['cls_outputs'], axis=-1), 0), tf.int32), axis=-1), + 'source_id': labels['id'], + 'image_info': labels['image_info'] + } + ground_truths = { + 'source_id': labels['id'], + 'height': labels['image_info'][:, 0:1, 0], + 'width': labels['image_info'][:, 0:1, 1], + 'num_detections': tf.reduce_sum( + tf.cast(tf.math.greater(labels['classes'], 0), tf.int32), axis=-1), + 'boxes': labels['gt_boxes'], + 'classes': labels['classes'], + 'is_crowds': labels['is_crowd'] + } + logs.update({'predictions': predictions, + 'ground_truths': ground_truths}) + + all_losses = { + 'cls_loss': cls_loss, + 'box_loss': box_loss, + 'giou_loss': giou_loss, + } + + # Metric results will be added to logs for you. + if metrics: + for m in metrics: + m.update_state(all_losses[m.name]) + return logs + + def aggregate_logs(self, state=None, step_outputs=None): + if state is None: + self.coco_metric.reset_states() + state = self.coco_metric + + state.update_state( + step_outputs['ground_truths'], + step_outputs['predictions']) + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + return aggregated_logs.result() diff --git a/official/projects/detr/tasks/detection_test.py b/official/projects/detr/tasks/detection_test.py new file mode 100644 index 0000000000000000000000000000000000000000..7e27f4db0ca9e2a095f1b0ae85469cfe38d47f0e --- /dev/null +++ b/official/projects/detr/tasks/detection_test.py @@ -0,0 +1,203 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for detection.""" + +import numpy as np +import tensorflow as tf +import tensorflow_datasets as tfds + +from official.projects.detr import optimization +from official.projects.detr.configs import detr as detr_cfg +from official.projects.detr.dataloaders import coco +from official.projects.detr.tasks import detection +from official.vision.configs import backbones + + +_NUM_EXAMPLES = 10 + + +def _gen_fn(): + h = np.random.randint(0, 300) + w = np.random.randint(0, 300) + num_boxes = np.random.randint(0, 50) + return { + 'image': np.ones(shape=(h, w, 3), dtype=np.uint8), + 'image/id': np.random.randint(0, 100), + 'image/filename': 'test', + 'objects': { + 'is_crowd': np.ones(shape=(num_boxes), dtype=np.bool), + 'bbox': np.ones(shape=(num_boxes, 4), dtype=np.float32), + 'label': np.ones(shape=(num_boxes), dtype=np.int64), + 'id': np.ones(shape=(num_boxes), dtype=np.int64), + 'area': np.ones(shape=(num_boxes), dtype=np.int64), + } + } + + +def _as_dataset(self, *args, **kwargs): + del args + del kwargs + return tf.data.Dataset.from_generator( + lambda: (_gen_fn() for i in range(_NUM_EXAMPLES)), + output_types=self.info.features.dtype, + output_shapes=self.info.features.shape, + ) + + +class DetectionTest(tf.test.TestCase): + + def test_train_step(self): + config = detr_cfg.DetrTask( + model=detr_cfg.Detr( + input_size=[1333, 1333, 3], + num_encoder_layers=1, + num_decoder_layers=1, + num_classes=81, + backbone=backbones.Backbone( + type='resnet', + resnet=backbones.ResNet(model_id=10, bn_trainable=False)) + ), + train_data=coco.COCODataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=True, + global_batch_size=2, + )) + with tfds.testing.mock_data(as_dataset_fn=_as_dataset): + task = detection.DetectionTask(config) + model = task.build_model() + dataset = task.build_inputs(config.train_data) + iterator = iter(dataset) + opt_cfg = optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'detr_adamw', + 'detr_adamw': { + 'weight_decay_rate': 1e-4, + 'global_clipnorm': 0.1, + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [120000], + 'values': [0.0001, 1.0e-05] + } + }, + }) + optimizer = detection.DetectionTask.create_optimizer(opt_cfg) + task.train_step(next(iterator), model, optimizer) + + def test_validation_step(self): + config = detr_cfg.DetrTask( + model=detr_cfg.Detr( + input_size=[1333, 1333, 3], + num_encoder_layers=1, + num_decoder_layers=1, + num_classes=81, + backbone=backbones.Backbone( + type='resnet', + resnet=backbones.ResNet(model_id=10, bn_trainable=False)) + ), + validation_data=coco.COCODataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=False, + global_batch_size=2, + )) + + with tfds.testing.mock_data(as_dataset_fn=_as_dataset): + task = detection.DetectionTask(config) + model = task.build_model() + metrics = task.build_metrics(training=False) + dataset = task.build_inputs(config.validation_data) + iterator = iter(dataset) + logs = task.validation_step(next(iterator), model, metrics) + state = task.aggregate_logs(step_outputs=logs) + task.reduce_aggregated_logs(state) + + +class DetectionTFDSTest(tf.test.TestCase): + + def test_train_step(self): + config = detr_cfg.DetrTask( + model=detr_cfg.Detr( + input_size=[1333, 1333, 3], + num_encoder_layers=1, + num_decoder_layers=1, + backbone=backbones.Backbone( + type='resnet', + resnet=backbones.ResNet(model_id=10, bn_trainable=False)) + ), + losses=detr_cfg.Losses(class_offset=1), + train_data=detr_cfg.DataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=True, + global_batch_size=2, + )) + with tfds.testing.mock_data(as_dataset_fn=_as_dataset): + task = detection.DetectionTask(config) + model = task.build_model() + dataset = task.build_inputs(config.train_data) + iterator = iter(dataset) + opt_cfg = optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'detr_adamw', + 'detr_adamw': { + 'weight_decay_rate': 1e-4, + 'global_clipnorm': 0.1, + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [120000], + 'values': [0.0001, 1.0e-05] + } + }, + }) + optimizer = detection.DetectionTask.create_optimizer(opt_cfg) + task.train_step(next(iterator), model, optimizer) + + def test_validation_step(self): + config = detr_cfg.DetrTask( + model=detr_cfg.Detr( + input_size=[1333, 1333, 3], + num_encoder_layers=1, + num_decoder_layers=1, + backbone=backbones.Backbone( + type='resnet', + resnet=backbones.ResNet(model_id=10, bn_trainable=False)) + ), + losses=detr_cfg.Losses(class_offset=1), + validation_data=detr_cfg.DataConfig( + tfds_name='coco/2017', + tfds_split='validation', + is_training=False, + global_batch_size=2, + )) + + with tfds.testing.mock_data(as_dataset_fn=_as_dataset): + task = detection.DetectionTask(config) + model = task.build_model() + metrics = task.build_metrics(training=False) + dataset = task.build_inputs(config.validation_data) + iterator = iter(dataset) + logs = task.validation_step(next(iterator), model, metrics) + state = task.aggregate_logs(step_outputs=logs) + task.reduce_aggregated_logs(state) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/detr/train.py b/official/projects/detr/train.py new file mode 100644 index 0000000000000000000000000000000000000000..a34da6843b45cbc0f17c5b8166ad92092dd33066 --- /dev/null +++ b/official/projects/detr/train.py @@ -0,0 +1,70 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver.""" + +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +# pylint: disable=unused-import +from official.projects.detr.configs import detr +from official.projects.detr.tasks import detection +# pylint: enable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + +if __name__ == '__main__': + tfm_flags.define_flags() + flags.mark_flags_as_required(['experiment', 'mode', 'model_dir']) + app.run(main) diff --git a/official/projects/edgetpu/nlp/__init__.py b/official/projects/edgetpu/nlp/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/nlp/__init__.py +++ b/official/projects/edgetpu/nlp/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/configs/__init__.py b/official/projects/edgetpu/nlp/configs/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/nlp/configs/__init__.py +++ b/official/projects/edgetpu/nlp/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/configs/params.py b/official/projects/edgetpu/nlp/configs/params.py index fc8a5f4e9d9959f1d793c158dab60b6cf4002e55..39a83ba26eb11075158c4bc6e27ba6a4d7db8435 100644 --- a/official/projects/edgetpu/nlp/configs/params.py +++ b/official/projects/edgetpu/nlp/configs/params.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/experiments/downstream_tasks/mobilebert_baseline.yaml b/official/projects/edgetpu/nlp/experiments/downstream_tasks/mobilebert_baseline.yaml index 0b7d0563e4e7e80947ebf8f577df78f9f7032229..0b55beeaaed58b7fd3ee1c9ba3390429f3ccc8e0 100644 --- a/official/projects/edgetpu/nlp/experiments/downstream_tasks/mobilebert_baseline.yaml +++ b/official/projects/edgetpu/nlp/experiments/downstream_tasks/mobilebert_baseline.yaml @@ -13,7 +13,7 @@ task: num_attention_heads: 4 intermediate_size: 512 hidden_activation: relu - hidden_dropout_prob: 0.0 + hidden_dropout_prob: 0.1 attention_probs_dropout_prob: 0.1 intra_bottleneck_size: 128 initializer_range: 0.02 diff --git a/official/projects/edgetpu/nlp/experiments/downstream_tasks/mobilebert_edgetpu_xxs.yaml b/official/projects/edgetpu/nlp/experiments/downstream_tasks/mobilebert_edgetpu_xxs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f1b1bfff412c4e8485e2bcfb7c883b47e834263e --- /dev/null +++ b/official/projects/edgetpu/nlp/experiments/downstream_tasks/mobilebert_edgetpu_xxs.yaml @@ -0,0 +1,23 @@ +# MobileBERT-EdgeTPU-XXS model. +task: + model: + encoder: + type: mobilebert + mobilebert: + word_vocab_size: 30522 + word_embed_size: 128 + type_vocab_size: 2 + max_sequence_length: 512 + num_blocks: 6 + hidden_size: 512 + num_attention_heads: 4 + intermediate_size: 1024 + hidden_activation: relu + hidden_dropout_prob: 0.1 + attention_probs_dropout_prob: 0.1 + intra_bottleneck_size: 128 + initializer_range: 0.02 + key_query_shared_bottleneck: true + num_feedforward_networks: 2 + normalization_type: no_norm + classifier_activation: false diff --git a/official/projects/edgetpu/nlp/experiments/mobilebert_edgetpu_xxs.yaml b/official/projects/edgetpu/nlp/experiments/mobilebert_edgetpu_xxs.yaml new file mode 100644 index 0000000000000000000000000000000000000000..86c26569339a5f147f5b9e18f47d5f636f643957 --- /dev/null +++ b/official/projects/edgetpu/nlp/experiments/mobilebert_edgetpu_xxs.yaml @@ -0,0 +1,142 @@ +layer_wise_distillation: + num_steps: 30000 + warmup_steps: 0 + initial_learning_rate: 1.5e-3 + end_learning_rate: 1.5e-3 + decay_steps: 30000 +end_to_end_distillation: + num_steps: 585000 + warmup_steps: 20000 + initial_learning_rate: 1.5e-3 + end_learning_rate: 1.5e-7 + decay_steps: 585000 + distill_ground_truth_ratio: 0.5 +optimizer: + optimizer: + lamb: + beta_1: 0.9 + beta_2: 0.999 + clipnorm: 1.0 + epsilon: 1.0e-06 + exclude_from_layer_adaptation: null + exclude_from_weight_decay: ['LayerNorm', 'bias', 'norm'] + global_clipnorm: null + name: LAMB + weight_decay_rate: 0.01 + type: lamb +orbit_config: + eval_interval: 1000 + eval_steps: -1 + mode: train + steps_per_loop: 1000 + total_steps: 825000 +runtime: + distribution_strategy: 'tpu' +student_model: + cls_heads: [{'activation': 'tanh', + 'cls_token_idx': 0, + 'dropout_rate': 0.0, + 'inner_dim': 512, + 'name': 'next_sentence', + 'num_classes': 2}] + encoder: + mobilebert: + attention_probs_dropout_prob: 0.1 + classifier_activation: false + hidden_activation: relu + hidden_dropout_prob: 0.0 + hidden_size: 512 + initializer_range: 0.02 + input_mask_dtype: int32 + intermediate_size: 1024 + intra_bottleneck_size: 128 + key_query_shared_bottleneck: true + max_sequence_length: 512 + normalization_type: no_norm + num_attention_heads: 4 + num_blocks: 6 + num_feedforward_networks: 2 + type_vocab_size: 2 + use_bottleneck_attention: false + word_embed_size: 128 + word_vocab_size: 30522 + type: mobilebert + mlm_activation: relu + mlm_initializer_range: 0.02 + mlm_output_weights_use_proj: true +teacher_model: + cls_heads: [] + encoder: + mobilebert: + attention_probs_dropout_prob: 0.1 + classifier_activation: false + hidden_activation: gelu + hidden_dropout_prob: 0.1 + hidden_size: 512 + initializer_range: 0.02 + input_mask_dtype: int32 + intermediate_size: 4096 + intra_bottleneck_size: 1024 + key_query_shared_bottleneck: false + max_sequence_length: 512 + normalization_type: layer_norm + num_attention_heads: 4 + num_blocks: 24 + num_feedforward_networks: 1 + type_vocab_size: 2 + use_bottleneck_attention: false + word_embed_size: 128 + word_vocab_size: 30522 + type: mobilebert + mlm_activation: gelu + mlm_initializer_range: 0.02 +teacher_model_init_checkpoint: gs://**/uncased_L-24_H-1024_B-512_A-4_teacher/tf2_checkpoint/bert_model.ckpt-1 +student_model_init_checkpoint: '' +train_datasest: + block_length: 1 + cache: false + cycle_length: null + deterministic: null + drop_remainder: true + enable_tf_data_service: false + global_batch_size: 2048 + input_path: gs://**/seq_512_mask_20/wikipedia.tfrecord*,gs://**/seq_512_mask_20/books.tfrecord* + is_training: true + max_predictions_per_seq: 20 + seq_length: 512 + sharding: true + shuffle_buffer_size: 100 + tf_data_service_address: null + tf_data_service_job_name: null + tfds_as_supervised: false + tfds_data_dir: '' + tfds_name: '' + tfds_skip_decoding_feature: '' + tfds_split: '' + use_next_sentence_label: true + use_position_id: false + use_v2_feature_names: false +eval_dataset: + block_length: 1 + cache: false + cycle_length: null + deterministic: null + drop_remainder: true + enable_tf_data_service: false + global_batch_size: 2048 + input_path: gs://**/seq_512_mask_20/wikipedia.tfrecord-00141-of-00500,gs://**/seq_512_mask_20/books.tfrecord-00141-of-00500 + is_training: false + max_predictions_per_seq: 20 + seq_length: 512 + sharding: true + shuffle_buffer_size: 100 + tf_data_service_address: null + tf_data_service_job_name: null + tfds_as_supervised: false + tfds_data_dir: '' + tfds_name: '' + tfds_skip_decoding_feature: '' + tfds_split: '' + use_next_sentence_label: true + use_position_id: false + use_v2_feature_names: false diff --git a/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer.py b/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer.py index 8e57ef9b17ac4ce22bcd83b138397a3533367528..2adeb246bf054845b9c769786138f274a07e7bb1 100644 --- a/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer.py +++ b/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer_test.py b/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer_test.py index 82afbca221f3a4af70f2d43b399394ee61fb607f..b411c4946f3c5ff55400b8f028f704db19279c19 100644 --- a/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer_test.py +++ b/official/projects/edgetpu/nlp/mobilebert_edgetpu_trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/modeling/__init__.py b/official/projects/edgetpu/nlp/modeling/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/nlp/modeling/__init__.py +++ b/official/projects/edgetpu/nlp/modeling/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/modeling/edgetpu_layers.py b/official/projects/edgetpu/nlp/modeling/edgetpu_layers.py index fd1ea5cc7efdabda1a2b3a2ba73204fcc260275e..08900f6acc59960b7a10e445d0a9d64ed2e1cd52 100644 --- a/official/projects/edgetpu/nlp/modeling/edgetpu_layers.py +++ b/official/projects/edgetpu/nlp/modeling/edgetpu_layers.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -123,7 +123,7 @@ class EdgeTPUMultiHeadAttention(tf.keras.layers.MultiHeadAttention): """Builds multi-head dot-product attention computations. This function builds attributes necessary for `_compute_attention` to - costomize attention computation to replace the default dot-product + customize attention computation to replace the default dot-product attention. Args: diff --git a/official/projects/edgetpu/nlp/modeling/edgetpu_layers_test.py b/official/projects/edgetpu/nlp/modeling/edgetpu_layers_test.py index 477eea25862261224a2a385017b65e7444cd65d9..1ed5570d2d1132c0ea015a1e0ebd2a15aef310db 100644 --- a/official/projects/edgetpu/nlp/modeling/edgetpu_layers_test.py +++ b/official/projects/edgetpu/nlp/modeling/edgetpu_layers_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/modeling/encoder.py b/official/projects/edgetpu/nlp/modeling/encoder.py index 0693a0ac1f59912636e6463792316aaedf536e76..ea8e03f2bd6c2fe7d5d929817a7acd5a266f0e0b 100644 --- a/official/projects/edgetpu/nlp/modeling/encoder.py +++ b/official/projects/edgetpu/nlp/modeling/encoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -161,7 +161,7 @@ class MobileBERTEncoder(tf.keras.Model): first_token = tf.squeeze(prev_output[:, 0:1, :], axis=1) if classifier_activation: - self._pooler_layer = tf.keras.layers.experimental.EinsumDense( + self._pooler_layer = tf.keras.layers.EinsumDense( 'ab,bc->ac', output_shape=hidden_size, activation=tf.tanh, diff --git a/official/projects/edgetpu/nlp/modeling/model_builder.py b/official/projects/edgetpu/nlp/modeling/model_builder.py index b78916dd2696b02cd92aab69925053a8ffc95899..de6f1ec597b4c228967c34b9efdc9da83d3c4ac2 100644 --- a/official/projects/edgetpu/nlp/modeling/model_builder.py +++ b/official/projects/edgetpu/nlp/modeling/model_builder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -85,6 +85,7 @@ def build_bert_pretrainer(pretrainer_cfg: params.PretrainerModelParams, activation=tf_utils.get_activation(pretrainer_cfg.mlm_activation), initializer=tf.keras.initializers.TruncatedNormal( stddev=pretrainer_cfg.mlm_initializer_range), + output_weights_use_proj=pretrainer_cfg.mlm_output_weights_use_proj, name='cls/predictions') pretrainer = edgetpu_pretrainer.MobileBERTEdgeTPUPretrainer( diff --git a/official/projects/edgetpu/nlp/modeling/model_builder_test.py b/official/projects/edgetpu/nlp/modeling/model_builder_test.py index 96461fb31391a90b0c8374feb4e89f1be0444556..159dd2d7b448141388bcec4a0f62bfcda19b7175 100644 --- a/official/projects/edgetpu/nlp/modeling/model_builder_test.py +++ b/official/projects/edgetpu/nlp/modeling/model_builder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/modeling/pretrainer.py b/official/projects/edgetpu/nlp/modeling/pretrainer.py index 8a81021c0220d6f4d7047f906c990793a741bc4c..8607f3e817c105ca293b0d56aa0bea4b24f5077a 100644 --- a/official/projects/edgetpu/nlp/modeling/pretrainer.py +++ b/official/projects/edgetpu/nlp/modeling/pretrainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/modeling/pretrainer_test.py b/official/projects/edgetpu/nlp/modeling/pretrainer_test.py index 67741cbf69a58414657b7702210eb85057843d8d..e896d0da1a90bdc3bf6d4acac1b434d58e47eb8e 100644 --- a/official/projects/edgetpu/nlp/modeling/pretrainer_test.py +++ b/official/projects/edgetpu/nlp/modeling/pretrainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/run_mobilebert_edgetpu_train.py b/official/projects/edgetpu/nlp/run_mobilebert_edgetpu_train.py index 2a9e671f9f1ce89e5184f4395c5a8be369f480b0..812a0d051e686dbd33226c4082b312035938d44d 100644 --- a/official/projects/edgetpu/nlp/run_mobilebert_edgetpu_train.py +++ b/official/projects/edgetpu/nlp/run_mobilebert_edgetpu_train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/serving/__init__.py b/official/projects/edgetpu/nlp/serving/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/nlp/serving/__init__.py +++ b/official/projects/edgetpu/nlp/serving/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/serving/export_tflite_squad.py b/official/projects/edgetpu/nlp/serving/export_tflite_squad.py index acd39198642410bb205995c6155256d63fca87ca..b66c54a84d4d706e5c354f40bda1b5083a6c5099 100644 --- a/official/projects/edgetpu/nlp/serving/export_tflite_squad.py +++ b/official/projects/edgetpu/nlp/serving/export_tflite_squad.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -135,7 +135,8 @@ def main(argv: Sequence[str]) -> None: checkpoint = tf.train.Checkpoint(**checkpoint_dict) checkpoint.restore(FLAGS.model_checkpoint).assert_existing_objects_matched() - model_for_serving = build_model_for_serving(model) + model_for_serving = build_model_for_serving(model, FLAGS.sequence_length, + FLAGS.batch_size) model_for_serving.summary() # TODO(b/194449109): Need to save the model to file and then convert tflite diff --git a/official/projects/edgetpu/nlp/serving/export_tflite_squad_test.py b/official/projects/edgetpu/nlp/serving/export_tflite_squad_test.py index 300c66b353c2573c44d3f4124b4c2fc277c25426..10c1b0d51a89e0476fac683f7c00e47315a54f9b 100644 --- a/official/projects/edgetpu/nlp/serving/export_tflite_squad_test.py +++ b/official/projects/edgetpu/nlp/serving/export_tflite_squad_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/utils/__init__.py b/official/projects/edgetpu/nlp/utils/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/nlp/utils/__init__.py +++ b/official/projects/edgetpu/nlp/utils/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/utils/utils.py b/official/projects/edgetpu/nlp/utils/utils.py index 95604502611c696ca7f8ec1158d29178f93677e1..ea0594a160eaa06af897c9a7cf177bee357159aa 100644 --- a/official/projects/edgetpu/nlp/utils/utils.py +++ b/official/projects/edgetpu/nlp/utils/utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/nlp/utils/utils_test.py b/official/projects/edgetpu/nlp/utils/utils_test.py index 37cb0efbd414b89d554370996ea0a2c619e461c0..82131baab74b7f34c3fe72285194109c3c432b1e 100644 --- a/official/projects/edgetpu/nlp/utils/utils_test.py +++ b/official/projects/edgetpu/nlp/utils/utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/README.md b/official/projects/edgetpu/vision/README.md index d1a4d3fd7d3eb9836b70f53175c1abedd7baa958..5951a20d88039e1e371c6126ca63ae2f613e2232 100644 --- a/official/projects/edgetpu/vision/README.md +++ b/official/projects/edgetpu/vision/README.md @@ -78,10 +78,10 @@ models for 224x224 input resolution: Model (Checkpoint) | Accuracy (int8) | Pixel 6 Edge TPU Latency (ms) | tflite ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------: | :---------------------------: | :----: [MobileNetEdgeTPUv2-Tiny](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet-edgetpu-v2-tiny.tar.gz) | 74.66% | 0.78 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet_edgetpu_v2_tiny.tflite) -[MobileNetEdgeTPUv2-XS](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet-edgetpu-v2-xs.tar.gz) | 75.79% | 0.82 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet_edgetpu_v2_xs.tflite) -[MobileNetEdgeTPUv2-S](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet-edgetpu-v2-s.tar.gz) | 77.36% | 1.03 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet_edgetpu_v2_s.tflite) -[MobileNetEdgeTPUv2-M](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet-edgetpu-v2-m.tar.gz) | 78.43% | 1.35 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet_edgetpu_v2_m.tflite) -[MobileNetEdgeTPUv2-L](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet-edgetpu-v2-l.tar.gz) | 79.00% | 1.64 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/tiny/mobilenet_edgetpu_v2_l.tflite) +[MobileNetEdgeTPUv2-XS](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/xs/mobilenet-edgetpu-v2-xs.tar.gz) | 75.79% | 0.82 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/xs/mobilenet_edgetpu_v2_xs.tflite) +[MobileNetEdgeTPUv2-S](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/s/mobilenet-edgetpu-v2-s.tar.gz) | 77.36% | 1.03 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/s/mobilenet_edgetpu_v2_s.tflite) +[MobileNetEdgeTPUv2-M](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/m/mobilenet-edgetpu-v2-m.tar.gz) | 78.43% | 1.35 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/m/mobilenet_edgetpu_v2_m.tflite) +[MobileNetEdgeTPUv2-L](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/l/mobilenet-edgetpu-v2-l.tar.gz) | 79.00% | 1.64 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v2/l/mobilenet_edgetpu_v2_l.tflite) [MobileNetEdgeTPU dm1.0](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v1/dm1p0/mobilenet-edgetpu-dm1p0.tar.gz) | 75.6% | 0.92 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v1/dm1p0/mobilenet_edgetpu.tflite) [MobileNetEdgeTPU dm1.25](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v1/dm1p25/mobilenet-edgetpu-dm1p25.tar.gz) | 77.06% | 1.20 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v1/dm1p25/mobilenet_edgetpu_dm1p25.tflite) [MobileNetEdgeTPU dm1.5](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v1/dm1p5/mobilenet-edgetpu-dm1p5.tar.gz) | 75.9% | 1.42 | [link](https://storage.cloud.google.com/tf_model_garden/models/edgetpu/checkpoint_and_tflite/vision/mobilenet-edgetpu-v1/dm1p5/mobilenet_edgetpu_dm1p5.tflite) diff --git a/official/projects/edgetpu/vision/__init__.py b/official/projects/edgetpu/vision/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/__init__.py +++ b/official/projects/edgetpu/vision/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/configs/__init__.py b/official/projects/edgetpu/vision/configs/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/configs/__init__.py +++ b/official/projects/edgetpu/vision/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/configs/mobilenet_edgetpu_config.py b/official/projects/edgetpu/vision/configs/mobilenet_edgetpu_config.py index 5ce1c3eb49f408ddb78424e121752d1155d4a002..5970e533c5ec9adfb926d9679ec0d8d82ff6d4a9 100644 --- a/official/projects/edgetpu/vision/configs/mobilenet_edgetpu_config.py +++ b/official/projects/edgetpu/vision/configs/mobilenet_edgetpu_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -24,8 +24,8 @@ from typing import Any, Mapping, Optional from official.core import config_definitions as cfg from official.core import exp_factory from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.configs import image_classification as base_config +from official.vision.configs import common +from official.vision.configs import image_classification as base_config @dataclasses.dataclass diff --git a/official/projects/edgetpu/vision/configs/semantic_segmentation_config.py b/official/projects/edgetpu/vision/configs/semantic_segmentation_config.py index dbb5e41502535ad5782e71a39833305e480fd5a1..10012436d963935939252da1ff68d7028b7e05d1 100644 --- a/official/projects/edgetpu/vision/configs/semantic_segmentation_config.py +++ b/official/projects/edgetpu/vision/configs/semantic_segmentation_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Semantic segmentation configuration definition. The segmentation model is built using the mobilenet edgetpu v2 backbone and @@ -25,10 +24,10 @@ from official.core import config_definitions as cfg from official.core import exp_factory from official.modeling import hyperparams from official.modeling import optimization -from official.vision.beta.configs import backbones -from official.vision.beta.configs import common -from official.vision.beta.configs import decoders -from official.vision.beta.configs import semantic_segmentation as base_cfg +from official.vision.configs import backbones +from official.vision.configs import common +from official.vision.configs import decoders +from official.vision.configs import semantic_segmentation as base_cfg @dataclasses.dataclass diff --git a/official/projects/edgetpu/vision/configs/semantic_segmentation_searched_config.py b/official/projects/edgetpu/vision/configs/semantic_segmentation_searched_config.py index 44a5a4c04930ae244f2d8e82e22d0319874352ec..87213ff6dc73d977228ca6fe255f69b29f60631a 100644 --- a/official/projects/edgetpu/vision/configs/semantic_segmentation_searched_config.py +++ b/official/projects/edgetpu/vision/configs/semantic_segmentation_searched_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -26,8 +26,8 @@ from official.core import config_definitions as cfg from official.core import exp_factory from official.modeling import hyperparams from official.modeling import optimization -from official.vision.beta.configs import backbones -from official.vision.beta.configs import semantic_segmentation as base_cfg +from official.vision.configs import backbones +from official.vision.configs import semantic_segmentation as base_cfg # ADE 20K Dataset ADE20K_TRAIN_EXAMPLES = 20210 diff --git a/official/projects/edgetpu/vision/dataloaders/__init__.py b/official/projects/edgetpu/vision/dataloaders/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/dataloaders/__init__.py +++ b/official/projects/edgetpu/vision/dataloaders/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/dataloaders/classification_input.py b/official/projects/edgetpu/vision/dataloaders/classification_input.py index 175d1900d0d6bdaade05fabab01e4345348877ff..1c7f532d93b5c34adbbdcc0bffe18d8da77dd834 100644 --- a/official/projects/edgetpu/vision/dataloaders/classification_input.py +++ b/official/projects/edgetpu/vision/dataloaders/classification_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,8 +16,8 @@ # Import libraries import tensorflow as tf -from official.vision.beta.dataloaders import classification_input -from official.vision.beta.ops import preprocess_ops +from official.vision.dataloaders import classification_input +from official.vision.ops import preprocess_ops MEAN_RGB = (0.5 * 255, 0.5 * 255, 0.5 * 255) STDDEV_RGB = (0.5 * 255, 0.5 * 255, 0.5 * 255) diff --git a/official/projects/edgetpu/vision/dataloaders/classification_input_test.py b/official/projects/edgetpu/vision/dataloaders/classification_input_test.py index 437f10b8d418fa03d4ab3a80c4e035ef92b36275..ecd552b072d12809f66f9b28faef970d17fed132 100644 --- a/official/projects/edgetpu/vision/dataloaders/classification_input_test.py +++ b/official/projects/edgetpu/vision/dataloaders/classification_input_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ from absl.testing import parameterized import tensorflow as tf from official.projects.edgetpu.vision.dataloaders import classification_input -from official.vision.beta.configs import common -from official.vision.beta.dataloaders import tfexample_utils +from official.vision.configs import common +from official.vision.dataloaders import tfexample_utils IMAGE_FIELD_KEY = 'image/encoded' LABEL_FIELD_KEY = 'image/class/label' diff --git a/official/projects/edgetpu/vision/modeling/__init__.py b/official/projects/edgetpu/vision/modeling/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/modeling/__init__.py +++ b/official/projects/edgetpu/vision/modeling/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/backbones/__init__.py b/official/projects/edgetpu/vision/modeling/backbones/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/modeling/backbones/__init__.py +++ b/official/projects/edgetpu/vision/modeling/backbones/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu.py b/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu.py index 0a2aafe3d554ea745c3ec49a668c609d6af9a279..2fa8d10e59719ca25a7000e3803f903bded0b17f 100644 --- a/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu.py +++ b/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,7 +22,7 @@ import tensorflow as tf from official.modeling import hyperparams from official.projects.edgetpu.vision.modeling.mobilenet_edgetpu_v1_model import MobilenetEdgeTPU from official.projects.edgetpu.vision.modeling.mobilenet_edgetpu_v2_model import MobilenetEdgeTPUV2 -from official.vision.beta.modeling.backbones import factory +from official.vision.modeling.backbones import factory layers = tf.keras.layers diff --git a/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu_test.py b/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu_test.py index dea28630eb71878cb9bac789115919e2c2c311f8..9043aeb06089ef93e72c59efc4feb829d1f46a72 100644 --- a/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu_test.py +++ b/official/projects/edgetpu/vision/modeling/backbones/mobilenet_edgetpu_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for MobileNet.""" # Import libraries diff --git a/official/projects/edgetpu/vision/modeling/common_modules.py b/official/projects/edgetpu/vision/modeling/common_modules.py index 878a5702fd3b09d683c1256f8a69f2b4262e1097..284a2e8e46fc3561e833bb5792392c50955b629f 100644 --- a/official/projects/edgetpu/vision/modeling/common_modules.py +++ b/official/projects/edgetpu/vision/modeling/common_modules.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/custom_layers.py b/official/projects/edgetpu/vision/modeling/custom_layers.py index 7097fde2e4703c36ae8990cc7d1f43ee95e6b20f..3548dd335785724efd6c2083db7e99ae03ba434c 100644 --- a/official/projects/edgetpu/vision/modeling/custom_layers.py +++ b/official/projects/edgetpu/vision/modeling/custom_layers.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,6 +18,8 @@ import inspect from typing import Any, MutableMapping, Optional, Union, Tuple import tensorflow as tf +from official.modeling import tf_utils + class GroupConv2D(tf.keras.layers.Conv2D): """2D group convolution as a Keras Layer.""" @@ -168,7 +170,7 @@ class GroupConv2D(tf.keras.layers.Conv2D): self.add_weight( name='kernel_{}'.format(g), shape=self.group_kernel_shape, - initializer=self.kernel_initializer, + initializer=tf_utils.clone_initializer(self.kernel_initializer), regularizer=self.kernel_regularizer, constraint=self.kernel_constraint, trainable=True, @@ -178,7 +180,7 @@ class GroupConv2D(tf.keras.layers.Conv2D): self.add_weight( name='bias_{}'.format(g), shape=(self.group_output_channel,), - initializer=self.bias_initializer, + initializer=tf_utils.clone_initializer(self.bias_initializer), regularizer=self.bias_regularizer, constraint=self.bias_constraint, trainable=True, diff --git a/official/projects/edgetpu/vision/modeling/custom_layers_test.py b/official/projects/edgetpu/vision/modeling/custom_layers_test.py index ef39c563d19c950a48671e74e7f8819f0fa13f9e..c07ce224ee3ab18ad68815c0e0f45b40470e59ee 100644 --- a/official/projects/edgetpu/vision/modeling/custom_layers_test.py +++ b/official/projects/edgetpu/vision/modeling/custom_layers_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/heads/__init__.py b/official/projects/edgetpu/vision/modeling/heads/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/modeling/heads/__init__.py +++ b/official/projects/edgetpu/vision/modeling/heads/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/heads/bifpn_head.py b/official/projects/edgetpu/vision/modeling/heads/bifpn_head.py index ea6ba275b5678d2beefe549c78a3afc491bf478d..7af79d1e55b6449d70604129b0e04ec2ce096f66 100644 --- a/official/projects/edgetpu/vision/modeling/heads/bifpn_head.py +++ b/official/projects/edgetpu/vision/modeling/heads/bifpn_head.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -670,7 +670,7 @@ class SegClassNet(tf.keras.layers.Layer): self.min_level = min_level self.max_level = max_level self.fullres_output = fullres_output - self.fullres_conv_transpose = fullres_skip_connections + self.fullres_skip_connections = fullres_skip_connections self.fnode = FNode( 0, # Always use the first level with highest resolution. @@ -726,7 +726,7 @@ class SegClassNet(tf.keras.layers.Layer): if self.fullres_output: for i in reversed(range(self.min_level)): - if self.config.fullres_skip_connections: + if self.fullres_skip_connections: net = tf.keras.layers.Concatenate()([net, backbone_feats[i + 1]]) net = self.fullres_conv[str(i)](net) net = self.fullres_conv_transpose[str(i)](net) diff --git a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model.py b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model.py index c3b4a18cbe13da34fbdeeccb7985e1bfe0d5a1b9..fa3f36cc55aff3c8ab82da242a5be12d12cdf615 100644 --- a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model.py +++ b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_blocks.py b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_blocks.py index 7441db5b721a4a4f18333085bc0a42689e94b76a..29d93d3d92b758582add630fc3eb49d1951b2132 100644 --- a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_blocks.py +++ b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_blocks.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_test.py b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_test.py index 775e56620bd1f1c0edda004c4fba5aef71cde6a2..a4ca070a908585349f2ccc86c3b5635da546e6cd 100644 --- a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_test.py +++ b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v1_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model.py b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model.py index 4cb34b94a1abf2801374acd98349ecd50aac0b7e..9321cb47ec6d03068b29316a20382b3c0642ab24 100644 --- a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model.py +++ b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_blocks.py b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_blocks.py index c5aa89d0e81c6a2a3d2463b47323f87e3681e632..a66c72a7c1fc1ec6aa851e42924cf520b6e80f7c 100644 --- a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_blocks.py +++ b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_blocks.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -26,6 +26,8 @@ from official.modeling.hyperparams import oneof from official.projects.edgetpu.vision.modeling import common_modules from official.projects.edgetpu.vision.modeling import custom_layers +InitializerType = Optional[Union[str, tf.keras.initializers.Initializer]] + @dataclasses.dataclass class BlockType(oneof.OneOfConfig): @@ -216,6 +218,8 @@ class ModelConfig(base_config.Config): stem_base_filters: int = 64 stem_kernel_size: int = 5 top_base_filters: int = 1280 + conv_kernel_initializer: InitializerType = None + dense_kernel_initializer: InitializerType = None blocks: Tuple[BlockConfig, ...] = ( # (input_filters, output_filters, kernel_size, num_repeat, # expand_ratio, strides, se_ratio, id_skip, fused_conv, conv_type) @@ -279,7 +283,8 @@ def mobilenet_edgetpu_v2_base( drop_connect_rate: float = 0.1, filter_size_overrides: Optional[Dict[int, int]] = None, block_op_overrides: Optional[Dict[int, Dict[int, Dict[str, Any]]]] = None, - block_group_overrides: Optional[Dict[int, Dict[str, Any]]] = None): + block_group_overrides: Optional[Dict[int, Dict[str, Any]]] = None, + topology: Optional[TopologyConfig] = None): """Creates MobilenetEdgeTPUV2 ModelConfig based on tuning parameters.""" config = ModelConfig() @@ -295,7 +300,7 @@ def mobilenet_edgetpu_v2_base( } config = config.replace(**param_overrides) - topology_config = TopologyConfig() + topology_config = TopologyConfig() if topology is None else topology if filter_size_overrides: for group_id in filter_size_overrides: topology_config.block_groups[group_id].filters = filter_size_overrides[ @@ -724,6 +729,7 @@ def conv2d_block_as_layers( use_bias: bool = False, activation: Any = None, depthwise: bool = False, + kernel_initializer: InitializerType = None, name: Optional[str] = None) -> List[tf.keras.layers.Layer]: """A conv2d followed by batch norm and an activation.""" batch_norm = common_modules.get_batch_norm(config.batch_norm) @@ -748,11 +754,13 @@ def conv2d_block_as_layers( sequential_layers: List[tf.keras.layers.Layer] = [] if depthwise: conv2d = tf.keras.layers.DepthwiseConv2D - init_kwargs.update({'depthwise_initializer': CONV_KERNEL_INITIALIZER}) + init_kwargs.update({'depthwise_initializer': kernel_initializer}) else: conv2d = tf.keras.layers.Conv2D - init_kwargs.update({'filters': conv_filters, - 'kernel_initializer': CONV_KERNEL_INITIALIZER}) + init_kwargs.update({ + 'filters': conv_filters, + 'kernel_initializer': kernel_initializer + }) sequential_layers.append(conv2d(**init_kwargs)) @@ -780,12 +788,21 @@ def conv2d_block(inputs: tf.Tensor, use_bias: bool = False, activation: Any = None, depthwise: bool = False, + kernel_initializer: Optional[InitializerType] = None, name: Optional[str] = None) -> tf.Tensor: """Compatibility with third_party/car/deep_nets.""" x = inputs - for layer in conv2d_block_as_layers(conv_filters, config, kernel_size, - strides, use_batch_norm, use_bias, - activation, depthwise, name): + for layer in conv2d_block_as_layers( + conv_filters=conv_filters, + config=config, + kernel_size=kernel_size, + strides=strides, + use_batch_norm=use_batch_norm, + use_bias=use_bias, + activation=activation, + depthwise=depthwise, + kernel_initializer=kernel_initializer, + name=name): x = layer(x) return x @@ -828,6 +845,9 @@ class _MbConvBlock: use_groupconv = block.conv_type == 'group' prefix = prefix or '' self.name = prefix + conv_kernel_initializer = ( + config.conv_kernel_initializer if config.conv_kernel_initializer + is not None else CONV_KERNEL_INITIALIZER) filters = block.input_filters * block.expand_ratio @@ -851,22 +871,26 @@ class _MbConvBlock: activation=activation, name=prefix + 'fused')) else: - self.expand_block.extend(conv2d_block_as_layers( - filters, - config, - kernel_size=block.kernel_size, - strides=block.strides, - activation=activation, - name=prefix + 'fused')) + self.expand_block.extend( + conv2d_block_as_layers( + conv_filters=filters, + config=config, + kernel_size=block.kernel_size, + strides=block.strides, + activation=activation, + kernel_initializer=conv_kernel_initializer, + name=prefix + 'fused')) else: if block.expand_ratio != 1: # Expansion phase with a pointwise conv - self.expand_block.extend(conv2d_block_as_layers( - filters, - config, - kernel_size=(1, 1), - activation=activation, - name=prefix + 'expand')) + self.expand_block.extend( + conv2d_block_as_layers( + conv_filters=filters, + config=config, + kernel_size=(1, 1), + activation=activation, + kernel_initializer=conv_kernel_initializer, + name=prefix + 'expand')) # Main kernel, after the expansion (if applicable, i.e. not fused). if use_depthwise: @@ -876,6 +900,7 @@ class _MbConvBlock: kernel_size=block.kernel_size, strides=block.strides, activation=activation, + kernel_initializer=conv_kernel_initializer, depthwise=True, name=prefix + 'depthwise')) elif use_groupconv: @@ -907,27 +932,30 @@ class _MbConvBlock: tf.keras.layers.Reshape(se_shape, name=prefix + 'se_reshape')) self.squeeze_excitation.extend( conv2d_block_as_layers( - num_reduced_filters, - config, + conv_filters=num_reduced_filters, + config=config, use_bias=True, use_batch_norm=False, activation=activation, + kernel_initializer=conv_kernel_initializer, name=prefix + 'se_reduce')) self.squeeze_excitation.extend( conv2d_block_as_layers( - filters, - config, + conv_filters=filters, + config=config, use_bias=True, use_batch_norm=False, activation='sigmoid', + kernel_initializer=conv_kernel_initializer, name=prefix + 'se_expand')) # Output phase self.project_block.extend( conv2d_block_as_layers( - block.output_filters, - config, + conv_filters=block.output_filters, + config=config, activation=None, + kernel_initializer=conv_kernel_initializer, name=prefix + 'project')) # Add identity so that quantization-aware training can insert quantization @@ -993,6 +1021,12 @@ def mobilenet_edgetpu_v2(image_input: tf.keras.layers.Input, activation = tf_utils.get_activation(config.activation) dropout_rate = config.dropout_rate drop_connect_rate = config.drop_connect_rate + conv_kernel_initializer = ( + config.conv_kernel_initializer if config.conv_kernel_initializer + is not None else CONV_KERNEL_INITIALIZER) + dense_kernel_initializer = ( + config.dense_kernel_initializer if config.dense_kernel_initializer + is not None else DENSE_KERNEL_INITIALIZER) num_classes = config.num_classes input_channels = config.input_channels rescale_input = config.rescale_input @@ -1010,12 +1044,13 @@ def mobilenet_edgetpu_v2(image_input: tf.keras.layers.Input, # Build stem x = conv2d_block( - x, - round_filters(stem_base_filters, config), - config, + inputs=x, + conv_filters=round_filters(stem_base_filters, config), + config=config, kernel_size=[stem_kernel_size, stem_kernel_size], strides=[2, 2], activation=activation, + kernel_initializer=conv_kernel_initializer, name='stem') # Build blocks @@ -1061,11 +1096,13 @@ def mobilenet_edgetpu_v2(image_input: tf.keras.layers.Input, if config.backbone_only: return backbone_levels # Build top - x = conv2d_block(x, - round_filters(top_base_filters, config), - config, - activation=activation, - name='top') + x = conv2d_block( + inputs=x, + conv_filters=round_filters(top_base_filters, config), + config=config, + activation=activation, + kernel_initializer=conv_kernel_initializer, + name='top') # Build classifier pool_size = (x.shape.as_list()[1], x.shape.as_list()[2]) @@ -1075,7 +1112,7 @@ def mobilenet_edgetpu_v2(image_input: tf.keras.layers.Input, x = tf.keras.layers.Conv2D( num_classes, 1, - kernel_initializer=DENSE_KERNEL_INITIALIZER, + kernel_initializer=dense_kernel_initializer, kernel_regularizer=tf.keras.regularizers.l2(weight_decay), bias_regularizer=tf.keras.regularizers.l2(weight_decay), name='logits')( diff --git a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_blocks_test.py b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_blocks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..1ad600399d1e38c8eb5311b3a8a91e9c14065452 --- /dev/null +++ b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_blocks_test.py @@ -0,0 +1,72 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for mobilenet_edgetpu_v2_model_blocks.""" + +import tensorflow as tf + +from official.projects.edgetpu.vision.modeling import custom_layers +from official.projects.edgetpu.vision.modeling import mobilenet_edgetpu_v2_model_blocks + + +class MobilenetEdgetpuV2ModelBlocksTest(tf.test.TestCase): + + def setUp(self): + super().setUp() + self.model_config = mobilenet_edgetpu_v2_model_blocks.ModelConfig() + + def test_model_creatation(self): + model_input = tf.keras.layers.Input(shape=(224, 224, 1)) + model_output = mobilenet_edgetpu_v2_model_blocks.mobilenet_edgetpu_v2( + image_input=model_input, + config=self.model_config) + test_model = tf.keras.Model(inputs=model_input, outputs=model_output) + self.assertIsInstance(test_model, tf.keras.Model) + self.assertEqual(test_model.input.shape, (None, 224, 224, 1)) + self.assertEqual(test_model.output.shape, (None, 1001)) + + def test_model_with_customized_kernel_initializer(self): + self.model_config.conv_kernel_initializer = 'he_uniform' + self.model_config.dense_kernel_initializer = 'glorot_normal' + model_input = tf.keras.layers.Input(shape=(224, 224, 1)) + model_output = mobilenet_edgetpu_v2_model_blocks.mobilenet_edgetpu_v2( + image_input=model_input, + config=self.model_config) + test_model = tf.keras.Model(inputs=model_input, outputs=model_output) + + conv_layer_stack = [] + for layer in test_model.layers: + if (isinstance(layer, tf.keras.layers.Conv2D) or + isinstance(layer, tf.keras.layers.DepthwiseConv2D) or + isinstance(layer, custom_layers.GroupConv2D)): + conv_layer_stack.append(layer) + self.assertGreater(len(conv_layer_stack), 2) + # The last Conv layer is used as a Dense layer. + for layer in conv_layer_stack[:-1]: + if isinstance(layer, custom_layers.GroupConv2D): + self.assertIsInstance(layer.kernel_initializer, + tf.keras.initializers.GlorotUniform) + elif isinstance(layer, tf.keras.layers.Conv2D): + self.assertIsInstance(layer.kernel_initializer, + tf.keras.initializers.HeUniform) + elif isinstance(layer, tf.keras.layers.DepthwiseConv2D): + self.assertIsInstance(layer.depthwise_initializer, + tf.keras.initializers.HeUniform) + + self.assertIsInstance(conv_layer_stack[-1].kernel_initializer, + tf.keras.initializers.GlorotNormal) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_test.py b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_test.py index 004ffeb79b382e132a80906306f2403bdd614390..7044d7d93e5642176cf298237f92754759fa10c4 100644 --- a/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_test.py +++ b/official/projects/edgetpu/vision/modeling/mobilenet_edgetpu_v2_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/modeling/optimized_multiheadattention_layer.py b/official/projects/edgetpu/vision/modeling/optimized_multiheadattention_layer.py new file mode 100644 index 0000000000000000000000000000000000000000..c8f80fd8216b0fd5292a22452d43e71ccb7525bc --- /dev/null +++ b/official/projects/edgetpu/vision/modeling/optimized_multiheadattention_layer.py @@ -0,0 +1,164 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""MultiHeadAttention layer optimized for EdgeTPU. + +Compared to tf.keras.layers.MultiHeadAttention, this layer performs query-key +multiplication instead of key-query multiplication to remove an unnecessary +transpose. +""" +import math +import string +from typing import Optional, Tuple + +import numpy as np +import tensorflow as tf + +_CHR_IDX = string.ascii_lowercase + + +def _build_attention_equation( + rank: int, attn_axes: Tuple[int, ...]) -> Tuple[str, str, int]: + """Builds einsum equations for the attention computation. + + Query, key, value inputs after projection are expected to have the shape as: + `(bs, , , num_heads, channels)`. + `bs` and `` are treated as ``. + + The attention operations can be generalized: + (1) Query-key dot product: + `(, , num_heads, channels), (, + , num_heads, channels) -> (, + num_heads, , )` + (2) Combination: + `(, num_heads, , ), + (, , num_heads, channels) -> (, , num_heads, channels)` + + Args: + rank: Rank of query, key, value tensors. + attn_axes: List/tuple of axes, `[-1, rank)`, that attention will be + applied to. + + Returns: + Einsum equations. + """ + target_notation = _CHR_IDX[:rank] + # `batch_dims` includes the head dim. + batch_dims = tuple(np.delete(range(rank), attn_axes + (rank - 1,))) + letter_offset = rank + source_notation = "" + for i in range(rank): + if i in batch_dims or i == rank - 1: + source_notation += target_notation[i] + else: + source_notation += _CHR_IDX[letter_offset] + letter_offset += 1 + + product_notation = "".join([target_notation[i] for i in batch_dims] + + [target_notation[i] for i in attn_axes] + + [source_notation[i] for i in attn_axes]) + dot_product_equation = "%s,%s->%s" % ( + target_notation, + source_notation, + product_notation, + ) + attn_scores_rank = len(product_notation) + combine_equation = "%s,%s->%s" % ( + product_notation, + source_notation, + target_notation, + ) + return dot_product_equation, combine_equation, attn_scores_rank + + +class OptimizedMultiHeadAttention(tf.keras.layers.MultiHeadAttention): + """MultiHeadAttention with query-key multiplication. + + Currently, this layer only works for self-attention but not for + cross-attention. TODO(b/243166060). + """ + + def _build_attention(self, rank: int) -> None: + """Builds multi-head dot-product attention computations. + + This function builds attributes necessary for `_compute_attention` to + customize attention computation to replace the default dot-product + attention. + + Args: + rank: the rank of query, key, value tensors. + """ + if self._attention_axes is None: + self._attention_axes = tuple(range(1, rank - 2)) + else: + self._attention_axes = tuple(self._attention_axes) + ( + self._dot_product_equation, + self._combine_equation, + attn_scores_rank, + ) = _build_attention_equation( + rank, attn_axes=self._attention_axes) + norm_axes = tuple( + range(attn_scores_rank - len(self._attention_axes), attn_scores_rank)) + self._softmax = tf.keras.layers.Softmax(axis=norm_axes) + self._dropout_layer = tf.keras.layers.Dropout(rate=self._dropout) + + def _compute_attention( + self, + query: tf.Tensor, + key: tf.Tensor, + value: tf.Tensor, + attention_mask: Optional[tf.Tensor] = None, + training: Optional[bool] = None) -> Tuple[tf.Tensor, tf.Tensor]: + """Applies Dot-product attention with query, key, value tensors. + + This function defines the computation inside `call` with projected + multi-head Q, K, V inputs. Users can override this function for + customized attention implementation. + + Args: + query: Projected query `Tensor` of shape `(B, T, N, key_dim)`. + key: Projected key `Tensor` of shape `(B, S, N, key_dim)`. + value: Projected value `Tensor` of shape `(B, S, N, value_dim)`. + attention_mask: a boolean mask of shape `(B, T, S)`, that prevents + attention to certain positions. It is generally not needed if the + `query` and `value` (and/or `key`) are masked. + training: Python boolean indicating whether the layer should behave in + training mode (adding dropout) or in inference mode (doing nothing). + + Returns: + attention_output: Multi-headed outputs of attention computation. + attention_scores: Multi-headed attention weights. + """ + # Note: Applying scalar multiply at the smaller end of einsum improves + # XLA performance, but may introduce slight numeric differences in + # the Transformer attention head. + query = tf.multiply(query, 1.0 / math.sqrt(float(self._key_dim))) + + # Take the dot product between "query" and "key" to get the raw + # attention scores. + attention_scores = tf.einsum(self._dot_product_equation, query, key) + + attention_scores = self._masked_softmax(attention_scores, attention_mask) + + # This is actually dropping out entire tokens to attend to, which might + # seem a bit unusual, but is taken from the original Transformer paper. + attention_scores_dropout = self._dropout_layer( + attention_scores, training=training) + + # `context_layer` = [B, T, N, H] + attention_output = tf.einsum(self._combine_equation, + attention_scores_dropout, value) + return attention_output, attention_scores diff --git a/official/projects/edgetpu/vision/modeling/optimized_multiheadattention_layer_test.py b/official/projects/edgetpu/vision/modeling/optimized_multiheadattention_layer_test.py new file mode 100644 index 0000000000000000000000000000000000000000..4d4ca514867fbfd318f4ef9ec3f945fa853bf9d3 --- /dev/null +++ b/official/projects/edgetpu/vision/modeling/optimized_multiheadattention_layer_test.py @@ -0,0 +1,81 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for optimized_multiheadattention_layer.""" + +import numpy as np +import tensorflow as tf + +from official.projects.edgetpu.vision.modeling import optimized_multiheadattention_layer + +_BATCH_SIZE = 32 +_SEQ_LEN = 4 +_EMBEDDING_SIZE = 8 +_NUM_HEADS = 2 +_KEY_DIM = 2 + + +class OptimizedMultiheadattentionLayerTest(tf.test.TestCase): + + def test_same_output(self): + """Tests that OptimizedMultiHeadAttention returns the expected outputs.""" + + input_tensor_1 = tf.random.uniform((_BATCH_SIZE, _SEQ_LEN, _EMBEDDING_SIZE)) + input_tensor_2 = tf.random.uniform((_BATCH_SIZE, _SEQ_LEN, _EMBEDDING_SIZE)) + + # Instantiate layer and call with inputs to build. + orig_layer = tf.keras.layers.MultiHeadAttention( + num_heads=_NUM_HEADS, key_dim=_KEY_DIM) + _ = orig_layer(input_tensor_1, input_tensor_2) + opt_layer = optimized_multiheadattention_layer.OptimizedMultiHeadAttention( + num_heads=_NUM_HEADS, key_dim=_KEY_DIM) + _ = opt_layer(input_tensor_1, input_tensor_2) + + # Set the weights of the two layers to be the same. + query_dense_weights = np.random.uniform( + size=(_EMBEDDING_SIZE, _NUM_HEADS, _KEY_DIM)) + query_dense_bias = np.random.uniform(size=(_NUM_HEADS, _KEY_DIM)) + key_dense_weights = np.random.uniform( + size=(_EMBEDDING_SIZE, _NUM_HEADS, _KEY_DIM)) + key_dense_bias = np.random.uniform(size=(_NUM_HEADS, _KEY_DIM)) + value_dense_weights = np.random.uniform( + size=(_EMBEDDING_SIZE, _NUM_HEADS, _KEY_DIM)) + value_dense_bias = np.random.uniform(size=(_NUM_HEADS, _KEY_DIM)) + attention_output_dense_weights = np.random.uniform( + size=(_NUM_HEADS, _KEY_DIM, _EMBEDDING_SIZE)) + attention_output_dense_bias = np.random.uniform(size=(_EMBEDDING_SIZE,)) + + orig_layer._query_dense.set_weights([query_dense_weights, query_dense_bias]) + orig_layer._key_dense.set_weights([key_dense_weights, key_dense_bias]) + orig_layer._value_dense.set_weights([value_dense_weights, value_dense_bias]) + orig_layer._output_dense.set_weights( + [attention_output_dense_weights, attention_output_dense_bias]) + + opt_layer._query_dense.set_weights([query_dense_weights, query_dense_bias]) + opt_layer._key_dense.set_weights([key_dense_weights, key_dense_bias]) + opt_layer._value_dense.set_weights([value_dense_weights, value_dense_bias]) + opt_layer._output_dense.set_weights( + [attention_output_dense_weights, attention_output_dense_bias]) + + # Calculate two sets of attention outputs and scores and compare. + orig_attn_output, orig_attn_score = orig_layer( + input_tensor_1, input_tensor_2, return_attention_scores=True) + opt_attn_output, opt_attn_score = opt_layer( + input_tensor_1, input_tensor_2, return_attention_scores=True) + self.assertAllClose(orig_attn_output, opt_attn_output) + self.assertAllClose(orig_attn_score, opt_attn_score) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/edgetpu/vision/serving/__init__.py b/official/projects/edgetpu/vision/serving/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/serving/__init__.py +++ b/official/projects/edgetpu/vision/serving/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/serving/export_tflite.py b/official/projects/edgetpu/vision/serving/export_tflite.py index 3014329e36e4ce547729c97d455eea69d2ac9e76..775b55f4ba9b27e39bddf844676fc0c5590cee7d 100644 --- a/official/projects/edgetpu/vision/serving/export_tflite.py +++ b/official/projects/edgetpu/vision/serving/export_tflite.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -60,6 +60,8 @@ flags.DEFINE_integer( 'image_size', 224, 'Size of the input image. Ideally should be the same as the image_size used ' 'in training config.') +flags.DEFINE_bool( + 'fix_batch_size', True, 'Whether to export model with fixed batch size.') flags.DEFINE_string( 'output_layer', None, 'Layer name to take the output from. Can be used to take the output from ' @@ -146,9 +148,11 @@ def run_export(): output_layer = model.get_layer(export_config.output_layer) model = tf.keras.Model(model.input, output_layer.output) + batch_size = 1 if FLAGS.fix_batch_size else None + model_input = tf.keras.Input( shape=(export_config.image_size, export_config.image_size, 3), - batch_size=1) + batch_size=batch_size) model_output = export_util.finalize_serving(model(model_input), export_config) model_for_inference = tf.keras.Model(model_input, model_output) diff --git a/official/projects/edgetpu/vision/serving/export_tflite_test.py b/official/projects/edgetpu/vision/serving/export_tflite_test.py index 179212c5b5f8b5906b2fb505ab32de98882c6034..6a0ae90629c0d6078cab9ae1ac4584a78298bac1 100644 --- a/official/projects/edgetpu/vision/serving/export_tflite_test.py +++ b/official/projects/edgetpu/vision/serving/export_tflite_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/serving/export_util.py b/official/projects/edgetpu/vision/serving/export_util.py index a98b149820a24894bf23cdece70cef42ff7885c8..5b208a40f403f796a1c5991069fcb896dc9fc4a3 100644 --- a/official/projects/edgetpu/vision/serving/export_util.py +++ b/official/projects/edgetpu/vision/serving/export_util.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -31,7 +31,7 @@ from official.projects.edgetpu.vision.modeling import custom_layers from official.projects.edgetpu.vision.modeling.backbones import mobilenet_edgetpu from official.projects.edgetpu.vision.tasks import image_classification from official.projects.edgetpu.vision.tasks import semantic_segmentation as edgetpu_semantic_segmentation -from official.vision.beta.tasks import semantic_segmentation +from official.vision.tasks import semantic_segmentation # pylint: enable=unused-import MEAN_RGB = [127.5, 127.5, 127.5] @@ -107,6 +107,12 @@ class ExportConfig(base_config.Config): def finalize_serving(model_output, export_config): """Adds extra layers based on the provided configuration.""" + if isinstance(model_output, dict): + return { + key: finalize_serving(model_output[key], export_config) + for key in model_output + } + finalize_method = export_config.finalize_method output_layer = model_output if not finalize_method or finalize_method[0] == 'none': @@ -183,8 +189,7 @@ def representative_dataset_gen(export_config): """Gets a python generator of numpy arrays for the given dataset.""" quantization_config = export_config.quantization_config dataset = tfds.builder( - quantization_config.dataset_name, - data_dir=quantization_config.dataset_dir) + quantization_config.dataset_name, try_gcs=True) dataset.download_and_prepare() data = dataset.as_dataset()[quantization_config.dataset_split] iterator = data.as_numpy_iterator() @@ -201,7 +206,8 @@ def configure_tflite_converter(export_config, converter): """Common code for picking up quantization parameters.""" quantization_config = export_config.quantization_config if quantization_config.quantize: - if quantization_config.dataset_dir is None: + if (quantization_config.dataset_dir is + None) and (quantization_config.dataset_name is None): raise ValueError( 'Must provide a representative dataset when quantizing the model.') converter.optimizations = [tf.lite.Optimize.DEFAULT] diff --git a/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator.py b/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator.py index b28adec43229a755bbff3cf24a9971b953f80d77..c4afb000b018255824dfe401f3d4ded9f248bacd 100644 --- a/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator.py +++ b/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_run.py b/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_run.py index 5a8981da05c33b09a840d665c08fbf66dfc4f81b..f74f90ac2fb2e526af9aaa3e90e4f1fcd69f0437 100644 --- a/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_run.py +++ b/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_run.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_test.py b/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_test.py index f531069000911cfff0b427e2022ee59981940aee..3fcaffa453701c0b2910b3db1f33a13df3969a17 100644 --- a/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_test.py +++ b/official/projects/edgetpu/vision/serving/tflite_imagenet_evaluator_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/tasks/__init__.py b/official/projects/edgetpu/vision/tasks/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/edgetpu/vision/tasks/__init__.py +++ b/official/projects/edgetpu/vision/tasks/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/edgetpu/vision/tasks/image_classification.py b/official/projects/edgetpu/vision/tasks/image_classification.py index cdb651a61b655daf4569b6a5efe8f4991dc828c1..6559368a2176c8acd5d2510504599635d89f9e2d 100644 --- a/official/projects/edgetpu/vision/tasks/image_classification.py +++ b/official/projects/edgetpu/vision/tasks/image_classification.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -28,8 +28,8 @@ from official.projects.edgetpu.vision.configs import mobilenet_edgetpu_config as from official.projects.edgetpu.vision.dataloaders import classification_input from official.projects.edgetpu.vision.modeling import mobilenet_edgetpu_v1_model from official.projects.edgetpu.vision.modeling import mobilenet_edgetpu_v2_model -from official.vision.beta.configs import image_classification as base_cfg -from official.vision.beta.dataloaders import input_reader_factory +from official.vision.configs import image_classification as base_cfg +from official.vision.dataloaders import input_reader_factory def _copy_recursively(src: str, dst: str) -> None: @@ -265,7 +265,7 @@ class EdgeTPUTask(base_task.Task): """Does forward and backward. Args: - inputs: A tuple of of input tensors of (features, labels). + inputs: A tuple of input tensors of (features, labels). model: A tf.keras.Model instance. optimizer: The optimizer for this training step. metrics: A nested structure of metrics objects. @@ -319,7 +319,7 @@ class EdgeTPUTask(base_task.Task): """Runs validatation step. Args: - inputs: A tuple of of input tensors of (features, labels). + inputs: A tuple of input tensors of (features, labels). model: A tf.keras.Model instance. metrics: A nested structure of metrics objects. diff --git a/official/projects/edgetpu/vision/tasks/image_classification_test.py b/official/projects/edgetpu/vision/tasks/image_classification_test.py index 8916fc92cad9d79bfe5225350fa6950afc18dd86..be250d9d405d8b6f6feac9202520c0b2b78a3a25 100644 --- a/official/projects/edgetpu/vision/tasks/image_classification_test.py +++ b/official/projects/edgetpu/vision/tasks/image_classification_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for image classification task.""" # pylint: disable=unused-import @@ -20,11 +19,11 @@ from absl.testing import parameterized import orbit import tensorflow as tf -from official.common import registry_imports from official.core import exp_factory from official.modeling import optimization from official.projects.edgetpu.vision.configs import mobilenet_edgetpu_config from official.projects.edgetpu.vision.tasks import image_classification +from official.vision import registry_imports # Dummy ImageNet TF dataset. diff --git a/official/projects/edgetpu/vision/tasks/semantic_segmentation.py b/official/projects/edgetpu/vision/tasks/semantic_segmentation.py index 28477f1bdff4b36b9499808d5d9b0fb3069862c4..d5cac8120fa85fcaaabdb9dd7e7e76426589c4a9 100644 --- a/official/projects/edgetpu/vision/tasks/semantic_segmentation.py +++ b/official/projects/edgetpu/vision/tasks/semantic_segmentation.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -27,11 +27,11 @@ from official.projects.edgetpu.vision.modeling import mobilenet_edgetpu_v1_model from official.projects.edgetpu.vision.modeling import mobilenet_edgetpu_v2_model from official.projects.edgetpu.vision.modeling.backbones import mobilenet_edgetpu # pylint: disable=unused-import from official.projects.edgetpu.vision.modeling.heads import bifpn_head -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.dataloaders import segmentation_input -from official.vision.beta.dataloaders import tfds_factory -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.tasks import semantic_segmentation +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import segmentation_input +from official.vision.dataloaders import tfds_factory +from official.vision.ops import preprocess_ops +from official.vision.tasks import semantic_segmentation class ClassMappingParser(segmentation_input.Parser): diff --git a/official/projects/edgetpu/vision/tasks/semantic_segmentation_test.py b/official/projects/edgetpu/vision/tasks/semantic_segmentation_test.py index c3c637c02c92a62982d59357b3565ba40c691708..d12eb8dcdcd369370d456ce56816a4e9a52ec7de 100644 --- a/official/projects/edgetpu/vision/tasks/semantic_segmentation_test.py +++ b/official/projects/edgetpu/vision/tasks/semantic_segmentation_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for semantic segmentation task.""" # pylint: disable=unused-import @@ -20,12 +19,12 @@ from absl.testing import parameterized import orbit import tensorflow as tf +from official import vision from official.core import exp_factory from official.modeling import optimization from official.projects.edgetpu.vision.configs import semantic_segmentation_config as seg_cfg from official.projects.edgetpu.vision.configs import semantic_segmentation_searched_config as autoseg_cfg from official.projects.edgetpu.vision.tasks import semantic_segmentation as img_seg_task -from official.vision import beta # Dummy ADE20K TF dataset. diff --git a/official/projects/edgetpu/vision/train.py b/official/projects/edgetpu/vision/train.py index 3b4a432e02a6ac71ac1935eee827c7902c628ea6..d08da93810d1274e62d9976e14440d9e75242c25 100644 --- a/official/projects/edgetpu/vision/train.py +++ b/official/projects/edgetpu/vision/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,16 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """TensorFlow Model Garden Vision training for MobileNet-EdgeTPU.""" from absl import app from absl import flags import gin -# pylint: disable=unused-import -from official.common import registry_imports -# pylint: enable=unused-import from official.common import distribute_utils from official.common import flags as tfm_flags from official.core import task_factory @@ -35,6 +31,7 @@ from official.projects.edgetpu.vision.configs import semantic_segmentation_searc from official.projects.edgetpu.vision.modeling.backbones import mobilenet_edgetpu from official.projects.edgetpu.vision.tasks import image_classification from official.projects.edgetpu.vision.tasks import semantic_segmentation +from official.vision import registry_imports # pylint: enable=unused-import FLAGS = flags.FLAGS diff --git a/official/projects/labse/README.md b/official/projects/labse/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bb6abcca16d8c27e2376e7c93dfad24abd2a0115 --- /dev/null +++ b/official/projects/labse/README.md @@ -0,0 +1,111 @@ +# Language-agnostic BERT Sentence Embedding + +The repository contains the implementation and experiment definition of `LaBSE`, +[Language-agnostic BERT Sentence Embedding](https://arxiv.org/pdf/2007.01852.pdf). +The implementation is provided by the paper author, Yinfei Yang. Note that, +the cross-accelerator batch softmax is not implemented by the author, so the +implementation does not fully reproduce the paper yet. + +Due to the data policy, the authors are not able to release the pre-training and +fine-tuning data for `LaBSE` training. + +### Requirements + +The starter code requires Tensorflow. If you haven't installed it yet, follow +the instructions on [tensorflow.org][1]. +This code has been tested with Tensorflow 2.8.0. Going forward, +we will continue to target the latest released version of Tensorflow. + +Please verify that you have Python 3.7+ and Tensorflow 2.8.0 or higher +installed by running the following commands: + +```sh +python --version +python -c 'import tensorflow as tf; print(tf.__version__)' +``` + +Refer to the [instructions here][2] +for using the model in this repo. Make sure to add the models folder to your +Python path. + +[1]: https://www.tensorflow.org/install/ +[2]: +https://github.com/tensorflow/models/tree/master/official#running-the-models + +## Data + +The pre-training data should be multi-lingual and the format is the same as BERT +pre-training. + +The fine-tuning data follows the format as below: + +```text +{ # (tensorflow.Example) + features: { + feature: { + key : "src_raw" + value: { + bytes_list: { + value: [ "Foo. " ] + } + } + } + feature: { + key : "tgt_raw" + value: { + bytes_list: { + value: [ "Bar. " ] + } + } + } + } +} +``` + +## Train using the config file. + +After you generated your pretraining data, run the following command to start +pretraining: + +```bash +TPU=local +VOCAB=??? +INIT_CHECKPOINT=??? +PARAMS="task.train_data.input_data=/path/to/train/data" +PARAMS="${PARAMS},task.train_data.vocab_file=${VOCAB}" +PARAMS="${PARAMS},task.validation_data.input_path=/path/to/validation/data" +PARAMS="${PARAMS},task.validation_data.vocab_file=${VOCAB}" +PARAMS="${PARAMS},task.init_checkpoint=${INIT_CHECKPOINT}" +PARAMS="${PARAMS},runtime.distribution_strategy=tpu" + +python3 train.py \ + --experiment=labse/train \ + --config_file=./experiments/labse_bert_base.yaml \ + --config_file=./experiments/labse_base.yaml \ + --params_override=${PARAMS} \ + --tpu=${TPU} \ + --model_dir=/folder/to/hold/logs/and/models/ \ + --mode=train_and_eval +``` + +## Implementation + +We implement the encoder and layers using `tf.keras` APIs in NLP +modeling library: + + * [dual_encoder.py](https://github.com/tensorflow/models/blob/master/official/nlp/tasks/dual_encoder.py) + contains the dual-encoder task used for labse training. + + * [config_labse.py](https://github.com/tensorflow/models/blob/master/official/projects/labse/config_labse.py) + registers the labse training experiment. + + * [train.py](https://github.com/tensorflow/models/blob/master/official/projects/labse/train.py) + is the program entry. + + +## Pre-trained model through TF-HUB + +If you are looking for pre-trained models, please check out: +https://tfhub.dev/google/LaBSE/2. +The hub `SavedModel`s are exported through the `export_tfhub.py` in +this repository. diff --git a/official/projects/labse/config_labse.py b/official/projects/labse/config_labse.py new file mode 100644 index 0000000000000000000000000000000000000000..4dba0e32a03c150b324a01bb4f8df217f6908ebe --- /dev/null +++ b/official/projects/labse/config_labse.py @@ -0,0 +1,68 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# pylint: disable=g-doc-return-or-yield,line-too-long +"""LaBSE configurations.""" +import dataclasses +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import optimization +from official.nlp.data import dual_encoder_dataloader +from official.nlp.tasks import dual_encoder + +AdamWeightDecay = optimization.AdamWeightDecayConfig +PolynomialLr = optimization.PolynomialLrConfig +PolynomialWarmupConfig = optimization.PolynomialWarmupConfig + + +@dataclasses.dataclass +class LaBSEOptimizationConfig(optimization.OptimizationConfig): + """Bert optimization config.""" + optimizer: optimization.OptimizerConfig = optimization.OptimizerConfig( + type="adamw", adamw=AdamWeightDecay()) + learning_rate: optimization.LrConfig = optimization.LrConfig( + type="polynomial", + polynomial=PolynomialLr( + initial_learning_rate=1e-4, + decay_steps=1000000, + end_learning_rate=0.0)) + warmup: optimization.WarmupConfig = optimization.WarmupConfig( + type="polynomial", polynomial=PolynomialWarmupConfig(warmup_steps=10000)) + + +@exp_factory.register_config_factory("labse/train") +def labse_train() -> cfg.ExperimentConfig: + r"""Language-agnostic bert sentence embedding. + + *Note*: this experiment does not use cross-accelerator global softmax so it + does not reproduce the exact LABSE training. + """ + config = cfg.ExperimentConfig( + task=dual_encoder.DualEncoderConfig( + train_data=dual_encoder_dataloader.DualEncoderDataConfig(), + validation_data=dual_encoder_dataloader.DualEncoderDataConfig( + is_training=False, drop_remainder=False)), + trainer=cfg.TrainerConfig( + optimizer_config=LaBSEOptimizationConfig( + learning_rate=optimization.LrConfig( + type="polynomial", + polynomial=PolynomialLr( + initial_learning_rate=3e-5, end_learning_rate=0.0)), + warmup=optimization.WarmupConfig( + type="polynomial", polynomial=PolynomialWarmupConfig()))), + restrictions=[ + "task.train_data.is_training != None", + "task.validation_data.is_training != None" + ]) + return config diff --git a/official/projects/labse/experiments/labse_base.yaml b/official/projects/labse/experiments/labse_base.yaml new file mode 100644 index 0000000000000000000000000000000000000000..a2487fd12580478b8854a1e422f1b569752aaeaf --- /dev/null +++ b/official/projects/labse/experiments/labse_base.yaml @@ -0,0 +1,85 @@ +task: + hub_module_url: '' + model: + bidirectional: true + max_sequence_length: 32 + logit_scale: 100 + logit_margin: 0.3 + init_checkpoint: 'the pre-trained BERT checkpoint using the labse vocab.' + train_data: + drop_remainder: true + global_batch_size: 4096 + input_path: 'the path to train partition' + left_text_fields: ['src_raw'] + right_text_fields: ['tgt_raw'] + vocab_file: 'the path to vocab.txt' + lower_case: false + is_training: true + seq_length: 32 + sharding: false + cycle_length: 4 + shuffle_buffer_size: 1000 + tfds_as_supervised: false + tfds_data_dir: '' + tfds_name: '' + tfds_skip_decoding_feature: '' + tfds_split: '' + validation_data: + block_length: 1 + cache: false + cycle_length: 4 + drop_remainder: false + global_batch_size: 32000 + input_path: 'the path to validation partition' + left_text_fields: ['src_raw'] + right_text_fields: ['tgt_raw'] + vocab_file: 'the path to vocab.txt' + lower_case: false + is_training: false + seq_length: 32 + sharding: true + shuffle_buffer_size: 1000 + tfds_as_supervised: false + tfds_data_dir: '' + tfds_name: '' + tfds_skip_decoding_feature: '' + tfds_split: '' +trainer: + checkpoint_interval: 1000 + eval_tf_function: true + max_to_keep: 5 + optimizer_config: + learning_rate: + polynomial: + cycle: false + decay_steps: 500000 + end_learning_rate: 0.0 + initial_learning_rate: 1.0e-04 + name: PolynomialDecay + power: 1.0 + type: polynomial + optimizer: + adamw: + amsgrad: false + beta_1: 0.9 + beta_2: 0.999 + epsilon: 1.0e-05 + exclude_from_weight_decay: null + include_in_weight_decay: null + name: AdamWeightDecay + weight_decay_rate: 0.0 + gradient_clip_norm: 100 + type: adamw + warmup: + polynomial: + name: polynomial + power: 1 + warmup_steps: 5000 + type: polynomial + steps_per_loop: 1000 + summary_interval: 1000 + train_tf_function: true + train_tf_while_loop: true + train_steps: 500000 + validation_interval: 1000 + validation_steps: 100 diff --git a/official/projects/labse/experiments/labse_bert_base.yaml b/official/projects/labse/experiments/labse_bert_base.yaml new file mode 100644 index 0000000000000000000000000000000000000000..bd292cdf5c5b1ef1e2e67432e514fe63d4de2f17 --- /dev/null +++ b/official/projects/labse/experiments/labse_bert_base.yaml @@ -0,0 +1,15 @@ +task: + model: + encoder: + bert: + attention_dropout_rate: 0.1 + dropout_rate: 0.1 + hidden_activation: gelu + hidden_size: 768 + initializer_range: 0.02 + intermediate_size: 3072 + max_position_embeddings: 512 + num_attention_heads: 12 + num_layers: 12 + type_vocab_size: 2 + vocab_size: 501153 diff --git a/official/projects/labse/export_tfhub.py b/official/projects/labse/export_tfhub.py new file mode 100644 index 0000000000000000000000000000000000000000..6adb53c79840a0b05dd2130cd2529b728a503856 --- /dev/null +++ b/official/projects/labse/export_tfhub.py @@ -0,0 +1,161 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Exports the LaBSE model and its preprocessing as SavedModels for TF Hub. + +Example usage: +# Point this variable to your training results. +# Note that flag --do_lower_case is inferred from the name. +LaBSE_DIR= +# Step 1: export the core LaBSE model. +python3 ./export_tfhub.py \ + --bert_config_file ${LaBSE_DIR:?}/bert_config.json \ + --model_checkpoint_path ${LaBSE_DIR:?}/labse_model.ckpt \ + --vocab_file ${LaBSE_DIR:?}/vocab.txt \ + --export_type model --export_path /tmp/labse_model +# Step 2: export matching preprocessing (be sure to use same flags). +python3 ./export_tfhub.py \ + --vocab_file ${LaBSE_DIR:?}/vocab.txt \ + --export_type preprocessing --export_path /tmp/labse_preprocessing +""" + +from typing import Text + +from absl import app +from absl import flags +from absl import logging +import tensorflow as tf + +from official.legacy.bert import bert_models +from official.legacy.bert import configs +from official.nlp.modeling import models +from official.nlp.tasks import utils +from official.nlp.tools import export_tfhub_lib + +FLAGS = flags.FLAGS + +flags.DEFINE_enum("export_type", "model", ["model", "preprocessing"], + "The type of model to export") +flags.DEFINE_string("export_path", None, "TF-Hub SavedModel destination path.") +flags.DEFINE_string( + "bert_tfhub_module", None, + "Bert tfhub module to define core bert layers. Needed for --export_type " + "model.") +flags.DEFINE_string( + "bert_config_file", None, + "Bert configuration file to define core bert layers. It will not be used " + "if bert_tfhub_module is set. Needed for --export_type model.") +flags.DEFINE_string( + "model_checkpoint_path", None, "File path to TF model checkpoint. " + "Needed for --export_type model.") +flags.DEFINE_string( + "vocab_file", None, + "The vocabulary file that the BERT model was trained on. " + "Needed for both --export_type model and preprocessing.") +flags.DEFINE_bool( + "do_lower_case", None, + "Whether to lowercase before tokenization. If left as None, " + "do_lower_case will be enabled if 'uncased' appears in the " + "name of --vocab_file. " + "Needed for both --export_type model and preprocessing.") +flags.DEFINE_integer( + "default_seq_length", 128, + "The sequence length of preprocessing results from " + "top-level preprocess method. This is also the default " + "sequence length for the bert_pack_inputs subobject." + "Needed for --export_type preprocessing.") +flags.DEFINE_bool( + "tokenize_with_offsets", False, # TODO(b/181866850) + "Whether to export a .tokenize_with_offsets subobject for " + "--export_type preprocessing.") +flags.DEFINE_bool( + "normalize", True, + "Parameter of DualEncoder model, normalize the embedding (pooled_output) " + "if set to True.") + + +def _get_do_lower_case(do_lower_case, vocab_file): + """Returns do_lower_case, replacing None by a guess from vocab file name.""" + if do_lower_case is None: + do_lower_case = "uncased" in vocab_file + logging.info("Using do_lower_case=%s based on name of vocab_file=%s", + do_lower_case, vocab_file) + return do_lower_case + + +def create_labse_model(bert_tfhub_module: Text, + bert_config: configs.BertConfig, + normalize: bool) -> tf.keras.Model: + """Creates a LaBSE keras core model from BERT configuration. + + Args: + bert_tfhub_module: The bert tfhub module path. The LaBSE will be built upon + the tfhub module if it is not empty. + bert_config: A `BertConfig` to create the core model. Used if + bert_tfhub_module is empty. + normalize: Parameter of DualEncoder model, normalize the embedding ( + pooled_output) if set to True. + + Returns: + A keras model. + """ + if bert_tfhub_module: + encoder_network = utils.get_encoder_from_hub(bert_tfhub_module) + else: + encoder_network = bert_models.get_transformer_encoder( + bert_config, sequence_length=None) + + labse_model = models.DualEncoder( + network=encoder_network, + max_seq_length=None, + normalize=normalize, + output="predictions") + return labse_model, encoder_network # pytype: disable=bad-return-type # typed-keras + + +def export_labse_model(bert_tfhub_module: Text, bert_config: configs.BertConfig, + model_checkpoint_path: Text, hub_destination: Text, + vocab_file: Text, do_lower_case: bool, normalize: bool): + """Restores a tf.keras.Model and saves for TF-Hub.""" + core_model, encoder = create_labse_model( + bert_tfhub_module, bert_config, normalize) + checkpoint = tf.train.Checkpoint(encoder=encoder) + checkpoint.restore(model_checkpoint_path).assert_existing_objects_matched() + core_model.vocab_file = tf.saved_model.Asset(vocab_file) + core_model.do_lower_case = tf.Variable(do_lower_case, trainable=False) + core_model.save(hub_destination, include_optimizer=False, save_format="tf") + + +def main(_): + do_lower_case = export_tfhub_lib.get_do_lower_case(FLAGS.do_lower_case, + FLAGS.vocab_file) + if FLAGS.export_type == "model": + if FLAGS.bert_tfhub_module: + bert_config = None + else: + bert_config = configs.BertConfig.from_json_file(FLAGS.bert_config_file) + export_labse_model(FLAGS.bert_tfhub_module, bert_config, + FLAGS.model_checkpoint_path, FLAGS.export_path, + FLAGS.vocab_file, do_lower_case, FLAGS.normalize) + elif FLAGS.export_type == "preprocessing": + # LaBSE is still a BERT model, reuse the export_bert_preprocessing here. + export_tfhub_lib.export_bert_preprocessing( + FLAGS.export_path, FLAGS.vocab_file, do_lower_case, + FLAGS.default_seq_length, FLAGS.tokenize_with_offsets) + else: + raise app.UsageError("Unknown value '%s' for flag --export_type") + + +if __name__ == "__main__": + app.run(main) diff --git a/official/projects/labse/export_tfhub_test.py b/official/projects/labse/export_tfhub_test.py new file mode 100644 index 0000000000000000000000000000000000000000..f45c200441c22bc695a7559876f747054695b028 --- /dev/null +++ b/official/projects/labse/export_tfhub_test.py @@ -0,0 +1,111 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests LaBSE's export_tfhub.""" + +import os + +# Import libraries +import numpy as np +import tensorflow as tf +import tensorflow_hub as hub +from official.legacy.bert import configs +from official.projects.labse import export_tfhub + + +class ExportModelTest(tf.test.TestCase): + + def test_export_model(self): + # Exports a savedmodel for TF-Hub + hidden_size = 16 + bert_config = configs.BertConfig( + vocab_size=100, + hidden_size=hidden_size, + intermediate_size=32, + max_position_embeddings=128, + num_attention_heads=2, + num_hidden_layers=1) + labse_model, encoder = export_tfhub.create_labse_model( + None, bert_config, normalize=True) + model_checkpoint_dir = os.path.join(self.get_temp_dir(), "checkpoint") + checkpoint = tf.train.Checkpoint(encoder=encoder) + checkpoint.save(os.path.join(model_checkpoint_dir, "test")) + model_checkpoint_path = tf.train.latest_checkpoint(model_checkpoint_dir) + + vocab_file = os.path.join(self.get_temp_dir(), "uncased_vocab.txt") + with tf.io.gfile.GFile(vocab_file, "w") as f: + f.write("dummy content") + + hub_destination = os.path.join(self.get_temp_dir(), "hub") + export_tfhub.export_labse_model( + None, # bert_tfhub_module + bert_config, + model_checkpoint_path, + hub_destination, + vocab_file, + do_lower_case=True, + normalize=True) + + # Restores a hub KerasLayer. + hub_layer = hub.KerasLayer(hub_destination, trainable=True) + + if hasattr(hub_layer, "resolved_object"): + # Checks meta attributes. + self.assertTrue(hub_layer.resolved_object.do_lower_case.numpy()) + with tf.io.gfile.GFile( + hub_layer.resolved_object.vocab_file.asset_path.numpy()) as f: + self.assertEqual("dummy content", f.read()) + # Checks the hub KerasLayer. + for source_weight, hub_weight in zip(labse_model.trainable_weights, + hub_layer.trainable_weights): + self.assertAllClose(source_weight.numpy(), hub_weight.numpy()) + + seq_length = 10 + dummy_ids = np.zeros((2, seq_length), dtype=np.int32) + hub_outputs = hub_layer([dummy_ids, dummy_ids, dummy_ids]) + source_outputs = labse_model([dummy_ids, dummy_ids, dummy_ids]) + + self.assertEqual(hub_outputs["pooled_output"].shape, (2, hidden_size)) + self.assertEqual(hub_outputs["sequence_output"].shape, + (2, seq_length, hidden_size)) + for output_name in source_outputs: + self.assertAllClose(hub_outputs[output_name].numpy(), + hub_outputs[output_name].numpy()) + + # Test that training=True makes a difference (activates dropout). + def _dropout_mean_stddev(training, num_runs=20): + input_ids = np.array([[14, 12, 42, 95, 99]], np.int32) + inputs = [input_ids, np.ones_like(input_ids), np.zeros_like(input_ids)] + outputs = np.concatenate([ + hub_layer(inputs, training=training)["pooled_output"] + for _ in range(num_runs) + ]) + return np.mean(np.std(outputs, axis=0)) + + self.assertLess(_dropout_mean_stddev(training=False), 1e-6) + self.assertGreater(_dropout_mean_stddev(training=True), 1e-3) + + # Test propagation of seq_length in shape inference. + input_word_ids = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) + input_mask = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) + input_type_ids = tf.keras.layers.Input(shape=(seq_length,), dtype=tf.int32) + outputs = hub_layer([input_word_ids, input_mask, input_type_ids]) + self.assertEqual(outputs["pooled_output"].shape.as_list(), + [None, hidden_size]) + self.assertEqual(outputs["sequence_output"].shape.as_list(), + [None, seq_length, hidden_size]) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/projects/labse/train.py b/official/projects/labse/train.py new file mode 100644 index 0000000000000000000000000000000000000000..7e9cc7d11c3df2c7697f607871fcf82c1a45d905 --- /dev/null +++ b/official/projects/labse/train.py @@ -0,0 +1,27 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Labse training driver, register labse configs.""" + +# pylint: disable=unused-import +from absl import app + +from official.common import flags as tfm_flags +from official.nlp import tasks +from official.nlp import train +from official.projects.labse import config_labse + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/longformer/README.md b/official/projects/longformer/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8db9a3ac8fb11f47c47370138e760e7134c37b65 --- /dev/null +++ b/official/projects/longformer/README.md @@ -0,0 +1,50 @@ +# Longformer: The Long-Document Transformer + +## Modifications from Huggingface's Implementation + +All models require a `global_attention_size` specified in the config, setting a +global attention for all first `global_attention_size` tokens in any sentence. +Individual different global attention sizes for sentences are not supported. +This setting allows running on TPUs where tensor sizes have to be determined. + +`_get_global_attn_indices` in `longformer_attention.py` contains how the new +global attention indices are specified. Changed all `tf.cond` to if +confiditions, since global attention is specified in the start now. + +To load weights from a pre-trained huggingface longformer, run +`utils/convert_pretrained_pytorch_checkpoint_to_tf.py` to create a checkpoint. \ +There is also a `utils/longformer_tokenizer_to_tfrecord.py` that transformers +pytorch longformer tokenized data to tf_records. + +## Steps to Fine-tune on MNLI +#### Prepare the pre-trained checkpoint +Option 1. Use our saved checkpoint of `allenai/longformer-base-4096` stored in cloud storage + +```bash +gsutil cp -r gs://model-garden-ucsd-zihan/longformer-4096 . +``` +Option 2. Create it directly + +```bash +python3 utils/convert_pretrained_pytorch_checkpoint_to_tf.py +``` +#### [Optional] Prepare the input file +```bash +python3 longformer_tokenizer_to_tfrecord.py +``` +#### Training +Here, we use the training data of MNLI that were uploaded to the cloud storage, you can replace it with the input files you generated. + +```bash +TRAIN_DATA=task.train_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_train.tf_record,task.validation_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_eval.tf_record +INIT_CHECKPOINT=longformer-4096/longformer +PYTHONPATH=/path/to/model/garden \ + python3 train.py \ + --experiment=longformer/glue \ + --config_file=experiments/glue_mnli_allenai.yaml \ + --params_override="${TRAIN_DATA},runtime.distribution_strategy=tpu,task.init_checkpoint=${INIT_CHECKPOINT}" \ + --tpu=local \ + --model_dir=/path/to/outputdir \ + --mode=train_and_eval +``` +This should take ~ 3 hours to run, and give a performance of ~86. diff --git a/official/projects/longformer/experiments/glue_mnli.yaml b/official/projects/longformer/experiments/glue_mnli.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7c5540cfe9f144efc544570e8aea198e25b363d6 --- /dev/null +++ b/official/projects/longformer/experiments/glue_mnli.yaml @@ -0,0 +1,47 @@ +task: + hub_module_url: '' + model: + num_classes: 3 + encoder: + type: any + any: + max_position_embeddings: 512 + attention_window: [32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32] + global_attention_size: 1 + metric_type: 'accuracy' + train_data: + drop_remainder: true + global_batch_size: 32 + input_path: TODO + is_training: true + seq_length: 128 + validation_data: + drop_remainder: true + global_batch_size: 32 + input_path: TODO + is_training: false + seq_length: 128 +trainer: + checkpoint_interval: 1000 + continuous_eval_timeout: 7200 + optimizer_config: + learning_rate: + polynomial: + decay_steps: 61359 + end_learning_rate: 0.0 + initial_learning_rate: 3.0e-05 + power: 1.0 + type: polynomial + optimizer: + type: adamw + warmup: + polynomial: + power: 1 + warmup_steps: 6136 + type: polynomial + steps_per_loop: 100 + summary_interval: 100 + # Training data size 392,702 examples, 5 epochs. + train_steps: 61359 + validation_interval: 2000 + validation_steps: 307 diff --git a/official/projects/longformer/experiments/glue_mnli_allenai.yaml b/official/projects/longformer/experiments/glue_mnli_allenai.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c3495786de838efb73fffe0df415e1687132f847 --- /dev/null +++ b/official/projects/longformer/experiments/glue_mnli_allenai.yaml @@ -0,0 +1,48 @@ +task: + hub_module_url: '' + model: + num_classes: 3 + encoder: + type: any + any: + max_position_embeddings: 4098 + attention_window: [128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128, 128] + global_attention_size: 1 + vocab_size: 50265 + metric_type: 'accuracy' + train_data: + drop_remainder: true + global_batch_size: 32 + input_path: TODO + is_training: true + seq_length: 512 + validation_data: + drop_remainder: true + global_batch_size: 32 + input_path: TODO + is_training: false + seq_length: 512 +trainer: + checkpoint_interval: 1000 + continuous_eval_timeout: 7200 + optimizer_config: + learning_rate: + polynomial: + decay_steps: 61359 + end_learning_rate: 0.0 + initial_learning_rate: 3.0e-05 + power: 1.0 + type: polynomial + optimizer: + type: adamw + warmup: + polynomial: + power: 1 + warmup_steps: 6136 + type: polynomial + steps_per_loop: 1000 + summary_interval: 1000 + # Training data size 392,702 examples, 5 epochs. + train_steps: 61359 + validation_interval: 2000 + validation_steps: 307 diff --git a/official/projects/longformer/experiments/pretraining_512.yaml b/official/projects/longformer/experiments/pretraining_512.yaml new file mode 100644 index 0000000000000000000000000000000000000000..152d1356690800b160283075faf6d3ae2e9e8198 --- /dev/null +++ b/official/projects/longformer/experiments/pretraining_512.yaml @@ -0,0 +1,74 @@ +task: + init_checkpoint: "" + model: + cls_heads: + [ + { + activation: tanh, + cls_token_idx: 0, + dropout_rate: 0.1, + inner_dim: 768, + name: next_sentence, + num_classes: 2, + }, + ] + encoder: + type: any + any: + attention_dropout_rate: 0.1 + dropout_rate: 0.1 + embedding_size: 768 + hidden_activation: gelu + hidden_size: 768 + initializer_range: 0.02 + intermediate_size: 3072 + max_position_embeddings: 512 + num_attention_heads: 12 + num_layers: 12 + type_vocab_size: 2 + vocab_size: 30522 + attention_window: [32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32] + global_attention_size: 1 + train_data: + drop_remainder: true + global_batch_size: 256 + input_path: gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00000-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00001-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00002-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00003-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00004-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00005-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00006-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00007-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00008-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00009-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00010-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00011-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00012-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00013-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00014-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00015-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00016-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00017-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00018-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00019-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00020-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00021-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00022-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00023-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00024-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00025-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00026-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00027-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00028-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00029-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00030-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00031-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00032-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00033-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00034-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00035-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00036-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00037-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00038-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00039-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00040-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00041-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00042-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00043-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00044-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00045-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00046-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00047-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00048-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00049-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00050-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00051-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00052-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00053-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00054-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00055-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00056-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00057-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00058-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00059-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00060-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00061-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00062-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00063-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00064-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00065-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00066-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00067-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00068-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00069-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00070-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00071-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00072-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00073-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00074-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00075-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00076-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00077-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00078-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00079-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00080-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00081-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00082-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00083-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00084-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00085-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00086-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00087-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00088-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00089-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00090-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00091-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00092-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00093-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00094-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00095-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00096-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00097-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00098-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00099-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00100-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00101-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00102-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00103-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00104-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00105-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00106-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00107-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00108-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00109-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00110-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00111-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00112-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00113-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00114-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00115-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00116-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00117-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00118-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00119-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00120-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00121-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00122-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00123-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00124-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00125-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00126-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00127-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00128-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00129-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00130-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00131-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00132-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00133-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00134-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00135-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00136-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00137-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00138-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00139-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00140-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00141-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00142-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00143-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00144-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00145-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00146-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00147-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00148-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00149-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00150-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00151-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00152-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00153-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00154-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00155-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00156-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00157-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00158-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00159-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00160-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00161-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00162-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00163-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00164-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00165-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00166-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00167-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00168-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00169-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00170-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00171-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00172-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00173-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00174-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00175-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00176-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00177-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00178-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00179-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00180-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00181-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00182-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00183-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00184-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00185-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00186-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00187-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00188-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00189-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00190-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00191-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00192-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00193-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00194-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00195-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00196-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00197-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00198-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00199-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00200-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00201-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00202-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00203-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00204-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00205-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00206-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00207-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00208-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00209-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00210-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00211-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00212-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00213-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00214-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00215-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00216-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00217-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00218-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00219-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00220-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00221-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00222-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00223-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00224-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00225-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00226-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00227-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00228-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00229-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00230-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00231-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00232-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00233-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00234-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00235-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00236-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00237-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00238-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00239-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00240-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00241-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00242-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00243-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00244-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00245-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00246-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00247-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00248-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00249-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00250-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00251-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00252-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00253-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00254-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00255-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00256-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00257-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00258-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00259-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00260-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00261-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00262-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00263-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00264-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00265-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00266-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00267-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00268-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00269-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00270-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00271-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00272-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00273-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00274-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00275-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00276-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00277-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00278-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00279-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00280-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00281-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00282-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00283-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00284-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00285-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00286-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00287-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00288-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00289-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00290-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00291-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00292-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00293-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00294-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00295-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00296-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00297-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00298-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00299-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00300-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00301-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00302-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00303-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00304-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00305-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00306-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00307-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00308-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00309-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00310-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00311-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00312-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00313-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00314-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00315-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00316-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00317-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00318-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00319-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00320-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00321-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00322-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00323-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00324-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00325-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00326-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00327-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00328-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00329-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00330-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00331-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00332-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00333-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00334-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00335-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00336-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00337-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00338-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00339-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00340-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00341-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00342-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00343-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00344-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00345-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00346-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00347-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00348-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00349-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00350-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00351-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00352-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00353-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00354-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00355-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00356-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00357-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00358-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00359-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00360-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00361-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00362-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00363-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00364-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00365-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00366-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00367-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00368-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00369-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00370-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00371-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00372-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00373-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00374-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00375-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00376-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00377-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00378-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00379-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00380-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00381-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00382-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00383-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00384-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00385-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00386-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00387-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00388-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00389-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00390-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00391-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00392-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00393-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00394-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00395-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00396-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00397-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00398-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00399-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00400-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00401-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00402-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00403-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00404-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00405-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00406-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00407-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00408-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00409-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00410-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00411-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00412-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00413-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00414-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00415-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00416-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00417-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00418-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00419-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00420-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00421-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00422-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00423-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00424-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00425-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00426-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00427-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00428-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00429-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00430-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00431-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00432-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00433-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00434-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00435-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00436-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00437-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00438-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00439-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00440-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00441-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00442-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00443-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00444-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00445-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00446-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00447-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00448-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00449-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00450-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00451-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00452-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00453-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00454-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00455-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00456-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00457-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00458-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00459-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00460-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00461-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00462-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00463-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00464-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00465-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00466-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00467-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00468-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00469-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00470-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00471-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00472-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00473-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00474-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00475-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00476-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00477-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00478-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00479-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00480-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00481-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00482-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00483-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00484-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00485-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00486-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00487-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00488-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00489-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00490-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00491-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00492-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00493-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00494-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00495-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00496-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00497-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00498-of-00500,gs://tf_model_garden/nlp/data/research_data/bert_pretrain/wikipedia.tfrecord-00499-of-00500 + + is_training: true + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: true + use_position_id: false + validation_data: + drop_remainder: true + global_batch_size: 256 + input_path: TODO + is_training: false + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: true + use_position_id: false +trainer: + checkpoint_interval: 20000 + max_to_keep: 5 + optimizer_config: + learning_rate: + polynomial: + cycle: false + decay_steps: 1000000 + end_learning_rate: 0.0 + initial_learning_rate: 0.0001 + power: 1.0 + type: polynomial + optimizer: + type: adamw + warmup: + polynomial: + power: 1 + warmup_steps: 10000 + type: polynomial + steps_per_loop: 50 + summary_interval: 50 + train_steps: 1000000 + validation_interval: 1000 + validation_steps: 64 diff --git a/official/projects/longformer/longformer.py b/official/projects/longformer/longformer.py new file mode 100644 index 0000000000000000000000000000000000000000..76a491ccb5fd70c35a19aaab6935a13de2bcfcb9 --- /dev/null +++ b/official/projects/longformer/longformer.py @@ -0,0 +1,69 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Longformer model configurations and instantiation methods.""" +import dataclasses +from typing import List + +import tensorflow as tf + +from official.modeling import tf_utils +from official.modeling.hyperparams import base_config +from official.nlp.configs import encoders +from official.projects.longformer.longformer_encoder import LongformerEncoder + + +@dataclasses.dataclass +class LongformerEncoderConfig(encoders.BertEncoderConfig): + """Extra paramerters for Longformer configs. + + Attributes: + attention_window: list of ints representing the window size for each layer. + global_attention_size: the size of global attention used for each token. + pad_token_id: the token id for the pad token + """ + attention_window: List[int] = dataclasses.field(default_factory=list) + global_attention_size: int = 0 + pad_token_id: int = 1 + + +@base_config.bind(LongformerEncoderConfig) +def get_encoder(encoder_cfg: LongformerEncoderConfig): + """Gets a 'LongformerEncoder' object. + + Args: + encoder_cfg: A 'LongformerEncoderConfig'. + + Returns: + A encoder object. + """ + encoder = LongformerEncoder( + attention_window=encoder_cfg.attention_window, + global_attention_size=encoder_cfg.global_attention_size, + vocab_size=encoder_cfg.vocab_size, + hidden_size=encoder_cfg.hidden_size, + num_layers=encoder_cfg.num_layers, + num_attention_heads=encoder_cfg.num_attention_heads, + inner_dim=encoder_cfg.intermediate_size, + inner_activation=tf_utils.get_activation(encoder_cfg.hidden_activation), + output_dropout=encoder_cfg.dropout_rate, + attention_dropout=encoder_cfg.attention_dropout_rate, + max_sequence_length=encoder_cfg.max_position_embeddings, + type_vocab_size=encoder_cfg.type_vocab_size, + initializer=tf.keras.initializers.TruncatedNormal( + stddev=encoder_cfg.initializer_range), + output_range=encoder_cfg.output_range, + embedding_width=encoder_cfg.embedding_size, + norm_first=encoder_cfg.norm_first) + return encoder diff --git a/official/projects/longformer/longformer_attention.py b/official/projects/longformer/longformer_attention.py new file mode 100644 index 0000000000000000000000000000000000000000..f8d884220542b1daea63f7fde77bd7bee8b6c70e --- /dev/null +++ b/official/projects/longformer/longformer_attention.py @@ -0,0 +1,1082 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Longformer attention block. Modified From huggingface/transformers.""" + +# pylint: disable=g-classes-have-attributes + +import math +import string + +import numpy as np +import tensorflow as tf + +from official.modeling.tf_utils import get_shape_list + +_CHR_IDX = string.ascii_lowercase + + +def _build_attention_equation(rank, attn_axes): + """Builds einsum equations for the attention computation. + + Query, key, value inputs after projection are expected to have the shape as: + `(bs, , , num_heads, channels)`. + `bs` and `` are treated as ``. + The attention operations can be generalized: + (1) Query-key dot product: + `(, , num_heads, channels), (, + , num_heads, channels) -> (, + num_heads, , )` + (2) Combination: + `(, num_heads, , ), + (, , num_heads, channels) -> (, + , num_heads, channels)` + Args: + rank: Rank of query, key, value tensors. + attn_axes: List/tuple of axes, `[-1, rank)`, that attention will be applied + to. + + Returns: + Einsum equations. + """ + target_notation = _CHR_IDX[:rank] + # `batch_dims` includes the head dim. + batch_dims = tuple(np.delete(range(rank), attn_axes + (rank - 1,))) + letter_offset = rank + source_notation = "" + for i in range(rank): + if i in batch_dims or i == rank - 1: + source_notation += target_notation[i] + else: + source_notation += _CHR_IDX[letter_offset] + letter_offset += 1 + + product_notation = "".join([target_notation[i] for i in batch_dims] + + [target_notation[i] for i in attn_axes] + + [source_notation[i] for i in attn_axes]) + dot_product_equation = f"{source_notation},{target_notation}->{product_notation}" + attn_scores_rank = len(product_notation) + combine_equation = f"{product_notation},{source_notation}->{target_notation}" + return dot_product_equation, combine_equation, attn_scores_rank + + +def _build_proj_equation(free_dims, bound_dims, output_dims): + """Builds an einsum equation for projections inside multi-head attention.""" + input_str = "" + kernel_str = "" + output_str = "" + bias_axes = "" + letter_offset = 0 + for i in range(free_dims): + char = _CHR_IDX[i + letter_offset] + input_str += char + output_str += char + + letter_offset += free_dims + for i in range(bound_dims): + char = _CHR_IDX[i + letter_offset] + input_str += char + kernel_str += char + + letter_offset += bound_dims + for i in range(output_dims): + char = _CHR_IDX[i + letter_offset] + kernel_str += char + output_str += char + bias_axes += char + equation = f"{input_str},{kernel_str}->{output_str}" + + return equation, bias_axes, len(output_str) + + +def _get_output_shape(output_rank, known_last_dims): + return [None] * (output_rank - len(known_last_dims)) + list(known_last_dims) + + +@tf.keras.utils.register_keras_serializable(package="Text") +class LongformerAttention(tf.keras.layers.MultiHeadAttention): + """LongformerAttention. + + Args: + attention_window: int representing the window size for attention. + layer_id: int of the id of the layer. + global_attention_size: the size of global attention used for each token. + """ + + def __init__(self, attention_window, layer_id, global_attention_size, + **kwargs): + super().__init__(**kwargs) + self._layer_id = layer_id + self._attention_window = attention_window + assert (self._attention_window % 2 == 0), ( + f"`attention_window` for layer {self._layer_id} has to be an even " + f"value. Given {self.attention_window}") + assert (self._attention_window > 0), ( + f"`attention_window` for layer {self._layer_id} has to be positive. " + f"Given {self.attention_window}") + self._one_sided_attn_window_size = self._attention_window // 2 + self.global_attention_size = global_attention_size + + def _build_from_signature(self, query, value, key=None): + """Builds layers and variables. + + Once the method is called, self._built_from_signature will be set to True. + Args: + query: Query tensor or TensorShape. + value: Value tensor or TensorShape. + key: Key tensor or TensorShape. + """ + self._built_from_signature = True + if hasattr(query, "shape"): + self._query_shape = tf.TensorShape(query.shape) + else: + self._query_shape = tf.TensorShape(query) + if hasattr(value, "shape"): + self._value_shape = tf.TensorShape(value.shape) + else: + self._value_shape = tf.TensorShape(value) + if key is None: + self._key_shape = self._value_shape + elif hasattr(key, "shape"): + self._key_shape = tf.TensorShape(key.shape) + else: + self._key_shape = tf.TensorShape(key) + + common_kwargs = dict( + kernel_initializer=self._kernel_initializer, + bias_initializer=self._bias_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activity_regularizer=self._activity_regularizer, + kernel_constraint=self._kernel_constraint, + bias_constraint=self._bias_constraint) + # Any setup work performed only once should happen in an `init_scope` + # to avoid creating symbolic Tensors that will later pollute any eager + # operations. + # with tf_utils.maybe_init_scope(self): + # TODO(crickwu): check whether tf_utils.maybe_init_scope(self) (keras) + # is needed. + free_dims = self._query_shape.rank - 1 + einsum_equation, bias_axes, output_rank = _build_proj_equation( + free_dims, bound_dims=1, output_dims=2) + self._query_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape(output_rank - 1, + [self._num_heads, self._key_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="query", + **common_kwargs) + self._global_query_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape(output_rank - 1, + [self._num_heads, self._key_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="global_query", + **common_kwargs) + einsum_equation, bias_axes, output_rank = _build_proj_equation( + self._key_shape.rank - 1, bound_dims=1, output_dims=2) + self._key_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape(output_rank - 1, + [self._num_heads, self._key_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="key", + **common_kwargs) + self._global_key_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape(output_rank - 1, + [self._num_heads, self._key_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="global_key", + **common_kwargs) + einsum_equation, bias_axes, output_rank = _build_proj_equation( + self._value_shape.rank - 1, bound_dims=1, output_dims=2) + self._value_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape(output_rank - 1, + [self._num_heads, self._value_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="value", + **common_kwargs) + self._global_value_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=_get_output_shape(output_rank - 1, + [self._num_heads, self._value_dim]), + bias_axes=bias_axes if self._use_bias else None, + name="global_value", + **common_kwargs) + + # Builds the attention computations for multi-head dot product attention. + # These computations could be wrapped into the keras attention layer once + # it support mult-head einsum computations. + self._build_attention(output_rank) + self._global_dropout_layer = tf.keras.layers.Dropout(rate=self._dropout) + # self._output_dense = self._make_output_dense( + # free_dims, common_kwargs, "attention_output") + self._output_dense = tf.keras.layers.Dense( + units=self._num_heads * self._key_dim, name="dense", **common_kwargs) + + def call(self, + hidden_states, + attention_mask=None, + is_index_masked=None, + is_index_global_attn=None, + training=None): + """Applies Dot-product attention with query, key, value tensors. + + This function defines the computation inside `call` with projected + multi-head Q, K, V inputs. Users can override this function for customized + attention implementation. + Args: + hidden_states: inputs for generating query, key and value tensors. + attention_mask: a boolean mask of shape `(B, T, S)`, that prevents + attention to certain positions. + is_index_masked: boolean indicating whether the index is masked. + is_index_global_attn: boolean indicating whether the index is global + attention. + training: Python boolean indicating whether the layer should behave in + training mode (adding dropout) or in inference mode (doing nothing). + + Returns: + attention_output: Multi-headed outputs of attention computation. + """ + if not self._built_from_signature: + self._build_from_signature( + query=hidden_states, value=hidden_states, key=hidden_states) + + # N = `num_attention_heads` + # H = `size_per_head` + # `query` = [B, T, N ,H] + query = self._query_dense(hidden_states) + + # `key` = [B, S, N, H] + key = self._key_dense(hidden_states) + + # `value` = [B, S, N, H] + value = self._value_dense(hidden_states) + + # Note: Applying scalar multiply at the smaller end of einsum improves + # XLA performance, but may introduce slight numeric differences in + # the Transformer attention head. + query = tf.multiply(query, 1.0 / math.sqrt(float(self._key_dim))) + batch_size, seq_len, num_heads, head_dim = get_shape_list(query) + + # attn_probs = (batch_size, seq_len, num_heads, window*2+1) + attn_scores = self._sliding_chunks_query_key_matmul( + query, key, self._one_sided_attn_window_size) + + # diagonal mask with zeros everywhere and -inf inplace of padding + diagonal_mask = self._sliding_chunks_query_key_matmul( + tf.ones(get_shape_list(attention_mask)), + attention_mask, + self._one_sided_attn_window_size, + ) + + # pad local attention probs + attn_scores += diagonal_mask + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(attn_scores), + [ + batch_size, seq_len, self._num_heads, + self._one_sided_attn_window_size * 2 + 1 + ], + message=f"attn_probs should be of size " + f"({batch_size}, {seq_len}, {num_heads}, " + f"{self._one_sided_attn_window_size * 2 + 1})," + f" but is of size {get_shape_list(attn_scores)}", + ) + + # compute global attn indices required through out forward fn + ( + max_num_global_attn_indices, + is_index_global_attn_nonzero, + is_local_index_global_attn_nonzero, + is_local_index_no_global_attn_nonzero, + ) = self._get_global_attn_indices(is_index_global_attn, + self.global_attention_size) + # this function is only relevant for global attention + if self.global_attention_size > 0: + attn_scores = self._concat_with_global_key_attn_probs( + attn_scores=attn_scores, + query_vectors=query, + key_vectors=key, + max_num_global_attn_indices=max_num_global_attn_indices, + is_index_global_attn_nonzero=is_index_global_attn_nonzero, + is_local_index_global_attn_nonzero=is_local_index_global_attn_nonzero, + is_local_index_no_global_attn_nonzero=is_local_index_no_global_attn_nonzero, + ) + else: + pass + + attn_probs = tf.nn.softmax(attn_scores, axis=-1) + + # softmax sometimes inserts NaN if all positions are masked, + # replace them with 0 + # Make sure to create a mask with the proper shape: + # if is_global_attn==True => [batch_size, seq_len, self.num_heads, + # self.one_sided_attn_window_size * 2 + max_num_global_attn_indices + 1] + # if is_global_attn==False => [batch_size, seq_len, self.num_heads, + # self.one_sided_attn_window_size * 2 + 1] + if self.global_attention_size > 0: + masked_index = tf.tile( + is_index_masked[:, :, None, None], + (1, 1, self._num_heads, self._one_sided_attn_window_size * 2 + + max_num_global_attn_indices + 1), + ) + else: + masked_index = tf.tile( + is_index_masked[:, :, None, None], + (1, 1, self._num_heads, self._one_sided_attn_window_size * 2 + 1), + ) + + attn_probs = tf.where( + masked_index, + tf.zeros(get_shape_list(masked_index), dtype=attn_probs.dtype), + attn_probs, + ) + + layer_head_mask = None + if layer_head_mask is not None: + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(layer_head_mask), + [self._num_heads], + message=f"Head mask for a single layer should be of size " + f"{(self._num_heads)}, but is " + f"{get_shape_list(layer_head_mask)}", + ) + + attn_probs = tf.reshape(layer_head_mask, (1, 1, -1, 1)) * attn_probs + + # apply dropout + attn_probs = self._dropout_layer(attn_probs, training=training) + value_vectors = tf.reshape( + value, (batch_size, seq_len, self._num_heads, self._key_dim)) + + # if global attention, compute sum of global and local attn + if self.global_attention_size > 0: + attn_output = self._compute_attn_output_with_global_indices( + value_vectors=value_vectors, + attn_probs=attn_probs, + max_num_global_attn_indices=max_num_global_attn_indices, + is_index_global_attn_nonzero=is_index_global_attn_nonzero, + is_local_index_global_attn_nonzero=is_local_index_global_attn_nonzero, + ) + else: + attn_output = self._sliding_chunks_matmul_attn_probs_value( + attn_probs, value_vectors, self._one_sided_attn_window_size) + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(attn_output), + [batch_size, seq_len, self._num_heads, head_dim], + message="Unexpected size", + ) + + attn_output = tf.reshape( + attn_output, + (batch_size, seq_len, self._num_heads * self._key_dim)) # FIXME + + # compute value for global attention and overwrite to attention output + # TODO(crickwu): remove the redundant computation + if self.global_attention_size > 0: + attn_output, global_attn_probs = self._compute_global_attn_output_from_hidden( # pylint: disable=unused-variable + attn_output=attn_output, + hidden_states=hidden_states, + max_num_global_attn_indices=max_num_global_attn_indices, + layer_head_mask=layer_head_mask, + is_local_index_global_attn_nonzero=is_local_index_global_attn_nonzero, + is_index_global_attn_nonzero=is_index_global_attn_nonzero, + is_local_index_no_global_attn_nonzero=is_local_index_no_global_attn_nonzero, + is_index_masked=is_index_masked, + training=training, + ) + else: + global_attn_probs = tf.zeros( + (batch_size, self._num_heads, max_num_global_attn_indices, seq_len)) + + # make sure that local attention probabilities are set to 0 for indices of + # global attn + if self.global_attention_size > 0: + masked_global_attn_index = tf.tile( + is_index_global_attn[:, :, None, None], + (1, 1, self._num_heads, self._one_sided_attn_window_size * 2 + + max_num_global_attn_indices + 1), + ) + else: + masked_global_attn_index = tf.tile( + is_index_global_attn[:, :, None, None], + (1, 1, self._num_heads, self._one_sided_attn_window_size * 2 + 1), + ) + + attn_probs = tf.where( + masked_global_attn_index, + tf.zeros( + get_shape_list(masked_global_attn_index), dtype=attn_probs.dtype), + attn_probs, + ) + + # we can return extra information here + # (attn_output, attn_probs, global_attn_probs) + + return attn_output + + def get_config(self): + config = { + "layer_id": self._layer_id, + "attention_window": self._one_sided_attn_window_size, + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def _sliding_chunks_query_key_matmul(self, query, key, window_overlap): + """Matrix multiplication of query and key tensors. + + This multiplication uses a sliding window attention pattern. + + This implementation splits the input into overlapping chunks of size + 2w (e.g. 512 for pretrained Longformer) with an overlap of size + window_overlap. + Args: + query: query tensor. + key: key tensor. + window_overlap: int. + Returns: + diagonal_attention_scores: tensor. + """ + batch_size, seq_len, num_heads, head_dim = get_shape_list(query) + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + seq_len % (window_overlap * 2), + 0, + message=f"Sequence length should be multiple of {window_overlap * 2}. " + f"Given {seq_len}", + ) + tf.debugging.assert_equal( + get_shape_list(query), + get_shape_list(key), + message=f"Shape of query and key should be equal, but got query: " + f"{get_shape_list(query)} and key: {get_shape_list(key)}", + ) + + chunks_count = seq_len // window_overlap - 1 + + # group batch_size and num_heads dimensions into one, + # then chunk seq_len into chunks of size window_overlap * 2 + query = tf.reshape( + tf.transpose(query, (0, 2, 1, 3)), + (batch_size * num_heads, seq_len, head_dim), + ) + key = tf.reshape( + tf.transpose(key, (0, 2, 1, 3)), + (batch_size * num_heads, seq_len, head_dim)) + chunked_query = self._chunk(query, window_overlap) + chunked_key = self._chunk(key, window_overlap) + + # matrix multiplication + # bcxd: batch_size * num_heads x chunks x 2window_overlap x head_dim + # bcyd: batch_size * num_heads x chunks x 2window_overlap x head_dim + # bcxy: batch_size * num_heads x chunks x 2window_overlap x 2window_overlap + chunked_query = tf.cast(chunked_query, dtype=chunked_key.dtype) + chunked_attention_scores = tf.einsum("bcxd,bcyd->bcxy", chunked_query, + chunked_key) # multiply + + # convert diagonals into columns + paddings = tf.convert_to_tensor([[0, 0], [0, 0], [0, 1], [0, 0]]) + diagonal_chunked_attention_scores = self._pad_and_transpose_last_two_dims( + chunked_attention_scores, paddings) + + # allocate space for the overall attention matrix where the chunks are + # combined. The last dimension + # has (window_overlap * 2 + 1) columns. The first (window_overlap) columns + # are the window_overlap lower triangles (attention from a word to + # window_overlap previous words). The following column is attention score + # from each word to itself, then + # followed by window_overlap columns for the upper triangle. + + # copy parts from diagonal_chunked_attention_scores into the combined matrix + # of attentions - copying the main diagonal and the upper triangle + # TODO(crickwu): This code is most likely not very efficient and should be + # improved. + diagonal_attn_scores_up_triang = tf.concat( + [ + diagonal_chunked_attention_scores[:, :, :window_overlap, : + window_overlap + 1], + diagonal_chunked_attention_scores[:, -1:, + window_overlap:, :window_overlap + + 1], + ], + axis=1, + ) + + # - copying the lower triangle + diagonal_attn_scores_low_triang = tf.concat( + [ + tf.zeros( + (batch_size * num_heads, 1, window_overlap, window_overlap), + dtype=diagonal_chunked_attention_scores.dtype, + ), + diagonal_chunked_attention_scores[:, :, -(window_overlap + 1):-1, + window_overlap + 1:], + ], + axis=1, + ) + diagonal_attn_scores_first_chunk = tf.concat( + [ + tf.roll( + diagonal_chunked_attention_scores, + shift=[1, window_overlap], + axis=[2, 3], + )[:, :, :window_overlap, :window_overlap], + tf.zeros( + (batch_size * num_heads, 1, window_overlap, window_overlap), + dtype=diagonal_chunked_attention_scores.dtype, + ), + ], + axis=1, + ) + first_chunk_mask = ( + tf.tile( + tf.range(chunks_count + 1)[None, :, None, None], + (batch_size * num_heads, 1, window_overlap, window_overlap), + ) < 1) + + diagonal_attn_scores_low_triang = tf.where( + first_chunk_mask, + diagonal_attn_scores_first_chunk, + diagonal_attn_scores_low_triang, + ) + + # merging upper and lower triangle + diagonal_attention_scores = tf.concat( + [diagonal_attn_scores_low_triang, diagonal_attn_scores_up_triang], + axis=-1) + + # separate batch_size and num_heads dimensions again + diagonal_attention_scores = tf.transpose( + tf.reshape( + diagonal_attention_scores, + (batch_size, num_heads, seq_len, 2 * window_overlap + 1), + ), + (0, 2, 1, 3), + ) + + diagonal_attention_scores = self._mask_invalid_locations( + diagonal_attention_scores, window_overlap) + + return diagonal_attention_scores + + @staticmethod + def _mask_invalid_locations(input_tensor, window_overlap): + # create correct upper triangle bool mask + mask_2d_upper = tf.reverse( + tf.linalg.band_part( + tf.ones(shape=(window_overlap, window_overlap + 1)), -1, 0), + axis=[0], + ) + + # pad to full matrix + padding = tf.convert_to_tensor( + [[0, get_shape_list(input_tensor)[1] - window_overlap], + [0, get_shape_list(input_tensor)[3] - window_overlap - 1]]) + + # create lower mask + mask_2d = tf.pad(mask_2d_upper, padding) + + # combine with upper mask + mask_2d = mask_2d + tf.reverse(mask_2d, axis=[0, 1]) + + # broadcast to full matrix + mask_4d = tf.tile(mask_2d[None, :, None, :], + (get_shape_list(input_tensor)[0], 1, 1, 1)) + + # inf tensor used for masking + inf_tensor = -float("inf") * tf.ones_like(input_tensor) + + # mask + input_tensor = tf.where( + tf.math.greater(mask_4d, 0), inf_tensor, input_tensor) + + return input_tensor + + def _sliding_chunks_matmul_attn_probs_value(self, attn_probs, value, + window_overlap): + """Same as _sliding_chunks_query_key_matmul but for attn_probs and value.""" + + batch_size, seq_len, num_heads, head_dim = get_shape_list(value) + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + seq_len % (window_overlap * 2), + 0, + message="Seq_len has to be multiple of 2 * window_overlap", + ) + tf.debugging.assert_equal( + get_shape_list(attn_probs)[:3], + get_shape_list(value)[:3], + message="value and attn_probs must have same dims (except head_dim)", + ) + tf.debugging.assert_equal( + get_shape_list(attn_probs)[3], + 2 * window_overlap + 1, + message="attn_probs last dim has to be 2 * window_overlap + 1", + ) + + chunks_count = seq_len // window_overlap - 1 + + # group batch_size and num_heads dimensions into one, then chunk seq_len + # into chunks of size 2 window overlap + chunked_attn_probs = tf.reshape( + tf.transpose(attn_probs, (0, 2, 1, 3)), + ( + batch_size * num_heads, + seq_len // window_overlap, + window_overlap, + 2 * window_overlap + 1, + ), + ) + + # group batch_size and num_heads dimensions into one + value = tf.reshape( + tf.transpose(value, (0, 2, 1, 3)), + (batch_size * num_heads, seq_len, head_dim), + ) + + # pad seq_len with w at the beginning of the sequence and another window + # overlap at the end + paddings = tf.convert_to_tensor([[0, 0], [window_overlap, window_overlap], + [0, 0]]) + padded_value = tf.pad(value, paddings, constant_values=-1) + + # chunk padded_value into chunks of size 3 window overlap and an overlap of + # size window overlap + frame_size = 3 * window_overlap * head_dim + frame_hop_size = (get_shape_list(padded_value)[1] * head_dim - + frame_size) // chunks_count + chunked_value = tf.signal.frame( + tf.reshape(padded_value, (batch_size * num_heads, -1)), + frame_size, + frame_hop_size, + ) + chunked_value = tf.reshape( + chunked_value, + (batch_size * num_heads, chunks_count + 1, 3 * window_overlap, + head_dim), + ) + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(chunked_value), + [ + batch_size * num_heads, chunks_count + 1, 3 * window_overlap, + head_dim + ], + message="Chunked value has the wrong shape", + ) + + chunked_attn_probs = self._pad_and_diagonalize(chunked_attn_probs) + context = tf.einsum("bcwd,bcdh->bcwh", chunked_attn_probs, chunked_value) + context = tf.transpose( + tf.reshape(context, (batch_size, num_heads, seq_len, head_dim)), + (0, 2, 1, 3), + ) + + return context + + @staticmethod + def _pad_and_transpose_last_two_dims(hidden_states_padded, paddings): + """Pads rows and then flips rows and columns.""" + hidden_states_padded = tf.pad( + hidden_states_padded, paddings + ) # padding value is not important because it will be overwritten + batch_size, chunk_size, seq_length, hidden_dim = get_shape_list( + hidden_states_padded) + hidden_states_padded = tf.reshape( + hidden_states_padded, (batch_size, chunk_size, hidden_dim, seq_length)) + + return hidden_states_padded + + @staticmethod + def _pad_and_diagonalize(chunked_hidden_states): + """Shifts every row 1 step right, converting columns into diagonals. + + Example:: + + chunked_hidden_states: [ 0.4983, 2.6918, -0.0071, 1.0492, + -1.8348, 0.7672, 0.2986, 0.0285, + -0.7584, 0.4206, -0.0405, 0.1599, + 2.0514, -1.1600, 0.5372, 0.2629 ] + window_overlap = num_rows = 4 + (pad & diagonalize) => + [ 0.4983, 2.6918, -0.0071, 1.0492, 0.0000, 0.0000, 0.0000 + 0.0000, -1.8348, 0.7672, 0.2986, 0.0285, 0.0000, 0.0000 + 0.0000, 0.0000, -0.7584, 0.4206, -0.0405, 0.1599, 0.0000 + 0.0000, 0.0000, 0.0000, 2.0514, -1.1600, 0.5372, 0.2629 ] + Args: + chunked_hidden_states: tensor. + Returns: + padded_hidden_stategs: tensor. + """ + total_num_heads, num_chunks, window_overlap, hidden_dim = get_shape_list( + chunked_hidden_states) + paddings = tf.convert_to_tensor([[0, 0], [0, 0], [0, 0], + [0, window_overlap + 1]]) + + chunked_hidden_states = tf.pad(chunked_hidden_states, paddings) + + chunked_hidden_states = tf.reshape(chunked_hidden_states, + (total_num_heads, num_chunks, -1)) + chunked_hidden_states = chunked_hidden_states[:, :, :-window_overlap] + chunked_hidden_states = tf.reshape( + chunked_hidden_states, + (total_num_heads, num_chunks, window_overlap, + window_overlap + hidden_dim), + ) + chunked_hidden_states = chunked_hidden_states[:, :, :, :-1] + + return chunked_hidden_states + + @staticmethod + def _chunk(hidden_states, window_overlap): + """convert into overlapping chunks. Chunk size = 2w, overlap size = w.""" + batch_size, seq_length, hidden_dim = get_shape_list(hidden_states) + num_output_chunks = 2 * (seq_length // (2 * window_overlap)) - 1 + + # define frame size and frame stride (similar to convolution) + frame_hop_size = window_overlap * hidden_dim + frame_size = 2 * frame_hop_size + hidden_states = tf.reshape(hidden_states, + (batch_size, seq_length * hidden_dim)) + + # chunk with overlap + chunked_hidden_states = tf.signal.frame(hidden_states, frame_size, + frame_hop_size) + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(chunked_hidden_states), + [batch_size, num_output_chunks, frame_size], + message=f"Make sure chunking is correctly applied. `Chunked hidden " + f"states should have output dimension" + f" {[batch_size, frame_size, num_output_chunks]}, but got " + f"{get_shape_list(chunked_hidden_states)}.", + ) + + chunked_hidden_states = tf.reshape( + chunked_hidden_states, + (batch_size, num_output_chunks, 2 * window_overlap, hidden_dim), + ) + + return chunked_hidden_states + + @staticmethod + def _get_global_attn_indices(is_index_global_attn, global_attention_size): + """Computes global attn indices required throughout forward pass.""" + # All global attention size are fixed through global_attention_size + + batch_size, _ = get_shape_list(is_index_global_attn) + + max_num_global_attn_indices = global_attention_size + + row_indices = tf.range(batch_size) + row_indices = tf.repeat( + tf.expand_dims(row_indices, axis=0), + repeats=[global_attention_size], + axis=0) + row_indices = tf.reshape(row_indices, + (batch_size * global_attention_size, 1)) + + col_indices = tf.range(global_attention_size) + col_indices = tf.repeat( + tf.expand_dims(col_indices, axis=1), repeats=[batch_size], axis=0) + + is_index_global_attn_nonzero = tf.concat((row_indices, col_indices), axis=1) + + # this is actually same as `is_index_global_attn_nonzero`, + # since we assume all global attention are the same size + is_local_index_global_attn_nonzero = tf.concat((row_indices, col_indices), + axis=1) + + # empty tensor + is_local_index_no_global_attn_nonzero = tf.reshape( + tf.expand_dims(tf.range(0), axis=1), (0, 2)) + return ( + max_num_global_attn_indices, + is_index_global_attn_nonzero, + is_local_index_global_attn_nonzero, + is_local_index_no_global_attn_nonzero, + ) + + def _concat_with_global_key_attn_probs( + self, + attn_scores, + key_vectors, + query_vectors, + max_num_global_attn_indices, + is_index_global_attn_nonzero, + is_local_index_global_attn_nonzero, + is_local_index_no_global_attn_nonzero, + ): + batch_size = get_shape_list(key_vectors)[0] + + # select global key vectors + global_key_vectors = tf.gather_nd(key_vectors, is_index_global_attn_nonzero) + + # create only global key vectors + key_vectors_only_global = tf.scatter_nd( + is_local_index_global_attn_nonzero, + global_key_vectors, + shape=( + batch_size, + max_num_global_attn_indices, + self._num_heads, + self._key_dim, + ), + ) + + # (batch_size, seq_len, num_heads, max_num_global_attn_indices) + attn_probs_from_global_key = tf.einsum("blhd,bshd->blhs", query_vectors, + key_vectors_only_global) + + # (batch_size, max_num_global_attn_indices, seq_len, num_heads) + attn_probs_from_global_key_trans = tf.transpose(attn_probs_from_global_key, + (0, 3, 1, 2)) + mask_shape = ( + get_shape_list(is_local_index_no_global_attn_nonzero)[0],) + tuple( + get_shape_list(attn_probs_from_global_key_trans)[-2:]) + mask = tf.ones(mask_shape) * -10000.0 + mask = tf.cast(mask, dtype=attn_probs_from_global_key_trans.dtype) + + # scatter mask + attn_probs_from_global_key_trans = tf.tensor_scatter_nd_update( + attn_probs_from_global_key_trans, + is_local_index_no_global_attn_nonzero, + mask, + ) + + # (batch_size, seq_len, num_heads, max_num_global_attn_indices) + attn_probs_from_global_key = tf.transpose(attn_probs_from_global_key_trans, + (0, 2, 3, 1)) + + # concat to attn_probs + # (batch_size, seq_len, num_heads, extra attention count + 2*window+1) + attn_scores = tf.concat((attn_probs_from_global_key, attn_scores), axis=-1) + return attn_scores + + def _compute_attn_output_with_global_indices( + self, + value_vectors, + attn_probs, + max_num_global_attn_indices, + is_index_global_attn_nonzero, + is_local_index_global_attn_nonzero, + ): + batch_size = get_shape_list(attn_probs)[0] + + # cut local attn probs to global only + attn_probs_only_global = attn_probs[:, :, :, :max_num_global_attn_indices] + + # select global value vectors + global_value_vectors = tf.gather_nd(value_vectors, + is_index_global_attn_nonzero) + + # create only global value vectors + value_vectors_only_global = tf.scatter_nd( + is_local_index_global_attn_nonzero, + global_value_vectors, + shape=( + batch_size, + max_num_global_attn_indices, + self._num_heads, + self._key_dim, + ), + ) + + # compute attn output only global + attn_output_only_global = tf.einsum("blhs,bshd->blhd", + attn_probs_only_global, + value_vectors_only_global) + # reshape attn probs + attn_probs_without_global = attn_probs[:, :, :, + max_num_global_attn_indices:] + + # compute attn output with global + attn_output_without_global = self._sliding_chunks_matmul_attn_probs_value( + attn_probs_without_global, value_vectors, + self._one_sided_attn_window_size) + + return attn_output_only_global + attn_output_without_global + + def _compute_global_attn_output_from_hidden( + self, + attn_output, + hidden_states, + max_num_global_attn_indices, + layer_head_mask, + is_local_index_global_attn_nonzero, + is_index_global_attn_nonzero, + is_local_index_no_global_attn_nonzero, + is_index_masked, + training, + ): + batch_size, seq_len = get_shape_list(hidden_states)[:2] + + # prepare global hidden states + global_attn_hidden_states = tf.gather_nd(hidden_states, + is_index_global_attn_nonzero) + global_attn_hidden_states = tf.scatter_nd( + is_local_index_global_attn_nonzero, + global_attn_hidden_states, + shape=(batch_size, max_num_global_attn_indices, + self._num_heads * self._key_dim), + ) + + # global key, query, value + global_query_vectors_only_global = self._global_query_dense( + global_attn_hidden_states) + global_key_vectors = self._global_key_dense(hidden_states) + global_value_vectors = self._global_value_dense(hidden_states) + + # normalize + global_query_vectors_only_global /= tf.math.sqrt( + tf.cast(self._key_dim, dtype=global_query_vectors_only_global.dtype)) + global_query_vectors_only_global = self.reshape_and_transpose( + global_query_vectors_only_global, batch_size) + global_key_vectors = self.reshape_and_transpose(global_key_vectors, + batch_size) + global_value_vectors = self.reshape_and_transpose(global_value_vectors, + batch_size) + + # compute attn scores + global_attn_scores = tf.matmul( + global_query_vectors_only_global, global_key_vectors, transpose_b=True) + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(global_attn_scores), + [batch_size * self._num_heads, max_num_global_attn_indices, seq_len], + message=f"global_attn_scores have the wrong size. Size should be" + f"{(batch_size * self._num_heads, max_num_global_attn_indices, seq_len)}, " + f"but is {get_shape_list(global_attn_scores)}.", + ) + + global_attn_scores = tf.reshape( + global_attn_scores, + (batch_size, self._num_heads, max_num_global_attn_indices, seq_len), + ) + global_attn_scores_trans = tf.transpose(global_attn_scores, (0, 2, 1, 3)) + mask_shape = (get_shape_list(is_local_index_no_global_attn_nonzero)[0], + ) + tuple(get_shape_list(global_attn_scores_trans)[-2:]) + global_attn_mask = tf.ones(mask_shape) * -10000.0 + global_attn_mask = tf.cast( + global_attn_mask, dtype=global_attn_scores_trans.dtype) + + # scatter mask + global_attn_scores_trans = tf.tensor_scatter_nd_update( + global_attn_scores_trans, + is_local_index_no_global_attn_nonzero, + global_attn_mask, + ) + global_attn_scores = tf.transpose(global_attn_scores_trans, (0, 2, 1, 3)) + + # mask global attn scores + attn_mask = tf.tile(is_index_masked[:, None, None, :], + (1, get_shape_list(global_attn_scores)[1], 1, 1)) + global_attn_scores = tf.where(attn_mask, -10000.0, global_attn_scores) + global_attn_scores = tf.reshape( + global_attn_scores, + (batch_size * self._num_heads, max_num_global_attn_indices, seq_len), + ) + + # compute global attn probs + global_attn_probs_float = tf.nn.softmax(global_attn_scores, axis=-1) + + # apply layer head masking + if layer_head_mask is not None: + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(layer_head_mask), + [self._num_heads], + message=f"Head mask for a single layer should be of size " + f"{(self._num_heads)}, but is {get_shape_list(layer_head_mask)}", + ) + global_attn_probs_float = tf.reshape( + layer_head_mask, + (1, -1, 1, 1)) * tf.reshape(global_attn_probs_float, + (batch_size, self._num_heads, + max_num_global_attn_indices, seq_len)) + global_attn_probs_float = tf.reshape( + global_attn_probs_float, + (batch_size * self._num_heads, max_num_global_attn_indices, seq_len)) + + # dropout + global_attn_probs = self._global_dropout_layer( + global_attn_probs_float, training=training) + + # global attn output + global_attn_output = tf.matmul(global_attn_probs, global_value_vectors) + + if tf.executing_eagerly(): + tf.debugging.assert_equal( + get_shape_list(global_attn_output), + [ + batch_size * self._num_heads, max_num_global_attn_indices, + self._key_dim + ], + message=f"global_attn_output tensor has the wrong size. Size should be " + f"{(batch_size * self._num_heads, max_num_global_attn_indices, self._key_dim)}, " + f"but is {get_shape_list(global_attn_output)}.", + ) + + global_attn_output = tf.reshape( + global_attn_output, + (batch_size, self._num_heads, max_num_global_attn_indices, + self._key_dim), + ) + + # get only non zero global attn output + nonzero_global_attn_output = tf.gather_nd( + tf.transpose(global_attn_output, (0, 2, 1, 3)), + is_local_index_global_attn_nonzero, + ) + nonzero_global_attn_output = tf.reshape( + nonzero_global_attn_output, + (get_shape_list(is_local_index_global_attn_nonzero)[0], -1), + ) + + # overwrite values with global attention + attn_output = tf.tensor_scatter_nd_update(attn_output, + is_index_global_attn_nonzero, + nonzero_global_attn_output) + + global_attn_probs = tf.reshape( + global_attn_probs, + (batch_size, self._num_heads, max_num_global_attn_indices, seq_len)) + + attn_output = self._output_dense(attn_output) + + return attn_output, global_attn_probs + + def reshape_and_transpose(self, vector, batch_size): + return tf.reshape( + tf.transpose( + tf.reshape(vector, + (batch_size, -1, self._num_heads, self._key_dim)), + (0, 2, 1, 3), + ), + (batch_size * self._num_heads, -1, self._key_dim), + ) diff --git a/official/projects/longformer/longformer_attention_test.py b/official/projects/longformer/longformer_attention_test.py new file mode 100644 index 0000000000000000000000000000000000000000..9211987e62ab752954bb345f9daf635164cbd12d --- /dev/null +++ b/official/projects/longformer/longformer_attention_test.py @@ -0,0 +1,306 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for official.nlp.projects.longformer.longformer_attention.""" + +import numpy as np +import tensorflow as tf + +from official.modeling.tf_utils import get_shape_list +from official.projects.longformer import longformer_attention + + +def _create_mock_attention_data(num_heads, + key_dim, + value_dim, + q_seq_length, + kv_seq_length, + batch_size, + include_mask=False): + """Creates mock testing data. + + Args: + num_heads: `int`, Number of attention heads. + key_dim: `int`, Size of query head. + value_dim: `int`, Size of key, value dim. + q_seq_length: `int`, query sequence length of the input. + kv_seq_length: `int`, key, value sequence length of the input. + batch_size: `int`, the batch size. + include_mask: optional `bool`, whether or not to include mask data. + + Returns: + A dictionary with `str` as keys and `Tensor` as values. + """ + query_shape = (batch_size, q_seq_length, key_dim) + value_shape = (batch_size, kv_seq_length, value_dim) + + data = dict( + query=tf.random.normal(shape=query_shape), + value=tf.random.normal(shape=value_shape), + key=tf.random.normal(shape=value_shape)) + + total_seq_length = kv_seq_length + + if include_mask: + mask_shape = (batch_size, num_heads, q_seq_length, total_seq_length) + mask_data = np.random.randint(2, size=mask_shape).astype('float32') + mask_data = dict(attention_mask=mask_data) + data.update(mask_data) + + return data + + +class LongformerAttentionTest(tf.test.TestCase): + + def setUp(self): + super(LongformerAttentionTest, self).setUp() + np.random.seed(0) + tf.random.set_seed(0) + + def _get_hidden_states(self): + return tf.convert_to_tensor( + [[ + [ + 4.98332758e-01, + 2.69175139e00, + -7.08081422e-03, + 1.04915401e00, + -1.83476661e00, + 7.67220476e-01, + 2.98580543e-01, + 2.84803992e-02, + ], + [ + -7.58357372e-01, + 4.20635998e-01, + -4.04739919e-02, + 1.59924145e-01, + 2.05135748e00, + -1.15997978e00, + 5.37166397e-01, + 2.62873606e-01, + ], + [ + -1.69438001e00, + 4.17574660e-01, + -1.49196962e00, + -1.76483717e00, + -1.94566312e-01, + -1.71183858e00, + 7.72903565e-01, + -1.11557056e00, + ], + [ + 5.44028163e-01, + 2.05466114e-01, + -3.63045868e-01, + 2.41865062e-01, + 3.20348382e-01, + -9.05611176e-01, + -1.92690727e-01, + -1.19917547e00, + ], + ]], + dtype=tf.float32, + ) + + def test_diagonalize(self): + hidden_states = self._get_hidden_states() + hidden_states = tf.reshape(hidden_states, + (1, 8, 4)) # set seq length = 8, hidden dim = 4 + chunked_hidden_states = longformer_attention.LongformerAttention._chunk( + hidden_states, window_overlap=2) + window_overlap_size = get_shape_list(chunked_hidden_states)[2] + self.assertEqual(window_overlap_size, 4) + + padded_hidden_states = longformer_attention.LongformerAttention._pad_and_diagonalize( + chunked_hidden_states) + + self.assertEqual( + get_shape_list(padded_hidden_states)[-1], + get_shape_list(chunked_hidden_states)[-1] + window_overlap_size - 1) + + # first row => [0.4983, 2.6918, -0.0071, 1.0492, 0.0000, 0.0000, 0.0000] + tf.debugging.assert_near( + padded_hidden_states[0, 0, 0, :4], + chunked_hidden_states[0, 0, 0], + rtol=1e-3) + tf.debugging.assert_near( + padded_hidden_states[0, 0, 0, 4:], + tf.zeros((3,), dtype=tf.dtypes.float32), + rtol=1e-3) + + # last row => [0.0000, 0.0000, 0.0000, 2.0514, -1.1600, 0.5372, 0.2629] + tf.debugging.assert_near( + padded_hidden_states[0, 0, -1, 3:], + chunked_hidden_states[0, 0, -1], + rtol=1e-3) + tf.debugging.assert_near( + padded_hidden_states[0, 0, -1, :3], + tf.zeros((3,), dtype=tf.dtypes.float32), + rtol=1e-3) + + def test_pad_and_transpose_last_two_dims(self): + hidden_states = self._get_hidden_states() + self.assertTrue(get_shape_list(hidden_states), [1, 8, 4]) + + # pad along seq length dim + paddings = tf.constant([[0, 0], [0, 0], [0, 1], [0, 0]], + dtype=tf.dtypes.int32) + + hidden_states = longformer_attention.LongformerAttention._chunk( + hidden_states, window_overlap=2) + padded_hidden_states = longformer_attention.LongformerAttention._pad_and_transpose_last_two_dims( + hidden_states, paddings) + self.assertEqual(get_shape_list(padded_hidden_states), [1, 1, 8, 5]) + + expected_added_dim = tf.zeros((5,), dtype=tf.dtypes.float32) + tf.debugging.assert_near( + expected_added_dim, padded_hidden_states[0, 0, -1, :], rtol=1e-6) + tf.debugging.assert_near( + hidden_states[0, 0, -1, :], + tf.reshape(padded_hidden_states, (1, -1))[0, 24:32], + rtol=1e-6) + + def test_mask_invalid_locations(self): + hidden_states = self._get_hidden_states() + batch_size = 1 + seq_length = 8 + hidden_size = 4 + hidden_states = tf.reshape(hidden_states, + (batch_size, seq_length, hidden_size)) + hidden_states = longformer_attention.LongformerAttention._chunk( + hidden_states, window_overlap=2) + + hid_states_1 = longformer_attention.LongformerAttention._mask_invalid_locations( + hidden_states, 1) + hid_states_2 = longformer_attention.LongformerAttention._mask_invalid_locations( + hidden_states, 2) + hid_states_3 = longformer_attention.LongformerAttention._mask_invalid_locations( + hidden_states[:, :, :, :3], 2) + hid_states_4 = longformer_attention.LongformerAttention._mask_invalid_locations( + hidden_states[:, :, 2:, :], 2) + + self.assertEqual( + tf.math.reduce_sum( + tf.cast(tf.math.is_inf(hid_states_1), tf.dtypes.int32)), 8) + self.assertEqual( + tf.math.reduce_sum( + tf.cast(tf.math.is_inf(hid_states_2), tf.dtypes.int32)), 24) + self.assertEqual( + tf.math.reduce_sum( + tf.cast(tf.math.is_inf(hid_states_3), tf.dtypes.int32)), 24) + self.assertEqual( + tf.math.reduce_sum( + tf.cast(tf.math.is_inf(hid_states_4), tf.dtypes.int32)), 12) + + def test_chunk(self): + hidden_states = self._get_hidden_states() + batch_size = 1 + seq_length = 8 + hidden_size = 4 + hidden_states = tf.reshape(hidden_states, + (batch_size, seq_length, hidden_size)) + + chunked_hidden_states = longformer_attention.LongformerAttention._chunk( + hidden_states, window_overlap=2) + + # expected slices across chunk and seq length dim + expected_slice_along_seq_length = tf.convert_to_tensor( + [0.4983, -0.7584, -1.6944], dtype=tf.dtypes.float32) + expected_slice_along_chunk = tf.convert_to_tensor( + [0.4983, -1.8348, -0.7584, 2.0514], dtype=tf.dtypes.float32) + + self.assertEqual(get_shape_list(chunked_hidden_states), [1, 3, 4, 4]) + tf.debugging.assert_near( + chunked_hidden_states[0, :, 0, 0], + expected_slice_along_seq_length, + rtol=1e-3) + tf.debugging.assert_near( + chunked_hidden_states[0, 0, :, 0], + expected_slice_along_chunk, + rtol=1e-3) + + def test_layer_local_attn(self): + hidden_states = self._get_hidden_states() + batch_size, seq_length, _ = hidden_states.shape + layer = longformer_attention.LongformerAttention( + num_heads=2, + key_dim=4, + value_dim=4, + layer_id=0, + attention_window=4, + global_attention_size=0, + ) + + attention_mask = tf.zeros((batch_size, seq_length), dtype=tf.dtypes.float32) + is_index_global_attn = tf.math.greater(attention_mask, 1) + + attention_mask = tf.where( + tf.range(4)[None, :, None, None] > 1, -10000.0, + attention_mask[:, :, None, None]) + is_index_masked = tf.math.less(attention_mask[:, :, 0, 0], 0) + + output_hidden_states = layer( + hidden_states=hidden_states, + attention_mask=attention_mask, + is_index_masked=is_index_masked, + is_index_global_attn=is_index_global_attn, + )[0] + + self.assertTrue(output_hidden_states.shape, (1, 4, 8)) + + def test_layer_global_attn(self): + layer = longformer_attention.LongformerAttention( + num_heads=2, + key_dim=4, + value_dim=4, + layer_id=0, + attention_window=4, + global_attention_size=1, + ) + hidden_states = self._get_hidden_states() + + hidden_states = tf.concat( + [self._get_hidden_states(), + self._get_hidden_states() - 0.5], axis=0) + _, seq_length, _ = hidden_states.shape + + # create attn mask + attention_mask_1 = tf.zeros((1, 1, 1, seq_length), dtype=tf.dtypes.float32) + attention_mask_2 = tf.zeros((1, 1, 1, seq_length), dtype=tf.dtypes.float32) + + attention_mask_1 = tf.where( + tf.range(4)[None, :, None, None] == 0, 10000.0, attention_mask_1) + attention_mask_1 = tf.where( + tf.range(4)[None, :, None, None] > 2, -10000.0, attention_mask_1) + attention_mask_2 = tf.where( + tf.range(4)[None, :, None, None] == 0, 10000.0, attention_mask_2) + attention_mask = tf.concat([attention_mask_1, attention_mask_2], axis=0) + + is_index_masked = tf.math.less(attention_mask[:, :, 0, 0], 0) + is_index_global_attn = tf.math.greater(attention_mask[:, :, 0, 0], 0) + + output_hidden_states = layer( + hidden_states=hidden_states, + attention_mask=-tf.math.abs(attention_mask), + is_index_masked=is_index_masked, + is_index_global_attn=is_index_global_attn, + )[0] + + self.assertTrue(output_hidden_states.shape, (2, 4, 8)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/longformer/longformer_encoder.py b/official/projects/longformer/longformer_encoder.py new file mode 100644 index 0000000000000000000000000000000000000000..c5a29dc4496d0574403188837a884a3be8d44112 --- /dev/null +++ b/official/projects/longformer/longformer_encoder.py @@ -0,0 +1,365 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Longformer encoder. Modified From huggingface/transformers.""" + +# pylint: disable=g-classes-have-attributes + +from typing import Any, Callable, List, Optional, Union + +from absl import logging +import tensorflow as tf + +from official.modeling.tf_utils import get_shape_list +from official.nlp.modeling import layers +from official.projects.longformer.longformer_encoder_block import LongformerEncoderBlock + + +_Initializer = Union[str, tf.keras.initializers.Initializer] +_approx_gelu = lambda x: tf.keras.activations.gelu(x, approximate=True) + + +class LongformerEncoder(tf.keras.layers.Layer): + """LongformerEncoder. + + Args: + vocab_size: The size of the token vocabulary. + attention_window: list of ints representing the window size for each layer. + global_attention_size: the size of global attention used for each token. + pad_token_id: the token id for the pad token + hidden_size: The size of the transformer hidden layers. + num_layers: The number of transformer layers. + num_attention_heads: The number of attention heads for each transformer. The + hidden size must be divisible by the number of attention heads. + max_sequence_length: The maximum sequence length that this encoder can + consume. If None, max_sequence_length uses the value from sequence length. + This determines the variable shape for positional embeddings. + type_vocab_size: The number of types that the 'type_ids' input can take. + inner_dim: The output dimension of the first Dense layer in a two-layer + feedforward network for each transformer. + inner_activation: The activation for the first Dense layer in a two-layer + feedforward network for each transformer. + output_dropout: Dropout probability for the post-attention and output + dropout. + attention_dropout: The dropout rate to use for the attention layers within + the transformer layers. + initializer: The initialzer to use for all weights in this encoder. + output_range: The sequence output range, [0, output_range), by slicing the + target sequence of the last transformer layer. `None` means the entire + target sequence will attend to the source sequence, which yields the full + output. + embedding_width: The width of the word embeddings. If the embedding width is + not equal to hidden size, embedding parameters will be factorized into two + matrices in the shape of ['vocab_size', 'embedding_width'] and + ['embedding_width', 'hidden_size'] ('embedding_width' is usually much + smaller than 'hidden_size'). + embedding_layer: An optional Layer instance which will be called to generate + embeddings for the input word IDs. + norm_first: Whether to normalize inputs to attention and intermediate dense + layers. If set False, output of attention and intermediate dense layers is + normalized. + """ + + def __init__( + self, + vocab_size: int, + attention_window: Union[List[int], int] = 512, + global_attention_size: int = 0, + pad_token_id: int = 1, + hidden_size: int = 768, + num_layers: int = 12, + num_attention_heads: int = 12, + max_sequence_length: int = 512, + type_vocab_size: int = 16, + inner_dim: int = 3072, + inner_activation: Callable[..., Any] = _approx_gelu, + output_dropout: float = 0.1, + attention_dropout: float = 0.1, + initializer: _Initializer = tf.keras.initializers.TruncatedNormal( + stddev=0.02), + output_range: Optional[int] = None, + embedding_width: Optional[int] = None, + embedding_layer: Optional[tf.keras.layers.Layer] = None, + norm_first: bool = False, + **kwargs): + super().__init__(**kwargs) + # Longformer args + self._attention_window = attention_window + self._global_attention_size = global_attention_size + self._pad_token_id = pad_token_id + + activation = tf.keras.activations.get(inner_activation) + initializer = tf.keras.initializers.get(initializer) + + if embedding_width is None: + embedding_width = hidden_size + + if embedding_layer is None: + self._embedding_layer = layers.OnDeviceEmbedding( + vocab_size=vocab_size, + embedding_width=embedding_width, + initializer=initializer, + name='word_embeddings') + else: + self._embedding_layer = embedding_layer + + self._position_embedding_layer = layers.PositionEmbedding( + initializer=initializer, + max_length=max_sequence_length, + name='position_embedding') + + self._type_embedding_layer = layers.OnDeviceEmbedding( + vocab_size=type_vocab_size, + embedding_width=embedding_width, + initializer=initializer, + use_one_hot=True, + name='type_embeddings') + + self._embedding_norm_layer = tf.keras.layers.LayerNormalization( + name='embeddings/layer_norm', axis=-1, epsilon=1e-12, dtype=tf.float32) + + self._embedding_dropout = tf.keras.layers.Dropout( + rate=output_dropout, name='embedding_dropout') + + # We project the 'embedding' output to 'hidden_size' if it is not already + # 'hidden_size'. + self._embedding_projection = None + if embedding_width != hidden_size: + self._embedding_projection = tf.keras.layers.EinsumDense( + '...x,xy->...y', + output_shape=hidden_size, + bias_axes='y', + kernel_initializer=initializer, + name='embedding_projection') + + self._transformer_layers = [] + self._attention_mask_layer = layers.SelfAttentionMask( + name='self_attention_mask') + for i in range(num_layers): + layer = LongformerEncoderBlock( + global_attention_size=global_attention_size, + num_attention_heads=num_attention_heads, + inner_dim=inner_dim, + inner_activation=inner_activation, + attention_window=attention_window[i], + layer_id=i, + output_dropout=output_dropout, + attention_dropout=attention_dropout, + norm_first=norm_first, + output_range=output_range if i == num_layers - 1 else None, + kernel_initializer=initializer, + name=f'transformer/layer_{i}') + self._transformer_layers.append(layer) + + self._pooler_layer = tf.keras.layers.Dense( + units=hidden_size, + activation='tanh', + kernel_initializer=initializer, + name='pooler_transform') + + self._config = { + 'vocab_size': vocab_size, + 'hidden_size': hidden_size, + 'num_layers': num_layers, + 'num_attention_heads': num_attention_heads, + 'max_sequence_length': max_sequence_length, + 'type_vocab_size': type_vocab_size, + 'inner_dim': inner_dim, + 'inner_activation': tf.keras.activations.serialize(activation), + 'output_dropout': output_dropout, + 'attention_dropout': attention_dropout, + 'initializer': tf.keras.initializers.serialize(initializer), + 'output_range': output_range, + 'embedding_width': embedding_width, + 'embedding_layer': embedding_layer, + 'norm_first': norm_first, + 'attention_window': attention_window, + 'global_attention_size': global_attention_size, + 'pad_token_id': pad_token_id, + } + self.inputs = dict( + input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32)) + + def call(self, inputs): + word_embeddings = None + if isinstance(inputs, dict): + word_ids = inputs.get('input_word_ids') # input_ids + mask = inputs.get('input_mask') # attention_mask + type_ids = inputs.get('input_type_ids') # token_type_ids + word_embeddings = inputs.get('input_word_embeddings', + None) # input_embeds + else: + raise ValueError(f'Unexpected inputs type to {self.__class__}.') + + ( + padding_len, + word_ids, + mask, + type_ids, + word_embeddings, + ) = self._pad_to_window_size( + word_ids=word_ids, + mask=mask, + type_ids=type_ids, + word_embeddings=word_embeddings, + pad_token_id=self._pad_token_id) + + if word_embeddings is None: + word_embeddings = self._embedding_layer(word_ids) + # absolute position embeddings. + position_embeddings = self._position_embedding_layer(word_embeddings) + type_embeddings = self._type_embedding_layer(type_ids) + + embeddings = word_embeddings + position_embeddings + type_embeddings + embeddings = self._embedding_norm_layer(embeddings) + embeddings = self._embedding_dropout(embeddings) + + if self._embedding_projection is not None: + embeddings = self._embedding_projection(embeddings) + + batch_size, seq_len = get_shape_list(mask) + # create masks with fixed len global_attention_size + mask = tf.transpose( + tf.concat( + values=[ + tf.ones( + (self._global_attention_size, batch_size), tf.int32) * 2, + tf.transpose(mask)[self._global_attention_size:] + ], + axis=0)) + + is_index_masked = tf.math.less(mask, 1) + + is_index_global_attn = tf.transpose( + tf.concat( + values=[ + tf.ones((self._global_attention_size, batch_size), tf.bool), + tf.zeros((seq_len - self._global_attention_size, batch_size), + tf.bool) + ], + axis=0)) + + # Longformer + attention_mask = mask + extended_attention_mask = tf.reshape( + attention_mask, (tf.shape(mask)[0], tf.shape(mask)[1], 1, 1)) + attention_mask = tf.cast( + tf.math.abs(1 - extended_attention_mask), tf.dtypes.float32) * -10000.0 + + encoder_outputs = [] + x = embeddings + # TFLongformerEncoder + for layer in self._transformer_layers: + x = layer([x, attention_mask, is_index_masked, is_index_global_attn]) + encoder_outputs.append(x) + + last_encoder_output = encoder_outputs[-1] + if padding_len > 0: + last_encoder_output = last_encoder_output[:, :-padding_len] + first_token_tensor = last_encoder_output[:, 0, :] + pooled_output = self._pooler_layer(first_token_tensor) + + return dict( + sequence_output=last_encoder_output, + pooled_output=pooled_output, + encoder_outputs=encoder_outputs) + + def get_embedding_table(self): + return self._embedding_layer.embeddings + + def get_embedding_layer(self): + return self._embedding_layer + + def get_config(self): + return dict(self._config) + + @property + def transformer_layers(self): + """List of Transformer layers in the encoder.""" + return self._transformer_layers + + @property + def pooler_layer(self): + """The pooler dense layer after the transformer layers.""" + return self._pooler_layer + + @classmethod + def from_config(cls, config, custom_objects=None): + if 'embedding_layer' in config and config['embedding_layer'] is not None: + warn_string = ( + 'You are reloading a model that was saved with a ' + 'potentially-shared embedding layer object. If you contine to ' + 'train this model, the embedding layer will no longer be shared. ' + 'To work around this, load the model outside of the Keras API.') + print('WARNING: ' + warn_string) + logging.warn(warn_string) + + return cls(**config) + + def _pad_to_window_size( + self, + word_ids, + mask, + type_ids, + word_embeddings, + pad_token_id, + ): + # padding + attention_window = max(self._attention_window) + + assert (attention_window % + 2 == 0), ('`attention_window` should be an even value.' + f'Given {attention_window}') + + input_shape = get_shape_list( + word_ids) if word_ids is not None else get_shape_list(word_embeddings) + batch_size, seq_len = input_shape[:2] + + if seq_len is not None: + padding_len = (attention_window - + seq_len % attention_window) % attention_window + else: + padding_len = 0 + + paddings = tf.convert_to_tensor([[0, 0], [0, padding_len]]) + + if word_ids is not None: + word_ids = tf.pad(word_ids, paddings, constant_values=pad_token_id) + + if word_embeddings is not None: + + def pad_embeddings(): + word_ids_padding = tf.fill((batch_size, padding_len), self.pad_token_id) + word_embeddings_padding = self._embedding_layer(word_ids_padding) + return tf.concat([word_embeddings, word_embeddings_padding], axis=-2) + + word_embeddings = tf.cond( + tf.math.greater(padding_len, 0), pad_embeddings, + lambda: word_embeddings) + + mask = tf.pad( + mask, paddings, + constant_values=False) # no attention on the padding tokens + token_type_ids = tf.pad( + type_ids, paddings, constant_values=0) # pad with token_type_id = 0 + + return ( + padding_len, + word_ids, + mask, + token_type_ids, + word_embeddings, + ) diff --git a/official/projects/longformer/longformer_encoder_block.py b/official/projects/longformer/longformer_encoder_block.py new file mode 100644 index 0000000000000000000000000000000000000000..1253477d37f69774aa36ea1fe8cbf3b83c12f300 --- /dev/null +++ b/official/projects/longformer/longformer_encoder_block.py @@ -0,0 +1,340 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Longformer attention layer. Modified From huggingface/transformers.""" + +import tensorflow as tf +from official.projects.longformer.longformer_attention import LongformerAttention + + +@tf.keras.utils.register_keras_serializable(package="Text") +class LongformerEncoderBlock(tf.keras.layers.Layer): + """LongformerEncoderBlock. + + Args: + num_attention_heads: Number of attention heads. + inner_dim: The output dimension of the first Dense layer in a two-layer + feedforward network. + inner_activation: The activation for the first Dense layer in a two-layer + feedforward network. + output_range: the sequence output range, [0, output_range) for slicing the + target sequence. `None` means the target sequence is not sliced. + kernel_initializer: Initializer for dense layer kernels. + bias_initializer: Initializer for dense layer biases. + kernel_regularizer: Regularizer for dense layer kernels. + bias_regularizer: Regularizer for dense layer biases. + activity_regularizer: Regularizer for dense layer activity. + kernel_constraint: Constraint for dense layer kernels. + bias_constraint: Constraint for dense layer kernels. + use_bias: Whether to enable use_bias in attention layer. If set False, + use_bias in attention layer is disabled. + norm_first: Whether to normalize inputs to attention and intermediate + dense layers. If set False, output of attention and intermediate dense + layers is normalized. + norm_epsilon: Epsilon value to initialize normalization layers. + output_dropout: Dropout probability for the post-attention and output + dropout. + attention_dropout: Dropout probability for within the attention layer. + inner_dropout: Dropout probability for the first Dense layer in a + two-layer feedforward network. + attention_initializer: Initializer for kernels of attention layers. If set + `None`, attention layers use kernel_initializer as initializer for + kernel. + attention_axes: axes over which the attention is applied. `None` means + attention over all axes, but batch, heads, and features. + **kwargs: keyword arguments/ + """ + + def __init__( + self, + global_attention_size, + num_attention_heads, + inner_dim, + inner_activation, + # Longformer + attention_window, + layer_id=0, + output_range=None, + kernel_initializer="glorot_uniform", + bias_initializer="zeros", + kernel_regularizer=None, + bias_regularizer=None, + activity_regularizer=None, + kernel_constraint=None, + bias_constraint=None, + use_bias=True, + norm_first=False, + norm_epsilon=1e-12, + output_dropout=0.0, + attention_dropout=0.0, + inner_dropout=0.0, + attention_initializer=None, + attention_axes=None, + **kwargs): + super().__init__(**kwargs) + + self.global_attention_size = global_attention_size + self._num_heads = num_attention_heads + self._inner_dim = inner_dim + self._inner_activation = inner_activation + # Longformer + self._attention_window = attention_window + self._layer_id = layer_id + self._attention_dropout = attention_dropout + self._attention_dropout_rate = attention_dropout + self._output_dropout = output_dropout + self._output_dropout_rate = output_dropout + self._output_range = output_range + self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) + self._bias_initializer = tf.keras.initializers.get(bias_initializer) + self._kernel_regularizer = tf.keras.regularizers.get(kernel_regularizer) + self._bias_regularizer = tf.keras.regularizers.get(bias_regularizer) + self._activity_regularizer = tf.keras.regularizers.get(activity_regularizer) + self._kernel_constraint = tf.keras.constraints.get(kernel_constraint) + self._bias_constraint = tf.keras.constraints.get(bias_constraint) + self._use_bias = use_bias + self._norm_first = norm_first + self._norm_epsilon = norm_epsilon + self._inner_dropout = inner_dropout + if attention_initializer: + self._attention_initializer = tf.keras.initializers.get( + attention_initializer) + else: + self._attention_initializer = self._kernel_initializer + self._attention_axes = attention_axes + + def build(self, input_shape): + if isinstance(input_shape, tf.TensorShape): + input_tensor_shape = input_shape + elif isinstance(input_shape, (list, tuple)): + input_tensor_shape = tf.TensorShape(input_shape[0]) + else: + raise ValueError( + f"The type of input shape argument is not supported, got: " + f"{type(input_shape)}") + einsum_equation = "abc,cd->abd" + if len(input_tensor_shape.as_list()) > 3: + einsum_equation = "...bc,cd->...bd" + hidden_size = input_tensor_shape[-1] + if hidden_size % self._num_heads != 0: + raise ValueError( + f"The input size ({hidden_size}) is not a multiple of the number of attention " + f"heads ({self._num_heads})") + self._attention_head_size = int(hidden_size // self._num_heads) + common_kwargs = dict( + bias_initializer=self._bias_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activity_regularizer=self._activity_regularizer, + kernel_constraint=self._kernel_constraint, + bias_constraint=self._bias_constraint) + # TFLongformerSelfAttention + TFLongformerSelfOutput.dense + self._attention_layer = LongformerAttention( + # Longformer + layer_id=self._layer_id, + global_attention_size=self.global_attention_size, + attention_window=self._attention_window, + num_heads=self._num_heads, + key_dim=self._attention_head_size, + dropout=self._attention_dropout, + use_bias=self._use_bias, + kernel_initializer=self._attention_initializer, + attention_axes=self._attention_axes, + name="self_attention", + **common_kwargs) + # TFLongformerSelfOutput.dropout + self._attention_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) + # Use float32 in layernorm for numeric stability. + # It is probably safe in mixed_float16, but we haven't validated this yet. + # TFLongformerSelfOutput.Layernorm + self._attention_layer_norm = ( + tf.keras.layers.LayerNormalization( + name="self_attention_layer_norm", + axis=-1, + epsilon=self._norm_epsilon, + dtype=tf.float32)) + # TFLongformerIntermediate + # TFLongformerIntermediate.dense + self._intermediate_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=(None, self._inner_dim), + bias_axes="d", + kernel_initializer=self._kernel_initializer, + name="intermediate", + **common_kwargs) + policy = tf.keras.mixed_precision.global_policy() + if policy.name == "mixed_bfloat16": + # bfloat16 causes BERT with the LAMB optimizer to not converge + # as well, so we use float32. + # TODO(b/154538392): Investigate this. + policy = tf.float32 + # TFLongformerIntermediate.intermediate_act_fn + self._intermediate_activation_layer = tf.keras.layers.Activation( + self._inner_activation, dtype=policy) + self._inner_dropout_layer = tf.keras.layers.Dropout( + rate=self._inner_dropout) + # TFLongformerOutput + # TFLongformerOutput.dense + self._output_dense = tf.keras.layers.EinsumDense( + einsum_equation, + output_shape=(None, hidden_size), + bias_axes="d", + name="output", + kernel_initializer=self._kernel_initializer, + **common_kwargs) + # TFLongformerOutput.dropout + self._output_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) + # Use float32 in layernorm for numeric stability. + # TFLongformerOutput.layernorm + self._output_layer_norm = tf.keras.layers.LayerNormalization( + name="output_layer_norm", + axis=-1, + epsilon=self._norm_epsilon, + dtype=tf.float32) + + super().build(input_shape) + + def get_config(self): + config = { + "num_attention_heads": + self._num_heads, + "inner_dim": + self._inner_dim, + "inner_activation": + self._inner_activation, + "output_dropout": + self._output_dropout_rate, + "attention_dropout": + self._attention_dropout_rate, + "output_range": + self._output_range, + "kernel_initializer": + tf.keras.initializers.serialize(self._kernel_initializer), + "bias_initializer": + tf.keras.initializers.serialize(self._bias_initializer), + "kernel_regularizer": + tf.keras.regularizers.serialize(self._kernel_regularizer), + "bias_regularizer": + tf.keras.regularizers.serialize(self._bias_regularizer), + "activity_regularizer": + tf.keras.regularizers.serialize(self._activity_regularizer), + "kernel_constraint": + tf.keras.constraints.serialize(self._kernel_constraint), + "bias_constraint": + tf.keras.constraints.serialize(self._bias_constraint), + "use_bias": + self._use_bias, + "norm_first": + self._norm_first, + "norm_epsilon": + self._norm_epsilon, + "inner_dropout": + self._inner_dropout, + "attention_initializer": + tf.keras.initializers.serialize(self._attention_initializer), + "attention_axes": + self._attention_axes, + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + """Transformer self-attention encoder block call. + + Args: + inputs: a single tensor or a list of tensors. `input tensor` as the single + sequence of embeddings. [`input tensor`, `attention mask`] to have the + additional attention mask. [`query tensor`, `key value tensor`, + `attention mask`] to have separate input streams for the query, and + key/value to the multi-head attention. + + Returns: + An output tensor with the same dimensions as input/query tensor. + """ + if isinstance(inputs, (list, tuple)): + if len(inputs) == 4: + ( + input_tensor, + attention_mask, + is_index_masked, + is_index_global_attn, + ) = inputs + key_value = None + elif len(inputs) == 5: + assert False # No key_value + else: + raise ValueError( + f"Unexpected inputs to {self.__class__} with length at {len(inputs)}" + ) + else: + input_tensor = inputs + attention_mask = None + is_index_masked = None + is_index_global_attn = None + key_value = None + + if self._output_range: + if self._norm_first: + source_tensor = input_tensor[:, 0:self._output_range, :] + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor[:, 0:self._output_range, :] + if attention_mask is not None: + attention_mask = attention_mask[:, 0:self._output_range, :] + if is_index_masked is not None: + is_index_masked = is_index_masked[:, 0:self._output_range] + if is_index_global_attn is not None: + is_index_global_attn = is_index_global_attn[:, 0:self._output_range] + else: + if self._norm_first: + source_tensor = input_tensor + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor + + if key_value is None: + key_value = input_tensor + attention_output = self._attention_layer( + hidden_states=target_tensor, + attention_mask=attention_mask, + is_index_masked=is_index_masked, + is_index_global_attn=is_index_global_attn, + ) + # TFLongformerAttention.TFLongformerSelfOutput.* - {.dense} + attention_output = self._attention_dropout(attention_output) + if self._norm_first: + attention_output = source_tensor + attention_output + else: + attention_output = self._attention_layer_norm(target_tensor + + attention_output) + if self._norm_first: + source_attention_output = attention_output + attention_output = self._output_layer_norm(attention_output) + # TFLongformerIntermediate + inner_output = self._intermediate_dense(attention_output) + inner_output = self._intermediate_activation_layer(inner_output) + inner_output = self._inner_dropout_layer(inner_output) + # TFLongformerOutput + layer_output = self._output_dense(inner_output) + layer_output = self._output_dropout(layer_output) + + if self._norm_first: + return source_attention_output + layer_output + + # During mixed precision training, layer norm output is always fp32 for now. + # Casts fp32 for the subsequent add. + layer_output = tf.cast(layer_output, tf.float32) + return self._output_layer_norm(layer_output + attention_output) diff --git a/official/projects/longformer/longformer_encoder_test.py b/official/projects/longformer/longformer_encoder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..cf24d7c926bed5ec94c4f4528a72856ebd3b7d35 --- /dev/null +++ b/official/projects/longformer/longformer_encoder_test.py @@ -0,0 +1,97 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for official.nlp.projects.longformer.longformer_encoder.""" + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from official.projects.longformer.longformer_encoder import LongformerEncoder + + +class LongformerEncoderTest(parameterized.TestCase, tf.test.TestCase): + + def setUp(self): + super(LongformerEncoderTest, self).setUp() + np.random.seed(0) + tf.random.set_seed(0) + + @combinations.generate( + combinations.combine( + attention_window=[32, 128], global_attention_size=[0, 1, 2])) + def test_encoder(self, attention_window, global_attention_size): + sequence_length = 128 + batch_size = 2 + vocab_size = 1024 + hidden_size = 256 + network = LongformerEncoder( + global_attention_size=global_attention_size, + vocab_size=vocab_size, + attention_window=[attention_window], + hidden_size=hidden_size, + num_layers=1, + num_attention_heads=4, + max_sequence_length=512) + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length), dtype=np.int32) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length), dtype=np.int32) + type_id_data = np.random.randint( + 2, size=(batch_size, sequence_length), dtype=np.int32) + inputs = { + 'input_word_ids': word_id_data, + 'input_mask': mask_data, + 'input_type_ids': type_id_data, + } + outputs = network(inputs) + self.assertEqual(outputs['sequence_output'].shape, + (batch_size, sequence_length, hidden_size)) + + @combinations.generate( + combinations.combine( + norm_first=[True, False], global_attention_size=[0, 1, 2])) + def test_norm_first(self, norm_first, global_attention_size): + sequence_length = 128 + batch_size = 2 + vocab_size = 1024 + hidden_size = 256 + network = LongformerEncoder( + global_attention_size=global_attention_size, + vocab_size=vocab_size, + attention_window=[32], + hidden_size=hidden_size, + num_layers=1, + num_attention_heads=4, + max_sequence_length=512, + norm_first=norm_first) + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length), dtype=np.int32) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length), dtype=np.int32) + type_id_data = np.random.randint( + 2, size=(batch_size, sequence_length), dtype=np.int32) + inputs = { + 'input_word_ids': word_id_data, + 'input_mask': mask_data, + 'input_type_ids': type_id_data, + } + outputs = network(inputs) + self.assertEqual(outputs['sequence_output'].shape, + (batch_size, sequence_length, hidden_size)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/longformer/longformer_experiments.py b/official/projects/longformer/longformer_experiments.py new file mode 100644 index 0000000000000000000000000000000000000000..e93672806d849bc7801b3d3f13f44af50b0ea320 --- /dev/null +++ b/official/projects/longformer/longformer_experiments.py @@ -0,0 +1,123 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Longformer experiments.""" +# pylint: disable=g-doc-return-or-yield,line-too-long +import dataclasses + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import optimization +from official.nlp.configs import bert +from official.nlp.configs import encoders +from official.nlp.data import pretrain_dataloader +from official.nlp.data import sentence_prediction_dataloader +from official.nlp.tasks import masked_lm +from official.nlp.tasks import sentence_prediction +from official.projects.longformer.longformer import LongformerEncoderConfig + + +AdamWeightDecay = optimization.AdamWeightDecayConfig +PolynomialLr = optimization.PolynomialLrConfig +PolynomialWarmupConfig = optimization.PolynomialWarmupConfig + + +@dataclasses.dataclass +class LongformerOptimizationConfig(optimization.OptimizationConfig): + """Longformer optimization configuration.""" + optimizer: optimization.OptimizerConfig = optimization.OptimizerConfig( + type='adamw', + adamw=AdamWeightDecay( + weight_decay_rate=0.01, + exclude_from_weight_decay=['LayerNorm', 'layer_norm', 'bias'], + epsilon=1e-6)) + learning_rate: optimization.LrConfig = optimization.LrConfig( + type='polynomial', + polynomial=PolynomialLr( + initial_learning_rate=1e-4, + decay_steps=1000000, + end_learning_rate=0.0)) + warmup: optimization.WarmupConfig = optimization.WarmupConfig( + type='polynomial', polynomial=PolynomialWarmupConfig(warmup_steps=10000)) + + +@exp_factory.register_config_factory('longformer/pretraining') +def longformer_pretraining() -> cfg.ExperimentConfig: + """Longformer pretraining experiment.""" + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(enable_xla=True), + task=masked_lm.MaskedLMConfig( + model=bert.PretrainerConfig( + encoder=encoders.EncoderConfig( + type='any', any=LongformerEncoderConfig()), + cls_heads=[ + bert.ClsHeadConfig( + inner_dim=768, + num_classes=2, + dropout_rate=0.1, + name='next_sentence') + ]), + train_data=pretrain_dataloader.BertPretrainDataConfig( + use_v2_feature_names=True), + validation_data=pretrain_dataloader.BertPretrainDataConfig( + use_v2_feature_names=True, is_training=False)), + trainer=cfg.TrainerConfig( + optimizer_config=LongformerOptimizationConfig(), train_steps=1000000), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config + + +@exp_factory.register_config_factory('longformer/glue') +def longformer_glue() -> cfg.ExperimentConfig: + """Longformer glue fine-tuning.""" + config = cfg.ExperimentConfig( + task=sentence_prediction.SentencePredictionConfig( + model=sentence_prediction.ModelConfig( + encoder=encoders.EncoderConfig( + type='any', any=LongformerEncoderConfig())), + train_data=sentence_prediction_dataloader + .SentencePredictionDataConfig(), + validation_data=sentence_prediction_dataloader + .SentencePredictionDataConfig( + is_training=False, drop_remainder=False)), + trainer=cfg.TrainerConfig( + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': + 0.01, + 'exclude_from_weight_decay': + ['LayerNorm', 'layer_norm', 'bias'], + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 3e-5, + 'end_learning_rate': 0.0, + } + }, + 'warmup': { + 'type': 'polynomial' + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config diff --git a/official/projects/longformer/train.py b/official/projects/longformer/train.py new file mode 100644 index 0000000000000000000000000000000000000000..5486c1902d8ec61613a88c554f4600fc20c0bcd7 --- /dev/null +++ b/official/projects/longformer/train.py @@ -0,0 +1,69 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""A customized training library for the specific task.""" + +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +from official.projects.longformer import longformer_experiments # pylint: disable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu, + **params.runtime.model_parallelism()) + + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(main) diff --git a/official/projects/longformer/utils/convert_pretrained_pytorch_checkpoint_to_tf.py b/official/projects/longformer/utils/convert_pretrained_pytorch_checkpoint_to_tf.py new file mode 100644 index 0000000000000000000000000000000000000000..38fcc84e07792f5bc4970b30aba7e61ca20d10f8 --- /dev/null +++ b/official/projects/longformer/utils/convert_pretrained_pytorch_checkpoint_to_tf.py @@ -0,0 +1,200 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Converts pre-trained pytorch checkpoint into a tf encoder checkpoint.""" + +import os + +from absl import app +import numpy as np +import tensorflow as tf +import transformers + +from official.modeling import tf_utils +from official.projects.longformer.longformer import LongformerEncoderConfig +from official.projects.longformer.longformer_encoder import LongformerEncoder + + +def _get_pytorch_longformer_model(): + pretrained_lm = "allenai/longformer-base-4096" + + model = transformers.AutoModel.from_pretrained(pretrained_lm) + + return {n: p.data.numpy() for n, p in model.named_parameters()} + + +def _create_longformer_model(): + """Creates a Longformer model.""" + encoder_cfg = LongformerEncoderConfig + encoder_cfg.vocab_size = 50265 + encoder_cfg.max_position_embeddings = 4098 + encoder_cfg.attention_window = [2] * encoder_cfg.num_layers + encoder_cfg.global_attention_size = 1 + encoder = LongformerEncoder( + attention_window=encoder_cfg.attention_window, + global_attention_size=encoder_cfg.global_attention_size, + vocab_size=encoder_cfg.vocab_size, + hidden_size=encoder_cfg.hidden_size, + num_layers=encoder_cfg.num_layers, + num_attention_heads=encoder_cfg.num_attention_heads, + inner_dim=encoder_cfg.intermediate_size, + inner_activation=tf_utils.get_activation(encoder_cfg.hidden_activation), + output_dropout=encoder_cfg.dropout_rate, + attention_dropout=encoder_cfg.attention_dropout_rate, + max_sequence_length=encoder_cfg.max_position_embeddings, + type_vocab_size=encoder_cfg.type_vocab_size, + initializer=tf.keras.initializers.TruncatedNormal( + stddev=encoder_cfg.initializer_range), + output_range=encoder_cfg.output_range, + embedding_width=encoder_cfg.embedding_size, + norm_first=encoder_cfg.norm_first) + return encoder + + +# pylint: disable=protected-access +def convert(encoder, allenai_model): + """Convert AllenAI Longformer to the one in the codebase.""" + num_layers = encoder._config["num_layers"] + num_attention_heads = encoder._config["num_attention_heads"] + hidden_size = encoder._config["hidden_size"] + head_size = hidden_size // num_attention_heads + assert head_size * num_attention_heads == hidden_size + encoder._embedding_layer.set_weights( + [allenai_model["embeddings.word_embeddings.weight"]]) + encoder._embedding_norm_layer.set_weights([ + allenai_model["embeddings.LayerNorm.weight"], + allenai_model["embeddings.LayerNorm.bias"] + ]) + encoder._type_embedding_layer.set_weights([ + np.repeat( + allenai_model["embeddings.token_type_embeddings.weight"], 2, axis=0) + ]) + encoder._position_embedding_layer.set_weights( + [allenai_model["embeddings.position_embeddings.weight"]]) + encoder._pooler_layer.set_weights([ + allenai_model["pooler.dense.weight"], allenai_model["pooler.dense.bias"] + ]) + for layer_num in range(num_layers): + encoder._transformer_layers[ + layer_num]._attention_layer._global_key_dense.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.key_global.weight"].T + .reshape( + (hidden_size, num_attention_heads, head_size)), allenai_model[ + f"encoder.layer.{layer_num}.attention.self.key_global.bias"] + .reshape((num_attention_heads, head_size)) + ]) + encoder._transformer_layers[ + layer_num]._attention_layer._global_query_dense.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.query_global.weight"] + .T.reshape((hidden_size, num_attention_heads, head_size)), + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.query_global.bias"] + .reshape((num_attention_heads, head_size)) + ]) + encoder._transformer_layers[ + layer_num]._attention_layer._global_value_dense.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.value_global.weight"] + .T.reshape((hidden_size, num_attention_heads, head_size)), + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.value_global.bias"] + .reshape((num_attention_heads, head_size)) + ]) + encoder._transformer_layers[ + layer_num]._attention_layer._key_dense.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.key.weight"].T + .reshape( + (hidden_size, num_attention_heads, head_size)), allenai_model[ + f"encoder.layer.{layer_num}.attention.self.key_global.bias"] + .reshape((num_attention_heads, head_size)) + ]) + encoder._transformer_layers[ + layer_num]._attention_layer._query_dense.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.query.weight"].T + .reshape((hidden_size, num_attention_heads, head_size)), + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.query.bias"].reshape( + (num_attention_heads, head_size)) + ]) + encoder._transformer_layers[ + layer_num]._attention_layer._value_dense.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.value.weight"].T + .reshape((hidden_size, num_attention_heads, head_size)), + allenai_model[ + f"encoder.layer.{layer_num}.attention.self.value.bias"].reshape( + (num_attention_heads, head_size)) + ]) + encoder._transformer_layers[ + layer_num]._attention_layer._output_dense.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.output.dense.weight"].T, + allenai_model[ + f"encoder.layer.{layer_num}.attention.output.dense.bias"] + ]) + encoder._transformer_layers[layer_num]._attention_layer_norm.set_weights([ + allenai_model[ + f"encoder.layer.{layer_num}.attention.output.LayerNorm.weight"], + allenai_model[ + f"encoder.layer.{layer_num}.attention.output.LayerNorm.bias"] + ]) + encoder._transformer_layers[layer_num]._intermediate_dense.set_weights([ + allenai_model[f"encoder.layer.{layer_num}.intermediate.dense.weight"].T, + allenai_model[f"encoder.layer.{layer_num}.intermediate.dense.bias"] + ]) + encoder._transformer_layers[layer_num]._output_dense.set_weights([ + allenai_model[f"encoder.layer.{layer_num}.output.dense.weight"].T, + allenai_model[f"encoder.layer.{layer_num}.output.dense.bias"] + ]) + encoder._transformer_layers[layer_num]._output_layer_norm.set_weights([ + allenai_model[f"encoder.layer.{layer_num}.output.LayerNorm.weight"], + allenai_model[f"encoder.layer.{layer_num}.output.LayerNorm.bias"] + ]) + + +def convert_checkpoint(output_path): + """Converts and save the checkpoint.""" + output_dir, _ = os.path.split(output_path) + tf.io.gfile.makedirs(output_dir) + + encoder = _create_longformer_model() + allenai_model = _get_pytorch_longformer_model() + sequence_length = 128 + batch_size = 2 + word_id_data = np.random.randint( + 10, size=(batch_size, sequence_length), dtype=np.int32) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length), dtype=np.int32) + type_id_data = np.random.randint( + 2, size=(batch_size, sequence_length), dtype=np.int32) + inputs = { + "input_word_ids": word_id_data, + "input_mask": mask_data, + "input_type_ids": type_id_data, + } + encoder(inputs) + convert(encoder, allenai_model) + tf.train.Checkpoint(encoder=encoder).write(output_path) + + +def main(_): + convert_checkpoint("longformer-4096/longformer") + + +if __name__ == "__main__": + app.run(main) diff --git a/official/projects/longformer/utils/longformer_tokenizer_to_tfrecord.py b/official/projects/longformer/utils/longformer_tokenizer_to_tfrecord.py new file mode 100644 index 0000000000000000000000000000000000000000..9fc85a391edc7d3beefbc64bbfd51fe04d2f3f73 --- /dev/null +++ b/official/projects/longformer/utils/longformer_tokenizer_to_tfrecord.py @@ -0,0 +1,112 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Convert Longformer training examples to Tfrecord.""" +import collections +import os + +import datasets +import tensorflow as tf +import transformers + +pretrained_lm = "allenai/longformer-base-4096" +task_name = "mnli" +save_path = "./" + +raw_datasets = datasets.load_dataset("glue", task_name, cache_dir=None) +label_list = raw_datasets["train"].features["label"].names +num_labels = len(label_list) + +tokenizer = transformers.AutoTokenizer.from_pretrained( + pretrained_lm, + use_fast=True, +) + +task_to_keys = { + "cola": ("sentence", None), + "mnli": ("premise", "hypothesis"), + "mrpc": ("sentence1", "sentence2"), + "qnli": ("question", "sentence"), + "qqp": ("question1", "question2"), + "rte": ("sentence1", "sentence2"), + "sst2": ("sentence", None), + "stsb": ("sentence1", "sentence2"), + "wnli": ("sentence1", "sentence2"), +} + +sentence1_key, sentence2_key = task_to_keys[task_name] +padding = "max_length" + +# make sure this is the same with model input size. +max_seq_length = 512 + + +def preprocess_function(examples): + # Tokenize the texts + args = ((examples[sentence1_key],) if sentence2_key is None else + (examples[sentence1_key], examples[sentence2_key])) + result = tokenizer( + *args, padding=padding, max_length=max_seq_length, truncation=True) + return result + + +raw_datasets = raw_datasets.map( + preprocess_function, + batched=True, + desc="Running tokenizer on dataset", +) + +train_dataset = raw_datasets["train"] +eval_dataset = raw_datasets["validation_matched" if task_name == + "mnli" else "validation"] + +print("train_dataset", train_dataset[0]) +print("eval_dataset", eval_dataset[0]) + + +def file_based_convert_examples_to_features(examples, output_file): + """Convert a set of `InputExample`s to a TFRecord file.""" + tf.io.gfile.makedirs(os.path.dirname(output_file)) + writer = tf.io.TFRecordWriter(output_file) + + for ex_index, example in enumerate(examples): + if ex_index % 10000 == 0: + print(f"Writing example {ex_index} of {len(examples)}") + + def create_int_feature(values): + f = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values))) + return f + + features = collections.OrderedDict() + features["input_ids"] = create_int_feature(example["input_ids"]) + features["input_mask"] = create_int_feature(example["attention_mask"]) + features["segment_ids"] = create_int_feature([0] * + len(example["attention_mask"])) + features["label_ids"] = create_int_feature([example["label"]]) + features["is_real_example"] = create_int_feature([1]) + features["example_id"] = create_int_feature([example["idx"]]) + + tf_example = tf.train.Example(features=tf.train.Features(feature=features)) + writer.write(tf_example.SerializeToString()) + writer.close() + + +file_based_convert_examples_to_features( + train_dataset, + os.path.join(save_path, + f"{pretrained_lm.replace('/', '_')}_train.tf_record")) +file_based_convert_examples_to_features( + eval_dataset, + os.path.join(save_path, + f"{pretrained_lm.replace('/', '_')}_eval.tf_record")) diff --git a/official/projects/mobilebert/__init__.py b/official/projects/mobilebert/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/mobilebert/__init__.py +++ b/official/projects/mobilebert/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/mobilebert/distillation.py b/official/projects/mobilebert/distillation.py index 731e32f938226c5233d2683a517e977a91c80d3b..68decad6c92756519226f4ff0c1a1f001962c8ad 100644 --- a/official/projects/mobilebert/distillation.py +++ b/official/projects/mobilebert/distillation.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -238,6 +238,9 @@ class BertDistillationTask(policies.ProgressivePolicy, base_task.Task): }) opt_factory = optimization.OptimizerFactory(params) optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + if isinstance(optimizer, tf.keras.optimizers.experimental.Optimizer): + optimizer = tf.keras.__internal__.optimizers.convert_to_legacy_optimizer( + optimizer) return optimizer diff --git a/official/projects/mobilebert/distillation_test.py b/official/projects/mobilebert/distillation_test.py index 6b80ebaa30663e93f161e2a0c9afbc25a82701ab..1e6605ac1064929ae5556dd0ff3ed860d6cc6369 100644 --- a/official/projects/mobilebert/distillation_test.py +++ b/official/projects/mobilebert/distillation_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -153,7 +153,7 @@ class DistillationTest(tf.test.TestCase, parameterized.TestCase): eval_dataset = bert_distillation_task.get_eval_dataset(stage_id=0) eval_iterator = iter(eval_dataset) - optimizer = tf.keras.optimizers.SGD(lr=0.1) + optimizer = tf.keras.optimizers.legacy.SGD(learning_rate=0.1) # test train/val step for all stages, including the last pretraining stage for stage in range(student_block_num + 1): diff --git a/official/projects/mobilebert/export_tfhub.py b/official/projects/mobilebert/export_tfhub.py index 4f065a2a94488f5661fcec0e3405406ae500fbb3..184de577b57b105fe6ea3fd194f460339f4de149 100644 --- a/official/projects/mobilebert/export_tfhub.py +++ b/official/projects/mobilebert/export_tfhub.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/mobilebert/model_utils.py b/official/projects/mobilebert/model_utils.py index 0cd6448515771f5aa4a06cbe71908a7ac196933d..70be52a4f864822b828e35aa2daf99805ba9a32f 100644 --- a/official/projects/mobilebert/model_utils.py +++ b/official/projects/mobilebert/model_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/mobilebert/run_distillation.py b/official/projects/mobilebert/run_distillation.py index 9fb7a9e670d2d191e9add1886443e72e6bdf9444..30aefcf8d7bde410a3c891ba1a567f094154d76c 100644 --- a/official/projects/mobilebert/run_distillation.py +++ b/official/projects/mobilebert/run_distillation.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/mobilebert/tf2_model_checkpoint_converter.py b/official/projects/mobilebert/tf2_model_checkpoint_converter.py index d3333ee31908c63541603ea2f4a1338b523263c1..eae8e8b2d2ad43090f560087b7113e8ca1e661a8 100644 --- a/official/projects/mobilebert/tf2_model_checkpoint_converter.py +++ b/official/projects/mobilebert/tf2_model_checkpoint_converter.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/mobilebert/utils.py b/official/projects/mobilebert/utils.py index d5c3e4067471de279ce2e3147ef655771447cb57..f9e2a924c6f1def16e2af8c3cd4ae463ff42d583 100644 --- a/official/projects/mobilebert/utils.py +++ b/official/projects/mobilebert/utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/mosaic/README.md b/official/projects/mosaic/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d63a22deaa9007d46e07ec59b8cf0e523c91093a --- /dev/null +++ b/official/projects/mosaic/README.md @@ -0,0 +1,121 @@ +# MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context + +[![Paper](http://img.shields.io/badge/Paper-arXiv.2112.11623-B3181B?logo=arXiv)](https://arxiv.org/abs/2112.11623) + +This repository is the official implementation of the following +paper. + +* [MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded Context](https://arxiv.org/abs/2112.11623) + +## Description + +MOSAIC is a neural network architecture for efficient and accurate semantic +image segmentation on mobile devices. MOSAIC is designed using commonly +supported neural operations by diverse mobile hardware platforms for flexible +deployment across various mobile platforms. With a simple asymmetric +encoder-decoder structure which consists of an efficient multi-scale context +encoder and a light-weight hybrid decoder to recover spatial details from +aggregated information, MOSAIC achieves better balanced performance while +considering accuracy and computational cost. Deployed on top of a tailored +feature extraction backbone based on a searched classification network, MOSAIC +achieves a 5% absolute accuracy gain on ADE20K with similar or lower latency +compared to the current industry standard MLPerf mobile v1.0 models and +state-of-the-art architectures. + +[MLPerf Mobile v2.0]((https://mlcommons.org/en/inference-mobile-20/)) included +MOSAIC as a new industry standard benchmark model for image segmentation. +Please see details [here](https://mlcommons.org/en/news/mlperf-inference-1q2022/). + +You can also refer to the [MLCommons GitHub repository](https://github.com/mlcommons/mobile_open/tree/main/vision/mosaic). + +## History + +### Oct 13, 2022 + +* First release of MOSAIC in TensorFlow 2 including checkpoints that have been + pretrained on Cityscapes. + +## Maintainers + +* Weijun Wang ([weijunw-g](https://github.com/weijunw-g)) +* Fang Yang ([fyangf](https://github.com/fyangf)) +* Shixin Luo ([luotigerlsx](https://github.com/luotigerlsx)) + +## Requirements + +[![Python](https://img.shields.io/pypi/pyversions/tensorflow.svg?style=plastic)](https://badge.fury.io/py/tensorflow) +[![tf-models-official PyPI](https://badge.fury.io/py/tf-models-official.svg)](https://badge.fury.io/py/tf-models-official) + +## Results + +The following table shows the mIoU measured on the `cityscapes` dataset. + +| Config | Backbone | Resolution | branch_filter_depths | pyramid_pool_bin_nums | mIoU | Download | +|-------------------------|:--------------------:|:----------:|:--------------------:|:---------------------:|:-----:|:--------:| +| Paper reference config | MobileNetMultiAVGSeg | 1024x2048 | [32, 32] | [4, 8, 16] | 75.98 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/mosaic/MobileNetMultiAVGSeg-r1024-ebf32-nogp.tar.gz)
[tensorboard](https://tensorboard.dev/experiment/okEog90bSwupajFgJwGEIw//#scalars) | +| Current best config | MobileNetMultiAVGSeg | 1024x2048 | [64, 64] | [1, 4, 8, 16] | 77.24 | [ckpt](https://storage.googleapis.com/tf_model_garden/vision/mosaic/MobileNetMultiAVGSeg-r1024-ebf64-gp.tar.gz)
[tensorboard](https://tensorboard.dev/experiment/l5hkV7JaQM23EXeOBT6oJg/#scalars) | + +* `branch_filter_depths`: the number of convolution channels in each branch at + a pyramid level after `Spatial Pyramid Pooling` +* `pyramid_pool_bin_nums`: the number of bins at each level of the `Spatial + Pyramid Pooling` + +## Training + +It can run on Google Cloud Platform using Cloud TPU. +[Here](https://cloud.google.com/tpu/docs/how-to) is the instruction of using +Cloud TPU. Following the instructions to set up Cloud TPU and +launch training by: + +```shell +EXP_TYPE=mosaic_mnv35_cityscapes +EXP_NAME="" # You can give any name to the experiment. +TPU_NAME="" # The name assigned while creating a Cloud TPU +MODEL_DIR="gs://" +# Now launch the experiment. +python3 -m official.projects.mosaic.train \ + --experiment=$EXP_TYPE \ + --mode=train \ + --tpu=$TPU_NAME \ + --model_dir=$MODEL_DIR \ + --config_file=official/projects/mosaic/configs/experiments/mosaic_mnv35_cityscapes_tdfs_tpu.yaml +``` + +## Evaluation + +Please run this command line for evaluation. + +```shell +EXP_TYPE=mosaic_mnv35_cityscapes +EXP_NAME="" # You can give any name to the experiment. +TPU_NAME="" # The name assigned while creating a Cloud TPU +MODEL_DIR="gs://" +# Now launch the experiment. +python3 -m official.projects.mosaic.train \ + --experiment=$EXP_TYPE \ + --mode=eval \ + --tpu=$TPU_NAME \ + --model_dir=$MODEL_DIR \ + --config_file=official/projects/mosaic/configs/experiments/mosaic_mnv35_cityscapes_tdfs_tpu.yaml +``` + +## License + +[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) + +This project is licensed under the terms of the **Apache License 2.0**. + +## Citation + +If you want to cite this repository in your work, please consider citing the +paper. + +``` +@inproceedings{weijun2021mosaic, + title={MOSAIC: Mobile Segmentation via decoding Aggregated Information and + encoded Context}, + author={Weijun Wang, Andrew Howard}, + journal={arXiv preprint arXiv:2112.11623}, + year={2021}, +} +``` diff --git a/official/projects/mosaic/configs/experiments/mosaic_mnv35_cityscapes_tfds_tpu.yaml b/official/projects/mosaic/configs/experiments/mosaic_mnv35_cityscapes_tfds_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b2a240050492a8d00ac427d2f2bf48bd39872d80 --- /dev/null +++ b/official/projects/mosaic/configs/experiments/mosaic_mnv35_cityscapes_tfds_tpu.yaml @@ -0,0 +1,87 @@ +# Using Tensorflow datasets: 'cityscapes/semantic_segmentation' +# Some expected flags to use with xmanager launcher: +# --experiment_type=mosaic_mnv35_cityscapes +# --tpu_topology=4x4 +# mIoU: 77.24% +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'float32' +task: + model: + num_classes: 19 + input_size: [null, null, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetMultiAVGSeg' + output_intermediate_endpoints: true + output_stride: 16 + neck: + branch_filter_depths: [64, 64] + conv_kernel_sizes: [3, 5] + pyramid_pool_bin_nums: [1, 4, 8, 16] + dropout_rate: 0.0 + head: + num_classes: 19 + decoder_input_levels: ['3/depthwise', '2/depthwise'] + decoder_stage_merge_styles: ['concat_merge', 'sum_merge'] + decoder_filters: [64, 64] + decoder_projected_filters: [19, 19] + norm_activation: + activation: relu + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + init_checkpoint: 'gs://tf_model_garden/vision/mobilenet/v3.5multiavg_seg_float/' + init_checkpoint_modules: 'backbone' + losses: + l2_weight_decay: 1.0e-04 + train_data: + output_size: [1024, 2048] + crop_size: [1024, 2048] + input_path: '' + tfds_name: 'cityscapes/semantic_segmentation' + tfds_split: 'train' + is_training: true + global_batch_size: 32 + dtype: 'float32' + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + output_size: [1024, 2048] + input_path: '' + tfds_name: 'cityscapes/semantic_segmentation' + tfds_split: 'validation' + is_training: false + global_batch_size: 32 + dtype: 'float32' + drop_remainder: false + resize_eval_groundtruth: true +trainer: + optimizer_config: + learning_rate: + polynomial: + decay_steps: 100000 + initial_learning_rate: 0.1 + power: 0.9 + type: polynomial + optimizer: + sgd: + momentum: 0.9 + type: sgd + warmup: + linear: + name: linear + warmup_learning_rate: 0 + warmup_steps: 925 + type: linear + steps_per_loop: 92 # 2975 / 32 = 92 + summary_interval: 92 + train_steps: 100000 + validation_interval: 92 + validation_steps: 16 # 500 / 32 = 16 + checkpoint_interval: 92 + best_checkpoint_export_subdir: 'best_ckpt' + best_checkpoint_eval_metric: 'mean_iou' + best_checkpoint_metric_comp: 'higher' diff --git a/official/projects/mosaic/configs/mosaic_config.py b/official/projects/mosaic/configs/mosaic_config.py new file mode 100644 index 0000000000000000000000000000000000000000..4435a83d42b2540a5cce7d7ca904507e85729ec3 --- /dev/null +++ b/official/projects/mosaic/configs/mosaic_config.py @@ -0,0 +1,218 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configuration definition for Semantic Segmentation with MOSAIC.""" +import dataclasses +import os +from typing import List, Optional, Union + +import numpy as np +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.vision.configs import common +from official.vision.configs import semantic_segmentation as seg_cfg +from official.vision.configs.google import backbones + + +@dataclasses.dataclass +class MosaicDecoderHead(hyperparams.Config): + """MOSAIC decoder head config for Segmentation.""" + num_classes: int = 19 + decoder_input_levels: List[str] = dataclasses.field(default_factory=list) + decoder_stage_merge_styles: List[str] = dataclasses.field( + default_factory=list) + decoder_filters: List[int] = dataclasses.field(default_factory=list) + decoder_projected_filters: List[int] = dataclasses.field(default_factory=list) + encoder_end_level: int = 4 + use_additional_classifier_layer: bool = False + classifier_kernel_size: int = 1 + activation: str = 'relu' + kernel_initializer: str = 'glorot_uniform' + interpolation: str = 'bilinear' + + +@dataclasses.dataclass +class MosaicEncoderNeck(hyperparams.Config): + """MOSAIC encoder neck config for segmentation.""" + encoder_input_level: Union[str, int] = '4' + branch_filter_depths: List[int] = dataclasses.field(default_factory=list) + conv_kernel_sizes: List[int] = dataclasses.field(default_factory=list) + pyramid_pool_bin_nums: List[int] = dataclasses.field(default_factory=list) + activation: str = 'relu' + dropout_rate: float = 0.1 + kernel_initializer: str = 'glorot_uniform' + interpolation: str = 'bilinear' + use_depthwise_convolution: bool = True + + +@dataclasses.dataclass +class MosaicSemanticSegmentationModel(hyperparams.Config): + """MOSAIC semantic segmentation model config.""" + num_classes: int = 19 + input_size: List[int] = dataclasses.field(default_factory=list) + head: MosaicDecoderHead = MosaicDecoderHead() + backbone: backbones.Backbone = backbones.Backbone( + type='mobilenet', mobilenet=backbones.MobileNet()) + neck: MosaicEncoderNeck = MosaicEncoderNeck() + norm_activation: common.NormActivation = common.NormActivation( + use_sync_bn=True, norm_momentum=0.99, norm_epsilon=0.001) + + +@dataclasses.dataclass +class MosaicSemanticSegmentationTask(seg_cfg.SemanticSegmentationTask): + """The config for MOSAIC segmentation task.""" + model: MosaicSemanticSegmentationModel = MosaicSemanticSegmentationModel() + train_data: seg_cfg.DataConfig = seg_cfg.DataConfig(is_training=True) + validation_data: seg_cfg.DataConfig = seg_cfg.DataConfig(is_training=False) + losses: seg_cfg.Losses = seg_cfg.Losses() + evaluation: seg_cfg.Evaluation = seg_cfg.Evaluation() + train_input_partition_dims: List[int] = dataclasses.field( + default_factory=list) + eval_input_partition_dims: List[int] = dataclasses.field( + default_factory=list) + init_checkpoint: Optional[str] = None + init_checkpoint_modules: Union[ + str, List[str]] = 'all' # all, backbone, and/or neck. + export_config: seg_cfg.ExportConfig = seg_cfg.ExportConfig() + + +# Cityscapes Dataset (Download and process the dataset yourself) +CITYSCAPES_TRAIN_EXAMPLES = 2975 +CITYSCAPES_VAL_EXAMPLES = 500 +CITYSCAPES_INPUT_PATH_BASE = 'cityscapes/tfrecord' + + +@exp_factory.register_config_factory('mosaic_mnv35_cityscapes') +def mosaic_mnv35_cityscapes() -> cfg.ExperimentConfig: + """Instantiates an experiment configuration of image segmentation task. + + This image segmentation experiment is conducted on Cityscapes dataset. The + model architecture is a MOSAIC encoder-decoer. The default backbone network is + a mobilenet variant called Mobilenet_v3.5-MultiAvg on top of which the MOSAIC + encoder-decoder can be deployed. All detailed configurations can be overridden + by a .yaml file provided by the user to launch the experiments. Please refer + to .yaml examples in the path of ../configs/experiments/. + + Returns: + A particular instance of cfg.ExperimentConfig for MOSAIC model based + image semantic segmentation task. + """ + train_batch_size = 16 + eval_batch_size = 16 + steps_per_epoch = CITYSCAPES_TRAIN_EXAMPLES // train_batch_size + output_stride = 16 + + backbone_output_level = int(np.math.log2(output_stride)) + config = cfg.ExperimentConfig( + task=MosaicSemanticSegmentationTask( + model=MosaicSemanticSegmentationModel( + # Cityscapes uses only 19 semantic classes for train/evaluation. + # The void (background) class is ignored in train and evaluation. + num_classes=19, + input_size=[None, None, 3], + backbone=backbones.Backbone( + type='mobilenet', + mobilenet=backbones.MobileNet( + model_id='MobileNetMultiAVGSeg', + output_intermediate_endpoints=True, + output_stride=output_stride)), + neck=MosaicEncoderNeck( + encoder_input_level=backbone_output_level, + branch_filter_depths=[64, 64], + conv_kernel_sizes=[3, 5], + pyramid_pool_bin_nums=[1, 4, 8, 16], # paper default + activation='relu', + dropout_rate=0.1, + kernel_initializer='glorot_uniform', + interpolation='bilinear', + use_depthwise_convolution=True), + head=MosaicDecoderHead( + num_classes=19, + decoder_input_levels=['3/depthwise', '2/depthwise'], + decoder_stage_merge_styles=['concat_merge', 'sum_merge'], + decoder_filters=[64, 64], + decoder_projected_filters=[19, 19], + encoder_end_level=backbone_output_level, + use_additional_classifier_layer=False, + classifier_kernel_size=1, + activation='relu', + kernel_initializer='glorot_uniform', + interpolation='bilinear'), + norm_activation=common.NormActivation( + activation='relu', + norm_momentum=0.99, + norm_epsilon=1e-3, + use_sync_bn=True)), + losses=seg_cfg.Losses(l2_weight_decay=4e-5), + train_data=seg_cfg.DataConfig( + input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, + 'train_fine**'), + crop_size=[1024, 2048], + output_size=[1024, 2048], + is_training=True, + global_batch_size=train_batch_size, + aug_scale_min=0.5, + aug_scale_max=2.0), + validation_data=seg_cfg.DataConfig( + input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, 'val_fine*'), + output_size=[1024, 2048], + is_training=False, + global_batch_size=eval_batch_size, + resize_eval_groundtruth=True, + drop_remainder=False), + # Imagenet pre-trained Mobilenet_v3.5-MultiAvg checkpoint. + init_checkpoint='gs://tf_model_garden/vision/mobilenet/v3.5multiavg_seg_float/', + init_checkpoint_modules='backbone'), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=100000, + validation_steps=CITYSCAPES_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + best_checkpoint_eval_metric='mean_iou', + best_checkpoint_export_subdir='best_ckpt', + best_checkpoint_metric_comp='higher', + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.1, + 'decay_steps': 100000, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config diff --git a/official/projects/mosaic/modeling/mosaic_blocks.py b/official/projects/mosaic/modeling/mosaic_blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..076859bb2425e3f82839fed98527d034610d8909 --- /dev/null +++ b/official/projects/mosaic/modeling/mosaic_blocks.py @@ -0,0 +1,885 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Definitions of building blocks for MOSAIC model. + +Reference: + [MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded + Context](https://arxiv.org/pdf/2112.11623.pdf) +""" + +from typing import Any, Dict, List, Optional, Tuple, Union + +import tensorflow as tf + +from official.modeling import tf_utils + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class MultiKernelGroupConvBlock(tf.keras.layers.Layer): + """A multi-kernel grouped convolution block. + + This block is used in the segmentation neck introduced in MOSAIC. + Reference: + [MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded + Context](https://arxiv.org/pdf/2112.11623.pdf) + """ + + def __init__( + self, + output_filter_depths: Optional[List[int]] = None, + kernel_sizes: Optional[List[int]] = None, + use_sync_bn: bool = False, + batchnorm_momentum: float = 0.99, + batchnorm_epsilon: float = 0.001, + activation: str = 'relu', + kernel_initializer: str = 'GlorotUniform', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + use_depthwise_convolution: bool = True, + **kwargs): + """Initializes a Multi-kernel Grouped Convolution Block. + + Args: + output_filter_depths: A list of integers representing the numbers of + output channels or filter depths of convolution groups. + kernel_sizes: A list of integers denoting the convolution kernel sizes in + each convolution group. + use_sync_bn: A bool, whether or not to use sync batch normalization. + batchnorm_momentum: A float for the momentum in BatchNorm. Defaults to + 0.99. + batchnorm_epsilon: A float for the epsilon value in BatchNorm. Defaults to + 0.001. + activation: A `str` for the activation fuction type. Defaults to 'relu'. + kernel_initializer: Kernel initializer for conv layers. Defaults to + `glorot_uniform`. + kernel_regularizer: Kernel regularizer for conv layers. Defaults to None. + use_depthwise_convolution: Allows spatial pooling to be separable + depthwise convolusions. + **kwargs: Other keyword arguments for the layer. + """ + super(MultiKernelGroupConvBlock, self).__init__(**kwargs) + + if output_filter_depths is None: + output_filter_depths = [64, 64] + if kernel_sizes is None: + kernel_sizes = [3, 5] + if len(output_filter_depths) != len(kernel_sizes): + raise ValueError('The number of output groups must match #kernels.') + self._output_filter_depths = output_filter_depths + self._kernel_sizes = kernel_sizes + self._num_groups = len(self._kernel_sizes) + self._use_sync_bn = use_sync_bn + self._batchnorm_momentum = batchnorm_momentum + self._batchnorm_epsilon = batchnorm_epsilon + self._activation = activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._use_depthwise_convolution = use_depthwise_convolution + # To apply BN before activation. Putting BN between conv and activation also + # helps quantization where conv+bn+activation are fused into a single op. + self._activation_fn = tf_utils.get_activation(activation) + if self._use_sync_bn: + self._bn_op = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._bn_op = tf.keras.layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + self._group_split_axis = -1 + else: + self._bn_axis = 1 + self._group_split_axis = 1 + + def build(self, input_shape: tf.TensorShape) -> None: + """Builds the block with the given input shape.""" + input_channels = input_shape[self._group_split_axis] + if input_channels % self._num_groups != 0: + raise ValueError('The number of input channels must be divisible by ' + 'the number of groups for evenly group split.') + self._conv_branches = [] + if self._use_depthwise_convolution: + for i, conv_kernel_size in enumerate(self._kernel_sizes): + depthwise_conv = tf.keras.layers.DepthwiseConv2D( + kernel_size=(conv_kernel_size, conv_kernel_size), + depth_multiplier=1, + padding='same', + depthwise_regularizer=self._kernel_regularizer, + depthwise_initializer=self._kernel_initializer, + use_bias=False) + # Add BN->RELU after depthwise convolution. + batchnorm_op_depthwise = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + activation_depthwise = self._activation_fn + feature_conv = tf.keras.layers.Conv2D( + filters=self._output_filter_depths[i], + kernel_size=(1, 1), + padding='same', + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + activation=None, + use_bias=False) + batchnorm_op = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + # Use list manually as current QAT API does not support sequential model + # within a tf.keras.Sequential block, e.g. conv_branch = + # tf.keras.Sequential([depthwise_conv, feature_conv, batchnorm_op,]) + conv_branch = [ + depthwise_conv, + batchnorm_op_depthwise, + activation_depthwise, + feature_conv, + batchnorm_op, + ] + self._conv_branches.append(conv_branch) + else: + for i, conv_kernel_size in enumerate(self._kernel_sizes): + norm_conv = tf.keras.layers.Conv2D( + filters=self._output_filter_depths[i], + kernel_size=(conv_kernel_size, conv_kernel_size), + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + activation=None, + use_bias=False) + batchnorm_op = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + conv_branch = [norm_conv, batchnorm_op] + self._conv_branches.append(conv_branch) + self._concat_groups = tf.keras.layers.Concatenate( + axis=self._group_split_axis) + + def call(self, + inputs: tf.Tensor, + training: Optional[bool] = None) -> tf.Tensor: + """Calls this group convolution block with the given inputs.""" + inputs_splits = tf.split(inputs, + num_or_size_splits=self._num_groups, + axis=self._group_split_axis) + output_branches = [] + for i, x in enumerate(inputs_splits): + conv_branch = self._conv_branches[i] + # Apply layers sequentially and manually. + for layer in conv_branch: + if isinstance(layer, tf.keras.layers.Layer): + x = layer(x, training=training) + else: + x = layer(x) + # Apply activation function after BN, which also helps quantization + # where conv+bn+activation are fused into a single op. + x = self._activation_fn(x) + output_branches.append(x) + x = self._concat_groups(output_branches) + return x + + def get_config(self) -> Dict[str, Any]: + """Returns a config dictionary for initialization from serialization.""" + config = { + 'output_filter_depths': self._output_filter_depths, + 'kernel_sizes': self._kernel_sizes, + 'num_groups': self._num_groups, + 'use_sync_bn': self._use_sync_bn, + 'batchnorm_momentum': self._batchnorm_momentum, + 'batchnorm_epsilon': self._batchnorm_epsilon, + 'activation': self._activation, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'use_depthwise_convolution': self._use_depthwise_convolution, + } + base_config = super(MultiKernelGroupConvBlock, self).get_config() + base_config.update(config) + return base_config + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class MosaicEncoderBlock(tf.keras.layers.Layer): + """Implements the encoder module/block of MOSAIC model. + + Spatial Pyramid Pooling and Multi-kernel Conv layer + SpatialPyramidPoolingMultiKernelConv + References: + [MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded + context](https://arxiv.org/pdf/2112.11623.pdf) + """ + + def __init__( + self, + encoder_input_level: Optional[Union[str, int]] = '4', + branch_filter_depths: Optional[List[int]] = None, + conv_kernel_sizes: Optional[List[int]] = None, + pyramid_pool_bin_nums: Optional[List[int]] = None, + use_sync_bn: bool = False, + batchnorm_momentum: float = 0.99, + batchnorm_epsilon: float = 0.001, + activation: str = 'relu', + dropout_rate: float = 0.1, + kernel_initializer: str = 'glorot_uniform', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + use_depthwise_convolution: bool = True, + **kwargs): + """Initializes a MOSAIC encoder block which is deployed after a backbone. + + Args: + encoder_input_level: An optional `str` or integer specifying the level of + backbone outputs as the input to the encoder. + branch_filter_depths: A list of integers for the number of convolution + channels in each branch at a pyramid level after SpatialPyramidPooling. + conv_kernel_sizes: A list of integers representing the convolution kernel + sizes in the Multi-kernel Convolution blocks in the encoder. + pyramid_pool_bin_nums: A list of integers for the number of bins at each + level of the Spatial Pyramid Pooling. + use_sync_bn: A bool, whether or not to use sync batch normalization. + batchnorm_momentum: A float for the momentum in BatchNorm. Defaults to + 0.99. + batchnorm_epsilon: A float for the epsilon value in BatchNorm. Defaults to + 0.001. + activation: A `str` for the activation function type. Defaults to 'relu'. + dropout_rate: A float between 0 and 1. Fraction of the input units to drop + out, which will be used directly as the `rate` of the Dropout layer at + the end of the encoder. Defaults to 0.1. + kernel_initializer: Kernel initializer for conv layers. Defaults to + `glorot_uniform`. + kernel_regularizer: Kernel regularizer for conv layers. Defaults to None. + interpolation: The interpolation method for upsampling. Defaults to + `bilinear`. + use_depthwise_convolution: Use depthwise separable convolusions in the + Multi-kernel Convolution blocks in the encoder. + **kwargs: Other keyword arguments for the layer. + """ + super().__init__(**kwargs) + + self._encoder_input_level = str(encoder_input_level) + if branch_filter_depths is None: + branch_filter_depths = [64, 64] + self._branch_filter_depths = branch_filter_depths + if conv_kernel_sizes is None: + conv_kernel_sizes = [3, 5] + self._conv_kernel_sizes = conv_kernel_sizes + if pyramid_pool_bin_nums is None: + pyramid_pool_bin_nums = [1, 4, 8, 16] + self._pyramid_pool_bin_nums = pyramid_pool_bin_nums + self._use_sync_bn = use_sync_bn + self._batchnorm_momentum = batchnorm_momentum + self._batchnorm_epsilon = batchnorm_epsilon + self._activation = activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._interpolation = interpolation + self._use_depthwise_convolution = use_depthwise_convolution + self._activation_fn = tf_utils.get_activation(activation) + + if self._use_sync_bn: + self._bn_op = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._bn_op = tf.keras.layers.BatchNormalization + + self._dropout_rate = dropout_rate + if dropout_rate: + self._encoder_end_dropout_layer = tf.keras.layers.Dropout( + rate=dropout_rate) + else: + self._encoder_end_dropout_layer = None + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + self._channel_axis = -1 + else: + self._bn_axis = 1 + self._channel_axis = 1 + + def _get_bin_pool_kernel_and_stride( + self, + input_size: int, + num_of_bin: int) -> Tuple[int, int]: + """Calculates the kernel size and stride for spatial bin pooling. + + Args: + input_size: Input dimension (a scalar). + num_of_bin: The number of bins used for spatial bin pooling. + + Returns: + The Kernel and Stride for spatial bin pooling (a scalar). + """ + bin_overlap = int(input_size % num_of_bin) + pooling_stride = int(input_size // num_of_bin) + pooling_kernel = pooling_stride + bin_overlap + return pooling_kernel, pooling_stride + + def build( + self, input_shape: Union[tf.TensorShape, Dict[str, + tf.TensorShape]]) -> None: + """Builds this MOSAIC encoder block with the given single input shape.""" + input_shape = ( + input_shape[self._encoder_input_level] + if isinstance(input_shape, dict) else input_shape) + self._data_format = tf.keras.backend.image_data_format() + if self._data_format == 'channels_last': + height = input_shape[1] + width = input_shape[2] + else: + height = input_shape[2] + width = input_shape[3] + + self._global_pool_branch = None + self._spatial_pyramid = [] + + for pyramid_pool_bin_num in self._pyramid_pool_bin_nums: + if pyramid_pool_bin_num == 1: + global_pool = tf.keras.layers.GlobalAveragePooling2D( + data_format=self._data_format, keepdims=True) + global_projection = tf.keras.layers.Conv2D( + filters=max(self._branch_filter_depths), + kernel_size=(1, 1), + padding='same', + activation=None, + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + use_bias=False) + batch_norm_global_branch = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + # Use list manually instead of tf.keras.Sequential([]) + self._global_pool_branch = [ + global_pool, + global_projection, + batch_norm_global_branch, + ] + else: + if height < pyramid_pool_bin_num or width < pyramid_pool_bin_num: + raise ValueError('The number of pooling bins must be smaller than ' + 'input sizes.') + assert pyramid_pool_bin_num >= 2, ( + 'Except for the gloabl pooling, the number of bins in pyramid ' + 'pooling must be at least two.') + pool_height, stride_height = self._get_bin_pool_kernel_and_stride( + height, pyramid_pool_bin_num) + pool_width, stride_width = self._get_bin_pool_kernel_and_stride( + width, pyramid_pool_bin_num) + bin_pool_level = tf.keras.layers.AveragePooling2D( + pool_size=(pool_height, pool_width), + strides=(stride_height, stride_width), + padding='valid', + data_format=self._data_format) + self._spatial_pyramid.append(bin_pool_level) + + # Grouped multi-kernel Convolution. + self._multi_kernel_group_conv = MultiKernelGroupConvBlock( + output_filter_depths=self._branch_filter_depths, + kernel_sizes=self._conv_kernel_sizes, + use_sync_bn=self._use_sync_bn, + batchnorm_momentum=self._batchnorm_momentum, + batchnorm_epsilon=self._batchnorm_epsilon, + activation=self._activation, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + use_depthwise_convolution=self._use_depthwise_convolution) + + # Encoder's final 1x1 feature projection. + # Considering the relatively large #channels merged before projection, + # enlarge the projection #channels to the sum of the filter depths of + # branches. + self._output_channels = sum(self._branch_filter_depths) + # Use list manually instead of tf.keras.Sequential([]). + self._encoder_projection = [ + tf.keras.layers.Conv2D( + filters=self._output_channels, + kernel_size=(1, 1), + padding='same', + activation=None, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + use_bias=False), + self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon), + ] + # Use the TF2 default feature alignment rule for bilinear resizing. + self._upsample = tf.keras.layers.Resizing( + height, + width, + interpolation=self._interpolation, + crop_to_aspect_ratio=False) + self._concat_layer = tf.keras.layers.Concatenate(axis=self._channel_axis) + + def call(self, + inputs: Union[tf.Tensor, Dict[str, tf.Tensor]], + training: Optional[bool] = None) -> tf.Tensor: + """Calls this MOSAIC encoder block with the given input.""" + if training is None: + training = tf.keras.backend.learning_phase() + input_from_backbone_output = ( + inputs[self._encoder_input_level] + if isinstance(inputs, dict) else inputs) + branches = [] + # Original features from the final output of the backbone. + branches.append(input_from_backbone_output) + if self._spatial_pyramid: + for bin_pool_level in self._spatial_pyramid: + x = input_from_backbone_output + x = bin_pool_level(x) + x = self._multi_kernel_group_conv(x, training=training) + x = self._upsample(x) + branches.append(x) + if self._global_pool_branch is not None: + x = input_from_backbone_output + for layer in self._global_pool_branch: + x = layer(x, training=training) + x = self._activation_fn(x) + x = self._upsample(x) + branches.append(x) + x = self._concat_layer(branches) + for layer in self._encoder_projection: + x = layer(x, training=training) + x = self._activation_fn(x) + if self._encoder_end_dropout_layer is not None: + x = self._encoder_end_dropout_layer(x, training=training) + return x + + def get_config(self) -> Dict[str, Any]: + """Returns a config dictionary for initialization from serialization.""" + config = { + 'encoder_input_level': self._encoder_input_level, + 'branch_filter_depths': self._branch_filter_depths, + 'conv_kernel_sizes': self._conv_kernel_sizes, + 'pyramid_pool_bin_nums': self._pyramid_pool_bin_nums, + 'use_sync_bn': self._use_sync_bn, + 'batchnorm_momentum': self._batchnorm_momentum, + 'batchnorm_epsilon': self._batchnorm_epsilon, + 'activation': self._activation, + 'dropout_rate': self._dropout_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'interpolation': self._interpolation, + 'use_depthwise_convolution': self._use_depthwise_convolution, + } + base_config = super().get_config() + base_config.update(config) + return base_config + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class DecoderSumMergeBlock(tf.keras.layers.Layer): + """Implements the decoder feature sum merge block of MOSAIC model. + + This block is used in the decoder of segmentation head introduced in MOSAIC. + It essentially merges a high-resolution feature map of a low semantic level + and a low-resolution feature map of a higher semantic level by 'Sum-Merge'. + """ + + def __init__( + self, + decoder_projected_depth: int, + output_size: Tuple[int, int] = (0, 0), + use_sync_bn: bool = False, + batchnorm_momentum: float = 0.99, + batchnorm_epsilon: float = 0.001, + activation: str = 'relu', + kernel_initializer: str = 'GlorotUniform', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + **kwargs): + """Initialize a sum-merge block for one decoder stage. + + Args: + decoder_projected_depth: An integer representing the number of output + channels of this sum-merge block in the decoder. + output_size: A Tuple of integers representing the output height and width + of the feature maps from this sum-merge block. Defaults to (0, 0), + where the output size is set the same as the high-resolution branch. + use_sync_bn: A bool, whether or not to use sync batch normalization. + batchnorm_momentum: A float for the momentum in BatchNorm. Defaults to + 0.99. + batchnorm_epsilon: A float for the epsilon value in BatchNorm. Defaults to + 0.001. + activation: A `str` for the activation function type. Defaults to 'relu'. + kernel_initializer: Kernel initializer for conv layers. Defaults to + `glorot_uniform`. + kernel_regularizer: Kernel regularizer for conv layers. Defaults to None. + interpolation: The interpolation method for upsampling. Defaults to + `bilinear`. + **kwargs: Other keyword arguments for the layer. + """ + super(DecoderSumMergeBlock, self).__init__(**kwargs) + + self._decoder_projected_depth = decoder_projected_depth + self._output_size = output_size + self._low_res_branch = [] + self._upsample_low_res = None + self._high_res_branch = [] + self._upsample_high_res = None + + self._use_sync_bn = use_sync_bn + self._batchnorm_momentum = batchnorm_momentum + self._batchnorm_epsilon = batchnorm_epsilon + self._activation = activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._interpolation = interpolation + # Apply BN before activation. Putting BN between conv and activation also + # helps quantization where conv+bn+activation are fused into a single op. + self._activation_fn = tf_utils.get_activation(activation) + if self._use_sync_bn: + self._bn_op = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._bn_op = tf.keras.layers.BatchNormalization + + self._bn_axis = ( + -1 + if tf.keras.backend.image_data_format() == 'channels_last' else 1) + self._channel_axis = ( + -1 + if tf.keras.backend.image_data_format() == 'channels_last' else 1) + self._add_layer = tf.keras.layers.Add() + + def build( + self, + input_shape: Tuple[tf.TensorShape, tf.TensorShape]) -> None: + """Builds the block with the given input shape.""" + # Assume backbone features of the same level are concated before input. + low_res_input_shape = input_shape[0] + high_res_input_shape = input_shape[1] + low_res_channels = low_res_input_shape[self._channel_axis] + high_res_channels = high_res_input_shape[self._channel_axis] + + if low_res_channels != self._decoder_projected_depth: + low_res_feature_conv = tf.keras.layers.Conv2D( + filters=self._decoder_projected_depth, + kernel_size=(1, 1), + padding='same', + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + activation=None, + use_bias=False) + batchnorm_op = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + self._low_res_branch.extend([ + low_res_feature_conv, + batchnorm_op, + ]) + if high_res_channels != self._decoder_projected_depth: + high_res_feature_conv = tf.keras.layers.Conv2D( + filters=self._decoder_projected_depth, + kernel_size=(1, 1), + padding='same', + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + activation=None, + use_bias=False) + batchnorm_op_high = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + self._high_res_branch.extend([ + high_res_feature_conv, + batchnorm_op_high, + ]) + # Resize feature maps. + if tf.keras.backend.image_data_format() == 'channels_last': + low_res_height = low_res_input_shape[1] + low_res_width = low_res_input_shape[2] + high_res_height = high_res_input_shape[1] + high_res_width = high_res_input_shape[2] + else: + low_res_height = low_res_input_shape[2] + low_res_width = low_res_input_shape[3] + high_res_height = high_res_input_shape[2] + high_res_width = high_res_input_shape[3] + if (self._output_size[0] == 0 or self._output_size[1] == 0): + self._output_size = (high_res_height, high_res_width) + if (low_res_height != self._output_size[0] or + low_res_width != self._output_size[1]): + self._upsample_low_res = tf.keras.layers.Resizing( + self._output_size[0], + self._output_size[1], + interpolation=self._interpolation, + crop_to_aspect_ratio=False) + if (high_res_height != self._output_size[0] or + high_res_width != self._output_size[1]): + self._upsample_high_res = tf.keras.layers.Resizing( + self._output_size[0], + self._output_size[1], + interpolation=self._interpolation, + crop_to_aspect_ratio=False) + + def call(self, + inputs: Tuple[tf.Tensor, tf.Tensor], + training: Optional[bool] = None) -> tf.Tensor: + """Calls this decoder sum-merge block with the given input. + + Args: + inputs: A Tuple of tensors consisting of a low-resolution higher-semantic + level feature map from the encoder as the first item and a higher + resolution lower-level feature map from the backbone as the second item. + training: a `bool` indicating whether it is in `training` mode. + Note: the first item of the input Tuple takes a lower-resolution feature map + and the second item of the input Tuple takes a higher-resolution branch. + + Returns: + A tensor representing the sum-merged decoder feature map. + """ + if training is None: + training = tf.keras.backend.learning_phase() + x_low_res = inputs[0] + x_high_res = inputs[1] + if self._low_res_branch: + for layer in self._low_res_branch: + x_low_res = layer(x_low_res, training=training) + x_low_res = self._activation_fn(x_low_res) + if self._high_res_branch: + for layer in self._high_res_branch: + x_high_res = layer(x_high_res, training=training) + x_high_res = self._activation_fn(x_high_res) + if self._upsample_low_res is not None: + x_low_res = self._upsample_low_res(x_low_res) + if self._upsample_high_res is not None: + x_high_res = self._upsample_high_res(x_high_res) + output = self._add_layer([x_low_res, x_high_res]) + return output + + def get_config(self) -> Dict[str, Any]: + """Returns a config dictionary for initialization from serialization.""" + config = { + 'decoder_projected_depth': self._decoder_projected_depth, + 'output_size': self._output_size, + 'use_sync_bn': self._use_sync_bn, + 'batchnorm_momentum': self._batchnorm_momentum, + 'batchnorm_epsilon': self._batchnorm_epsilon, + 'activation': self._activation, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'interpolation': self._interpolation, + } + base_config = super(DecoderSumMergeBlock, self).get_config() + base_config.update(config) + return base_config + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class DecoderConcatMergeBlock(tf.keras.layers.Layer): + """Implements the decoder feature concat merge block of MOSAIC model. + + This block is used in the decoder of segmentation head introduced in MOSAIC. + It essentially merges a high-resolution feature map of a low semantic level + and a low-resolution feature of a higher semantic level by 'Concat-Merge'. + """ + + def __init__( + self, + decoder_internal_depth: int, + decoder_projected_depth: int, + output_size: Tuple[int, int] = (0, 0), + use_sync_bn: bool = False, + batchnorm_momentum: float = 0.99, + batchnorm_epsilon: float = 0.001, + activation: str = 'relu', + kernel_initializer: str = 'GlorotUniform', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + **kwargs): + """Initializes a concat-merge block for one decoder stage. + + Args: + decoder_internal_depth: An integer representing the number of internal + channels of this concat-merge block in the decoder. + decoder_projected_depth: An integer representing the number of output + channels of this concat-merge block in the decoder. + output_size: A Tuple of integers representing the output height and width + of the feature maps from this concat-merge block. Defaults to (0, 0), + where the output size is set the same as the high-resolution branch. + use_sync_bn: A bool, whether or not to use sync batch normalization. + batchnorm_momentum: A float for the momentum in BatchNorm. Defaults to + 0.99. + batchnorm_epsilon: A float for the epsilon value in BatchNorm. Defaults to + 0.001. + activation: A `str` for the activation function type. Defaults to 'relu'. + kernel_initializer: Kernel initializer for conv layers. Defaults to + `glorot_uniform`. + kernel_regularizer: Kernel regularizer for conv layers. Defaults to None. + interpolation: The interpolation method for upsampling. Defaults to + `bilinear`. + **kwargs: Other keyword arguments for the layer. + """ + super(DecoderConcatMergeBlock, self).__init__(**kwargs) + + self._decoder_internal_depth = decoder_internal_depth + self._decoder_projected_depth = decoder_projected_depth + self._output_size = output_size + self._upsample_low_res = None + self._upsample_high_res = None + + self._use_sync_bn = use_sync_bn + self._batchnorm_momentum = batchnorm_momentum + self._batchnorm_epsilon = batchnorm_epsilon + self._activation = activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._interpolation = interpolation + # Apply BN before activation. Putting BN between conv and activation also + # helps quantization where conv+bn+activation are fused into a single op. + self._activation_fn = tf_utils.get_activation(activation) + if self._use_sync_bn: + self._bn_op = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._bn_op = tf.keras.layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + self._channel_axis = -1 + else: + self._bn_axis = 1 + self._channel_axis = 1 + + def build( + self, + input_shape: Tuple[tf.TensorShape, tf.TensorShape]) -> None: + """Builds this block with the given input shape.""" + # Assume backbone features of the same level are concated before input. + low_res_input_shape = input_shape[0] + high_res_input_shape = input_shape[1] + # Set up resizing feature maps before concat. + if tf.keras.backend.image_data_format() == 'channels_last': + low_res_height = low_res_input_shape[1] + low_res_width = low_res_input_shape[2] + high_res_height = high_res_input_shape[1] + high_res_width = high_res_input_shape[2] + else: + low_res_height = low_res_input_shape[2] + low_res_width = low_res_input_shape[3] + high_res_height = high_res_input_shape[2] + high_res_width = high_res_input_shape[3] + if (self._output_size[0] == 0 or self._output_size[1] == 0): + self._output_size = (high_res_height, high_res_width) + if (low_res_height != self._output_size[0] or + low_res_width != self._output_size[1]): + self._upsample_low_res = tf.keras.layers.Resizing( + self._output_size[0], + self._output_size[1], + interpolation=self._interpolation, + crop_to_aspect_ratio=False) + if (high_res_height != self._output_size[0] or + high_res_width != self._output_size[1]): + self._upsample_high_res = tf.keras.layers.Resizing( + self._output_size[0], + self._output_size[1], + interpolation=self._interpolation, + crop_to_aspect_ratio=False) + # Set up a 3-layer separable convolution blocks, i.e. + # 1x1->BN->RELU + Depthwise->BN->RELU + 1x1->BN->RELU. + initial_feature_conv = tf.keras.layers.Conv2D( + filters=self._decoder_internal_depth, + kernel_size=(1, 1), + padding='same', + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + activation=None, + use_bias=False) + batchnorm_op1 = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + activation1 = self._activation_fn + depthwise_conv = tf.keras.layers.DepthwiseConv2D( + kernel_size=(3, 3), + depth_multiplier=1, + padding='same', + depthwise_regularizer=self._kernel_regularizer, + depthwise_initializer=self._kernel_initializer, + use_bias=False) + batchnorm_op2 = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + activation2 = self._activation_fn + project_feature_conv = tf.keras.layers.Conv2D( + filters=self._decoder_projected_depth, + kernel_size=(1, 1), + padding='same', + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + activation=None, + use_bias=False) + batchnorm_op3 = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + activation3 = self._activation_fn + self._feature_fusion_block = [ + initial_feature_conv, + batchnorm_op1, + activation1, + depthwise_conv, + batchnorm_op2, + activation2, + project_feature_conv, + batchnorm_op3, + activation3, + ] + self._concat_layer = tf.keras.layers.Concatenate(axis=self._channel_axis) + + def call(self, + inputs: Tuple[tf.Tensor, tf.Tensor], + training: Optional[bool] = None) -> tf.Tensor: + """Calls this concat-merge block with the given inputs. + + Args: + inputs: A Tuple of tensors consisting of a lower-level higher-resolution + feature map from the backbone as the first item and a higher-level + lower-resolution feature map from the encoder as the second item. + training: a `Boolean` indicating whether it is in `training` mode. + + Returns: + A tensor representing the concat-merged decoder feature map. + """ + low_res_input = inputs[0] + high_res_input = inputs[1] + if self._upsample_low_res is not None: + low_res_input = self._upsample_low_res(low_res_input) + if self._upsample_high_res is not None: + high_res_input = self._upsample_high_res(high_res_input) + decoder_feature_list = [low_res_input, high_res_input] + x = self._concat_layer(decoder_feature_list) + for layer in self._feature_fusion_block: + if isinstance(layer, tf.keras.layers.Layer): + x = layer(x, training=training) + else: + x = layer(x) + return x + + def get_config(self) -> Dict[str, Any]: + """Returns a config dictionary for initialization from serialization.""" + config = { + 'decoder_internal_depth': self._decoder_internal_depth, + 'decoder_projected_depth': self._decoder_projected_depth, + 'output_size': self._output_size, + 'use_sync_bn': self._use_sync_bn, + 'batchnorm_momentum': self._batchnorm_momentum, + 'batchnorm_epsilon': self._batchnorm_epsilon, + 'activation': self._activation, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'interpolation': self._interpolation, + } + base_config = super(DecoderConcatMergeBlock, self).get_config() + base_config.update(config) + return base_config diff --git a/official/projects/mosaic/modeling/mosaic_blocks_test.py b/official/projects/mosaic/modeling/mosaic_blocks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..9e1f168dc165275cb8da1fda8556999b199de86b --- /dev/null +++ b/official/projects/mosaic/modeling/mosaic_blocks_test.py @@ -0,0 +1,100 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for mosaic_blocks.""" + +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.mosaic.modeling import mosaic_blocks + + +class MosaicBlocksTest(parameterized.TestCase, tf.test.TestCase): + + def test_multi_kernel_group_conv_block(self): + block = mosaic_blocks.MultiKernelGroupConvBlock([64, 64], [3, 5]) + inputs = tf.ones([1, 4, 4, 448]) + outputs = block(inputs) + self.assertAllEqual(outputs.shape, [1, 4, 4, 128]) + + def test_mosaic_encoder_block(self): + block = mosaic_blocks.MosaicEncoderBlock( + encoder_input_level=4, + branch_filter_depths=[64, 64], + conv_kernel_sizes=[3, 5], + pyramid_pool_bin_nums=[1, 4, 8, 16]) + inputs = tf.ones([1, 32, 32, 448]) + outputs = block(inputs) + self.assertAllEqual(outputs.shape, [1, 32, 32, 128]) + + def test_mosaic_encoder_block_odd_input_overlap_pool(self): + block = mosaic_blocks.MosaicEncoderBlock( + encoder_input_level=4, + branch_filter_depths=[64, 64], + conv_kernel_sizes=[3, 5], + pyramid_pool_bin_nums=[1, 4, 8, 16]) + inputs = tf.ones([1, 31, 31, 448]) + outputs = block(inputs) + self.assertAllEqual(outputs.shape, [1, 31, 31, 128]) + + def test_mosaic_encoder_non_separable_block(self): + block = mosaic_blocks.MosaicEncoderBlock( + encoder_input_level=4, + branch_filter_depths=[64, 64], + conv_kernel_sizes=[3, 5], + pyramid_pool_bin_nums=[1, 4, 8, 16], + use_depthwise_convolution=False) + inputs = tf.ones([1, 32, 32, 448]) + outputs = block(inputs) + self.assertAllEqual(outputs.shape, [1, 32, 32, 128]) + + def test_mosaic_decoder_concat_merge_block(self): + concat_merge_block = mosaic_blocks.DecoderConcatMergeBlock(64, 32, [64, 64]) + inputs = [tf.ones([1, 32, 32, 128]), tf.ones([1, 64, 64, 192])] + outputs = concat_merge_block(inputs) + self.assertAllEqual(outputs.shape, [1, 64, 64, 32]) + + def test_mosaic_decoder_concat_merge_block_default_output_size(self): + concat_merge_block = mosaic_blocks.DecoderConcatMergeBlock(64, 32) + inputs = [tf.ones([1, 32, 32, 128]), tf.ones([1, 64, 64, 192])] + outputs = concat_merge_block(inputs) + self.assertAllEqual(outputs.shape, [1, 64, 64, 32]) + + def test_mosaic_decoder_concat_merge_block_default_output_size_4x(self): + concat_merge_block = mosaic_blocks.DecoderConcatMergeBlock(64, 32) + inputs = [tf.ones([1, 32, 32, 128]), tf.ones([1, 128, 128, 192])] + outputs = concat_merge_block(inputs) + self.assertAllEqual(outputs.shape, [1, 128, 128, 32]) + + def test_mosaic_decoder_concat_merge_block_default_output_size_4x_rec(self): + concat_merge_block = mosaic_blocks.DecoderConcatMergeBlock(64, 32) + inputs = [tf.ones([1, 32, 64, 128]), tf.ones([1, 128, 256, 64])] + outputs = concat_merge_block(inputs) + self.assertAllEqual(outputs.shape, [1, 128, 256, 32]) + + def test_mosaic_decoder_sum_merge_block(self): + concat_merge_block = mosaic_blocks.DecoderSumMergeBlock(32, [128, 128]) + inputs = [tf.ones([1, 64, 64, 32]), tf.ones([1, 128, 128, 64])] + outputs = concat_merge_block(inputs) + self.assertAllEqual(outputs.shape, [1, 128, 128, 32]) + + def test_mosaic_decoder_sum_merge_block_default_output_size(self): + concat_merge_block = mosaic_blocks.DecoderSumMergeBlock(32) + inputs = [tf.ones([1, 64, 64, 32]), tf.ones([1, 128, 128, 64])] + outputs = concat_merge_block(inputs) + self.assertAllEqual(outputs.shape, [1, 128, 128, 32]) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/mosaic/modeling/mosaic_head.py b/official/projects/mosaic/modeling/mosaic_head.py new file mode 100644 index 0000000000000000000000000000000000000000..79e16ecf2a4464f4cbf79034fc420f34e44c1c06 --- /dev/null +++ b/official/projects/mosaic/modeling/mosaic_head.py @@ -0,0 +1,242 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of segmentation head of the MOSAIC model.""" +from typing import Any, Dict, List, Mapping, Optional, Tuple, Union + +import tensorflow as tf + +from official.modeling import tf_utils +from official.projects.mosaic.modeling import mosaic_blocks + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class MosaicDecoderHead(tf.keras.layers.Layer): + """Creates a MOSAIC decoder in segmentation head. + + Reference: + [MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded + Context](https://arxiv.org/pdf/2112.11623.pdf) + """ + + def __init__( + self, + num_classes: int, + decoder_input_levels: Optional[List[str]] = None, + decoder_stage_merge_styles: Optional[List[str]] = None, + decoder_filters: Optional[List[int]] = None, + decoder_projected_filters: Optional[List[int]] = None, + encoder_end_level: Optional[int] = 4, + use_additional_classifier_layer: bool = False, + classifier_kernel_size: int = 1, + activation: str = 'relu', + use_sync_bn: bool = False, + batchnorm_momentum: float = 0.99, + batchnorm_epsilon: float = 0.001, + kernel_initializer: str = 'GlorotUniform', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a MOSAIC segmentation head. + + Args: + num_classes: An `int` number of mask classification categories. The number + of classes does not include background class. + decoder_input_levels: A list of `str` specifying additional + input levels from the backbone outputs for mask refinement in decoder. + decoder_stage_merge_styles: A list of `str` specifying the merge style at + each stage of the decoder, merge styles can be 'concat_merge' or + 'sum_merge'. + decoder_filters: A list of integers specifying the number of channels used + at each decoder stage. Note: this only has affects if the decoder merge + style is 'concat_merge'. + decoder_projected_filters: A list of integers specifying the number of + projected channels at the end of each decoder stage. + encoder_end_level: An optional integer specifying the output level of the + encoder stage, which is used if the input from the encoder to the + decoder head is a dictionary. + use_additional_classifier_layer: A `bool` specifying whether to use an + additional classifier layer or not. It must be True if the final decoder + projected filters does not match the `num_classes`. + classifier_kernel_size: An `int` number to specify the kernel size of the + classifier layer. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + batchnorm_momentum: A `float` of normalization momentum for the moving + average. + batchnorm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_initializer: Kernel initializer for conv layers. Defaults to + `glorot_uniform`. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + interpolation: The interpolation method for upsampling. Defaults to + `bilinear`. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + super(MosaicDecoderHead, self).__init__(**kwargs) + + # Assuming 'decoder_input_levels' are sorted in descending order and the + # other setting are listed in the order according to 'decoder_input_levels'. + if decoder_input_levels is None: + decoder_input_levels = ['3', '2'] + if decoder_stage_merge_styles is None: + decoder_stage_merge_styles = ['concat_merge', 'sum_merge'] + if decoder_filters is None: + decoder_filters = [64, 64] + if decoder_projected_filters is None: + decoder_projected_filters = [32, 32] + self._decoder_input_levels = decoder_input_levels + self._decoder_stage_merge_styles = decoder_stage_merge_styles + self._decoder_filters = decoder_filters + self._decoder_projected_filters = decoder_projected_filters + if (len(decoder_input_levels) != len(decoder_stage_merge_styles) or + len(decoder_input_levels) != len(decoder_filters) or + len(decoder_input_levels) != len(decoder_projected_filters)): + raise ValueError('The number of Decoder inputs and settings must match.') + self._merge_stages = [] + for (stage_merge_style, decoder_filter, + decoder_projected_filter) in zip(decoder_stage_merge_styles, + decoder_filters, + decoder_projected_filters): + if stage_merge_style == 'concat_merge': + concat_merge_stage = mosaic_blocks.DecoderConcatMergeBlock( + decoder_internal_depth=decoder_filter, + decoder_projected_depth=decoder_projected_filter, + output_size=(0, 0), + use_sync_bn=use_sync_bn, + batchnorm_momentum=batchnorm_momentum, + batchnorm_epsilon=batchnorm_epsilon, + activation=activation, + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + interpolation=interpolation) + self._merge_stages.append(concat_merge_stage) + elif stage_merge_style == 'sum_merge': + sum_merge_stage = mosaic_blocks.DecoderSumMergeBlock( + decoder_projected_depth=decoder_projected_filter, + output_size=(0, 0), + use_sync_bn=use_sync_bn, + batchnorm_momentum=batchnorm_momentum, + batchnorm_epsilon=batchnorm_epsilon, + activation=activation, + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + interpolation=interpolation) + self._merge_stages.append(sum_merge_stage) + else: + raise ValueError( + 'A stage merge style in MOSAIC Decoder can only be concat_merge ' + 'or sum_merge.') + + # Concat merge or sum merge does not require an additional classifer layer + # unless the final decoder projected filter does not match num_classes. + final_decoder_projected_filter = decoder_projected_filters[-1] + if (final_decoder_projected_filter != num_classes and + not use_additional_classifier_layer): + raise ValueError('Additional classifier layer is needed if final decoder ' + 'projected filters does not match num_classes!') + self._use_additional_classifier_layer = use_additional_classifier_layer + if use_additional_classifier_layer: + # This additional classification layer uses different kernel + # initializers and bias compared to earlier blocks. + self._pixelwise_classifier = tf.keras.layers.Conv2D( + name='pixelwise_classifier', + filters=num_classes, + kernel_size=classifier_kernel_size, + padding='same', + bias_initializer=tf.zeros_initializer(), + kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer, + use_bias=True) + self._activation_fn = tf_utils.get_activation(activation) + + self._config_dict = { + 'num_classes': num_classes, + 'decoder_input_levels': decoder_input_levels, + 'decoder_stage_merge_styles': decoder_stage_merge_styles, + 'decoder_filters': decoder_filters, + 'decoder_projected_filters': decoder_projected_filters, + 'encoder_end_level': encoder_end_level, + 'use_additional_classifier_layer': use_additional_classifier_layer, + 'classifier_kernel_size': classifier_kernel_size, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'batchnorm_momentum': batchnorm_momentum, + 'batchnorm_epsilon': batchnorm_epsilon, + 'kernel_initializer': kernel_initializer, + 'kernel_regularizer': kernel_regularizer, + 'interpolation': interpolation, + 'bias_regularizer': bias_regularizer + } + + def call(self, + inputs: Tuple[Union[tf.Tensor, Mapping[str, tf.Tensor]], + Union[tf.Tensor, Mapping[str, tf.Tensor]]], + training: Optional[bool] = None) -> tf.Tensor: + """Forward pass of the segmentation head. + + It supports a tuple of 2 elements. Each element is a tensor or a tensor + dictionary. The first one is the final (low-resolution) encoder endpoints, + and the second one is higher-resolution backbone endpoints. + When inputs are tensors, they are from a single level of feature maps. + When inputs are dictionaries, they contain multiple levels of feature maps, + where the key is the level/index of feature map. + Note: 'level' denotes the number of 2x downsampling, defined in backbone. + + Args: + inputs: A tuple of 2 elements, each element can either be a tensor + representing feature maps or 1 dictionary of tensors: + - key: A `str` of the level of the multilevel features. + - values: A `tf.Tensor` of the feature map tensors. + The first is encoder endpoints, and the second is backbone endpoints. + training: a `Boolean` indicating whether it is in `training` mode. + Returns: + segmentation mask prediction logits: A `tf.Tensor` representing the + output logits before the final segmentation mask. + """ + + encoder_outputs = inputs[0] + backbone_outputs = inputs[1] + y = encoder_outputs[str( + self._config_dict['encoder_end_level'])] if isinstance( + encoder_outputs, dict) else encoder_outputs + if isinstance(backbone_outputs, dict): + for level, merge_stage in zip( + self._decoder_input_levels, self._merge_stages): + x = backbone_outputs[str(level)] + y = merge_stage([y, x], training=training) + else: + x = backbone_outputs + y = self._merge_stages[0]([y, x], training=training) + + if self._use_additional_classifier_layer: + y = self._pixelwise_classifier(y) + y = self._activation_fn(y) + + return y + + def get_config(self) -> Dict[str, Any]: + """Returns a config dictionary for initialization from serialization.""" + base_config = super().get_config() + base_config.update(self._config_dict) + return base_config + + @classmethod + def from_config(cls, config: Dict[str, Any]): + return cls(**config) diff --git a/official/projects/mosaic/modeling/mosaic_head_test.py b/official/projects/mosaic/modeling/mosaic_head_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b8e15969181f6667e2254bbfb30ce9a75ab2b9ef --- /dev/null +++ b/official/projects/mosaic/modeling/mosaic_head_test.py @@ -0,0 +1,63 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for mosaic_head.""" + +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.mosaic.modeling import mosaic_head + + +class MosaicBlocksTest(parameterized.TestCase, tf.test.TestCase): + + def test_mosaic_head(self): + decoder_head = mosaic_head.MosaicDecoderHead( + num_classes=32, + decoder_input_levels=['3', '2'], + decoder_stage_merge_styles=['concat_merge', 'sum_merge'], + decoder_filters=[64, 64], + decoder_projected_filters=[32, 32]) + inputs = [ + tf.ones([1, 32, 32, 128]), { + '2': tf.ones([1, 128, 128, 64]), + '3': tf.ones([1, 64, 64, 192]) + } + ] + outputs = decoder_head(inputs) + self.assertAllEqual(outputs.shape, [1, 128, 128, 32]) + + def test_mosaic_head_3laterals(self): + decoder_head = mosaic_head.MosaicDecoderHead( + num_classes=32, + decoder_input_levels=[3, 2, 1], + decoder_stage_merge_styles=[ + 'concat_merge', 'concat_merge', 'sum_merge' + ], + decoder_filters=[64, 64, 64], + decoder_projected_filters=[32, 32, 32]) + inputs = [ + tf.ones([1, 32, 32, 128]), { + '1': tf.ones([1, 256, 256, 64]), + '2': tf.ones([1, 128, 128, 64]), + '3': tf.ones([1, 64, 64, 192]) + } + ] + outputs = decoder_head(inputs) + self.assertAllEqual(outputs.shape, [1, 256, 256, 32]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/mosaic/modeling/mosaic_model.py b/official/projects/mosaic/modeling/mosaic_model.py new file mode 100644 index 0000000000000000000000000000000000000000..1d9a750c3da5acae201baf8bdf1e4111014a8432 --- /dev/null +++ b/official/projects/mosaic/modeling/mosaic_model.py @@ -0,0 +1,152 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Builds the overall MOSAIC segmentation models.""" +from typing import Any, Dict, Optional, Union + +import tensorflow as tf +from official.projects.mosaic.configs import mosaic_config +from official.projects.mosaic.modeling import mosaic_blocks +from official.projects.mosaic.modeling import mosaic_head +from official.vision.modeling import backbones + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class MosaicSegmentationModel(tf.keras.Model): + """A model class for segmentation using MOSAIC. + + Input images are passed through a backbone first. A MOSAIC neck encoder + network is then applied, and finally a MOSAIC segmentation head is applied on + the outputs of the backbone and neck encoder network. Feature fusion and + decoding is done in the segmentation head. + + Reference: + [MOSAIC: Mobile Segmentation via decoding Aggregated Information and encoded + Context](https://arxiv.org/pdf/2112.11623.pdf) + """ + + def __init__(self, + backbone: tf.keras.Model, + head: tf.keras.layers.Layer, + neck: Optional[tf.keras.layers.Layer] = None, + **kwargs): + """Segmentation initialization function. + + Args: + backbone: A backbone network. + head: A segmentation head, e.g. MOSAIC decoder. + neck: An optional neck encoder network, e.g. MOSAIC encoder. If it is not + provided, the decoder head will be connected directly with the backbone. + **kwargs: keyword arguments to be passed. + """ + super(MosaicSegmentationModel, self).__init__(**kwargs) + self._config_dict = { + 'backbone': backbone, + 'neck': neck, + 'head': head, + } + self.backbone = backbone + self.neck = neck + self.head = head + + def call(self, + inputs: tf.Tensor, + training: bool = None) -> Dict[str, tf.Tensor]: + backbone_features = self.backbone(inputs) + + if self.neck is not None: + neck_features = self.neck(backbone_features, training=training) + else: + neck_features = backbone_features + + logits = self.head([neck_features, backbone_features], training=training) + outputs = {'logits': logits} + return outputs + + @property + def checkpoint_items( + self) -> Dict[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: + """Returns a dictionary of items to be additionally checkpointed.""" + items = dict(backbone=self.backbone, head=self.head) + if self.neck is not None: + items.update(neck=self.neck) + return items + + def get_config(self) -> Dict[str, Any]: + """Returns a config dictionary for initialization from serialization.""" + base_config = super().get_config() + model_config = base_config + model_config.update(self._config_dict) + return model_config + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + +def build_mosaic_segmentation_model( + input_specs: tf.keras.layers.InputSpec, + model_config: mosaic_config.MosaicSemanticSegmentationModel, + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + backbone: Optional[tf.keras.Model] = None, + neck: Optional[tf.keras.layers.Layer] = None +) -> tf.keras.Model: + """Builds MOSAIC Segmentation model.""" + norm_activation_config = model_config.norm_activation + if backbone is None: + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + + if neck is None: + neck_config = model_config.neck + neck = mosaic_blocks.MosaicEncoderBlock( + encoder_input_level=neck_config.encoder_input_level, + branch_filter_depths=neck_config.branch_filter_depths, + conv_kernel_sizes=neck_config.conv_kernel_sizes, + pyramid_pool_bin_nums=neck_config.pyramid_pool_bin_nums, + use_sync_bn=norm_activation_config.use_sync_bn, + batchnorm_momentum=norm_activation_config.norm_momentum, + batchnorm_epsilon=norm_activation_config.norm_epsilon, + activation=neck_config.activation, + dropout_rate=neck_config.dropout_rate, + kernel_initializer=neck_config.kernel_initializer, + kernel_regularizer=l2_regularizer, + interpolation=neck_config.interpolation, + use_depthwise_convolution=neck_config.use_depthwise_convolution) + + head_config = model_config.head + head = mosaic_head.MosaicDecoderHead( + num_classes=model_config.num_classes, + decoder_input_levels=head_config.decoder_input_levels, + decoder_stage_merge_styles=head_config.decoder_stage_merge_styles, + decoder_filters=head_config.decoder_filters, + decoder_projected_filters=head_config.decoder_projected_filters, + encoder_end_level=head_config.encoder_end_level, + use_additional_classifier_layer=head_config + .use_additional_classifier_layer, + classifier_kernel_size=head_config.classifier_kernel_size, + activation=head_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + batchnorm_momentum=norm_activation_config.norm_momentum, + batchnorm_epsilon=norm_activation_config.norm_epsilon, + kernel_initializer=head_config.kernel_initializer, + kernel_regularizer=l2_regularizer, + interpolation=head_config.interpolation) + + model = MosaicSegmentationModel( + backbone=backbone, neck=neck, head=head) + return model diff --git a/official/projects/mosaic/modeling/mosaic_model_test.py b/official/projects/mosaic/modeling/mosaic_model_test.py new file mode 100644 index 0000000000000000000000000000000000000000..9ee8d7aa212bc82c3ac4949bf5d8ff0bd01fe862 --- /dev/null +++ b/official/projects/mosaic/modeling/mosaic_model_test.py @@ -0,0 +1,108 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for the overall MOSAIC segmentation network modeling.""" + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.projects.mosaic.modeling import mosaic_blocks +from official.projects.mosaic.modeling import mosaic_head +from official.projects.mosaic.modeling import mosaic_model +from official.vision.modeling import backbones + + +class SegmentationNetworkTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (128, [4, 8], [3, 2], ['concat_merge', 'sum_merge']), + (128, [1, 4, 8], [3, 2], ['concat_merge', 'sum_merge']), + (128, [1, 4, 8], [3, 2], ['sum_merge', 'sum_merge']), + (128, [1, 4, 8], [3, 2], ['concat_merge', 'concat_merge']), + (512, [1, 4, 8, 16], [3, 2], ['concat_merge', 'sum_merge']), + (256, [4, 8], [3, 2], ['concat_merge', 'sum_merge']), + (256, [1, 4, 8], [3, 2], ['concat_merge', 'sum_merge']), + (256, [1, 4, 8, 16], [3, 2], ['concat_merge', 'sum_merge']), + ) + def test_mosaic_segmentation_model(self, + input_size, + pyramid_pool_bin_nums, + decoder_input_levels, + decoder_stage_merge_styles): + """Test for building and calling of a MOSAIC segmentation network.""" + num_classes = 32 + inputs = np.random.rand(2, input_size, input_size, 3) + tf.keras.backend.set_image_data_format('channels_last') + backbone = backbones.MobileNet(model_id='MobileNetMultiAVGSeg') + encoder_input_level = 4 + + neck = mosaic_blocks.MosaicEncoderBlock( + encoder_input_level=encoder_input_level, + branch_filter_depths=[64, 64], + conv_kernel_sizes=[3, 5], + pyramid_pool_bin_nums=pyramid_pool_bin_nums) + head = mosaic_head.MosaicDecoderHead( + num_classes=num_classes, + decoder_input_levels=decoder_input_levels, + decoder_stage_merge_styles=decoder_stage_merge_styles, + decoder_filters=[64, 64], + decoder_projected_filters=[32, 32]) + + model = mosaic_model.MosaicSegmentationModel( + backbone=backbone, + head=head, + neck=neck, + ) + + # Calls the MOSAIC model. + outputs = model(inputs) + level = min(decoder_input_levels) + self.assertAllEqual( + [2, input_size // (2**level), input_size // (2**level), num_classes], + outputs['logits'].numpy().shape) + + def test_serialize_deserialize(self): + """Validate the mosaic network can be serialized and deserialized.""" + num_classes = 8 + backbone = backbones.ResNet(model_id=50) + neck = mosaic_blocks.MosaicEncoderBlock( + encoder_input_level=4, + branch_filter_depths=[64, 64], + conv_kernel_sizes=[3, 5], + pyramid_pool_bin_nums=[1, 4, 8, 16]) + head = mosaic_head.MosaicDecoderHead( + num_classes=num_classes, + decoder_input_levels=[3, 2], + decoder_stage_merge_styles=['concat_merge', 'sum_merge'], + decoder_filters=[64, 64], + decoder_projected_filters=[32, 8]) + model = mosaic_model.MosaicSegmentationModel( + backbone=backbone, + head=head, + neck=neck, + ) + + config = model.get_config() + new_model = mosaic_model.MosaicSegmentationModel.from_config(config) + + # Validate that the config can be forced to JSON. + _ = new_model.to_json() + + # If the serialization was successful, the new config should match the old. + self.assertAllEqual(model.get_config(), new_model.get_config()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/mosaic/mosaic_tasks.py b/official/projects/mosaic/mosaic_tasks.py new file mode 100644 index 0000000000000000000000000000000000000000..6b2af795dde44c8e9cf6cd8b0f8f9aca8acad823 --- /dev/null +++ b/official/projects/mosaic/mosaic_tasks.py @@ -0,0 +1,96 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Task definition for image semantic segmentation with MOSAIC models.""" + +from absl import logging +import tensorflow as tf + +from official.core import task_factory +from official.projects.mosaic.configs import mosaic_config +from official.projects.mosaic.modeling import mosaic_model +from official.vision.tasks import semantic_segmentation as seg_tasks + + +@task_factory.register_task_cls(mosaic_config.MosaicSemanticSegmentationTask) +class MosaicSemanticSegmentationTask(seg_tasks.SemanticSegmentationTask): + """A task for semantic segmentation using MOSAIC model.""" + + # Note: the `build_model` is overrided to add an additional `train` flag + # for the purpose of indicating the model is built for performing `training` + # or `eval`. This is to make sure the model is initialized with proper + # `input_shape` if the model will be trained and evaluated in different + # `input_shape`. For example, the model is trained with cropping but + # evaluated with original shape. + def build_model(self, training: bool = True) -> tf.keras.Model: + """Builds MOSAIC segmentation model.""" + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = mosaic_model.build_mosaic_segmentation_model( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + + # Note: Create a dummy input and call model instance to initialize. + # This ensures all the layers are built; otherwise some layers may be + # missing from the model and cannot be associated with variables from + # a loaded checkpoint. The input size is determined by whether the model + # is built for performing training or eval. + if training: + input_size = self.task_config.train_data.output_size + crop_size = self.task_config.train_data.crop_size + if crop_size: + input_size = crop_size + else: + input_size = self.task_config.validation_data.output_size + dummy_input = tf.ones(shape=[1] + input_size + [3]) + model(dummy_input) + + return model + + def initialize(self, model: tf.keras.Model): + """Loads pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if 'all' in self.task_config.init_checkpoint_modules: + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + ckpt_items = {} + if 'backbone' in self.task_config.init_checkpoint_modules: + ckpt_items.update(backbone=model.backbone) + if 'neck' in self.task_config.init_checkpoint_modules: + ckpt_items.update(neck=model.neck) + + ckpt = tf.train.Checkpoint(**ckpt_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) diff --git a/official/projects/mosaic/mosaic_tasks_test.py b/official/projects/mosaic/mosaic_tasks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b6dd9aebd06ebd729d352b3d226eb9c18084e591 --- /dev/null +++ b/official/projects/mosaic/mosaic_tasks_test.py @@ -0,0 +1,91 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for mosaic task.""" +# pylint: disable=unused-import +import os + +from absl.testing import parameterized +import orbit +import tensorflow as tf + +from official import vision +from official.core import exp_factory +from official.modeling import optimization +from official.projects.mosaic import mosaic_tasks +from official.projects.mosaic.configs import mosaic_config as exp_cfg +from official.vision.dataloaders import tfexample_utils + + +class MosaicTaskTest(parameterized.TestCase, tf.test.TestCase): + + def _create_test_tfrecord(self, tfrecord_file, example, num_samples): + examples = [example] * num_samples + tfexample_utils.dump_to_tfrecord( + record_file=tfrecord_file, tf_examples=examples) + + @parameterized.parameters( + ('mosaic_mnv35_cityscapes', True), + ('mosaic_mnv35_cityscapes', False), + ) + def test_semantic_segmentation_task(self, test_config, is_training): + """Tests mosaic task for training and eval using toy configs.""" + input_image_size = [1024, 2048] + test_tfrecord_file = os.path.join(self.get_temp_dir(), 'seg_test.tfrecord') + example = tfexample_utils.create_segmentation_test_example( + image_height=input_image_size[0], + image_width=input_image_size[1], + image_channel=3) + self._create_test_tfrecord( + tfrecord_file=test_tfrecord_file, example=example, num_samples=10) + config = exp_factory.get_exp_config(test_config) + # Modify config to suit local testing + config.task.model.input_size = [None, None, 3] + config.trainer.steps_per_loop = 1 + config.task.train_data.global_batch_size = 1 + config.task.validation_data.global_batch_size = 1 + config.task.train_data.output_size = [1024, 2048] + config.task.validation_data.output_size = [1024, 2048] + config.task.train_data.crop_size = [512, 512] + config.task.train_data.shuffle_buffer_size = 2 + config.task.validation_data.shuffle_buffer_size = 2 + config.task.validation_data.input_path = test_tfrecord_file + config.task.train_data.input_path = test_tfrecord_file + config.train_steps = 1 + config.task.model.num_classes = 256 + config.task.model.head.num_classes = 256 + config.task.model.head.decoder_projected_filters = [256, 256] + + task = mosaic_tasks.MosaicSemanticSegmentationTask(config.task) + model = task.build_model(training=is_training) + metrics = task.build_metrics(training=is_training) + + strategy = tf.distribute.get_strategy() + + data_config = config.task.train_data if is_training else config.task.validation_data + dataset = orbit.utils.make_distributed_dataset(strategy, task.build_inputs, + data_config) + iterator = iter(dataset) + opt_factory = optimization.OptimizerFactory(config.trainer.optimizer_config) + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + + if is_training: + logs = task.train_step(next(iterator), model, optimizer, metrics=metrics) + else: + logs = task.validation_step(next(iterator), model, metrics=metrics) + + self.assertIn('loss', logs) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/mosaic/train.py b/official/projects/mosaic/train.py new file mode 100644 index 0000000000000000000000000000000000000000..ea07b6d80ef9030304f87df9670224d7e12f4667 --- /dev/null +++ b/official/projects/mosaic/train.py @@ -0,0 +1,103 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Training driver for MOSAIC models.""" + +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import base_trainer +from official.core import config_definitions +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +# Import MOSAIC libraries to register the model into tf.vision +# model garden factory. +# pylint: disable=unused-import +from official.projects.mosaic import mosaic_tasks +from official.projects.mosaic.modeling import mosaic_model +from official.vision import registry_imports +# pylint: enable=unused-import + +FLAGS = flags.FLAGS + + +# Note: we overrided the `build_trainer` due to the customized `build_model` +# methods in `MosaicSemanticSegmentationTask. +def _build_mosaic_trainer(params: config_definitions.ExperimentConfig, + task: mosaic_tasks.MosaicSemanticSegmentationTask, + model_dir: str, train: bool, + evaluate: bool) -> base_trainer.Trainer: + """Creates custom trainer.""" + checkpoint_exporter = train_lib.maybe_create_best_ckpt_exporter( + params, model_dir) + model = task.build_model(train) + optimizer = train_utils.create_optimizer(task, params) + trainer = base_trainer.Trainer( + params, + task, + model=model, + optimizer=optimizer, + train=train, + evaluate=evaluate, + checkpoint_exporter=checkpoint_exporter) + return trainer + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + mosaic_trainer = _build_mosaic_trainer( + task=task, + params=params, + model_dir=model_dir, + train='train' in FLAGS.mode, + evaluate='eval' in FLAGS.mode) + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir, + trainer=mosaic_trainer) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + +if __name__ == '__main__': + tfm_flags.define_flags() + flags.mark_flags_as_required(['experiment', 'mode', 'model_dir']) + app.run(main) diff --git a/official/projects/movinet/README.md b/official/projects/movinet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..36bcfe89de5a2cfc53bbbd090801f1e57e7d6a85 --- /dev/null +++ b/official/projects/movinet/README.md @@ -0,0 +1,444 @@ +# Mobile Video Networks (MoViNets) + +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/movinet/movinet_tutorial.ipynb) +[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet) +[![Paper](http://img.shields.io/badge/Paper-arXiv.2103.11511-B3181B?logo=arXiv)](https://arxiv.org/abs/2103.11511) + +This repository is the official implementation of +[MoViNets: Mobile Video Networks for Efficient Video +Recognition](https://arxiv.org/abs/2103.11511). + +- **[UPDATE 2022-03-14] Quantized TF Lite models + [available on TF Hub](https://tfhub.dev/s?deployment-format=lite&q=movinet) + (also [see table](https://tfhub.dev/google/collections/movinet) for + quantized performance)** + +

+ +

+ +Create your own video plot like the one above with this [Colab notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/movinet/tools/plot_movinet_video_stream_predictions.ipynb). + +## Description + +Mobile Video Networks (MoViNets) are efficient video classification models +runnable on mobile devices. MoViNets demonstrate state-of-the-art accuracy and +efficiency on several large-scale video action recognition datasets. + +On [Kinetics 600](https://deepmind.com/research/open-source/kinetics), +MoViNet-A6 achieves 84.8% top-1 accuracy, outperforming recent +Vision Transformer models like [ViViT](https://arxiv.org/abs/2103.15691) (83.0%) +and [VATT](https://arxiv.org/abs/2104.11178) (83.6%) without any additional +training data, while using 10x fewer FLOPs. And streaming MoViNet-A0 achieves +72% accuracy while using 3x fewer FLOPs than MobileNetV3-large (68%). + +There is a large gap between video model performance of accurate models and +efficient models for video action recognition. On the one hand, 2D MobileNet +CNNs are fast and can operate on streaming video in real time, but are prone to +be noisy and inaccurate. On the other hand, 3D CNNs are accurate, but are +memory and computation intensive and cannot operate on streaming video. + +MoViNets bridge this gap, producing: + +- State-of-the art efficiency and accuracy across the model family (MoViNet-A0 +to A6). +- Streaming models with 3D causal convolutions substantially reducing memory +usage. +- Temporal ensembles of models to boost efficiency even higher. + +MoViNets also improve computational efficiency by outputting high-quality +predictions frame by frame, as opposed to the traditional multi-clip evaluation +approach that performs redundant computation and limits temporal scope. + +

+ +

+ +

+ +

+ +## History + +- **2022-03-14** Support quantized TF Lite models and add/update Colab +notebooks. +- **2021-07-12** Add TF Lite support and replace 3D stream models with +mobile-friendly (2+1)D stream. +- **2021-05-30** Add streaming MoViNet checkpoints and examples. +- **2021-05-11** Initial Commit. + +## Authors and Maintainers + +* Dan Kondratyuk ([@hyperparticle](https://github.com/hyperparticle)) +* Liangzhe Yuan ([@yuanliangzhe](https://github.com/yuanliangzhe)) +* Yeqing Li ([@yeqingli](https://github.com/yeqingli)) + +## Table of Contents + +- [Requirements](#requirements) +- [Results and Pretrained Weights](#results-and-pretrained-weights) + - [Kinetics 600](#kinetics-600) + - [Kinetics 400](#kinetics-400) +- [Prediction Examples](#prediction-examples) +- [TF Lite Example](#tf-lite-example) +- [Training and Evaluation](#training-and-evaluation) +- [References](#references) +- [License](#license) +- [Citation](#citation) + +## Requirements + +[![TensorFlow 2.4](https://img.shields.io/badge/TensorFlow-2.1-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0) +[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB?logo=python)](https://www.python.org/downloads/release/python-360/) + +To install requirements: + +```shell +pip install -r requirements.txt +``` + +## Results and Pretrained Weights + +[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet) +[![TensorBoard](https://img.shields.io/badge/TensorBoard-dev-FF6F00?logo=tensorflow)](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/) + +### Kinetics 600 + +

+ +

+ +[tensorboard.dev summary](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/) +of training runs across all models. + +The table below summarizes the performance of each model on +[Kinetics 600](https://deepmind.com/research/open-source/kinetics) +and provides links to download pretrained models. All models are evaluated on +single clips with the same resolution as training. + +Note: MoViNet-A6 can be constructed as an ensemble of MoViNet-A4 and +MoViNet-A5. + +#### Base Models + +Base models implement standard 3D convolutions without stream buffers. Base +models are not recommended for fast inference on CPU or mobile due to +limited support for +[`tf.nn.conv3d`](https://www.tensorflow.org/api_docs/python/tf/nn/conv3d). +Instead, see the [streaming models section](#streaming-models). + +| Model Name | Top-1 Accuracy | Top-5 Accuracy | Input Shape | GFLOPs\* | Checkpoint | TF Hub SavedModel | +|------------|----------------|----------------|-------------|----------|------------|-------------------| +| MoViNet-A0-Base | 72.28 | 90.92 | 50 x 172 x 172 | 2.7 | [checkpoint (12 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a0/base/kinetics-600/classification/) | +| MoViNet-A1-Base | 76.69 | 93.40 | 50 x 172 x 172 | 6.0 | [checkpoint (18 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a1/base/kinetics-600/classification/) | +| MoViNet-A2-Base | 78.62 | 94.17 | 50 x 224 x 224 | 10 | [checkpoint (20 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a2/base/kinetics-600/classification/) | +| MoViNet-A3-Base | 81.79 | 95.67 | 120 x 256 x 256 | 57 | [checkpoint (29 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a3/base/kinetics-600/classification/) | +| MoViNet-A4-Base | 83.48 | 96.16 | 80 x 290 x 290 | 110 | [checkpoint (44 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a4/base/kinetics-600/classification/) | +| MoViNet-A5-Base | 84.27 | 96.39 | 120 x 320 x 320 | 280 | [checkpoint (72 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a5/base/kinetics-600/classification/) | + +\*GFLOPs per video on Kinetics 600. + +#### Streaming Models + +Streaming models implement causal (2+1)D convolutions with stream buffers. +Streaming models use (2+1)D convolution instead of 3D to utilize optimized +[`tf.nn.conv2d`](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) +operations, which offer fast inference on CPU. Streaming models can be run on +individual frames or on larger video clips like base models. + +Note: A3, A4, and A5 models use a positional encoding in the squeeze-excitation +blocks, while A0, A1, and A2 do not. For the smaller models, accuracy is +unaffected without positional encoding, while for the larger models accuracy is +significantly worse without positional encoding. + +| Model Name | Top-1 Accuracy | Top-5 Accuracy | Input Shape\* | GFLOPs\*\* | Checkpoint | TF Hub SavedModel | +|------------|----------------|----------------|---------------|------------|------------|-------------------| +| MoViNet-A0-Stream | 72.05 | 90.63 | 50 x 172 x 172 | 2.7 | [checkpoint (12 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a0/stream/kinetics-600/classification/) | +| MoViNet-A1-Stream | 76.45 | 93.25 | 50 x 172 x 172 | 6.0 | [checkpoint (18 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a1/stream/kinetics-600/classification/) | +| MoViNet-A2-Stream | 78.40 | 94.05 | 50 x 224 x 224 | 10 | [checkpoint (20 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a2/stream/kinetics-600/classification/) | +| MoViNet-A3-Stream | 80.09 | 94.84 | 120 x 256 x 256 | 57 | [checkpoint (29 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a3/stream/kinetics-600/classification/) | +| MoViNet-A4-Stream | 81.49 | 95.66 | 80 x 290 x 290 | 110 | [checkpoint (44 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a4/stream/kinetics-600/classification/) | +| MoViNet-A5-Stream | 82.37 | 95.79 | 120 x 320 x 320 | 280 | [checkpoint (72 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a5/stream/kinetics-600/classification/) | + +\*In streaming mode, the number of frames correspond to the total accumulated +duration of the 10-second clip. + +\*\*GFLOPs per video on Kinetics 600. + +Note: current streaming model checkpoints have been updated with a slightly +different architecture. To download the old checkpoints, insert `_legacy` before +`.tar.gz` in the URL. E.g., `movinet_a0_stream_legacy.tar.gz`. + +##### TF Lite Streaming Models + +For convenience, we provide converted TF Lite models for inference on mobile +devices. See the [TF Lite Example](#tf-lite-example) to export and run your own +models. We also provide [quantized TF Lite binaries via TF Hub](https://tfhub.dev/s?deployment-format=lite&q=movinet). + +For reference, MoViNet-A0-Stream runs with a similar latency to +[MobileNetV3-Large](https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/) +with +5% accuracy on Kinetics 600. + +| Model Name | Input Shape | Pixel 4 Latency\* | x86 Latency\* | TF Lite Binary | +|------------|-------------|-------------------|---------------|----------------| +| MoViNet-A0-Stream | 1 x 1 x 172 x 172 | 22 ms | 16 ms | [TF Lite (13 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tflite) | +| MoViNet-A1-Stream | 1 x 1 x 172 x 172 | 42 ms | 33 ms | [TF Lite (45 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_stream.tflite) | +| MoViNet-A2-Stream | 1 x 1 x 224 x 224 | 200 ms | 66 ms | [TF Lite (53 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_stream.tflite) | +| MoViNet-A3-Stream | 1 x 1 x 256 x 256 | - | 120 ms | [TF Lite (73 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_stream.tflite) | +| MoViNet-A4-Stream | 1 x 1 x 290 x 290 | - | 300 ms | [TF Lite (101 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_stream.tflite) | +| MoViNet-A5-Stream | 1 x 1 x 320 x 320 | - | 450 ms | [TF Lite (153 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_stream.tflite) | + +\*Single-frame latency measured on with unaltered float32 operations on a +single CPU core. Observed latency may differ depending on hardware +configuration. Measured on a stock Pixel 4 (Android 11) and x86 Intel Xeon +W-2135 CPU. + +### Kinetics 400 + +We also have checkpoints for Kinetics 400 models available. See the Kinetics 600 +sections for more details. To load checkpoints, set `num_classes=400`. + +#### Base Models + +| Model Name | Top-1 Accuracy | Top-5 Accuracy | Input Shape | GFLOPs\* | Checkpoint | +|------------|----------------|----------------|-------------|----------|------------| +| MoViNet-A0-Base | 69.40 | 89.18 | 50 x 172 x 172 | 2.7 | [checkpoint (12 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_base_k400.tar.gz) | +| MoViNet-A1-Base | 74.57 | 92.03 | 50 x 172 x 172 | 6.0 | [checkpoint (18 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_base_k400.tar.gz) | +| MoViNet-A2-Base | 75.91 | 92.63 | 50 x 224 x 224 | 10 | [checkpoint (20 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_base_k400.tar.gz) | +| MoViNet-A3-Base | 79.34 | 94.52 | 120 x 256 x 256 | 57 | [checkpoint (29 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_base_k400.tar.gz) | +| MoViNet-A4-Base | 80.64 | 94.93 | 80 x 290 x 290 | 110 | [checkpoint (44 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_base_k400.tar.gz) | +| MoViNet-A5-Base | 81.39 | 95.06 | 120 x 320 x 320 | 280 | [checkpoint (72 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_base_k400.tar.gz) | + +*GFLOPs per video on Kinetics 400. + +## Prediction Examples + +Please check out our [Colab Notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/movinet/movinet_tutorial.ipynb) +to get started with MoViNets. + +This section provides examples on how to run prediction. + +For **base models**, run the following: + +```python +import tensorflow as tf + +from official.projects.movinet.modeling import movinet +from official.projects.movinet.modeling import movinet_model + +# Create backbone and model. +backbone = movinet.Movinet( + model_id='a0', + causal=False, + use_external_states=False, +) +model = movinet_model.MovinetClassifier( + backbone, num_classes=600, output_states=False) + +# Create your example input here. +# Refer to the paper for recommended input shapes. +inputs = tf.ones([1, 8, 172, 172, 3]) + +# [Optional] Build the model and load a pretrained checkpoint +model.build(inputs.shape) + +checkpoint_dir = '/path/to/checkpoint' +checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir) +checkpoint = tf.train.Checkpoint(model=model) +status = checkpoint.restore(checkpoint_path) +status.assert_existing_objects_matched() + +# Run the model prediction. +output = model(inputs) +prediction = tf.argmax(output, -1) +``` + +For **streaming models**, run the following: + +```python +import tensorflow as tf + +from official.projects.movinet.modeling import movinet +from official.projects.movinet.modeling import movinet_model + +model_id = 'a0' +use_positional_encoding = model_id in {'a3', 'a4', 'a5'} + +# Create backbone and model. +backbone = movinet.Movinet( + model_id=model_id, + causal=True, + conv_type='2plus1d', + se_type='2plus3d', + activation='hard_swish', + gating_activation='hard_sigmoid', + use_positional_encoding=use_positional_encoding, + use_external_states=True, +) + +model = movinet_model.MovinetClassifier( + backbone, + num_classes=600, + output_states=True) + +# Create your example input here. +# Refer to the paper for recommended input shapes. +inputs = tf.ones([1, 8, 172, 172, 3]) + +# [Optional] Build the model and load a pretrained checkpoint. +model.build(inputs.shape) + +checkpoint_dir = '/path/to/checkpoint' +checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir) +checkpoint = tf.train.Checkpoint(model=model) +status = checkpoint.restore(checkpoint_path) +status.assert_existing_objects_matched() + +# Split the video into individual frames. +# Note: we can also split into larger clips as well (e.g., 8-frame clips). +# Running on larger clips will slightly reduce latency overhead, but +# will consume more memory. +frames = tf.split(inputs, inputs.shape[1], axis=1) + +# Initialize the dict of states. All state tensors are initially zeros. +init_states = model.init_states(tf.shape(inputs)) + +# Run the model prediction by looping over each frame. +states = init_states +predictions = [] +for frame in frames: + output, states = model({**states, 'image': frame}) + predictions.append(output) + +# The video classification will simply be the last output of the model. +final_prediction = tf.argmax(predictions[-1], -1) + +# Alternatively, we can run the network on the entire input video. +# The output should be effectively the same +# (but it may differ a small amount due to floating point errors). +non_streaming_output, _ = model({**init_states, 'image': inputs}) +non_streaming_prediction = tf.argmax(non_streaming_output, -1) +``` + +## TF Lite Example + +This section outlines an example on how to export a model to run on mobile +devices with [TF Lite](https://www.tensorflow.org/lite). + +[Optional] For streaming models, they are typically trained with +`conv_type = 3d_2plus1d` for better training throughpouts. In order to achieve +better inference performance on CPU, we need to convert the `3d_2plus1d` +checkpoint to make it compatible with the `2plus1d` graph. +You could achieve this by running `tools/convert_3d_2plus1d.py`. + +First, convert to [TF SavedModel](https://www.tensorflow.org/guide/saved_model) +by running `export_saved_model.py`. For example, for `MoViNet-A0-Stream`, run: + +```shell +python3 export_saved_model.py \ + --model_id=a0 \ + --causal=True \ + --conv_type=2plus1d \ + --se_type=2plus3d \ + --activation=hard_swish \ + --gating_activation=hard_sigmoid \ + --use_positional_encoding=False \ + --num_classes=600 \ + --batch_size=1 \ + --num_frames=1 \ + --image_size=172 \ + --bundle_input_init_states_fn=False \ + --checkpoint_path=/path/to/checkpoint \ + --export_path=/tmp/movinet_a0_stream +``` + +Then the SavedModel can be converted to TF Lite using the [`TFLiteConverter`](https://www.tensorflow.org/lite/convert): + +```python +saved_model_dir = '/tmp/movinet_a0_stream' +converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) +tflite_model = converter.convert() + +with open('/tmp/movinet_a0_stream.tflite', 'wb') as f: + f.write(tflite_model) +``` + +To run with TF Lite using [tf.lite.Interpreter](https://www.tensorflow.org/lite/guide/inference#load_and_run_a_model_in_python) +with the Python API: + +```python +# Create the interpreter and signature runner +interpreter = tf.lite.Interpreter('/tmp/movinet_a0_stream.tflite') +runner = interpreter.get_signature_runner() + +# Extract state names and create the initial (zero) states +def state_name(name: str) -> str: + return name[len('serving_default_'):-len(':0')] + +init_states = { + state_name(x['name']): tf.zeros(x['shape'], dtype=x['dtype']) + for x in interpreter.get_input_details() +} +del init_states['image'] + +# Insert your video clip here +video = tf.ones([1, 8, 172, 172, 3]) +clips = tf.split(video, video.shape[1], axis=1) + +# To run on a video, pass in one frame at a time +states = init_states +for clip in clips: + # Input shape: [1, 1, 172, 172, 3] + outputs = runner(**states, image=clip) + logits = outputs.pop('logits') + states = outputs +``` + +Follow the [official guide](https://www.tensorflow.org/lite/guide) to run a +model with TF Lite on your mobile device. + +## Training and Evaluation + +Run this command line for continuous training and evaluation. + +```shell +MODE=train_and_eval # Can also be 'train' if using a separate evaluator job +CONFIG_FILE=official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml +python3 official/projects/movinet/train.py \ + --experiment=movinet_kinetics600 \ + --mode=${MODE} \ + --model_dir=/tmp/movinet_a0_base/ \ + --config_file=${CONFIG_FILE} +``` + +Run this command line for evaluation. + +```shell +MODE=eval # Can also be 'eval_continuous' for use during training +CONFIG_FILE=official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml +python3 official/projects/movinet/train.py \ + --experiment=movinet_kinetics600 \ + --mode=${MODE} \ + --model_dir=/tmp/movinet_a0_base/ \ + --config_file=${CONFIG_FILE} +``` + +## License + +[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) + +This project is licensed under the terms of the **Apache License 2.0**. + +## Citation + +If you want to cite this code in your research paper, please use the following +information. + +``` +@article{kondratyuk2021movinets, + title={MoViNets: Mobile Video Networks for Efficient Video Recognition}, + author={Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Matthew Brown, and Boqing Gong}, + journal={arXiv preprint arXiv:2103.11511}, + year={2021} +} +``` diff --git a/official/projects/movinet/__init__.py b/official/projects/movinet/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/movinet/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/movinet/configs/__init__.py b/official/projects/movinet/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/movinet/configs/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/movinet/configs/movinet.py b/official/projects/movinet/configs/movinet.py new file mode 100644 index 0000000000000000000000000000000000000000..8db85b03f7c84f413ed87cbf0ef25d6e1b067068 --- /dev/null +++ b/official/projects/movinet/configs/movinet.py @@ -0,0 +1,149 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Definitions for MoViNet structures. + +Reference: "MoViNets: Mobile Video Networks for Efficient Video Recognition" +https://arxiv.org/pdf/2103.11511.pdf + +MoViNets are efficient video classification networks that are part of a model +family, ranging from the smallest model, MoViNet-A0, to the largest model, +MoViNet-A6. Each model has various width, depth, input resolution, and input +frame-rate associated with them. See the main paper for more details. +""" + +import dataclasses + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.vision.configs import backbones_3d +from official.vision.configs import common +from official.vision.configs import video_classification + + +@dataclasses.dataclass +class Movinet(hyperparams.Config): + """Backbone config for Base MoViNet.""" + model_id: str = 'a0' + causal: bool = False + use_positional_encoding: bool = False + # Choose from ['3d', '2plus1d', '3d_2plus1d'] + # 3d: default 3D convolution + # 2plus1d: (2+1)D convolution with Conv2D (2D reshaping) + # 3d_2plus1d: (2+1)D convolution with Conv3D (no 2D reshaping) + conv_type: str = '3d' + # Choose from ['3d', '2d', '2plus3d'] + # 3d: default 3D global average pooling. + # 2d: 2D global average pooling. + # 2plus3d: concatenation of 2D and 3D global average pooling. + se_type: str = '3d' + activation: str = 'swish' + gating_activation: str = 'sigmoid' + stochastic_depth_drop_rate: float = 0.2 + use_external_states: bool = False + average_pooling_type: str = '3d' + output_states: bool = True + + +@dataclasses.dataclass +class MovinetA0(Movinet): + """Backbone config for MoViNet-A0. + + Represents the smallest base MoViNet searched by NAS. + + Reference: https://arxiv.org/pdf/2103.11511.pdf + """ + model_id: str = 'a0' + + +@dataclasses.dataclass +class MovinetA1(Movinet): + """Backbone config for MoViNet-A1.""" + model_id: str = 'a1' + + +@dataclasses.dataclass +class MovinetA2(Movinet): + """Backbone config for MoViNet-A2.""" + model_id: str = 'a2' + + +@dataclasses.dataclass +class MovinetA3(Movinet): + """Backbone config for MoViNet-A3.""" + model_id: str = 'a3' + + +@dataclasses.dataclass +class MovinetA4(Movinet): + """Backbone config for MoViNet-A4.""" + model_id: str = 'a4' + + +@dataclasses.dataclass +class MovinetA5(Movinet): + """Backbone config for MoViNet-A5. + + Represents the largest base MoViNet searched by NAS. + """ + model_id: str = 'a5' + + +@dataclasses.dataclass +class MovinetT0(Movinet): + """Backbone config for MoViNet-T0. + + MoViNet-T0 is a smaller version of MoViNet-A0 for even faster processing. + """ + model_id: str = 't0' + + +@dataclasses.dataclass +class Backbone3D(backbones_3d.Backbone3D): + """Configuration for backbones. + + Attributes: + type: 'str', type of backbone be used, on the of fields below. + movinet: movinet backbone config. + """ + type: str = 'movinet' + movinet: Movinet = Movinet() + + +@dataclasses.dataclass +class MovinetModel(video_classification.VideoClassificationModel): + """The MoViNet model config.""" + model_type: str = 'movinet' + backbone: Backbone3D = Backbone3D() + norm_activation: common.NormActivation = common.NormActivation( + activation=None, # legacy flag, not used. + norm_momentum=0.99, + norm_epsilon=1e-3, + use_sync_bn=True) + activation: str = 'swish' + output_states: bool = False + + +@exp_factory.register_config_factory('movinet_kinetics600') +def movinet_kinetics600() -> cfg.ExperimentConfig: + """Video classification on Videonet with MoViNet backbone.""" + exp = video_classification.video_classification_kinetics600() + exp.task.train_data.dtype = 'bfloat16' + exp.task.validation_data.dtype = 'bfloat16' + + model = MovinetModel() + exp.task.model = model + + return exp diff --git a/official/projects/movinet/configs/movinet_test.py b/official/projects/movinet/configs/movinet_test.py new file mode 100644 index 0000000000000000000000000000000000000000..6efd069c97f0dfa5bec4e31ed415ffb555dbb234 --- /dev/null +++ b/official/projects/movinet/configs/movinet_test.py @@ -0,0 +1,42 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for movinet video classification.""" + +from absl.testing import parameterized +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.movinet.configs import movinet +from official.vision.configs import video_classification as exp_cfg + + +class MovinetConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + ('movinet_kinetics600',),) + def test_video_classification_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.VideoClassificationTask) + self.assertIsInstance(config.task.model, movinet.MovinetModel) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml index bd7d3ce92a9d1b5dee6932a0be1e39e0f54e938f..368a5c8ca0d71a80e28d373a1d404fc9c0d242e8 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml @@ -18,6 +18,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.2 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml b/official/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml rename to official/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml index a144ac56e4df38adbe336e807fbf236ab2d4345d..b8f66ed68ae36be3b2b83e799dc20a9190bcda26 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a0_k600_cpu_local.yaml @@ -12,6 +12,7 @@ task: norm_activation: use_sync_bn: false dropout_rate: 0.5 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml index 749f97af2392eedee98fa35c5040bad33bb5cda2..5f678a844d7ecc300ea8d56f7e07f15d3448394e 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a0_stream_k600_8x8.yaml @@ -24,6 +24,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.2 + activation: 'hard_swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml index 8c097f49b4a06af4fbda6b7c45b1e51b441247fd..f25539d70bf887fc6120ee586ef58b4b6c905ea9 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml @@ -18,6 +18,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml index 7f6e597368ac380e31f2086708f959917c2e2aa1..3948f3ed1ff4e643c0d1e917f78ea4d5398aee1e 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a1_stream_k600_8x8.yaml @@ -24,6 +24,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.2 + activation: 'hard_swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml index 575772b9f3e62e2e8a462c7cb08bbb64e8ac44e0..a8728225ae25d081cf494b4bf552535931dd4012 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml @@ -18,6 +18,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml index d5a1f9d9ebc97745743ff978e3be5d2d6cffb63c..9f707e98d9b9fb8d1fb32ff2957f4172b85ca65c 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a2_stream_k600_8x8.yaml @@ -24,6 +24,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'hard_swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml index a4d34314695baa36a74fd515118ae3429cd81d4b..5a729cca8fae791ac57d1d894df3699fd1a78114 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml @@ -18,6 +18,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml index 3f8336be0678240f7d65244e12496f7e5bee0917..c4778a084aff56282afbeebed6ee7a45d7167a25 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a3_stream_k600_8x8.yaml @@ -25,6 +25,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'hard_swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml index 102ccad4f5524a2160b004262bcc34cc2d06680f..e3eb772b80ceaa3c13d311649d6b5ebd860458c9 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml @@ -18,6 +18,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a4_stream_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a4_stream_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a4_stream_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a4_stream_k600_8x8.yaml index ac72c65f7c5753b09f85aac75790139f1da05e74..0b3b687605c4a6544cff23882ee4503b8cef3b9f 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a4_stream_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a4_stream_k600_8x8.yaml @@ -25,6 +25,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'hard_swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml index 79c9d209d9177f59e05b1e0b7d81682cdad3197f..560fccab4cdb12890509531b33d7feedddf887e4 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml @@ -18,6 +18,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_a5_stream_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_a5_stream_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_a5_stream_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_a5_stream_k600_8x8.yaml index 13b7c4904c2c4e90021277701951c2e4858953ec..c44f9ba2653c82718932cc1f1ff85b2d5f0bf853 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_a5_stream_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_a5_stream_k600_8x8.yaml @@ -25,6 +25,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.5 + activation: 'hard_swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml index b6b190c8acb6f504ac489f9531f8f5af46f7ba65..fde13d342834f0c718b635b4a48e4bcf14940e6b 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_t0_k600_8x8.yaml @@ -18,6 +18,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.2 + activation: 'swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml b/official/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml similarity index 98% rename from official/vision/beta/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml rename to official/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml index 9f490e7c3d4049d6ef6d984bcfa2cfaeaf900488..b302850c66985225fa541ce07d735e399959687f 100644 --- a/official/vision/beta/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml +++ b/official/projects/movinet/configs/yaml/movinet_t0_stream_k600_8x8.yaml @@ -24,6 +24,7 @@ task: norm_activation: use_sync_bn: true dropout_rate: 0.2 + activation: 'hard_swish' train_data: name: kinetics600 variant_name: rgb diff --git a/official/projects/movinet/files/jumpingjack.gif b/official/projects/movinet/files/jumpingjack.gif new file mode 100644 index 0000000000000000000000000000000000000000..9527e431228e519afa0f307121b4d2d54dc69c58 Binary files /dev/null and b/official/projects/movinet/files/jumpingjack.gif differ diff --git a/official/projects/movinet/files/kinetics_600_labels.txt b/official/projects/movinet/files/kinetics_600_labels.txt new file mode 100644 index 0000000000000000000000000000000000000000..639e9c91fa8a941ea57942872fae55628d590b42 --- /dev/null +++ b/official/projects/movinet/files/kinetics_600_labels.txt @@ -0,0 +1,600 @@ +abseiling +acting in play +adjusting glasses +air drumming +alligator wrestling +answering questions +applauding +applying cream +archaeological excavation +archery +arguing +arm wrestling +arranging flowers +assembling bicycle +assembling computer +attending conference +auctioning +backflip (human) +baking cookies +bandaging +barbequing +bartending +base jumping +bathing dog +battle rope training +beatboxing +bee keeping +belly dancing +bench pressing +bending back +bending metal +biking through snow +blasting sand +blowdrying hair +blowing bubble gum +blowing glass +blowing leaves +blowing nose +blowing out candles +bobsledding +bodysurfing +bookbinding +bottling +bouncing on bouncy castle +bouncing on trampoline +bowling +braiding hair +breading or breadcrumbing +breakdancing +breaking boards +breathing fire +brush painting +brushing hair +brushing teeth +building cabinet +building lego +building sandcastle +building shed +bull fighting +bulldozing +bungee jumping +burping +busking +calculating +calligraphy +canoeing or kayaking +capoeira +capsizing +card stacking +card throwing +carrying baby +cartwheeling +carving ice +carving pumpkin +casting fishing line +catching fish +catching or throwing baseball +catching or throwing frisbee +catching or throwing softball +celebrating +changing gear in car +changing oil +changing wheel (not on bike) +checking tires +cheerleading +chewing gum +chiseling stone +chiseling wood +chopping meat +chopping vegetables +chopping wood +clam digging +clapping +clay pottery making +clean and jerk +cleaning gutters +cleaning pool +cleaning shoes +cleaning toilet +cleaning windows +climbing a rope +climbing ladder +climbing tree +coloring in +combing hair +contact juggling +contorting +cooking egg +cooking on campfire +cooking sausages (not on barbeque) +cooking scallops +cosplaying +counting money +country line dancing +cracking back +cracking knuckles +cracking neck +crawling baby +crossing eyes +crossing river +crying +cumbia +curling (sport) +curling hair +cutting apple +cutting nails +cutting orange +cutting pineapple +cutting watermelon +dancing ballet +dancing charleston +dancing gangnam style +dancing macarena +deadlifting +decorating the christmas tree +delivering mail +dining +directing traffic +disc golfing +diving cliff +docking boat +dodgeball +doing aerobics +doing jigsaw puzzle +doing laundry +doing nails +drawing +dribbling basketball +drinking shots +driving car +driving tractor +drooling +drop kicking +drumming fingers +dumpster diving +dunking basketball +dyeing eyebrows +dyeing hair +eating burger +eating cake +eating carrots +eating chips +eating doughnuts +eating hotdog +eating ice cream +eating spaghetti +eating watermelon +egg hunting +embroidering +exercising with an exercise ball +extinguishing fire +faceplanting +falling off bike +falling off chair +feeding birds +feeding fish +feeding goats +fencing (sport) +fidgeting +finger snapping +fixing bicycle +fixing hair +flint knapping +flipping pancake +fly tying +flying kite +folding clothes +folding napkins +folding paper +front raises +frying vegetables +geocaching +getting a haircut +getting a piercing +getting a tattoo +giving or receiving award +gold panning +golf chipping +golf driving +golf putting +gospel singing in church +grinding meat +grooming dog +grooming horse +gymnastics tumbling +hammer throw +hand washing clothes +head stand +headbanging +headbutting +high jump +high kick +historical reenactment +hitting baseball +hockey stop +holding snake +home roasting coffee +hopscotch +hoverboarding +huddling +hugging (not baby) +hugging baby +hula hooping +hurdling +hurling (sport) +ice climbing +ice fishing +ice skating +ice swimming +inflating balloons +installing carpet +ironing +ironing hair +javelin throw +jaywalking +jetskiing +jogging +juggling balls +juggling fire +juggling soccer ball +jumping bicycle +jumping into pool +jumping jacks +jumpstyle dancing +karaoke +kicking field goal +kicking soccer ball +kissing +kitesurfing +knitting +krumping +land sailing +laughing +lawn mower racing +laying bricks +laying concrete +laying stone +laying tiles +leatherworking +licking +lifting hat +lighting fire +lock picking +long jump +longboarding +looking at phone +luge +lunge +making a cake +making a sandwich +making balloon shapes +making bubbles +making cheese +making horseshoes +making jewelry +making paper aeroplanes +making pizza +making snowman +making sushi +making tea +making the bed +marching +marriage proposal +massaging back +massaging feet +massaging legs +massaging neck +massaging person's head +milking cow +moon walking +mopping floor +mosh pit dancing +motorcycling +mountain climber (exercise) +moving furniture +mowing lawn +mushroom foraging +needle felting +news anchoring +opening bottle (not wine) +opening door +opening present +opening refrigerator +opening wine bottle +packing +paragliding +parasailing +parkour +passing American football (in game) +passing american football (not in game) +passing soccer ball +peeling apples +peeling potatoes +person collecting garbage +petting animal (not cat) +petting cat +photobombing +photocopying +picking fruit +pillow fight +pinching +pirouetting +planing wood +planting trees +plastering +playing accordion +playing badminton +playing bagpipes +playing basketball +playing bass guitar +playing beer pong +playing blackjack +playing cello +playing chess +playing clarinet +playing controller +playing cricket +playing cymbals +playing darts +playing didgeridoo +playing dominoes +playing drums +playing field hockey +playing flute +playing gong +playing guitar +playing hand clapping games +playing harmonica +playing harp +playing ice hockey +playing keyboard +playing kickball +playing laser tag +playing lute +playing maracas +playing marbles +playing monopoly +playing netball +playing ocarina +playing organ +playing paintball +playing pan pipes +playing piano +playing pinball +playing ping pong +playing poker +playing polo +playing recorder +playing rubiks cube +playing saxophone +playing scrabble +playing squash or racquetball +playing tennis +playing trombone +playing trumpet +playing ukulele +playing violin +playing volleyball +playing with trains +playing xylophone +poking bellybutton +pole vault +polishing metal +popping balloons +pouring beer +preparing salad +presenting weather forecast +pull ups +pumping fist +pumping gas +punching bag +punching person (boxing) +push up +pushing car +pushing cart +pushing wheelbarrow +pushing wheelchair +putting in contact lenses +putting on eyeliner +putting on foundation +putting on lipstick +putting on mascara +putting on sari +putting on shoes +raising eyebrows +reading book +reading newspaper +recording music +repairing puncture +riding a bike +riding camel +riding elephant +riding mechanical bull +riding mule +riding or walking with horse +riding scooter +riding snow blower +riding unicycle +ripping paper +roasting marshmallows +roasting pig +robot dancing +rock climbing +rock scissors paper +roller skating +rolling pastry +rope pushdown +running on treadmill +sailing +salsa dancing +sanding floor +sausage making +sawing wood +scrambling eggs +scrapbooking +scrubbing face +scuba diving +separating eggs +setting table +sewing +shaking hands +shaking head +shaping bread dough +sharpening knives +sharpening pencil +shaving head +shaving legs +shearing sheep +shining flashlight +shining shoes +shooting basketball +shooting goal (soccer) +shopping +shot put +shoveling snow +shucking oysters +shuffling cards +shuffling feet +side kick +sign language interpreting +singing +sipping cup +situp +skateboarding +ski jumping +skiing crosscountry +skiing mono +skiing slalom +skipping rope +skipping stone +skydiving +slacklining +slapping +sled dog racing +sleeping +smashing +smelling feet +smoking +smoking hookah +smoking pipe +snatch weight lifting +sneezing +snorkeling +snowboarding +snowkiting +snowmobiling +somersaulting +spelunking +spinning poi +spray painting +springboard diving +square dancing +squat +standing on hands +staring +steer roping +sticking tongue out +stomping grapes +stretching arm +stretching leg +sucking lolly +surfing crowd +surfing water +sweeping floor +swimming backstroke +swimming breast stroke +swimming butterfly stroke +swimming front crawl +swing dancing +swinging baseball bat +swinging on something +sword fighting +sword swallowing +tackling +tagging graffiti +tai chi +talking on cell phone +tango dancing +tap dancing +tapping guitar +tapping pen +tasting beer +tasting food +tasting wine +testifying +texting +threading needle +throwing axe +throwing ball (not baseball or American football) +throwing discus +throwing knife +throwing snowballs +throwing tantrum +throwing water balloon +tickling +tie dying +tightrope walking +tiptoeing +tobogganing +tossing coin +training dog +trapezing +trimming or shaving beard +trimming shrubs +trimming trees +triple jump +twiddling fingers +tying bow tie +tying knot (not on a tie) +tying necktie +tying shoe laces +unboxing +unloading truck +using a microscope +using a paint roller +using a power drill +using a sledge hammer +using a wrench +using atm +using bagging machine +using circular saw +using inhaler +using puppets +using remote controller (not gaming) +using segway +vacuuming floor +visiting the zoo +wading through mud +wading through water +waiting in line +waking up +walking the dog +walking through snow +washing dishes +washing feet +washing hair +washing hands +watching tv +water skiing +water sliding +watering plants +waving hand +waxing back +waxing chest +waxing eyebrows +waxing legs +weaving basket +weaving fabric +welding +whistling +windsurfing +winking +wood burning (art) +wrapping present +wrestling +writing +yarn spinning +yawning +yoga +zumba diff --git a/official/projects/movinet/modeling/__init__.py b/official/projects/movinet/modeling/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/movinet/modeling/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/movinet/modeling/movinet.py b/official/projects/movinet/modeling/movinet.py new file mode 100644 index 0000000000000000000000000000000000000000..eef9b7f0edae44a4b0a4c23b5b03c42c952fd3cd --- /dev/null +++ b/official/projects/movinet/modeling/movinet.py @@ -0,0 +1,740 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of Mobile Video Networks. + +Reference: https://arxiv.org/pdf/2103.11511.pdf +""" +import dataclasses +import math +from typing import Dict, Mapping, Optional, Sequence, Tuple, Union + +from absl import logging +import tensorflow as tf + +from official.modeling import hyperparams +from official.projects.movinet.modeling import movinet_layers +from official.vision.modeling.backbones import factory + +# Defines a set of kernel sizes and stride sizes to simplify and shorten +# architecture definitions for configs below. +KernelSize = Tuple[int, int, int] + +# K(ab) represents a 3D kernel of size (a, b, b) +K13: KernelSize = (1, 3, 3) +K15: KernelSize = (1, 5, 5) +K33: KernelSize = (3, 3, 3) +K53: KernelSize = (5, 3, 3) + +# S(ab) represents a 3D stride of size (a, b, b) +S11: KernelSize = (1, 1, 1) +S12: KernelSize = (1, 2, 2) +S22: KernelSize = (2, 2, 2) +S21: KernelSize = (2, 1, 1) + +# Type for a state container (map) +TensorMap = Mapping[str, tf.Tensor] + + +@dataclasses.dataclass +class BlockSpec: + """Configuration of a block.""" + + +@dataclasses.dataclass +class StemSpec(BlockSpec): + """Configuration of a Movinet block.""" + filters: int = 0 + kernel_size: KernelSize = (0, 0, 0) + strides: KernelSize = (0, 0, 0) + + +@dataclasses.dataclass +class MovinetBlockSpec(BlockSpec): + """Configuration of a Movinet block.""" + base_filters: int = 0 + expand_filters: Sequence[int] = () + kernel_sizes: Sequence[KernelSize] = () + strides: Sequence[KernelSize] = () + + +@dataclasses.dataclass +class HeadSpec(BlockSpec): + """Configuration of a Movinet block.""" + project_filters: int = 0 + head_filters: int = 0 + + +# Block specs specify the architecture of each model +BLOCK_SPECS = { + 'a0': ( + StemSpec(filters=8, kernel_size=K13, strides=S12), + MovinetBlockSpec( + base_filters=8, + expand_filters=(24,), + kernel_sizes=(K15,), + strides=(S12,)), + MovinetBlockSpec( + base_filters=32, + expand_filters=(80, 80, 80), + kernel_sizes=(K33, K33, K33), + strides=(S12, S11, S11)), + MovinetBlockSpec( + base_filters=56, + expand_filters=(184, 112, 184), + kernel_sizes=(K53, K33, K33), + strides=(S12, S11, S11)), + MovinetBlockSpec( + base_filters=56, + expand_filters=(184, 184, 184, 184), + kernel_sizes=(K53, K33, K33, K33), + strides=(S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=104, + expand_filters=(384, 280, 280, 344), + kernel_sizes=(K53, K15, K15, K15), + strides=(S12, S11, S11, S11)), + HeadSpec(project_filters=480, head_filters=2048), + ), + 'a1': ( + StemSpec(filters=16, kernel_size=K13, strides=S12), + MovinetBlockSpec( + base_filters=16, + expand_filters=(40, 40), + kernel_sizes=(K15, K33), + strides=(S12, S11)), + MovinetBlockSpec( + base_filters=40, + expand_filters=(96, 120, 96, 96), + kernel_sizes=(K33, K33, K33, K33), + strides=(S12, S11, S11, S11)), + MovinetBlockSpec( + base_filters=64, + expand_filters=(216, 128, 216, 168, 216), + kernel_sizes=(K53, K33, K33, K33, K33), + strides=(S12, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=64, + expand_filters=(216, 216, 216, 128, 128, 216), + kernel_sizes=(K53, K33, K33, K33, K15, K33), + strides=(S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=136, + expand_filters=(456, 360, 360, 360, 456, 456, 544), + kernel_sizes=(K53, K15, K15, K15, K15, K33, K13), + strides=(S12, S11, S11, S11, S11, S11, S11)), + HeadSpec(project_filters=600, head_filters=2048), + ), + 'a2': ( + StemSpec(filters=16, kernel_size=K13, strides=S12), + MovinetBlockSpec( + base_filters=16, + expand_filters=(40, 40, 64), + kernel_sizes=(K15, K33, K33), + strides=(S12, S11, S11)), + MovinetBlockSpec( + base_filters=40, + expand_filters=(96, 120, 96, 96, 120), + kernel_sizes=(K33, K33, K33, K33, K33), + strides=(S12, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=72, + expand_filters=(240, 160, 240, 192, 240), + kernel_sizes=(K53, K33, K33, K33, K33), + strides=(S12, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=72, + expand_filters=(240, 240, 240, 240, 144, 240), + kernel_sizes=(K53, K33, K33, K33, K15, K33), + strides=(S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=144, + expand_filters=(480, 384, 384, 480, 480, 480, 576), + kernel_sizes=(K53, K15, K15, K15, K15, K33, K13), + strides=(S12, S11, S11, S11, S11, S11, S11)), + HeadSpec(project_filters=640, head_filters=2048), + ), + 'a3': ( + StemSpec(filters=16, kernel_size=K13, strides=S12), + MovinetBlockSpec( + base_filters=16, + expand_filters=(40, 40, 64, 40), + kernel_sizes=(K15, K33, K33, K33), + strides=(S12, S11, S11, S11)), + MovinetBlockSpec( + base_filters=48, + expand_filters=(112, 144, 112, 112, 144, 144), + kernel_sizes=(K33, K33, K33, K15, K33, K33), + strides=(S12, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=80, + expand_filters=(240, 152, 240, 192, 240), + kernel_sizes=(K53, K33, K33, K33, K33), + strides=(S12, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=88, + expand_filters=(264, 264, 264, 264, 160, 264, 264, 264), + kernel_sizes=(K53, K33, K33, K33, K15, K33, K33, K33), + strides=(S11, S11, S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=168, + expand_filters=(560, 448, 448, 560, 560, 560, 448, 448, 560, 672), + kernel_sizes=(K53, K15, K15, K15, K15, K33, K15, K15, K33, K13), + strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11)), + HeadSpec(project_filters=744, head_filters=2048), + ), + 'a4': ( + StemSpec(filters=24, kernel_size=K13, strides=S12), + MovinetBlockSpec( + base_filters=24, + expand_filters=(64, 64, 96, 64, 96, 64), + kernel_sizes=(K15, K33, K33, K33, K33, K33), + strides=(S12, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=56, + expand_filters=(168, 168, 136, 136, 168, 168, 168, 136, 136), + kernel_sizes=(K33, K33, K33, K33, K33, K33, K33, K15, K33), + strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=96, + expand_filters=(320, 160, 320, 192, 320, 160, 320, 256, 320), + kernel_sizes=(K53, K33, K33, K33, K33, K33, K33, K33, K33), + strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=96, + expand_filters=(320, 320, 320, 320, 192, 320, 320, 192, 320, 320), + kernel_sizes=(K53, K33, K33, K33, K15, K33, K33, K33, K33, K33), + strides=(S11, S11, S11, S11, S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=192, + expand_filters=(640, 512, 512, 640, 640, 640, 512, 512, 640, 768, + 640, 640, 768), + kernel_sizes=(K53, K15, K15, K15, K15, K33, K15, K15, K15, K15, K15, + K33, K33), + strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, + S11)), + HeadSpec(project_filters=856, head_filters=2048), + ), + 'a5': ( + StemSpec(filters=24, kernel_size=K13, strides=S12), + MovinetBlockSpec( + base_filters=24, + expand_filters=(64, 64, 96, 64, 96, 64), + kernel_sizes=(K15, K15, K33, K33, K33, K33), + strides=(S12, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=64, + expand_filters=(192, 152, 152, 152, 192, 192, 192, 152, 152, 192, + 192), + kernel_sizes=(K53, K33, K33, K33, K33, K33, K33, K33, K33, K33, + K33), + strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=112, + expand_filters=(376, 224, 376, 376, 296, 376, 224, 376, 376, 296, + 376, 376, 376), + kernel_sizes=(K53, K33, K33, K33, K33, K33, K33, K33, K33, K33, K33, + K33, K33), + strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, + S11)), + MovinetBlockSpec( + base_filters=120, + expand_filters=(376, 376, 376, 376, 224, 376, 376, 224, 376, 376, + 376), + kernel_sizes=(K53, K33, K33, K33, K15, K33, K33, K33, K33, K33, + K33), + strides=(S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=224, + expand_filters=(744, 744, 600, 600, 744, 744, 744, 896, 600, 600, + 896, 744, 744, 896, 600, 600, 744, 744), + kernel_sizes=(K53, K33, K15, K15, K15, K15, K33, K15, K15, K15, K15, + K15, K33, K15, K15, K15, K15, K33), + strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, + S11, S11, S11, S11, S11, S11)), + HeadSpec(project_filters=992, head_filters=2048), + ), + 't0': ( + StemSpec(filters=8, kernel_size=K13, strides=S12), + MovinetBlockSpec( + base_filters=8, + expand_filters=(16,), + kernel_sizes=(K15,), + strides=(S12,)), + MovinetBlockSpec( + base_filters=32, + expand_filters=(72, 72), + kernel_sizes=(K33, K15), + strides=(S12, S11)), + MovinetBlockSpec( + base_filters=56, + expand_filters=(112, 112, 112), + kernel_sizes=(K53, K15, K33), + strides=(S12, S11, S11)), + MovinetBlockSpec( + base_filters=56, + expand_filters=(184, 184, 184, 184), + kernel_sizes=(K53, K15, K33, K33), + strides=(S11, S11, S11, S11)), + MovinetBlockSpec( + base_filters=104, + expand_filters=(344, 344, 344, 344), + kernel_sizes=(K53, K15, K15, K33), + strides=(S12, S11, S11, S11)), + HeadSpec(project_filters=240, head_filters=1024), + ), +} + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class Movinet(tf.keras.Model): + """Class to build Movinet family model. + + Reference: https://arxiv.org/pdf/2103.11511.pdf + """ + + def __init__(self, + model_id: str = 'a0', + causal: bool = False, + use_positional_encoding: bool = False, + conv_type: str = '3d', + se_type: str = '3d', + input_specs: Optional[tf.keras.layers.InputSpec] = None, + activation: str = 'swish', + gating_activation: str = 'sigmoid', + use_sync_bn: bool = True, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_initializer: str = 'HeNormal', + kernel_regularizer: Optional[str] = None, + bias_regularizer: Optional[str] = None, + stochastic_depth_drop_rate: float = 0., + use_external_states: bool = False, + output_states: bool = True, + average_pooling_type: str = '3d', + **kwargs): + """MoViNet initialization function. + + Args: + model_id: name of MoViNet backbone model. + causal: use causal mode, with CausalConv and CausalSE operations. + use_positional_encoding: if True, adds a positional encoding before + temporal convolutions and the cumulative global average pooling + layers. + conv_type: '3d', '2plus1d', or '3d_2plus1d'. '3d' configures the network + to use the default 3D convolution. '2plus1d' uses (2+1)D convolution + with Conv2D operations and 2D reshaping (e.g., a 5x3x3 kernel becomes + 3x3 followed by 5x1 conv). '3d_2plus1d' uses (2+1)D convolution with + Conv3D and no 2D reshaping (e.g., a 5x3x3 kernel becomes 1x3x3 followed + by 5x1x1 conv). + se_type: '3d', '2d', '2plus3d' or 'none'. '3d' uses the default 3D + spatiotemporal global average pooling for squeeze excitation. '2d' + uses 2D spatial global average pooling on each frame. '2plus3d' + concatenates both 3D and 2D global average pooling. + input_specs: the model input spec to use. + activation: name of the main activation function. + gating_activation: gating activation to use in squeeze excitation layers. + use_sync_bn: if True, use synchronized batch normalization. + norm_momentum: normalization momentum for the moving average. + norm_epsilon: small float added to variance to avoid dividing by + zero. + kernel_initializer: kernel_initializer for convolutional layers. + kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D. + Defaults to None. + bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d. + Defaults to None. + stochastic_depth_drop_rate: the base rate for stochastic depth. + use_external_states: if True, expects states to be passed as additional + input. + output_states: if True, output intermediate states that can be used to run + the model in streaming mode. Inputting the output states of the + previous input clip with the current input clip will utilize a stream + buffer for streaming video. + average_pooling_type: The average pooling type. Currently supporting + ['3d', '2d', 'none']. + **kwargs: keyword arguments to be passed. + """ + block_specs = BLOCK_SPECS[model_id] + if input_specs is None: + input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, None, 3]) + + if conv_type not in ('3d', '2plus1d', '3d_2plus1d'): + raise ValueError('Unknown conv type: {}'.format(conv_type)) + if se_type not in ('3d', '2d', '2plus3d', 'none'): + raise ValueError('Unknown squeeze excitation type: {}'.format(se_type)) + + self._model_id = model_id + self._block_specs = block_specs + self._causal = causal + self._use_positional_encoding = use_positional_encoding + self._conv_type = conv_type + self._se_type = se_type + self._input_specs = input_specs + self._use_sync_bn = use_sync_bn + self._activation = activation + self._gating_activation = gating_activation + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._use_external_states = use_external_states + self._output_states = output_states + self._average_pooling_type = average_pooling_type + + if self._use_external_states and not self._causal: + raise ValueError('External states should be used with causal mode.') + if not isinstance(block_specs[0], StemSpec): + raise ValueError( + 'Expected first spec to be StemSpec, got {}'.format(block_specs[0])) + if not isinstance(block_specs[-1], HeadSpec): + raise ValueError( + 'Expected final spec to be HeadSpec, got {}'.format(block_specs[-1])) + self._head_filters = block_specs[-1].head_filters + + state_specs = None + if use_external_states: + self._set_dtype_policy(input_specs.dtype) + state_specs = self.initial_state_specs(input_specs.shape) + + inputs, outputs = self._build_network(input_specs, state_specs=state_specs) + + super(Movinet, self).__init__(inputs=inputs, outputs=outputs, **kwargs) + + self._state_specs = state_specs + + def _build_network( + self, + input_specs: tf.keras.layers.InputSpec, + state_specs: Optional[Mapping[str, tf.keras.layers.InputSpec]] = None, + ) -> Tuple[TensorMap, Union[TensorMap, Tuple[TensorMap, TensorMap]]]: + """Builds the model network. + + Args: + input_specs: the model input spec to use. + state_specs: a dict mapping a state name to the corresponding state spec. + State names should match with the `state` input/output dict. + + Returns: + Inputs and outputs as a tuple. Inputs are expected to be a dict with + base input and states. Outputs are expected to be a dict of endpoints + and (optional) output states. + """ + state_specs = state_specs if state_specs is not None else {} + + image_input = tf.keras.Input(shape=input_specs.shape[1:], name='inputs') + + states = { + name: tf.keras.Input(shape=spec.shape[1:], dtype=spec.dtype, name=name) + for name, spec in state_specs.items() + } + + inputs = {**states, 'image': image_input} + endpoints = {} + + x = image_input + + num_layers = sum( + len(block.expand_filters) + for block in self._block_specs + if isinstance(block, MovinetBlockSpec)) + stochastic_depth_idx = 1 + for block_idx, block in enumerate(self._block_specs): + if isinstance(block, StemSpec): + layer_obj = movinet_layers.Stem( + block.filters, + block.kernel_size, + block.strides, + conv_type=self._conv_type, + causal=self._causal, + activation=self._activation, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + batch_norm_layer=self._norm, + batch_norm_momentum=self._norm_momentum, + batch_norm_epsilon=self._norm_epsilon, + state_prefix='state_stem', + name='stem') + x, states = layer_obj(x, states=states) + endpoints['stem'] = x + elif isinstance(block, MovinetBlockSpec): + if not (len(block.expand_filters) == len(block.kernel_sizes) == + len(block.strides)): + raise ValueError( + 'Lengths of block parameters differ: {}, {}, {}'.format( + len(block.expand_filters), + len(block.kernel_sizes), + len(block.strides))) + params = list(zip(block.expand_filters, + block.kernel_sizes, + block.strides)) + for layer_idx, layer in enumerate(params): + stochastic_depth_drop_rate = ( + self._stochastic_depth_drop_rate * stochastic_depth_idx / + num_layers) + expand_filters, kernel_size, strides = layer + name = f'block{block_idx-1}_layer{layer_idx}' + layer_obj = movinet_layers.MovinetBlock( + block.base_filters, + expand_filters, + kernel_size=kernel_size, + strides=strides, + causal=self._causal, + activation=self._activation, + gating_activation=self._gating_activation, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + conv_type=self._conv_type, + se_type=self._se_type, + use_positional_encoding= + self._use_positional_encoding and self._causal, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + batch_norm_layer=self._norm, + batch_norm_momentum=self._norm_momentum, + batch_norm_epsilon=self._norm_epsilon, + state_prefix=f'state_{name}', + name=name) + x, states = layer_obj(x, states=states) + + endpoints[name] = x + stochastic_depth_idx += 1 + elif isinstance(block, HeadSpec): + layer_obj = movinet_layers.Head( + project_filters=block.project_filters, + conv_type=self._conv_type, + activation=self._activation, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + batch_norm_layer=self._norm, + batch_norm_momentum=self._norm_momentum, + batch_norm_epsilon=self._norm_epsilon, + average_pooling_type=self._average_pooling_type, + state_prefix='state_head', + name='head') + x, states = layer_obj(x, states=states) + endpoints['head'] = x + else: + raise ValueError('Unknown block type {}'.format(block)) + + outputs = (endpoints, states) if self._output_states else endpoints + + return inputs, outputs + + def _get_initial_state_shapes( + self, + block_specs: Sequence[BlockSpec], + input_shape: Union[Sequence[int], tf.Tensor], + use_positional_encoding: bool = False) -> Dict[str, Sequence[int]]: + """Generates names and shapes for all input states. + + Args: + block_specs: sequence of specs used for creating a model. + input_shape: the expected 5D shape of the image input. + use_positional_encoding: whether the model will use positional encoding. + + Returns: + A dict mapping state names to state shapes. + """ + def divide_resolution(shape, num_downsamples): + """Downsamples the dimension to calculate strided convolution shape.""" + if shape is None: + return None + if isinstance(shape, tf.Tensor): + # Avoid using div and ceil to support tf lite + shape = tf.cast(shape, tf.float32) + resolution_divisor = 2 ** num_downsamples + resolution_multiplier = 0.5 ** num_downsamples + shape = ((shape + resolution_divisor - 1) * resolution_multiplier) + return tf.cast(shape, tf.int32) + else: + resolution_divisor = 2 ** num_downsamples + return math.ceil(shape / resolution_divisor) + + states = {} + num_downsamples = 0 + + for block_idx, block in enumerate(block_specs): + if isinstance(block, StemSpec): + if block.kernel_size[0] > 1: + states['state_stem_stream_buffer'] = ( + input_shape[0], + input_shape[1], + divide_resolution(input_shape[2], num_downsamples), + divide_resolution(input_shape[3], num_downsamples), + block.filters, + ) + num_downsamples += 1 + elif isinstance(block, MovinetBlockSpec): + block_idx -= 1 + params = list(zip( + block.expand_filters, + block.kernel_sizes, + block.strides)) + for layer_idx, layer in enumerate(params): + expand_filters, kernel_size, strides = layer + + # If we use a 2D kernel, we apply spatial downsampling + # before the buffer. + if (tuple(strides[1:3]) != (1, 1) and + self._conv_type in ['2plus1d', '3d_2plus1d']): + num_downsamples += 1 + + prefix = f'state_block{block_idx}_layer{layer_idx}' + + if kernel_size[0] > 1: + states[f'{prefix}_stream_buffer'] = ( + input_shape[0], + kernel_size[0] - 1, + divide_resolution(input_shape[2], num_downsamples), + divide_resolution(input_shape[3], num_downsamples), + expand_filters, + ) + + if '3d' in self._se_type: + states[f'{prefix}_pool_buffer'] = ( + input_shape[0], 1, 1, 1, expand_filters, + ) + states[f'{prefix}_pool_frame_count'] = (1,) + + if use_positional_encoding: + name = f'{prefix}_pos_enc_frame_count' + states[name] = (1,) + + if strides[1] != strides[2]: + raise ValueError('Strides must match in the spatial dimensions, ' + 'got {}'.format(strides)) + + # If we use a 3D kernel, we apply spatial downsampling + # after the buffer. + if (tuple(strides[1:3]) != (1, 1) and + self._conv_type not in ['2plus1d', '3d_2plus1d']): + num_downsamples += 1 + elif isinstance(block, HeadSpec): + states['state_head_pool_buffer'] = ( + input_shape[0], 1, 1, 1, block.project_filters, + ) + states['state_head_pool_frame_count'] = (1,) + + return states + + def _get_state_dtype(self, name: str) -> str: + """Returns the dtype associated with a state.""" + if 'frame_count' in name: + return 'int32' + return self.dtype + + def initial_state_specs( + self, input_shape: Sequence[int]) -> Dict[str, tf.keras.layers.InputSpec]: + """Creates a mapping of state name to InputSpec from the input shape.""" + state_shapes = self._get_initial_state_shapes( + self._block_specs, + input_shape, + use_positional_encoding=self._use_positional_encoding) + + return { + name: tf.keras.layers.InputSpec( + shape=shape, dtype=self._get_state_dtype(name)) + for name, shape in state_shapes.items() + } + + def init_states(self, input_shape: Sequence[int]) -> Dict[str, tf.Tensor]: + """Returns initial states for the first call in steaming mode.""" + state_shapes = self._get_initial_state_shapes( + self._block_specs, + input_shape, + use_positional_encoding=self._use_positional_encoding) + + states = { + name: tf.zeros(shape, dtype=self._get_state_dtype(name)) + for name, shape in state_shapes.items() + } + return states + + @property + def use_external_states(self) -> bool: + """Whether this model is expecting input states as additional input.""" + return self._use_external_states + + @property + def head_filters(self): + """The number of filters expected to be in the head classifer layer.""" + return self._head_filters + + @property + def conv_type(self): + """The expected convolution type (see __init__ for more details).""" + return self._conv_type + + def get_config(self): + config_dict = { + 'model_id': self._model_id, + 'causal': self._causal, + 'use_positional_encoding': self._use_positional_encoding, + 'conv_type': self._conv_type, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'use_external_states': self._use_external_states, + 'output_states': self._output_states, + } + return config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + +@factory.register_backbone_builder('movinet') +def build_movinet( + input_specs: tf.keras.layers.InputSpec, + backbone_config: hyperparams.Config, + norm_activation_config: hyperparams.Config, + l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + """Builds MoViNet backbone from a config.""" + backbone_type = backbone_config.type + backbone_cfg = backbone_config.get() + if backbone_type != 'movinet': + raise ValueError(f'Inconsistent backbone type {backbone_type}') + if norm_activation_config.activation is not None: + logging.warn('norm_activation is not used in MoViNets, but specified: ' + '%s', norm_activation_config.activation) + logging.warn('norm_activation is ignored.') + + return Movinet( + model_id=backbone_cfg.model_id, + causal=backbone_cfg.causal, + use_positional_encoding=backbone_cfg.use_positional_encoding, + conv_type=backbone_cfg.conv_type, + se_type=backbone_cfg.se_type, + input_specs=input_specs, + activation=backbone_cfg.activation, + gating_activation=backbone_cfg.gating_activation, + output_states=backbone_cfg.output_states, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer, + stochastic_depth_drop_rate=backbone_cfg.stochastic_depth_drop_rate, + use_external_states=backbone_cfg.use_external_states, + average_pooling_type=backbone_cfg.average_pooling_type) diff --git a/official/vision/beta/projects/movinet/modeling/movinet_layers.py b/official/projects/movinet/modeling/movinet_layers.py similarity index 94% rename from official/vision/beta/projects/movinet/modeling/movinet_layers.py rename to official/projects/movinet/modeling/movinet_layers.py index 38179e7b3f748bb464bea7e2f5ee5f9284e6b424..af81c4cabb3a28d552411e21502a2b8efd5ec1f8 100644 --- a/official/vision/beta/projects/movinet/modeling/movinet_layers.py +++ b/official/projects/movinet/modeling/movinet_layers.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Contains common building blocks for MoViNets. Reference: https://arxiv.org/pdf/2103.11511.pdf @@ -23,7 +22,7 @@ from typing import Any, Mapping, Optional, Sequence, Tuple, Union import tensorflow as tf from official.modeling import tf_utils -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.layers import nn_layers # Default kernel weight decay that may be overridden KERNEL_WEIGHT_DECAY = 1.5e-5 @@ -93,10 +92,9 @@ class MobileConv2D(tf.keras.layers.Layer): data_format: Optional[str] = None, dilation_rate: Union[int, Sequence[int]] = (1, 1), groups: int = 1, - activation: Optional[nn_layers.Activation] = None, use_bias: bool = True, - kernel_initializer: tf.keras.initializers.Initializer = 'glorot_uniform', - bias_initializer: tf.keras.initializers.Initializer = 'zeros', + kernel_initializer: str = 'glorot_uniform', + bias_initializer: str = 'zeros', kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, activity_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, @@ -105,6 +103,8 @@ class MobileConv2D(tf.keras.layers.Layer): use_depthwise: bool = False, use_temporal: bool = False, use_buffered_input: bool = False, # pytype: disable=annotation-type-mismatch # typed-keras + batch_norm_op: Optional[Any] = None, + activation_op: Optional[Any] = None, **kwargs): # pylint: disable=g-doc-args """Initializes mobile conv2d. @@ -117,6 +117,10 @@ class MobileConv2D(tf.keras.layers.Layer): use_buffered_input: if True, the input is expected to be padded beforehand. In effect, calling this layer will use 'valid' padding on the temporal dimension to simulate 'causal' padding. + batch_norm_op: A callable object of batch norm layer. If None, no batch + norm will be applied after the convolution. + activation_op: A callabel object of activation layer. If None, no + activation will be applied after the convolution. **kwargs: keyword arguments to be passed to this layer. Returns: @@ -130,7 +134,6 @@ class MobileConv2D(tf.keras.layers.Layer): self._data_format = data_format self._dilation_rate = dilation_rate self._groups = groups - self._activation = activation self._use_bias = use_bias self._kernel_initializer = kernel_initializer self._bias_initializer = bias_initializer @@ -142,6 +145,8 @@ class MobileConv2D(tf.keras.layers.Layer): self._use_depthwise = use_depthwise self._use_temporal = use_temporal self._use_buffered_input = use_buffered_input + self._batch_norm_op = batch_norm_op + self._activation_op = activation_op kernel_size = normalize_tuple(kernel_size, 2, 'kernel_size') @@ -156,7 +161,6 @@ class MobileConv2D(tf.keras.layers.Layer): depth_multiplier=1, data_format=data_format, dilation_rate=dilation_rate, - activation=activation, use_bias=use_bias, depthwise_initializer=kernel_initializer, bias_initializer=bias_initializer, @@ -175,7 +179,6 @@ class MobileConv2D(tf.keras.layers.Layer): data_format=data_format, dilation_rate=dilation_rate, groups=groups, - activation=activation, use_bias=use_bias, kernel_initializer=kernel_initializer, bias_initializer=bias_initializer, @@ -196,7 +199,6 @@ class MobileConv2D(tf.keras.layers.Layer): 'data_format': self._data_format, 'dilation_rate': self._dilation_rate, 'groups': self._groups, - 'activation': self._activation, 'use_bias': self._use_bias, 'kernel_initializer': self._kernel_initializer, 'bias_initializer': self._bias_initializer, @@ -229,6 +231,10 @@ class MobileConv2D(tf.keras.layers.Layer): x = tf.reshape(inputs, input_shape) x = self._conv(x) + if self._batch_norm_op is not None: + x = self._batch_norm_op(x) + if self._activation_op is not None: + x = self._activation_op(x) if self._use_temporal: output_shape = [ @@ -357,8 +363,20 @@ class ConvBlock(tf.keras.layers.Layer): padding = 'causal' if self._causal else 'same' self._groups = input_shape[-1] if self._depthwise else 1 - self._conv_temporal = None + self._batch_norm = None + self._batch_norm_temporal = None + if self._use_batch_norm: + self._batch_norm = self._batch_norm_layer( + momentum=self._batch_norm_momentum, + epsilon=self._batch_norm_epsilon, + name='bn') + if self._conv_type != '3d' and self._kernel_size[0] > 1: + self._batch_norm_temporal = self._batch_norm_layer( + momentum=self._batch_norm_momentum, + epsilon=self._batch_norm_epsilon, + name='bn_temporal') + self._conv_temporal = None if self._conv_type == '3d_2plus1d' and self._kernel_size[0] > 1: self._conv = nn_layers.Conv3D( self._filters, @@ -394,6 +412,8 @@ class ConvBlock(tf.keras.layers.Layer): kernel_initializer=self._kernel_initializer, kernel_regularizer=self._kernel_regularizer, use_buffered_input=False, + batch_norm_op=self._batch_norm, + activation_op=self._activation_layer, name='conv2d') if self._kernel_size[0] > 1: self._conv_temporal = MobileConv2D( @@ -408,6 +428,8 @@ class ConvBlock(tf.keras.layers.Layer): kernel_initializer=self._kernel_initializer, kernel_regularizer=self._kernel_regularizer, use_buffered_input=self._use_buffered_input, + batch_norm_op=self._batch_norm_temporal, + activation_op=self._activation_layer, name='conv2d_temporal') else: self._conv = nn_layers.Conv3D( @@ -422,37 +444,26 @@ class ConvBlock(tf.keras.layers.Layer): use_buffered_input=self._use_buffered_input, name='conv3d') - self._batch_norm = None - self._batch_norm_temporal = None - - if self._use_batch_norm: - self._batch_norm = self._batch_norm_layer( - momentum=self._batch_norm_momentum, - epsilon=self._batch_norm_epsilon, - name='bn') - if self._conv_type != '3d' and self._conv_temporal is not None: - self._batch_norm_temporal = self._batch_norm_layer( - momentum=self._batch_norm_momentum, - epsilon=self._batch_norm_epsilon, - name='bn_temporal') - super(ConvBlock, self).build(input_shape) def call(self, inputs): """Calls the layer with the given inputs.""" x = inputs + # bn_op and activation_op are folded into the '2plus1d' conv layer so that + # we do not explicitly call them here. + # TODO(lzyuan): clean the conv layers api once the models are re-trained. x = self._conv(x) - if self._batch_norm is not None: + if self._batch_norm is not None and self._conv_type != '2plus1d': x = self._batch_norm(x) - if self._activation_layer is not None: + if self._activation_layer is not None and self._conv_type != '2plus1d': x = self._activation_layer(x) if self._conv_temporal is not None: x = self._conv_temporal(x) - if self._batch_norm_temporal is not None: + if self._batch_norm_temporal is not None and self._conv_type != '2plus1d': x = self._batch_norm_temporal(x) - if self._activation_layer is not None: + if self._activation_layer is not None and self._conv_type != '2plus1d': x = self._activation_layer(x) return x @@ -640,10 +651,13 @@ class StreamConvBlock(ConvBlock): if self._conv_temporal is None and self._stream_buffer is not None: x, states = self._stream_buffer(x, states=states) + # bn_op and activation_op are folded into the '2plus1d' conv layer so that + # we do not explicitly call them here. + # TODO(lzyuan): clean the conv layers api once the models are re-trained. x = self._conv(x) - if self._batch_norm is not None: + if self._batch_norm is not None and self._conv_type != '2plus1d': x = self._batch_norm(x) - if self._activation_layer is not None: + if self._activation_layer is not None and self._conv_type != '2plus1d': x = self._activation_layer(x) if self._conv_temporal is not None: @@ -653,9 +667,9 @@ class StreamConvBlock(ConvBlock): x, states = self._stream_buffer(x, states=states) x = self._conv_temporal(x) - if self._batch_norm_temporal is not None: + if self._batch_norm_temporal is not None and self._conv_type != '2plus1d': x = self._batch_norm_temporal(x) - if self._activation_layer is not None: + if self._activation_layer is not None and self._conv_type != '2plus1d': x = self._activation_layer(x) return x, states @@ -788,12 +802,14 @@ class StreamSqueezeExcitation(tf.keras.layers.Layer): states = dict(states) if states is not None else {} if self._se_type == '3d': - x, states = self._spatiotemporal_pool(inputs, states=states) + x, states = self._spatiotemporal_pool( + inputs, states=states, output_states=True) elif self._se_type == '2d': x = self._spatial_pool(inputs) elif self._se_type == '2plus3d': x_space = self._spatial_pool(inputs) - x, states = self._spatiotemporal_pool(x_space, states=states) + x, states = self._spatiotemporal_pool( + x_space, states=states, output_states=True) if not self._causal: x = tf.tile(x, [1, tf.shape(inputs)[1], 1, 1, 1]) @@ -885,7 +901,8 @@ class MobileBottleneck(tf.keras.layers.Layer): x = self._expansion_layer(inputs) x, states = self._feature_layer(x, states=states) - x, states = self._attention_layer(x, states=states) + if self._attention_layer is not None: + x, states = self._attention_layer(x, states=states) x = self._projection_layer(x) # Add identity so that the ops are ordered as written. This is useful for, @@ -1136,18 +1153,20 @@ class MovinetBlock(tf.keras.layers.Layer): batch_norm_momentum=self._batch_norm_momentum, batch_norm_epsilon=self._batch_norm_epsilon, name='projection') - self._attention = StreamSqueezeExcitation( - se_hidden_filters, - se_type=se_type, - activation=activation, - gating_activation=gating_activation, - causal=self._causal, - conv_type=conv_type, - use_positional_encoding=use_positional_encoding, - kernel_initializer=kernel_initializer, - kernel_regularizer=kernel_regularizer, - state_prefix=state_prefix, - name='se') + self._attention = None + if se_type != 'none': + self._attention = StreamSqueezeExcitation( + se_hidden_filters, + se_type=se_type, + activation=activation, + gating_activation=gating_activation, + causal=self._causal, + conv_type=conv_type, + use_positional_encoding=use_positional_encoding, + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + state_prefix=state_prefix, + name='se') def get_config(self): """Returns a dictionary containing the config used for initialization.""" @@ -1345,6 +1364,7 @@ class Head(tf.keras.layers.Layer): tf.keras.layers.BatchNormalization, batch_norm_momentum: float = 0.99, batch_norm_epsilon: float = 1e-3, + average_pooling_type: str = '3d', state_prefix: Optional[str] = None, # pytype: disable=annotation-type-mismatch # typed-keras **kwargs): """Implementation for video model head. @@ -1361,6 +1381,8 @@ class Head(tf.keras.layers.Layer): batch_norm_layer: class to use for batch norm. batch_norm_momentum: momentum of the batch norm operation. batch_norm_epsilon: epsilon of the batch norm operation. + average_pooling_type: The average pooling type. Currently supporting + ['3d', '2d', 'none']. state_prefix: a prefix string to identify states. **kwargs: keyword arguments to be passed to this layer. """ @@ -1387,8 +1409,16 @@ class Head(tf.keras.layers.Layer): batch_norm_momentum=self._batch_norm_momentum, batch_norm_epsilon=self._batch_norm_epsilon, name='project') - self._pool = nn_layers.GlobalAveragePool3D( - keepdims=True, causal=False, state_prefix=state_prefix) + if average_pooling_type.lower() == '3d': + self._pool = nn_layers.GlobalAveragePool3D( + keepdims=True, causal=False, state_prefix=state_prefix) + elif average_pooling_type.lower() == '2d': + self._pool = nn_layers.SpatialAveragePool3D(keepdims=True) + elif average_pooling_type == 'none': + self._pool = None + else: + raise ValueError( + '%s average_pooling_type is not supported.' % average_pooling_type) def get_config(self): """Returns a dictionary containing the config used for initialization.""" @@ -1422,7 +1452,11 @@ class Head(tf.keras.layers.Layer): """ states = dict(states) if states is not None else {} x = self._project(inputs) - return self._pool(x, states=states) + if self._pool is not None: + outputs = self._pool(x, states=states, output_states=True) + else: + outputs = (x, states) + return outputs @tf.keras.utils.register_keras_serializable(package='Vision') diff --git a/official/vision/beta/projects/movinet/modeling/movinet_layers_test.py b/official/projects/movinet/modeling/movinet_layers_test.py similarity index 80% rename from official/vision/beta/projects/movinet/modeling/movinet_layers_test.py rename to official/projects/movinet/modeling/movinet_layers_test.py index 472ad167571a56473f65f395b9ce2acdac75481f..b4027043c1acaf5a0cc71c8566acadda46c64eb6 100644 --- a/official/vision/beta/projects/movinet/modeling/movinet_layers_test.py +++ b/official/projects/movinet/modeling/movinet_layers_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for movinet_layers.py.""" from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.layers import nn_layers -from official.vision.beta.projects.movinet.modeling import movinet_layers +from official.projects.movinet.modeling import movinet_layers +from official.vision.modeling.layers import nn_layers class MovinetLayersTest(parameterized.TestCase, tf.test.TestCase): @@ -64,6 +63,72 @@ class MovinetLayersTest(parameterized.TestCase, tf.test.TestCase): self.assertEqual(predicted.shape, expected.shape) self.assertAllClose(predicted, expected) + def test_mobile_conv2d_bn(self): + batch_norm_op = tf.keras.layers.BatchNormalization( + momentum=0.9, + epsilon=1., + name='bn') + conv2d = movinet_layers.MobileConv2D( + filters=3, + kernel_size=(3, 3), + strides=(1, 1), + padding='same', + kernel_initializer='ones', + use_bias=False, + use_depthwise=False, + use_temporal=False, + use_buffered_input=True, + batch_norm_op=batch_norm_op, + ) + + inputs = tf.ones([1, 2, 2, 2, 3]) + + predicted = conv2d(inputs) + + expected = tf.constant( + [[[[[8.48528, 8.48528, 8.48528], + [8.48528, 8.48528, 8.48528]], + [[8.48528, 8.48528, 8.48528], + [8.48528, 8.48528, 8.48528]]], + [[[8.48528, 8.48528, 8.48528], + [8.48528, 8.48528, 8.48528]], + [[8.48528, 8.48528, 8.48528], + [8.48528, 8.48528, 8.48528]]]]]) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + def test_mobile_conv2d_activation(self): + conv2d = movinet_layers.MobileConv2D( + filters=3, + kernel_size=(3, 3), + strides=(1, 1), + padding='same', + kernel_initializer='ones', + use_bias=False, + use_depthwise=False, + use_temporal=False, + use_buffered_input=True, + activation_op=tf.nn.relu6, + ) + + inputs = tf.ones([1, 2, 2, 2, 3]) + + predicted = conv2d(inputs) + + expected = tf.constant( + [[[[[6., 6., 6.], + [6., 6., 6.]], + [[6., 6., 6.], + [6., 6., 6.]]], + [[[6., 6., 6.], + [6., 6., 6.]], + [[6., 6., 6.], + [6., 6., 6.]]]]]) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + def test_mobile_conv2d_temporal(self): conv2d = movinet_layers.MobileConv2D( filters=3, @@ -378,6 +443,35 @@ class MovinetLayersTest(parameterized.TestCase, tf.test.TestCase): self.assertEqual(predicted.shape, expected.shape) self.assertAllClose(predicted, expected) + def test_stream_movinet_block_none_se(self): + block = movinet_layers.MovinetBlock( + out_filters=3, + expand_filters=6, + kernel_size=(3, 3, 3), + strides=(1, 2, 2), + causal=True, + se_type='none', + state_prefix='test', + ) + + inputs = tf.range(4, dtype=tf.float32) + 1. + inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) + inputs = tf.tile(inputs, [1, 1, 2, 1, 3]) + expected, expected_states = block(inputs) + + for num_splits in [1, 2, 4]: + frames = tf.split(inputs, inputs.shape[1] // num_splits, axis=1) + states = {} + predicted = [] + for frame in frames: + x, states = block(frame, states=states) + predicted.append(x) + predicted = tf.concat(predicted, axis=1) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + self.assertAllEqual(list(expected_states.keys()), ['test_stream_buffer']) + def test_stream_classifier_head(self): head = movinet_layers.Head(project_filters=5) classifier_head = movinet_layers.ClassifierHead( diff --git a/official/vision/beta/projects/movinet/modeling/movinet_model.py b/official/projects/movinet/modeling/movinet_model.py similarity index 88% rename from official/vision/beta/projects/movinet/modeling/movinet_model.py rename to official/projects/movinet/modeling/movinet_model.py index e269306c0c1cf960d73c9c37f4daa36119c9ec9e..0b527f7c159a569358a6a3c52f91366684cc0069 100644 --- a/official/vision/beta/projects/movinet/modeling/movinet_model.py +++ b/official/projects/movinet/modeling/movinet_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,10 +21,10 @@ from typing import Any, Dict, Mapping, Optional, Sequence, Tuple, Union from absl import logging import tensorflow as tf -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import factory_3d as model_factory -from official.vision.beta.projects.movinet.configs import movinet as cfg -from official.vision.beta.projects.movinet.modeling import movinet_layers +from official.projects.movinet.configs import movinet as cfg +from official.projects.movinet.modeling import movinet_layers +from official.vision.modeling import backbones +from official.vision.modeling import factory_3d as model_factory @tf.keras.utils.register_keras_serializable(package='Vision') @@ -88,14 +88,13 @@ class MovinetClassifier(tf.keras.Model): # Move backbone after super() call so Keras is happy self._backbone = backbone - def _build_network( + def _build_backbone( self, backbone: tf.keras.Model, input_specs: Mapping[str, tf.keras.layers.InputSpec], state_specs: Optional[Mapping[str, tf.keras.layers.InputSpec]] = None, - ) -> Tuple[Mapping[str, tf.keras.Input], Union[Tuple[Mapping[ # pytype: disable=invalid-annotation # typed-keras - str, tf.Tensor], Mapping[str, tf.Tensor]], Mapping[str, tf.Tensor]]]: - """Builds the model network. + ) -> Tuple[Mapping[str, Any], Any, Any]: + """Builds the backbone network and gets states and endpoints. Args: backbone: the model backbone. @@ -104,9 +103,9 @@ class MovinetClassifier(tf.keras.Model): layer, will overwrite the contents of the buffer(s). Returns: - Inputs and outputs as a tuple. Inputs are expected to be a dict with - base input and states. Outputs are expected to be a dict of endpoints - and (optionally) output states. + inputs: a dict of input specs. + endpoints: a dict of model endpoints. + states: a dict of model states. """ state_specs = state_specs if state_specs is not None else {} @@ -145,7 +144,30 @@ class MovinetClassifier(tf.keras.Model): mismatched_shapes)) else: endpoints, states = backbone(inputs) + return inputs, endpoints, states + def _build_network( + self, + backbone: tf.keras.Model, + input_specs: Mapping[str, tf.keras.layers.InputSpec], + state_specs: Optional[Mapping[str, tf.keras.layers.InputSpec]] = None, + ) -> Tuple[Mapping[str, tf.keras.Input], Union[Tuple[Mapping[ # pytype: disable=invalid-annotation # typed-keras + str, tf.Tensor], Mapping[str, tf.Tensor]], Mapping[str, tf.Tensor]]]: + """Builds the model network. + + Args: + backbone: the model backbone. + input_specs: the model input spec to use. + state_specs: a dict of states such that, if any of the keys match for a + layer, will overwrite the contents of the buffer(s). + + Returns: + Inputs and outputs as a tuple. Inputs are expected to be a dict with + base input and states. Outputs are expected to be a dict of endpoints + and (optionally) output states. + """ + inputs, endpoints, states = self._build_backbone( + backbone=backbone, input_specs=input_specs, state_specs=state_specs) x = endpoints['head'] x = movinet_layers.ClassifierHead( diff --git a/official/vision/beta/projects/movinet/modeling/movinet_model_test.py b/official/projects/movinet/modeling/movinet_model_test.py similarity index 97% rename from official/vision/beta/projects/movinet/modeling/movinet_model_test.py rename to official/projects/movinet/modeling/movinet_model_test.py index 7075b487ebce111a59b7a0321673429f12418b28..3187e38a3f62143dc5f7134d524385d91e902041 100644 --- a/official/vision/beta/projects/movinet/modeling/movinet_model_test.py +++ b/official/projects/movinet/modeling/movinet_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,15 +12,14 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for movinet_model.py.""" from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.projects.movinet.modeling import movinet -from official.vision.beta.projects.movinet.modeling import movinet_model +from official.projects.movinet.modeling import movinet +from official.projects.movinet.modeling import movinet_model class MovinetModelTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/movinet/modeling/movinet_test.py b/official/projects/movinet/modeling/movinet_test.py new file mode 100644 index 0000000000000000000000000000000000000000..0b082c00a62b42a4f7accacc928c47eadb4c4192 --- /dev/null +++ b/official/projects/movinet/modeling/movinet_test.py @@ -0,0 +1,225 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for movinet.py.""" + +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.movinet.modeling import movinet + + +class MoViNetTest(parameterized.TestCase, tf.test.TestCase): + + def test_network_creation(self): + """Test creation of MoViNet family models.""" + tf.keras.backend.set_image_data_format('channels_last') + + network = movinet.Movinet( + model_id='a0', + causal=True, + ) + inputs = tf.keras.Input(shape=(8, 128, 128, 3), batch_size=1) + endpoints, states = network(inputs) + + self.assertAllEqual(endpoints['stem'].shape, [1, 8, 64, 64, 8]) + self.assertAllEqual(endpoints['block0_layer0'].shape, [1, 8, 32, 32, 8]) + self.assertAllEqual(endpoints['block1_layer0'].shape, [1, 8, 16, 16, 32]) + self.assertAllEqual(endpoints['block2_layer0'].shape, [1, 8, 8, 8, 56]) + self.assertAllEqual(endpoints['block3_layer0'].shape, [1, 8, 8, 8, 56]) + self.assertAllEqual(endpoints['block4_layer0'].shape, [1, 8, 4, 4, 104]) + self.assertAllEqual(endpoints['head'].shape, [1, 1, 1, 1, 480]) + + self.assertNotEmpty(states) + + def test_network_with_states(self): + """Test creation of MoViNet family models with states.""" + tf.keras.backend.set_image_data_format('channels_last') + + backbone = movinet.Movinet( + model_id='a0', + causal=True, + use_external_states=True, + ) + inputs = tf.ones([1, 8, 128, 128, 3]) + + init_states = backbone.init_states(tf.shape(inputs)) + endpoints, new_states = backbone({**init_states, 'image': inputs}) + + self.assertAllEqual(endpoints['stem'].shape, [1, 8, 64, 64, 8]) + self.assertAllEqual(endpoints['block0_layer0'].shape, [1, 8, 32, 32, 8]) + self.assertAllEqual(endpoints['block1_layer0'].shape, [1, 8, 16, 16, 32]) + self.assertAllEqual(endpoints['block2_layer0'].shape, [1, 8, 8, 8, 56]) + self.assertAllEqual(endpoints['block3_layer0'].shape, [1, 8, 8, 8, 56]) + self.assertAllEqual(endpoints['block4_layer0'].shape, [1, 8, 4, 4, 104]) + self.assertAllEqual(endpoints['head'].shape, [1, 1, 1, 1, 480]) + + self.assertNotEmpty(init_states) + self.assertNotEmpty(new_states) + + def test_movinet_stream(self): + """Test if the backbone can be run in streaming mode.""" + tf.keras.backend.set_image_data_format('channels_last') + + backbone = movinet.Movinet( + model_id='a0', + causal=True, + use_external_states=True, + ) + inputs = tf.ones([1, 5, 128, 128, 3]) + + init_states = backbone.init_states(tf.shape(inputs)) + expected_endpoints, _ = backbone({**init_states, 'image': inputs}) + + frames = tf.split(inputs, inputs.shape[1], axis=1) + + states = init_states + for frame in frames: + output, states = backbone({**states, 'image': frame}) + predicted_endpoints = output + + predicted = predicted_endpoints['head'] + + # The expected final output is simply the mean across frames + expected = expected_endpoints['head'] + expected = tf.reduce_mean(expected, 1, keepdims=True) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected, 1e-5, 1e-5) + + def test_movinet_stream_nse(self): + """Test if the backbone can be run in streaming mode w/o SE layer.""" + tf.keras.backend.set_image_data_format('channels_last') + + backbone = movinet.Movinet( + model_id='a0', + causal=True, + use_external_states=True, + se_type='none', + ) + inputs = tf.ones([1, 5, 128, 128, 3]) + + init_states = backbone.init_states(tf.shape(inputs)) + expected_endpoints, _ = backbone({**init_states, 'image': inputs}) + + frames = tf.split(inputs, inputs.shape[1], axis=1) + + states = init_states + for frame in frames: + output, states = backbone({**states, 'image': frame}) + predicted_endpoints = output + + predicted = predicted_endpoints['head'] + + # The expected final output is simply the mean across frames + expected = expected_endpoints['head'] + expected = tf.reduce_mean(expected, 1, keepdims=True) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected, 1e-5, 1e-5) + + # Check contents in the states dictionary. + state_keys = list(init_states.keys()) + self.assertIn('state_head_pool_buffer', state_keys) + self.assertIn('state_head_pool_frame_count', state_keys) + state_keys.remove('state_head_pool_buffer') + state_keys.remove('state_head_pool_frame_count') + # From now on, there are only 'stream_buffer' for the convolutions. + for state_key in state_keys: + self.assertIn( + 'stream_buffer', state_key, + msg=f'Expecting stream_buffer only, found {state_key}') + + def test_movinet_2plus1d_stream(self): + tf.keras.backend.set_image_data_format('channels_last') + + backbone = movinet.Movinet( + model_id='a0', + causal=True, + conv_type='2plus1d', + use_external_states=True, + ) + inputs = tf.ones([1, 5, 128, 128, 3]) + + init_states = backbone.init_states(tf.shape(inputs)) + expected_endpoints, _ = backbone({**init_states, 'image': inputs}) + + frames = tf.split(inputs, inputs.shape[1], axis=1) + + states = init_states + for frame in frames: + output, states = backbone({**states, 'image': frame}) + predicted_endpoints = output + + predicted = predicted_endpoints['head'] + + # The expected final output is simply the mean across frames + expected = expected_endpoints['head'] + expected = tf.reduce_mean(expected, 1, keepdims=True) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected, 1e-5, 1e-5) + + def test_movinet_3d_2plus1d_stream(self): + tf.keras.backend.set_image_data_format('channels_last') + + backbone = movinet.Movinet( + model_id='a0', + causal=True, + conv_type='3d_2plus1d', + use_external_states=True, + ) + inputs = tf.ones([1, 5, 128, 128, 3]) + + init_states = backbone.init_states(tf.shape(inputs)) + expected_endpoints, _ = backbone({**init_states, 'image': inputs}) + + frames = tf.split(inputs, inputs.shape[1], axis=1) + + states = init_states + for frame in frames: + output, states = backbone({**states, 'image': frame}) + predicted_endpoints = output + + predicted = predicted_endpoints['head'] + + # The expected final output is simply the mean across frames + expected = expected_endpoints['head'] + expected = tf.reduce_mean(expected, 1, keepdims=True) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected, 1e-5, 1e-5) + + def test_serialize_deserialize(self): + # Create a network object that sets all of its config options. + kwargs = dict( + model_id='a0', + causal=True, + use_positional_encoding=True, + use_external_states=True, + ) + network = movinet.Movinet(**kwargs) + + # Create another network object from the first object's config. + new_network = movinet.Movinet.from_config(network.get_config()) + + # Validate that the config can be forced to JSON. + _ = new_network.to_json() + + # If the serialization was successful, the new config should match the old. + self.assertAllEqual(network.get_config(), new_network.get_config()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/movinet/movinet_tutorial.ipynb b/official/projects/movinet/movinet_tutorial.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..a29cc72fac9d799ae03773253b736ccf7dc3cca0 --- /dev/null +++ b/official/projects/movinet/movinet_tutorial.ipynb @@ -0,0 +1,1112 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "3E96e1UKQ8uR" + }, + "source": [ + "# MoViNet Tutorial\n", + "\n", + "This notebook provides basic example code to build, run, and fine-tune [MoViNets (Mobile Video Networks)](https://arxiv.org/pdf/2103.11511.pdf).\n", + "\n", + "Pretrained models are provided by [TensorFlow Hub](https://tfhub.dev/google/collections/movinet/) and the [TensorFlow Model Garden](https://github.com/tensorflow/models/tree/master/official/projects/movinet), trained on [Kinetics 600](https://deepmind.com/research/open-source/kinetics) for video action classification. All Models use TensorFlow 2 with Keras for inference and training.\n", + "\n", + "The following steps will be performed:\n", + "\n", + "1. [Running base model inference with TensorFlow Hub](#scrollTo=6g0tuFvf71S9\u0026line=8\u0026uniqifier=1)\n", + "2. [Running streaming model inference with TensorFlow Hub and plotting predictions](#scrollTo=ADrHPmwGcBZ5\u0026line=4\u0026uniqifier=1)\n", + "3. [Exporting a streaming model to TensorFlow Lite for mobile](#scrollTo=W3CLHvubvdSI\u0026line=3\u0026uniqifier=1)\n", + "4. [Fine-Tuning a base Model with the TensorFlow Model Garden](#scrollTo=_s-7bEoa3f8g\u0026line=11\u0026uniqifier=1)\n", + "\n", + "![jumping jacks plot](https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/jumpingjacks_plot.gif)\n", + "\n", + "To generate video plots like the one above, see [section 2](#scrollTo=ADrHPmwGcBZ5\u0026line=4\u0026uniqifier=1)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8_oLnvJy7kz5" + }, + "source": [ + "## Setup\n", + "\n", + "For inference on smaller models (A0-A2), CPU is sufficient for this Colab. For fine-tuning, it is recommended to run the models using GPUs.\n", + "\n", + "To select a GPU in Colab, select `Runtime \u003e Change runtime type \u003e Hardware accelerator \u003e GPU` dropdown in the top menu." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "s3khsunT7kWa" + }, + "outputs": [], + "source": [ + "# Install packages\n", + "\n", + "# tf-models-official is the stable Model Garden package\n", + "# tf-models-nightly includes latest changes\n", + "!pip install -q tf-models-nightly\n", + "\n", + "# Install tfds nightly to download ucf101\n", + "!pip install -q tfds-nightly\n", + "\n", + "# Install the mediapy package for visualizing images/videos.\n", + "# See https://github.com/google/mediapy\n", + "!command -v ffmpeg \u003e/dev/null || (apt update \u0026\u0026 apt install -y ffmpeg)\n", + "!pip install -q mediapy\n", + "\n", + "# Due to a bug, we reinstall opencv\n", + "# See https://stackoverflow.com/q/70537488\n", + "!pip uninstall -q -y opencv-python-headless\n", + "!pip install -q \"opencv-python-headless\u003c4.3\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dI_1csl6Q-gH" + }, + "outputs": [], + "source": [ + "# Run imports\n", + "import os\n", + "\n", + "import matplotlib as mpl\n", + "import matplotlib.pyplot as plt\n", + "import mediapy as media\n", + "import numpy as np\n", + "import PIL\n", + "import pandas as pd\n", + "import tensorflow as tf\n", + "import tensorflow_datasets as tfds\n", + "import tensorflow_hub as hub\n", + "import tqdm\n", + "\n", + "mpl.rcParams.update({\n", + " 'font.size': 10,\n", + "})" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OnFqOXazoWgy" + }, + "source": [ + "Run the cell below to define helper functions and create variables." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "dx55NK3ZoZeh" + }, + "outputs": [], + "source": [ + "#@title Run this cell to set up some helper code.\n", + "\n", + "# Download Kinetics 600 label map\n", + "!wget https://raw.githubusercontent.com/tensorflow/models/f8af2291cced43fc9f1d9b41ddbf772ae7b0d7d2/official/projects/movinet/files/kinetics_600_labels.txt -O labels.txt -q\n", + "\n", + "with tf.io.gfile.GFile('labels.txt') as f:\n", + " lines = f.readlines()\n", + " KINETICS_600_LABELS_LIST = [line.strip() for line in lines]\n", + " KINETICS_600_LABELS = tf.constant(KINETICS_600_LABELS_LIST)\n", + "\n", + "def get_top_k(probs, k=5, label_map=KINETICS_600_LABELS):\n", + " \"\"\"Outputs the top k model labels and probabilities on the given video.\"\"\"\n", + " top_predictions = tf.argsort(probs, axis=-1, direction='DESCENDING')[:k]\n", + " top_labels = tf.gather(label_map, top_predictions, axis=-1)\n", + " top_labels = [label.decode('utf8') for label in top_labels.numpy()]\n", + " top_probs = tf.gather(probs, top_predictions, axis=-1).numpy()\n", + " return tuple(zip(top_labels, top_probs))\n", + "\n", + "def predict_top_k(model, video, k=5, label_map=KINETICS_600_LABELS):\n", + " \"\"\"Outputs the top k model labels and probabilities on the given video.\"\"\"\n", + " outputs = model.predict(video[tf.newaxis])[0]\n", + " probs = tf.nn.softmax(outputs)\n", + " return get_top_k(probs, k=k, label_map=label_map)\n", + "\n", + "def load_movinet_from_hub(model_id, model_mode, hub_version=3):\n", + " \"\"\"Loads a MoViNet model from TF Hub.\"\"\"\n", + " hub_url = f'https://tfhub.dev/tensorflow/movinet/{model_id}/{model_mode}/kinetics-600/classification/{hub_version}'\n", + "\n", + " encoder = hub.KerasLayer(hub_url, trainable=True)\n", + "\n", + " inputs = tf.keras.layers.Input(\n", + " shape=[None, None, None, 3],\n", + " dtype=tf.float32)\n", + "\n", + " if model_mode == 'base':\n", + " inputs = dict(image=inputs)\n", + " else:\n", + " # Define the state inputs, which is a dict that maps state names to tensors.\n", + " init_states_fn = encoder.resolved_object.signatures['init_states']\n", + " state_shapes = {\n", + " name: ([s if s \u003e 0 else None for s in state.shape], state.dtype)\n", + " for name, state in init_states_fn(tf.constant([0, 0, 0, 0, 3])).items()\n", + " }\n", + " states_input = {\n", + " name: tf.keras.Input(shape[1:], dtype=dtype, name=name)\n", + " for name, (shape, dtype) in state_shapes.items()\n", + " }\n", + "\n", + " # The inputs to the model are the states and the video\n", + " inputs = {**states_input, 'image': inputs}\n", + "\n", + " # Output shape: [batch_size, 600]\n", + " outputs = encoder(inputs)\n", + "\n", + " model = tf.keras.Model(inputs, outputs)\n", + " model.build([1, 1, 1, 1, 3])\n", + "\n", + " return model\n", + "\n", + "# Download example gif\n", + "!wget https://github.com/tensorflow/models/raw/f8af2291cced43fc9f1d9b41ddbf772ae7b0d7d2/official/projects/movinet/files/jumpingjack.gif -O jumpingjack.gif -q\n", + "\n", + "def load_gif(file_path, image_size=(224, 224)):\n", + " \"\"\"Loads a gif file into a TF tensor.\"\"\"\n", + " with tf.io.gfile.GFile(file_path, 'rb') as f:\n", + " video = tf.io.decode_gif(f.read())\n", + " video = tf.image.resize(video, image_size)\n", + " video = tf.cast(video, tf.float32) / 255.\n", + " return video\n", + "\n", + "def get_top_k_streaming_labels(probs, k=5, label_map=KINETICS_600_LABELS_LIST):\n", + " \"\"\"Returns the top-k labels over an entire video sequence.\n", + "\n", + " Args:\n", + " probs: probability tensor of shape (num_frames, num_classes) that represents\n", + " the probability of each class on each frame.\n", + " k: the number of top predictions to select.\n", + " label_map: a list of labels to map logit indices to label strings.\n", + "\n", + " Returns:\n", + " a tuple of the top-k probabilities, labels, and logit indices\n", + " \"\"\"\n", + " top_categories_last = tf.argsort(probs, -1, 'DESCENDING')[-1, :1]\n", + " categories = tf.argsort(probs, -1, 'DESCENDING')[:, :k]\n", + " categories = tf.reshape(categories, [-1])\n", + "\n", + " counts = sorted([\n", + " (i.numpy(), tf.reduce_sum(tf.cast(categories == i, tf.int32)).numpy())\n", + " for i in tf.unique(categories)[0]\n", + " ], key=lambda x: x[1], reverse=True)\n", + "\n", + " top_probs_idx = tf.constant([i for i, _ in counts[:k]])\n", + " top_probs_idx = tf.concat([top_categories_last, top_probs_idx], 0)\n", + " top_probs_idx = tf.unique(top_probs_idx)[0][:k+1]\n", + "\n", + " top_probs = tf.gather(probs, top_probs_idx, axis=-1)\n", + " top_probs = tf.transpose(top_probs, perm=(1, 0))\n", + " top_labels = tf.gather(label_map, top_probs_idx, axis=0)\n", + " top_labels = [label.decode('utf8') for label in top_labels.numpy()]\n", + "\n", + " return top_probs, top_labels, top_probs_idx\n", + "\n", + "def plot_streaming_top_preds_at_step(\n", + " top_probs,\n", + " top_labels,\n", + " step=None,\n", + " image=None,\n", + " legend_loc='lower left',\n", + " duration_seconds=10,\n", + " figure_height=500,\n", + " playhead_scale=0.8,\n", + " grid_alpha=0.3):\n", + " \"\"\"Generates a plot of the top video model predictions at a given time step.\n", + "\n", + " Args:\n", + " top_probs: a tensor of shape (k, num_frames) representing the top-k\n", + " probabilities over all frames.\n", + " top_labels: a list of length k that represents the top-k label strings.\n", + " step: the current time step in the range [0, num_frames].\n", + " image: the image frame to display at the current time step.\n", + " legend_loc: the placement location of the legend.\n", + " duration_seconds: the total duration of the video.\n", + " figure_height: the output figure height.\n", + " playhead_scale: scale value for the playhead.\n", + " grid_alpha: alpha value for the gridlines.\n", + "\n", + " Returns:\n", + " A tuple of the output numpy image, figure, and axes.\n", + " \"\"\"\n", + " num_labels, num_frames = top_probs.shape\n", + " if step is None:\n", + " step = num_frames\n", + "\n", + " fig = plt.figure(figsize=(6.5, 7), dpi=300)\n", + " gs = mpl.gridspec.GridSpec(8, 1)\n", + " ax2 = plt.subplot(gs[:-3, :])\n", + " ax = plt.subplot(gs[-3:, :])\n", + "\n", + " if image is not None:\n", + " ax2.imshow(image, interpolation='nearest')\n", + " ax2.axis('off')\n", + "\n", + " preview_line_x = tf.linspace(0., duration_seconds, num_frames)\n", + " preview_line_y = top_probs\n", + "\n", + " line_x = preview_line_x[:step+1]\n", + " line_y = preview_line_y[:, :step+1]\n", + "\n", + " for i in range(num_labels):\n", + " ax.plot(preview_line_x, preview_line_y[i], label=None, linewidth='1.5',\n", + " linestyle=':', color='gray')\n", + " ax.plot(line_x, line_y[i], label=top_labels[i], linewidth='2.0')\n", + "\n", + "\n", + " ax.grid(which='major', linestyle=':', linewidth='1.0', alpha=grid_alpha)\n", + " ax.grid(which='minor', linestyle=':', linewidth='0.5', alpha=grid_alpha)\n", + "\n", + " min_height = tf.reduce_min(top_probs) * playhead_scale\n", + " max_height = tf.reduce_max(top_probs)\n", + " ax.vlines(preview_line_x[step], min_height, max_height, colors='red')\n", + " ax.scatter(preview_line_x[step], max_height, color='red')\n", + "\n", + " ax.legend(loc=legend_loc)\n", + "\n", + " plt.xlim(0, duration_seconds)\n", + " plt.ylabel('Probability')\n", + " plt.xlabel('Time (s)')\n", + " plt.yscale('log')\n", + "\n", + " fig.tight_layout()\n", + " fig.canvas.draw()\n", + "\n", + " data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)\n", + " data = data.reshape(fig.canvas.get_width_height()[::-1] + (3,))\n", + " plt.close()\n", + "\n", + " figure_width = int(figure_height * data.shape[1] / data.shape[0])\n", + " image = PIL.Image.fromarray(data).resize([figure_width, figure_height])\n", + " image = np.array(image)\n", + "\n", + " return image, (fig, ax, ax2)\n", + "\n", + "def plot_streaming_top_preds(\n", + " probs,\n", + " video,\n", + " top_k=5,\n", + " video_fps=25.,\n", + " figure_height=500,\n", + " use_progbar=True):\n", + " \"\"\"Generates a video plot of the top video model predictions.\n", + "\n", + " Args:\n", + " probs: probability tensor of shape (num_frames, num_classes) that represents\n", + " the probability of each class on each frame.\n", + " video: the video to display in the plot.\n", + " top_k: the number of top predictions to select.\n", + " video_fps: the input video fps.\n", + " figure_fps: the output video fps.\n", + " figure_height: the height of the output video.\n", + " use_progbar: display a progress bar.\n", + "\n", + " Returns:\n", + " A numpy array representing the output video.\n", + " \"\"\"\n", + " video_fps = 8.\n", + " figure_height = 500\n", + " steps = video.shape[0]\n", + " duration = steps / video_fps\n", + "\n", + " top_probs, top_labels, _ = get_top_k_streaming_labels(probs, k=top_k)\n", + "\n", + " images = []\n", + " step_generator = tqdm.trange(steps) if use_progbar else range(steps)\n", + " for i in step_generator:\n", + " image, _ = plot_streaming_top_preds_at_step(\n", + " top_probs=top_probs,\n", + " top_labels=top_labels,\n", + " step=i,\n", + " image=video[i],\n", + " duration_seconds=duration,\n", + " figure_height=figure_height,\n", + " )\n", + " images.append(image)\n", + "\n", + " return np.array(images)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6g0tuFvf71S9" + }, + "source": [ + "## Running Base Model Inference with TensorFlow Hub\n", + "\n", + "We will load MoViNet-A2-Base from TensorFlow Hub as part of the [MoViNet collection](https://tfhub.dev/google/collections/movinet/).\n", + "\n", + "The following code will:\n", + "\n", + "- Load a MoViNet KerasLayer from [tfhub.dev](https://tfhub.dev).\n", + "- Wrap the layer in a [Keras Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model).\n", + "- Load an example gif as a video.\n", + "- Classify the video and print the top-5 predicted classes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KZKKNZVBpglJ" + }, + "outputs": [], + "source": [ + "model = load_movinet_from_hub('a2', 'base', hub_version=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7kU1_pL10l0B" + }, + "source": [ + "To provide a simple example video for classification, we can load a short gif of jumping jacks being performed.\n", + "\n", + "![jumping jacks](https://github.com/tensorflow/models/raw/f8af2291cced43fc9f1d9b41ddbf772ae7b0d7d2/official/projects/movinet/files/jumpingjack.gif)\n", + "\n", + "Attribution: Footage shared by [Coach Bobby Bluford](https://www.youtube.com/watch?v=-AxHpj-EuPg) on YouTube under the CC-BY license." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Iy0rKRrT723_" + }, + "outputs": [], + "source": [ + "video = load_gif('jumpingjack.gif', image_size=(172, 172))\n", + "\n", + "# Show video\n", + "print(video.shape)\n", + "media.show_video(video.numpy(), fps=5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "P0bZfrAsqPv2", + "outputId": "bd82571f-8dfd-4faf-ed10-e34708b0405d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "jumping jacks 0.9166437\n", + "zumba 0.016020728\n", + "doing aerobics 0.008053946\n", + "dancing charleston 0.006083599\n", + "lunge 0.0035062772\n" + ] + } + ], + "source": [ + "# Run the model on the video and output the top 5 predictions\n", + "outputs = predict_top_k(model, video)\n", + "\n", + "for label, prob in outputs:\n", + " print(label, prob)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ADrHPmwGcBZ5" + }, + "source": [ + "## Run Streaming Model Inference with TensorFlow Hub and Plot Predictions\n", + "\n", + "We will load MoViNet-A0-Stream from TensorFlow Hub as part of the [MoViNet collection](https://tfhub.dev/google/collections/movinet/).\n", + "\n", + "The following code will:\n", + "\n", + "- Load a MoViNet model from [tfhub.dev](https://tfhub.dev).\n", + "- Classify an example video and plot the streaming predictions over time." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tXWR13wthnK5" + }, + "outputs": [], + "source": [ + "model = load_movinet_from_hub('a2', 'stream', hub_version=3)\n", + "\n", + "# Create initial states for the stream model\n", + "init_states_fn = model.layers[-1].resolved_object.signatures['init_states']\n", + "init_states = init_states_fn(tf.shape(video[tf.newaxis]))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YqSkt7l8ltwt", + "outputId": "6ccf1dd6-95d1-43b1-efdb-2e931dd3a19d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "100%|██████████| 13/13 [00:08\u003c00:00, 1.58it/s]\n", + "jumping jacks 0.9998123\n", + "zumba 0.00011835508\n", + "doing aerobics 3.3375818e-05\n", + "dancing charleston 4.9819987e-06\n", + "finger snapping 3.8673647e-06\n" + ] + } + ], + "source": [ + "# Insert your video clip here\n", + "video = load_gif('jumpingjack.gif', image_size=(172, 172))\n", + "clips = tf.split(video[tf.newaxis], video.shape[0], axis=1)\n", + "\n", + "all_logits = []\n", + "\n", + "# To run on a video, pass in one frame at a time\n", + "states = init_states\n", + "for clip in tqdm.tqdm(clips):\n", + " # Input shape: [1, 1, 172, 172, 3]\n", + " logits, states = model.predict({**states, 'image': clip}, verbose=0)\n", + " all_logits.append(logits)\n", + "\n", + "logits = tf.concat(all_logits, 0)\n", + "probs = tf.nn.softmax(logits)\n", + "\n", + "final_probs = probs[-1]\n", + "top_k = get_top_k(final_probs)\n", + "print()\n", + "for label, prob in top_k:\n", + " print(label, prob)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Xdox556CtMRb" + }, + "outputs": [], + "source": [ + "# Generate a plot and output to a video tensor\n", + "plot_video = plot_streaming_top_preds(probs, video, video_fps=8.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NSStKE9klCs3" + }, + "outputs": [], + "source": [ + "# For gif format, set codec='gif'\n", + "media.show_video(plot_video, fps=3)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W3CLHvubvdSI" + }, + "source": [ + "## Export a Streaming Model to TensorFlow Lite for Mobile\n", + "\n", + "We will convert a MoViNet-A0-Stream model to [TensorFlow Lite](https://www.tensorflow.org/lite).\n", + "\n", + "The following code will:\n", + "- Load a MoViNet-A0-Stream model.\n", + "- Convert the model to TF Lite.\n", + "- Run inference on an example video using the Python interpreter." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KH0j-07KVh06" + }, + "outputs": [], + "source": [ + "# Run imports\n", + "from official.vision.configs import video_classification\n", + "from official.projects.movinet.configs import movinet as movinet_configs\n", + "from official.projects.movinet.modeling import movinet\n", + "from official.projects.movinet.modeling import movinet_layers\n", + "from official.projects.movinet.modeling import movinet_model\n", + "from official.projects.movinet.tools import export_saved_model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RLkV0xtPvfkY" + }, + "outputs": [], + "source": [ + "# Export to saved model\n", + "saved_model_dir = 'model'\n", + "tflite_filename = 'model.tflite'\n", + "input_shape = [1, 1, 172, 172, 3]\n", + "batch_size, num_frames, image_size, = input_shape[:3]\n", + "\n", + "tf.keras.backend.clear_session()\n", + "\n", + "# Create the model\n", + "input_specs = tf.keras.layers.InputSpec(shape=input_shape)\n", + "backbone = movinet.Movinet(\n", + " model_id='a0',\n", + " causal=True,\n", + " conv_type='2plus1d',\n", + " se_type='2plus3d',\n", + " input_specs=input_specs,\n", + " activation='hard_swish',\n", + " gating_activation='hard_sigmoid',\n", + " use_sync_bn=False,\n", + " use_external_states=True)\n", + "model = movinet_model.MovinetClassifier(\n", + " backbone=backbone,\n", + " activation='hard_swish',\n", + " num_classes=600,\n", + " output_states=True,\n", + " input_specs=dict(image=input_specs))\n", + "model.build([1, 1, 1, 1, 3])\n", + "\n", + "# Extract pretrained weights\n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tar.gz -O movinet_a0_stream.tar.gz -q\n", + "!tar -xvf movinet_a0_stream.tar.gz\n", + "\n", + "checkpoint_dir = 'movinet_a0_stream'\n", + "checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)\n", + "\n", + "# Convert to saved model\n", + "export_saved_model.export_saved_model(\n", + " model=model,\n", + " input_shape=input_shape,\n", + " export_path=saved_model_dir,\n", + " causal=True,\n", + " bundle_input_init_states_fn=False,\n", + " checkpoint_path=checkpoint_path)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gPg_6eMC8IwF" + }, + "outputs": [], + "source": [ + "# Convert to TF Lite\n", + "converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)\n", + "tflite_model = converter.convert()\n", + "\n", + "with open(tflite_filename, 'wb') as f:\n", + " f.write(tflite_model)\n", + "\n", + "# Create the interpreter and signature runner\n", + "interpreter = tf.lite.Interpreter(model_path=tflite_filename)\n", + "runner = interpreter.get_signature_runner()\n", + "\n", + "init_states = {\n", + " name: tf.zeros(x['shape'], dtype=x['dtype'])\n", + " for name, x in runner.get_input_details().items()\n", + "}\n", + "del init_states['image']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-TQ-7oSJIlTA", + "outputId": "a15519ff-d08c-40bc-fbea-d3a58169450c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "jumping jacks 0.9791285\n", + "jogging 0.0019550633\n", + "riding unicycle 0.0017429002\n", + "passing soccer ball 0.0016952101\n", + "stretching arm 0.0014458151\n" + ] + } + ], + "source": [ + "# Insert your video clip here\n", + "video = load_gif('jumpingjack.gif', image_size=(172, 172))\n", + "clips = tf.split(video[tf.newaxis], video.shape[0], axis=1)\n", + "\n", + "# To run on a video, pass in one frame at a time\n", + "states = init_states\n", + "for clip in clips:\n", + " # Input shape: [1, 1, 172, 172, 3]\n", + " outputs = runner(**states, image=clip)\n", + " logits = outputs.pop('logits')[0]\n", + " states = outputs\n", + "\n", + "probs = tf.nn.softmax(logits)\n", + "top_k = get_top_k(probs)\n", + "print()\n", + "for label, prob in top_k:\n", + " print(label, prob)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_s-7bEoa3f8g" + }, + "source": [ + "## Fine-Tune a Base Model with the TensorFlow Model Garden\n", + "\n", + "We will Fine-tune MoViNet-A0-Base on [UCF-101](https://www.crcv.ucf.edu/research/data-sets/ucf101/).\n", + "\n", + "The following code will:\n", + "\n", + "- Load the UCF-101 dataset with [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/ucf101).\n", + "- Create a simple [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) pipeline for training and evaluation.\n", + "- Display some example videos from the dataset.\n", + "- Build a MoViNet model and load pretrained weights.\n", + "- Fine-tune the final classifier layers on UCF-101 and evaluate accuracy on the validation set." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o7unW4WVr580" + }, + "source": [ + "### Load the UCF-101 Dataset with TensorFlow Datasets\n", + "\n", + "Calling `download_and_prepare()` will automatically download the dataset. This step may take up to 1 hour depending on the download and extraction speed. After downloading, the next cell will output information about the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2IHLbPAfrs5P" + }, + "outputs": [], + "source": [ + "# Run imports\n", + "import tensorflow_datasets as tfds\n", + "\n", + "from official.vision.configs import video_classification\n", + "from official.projects.movinet.configs import movinet as movinet_configs\n", + "from official.projects.movinet.modeling import movinet\n", + "from official.projects.movinet.modeling import movinet_layers\n", + "from official.projects.movinet.modeling import movinet_model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FxM1vNYp_YAM" + }, + "outputs": [], + "source": [ + "dataset_name = 'ucf101'\n", + "\n", + "builder = tfds.builder(dataset_name)\n", + "\n", + "config = tfds.download.DownloadConfig(verify_ssl=False)\n", + "builder.download_and_prepare(download_config=config)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "executionInfo": { + "elapsed": 2957, + "status": "ok", + "timestamp": 1619748263684, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + }, + "user_tz": 360 + }, + "id": "boQHbcfDhXpJ", + "outputId": "eabc3307-d6bf-4f29-cc5a-c8dc6360701b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of classes: 101\n", + "Number of examples for train: 9537\n", + "Number of examples for test: 3783\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "tfds.core.DatasetInfo(\n", + " name='ucf101',\n", + " full_name='ucf101/ucf101_1_256/2.0.0',\n", + " description=\"\"\"\n", + " A 101-label video classification dataset.\n", + " \"\"\",\n", + " config_description=\"\"\"\n", + " 256x256 UCF with the first action recognition split.\n", + " \"\"\",\n", + " homepage='https://www.crcv.ucf.edu/data-sets/ucf101/',\n", + " data_path='/readahead/128M/placer/prod/home/tensorflow-datasets-cns-storage-owner/datasets/ucf101/ucf101_1_256/2.0.0',\n", + " download_size=6.48 GiB,\n", + " dataset_size=Unknown size,\n", + " features=FeaturesDict({\n", + " 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=101),\n", + " 'video': Video(Image(shape=(256, 256, 3), dtype=tf.uint8)),\n", + " }),\n", + " supervised_keys=None,\n", + " splits={\n", + " 'test': \u003cSplitInfo num_examples=3783, num_shards=32\u003e,\n", + " 'train': \u003cSplitInfo num_examples=9537, num_shards=64\u003e,\n", + " },\n", + " citation=\"\"\"@article{DBLP:journals/corr/abs-1212-0402,\n", + " author = {Khurram Soomro and\n", + " Amir Roshan Zamir and\n", + " Mubarak Shah},\n", + " title = {{UCF101:} {A} Dataset of 101 Human Actions Classes From Videos in\n", + " The Wild},\n", + " journal = {CoRR},\n", + " volume = {abs/1212.0402},\n", + " year = {2012},\n", + " url = {http://arxiv.org/abs/1212.0402},\n", + " archivePrefix = {arXiv},\n", + " eprint = {1212.0402},\n", + " timestamp = {Mon, 13 Aug 2018 16:47:45 +0200},\n", + " biburl = {https://dblp.org/rec/bib/journals/corr/abs-1212-0402},\n", + " bibsource = {dblp computer science bibliography, https://dblp.org}\n", + " }\"\"\",\n", + ")" + ] + }, + "execution_count": null, + "metadata": { + "tags": [] + }, + "output_type": "execute_result" + } + ], + "source": [ + "num_classes = builder.info.features['label'].num_classes\n", + "num_examples = {\n", + " name: split.num_examples\n", + " for name, split in builder.info.splits.items()\n", + "}\n", + "\n", + "print('Number of classes:', num_classes)\n", + "print('Number of examples for train:', num_examples['train'])\n", + "print('Number of examples for test:', num_examples['test'])\n", + "print()\n", + "\n", + "builder.info" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9cO_BCu9le3r" + }, + "outputs": [], + "source": [ + "# Build the training and evaluation datasets.\n", + "\n", + "batch_size = 8\n", + "num_frames = 8\n", + "frame_stride = 10\n", + "resolution = 172\n", + "\n", + "def format_features(features):\n", + " video = features['video']\n", + " video = video[:, ::frame_stride]\n", + " video = video[:, :num_frames]\n", + "\n", + " video = tf.reshape(video, [-1, video.shape[2], video.shape[3], 3])\n", + " video = tf.image.resize(video, (resolution, resolution))\n", + " video = tf.reshape(video, [-1, num_frames, resolution, resolution, 3])\n", + " video = tf.cast(video, tf.float32) / 255.\n", + "\n", + " label = tf.one_hot(features['label'], num_classes)\n", + " return (video, label)\n", + "\n", + "train_dataset = builder.as_dataset(\n", + " split='train',\n", + " batch_size=batch_size,\n", + " shuffle_files=True)\n", + "train_dataset = train_dataset.map(\n", + " format_features,\n", + " num_parallel_calls=tf.data.AUTOTUNE)\n", + "train_dataset = train_dataset.repeat()\n", + "train_dataset = train_dataset.prefetch(2)\n", + "\n", + "test_dataset = builder.as_dataset(\n", + " split='test',\n", + " batch_size=batch_size)\n", + "test_dataset = test_dataset.map(\n", + " format_features,\n", + " num_parallel_calls=tf.data.AUTOTUNE,\n", + " deterministic=True)\n", + "test_dataset = test_dataset.prefetch(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rToX7_Ymgh57" + }, + "source": [ + "Display some example videos from the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KG8Z7rUj06of" + }, + "outputs": [], + "source": [ + "videos, labels = next(iter(train_dataset))\n", + "media.show_videos(videos.numpy(), codec='gif', fps=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R3RHeuHdsd_3" + }, + "source": [ + "### Build MoViNet-A0-Base and Load Pretrained Weights\n", + "\n", + "Here we create a MoViNet model using the open source code provided in [official/projects/movinet](https://github.com/tensorflow/models/tree/master/official/projects/movinet) and load the pretrained weights. Here we freeze the all layers except the final classifier head to speed up fine-tuning." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JpfxpeGSsbzJ" + }, + "outputs": [], + "source": [ + "model_id = 'a0'\n", + "\n", + "tf.keras.backend.clear_session()\n", + "\n", + "backbone = movinet.Movinet(model_id=model_id)\n", + "model = movinet_model.MovinetClassifier(backbone=backbone, num_classes=600)\n", + "model.build([1, 1, 1, 1, 3])\n", + "\n", + "# Load pretrained weights\n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_base.tar.gz -O movinet_a0_base.tar.gz -q\n", + "!tar -xvf movinet_a0_base.tar.gz\n", + "\n", + "checkpoint_dir = 'movinet_a0_base'\n", + "checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)\n", + "checkpoint = tf.train.Checkpoint(model=model)\n", + "status = checkpoint.restore(checkpoint_path)\n", + "status.assert_existing_objects_matched()\n", + "\n", + "def build_classifier(backbone, num_classes, freeze_backbone=False):\n", + " \"\"\"Builds a classifier on top of a backbone model.\"\"\"\n", + " model = movinet_model.MovinetClassifier(\n", + " backbone=backbone,\n", + " num_classes=num_classes)\n", + " model.build([batch_size, num_frames, resolution, resolution, 3])\n", + "\n", + " if freeze_backbone:\n", + " for layer in model.layers[:-1]:\n", + " layer.trainable = False\n", + " model.layers[-1].trainable = True\n", + "\n", + " return model\n", + "\n", + "# Wrap the backbone with a new classifier to create a new classifier head\n", + "# with num_classes outputs (101 classes for UCF101).\n", + "# Freeze all layers except for the final classifier head.\n", + "model = build_classifier(backbone, num_classes, freeze_backbone=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ucntdu2xqgXB" + }, + "source": [ + "Configure fine-tuning with training/evaluation steps, loss object, metrics, learning rate, optimizer, and callbacks.\n", + "\n", + "Here we use 3 epochs. Training for more epochs should improve accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WUYTw48BouTu" + }, + "outputs": [], + "source": [ + "num_epochs = 3\n", + "\n", + "train_steps = num_examples['train'] // batch_size\n", + "total_train_steps = train_steps * num_epochs\n", + "test_steps = num_examples['test'] // batch_size\n", + "\n", + "loss_obj = tf.keras.losses.CategoricalCrossentropy(\n", + " from_logits=True,\n", + " label_smoothing=0.1)\n", + "\n", + "metrics = [\n", + " tf.keras.metrics.TopKCategoricalAccuracy(\n", + " k=1, name='top_1', dtype=tf.float32),\n", + " tf.keras.metrics.TopKCategoricalAccuracy(\n", + " k=5, name='top_5', dtype=tf.float32),\n", + "]\n", + "\n", + "initial_learning_rate = 0.01\n", + "learning_rate = tf.keras.optimizers.schedules.CosineDecay(\n", + " initial_learning_rate, decay_steps=total_train_steps,\n", + ")\n", + "optimizer = tf.keras.optimizers.RMSprop(\n", + " learning_rate, rho=0.9, momentum=0.9, epsilon=1.0, clipnorm=1.0)\n", + "\n", + "model.compile(loss=loss_obj, optimizer=optimizer, metrics=metrics)\n", + "\n", + "callbacks = [\n", + " tf.keras.callbacks.TensorBoard(),\n", + "]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0IyAOOlcpHna" + }, + "source": [ + "Run the fine-tuning with Keras compile/fit. After fine-tuning the model, we should be able to achieve \u003e85% accuracy on the test set." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "executionInfo": { + "elapsed": 982253, + "status": "ok", + "timestamp": 1619750139919, + "user": { + "displayName": "", + "photoUrl": "", + "userId": "" + }, + "user_tz": 360 + }, + "id": "Zecc_K3lga8I", + "outputId": "e4c5c61e-aa08-47db-c04c-42dea3efb545" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch 1/3\n", + "1192/1192 [==============================] - 551s 451ms/step - loss: 2.5050 - top_1: 0.6692 - top_5: 0.8753 - val_loss: 1.6310 - val_top_1: 0.8109 - val_top_5: 0.9701\n", + "Epoch 2/3\n", + "1192/1192 [==============================] - 533s 447ms/step - loss: 1.3336 - top_1: 0.9024 - top_5: 0.9906 - val_loss: 1.4576 - val_top_1: 0.8451 - val_top_5: 0.9740\n", + "Epoch 3/3\n", + "1192/1192 [==============================] - 531s 446ms/step - loss: 1.2298 - top_1: 0.9329 - top_5: 0.9943 - val_loss: 1.4351 - val_top_1: 0.8514 - val_top_5: 0.9762\n" + ] + } + ], + "source": [ + "results = model.fit(\n", + " train_dataset,\n", + " validation_data=test_dataset,\n", + " epochs=num_epochs,\n", + " steps_per_epoch=train_steps,\n", + " validation_steps=test_steps,\n", + " callbacks=callbacks,\n", + " validation_freq=1,\n", + " verbose=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XuH8XflmpU9d" + }, + "source": [ + "We can also view the training and evaluation progress in TensorBoard." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9fZhzhRJRd2J" + }, + "outputs": [], + "source": [ + "%reload_ext tensorboard\n", + "%tensorboard --logdir logs --port 0" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "last_runtime": { + "build_target": "//learning/deepmind/dm_python:dm_notebook3", + "kind": "private" + }, + "name": "movinet_tutorial.ipynb", + "provenance": [ + { + "file_id": "11msGCxFjxwioBOBJavP9alfTclUQCJf-", + "timestamp": 1617043059980 + } + ] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/vision/beta/projects/movinet/requirements.txt b/official/projects/movinet/requirements.txt similarity index 100% rename from official/vision/beta/projects/movinet/requirements.txt rename to official/projects/movinet/requirements.txt diff --git a/official/projects/movinet/tools/__init__.py b/official/projects/movinet/tools/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/movinet/tools/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/movinet/tools/convert_3d_2plus1d.py b/official/projects/movinet/tools/convert_3d_2plus1d.py similarity index 91% rename from official/vision/beta/projects/movinet/tools/convert_3d_2plus1d.py rename to official/projects/movinet/tools/convert_3d_2plus1d.py index 4b126150bc9650e8080193f679e8cd8013e2c8fe..0349c6075174f716b3d27f88c58d288cb239ba1b 100644 --- a/official/vision/beta/projects/movinet/tools/convert_3d_2plus1d.py +++ b/official/projects/movinet/tools/convert_3d_2plus1d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,8 +18,8 @@ from absl import app from absl import flags import tensorflow as tf -from official.vision.beta.projects.movinet.modeling import movinet -from official.vision.beta.projects.movinet.modeling import movinet_model +from official.projects.movinet.modeling import movinet +from official.projects.movinet.modeling import movinet_model flags.DEFINE_string( 'input_checkpoint_path', None, @@ -29,6 +29,8 @@ flags.DEFINE_string( 'Export path to save the saved_model file.') flags.DEFINE_string( 'model_id', 'a0', 'MoViNet model name.') +flags.DEFINE_string( + 'se_type', '2plus3d', 'MoViNet model SE type.') flags.DEFINE_bool( 'causal', True, 'Run the model in causal mode.') flags.DEFINE_bool( @@ -47,6 +49,7 @@ def main(_) -> None: model_id=FLAGS.model_id, causal=FLAGS.causal, conv_type='2plus1d', + se_type=FLAGS.se_type, use_positional_encoding=FLAGS.use_positional_encoding) model_2plus1d = movinet_model.MovinetClassifier( backbone=backbone_2plus1d, @@ -57,6 +60,7 @@ def main(_) -> None: model_id=FLAGS.model_id, causal=FLAGS.causal, conv_type='3d_2plus1d', + se_type=FLAGS.se_type, use_positional_encoding=FLAGS.use_positional_encoding) model_3d_2plus1d = movinet_model.MovinetClassifier( backbone=backbone_3d_2plus1d, diff --git a/official/vision/beta/projects/movinet/tools/convert_3d_2plus1d_test.py b/official/projects/movinet/tools/convert_3d_2plus1d_test.py similarity index 84% rename from official/vision/beta/projects/movinet/tools/convert_3d_2plus1d_test.py rename to official/projects/movinet/tools/convert_3d_2plus1d_test.py index d225e13a8704762d0690973d3e13ee4ba11441b7..d2899c9b960ca7991370e2cd991323a091065dba 100644 --- a/official/vision/beta/projects/movinet/tools/convert_3d_2plus1d_test.py +++ b/official/projects/movinet/tools/convert_3d_2plus1d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,9 +19,9 @@ import os from absl import flags import tensorflow as tf -from official.vision.beta.projects.movinet.modeling import movinet -from official.vision.beta.projects.movinet.modeling import movinet_model -from official.vision.beta.projects.movinet.tools import convert_3d_2plus1d +from official.projects.movinet.modeling import movinet +from official.projects.movinet.modeling import movinet_model +from official.projects.movinet.tools import convert_3d_2plus1d FLAGS = flags.FLAGS @@ -36,7 +36,8 @@ class Convert3d2plus1dTest(tf.test.TestCase): model_3d_2plus1d = movinet_model.MovinetClassifier( backbone=movinet.Movinet( model_id='a0', - conv_type='3d_2plus1d'), + conv_type='3d_2plus1d', + se_type='2plus3d'), num_classes=600) model_3d_2plus1d.build([1, 1, 1, 1, 3]) save_checkpoint = tf.train.Checkpoint(model=model_3d_2plus1d) diff --git a/official/projects/movinet/tools/export_saved_model.py b/official/projects/movinet/tools/export_saved_model.py new file mode 100644 index 0000000000000000000000000000000000000000..86be661647760ca88e527446116cc6e481587b62 --- /dev/null +++ b/official/projects/movinet/tools/export_saved_model.py @@ -0,0 +1,299 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Exports models to tf.saved_model. + +Export example: + +```shell +python3 export_saved_model.py \ + --export_path=/tmp/movinet/ \ + --model_id=a0 \ + --causal=True \ + --conv_type="3d" \ + --num_classes=600 \ + --use_positional_encoding=False \ + --checkpoint_path="" +``` + +Export for TF Lite example: + +```shell +python3 export_saved_model.py \ + --model_id=a0 \ + --causal=True \ + --conv_type=2plus1d \ + --se_type=2plus3d \ + --activation=hard_swish \ + --gating_activation=hard_sigmoid \ + --use_positional_encoding=False \ + --num_classes=600 \ + --batch_size=1 \ + --num_frames=1 \ # Use a single frame for streaming mode + --image_size=172 \ # Input resolution for the model + --bundle_input_init_states_fn=False \ + --checkpoint_path=/path/to/checkpoint \ + --export_path=/tmp/movinet_a0_stream +``` + +To use an exported saved_model, refer to export_saved_model_test.py. +""" + +from typing import Optional, Tuple + +from absl import app +from absl import flags +import tensorflow as tf + +from official.projects.movinet.modeling import movinet +from official.projects.movinet.modeling import movinet_model + +flags.DEFINE_string( + 'export_path', '/tmp/movinet/', + 'Export path to save the saved_model file.') +flags.DEFINE_string( + 'model_id', 'a0', 'MoViNet model name.') +flags.DEFINE_bool( + 'causal', False, 'Run the model in causal mode.') +flags.DEFINE_string( + 'conv_type', '3d', + '3d, 2plus1d, or 3d_2plus1d. 3d configures the network ' + 'to use the default 3D convolution. 2plus1d uses (2+1)D convolution ' + 'with Conv2D operations and 2D reshaping (e.g., a 5x3x3 kernel becomes ' + '3x3 followed by 5x1 conv). 3d_2plus1d uses (2+1)D convolution with ' + 'Conv3D and no 2D reshaping (e.g., a 5x3x3 kernel becomes 1x3x3 ' + 'followed by 5x1x1 conv).') +flags.DEFINE_string( + 'se_type', '3d', + '3d, 2d, or 2plus3d. 3d uses the default 3D spatiotemporal global average' + 'pooling for squeeze excitation. 2d uses 2D spatial global average pooling ' + 'on each frame. 2plus3d concatenates both 3D and 2D global average ' + 'pooling.') +flags.DEFINE_string( + 'activation', 'swish', + 'The main activation to use across layers.') +flags.DEFINE_string( + 'classifier_activation', 'swish', + 'The classifier activation to use.') +flags.DEFINE_string( + 'gating_activation', 'sigmoid', + 'The gating activation to use in squeeze-excitation layers.') +flags.DEFINE_bool( + 'use_positional_encoding', False, + 'Whether to use positional encoding (only applied when causal=True).') +flags.DEFINE_integer( + 'num_classes', 600, 'The number of classes for prediction.') +flags.DEFINE_integer( + 'batch_size', None, + 'The batch size of the input. Set to None for dynamic input.') +flags.DEFINE_integer( + 'num_frames', None, + 'The number of frames of the input. Set to None for dynamic input.') +flags.DEFINE_integer( + 'image_size', None, + 'The resolution of the input. Set to None for dynamic input.') +flags.DEFINE_bool( + 'bundle_input_init_states_fn', True, + 'Add init_states as a function signature to the saved model.' + 'This is not necessary if the input shape is static (e.g., for TF Lite).') +flags.DEFINE_string( + 'checkpoint_path', '', + 'Checkpoint path to load. Leave blank for default initialization.') + +FLAGS = flags.FLAGS + + +def export_saved_model( + model: tf.keras.Model, + input_shape: Tuple[int, int, int, int, int], + export_path: str = '/tmp/movinet/', + causal: bool = False, + bundle_input_init_states_fn: bool = True, + checkpoint_path: Optional[str] = None) -> None: + """Exports a MoViNet model to a saved model. + + Args: + model: the tf.keras.Model to export. + input_shape: The 5D spatiotemporal input shape of size + [batch_size, num_frames, image_height, image_width, num_channels]. + Set the field or a shape position in the field to None for dynamic input. + export_path: Export path to save the saved_model file. + causal: Run the model in causal mode. + bundle_input_init_states_fn: Add init_states as a function signature to the + saved model. This is not necessary if the input shape is static (e.g., + for TF Lite). + checkpoint_path: Checkpoint path to load. Leave blank to keep the model's + initialization. + """ + + # Use dimensions of 1 except the channels to export faster, + # since we only really need the last dimension to build and get the output + # states. These dimensions can be set to `None` once the model is built. + input_shape_concrete = [1 if s is None else s for s in input_shape] + model.build(input_shape_concrete) + + # Compile model to generate some internal Keras variables. + model.compile() + + if checkpoint_path: + checkpoint = tf.train.Checkpoint(model=model) + status = checkpoint.restore(checkpoint_path) + status.assert_existing_objects_matched() + + if causal: + # Call the model once to get the output states. Call again with `states` + # input to ensure that the inputs with the `states` argument is built + # with the full output state shapes. + input_image = tf.ones(input_shape_concrete) + _, states = model({ + **model.init_states(input_shape_concrete), 'image': input_image}) + _ = model({**states, 'image': input_image}) + + # Create a function to explicitly set the names of the outputs + def predict(inputs): + outputs, states = model(inputs) + return {**states, 'logits': outputs} + + specs = { + name: tf.TensorSpec(spec.shape, name=name, dtype=spec.dtype) + for name, spec in model.initial_state_specs( + input_shape).items() + } + specs['image'] = tf.TensorSpec( + input_shape, dtype=model.dtype, name='image') + + predict_fn = tf.function(predict, jit_compile=True) + predict_fn = predict_fn.get_concrete_function(specs) + + init_states_fn = tf.function(model.init_states, jit_compile=True) + init_states_fn = init_states_fn.get_concrete_function( + tf.TensorSpec([5], dtype=tf.int32)) + + if bundle_input_init_states_fn: + signatures = {'call': predict_fn, 'init_states': init_states_fn} + else: + signatures = predict_fn + + tf.keras.models.save_model( + model, export_path, signatures=signatures) + else: + _ = model(tf.ones(input_shape_concrete)) + tf.keras.models.save_model(model, export_path) + + +def build_and_export_saved_model( + export_path: str = '/tmp/movinet/', + model_id: str = 'a0', + causal: bool = False, + conv_type: str = '3d', + se_type: str = '3d', + activation: str = 'swish', + classifier_activation: str = 'swish', + gating_activation: str = 'sigmoid', + use_positional_encoding: bool = False, + num_classes: int = 600, + input_shape: Optional[Tuple[int, int, int, int, int]] = None, + bundle_input_init_states_fn: bool = True, + checkpoint_path: Optional[str] = None) -> None: + """Builds and exports a MoViNet model to a saved model. + + Args: + export_path: Export path to save the saved_model file. + model_id: MoViNet model name. + causal: Run the model in causal mode. + conv_type: 3d, 2plus1d, or 3d_2plus1d. 3d configures the network + to use the default 3D convolution. 2plus1d uses (2+1)D convolution + with Conv2D operations and 2D reshaping (e.g., a 5x3x3 kernel becomes + 3x3 followed by 5x1 conv). 3d_2plus1d uses (2+1)D convolution with + Conv3D and no 2D reshaping (e.g., a 5x3x3 kernel becomes 1x3x3 + followed by 5x1x1 conv). + se_type: + 3d, 2d, or 2plus3d. 3d uses the default 3D spatiotemporal global average + pooling for squeeze excitation. 2d uses 2D spatial global average pooling + on each frame. 2plus3d concatenates both 3D and 2D global average + pooling. + activation: The main activation to use across layers. + classifier_activation: The classifier activation to use. + gating_activation: The gating activation to use in squeeze-excitation + layers. + use_positional_encoding: Whether to use positional encoding (only applied + when causal=True). + num_classes: The number of classes for prediction. + input_shape: The 5D spatiotemporal input shape of size + [batch_size, num_frames, image_height, image_width, num_channels]. + Set the field or a shape position in the field to None for dynamic input. + bundle_input_init_states_fn: Add init_states as a function signature to the + saved model. This is not necessary if the input shape is static (e.g., + for TF Lite). + checkpoint_path: Checkpoint path to load. Leave blank for default + initialization. + """ + + input_specs = tf.keras.layers.InputSpec(shape=input_shape) + + # Override swish activation implementation to remove custom gradients + if activation == 'swish': + activation = 'simple_swish' + if classifier_activation == 'swish': + classifier_activation = 'simple_swish' + + backbone = movinet.Movinet( + model_id=model_id, + causal=causal, + use_positional_encoding=use_positional_encoding, + conv_type=conv_type, + se_type=se_type, + input_specs=input_specs, + activation=activation, + gating_activation=gating_activation, + use_sync_bn=False, + use_external_states=causal) + model = movinet_model.MovinetClassifier( + backbone, + num_classes=num_classes, + output_states=causal, + input_specs=dict(image=input_specs), + activation=classifier_activation) + + export_saved_model( + model=model, + input_shape=input_shape, + export_path=export_path, + causal=causal, + bundle_input_init_states_fn=bundle_input_init_states_fn, + checkpoint_path=checkpoint_path) + + +def main(_) -> None: + input_shape = ( + FLAGS.batch_size, FLAGS.num_frames, FLAGS.image_size, FLAGS.image_size, 3) + build_and_export_saved_model( + export_path=FLAGS.export_path, + model_id=FLAGS.model_id, + causal=FLAGS.causal, + conv_type=FLAGS.conv_type, + se_type=FLAGS.se_type, + activation=FLAGS.activation, + classifier_activation=FLAGS.classifier_activation, + gating_activation=FLAGS.gating_activation, + use_positional_encoding=FLAGS.use_positional_encoding, + num_classes=FLAGS.num_classes, + input_shape=input_shape, + bundle_input_init_states_fn=FLAGS.bundle_input_init_states_fn, + checkpoint_path=FLAGS.checkpoint_path) + print(' ----- Done. Saved Model is saved at {}'.format(FLAGS.export_path)) + + +if __name__ == '__main__': + app.run(main) diff --git a/official/vision/beta/projects/movinet/export_saved_model_test.py b/official/projects/movinet/tools/export_saved_model_test.py similarity index 97% rename from official/vision/beta/projects/movinet/export_saved_model_test.py rename to official/projects/movinet/tools/export_saved_model_test.py index cc620c505e2d5d65d2a824cd81acda7c5fa588a9..a06be1c9e5adc4f04d8f32f70aacb45efdde30f2 100644 --- a/official/vision/beta/projects/movinet/export_saved_model_test.py +++ b/official/projects/movinet/tools/export_saved_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ from absl import flags import tensorflow as tf import tensorflow_hub as hub -from official.vision.beta.projects.movinet import export_saved_model +from official.projects.movinet.tools import export_saved_model FLAGS = flags.FLAGS diff --git a/official/projects/movinet/tools/plot_movinet_video_stream_predictions.ipynb b/official/projects/movinet/tools/plot_movinet_video_stream_predictions.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..93d09a77bb5ade95a07dc01a6f3fc3e3e2e7088e --- /dev/null +++ b/official/projects/movinet/tools/plot_movinet_video_stream_predictions.ipynb @@ -0,0 +1,393 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "qwBHHt-XvPqn" + }, + "source": [ + "# Plot MoViNet Video Stream Predictions\n", + "\n", + "This notebook uses [MoViNets (Mobile Video Networks)](https://github.com/tensorflow/models/tree/master/official/projects/movinet) to predict a human action in a streaming video and outputs a visualization of predictions on each frame.\n", + "\n", + "Provide a video URL or upload your own to see how predictions change over time. All models can be run on CPU.\n", + "\n", + "Pretrained models are provided by [TensorFlow Hub](https://tfhub.dev/google/collections/movinet/) and the [TensorFlow Model Garden](https://github.com/tensorflow/models/tree/master/official/projects/movinet), trained on [Kinetics 600](https://deepmind.com/research/open-source/kinetics) for video action classification. All Models use TensorFlow 2 with Keras for inference and training. See the [research paper](https://arxiv.org/pdf/2103.11511.pdf) for more details.\n", + "\n", + "Example output using [this gif](https://github.com/tensorflow/models/raw/f8af2291cced43fc9f1d9b41ddbf772ae7b0d7d2/official/projects/movinet/files/jumpingjack.gif) as input:\n", + "\n", + "![jumping jacks plot](https://storage.googleapis.com/tf_model_garden/vision/movinet/artifacts/jumpingjacks_plot.gif)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "ElvELd9mIfZe" + }, + "outputs": [], + "source": [ + "#@title Run this cell to initialize and setup a [MoViNet](https://github.com/tensorflow/models/tree/master/official/projects/movinet) model.\n", + "\n", + "\n", + "# Install the mediapy package for visualizing images/videos.\n", + "# See https://github.com/google/mediapy\n", + "!pip install -q mediapy\n", + "\n", + "# Run imports\n", + "import os\n", + "import io\n", + "\n", + "import matplotlib as mpl\n", + "import matplotlib.pyplot as plt\n", + "import mediapy as media\n", + "import numpy as np\n", + "import PIL\n", + "import pandas as pd\n", + "import tensorflow as tf\n", + "import tensorflow_datasets as tfds\n", + "import tensorflow_hub as hub\n", + "import tqdm\n", + "from google.colab import files\n", + "import urllib.request\n", + "\n", + "mpl.rcParams.update({\n", + " 'font.size': 10,\n", + "})\n", + "\n", + "\n", + "# Download Kinetics 600 label map\n", + "!wget https://raw.githubusercontent.com/tensorflow/models/f8af2291cced43fc9f1d9b41ddbf772ae7b0d7d2/official/projects/movinet/files/kinetics_600_labels.txt -O labels.txt -q\n", + "\n", + "with tf.io.gfile.GFile('labels.txt') as f:\n", + " lines = f.readlines()\n", + " KINETICS_600_LABELS_LIST = [line.strip() for line in lines]\n", + " KINETICS_600_LABELS = tf.constant(KINETICS_600_LABELS_LIST)\n", + "\n", + "def get_top_k(probs, k=5, label_map=KINETICS_600_LABELS):\n", + " \"\"\"Outputs the top k model labels and probabilities on the given video.\"\"\"\n", + " top_predictions = tf.argsort(probs, axis=-1, direction='DESCENDING')[:k]\n", + " top_labels = tf.gather(label_map, top_predictions, axis=-1)\n", + " top_labels = [label.decode('utf8') for label in top_labels.numpy()]\n", + " top_probs = tf.gather(probs, top_predictions, axis=-1).numpy()\n", + " return tuple(zip(top_labels, top_probs))\n", + "\n", + "def predict_top_k(model, video, k=5, label_map=KINETICS_600_LABELS):\n", + " \"\"\"Outputs the top k model labels and probabilities on the given video.\"\"\"\n", + " outputs = model.predict(video[tf.newaxis])[0]\n", + " probs = tf.nn.softmax(outputs)\n", + " return get_top_k(probs, k=k, label_map=label_map)\n", + "\n", + "def load_movinet_from_hub(model_id, model_mode, hub_version=3):\n", + " \"\"\"Loads a MoViNet model from TF Hub.\"\"\"\n", + " hub_url = f'https://tfhub.dev/tensorflow/movinet/{model_id}/{model_mode}/kinetics-600/classification/{hub_version}'\n", + "\n", + " encoder = hub.KerasLayer(hub_url, trainable=True)\n", + "\n", + " inputs = tf.keras.layers.Input(\n", + " shape=[None, None, None, 3],\n", + " dtype=tf.float32)\n", + "\n", + " if model_mode == 'base':\n", + " inputs = dict(image=inputs)\n", + " else:\n", + " # Define the state inputs, which is a dict that maps state names to tensors.\n", + " init_states_fn = encoder.resolved_object.signatures['init_states']\n", + " state_shapes = {\n", + " name: ([s if s \u003e 0 else None for s in state.shape], state.dtype)\n", + " for name, state in init_states_fn(tf.constant([0, 0, 0, 0, 3])).items()\n", + " }\n", + " states_input = {\n", + " name: tf.keras.Input(shape[1:], dtype=dtype, name=name)\n", + " for name, (shape, dtype) in state_shapes.items()\n", + " }\n", + "\n", + " # The inputs to the model are the states and the video\n", + " inputs = {**states_input, 'image': inputs}\n", + "\n", + " # Output shape: [batch_size, 600]\n", + " outputs = encoder(inputs)\n", + "\n", + " model = tf.keras.Model(inputs, outputs)\n", + " model.build([1, 1, 1, 1, 3])\n", + "\n", + " return model\n", + "\n", + "# Download example gif\n", + "!wget https://github.com/tensorflow/models/raw/f8af2291cced43fc9f1d9b41ddbf772ae7b0d7d2/official/projects/movinet/files/jumpingjack.gif -O jumpingjack.gif -q\n", + "\n", + "def load_gif(file_path, image_size=(224, 224)):\n", + " \"\"\"Loads a gif file into a TF tensor.\"\"\"\n", + " with tf.io.gfile.GFile(file_path, 'rb') as f:\n", + " video = tf.io.decode_gif(f.read())\n", + " video = tf.image.resize(video, image_size)\n", + " video = tf.cast(video, tf.float32) / 255.\n", + " return video\n", + "\n", + "def get_top_k_streaming_labels(probs, k=5, label_map=KINETICS_600_LABELS_LIST):\n", + " \"\"\"Returns the top-k labels over an entire video sequence.\n", + "\n", + " Args:\n", + " probs: probability tensor of shape (num_frames, num_classes) that represents\n", + " the probability of each class on each frame.\n", + " k: the number of top predictions to select.\n", + " label_map: a list of labels to map logit indices to label strings.\n", + "\n", + " Returns:\n", + " a tuple of the top-k probabilities, labels, and logit indices\n", + " \"\"\"\n", + " top_categories_last = tf.argsort(probs, -1, 'DESCENDING')[-1, :1]\n", + " categories = tf.argsort(probs, -1, 'DESCENDING')[:, :k]\n", + " categories = tf.reshape(categories, [-1])\n", + "\n", + " counts = sorted([\n", + " (i.numpy(), tf.reduce_sum(tf.cast(categories == i, tf.int32)).numpy())\n", + " for i in tf.unique(categories)[0]\n", + " ], key=lambda x: x[1], reverse=True)\n", + "\n", + " top_probs_idx = tf.constant([i for i, _ in counts[:k]])\n", + " top_probs_idx = tf.concat([top_categories_last, top_probs_idx], 0)\n", + " top_probs_idx = tf.unique(top_probs_idx)[0][:k+1]\n", + "\n", + " top_probs = tf.gather(probs, top_probs_idx, axis=-1)\n", + " top_probs = tf.transpose(top_probs, perm=(1, 0))\n", + " top_labels = tf.gather(label_map, top_probs_idx, axis=0)\n", + " top_labels = [label.decode('utf8') for label in top_labels.numpy()]\n", + "\n", + " return top_probs, top_labels, top_probs_idx\n", + "\n", + "def plot_streaming_top_preds_at_step(\n", + " top_probs,\n", + " top_labels,\n", + " step=None,\n", + " image=None,\n", + " legend_loc='lower left',\n", + " duration_seconds=10,\n", + " figure_height=500,\n", + " playhead_scale=0.8,\n", + " grid_alpha=0.3):\n", + " \"\"\"Generates a plot of the top video model predictions at a given time step.\n", + "\n", + " Args:\n", + " top_probs: a tensor of shape (k, num_frames) representing the top-k\n", + " probabilities over all frames.\n", + " top_labels: a list of length k that represents the top-k label strings.\n", + " step: the current time step in the range [0, num_frames].\n", + " image: the image frame to display at the current time step.\n", + " legend_loc: the placement location of the legend.\n", + " duration_seconds: the total duration of the video.\n", + " figure_height: the output figure height.\n", + " playhead_scale: scale value for the playhead.\n", + " grid_alpha: alpha value for the gridlines.\n", + "\n", + " Returns:\n", + " A tuple of the output numpy image, figure, and axes.\n", + " \"\"\"\n", + " num_labels, num_frames = top_probs.shape\n", + " if step is None:\n", + " step = num_frames\n", + "\n", + " fig = plt.figure(figsize=(6.5, 7), dpi=300)\n", + " gs = mpl.gridspec.GridSpec(8, 1)\n", + " ax2 = plt.subplot(gs[:-3, :])\n", + " ax = plt.subplot(gs[-3:, :])\n", + "\n", + " if image is not None:\n", + " ax2.imshow(image, interpolation='nearest')\n", + " ax2.axis('off')\n", + "\n", + " preview_line_x = tf.linspace(0., duration_seconds, num_frames)\n", + " preview_line_y = top_probs\n", + "\n", + " line_x = preview_line_x[:step+1]\n", + " line_y = preview_line_y[:, :step+1]\n", + "\n", + " for i in range(num_labels):\n", + " ax.plot(preview_line_x, preview_line_y[i], label=None, linewidth='1.5',\n", + " linestyle=':', color='gray')\n", + " ax.plot(line_x, line_y[i], label=top_labels[i], linewidth='2.0')\n", + "\n", + "\n", + " ax.grid(which='major', linestyle=':', linewidth='1.0', alpha=grid_alpha)\n", + " ax.grid(which='minor', linestyle=':', linewidth='0.5', alpha=grid_alpha)\n", + "\n", + " min_height = tf.reduce_min(top_probs) * playhead_scale\n", + " max_height = tf.reduce_max(top_probs)\n", + " ax.vlines(preview_line_x[step], min_height, max_height, colors='red')\n", + " ax.scatter(preview_line_x[step], max_height, color='red')\n", + "\n", + " ax.legend(loc=legend_loc)\n", + "\n", + " plt.xlim(0, duration_seconds)\n", + " plt.ylabel('Probability')\n", + " plt.xlabel('Time (s)')\n", + " plt.yscale('log')\n", + "\n", + " fig.tight_layout()\n", + " fig.canvas.draw()\n", + "\n", + " data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)\n", + " data = data.reshape(fig.canvas.get_width_height()[::-1] + (3,))\n", + " plt.close()\n", + "\n", + " figure_width = int(figure_height * data.shape[1] / data.shape[0])\n", + " image = PIL.Image.fromarray(data).resize([figure_width, figure_height])\n", + " image = np.array(image)\n", + "\n", + " return image, (fig, ax, ax2)\n", + "\n", + "def plot_streaming_top_preds(\n", + " probs,\n", + " video,\n", + " top_k=5,\n", + " video_fps=25.,\n", + " figure_height=500,\n", + " use_progbar=True):\n", + " \"\"\"Generates a video plot of the top video model predictions.\n", + "\n", + " Args:\n", + " probs: probability tensor of shape (num_frames, num_classes) that represents\n", + " the probability of each class on each frame.\n", + " video: the video to display in the plot.\n", + " top_k: the number of top predictions to select.\n", + " video_fps: the input video fps.\n", + " figure_fps: the output video fps.\n", + " figure_height: the height of the output video.\n", + " use_progbar: display a progress bar.\n", + "\n", + " Returns:\n", + " A numpy array representing the output video.\n", + " \"\"\"\n", + " video_fps = 8.\n", + " figure_height = 500\n", + " steps = video.shape[0]\n", + " duration = steps / video_fps\n", + "\n", + " top_probs, top_labels, _ = get_top_k_streaming_labels(probs, k=top_k)\n", + "\n", + " images = []\n", + " step_generator = tqdm.trange(steps) if use_progbar else range(steps)\n", + " for i in step_generator:\n", + " image, _ = plot_streaming_top_preds_at_step(\n", + " top_probs=top_probs,\n", + " top_labels=top_labels,\n", + " step=i,\n", + " image=video[i],\n", + " duration_seconds=duration,\n", + " figure_height=figure_height,\n", + " )\n", + " images.append(image)\n", + "\n", + " return np.array(images)\n", + "\n", + "def generate_plot(\n", + " model,\n", + " video_url=None,\n", + " resolution=224,\n", + " video_fps=25,\n", + " display_fps=25):\n", + " # Load the video\n", + " if not video_url:\n", + " video_bytes = list(files.upload().values())[0]\n", + " with open('video', 'wb') as f:\n", + " f.write(video_bytes)\n", + " else:\n", + " urllib.request.urlretrieve(video_url, \"video\")\n", + "\n", + " video = tf.cast(media.read_video('video'), tf.float32) / 255.\n", + " video = tf.image.resize(video, [resolution, resolution], preserve_aspect_ratio=True)\n", + "\n", + " # Create initial states for the stream model\n", + " init_states_fn = model.layers[-1].resolved_object.signatures['init_states']\n", + " init_states = init_states_fn(tf.shape(video[tf.newaxis]))\n", + "\n", + " clips = tf.split(video[tf.newaxis], video.shape[0], axis=1)\n", + "\n", + " all_logits = []\n", + "\n", + " print('Running the model on the video...')\n", + "\n", + " # To run on a video, pass in one frame at a time\n", + " states = init_states\n", + " for clip in tqdm.tqdm(clips):\n", + " # Input shape: [1, 1, 172, 172, 3]\n", + " logits, states = model.predict({**states, 'image': clip}, verbose=0)\n", + " all_logits.append(logits)\n", + "\n", + " logits = tf.concat(all_logits, 0)\n", + " probs = tf.nn.softmax(logits)\n", + "\n", + " print('Generating the plot...')\n", + "\n", + " # Generate a plot and output to a video tensor\n", + " plot_video = plot_streaming_top_preds(probs, video, video_fps=video_fps)\n", + " media.show_video(plot_video, fps=display_fps, codec='gif')\n", + "\n", + "model_size = 'm' #@param [\"xs\", \"s\", \"m\", \"l\", \"xl\", \"xxl\"]\n", + "\n", + "model_map = {\n", + " 'xs': 'a0',\n", + " 's': 'a1',\n", + " 'm': 'a2',\n", + " 'l': 'a3',\n", + " 'xl': 'a4',\n", + " 'xxl': 'a5',\n", + "}\n", + "movinet_model_id = model_map[model_size]\n", + "\n", + "model = load_movinet_from_hub(\n", + " movinet_model_id, 'stream', hub_version=3)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "cellView": "form", + "id": "jO6HrPk8pqo8" + }, + "outputs": [], + "source": [ + "#@title Generate a video plot.\n", + "\n", + "#@markdown You may add a video URL (gif or mp4) or leave the video_url field blank to upload your own file.\n", + "video_url = \"https://i.pinimg.com/originals/33/5e/31/335e31bc8ed52511da0cfb4bc44e95c7.gif\" #@param {type:\"string\"}\n", + "\n", + "#@markdown The base input resolution to the model. A good value is 224, but can change based on model size.\n", + "resolution = 224 #@param\n", + "#@markdown The fps of the input video.\n", + "video_fps = 12 #@param\n", + "#@markdown The fps to display the output plot. Depending on the duration of the input video, it may help to use a lower fps.\n", + "display_fps = 12 #@param\n", + "\n", + "generate_plot(\n", + " model,\n", + " video_url=video_url,\n", + " resolution=resolution,\n", + " video_fps=video_fps,\n", + " display_fps=display_fps)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "last_runtime": { + "build_target": "//learning/deepmind/dm_python:dm_notebook3", + "kind": "private" + }, + "name": "plot_movinet_video_stream_predictions.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/movinet/tools/quantize_movinet.py b/official/projects/movinet/tools/quantize_movinet.py new file mode 100644 index 0000000000000000000000000000000000000000..5e34c3c9e524ead11ca144d511f7dee159582be1 --- /dev/null +++ b/official/projects/movinet/tools/quantize_movinet.py @@ -0,0 +1,331 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Generates example dataset for post-training quantization. + +Example command line to run the script: + +```shell +python3 quantize_movinet.py \ +--saved_model_dir=${SAVED_MODEL_DIR} \ +--saved_model_with_states_dir=${SAVED_MODEL_WITH_STATES_DIR} \ +--output_dataset_dir=${OUTPUT_DATASET_DIR} \ +--output_tflite=${OUTPUT_TFLITE} \ +--quantization_mode='int_float_fallback' \ +--save_dataset_to_tfrecords=True +``` + +""" + +import functools +from typing import Any, Callable, Mapping, Optional + +from absl import app +from absl import flags +from absl import logging +import numpy as np +import tensorflow.compat.v2 as tf +import tensorflow_hub as hub + +from official.vision.configs import video_classification as video_classification_configs +from official.vision.tasks import video_classification + +tf.enable_v2_behavior() + +FLAGS = flags.FLAGS +flags.DEFINE_string( + 'saved_model_dir', None, 'The saved_model directory.') +flags.DEFINE_string( + 'saved_model_with_states_dir', None, + 'The directory to the saved_model with state signature. ' + 'The saved_model_with_states is needed in order to get the initial state ' + 'shape and dtype while saved_model is used for the quantization.') +flags.DEFINE_string( + 'output_tflite', '/tmp/output.tflite', + 'The output tflite file path.') +flags.DEFINE_integer( + 'temporal_stride', 5, + 'Temporal stride used to generate input videos.') +flags.DEFINE_integer( + 'num_frames', 50, 'Input videos number of frames.') +flags.DEFINE_integer( + 'image_size', 172, 'Input videos frame size.') +flags.DEFINE_string( + 'quantization_mode', None, + 'The quantization mode. Can be one of "float16", "int8",' + '"int_float_fallback" or None.') +flags.DEFINE_integer( + 'num_calibration_videos', 100, + 'Number of videos to run to generate example datasets.') +flags.DEFINE_integer( + 'num_samples_per_video', 3, + 'Number of sample draw from one single video.') +flags.DEFINE_boolean( + 'save_dataset_to_tfrecords', False, + 'Whether to save representative dataset to the disk.') +flags.DEFINE_string( + 'output_dataset_dir', '/tmp/representative_dataset/', + 'The directory to store exported tfrecords.') +flags.DEFINE_integer( + 'max_saved_files', 100, + 'The maximum number of tfrecord files to save.') + + +def _bytes_feature(value): + """Returns a bytes_list from a string / byte.""" + if isinstance(value, type(tf.constant(0))): + value = value.numpy() # BytesList won't unpack string from an EagerTensor. + return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) + + +def _float_feature(value): + """Returns a float_list from a float / double.""" + return tf.train.Feature(float_list=tf.train.FloatList(value=value)) + + +def _int64_feature(value): + """Returns an int64_list from a bool / enum / int / uint.""" + return tf.train.Feature(int64_list=tf.train.Int64List(value=value)) + + +def _build_tf_example(feature): + return tf.train.Example( + features=tf.train.Features(feature=feature)).SerializeToString() + + +def save_to_tfrecord(input_frame: tf.Tensor, + input_states: Mapping[str, tf.Tensor], + frame_index: int, + predictions: tf.Tensor, + output_states: Mapping[str, tf.Tensor], + groundtruth_label_id: tf.Tensor, + output_dataset_dir: str, + file_index: int): + """Save results to tfrecord.""" + features = {} + features['frame_id'] = _int64_feature([frame_index]) + features['groundtruth_label'] = _int64_feature( + groundtruth_label_id.numpy().flatten().tolist()) + features['predictions'] = _float_feature( + predictions.numpy().flatten().tolist()) + image_string = tf.io.encode_png( + tf.squeeze(tf.cast(input_frame * 255., tf.uint8), axis=[0, 1])) + features['image'] = _bytes_feature(image_string.numpy()) + + # Input/Output states at time T + for k, v in output_states.items(): + dtype = v[0].dtype + if dtype == tf.int32: + features['input/' + k] = _int64_feature( + input_states[k].numpy().flatten().tolist()) + features['output/' + k] = _int64_feature( + output_states[k].numpy().flatten().tolist()) + elif dtype == tf.float32: + features['input/' + k] = _float_feature( + input_states[k].numpy().flatten().tolist()) + features['output/' + k] = _float_feature( + output_states[k].numpy().flatten().tolist()) + else: + raise ValueError(f'Unrecongized dtype: {dtype}') + + tfe = _build_tf_example(features) + record_file = '{}/movinet_stream_{:06d}.tfrecords'.format( + output_dataset_dir, file_index) + logging.info('Saving to %s.', record_file) + with tf.io.TFRecordWriter(record_file) as writer: + writer.write(tfe) + + +def get_dataset() -> tf.data.Dataset: + """Gets dataset source.""" + config = video_classification_configs.video_classification_kinetics600() + + temporal_stride = FLAGS.temporal_stride + num_frames = FLAGS.num_frames + image_size = FLAGS.image_size + feature_shape = (num_frames, image_size, image_size, 3) + + config.task.validation_data.global_batch_size = 1 + config.task.validation_data.feature_shape = feature_shape + config.task.validation_data.temporal_stride = temporal_stride + config.task.train_data.min_image_size = int(1.125 * image_size) + config.task.validation_data.dtype = 'float32' + config.task.validation_data.drop_remainder = False + + task = video_classification.VideoClassificationTask(config.task) + + valid_dataset = task.build_inputs(config.task.validation_data) + valid_dataset = valid_dataset.map(lambda x, y: (x['image'], y)) + valid_dataset = valid_dataset.prefetch(32) + return valid_dataset + + +def stateful_representative_dataset_generator( + model: tf.keras.Model, + dataset_iter: Any, + init_states: Mapping[str, tf.Tensor], + save_dataset_to_tfrecords: bool = False, + max_saved_files: int = 100, + output_dataset_dir: Optional[str] = None, + num_samples_per_video: int = 3, + num_calibration_videos: int = 100): + """Generates sample input data with states. + + Args: + model: the inference keras model. + dataset_iter: the dataset source. + init_states: the initial states for the model. + save_dataset_to_tfrecords: whether to save the representative dataset to + tfrecords on disk. + max_saved_files: the max number of saved tfrecords files. + output_dataset_dir: the directory to store the saved tfrecords. + num_samples_per_video: number of randomly sampled frames per video. + num_calibration_videos: number of calibration videos to run. + + Yields: + A dictionary of model inputs. + """ + counter = 0 + for i in range(num_calibration_videos): + if i % 100 == 0: + logging.info('Reading representative dateset id %d.', i) + + example_input, example_label = next(dataset_iter) + groundtruth_label_id = tf.argmax(example_label, axis=-1) + input_states = init_states + # split video into frames along the temporal dimension. + frames = tf.split(example_input, example_input.shape[1], axis=1) + + random_indices = np.random.randint( + low=1, high=len(frames), size=num_samples_per_video) + # always include the first frame + random_indices[0] = 0 + random_indices = set(random_indices) + + for frame_index, frame in enumerate(frames): + predictions, output_states = model({'image': frame, **input_states}) + if frame_index in random_indices: + if save_dataset_to_tfrecords and counter < max_saved_files: + save_to_tfrecord( + input_frame=frame, + input_states=input_states, + frame_index=frame_index, + predictions=predictions, + output_states=output_states, + groundtruth_label_id=groundtruth_label_id, + output_dataset_dir=output_dataset_dir, + file_index=counter) + yield {'image': frame, **input_states} + counter += 1 + + # update states for the next inference step + input_states = output_states + + +def get_tflite_converter( + saved_model_dir: str, + quantization_mode: str, + representative_dataset: Optional[Callable[..., Any]] = None +) -> tf.lite.TFLiteConverter: + """Gets tflite converter.""" + converter = tf.lite.TFLiteConverter.from_saved_model( + saved_model_dir=saved_model_dir) + converter.optimizations = [tf.lite.Optimize.DEFAULT] + + if quantization_mode == 'float16': + logging.info('Using float16 quantization.') + converter.target_spec.supported_types = [tf.float16] + + elif quantization_mode == 'int8': + logging.info('Using full interger quantization.') + converter.representative_dataset = representative_dataset + converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] + converter.inference_input_type = tf.int8 + converter.inference_output_type = tf.int8 + + elif quantization_mode == 'int_float_fallback': + logging.info('Using interger quantization with float-point fallback.') + converter.representative_dataset = representative_dataset + + else: + logging.info('Using dynamic range quantization.') + return converter + + +def quantize_movinet(dataset_fn): + """Quantizes Movinet.""" + valid_dataset = dataset_fn() + dataset_iter = iter(valid_dataset) + + # Load model + encoder = hub.KerasLayer(FLAGS.saved_model_with_states_dir, trainable=False) + inputs = tf.keras.layers.Input( + shape=[1, FLAGS.image_size, FLAGS.image_size, 3], + dtype=tf.float32, + name='image') + + # Define the state inputs, which is a dict that maps state names to tensors. + init_states_fn = encoder.resolved_object.signatures['init_states'] + state_shapes = { + name: ([s if s > 0 else None for s in state.shape], state.dtype) + for name, state in init_states_fn( + tf.constant([1, 1, FLAGS.image_size, FLAGS.image_size, 3])).items() + } + states_input = { + name: tf.keras.Input(shape[1:], dtype=dtype, name=name) + for name, (shape, dtype) in state_shapes.items() + } + + # The inputs to the model are the states and the video + inputs = {**states_input, 'image': inputs} + outputs = encoder(inputs) + model = tf.keras.Model(inputs, outputs, name='movinet_stream') + input_shape = tf.constant( + [1, FLAGS.num_frames, FLAGS.image_size, FLAGS.image_size, 3]) + init_states = init_states_fn(input_shape) + + # config representative_datset_fn + representative_dataset = functools.partial( + stateful_representative_dataset_generator, + model=model, + dataset_iter=dataset_iter, + init_states=init_states, + save_dataset_to_tfrecords=FLAGS.save_dataset_to_tfrecords, + max_saved_files=FLAGS.max_saved_files, + output_dataset_dir=FLAGS.output_dataset_dir, + num_samples_per_video=FLAGS.num_samples_per_video, + num_calibration_videos=FLAGS.num_calibration_videos) + + converter = get_tflite_converter( + saved_model_dir=FLAGS.saved_model_dir, + quantization_mode=FLAGS.quantization_mode, + representative_dataset=representative_dataset) + + logging.info('Converting...') + tflite_buffer = converter.convert() + return tflite_buffer + + +def main(_): + tflite_buffer = quantize_movinet(dataset_fn=get_dataset) + + with open(FLAGS.output_tflite, 'wb') as f: + f.write(tflite_buffer) + + logging.info('tflite model written to %s', FLAGS.output_tflite) + +if __name__ == '__main__': + flags.mark_flag_as_required('saved_model_dir') + flags.mark_flag_as_required('saved_model_with_states_dir') + app.run(main) diff --git a/official/projects/movinet/train.py b/official/projects/movinet/train.py new file mode 100644 index 0000000000000000000000000000000000000000..ef42379ec7bc543f3910ed3ba4829ef14734cc5a --- /dev/null +++ b/official/projects/movinet/train.py @@ -0,0 +1,91 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Training driver. + +To train: + +CONFIG_FILE=official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml +python3 official/projects/movinet/train.py \ + --experiment=movinet_kinetics600 \ + --mode=train \ + --model_dir=/tmp/movinet/ \ + --config_file=${CONFIG_FILE} \ + --params_override="" \ + --gin_file="" \ + --gin_params="" \ + --tpu="" \ + --tf_data_service="" +""" + +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +# Import movinet libraries to register the backbone and model into tf.vision +# model garden factory. +# pylint: disable=unused-import +from official.projects.movinet.modeling import movinet +from official.projects.movinet.modeling import movinet_model +from official.vision import registry_imports +# pylint: enable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + if 'train_and_eval' in FLAGS.mode: + assert (params.task.train_data.feature_shape == + params.task.validation_data.feature_shape), ( + f'train {params.task.train_data.feature_shape} != validate ' + f'{params.task.validation_data.feature_shape}') + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(main) diff --git a/official/vision/beta/projects/movinet/train_test.py b/official/projects/movinet/train_test.py similarity index 93% rename from official/vision/beta/projects/movinet/train_test.py rename to official/projects/movinet/train_test.py index 5258c50cee9032bb082d3ed25388bcb5f48ef4e2..ad53802ac6593aca986ad40d4cd949f755a86a83 100644 --- a/official/vision/beta/projects/movinet/train_test.py +++ b/official/projects/movinet/train_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for train.py.""" import json @@ -24,8 +23,8 @@ from absl import logging from absl.testing import flagsaver import tensorflow as tf -from official.vision.beta.dataloaders import tfexample_utils -from official.vision.beta.projects.movinet import train as train_lib +from official.projects.movinet import train as train_lib +from official.vision.dataloaders import tfexample_utils FLAGS = flags.FLAGS diff --git a/official/projects/mtop/README.md b/official/projects/mtop/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8efe09616216d8e72523c7df2a91120210b0a236 --- /dev/null +++ b/official/projects/mtop/README.md @@ -0,0 +1,11 @@ +# MTOP (All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass) + +**Note:** This project is a work in progress; please stay tuned. + +MTOP is a text encoder multi-task method that can conduct one forward pass to +make predictions for all tasks. We propose prompt-based modules tailored for +the multi-task setting and a conditional pooler for flexible task +representations, and initialization from single task models for effective +knowledge transfer. Our proposed approach gets superior performance on news +tasks and the GLUE benchmark. We also release a multi-task news dataset. + diff --git a/official/projects/nhnet/README.md b/official/projects/nhnet/README.md index f838d120fb8bcc419d5eaeb543675eb224cfddbd..88a7d1f9fe40765441b3aed44643c7f2b425cae9 100644 --- a/official/projects/nhnet/README.md +++ b/official/projects/nhnet/README.md @@ -36,7 +36,7 @@ will crawl and extract news articles on a local machine. First, install the `news-please` CLI (requires python 3.x) ```shell -$ pip3 install news-please +$ pip3 install news-please==1.4.26 ``` Next, run the crawler with our provided [config and URL list](https://github.com/google-research-datasets/NewSHead/releases) diff --git a/official/projects/nhnet/__init__.py b/official/projects/nhnet/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/nhnet/__init__.py +++ b/official/projects/nhnet/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/configs.py b/official/projects/nhnet/configs.py index 0f58dce8a9a1304aa00367cc759f6446cbb6a081..fa0a787f9a4a9e81fe0e8c737c9050c0391fd9a6 100644 --- a/official/projects/nhnet/configs.py +++ b/official/projects/nhnet/configs.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/configs_test.py b/official/projects/nhnet/configs_test.py index 6ed2b24adbed457670fee85e36c0ad75bf9f9631..54678ddecf2703d1aef90d4c70944d7d0a0c6e57 100644 --- a/official/projects/nhnet/configs_test.py +++ b/official/projects/nhnet/configs_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/decoder.py b/official/projects/nhnet/decoder.py index c937feac1003bc9b0d0ff00033d44245b6201786..dc1d8e3fd86351fe609ebd1449e83c75b0e28ca9 100644 --- a/official/projects/nhnet/decoder.py +++ b/official/projects/nhnet/decoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/decoder_test.py b/official/projects/nhnet/decoder_test.py index 4d70bbadf0a0b67fa6a997bc9181cebbac277ec1..1c0feb81abc7a300128931e1b2d52d4ff416400c 100644 --- a/official/projects/nhnet/decoder_test.py +++ b/official/projects/nhnet/decoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/evaluation.py b/official/projects/nhnet/evaluation.py index 8435d2c0a24dab13da3a66a587235fc099c1a77e..c762aeb54897456159a92dd73343af4420207896 100644 --- a/official/projects/nhnet/evaluation.py +++ b/official/projects/nhnet/evaluation.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/input_pipeline.py b/official/projects/nhnet/input_pipeline.py index d61ea688e2d9dc83083f5ddd1e1df109dc8e65d5..3bfe2bc511370ea4838c67faf58e23d87d56648d 100644 --- a/official/projects/nhnet/input_pipeline.py +++ b/official/projects/nhnet/input_pipeline.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/models.py b/official/projects/nhnet/models.py index 96f2ab30c288e54eeef66bfbbba21de721cc109b..6832a96404e7abef73a9502c57c656850e6867f3 100644 --- a/official/projects/nhnet/models.py +++ b/official/projects/nhnet/models.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/models_test.py b/official/projects/nhnet/models_test.py index ac4783722a3b49493e262a3f833fcf7c3e116bf4..3f487d08943c54d1cbc3fb5cbb0a31d0b6938543 100644 --- a/official/projects/nhnet/models_test.py +++ b/official/projects/nhnet/models_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/optimizer.py b/official/projects/nhnet/optimizer.py index 03375c3b22134e566dd1ce28120a2897cf8a1b1d..85a9a79448d5325fdf272cde406af4441e54d09b 100644 --- a/official/projects/nhnet/optimizer.py +++ b/official/projects/nhnet/optimizer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/raw_data_process.py b/official/projects/nhnet/raw_data_process.py index c845b08c2b447fef8f03ba3fe505735bea8850fa..3f5d15eab10df7c7e99b8207132a21cff226d94a 100644 --- a/official/projects/nhnet/raw_data_process.py +++ b/official/projects/nhnet/raw_data_process.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/raw_data_processor.py b/official/projects/nhnet/raw_data_processor.py index 73a00ba158cb2aa098516880ef6d18dd1ef2636e..1e3316e8be74cf229c7e817af23089abcea1ab03 100644 --- a/official/projects/nhnet/raw_data_processor.py +++ b/official/projects/nhnet/raw_data_processor.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,8 +22,8 @@ import urllib.parse import tensorflow as tf -from official.nlp.bert import tokenization from official.nlp.data import classifier_data_lib +from official.nlp.tools import tokenization class RawDataProcessor(object): diff --git a/official/projects/nhnet/trainer.py b/official/projects/nhnet/trainer.py index 183f05ef01e29b533a5ecbdd2b4075ee8f8df567..35d4eea637c39a80a34afbd3fe9172ba3b5a51f5 100644 --- a/official/projects/nhnet/trainer.py +++ b/official/projects/nhnet/trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -120,9 +120,13 @@ class Trainer(tf.keras.Model): tvars = self.trainable_variables grads = tape.gradient(scaled_loss, tvars) self.optimizer.apply_gradients(list(zip(grads, tvars))) + if isinstance(self.optimizer, tf.keras.optimizers.experimental.Optimizer): + learning_rate = self.optimizer.learning_rate + else: + learning_rate = self.optimizer._decayed_lr(var_dtype=tf.float32) return { "training_loss": loss, - "learning_rate": self.optimizer._decayed_lr(var_dtype=tf.float32) + "learning_rate": learning_rate, } diff --git a/official/projects/nhnet/trainer_test.py b/official/projects/nhnet/trainer_test.py index 6adddbbdba63385da56887fe8268065a1a08a2a0..886c8b4cf2a1773e03f577f089a4ab3dac056845 100644 --- a/official/projects/nhnet/trainer_test.py +++ b/official/projects/nhnet/trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/nhnet/utils.py b/official/projects/nhnet/utils.py index f23b2bef21cd171e432e03eff4e93077c478c076..23c3d571e70e4f1b21d461a56e0fa7bc28bac6e6 100644 --- a/official/projects/nhnet/utils.py +++ b/official/projects/nhnet/utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,8 +18,8 @@ from typing import Optional, Text from absl import logging import tensorflow as tf +from official.legacy.bert import configs from official.modeling.hyperparams import params_dict -from official.nlp.bert import configs from official.projects.nhnet import configs as nhnet_configs diff --git a/official/projects/panoptic/README.md b/official/projects/panoptic/README.md new file mode 100644 index 0000000000000000000000000000000000000000..ebe1c2a93ddd97c355730239a20f0ee76a8f716b --- /dev/null +++ b/official/projects/panoptic/README.md @@ -0,0 +1,114 @@ +# Panoptic Segmentation + +## Description + +Panoptic Segmentation combines the two distinct vision tasks - semantic +segmentation and instance segmentation. These tasks are unified such that, each +pixel in the image is assigned the label of the class it belongs to, and also +the instance identifier of the object it is a part of. + +## Environment setup +The code can be run on multiple GPUs or TPUs with different distribution +strategies. See the TensorFlow distributed training +[guide](https://www.tensorflow.org/guide/distributed_training) for an overview +of `tf.distribute`. + +The code is compatible with TensorFlow 2.6+. See requirements.txt for all +prerequisites. + +```bash +$ git clone https://github.com/tensorflow/models.git +$ cd models +$ pip3 install -r official/requirements.txt +$ export PYTHONPATH=$(pwd) +``` + +## Preparing Dataset +```bash +$ ./official/vision/beta/data/process_coco_panoptic.sh +``` + +## Launch Training +```bash +$ export MODEL_DIR="gs://" +$ export TPU_NAME="" +$ export ANNOTATION_FILE="gs://" +$ export TRAIN_DATA="gs://" +$ export EVAL_DATA="gs://" +$ export OVERRIDES="task.validation_data.input_path=${EVAL_DATA},\ +task.train_data.input_path=${TRAIN_DATA},\ +task.annotation_file=${ANNOTATION_FILE},\ +runtime.distribution_strategy=tpu" + + +$ python3 train.py \ + --experiment panoptic_fpn_coco \ + --config_file configs/experiments/r50fpn_1x_coco.yaml \ + --mode train \ + --model_dir $MODEL_DIR \ + --tpu $TPU_NAME \ + --params_override=$OVERRIDES +``` + +## Launch Evaluation +```bash +$ export MODEL_DIR="gs://" +$ export NUM_GPUS="" +$ export PRECISION="" +$ export ANNOTATION_FILE="gs://" +$ export TRAIN_DATA="gs://" +$ export EVAL_DATA="gs://" +$ export OVERRIDES="task.validation_data.input_path=${EVAL_DATA}, \ +task.train_data.input_path=${TRAIN_DATA}, \ +task.annotation_file=${ANNOTATION_FILE}, \ +runtime.distribution_strategy=mirrored, \ +runtime.mixed_precision_dtype=$PRECISION, \ +runtime.num_gpus=$NUM_GPUS" + + +$ python3 train.py \ + --experiment panoptic_fpn_coco \ + --config_file configs/experiments/r50fpn_1x_coco.yaml \ + --mode eval \ + --model_dir $MODEL_DIR \ + --params_override=$OVERRIDES +``` +**Note**: The [PanopticSegmentationGenerator](https://github.com/tensorflow/models/blob/ac7f9e7f2d0508913947242bad3e23ef7cae5a43/official/projects/panoptic/modeling/layers/panoptic_segmentation_generator.py#L22) layer uses dynamic shapes and hence generating panoptic masks is not supported on Cloud TPUs. Running evaluation on Cloud TPUs is not supported for the same reason. However, training is supported on both Cloud TPUs and GPUs. +## Pretrained Models +### Panoptic FPN +Backbone | Schedule | Experiment name | Box mAP | Mask mAP | Overall PQ | Things PQ | Stuff PQ | Checkpoints +:------------| :----------- | :---------------------------| ------- | ---------- | ---------- | --------- | -------- | ------------: +ResNet-50 | 1x | `panoptic_fpn_coco` | 38.19 | 34.25 | 39.14 | 45.42 | 29.65 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_fpn/panoptic_fpn_1x) +ResNet-50 | 3x | `panoptic_fpn_coco` | 40.64 | 36.29 | 40.91 | 47.68 | 30.69 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_fpn/panoptic_fpn_3x) + +**Note**: Here 1x schedule refers to ~12 epochs + +### Panoptic Deeplab +Backbone | Experiment name | Overall PQ | Things PQ | Stuff PQ | Checkpoints +:---------------------| :-------------------------------| ---------- | --------- | -------- | ------------: +Dilated ResNet-50 | `panoptic_deeplab_resnet_coco` | 36.80 | 37.51 | 35.73 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_deeplab/coco/resnet50) +Dilated ResNet-101 | `panoptic_deeplab_resnet_coco` | 38.39 | 39.47 | 36.75 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_deeplab/coco/resnet101) +MobileNetV3 Large | `panoptic_deeplab_mobilenetv3_large_coco` | 30.50 | 30.10 | 31.10 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_deeplab/coco/mobilenetv3_large) +MobileNetV3 Small | `panoptic_deeplab_mobilenetv3_small_coco` | 25.06 | 23.46 | 27.48 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_deeplab/coco/mobilenetv3_small) + + +___ +## Citation +``` +@misc{kirillov2019panoptic, + title={Panoptic Feature Pyramid Networks}, + author={Alexander Kirillov and Ross Girshick and Kaiming He and Piotr Dollár}, + year={2019}, + eprint={1901.02446}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} + +@article{Cheng2020PanopticDeepLabAS, + title={Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation}, + author={Bowen Cheng and Maxwell D. Collins and Yukun Zhu and Ting Liu and Thomas S. Huang and Hartwig Adam and Liang-Chieh Chen}, + journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + year={2020}, + pages={12472-12482} +} +``` diff --git a/official/projects/panoptic/__init__.py b/official/projects/panoptic/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/panoptic/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/panoptic/configs/__init__.py b/official/projects/panoptic/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/panoptic/configs/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/panoptic_maskrcnn/configs/experiments/r50fpn_1x_coco.yaml b/official/projects/panoptic/configs/experiments/r50fpn_1x_coco.yaml similarity index 100% rename from official/vision/beta/projects/panoptic_maskrcnn/configs/experiments/r50fpn_1x_coco.yaml rename to official/projects/panoptic/configs/experiments/r50fpn_1x_coco.yaml diff --git a/official/vision/beta/projects/panoptic_maskrcnn/configs/experiments/r50fpn_3x_coco.yaml b/official/projects/panoptic/configs/experiments/r50fpn_3x_coco.yaml similarity index 100% rename from official/vision/beta/projects/panoptic_maskrcnn/configs/experiments/r50fpn_3x_coco.yaml rename to official/projects/panoptic/configs/experiments/r50fpn_3x_coco.yaml diff --git a/official/projects/panoptic/configs/panoptic_deeplab.py b/official/projects/panoptic/configs/panoptic_deeplab.py new file mode 100644 index 0000000000000000000000000000000000000000..b64fb62410030719d7501cccd18ce1051c28c4ed --- /dev/null +++ b/official/projects/panoptic/configs/panoptic_deeplab.py @@ -0,0 +1,670 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Panoptic Deeplab configuration definition.""" +import dataclasses +import os +from typing import List, Optional, Union + +import numpy as np + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.vision.configs import common +from official.vision.configs import decoders +from official.vision.configs import backbones + + +_COCO_INPUT_PATH_BASE = 'coco/tfrecords' +_COCO_TRAIN_EXAMPLES = 118287 +_COCO_VAL_EXAMPLES = 5000 + + +@dataclasses.dataclass +class Parser(hyperparams.Config): + """Panoptic deeplab parser.""" + ignore_label: int = 0 + # If resize_eval_groundtruth is set to False, original image sizes are used + # for eval. In that case, groundtruth_padded_size has to be specified too to + # allow for batching the variable input sizes of images. + resize_eval_groundtruth: bool = True + groundtruth_padded_size: List[int] = dataclasses.field(default_factory=list) + aug_scale_min: float = 1.0 + aug_scale_max: float = 1.0 + aug_rand_hflip: bool = True + aug_type: common.Augmentation = common.Augmentation() + sigma: float = 8.0 + small_instance_area_threshold: int = 4096 + small_instance_weight: float = 3.0 + dtype = 'float32' + + +@dataclasses.dataclass +class TfExampleDecoder(common.TfExampleDecoder): + """A simple TF Example decoder config.""" + panoptic_category_mask_key: str = 'image/panoptic/category_mask' + panoptic_instance_mask_key: str = 'image/panoptic/instance_mask' + + +@dataclasses.dataclass +class DataDecoder(common.DataDecoder): + """Data decoder config.""" + simple_decoder: TfExampleDecoder = TfExampleDecoder() + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + decoder: DataDecoder = DataDecoder() + parser: Parser = Parser() + input_path: str = '' + drop_remainder: bool = True + file_type: str = 'tfrecord' + is_training: bool = True + global_batch_size: int = 1 + + +@dataclasses.dataclass +class PanopticDeeplabHead(hyperparams.Config): + """Panoptic Deeplab head config.""" + level: int = 3 + num_convs: int = 2 + num_filters: int = 256 + kernel_size: int = 5 + use_depthwise_convolution: bool = False + upsample_factor: int = 1 + low_level: List[int] = dataclasses.field(default_factory=lambda: [3, 2]) + low_level_num_filters: List[int] = dataclasses.field( + default_factory=lambda: [64, 32]) + fusion_num_output_filters: int = 256 + + +@dataclasses.dataclass +class SemanticHead(PanopticDeeplabHead): + """Semantic head config.""" + prediction_kernel_size: int = 1 + + +@dataclasses.dataclass +class InstanceHead(PanopticDeeplabHead): + """Instance head config.""" + prediction_kernel_size: int = 1 + + +@dataclasses.dataclass +class PanopticDeeplabPostProcessor(hyperparams.Config): + """Panoptic Deeplab PostProcessing config.""" + output_size: List[int] = dataclasses.field( + default_factory=list) + center_score_threshold: float = 0.1 + thing_class_ids: List[int] = dataclasses.field(default_factory=list) + label_divisor: int = 256 * 256 * 256 + stuff_area_limit: int = 4096 + ignore_label: int = 0 + nms_kernel: int = 7 + keep_k_centers: int = 200 + rescale_predictions: bool = True + + +@dataclasses.dataclass +class PanopticDeeplab(hyperparams.Config): + """Panoptic Deeplab model config.""" + num_classes: int = 2 + input_size: List[int] = dataclasses.field(default_factory=list) + min_level: int = 3 + max_level: int = 6 + norm_activation: common.NormActivation = common.NormActivation() + backbone: backbones.Backbone = backbones.Backbone( + type='resnet', resnet=backbones.ResNet()) + decoder: decoders.Decoder = decoders.Decoder(type='aspp') + semantic_head: SemanticHead = SemanticHead() + instance_head: InstanceHead = InstanceHead() + shared_decoder: bool = False + generate_panoptic_masks: bool = True + post_processor: PanopticDeeplabPostProcessor = PanopticDeeplabPostProcessor() + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + label_smoothing: float = 0.0 + ignore_label: int = 0 + class_weights: List[float] = dataclasses.field(default_factory=list) + l2_weight_decay: float = 1e-4 + top_k_percent_pixels: float = 0.15 + segmentation_loss_weight: float = 1.0 + center_heatmap_loss_weight: float = 200 + center_offset_loss_weight: float = 0.01 + + +@dataclasses.dataclass +class Evaluation(hyperparams.Config): + """Evaluation config.""" + ignored_label: int = 0 + max_instances_per_category: int = 256 + offset: int = 256 * 256 * 256 + is_thing: List[float] = dataclasses.field( + default_factory=list) + rescale_predictions: bool = True + report_per_class_pq: bool = False + + report_per_class_iou: bool = False + report_train_mean_iou: bool = True # Turning this off can speed up training. + + +@dataclasses.dataclass +class PanopticDeeplabTask(cfg.TaskConfig): + """Panoptic deeplab task config.""" + model: PanopticDeeplab = PanopticDeeplab() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig( + is_training=False, + drop_remainder=False) + losses: Losses = Losses() + init_checkpoint: Optional[str] = None + init_checkpoint_modules: Union[ + str, List[str]] = 'all' # all, backbone, and/or decoder + evaluation: Evaluation = Evaluation() + + +@exp_factory.register_config_factory('panoptic_deeplab_resnet_coco') +def panoptic_deeplab_resnet_coco() -> cfg.ExperimentConfig: + """COCO panoptic segmentation with Panoptic Deeplab.""" + train_steps = 200000 + train_batch_size = 64 + eval_batch_size = 1 + steps_per_epoch = _COCO_TRAIN_EXAMPLES // train_batch_size + validation_steps = _COCO_VAL_EXAMPLES // eval_batch_size + + num_panoptic_categories = 201 + num_thing_categories = 91 + ignore_label = 0 + + is_thing = [False] + for idx in range(1, num_panoptic_categories): + is_thing.append(True if idx <= num_thing_categories else False) + + input_size = [640, 640, 3] + output_stride = 16 + aspp_dilation_rates = [6, 12, 18] + multigrid = [1, 2, 4] + stem_type = 'v1' + level = int(np.math.log2(output_stride)) + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig( + mixed_precision_dtype='bfloat16', enable_xla=True), + task=PanopticDeeplabTask( + init_checkpoint='gs://tf_model_garden/vision/panoptic/panoptic_deeplab/imagenet/resnet50_v1/ckpt-436800', # pylint: disable=line-too-long + init_checkpoint_modules=['backbone'], + model=PanopticDeeplab( + num_classes=num_panoptic_categories, + input_size=input_size, + backbone=backbones.Backbone( + type='dilated_resnet', dilated_resnet=backbones.DilatedResNet( + model_id=50, + stem_type=stem_type, + output_stride=output_stride, + multigrid=multigrid, + se_ratio=0.25, + last_stage_repeats=1, + stochastic_depth_drop_rate=0.2)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, + num_filters=256, + pool_kernel_size=input_size[:2], + dilation_rates=aspp_dilation_rates, + use_depthwise_convolution=True, + dropout_rate=0.1)), + semantic_head=SemanticHead( + level=level, + num_convs=1, + num_filters=256, + kernel_size=5, + use_depthwise_convolution=True, + upsample_factor=1, + low_level=[3, 2], + low_level_num_filters=[64, 32], + fusion_num_output_filters=256, + prediction_kernel_size=1), + instance_head=InstanceHead( + level=level, + num_convs=1, + num_filters=32, + kernel_size=5, + use_depthwise_convolution=True, + upsample_factor=1, + low_level=[3, 2], + low_level_num_filters=[32, 16], + fusion_num_output_filters=128, + prediction_kernel_size=1), + shared_decoder=False, + generate_panoptic_masks=True, + post_processor=PanopticDeeplabPostProcessor( + output_size=input_size[:2], + center_score_threshold=0.1, + thing_class_ids=list(range(1, num_thing_categories)), + label_divisor=256, + stuff_area_limit=4096, + ignore_label=ignore_label, + nms_kernel=41, + keep_k_centers=200, + rescale_predictions=True)), + losses=Losses( + label_smoothing=0.0, + ignore_label=ignore_label, + l2_weight_decay=0.0, + top_k_percent_pixels=0.2, + segmentation_loss_weight=1.0, + center_heatmap_loss_weight=200, + center_offset_loss_weight=0.01), + train_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_scale_min=0.5, + aug_scale_max=1.5, + aug_rand_hflip=True, + aug_type=common.Augmentation( + type='autoaug', + autoaug=common.AutoAugment( + augmentation_name='panoptic_deeplab_policy')), + sigma=8.0, + small_instance_area_threshold=4096, + small_instance_weight=3.0)), + validation_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + parser=Parser( + resize_eval_groundtruth=False, + groundtruth_padded_size=[640, 640], + aug_scale_min=1.0, + aug_scale_max=1.0, + aug_rand_hflip=False, + aug_type=None, + sigma=8.0, + small_instance_area_threshold=4096, + small_instance_weight=3.0), + drop_remainder=False), + evaluation=Evaluation( + ignored_label=ignore_label, + max_instances_per_category=256, + offset=256*256*256, + is_thing=is_thing, + rescale_predictions=True, + report_per_class_pq=False, + report_per_class_iou=False, + report_train_mean_iou=False)), + trainer=cfg.TrainerConfig( + train_steps=train_steps, + validation_steps=validation_steps, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adam', + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.0005, + 'decay_steps': train_steps, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config + + +@exp_factory.register_config_factory('panoptic_deeplab_mobilenetv3_large_coco') +def panoptic_deeplab_mobilenetv3_large_coco() -> cfg.ExperimentConfig: + """COCO panoptic segmentation with Panoptic Deeplab.""" + train_steps = 200000 + train_batch_size = 64 + eval_batch_size = 1 + steps_per_epoch = _COCO_TRAIN_EXAMPLES // train_batch_size + validation_steps = _COCO_VAL_EXAMPLES // eval_batch_size + + num_panoptic_categories = 201 + num_thing_categories = 91 + ignore_label = 0 + + is_thing = [False] + for idx in range(1, num_panoptic_categories): + is_thing.append(True if idx <= num_thing_categories else False) + + input_size = [640, 640, 3] + output_stride = 16 + aspp_dilation_rates = [6, 12, 18] + level = int(np.math.log2(output_stride)) + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig( + mixed_precision_dtype='float32', enable_xla=True), + task=PanopticDeeplabTask( + init_checkpoint='gs://tf_model_garden/vision/panoptic/panoptic_deeplab/imagenet/mobilenetv3_large/ckpt-156000', + init_checkpoint_modules=['backbone'], + model=PanopticDeeplab( + num_classes=num_panoptic_categories, + input_size=input_size, + backbone=backbones.Backbone( + type='mobilenet', mobilenet=backbones.MobileNet( + model_id='MobileNetV3Large', + filter_size_scale=1.0, + stochastic_depth_drop_rate=0.0, + output_stride=output_stride)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, + num_filters=256, + pool_kernel_size=input_size[:2], + dilation_rates=aspp_dilation_rates, + use_depthwise_convolution=True, + dropout_rate=0.1)), + semantic_head=SemanticHead( + level=level, + num_convs=1, + num_filters=256, + kernel_size=5, + use_depthwise_convolution=True, + upsample_factor=1, + low_level=[3, 2], + low_level_num_filters=[64, 32], + fusion_num_output_filters=256, + prediction_kernel_size=1), + instance_head=InstanceHead( + level=level, + num_convs=1, + num_filters=32, + kernel_size=5, + use_depthwise_convolution=True, + upsample_factor=1, + low_level=[3, 2], + low_level_num_filters=[32, 16], + fusion_num_output_filters=128, + prediction_kernel_size=1), + shared_decoder=False, + generate_panoptic_masks=True, + post_processor=PanopticDeeplabPostProcessor( + output_size=input_size[:2], + center_score_threshold=0.1, + thing_class_ids=list(range(1, num_thing_categories)), + label_divisor=256, + stuff_area_limit=4096, + ignore_label=ignore_label, + nms_kernel=41, + keep_k_centers=200, + rescale_predictions=True)), + losses=Losses( + label_smoothing=0.0, + ignore_label=ignore_label, + l2_weight_decay=0.0, + top_k_percent_pixels=0.2, + segmentation_loss_weight=1.0, + center_heatmap_loss_weight=200, + center_offset_loss_weight=0.01), + train_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_scale_min=0.5, + aug_scale_max=2.0, + aug_rand_hflip=True, + aug_type=common.Augmentation( + type='autoaug', + autoaug=common.AutoAugment( + augmentation_name='panoptic_deeplab_policy')), + sigma=8.0, + small_instance_area_threshold=4096, + small_instance_weight=3.0)), + validation_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + parser=Parser( + resize_eval_groundtruth=False, + groundtruth_padded_size=[640, 640], + aug_scale_min=1.0, + aug_scale_max=1.0, + aug_rand_hflip=False, + aug_type=None, + sigma=8.0, + small_instance_area_threshold=4096, + small_instance_weight=3.0), + drop_remainder=False), + evaluation=Evaluation( + ignored_label=ignore_label, + max_instances_per_category=256, + offset=256*256*256, + is_thing=is_thing, + rescale_predictions=True, + report_per_class_pq=False, + report_per_class_iou=False, + report_train_mean_iou=False)), + trainer=cfg.TrainerConfig( + train_steps=train_steps, + validation_steps=validation_steps, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adam', + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.001, + 'decay_steps': train_steps, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config + + +@exp_factory.register_config_factory('panoptic_deeplab_mobilenetv3_small_coco') +def panoptic_deeplab_mobilenetv3_small_coco() -> cfg.ExperimentConfig: + """COCO panoptic segmentation with Panoptic Deeplab.""" + train_steps = 200000 + train_batch_size = 64 + eval_batch_size = 1 + steps_per_epoch = _COCO_TRAIN_EXAMPLES // train_batch_size + validation_steps = _COCO_VAL_EXAMPLES // eval_batch_size + + num_panoptic_categories = 201 + num_thing_categories = 91 + ignore_label = 0 + + is_thing = [False] + for idx in range(1, num_panoptic_categories): + is_thing.append(True if idx <= num_thing_categories else False) + + input_size = [640, 640, 3] + output_stride = 16 + aspp_dilation_rates = [6, 12, 18] + level = int(np.math.log2(output_stride)) + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig( + mixed_precision_dtype='float32', enable_xla=True), + task=PanopticDeeplabTask( + init_checkpoint='gs://tf_model_garden/vision/panoptic/panoptic_deeplab/imagenet/mobilenetv3_small/ckpt-312000', + init_checkpoint_modules=['backbone'], + model=PanopticDeeplab( + num_classes=num_panoptic_categories, + input_size=input_size, + backbone=backbones.Backbone( + type='mobilenet', mobilenet=backbones.MobileNet( + model_id='MobileNetV3Small', + filter_size_scale=1.0, + stochastic_depth_drop_rate=0.0, + output_stride=output_stride)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, + num_filters=256, + pool_kernel_size=input_size[:2], + dilation_rates=aspp_dilation_rates, + use_depthwise_convolution=True, + dropout_rate=0.1)), + semantic_head=SemanticHead( + level=level, + num_convs=1, + num_filters=256, + kernel_size=5, + use_depthwise_convolution=True, + upsample_factor=1, + low_level=[3, 2], + low_level_num_filters=[64, 32], + fusion_num_output_filters=256, + prediction_kernel_size=1), + instance_head=InstanceHead( + level=level, + num_convs=1, + num_filters=32, + kernel_size=5, + use_depthwise_convolution=True, + upsample_factor=1, + low_level=[3, 2], + low_level_num_filters=[32, 16], + fusion_num_output_filters=128, + prediction_kernel_size=1), + shared_decoder=False, + generate_panoptic_masks=True, + post_processor=PanopticDeeplabPostProcessor( + output_size=input_size[:2], + center_score_threshold=0.1, + thing_class_ids=list(range(1, num_thing_categories)), + label_divisor=256, + stuff_area_limit=4096, + ignore_label=ignore_label, + nms_kernel=41, + keep_k_centers=200, + rescale_predictions=True)), + losses=Losses( + label_smoothing=0.0, + ignore_label=ignore_label, + l2_weight_decay=0.0, + top_k_percent_pixels=0.2, + segmentation_loss_weight=1.0, + center_heatmap_loss_weight=200, + center_offset_loss_weight=0.01), + train_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_scale_min=0.5, + aug_scale_max=2.0, + aug_rand_hflip=True, + aug_type=common.Augmentation( + type='autoaug', + autoaug=common.AutoAugment( + augmentation_name='panoptic_deeplab_policy')), + sigma=8.0, + small_instance_area_threshold=4096, + small_instance_weight=3.0)), + validation_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + parser=Parser( + resize_eval_groundtruth=False, + groundtruth_padded_size=[640, 640], + aug_scale_min=1.0, + aug_scale_max=1.0, + aug_rand_hflip=False, + aug_type=None, + sigma=8.0, + small_instance_area_threshold=4096, + small_instance_weight=3.0), + drop_remainder=False), + evaluation=Evaluation( + ignored_label=ignore_label, + max_instances_per_category=256, + offset=256*256*256, + is_thing=is_thing, + rescale_predictions=True, + report_per_class_pq=False, + report_per_class_iou=False, + report_train_mean_iou=False)), + trainer=cfg.TrainerConfig( + train_steps=train_steps, + validation_steps=validation_steps, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adam', + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.001, + 'decay_steps': train_steps, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config diff --git a/official/projects/panoptic/configs/panoptic_maskrcnn.py b/official/projects/panoptic/configs/panoptic_maskrcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..a27bcb7755f2bd34d469ee73b4400cc7324fce3f --- /dev/null +++ b/official/projects/panoptic/configs/panoptic_maskrcnn.py @@ -0,0 +1,259 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Panoptic Mask R-CNN configuration definition.""" + +import dataclasses +import os +from typing import List, Optional + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.projects.deepmac_maskrcnn.configs import deep_mask_head_rcnn as deepmac_maskrcnn +from official.vision.configs import common +from official.vision.configs import maskrcnn +from official.vision.configs import semantic_segmentation + + +SEGMENTATION_MODEL = semantic_segmentation.SemanticSegmentationModel +SEGMENTATION_HEAD = semantic_segmentation.SegmentationHead + +_COCO_INPUT_PATH_BASE = 'coco/tfrecords' +_COCO_TRAIN_EXAMPLES = 118287 +_COCO_VAL_EXAMPLES = 5000 + +# pytype: disable=wrong-keyword-args + + +@dataclasses.dataclass +class Parser(maskrcnn.Parser): + """Panoptic Mask R-CNN parser config.""" + # If segmentation_resize_eval_groundtruth is set to False, original image + # sizes are used for eval. In that case, + # segmentation_groundtruth_padded_size has to be specified too to allow for + # batching the variable input sizes of images. + segmentation_resize_eval_groundtruth: bool = True + segmentation_groundtruth_padded_size: List[int] = dataclasses.field( + default_factory=list) + segmentation_ignore_label: int = 255 + panoptic_ignore_label: int = 0 + # Setting this to true will enable parsing category_mask and instance_mask. + include_panoptic_masks: bool = True + + +@dataclasses.dataclass +class TfExampleDecoder(common.TfExampleDecoder): + """A simple TF Example decoder config.""" + # Setting this to true will enable decoding category_mask and instance_mask. + include_panoptic_masks: bool = True + panoptic_category_mask_key: str = 'image/panoptic/category_mask' + panoptic_instance_mask_key: str = 'image/panoptic/instance_mask' + + +@dataclasses.dataclass +class DataDecoder(common.DataDecoder): + """Data decoder config.""" + simple_decoder: TfExampleDecoder = TfExampleDecoder() + + +@dataclasses.dataclass +class DataConfig(maskrcnn.DataConfig): + """Input config for training.""" + decoder: DataDecoder = DataDecoder() + parser: Parser = Parser() + + +@dataclasses.dataclass +class PanopticSegmentationGenerator(hyperparams.Config): + """Panoptic segmentation generator config.""" + output_size: List[int] = dataclasses.field( + default_factory=list) + mask_binarize_threshold: float = 0.5 + score_threshold: float = 0.5 + things_overlap_threshold: float = 0.5 + stuff_area_threshold: float = 4096.0 + things_class_label: int = 1 + void_class_label: int = 0 + void_instance_id: int = 0 + rescale_predictions: bool = False + + +@dataclasses.dataclass +class PanopticMaskRCNN(deepmac_maskrcnn.DeepMaskHeadRCNN): + """Panoptic Mask R-CNN model config.""" + segmentation_model: semantic_segmentation.SemanticSegmentationModel = ( + SEGMENTATION_MODEL(num_classes=2)) + include_mask = True + shared_backbone: bool = True + shared_decoder: bool = True + stuff_classes_offset: int = 0 + generate_panoptic_masks: bool = True + panoptic_segmentation_generator: PanopticSegmentationGenerator = PanopticSegmentationGenerator() # pylint:disable=line-too-long + + +@dataclasses.dataclass +class Losses(maskrcnn.Losses): + """Panoptic Mask R-CNN loss config.""" + semantic_segmentation_label_smoothing: float = 0.0 + semantic_segmentation_ignore_label: int = 255 + semantic_segmentation_gt_is_matting_map: bool = False + semantic_segmentation_class_weights: List[float] = dataclasses.field( + default_factory=list) + semantic_segmentation_use_groundtruth_dimension: bool = True + semantic_segmentation_top_k_percent_pixels: float = 1.0 + instance_segmentation_weight: float = 1.0 + semantic_segmentation_weight: float = 0.5 + + +@dataclasses.dataclass +class PanopticQualityEvaluator(hyperparams.Config): + """Panoptic Quality Evaluator config.""" + num_categories: int = 2 + ignored_label: int = 0 + max_instances_per_category: int = 256 + offset: int = 256 * 256 * 256 + is_thing: List[float] = dataclasses.field( + default_factory=list) + rescale_predictions: bool = False + report_per_class_metrics: bool = False + + +@dataclasses.dataclass +class PanopticMaskRCNNTask(maskrcnn.MaskRCNNTask): + """Panoptic Mask R-CNN task config.""" + model: PanopticMaskRCNN = PanopticMaskRCNN() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False, + drop_remainder=False) + segmentation_evaluation: semantic_segmentation.Evaluation = semantic_segmentation.Evaluation() # pylint: disable=line-too-long + losses: Losses = Losses() + init_checkpoint: Optional[str] = None + segmentation_init_checkpoint: Optional[str] = None + + # 'init_checkpoint_modules' controls the modules that need to be initialized + # from checkpoint paths given by 'init_checkpoint' and/or + # 'segmentation_init_checkpoint. Supports modules: + # 'backbone': Initialize MaskRCNN backbone + # 'segmentation_backbone': Initialize segmentation backbone + # 'segmentation_decoder': Initialize segmentation decoder + # 'all': Initialize all modules + init_checkpoint_modules: Optional[List[str]] = dataclasses.field( + default_factory=list) + panoptic_quality_evaluator: PanopticQualityEvaluator = PanopticQualityEvaluator() # pylint: disable=line-too-long + + +@exp_factory.register_config_factory('panoptic_fpn_coco') +def panoptic_fpn_coco() -> cfg.ExperimentConfig: + """COCO panoptic segmentation with Panoptic Mask R-CNN.""" + train_batch_size = 64 + eval_batch_size = 8 + steps_per_epoch = _COCO_TRAIN_EXAMPLES // train_batch_size + validation_steps = _COCO_VAL_EXAMPLES // eval_batch_size + + # coco panoptic dataset has category ids ranging from [0-200] inclusive. + # 0 is not used and represents the background class + # ids 1-91 represent thing categories (91) + # ids 92-200 represent stuff categories (109) + # for the segmentation task, we continue using id=0 for the background + # and map all thing categories to id=1, the remaining 109 stuff categories + # are shifted by an offset=90 given by num_thing classes - 1. This shifting + # will make all the stuff categories begin from id=2 and end at id=110 + num_panoptic_categories = 201 + num_thing_categories = 91 + num_semantic_segmentation_classes = 111 + + is_thing = [False] + for idx in range(1, num_panoptic_categories): + is_thing.append(True if idx <= num_thing_categories else False) + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig( + mixed_precision_dtype='float32', enable_xla=True), + task=PanopticMaskRCNNTask( + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', # pylint: disable=line-too-long + init_checkpoint_modules=['backbone'], + model=PanopticMaskRCNN( + num_classes=91, input_size=[1024, 1024, 3], + panoptic_segmentation_generator=PanopticSegmentationGenerator( + output_size=[640, 640], rescale_predictions=True), + stuff_classes_offset=90, + segmentation_model=SEGMENTATION_MODEL( + num_classes=num_semantic_segmentation_classes, + head=SEGMENTATION_HEAD( + level=2, + num_convs=0, + num_filters=128, + decoder_min_level=2, + decoder_max_level=6, + feature_fusion='panoptic_fpn_fusion'))), + losses=Losses(l2_weight_decay=0.00004), + train_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), + validation_data=DataConfig( + input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + parser=Parser( + segmentation_resize_eval_groundtruth=False, + segmentation_groundtruth_padded_size=[640, 640]), + drop_remainder=False), + annotation_file=os.path.join(_COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + segmentation_evaluation=semantic_segmentation.Evaluation( + report_per_class_iou=False, report_train_mean_iou=False), + panoptic_quality_evaluator=PanopticQualityEvaluator( + num_categories=num_panoptic_categories, + ignored_label=0, + is_thing=is_thing, + rescale_predictions=True)), + trainer=cfg.TrainerConfig( + train_steps=22500, + validation_steps=validation_steps, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [15000, 20000], + 'values': [0.12, 0.012, 0.0012], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 500, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config diff --git a/official/projects/panoptic/dataloaders/panoptic_deeplab_input.py b/official/projects/panoptic/dataloaders/panoptic_deeplab_input.py new file mode 100644 index 0000000000000000000000000000000000000000..6729c9e0d28692debca454b0e96552986341fdce --- /dev/null +++ b/official/projects/panoptic/dataloaders/panoptic_deeplab_input.py @@ -0,0 +1,359 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Data parser and processing for Panoptic Deeplab.""" + +from typing import List, Optional + +import numpy as np +import tensorflow as tf + +from official.vision.configs import common +from official.vision.dataloaders import parser +from official.vision.dataloaders import tf_example_decoder +from official.vision.ops import augment +from official.vision.ops import preprocess_ops + + +def _compute_gaussian_from_std(sigma): + """Computes the Gaussian and its size from a given standard deviation.""" + size = int(6 * sigma + 3) + x = np.arange(size, dtype=np.float) + y = x[:, np.newaxis] + x0, y0 = 3 * sigma + 1, 3 * sigma + 1 + gaussian = tf.constant( + np.exp(-((x - x0)**2 + (y - y0)**2) / (2 * sigma**2)), + dtype=tf.float32) + return gaussian, size + + +class TfExampleDecoder(tf_example_decoder.TfExampleDecoder): + """Tensorflow Example proto decoder.""" + + def __init__( + self, + regenerate_source_id: bool, + panoptic_category_mask_key: str = 'image/panoptic/category_mask', + panoptic_instance_mask_key: str = 'image/panoptic/instance_mask'): + super(TfExampleDecoder, + self).__init__( + include_mask=True, + regenerate_source_id=regenerate_source_id) + self._panoptic_category_mask_key = panoptic_category_mask_key + self._panoptic_instance_mask_key = panoptic_instance_mask_key + + self._panoptic_keys_to_features = { + panoptic_category_mask_key: + tf.io.FixedLenFeature((), tf.string, default_value=''), + panoptic_instance_mask_key: + tf.io.FixedLenFeature((), tf.string, default_value='') + } + + def decode(self, serialized_example): + decoded_tensors = super(TfExampleDecoder, + self).decode(serialized_example) + parsed_tensors = tf.io.parse_single_example( + serialized_example, self._panoptic_keys_to_features) + + category_mask = tf.io.decode_image( + parsed_tensors[self._panoptic_category_mask_key], channels=1) + instance_mask = tf.io.decode_image( + parsed_tensors[self._panoptic_instance_mask_key], channels=1) + category_mask.set_shape([None, None, 1]) + instance_mask.set_shape([None, None, 1]) + + decoded_tensors.update({ + 'groundtruth_panoptic_category_mask': category_mask, + 'groundtruth_panoptic_instance_mask': instance_mask + }) + return decoded_tensors + + +class Parser(parser.Parser): + """Parser to parse an image and its annotations into a dictionary of tensors.""" + + def __init__( + self, + output_size: List[int], + resize_eval_groundtruth: bool = True, + groundtruth_padded_size: Optional[List[int]] = None, + ignore_label: int = 0, + aug_rand_hflip: bool = False, + aug_scale_min: float = 1.0, + aug_scale_max: float = 1.0, + aug_type: Optional[common.Augmentation] = None, + sigma: float = 8.0, + small_instance_area_threshold: int = 4096, + small_instance_weight: float = 3.0, + dtype: str = 'float32'): + """Initializes parameters for parsing annotations in the dataset. + + Args: + output_size: `Tensor` or `list` for [height, width] of output image. The + output_size should be divided by the largest feature stride 2^max_level. + resize_eval_groundtruth: `bool`, if True, eval groundtruth masks are + resized to output_size. + groundtruth_padded_size: `Tensor` or `list` for [height, width]. When + resize_eval_groundtruth is set to False, the groundtruth masks are + padded to this size. + ignore_label: `int` the pixel with ignore label will not used for training + and evaluation. + aug_rand_hflip: `bool`, if True, augment training with random + horizontal flip. + aug_scale_min: `float`, the minimum scale applied to `output_size` for + data augmentation during training. + aug_scale_max: `float`, the maximum scale applied to `output_size` for + data augmentation during training. + aug_type: An optional Augmentation object with params for AutoAugment. + sigma: `float`, standard deviation for generating 2D Gaussian to encode + centers. + small_instance_area_threshold: `int`, small instance area threshold. + small_instance_weight: `float`, small instance weight. + dtype: `str`, data type. One of {`bfloat16`, `float32`, `float16`}. + """ + self._output_size = output_size + self._resize_eval_groundtruth = resize_eval_groundtruth + if (not resize_eval_groundtruth) and (groundtruth_padded_size is None): + raise ValueError( + 'groundtruth_padded_size ([height, width]) needs to be' + 'specified when resize_eval_groundtruth is False.') + self._groundtruth_padded_size = groundtruth_padded_size + self._ignore_label = ignore_label + + # Data augmentation. + self._aug_rand_hflip = aug_rand_hflip + self._aug_scale_min = aug_scale_min + self._aug_scale_max = aug_scale_max + + if aug_type and aug_type.type: + if aug_type.type == 'autoaug': + self._augmenter = augment.AutoAugment( + augmentation_name=aug_type.autoaug.augmentation_name, + cutout_const=aug_type.autoaug.cutout_const, + translate_const=aug_type.autoaug.translate_const) + else: + raise ValueError('Augmentation policy {} not supported.'.format( + aug_type.type)) + else: + self._augmenter = None + + self._dtype = dtype + + self._sigma = sigma + self._gaussian, self._gaussian_size = _compute_gaussian_from_std( + self._sigma) + self._gaussian = tf.reshape(self._gaussian, shape=[-1]) + self._small_instance_area_threshold = small_instance_area_threshold + self._small_instance_weight = small_instance_weight + + def _resize_and_crop_mask(self, mask, image_info, is_training): + """Resizes and crops mask using `image_info` dict.""" + height = image_info[0][0] + width = image_info[0][1] + mask = tf.reshape(mask, shape=[1, height, width, 1]) + mask += 1 + + if is_training or self._resize_eval_groundtruth: + image_scale = image_info[2, :] + offset = image_info[3, :] + mask = preprocess_ops.resize_and_crop_masks( + mask, + image_scale, + self._output_size, + offset) + else: + mask = tf.image.pad_to_bounding_box( + mask, 0, 0, + self._groundtruth_padded_size[0], + self._groundtruth_padded_size[1]) + mask -= 1 + + # Assign ignore label to the padded region. + mask = tf.where( + tf.equal(mask, -1), + self._ignore_label * tf.ones_like(mask), + mask) + mask = tf.squeeze(mask, axis=0) + return mask + + def _parse_data(self, data, is_training): + image = data['image'] + + if self._augmenter is not None and is_training: + image = self._augmenter.distort(image) + + image = preprocess_ops.normalize_image(image) + + category_mask = tf.cast( + data['groundtruth_panoptic_category_mask'][:, :, 0], + dtype=tf.float32) + instance_mask = tf.cast( + data['groundtruth_panoptic_instance_mask'][:, :, 0], + dtype=tf.float32) + + # Flips image randomly during training. + if self._aug_rand_hflip and is_training: + masks = tf.stack([category_mask, instance_mask], axis=0) + image, _, masks = preprocess_ops.random_horizontal_flip( + image=image, masks=masks) + category_mask = masks[0] + instance_mask = masks[1] + + # Resizes and crops image. + image, image_info = preprocess_ops.resize_and_crop_image( + image, + self._output_size, + self._output_size, + aug_scale_min=self._aug_scale_min if is_training else 1.0, + aug_scale_max=self._aug_scale_max if is_training else 1.0) + + category_mask = self._resize_and_crop_mask( + category_mask, + image_info, + is_training=is_training) + instance_mask = self._resize_and_crop_mask( + instance_mask, + image_info, + is_training=is_training) + + (instance_centers_heatmap, + instance_centers_offset, + semantic_weights) = self._encode_centers_and_offets( + instance_mask=instance_mask[:, :, 0]) + + # Cast image and labels as self._dtype + image = tf.cast(image, dtype=self._dtype) + category_mask = tf.cast(category_mask, dtype=self._dtype) + instance_mask = tf.cast(instance_mask, dtype=self._dtype) + instance_centers_heatmap = tf.cast( + instance_centers_heatmap, dtype=self._dtype) + instance_centers_offset = tf.cast( + instance_centers_offset, dtype=self._dtype) + + valid_mask = tf.not_equal( + category_mask, self._ignore_label) + things_mask = tf.not_equal( + instance_mask, self._ignore_label) + + labels = { + 'category_mask': category_mask, + 'instance_mask': instance_mask, + 'instance_centers_heatmap': instance_centers_heatmap, + 'instance_centers_offset': instance_centers_offset, + 'semantic_weights': semantic_weights, + 'valid_mask': valid_mask, + 'things_mask': things_mask, + 'image_info': image_info + } + return image, labels + + def _parse_train_data(self, data): + """Parses data for training.""" + return self._parse_data(data=data, is_training=True) + + def _parse_eval_data(self, data): + """Parses data for evaluation.""" + return self._parse_data(data=data, is_training=False) + + def _encode_centers_and_offets(self, instance_mask): + """Generates center heatmaps and offets from instance id mask. + + Args: + instance_mask: `tf.Tensor` of shape [height, width] representing + groundtruth instance id mask. + Returns: + instance_centers_heatmap: `tf.Tensor` of shape [height, width, 1] + instance_centers_offset: `tf.Tensor` of shape [height, width, 2] + """ + shape = tf.shape(instance_mask) + height, width = shape[0], shape[1] + + padding_start = int(3 * self._sigma + 1) + padding_end = int(3 * self._sigma + 2) + + # padding should be equal to self._gaussian_size which is calculated + # as size = int(6 * sigma + 3) + padding = padding_start + padding_end + + instance_centers_heatmap = tf.zeros( + shape=[height + padding, width + padding], + dtype=tf.float32) + centers_offset_y = tf.zeros( + shape=[height, width], + dtype=tf.float32) + centers_offset_x = tf.zeros( + shape=[height, width], + dtype=tf.float32) + semantic_weights = tf.ones( + shape=[height, width], + dtype=tf.float32) + + unique_instance_ids, _ = tf.unique(tf.reshape(instance_mask, [-1])) + + # The following method for encoding center heatmaps and offets is inspired + # by the reference implementation available at + # https://github.com/google-research/deeplab2/blob/main/data/sample_generator.py # pylint: disable=line-too-long + for instance_id in unique_instance_ids: + if instance_id == self._ignore_label: + continue + + mask = tf.equal(instance_mask, instance_id) + mask_area = tf.reduce_sum(tf.cast(mask, dtype=tf.float32)) + mask_indices = tf.cast(tf.where(mask), dtype=tf.float32) + mask_center = tf.reduce_mean(mask_indices, axis=0) + mask_center_y = tf.cast(tf.round(mask_center[0]), dtype=tf.int32) + mask_center_x = tf.cast(tf.round(mask_center[1]), dtype=tf.int32) + + if mask_area < self._small_instance_area_threshold: + semantic_weights = tf.where( + mask, + self._small_instance_weight, + semantic_weights) + + gaussian_size = self._gaussian_size + indices_y = tf.range(mask_center_y, mask_center_y + gaussian_size) + indices_x = tf.range(mask_center_x, mask_center_x + gaussian_size) + + indices = tf.stack(tf.meshgrid(indices_y, indices_x)) + indices = tf.reshape( + indices, shape=[2, gaussian_size * gaussian_size]) + indices = tf.transpose(indices) + + instance_centers_heatmap = tf.tensor_scatter_nd_max( + tensor=instance_centers_heatmap, + indices=indices, + updates=self._gaussian) + + centers_offset_y = tf.tensor_scatter_nd_update( + tensor=centers_offset_y, + indices=tf.cast(mask_indices, dtype=tf.int32), + updates=tf.cast(mask_center_y, dtype=tf.float32) - mask_indices[:, 0]) + + centers_offset_x = tf.tensor_scatter_nd_update( + tensor=centers_offset_x, + indices=tf.cast(mask_indices, dtype=tf.int32), + updates=tf.cast(mask_center_x, dtype=tf.float32) - mask_indices[:, 1]) + + instance_centers_heatmap = instance_centers_heatmap[ + padding_start:padding_start + height, + padding_start:padding_start + width] + instance_centers_heatmap = tf.expand_dims(instance_centers_heatmap, axis=-1) + + instance_centers_offset = tf.stack( + [centers_offset_y, centers_offset_x], + axis=-1) + + return (instance_centers_heatmap, + instance_centers_offset, + semantic_weights) diff --git a/official/vision/beta/projects/panoptic_maskrcnn/dataloaders/panoptic_maskrcnn_input.py b/official/projects/panoptic/dataloaders/panoptic_maskrcnn_input.py similarity index 88% rename from official/vision/beta/projects/panoptic_maskrcnn/dataloaders/panoptic_maskrcnn_input.py rename to official/projects/panoptic/dataloaders/panoptic_maskrcnn_input.py index 4df17b483cf0cd5590c32301b63f7861fc5d2419..ac207e14e5cc81f3978dfbb9f491a99d5f285240 100644 --- a/official/vision/beta/projects/panoptic_maskrcnn/dataloaders/panoptic_maskrcnn_input.py +++ b/official/projects/panoptic/dataloaders/panoptic_maskrcnn_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,50 +16,63 @@ import tensorflow as tf -from official.vision.beta.dataloaders import maskrcnn_input -from official.vision.beta.dataloaders import tf_example_decoder -from official.vision.beta.ops import preprocess_ops +from official.vision.dataloaders import maskrcnn_input +from official.vision.dataloaders import tf_example_decoder +from official.vision.ops import preprocess_ops class TfExampleDecoder(tf_example_decoder.TfExampleDecoder): """Tensorflow Example proto decoder.""" - def __init__(self, regenerate_source_id, - mask_binarize_threshold, include_panoptic_masks): + def __init__( + self, + regenerate_source_id: bool, + mask_binarize_threshold: float, + include_panoptic_masks: bool, + panoptic_category_mask_key: str = 'image/panoptic/category_mask', + panoptic_instance_mask_key: str = 'image/panoptic/instance_mask'): super(TfExampleDecoder, self).__init__( include_mask=True, regenerate_source_id=regenerate_source_id, mask_binarize_threshold=None) self._include_panoptic_masks = include_panoptic_masks + self._panoptic_category_mask_key = panoptic_category_mask_key + self._panoptic_instance_mask_key = panoptic_instance_mask_key keys_to_features = { 'image/segmentation/class/encoded': tf.io.FixedLenFeature((), tf.string, default_value='')} if include_panoptic_masks: keys_to_features.update({ - 'image/panoptic/category_mask': + panoptic_category_mask_key: tf.io.FixedLenFeature((), tf.string, default_value=''), - 'image/panoptic/instance_mask': - tf.io.FixedLenFeature((), tf.string, default_value='')}) + panoptic_instance_mask_key: + tf.io.FixedLenFeature((), tf.string, default_value='') + }) self._segmentation_keys_to_features = keys_to_features + def decode_segmentation_mask(self, parsed_tensors): + segmentation_mask = tf.io.decode_image( + parsed_tensors['image/segmentation/class/encoded'], channels=1) + segmentation_mask.set_shape([None, None, 1]) + return segmentation_mask + def decode(self, serialized_example): decoded_tensors = super(TfExampleDecoder, self).decode(serialized_example) parsed_tensors = tf.io.parse_single_example( serialized_example, self._segmentation_keys_to_features) - segmentation_mask = tf.io.decode_image( - parsed_tensors['image/segmentation/class/encoded'], - channels=1) - segmentation_mask.set_shape([None, None, 1]) - decoded_tensors.update({'groundtruth_segmentation_mask': segmentation_mask}) + decoded_tensors.update({ + 'groundtruth_segmentation_mask': + self.decode_segmentation_mask(parsed_tensors) + }) if self._include_panoptic_masks: category_mask = tf.io.decode_image( - parsed_tensors['image/panoptic/category_mask'], + parsed_tensors[self._panoptic_category_mask_key], channels=1) instance_mask = tf.io.decode_image( - parsed_tensors['image/panoptic/instance_mask'], + parsed_tensors[self._panoptic_instance_mask_key], channels=1) category_mask.set_shape([None, None, 1]) instance_mask.set_shape([None, None, 1]) @@ -214,18 +227,21 @@ class Parser(maskrcnn_input.Parser): are supposed to be used in computing the segmentation loss while training. """ + # (height, width, num_channels = 1) + # All the operations below support num_channels >= 1. segmentation_mask = data['groundtruth_segmentation_mask'] # Flips image randomly during training. if self.aug_rand_hflip: masks = data['groundtruth_instance_masks'] + num_image_channels = data['image'].shape.as_list()[-1] image_mask = tf.concat([data['image'], segmentation_mask], axis=2) image_mask, boxes, masks = preprocess_ops.random_horizontal_flip( image_mask, data['groundtruth_boxes'], masks) - segmentation_mask = image_mask[:, :, -1:] - image = image_mask[:, :, :-1] + image = image_mask[:, :, :num_image_channels] + segmentation_mask = image_mask[:, :, num_image_channels:] data['image'] = image data['groundtruth_boxes'] = boxes @@ -237,14 +253,14 @@ class Parser(maskrcnn_input.Parser): image_scale = image_info[2, :] offset = image_info[3, :] - segmentation_mask = tf.reshape( - segmentation_mask, shape=[1, data['height'], data['width']]) + # (height, width, num_channels = 1) segmentation_mask = tf.cast(segmentation_mask, tf.float32) # Pad label and make sure the padded region assigned to the ignore label. # The label is first offset by +1 and then padded with 0. segmentation_mask += 1 - segmentation_mask = tf.expand_dims(segmentation_mask, axis=3) + # (1, height, width, num_channels = 1) + segmentation_mask = tf.expand_dims(segmentation_mask, axis=0) segmentation_mask = preprocess_ops.resize_and_crop_masks( segmentation_mask, image_scale, self._output_size, offset) segmentation_mask -= 1 @@ -252,6 +268,7 @@ class Parser(maskrcnn_input.Parser): tf.equal(segmentation_mask, -1), self._segmentation_ignore_label * tf.ones_like(segmentation_mask), segmentation_mask) + # (height, width, num_channels = 1) segmentation_mask = tf.squeeze(segmentation_mask, axis=0) segmentation_valid_mask = tf.not_equal( segmentation_mask, self._segmentation_ignore_label) @@ -284,9 +301,13 @@ class Parser(maskrcnn_input.Parser): shape [height_l, width_l, 4] representing anchor boxes at each level. """ + def _process_mask(mask, ignore_label, image_info): + # (height, width, num_channels = 1) + # All the operations below support num_channels >= 1. mask = tf.cast(mask, dtype=tf.float32) - mask = tf.reshape(mask, shape=[1, data['height'], data['width'], 1]) + # (1, height, width, num_channels = 1) + mask = tf.expand_dims(mask, axis=0) mask += 1 if self._segmentation_resize_eval_groundtruth: @@ -307,12 +328,14 @@ class Parser(maskrcnn_input.Parser): tf.equal(mask, -1), ignore_label * tf.ones_like(mask), mask) + # (height, width, num_channels = 1) mask = tf.squeeze(mask, axis=0) return mask image, labels = super(Parser, self)._parse_eval_data(data) image_info = labels['image_info'] + # (height, width, num_channels = 1) segmentation_mask = _process_mask( data['groundtruth_segmentation_mask'], self._segmentation_ignore_label, image_info) diff --git a/official/projects/panoptic/losses/panoptic_deeplab_losses.py b/official/projects/panoptic/losses/panoptic_deeplab_losses.py new file mode 100644 index 0000000000000000000000000000000000000000..f109bf9d5414a0003af6c79325cf451f64752270 --- /dev/null +++ b/official/projects/panoptic/losses/panoptic_deeplab_losses.py @@ -0,0 +1,148 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Losses used for panoptic deeplab model.""" + +import tensorflow as tf + +from official.modeling import tf_utils +from official.projects.panoptic.ops import mask_ops + +EPSILON = 1e-5 + + +class WeightedBootstrappedCrossEntropyLoss: + """Weighted semantic segmentation loss.""" + + def __init__(self, label_smoothing, class_weights, ignore_label, + top_k_percent_pixels=1.0): + self._top_k_percent_pixels = top_k_percent_pixels + self._class_weights = class_weights + self._ignore_label = ignore_label + self._label_smoothing = label_smoothing + + def __call__(self, logits, labels, sample_weight=None): + _, _, _, num_classes = logits.get_shape().as_list() + + logits = tf.image.resize( + logits, tf.shape(labels)[1:3], + method=tf.image.ResizeMethod.BILINEAR) + + valid_mask = tf.not_equal(labels, self._ignore_label) + normalizer = tf.reduce_sum(tf.cast(valid_mask, tf.float32)) + EPSILON + # Assign pixel with ignore label to class 0 (background). The loss on the + # pixel will later be masked out. + labels = tf.where(valid_mask, labels, tf.zeros_like(labels)) + + labels = tf.squeeze(tf.cast(labels, tf.int32), axis=3) + valid_mask = tf.squeeze(tf.cast(valid_mask, tf.float32), axis=3) + onehot_labels = tf.one_hot(labels, num_classes) + onehot_labels = onehot_labels * ( + 1 - self._label_smoothing) + self._label_smoothing / num_classes + cross_entropy_loss = tf.nn.softmax_cross_entropy_with_logits( + labels=onehot_labels, logits=logits) + + if not self._class_weights: + class_weights = [1] * num_classes + else: + class_weights = self._class_weights + + if num_classes != len(class_weights): + raise ValueError( + 'Length of class_weights should be {}'.format(num_classes)) + + weight_mask = tf.einsum('...y,y->...', + tf.one_hot(labels, num_classes, dtype=tf.float32), + tf.constant(class_weights, tf.float32)) + valid_mask *= weight_mask + + if sample_weight is not None: + valid_mask *= sample_weight + + cross_entropy_loss *= tf.cast(valid_mask, tf.float32) + + if self._top_k_percent_pixels >= 1.0: + loss = tf.reduce_sum(cross_entropy_loss) / normalizer + else: + loss = self._compute_top_k_loss(cross_entropy_loss) + return loss + + def _compute_top_k_loss(self, loss): + """Computs top k loss.""" + batch_size = tf.shape(loss)[0] + loss = tf.reshape(loss, shape=[batch_size, -1]) + + top_k_pixels = tf.cast( + self._top_k_percent_pixels * + tf.cast(tf.shape(loss)[-1], dtype=tf.float32), + dtype=tf.int32) + + # shape: [batch_size, top_k_pixels] + per_sample_top_k_loss = tf.map_fn( + fn=lambda x: tf.nn.top_k(x, k=top_k_pixels, sorted=False)[0], + elems=loss, + parallel_iterations=32, + fn_output_signature=tf.float32) + + # shape: [batch_size] + per_sample_normalizer = tf.reduce_sum( + tf.cast( + tf.not_equal(per_sample_top_k_loss, 0.0), + dtype=tf.float32), + axis=-1) + EPSILON + per_sample_normalized_loss = tf.reduce_sum( + per_sample_top_k_loss, axis=-1) / per_sample_normalizer + + normalized_loss = tf_utils.safe_mean(per_sample_normalized_loss) + return normalized_loss + + +class CenterHeatmapLoss: + """Center heatmap loss.""" + + def __init__(self): + self._loss_fn = tf.losses.mean_squared_error + + def __call__(self, logits, labels, sample_weight=None): + _, height, width, _ = labels.get_shape().as_list() + logits = tf.image.resize( + logits, + size=[height, width], + method=tf.image.ResizeMethod.BILINEAR) + + loss = self._loss_fn(y_true=labels, y_pred=logits) + + if sample_weight is not None: + loss *= sample_weight + + return tf_utils.safe_mean(loss) + + +class CenterOffsetLoss: + """Center offset loss.""" + + def __init__(self): + self._loss_fn = tf.losses.mean_absolute_error + + def __call__(self, logits, labels, sample_weight=None): + _, height, width, _ = labels.get_shape().as_list() + logits = mask_ops.resize_and_rescale_offsets( + logits, target_size=[height, width]) + + loss = self._loss_fn(y_true=labels, y_pred=logits) + + if sample_weight is not None: + loss *= sample_weight + + return tf_utils.safe_mean(loss) diff --git a/official/projects/panoptic/modeling/factory.py b/official/projects/panoptic/modeling/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..d9769c18f91ec40e78e3975aa29288e265e4abaf --- /dev/null +++ b/official/projects/panoptic/modeling/factory.py @@ -0,0 +1,252 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Factory method to build panoptic segmentation model.""" +from typing import Optional + +import tensorflow as tf + +from official.projects.deepmac_maskrcnn.tasks import deep_mask_head_rcnn +from official.projects.panoptic.configs import panoptic_deeplab as panoptic_deeplab_cfg +from official.projects.panoptic.configs import panoptic_maskrcnn as panoptic_maskrcnn_cfg +from official.projects.panoptic.modeling import panoptic_deeplab_model +from official.projects.panoptic.modeling import panoptic_maskrcnn_model +from official.projects.panoptic.modeling.heads import panoptic_deeplab_heads +from official.projects.panoptic.modeling.layers import panoptic_deeplab_merge +from official.projects.panoptic.modeling.layers import panoptic_segmentation_generator +from official.vision.modeling import backbones +from official.vision.modeling.decoders import factory as decoder_factory +from official.vision.modeling.heads import segmentation_heads + + +def build_panoptic_maskrcnn( + input_specs: tf.keras.layers.InputSpec, + model_config: panoptic_maskrcnn_cfg.PanopticMaskRCNN, + l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + """Builds Panoptic Mask R-CNN model. + + This factory function builds the mask rcnn first, builds the non-shared + semantic segmentation layers, and finally combines the two models to form + the panoptic segmentation model. + + Args: + input_specs: `tf.keras.layers.InputSpec` specs of the input tensor. + model_config: Config instance for the panoptic maskrcnn model. + l2_regularizer: Optional `tf.keras.regularizers.Regularizer`, if specified, + the model is built with the provided regularization layer. + Returns: + tf.keras.Model for the panoptic segmentation model. + """ + norm_activation_config = model_config.norm_activation + segmentation_config = model_config.segmentation_model + + # Builds the maskrcnn model. + maskrcnn_model = deep_mask_head_rcnn.build_maskrcnn( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + # Builds the semantic segmentation branch. + if not model_config.shared_backbone: + segmentation_backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=segmentation_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + segmentation_decoder_input_specs = segmentation_backbone.output_specs + else: + segmentation_backbone = None + segmentation_decoder_input_specs = maskrcnn_model.backbone.output_specs + + if not model_config.shared_decoder: + segmentation_decoder = decoder_factory.build_decoder( + input_specs=segmentation_decoder_input_specs, + model_config=segmentation_config, + l2_regularizer=l2_regularizer) + decoder_config = segmentation_decoder.get_config() + else: + segmentation_decoder = None + decoder_config = maskrcnn_model.decoder.get_config() + + segmentation_head_config = segmentation_config.head + detection_head_config = model_config.detection_head + postprocessing_config = model_config.panoptic_segmentation_generator + + segmentation_head = segmentation_heads.SegmentationHead( + num_classes=segmentation_config.num_classes, + level=segmentation_head_config.level, + num_convs=segmentation_head_config.num_convs, + prediction_kernel_size=segmentation_head_config.prediction_kernel_size, + num_filters=segmentation_head_config.num_filters, + upsample_factor=segmentation_head_config.upsample_factor, + feature_fusion=segmentation_head_config.feature_fusion, + decoder_min_level=segmentation_head_config.decoder_min_level, + decoder_max_level=segmentation_head_config.decoder_max_level, + low_level=segmentation_head_config.low_level, + low_level_num_filters=segmentation_head_config.low_level_num_filters, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + num_decoder_filters=decoder_config['num_filters'], + kernel_regularizer=l2_regularizer) + + if model_config.generate_panoptic_masks: + max_num_detections = model_config.detection_generator.max_num_detections + mask_binarize_threshold = postprocessing_config.mask_binarize_threshold + panoptic_segmentation_generator_obj = ( + panoptic_segmentation_generator.PanopticSegmentationGeneratorV2( + output_size=postprocessing_config.output_size, + max_num_detections=max_num_detections, + stuff_classes_offset=model_config.stuff_classes_offset, + mask_binarize_threshold=mask_binarize_threshold, + score_threshold=postprocessing_config.score_threshold, + things_overlap_threshold=postprocessing_config + .things_overlap_threshold, + things_class_label=postprocessing_config.things_class_label, + stuff_area_threshold=postprocessing_config.stuff_area_threshold, + void_class_label=postprocessing_config.void_class_label, + void_instance_id=postprocessing_config.void_instance_id, + rescale_predictions=postprocessing_config.rescale_predictions)) + else: + panoptic_segmentation_generator_obj = None + + # Combines maskrcnn, and segmentation models to build panoptic segmentation + # model. + + model = panoptic_maskrcnn_model.PanopticMaskRCNNModel( + backbone=maskrcnn_model.backbone, + decoder=maskrcnn_model.decoder, + rpn_head=maskrcnn_model.rpn_head, + detection_head=maskrcnn_model.detection_head, + roi_generator=maskrcnn_model.roi_generator, + roi_sampler=maskrcnn_model.roi_sampler, + roi_aligner=maskrcnn_model.roi_aligner, + detection_generator=maskrcnn_model.detection_generator, + panoptic_segmentation_generator=panoptic_segmentation_generator_obj, + mask_head=maskrcnn_model.mask_head, + mask_sampler=maskrcnn_model.mask_sampler, + mask_roi_aligner=maskrcnn_model.mask_roi_aligner, + segmentation_backbone=segmentation_backbone, + segmentation_decoder=segmentation_decoder, + segmentation_head=segmentation_head, + class_agnostic_bbox_pred=detection_head_config.class_agnostic_bbox_pred, + cascade_class_ensemble=detection_head_config.cascade_class_ensemble, + min_level=model_config.min_level, + max_level=model_config.max_level, + num_scales=model_config.anchor.num_scales, + aspect_ratios=model_config.anchor.aspect_ratios, + anchor_size=model_config.anchor.anchor_size) + return model + + +def build_panoptic_deeplab( + input_specs: tf.keras.layers.InputSpec, + model_config: panoptic_deeplab_cfg.PanopticDeeplab, + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None +) -> tf.keras.Model: + """Builds Panoptic Deeplab model. + + + Args: + input_specs: `tf.keras.layers.InputSpec` specs of the input tensor. + model_config: Config instance for the panoptic deeplab model. + l2_regularizer: Optional `tf.keras.regularizers.Regularizer`, if specified, + the model is built with the provided regularization layer. + Returns: + tf.keras.Model for the panoptic segmentation model. + """ + norm_activation_config = model_config.norm_activation + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + + semantic_decoder = decoder_factory.build_decoder( + input_specs=backbone.output_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + if model_config.shared_decoder: + instance_decoder = None + else: + # semantic and instance share the same decoder type + instance_decoder = decoder_factory.build_decoder( + input_specs=backbone.output_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + semantic_head_config = model_config.semantic_head + instance_head_config = model_config.instance_head + + semantic_head = panoptic_deeplab_heads.SemanticHead( + num_classes=model_config.num_classes, + level=semantic_head_config.level, + num_convs=semantic_head_config.num_convs, + kernel_size=semantic_head_config.kernel_size, + prediction_kernel_size=semantic_head_config.prediction_kernel_size, + num_filters=semantic_head_config.num_filters, + use_depthwise_convolution=semantic_head_config.use_depthwise_convolution, + upsample_factor=semantic_head_config.upsample_factor, + low_level=semantic_head_config.low_level, + low_level_num_filters=semantic_head_config.low_level_num_filters, + fusion_num_output_filters=semantic_head_config.fusion_num_output_filters, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + instance_head = panoptic_deeplab_heads.InstanceHead( + level=instance_head_config.level, + num_convs=instance_head_config.num_convs, + kernel_size=instance_head_config.kernel_size, + prediction_kernel_size=instance_head_config.prediction_kernel_size, + num_filters=instance_head_config.num_filters, + use_depthwise_convolution=instance_head_config.use_depthwise_convolution, + upsample_factor=instance_head_config.upsample_factor, + low_level=instance_head_config.low_level, + low_level_num_filters=instance_head_config.low_level_num_filters, + fusion_num_output_filters=instance_head_config.fusion_num_output_filters, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + if model_config.generate_panoptic_masks: + post_processing_config = model_config.post_processor + post_processor = panoptic_deeplab_merge.PostProcessor( + output_size=post_processing_config.output_size, + center_score_threshold=post_processing_config.center_score_threshold, + thing_class_ids=post_processing_config.thing_class_ids, + label_divisor=post_processing_config.label_divisor, + stuff_area_limit=post_processing_config.stuff_area_limit, + ignore_label=post_processing_config.ignore_label, + nms_kernel=post_processing_config.nms_kernel, + keep_k_centers=post_processing_config.keep_k_centers, + rescale_predictions=post_processing_config.rescale_predictions) + else: + post_processor = None + + model = panoptic_deeplab_model.PanopticDeeplabModel( + backbone=backbone, + semantic_decoder=semantic_decoder, + instance_decoder=instance_decoder, + semantic_head=semantic_head, + instance_head=instance_head, + post_processor=post_processor) + + return model diff --git a/official/projects/panoptic/modeling/heads/panoptic_deeplab_heads.py b/official/projects/panoptic/modeling/heads/panoptic_deeplab_heads.py new file mode 100644 index 0000000000000000000000000000000000000000..93113a333e8644d771b19f9e00a117c3cdfaa55f --- /dev/null +++ b/official/projects/panoptic/modeling/heads/panoptic_deeplab_heads.py @@ -0,0 +1,434 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions for Panoptic Deeplab heads.""" + +from typing import List, Mapping, Optional, Tuple, Union +import tensorflow as tf + +from official.modeling import tf_utils +from official.projects.panoptic.modeling.layers import fusion_layers +from official.vision.ops import spatial_transform_ops + + +class PanopticDeeplabHead(tf.keras.layers.Layer): + """Creates a panoptic deeplab head.""" + + def __init__( + self, + level: Union[int, str], + num_convs: int = 2, + num_filters: int = 256, + kernel_size: int = 3, + use_depthwise_convolution: bool = False, + upsample_factor: int = 1, + low_level: Optional[List[int]] = None, + low_level_num_filters: Optional[List[int]] = None, + fusion_num_output_filters: int = 256, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a panoptic deeplab head. + + Args: + level: An `int` or `str`, level to use to build head. + num_convs: An `int` number of stacked convolution before the last + prediction layer. + num_filters: An `int` number to specify the number of filters used. + Default is 256. + kernel_size: An `int` number to specify the kernel size of the + stacked convolutions before the last prediction layer. + use_depthwise_convolution: A bool to specify if use depthwise separable + convolutions. + upsample_factor: An `int` number to specify the upsampling factor to + generate finer mask. Default 1 means no upsampling is applied. + low_level: An `int` of backbone level to be used for feature fusion. It is + used when feature_fusion is set to `deeplabv3plus`. + low_level_num_filters: An `int` of reduced number of filters for the low + level features before fusing it with higher level features. It is only + used when feature_fusion is set to `deeplabv3plus`. + fusion_num_output_filters: An `int` number to specify the number of + filters used by output layer of fusion module. Default is 256. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + super(PanopticDeeplabHead, self).__init__(**kwargs) + + self._config_dict = { + 'level': level, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'kernel_size': kernel_size, + 'use_depthwise_convolution': use_depthwise_convolution, + 'upsample_factor': upsample_factor, + 'low_level': low_level, + 'low_level_num_filters': low_level_num_filters, + 'fusion_num_output_filters': fusion_num_output_filters, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer + } + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation = tf_utils.get_activation(activation) + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the variables of the head.""" + kernel_size = self._config_dict['kernel_size'] + use_depthwise_convolution = self._config_dict['use_depthwise_convolution'] + random_initializer = tf.keras.initializers.RandomNormal(stddev=0.01) + conv_op = tf.keras.layers.Conv2D + conv_kwargs = { + 'kernel_size': kernel_size if not use_depthwise_convolution else 1, + 'padding': 'same', + 'use_bias': True, + 'kernel_initializer': random_initializer, + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + } + bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + self._panoptic_deeplab_fusion = fusion_layers.PanopticDeepLabFusion( + level=self._config_dict['level'], + low_level=self._config_dict['low_level'], + num_projection_filters=self._config_dict['low_level_num_filters'], + num_output_filters=self._config_dict['fusion_num_output_filters'], + use_depthwise_convolution=self + ._config_dict['use_depthwise_convolution'], + activation=self._config_dict['activation'], + use_sync_bn=self._config_dict['use_sync_bn'], + norm_momentum=self._config_dict['norm_momentum'], + norm_epsilon=self._config_dict['norm_epsilon'], + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer']) + + # Stacked convolutions layers. + self._convs = [] + self._norms = [] + for i in range(self._config_dict['num_convs']): + if use_depthwise_convolution: + self._convs.append( + tf.keras.layers.DepthwiseConv2D( + name='panoptic_deeplab_head_depthwise_conv_{}'.format(i), + kernel_size=kernel_size, + padding='same', + use_bias=True, + depthwise_initializer=random_initializer, + depthwise_regularizer=self._config_dict['kernel_regularizer'], + depth_multiplier=1)) + norm_name = 'panoptic_deeplab_head_depthwise_norm_{}'.format(i) + self._norms.append(bn_op(name=norm_name, **bn_kwargs)) + conv_name = 'panoptic_deeplab_head_conv_{}'.format(i) + self._convs.append( + conv_op( + name=conv_name, + filters=self._config_dict['num_filters'], + **conv_kwargs)) + norm_name = 'panoptic_deeplab_head_norm_{}'.format(i) + self._norms.append(bn_op(name=norm_name, **bn_kwargs)) + + super().build(input_shape) + + def call(self, inputs: Tuple[Union[tf.Tensor, Mapping[str, tf.Tensor]], + Union[tf.Tensor, Mapping[str, tf.Tensor]]], + training=None): + """Forward pass of the head. + + It supports both a tuple of 2 tensors or 2 dictionaries. The first is + backbone endpoints, and the second is decoder endpoints. When inputs are + tensors, they are from a single level of feature maps. When inputs are + dictionaries, they contain multiple levels of feature maps, where the key + is the index of feature map. + + Args: + inputs: A tuple of 2 feature map tensors of shape + [batch, height_l, width_l, channels] or 2 dictionaries of tensors: + - key: A `str` of the level of the multilevel features. + - values: A `tf.Tensor` of the feature map tensors, whose shape is + [batch, height_l, width_l, channels]. + training: A bool, runs the model in training/eval mode. + + Returns: + A `tf.Tensor` of the fused backbone and decoder features. + """ + if training is None: + training = tf.keras.backend.learning_phase() + + x = self._panoptic_deeplab_fusion(inputs, training=training) + + for conv, norm in zip(self._convs, self._norms): + x = conv(x) + x = norm(x, training=training) + x = self._activation(x) + + if self._config_dict['upsample_factor'] > 1: + x = spatial_transform_ops.nearest_upsampling( + x, scale=self._config_dict['upsample_factor']) + + return x + + def get_config(self): + base_config = super().get_config() + return dict(list(base_config.items()) + list(self._config_dict.items())) + + @classmethod + def from_config(cls, config): + return cls(**config) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SemanticHead(PanopticDeeplabHead): + """Creates a semantic head.""" + + def __init__( + self, + num_classes: int, + level: Union[int, str], + num_convs: int = 2, + num_filters: int = 256, + kernel_size: int = 3, + prediction_kernel_size: int = 3, + use_depthwise_convolution: bool = False, + upsample_factor: int = 1, + low_level: Optional[List[int]] = None, + low_level_num_filters: Optional[List[int]] = None, + fusion_num_output_filters: int = 256, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a instance center head. + + Args: + num_classes: An `int` number of mask classification categories. The number + of classes does not include background class. + level: An `int` or `str`, level to use to build head. + num_convs: An `int` number of stacked convolution before the last + prediction layer. + num_filters: An `int` number to specify the number of filters used. + Default is 256. + kernel_size: An `int` number to specify the kernel size of the + stacked convolutions before the last prediction layer. + prediction_kernel_size: An `int` number to specify the kernel size of the + prediction layer. + use_depthwise_convolution: A bool to specify if use depthwise separable + convolutions. + upsample_factor: An `int` number to specify the upsampling factor to + generate finer mask. Default 1 means no upsampling is applied. + low_level: An `int` of backbone level to be used for feature fusion. It is + used when feature_fusion is set to `deeplabv3plus`. + low_level_num_filters: An `int` of reduced number of filters for the low + level features before fusing it with higher level features. It is only + used when feature_fusion is set to `deeplabv3plus`. + fusion_num_output_filters: An `int` number to specify the number of + filters used by output layer of fusion module. Default is 256. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + super(SemanticHead, self).__init__( + level=level, + num_convs=num_convs, + num_filters=num_filters, + use_depthwise_convolution=use_depthwise_convolution, + kernel_size=kernel_size, + upsample_factor=upsample_factor, + low_level=low_level, + low_level_num_filters=low_level_num_filters, + fusion_num_output_filters=fusion_num_output_filters, + activation=activation, + use_sync_bn=use_sync_bn, + norm_momentum=norm_momentum, + norm_epsilon=norm_epsilon, + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer, + **kwargs) + self._config_dict.update({ + 'num_classes': num_classes, + 'prediction_kernel_size': prediction_kernel_size}) + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the variables of the semantic head.""" + super(SemanticHead, self).build(input_shape) + self._classifier = tf.keras.layers.Conv2D( + name='semantic_output', + filters=self._config_dict['num_classes'], + kernel_size=self._config_dict['prediction_kernel_size'], + padding='same', + bias_initializer=tf.zeros_initializer(), + kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer']) + + def call(self, inputs: Tuple[Union[tf.Tensor, Mapping[str, tf.Tensor]], + Union[tf.Tensor, Mapping[str, tf.Tensor]]], + training=None): + """Forward pass of the head.""" + + if training is None: + training = tf.keras.backend.learning_phase() + x = super(SemanticHead, self).call(inputs, training=training) + outputs = self._classifier(x) + return outputs + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class InstanceHead(PanopticDeeplabHead): + """Creates a instance head.""" + + def __init__( + self, + level: Union[int, str], + num_convs: int = 2, + num_filters: int = 256, + kernel_size: int = 3, + prediction_kernel_size: int = 3, + use_depthwise_convolution: bool = False, + upsample_factor: int = 1, + low_level: Optional[List[int]] = None, + low_level_num_filters: Optional[List[int]] = None, + fusion_num_output_filters: int = 256, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a instance center head. + + Args: + level: An `int` or `str`, level to use to build head. + num_convs: An `int` number of stacked convolution before the last + prediction layer. + num_filters: An `int` number to specify the number of filters used. + Default is 256. + kernel_size: An `int` number to specify the kernel size of the + stacked convolutions before the last prediction layer. + prediction_kernel_size: An `int` number to specify the kernel size of the + prediction layer. + use_depthwise_convolution: A bool to specify if use depthwise separable + convolutions. + upsample_factor: An `int` number to specify the upsampling factor to + generate finer mask. Default 1 means no upsampling is applied. + low_level: An `int` of backbone level to be used for feature fusion. It is + used when feature_fusion is set to `deeplabv3plus`. + low_level_num_filters: An `int` of reduced number of filters for the low + level features before fusing it with higher level features. It is only + used when feature_fusion is set to `deeplabv3plus`. + fusion_num_output_filters: An `int` number to specify the number of + filters used by output layer of fusion module. Default is 256. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + super(InstanceHead, self).__init__( + level=level, + num_convs=num_convs, + num_filters=num_filters, + use_depthwise_convolution=use_depthwise_convolution, + kernel_size=kernel_size, + upsample_factor=upsample_factor, + low_level=low_level, + low_level_num_filters=low_level_num_filters, + fusion_num_output_filters=fusion_num_output_filters, + activation=activation, + use_sync_bn=use_sync_bn, + norm_momentum=norm_momentum, + norm_epsilon=norm_epsilon, + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer, + **kwargs) + self._config_dict.update({ + 'prediction_kernel_size': prediction_kernel_size}) + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the variables of the instance head.""" + super(InstanceHead, self).build(input_shape) + self._instance_center_prediction_conv = tf.keras.layers.Conv2D( + name='instance_centers_heatmap', + filters=1, + kernel_size=self._config_dict['prediction_kernel_size'], + padding='same', + bias_initializer=tf.zeros_initializer(), + kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer']) + + self._instance_center_regression_conv = tf.keras.layers.Conv2D( + name='instance_centers_offset', + filters=2, + kernel_size=self._config_dict['prediction_kernel_size'], + padding='same', + bias_initializer=tf.zeros_initializer(), + kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer']) + + def call(self, inputs: Tuple[Union[tf.Tensor, Mapping[str, tf.Tensor]], + Union[tf.Tensor, Mapping[str, tf.Tensor]]], + training=None): + """Forward pass of the head.""" + + if training is None: + training = tf.keras.backend.learning_phase() + + x = super(InstanceHead, self).call(inputs, training=training) + instance_centers_heatmap = self._instance_center_prediction_conv(x) + instance_centers_offset = self._instance_center_regression_conv(x) + outputs = { + 'instance_centers_heatmap': instance_centers_heatmap, + 'instance_centers_offset': instance_centers_offset + } + return outputs diff --git a/official/projects/panoptic/modeling/layers/fusion_layers.py b/official/projects/panoptic/modeling/layers/fusion_layers.py new file mode 100644 index 0000000000000000000000000000000000000000..42db299738d6f444c1ac3223c7863a38a41ebba0 --- /dev/null +++ b/official/projects/panoptic/modeling/layers/fusion_layers.py @@ -0,0 +1,180 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains feature fusion blocks for panoptic segmentation models.""" +from typing import Any, Callable, Dict, List, Mapping, Optional, Union + +import tensorflow as tf + +from official.modeling import tf_utils + + +# Type annotations. +States = Dict[str, tf.Tensor] +Activation = Union[str, Callable] + + +class PanopticDeepLabFusion(tf.keras.layers.Layer): + """Creates a Panoptic DeepLab feature Fusion layer. + + This implements the feature fusion introduced in the paper: + Cheng et al. Panoptic-DeepLab + (https://arxiv.org/pdf/1911.10194.pdf) + """ + + def __init__( + self, + level: int, + low_level: List[int], + num_projection_filters: List[int], + num_output_filters: int = 256, + use_depthwise_convolution: bool = False, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + **kwargs): + """Initializes panoptic FPN feature fusion layer. + + Args: + level: An `int` level at which the decoder was appled at. + low_level: A list of `int` of minimum level to use in feature fusion. + num_projection_filters: A list of `int` with number of filters for + projection conv2d layers. + num_output_filters: An `int` number of filters in output conv2d layers. + use_depthwise_convolution: A bool to specify if use depthwise separable + convolutions. + activation: A `str` name of the activation function. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + interpolation: A `str` interpolation method for upsampling. Defaults to + `bilinear`. + **kwargs: Additional keyword arguments to be passed. + Returns: + A `float` `tf.Tensor` of shape [batch_size, feature_height, feature_width, + feature_channel]. + """ + super(PanopticDeepLabFusion, self).__init__(**kwargs) + + self._config_dict = { + 'level': level, + 'low_level': low_level, + 'num_projection_filters': num_projection_filters, + 'num_output_filters': num_output_filters, + 'use_depthwise_convolution': use_depthwise_convolution, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + 'interpolation': interpolation + } + if tf.keras.backend.image_data_format() == 'channels_last': + self._channel_axis = -1 + else: + self._channel_axis = 1 + self._activation = tf_utils.get_activation(activation) + + def build(self, input_shape: List[tf.TensorShape]): + conv_op = tf.keras.layers.Conv2D + conv_kwargs = { + 'padding': 'same', + 'use_bias': True, + 'kernel_initializer': tf.initializers.VarianceScaling(), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + } + bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_kwargs = { + 'axis': self._channel_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + self._projection_convs = [] + self._projection_norms = [] + self._fusion_convs = [] + self._fusion_norms = [] + for i in range(len(self._config_dict['low_level'])): + self._projection_convs.append( + conv_op( + filters=self._config_dict['num_projection_filters'][i], + kernel_size=1, + **conv_kwargs)) + if self._config_dict['use_depthwise_convolution']: + depthwise_initializer = tf.keras.initializers.RandomNormal(stddev=0.01) + fusion_conv = tf.keras.Sequential([ + tf.keras.layers.DepthwiseConv2D( + kernel_size=5, + padding='same', + use_bias=True, + depthwise_initializer=depthwise_initializer, + depthwise_regularizer=self._config_dict['kernel_regularizer'], + depth_multiplier=1), + bn_op(**bn_kwargs), + conv_op( + filters=self._config_dict['num_output_filters'], + kernel_size=1, + **conv_kwargs)]) + else: + fusion_conv = conv_op( + filters=self._config_dict['num_output_filters'], + kernel_size=5, + **conv_kwargs) + self._fusion_convs.append(fusion_conv) + self._projection_norms.append(bn_op(**bn_kwargs)) + self._fusion_norms.append(bn_op(**bn_kwargs)) + + def call(self, inputs, training=None): + if training is None: + training = tf.keras.backend.learning_phase() + + backbone_output = inputs[0] + decoder_output = inputs[1][str(self._config_dict['level'])] + + x = decoder_output + for i in range(len(self._config_dict['low_level'])): + feature = backbone_output[str(self._config_dict['low_level'][i])] + feature = self._projection_convs[i](feature) + feature = self._projection_norms[i](feature, training=training) + feature = self._activation(feature) + + shape = tf.shape(feature) + x = tf.image.resize( + x, size=[shape[1], shape[2]], + method=self._config_dict['interpolation']) + x = tf.cast(x, dtype=feature.dtype) + x = tf.concat([x, feature], axis=self._channel_axis) + + x = self._fusion_convs[i](x) + x = self._fusion_norms[i](x, training=training) + x = self._activation(x) + return x + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) diff --git a/official/projects/panoptic/modeling/layers/panoptic_deeplab_merge.py b/official/projects/panoptic/modeling/layers/panoptic_deeplab_merge.py new file mode 100644 index 0000000000000000000000000000000000000000..764b15e19ef64e58537b8af9b6595756c635a6d5 --- /dev/null +++ b/official/projects/panoptic/modeling/layers/panoptic_deeplab_merge.py @@ -0,0 +1,568 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""This file contains functions to post-process Panoptic-DeepLab results. + +Note that the postprocessing class and the supporting functions are branched +from: +https://github.com/google-research/deeplab2/blob/main/model/post_processor/panoptic_deeplab.py +with minor changes. +""" + +import functools +from typing import Dict, List, Text, Tuple + +import tensorflow as tf + +from official.projects.panoptic.ops import mask_ops + + +def _add_zero_padding(input_tensor: tf.Tensor, kernel_size: int, + rank: int) -> tf.Tensor: + """Adds zero-padding to the input_tensor.""" + pad_total = kernel_size - 1 + pad_begin = pad_total // 2 + pad_end = pad_total - pad_begin + if rank == 3: + return tf.pad( + input_tensor, + paddings=[[pad_begin, pad_end], [pad_begin, pad_end], [0, 0]]) + else: + return tf.pad( + input_tensor, + paddings=[[0, 0], [pad_begin, pad_end], [pad_begin, pad_end], [0, 0]]) + + +def _get_semantic_predictions(semantic_logits: tf.Tensor) -> tf.Tensor: + """Computes the semantic classes from the predictions. + + Args: + semantic_logits: A tf.tensor of shape [batch, height, width, classes]. + Returns: + A tf.Tensor containing the semantic class prediction of shape + [batch, height, width]. + """ + return tf.argmax(semantic_logits, axis=-1, output_type=tf.int32) + + +def _get_instance_centers_from_heatmap( + center_heatmap: tf.Tensor, + center_threshold: float, + nms_kernel_size: int, + keep_k_centers: int) -> Tuple[tf.Tensor, tf.Tensor]: + """Computes a list of instance centers. + + Args: + center_heatmap: A tf.Tensor of shape [height, width, 1]. + center_threshold: A float setting the threshold for the center heatmap. + nms_kernel_size: An integer specifying the nms kernel size. + keep_k_centers: An integer specifying the number of centers to keep (K). + Non-positive values will keep all centers. + Returns: + A tuple of + - tf.Tensor of shape [N, 2] containing N center coordinates (after + non-maximum suppression) in (y, x) order. + - tf.Tensor of shape [height, width] containing the center heatmap after + non-maximum suppression. + """ + # Threshold center map. + center_heatmap = tf.where( + tf.greater(center_heatmap, center_threshold), center_heatmap, 0.0) + + # Non-maximum suppression. + padded_map = _add_zero_padding(center_heatmap, nms_kernel_size, rank=3) + pooled_center_heatmap = tf.keras.backend.pool2d( + tf.expand_dims(padded_map, 0), + pool_size=(nms_kernel_size, nms_kernel_size), + strides=(1, 1), + padding='valid', + pool_mode='max') + center_heatmap = tf.where( + tf.equal(pooled_center_heatmap, center_heatmap), center_heatmap, 0.0) + center_heatmap = tf.squeeze(center_heatmap, axis=[0, 3]) + + # `centers` is of shape (N, 2) with (y, x) order of the second dimension. + centers = tf.where(tf.greater(center_heatmap, 0.0)) + + if keep_k_centers > 0 and tf.shape(centers)[0] > keep_k_centers: + topk_scores, _ = tf.math.top_k( + tf.reshape(center_heatmap, [-1]), keep_k_centers, sorted=False) + centers = tf.where(tf.greater(center_heatmap, topk_scores[-1])) + + return centers, center_heatmap + + +def _find_closest_center_per_pixel(centers: tf.Tensor, + center_offsets: tf.Tensor) -> tf.Tensor: + """Assigns all pixels to their closest center. + + Args: + centers: A tf.Tensor of shape [N, 2] containing N centers with coordinate + order (y, x). + center_offsets: A tf.Tensor of shape [height, width, 2]. + Returns: + A tf.Tensor of shape [height, width] containing the index of the closest + center, per pixel. + """ + height = tf.shape(center_offsets)[0] + width = tf.shape(center_offsets)[1] + + x_coord, y_coord = tf.meshgrid(tf.range(width), tf.range(height)) + coord = tf.stack([y_coord, x_coord], axis=-1) + + center_per_pixel = tf.cast(coord, tf.float32) + center_offsets + + # centers: [N, 2] -> [N, 1, 2]. + # center_per_pixel: [H, W, 2] -> [1, H*W, 2]. + centers = tf.cast(tf.expand_dims(centers, 1), tf.float32) + center_per_pixel = tf.reshape(center_per_pixel, [height*width, 2]) + center_per_pixel = tf.expand_dims(center_per_pixel, 0) + + # distances: [N, H*W]. + distances = tf.norm(centers - center_per_pixel, axis=-1) + + return tf.reshape(tf.argmin(distances, axis=0), [height, width]) + + +def _get_instances_from_heatmap_and_offset( + semantic_segmentation: tf.Tensor, center_heatmap: tf.Tensor, + center_offsets: tf.Tensor, center_threshold: float, + thing_class_ids: tf.Tensor, nms_kernel_size: int, + keep_k_centers: int) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]: + """Computes the instance assignment per pixel. + + Args: + semantic_segmentation: A tf.Tensor containing the semantic labels of shape + [height, width]. + center_heatmap: A tf.Tensor of shape [height, width, 1]. + center_offsets: A tf.Tensor of shape [height, width, 2]. + center_threshold: A float setting the threshold for the center heatmap. + thing_class_ids: A tf.Tensor of shape [N] containing N thing indices. + nms_kernel_size: An integer specifying the nms kernel size. + keep_k_centers: An integer specifying the number of centers to keep. + Negative values will keep all centers. + Returns: + A tuple of: + - tf.Tensor containing the instance segmentation (filtered with the `thing` + segmentation from the semantic segmentation output) with shape + [height, width]. + - tf.Tensor containing the processed centermap with shape [height, width]. + - tf.Tensor containing instance scores (where higher "score" is a reasonable + signal of a higher confidence detection.) Will be of shape [height, width] + with the score for a pixel being the score of the instance it belongs to. + The scores will be zero for pixels in background/"stuff" regions. + """ + thing_segmentation = tf.zeros_like(semantic_segmentation) + for thing_id in thing_class_ids: + thing_segmentation = tf.where(tf.equal(semantic_segmentation, thing_id), + 1, + thing_segmentation) + + centers, processed_center_heatmap = _get_instance_centers_from_heatmap( + center_heatmap, center_threshold, nms_kernel_size, keep_k_centers) + if tf.shape(centers)[0] == 0: + return (tf.zeros_like(semantic_segmentation), processed_center_heatmap, + tf.zeros_like(processed_center_heatmap)) + + instance_center_index = _find_closest_center_per_pixel( + centers, center_offsets) + # Instance IDs should start with 1. So we use the index into the centers, but + # shifted by 1. + instance_segmentation = tf.cast(instance_center_index, tf.int32) + 1 + + # The value of the heatmap at an instance's center is used as the score + # for that instance. + instance_scores = tf.gather_nd(processed_center_heatmap, centers) + # This will map the instance scores back to the image space: where each pixel + # has a value equal to the score of its instance. + flat_center_index = tf.reshape(instance_center_index, [-1]) + instance_score_map = tf.gather(instance_scores, flat_center_index) + instance_score_map = tf.reshape(instance_score_map, + tf.shape(instance_segmentation)) + instance_score_map *= tf.cast(thing_segmentation, tf.float32) + + return (thing_segmentation * instance_segmentation, processed_center_heatmap, + instance_score_map) + + +@tf.function +def _get_panoptic_predictions( + semantic_logits: tf.Tensor, center_heatmap: tf.Tensor, + center_offsets: tf.Tensor, center_threshold: float, + thing_class_ids: tf.Tensor, label_divisor: int, stuff_area_limit: int, + void_label: int, nms_kernel_size: int, keep_k_centers: int +) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]: + """Computes the semantic class and instance ID per pixel. + + Args: + semantic_logits: A tf.Tensor of shape [batch, height, width, classes]. + center_heatmap: A tf.Tensor of shape [batch, height, width, 1]. + center_offsets: A tf.Tensor of shape [batch, height, width, 2]. + center_threshold: A float setting the threshold for the center heatmap. + thing_class_ids: A tf.Tensor of shape [N] containing N thing indices. + label_divisor: An integer specifying the label divisor of the dataset. + stuff_area_limit: An integer specifying the number of pixels that stuff + regions need to have at least. The stuff region will be included in the + panoptic prediction, only if its area is larger than the limit; otherwise, + it will be re-assigned as void_label. + void_label: An integer specifying the void label. + nms_kernel_size: An integer specifying the nms kernel size. + keep_k_centers: An integer specifying the number of centers to keep. + Negative values will keep all centers. + Returns: + A tuple of: + - the panoptic prediction as tf.Tensor with shape [batch, height, width]. + - the centermap prediction as tf.Tensor with shape [batch, height, width]. + - the instance score maps as tf.Tensor with shape [batch, height, width]. + - the instance prediction as tf.Tensor with shape [batch, height, width]. + """ + semantic_prediction = _get_semantic_predictions(semantic_logits) + batch_size = tf.shape(semantic_logits)[0] + + instance_map_lists = tf.TensorArray( + tf.int32, size=batch_size, dynamic_size=False) + center_map_lists = tf.TensorArray( + tf.float32, size=batch_size, dynamic_size=False) + instance_score_map_lists = tf.TensorArray( + tf.float32, size=batch_size, dynamic_size=False) + + for i in tf.range(batch_size): + (instance_map, center_map, + instance_score_map) = _get_instances_from_heatmap_and_offset( + semantic_prediction[i, ...], center_heatmap[i, ...], + center_offsets[i, ...], center_threshold, thing_class_ids, + nms_kernel_size, keep_k_centers) + instance_map_lists = instance_map_lists.write(i, instance_map) + center_map_lists = center_map_lists.write(i, center_map) + instance_score_map_lists = instance_score_map_lists.write( + i, instance_score_map) + + # This does not work with unknown shapes. + instance_maps = instance_map_lists.stack() + center_maps = center_map_lists.stack() + instance_score_maps = instance_score_map_lists.stack() + + panoptic_prediction = _merge_semantic_and_instance_maps( + semantic_prediction, instance_maps, thing_class_ids, label_divisor, + stuff_area_limit, void_label) + return (panoptic_prediction, center_maps, instance_score_maps, instance_maps) + + +@tf.function +def _merge_semantic_and_instance_maps( + semantic_prediction: tf.Tensor, + instance_maps: tf.Tensor, + thing_class_ids: tf.Tensor, + label_divisor: int, + stuff_area_limit: int, + void_label: int) -> tf.Tensor: + """Merges semantic and instance maps to obtain panoptic segmentation. + + This function merges the semantic segmentation and class-agnostic + instance segmentation to form the panoptic segmentation. In particular, + the class label of each instance mask is inferred from the majority + votes from the corresponding pixels in the semantic segmentation. This + operation is first proposed in the DeeperLab paper and adopted by the + Panoptic-DeepLab. + - DeeperLab: Single-Shot Image Parser, T-J Yang, et al. arXiv:1902.05093. + - Panoptic-DeepLab, B. Cheng, et al. In CVPR, 2020. + Note that this function only supports batch = 1 for simplicity. Additionally, + this function has a slightly different implementation from the provided + TensorFlow implementation `merge_ops` but with a similar performance. This + function is mainly used as a backup solution when you could not successfully + compile the provided TensorFlow implementation. To reproduce our results, + please use the provided TensorFlow implementation (i.e., not use this + function, but the `merge_ops.merge_semantic_and_instance_maps`). + + Args: + semantic_prediction: A tf.Tensor of shape [batch, height, width]. + instance_maps: A tf.Tensor of shape [batch, height, width]. + thing_class_ids: A tf.Tensor of shape [N] containing N thing indices. + label_divisor: An integer specifying the label divisor of the dataset. + stuff_area_limit: An integer specifying the number of pixels that stuff + regions need to have at least. The stuff region will be included in the + panoptic prediction, only if its area is larger than the limit; otherwise, + it will be re-assigned as void_label. + void_label: An integer specifying the void label. + Returns: + panoptic_prediction: A tf.Tensor with shape [batch, height, width]. + """ + prediction_shape = semantic_prediction.get_shape().as_list() + # This implementation only supports batch size of 1. Since model construction + # might lose batch size information (and leave it to None), override it here. + prediction_shape[0] = 1 + semantic_prediction = tf.ensure_shape(semantic_prediction, prediction_shape) + instance_maps = tf.ensure_shape(instance_maps, prediction_shape) + + # Default panoptic_prediction to have semantic label = void_label. + panoptic_prediction = tf.ones_like( + semantic_prediction) * void_label * label_divisor + + # Start to paste predicted `thing` regions to panoptic_prediction. + # Infer `thing` segmentation regions from semantic prediction. + semantic_thing_segmentation = tf.zeros_like(semantic_prediction, + dtype=tf.bool) + for thing_class in thing_class_ids: + semantic_thing_segmentation = tf.math.logical_or( + semantic_thing_segmentation, + semantic_prediction == thing_class) + # Keep track of how many instances for each semantic label. + num_instance_per_semantic_label = tf.TensorArray( + tf.int32, size=0, dynamic_size=True, clear_after_read=False) + instance_ids, _ = tf.unique(tf.reshape(instance_maps, [-1])) + for instance_id in instance_ids: + # Instance ID 0 is reserved for crowd region. + if instance_id == 0: + continue + thing_mask = tf.math.logical_and(instance_maps == instance_id, + semantic_thing_segmentation) + if tf.reduce_sum(tf.cast(thing_mask, tf.int32)) == 0: + continue + semantic_bin_counts = tf.math.bincount( + tf.boolean_mask(semantic_prediction, thing_mask)) + semantic_majority = tf.cast( + tf.math.argmax(semantic_bin_counts), tf.int32) + + while num_instance_per_semantic_label.size() <= semantic_majority: + num_instance_per_semantic_label = num_instance_per_semantic_label.write( + num_instance_per_semantic_label.size(), 0) + + new_instance_id = ( + num_instance_per_semantic_label.read(semantic_majority) + 1) + num_instance_per_semantic_label = num_instance_per_semantic_label.write( + semantic_majority, new_instance_id) + panoptic_prediction = tf.where( + thing_mask, + tf.ones_like(panoptic_prediction) * semantic_majority * label_divisor + + new_instance_id, + panoptic_prediction) + + # Done with `num_instance_per_semantic_label` tensor array. + num_instance_per_semantic_label.close() + + # Start to paste predicted `stuff` regions to panoptic prediction. + instance_stuff_regions = instance_maps == 0 + semantic_ids, _ = tf.unique(tf.reshape(semantic_prediction, [-1])) + for semantic_id in semantic_ids: + if tf.reduce_sum(tf.cast(thing_class_ids == semantic_id, tf.int32)) > 0: + continue + # Check stuff area. + stuff_mask = tf.math.logical_and(semantic_prediction == semantic_id, + instance_stuff_regions) + stuff_area = tf.reduce_sum(tf.cast(stuff_mask, tf.int32)) + if stuff_area >= stuff_area_limit: + panoptic_prediction = tf.where( + stuff_mask, + tf.ones_like(panoptic_prediction) * semantic_id * label_divisor, + panoptic_prediction) + + return panoptic_prediction + + +class PostProcessor(tf.keras.layers.Layer): + """This class contains code of a Panoptic-Deeplab post-processor.""" + + def __init__( + self, + output_size: List[int], + center_score_threshold: float, + thing_class_ids: List[int], + label_divisor: int, + stuff_area_limit: int, + ignore_label: int, + nms_kernel: int, + keep_k_centers: int, + rescale_predictions: bool, + **kwargs): + """Initializes a Panoptic-Deeplab post-processor. + + Args: + output_size: A `List` of integers that represent the height and width of + the output mask. + center_score_threshold: A float setting the threshold for the center + heatmap. + thing_class_ids: An integer list shape [N] containing N thing indices. + label_divisor: An integer specifying the label divisor of the dataset. + stuff_area_limit: An integer specifying the number of pixels that stuff + regions need to have at least. The stuff region will be included in the + panoptic prediction, only if its area is larger than the limit; + otherwise, it will be re-assigned as void_label. + ignore_label: An integer specifying the void label. + nms_kernel: An integer specifying the nms kernel size. + keep_k_centers: An integer specifying the number of centers to keep. + Negative values will keep all centers. + rescale_predictions: `bool`, whether to scale back prediction to original + image sizes. If True, image_info is used to rescale predictions. + **kwargs: additional kwargs arguments. + """ + super(PostProcessor, self).__init__(**kwargs) + + self._config_dict = { + 'output_size': output_size, + 'center_score_threshold': center_score_threshold, + 'thing_class_ids': thing_class_ids, + 'label_divisor': label_divisor, + 'stuff_area_limit': stuff_area_limit, + 'ignore_label': ignore_label, + 'nms_kernel': nms_kernel, + 'keep_k_centers': keep_k_centers, + 'rescale_predictions': rescale_predictions + } + self._post_processor = functools.partial( + _get_panoptic_predictions, + center_threshold=center_score_threshold, + thing_class_ids=tf.convert_to_tensor(thing_class_ids), + label_divisor=label_divisor, + stuff_area_limit=stuff_area_limit, + void_label=ignore_label, + nms_kernel_size=nms_kernel, + keep_k_centers=keep_k_centers) + + def _resize_and_pad_masks(self, mask, image_info): + """Resizes masks to match the original image shape and pads to`output_size`. + + Args: + mask: a padded mask tensor. + image_info: a tensor that holds information about original and + preprocessed images. + Returns: + resized and padded masks: tf.Tensor. + """ + rescale_size = tf.cast( + tf.math.ceil(image_info[1, :] / image_info[2, :]), tf.int32) + image_shape = tf.cast(image_info[0, :], tf.int32) + offsets = tf.cast(image_info[3, :], tf.int32) + + mask = tf.image.resize( + mask, + rescale_size, + method='bilinear') + mask = tf.image.crop_to_bounding_box( + mask, + offsets[0], offsets[1], + image_shape[0], + image_shape[1]) + mask = tf.image.pad_to_bounding_box( + mask, 0, 0, + self._config_dict['output_size'][0], + self._config_dict['output_size'][1]) + return mask + + def _resize_and_pad_offset_mask(self, mask, image_info): + """Rescales and resizes offset masks and pads to`output_size`. + + Args: + mask: a padded offset mask tensor. + image_info: a tensor that holds information about original and + preprocessed images. + Returns: + rescaled, resized and padded masks: tf.Tensor. + """ + rescale_size = tf.cast( + tf.math.ceil(image_info[1, :] / image_info[2, :]), tf.int32) + image_shape = tf.cast(image_info[0, :], tf.int32) + offsets = tf.cast(image_info[3, :], tf.int32) + + mask = mask_ops.resize_and_rescale_offsets( + tf.expand_dims(mask, axis=0), + rescale_size)[0] + mask = tf.image.crop_to_bounding_box( + mask, + offsets[0], offsets[1], + image_shape[0], + image_shape[1]) + mask = tf.image.pad_to_bounding_box( + mask, 0, 0, + self._config_dict['output_size'][0], + self._config_dict['output_size'][1]) + return mask + + def call( + self, + result_dict: Dict[Text, tf.Tensor], + image_info: tf.Tensor) -> Dict[Text, tf.Tensor]: + """Performs the post-processing given model predicted results. + + Args: + result_dict: A dictionary of tf.Tensor containing model results. The dict + has to contain + - segmentation_outputs + - instance_centers_heatmap + - instance_centers_offset + image_info: A tf.Tensor of image infos. + + Returns: + The post-processed dict of tf.Tensor, containing the following keys: + - panoptic_outputs + - category_mask + - instance_mask + - instance_centers + - instance_score + """ + if self._config_dict['rescale_predictions']: + segmentation_outputs = tf.map_fn( + fn=lambda x: self._resize_and_pad_masks(x[0], x[1]), + elems=(result_dict['segmentation_outputs'], image_info), + fn_output_signature=tf.float32, + parallel_iterations=32) + instance_centers_heatmap = tf.map_fn( + fn=lambda x: self._resize_and_pad_masks(x[0], x[1]), + elems=(result_dict['instance_centers_heatmap'], image_info), + fn_output_signature=tf.float32, + parallel_iterations=32) + instance_centers_offset = tf.map_fn( + fn=lambda x: self._resize_and_pad_offset_mask(x[0], x[1]), + elems=(result_dict['instance_centers_offset'], image_info), + fn_output_signature=tf.float32, + parallel_iterations=32) + else: + segmentation_outputs = tf.image.resize( + result_dict['segmentation_outputs'], + size=self._config_dict['output_size'], + method='bilinear') + instance_centers_heatmap = tf.image.resize( + result_dict['instance_centers_heatmap'], + size=self._config_dict['output_size'], + method='bilinear') + instance_centers_offset = mask_ops.resize_and_rescale_offsets( + result_dict['instance_centers_offset'], + target_size=self._config_dict['output_size']) + + processed_dict = {} + + (processed_dict['panoptic_outputs'], + processed_dict['instance_centers'], + processed_dict['instance_scores'], + _) = self._post_processor( + tf.nn.softmax(segmentation_outputs, axis=-1), + instance_centers_heatmap, + instance_centers_offset) + + label_divisor = self._config_dict['label_divisor'] + processed_dict['category_mask'] = ( + processed_dict['panoptic_outputs'] // label_divisor) + processed_dict['instance_mask'] = ( + processed_dict['panoptic_outputs'] % label_divisor) + + processed_dict.update({ + 'segmentation_outputs': result_dict['segmentation_outputs']}) + + return processed_dict + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/projects/panoptic/modeling/layers/panoptic_segmentation_generator.py b/official/projects/panoptic/modeling/layers/panoptic_segmentation_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..cecc4661d330076029cf429aece197fdabf1c84d --- /dev/null +++ b/official/projects/panoptic/modeling/layers/panoptic_segmentation_generator.py @@ -0,0 +1,617 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definition for postprocessing layer to genrate panoptic segmentations.""" + +from typing import Any, Dict, List, Optional, Tuple + +import tensorflow as tf + +from official.projects.panoptic.modeling.layers import paste_masks +from official.vision.ops import spatial_transform_ops + + +def _batch_count_ones(masks: tf.Tensor, + dtype: tf.dtypes.DType = tf.int32) -> tf.Tensor: + """Counts the ones/trues for each mask in the batch. + + Args: + masks: A tensor in shape (..., height, width) with arbitrary numbers of + batch dimensions. + dtype: DType of the resulting tensor. Default is tf.int32. + + Returns: + A tensor which contains the count of non-zero elements for each mask in the + batch. The rank of the resulting tensor is equal to rank(masks) - 2. + """ + masks_shape = masks.get_shape().as_list() + if len(masks_shape) < 2: + raise ValueError( + 'Expected the input masks (..., height, width) has rank >= 2, was: %s' % + masks_shape) + return tf.reduce_sum(tf.cast(masks, dtype), axis=[-2, -1]) + + +class PanopticSegmentationGenerator(tf.keras.layers.Layer): + """Panoptic segmentation generator layer.""" + + def __init__( + self, + output_size: List[int], + max_num_detections: int, + stuff_classes_offset: int, + mask_binarize_threshold: float = 0.5, + score_threshold: float = 0.5, + things_overlap_threshold: float = 0.5, + stuff_area_threshold: float = 4096, + things_class_label: int = 1, + void_class_label: int = 0, + void_instance_id: int = -1, + rescale_predictions: bool = False, + **kwargs): + """Generates panoptic segmentation masks. + + Args: + output_size: A `List` of integers that represent the height and width of + the output mask. + max_num_detections: `int` for maximum number of detections. + stuff_classes_offset: An `int` that is added to the output of the + semantic segmentation mask to make sure that the stuff class ids do not + ovelap with the thing class ids of the MaskRCNN outputs. + mask_binarize_threshold: A `float` + score_threshold: A `float` representing the threshold for deciding + when to remove objects based on score. + things_overlap_threshold: A `float` representing a threshold for deciding + to ignore a thing if overlap is above the threshold. + stuff_area_threshold: A `float` representing a threshold for deciding to + to ignore a stuff class if area is below certain threshold. + things_class_label: An `int` that represents a single merged category of + all thing classes in the semantic segmentation output. + void_class_label: An `int` that is used to represent empty or unlabelled + regions of the mask + void_instance_id: An `int` that is used to denote regions that are not + assigned to any thing class. That is, void_instance_id are assigned to + both stuff regions and empty regions. + rescale_predictions: `bool`, whether to scale back prediction to original + image sizes. If True, image_info is used to rescale predictions. + **kwargs: additional kewargs arguments. + """ + self._output_size = output_size + self._max_num_detections = max_num_detections + self._stuff_classes_offset = stuff_classes_offset + self._mask_binarize_threshold = mask_binarize_threshold + self._score_threshold = score_threshold + self._things_overlap_threshold = things_overlap_threshold + self._stuff_area_threshold = stuff_area_threshold + self._things_class_label = things_class_label + self._void_class_label = void_class_label + self._void_instance_id = void_instance_id + self._rescale_predictions = rescale_predictions + + self._config_dict = { + 'output_size': output_size, + 'max_num_detections': max_num_detections, + 'stuff_classes_offset': stuff_classes_offset, + 'mask_binarize_threshold': mask_binarize_threshold, + 'score_threshold': score_threshold, + 'things_class_label': things_class_label, + 'void_class_label': void_class_label, + 'void_instance_id': void_instance_id, + 'rescale_predictions': rescale_predictions + } + super().__init__(**kwargs) + + def build(self, input_shape: tf.TensorShape): + grid_sampler = paste_masks.BilinearGridSampler(align_corners=False) + self._paste_masks_fn = paste_masks.PasteMasks( + output_size=self._output_size, grid_sampler=grid_sampler) + super().build(input_shape) + + def _generate_panoptic_masks( + self, boxes: tf.Tensor, scores: tf.Tensor, classes: tf.Tensor, + detections_masks: tf.Tensor, + segmentation_mask: tf.Tensor) -> Dict[str, tf.Tensor]: + """Generates panoptic masks for a single image. + + This function implements the following steps to merge instance and semantic + segmentation masks described in https://arxiv.org/pdf/1901.02446.pdf + Steps: + 1. resolving overlaps between different instances based on their + confidence scores + 2. resolving overlaps between instance and semantic segmentation + outputs in favor of instances + 3. removing any stuff regions labeled other or under a given area + threshold. + Args: + boxes: A `tf.Tensor` of shape [num_rois, 4], representing the bounding + boxes for detected objects. + scores: A `tf.Tensor` of shape [num_rois], representing the + confidence scores for each object. + classes: A `tf.Tensor` of shape [num_rois], representing the class + for each object. + detections_masks: A `tf.Tensor` of shape + [num_rois, mask_height, mask_width, 1], representing the cropped mask + for each object. + segmentation_mask: A `tf.Tensor` of shape [height, width], representing + the semantic segmentation output. + Returns: + Dict with the following keys: + - category_mask: A `tf.Tensor` for category masks. + - instance_mask: A `tf.Tensor for instance masks. + """ + + # Offset stuff class predictions + segmentation_mask = tf.where( + tf.logical_or( + tf.equal(segmentation_mask, self._things_class_label), + tf.equal(segmentation_mask, self._void_class_label)), + segmentation_mask, + segmentation_mask + self._stuff_classes_offset + ) + # sort instances by their scores + sorted_indices = tf.argsort(scores, direction='DESCENDING') + + mask_shape = self._output_size + [1] + category_mask = tf.ones(mask_shape, + dtype=tf.float32) * self._void_class_label + instance_mask = tf.ones( + mask_shape, dtype=tf.float32) * self._void_instance_id + + # filter instances with low confidence + sorted_scores = tf.sort(scores, direction='DESCENDING') + + valid_indices = tf.where(sorted_scores > self._score_threshold) + + # if no instance has sufficient confidence score, skip merging + # instance segmentation masks + if tf.shape(valid_indices)[0] > 0: + loop_end_idx = valid_indices[-1, 0] + 1 + loop_end_idx = tf.minimum( + tf.cast(loop_end_idx, dtype=tf.int32), + self._max_num_detections) + pasted_masks = self._paste_masks_fn(( + detections_masks[:loop_end_idx], + boxes[:loop_end_idx])) + + # add things segmentation to panoptic masks + for i in range(loop_end_idx): + # we process instances in decending order, which will make sure + # the overlaps are resolved based on confidence score + instance_idx = sorted_indices[i] + + pasted_mask = pasted_masks[instance_idx] + + class_id = tf.cast(classes[instance_idx], dtype=tf.float32) + + # convert sigmoid scores to binary values + binary_mask = tf.greater( + pasted_mask, self._mask_binarize_threshold) + + # filter empty instance masks + if not tf.reduce_sum(tf.cast(binary_mask, tf.float32)) > 0: + continue + + overlap = tf.logical_and( + binary_mask, + tf.not_equal(category_mask, self._void_class_label)) + binary_mask_area = tf.reduce_sum( + tf.cast(binary_mask, dtype=tf.float32)) + overlap_area = tf.reduce_sum( + tf.cast(overlap, dtype=tf.float32)) + + # skip instance that have a big enough overlap with instances with + # higer scores + if overlap_area / binary_mask_area > self._things_overlap_threshold: + continue + + # fill empty regions in category_mask represented by + # void_class_label with class_id of the instance. + category_mask = tf.where( + tf.logical_and( + binary_mask, tf.equal(category_mask, self._void_class_label)), + tf.ones_like(category_mask) * class_id, category_mask) + + # fill empty regions in the instance_mask represented by + # void_instance_id with the id of the instance, starting from 1 + instance_mask = tf.where( + tf.logical_and( + binary_mask, + tf.equal(instance_mask, self._void_instance_id)), + tf.ones_like(instance_mask) * + tf.cast(instance_idx + 1, tf.float32), instance_mask) + + stuff_class_ids = tf.unique(tf.reshape(segmentation_mask, [-1])).y + for stuff_class_id in stuff_class_ids: + if stuff_class_id == self._things_class_label: + continue + + stuff_mask = tf.logical_and( + tf.equal(segmentation_mask, stuff_class_id), + tf.equal(category_mask, self._void_class_label)) + + stuff_mask_area = tf.reduce_sum( + tf.cast(stuff_mask, dtype=tf.float32)) + + if stuff_mask_area < self._stuff_area_threshold: + continue + + category_mask = tf.where( + stuff_mask, + tf.ones_like(category_mask) * stuff_class_id, + category_mask) + + results = { + 'category_mask': category_mask[:, :, 0], + 'instance_mask': instance_mask[:, :, 0] + } + return results + + def _resize_and_pad_masks(self, mask, image_info): + """Resizes masks to match the original image shape and pads to`output_size`. + + Args: + mask: a padded mask tensor. + image_info: a tensor that holds information about original and + preprocessed images. + Returns: + resized and padded masks: tf.Tensor. + """ + rescale_size = tf.cast( + tf.math.ceil(image_info[1, :] / image_info[2, :]), tf.int32) + image_shape = tf.cast(image_info[0, :], tf.int32) + offsets = tf.cast(image_info[3, :], tf.int32) + + mask = tf.image.resize( + mask, + rescale_size, + method='bilinear') + mask = tf.image.crop_to_bounding_box( + mask, + offsets[0], offsets[1], + image_shape[0], + image_shape[1]) + mask = tf.image.pad_to_bounding_box( + mask, 0, 0, self._output_size[0], self._output_size[1]) + return mask + + def call(self, + inputs: tf.Tensor, + image_info: Optional[tf.Tensor] = None) -> Dict[str, tf.Tensor]: + detections = inputs + + batched_scores = detections['detection_scores'] + batched_classes = detections['detection_classes'] + batched_detections_masks = tf.expand_dims( + detections['detection_masks'], axis=-1) + batched_boxes = detections['detection_boxes'] + batched_segmentation_masks = tf.cast( + detections['segmentation_outputs'], dtype=tf.float32) + + if self._rescale_predictions: + scale = tf.tile( + tf.cast(image_info[:, 2:3, :], dtype=batched_boxes.dtype), + multiples=[1, 1, 2]) + batched_boxes /= scale + + batched_segmentation_masks = tf.map_fn( + fn=lambda x: self._resize_and_pad_masks(x[0], x[1]), + elems=( + batched_segmentation_masks, + image_info), + fn_output_signature=tf.float32, + parallel_iterations=32) + else: + batched_segmentation_masks = tf.image.resize( + batched_segmentation_masks, + size=self._output_size, + method='bilinear') + + batched_segmentation_masks = tf.expand_dims(tf.cast( + tf.argmax(batched_segmentation_masks, axis=-1), + dtype=tf.float32), axis=-1) + + panoptic_masks = tf.map_fn( + fn=lambda x: self._generate_panoptic_masks( # pylint:disable=g-long-lambda + x[0], x[1], x[2], x[3], x[4]), + elems=( + batched_boxes, + batched_scores, + batched_classes, + batched_detections_masks, + batched_segmentation_masks), + fn_output_signature={ + 'category_mask': tf.float32, + 'instance_mask': tf.float32 + }, parallel_iterations=32) + + for k, v in panoptic_masks.items(): + panoptic_masks[k] = tf.cast(v, dtype=tf.int32) + + return panoptic_masks + + def get_config(self) -> Dict[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config: Dict[str, + Any]) -> 'PanopticSegmentationGenerator': + return cls(**config) + + +class PanopticSegmentationGeneratorV2(tf.keras.layers.Layer): + """Panoptic segmentation generator layer V2.""" + + def __init__(self, + output_size: List[int], + max_num_detections: int, + stuff_classes_offset: int, + mask_binarize_threshold: float = 0.5, + score_threshold: float = 0.5, + things_overlap_threshold: float = 0.5, + stuff_area_threshold: float = 4096, + things_class_label: int = 1, + void_class_label: int = 0, + void_instance_id: int = -1, + rescale_predictions: bool = False, + **kwargs): + """Generates panoptic segmentation masks. + + Args: + output_size: A `List` of integers that represent the height and width of + the output mask. + max_num_detections: `int` for maximum number of detections. + stuff_classes_offset: An `int` that is added to the output of the semantic + segmentation mask to make sure that the stuff class ids do not ovelap + with the thing class ids of the MaskRCNN outputs. + mask_binarize_threshold: A `float` + score_threshold: A `float` representing the threshold for deciding when to + remove objects based on score. + things_overlap_threshold: A `float` representing a threshold for deciding + to ignore a thing if overlap is above the threshold. + stuff_area_threshold: A `float` representing a threshold for deciding to + to ignore a stuff class if area is below certain threshold. + things_class_label: An `int` that represents a single merged category of + all thing classes in the semantic segmentation output. + void_class_label: An `int` that is used to represent empty or unlabelled + regions of the mask + void_instance_id: An `int` that is used to denote regions that are not + assigned to any thing class. That is, void_instance_id are assigned to + both stuff regions and empty regions. + rescale_predictions: `bool`, whether to scale back prediction to original + image sizes. If True, image_info is used to rescale predictions. + **kwargs: additional kewargs arguments. + """ + self._output_size = output_size + self._max_num_detections = max_num_detections + self._stuff_classes_offset = stuff_classes_offset + self._mask_binarize_threshold = mask_binarize_threshold + self._score_threshold = score_threshold + self._things_overlap_threshold = things_overlap_threshold + self._stuff_area_threshold = stuff_area_threshold + self._things_class_label = things_class_label + self._void_class_label = void_class_label + self._void_instance_id = void_instance_id + self._rescale_predictions = rescale_predictions + + self._config_dict = { + 'output_size': output_size, + 'max_num_detections': max_num_detections, + 'stuff_classes_offset': stuff_classes_offset, + 'mask_binarize_threshold': mask_binarize_threshold, + 'score_threshold': score_threshold, + 'things_class_label': things_class_label, + 'void_class_label': void_class_label, + 'void_instance_id': void_instance_id, + 'rescale_predictions': rescale_predictions + } + super().__init__(**kwargs) + + def call(self, + inputs: tf.Tensor, + image_info: Optional[tf.Tensor] = None) -> Dict[str, tf.Tensor]: + """Generates panoptic segmentation masks.""" + # (batch_size, num_rois, 4) in absolute coordinates. + detection_boxes = tf.cast(inputs['detection_boxes'], tf.float32) + # (batch_size, num_rois) + detection_classes = tf.cast(inputs['detection_classes'], tf.int32) + # (batch_size, num_rois) + detection_scores = tf.cast(inputs['detection_scores'], tf.float32) + # (batch_size, num_rois, mask_height, mask_width) + detections_masks = tf.cast(inputs['detection_masks'], tf.float32) + # (batch_size, height, width, num_semantic_classes) + segmentation_outputs = tf.cast(inputs['segmentation_outputs'], tf.float32) + + if self._rescale_predictions: + # (batch_size, 2) + original_size = tf.cast(image_info[:, 0, :], tf.float32) + desired_size = tf.cast(image_info[:, 1, :], tf.float32) + image_scale = tf.cast(image_info[:, 2, :], tf.float32) + offset = tf.cast(image_info[:, 3, :], tf.float32) + rescale_size = tf.math.ceil(desired_size / image_scale) + # (batch_size, output_height, output_width, num_semantic_classes) + segmentation_outputs = ( + spatial_transform_ops.bilinear_resize_with_crop_and_pad( + segmentation_outputs, + rescale_size, + crop_offset=offset, + crop_size=original_size, + output_size=self._output_size)) + # (batch_size, 1, 4) + image_scale = tf.tile(image_scale, multiples=[1, 2])[:, tf.newaxis] + detection_boxes /= image_scale + else: + # (batch_size, output_height, output_width, num_semantic_classes) + segmentation_outputs = tf.image.resize( + segmentation_outputs, size=self._output_size, method='bilinear') + + # (batch_size, output_height, output_width) + instance_mask, instance_category_mask = self._generate_instances( + detection_boxes, detection_classes, detection_scores, detections_masks) + + # (batch_size, output_height, output_width) + stuff_category_mask = self._generate_stuffs(segmentation_outputs) + + # (batch_size, output_height, output_width) + category_mask = tf.where((stuff_category_mask != self._void_class_label) & + (instance_category_mask == self._void_class_label), + stuff_category_mask + self._stuff_classes_offset, + instance_category_mask) + + return {'instance_mask': instance_mask, 'category_mask': category_mask} + + def _generate_instances( + self, detection_boxes: tf.Tensor, detection_classes: tf.Tensor, + detection_scores: tf.Tensor, + detections_masks: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: + """Generates instance & category masks from instance segmentation outputs.""" + batch_size = tf.shape(detections_masks)[0] + num_rois = tf.shape(detections_masks)[1] + mask_height = tf.shape(detections_masks)[2] + mask_width = tf.shape(detections_masks)[3] + output_height = self._output_size[0] + output_width = self._output_size[1] + + # (batch_size, num_rois, mask_height, mask_width) + detections_masks = detections_masks * ( + tf.cast((detection_scores > self._score_threshold) & + (detection_classes != self._void_class_label), + detections_masks.dtype)[:, :, tf.newaxis, tf.newaxis]) + + # Resizes and copies the detections_masks to the bounding boxes in the + # output canvas. + # (batch_size, num_rois, output_height, output_width) + pasted_detection_masks = tf.reshape( + spatial_transform_ops.bilinear_resize_to_bbox( + tf.reshape(detections_masks, [-1, mask_height, mask_width]), + tf.reshape(detection_boxes, [-1, 4]), self._output_size), + shape=[-1, num_rois, output_height, output_width]) + + # (batch_size, num_rois, output_height, output_width) + instance_binary_masks = ( + pasted_detection_masks > self._mask_binarize_threshold) + + # Sorts detection related tensors by scores. + # (batch_size, num_rois) + sorted_detection_indices = tf.argsort( + detection_scores, axis=1, direction='DESCENDING') + # (batch_size, num_rois) + sorted_detection_classes = tf.gather( + detection_classes, sorted_detection_indices, batch_dims=1) + # (batch_size, num_rois, output_height, output_width) + sorted_instance_binary_masks = tf.gather( + instance_binary_masks, sorted_detection_indices, batch_dims=1) + # (batch_size, num_rois) + instance_areas = _batch_count_ones( + sorted_instance_binary_masks, dtype=tf.float32) + + init_loop_vars = ( + 0, # i: the loop counter + tf.ones([batch_size, output_height, output_width], dtype=tf.int32) * + self._void_instance_id, # combined_instance_mask + tf.ones([batch_size, output_height, output_width], dtype=tf.int32) * + self._void_class_label # combined_category_mask + ) + + def _copy_instances_loop_body( + i: int, combined_instance_mask: tf.Tensor, + combined_category_mask: tf.Tensor) -> Tuple[int, tf.Tensor, tf.Tensor]: + """Iterates the sorted detections and copies the instances.""" + # (batch_size, output_height, output_width) + instance_binary_mask = sorted_instance_binary_masks[:, i] + + # Masks out the instances that have a big enough overlap with the other + # instances with higher scores. + # (batch_size, ) + overlap_areas = _batch_count_ones( + (combined_instance_mask != self._void_instance_id) + & instance_binary_mask, + dtype=tf.float32) + # (batch_size, ) + instance_overlap_threshold_mask = tf.math.divide_no_nan( + overlap_areas, instance_areas[:, i]) < self._things_overlap_threshold + # (batch_size, output_height, output_width) + instance_binary_mask &= ( + instance_overlap_threshold_mask[:, tf.newaxis, tf.newaxis] + & (combined_instance_mask == self._void_instance_id)) + + # Updates combined_instance_mask. + # (batch_size, ) + instance_id = tf.cast( + sorted_detection_indices[:, i] + 1, # starting from 1 + dtype=combined_instance_mask.dtype) + # (batch_size, output_height, output_width) + combined_instance_mask = tf.where(instance_binary_mask, + instance_id[:, tf.newaxis, tf.newaxis], + combined_instance_mask) + + # Updates combined_category_mask. + # (batch_size, ) + class_id = tf.cast( + sorted_detection_classes[:, i], dtype=combined_category_mask.dtype) + # (batch_size, output_height, output_width) + combined_category_mask = tf.where(instance_binary_mask, + class_id[:, tf.newaxis, tf.newaxis], + combined_category_mask) + + # Returns the updated loop vars. + return ( + i + 1, # Increment the loop counter i + combined_instance_mask, + combined_category_mask) + + # (batch_size, output_height, output_width) + _, instance_mask, category_mask = tf.while_loop( + cond=lambda i, *_: i < num_rois - 1, + body=_copy_instances_loop_body, + loop_vars=init_loop_vars, + parallel_iterations=32, + maximum_iterations=num_rois) + return instance_mask, category_mask + + def _generate_stuffs(self, segmentation_outputs: tf.Tensor) -> tf.Tensor: + """Generates category mask from semantic segmentation outputs.""" + num_semantic_classes = tf.shape(segmentation_outputs)[3] + + # (batch_size, output_height, output_width) + segmentation_masks = tf.argmax( + segmentation_outputs, axis=-1, output_type=tf.int32) + stuff_binary_masks = (segmentation_masks != self._things_class_label) & ( + segmentation_masks != self._void_class_label) + # (batch_size, num_semantic_classes, output_height, output_width) + stuff_class_binary_masks = ((tf.one_hot( + segmentation_masks, num_semantic_classes, axis=1, dtype=tf.int32) == 1) + & tf.expand_dims(stuff_binary_masks, axis=1)) + + # Masks out the stuff class whose area is below the given threshold. + # (batch_size, num_semantic_classes) + stuff_class_areas = _batch_count_ones( + stuff_class_binary_masks, dtype=tf.float32) + # (batch_size, num_semantic_classes, output_height, output_width) + stuff_class_binary_masks &= tf.greater( + stuff_class_areas, self._stuff_area_threshold)[:, :, tf.newaxis, + tf.newaxis] + # (batch_size, output_height, output_width) + stuff_binary_masks = tf.reduce_any(stuff_class_binary_masks, axis=1) + + # (batch_size, output_height, output_width) + return tf.where(stuff_binary_masks, segmentation_masks, + tf.ones_like(segmentation_masks) * self._void_class_label) + + def get_config(self) -> Dict[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config: Dict[str, + Any]) -> 'PanopticSegmentationGeneratorV2': + return cls(**config) diff --git a/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/paste_masks.py b/official/projects/panoptic/modeling/layers/paste_masks.py similarity index 98% rename from official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/paste_masks.py rename to official/projects/panoptic/modeling/layers/paste_masks.py index 1a750be5941a993a49bd7d4a388272cb50dca402..e46ffae8ca362dcc82e0c5e69d70ac59fd00bd8e 100644 --- a/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/paste_masks.py +++ b/official/projects/panoptic/modeling/layers/paste_masks.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/panoptic/modeling/panoptic_deeplab_model.py b/official/projects/panoptic/modeling/panoptic_deeplab_model.py new file mode 100644 index 0000000000000000000000000000000000000000..47ea1d714310f6ea019758bdaf140f29ab21ebd0 --- /dev/null +++ b/official/projects/panoptic/modeling/panoptic_deeplab_model.py @@ -0,0 +1,122 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Build Panoptic Deeplab model.""" +from typing import Any, Mapping, Optional, Union + +import tensorflow as tf +from official.projects.panoptic.modeling.layers import panoptic_deeplab_merge + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class PanopticDeeplabModel(tf.keras.Model): + """Panoptic Deeplab model.""" + + def __init__( + self, + backbone: tf.keras.Model, + semantic_decoder: tf.keras.Model, + semantic_head: tf.keras.layers.Layer, + instance_head: tf.keras.layers.Layer, + instance_decoder: Optional[tf.keras.Model] = None, + post_processor: Optional[panoptic_deeplab_merge.PostProcessor] = None, + **kwargs): + """Panoptic deeplab model initializer. + + Args: + backbone: a backbone network. + semantic_decoder: a decoder network. E.g. FPN. + semantic_head: segmentation head. + instance_head: instance center head. + instance_decoder: Optional decoder network for instance predictions. + post_processor: Optional post processor layer. + **kwargs: keyword arguments to be passed. + """ + super(PanopticDeeplabModel, self).__init__(**kwargs) + + self._config_dict = { + 'backbone': backbone, + 'semantic_decoder': semantic_decoder, + 'instance_decoder': instance_decoder, + 'semantic_head': semantic_head, + 'instance_head': instance_head, + 'post_processor': post_processor + } + self.backbone = backbone + self.semantic_decoder = semantic_decoder + self.instance_decoder = instance_decoder + self.semantic_head = semantic_head + self.instance_head = instance_head + self.post_processor = post_processor + + def call( + self, inputs: tf.Tensor, + image_info: tf.Tensor, + training: bool = None): + if training is None: + training = tf.keras.backend.learning_phase() + + backbone_features = self.backbone(inputs, training=training) + + semantic_features = self.semantic_decoder( + backbone_features, training=training) + + if self.instance_decoder is None: + instance_features = semantic_features + else: + instance_features = self.instance_decoder( + backbone_features, training=training) + + segmentation_outputs = self.semantic_head( + (backbone_features, semantic_features), + training=training) + instance_outputs = self.instance_head( + (backbone_features, instance_features), + training=training) + + outputs = { + 'segmentation_outputs': segmentation_outputs, + 'instance_centers_heatmap': + instance_outputs['instance_centers_heatmap'], + 'instance_centers_offset': + instance_outputs['instance_centers_offset'], + } + if training: + return outputs + + if self.post_processor is not None: + panoptic_masks = self.post_processor(outputs, image_info) + outputs.update(panoptic_masks) + return outputs + + @property + def checkpoint_items( + self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: + """Returns a dictionary of items to be additionally checkpointed.""" + items = dict( + backbone=self.backbone, + semantic_decoder=self.semantic_decoder, + semantic_head=self.semantic_head, + instance_head=self.instance_head) + if self.instance_decoder is not None: + items.update(instance_decoder=self.instance_decoder) + + return items + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) diff --git a/official/vision/beta/projects/panoptic_maskrcnn/modeling/panoptic_maskrcnn_model.py b/official/projects/panoptic/modeling/panoptic_maskrcnn_model.py similarity index 94% rename from official/vision/beta/projects/panoptic_maskrcnn/modeling/panoptic_maskrcnn_model.py rename to official/projects/panoptic/modeling/panoptic_maskrcnn_model.py index 713ae62b203c3ced67b343eacca7af441285fd49..309f7fa7bf7404be1cfd17481b734a6f90f4e5a0 100644 --- a/official/vision/beta/projects/panoptic_maskrcnn/modeling/panoptic_maskrcnn_model.py +++ b/official/projects/panoptic/modeling/panoptic_maskrcnn_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,10 +18,10 @@ from typing import List, Mapping, Optional, Union import tensorflow as tf -from official.vision.beta.modeling import maskrcnn_model +from official.projects.deepmac_maskrcnn.modeling import maskrcnn_model -class PanopticMaskRCNNModel(maskrcnn_model.MaskRCNNModel): +class PanopticMaskRCNNModel(maskrcnn_model.DeepMaskRCNNModel): """The Panoptic Segmentation model.""" def __init__( @@ -49,7 +49,8 @@ class PanopticMaskRCNNModel(maskrcnn_model.MaskRCNNModel): max_level: Optional[int] = None, num_scales: Optional[int] = None, aspect_ratios: Optional[List[float]] = None, - anchor_size: Optional[float] = None, # pytype: disable=annotation-type-mismatch # typed-keras + anchor_size: Optional[float] = None, + use_gt_boxes_for_masks: bool = False, # pytype: disable=annotation-type-mismatch # typed-keras **kwargs): """Initializes the Panoptic Mask R-CNN model. @@ -94,6 +95,7 @@ class PanopticMaskRCNNModel(maskrcnn_model.MaskRCNNModel): aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each scale level. anchor_size: A number representing the scale of size of the base anchor to the feature stride 2^level. + use_gt_boxes_for_masks: `bool`, whether to use only gt boxes for masks. **kwargs: keyword arguments to be passed. """ super(PanopticMaskRCNNModel, self).__init__( @@ -115,6 +117,7 @@ class PanopticMaskRCNNModel(maskrcnn_model.MaskRCNNModel): num_scales=num_scales, aspect_ratios=aspect_ratios, anchor_size=anchor_size, + use_gt_boxes_for_masks=use_gt_boxes_for_masks, **kwargs) self._config_dict.update({ diff --git a/official/projects/panoptic/ops/mask_ops.py b/official/projects/panoptic/ops/mask_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..9bcc542e8ebf0724f985f95f6082848420b7e4ba --- /dev/null +++ b/official/projects/panoptic/ops/mask_ops.py @@ -0,0 +1,55 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Utility functions for masks.""" + +import tensorflow as tf + + +def resize_and_rescale_offsets(input_tensor: tf.Tensor, target_size): + """Bilinearly resizes and rescales the offsets. + + Reference: + https://github.com/google-research/deeplab2/blob/main/model/utils.py#L157 + + Args: + input_tensor: A tf.Tensor of shape [batch, height, width, 2]. + target_size: A list or tuple or 1D tf.Tensor that specifies the height and + width after resizing. + + Returns: + The input_tensor resized to shape `[batch, target_height, target_width, 2]`. + Moreover, the offsets along the y-axis are rescaled by a factor equal to + (target_height - 1) / (reference_height - 1) and the offsets along the + x-axis are rescaled by a factor equal to + (target_width - 1) / (reference_width - 1). + """ + input_size_y = tf.shape(input_tensor)[1] + input_size_x = tf.shape(input_tensor)[2] + dtype = input_tensor.dtype + + scale_y = tf.cast(target_size[0] - 1, dtype=dtype) / tf.cast( + input_size_y - 1, dtype=dtype) + scale_x = tf.cast(target_size[1] - 1, dtype=dtype) / tf.cast( + input_size_x - 1, dtype=dtype) + + target_y, target_x = tf.split( + value=input_tensor, num_or_size_splits=2, axis=3) + target_y *= scale_y + target_x *= scale_x + _ = tf.concat([target_y, target_x], 3) + return tf.image.resize( + input_tensor, + size=target_size, + method=tf.image.ResizeMethod.BILINEAR) diff --git a/official/projects/panoptic/serving/export_saved_model.py b/official/projects/panoptic/serving/export_saved_model.py new file mode 100644 index 0000000000000000000000000000000000000000..9808c64804980f71dc4912a684660a780cd7207d --- /dev/null +++ b/official/projects/panoptic/serving/export_saved_model.py @@ -0,0 +1,130 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Panoptic MaskRCNN model export binary for serving/inference. + +To export a trained checkpoint in saved_model format (shell script): + +CHECKPOINT_PATH = XX +EXPORT_DIR_PATH = XX +CONFIG_FILE_PATH = XX +export_saved_model --export_dir=${EXPORT_DIR_PATH}/ \ + --checkpoint_path=${CHECKPOINT_PATH} \ + --config_file=${CONFIG_FILE_PATH} \ + --batch_size=2 \ + --input_image_size=224,224 +To serve (python): +export_dir_path = XX +input_type = XX +input_images = XX +imported = tf.saved_model.load(export_dir_path) +model_fn = imported.signatures['serving_default'] +output = model_fn(input_images) +""" + +from absl import app +from absl import flags +import tensorflow as tf + +from official.core import exp_factory +from official.modeling import hyperparams +# pylint: disable=unused-import +from official.projects.panoptic.configs import panoptic_deeplab as panoptic_deeplab_cfg +from official.projects.panoptic.configs import panoptic_maskrcnn as panoptic_maskrcnn_cfg +# pylint: enable=unused-import +from official.projects.panoptic.modeling import factory +from official.projects.panoptic.serving import panoptic_deeplab +from official.projects.panoptic.serving import panoptic_maskrcnn +# pylint: disable=unused-import +from official.projects.panoptic.tasks import panoptic_deeplab as panoptic_deeplab_task +from official.projects.panoptic.tasks import panoptic_maskrcnn as panoptic_maskrcnn_task +# pylint: enable=unused-import +from official.vision.serving import export_saved_model_lib + +FLAGS = flags.FLAGS + +flags.DEFINE_string('model', 'panoptic_maskrcnn', + 'model type, one of panoptic_maskrcnn and panoptic_deeplab') +flags.DEFINE_string('experiment', 'panoptic_fpn_coco', + 'experiment type, e.g. panoptic_fpn_coco') +flags.DEFINE_string('export_dir', None, 'The export directory.') +flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path.') +flags.DEFINE_multi_string( + 'config_file', + default=None, + help='YAML/JSON files which specifies overrides. The override order ' + 'follows the order of args. Note that each file ' + 'can be used as an override template to override the default parameters ' + 'specified in Python. If the same parameter is specified in both ' + '`--config_file` and `--params_override`, `config_file` will be used ' + 'first, followed by params_override.') +flags.DEFINE_string( + 'params_override', '', + 'The JSON/YAML file or string which specifies the parameter to be overriden' + ' on top of `config_file` template.') +flags.DEFINE_integer('batch_size', None, 'The batch size.') +flags.DEFINE_string('input_type', 'image_tensor', + 'One of `image_tensor`, `image_bytes`, `tf_example`.') +flags.DEFINE_string( + 'input_image_size', '224,224', + 'The comma-separated string of two integers representing the height,width ' + 'of the input to the model.') + + +def main(_): + + params = exp_factory.get_exp_config(FLAGS.experiment) + for config_file in FLAGS.config_file or []: + params = hyperparams.override_params_dict( + params, config_file, is_strict=True) + if FLAGS.params_override: + params = hyperparams.override_params_dict( + params, FLAGS.params_override, is_strict=True) + + params.validate() + params.lock() + + input_image_size = [int(x) for x in FLAGS.input_image_size.split(',')] + input_specs = tf.keras.layers.InputSpec( + shape=[FLAGS.batch_size, *input_image_size, 3]) + + if FLAGS.model == 'panoptic_deeplab': + build_model = factory.build_panoptic_deeplab + panoptic_module = panoptic_deeplab.PanopticSegmentationModule + elif FLAGS.model == 'panoptic_maskrcnn': + build_model = factory.build_panoptic_maskrcnn + panoptic_module = panoptic_maskrcnn.PanopticSegmentationModule + else: + raise ValueError('Unsupported model type: %s' % FLAGS.model) + + model = build_model(input_specs=input_specs, model_config=params.task.model) + export_module = panoptic_module( + params=params, + model=model, + batch_size=FLAGS.batch_size, + input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], + num_channels=3) + export_saved_model_lib.export_inference_graph( + input_type=FLAGS.input_type, + batch_size=FLAGS.batch_size, + input_image_size=input_image_size, + params=params, + checkpoint_path=FLAGS.checkpoint_path, + export_dir=FLAGS.export_dir, + export_module=export_module, + export_checkpoint_subdir='checkpoint', + export_saved_model_subdir='saved_model') + +if __name__ == '__main__': + app.run(main) diff --git a/official/projects/panoptic/serving/panoptic_deeplab.py b/official/projects/panoptic/serving/panoptic_deeplab.py new file mode 100644 index 0000000000000000000000000000000000000000..0c2ad0607be6d25d55b72dc5263a4ce394d36fe9 --- /dev/null +++ b/official/projects/panoptic/serving/panoptic_deeplab.py @@ -0,0 +1,103 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Panoptic Segmentation input and model functions for serving/inference.""" + +from typing import List + +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.projects.panoptic.modeling import factory +from official.projects.panoptic.modeling import panoptic_deeplab_model +from official.vision.serving import semantic_segmentation + + +class PanopticSegmentationModule( + semantic_segmentation.SegmentationModule): + """Panoptic Deeplab Segmentation Module.""" + + def __init__(self, + params: cfg.ExperimentConfig, + *, + model: tf.keras.Model, + batch_size: int, + input_image_size: List[int], + num_channels: int = 3): + """Initializes panoptic segmentation module for export.""" + + if batch_size is None: + raise ValueError('batch_size cannot be None for panoptic segmentation ' + 'model.') + if not isinstance(model, panoptic_deeplab_model.PanopticDeeplabModel): + raise ValueError('PanopticSegmentationModule module not ' + 'implemented for {} model.'.format(type(model))) + params.task.train_data.preserve_aspect_ratio = True + super(PanopticSegmentationModule, self).__init__( + params=params, + model=model, + batch_size=batch_size, + input_image_size=input_image_size, + num_channels=num_channels) + + def _build_model(self): + input_specs = tf.keras.layers.InputSpec(shape=[self._batch_size] + + self._input_image_size + [3]) + + return factory.build_panoptic_deeplab( + input_specs=input_specs, + model_config=self.params.task.model, + l2_regularizer=None) + + def serve(self, images: tf.Tensor): + """Cast image to float and run inference. + + Args: + images: uint8 Tensor of shape [batch_size, None, None, 3] + + Returns: + Tensor holding detection output logits. + """ + if self._input_type != 'tflite': + with tf.device('cpu:0'): + images = tf.cast(images, dtype=tf.float32) + images_spec = tf.TensorSpec( + shape=self._input_image_size + [3], dtype=tf.float32) + image_info_spec = tf.TensorSpec(shape=[4, 2], dtype=tf.float32) + + images, image_info = tf.nest.map_structure( + tf.identity, + tf.map_fn( + self._build_inputs, + elems=images, + fn_output_signature=(images_spec, image_info_spec), + parallel_iterations=32)) + + outputs = self.model.call( + inputs=images, image_info=image_info, training=False) + + masks = outputs['segmentation_outputs'] + masks = tf.image.resize(masks, self._input_image_size, method='bilinear') + classes = tf.math.argmax(masks, axis=-1) + scores = tf.nn.softmax(masks, axis=-1) + final_outputs = { + 'semantic_logits': masks, + 'semantic_scores': scores, + 'semantic_classes': classes, + 'image_info': image_info, + 'panoptic_category_mask': outputs['category_mask'], + 'panoptic_instance_mask': outputs['instance_mask'], + } + + return final_outputs diff --git a/official/projects/panoptic/serving/panoptic_maskrcnn.py b/official/projects/panoptic/serving/panoptic_maskrcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..d62b073259aad4e9a5ce687461f13a17724d1c7e --- /dev/null +++ b/official/projects/panoptic/serving/panoptic_maskrcnn.py @@ -0,0 +1,145 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Panoptic Segmentation input and model functions for serving/inference.""" + +from typing import List + +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.projects.panoptic.modeling import panoptic_maskrcnn_model +from official.vision.serving import detection + + +class PanopticSegmentationModule(detection.DetectionModule): + """Panoptic Segmentation Module.""" + + def __init__(self, + params: cfg.ExperimentConfig, + *, + model: tf.keras.Model, + batch_size: int, + input_image_size: List[int], + num_channels: int = 3): + """Initializes panoptic segmentation module for export.""" + + if batch_size is None: + raise ValueError('batch_size cannot be None for panoptic segmentation ' + 'model.') + if not isinstance(model, panoptic_maskrcnn_model.PanopticMaskRCNNModel): + raise ValueError('PanopticSegmentationModule module not implemented for ' + '{} model.'.format(type(model))) + + super(PanopticSegmentationModule, self).__init__( + params=params, + model=model, + batch_size=batch_size, + input_image_size=input_image_size, + num_channels=num_channels) + + def serve(self, images: tf.Tensor): + """Cast image to float and run inference. + + Args: + images: uint8 Tensor of shape [batch_size, None, None, 3] + Returns: + Tensor holding detection output logits. + """ + model_params = self.params.task.model + with tf.device('cpu:0'): + images = tf.cast(images, dtype=tf.float32) + + # Tensor Specs for map_fn outputs (images, anchor_boxes, and image_info). + images_spec = tf.TensorSpec(shape=self._input_image_size + [3], + dtype=tf.float32) + + num_anchors = model_params.anchor.num_scales * len( + model_params.anchor.aspect_ratios) * 4 + anchor_shapes = [] + for level in range(model_params.min_level, model_params.max_level + 1): + anchor_level_spec = tf.TensorSpec( + shape=[ + self._input_image_size[0] // 2**level, + self._input_image_size[1] // 2**level, num_anchors + ], + dtype=tf.float32) + anchor_shapes.append((str(level), anchor_level_spec)) + + image_info_spec = tf.TensorSpec(shape=[4, 2], dtype=tf.float32) + + images, anchor_boxes, image_info = tf.nest.map_structure( + tf.identity, + tf.map_fn( + self._build_inputs, + elems=images, + fn_output_signature=(images_spec, dict(anchor_shapes), + image_info_spec), + parallel_iterations=32)) + + # To overcome keras.Model extra limitation to save a model with layers that + # have multiple inputs, we use `model.call` here to trigger the forward + # path. Note that, this disables some keras magics happens in `__call__`. + detections = self.model.call( + images=images, + image_info=image_info, + anchor_boxes=anchor_boxes, + training=False) + + detections.pop('rpn_boxes') + detections.pop('rpn_scores') + detections.pop('cls_outputs') + detections.pop('box_outputs') + detections.pop('backbone_features') + detections.pop('decoder_features') + + # Normalize detection boxes to [0, 1]. Here we first map them to the + # original image size, then normalize them to [0, 1]. + detections['detection_boxes'] = ( + detections['detection_boxes'] / + tf.tile(image_info[:, 2:3, :], [1, 1, 2]) / + tf.tile(image_info[:, 0:1, :], [1, 1, 2])) + + if model_params.detection_generator.apply_nms: + final_outputs = { + 'detection_boxes': detections['detection_boxes'], + 'detection_scores': detections['detection_scores'], + 'detection_classes': detections['detection_classes'], + 'num_detections': detections['num_detections'] + } + else: + final_outputs = { + 'decoded_boxes': detections['decoded_boxes'], + 'decoded_box_scores': detections['decoded_box_scores'] + } + masks = detections['segmentation_outputs'] + masks = tf.image.resize(masks, self._input_image_size, method='bilinear') + classes = tf.math.argmax(masks, axis=-1) + scores = tf.nn.softmax(masks, axis=-1) + final_outputs.update({ + 'detection_masks': detections['detection_masks'], + 'semantic_logits': masks, + 'semantic_scores': scores, + 'semantic_classes': classes, + 'image_info': image_info + }) + if model_params.generate_panoptic_masks: + final_outputs.update({ + 'panoptic_category_mask': + detections['panoptic_outputs']['category_mask'], + 'panoptic_instance_mask': + detections['panoptic_outputs']['instance_mask'], + }) + + return final_outputs diff --git a/official/projects/panoptic/tasks/__init__.py b/official/projects/panoptic/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/panoptic/tasks/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/panoptic/tasks/panoptic_deeplab.py b/official/projects/panoptic/tasks/panoptic_deeplab.py new file mode 100644 index 0000000000000000000000000000000000000000..cb6ddf1322ff69e2feff7158700a4d4a0173e267 --- /dev/null +++ b/official/projects/panoptic/tasks/panoptic_deeplab.py @@ -0,0 +1,387 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Panoptic Deeplab task definition.""" +from typing import Any, Dict, List, Mapping, Optional, Tuple + +from absl import logging +import tensorflow as tf + +from official.common import dataset_fn +from official.core import base_task +from official.core import task_factory +from official.projects.panoptic.configs import panoptic_deeplab as exp_cfg +from official.projects.panoptic.dataloaders import panoptic_deeplab_input +from official.projects.panoptic.losses import panoptic_deeplab_losses +from official.projects.panoptic.modeling import factory +from official.vision.dataloaders import input_reader_factory +from official.vision.evaluation import panoptic_quality_evaluator +from official.vision.evaluation import segmentation_metrics + + +@task_factory.register_task_cls(exp_cfg.PanopticDeeplabTask) +class PanopticDeeplabTask(base_task.Task): + """A task for Panoptic Deeplab.""" + + def build_model(self): + """Builds panoptic deeplab model.""" + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = factory.build_panoptic_deeplab( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + return model + + def initialize(self, model: tf.keras.Model): + """Loads pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if 'all' in self.task_config.init_checkpoint_modules: + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + ckpt_items = {} + if 'backbone' in self.task_config.init_checkpoint_modules: + ckpt_items.update(backbone=model.backbone) + if 'decoder' in self.task_config.init_checkpoint_modules: + ckpt_items.update(semantic_decoder=model.semantic_decoder) + if not self.task_config.model.shared_decoder: + ckpt_items.update(instance_decoder=model.instance_decoder) + + ckpt = tf.train.Checkpoint(**ckpt_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs(self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None): + """Builds panoptic deeplab input.""" + decoder_cfg = params.decoder.get() + + if params.decoder.type == 'simple_decoder': + decoder = panoptic_deeplab_input.TfExampleDecoder( + regenerate_source_id=decoder_cfg.regenerate_source_id, + panoptic_category_mask_key=decoder_cfg.panoptic_category_mask_key, + panoptic_instance_mask_key=decoder_cfg.panoptic_instance_mask_key) + else: + raise ValueError('Unknown decoder type: {}!'.format(params.decoder.type)) + + parser = panoptic_deeplab_input.Parser( + output_size=self.task_config.model.input_size[:2], + ignore_label=params.parser.ignore_label, + resize_eval_groundtruth=params.parser.resize_eval_groundtruth, + groundtruth_padded_size=params.parser.groundtruth_padded_size, + aug_scale_min=params.parser.aug_scale_min, + aug_scale_max=params.parser.aug_scale_max, + aug_rand_hflip=params.parser.aug_rand_hflip, + aug_type=params.parser.aug_type, + sigma=params.parser.sigma, + dtype=params.parser.dtype) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + + dataset = reader.read(input_context=input_context) + + return dataset + + def build_losses(self, + labels: Mapping[str, tf.Tensor], + model_outputs: Mapping[str, tf.Tensor], + aux_losses: Optional[Any] = None): + """Panoptic deeplab losses. + + Args: + labels: labels. + model_outputs: Output logits from panoptic deeplab. + aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model. + + Returns: + The total loss tensor. + """ + loss_config = self._task_config.losses + segmentation_loss_fn = ( + panoptic_deeplab_losses.WeightedBootstrappedCrossEntropyLoss( + loss_config.label_smoothing, + loss_config.class_weights, + loss_config.ignore_label, + top_k_percent_pixels=loss_config.top_k_percent_pixels)) + instance_center_heatmap_loss_fn = panoptic_deeplab_losses.CenterHeatmapLoss( + ) + instance_center_offset_loss_fn = panoptic_deeplab_losses.CenterOffsetLoss() + + semantic_weights = tf.cast( + labels['semantic_weights'], + dtype=model_outputs['instance_centers_heatmap'].dtype) + things_mask = tf.cast( + tf.squeeze(labels['things_mask'], axis=3), + dtype=model_outputs['instance_centers_heatmap'].dtype) + valid_mask = tf.cast( + tf.squeeze(labels['valid_mask'], axis=3), + dtype=model_outputs['instance_centers_heatmap'].dtype) + + segmentation_loss = segmentation_loss_fn( + model_outputs['segmentation_outputs'], + labels['category_mask'], + sample_weight=semantic_weights) + instance_center_heatmap_loss = instance_center_heatmap_loss_fn( + model_outputs['instance_centers_heatmap'], + labels['instance_centers_heatmap'], + sample_weight=valid_mask) + instance_center_offset_loss = instance_center_offset_loss_fn( + model_outputs['instance_centers_offset'], + labels['instance_centers_offset'], + sample_weight=things_mask) + + model_loss = ( + loss_config.segmentation_loss_weight * segmentation_loss + + loss_config.center_heatmap_loss_weight * instance_center_heatmap_loss + + loss_config.center_offset_loss_weight * instance_center_offset_loss) + + total_loss = model_loss + if aux_losses: + total_loss += tf.add_n(aux_losses) + + losses = { + 'total_loss': total_loss, + 'model_loss': model_loss, + 'segmentation_loss': segmentation_loss, + 'instance_center_heatmap_loss': instance_center_heatmap_loss, + 'instance_center_offset_loss': instance_center_offset_loss + } + + return losses + + def build_metrics(self, training: bool = True) -> List[ + tf.keras.metrics.Metric]: + """Build metrics.""" + eval_config = self.task_config.evaluation + metrics = [] + if training: + metric_names = [ + 'total_loss', + 'segmentation_loss', + 'instance_center_heatmap_loss', + 'instance_center_offset_loss', + 'model_loss'] + for name in metric_names: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + + if eval_config.report_train_mean_iou: + self.train_mean_iou = segmentation_metrics.MeanIoU( + name='train_mean_iou', + num_classes=self.task_config.model.num_classes, + rescale_predictions=False, + dtype=tf.float32) + else: + rescale_predictions = (not self.task_config.validation_data.parser + .resize_eval_groundtruth) + self.perclass_iou_metric = segmentation_metrics.PerClassIoU( + name='per_class_iou', + num_classes=self.task_config.model.num_classes, + rescale_predictions=rescale_predictions, + dtype=tf.float32) + + if self.task_config.model.generate_panoptic_masks: + self.panoptic_quality_metric = ( + panoptic_quality_evaluator.PanopticQualityEvaluator( + num_categories=self.task_config.model.num_classes, + ignored_label=eval_config.ignored_label, + max_instances_per_category=eval_config + .max_instances_per_category, + offset=eval_config.offset, + is_thing=eval_config.is_thing, + rescale_predictions=eval_config.rescale_predictions)) + + return metrics + + def train_step( + self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None) -> Dict[str, Any]: + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + images, labels = inputs + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + + with tf.GradientTape() as tape: + outputs = model( + inputs=images, + image_info=labels['image_info'], + training=True) + outputs = tf.nest.map_structure( + lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + losses = self.build_losses( + labels=labels, + model_outputs=outputs, + aux_losses=model.losses) + scaled_loss = losses['total_loss'] / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient when LossScaleOptimizer is used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: losses['total_loss']} + + if metrics: + for m in metrics: + m.update_state(losses[m.name]) + + if self.task_config.evaluation.report_train_mean_iou: + segmentation_labels = { + 'masks': labels['category_mask'], + 'valid_masks': labels['valid_mask'], + 'image_info': labels['image_info'] + } + self.process_metrics( + metrics=[self.train_mean_iou], + labels=segmentation_labels, + model_outputs=outputs['segmentation_outputs']) + logs.update({ + self.train_mean_iou.name: + self.train_mean_iou.result() + }) + + return logs + + def validation_step( + self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None) -> Dict[str, Any]: + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + images, labels = inputs + + outputs = model( + inputs=images, + image_info=labels['image_info'], + training=False) + + logs = {self.loss: 0} + segmentation_labels = { + 'masks': labels['category_mask'], + 'valid_masks': labels['valid_mask'], + 'image_info': labels['image_info'] + } + + self.perclass_iou_metric.update_state(segmentation_labels, + outputs['segmentation_outputs']) + + if self.task_config.model.generate_panoptic_masks: + pq_metric_labels = { + 'category_mask': tf.squeeze(labels['category_mask'], axis=3), + 'instance_mask': tf.squeeze(labels['instance_mask'], axis=3), + 'image_info': labels['image_info'] + } + panoptic_outputs = { + 'category_mask': + outputs['category_mask'], + 'instance_mask': + outputs['instance_mask'], + } + logs.update({ + self.panoptic_quality_metric.name: + (pq_metric_labels, panoptic_outputs)}) + return logs + + def aggregate_logs(self, state=None, step_outputs=None): + if state is None: + self.perclass_iou_metric.reset_states() + state = [self.perclass_iou_metric] + if self.task_config.model.generate_panoptic_masks: + state += [self.panoptic_quality_metric] + + if self.task_config.model.generate_panoptic_masks: + self.panoptic_quality_metric.update_state( + step_outputs[self.panoptic_quality_metric.name][0], + step_outputs[self.panoptic_quality_metric.name][1]) + + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + result = {} + ious = self.perclass_iou_metric.result() + if self.task_config.evaluation.report_per_class_iou: + for i, value in enumerate(ious.numpy()): + result.update({'segmentation_iou/class_{}'.format(i): value}) + + # Computes mean IoU + result.update({'segmentation_mean_iou': tf.reduce_mean(ious).numpy()}) + + if self.task_config.model.generate_panoptic_masks: + panoptic_quality_results = self.panoptic_quality_metric.result() + for k, value in panoptic_quality_results.items(): + if k.endswith('per_class'): + if self.task_config.evaluation.report_per_class_pq: + for i, per_class_value in enumerate(value): + metric_key = 'panoptic_quality/{}/class_{}'.format(k, i) + result[metric_key] = per_class_value + else: + continue + else: + result['panoptic_quality/{}'.format(k)] = value + + return result diff --git a/official/projects/panoptic/tasks/panoptic_maskrcnn.py b/official/projects/panoptic/tasks/panoptic_maskrcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..137888c0844ec2f09739a1f18f64cee4839bcf3a --- /dev/null +++ b/official/projects/panoptic/tasks/panoptic_maskrcnn.py @@ -0,0 +1,451 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Panoptic MaskRCNN task definition.""" +from typing import Any, Dict, List, Mapping, Optional, Tuple + +from absl import logging +import tensorflow as tf + +from official.common import dataset_fn +from official.core import task_factory +from official.projects.panoptic.configs import panoptic_maskrcnn as exp_cfg +from official.projects.panoptic.dataloaders import panoptic_maskrcnn_input +from official.projects.panoptic.modeling import factory +from official.vision.dataloaders import input_reader_factory +from official.vision.evaluation import panoptic_quality_evaluator +from official.vision.evaluation import segmentation_metrics +from official.vision.losses import segmentation_losses +from official.vision.tasks import maskrcnn + + +@task_factory.register_task_cls(exp_cfg.PanopticMaskRCNNTask) +class PanopticMaskRCNNTask(maskrcnn.MaskRCNNTask): + + """A single-replica view of training procedure. + + Panoptic Mask R-CNN task provides artifacts for training/evalution procedures, + including loading/iterating over Datasets, initializing the model, calculating + the loss, post-processing, and customized metrics with reduction. + """ + + def build_model(self) -> tf.keras.Model: + """Build Panoptic Mask R-CNN model.""" + + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = factory.build_panoptic_maskrcnn( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + + if self.task_config.freeze_backbone: + model.backbone.trainable = False + + return model + + def initialize(self, model: tf.keras.Model) -> None: + """Loading pretrained checkpoint.""" + + if not self.task_config.init_checkpoint: + return + + def _get_checkpoint_path(checkpoint_dir_or_file): + checkpoint_path = checkpoint_dir_or_file + if tf.io.gfile.isdir(checkpoint_dir_or_file): + checkpoint_path = tf.train.latest_checkpoint( + checkpoint_dir_or_file) + return checkpoint_path + + for init_module in self.task_config.init_checkpoint_modules: + # Restoring checkpoint. + if init_module == 'all': + checkpoint_path = _get_checkpoint_path( + self.task_config.init_checkpoint) + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(checkpoint_path) + status.expect_partial().assert_existing_objects_matched() + + elif init_module == 'backbone': + checkpoint_path = _get_checkpoint_path( + self.task_config.init_checkpoint) + ckpt = tf.train.Checkpoint(backbone=model.backbone) + status = ckpt.read(checkpoint_path) + status.expect_partial().assert_existing_objects_matched() + + elif init_module == 'segmentation_backbone': + checkpoint_path = _get_checkpoint_path( + self.task_config.segmentation_init_checkpoint) + ckpt = tf.train.Checkpoint( + segmentation_backbone=model.segmentation_backbone) + status = ckpt.read(checkpoint_path) + status.expect_partial().assert_existing_objects_matched() + + elif init_module == 'segmentation_decoder': + checkpoint_path = _get_checkpoint_path( + self.task_config.segmentation_init_checkpoint) + ckpt = tf.train.Checkpoint( + segmentation_decoder=model.segmentation_decoder) + status = ckpt.read(checkpoint_path) + status.expect_partial().assert_existing_objects_matched() + + else: + raise ValueError( + "Only 'all', 'backbone', 'segmentation_backbone' and/or " + "segmentation_backbone' can be used to initialize the model, but " + "got {}".format(init_module)) + logging.info('Finished loading pretrained checkpoint from %s for %s', + checkpoint_path, init_module) + + def build_inputs( + self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None + ) -> tf.data.Dataset: + """Build input dataset.""" + decoder_cfg = params.decoder.get() + if params.decoder.type == 'simple_decoder': + decoder = panoptic_maskrcnn_input.TfExampleDecoder( + regenerate_source_id=decoder_cfg.regenerate_source_id, + mask_binarize_threshold=decoder_cfg.mask_binarize_threshold, + include_panoptic_masks=decoder_cfg.include_panoptic_masks, + panoptic_category_mask_key=decoder_cfg.panoptic_category_mask_key, + panoptic_instance_mask_key=decoder_cfg.panoptic_instance_mask_key) + else: + raise ValueError('Unknown decoder type: {}!'.format(params.decoder.type)) + + parser = panoptic_maskrcnn_input.Parser( + output_size=self.task_config.model.input_size[:2], + min_level=self.task_config.model.min_level, + max_level=self.task_config.model.max_level, + num_scales=self.task_config.model.anchor.num_scales, + aspect_ratios=self.task_config.model.anchor.aspect_ratios, + anchor_size=self.task_config.model.anchor.anchor_size, + dtype=params.dtype, + rpn_match_threshold=params.parser.rpn_match_threshold, + rpn_unmatched_threshold=params.parser.rpn_unmatched_threshold, + rpn_batch_size_per_im=params.parser.rpn_batch_size_per_im, + rpn_fg_fraction=params.parser.rpn_fg_fraction, + aug_rand_hflip=params.parser.aug_rand_hflip, + aug_scale_min=params.parser.aug_scale_min, + aug_scale_max=params.parser.aug_scale_max, + skip_crowd_during_training=params.parser.skip_crowd_during_training, + max_num_instances=params.parser.max_num_instances, + mask_crop_size=params.parser.mask_crop_size, + segmentation_resize_eval_groundtruth=params.parser + .segmentation_resize_eval_groundtruth, + segmentation_groundtruth_padded_size=params.parser + .segmentation_groundtruth_padded_size, + segmentation_ignore_label=params.parser.segmentation_ignore_label, + panoptic_ignore_label=params.parser.panoptic_ignore_label, + include_panoptic_masks=params.parser.include_panoptic_masks) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + dataset = reader.read(input_context=input_context) + + return dataset + + def build_losses(self, + outputs: Mapping[str, Any], + labels: Mapping[str, Any], + aux_losses: Optional[Any] = None) -> Dict[str, tf.Tensor]: + """Build Panoptic Mask R-CNN losses.""" + params = self.task_config.losses + + use_groundtruth_dimension = ( + params.semantic_segmentation_use_groundtruth_dimension) + + segmentation_loss_fn = segmentation_losses.SegmentationLoss( + label_smoothing=params.semantic_segmentation_label_smoothing, + class_weights=params.semantic_segmentation_class_weights, + ignore_label=params.semantic_segmentation_ignore_label, + gt_is_matting_map=params.semantic_segmentation_gt_is_matting_map, + use_groundtruth_dimension=use_groundtruth_dimension, + top_k_percent_pixels=params.semantic_segmentation_top_k_percent_pixels) + + instance_segmentation_weight = params.instance_segmentation_weight + semantic_segmentation_weight = params.semantic_segmentation_weight + + losses = super(PanopticMaskRCNNTask, self).build_losses( + outputs=outputs, + labels=labels, + aux_losses=None) + maskrcnn_loss = losses['model_loss'] + segmentation_loss = segmentation_loss_fn( + outputs['segmentation_outputs'], + labels['gt_segmentation_mask']) + + model_loss = ( + instance_segmentation_weight * maskrcnn_loss + + semantic_segmentation_weight * segmentation_loss) + + total_loss = model_loss + if aux_losses: + reg_loss = tf.reduce_sum(aux_losses) + total_loss = model_loss + reg_loss + + losses.update({ + 'total_loss': total_loss, + 'maskrcnn_loss': maskrcnn_loss, + 'segmentation_loss': segmentation_loss, + 'model_loss': model_loss, + }) + return losses + + def build_metrics(self, training: bool = True) -> List[ + tf.keras.metrics.Metric]: + """Build detection metrics.""" + metrics = [] + num_segmentation_classes = ( + self.task_config.model.segmentation_model.num_classes) + if training: + metric_names = [ + 'total_loss', + 'rpn_score_loss', + 'rpn_box_loss', + 'frcnn_cls_loss', + 'frcnn_box_loss', + 'mask_loss', + 'maskrcnn_loss', + 'segmentation_loss', + 'model_loss' + ] + for name in metric_names: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + + if self.task_config.segmentation_evaluation.report_train_mean_iou: + self.segmentation_train_mean_iou = segmentation_metrics.MeanIoU( + name='train_mean_iou', + num_classes=num_segmentation_classes, + rescale_predictions=False, + dtype=tf.float32) + + else: + if self.task_config.use_coco_metrics: + self._build_coco_metrics() + + rescale_predictions = (not self.task_config.validation_data.parser + .segmentation_resize_eval_groundtruth) + + self.segmentation_perclass_iou_metric = segmentation_metrics.PerClassIoU( + name='per_class_iou', + num_classes=num_segmentation_classes, + rescale_predictions=rescale_predictions, + dtype=tf.float32) + + if self.task_config.model.generate_panoptic_masks: + if not self.task_config.validation_data.parser.include_panoptic_masks: + raise ValueError('`include_panoptic_masks` should be set to True when' + ' computing panoptic quality.') + pq_config = self.task_config.panoptic_quality_evaluator + self.panoptic_quality_metric = ( + panoptic_quality_evaluator.PanopticQualityEvaluator( + num_categories=pq_config.num_categories, + ignored_label=pq_config.ignored_label, + max_instances_per_category=pq_config.max_instances_per_category, + offset=pq_config.offset, + is_thing=pq_config.is_thing, + rescale_predictions=pq_config.rescale_predictions)) + + return metrics + + def train_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None) -> Dict[str, Any]: + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + images, labels = inputs + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + + with tf.GradientTape() as tape: + outputs = model( + images, + image_info=labels['image_info'], + anchor_boxes=labels['anchor_boxes'], + gt_boxes=labels['gt_boxes'], + gt_classes=labels['gt_classes'], + gt_masks=(labels['gt_masks'] if self.task_config.model.include_mask + else None), + training=True) + outputs = tf.nest.map_structure( + lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + losses = self.build_losses( + outputs=outputs, labels=labels, aux_losses=model.losses) + scaled_loss = losses['total_loss'] / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient when LossScaleOptimizer is used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: losses['total_loss']} + + if metrics: + for m in metrics: + m.update_state(losses[m.name]) + + if self.task_config.segmentation_evaluation.report_train_mean_iou: + segmentation_labels = { + 'masks': labels['gt_segmentation_mask'], + 'valid_masks': labels['gt_segmentation_valid_mask'], + 'image_info': labels['image_info'] + } + self.process_metrics( + metrics=[self.segmentation_train_mean_iou], + labels=segmentation_labels, + model_outputs=outputs['segmentation_outputs']) + logs.update({ + self.segmentation_train_mean_iou.name: + self.segmentation_train_mean_iou.result() + }) + + return logs + + def validation_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None) -> Dict[str, Any]: + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + images, labels = inputs + + outputs = model( + images, + anchor_boxes=labels['anchor_boxes'], + image_info=labels['image_info'], + training=False) + + logs = {self.loss: 0} + if self._task_config.use_coco_metrics: + coco_model_outputs = { + 'detection_masks': outputs['detection_masks'], + 'detection_boxes': outputs['detection_boxes'], + 'detection_scores': outputs['detection_scores'], + 'detection_classes': outputs['detection_classes'], + 'num_detections': outputs['num_detections'], + 'source_id': labels['groundtruths']['source_id'], + 'image_info': labels['image_info'] + } + logs.update( + {self.coco_metric.name: (labels['groundtruths'], coco_model_outputs)}) + + segmentation_labels = { + 'masks': labels['groundtruths']['gt_segmentation_mask'], + 'valid_masks': labels['groundtruths']['gt_segmentation_valid_mask'], + 'image_info': labels['image_info'] + } + + self.segmentation_perclass_iou_metric.update_state( + segmentation_labels, outputs['segmentation_outputs']) + + if self.task_config.model.generate_panoptic_masks: + pq_metric_labels = { + 'category_mask': labels['groundtruths']['gt_panoptic_category_mask'], + 'instance_mask': labels['groundtruths']['gt_panoptic_instance_mask'], + 'image_info': labels['image_info'] + } + logs.update({ + self.panoptic_quality_metric.name: + (pq_metric_labels, outputs['panoptic_outputs'])}) + return logs + + def aggregate_logs(self, state=None, step_outputs=None): + if state is None: + self.segmentation_perclass_iou_metric.reset_states() + state = [self.segmentation_perclass_iou_metric] + if self.task_config.use_coco_metrics: + self.coco_metric.reset_states() + state.append(self.coco_metric) + if self.task_config.model.generate_panoptic_masks: + self.panoptic_quality_metric.reset_states() + state.append(self.panoptic_quality_metric) + + if self.task_config.use_coco_metrics: + self.coco_metric.update_state(step_outputs[self.coco_metric.name][0], + step_outputs[self.coco_metric.name][1]) + + if self.task_config.model.generate_panoptic_masks: + self.panoptic_quality_metric.update_state( + step_outputs[self.panoptic_quality_metric.name][0], + step_outputs[self.panoptic_quality_metric.name][1]) + + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + result = super().reduce_aggregated_logs( + aggregated_logs=aggregated_logs, global_step=global_step) + + ious = self.segmentation_perclass_iou_metric.result() + if self.task_config.segmentation_evaluation.report_per_class_iou: + for i, value in enumerate(ious.numpy()): + result.update({'segmentation_iou/class_{}'.format(i): value}) + # Computes mean IoU + result.update({'segmentation_mean_iou': tf.reduce_mean(ious).numpy()}) + + if self.task_config.model.generate_panoptic_masks: + report_per_class_metrics = ( + self.task_config.panoptic_quality_evaluator.report_per_class_metrics) + panoptic_quality_results = self.panoptic_quality_metric.result() + for k, value in panoptic_quality_results.items(): + if k.endswith('per_class'): + if report_per_class_metrics: + for i, per_class_value in enumerate(value): + metric_key = 'panoptic_quality/{}/class_{}'.format(k, i) + result[metric_key] = per_class_value + else: + continue + else: + result['panoptic_quality/{}'.format(k)] = value + + return result diff --git a/official/projects/panoptic/train.py b/official/projects/panoptic/train.py new file mode 100644 index 0000000000000000000000000000000000000000..f8287bd73bed58a43a702a1dbe2fb689a089099e --- /dev/null +++ b/official/projects/panoptic/train.py @@ -0,0 +1,30 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Panoptic MaskRCNN trainer.""" + +from absl import app + +from official.common import flags as tfm_flags +# pylint: disable=unused-import +from official.projects.panoptic.configs import panoptic_deeplab +from official.projects.panoptic.configs import panoptic_maskrcnn +from official.projects.panoptic.tasks import panoptic_deeplab as panoptic_deeplab_task +from official.projects.panoptic.tasks import panoptic_maskrcnn as panoptic_maskrcnn_task +from official.vision import train +# pylint: enable=unused-import + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/pruning/README.md b/official/projects/pruning/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f5539584614b8acad4fbc25dfe8d772cd0aa69ef --- /dev/null +++ b/official/projects/pruning/README.md @@ -0,0 +1,44 @@ +# Training with Pruning +[TOC] + +⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or +distributed by Google. The dataset is made available by third parties. +Please review the terms and conditions made available by the third parties +before using the data. + +## Overview + +This project includes pruning codes for TensorFlow models. +These are examples to show how to apply the Model Optimization Toolkit's +[pruning API](https://www.tensorflow.org/model_optimization/guide/pruning). + +## How to train a model + +```bash +EXPERIMENT=xxx # Change this for your run, for example, 'resnet_imagenet_pruning' +CONFIG_FILE=xxx # Change this for your run, for example, path of imagenet_resnet50_pruning_gpu.yaml +MODEL_DIR=xxx # Change this for your run, for example, /tmp/model_dir +python3 train.py \ + --experiment=${EXPERIMENT} \ + --config_file=${CONFIG_FILE} \ + --model_dir=${MODEL_DIR} \ + --mode=train_and_eval +``` + +## Accuracy +
+ + +
Comparison of Imagenet top-1 accuracy for the classification models
+
+ +Note: The Top-1 model accuracy is measured on the validation set of [ImageNet](https://www.image-net.org/). + +## Pre-trained Models + +### Image Classification + +Model |Resolution|Top-1 Accuracy (Dense)|Top-1 Accuracy (50% sparsity)|Top-1 Accuracy (80% sparsity)|Config |Download +----------------------|----------|---------------------|-------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------| +|MobileNetV2 |224x224 |72.768% |71.334% |61.378% |[config](https://github.com/tensorflow/models/blob/master/official/projects/pruning/configs/experiments/image_classification/imagenet_mobilenetv2_pruning_gpu.yaml) |[TFLite(50% sparsity)](https://storage.googleapis.com/tf_model_garden/vision/mobilenet/v2_1.0_float/mobilenet_v2_0.5_pruned_1.00_224_float.tflite), | +|ResNet50 |224x224 |76.704% |76.61% |75.508% |[config](https://github.com/tensorflow/models/blob/master/official/projects/pruning/configs/experiments/image_classification/imagenet_resnet50_pruning_gpu.yaml) |[TFLite(80% sparsity)](https://storage.googleapis.com/tf_model_garden/vision/resnet50_imagenet/resnet_50_0.8_pruned_224_float.tflite) | diff --git a/official/projects/pruning/configs/__init__.py b/official/projects/pruning/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..4425d4bd55b14430b52bb8cca8a0d50c61cd329f --- /dev/null +++ b/official/projects/pruning/configs/__init__.py @@ -0,0 +1,17 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs package definition.""" + +from official.projects.pruning.configs import image_classification diff --git a/official/projects/pruning/configs/experiments/image_classification/imagenet_mobilenetv2_pruning_gpu.yaml b/official/projects/pruning/configs/experiments/image_classification/imagenet_mobilenetv2_pruning_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..af1dca89d30f1967156558bc758ba3bf80095df7 --- /dev/null +++ b/official/projects/pruning/configs/experiments/image_classification/imagenet_mobilenetv2_pruning_gpu.yaml @@ -0,0 +1,59 @@ +# MobileNetV2_1.0 ImageNet classification. +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV2' + filter_size_scale: 1.0 + dropout_rate: 0.1 + losses: + l2_weight_decay: 0 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 1024 + dtype: 'float32' + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 1024 + dtype: 'float32' + drop_remainder: false + pruning: + pretrained_original_checkpoint: 'gs://**/mobilenetv2_gpu/22984194/ckpt-625500' + pruning_schedule: 'PolynomialDecay' + begin_step: 0 + end_step: 80000 + initial_sparsity: 0.2 + final_sparsity: 0.5 + frequency: 400 +trainer: + # Top1 accuracy 71.33% after 17hr for 8 GPUs with pruning. + # Pretrained network without pruning has Top1 accuracy 72.77% + train_steps: 125100 # 50 epoch + validation_steps: 98 + validation_interval: 2502 + steps_per_loop: 2502 + summary_interval: 2502 + checkpoint_interval: 2502 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.04 + decay_steps: 5004 + decay_rate: 0.85 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 0 diff --git a/official/projects/pruning/configs/experiments/image_classification/imagenet_resnet50_pruning_gpu.yaml b/official/projects/pruning/configs/experiments/image_classification/imagenet_resnet50_pruning_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..dfa298a8a44f93bdf17d5a3a3dc55432e0365355 --- /dev/null +++ b/official/projects/pruning/configs/experiments/image_classification/imagenet_resnet50_pruning_gpu.yaml @@ -0,0 +1,60 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'resnet' + resnet: + model_id: 50 + losses: + l2_weight_decay: 0 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 1024 + dtype: 'float32' + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 1024 + dtype: 'float32' + drop_remainder: false + pruning: + pretrained_original_checkpoint: 'gs://**/resnet_classifier_gpu/ckpt-56160' + pruning_schedule: 'PolynomialDecay' + begin_step: 0 + end_step: 40000 + initial_sparsity: 0.2 + final_sparsity: 0.8 + frequency: 40 +trainer: + # Top1 accuracy 75.508% after 7hr for 8 GPUs with pruning. + # Pretrained network without pruning has Top1 accuracy 76.7% + train_steps: 50000 + validation_steps: 50 + validation_interval: 1251 + steps_per_loop: 1251 + summary_interval: 1251 + checkpoint_interval: 1251 + optimizer_config: + optimizer: + type: 'sgd' + sgd: + momentum: 0.9 + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.01 + decay_steps: 2502 + decay_rate: 0.9 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 0 diff --git a/official/projects/pruning/configs/image_classification.py b/official/projects/pruning/configs/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..4ab5952a85f739732ae7a55252f0200e0da9a3f5 --- /dev/null +++ b/official/projects/pruning/configs/image_classification.py @@ -0,0 +1,80 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification configuration definition.""" +import dataclasses + +from typing import Optional, Tuple + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.vision.configs import image_classification + + +@dataclasses.dataclass +class PruningConfig(hyperparams.Config): + """Pruning parameters. + + Attributes: + pretrained_original_checkpoint: The pretrained checkpoint location of the + original model. + pruning_schedule: A string that indicates the name of `PruningSchedule` + object that controls pruning rate throughout training. Current available + options are: `PolynomialDecay` and `ConstantSparsity`. + begin_step: Step at which to begin pruning. + end_step: Step at which to end pruning. + initial_sparsity: Sparsity ratio at which pruning begins. + final_sparsity: Sparsity ratio at which pruning ends. + frequency: Number of training steps between sparsity adjustment. + sparsity_m_by_n: Structured sparsity specification. It specifies m zeros + over n consecutive weight elements. + """ + pretrained_original_checkpoint: Optional[str] = None + pruning_schedule: str = 'PolynomialDecay' + begin_step: int = 0 + end_step: int = 1000 + initial_sparsity: float = 0.0 + final_sparsity: float = 0.1 + frequency: int = 100 + sparsity_m_by_n: Optional[Tuple[int, int]] = None + + +@dataclasses.dataclass +class ImageClassificationTask(image_classification.ImageClassificationTask): + pruning: Optional[PruningConfig] = None + + +@exp_factory.register_config_factory('resnet_imagenet_pruning') +def image_classification_imagenet() -> cfg.ExperimentConfig: + """Builds an image classification config for the resnet with pruning.""" + config = image_classification.image_classification_imagenet() + task = ImageClassificationTask.from_args( + pruning=PruningConfig(), **config.task.as_dict()) + config.task = task + runtime = cfg.RuntimeConfig(enable_xla=False) + config.runtime = runtime + + return config + + +@exp_factory.register_config_factory('mobilenet_imagenet_pruning') +def image_classification_imagenet_mobilenet() -> cfg.ExperimentConfig: + """Builds an image classification config for the mobilenetV2 with pruning.""" + config = image_classification.image_classification_imagenet_mobilenet() + task = ImageClassificationTask.from_args( + pruning=PruningConfig(), **config.task.as_dict()) + config.task = task + + return config diff --git a/official/projects/pruning/configs/image_classification_test.py b/official/projects/pruning/configs/image_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..f52505d27cc3718c8f9161a006bd55c177c6ff0b --- /dev/null +++ b/official/projects/pruning/configs/image_classification_test.py @@ -0,0 +1,48 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for image_classification.""" +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.pruning.configs import image_classification as pruning_exp_cfg +from official.vision.configs import image_classification as exp_cfg + + +class ImageClassificationConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + ('resnet_imagenet_pruning',), + ('mobilenet_imagenet_pruning'), + ) + def test_image_classification_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.ImageClassificationTask) + self.assertIsInstance(config.task, pruning_exp_cfg.ImageClassificationTask) + self.assertIsInstance(config.task.pruning, pruning_exp_cfg.PruningConfig) + self.assertIsInstance(config.task.model, exp_cfg.ImageClassificationModel) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.validate() + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/pruning/registry_imports.py b/official/projects/pruning/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..847cca42e1417637cf2268ed1f00840579ea5f37 --- /dev/null +++ b/official/projects/pruning/registry_imports.py @@ -0,0 +1,18 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration on pruning project.""" +# pylint: disable=unused-import +from official.projects.pruning import configs +from official.projects.pruning.tasks import image_classification diff --git a/official/projects/pruning/tasks/__init__.py b/official/projects/pruning/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..9320e3087f57565a200bb7ea2b89ed083fabf42f --- /dev/null +++ b/official/projects/pruning/tasks/__init__.py @@ -0,0 +1,17 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Modeling package definition.""" + +from official.projects.pruning.tasks import image_classification diff --git a/official/projects/pruning/tasks/image_classification.py b/official/projects/pruning/tasks/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..6b81788289af41d38c76b68af7acf5902ea4b49b --- /dev/null +++ b/official/projects/pruning/tasks/image_classification.py @@ -0,0 +1,147 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification task definition.""" +from absl import logging +import tensorflow as tf +import tensorflow_model_optimization as tfmot + +from official.core import task_factory +from official.projects.pruning.configs import image_classification as exp_cfg +from official.vision.modeling.backbones import mobilenet +from official.vision.modeling.layers import nn_blocks +from official.vision.tasks import image_classification + + +@task_factory.register_task_cls(exp_cfg.ImageClassificationTask) +class ImageClassificationTask(image_classification.ImageClassificationTask): + """A task for image classification with pruning.""" + _BLOCK_LAYER_SUFFIX_MAP = { + mobilenet.Conv2DBNBlock: ('conv2d/kernel:0',), + nn_blocks.BottleneckBlock: ( + 'conv2d/kernel:0', + 'conv2d_1/kernel:0', + 'conv2d_2/kernel:0', + 'conv2d_3/kernel:0', + ), + nn_blocks.InvertedBottleneckBlock: ( + 'conv2d/kernel:0', + 'conv2d_1/kernel:0', + 'conv2d_2/kernel:0', + 'conv2d_3/kernel:0', + 'depthwise_conv2d/depthwise_kernel:0', + ), + nn_blocks.ResidualBlock: ( + 'conv2d/kernel:0', + 'conv2d_1/kernel:0', + 'conv2d_2/kernel:0', + ), + } + + def build_model(self) -> tf.keras.Model: + """Builds classification model with pruning.""" + model = super(ImageClassificationTask, self).build_model() + if self.task_config.pruning is None: + return model + + pruning_cfg = self.task_config.pruning + + prunable_model = tf.keras.models.clone_model( + model, + clone_function=self._make_block_prunable, + ) + + original_checkpoint = pruning_cfg.pretrained_original_checkpoint + if original_checkpoint is not None: + ckpt = tf.train.Checkpoint(model=prunable_model, **model.checkpoint_items) + status = ckpt.read(original_checkpoint) + status.expect_partial().assert_existing_objects_matched() + + pruning_params = {} + if pruning_cfg.sparsity_m_by_n is not None: + pruning_params['sparsity_m_by_n'] = pruning_cfg.sparsity_m_by_n + + if pruning_cfg.pruning_schedule == 'PolynomialDecay': + pruning_params['pruning_schedule'] = tfmot.sparsity.keras.PolynomialDecay( + initial_sparsity=pruning_cfg.initial_sparsity, + final_sparsity=pruning_cfg.final_sparsity, + begin_step=pruning_cfg.begin_step, + end_step=pruning_cfg.end_step, + frequency=pruning_cfg.frequency) + elif pruning_cfg.pruning_schedule == 'ConstantSparsity': + pruning_params[ + 'pruning_schedule'] = tfmot.sparsity.keras.ConstantSparsity( + target_sparsity=pruning_cfg.final_sparsity, + begin_step=pruning_cfg.begin_step, + frequency=pruning_cfg.frequency) + else: + raise NotImplementedError( + 'Only PolynomialDecay and ConstantSparsity are currently supported. Not support %s' + % pruning_cfg.pruning_schedule) + + pruned_model = tfmot.sparsity.keras.prune_low_magnitude( + prunable_model, **pruning_params) + + # Print out prunable weights for debugging purpose. + prunable_layers = collect_prunable_layers(pruned_model) + pruned_weights = [] + for layer in prunable_layers: + pruned_weights += [weight.name for weight, _, _ in layer.pruning_vars] + unpruned_weights = [ + weight.name + for weight in pruned_model.weights + if weight.name not in pruned_weights + ] + + logging.info( + '%d / %d weights are pruned.\nPruned weights: [ \n%s \n],\n' + 'Unpruned weights: [ \n%s \n],', + len(pruned_weights), len(model.weights), ', '.join(pruned_weights), + ', '.join(unpruned_weights)) + + return pruned_model + + def _make_block_prunable( + self, layer: tf.keras.layers.Layer) -> tf.keras.layers.Layer: + if isinstance(layer, tf.keras.Model): + return tf.keras.models.clone_model( + layer, input_tensors=None, clone_function=self._make_block_prunable) + + if layer.__class__ not in self._BLOCK_LAYER_SUFFIX_MAP: + return layer + + prunable_weights = [] + for layer_suffix in self._BLOCK_LAYER_SUFFIX_MAP[layer.__class__]: + for weight in layer.weights: + if weight.name.endswith(layer_suffix): + prunable_weights.append(weight) + + def get_prunable_weights(): + return prunable_weights + + layer.get_prunable_weights = get_prunable_weights + + return layer + + +def collect_prunable_layers(model): + """Recursively collect the prunable layers in the model.""" + prunable_layers = [] + for layer in model.layers: + if isinstance(layer, tf.keras.Model): + prunable_layers += collect_prunable_layers(layer) + if layer.__class__.__name__ == 'PruneLowMagnitude': + prunable_layers.append(layer) + + return prunable_layers diff --git a/official/projects/pruning/tasks/image_classification_test.py b/official/projects/pruning/tasks/image_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..09c5a0aa05a7ef2f144a312aedbe4dbb6c5033db --- /dev/null +++ b/official/projects/pruning/tasks/image_classification_test.py @@ -0,0 +1,201 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for image classification task.""" + +# pylint: disable=unused-import +import os +import tempfile + +from absl.testing import parameterized +import numpy as np +import orbit +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official import vision +from official.core import actions +from official.core import exp_factory +from official.modeling import optimization +from official.projects.pruning.tasks import image_classification as img_cls_task +from official.vision.dataloaders import tfexample_utils + + +class ImageClassificationTaskTest(tf.test.TestCase, parameterized.TestCase): + + def _validate_model_pruned(self, model, config_name): + + pruning_weight_names = [] + prunable_layers = img_cls_task.collect_prunable_layers(model) + for layer in prunable_layers: + for weight, _, _ in layer.pruning_vars: + pruning_weight_names.append(weight.name) + if config_name == 'resnet_imagenet_pruning': + # Conv2D : 1 + # BottleneckBlockGroup : 4+3+3 = 10 + # BottleneckBlockGroup1 : 4+3+3+3 = 13 + # BottleneckBlockGroup2 : 4+3+3+3+3+3 = 19 + # BottleneckBlockGroup3 : 4+3+3 = 10 + # FullyConnected : 1 + # Total : 54 + self.assertLen(pruning_weight_names, 54) + elif config_name == 'mobilenet_imagenet_pruning': + # Conv2DBN = 1 + # InvertedBottleneckBlockGroup = 2 + # InvertedBottleneckBlockGroup1~16 = 48 + # Conv2DBN = 1 + # FullyConnected : 1 + # Total : 53 + self.assertLen(pruning_weight_names, 53) + + def _check_2x4_sparsity(self, model): + + def _is_pruned_2_by_4(weights): + if weights.shape.rank == 2: + prepared_weights = tf.transpose(weights) + elif weights.shape.rank == 4: + perm_weights = tf.transpose(weights, perm=[3, 0, 1, 2]) + prepared_weights = tf.reshape(perm_weights, + [-1, perm_weights.shape[-1]]) + + prepared_weights_np = prepared_weights.numpy() + + for row in range(0, prepared_weights_np.shape[0]): + for col in range(0, prepared_weights_np.shape[1], 4): + if np.count_nonzero(prepared_weights_np[row, col:col + 4]) > 2: + return False + return True + + prunable_layers = img_cls_task.collect_prunable_layers(model) + for layer in prunable_layers: + for weight, _, _ in layer.pruning_vars: + if weight.shape[-2] % 4 == 0: + self.assertTrue(_is_pruned_2_by_4(weight)) + + def _validate_metrics(self, logs, metrics): + for metric in metrics: + logs[metric.name] = metric.result() + self.assertIn('loss', logs) + self.assertIn('accuracy', logs) + self.assertIn('top_5_accuracy', logs) + + def _create_test_tfrecord(self, test_tfrecord_file, num_samples, + input_image_size): + example = tf.train.Example.FromString( + tfexample_utils.create_classification_example( + image_height=input_image_size[0], image_width=input_image_size[1])) + examples = [example] * num_samples + tfexample_utils.dump_to_tfrecord( + record_file=test_tfrecord_file, tf_examples=examples) + + @parameterized.parameters(('resnet_imagenet_pruning'), + ('mobilenet_imagenet_pruning')) + def testTaskWithUnstructuredSparsity(self, config_name): + test_tfrecord_file = os.path.join(self.get_temp_dir(), 'cls_test.tfrecord') + self._create_test_tfrecord( + test_tfrecord_file=test_tfrecord_file, + num_samples=10, + input_image_size=[224, 224]) + config = exp_factory.get_exp_config(config_name) + config.task.train_data.global_batch_size = 2 + config.task.validation_data.input_path = test_tfrecord_file + config.task.train_data.input_path = test_tfrecord_file + + task = img_cls_task.ImageClassificationTask(config.task) + model = task.build_model() + + metrics = task.build_metrics() + strategy = tf.distribute.get_strategy() + + dataset = orbit.utils.make_distributed_dataset(strategy, task.build_inputs, + config.task.train_data) + + iterator = iter(dataset) + opt_factory = optimization.OptimizerFactory(config.trainer.optimizer_config) + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + + if isinstance(optimizer, optimization.ExponentialMovingAverage + ) and not optimizer.has_shadow_copy: + optimizer.shadow_copy(model) + + if config.task.pruning: + # This is an auxilary initialization required to prune a model which is + # originally done in the train library. + actions.PruningAction( + export_dir=tempfile.gettempdir(), model=model, optimizer=optimizer) + + # Check all layers and target weights are successfully pruned. + self._validate_model_pruned(model, config_name) + + logs = task.train_step(next(iterator), model, optimizer, metrics=metrics) + self._validate_metrics(logs, metrics) + + logs = task.validation_step(next(iterator), model, metrics=metrics) + self._validate_metrics(logs, metrics) + + @parameterized.parameters(('resnet_imagenet_pruning'), + ('mobilenet_imagenet_pruning')) + def testTaskWithStructuredSparsity(self, config_name): + test_tfrecord_file = os.path.join(self.get_temp_dir(), 'cls_test.tfrecord') + self._create_test_tfrecord( + test_tfrecord_file=test_tfrecord_file, + num_samples=10, + input_image_size=[224, 224]) + config = exp_factory.get_exp_config(config_name) + config.task.train_data.global_batch_size = 2 + config.task.validation_data.input_path = test_tfrecord_file + config.task.train_data.input_path = test_tfrecord_file + + # Add structured sparsity + config.task.pruning.sparsity_m_by_n = (2, 4) + config.task.pruning.frequency = 1 + + task = img_cls_task.ImageClassificationTask(config.task) + model = task.build_model() + + metrics = task.build_metrics() + strategy = tf.distribute.get_strategy() + + dataset = orbit.utils.make_distributed_dataset(strategy, task.build_inputs, + config.task.train_data) + + iterator = iter(dataset) + opt_factory = optimization.OptimizerFactory(config.trainer.optimizer_config) + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + + if isinstance(optimizer, optimization.ExponentialMovingAverage + ) and not optimizer.has_shadow_copy: + optimizer.shadow_copy(model) + + # This is an auxiliary initialization required to prune a model which is + # originally done in the train library. + pruning_actions = actions.PruningAction( + export_dir=tempfile.gettempdir(), model=model, optimizer=optimizer) + + # Check all layers and target weights are successfully pruned. + self._validate_model_pruned(model, config_name) + + logs = task.train_step(next(iterator), model, optimizer, metrics=metrics) + self._validate_metrics(logs, metrics) + + logs = task.validation_step(next(iterator), model, metrics=metrics) + self._validate_metrics(logs, metrics) + + pruning_actions.update_pruning_step.on_epoch_end(batch=None) + # Check whether the weights are pruned in 2x4 pattern. + self._check_2x4_sparsity(model) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/pruning/train.py b/official/projects/pruning/train.py new file mode 100644 index 0000000000000000000000000000000000000000..e1d6e3c9416232ecefaa7530387bbcb23084e1e5 --- /dev/null +++ b/official/projects/pruning/train.py @@ -0,0 +1,29 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver, including Pruning configs..""" + +from absl import app + +from official.common import flags as tfm_flags +# To build up a connection with the training binary for pruning, the custom +# configs & tasks are imported while unused. +from official.projects.pruning import configs # pylint: disable=unused-import +from official.projects.pruning.tasks import image_classification # pylint: disable=unused-import +from official.vision import train + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/qat/vision/README.md b/official/projects/qat/vision/README.md new file mode 100644 index 0000000000000000000000000000000000000000..cfe4b1a76028b19a10e1f5f916eb577494d7d7d3 --- /dev/null +++ b/official/projects/qat/vision/README.md @@ -0,0 +1,63 @@ +# Quantization Aware Training Project for Computer Vision Models + +⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or +distributed by Google. The dataset is made available by third parties. +Please review the terms and conditions made available by the third parties +before using the data. + +## Overview + +This project includes quantization aware training code for Computer Vision +models. These are examples to show how to apply the Model Optimization Toolkit's +[quantization aware training API](https://www.tensorflow.org/model_optimization/guide/quantization/training). + +Note: Currently, we support a limited number of ML tasks & models (e.g., image +classification and semantic segmentation) +We will keep adding support for other ML tasks and models in the next releases. + +## How to train a model + +``` +EXPERIMENT=xxx # Change this for your run, for example, 'mobilenet_imagenet_qat' +CONFIG_FILE=xxx # Change this for your run, for example, path of imagenet_mobilenetv2_qat_gpu.yaml +MODEL_DIR=xxx # Change this for your run, for example, /tmp/model_dir +$ python3 train.py \ +--experiment=${EXPERIMENT} \ +--config_file=${CONFIG_FILE} \ +--model_dir=${MODEL_DIR} \ +--mode=train_and_eval +``` + +## Image Classification + +
+ +
Comparison of Imagenet top-1 accuracy for the classification models
+
+ +Note: The Top-1 model accuracy is measured on the validation set of [ImageNet](https://www.image-net.org/). + + +### Pre-trained Models + +|Model |Resolution|Top-1 Accuracy (FP32)|Top-1 Accuracy (Int8/PTQ)|Top-1 Accuracy (Int8/QAT)|Config |Download | +|----------------------|----------|---------------------|-------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------| +|MobileNetV2 |224x224 |72.782% |72.392% |72.792% |[config](https://github.com/tensorflow/models/blob/master/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu.yaml) |[TFLite(Int8/QAT)](https://storage.googleapis.com/tf_model_garden/vision/mobilenet/v2_1.0_int8/mobilenet_v2_1.00_224_int8.tflite) | +|ResNet50 |224x224 |76.710% |76.420% |77.200% |[config](https://github.com/tensorflow/models/blob/master/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu.yaml) |[TFLite(Int8/QAT)](https://storage.googleapis.com/tf_model_garden/vision/resnet50_imagenet/resnet_50_224_int8.tflite) | +|MobileNetV3.5 MultiAVG|224x224 |75.212% |74.122% |75.130% |[config](https://github.com/tensorflow/models/blob/master/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_qat_gpu.yaml)|[TFLite(Int8/QAT)](https://storage.googleapis.com/tf_model_garden/vision/mobilenet/v3.5multiavg_1.0_int8/mobilenet_v3.5multiavg_1.00_224_int8.tflite)| + +## Semantic Segmentation + + +Model is pretrained using COCO train set. Two datasets, Pascal VOC segmentation +dataset and Cityscapes dataset (only for DeepLab v3+), are used to train and +evaluate models. Model accuracy is measured on full Pascal VOC segmentation +validation set. + +### Pre-trained Models + +model | resolution | mIoU | mIoU (FP32) | mIoU (FP16) | mIoU (INT8) | mIoU (QAT INT8) | download (tflite)| +:------------------------- | :--------: | ----: | ----------: | ----------: | ----------: | --------------: | ----------------: +MobileNet v2 + DeepLab v3 | 512x512 | 75.27 | 75.30 | 75.32 | 73.95 | 74.68 | [FP32](https://storage.googleapis.com/tf_model_garden/vision/qat/deeplabv3_mobilenetv2_pascal_coco_0.21/model_none.tflite) \| [FP16](https://storage.googleapis.com/tf_model_garden/vision/qat/deeplabv3_mobilenetv2_pascal_coco_0.21/model_fp16.tflite) \| [INT8](https://storage.googleapis.com/tf_model_garden/vision/qat/deeplabv3_mobilenetv2_pascal_coco_0.21model_int8_full.tflite) \| [QAT INT8](https://storage.googleapis.com/tf_model_garden/vision/qat/deeplabv3_mobilenetv2_pascal_coco_0.21/Fmodel_default.tflite) +MobileNet v2 + DeepLab v3+ | 1024x2048 | 73.82 | 73.84 | 73.65 | 72.33 | 73.49 | [FP32](https://storage.googleapis.com/tf_model_garden/vision/qat/mnv2_deeplabv3plus_cityscapes/model_none.tflite) \| [FP16](https://storage.googleapis.com/tf_model_garden/vision/qat/mnv2_deeplabv3plus_cityscapes/Fmodel_fp16.tflite) \| [INT8](https://storage.googleapis.com/tf_model_garden/vision/qat/mnv2_deeplabv3plus_cityscapes/model_int8_full.tflite) \| [QAT INT8](https://storage.googleapis.com/tf_model_garden/vision/qat/mnv2_deeplabv3plus_cityscapes/Fmodel_default.tflite) + diff --git a/official/projects/qat/vision/configs/__init__.py b/official/projects/qat/vision/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1d3c061ac778f0e1a5dc864cfddb8c7eda3c7aea --- /dev/null +++ b/official/projects/qat/vision/configs/__init__.py @@ -0,0 +1,19 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs package definition.""" + +from official.projects.qat.vision.configs import image_classification +from official.projects.qat.vision.configs import retinanet +from official.projects.qat.vision.configs import semantic_segmentation diff --git a/official/projects/qat/vision/configs/common.py b/official/projects/qat/vision/configs/common.py new file mode 100644 index 0000000000000000000000000000000000000000..c370226169e669569ad85b5dfe04402681c400dc --- /dev/null +++ b/official/projects/qat/vision/configs/common.py @@ -0,0 +1,43 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification configuration definition.""" + +import dataclasses +from typing import Optional + +from official.modeling import hyperparams + + +@dataclasses.dataclass +class Quantization(hyperparams.Config): + """Quantization parameters. + + Attributes: + pretrained_original_checkpoint: A string indicate pretrained checkpoint + location. + change_num_bits: A `bool` indicates whether to manually allocate num_bits. + num_bits_weight: An `int` number of bits for weight. Default to 8. + num_bits_activation: An `int` number of bits for activation. Default to 8. + quantize_detection_decoder: A `bool` indicates whether to quantize detection + decoder. It only works for detection model. + quantize_detection_head: A `bool` indicates whether to quantize detection + head. It only works for detection model. + """ + pretrained_original_checkpoint: Optional[str] = None + change_num_bits: bool = False + num_bits_weight: int = 8 + num_bits_activation: int = 8 + quantize_detection_decoder: bool = False + quantize_detection_head: bool = False diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c22848a5cb5d5017ca0e92b988a1ba23560ba9c0 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu.yaml @@ -0,0 +1,53 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV2' + filter_size_scale: 1.0 + dropout_rate: 0.1 + losses: + l2_weight_decay: 0.0000001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 512 # 64 * 8 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 512 # 64 * 8 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/mobilenetv2_gpu/22984194/ckpt-625500' +trainer: + # With below setting, the accuracy of QAT reaches to accuracy 0.7279 after 43 hours with 8 GPUS. + train_steps: 250200 + validation_steps: 98 + validation_interval: 2502 + steps_per_loop: 2502 + summary_interval: 2502 + checkpoint_interval: 2502 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + decay_rate: 0.9 + decay_steps: 1251 + initial_learning_rate: 0.0001 + name: 'ExponentialDecay' + offset: 0 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 0 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu_batch256.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu_batch256.yaml new file mode 100644 index 0000000000000000000000000000000000000000..cf081190d18bbf2e36f2ac674f788e966660c039 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu_batch256.yaml @@ -0,0 +1,53 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV2' + filter_size_scale: 1.0 + dropout_rate: 0.0 # changed from 0.2 to 0.0 + losses: + l2_weight_decay: 0.0000001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 256 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 256 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/mobilenetv2_gpu/22984194/ckpt-625500' +trainer: + # With below setting, the accuracy of QAT reaches Top1-accuracy 0.7251 at 420336 steps after + # 1 day 19 hours of training with 8GPUs, which is higher than the result of PTQ in MobileNetV2. + train_steps: 1000800 # 200 epochs + validation_steps: 196 # NUM_EXAMPLES (50000) // global_batch_size (256) + validation_interval: 5004 # 1 epoch + steps_per_loop: 5004 # NUM_EXAMPLES (1281167) // global_batch_size (256) + summary_interval: 5004 # 1 epoch + checkpoint_interval: 5004 # 1 epoch + max_to_keep: 200 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.0001 + decay_steps: 1251 # steps_per_epoch // 4 + decay_rate: 0.96 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 0 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu_batch512.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu_batch512.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c1ab24a5e7cfe901a8a196b3c97eacb915b97290 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv2_qat_gpu_batch512.yaml @@ -0,0 +1,53 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV2' + filter_size_scale: 1.0 + dropout_rate: 0.0 # changed from 0.2 to 0.0 + losses: + l2_weight_decay: 0.0000001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 512 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 512 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/mobilenetv2_gpu/22984194/ckpt-625500' +trainer: + # With below setting, the accuracy of QAT reaches Top1-accuracy 0.7266 at 312750 steps after + # 1 day 22 hours of training with 8GPUs, which is higher than the result of PTQ in MobileNetV2. + train_steps: 500400 # 200 epochs + validation_steps: 98 # NUM_EXAMPLES (50000) // global_batch_size (512) + validation_interval: 2502 # 1 epoch + steps_per_loop: 2502 # NUM_EXAMPLES (1281167) // global_batch_size (512) + summary_interval: 2502 # 1 epoch + checkpoint_interval: 2502 # 1 epoch + max_to_keep: 200 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.0002 + decay_steps: 1251 # steps_per_epoch // 2 + decay_rate: 0.96 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 0 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_qat_gpu.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_qat_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..946b034d177ab936addd97180bbdbf7ce6291f3d --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_qat_gpu.yaml @@ -0,0 +1,53 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetMultiAVG' + filter_size_scale: 1.0 + dropout_rate: 0.3 + losses: + l2_weight_decay: 0.000001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 512 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 512 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/tf2_mhave_nobias_bn_aug05/28334857/ckpt-156000' +trainer: + # With below setting, the accuracy of QAT reaches to accuracy 0.7513 after 30 hours with 8 GPUS. + train_steps: 250200 + validation_steps: 98 + validation_interval: 2502 + steps_per_loop: 2502 + summary_interval: 2502 + checkpoint_interval: 2502 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + decay_rate: 0.9 + decay_steps: 1251 + initial_learning_rate: 0.0004 + name: 'ExponentialDecay' + offset: 0 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 0 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv3large_qat_tpu.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv3large_qat_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d5070c33a6ca6e04d00a356ab5842b6a44b0ee0b --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_mobilenetv3large_qat_tpu.yaml @@ -0,0 +1,63 @@ +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'float32' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV3Large' + filter_size_scale: 1.0 + dropout_rate: 0.3 + losses: + l2_weight_decay: 1.0e-06 # 1/10 of original value. + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 4096 + dtype: 'float32' + aug_rand_hflip: true + drop_remainder: true + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 4096 + dtype: 'float32' + drop_remainder: false + aug_rand_hflip: true + quantization: + pretrained_original_checkpoint: 'gs://**/mobilenetv3_baseline_31/ckpt-156000' +trainer: + # With below setting, the accuracy of QAT reaches to accuracy 0.74.43 after ~2 hours with 4x4 DF. + train_steps: 62400 + validation_steps: 13 + validation_interval: 312 + steps_per_loop: 312 + summary_interval: 312 + checkpoint_interval: 312 + optimizer_config: + learning_rate: + cosine: + alpha: 0.0 + decay_steps: 62400 + initial_learning_rate: 0.0003 # 1/10 of original lr. + name: CosineDecay + offset: 0 + type: cosine + optimizer: + adamw: + amsgrad: false + beta_1: 0.9 + beta_2: 0.999 + epsilon: 1.0e-07 + gradient_clip_norm: 1.0 + weight_decay_rate: 0.0 + type: adamw + warmup: + type: 'linear' + linear: + warmup_steps: 0 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..01ab5eb09caf665d803fe719c64cbda0a61206cf --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu.yaml @@ -0,0 +1,52 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'resnet' + resnet: + model_id: 50 + losses: + l2_weight_decay: 0.0001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 256 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 256 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/resnet_classifier_gpu/ckpt-56160' +trainer: + # With below setting, the accuracy of QAT reaches to Top1-accuracy 0.7720 after 5 days of training + # with 8GPUs, which is higher than the non-quantized float32 version Resnet. + train_steps: 449280 + validation_steps: 200 + validation_interval: 5000 + steps_per_loop: 5000 + summary_interval: 5000 + checkpoint_interval: 5000 + optimizer_config: + optimizer: + type: 'sgd' + sgd: + momentum: 0.9 + learning_rate: + type: 'stepwise' + stepwise: + boundaries: [150000, 300000, 400000] + values: [0.08, 0.008, 0.0008, 0.00008] + warmup: + type: 'linear' + linear: + warmup_steps: 40000 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1912477bb79fe35b2efb90df56aeea91cd5a20c1 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast.yaml @@ -0,0 +1,54 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'resnet' + resnet: + model_id: 50 + losses: + l2_weight_decay: 0.0001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 256 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 256 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/resnet_classifier_gpu/ckpt-56160' +trainer: + # With below setting, the accuracy of QAT reaches to the non-quantized float32 version after + # around 160k steps, which takes 1d 15h with 8 GPUS. + train_steps: 449280 + validation_steps: 200 + validation_interval: 5000 + steps_per_loop: 5000 + summary_interval: 5000 + checkpoint_interval: 5000 + optimizer_config: + optimizer: + type: 'sgd' + sgd: + momentum: 0.9 + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.016 + decay_steps: 25000 + decay_rate: 0.5 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 1000 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_4x4.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_4x4.yaml new file mode 100644 index 0000000000000000000000000000000000000000..12d4a2c9921403fbfc86f1eeef6fac6d1c9d1ad3 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_4x4.yaml @@ -0,0 +1,57 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'resnet' + resnet: + model_id: 50 + losses: + l2_weight_decay: 0.0001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 256 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 256 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/resnet_classifier_gpu/ckpt-56160' + change_num_bits: true + num_bits_weight: 4 + num_bits_activation: 4 +trainer: + # With below setting, the accuracy of QAT reaches Top1-accuracy 0.6822 at 205k steps with 8GPUs. + # TODO: Please change the configs when training is done. + train_steps: 449280 + validation_steps: 200 + validation_interval: 5000 + steps_per_loop: 5000 + summary_interval: 5000 + checkpoint_interval: 5000 + optimizer_config: + optimizer: + type: 'sgd' + sgd: + momentum: 0.9 + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.016 + decay_steps: 25000 + decay_rate: 0.5 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 1000 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_4x8.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_4x8.yaml new file mode 100644 index 0000000000000000000000000000000000000000..ca739d41497271dcbe3c05b05da841a77ff75b39 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_4x8.yaml @@ -0,0 +1,57 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'resnet' + resnet: + model_id: 50 + losses: + l2_weight_decay: 0.0001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 256 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 256 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/resnet_classifier_gpu/ckpt-56160' + change_num_bits: true + num_bits_weight: 4 + num_bits_activation: 8 +trainer: + # With below setting, the accuracy of QAT reaches Top1-accuracy 0.7575 at 220k steps with 8GPUs. + # TODO: Please change the configs when training is done. + train_steps: 449280 + validation_steps: 200 + validation_interval: 5000 + steps_per_loop: 5000 + summary_interval: 5000 + checkpoint_interval: 5000 + optimizer_config: + optimizer: + type: 'sgd' + sgd: + momentum: 0.9 + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.016 + decay_steps: 25000 + decay_rate: 0.5 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 1000 diff --git a/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_6x6.yaml b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_6x6.yaml new file mode 100644 index 0000000000000000000000000000000000000000..88512f6002e81f293a660444b6bf9f6d37aff4e2 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/image_classification/imagenet_resnet50_qat_gpu_fast_6x6.yaml @@ -0,0 +1,57 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'resnet' + resnet: + model_id: 50 + losses: + l2_weight_decay: 0.0001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 256 + dtype: 'float32' + validation_data: + input_path: '/readahead/200M/placer/prod/home/distbelief/imagenet-tensorflow/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 256 + dtype: 'float32' + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/resnet_classifier_gpu/ckpt-56160' + change_num_bits: true + num_bits_weight: 6 + num_bits_activation: 6 +trainer: + # With below setting, the accuracy of QAT reaches Top1-accuracy 0.7607 at 190k steps with 8GPUs. + # TODO: Please change the configs when training is done. + train_steps: 449280 + validation_steps: 200 + validation_interval: 5000 + steps_per_loop: 5000 + summary_interval: 5000 + checkpoint_interval: 5000 + optimizer_config: + optimizer: + type: 'sgd' + sgd: + momentum: 0.9 + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.016 + decay_steps: 25000 + decay_rate: 0.5 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 1000 diff --git a/official/projects/qat/vision/configs/experiments/retinanet/coco_mobilenetv2_qat_tpu_e2e.yaml b/official/projects/qat/vision/configs/experiments/retinanet/coco_mobilenetv2_qat_tpu_e2e.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7238f9357f1871de54d92d54a81eefbdb1f0d9b3 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/retinanet/coco_mobilenetv2_qat_tpu_e2e.yaml @@ -0,0 +1,72 @@ +# --experiment_type=retinanet_mobile_coco_qat +# COCO mAP: 23.02 from QAT training and 21.62 from the TFLite after conversion. +# QAT only supports float32 tpu due to fake-quant op. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'float32' +task: + losses: + l2_weight_decay: 0.0 + model: + anchor: + anchor_size: 3 + aspect_ratios: [0.5, 1.0, 2.0] + num_scales: 3 + backbone: + mobilenet: + model_id: 'MobileNetV2' + filter_size_scale: 1.0 + type: 'mobilenet' + decoder: + type: 'fpn' + fpn: + num_filters: 128 + use_separable_conv: true + use_keras_layer: true + head: + num_convs: 4 + num_filters: 128 + use_separable_conv: true + input_size: [256, 256, 3] + max_level: 7 + min_level: 3 + norm_activation: + activation: 'relu6' + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + train_data: + dtype: 'float32' + global_batch_size: 256 + is_training: true + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + dtype: 'float32' + global_batch_size: 256 + is_training: false + drop_remainder: false + quantization: + pretrained_original_checkpoint: 'gs://**/coco_mobilenetv2_mobile_tpu/ckpt-277200' + quantize_detection_decoder: true + quantize_detection_head: true +trainer: + best_checkpoint_eval_metric: AP + best_checkpoint_export_subdir: best_ckpt + best_checkpoint_metric_comp: higher + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + decay_rate: 0.96 + decay_steps: 231 + initial_learning_rate: 0.5 + name: 'ExponentialDecay' + offset: 0 + staircase: true + steps_per_loop: 462 + train_steps: 46200 + validation_interval: 462 + validation_steps: 20 diff --git a/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_gpu.yaml b/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..3bfcfb57d3304d501c85ddd7a6509380d0709b65 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_gpu.yaml @@ -0,0 +1,64 @@ +# --experiment_type=retinanet_mobile_coco_qat +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' +task: + losses: + l2_weight_decay: 3.0e-05 + model: + anchor: + anchor_size: 3 + aspect_ratios: [0.5, 1.0, 2.0] + num_scales: 3 + backbone: + spinenet_mobile: + stochastic_depth_drop_rate: 0.2 + model_id: '49' + se_ratio: 0.2 + use_keras_upsampling_2d: true + type: 'spinenet_mobile' + decoder: + type: 'identity' + head: + num_convs: 4 + num_filters: 48 + use_separable_conv: true + input_size: [384, 384, 3] + max_level: 7 + min_level: 3 + norm_activation: + activation: 'swish' + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + train_data: + dtype: 'float32' + global_batch_size: 128 + is_training: true + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + dtype: 'float32' + global_batch_size: 8 + is_training: false + quantization: + pretrained_original_checkpoint: 'gs://**/coco_spinenet49_mobile_tpu/ckpt-277200' +trainer: + checkpoint_interval: 924 + optimizer_config: + learning_rate: + stepwise: + boundaries: [531300, 545160] + values: [0.0016, 0.00016, 0.000016] + type: 'stepwise' + warmup: + linear: + warmup_learning_rate: 0.0000335 + warmup_steps: 4000 + steps_per_loop: 924 + train_steps: 554400 + validation_interval: 924 + validation_steps: 1250 + summary_interval: 924 diff --git a/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_tpu.yaml b/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..0ce3d5462101fdca40a6800633c45c306d6b1c05 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_tpu.yaml @@ -0,0 +1,66 @@ +# --experiment_type=retinanet_mobile_coco_qat +# COCO mAP: 24.7 +# QAT only supports float32 tpu due to fake-quant op. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'float32' +task: + losses: + l2_weight_decay: 3.0e-05 + model: + anchor: + anchor_size: 3 + aspect_ratios: [0.5, 1.0, 2.0] + num_scales: 3 + backbone: + spinenet_mobile: + stochastic_depth_drop_rate: 0.2 + model_id: '49' + se_ratio: 0.2 + use_keras_upsampling_2d: true + type: 'spinenet_mobile' + decoder: + type: 'identity' + head: + num_convs: 4 + num_filters: 48 + use_separable_conv: true + input_size: [384, 384, 3] + max_level: 7 + min_level: 3 + norm_activation: + activation: 'swish' + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + train_data: + dtype: 'float32' + global_batch_size: 128 + is_training: true + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + dtype: 'float32' + global_batch_size: 16 + is_training: false + quantization: + pretrained_original_checkpoint: 'gs://**/coco_spinenet49_mobile_tpu_33884721/ckpt-277200' +trainer: + checkpoint_interval: 924 + optimizer_config: + learning_rate: + stepwise: + boundaries: [531300, 545160] + values: [0.0016, 0.00016, 0.000016] + type: 'stepwise' + warmup: + linear: + warmup_learning_rate: 0.0000335 + warmup_steps: 4000 + steps_per_loop: 924 + train_steps: 554400 + validation_interval: 924 + validation_steps: 1250 + summary_interval: 924 diff --git a/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_tpu_e2e.yaml b/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_tpu_e2e.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b4374f9cd6118bee60bd75a6f3dce2c5eb04c15a --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/retinanet/coco_spinenet49_mobile_qat_tpu_e2e.yaml @@ -0,0 +1,67 @@ +# --experiment_type=retinanet_mobile_coco_qat +# COCO mAP: 23.2 +# QAT only supports float32 tpu due to fake-quant op. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'float32' +task: + losses: + l2_weight_decay: 3.0e-05 + model: + anchor: + anchor_size: 3 + aspect_ratios: [0.5, 1.0, 2.0] + num_scales: 3 + backbone: + spinenet_mobile: + stochastic_depth_drop_rate: 0.2 + model_id: '49' + se_ratio: 0.2 + use_keras_upsampling_2d: true + type: 'spinenet_mobile' + decoder: + type: 'identity' + head: + num_convs: 4 + num_filters: 48 + use_separable_conv: true + input_size: [384, 384, 3] + max_level: 7 + min_level: 3 + norm_activation: + activation: 'swish' + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + train_data: + dtype: 'float32' + global_batch_size: 256 + is_training: true + parser: + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + dtype: 'float32' + global_batch_size: 16 + is_training: false + quantization: + pretrained_original_checkpoint: 'gs://**/coco_spinenet49_mobile_tpu_33884721/ckpt-277200' + quantize_detection_head: true +trainer: + checkpoint_interval: 462 + optimizer_config: + learning_rate: + stepwise: + boundaries: [263340, 272580] + values: [0.032, 0.0032, 0.00032] + type: 'stepwise' + warmup: + linear: + warmup_learning_rate: 0.00067 + warmup_steps: 2000 + steps_per_loop: 462 + train_steps: 277200 + validation_interval: 462 + validation_steps: 625 + summary_interval: 924 diff --git a/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3_mobilenetv2_pascal_qat_gpu.yaml b/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3_mobilenetv2_pascal_qat_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..ca8bf7873cdf53fd202147370e6572ea5a6c4204 --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3_mobilenetv2_pascal_qat_gpu.yaml @@ -0,0 +1,81 @@ +# --experiment_type=mnv2_deeplabv3_pascal_qat +# Use 8 v100 GPUs for training and 4 v100 GPUs for eval. +# mIoU (unquantized fp32): 74.78 +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float32' + loss_scale: 'dynamic' +task: + model: + num_classes: 21 + input_size: [512, 512, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV2' + output_stride: 16 + decoder: + aspp: + dilation_rates: [] + level: 4 + pool_kernel_size: null + output_tensor: true + type: 'aspp' + head: + feature_fusion: null + num_convs: 0 + norm_activation: + activation: relu + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + losses: + l2_weight_decay: 4.0e-07 # 1/100 of original value. + train_data: + output_size: [512, 512] + crop_size: [512, 512] + input_path: 'gs://**/pascal_voc_seg/train_aug*' + is_training: true + global_batch_size: 16 + dtype: 'float32' + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + output_size: [512, 512] + input_path: 'gs://**/pascal_voc_seg/val*' + is_training: false + global_batch_size: 16 + dtype: 'float32' + drop_remainder: false + resize_eval_groundtruth: false + groundtruth_padded_size: [512, 512] + quantization: + pretrained_original_checkpoint: 'gs://**/deeplabv3_mobilenetv2_pascal_coco_0.21/29808901/best_ckpt/best_ckpt-54' + init_checkpoint: null +trainer: + optimizer_config: + learning_rate: + polynomial: + decay_steps: 13240 + initial_learning_rate: 0.00007 # 1/100 of original lr. + power: 0.9 + type: polynomial + optimizer: + sgd: + momentum: 0.9 + type: sgd + warmup: + linear: + name: linear + warmup_steps: 0 # No warmup + type: linear + best_checkpoint_eval_metric: 'mean_iou' + best_checkpoint_export_subdir: 'best_ckpt' + best_checkpoint_metric_comp: 'higher' + steps_per_loop: 662 + summary_interval: 662 + train_steps: 13240 + validation_interval: 662 + validation_steps: 90 + checkpoint_interval: 662 diff --git a/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3_mobilenetv2_pascal_qat_tpu.yaml b/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3_mobilenetv2_pascal_qat_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..7776a836436d6d89a8974d1cb8efd5311609a69c --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3_mobilenetv2_pascal_qat_tpu.yaml @@ -0,0 +1,80 @@ +# --experiment_type=mnv2_deeplabv3_pascal_qat +# Use 4x2 DF for training and eval. +# mIoU (unquantized fp32): 74.69 +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'float32' +task: + model: + num_classes: 21 + input_size: [512, 512, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV2' + output_stride: 16 + decoder: + aspp: + dilation_rates: [] + level: 4 + pool_kernel_size: null + output_tensor: true + type: 'aspp' + head: + feature_fusion: null + num_convs: 0 + norm_activation: + activation: relu + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + losses: + l2_weight_decay: 4.0e-07 # 1/100 of original value. + train_data: + output_size: [512, 512] + crop_size: [512, 512] + input_path: 'gs://**/pascal_voc_seg/train_aug*' + is_training: true + global_batch_size: 16 + dtype: 'float32' + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + output_size: [512, 512] + input_path: 'gs://**/pascal_voc_seg/val*' + is_training: false + global_batch_size: 16 + dtype: 'float32' + drop_remainder: false + resize_eval_groundtruth: false + groundtruth_padded_size: [512, 512] + quantization: + pretrained_original_checkpoint: 'gs://**/deeplabv3_mobilenetv2_pascal_coco_0.21/29808901/best_ckpt/best_ckpt-54' + init_checkpoint: null +trainer: + optimizer_config: + learning_rate: + polynomial: + decay_steps: 13240 + initial_learning_rate: 0.00007 # 1/100 of original lr. + power: 0.9 + type: polynomial + optimizer: + sgd: + momentum: 0.9 + type: sgd + warmup: + linear: + name: linear + warmup_steps: 0 # No warmup + type: linear + best_checkpoint_eval_metric: 'mean_iou' + best_checkpoint_export_subdir: 'best_ckpt' + best_checkpoint_metric_comp: 'higher' + steps_per_loop: 662 + summary_interval: 662 + train_steps: 13240 + validation_interval: 662 + validation_steps: 90 + checkpoint_interval: 662 diff --git a/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3plus_mobilenetv2_cityscapes_qat_tpu.yaml b/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3plus_mobilenetv2_cityscapes_qat_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b9bdbae6f04be513d195afdd9126e87ee68aed5a --- /dev/null +++ b/official/projects/qat/vision/configs/experiments/semantic_segmentation/deeplabv3plus_mobilenetv2_cityscapes_qat_tpu.yaml @@ -0,0 +1,89 @@ +# --experiment_type=mnv2_deeplabv3plus_cityscapes_qat +# Use 4x2 DF for training and eval. +# mIoU (unquantized fp32): 73.84 +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'float32' +task: + model: + num_classes: 19 + input_size: [1024, 2048, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV2' + output_stride: 16 + output_intermediate_endpoints: true + decoder: + aspp: + dilation_rates: [] + level: 4 + pool_kernel_size: [512, 1024] + output_tensor: true + type: 'aspp' + head: + feature_fusion: 'deeplabv3plus' + low_level: '2/depthwise' + low_level_num_filters: 48 + level: 4 + num_convs: 2 + use_depthwise_convolution: true + norm_activation: + activation: relu + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + losses: + l2_weight_decay: 4.0e-07 # 1/100 of original value. + train_data: + output_size: [1024, 2048] + crop_size: [] + input_path: '' + tfds_name: 'cityscapes/semantic_segmentation' + tfds_split: 'train' + is_training: true + global_batch_size: 16 + dtype: 'float32' + aug_rand_hflip: true + aug_scale_max: 2.0 + aug_scale_min: 0.5 + validation_data: + output_size: [1024, 2048] + input_path: '' + tfds_name: 'cityscapes/semantic_segmentation' + tfds_split: 'validation' + is_training: false + global_batch_size: 16 + dtype: 'float32' + drop_remainder: false + resize_eval_groundtruth: true + quantization: + pretrained_original_checkpoint: 'gs://**/deeplabv3plus_mobilenetv2_cityscapes/29814723/best_ckpt/best_ckpt-408' + init_checkpoint: null +trainer: + optimizer_config: + learning_rate: + polynomial: + decay_steps: 20000 + initial_learning_rate: 0.0001 # 1/100 of original lr. + power: 0.9 + type: polynomial + optimizer: + sgd: + momentum: 0.9 + type: sgd + warmup: + linear: + name: linear + warmup_learning_rate: 0 + warmup_steps: 0 # No warmup + type: linear + steps_per_loop: 185 + summary_interval: 185 + train_steps: 20000 + validation_interval: 185 + validation_steps: 31 + checkpoint_interval: 185 + best_checkpoint_export_subdir: 'best_ckpt' + best_checkpoint_eval_metric: 'mean_iou' + best_checkpoint_metric_comp: 'higher' diff --git a/official/projects/qat/vision/configs/image_classification.py b/official/projects/qat/vision/configs/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..b3f0356970c99551771084a5e60657919e8c77fa --- /dev/null +++ b/official/projects/qat/vision/configs/image_classification.py @@ -0,0 +1,52 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification configuration definition.""" + +import dataclasses +from typing import Optional + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.qat.vision.configs import common +from official.vision.configs import image_classification + + +@dataclasses.dataclass +class ImageClassificationTask(image_classification.ImageClassificationTask): + quantization: Optional[common.Quantization] = None + + +@exp_factory.register_config_factory('resnet_imagenet_qat') +def image_classification_imagenet() -> cfg.ExperimentConfig: + """Builds an image classification config for the resnet with QAT.""" + config = image_classification.image_classification_imagenet() + task = ImageClassificationTask.from_args( + quantization=common.Quantization(), **config.task.as_dict()) + config.task = task + runtime = cfg.RuntimeConfig(enable_xla=False) + config.runtime = runtime + + return config + + +@exp_factory.register_config_factory('mobilenet_imagenet_qat') +def image_classification_imagenet_mobilenet() -> cfg.ExperimentConfig: + """Builds an image classification config for the mobilenetV2 with QAT.""" + config = image_classification.image_classification_imagenet_mobilenet() + task = ImageClassificationTask.from_args( + quantization=common.Quantization(), **config.task.as_dict()) + config.task = task + + return config diff --git a/official/projects/qat/vision/configs/image_classification_test.py b/official/projects/qat/vision/configs/image_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..6bddd78f0a9fac28d945a59bc3f959efbc1b71e5 --- /dev/null +++ b/official/projects/qat/vision/configs/image_classification_test.py @@ -0,0 +1,48 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for image_classification.""" +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.qat.vision.configs import common +from official.projects.qat.vision.configs import image_classification as qat_exp_cfg +from official.vision.configs import image_classification as exp_cfg + + +class ImageClassificationConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + ('resnet_imagenet_qat',), + ('mobilenet_imagenet_qat',), + ) + def test_image_classification_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, qat_exp_cfg.ImageClassificationTask) + self.assertIsInstance(config.task.model, + exp_cfg.ImageClassificationModel) + self.assertIsInstance(config.task.quantization, common.Quantization) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.task.train_data.is_training = None + with self.assertRaisesRegex(KeyError, 'Found inconsistency between key'): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/configs/retinanet.py b/official/projects/qat/vision/configs/retinanet.py new file mode 100644 index 0000000000000000000000000000000000000000..36dfa4bf8e01bfc090cd47a19ba42560f648efc1 --- /dev/null +++ b/official/projects/qat/vision/configs/retinanet.py @@ -0,0 +1,47 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""RetinaNet configuration definition.""" +import dataclasses +from typing import Optional + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.qat.vision.configs import common +from official.vision.configs import retinanet +from official.vision.configs import backbones + + +@dataclasses.dataclass +class RetinaNetTask(retinanet.RetinaNetTask): + quantization: Optional[common.Quantization] = None + + +@exp_factory.register_config_factory('retinanet_mobile_coco_qat') +def retinanet_mobile_coco() -> cfg.ExperimentConfig: + """Generates a config for COCO OD RetinaNet for mobile with QAT.""" + config = retinanet.retinanet_spinenet_mobile_coco() + task = RetinaNetTask.from_args( + quantization=common.Quantization(), **config.task.as_dict()) + task.model.backbone = backbones.Backbone( + type='spinenet_mobile', + spinenet_mobile=backbones.SpineNetMobile( + model_id='49', + stochastic_depth_drop_rate=0.2, + min_level=3, + max_level=7, + use_keras_upsampling_2d=True)) + config.task = task + + return config diff --git a/official/projects/qat/vision/configs/retinanet_test.py b/official/projects/qat/vision/configs/retinanet_test.py new file mode 100644 index 0000000000000000000000000000000000000000..6d7dde0fd3b0fc2d9e244145c371ba2b3d82c601 --- /dev/null +++ b/official/projects/qat/vision/configs/retinanet_test.py @@ -0,0 +1,47 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for retinanet.""" +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.qat.vision.configs import common +from official.projects.qat.vision.configs import retinanet as qat_exp_cfg +from official.vision.configs import retinanet as exp_cfg + + +class RetinaNetConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + ('retinanet_mobile_coco_qat',), + ) + def test_retinanet_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, qat_exp_cfg.RetinaNetTask) + self.assertIsInstance(config.task.model, exp_cfg.RetinaNet) + self.assertIsInstance(config.task.quantization, common.Quantization) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.validate() + config.task.train_data.is_training = None + with self.assertRaisesRegex(KeyError, 'Found inconsistency between key'): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/configs/semantic_segmentation.py b/official/projects/qat/vision/configs/semantic_segmentation.py new file mode 100644 index 0000000000000000000000000000000000000000..0bfe94b4549040f7e54fe956625d25eab5eba6f5 --- /dev/null +++ b/official/projects/qat/vision/configs/semantic_segmentation.py @@ -0,0 +1,57 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""RetinaNet configuration definition.""" +import dataclasses +from typing import Optional + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.qat.vision.configs import common +from official.vision.configs import semantic_segmentation + + +@dataclasses.dataclass +class SemanticSegmentationTask(semantic_segmentation.SemanticSegmentationTask): + quantization: Optional[common.Quantization] = None + + +@exp_factory.register_config_factory('mnv2_deeplabv3_pascal_qat') +def mnv2_deeplabv3_pascal() -> cfg.ExperimentConfig: + """Generates a config for MobileNet v2 + deeplab v3 with QAT.""" + config = semantic_segmentation.mnv2_deeplabv3_pascal() + task = SemanticSegmentationTask.from_args( + quantization=common.Quantization(), **config.task.as_dict()) + config.task = task + return config + + +@exp_factory.register_config_factory('mnv2_deeplabv3_cityscapes_qat') +def mnv2_deeplabv3_cityscapes() -> cfg.ExperimentConfig: + """Generates a config for MobileNet v2 + deeplab v3 with QAT.""" + config = semantic_segmentation.mnv2_deeplabv3_cityscapes() + task = SemanticSegmentationTask.from_args( + quantization=common.Quantization(), **config.task.as_dict()) + config.task = task + return config + + +@exp_factory.register_config_factory('mnv2_deeplabv3plus_cityscapes_qat') +def mnv2_deeplabv3plus_cityscapes() -> cfg.ExperimentConfig: + """Generates a config for MobileNet v2 + deeplab v3+ with QAT.""" + config = semantic_segmentation.mnv2_deeplabv3plus_cityscapes() + task = SemanticSegmentationTask.from_args( + quantization=common.Quantization(), **config.task.as_dict()) + config.task = task + return config diff --git a/official/projects/qat/vision/configs/semantic_segmentation_test.py b/official/projects/qat/vision/configs/semantic_segmentation_test.py new file mode 100644 index 0000000000000000000000000000000000000000..55659fe62347d9d4c3eda4fb196c2c0dc676dd38 --- /dev/null +++ b/official/projects/qat/vision/configs/semantic_segmentation_test.py @@ -0,0 +1,47 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for retinanet.""" +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.projects.qat.vision.configs import common +from official.projects.qat.vision.configs import semantic_segmentation as qat_exp_cfg +from official.vision.configs import semantic_segmentation as exp_cfg + + +class SemanticSegmentationConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters(('mnv2_deeplabv3_pascal_qat',), + ('mnv2_deeplabv3_cityscapes_qat',), + ('mnv2_deeplabv3plus_cityscapes_qat')) + def test_semantic_segmentation_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, qat_exp_cfg.SemanticSegmentationTask) + self.assertIsInstance(config.task.model, exp_cfg.SemanticSegmentationModel) + self.assertIsInstance(config.task.quantization, common.Quantization) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.validate() + config.task.train_data.is_training = None + with self.assertRaisesRegex(KeyError, 'Found inconsistency between key'): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/modeling/__init__.py b/official/projects/qat/vision/modeling/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..aa57bbd868386cbe542c0ea680d75d9e01062df8 --- /dev/null +++ b/official/projects/qat/vision/modeling/__init__.py @@ -0,0 +1,17 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Modeling package definition.""" + +from official.projects.qat.vision.modeling import layers diff --git a/official/projects/qat/vision/modeling/factory.py b/official/projects/qat/vision/modeling/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..81b175e02541abfbd4eaee1d64e84fadafd52a2f --- /dev/null +++ b/official/projects/qat/vision/modeling/factory.py @@ -0,0 +1,267 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Factory methods to build models.""" +# Import libraries + +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.projects.qat.vision.configs import common +from official.projects.qat.vision.modeling import segmentation_model as qat_segmentation_model +from official.projects.qat.vision.modeling.heads import dense_prediction_heads as dense_prediction_heads_qat +from official.projects.qat.vision.modeling.layers import nn_layers as qat_nn_layers +from official.projects.qat.vision.n_bit import schemes as n_bit_schemes +from official.projects.qat.vision.quantization import configs as qat_configs +from official.projects.qat.vision.quantization import helper +from official.projects.qat.vision.quantization import schemes +from official.vision import configs +from official.vision.modeling import classification_model +from official.vision.modeling import retinanet_model +from official.vision.modeling.decoders import aspp +from official.vision.modeling.decoders import fpn +from official.vision.modeling.heads import dense_prediction_heads +from official.vision.modeling.heads import segmentation_heads +from official.vision.modeling.layers import nn_layers + + +def build_qat_classification_model( + model: tf.keras.Model, + quantization: common.Quantization, + input_specs: tf.keras.layers.InputSpec, + model_config: configs.image_classification.ImageClassificationModel, + l2_regularizer: tf.keras.regularizers.Regularizer = None +) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + """Apply model optimization techniques. + + Args: + model: The model applying model optimization techniques. + quantization: The Quantization config. + input_specs: `tf.keras.layers.InputSpec` specs of the input tensor. + model_config: The model config. + l2_regularizer: tf.keras.regularizers.Regularizer object. Default to None. + + Returns: + model: The model that applied optimization techniques. + """ + original_checkpoint = quantization.pretrained_original_checkpoint + if original_checkpoint: + ckpt = tf.train.Checkpoint( + model=model, + **model.checkpoint_items) + status = ckpt.read(original_checkpoint) + status.expect_partial().assert_existing_objects_matched() + + scope_dict = { + 'L2': tf.keras.regularizers.l2, + } + with tfmot.quantization.keras.quantize_scope(scope_dict): + annotated_backbone = tfmot.quantization.keras.quantize_annotate_model( + model.backbone) + if quantization.change_num_bits: + backbone = tfmot.quantization.keras.quantize_apply( + annotated_backbone, + scheme=n_bit_schemes.DefaultNBitQuantizeScheme( + num_bits_weight=quantization.num_bits_weight, + num_bits_activation=quantization.num_bits_activation)) + else: + backbone = tfmot.quantization.keras.quantize_apply( + annotated_backbone, + scheme=schemes.Default8BitQuantizeScheme()) + + norm_activation_config = model_config.norm_activation + backbone_optimized_model = classification_model.ClassificationModel( + backbone=backbone, + num_classes=model_config.num_classes, + input_specs=input_specs, + dropout_rate=model_config.dropout_rate, + kernel_regularizer=l2_regularizer, + add_head_batch_norm=model_config.add_head_batch_norm, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon) + for from_layer, to_layer in zip( + model.layers, backbone_optimized_model.layers): + if from_layer != model.backbone: + to_layer.set_weights(from_layer.get_weights()) + + with tfmot.quantization.keras.quantize_scope(scope_dict): + def apply_quantization_to_dense(layer): + if isinstance(layer, (tf.keras.layers.Dense, + tf.keras.layers.Dropout, + tf.keras.layers.GlobalAveragePooling2D)): + return tfmot.quantization.keras.quantize_annotate_layer(layer) + return layer + + annotated_model = tf.keras.models.clone_model( + backbone_optimized_model, + clone_function=apply_quantization_to_dense, + ) + + if quantization.change_num_bits: + optimized_model = tfmot.quantization.keras.quantize_apply( + annotated_model, + scheme=n_bit_schemes.DefaultNBitQuantizeScheme( + num_bits_weight=quantization.num_bits_weight, + num_bits_activation=quantization.num_bits_activation)) + + else: + optimized_model = tfmot.quantization.keras.quantize_apply( + annotated_model) + + return optimized_model + + +def _clone_function_for_fpn(layer): + if isinstance(layer, ( + tf.keras.layers.BatchNormalization, + tf.keras.layers.experimental.SyncBatchNormalization)): + return tfmot.quantization.keras.quantize_annotate_layer( + qat_nn_layers.BatchNormalizationWrapper(layer), + qat_configs.Default8BitOutputQuantizeConfig()) + if isinstance(layer, tf.keras.layers.UpSampling2D): + return layer + return tfmot.quantization.keras.quantize_annotate_layer(layer) + + +def build_qat_retinanet( + model: tf.keras.Model, quantization: common.Quantization, + model_config: configs.retinanet.RetinaNet) -> tf.keras.Model: + """Applies quantization aware training for RetinaNet model. + + Args: + model: The model applying quantization aware training. + quantization: The Quantization config. + model_config: The model config. + + Returns: + The model that applied optimization techniques. + """ + + original_checkpoint = quantization.pretrained_original_checkpoint + if original_checkpoint is not None: + ckpt = tf.train.Checkpoint( + model=model, + **model.checkpoint_items) + status = ckpt.read(original_checkpoint) + status.expect_partial().assert_existing_objects_matched() + + scope_dict = { + 'L2': tf.keras.regularizers.l2, + 'BatchNormalizationWrapper': qat_nn_layers.BatchNormalizationWrapper, + } + with tfmot.quantization.keras.quantize_scope(scope_dict): + annotated_backbone = tfmot.quantization.keras.quantize_annotate_model( + model.backbone) + optimized_backbone = tfmot.quantization.keras.quantize_apply( + annotated_backbone, + scheme=schemes.Default8BitQuantizeScheme()) + decoder = model.decoder + if quantization.quantize_detection_decoder: + if not isinstance(decoder, fpn.FPN): + raise ValueError('Currently only supports FPN.') + + decoder = tf.keras.models.clone_model( + decoder, + clone_function=_clone_function_for_fpn, + ) + decoder = tfmot.quantization.keras.quantize_apply(decoder) + decoder = tfmot.quantization.keras.remove_input_range(decoder) + + head = model.head + if quantization.quantize_detection_head: + if not isinstance(head, dense_prediction_heads.RetinaNetHead): + raise ValueError('Currently only supports RetinaNetHead.') + head = ( + dense_prediction_heads_qat.RetinaNetHeadQuantized.from_config( + head.get_config())) + + optimized_model = retinanet_model.RetinaNetModel( + optimized_backbone, + decoder, + head, + model.detection_generator, + min_level=model_config.min_level, + max_level=model_config.max_level, + num_scales=model_config.anchor.num_scales, + aspect_ratios=model_config.anchor.aspect_ratios, + anchor_size=model_config.anchor.anchor_size) + + if quantization.quantize_detection_head: + # Call the model with dummy input to build the head part. + dummpy_input = tf.zeros([1] + model_config.input_size) + optimized_model(dummpy_input, training=True) + helper.copy_original_weights(model.head, optimized_model.head) + return optimized_model + + +def build_qat_segmentation_model( + model: tf.keras.Model, quantization: common.Quantization, + input_specs: tf.keras.layers.InputSpec) -> tf.keras.Model: + """Applies quantization aware training for segmentation model. + + Args: + model: The model applying quantization aware training. + quantization: The Quantization config. + input_specs: The shape specifications of input tensor. + + Returns: + The model that applied optimization techniques. + """ + + original_checkpoint = quantization.pretrained_original_checkpoint + if original_checkpoint is not None: + ckpt = tf.train.Checkpoint(model=model, **model.checkpoint_items) + status = ckpt.read(original_checkpoint) + status.expect_partial().assert_existing_objects_matched() + + # Build quantization compatible model. + model = qat_segmentation_model.SegmentationModelQuantized( + model.backbone, model.decoder, model.head, input_specs) + + scope_dict = { + 'L2': tf.keras.regularizers.l2, + } + + # Apply QAT to backbone (a tf.keras.Model) first. + with tfmot.quantization.keras.quantize_scope(scope_dict): + annotated_backbone = tfmot.quantization.keras.quantize_annotate_model( + model.backbone) + optimized_backbone = tfmot.quantization.keras.quantize_apply( + annotated_backbone, scheme=schemes.Default8BitQuantizeScheme()) + backbone_optimized_model = qat_segmentation_model.SegmentationModelQuantized( + optimized_backbone, model.decoder, model.head, input_specs) + + # Copy over all remaining layers. + for from_layer, to_layer in zip(model.layers, + backbone_optimized_model.layers): + if from_layer != model.backbone: + to_layer.set_weights(from_layer.get_weights()) + + with tfmot.quantization.keras.quantize_scope(scope_dict): + + def apply_quantization_to_layers(layer): + if isinstance(layer, (segmentation_heads.SegmentationHead, + nn_layers.SpatialPyramidPooling, aspp.ASPP)): + return tfmot.quantization.keras.quantize_annotate_layer(layer) + return layer + + annotated_model = tf.keras.models.clone_model( + backbone_optimized_model, + clone_function=apply_quantization_to_layers, + ) + optimized_model = tfmot.quantization.keras.quantize_apply( + annotated_model, scheme=schemes.Default8BitQuantizeScheme()) + + return optimized_model diff --git a/official/projects/qat/vision/modeling/factory_test.py b/official/projects/qat/vision/modeling/factory_test.py new file mode 100644 index 0000000000000000000000000000000000000000..ae7aa90c7cbe9557841ec36df3a3190b0d6227b0 --- /dev/null +++ b/official/projects/qat/vision/modeling/factory_test.py @@ -0,0 +1,251 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for factory.py.""" + +# Import libraries + +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.qat.vision.configs import common +from official.projects.qat.vision.modeling import factory as qat_factory +from official.projects.qat.vision.modeling.heads import dense_prediction_heads as qat_dense_prediction_heads +from official.vision.configs import backbones +from official.vision.configs import decoders +from official.vision.configs import image_classification as classification_cfg +from official.vision.configs import retinanet as retinanet_cfg +from official.vision.configs import semantic_segmentation as semantic_segmentation_cfg +from official.vision.modeling import factory +from official.vision.modeling.decoders import fpn +from official.vision.modeling.heads import dense_prediction_heads + + +class ClassificationModelBuilderTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('resnet', (224, 224), 5e-5), + ('resnet', (224, 224), None), + ('resnet', (None, None), 5e-5), + ('resnet', (None, None), None), + ('mobilenet', (224, 224), 5e-5), + ('mobilenet', (224, 224), None), + ('mobilenet', (None, None), 5e-5), + ('mobilenet', (None, None), None), + ) + def test_builder(self, backbone_type, input_size, weight_decay): + num_classes = 2 + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], 3]) + model_config = classification_cfg.ImageClassificationModel( + num_classes=num_classes, + backbone=backbones.Backbone(type=backbone_type)) + l2_regularizer = ( + tf.keras.regularizers.l2(weight_decay) if weight_decay else None) + model = factory.build_classification_model( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + quantization_config = common.Quantization() + _ = qat_factory.build_qat_classification_model( + model=model, + input_specs=input_specs, + quantization=quantization_config, + model_config=model_config, + l2_regularizer=l2_regularizer) + + +class RetinaNetBuilderTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('spinenet_mobile', 'identity', (640, 640), False, False), + ('spinenet_mobile', 'identity', (640, 640), True, False), + ('mobilenet', 'fpn', (640, 640), True, False), + ('mobilenet', 'fpn', (640, 640), True, True), + ) + def test_builder(self, + backbone_type, + decoder_type, + input_size, + quantize_detection_head, + quantize_detection_decoder): + num_classes = 2 + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], 3]) + + if backbone_type == 'spinenet_mobile': + backbone_config = backbones.Backbone( + type=backbone_type, + spinenet_mobile=backbones.SpineNetMobile( + model_id='49', + stochastic_depth_drop_rate=0.2, + min_level=3, + max_level=7, + use_keras_upsampling_2d=True)) + elif backbone_type == 'mobilenet': + backbone_config = backbones.Backbone( + type=backbone_type, + mobilenet=backbones.MobileNet( + model_id='MobileNetV2', + filter_size_scale=1.0)) + else: + raise ValueError( + 'backbone_type {} is not supported'.format(backbone_type)) + + if decoder_type == 'identity': + decoder_config = decoders.Decoder(type=decoder_type) + elif decoder_type == 'fpn': + decoder_config = decoders.Decoder( + type=decoder_type, + fpn=decoders.FPN( + num_filters=128, + use_separable_conv=True, + use_keras_layer=True)) + else: + raise ValueError( + 'decoder_type {} is not supported'.format(decoder_type)) + + model_config = retinanet_cfg.RetinaNet( + num_classes=num_classes, + input_size=[input_size[0], input_size[1], 3], + backbone=backbone_config, + decoder=decoder_config, + head=retinanet_cfg.RetinaNetHead( + attribute_heads=None, + use_separable_conv=True)) + + l2_regularizer = tf.keras.regularizers.l2(5e-5) + # Build the original float32 retinanet model. + model = factory.build_retinanet( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + # Call the model with dummy input to build the head part. + dummpy_input = tf.zeros([1] + model_config.input_size) + model(dummpy_input, training=True) + + # Build the QAT model from the original model with quantization config. + qat_model = qat_factory.build_qat_retinanet( + model=model, + quantization=common.Quantization( + quantize_detection_decoder=quantize_detection_decoder, + quantize_detection_head=quantize_detection_head), + model_config=model_config) + + if quantize_detection_head: + # head become a RetinaNetHeadQuantized when we apply quantization. + self.assertIsInstance(qat_model.head, + qat_dense_prediction_heads.RetinaNetHeadQuantized) + else: + # head is a RetinaNetHead if we don't apply quantization on head part. + self.assertIsInstance( + qat_model.head, dense_prediction_heads.RetinaNetHead) + self.assertNotIsInstance( + qat_model.head, qat_dense_prediction_heads.RetinaNetHeadQuantized) + + if decoder_type == 'FPN': + if quantize_detection_decoder: + # FPN decoder become a general keras functional model after applying + # quantization. + self.assertNotIsInstance(qat_model.decoder, fpn.FPN) + else: + self.assertIsInstance(qat_model.decoder, fpn.FPN) + + +class SegmentationModelBuilderTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('mobilenet', (512, 512), 5e-5),) + def test_deeplabv3_builder(self, backbone_type, input_size, weight_decay): + num_classes = 21 + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], 3]) + model_config = semantic_segmentation_cfg.SemanticSegmentationModel( + num_classes=num_classes, + backbone=backbones.Backbone( + type=backbone_type, + mobilenet=backbones.MobileNet( + model_id='MobileNetV2', output_stride=16)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=4, + num_filters=256, + dilation_rates=[], + spp_layer_version='v1', + output_tensor=True)), + head=semantic_segmentation_cfg.SegmentationHead( + level=4, + low_level=2, + num_convs=1, + upsample_factor=2, + use_depthwise_convolution=True)) + l2_regularizer = ( + tf.keras.regularizers.l2(weight_decay) if weight_decay else None) + model = factory.build_segmentation_model( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + quantization_config = common.Quantization() + _ = qat_factory.build_qat_segmentation_model( + model=model, quantization=quantization_config, input_specs=input_specs) + + @parameterized.parameters( + ('mobilenet', (512, 1024), 5e-5),) + def test_deeplabv3plus_builder(self, backbone_type, input_size, weight_decay): + num_classes = 19 + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], 3]) + model_config = semantic_segmentation_cfg.SemanticSegmentationModel( + num_classes=num_classes, + backbone=backbones.Backbone( + type=backbone_type, + mobilenet=backbones.MobileNet( + model_id='MobileNetV2', + output_stride=16, + output_intermediate_endpoints=True)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=4, + num_filters=256, + dilation_rates=[], + pool_kernel_size=[512, 1024], + use_depthwise_convolution=False, + spp_layer_version='v1', + output_tensor=True)), + head=semantic_segmentation_cfg.SegmentationHead( + level=4, + num_convs=2, + feature_fusion='deeplabv3plus', + use_depthwise_convolution=True, + low_level='2/depthwise', + low_level_num_filters=48, + prediction_kernel_size=1, + upsample_factor=1, + num_filters=256)) + l2_regularizer = ( + tf.keras.regularizers.l2(weight_decay) if weight_decay else None) + model = factory.build_segmentation_model( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + quantization_config = common.Quantization() + _ = qat_factory.build_qat_segmentation_model( + model=model, quantization=quantization_config, input_specs=input_specs) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/modeling/heads/__init__.py b/official/projects/qat/vision/modeling/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..a7216cab7dc4f51486c6ef516a081d841e1d174e --- /dev/null +++ b/official/projects/qat/vision/modeling/heads/__init__.py @@ -0,0 +1,18 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Lint as: python3 +"""Heads package definition.""" + +from official.projects.qat.vision.modeling.heads.dense_prediction_heads import RetinaNetHeadQuantized diff --git a/official/projects/qat/vision/modeling/heads/dense_prediction_heads.py b/official/projects/qat/vision/modeling/heads/dense_prediction_heads.py new file mode 100644 index 0000000000000000000000000000000000000000..9ffe46f68284f24067506a6b81f8f8d3a5e3595f --- /dev/null +++ b/official/projects/qat/vision/modeling/heads/dense_prediction_heads.py @@ -0,0 +1,438 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of dense prediction heads.""" +from __future__ import annotations + +import copy +from typing import Any, Dict, List, Mapping, Optional, Union, Type + +# Import libraries + +import numpy as np +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.modeling import tf_utils +from official.projects.qat.vision.quantization import configs +from official.projects.qat.vision.quantization import helper + + +class SeparableConv2DQuantized(tf.keras.layers.Layer): + """Quantized SeperableConv2D.""" + + def __init__(self, + name: Optional[str] = None, + last_quantize: bool = False, + **conv_kwargs): + """Initializes a SeparableConv2DQuantized. + + Args: + name: The name of the layer. + last_quantize: A `bool` indicates whether add quantization for the output. + **conv_kwargs: A keyword arguments to be used for conv and dwconv. + """ + + super().__init__(name=name) + self._conv_kwargs = copy.deepcopy(conv_kwargs) + self._name = name + self._last_quantize = last_quantize + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the child layers of the layer.""" + depthwise_conv2d_quantized = helper.quantize_wrapped_layer( + tf.keras.layers.DepthwiseConv2D, + configs.Default8BitConvQuantizeConfig( + ['depthwise_kernel'], [], True)) + conv2d_quantized = helper.quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.Default8BitConvQuantizeConfig( + ['kernel'], [], self._last_quantize)) + + dwconv_kwargs = self._conv_kwargs.copy() + # Depthwise conv input filters is always equal to output filters. + # This filters argument only needed for the point-wise conv2d op. + del dwconv_kwargs['filters'] + dwconv_kwargs.update({ + 'activation': None, + 'use_bias': False, + }) + self.dw_conv = depthwise_conv2d_quantized(name='dw', **dwconv_kwargs) + + conv_kwargs = self._conv_kwargs.copy() + conv_kwargs.update({ + 'kernel_size': (1, 1), + 'strides': (1, 1), + 'padding': 'valid', + 'groups': 1, + }) + + self.conv = conv2d_quantized(name='pw', **conv_kwargs) + + def call(self, inputs: tf.Tensor) -> tf.Tensor: + """Call the separable conv layer.""" + x = self.dw_conv(inputs) + outputs = self.conv(x) + return outputs + + def get_config(self) -> Dict[str, Any]: + """Returns the config of the layer.""" + config = self._conv_kwargs.copy() + config.update({ + 'name': self._name, + 'last_quantize': self._last_quantize, + }) + return config + + @classmethod + def from_config( + cls: Type[SeparableConv2DQuantized], + config: Dict[str, Any]) -> SeparableConv2DQuantized: + """Creates a layer from its config.""" + return cls(**config) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class RetinaNetHeadQuantized(tf.keras.layers.Layer): + """Creates a RetinaNet quantized head.""" + + def __init__( + self, + min_level: int, + max_level: int, + num_classes: int, + num_anchors_per_location: int, + num_convs: int = 4, + num_filters: int = 256, + attribute_heads: Optional[List[Dict[str, Any]]] = None, + use_separable_conv: bool = False, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + num_params_per_anchor: int = 4, + share_classification_heads: bool = False, + **kwargs): + """Initializes a RetinaNet quantized head. + + Args: + min_level: An `int` number of minimum feature level. + max_level: An `int` number of maximum feature level. + num_classes: An `int` number of classes to predict. + num_anchors_per_location: An `int` number of number of anchors per pixel + location. + num_convs: An `int` number that represents the number of the intermediate + conv layers before the prediction. + num_filters: An `int` number that represents the number of filters of the + intermediate conv layers. + attribute_heads: If not None, a list that contains a dict for each + additional attribute head. Each dict consists of 3 key-value pairs: + `name`, `type` ('regression' or 'classification'), and `size` (number + of predicted values for each instance). + use_separable_conv: A `bool` that indicates whether the separable + convolution layers is used. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + num_params_per_anchor: Number of parameters required to specify an anchor + box. For example, `num_params_per_anchor` would be 4 for axis-aligned + anchor boxes specified by their y-centers, x-centers, heights, and + widths. + share_classification_heads: A `bool` that indicates whethere + sharing weights among the main and attribute classification heads. Not + used in the QAT model. + **kwargs: Additional keyword arguments to be passed. + """ + del share_classification_heads + + super().__init__(**kwargs) + self._config_dict = { + 'min_level': min_level, + 'max_level': max_level, + 'num_classes': num_classes, + 'num_anchors_per_location': num_anchors_per_location, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'attribute_heads': attribute_heads, + 'use_separable_conv': use_separable_conv, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + 'num_params_per_anchor': num_params_per_anchor, + } + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the variables of the head.""" + if self._config_dict['use_separable_conv']: + conv_op = SeparableConv2DQuantized + else: + conv_op = helper.quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.Default8BitConvQuantizeConfig( + ['kernel'], ['activation'], False)) + conv_kwargs = { + 'filters': self._config_dict['num_filters'], + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal( + stddev=0.01), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + + base_bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_op = helper.norm_by_activation( + self._config_dict['activation'], + helper.quantize_wrapped_layer( + base_bn_op, configs.Default8BitOutputQuantizeConfig()), + helper.quantize_wrapped_layer( + base_bn_op, configs.NoOpQuantizeConfig())) + + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + # Class net. + self._cls_convs = [] + self._cls_norms = [] + for level in range( + self._config_dict['min_level'], self._config_dict['max_level'] + 1): + this_level_cls_norms = [] + for i in range(self._config_dict['num_convs']): + if level == self._config_dict['min_level']: + cls_conv_name = 'classnet-conv_{}'.format(i) + self._cls_convs.append(conv_op(name=cls_conv_name, **conv_kwargs)) + cls_norm_name = 'classnet-conv-norm_{}_{}'.format(level, i) + this_level_cls_norms.append(bn_op(name=cls_norm_name, **bn_kwargs)) + self._cls_norms.append(this_level_cls_norms) + + classifier_kwargs = { + 'filters': ( + self._config_dict['num_classes'] * + self._config_dict['num_anchors_per_location']), + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.constant_initializer(-np.log((1 - 0.01) / 0.01)), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + classifier_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal(stddev=1e-5), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + self._classifier = conv_op( + name='scores', last_quantize=True, **classifier_kwargs) + + # Box net. + self._box_convs = [] + self._box_norms = [] + for level in range( + self._config_dict['min_level'], self._config_dict['max_level'] + 1): + this_level_box_norms = [] + for i in range(self._config_dict['num_convs']): + if level == self._config_dict['min_level']: + box_conv_name = 'boxnet-conv_{}'.format(i) + self._box_convs.append(conv_op(name=box_conv_name, **conv_kwargs)) + box_norm_name = 'boxnet-conv-norm_{}_{}'.format(level, i) + this_level_box_norms.append(bn_op(name=box_norm_name, **bn_kwargs)) + self._box_norms.append(this_level_box_norms) + + box_regressor_kwargs = { + 'filters': (self._config_dict['num_params_per_anchor'] * + self._config_dict['num_anchors_per_location']), + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + box_regressor_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal( + stddev=1e-5), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + self._box_regressor = conv_op( + name='boxes', last_quantize=True, **box_regressor_kwargs) + + # Attribute learning nets. + if self._config_dict['attribute_heads']: + self._att_predictors = {} + self._att_convs = {} + self._att_norms = {} + + for att_config in self._config_dict['attribute_heads']: + att_name = att_config['name'] + att_type = att_config['type'] + att_size = att_config['size'] + att_convs_i = [] + att_norms_i = [] + + # Build conv and norm layers. + for level in range(self._config_dict['min_level'], + self._config_dict['max_level'] + 1): + this_level_att_norms = [] + for i in range(self._config_dict['num_convs']): + if level == self._config_dict['min_level']: + att_conv_name = '{}-conv_{}'.format(att_name, i) + att_convs_i.append(conv_op(name=att_conv_name, **conv_kwargs)) + att_norm_name = '{}-conv-norm_{}_{}'.format(att_name, level, i) + this_level_att_norms.append(bn_op(name=att_norm_name, **bn_kwargs)) + att_norms_i.append(this_level_att_norms) + self._att_convs[att_name] = att_convs_i + self._att_norms[att_name] = att_norms_i + + # Build the final prediction layer. + att_predictor_kwargs = { + 'filters': + (att_size * self._config_dict['num_anchors_per_location']), + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if att_type == 'regression': + att_predictor_kwargs.update( + {'bias_initializer': tf.zeros_initializer()}) + elif att_type == 'classification': + att_predictor_kwargs.update({ + 'bias_initializer': + tf.constant_initializer(-np.log((1 - 0.01) / 0.01)) + }) + else: + raise ValueError( + 'Attribute head type {} not supported.'.format(att_type)) + + if not self._config_dict['use_separable_conv']: + att_predictor_kwargs.update({ + 'kernel_initializer': + tf.keras.initializers.RandomNormal(stddev=1e-5), + 'kernel_regularizer': + self._config_dict['kernel_regularizer'], + }) + + self._att_predictors[att_name] = conv_op( + name='{}_attributes'.format(att_name), **att_predictor_kwargs) + + super().build(input_shape) + + def call(self, features: Mapping[str, tf.Tensor]): + """Forward pass of the RetinaNet quantized head. + + Args: + features: A `dict` of `tf.Tensor` where + - key: A `str` of the level of the multilevel features. + - values: A `tf.Tensor`, the feature map tensors, whose shape is + [batch, height_l, width_l, channels]. + + Returns: + scores: A `dict` of `tf.Tensor` which includes scores of the predictions. + - key: A `str` of the level of the multilevel predictions. + - values: A `tf.Tensor` of the box scores predicted from a particular + feature level, whose shape is + [batch, height_l, width_l, num_classes * num_anchors_per_location]. + boxes: A `dict` of `tf.Tensor` which includes coordinates of the + predictions. + - key: A `str` of the level of the multilevel predictions. + - values: A `tf.Tensor` of the box scores predicted from a particular + feature level, whose shape is + [batch, height_l, width_l, + num_params_per_anchor * num_anchors_per_location]. + attributes: a dict of (attribute_name, attribute_prediction). Each + `attribute_prediction` is a dict of: + - key: `str`, the level of the multilevel predictions. + - values: `Tensor`, the box scores predicted from a particular feature + level, whose shape is + [batch, height_l, width_l, + attribute_size * num_anchors_per_location]. + Can be an empty dictionary if no attribute learning is required. + """ + scores = {} + boxes = {} + if self._config_dict['attribute_heads']: + attributes = { + att_config['name']: {} + for att_config in self._config_dict['attribute_heads'] + } + else: + attributes = {} + + for i, level in enumerate( + range(self._config_dict['min_level'], + self._config_dict['max_level'] + 1)): + this_level_features = features[str(level)] + + # class net. + x = this_level_features + for conv, norm in zip(self._cls_convs, self._cls_norms[i]): + x = conv(x) + x = norm(x) + x = self._activation(x) + scores[str(level)] = self._classifier(x) + + # box net. + x = this_level_features + for conv, norm in zip(self._box_convs, self._box_norms[i]): + x = conv(x) + x = norm(x) + x = self._activation(x) + boxes[str(level)] = self._box_regressor(x) + + # attribute nets. + if self._config_dict['attribute_heads']: + for att_config in self._config_dict['attribute_heads']: + att_name = att_config['name'] + x = this_level_features + for conv, norm in zip(self._att_convs[att_name], + self._att_norms[att_name][i]): + x = conv(x) + x = norm(x) + x = self._activation(x) + attributes[att_name][str(level)] = self._att_predictors[att_name](x) + + return scores, boxes, attributes + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) + diff --git a/official/projects/qat/vision/modeling/heads/dense_prediction_heads_test.py b/official/projects/qat/vision/modeling/heads/dense_prediction_heads_test.py new file mode 100644 index 0000000000000000000000000000000000000000..911fb17b56748dea043380fe3745613ebc500fbe --- /dev/null +++ b/official/projects/qat/vision/modeling/heads/dense_prediction_heads_test.py @@ -0,0 +1,92 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Lint as: python3 +"""Tests for dense_prediction_heads.py.""" + +# Import libraries +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.projects.qat.vision.modeling.heads import dense_prediction_heads + + +class RetinaNetHeadQuantizedTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (False, False, False), + (False, True, False), + (True, False, True), + (True, True, True), + ) + def test_forward(self, use_separable_conv, use_sync_bn, has_att_heads): + if has_att_heads: + attribute_heads = [dict(name='depth', type='regression', size=1)] + else: + attribute_heads = None + + retinanet_head = dense_prediction_heads.RetinaNetHeadQuantized( + min_level=3, + max_level=4, + num_classes=3, + num_anchors_per_location=3, + num_convs=2, + num_filters=256, + attribute_heads=attribute_heads, + use_separable_conv=use_separable_conv, + activation='relu', + use_sync_bn=use_sync_bn, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + features = { + '3': np.random.rand(2, 128, 128, 16), + '4': np.random.rand(2, 64, 64, 16), + } + scores, boxes, attributes = retinanet_head(features) + self.assertAllEqual(scores['3'].numpy().shape, [2, 128, 128, 9]) + self.assertAllEqual(scores['4'].numpy().shape, [2, 64, 64, 9]) + self.assertAllEqual(boxes['3'].numpy().shape, [2, 128, 128, 12]) + self.assertAllEqual(boxes['4'].numpy().shape, [2, 64, 64, 12]) + if has_att_heads: + for att in attributes.values(): + self.assertAllEqual(att['3'].numpy().shape, [2, 128, 128, 3]) + self.assertAllEqual(att['4'].numpy().shape, [2, 64, 64, 3]) + + def test_serialize_deserialize(self): + retinanet_head = dense_prediction_heads.RetinaNetHeadQuantized( + min_level=3, + max_level=7, + num_classes=3, + num_anchors_per_location=9, + num_convs=2, + num_filters=16, + attribute_heads=None, + use_separable_conv=False, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + config = retinanet_head.get_config() + new_retinanet_head = ( + dense_prediction_heads.RetinaNetHead.from_config(config)) + self.assertAllEqual( + retinanet_head.get_config(), new_retinanet_head.get_config()) + diff --git a/official/projects/qat/vision/modeling/layers/__init__.py b/official/projects/qat/vision/modeling/layers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..534843dd65845190da1dc2e7fa724cb6c9a7f14a --- /dev/null +++ b/official/projects/qat/vision/modeling/layers/__init__.py @@ -0,0 +1,19 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Layers package definition.""" + +from official.projects.qat.vision.modeling.layers.nn_blocks import BottleneckBlockQuantized +from official.projects.qat.vision.modeling.layers.nn_blocks import Conv2DBNBlockQuantized +from official.projects.qat.vision.modeling.layers.nn_blocks import InvertedBottleneckBlockQuantized diff --git a/official/projects/qat/vision/modeling/layers/nn_blocks.py b/official/projects/qat/vision/modeling/layers/nn_blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..a5e2c14536321f8feb6c74dcfee38120db30df8b --- /dev/null +++ b/official/projects/qat/vision/modeling/layers/nn_blocks.py @@ -0,0 +1,717 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains quantized neural blocks for the QAT.""" +from typing import Any, Dict, Optional, Sequence, Tuple, Union + +# Import libraries + +from absl import logging +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.modeling import tf_utils +from official.projects.qat.vision.modeling.layers import nn_layers as qat_nn_layers +from official.projects.qat.vision.quantization import configs +from official.projects.qat.vision.quantization import helper +from official.vision.modeling.layers import nn_layers + + +# This class is copied from modeling.layers.nn_blocks.BottleneckBlock and apply +# QAT. +@tf.keras.utils.register_keras_serializable(package='Vision') +class BottleneckBlockQuantized(tf.keras.layers.Layer): + """A quantized standard bottleneck block.""" + + def __init__(self, + filters: int, + strides: int, + dilation_rate: int = 1, + use_projection: bool = False, + se_ratio: Optional[float] = None, + resnetd_shortcut: bool = False, + stochastic_depth_drop_rate: Optional[float] = None, + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: tf.keras.regularizers.Regularizer = None, + bias_regularizer: tf.keras.regularizers.Regularizer = None, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + bn_trainable: bool = True, # pytype: disable=annotation-type-mismatch # typed-keras + **kwargs): + """Initializes a standard bottleneck block with BN after convolutions. + + Args: + filters: An `int` number of filters for the first two convolutions. Note + that the third and final convolution will use 4 times as many filters. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + dilation_rate: An `int` dilation_rate of convolutions. Default to 1. + use_projection: A `bool` for whether this block should use a projection + shortcut (versus the default identity shortcut). This is usually `True` + for the first block of a block group, which may change the number of + filters and the resolution. + se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. + resnetd_shortcut: A `bool`. If True, apply the resnetd style modification + to the shortcut connection. + stochastic_depth_drop_rate: A `float` or None. If not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + bn_trainable: A `bool` that indicates whether batch norm layers should be + trainable. Default to True. + **kwargs: Additional keyword arguments to be passed. + """ + super(BottleneckBlockQuantized, self).__init__(**kwargs) + + self._filters = filters + self._strides = strides + self._dilation_rate = dilation_rate + self._use_projection = use_projection + self._se_ratio = se_ratio + self._resnetd_shortcut = resnetd_shortcut + self._use_sync_bn = use_sync_bn + self._activation = activation + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + + norm_layer = ( + tf.keras.layers.experimental.SyncBatchNormalization + if use_sync_bn else tf.keras.layers.BatchNormalization) + self._norm_with_quantize = helper.BatchNormalizationQuantized(norm_layer) + self._norm = helper.BatchNormalizationNoQuantized(norm_layer) + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._bn_trainable = bn_trainable + + def build(self, input_shape: Optional[Union[Sequence[int], tf.Tensor]]): + """Build variables and child layers to prepare for calling.""" + if self._use_projection: + if self._resnetd_shortcut: + self._shortcut0 = tf.keras.layers.AveragePooling2D( + pool_size=2, strides=self._strides, padding='same') + self._shortcut1 = helper.Conv2DQuantized( + filters=self._filters * 4, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + else: + self._shortcut = helper.Conv2DQuantized( + filters=self._filters * 4, + kernel_size=1, + strides=self._strides, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + + self._norm0 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + + self._conv1 = helper.Conv2DQuantized( + filters=self._filters, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + self._norm1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation1 = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + + self._conv2 = helper.Conv2DQuantized( + filters=self._filters, + kernel_size=3, + strides=self._strides, + dilation_rate=self._dilation_rate, + padding='same', + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + self._norm2 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation2 = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + + self._conv3 = helper.Conv2DQuantized( + filters=self._filters * 4, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + self._norm3 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation3 = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + + if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: + self._squeeze_excitation = qat_nn_layers.SqueezeExcitationQuantized( + in_filters=self._filters * 4, + out_filters=self._filters * 4, + se_ratio=self._se_ratio, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + else: + self._squeeze_excitation = None + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + self._add = tfmot.quantization.keras.QuantizeWrapperV2( + tf.keras.layers.Add(), + configs.Default8BitQuantizeConfig([], [], True)) + + super(BottleneckBlockQuantized, self).build(input_shape) + + def get_config(self) -> Dict[str, Any]: + """Get a config of this layer.""" + config = { + 'filters': self._filters, + 'strides': self._strides, + 'dilation_rate': self._dilation_rate, + 'use_projection': self._use_projection, + 'se_ratio': self._se_ratio, + 'resnetd_shortcut': self._resnetd_shortcut, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'bn_trainable': self._bn_trainable + } + base_config = super(BottleneckBlockQuantized, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call( + self, + inputs: tf.Tensor, + training: Optional[Union[bool, tf.Tensor]] = None) -> tf.Tensor: + """Run the BottleneckBlockQuantized logics.""" + shortcut = inputs + if self._use_projection: + if self._resnetd_shortcut: + shortcut = self._shortcut0(shortcut) + shortcut = self._shortcut1(shortcut) + else: + shortcut = self._shortcut(shortcut) + shortcut = self._norm0(shortcut) + + x = self._conv1(inputs) + x = self._norm1(x) + x = self._activation1(x) + + x = self._conv2(x) + x = self._norm2(x) + x = self._activation2(x) + + x = self._conv3(x) + x = self._norm3(x) + + if self._squeeze_excitation: + x = self._squeeze_excitation(x) + + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + + x = self._add([x, shortcut]) + return self._activation3(x) + + +# This class is copied from modeling.backbones.mobilenet.Conv2DBNBlock and apply +# QAT. +@tf.keras.utils.register_keras_serializable(package='Vision') +class Conv2DBNBlockQuantized(tf.keras.layers.Layer): + """A quantized convolution block with batch normalization.""" + + def __init__( + self, + filters: int, + kernel_size: int = 3, + strides: int = 1, + use_bias: bool = False, + use_explicit_padding: bool = False, + activation: str = 'relu6', + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + use_normalization: bool = True, + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + **kwargs): + """A convolution block with batch normalization. + + Args: + filters: An `int` number of filters for the first two convolutions. Note + that the third and final convolution will use 4 times as many filters. + kernel_size: An `int` specifying the height and width of the 2D + convolution window. + strides: An `int` of block stride. If greater than 1, this block will + ultimately downsample the input. + use_bias: If True, use bias in the convolution layer. + use_explicit_padding: Use 'VALID' padding for convolutions, but prepad + inputs so that the output dimensions are the same as if 'SAME' padding + were used. + activation: A `str` name of the activation function. + kernel_initializer: A `str` for kernel initializer of convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + Default to None. + use_normalization: If True, use batch normalization. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + **kwargs: Additional keyword arguments to be passed. + """ + super(Conv2DBNBlockQuantized, self).__init__(**kwargs) + self._filters = filters + self._kernel_size = kernel_size + self._strides = strides + self._activation = activation + self._use_bias = use_bias + self._use_explicit_padding = use_explicit_padding + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._use_normalization = use_normalization + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + + if use_explicit_padding and kernel_size > 1: + self._padding = 'valid' + else: + self._padding = 'same' + + norm_layer = ( + tf.keras.layers.experimental.SyncBatchNormalization + if use_sync_bn else tf.keras.layers.BatchNormalization) + self._norm_with_quantize = helper.BatchNormalizationQuantized(norm_layer) + self._norm = helper.BatchNormalizationNoQuantized(norm_layer) + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + + def get_config(self) -> Dict[str, Any]: + """Get a config of this layer.""" + config = { + 'filters': self._filters, + 'strides': self._strides, + 'kernel_size': self._kernel_size, + 'use_bias': self._use_bias, + 'use_explicit_padding': self._use_explicit_padding, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'use_normalization': self._use_normalization, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon + } + base_config = super(Conv2DBNBlockQuantized, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def build(self, input_shape: Optional[Union[Sequence[int], tf.Tensor]]): + """Build variables and child layers to prepare for calling.""" + if self._use_explicit_padding and self._kernel_size > 1: + padding_size = nn_layers.get_padding_for_kernel_size(self._kernel_size) + self._pad = tf.keras.layers.ZeroPadding2D(padding_size) + conv2d_quantized = ( + helper.Conv2DQuantized + if self._use_normalization else helper.Conv2DOutputQuantized) + + self._conv0 = conv2d_quantized( + filters=self._filters, + kernel_size=self._kernel_size, + strides=self._strides, + padding=self._padding, + use_bias=self._use_bias, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + if self._use_normalization: + self._norm0 = helper.norm_by_activation(self._activation, + self._norm_with_quantize, + self._norm)( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + super(Conv2DBNBlockQuantized, self).build(input_shape) + + def call( + self, + inputs: tf.Tensor, + training: Optional[Union[bool, tf.Tensor]] = None) -> tf.Tensor: + """Run the Conv2DBNBlockQuantized logics.""" + if self._use_explicit_padding and self._kernel_size > 1: + inputs = self._pad(inputs) + x = self._conv0(inputs) + if self._use_normalization: + x = self._norm0(x) + return self._activation_layer(x) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class InvertedBottleneckBlockQuantized(tf.keras.layers.Layer): + """A quantized inverted bottleneck block.""" + + def __init__(self, + in_filters, + out_filters, + expand_ratio, + strides, + kernel_size=3, + se_ratio=None, + stochastic_depth_drop_rate=None, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + se_inner_activation='relu', + se_gating_activation='sigmoid', + se_round_down_protect=True, + expand_se_in_filters=False, + depthwise_activation=None, + use_sync_bn=False, + dilation_rate=1, + divisible_by=1, + regularize_depthwise=False, + use_depthwise=True, + use_residual=True, + norm_momentum=0.99, + norm_epsilon=0.001, + output_intermediate_endpoints=False, + **kwargs): + """Initializes an inverted bottleneck block with BN after convolutions. + + Args: + in_filters: An `int` number of filters of the input tensor. + out_filters: An `int` number of filters of the output tensor. + expand_ratio: An `int` of expand_ratio for an inverted bottleneck block. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + kernel_size: An `int` kernel_size of the depthwise conv layer. + se_ratio: A `float` or None. If not None, se ratio for the squeeze and + excitation layer. + stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + se_inner_activation: A `str` name of squeeze-excitation inner activation. + se_gating_activation: A `str` name of squeeze-excitation gating + activation. + se_round_down_protect: A `bool` of whether round down more than 10% will + be allowed in SE layer. + expand_se_in_filters: A `bool` of whether or not to expand in_filter in + squeeze and excitation layer. + depthwise_activation: A `str` name of the activation function for + depthwise only. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + dilation_rate: An `int` that specifies the dilation rate to use for. + divisible_by: An `int` that ensures all inner dimensions are divisible by + this number. + dilated convolution: An `int` to specify the same value for all spatial + dimensions. + regularize_depthwise: A `bool` of whether or not apply regularization on + depthwise. + use_depthwise: A `bool` of whether to uses fused convolutions instead of + depthwise. + use_residual: A `bool` of whether to include residual connection between + input and output. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + output_intermediate_endpoints: A `bool` of whether or not output the + intermediate endpoints. + **kwargs: Additional keyword arguments to be passed. + """ + super(InvertedBottleneckBlockQuantized, self).__init__(**kwargs) + + self._in_filters = in_filters + self._out_filters = out_filters + self._expand_ratio = expand_ratio + self._strides = strides + self._kernel_size = kernel_size + self._se_ratio = se_ratio + self._divisible_by = divisible_by + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._dilation_rate = dilation_rate + self._use_sync_bn = use_sync_bn + self._regularize_depthwise = regularize_depthwise + self._use_depthwise = use_depthwise + self._use_residual = use_residual + self._activation = activation + self._se_inner_activation = se_inner_activation + self._se_gating_activation = se_gating_activation + self._se_round_down_protect = se_round_down_protect + self._depthwise_activation = depthwise_activation + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._expand_se_in_filters = expand_se_in_filters + self._output_intermediate_endpoints = output_intermediate_endpoints + + norm_layer = ( + tf.keras.layers.experimental.SyncBatchNormalization + if use_sync_bn else tf.keras.layers.BatchNormalization) + self._norm_with_quantize = helper.BatchNormalizationQuantized(norm_layer) + self._norm = helper.BatchNormalizationNoQuantized(norm_layer) + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + if not depthwise_activation: + self._depthwise_activation = activation + if regularize_depthwise: + self._depthsize_regularizer = kernel_regularizer + else: + self._depthsize_regularizer = None + + def build(self, input_shape: Optional[Union[Sequence[int], tf.Tensor]]): + """Build variables and child layers to prepare for calling.""" + expand_filters = self._in_filters + if self._expand_ratio > 1: + # First 1x1 conv for channel expansion. + expand_filters = nn_layers.make_divisible( + self._in_filters * self._expand_ratio, self._divisible_by) + + expand_kernel = 1 if self._use_depthwise else self._kernel_size + expand_stride = 1 if self._use_depthwise else self._strides + + self._conv0 = helper.Conv2DQuantized( + filters=expand_filters, + kernel_size=expand_kernel, + strides=expand_stride, + padding='same', + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + self._norm0 = helper.norm_by_activation(self._activation, + self._norm_with_quantize, + self._norm)( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + if self._use_depthwise: + # Depthwise conv. + self._conv1 = helper.DepthwiseConv2DQuantized( + kernel_size=(self._kernel_size, self._kernel_size), + strides=self._strides, + padding='same', + depth_multiplier=1, + dilation_rate=self._dilation_rate, + use_bias=False, + depthwise_initializer=self._kernel_initializer, + depthwise_regularizer=self._depthsize_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + self._norm1 = helper.norm_by_activation(self._depthwise_activation, + self._norm_with_quantize, + self._norm)( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._depthwise_activation_layer = ( + tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._depthwise_activation, + use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig())) + + # Squeeze and excitation. + if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: + logging.info('Use Squeeze and excitation.') + in_filters = self._in_filters + if self._expand_se_in_filters: + in_filters = expand_filters + self._squeeze_excitation = qat_nn_layers.SqueezeExcitationQuantized( + in_filters=in_filters, + out_filters=expand_filters, + se_ratio=self._se_ratio, + divisible_by=self._divisible_by, + round_down_protect=self._se_round_down_protect, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=self._se_inner_activation, + gating_activation=self._se_gating_activation) + else: + self._squeeze_excitation = None + + # Last 1x1 conv. + self._conv2 = helper.Conv2DQuantized( + filters=self._out_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + self._norm2 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + self._add = tfmot.quantization.keras.QuantizeWrapperV2( + tf.keras.layers.Add(), + configs.Default8BitQuantizeConfig([], [], True)) + + super(InvertedBottleneckBlockQuantized, self).build(input_shape) + + def get_config(self) -> Dict[str, Any]: + """Get a config of this layer.""" + config = { + 'in_filters': self._in_filters, + 'out_filters': self._out_filters, + 'expand_ratio': self._expand_ratio, + 'strides': self._strides, + 'kernel_size': self._kernel_size, + 'se_ratio': self._se_ratio, + 'divisible_by': self._divisible_by, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'se_inner_activation': self._se_inner_activation, + 'se_gating_activation': self._se_gating_activation, + 'se_round_down_protect': self._se_round_down_protect, + 'expand_se_in_filters': self._expand_se_in_filters, + 'depthwise_activation': self._depthwise_activation, + 'dilation_rate': self._dilation_rate, + 'use_sync_bn': self._use_sync_bn, + 'regularize_depthwise': self._regularize_depthwise, + 'use_depthwise': self._use_depthwise, + 'use_residual': self._use_residual, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'output_intermediate_endpoints': self._output_intermediate_endpoints + } + base_config = super(InvertedBottleneckBlockQuantized, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call( + self, + inputs: tf.Tensor, + training: Optional[Union[bool, tf.Tensor]] = None + ) -> Union[tf.Tensor, Tuple[tf.Tensor, Dict[str, tf.Tensor]]]: + """Run the InvertedBottleneckBlockQuantized logics.""" + endpoints = {} + shortcut = inputs + if self._expand_ratio > 1: + x = self._conv0(inputs) + x = self._norm0(x) + x = self._activation_layer(x) + else: + x = inputs + + if self._use_depthwise: + x = self._conv1(x) + x = self._norm1(x) + x = self._depthwise_activation_layer(x) + if self._output_intermediate_endpoints: + endpoints['depthwise'] = x + + if self._squeeze_excitation: + x = self._squeeze_excitation(x) + + x = self._conv2(x) + x = self._norm2(x) + + if (self._use_residual and self._in_filters == self._out_filters and + self._strides == 1): + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + x = self._add([x, shortcut]) + + if self._output_intermediate_endpoints: + return x, endpoints + return x diff --git a/official/projects/qat/vision/modeling/layers/nn_blocks_test.py b/official/projects/qat/vision/modeling/layers/nn_blocks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..be7389b7aedf2b62337bd84fd07d0db757738dfa --- /dev/null +++ b/official/projects/qat/vision/modeling/layers/nn_blocks_test.py @@ -0,0 +1,95 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for nn_blocks.""" + +from typing import Any, Iterable, Tuple +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from tensorflow.python.distribute import strategy_combinations +from official.projects.qat.vision.modeling.layers import nn_blocks + + +def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]: + """Returns the combinations of end-to-end tests to run.""" + return combinations.combine( + distribution=[ + strategy_combinations.default_strategy, + strategy_combinations.cloud_tpu_strategy, + strategy_combinations.one_device_strategy_gpu, + ], + ) + + +class NNBlocksTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (nn_blocks.BottleneckBlockQuantized, 1, False, 0.0, None), + (nn_blocks.BottleneckBlockQuantized, 2, True, 0.2, 0.25), + ) + def test_bottleneck_block_creation(self, block_fn, strides, use_projection, + stochastic_depth_drop_rate, se_ratio): + input_size = 128 + filter_size = 256 + inputs = tf.keras.Input( + shape=(input_size, input_size, filter_size * 4), batch_size=1) + block = block_fn( + filter_size, + strides, + use_projection=use_projection, + se_ratio=se_ratio, + stochastic_depth_drop_rate=stochastic_depth_drop_rate) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, filter_size * 4], + features.shape.as_list()) + + @parameterized.parameters( + (nn_blocks.InvertedBottleneckBlockQuantized, 1, 1, None, None), + (nn_blocks.InvertedBottleneckBlockQuantized, 6, 1, None, None), + (nn_blocks.InvertedBottleneckBlockQuantized, 1, 2, None, None), + (nn_blocks.InvertedBottleneckBlockQuantized, 1, 1, 0.2, None), + (nn_blocks.InvertedBottleneckBlockQuantized, 1, 1, None, 0.2), + ) + def test_invertedbottleneck_block_creation( + self, block_fn, expand_ratio, strides, se_ratio, + stochastic_depth_drop_rate): + input_size = 128 + in_filters = 24 + out_filters = 40 + inputs = tf.keras.Input( + shape=(input_size, input_size, in_filters), batch_size=1) + block = block_fn( + in_filters=in_filters, + out_filters=out_filters, + expand_ratio=expand_ratio, + strides=strides, + se_ratio=se_ratio, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + output_intermediate_endpoints=False) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, out_filters], + features.shape.as_list()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/modeling/layers/nn_layers.py b/official/projects/qat/vision/modeling/layers/nn_layers.py new file mode 100644 index 0000000000000000000000000000000000000000..139432de61f67325123065a9d07074a427962310 --- /dev/null +++ b/official/projects/qat/vision/modeling/layers/nn_layers.py @@ -0,0 +1,794 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common building blocks for neural networks.""" + +import enum +from typing import Callable, Dict, List, Mapping, Optional, Sequence, Tuple, Union, Any + +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.modeling import tf_utils +from official.projects.qat.vision.quantization import configs +from official.projects.qat.vision.quantization import helper +from official.vision.modeling.decoders import aspp +from official.vision.modeling.layers import nn_layers + + +# Type annotations. +States = Dict[str, tf.Tensor] +Activation = Union[str, Callable] + + +# String constants. +class FeatureFusion(str, enum.Enum): + PYRAMID_FUSION = 'pyramid_fusion' + PANOPTIC_FPN_FUSION = 'panoptic_fpn_fusion' + DEEPLABV3PLUS = 'deeplabv3plus' + DEEPLABV3PLUS_SUM_TO_MERGE = 'deeplabv3plus_sum_to_merge' + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SqueezeExcitationQuantized( + helper.LayerQuantizerHelper, + tf.keras.layers.Layer): + """Creates a squeeze and excitation layer.""" + + def __init__(self, + in_filters, + out_filters, + se_ratio, + divisible_by=1, + use_3d_input=False, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + gating_activation='sigmoid', + round_down_protect=True, + **kwargs): + """Initializes a squeeze and excitation layer. + + Args: + in_filters: An `int` number of filters of the input tensor. + out_filters: An `int` number of filters of the output tensor. + se_ratio: A `float` or None. If not None, se ratio for the squeeze and + excitation layer. + divisible_by: An `int` that ensures all inner dimensions are divisible by + this number. + use_3d_input: A `bool` of whether input is 2D or 3D image. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + gating_activation: A `str` name of the activation function for final + gating function. + round_down_protect: A `bool` of whether round down more than 10% will be + allowed. + **kwargs: Additional keyword arguments to be passed. + """ + super().__init__(**kwargs) + + self._in_filters = in_filters + self._out_filters = out_filters + self._se_ratio = se_ratio + self._divisible_by = divisible_by + self._round_down_protect = round_down_protect + self._use_3d_input = use_3d_input + self._activation = activation + self._gating_activation = gating_activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + if tf.keras.backend.image_data_format() == 'channels_last': + if not use_3d_input: + self._spatial_axis = [1, 2] + else: + self._spatial_axis = [1, 2, 3] + else: + if not use_3d_input: + self._spatial_axis = [2, 3] + else: + self._spatial_axis = [2, 3, 4] + + def _create_gating_activation_layer(self): + if self._gating_activation == 'hard_sigmoid': + # Convert hard_sigmoid activation to quantizable keras layers so each op + # can be properly quantized. + # Formula is hard_sigmoid(x) = relu6(x + 3) * 0.16667. + self._add_quantizer('add_three') + self._add_quantizer('divide_six') + self._relu6 = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation('relu6', use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + else: + self._gating_activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation( + self._gating_activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + + def _apply_gating_activation_layer( + self, x: tf.Tensor, training: bool) -> tf.Tensor: + if self._gating_activation == 'hard_sigmoid': + x = self._apply_quantizer('add_three', x + 3.0, training) + x = self._relu6(x) + x = self._apply_quantizer('divide_six', x * 1.6667, training) + else: + x = self._gating_activation_layer(x) + return x + + def build(self, input_shape): + num_reduced_filters = nn_layers.make_divisible( + max(1, int(self._in_filters * self._se_ratio)), + divisor=self._divisible_by, + round_down_protect=self._round_down_protect) + + self._se_reduce = helper.Conv2DQuantized( + filters=num_reduced_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=True, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + + self._se_expand = helper.Conv2DOutputQuantized( + filters=self._out_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=True, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=helper.NoOpActivation()) + + self._multiply = tfmot.quantization.keras.QuantizeWrapperV2( + tf.keras.layers.Multiply(), + configs.Default8BitQuantizeConfig([], [], True)) + self._reduce_mean_quantizer = ( + tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=8, per_axis=False, symmetric=False, narrow_range=False)) + self._reduce_mean_quantizer_vars = self._reduce_mean_quantizer.build( + None, 'reduce_mean_quantizer_vars', self) + + self._activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + self._create_gating_activation_layer() + + self._build_quantizer_vars() + super().build(input_shape) + + def get_config(self): + config = { + 'in_filters': self._in_filters, + 'out_filters': self._out_filters, + 'se_ratio': self._se_ratio, + 'divisible_by': self._divisible_by, + 'use_3d_input': self._use_3d_input, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'gating_activation': self._gating_activation, + 'round_down_protect': self._round_down_protect, + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + x = tf.reduce_mean(inputs, self._spatial_axis, keepdims=True) + x = self._reduce_mean_quantizer( + x, training, self._reduce_mean_quantizer_vars) + x = self._activation_layer(self._se_reduce(x)) + x = self._apply_gating_activation_layer(self._se_expand(x), training) + x = self._multiply([x, inputs]) + return x + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SegmentationHeadQuantized(tf.keras.layers.Layer): + """Creates a segmentation head.""" + + def __init__( + self, + num_classes: int, + level: Union[int, str], + num_convs: int = 2, + num_filters: int = 256, + use_depthwise_convolution: bool = False, + prediction_kernel_size: int = 1, + upsample_factor: int = 1, + feature_fusion: Optional[str] = None, + decoder_min_level: Optional[int] = None, + decoder_max_level: Optional[int] = None, + low_level: int = 2, + low_level_num_filters: int = 48, + num_decoder_filters: int = 256, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a segmentation head. + + Args: + num_classes: An `int` number of mask classification categories. The number + of classes does not include background class. + level: An `int` or `str`, level to use to build segmentation head. + num_convs: An `int` number of stacked convolution before the last + prediction layer. + num_filters: An `int` number to specify the number of filters used. + Default is 256. + use_depthwise_convolution: A bool to specify if use depthwise separable + convolutions. + prediction_kernel_size: An `int` number to specify the kernel size of the + prediction layer. + upsample_factor: An `int` number to specify the upsampling factor to + generate finer mask. Default 1 means no upsampling is applied. + feature_fusion: One of `deeplabv3plus`, `deeplabv3plus_sum_to_merge`, + `pyramid_fusion`, or None. If `deeplabv3plus`, features from + decoder_features[level] will be fused with low level feature maps from + backbone. If `pyramid_fusion`, multiscale features will be resized and + fused at the target level. + decoder_min_level: An `int` of minimum level from decoder to use in + feature fusion. It is only used when feature_fusion is set to + `panoptic_fpn_fusion`. + decoder_max_level: An `int` of maximum level from decoder to use in + feature fusion. It is only used when feature_fusion is set to + `panoptic_fpn_fusion`. + low_level: An `int` of backbone level to be used for feature fusion. It is + used when feature_fusion is set to `deeplabv3plus`. + low_level_num_filters: An `int` of reduced number of filters for the low + level features before fusing it with higher level features. It is only + used when feature_fusion is set to `deeplabv3plus`. + num_decoder_filters: An `int` of number of filters in the decoder outputs. + It is only used when feature_fusion is set to `panoptic_fpn_fusion`. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + super().__init__(**kwargs) + + self._config_dict = { + 'num_classes': num_classes, + 'level': level, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'use_depthwise_convolution': use_depthwise_convolution, + 'prediction_kernel_size': prediction_kernel_size, + 'upsample_factor': upsample_factor, + 'feature_fusion': feature_fusion, + 'decoder_min_level': decoder_min_level, + 'decoder_max_level': decoder_max_level, + 'low_level': low_level, + 'low_level_num_filters': low_level_num_filters, + 'num_decoder_filters': num_decoder_filters, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + } + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + + def build(self, input_shape: Sequence[tf.TensorShape]): + """Creates the variables of the segmentation head.""" + # When input_shape is a list/tuple, the first corresponds to backbone + # features used for resizing the decoder features (the second) if feature + # fusion type is `deeplabv3plus`. + backbone_shape = input_shape[0] + use_depthwise_convolution = self._config_dict['use_depthwise_convolution'] + random_initializer = tf.keras.initializers.RandomNormal(stddev=0.01) + conv_kwargs = { + 'kernel_size': 3 if not use_depthwise_convolution else 1, + 'padding': 'same', + 'use_bias': False, + 'kernel_initializer': random_initializer, + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + } + + norm_layer = ( + tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] else + tf.keras.layers.BatchNormalization) + norm_with_quantize = helper.BatchNormalizationQuantized(norm_layer) + norm_no_quantize = helper.BatchNormalizationNoQuantized(norm_layer) + norm = helper.norm_by_activation(self._config_dict['activation'], + norm_with_quantize, norm_no_quantize) + + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + if self._config_dict['feature_fusion'] in [ + FeatureFusion.DEEPLABV3PLUS, FeatureFusion.DEEPLABV3PLUS_SUM_TO_MERGE + ]: + # Deeplabv3+ feature fusion layers. + self._dlv3p_conv = helper.Conv2DQuantized( + kernel_size=1, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(random_initializer), + kernel_regularizer=self._config_dict['kernel_regularizer'], + name='segmentation_head_deeplabv3p_fusion_conv', + filters=self._config_dict['low_level_num_filters'], + activation=helper.NoOpActivation()) + + self._dlv3p_norm = norm( + name='segmentation_head_deeplabv3p_fusion_norm', **bn_kwargs) + + # Segmentation head layers. + self._convs = [] + self._norms = [] + for i in range(self._config_dict['num_convs']): + if use_depthwise_convolution: + self._convs.append( + helper.DepthwiseConv2DQuantized( + name='segmentation_head_depthwise_conv_{}'.format(i), + kernel_size=3, + padding='same', + use_bias=False, + depthwise_initializer=tf_utils.clone_initializer( + random_initializer), + depthwise_regularizer=self._config_dict['kernel_regularizer'], + depth_multiplier=1, + activation=helper.NoOpActivation())) + norm_name = 'segmentation_head_depthwise_norm_{}'.format(i) + self._norms.append(norm(name=norm_name, **bn_kwargs)) + conv_name = 'segmentation_head_conv_{}'.format(i) + self._convs.append( + helper.Conv2DQuantized( + name=conv_name, + filters=self._config_dict['num_filters'], + activation=helper.NoOpActivation(), + **conv_kwargs)) + norm_name = 'segmentation_head_norm_{}'.format(i) + self._norms.append(norm(name=norm_name, **bn_kwargs)) + + self._classifier = helper.Conv2DOutputQuantized( + name='segmentation_output', + filters=self._config_dict['num_classes'], + kernel_size=self._config_dict['prediction_kernel_size'], + padding='same', + bias_initializer=tf.zeros_initializer(), + kernel_initializer=tf_utils.clone_initializer(random_initializer), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer'], + activation=helper.NoOpActivation()) + + self._upsampling_layer = helper.UpSampling2DQuantized( + size=(self._config_dict['upsample_factor'], + self._config_dict['upsample_factor']), + interpolation='nearest') + self._resizing_layer = helper.ResizingQuantized( + backbone_shape[1], backbone_shape[2], interpolation='bilinear') + + self._concat_layer = helper.ConcatenateQuantized(axis=self._bn_axis) + self._add_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf.keras.layers.Add(), configs.Default8BitQuantizeConfig([], [], True)) + + super().build(input_shape) + + def call(self, inputs: Tuple[Union[tf.Tensor, Mapping[str, tf.Tensor]], + Union[tf.Tensor, Mapping[str, tf.Tensor]]]): + """Forward pass of the segmentation head. + + It supports both a tuple of 2 tensors or 2 dictionaries. The first is + backbone endpoints, and the second is decoder endpoints. When inputs are + tensors, they are from a single level of feature maps. When inputs are + dictionaries, they contain multiple levels of feature maps, where the key + is the index of feature map. + + Args: + inputs: A tuple of 2 feature map tensors of shape + [batch, height_l, width_l, channels] or 2 dictionaries of tensors: + - key: A `str` of the level of the multilevel features. + - values: A `tf.Tensor` of the feature map tensors, whose shape is + [batch, height_l, width_l, channels]. + + Returns: + segmentation prediction mask: A `tf.Tensor` of the segmentation mask + scores predicted from input features. + """ + if self._config_dict['feature_fusion'] in ( + FeatureFusion.PYRAMID_FUSION, FeatureFusion.PANOPTIC_FPN_FUSION): + raise ValueError( + 'The feature fusion method `pyramid_fusion` is not supported in QAT.') + + backbone_output = inputs[0] + decoder_output = inputs[1] + if self._config_dict['feature_fusion'] in { + FeatureFusion.DEEPLABV3PLUS, FeatureFusion.DEEPLABV3PLUS_SUM_TO_MERGE + }: + # deeplabv3+ feature fusion. + x = decoder_output[str(self._config_dict['level'])] if isinstance( + decoder_output, dict) else decoder_output + y = backbone_output[str(self._config_dict['low_level'])] if isinstance( + backbone_output, dict) else backbone_output + y = self._dlv3p_norm(self._dlv3p_conv(y)) + y = self._activation_layer(y) + x = self._resizing_layer(x) + x = tf.cast(x, dtype=y.dtype) + if self._config_dict['feature_fusion'] == FeatureFusion.DEEPLABV3PLUS: + x = self._concat_layer([x, y]) + else: + x = self._add_layer([x, y]) + else: + x = decoder_output[str(self._config_dict['level'])] if isinstance( + decoder_output, dict) else decoder_output + + for conv, norm in zip(self._convs, self._norms): + x = conv(x) + x = norm(x) + x = self._activation_layer(x) + if self._config_dict['upsample_factor'] > 1: + # Use keras layer for nearest upsampling so it is QAT compatible. + x = self._upsampling_layer(x) + + return self._classifier(x) + + def get_config(self): + base_config = super().get_config() + return dict(list(base_config.items()) + list(self._config_dict.items())) + + @classmethod + def from_config(cls, config): + return cls(**config) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SpatialPyramidPoolingQuantized(nn_layers.SpatialPyramidPooling): + """Implements the quantized Atrous Spatial Pyramid Pooling. + + References: + [Rethinking Atrous Convolution for Semantic Image Segmentation]( + https://arxiv.org/pdf/1706.05587.pdf) + [Encoder-Decoder with Atrous Separable Convolution for Semantic Image + Segmentation](https://arxiv.org/pdf/1802.02611.pdf) + """ + + def __init__( + self, + output_channels: int, + dilation_rates: List[int], + pool_kernel_size: Optional[List[int]] = None, + use_sync_bn: bool = False, + batchnorm_momentum: float = 0.99, + batchnorm_epsilon: float = 0.001, + activation: str = 'relu', + dropout: float = 0.5, + kernel_initializer: str = 'GlorotUniform', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + use_depthwise_convolution: bool = False, + **kwargs): + """Initializes `SpatialPyramidPooling`. + + Args: + output_channels: Number of channels produced by SpatialPyramidPooling. + dilation_rates: A list of integers for parallel dilated conv. + pool_kernel_size: A list of integers or None. If None, global average + pooling is applied, otherwise an average pooling of pool_kernel_size is + applied. + use_sync_bn: A bool, whether or not to use sync batch normalization. + batchnorm_momentum: A float for the momentum in BatchNorm. Defaults to + 0.99. + batchnorm_epsilon: A float for the epsilon value in BatchNorm. Defaults to + 0.001. + activation: A `str` for type of activation to be used. Defaults to 'relu'. + dropout: A float for the dropout rate before output. Defaults to 0.5. + kernel_initializer: Kernel initializer for conv layers. Defaults to + `glorot_uniform`. + kernel_regularizer: Kernel regularizer for conv layers. Defaults to None. + interpolation: The interpolation method for upsampling. Defaults to + `bilinear`. + use_depthwise_convolution: Allows spatial pooling to be separable + depthwise convolusions. [Encoder-Decoder with Atrous Separable + Convolution for Semantic Image Segmentation]( + https://arxiv.org/pdf/1802.02611.pdf) + **kwargs: Other keyword arguments for the layer. + """ + super().__init__( + output_channels=output_channels, + dilation_rates=dilation_rates, + use_sync_bn=use_sync_bn, + batchnorm_momentum=batchnorm_momentum, + batchnorm_epsilon=batchnorm_epsilon, + activation=activation, + dropout=dropout, + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + interpolation=interpolation, + pool_kernel_size=pool_kernel_size, + use_depthwise_convolution=use_depthwise_convolution) + + self._activation_fn = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(activation, use_keras_layer=True), + configs.Default8BitActivationQuantizeConfig()) + self._activation_fn_no_quant = ( + tf_utils.get_activation(activation, use_keras_layer=True)) + + def build(self, input_shape): + height = input_shape[1] + width = input_shape[2] + channels = input_shape[3] + + norm_layer = ( + tf.keras.layers.experimental.SyncBatchNormalization + if self._use_sync_bn else tf.keras.layers.BatchNormalization) + norm_with_quantize = helper.BatchNormalizationQuantized(norm_layer) + norm_no_quantize = helper.BatchNormalizationNoQuantized(norm_layer) + norm = helper.norm_by_activation(self._activation, norm_with_quantize, + norm_no_quantize) + + self.aspp_layers = [] + + conv1 = helper.Conv2DQuantized( + filters=self._output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + use_bias=False, + activation=helper.NoOpActivation()) + norm1 = norm( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + + self.aspp_layers.append([conv1, norm1]) + + for dilation_rate in self._dilation_rates: + leading_layers = [] + kernel_size = (3, 3) + if self._use_depthwise_convolution: + leading_layers += [ + helper.DepthwiseConv2DOutputQuantized( + depth_multiplier=1, + kernel_size=kernel_size, + padding='same', + depthwise_regularizer=self._kernel_regularizer, + depthwise_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + dilation_rate=dilation_rate, + use_bias=False, + activation=helper.NoOpActivation()) + ] + kernel_size = (1, 1) + conv_dilation = leading_layers + [ + helper.Conv2DQuantized( + filters=self._output_channels, + kernel_size=kernel_size, + padding='same', + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + dilation_rate=dilation_rate, + use_bias=False, + activation=helper.NoOpActivation()) + ] + norm_dilation = norm( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + + self.aspp_layers.append(conv_dilation + [norm_dilation]) + + if self._pool_kernel_size is None: + pooling = [ + helper.GlobalAveragePooling2DQuantized(), + helper.ReshapeQuantized((1, 1, channels)) + ] + else: + pooling = [helper.AveragePooling2DQuantized(self._pool_kernel_size)] + + conv2 = helper.Conv2DQuantized( + filters=self._output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + use_bias=False, + activation=helper.NoOpActivation()) + norm2 = norm( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + + self.aspp_layers.append(pooling + [conv2, norm2]) + self._resizing_layer = helper.ResizingQuantized( + height, width, interpolation=self._interpolation) + + self._projection = [ + helper.Conv2DQuantized( + filters=self._output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + use_bias=False, + activation=helper.NoOpActivation()), + norm( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + ] + self._dropout_layer = tf.keras.layers.Dropout(rate=self._dropout) + self._concat_layer = helper.ConcatenateQuantized(axis=-1) + + def call(self, + inputs: tf.Tensor, + training: Optional[bool] = None) -> tf.Tensor: + if training is None: + training = tf.keras.backend.learning_phase() + result = [] + for i, layers in enumerate(self.aspp_layers): + x = inputs + for layer in layers: + # Apply layers sequentially. + x = layer(x, training=training) + x = self._activation_fn(x) + + # Apply resize layer to the end of the last set of layers. + if i == len(self.aspp_layers) - 1: + x = self._resizing_layer(x) + + result.append(tf.cast(x, inputs.dtype)) + x = self._concat_layer(result) + for layer in self._projection: + x = layer(x, training=training) + x = self._activation_fn(x) + return self._dropout_layer(x) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class ASPPQuantized(aspp.ASPP): + """Creates a quantized Atrous Spatial Pyramid Pooling (ASPP) layer.""" + + def __init__( + self, + level: int, + dilation_rates: List[int], + num_filters: int = 256, + pool_kernel_size: Optional[int] = None, + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + activation: str = 'relu', + dropout_rate: float = 0.0, + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + use_depthwise_convolution: bool = False, + spp_layer_version: str = 'v1', + output_tensor: bool = True, + **kwargs): + """Initializes an Atrous Spatial Pyramid Pooling (ASPP) layer. + + Args: + level: An `int` level to apply ASPP. + dilation_rates: A `list` of dilation rates. + num_filters: An `int` number of output filters in ASPP. + pool_kernel_size: A `list` of [height, width] of pooling kernel size or + None. Pooling size is with respect to original image size, it will be + scaled down by 2**level. If None, global average pooling is used. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + activation: A `str` activation to be used in ASPP. + dropout_rate: A `float` rate for dropout regularization. + kernel_initializer: A `str` name of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + interpolation: A `str` of interpolation method. It should be one of + `bilinear`, `nearest`, `bicubic`, `area`, `lanczos3`, `lanczos5`, + `gaussian`, or `mitchellcubic`. + use_depthwise_convolution: If True depthwise separable convolutions will + be added to the Atrous spatial pyramid pooling. + spp_layer_version: A `str` of spatial pyramid pooling layer version. + output_tensor: Whether to output a single tensor or a dictionary of + tensor. Default is true. + **kwargs: Additional keyword arguments to be passed. + """ + super().__init__( + level=level, + dilation_rates=dilation_rates, + num_filters=num_filters, + pool_kernel_size=pool_kernel_size, + use_sync_bn=use_sync_bn, + norm_momentum=norm_momentum, + norm_epsilon=norm_epsilon, + activation=activation, + dropout_rate=dropout_rate, + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + interpolation=interpolation, + use_depthwise_convolution=use_depthwise_convolution, + spp_layer_version=spp_layer_version, + output_tensor=output_tensor, + **kwargs) + + self._aspp_layer = SpatialPyramidPoolingQuantized + + def call(self, inputs: Union[tf.Tensor, Mapping[str, + tf.Tensor]]) -> tf.Tensor: + """Calls the Atrous Spatial Pyramid Pooling (ASPP) layer on an input. + + The output of ASPP will be a dict of {`level`, `tf.Tensor`} even if only one + level is present, if output_tensor is false. Hence, this will be compatible + with the rest of the segmentation model interfaces. + If output_tensor is true, a single tensot is output. + + Args: + inputs: A `tf.Tensor` of shape [batch, height_l, width_l, filter_size] or + a `dict` of `tf.Tensor` where + - key: A `str` of the level of the multilevel feature maps. + - values: A `tf.Tensor` of shape [batch, height_l, width_l, + filter_size]. + + Returns: + A `tf.Tensor` of shape [batch, height_l, width_l, filter_size] or a `dict` + of `tf.Tensor` where + - key: A `str` of the level of the multilevel feature maps. + - values: A `tf.Tensor` of output of ASPP module. + """ + level = str(self._config_dict['level']) + backbone_output = inputs[level] if isinstance(inputs, dict) else inputs + return self.aspp(backbone_output) + + +class BatchNormalizationWrapper(tf.keras.layers.Wrapper): + """A BatchNormalizationWrapper that explicitly not folded. + + It just added an identity depthwise conv right before the normalization. + As a result, given normalization op just folded into the identity depthwise + conv layer. + + Note that it only used when the batch normalization folding is not working. + It makes quantize them as a 1x1 depthwise conv layer that just work as same + as inference mode for the normalization. (Basically mult and add for the BN.) + """ + + def call(self, inputs: tf.Tensor, *args: Any, **kwargs: Any) -> tf.Tensor: + channels = tf.shape(inputs)[-1] + x = tf.nn.depthwise_conv2d( + inputs, tf.ones([1, 1, channels, 1]), [1, 1, 1, 1], 'VALID') + outputs = self.layer.call(x, *args, **kwargs) + return outputs diff --git a/official/projects/qat/vision/modeling/layers/nn_layers_test.py b/official/projects/qat/vision/modeling/layers/nn_layers_test.py new file mode 100644 index 0000000000000000000000000000000000000000..002a2e2920ec113e0d62e22ba251cd5a75dc2fcd --- /dev/null +++ b/official/projects/qat/vision/modeling/layers/nn_layers_test.py @@ -0,0 +1,107 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for nn_layers.""" + +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.qat.vision.modeling.layers import nn_layers + + +class NNLayersTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('deeplabv3plus', 1, 128, 128), + ('deeplabv3plus', 2, 128, 128), + ('deeplabv3', 1, 128, 64), + ('deeplabv3', 2, 128, 64), + ('deeplabv3plus_sum_to_merge', 1, 64, 128), + ('deeplabv3plus_sum_to_merge', 2, 64, 128), + ) + def test_segmentation_head_creation(self, feature_fusion, upsample_factor, + low_level_num_filters, expected_shape): + input_size = 128 + decoder_outupt_size = input_size // 2 + + decoder_output = tf.random.uniform( + (2, decoder_outupt_size, decoder_outupt_size, 64), dtype=tf.float32) + backbone_output = tf.random.uniform((2, input_size, input_size, 32), + dtype=tf.float32) + segmentation_head = nn_layers.SegmentationHeadQuantized( + num_classes=5, + level=4, + upsample_factor=upsample_factor, + low_level=2, + low_level_num_filters=low_level_num_filters, + feature_fusion=feature_fusion) + + features = segmentation_head((backbone_output, decoder_output)) + + self.assertAllEqual([ + 2, expected_shape * upsample_factor, expected_shape * upsample_factor, 5 + ], features.shape.as_list()) + + @parameterized.parameters( + (None, []), + (None, [6, 12, 18]), + ([32, 32], [6, 12, 18]), + ) + def test_spatial_pyramid_pooling_creation(self, pool_kernel_size, + dilation_rates): + inputs = tf.keras.Input(shape=(64, 64, 128), dtype=tf.float32) + layer = nn_layers.SpatialPyramidPoolingQuantized( + output_channels=256, + dilation_rates=dilation_rates, + pool_kernel_size=pool_kernel_size) + output = layer(inputs) + self.assertAllEqual([None, 64, 64, 256], output.shape) + + @parameterized.parameters( + (3, [6, 12, 18, 24], 128), + (3, [6, 12, 18], 128), + (3, [6, 12], 256), + (4, [], 128), + (4, [6, 12, 18], 128), + (4, [], 256), + ) + def test_aspp_creation(self, level, dilation_rates, num_filters): + input_size = 128 // 2**level + tf.keras.backend.set_image_data_format('channels_last') + endpoints = tf.random.uniform( + shape=(2, input_size, input_size, 64), dtype=tf.float32) + + network = nn_layers.ASPPQuantized( + level=level, dilation_rates=dilation_rates, num_filters=num_filters) + + feats = network(endpoints) + + self.assertAllEqual([2, input_size, input_size, num_filters], + feats.shape.as_list()) + + @parameterized.parameters(False, True) + def test_bnorm_wrapper_creation(self, use_sync_bn): + inputs = tf.keras.Input(shape=(64, 64, 128), dtype=tf.float32) + if use_sync_bn: + norm = tf.keras.layers.experimental.SyncBatchNormalization(axis=-1) + else: + norm = tf.keras.layers.BatchNormalization(axis=-1) + layer = nn_layers.BatchNormalizationWrapper(norm) + output = layer(inputs) + self.assertAllEqual([None, 64, 64, 128], output.shape) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/modeling/segmentation_model.py b/official/projects/qat/vision/modeling/segmentation_model.py new file mode 100644 index 0000000000000000000000000000000000000000..99511e2275dd5d01b162b503567ea1fb9038e1db --- /dev/null +++ b/official/projects/qat/vision/modeling/segmentation_model.py @@ -0,0 +1,84 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Build segmentation models.""" +from typing import Any, Mapping, Union + +# Import libraries +import tensorflow as tf + +layers = tf.keras.layers + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SegmentationModelQuantized(tf.keras.Model): + """A Segmentation class model. + + Input images are passed through backbone first. Decoder network is then + applied, and finally, segmentation head is applied on the output of the + decoder network. Layers such as ASPP should be part of decoder. Any feature + fusion is done as part of the segmentation head (i.e. deeplabv3+ feature + fusion is not part of the decoder, instead it is part of the segmentation + head). This way, different feature fusion techniques can be combined with + different backbones, and decoders. + """ + + def __init__(self, backbone: tf.keras.Model, decoder: tf.keras.layers.Layer, + head: tf.keras.layers.Layer, + input_specs: tf.keras.layers.InputSpec, **kwargs): + """Segmentation initialization function. + + Args: + backbone: a backbone network. + decoder: a decoder network. E.g. FPN. + head: segmentation head. + input_specs: The shape specifications of input tensor. + **kwargs: keyword arguments to be passed. + """ + inputs = tf.keras.Input(shape=input_specs.shape[1:], name=input_specs.name) + backbone_features = backbone(inputs) + + if decoder: + backbone_feature = backbone_features[str(decoder.get_config()['level'])] + decoder_feature = decoder(backbone_feature) + else: + decoder_feature = backbone_features + + backbone_feature = backbone_features[str(head.get_config()['low_level'])] + x = {'logits': head((backbone_feature, decoder_feature))} + super().__init__(inputs=inputs, outputs=x, **kwargs) + self._config_dict = { + 'backbone': backbone, + 'decoder': decoder, + 'head': head, + } + self.backbone = backbone + self.decoder = decoder + self.head = head + + @property + def checkpoint_items( + self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: + """Returns a dictionary of items to be additionally checkpointed.""" + items = dict(backbone=self.backbone, head=self.head) + if self.decoder is not None: + items.update(decoder=self.decoder) + return items + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) diff --git a/official/projects/qat/vision/n_bit/__init__.py b/official/projects/qat/vision/n_bit/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..569b809ec7d5e74454e698639ab48b7c1822937b --- /dev/null +++ b/official/projects/qat/vision/n_bit/__init__.py @@ -0,0 +1,21 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs package definition.""" + +from official.projects.qat.vision.n_bit import configs +from official.projects.qat.vision.n_bit import schemes +from official.projects.qat.vision.n_bit.nn_blocks import BottleneckBlockNBitQuantized +from official.projects.qat.vision.n_bit.nn_blocks import Conv2DBNBlockNBitQuantized +from official.projects.qat.vision.n_bit.nn_blocks import InvertedBottleneckBlockNBitQuantized diff --git a/official/projects/qat/vision/n_bit/configs.py b/official/projects/qat/vision/n_bit/configs.py new file mode 100644 index 0000000000000000000000000000000000000000..941c3690f34f2cb258af65be9226edddab40ec0e --- /dev/null +++ b/official/projects/qat/vision/n_bit/configs.py @@ -0,0 +1,380 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Default 8-bit QuantizeConfigs.""" +from typing import Sequence, Callable, Tuple, Any, Dict + +import tensorflow as tf +import tensorflow_model_optimization as tfmot + + +Quantizer = tfmot.quantization.keras.quantizers.Quantizer +Layer = tf.keras.layers.Layer +Activation = Callable[[tf.Tensor], tf.Tensor] +WeightAndQuantizer = Tuple[tf.Variable, Quantizer] +ActivationAndQuantizer = Tuple[Activation, Quantizer] + + +class DefaultNBitOutputQuantizeConfig( + tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig which only quantizes the output from a layer.""" + + def __init__(self, num_bits_weight: int = 8, num_bits_activation: int = 8): + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + return [] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + return [] + + def set_quantize_weights(self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + pass + + def set_quantize_activations(self, + layer: Layer, + quantize_activations: Sequence[Activation]): + pass + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + return [ + tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=self._num_bits_activation, per_axis=False, + symmetric=False, narrow_range=False) # activation/output + ] + + def get_config(self) -> Dict[str, Any]: + return { + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation, + } + + +class NoOpQuantizeConfig(tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig which does not quantize any part of the layer.""" + + def __init__(self, num_bits_weight: int = 8, num_bits_activation: int = 8): + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + return [] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + return [] + + def set_quantize_weights( + self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + pass + + def set_quantize_activations( + self, + layer: Layer, + quantize_activations: Sequence[Activation]): + pass + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + return [] + + def get_config(self) -> Dict[str, Any]: + return { + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation, + } + + +class DefaultNBitQuantizeConfig(tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig for non recurrent Keras layers.""" + + def __init__(self, + weight_attrs: Sequence[str], + activation_attrs: Sequence[str], + quantize_output: bool, + num_bits_weight: int = 8, + num_bits_activation: int = 8): + """Initializes a default N-bit quantize config.""" + self.weight_attrs = weight_attrs + self.activation_attrs = activation_attrs + self.quantize_output = quantize_output + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + # TODO(pulkitb): For some layers such as Conv2D, per_axis should be True. + # Add mapping for which layers support per_axis. + self.weight_quantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer( + num_bits=num_bits_weight, per_axis=False, + symmetric=True, narrow_range=True) # weight + self.activation_quantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=num_bits_activation, per_axis=False, + symmetric=False, narrow_range=False) # activation/output + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + """See base class.""" + return [(getattr(layer, weight_attr), self.weight_quantizer) + for weight_attr in self.weight_attrs] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + """See base class.""" + return [(getattr(layer, activation_attr), self.activation_quantizer) + for activation_attr in self.activation_attrs] + + def set_quantize_weights( + self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + """See base class.""" + if len(self.weight_attrs) != len(quantize_weights): + raise ValueError( + '`set_quantize_weights` called on layer {} with {} ' + 'weight parameters, but layer expects {} values.'.format( + layer.name, len(quantize_weights), len(self.weight_attrs))) + + for weight_attr, weight in zip(self.weight_attrs, quantize_weights): + current_weight = getattr(layer, weight_attr) + if current_weight.shape != weight.shape: + raise ValueError('Existing layer weight shape {} is incompatible with' + 'provided weight shape {}'.format( + current_weight.shape, weight.shape)) + + setattr(layer, weight_attr, weight) + + def set_quantize_activations( + self, + layer: Layer, + quantize_activations: Sequence[Activation]): + """See base class.""" + if len(self.activation_attrs) != len(quantize_activations): + raise ValueError( + '`set_quantize_activations` called on layer {} with {} ' + 'activation parameters, but layer expects {} values.'.format( + layer.name, len(quantize_activations), + len(self.activation_attrs))) + + for activation_attr, activation in zip( + self.activation_attrs, quantize_activations): + setattr(layer, activation_attr, activation) + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + """See base class.""" + if self.quantize_output: + return [self.activation_quantizer] + return [] + + @classmethod + def from_config(cls, config: Dict[str, Any]) -> object: + """Instantiates a `DefaultNBitQuantizeConfig` from its config. + + Args: + config: Output of `get_config()`. + + Returns: + A `DefaultNBitQuantizeConfig` instance. + """ + return cls(**config) + + def get_config(self) -> Dict[str, Any]: + """Get a config for this quantize config.""" + # TODO(pulkitb): Add weight and activation quantizer to config. + # Currently it's created internally, but ideally the quantizers should be + # part of the constructor and passed in from the registry. + return { + 'weight_attrs': self.weight_attrs, + 'activation_attrs': self.activation_attrs, + 'quantize_output': self.quantize_output, + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation + } + + def __eq__(self, other): + if not isinstance(other, DefaultNBitQuantizeConfig): + return False + + return (self.weight_attrs == other.weight_attrs and + self.activation_attrs == self.activation_attrs and + self.weight_quantizer == other.weight_quantizer and + self.activation_quantizer == other.activation_quantizer and + self.quantize_output == other.quantize_output) + + def __ne__(self, other): + return not self.__eq__(other) + + +class DefaultNBitConvWeightsQuantizer( + tfmot.quantization.keras.quantizers.LastValueQuantizer): + """Quantizer for handling weights in Conv2D/DepthwiseConv2D layers.""" + + def __init__(self, num_bits_weight: int = 8, num_bits_activation: int = 8): + """Construct LastValueQuantizer with params specific for TFLite Convs.""" + + super(DefaultNBitConvWeightsQuantizer, self).__init__( + num_bits=num_bits_weight, per_axis=True, + symmetric=True, narrow_range=True) # weight + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + def build(self, + tensor_shape: tf.TensorShape, + name: str, + layer: Layer): + """Build min/max quantization variables.""" + min_weight = layer.add_weight( + name + '_min', + shape=(tensor_shape[-1],), + initializer=tf.keras.initializers.Constant(-6.0), + trainable=False) + max_weight = layer.add_weight( + name + '_max', + shape=(tensor_shape[-1],), + initializer=tf.keras.initializers.Constant(6.0), + trainable=False) + + return {'min_var': min_weight, 'max_var': max_weight} + + +class NoQuantizer(tfmot.quantization.keras.quantizers.Quantizer): + """Dummy quantizer for explicitly not quantize.""" + + def __call__(self, inputs, training, weights, **kwargs): + return tf.identity(inputs) + + def get_config(self): + return {} + + def build(self, tensor_shape, name, layer): + return {} + + +class DefaultNBitConvQuantizeConfig(DefaultNBitQuantizeConfig): + """QuantizeConfig for Conv2D/DepthwiseConv2D layers.""" + + def __init__(self, + weight_attrs: Sequence[str], + activation_attrs: Sequence[str], + quantize_output: bool, + num_bits_weight: int = 8, + num_bits_activation: int = 8): + """Initializes default N-bit quantization config for the conv layer.""" + super().__init__(weight_attrs=weight_attrs, + activation_attrs=activation_attrs, + quantize_output=quantize_output, + num_bits_weight=num_bits_weight, + num_bits_activation=num_bits_activation) + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + self.weight_quantizer = DefaultNBitConvWeightsQuantizer( + num_bits_weight=num_bits_weight, + num_bits_activation=num_bits_activation) + + +class DefaultNBitActivationQuantizeConfig( + tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig for keras.layers.Activation. + + `keras.layers.Activation` needs a separate `QuantizeConfig` since the + decision to quantize depends on the specific activation type. + """ + + def __init__(self, num_bits_weight: int = 8, num_bits_activation: int = 8): + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + def _assert_activation_layer(self, layer: Layer): + if not isinstance(layer, tf.keras.layers.Activation): + raise RuntimeError( + 'DefaultNBitActivationQuantizeConfig can only be used with ' + '`keras.layers.Activation`.') + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + """See base class.""" + self._assert_activation_layer(layer) + return [] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + """See base class.""" + self._assert_activation_layer(layer) + return [] + + def set_quantize_weights( + self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + """See base class.""" + self._assert_activation_layer(layer) + + def set_quantize_activations( + self, + layer: Layer, + quantize_activations: Sequence[Activation]): + """See base class.""" + self._assert_activation_layer(layer) + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + """See base class.""" + self._assert_activation_layer(layer) + + if not hasattr(layer.activation, '__name__'): + raise ValueError('Activation {} not supported by ' + 'DefaultNBitActivationQuantizeConfig.'.format( + layer.activation)) + + # This code is copied from TFMOT repo, but added relu6 to support mobilenet. + if layer.activation.__name__ in ['relu', 'relu6', 'swish']: + # 'relu' should generally get fused into the previous layer. + return [tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=self._num_bits_activation, per_axis=False, + symmetric=False, narrow_range=False)] # activation/output + elif layer.activation.__name__ in ['linear', 'softmax', 'sigmoid']: + return [] + + raise ValueError('Activation {} not supported by ' + 'DefaultNBitActivationQuantizeConfig.'.format( + layer.activation)) + + def get_config(self) -> Dict[str, Any]: + """Get a config for this quantizer config.""" + return { + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation, + } + + +def _types_dict(): + return { + 'DefaultNBitOutputQuantizeConfig': + DefaultNBitOutputQuantizeConfig, + 'NoOpQuantizeConfig': + NoOpQuantizeConfig, + 'DefaultNBitQuantizeConfig': + DefaultNBitQuantizeConfig, + 'DefaultNBitConvWeightsQuantizer': + DefaultNBitConvWeightsQuantizer, + 'DefaultNBitConvQuantizeConfig': + DefaultNBitConvQuantizeConfig, + 'DefaultNBitActivationQuantizeConfig': + DefaultNBitActivationQuantizeConfig, + } diff --git a/official/projects/qat/vision/n_bit/configs_test.py b/official/projects/qat/vision/n_bit/configs_test.py new file mode 100644 index 0000000000000000000000000000000000000000..5390f8d9c47fccde8034604c53e192292c8ee321 --- /dev/null +++ b/official/projects/qat/vision/n_bit/configs_test.py @@ -0,0 +1,224 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for configs.py.""" + +# Import libraries + +import numpy as np +import tensorflow as tf + +import tensorflow_model_optimization as tfmot + +from official.projects.qat.vision.n_bit import configs + + +class _TestHelper(object): + + def _convert_list(self, list_of_tuples): + """Transforms a list of 2-tuples to a tuple of 2 lists. + + `QuantizeConfig` methods return a list of 2-tuples in the form + [(weight1, quantizer1), (weight2, quantizer2)]. This function converts + it into a 2-tuple of lists. ([weight1, weight2]), (quantizer1, quantizer2). + + Args: + list_of_tuples: List of 2-tuples. + + Returns: + 2-tuple of lists. + """ + list1 = [] + list2 = [] + for a, b in list_of_tuples: + list1.append(a) + list2.append(b) + + return list1, list2 + + # TODO(pulkitb): Consider asserting on full equality for quantizers. + + def _assert_weight_quantizers(self, quantizer_list): + for quantizer in quantizer_list: + self.assertIsInstance( + quantizer, + tfmot.quantization.keras.quantizers.LastValueQuantizer) + + def _assert_activation_quantizers(self, quantizer_list): + for quantizer in quantizer_list: + self.assertIsInstance( + quantizer, + tfmot.quantization.keras.quantizers.MovingAverageQuantizer) + + def _assert_kernel_equality(self, a, b): + self.assertAllEqual(a.numpy(), b.numpy()) + + +class DefaultNBitQuantizeConfigTest(tf.test.TestCase, _TestHelper): + + def _simple_dense_layer(self): + layer = tf.keras.layers.Dense(2) + layer.build(input_shape=(3,)) + return layer + + def testGetsQuantizeWeightsAndQuantizers(self): + layer = self._simple_dense_layer() + num_bits_weight = 4 + num_bits_activation = 4 + + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + (weights, weight_quantizers) = self._convert_list( + quantize_config.get_weights_and_quantizers(layer)) + + self._assert_weight_quantizers(weight_quantizers) + self.assertEqual([layer.kernel], weights) + + def testGetsQuantizeActivationsAndQuantizers(self): + layer = self._simple_dense_layer() + num_bits_weight = 4 + num_bits_activation = 4 + + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + (activations, activation_quantizers) = self._convert_list( + quantize_config.get_activations_and_quantizers(layer)) + + self._assert_activation_quantizers(activation_quantizers) + self.assertEqual([layer.activation], activations) + + def testSetsQuantizeWeights(self): + layer = self._simple_dense_layer() + quantize_kernel = tf.keras.backend.variable( + np.ones(layer.kernel.shape.as_list())) + num_bits_weight = 4 + num_bits_activation = 4 + + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + quantize_config.set_quantize_weights(layer, [quantize_kernel]) + + self._assert_kernel_equality(layer.kernel, quantize_kernel) + + def testSetsQuantizeActivations(self): + layer = self._simple_dense_layer() + quantize_activation = tf.keras.activations.relu + num_bits_weight = 4 + num_bits_activation = 4 + + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + quantize_config.set_quantize_activations(layer, [quantize_activation]) + + self.assertEqual(layer.activation, quantize_activation) + + def testSetsQuantizeWeights_ErrorOnWrongNumberOfWeights(self): + layer = self._simple_dense_layer() + quantize_kernel = tf.keras.backend.variable( + np.ones(layer.kernel.shape.as_list())) + num_bits_weight = 4 + num_bits_activation = 4 + + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_weights(layer, []) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_weights(layer, + [quantize_kernel, quantize_kernel]) + + def testSetsQuantizeWeights_ErrorOnWrongShapeOfWeight(self): + layer = self._simple_dense_layer() + quantize_kernel = tf.keras.backend.variable(np.ones([1, 2])) + num_bits_weight = 4 + num_bits_activation = 4 + + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_weights(layer, [quantize_kernel]) + + def testSetsQuantizeActivations_ErrorOnWrongNumberOfActivations(self): + layer = self._simple_dense_layer() + quantize_activation = tf.keras.activations.relu + num_bits_weight = 4 + num_bits_activation = 4 + + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_activations(layer, []) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_activations( + layer, [quantize_activation, quantize_activation]) + + def testGetsResultQuantizers_ReturnsQuantizer(self): + layer = self._simple_dense_layer() + num_bits_weight = 4 + num_bits_activation = 4 + quantize_config = configs.DefaultNBitQuantizeConfig( + [], [], True, num_bits_weight, num_bits_activation) + + output_quantizers = quantize_config.get_output_quantizers(layer) + + self.assertLen(output_quantizers, 1) + self._assert_activation_quantizers(output_quantizers) + + def testGetsResultQuantizers_EmptyWhenFalse(self): + layer = self._simple_dense_layer() + num_bits_weight = 4 + num_bits_activation = 4 + quantize_config = configs.DefaultNBitQuantizeConfig( + [], [], False, num_bits_weight, num_bits_activation) + + output_quantizers = quantize_config.get_output_quantizers(layer) + + self.assertEqual([], output_quantizers) + + def testSerialization(self): + num_bits_weight = 4 + num_bits_activation = 4 + quantize_config = configs.DefaultNBitQuantizeConfig( + ['kernel'], ['activation'], False, num_bits_weight, num_bits_activation) + + expected_config = { + 'class_name': 'DefaultNBitQuantizeConfig', + 'config': { + 'weight_attrs': ['kernel'], + 'activation_attrs': ['activation'], + 'quantize_output': False, + 'num_bits_weight': 4, + 'num_bits_activation': 4 + } + } + serialized_quantize_config = tf.keras.utils.serialize_keras_object( + quantize_config) + + self.assertEqual(expected_config, serialized_quantize_config) + + quantize_config_from_config = tf.keras.utils.deserialize_keras_object( + serialized_quantize_config, + module_objects=globals(), + custom_objects=configs._types_dict()) + + self.assertEqual(quantize_config, quantize_config_from_config) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/n_bit/nn_blocks.py b/official/projects/qat/vision/n_bit/nn_blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..6f168fab7aa039eec43e7206e509c6a316abfaaa --- /dev/null +++ b/official/projects/qat/vision/n_bit/nn_blocks.py @@ -0,0 +1,799 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains quantized neural blocks for the QAT.""" +from typing import Any, Dict, Optional, Sequence, Union + +# Import libraries + +from absl import logging +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.modeling import tf_utils +from official.projects.qat.vision.n_bit import configs +from official.projects.qat.vision.n_bit import nn_layers as qat_nn_layers +from official.vision.modeling.layers import nn_layers + + +class NoOpActivation: + """No-op activation which simply returns the incoming tensor. + + This activation is required to distinguish between `keras.activations.linear` + which does the same thing. The main difference is that NoOpActivation should + not have any quantize operation applied to it. + """ + + def __call__(self, x: tf.Tensor) -> tf.Tensor: + return x + + def get_config(self) -> Dict[str, Any]: + """Get a config of this object.""" + return {} + + def __eq__(self, other: Any) -> bool: + if not other or not isinstance(other, NoOpActivation): + return False + + return True + + def __ne__(self, other: Any) -> bool: + return not self.__eq__(other) + + +def _quantize_wrapped_layer(cls, quantize_config): + def constructor(*arg, **kwargs): + return tfmot.quantization.keras.QuantizeWrapperV2( + cls(*arg, **kwargs), + quantize_config) + return constructor + + +# This class is copied from modeling.layers.nn_blocks.BottleneckBlock and apply +# QAT. +@tf.keras.utils.register_keras_serializable(package='Vision') +class BottleneckBlockNBitQuantized(tf.keras.layers.Layer): + """A quantized standard bottleneck block.""" + + def __init__(self, + filters: int, + strides: int, + dilation_rate: int = 1, + use_projection: bool = False, + se_ratio: Optional[float] = None, + resnetd_shortcut: bool = False, + stochastic_depth_drop_rate: Optional[float] = None, + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: tf.keras.regularizers.Regularizer = None, + bias_regularizer: tf.keras.regularizers.Regularizer = None, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + bn_trainable: bool = True, + num_bits_weight: int = 8, + num_bits_activation: int = 8, # pytype: disable=annotation-type-mismatch # typed-keras + **kwargs): + """Initializes a standard bottleneck block with BN after convolutions. + + Args: + filters: An `int` number of filters for the first two convolutions. Note + that the third and final convolution will use 4 times as many filters. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + dilation_rate: An `int` dilation_rate of convolutions. Default to 1. + use_projection: A `bool` for whether this block should use a projection + shortcut (versus the default identity shortcut). This is usually `True` + for the first block of a block group, which may change the number of + filters and the resolution. + se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. + resnetd_shortcut: A `bool`. If True, apply the resnetd style modification + to the shortcut connection. + stochastic_depth_drop_rate: A `float` or None. If not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + bn_trainable: A `bool` that indicates whether batch norm layers should be + trainable. Default to True. + num_bits_weight: An `int` number of bits for the weight. Default to 8. + num_bits_activation: An `int` number of bits for the weight. Default to 8. + **kwargs: Additional keyword arguments to be passed. + """ + super().__init__(**kwargs) + + self._filters = filters + self._strides = strides + self._dilation_rate = dilation_rate + self._use_projection = use_projection + self._se_ratio = se_ratio + self._resnetd_shortcut = resnetd_shortcut + self._use_sync_bn = use_sync_bn + self._activation = activation + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + if use_sync_bn: + self._norm = _quantize_wrapped_layer( + tf.keras.layers.experimental.SyncBatchNormalization, + configs.NoOpQuantizeConfig()) + self._norm_with_quantize = _quantize_wrapped_layer( + tf.keras.layers.experimental.SyncBatchNormalization, + configs.DefaultNBitOutputQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + else: + self._norm = _quantize_wrapped_layer( + tf.keras.layers.BatchNormalization, + configs.NoOpQuantizeConfig()) + self._norm_with_quantize = _quantize_wrapped_layer( + tf.keras.layers.BatchNormalization, + configs.DefaultNBitOutputQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._bn_trainable = bn_trainable + + def build(self, input_shape: Optional[Union[Sequence[int], tf.Tensor]]): + """Build variables and child layers to prepare for calling.""" + conv2d_quantized = _quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.DefaultNBitConvQuantizeConfig( + ['kernel'], ['activation'], False, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + if self._use_projection: + if self._resnetd_shortcut: + self._shortcut0 = tf.keras.layers.AveragePooling2D( + pool_size=2, strides=self._strides, padding='same') + self._shortcut1 = conv2d_quantized( + filters=self._filters * 4, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + else: + self._shortcut = conv2d_quantized( + filters=self._filters * 4, + kernel_size=1, + strides=self._strides, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + + self._norm0 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + + self._conv1 = conv2d_quantized( + filters=self._filters, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + self._norm1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation1 = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + + self._conv2 = conv2d_quantized( + filters=self._filters, + kernel_size=3, + strides=self._strides, + dilation_rate=self._dilation_rate, + padding='same', + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + self._norm2 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation2 = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + + self._conv3 = conv2d_quantized( + filters=self._filters * 4, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + self._norm3 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation3 = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + + if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: + self._squeeze_excitation = qat_nn_layers.SqueezeExcitationNBitQuantized( + in_filters=self._filters * 4, + out_filters=self._filters * 4, + se_ratio=self._se_ratio, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation) + else: + self._squeeze_excitation = None + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + self._add = tfmot.quantization.keras.QuantizeWrapperV2( + tf.keras.layers.Add(), + configs.DefaultNBitQuantizeConfig( + [], [], True, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + + super().build(input_shape) + + def get_config(self) -> Dict[str, Any]: + """Get a config of this layer.""" + config = { + 'filters': self._filters, + 'strides': self._strides, + 'dilation_rate': self._dilation_rate, + 'use_projection': self._use_projection, + 'se_ratio': self._se_ratio, + 'resnetd_shortcut': self._resnetd_shortcut, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'bn_trainable': self._bn_trainable, + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call( + self, + inputs: tf.Tensor, + training: Optional[Union[bool, tf.Tensor]] = None) -> tf.Tensor: + """Run the BottleneckBlockQuantized logics.""" + shortcut = inputs + if self._use_projection: + if self._resnetd_shortcut: + shortcut = self._shortcut0(shortcut) + shortcut = self._shortcut1(shortcut) + else: + shortcut = self._shortcut(shortcut) + shortcut = self._norm0(shortcut) + + x = self._conv1(inputs) + x = self._norm1(x) + x = self._activation1(x) + + x = self._conv2(x) + x = self._norm2(x) + x = self._activation2(x) + + x = self._conv3(x) + x = self._norm3(x) + + if self._squeeze_excitation: + x = self._squeeze_excitation(x) + + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + + x = self._add([x, shortcut]) + return self._activation3(x) + + +# This class is copied from modeling.backbones.mobilenet.Conv2DBNBlock and apply +# QAT. +@tf.keras.utils.register_keras_serializable(package='Vision') +class Conv2DBNBlockNBitQuantized(tf.keras.layers.Layer): + """A quantized convolution block with batch normalization.""" + + def __init__( + self, + filters: int, + kernel_size: int = 3, + strides: int = 1, + use_bias: bool = False, + activation: str = 'relu6', + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + use_normalization: bool = True, + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + num_bits_weight: int = 8, + num_bits_activation: int = 8, + **kwargs): + """A convolution block with batch normalization. + + Args: + filters: An `int` number of filters for the first two convolutions. Note + that the third and final convolution will use 4 times as many filters. + kernel_size: An `int` specifying the height and width of the 2D + convolution window. + strides: An `int` of block stride. If greater than 1, this block will + ultimately downsample the input. + use_bias: If True, use bias in the convolution layer. + activation: A `str` name of the activation function. + kernel_initializer: A `str` for kernel initializer of convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + Default to None. + use_normalization: If True, use batch normalization. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + num_bits_weight: An `int` number of bits for the weight. Default to 8. + num_bits_activation: An `int` number of bits for the weight. Default to 8. + **kwargs: Additional keyword arguments to be passed. + """ + super().__init__(**kwargs) + self._filters = filters + self._kernel_size = kernel_size + self._strides = strides + self._activation = activation + self._use_bias = use_bias + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._use_normalization = use_normalization + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + if use_sync_bn: + self._norm = _quantize_wrapped_layer( + tf.keras.layers.experimental.SyncBatchNormalization, + configs.NoOpQuantizeConfig()) + else: + self._norm = _quantize_wrapped_layer( + tf.keras.layers.BatchNormalization, + configs.NoOpQuantizeConfig()) + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + + def get_config(self) -> Dict[str, Any]: + """Get a config of this layer.""" + config = { + 'filters': self._filters, + 'strides': self._strides, + 'kernel_size': self._kernel_size, + 'use_bias': self._use_bias, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'use_normalization': self._use_normalization, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def build(self, input_shape: Optional[Union[Sequence[int], tf.Tensor]]): + """Build variables and child layers to prepare for calling.""" + conv2d_quantized = _quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.DefaultNBitConvQuantizeConfig( + ['kernel'], ['activation'], False, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + self._conv0 = conv2d_quantized( + filters=self._filters, + kernel_size=self._kernel_size, + strides=self._strides, + padding='same', + use_bias=self._use_bias, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + if self._use_normalization: + self._norm0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + + super(Conv2DBNBlockNBitQuantized, self).build(input_shape) + + def call( + self, + inputs: tf.Tensor, + training: Optional[Union[bool, tf.Tensor]] = None) -> tf.Tensor: + """Run the Conv2DBNBlockNBitQuantized logics.""" + x = self._conv0(inputs) + if self._use_normalization: + x = self._norm0(x) + return self._activation_layer(x) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class InvertedBottleneckBlockNBitQuantized(tf.keras.layers.Layer): + """A quantized inverted bottleneck block.""" + + def __init__(self, + in_filters, + out_filters, + expand_ratio, + strides, + kernel_size=3, + se_ratio=None, + stochastic_depth_drop_rate=None, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + se_inner_activation='relu', + se_gating_activation='sigmoid', + expand_se_in_filters=False, + depthwise_activation=None, + use_sync_bn=False, + dilation_rate=1, + divisible_by=1, + regularize_depthwise=False, + use_depthwise=True, + use_residual=True, + norm_momentum=0.99, + norm_epsilon=0.001, + num_bits_weight: int = 8, + num_bits_activation: int = 8, + **kwargs): + """Initializes an inverted bottleneck block with BN after convolutions. + + Args: + in_filters: An `int` number of filters of the input tensor. + out_filters: An `int` number of filters of the output tensor. + expand_ratio: An `int` of expand_ratio for an inverted bottleneck block. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + kernel_size: An `int` kernel_size of the depthwise conv layer. + se_ratio: A `float` or None. If not None, se ratio for the squeeze and + excitation layer. + stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + se_inner_activation: A `str` name of squeeze-excitation inner activation. + se_gating_activation: A `str` name of squeeze-excitation gating + activation. + expand_se_in_filters: A `bool` of whether or not to expand in_filter in + squeeze and excitation layer. + depthwise_activation: A `str` name of the activation function for + depthwise only. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + dilation_rate: An `int` that specifies the dilation rate to use for. + divisible_by: An `int` that ensures all inner dimensions are divisible by + this number. + dilated convolution: An `int` to specify the same value for all spatial + dimensions. + regularize_depthwise: A `bool` of whether or not apply regularization on + depthwise. + use_depthwise: A `bool` of whether to uses fused convolutions instead of + depthwise. + use_residual: A `bool` of whether to include residual connection between + input and output. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + num_bits_weight: An `int` number of bits for the weight. Default to 8. + num_bits_activation: An `int` number of bits for the weight. Default to 8. + **kwargs: Additional keyword arguments to be passed. + """ + super().__init__(**kwargs) + + self._in_filters = in_filters + self._out_filters = out_filters + self._expand_ratio = expand_ratio + self._strides = strides + self._kernel_size = kernel_size + self._se_ratio = se_ratio + self._divisible_by = divisible_by + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._dilation_rate = dilation_rate + self._use_sync_bn = use_sync_bn + self._regularize_depthwise = regularize_depthwise + self._use_depthwise = use_depthwise + self._use_residual = use_residual + self._activation = activation + self._se_inner_activation = se_inner_activation + self._se_gating_activation = se_gating_activation + self._depthwise_activation = depthwise_activation + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._expand_se_in_filters = expand_se_in_filters + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + if use_sync_bn: + self._norm = _quantize_wrapped_layer( + tf.keras.layers.experimental.SyncBatchNormalization, + configs.NoOpQuantizeConfig()) + self._norm_with_quantize = _quantize_wrapped_layer( + tf.keras.layers.experimental.SyncBatchNormalization, + configs.DefaultNBitOutputQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + else: + self._norm = _quantize_wrapped_layer( + tf.keras.layers.BatchNormalization, + configs.NoOpQuantizeConfig()) + self._norm_with_quantize = _quantize_wrapped_layer( + tf.keras.layers.BatchNormalization, + configs.DefaultNBitOutputQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + if not depthwise_activation: + self._depthwise_activation = activation + if regularize_depthwise: + self._depthsize_regularizer = kernel_regularizer + else: + self._depthsize_regularizer = None + + def build(self, input_shape: Optional[Union[Sequence[int], tf.Tensor]]): + """Build variables and child layers to prepare for calling.""" + conv2d_quantized = _quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.DefaultNBitConvQuantizeConfig( + ['kernel'], ['activation'], False, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + depthwise_conv2d_quantized = _quantize_wrapped_layer( + tf.keras.layers.DepthwiseConv2D, + configs.DefaultNBitConvQuantizeConfig( + ['depthwise_kernel'], ['activation'], False, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + expand_filters = self._in_filters + if self._expand_ratio > 1: + # First 1x1 conv for channel expansion. + expand_filters = nn_layers.make_divisible( + self._in_filters * self._expand_ratio, self._divisible_by) + + expand_kernel = 1 if self._use_depthwise else self._kernel_size + expand_stride = 1 if self._use_depthwise else self._strides + + self._conv0 = conv2d_quantized( + filters=expand_filters, + kernel_size=expand_kernel, + strides=expand_stride, + padding='same', + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + self._norm0 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._activation, use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + + if self._use_depthwise: + # Depthwise conv. + self._conv1 = depthwise_conv2d_quantized( + kernel_size=(self._kernel_size, self._kernel_size), + strides=self._strides, + padding='same', + depth_multiplier=1, + dilation_rate=self._dilation_rate, + use_bias=False, + depthwise_initializer=self._kernel_initializer, + depthwise_regularizer=self._depthsize_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + self._norm1 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._depthwise_activation_layer = ( + tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(self._depthwise_activation, + use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation))) + + # Squeeze and excitation. + if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: + logging.info('Use Squeeze and excitation.') + in_filters = self._in_filters + if self._expand_se_in_filters: + in_filters = expand_filters + self._squeeze_excitation = qat_nn_layers.SqueezeExcitationNBitQuantized( + in_filters=in_filters, + out_filters=expand_filters, + se_ratio=self._se_ratio, + divisible_by=self._divisible_by, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=self._se_inner_activation, + gating_activation=self._se_gating_activation, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation) + else: + self._squeeze_excitation = None + + # Last 1x1 conv. + self._conv2 = conv2d_quantized( + filters=self._out_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + self._norm2 = self._norm_with_quantize( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + self._add = tf.keras.layers.Add() + + super().build(input_shape) + + def get_config(self) -> Dict[str, Any]: + """Get a config of this layer.""" + config = { + 'in_filters': self._in_filters, + 'out_filters': self._out_filters, + 'expand_ratio': self._expand_ratio, + 'strides': self._strides, + 'kernel_size': self._kernel_size, + 'se_ratio': self._se_ratio, + 'divisible_by': self._divisible_by, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'se_inner_activation': self._se_inner_activation, + 'se_gating_activation': self._se_gating_activation, + 'expand_se_in_filters': self._expand_se_in_filters, + 'depthwise_activation': self._depthwise_activation, + 'dilation_rate': self._dilation_rate, + 'use_sync_bn': self._use_sync_bn, + 'regularize_depthwise': self._regularize_depthwise, + 'use_depthwise': self._use_depthwise, + 'use_residual': self._use_residual, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call( + self, + inputs: tf.Tensor, + training: Optional[Union[bool, tf.Tensor]] = None) -> tf.Tensor: + """Run the InvertedBottleneckBlockNBitQuantized logics.""" + shortcut = inputs + if self._expand_ratio > 1: + x = self._conv0(inputs) + x = self._norm0(x) + x = self._activation_layer(x) + else: + x = inputs + + if self._use_depthwise: + x = self._conv1(x) + x = self._norm1(x) + x = self._depthwise_activation_layer(x) + + if self._squeeze_excitation: + x = self._squeeze_excitation(x) + + x = self._conv2(x) + x = self._norm2(x) + + if (self._use_residual and + self._in_filters == self._out_filters and + self._strides == 1): + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + x = self._add([x, shortcut]) + + return x diff --git a/official/projects/qat/vision/n_bit/nn_blocks_test.py b/official/projects/qat/vision/n_bit/nn_blocks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..e5778b4414f4d1acbdd979d4cde11e8fc9fb33ff --- /dev/null +++ b/official/projects/qat/vision/n_bit/nn_blocks_test.py @@ -0,0 +1,99 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for nn_blocks.""" + +from typing import Any, Iterable, Tuple +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from tensorflow.python.distribute import strategy_combinations +from official.projects.qat.vision.n_bit import nn_blocks + + +def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]: + """Returns the combinations of end-to-end tests to run.""" + return combinations.combine( + distribution=[ + strategy_combinations.default_strategy, + strategy_combinations.cloud_tpu_strategy, + strategy_combinations.one_device_strategy_gpu, + ], + ) + + +class NNBlocksTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (nn_blocks.BottleneckBlockNBitQuantized, 1, False, 0.0, None, 4, 4), + (nn_blocks.BottleneckBlockNBitQuantized, 2, True, 0.2, 0.25, 4, 4), + ) + def test_bottleneck_block_creation(self, block_fn, strides, use_projection, + stochastic_depth_drop_rate, se_ratio, + num_bits_weight, num_bits_activation): + input_size = 128 + filter_size = 256 + inputs = tf.keras.Input( + shape=(input_size, input_size, filter_size * 4), batch_size=1) + block = block_fn( + filter_size, + strides, + use_projection=use_projection, + se_ratio=se_ratio, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + num_bits_weight=num_bits_weight, + num_bits_activation=num_bits_activation) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, filter_size * 4], + features.shape.as_list()) + + @parameterized.parameters( + (nn_blocks.InvertedBottleneckBlockNBitQuantized, 1, 1, None, None, 4, 4), + (nn_blocks.InvertedBottleneckBlockNBitQuantized, 6, 1, None, None, 4, 4), + (nn_blocks.InvertedBottleneckBlockNBitQuantized, 1, 2, None, None, 4, 4), + (nn_blocks.InvertedBottleneckBlockNBitQuantized, 1, 1, 0.2, None, 4, 4), + (nn_blocks.InvertedBottleneckBlockNBitQuantized, 1, 1, None, 0.2, 4, 4), + ) + def test_invertedbottleneck_block_creation( + self, block_fn, expand_ratio, strides, se_ratio, + stochastic_depth_drop_rate, num_bits_weight, num_bits_activation): + input_size = 128 + in_filters = 24 + out_filters = 40 + inputs = tf.keras.Input( + shape=(input_size, input_size, in_filters), batch_size=1) + block = block_fn( + in_filters=in_filters, + out_filters=out_filters, + expand_ratio=expand_ratio, + strides=strides, + se_ratio=se_ratio, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + num_bits_weight=num_bits_weight, + num_bits_activation=num_bits_activation) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, out_filters], + features.shape.as_list()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/n_bit/nn_layers.py b/official/projects/qat/vision/n_bit/nn_layers.py new file mode 100644 index 0000000000000000000000000000000000000000..feef66e7cd0280302c78835c837eafac4b373187 --- /dev/null +++ b/official/projects/qat/vision/n_bit/nn_layers.py @@ -0,0 +1,215 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common building blocks for neural networks.""" + +from typing import Any, Callable, Dict, Union + +import tensorflow as tf +import tensorflow_model_optimization as tfmot + +from official.modeling import tf_utils +from official.projects.qat.vision.n_bit import configs +from official.vision.modeling.layers import nn_layers + +# Type annotations. +States = Dict[str, tf.Tensor] +Activation = Union[str, Callable] + + +class NoOpActivation: + """No-op activation which simply returns the incoming tensor. + + This activation is required to distinguish between `keras.activations.linear` + which does the same thing. The main difference is that NoOpActivation should + not have any quantize operation applied to it. + """ + + def __call__(self, x: tf.Tensor) -> tf.Tensor: + return x + + def get_config(self) -> Dict[str, Any]: + """Get a config of this object.""" + return {} + + def __eq__(self, other: Any) -> bool: + return isinstance(other, NoOpActivation) + + def __ne__(self, other: Any) -> bool: + return not self.__eq__(other) + + +def _quantize_wrapped_layer(cls, quantize_config): + def constructor(*arg, **kwargs): + return tfmot.quantization.keras.QuantizeWrapperV2( + cls(*arg, **kwargs), + quantize_config) + return constructor + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SqueezeExcitationNBitQuantized(tf.keras.layers.Layer): + """Creates a squeeze and excitation layer.""" + + def __init__(self, + in_filters, + out_filters, + se_ratio, + divisible_by=1, + use_3d_input=False, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + gating_activation='sigmoid', + num_bits_weight=8, + num_bits_activation=8, + **kwargs): + """Initializes a squeeze and excitation layer. + + Args: + in_filters: An `int` number of filters of the input tensor. + out_filters: An `int` number of filters of the output tensor. + se_ratio: A `float` or None. If not None, se ratio for the squeeze and + excitation layer. + divisible_by: An `int` that ensures all inner dimensions are divisible by + this number. + use_3d_input: A `bool` of whether input is 2D or 3D image. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + gating_activation: A `str` name of the activation function for final + gating function. + num_bits_weight: An `int` number of bits for the weight. Default to 8. + num_bits_activation: An `int` number of bits for the weight. Default to 8. + **kwargs: Additional keyword arguments to be passed. + """ + super().__init__(**kwargs) + + self._in_filters = in_filters + self._out_filters = out_filters + self._se_ratio = se_ratio + self._divisible_by = divisible_by + self._use_3d_input = use_3d_input + self._activation = activation + self._gating_activation = gating_activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + if tf.keras.backend.image_data_format() == 'channels_last': + if not use_3d_input: + self._spatial_axis = [1, 2] + else: + self._spatial_axis = [1, 2, 3] + else: + if not use_3d_input: + self._spatial_axis = [2, 3] + else: + self._spatial_axis = [2, 3, 4] + self._activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(activation, use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + self._gating_activation_layer = tfmot.quantization.keras.QuantizeWrapperV2( + tf_utils.get_activation(gating_activation, use_keras_layer=True), + configs.DefaultNBitActivationQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + + def build(self, input_shape): + conv2d_quantized = _quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.DefaultNBitConvQuantizeConfig( + ['kernel'], ['activation'], False, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + conv2d_quantized_output_quantized = _quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.DefaultNBitConvQuantizeConfig( + ['kernel'], ['activation'], True, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + num_reduced_filters = nn_layers.make_divisible( + max(1, int(self._in_filters * self._se_ratio)), + divisor=self._divisible_by) + + self._se_reduce = conv2d_quantized( + filters=num_reduced_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=True, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + + self._se_expand = conv2d_quantized_output_quantized( + filters=self._out_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=True, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=NoOpActivation()) + + self._multiply = tfmot.quantization.keras.QuantizeWrapperV2( + tf.keras.layers.Multiply(), + configs.DefaultNBitQuantizeConfig( + [], [], True, num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)) + self._reduce_mean_quantizer = ( + tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=self._num_bits_activation, per_axis=False, + symmetric=False, narrow_range=False)) # activation/output + self._reduce_mean_quantizer_vars = self._reduce_mean_quantizer.build( + None, 'reduce_mean_quantizer_vars', self) + + super().build(input_shape) + + def get_config(self): + config = { + 'in_filters': self._in_filters, + 'out_filters': self._out_filters, + 'se_ratio': self._se_ratio, + 'divisible_by': self._divisible_by, + 'use_3d_input': self._use_3d_input, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'gating_activation': self._gating_activation, + 'num_bits_weight': self._num_bits_weight, + 'num_bits_activation': self._num_bits_activation + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + x = tf.reduce_mean(inputs, self._spatial_axis, keepdims=True) + x = self._reduce_mean_quantizer( + x, training, self._reduce_mean_quantizer_vars) + x = self._activation_layer(self._se_reduce(x)) + x = self._gating_activation_layer(self._se_expand(x)) + x = self._multiply([x, inputs]) + return x diff --git a/official/projects/qat/vision/n_bit/schemes.py b/official/projects/qat/vision/n_bit/schemes.py new file mode 100644 index 0000000000000000000000000000000000000000..31661f89e23335f62be3f3677693cfcef09590d6 --- /dev/null +++ b/official/projects/qat/vision/n_bit/schemes.py @@ -0,0 +1,223 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Quantization schemes.""" +from typing import Type + +# Import libraries + +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.projects.qat.vision.n_bit import configs +from official.projects.qat.vision.n_bit import nn_blocks + +keras = tf.keras +default_n_bit_transforms = tfmot.quantization.keras.experimental.default_n_bit.default_n_bit_transforms +_LayerNode = tfmot.quantization.keras.graph_transformations.transforms.LayerNode +_LayerPattern = tfmot.quantization.keras.graph_transformations.transforms.LayerPattern +_ModelTransformer = tfmot.quantization.keras.graph_transformations.model_transformer.ModelTransformer + +_QUANTIZATION_WEIGHT_NAMES = [ + 'output_max', 'output_min', 'optimizer_step', + 'kernel_min', 'kernel_max', + 'depthwise_kernel_min', 'depthwise_kernel_max', + 'reduce_mean_quantizer_vars_min', 'reduce_mean_quantizer_vars_max'] + +_ORIGINAL_WEIGHT_NAME = [ + 'kernel', 'depthwise_kernel', + 'gamma', 'beta', 'moving_mean', 'moving_variance', + 'bias'] + + +class CustomLayerQuantize( + tfmot.quantization.keras.graph_transformations.transforms.Transform): + """Add QAT support for Keras Custom layer.""" + + def __init__(self, + original_layer_pattern: str, + quantized_layer_class: Type[keras.layers.Layer], + num_bits_weight: int = 8, + num_bits_activation: int = 8): + super().__init__() + self._original_layer_pattern = original_layer_pattern + self._quantized_layer_class = quantized_layer_class + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + def pattern(self) -> _LayerPattern: + """See base class.""" + return _LayerPattern(self._original_layer_pattern) + + def _is_quantization_weight_name(self, name): + simple_name = name.split('/')[-1].split(':')[0] + if simple_name in _QUANTIZATION_WEIGHT_NAMES: + return True + if simple_name in _ORIGINAL_WEIGHT_NAME: + return False + raise ValueError(f'Variable name {simple_name} is not supported on ' + 'CustomLayerQuantize({self._original_layer_pattern}) ' + 'transform.') + + def replacement(self, match_layer: _LayerNode) -> _LayerNode: + """See base class.""" + bottleneck_layer = match_layer.layer + bottleneck_config = bottleneck_layer['config'] + bottleneck_config['num_bits_weight'] = self._num_bits_weight + bottleneck_config['num_bits_activation'] = self._num_bits_activation + bottleneck_names_and_weights = list(match_layer.names_and_weights) + quantized_layer = self._quantized_layer_class( + **bottleneck_config) + dummy_input_shape = [1, 1, 1, 1] + quantized_layer.compute_output_shape(dummy_input_shape) + quantized_names_and_weights = zip( + [weight.name for weight in quantized_layer.weights], + quantized_layer.get_weights()) + match_idx = 0 + names_and_weights = [] + for name_and_weight in quantized_names_and_weights: + if not self._is_quantization_weight_name(name=name_and_weight[0]): + name_and_weight = bottleneck_names_and_weights[match_idx] + match_idx = match_idx + 1 + names_and_weights.append(name_and_weight) + + if match_idx != len(bottleneck_names_and_weights): + raise ValueError('{}/{} of Bottleneck weights is transformed.'.format( + match_idx, len(bottleneck_names_and_weights))) + quantized_layer_config = keras.layers.serialize(quantized_layer) + quantized_layer_config['name'] = quantized_layer_config['config']['name'] + layer_metadata = { + 'quantize_config': + configs.DefaultNBitOutputQuantizeConfig( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation)} + + return _LayerNode( + quantized_layer_config, + metadata=layer_metadata, + names_and_weights=names_and_weights) + + +class QuantizeLayoutTransform( + tfmot.quantization.keras.QuantizeLayoutTransform): + """Default model transformations.""" + + def __init__(self, num_bits_weight: int = 8, num_bits_activation: int = 8): + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + def apply(self, model, layer_quantize_map): + """Implement default 8-bit transforms. + + Currently this means the following. + 1. Pull activations into layers, and apply fuse activations. (TODO) + 2. Modify range in incoming layers for Concat. (TODO) + 3. Fuse Conv2D/DepthwiseConv2D + BN into single layer. + + Args: + model: Keras model to be quantized. + layer_quantize_map: Map with keys as layer names, and values as dicts + containing custom `QuantizeConfig`s which may have been passed with + layers. + + Returns: + (Transformed Keras model to better match TensorFlow Lite backend, updated + layer quantize map.) + """ + + transforms = [ + default_n_bit_transforms.InputLayerQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.SeparableConv1DQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.SeparableConvQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.Conv2DReshapeBatchNormReLUQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.Conv2DReshapeBatchNormActivationQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.Conv2DBatchNormReLUQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.Conv2DBatchNormActivationQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.Conv2DReshapeBatchNormQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.Conv2DBatchNormQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.ConcatTransform6Inputs( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.ConcatTransform5Inputs( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.ConcatTransform4Inputs( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.ConcatTransform3Inputs( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.ConcatTransform( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.LayerReLUQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + default_n_bit_transforms.LayerReluActivationQuantize( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + CustomLayerQuantize( + 'Vision>BottleneckBlock', + nn_blocks.BottleneckBlockNBitQuantized, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + CustomLayerQuantize( + 'Vision>InvertedBottleneckBlock', + nn_blocks.InvertedBottleneckBlockNBitQuantized, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation), + CustomLayerQuantize( + 'Vision>Conv2DBNBlock', + nn_blocks.Conv2DBNBlockNBitQuantized, + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation) + ] + return _ModelTransformer(model, transforms, set(layer_quantize_map.keys()), + layer_quantize_map).transform() + + +class DefaultNBitQuantizeScheme(tfmot.quantization.keras.experimental + .default_n_bit.DefaultNBitQuantizeScheme): + """Default N-bit Scheme.""" + + def __init__(self, num_bits_weight: int = 8, num_bits_activation: int = 8): + super(DefaultNBitQuantizeScheme, self).__init__( + num_bits_weight=num_bits_weight, + num_bits_activation=num_bits_activation) + self._num_bits_weight = num_bits_weight + self._num_bits_activation = num_bits_activation + + def get_layout_transformer(self): + return QuantizeLayoutTransform( + num_bits_weight=self._num_bits_weight, + num_bits_activation=self._num_bits_activation) + diff --git a/official/projects/qat/vision/quantization/__init__.py b/official/projects/qat/vision/quantization/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..67c06b5c83222383a661fce9c58ce2a763e39c07 --- /dev/null +++ b/official/projects/qat/vision/quantization/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs package definition.""" diff --git a/official/projects/qat/vision/quantization/configs.py b/official/projects/qat/vision/quantization/configs.py new file mode 100644 index 0000000000000000000000000000000000000000..17eeb9c3fcc8a9216da1e5b1f0d73e65ed0b88cb --- /dev/null +++ b/official/projects/qat/vision/quantization/configs.py @@ -0,0 +1,337 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Default 8-bit QuantizeConfigs.""" +from typing import Sequence, Callable, Tuple, Any, Dict + +import tensorflow as tf +import tensorflow_model_optimization as tfmot + + +Quantizer = tfmot.quantization.keras.quantizers.Quantizer +Layer = tf.keras.layers.Layer +Activation = Callable[[tf.Tensor], tf.Tensor] +WeightAndQuantizer = Tuple[tf.Variable, Quantizer] +ActivationAndQuantizer = Tuple[Activation, Quantizer] + + +class Default8BitOutputQuantizeConfig(tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig which only quantizes the output from a layer.""" + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + return [] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + return [] + + def set_quantize_weights(self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + pass + + def set_quantize_activations(self, + layer: Layer, + quantize_activations: Sequence[Activation]): + pass + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + return [ + tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=8, per_axis=False, symmetric=False, narrow_range=False) + ] + + def get_config(self) -> Dict[str, Any]: + return {} + + +class NoOpQuantizeConfig(tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig which does not quantize any part of the layer.""" + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + return [] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + return [] + + def set_quantize_weights( + self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + pass + + def set_quantize_activations( + self, + layer: Layer, + quantize_activations: Sequence[Activation]): + pass + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + return [] + + def get_config(self) -> Dict[str, Any]: + return {} + + +class Default8BitQuantizeConfig(tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig for non recurrent Keras layers.""" + + def __init__(self, + weight_attrs: Sequence[str], + activation_attrs: Sequence[str], + quantize_output: bool): + """Initializes a default 8bit quantize config.""" + self.weight_attrs = weight_attrs + self.activation_attrs = activation_attrs + self.quantize_output = quantize_output + + # TODO(pulkitb): For some layers such as Conv2D, per_axis should be True. + # Add mapping for which layers support per_axis. + self.weight_quantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer( + num_bits=8, per_axis=False, symmetric=True, narrow_range=True) + self.activation_quantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=8, per_axis=False, symmetric=False, narrow_range=False) + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + """See base class.""" + return [(getattr(layer, weight_attr), self.weight_quantizer) + for weight_attr in self.weight_attrs] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + """See base class.""" + return [(getattr(layer, activation_attr), self.activation_quantizer) + for activation_attr in self.activation_attrs] + + def set_quantize_weights( + self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + """See base class.""" + if len(self.weight_attrs) != len(quantize_weights): + raise ValueError( + '`set_quantize_weights` called on layer {} with {} ' + 'weight parameters, but layer expects {} values.'.format( + layer.name, len(quantize_weights), len(self.weight_attrs))) + + for weight_attr, weight in zip(self.weight_attrs, quantize_weights): + current_weight = getattr(layer, weight_attr) + if current_weight.shape != weight.shape: + raise ValueError('Existing layer weight shape {} is incompatible with' + 'provided weight shape {}'.format( + current_weight.shape, weight.shape)) + + setattr(layer, weight_attr, weight) + + def set_quantize_activations( + self, + layer: Layer, + quantize_activations: Sequence[Activation]): + """See base class.""" + if len(self.activation_attrs) != len(quantize_activations): + raise ValueError( + '`set_quantize_activations` called on layer {} with {} ' + 'activation parameters, but layer expects {} values.'.format( + layer.name, len(quantize_activations), + len(self.activation_attrs))) + + for activation_attr, activation in zip( + self.activation_attrs, quantize_activations): + setattr(layer, activation_attr, activation) + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + """See base class.""" + if self.quantize_output: + return [self.activation_quantizer] + return [] + + @classmethod + def from_config(cls, config: Dict[str, Any]) -> object: + """Instantiates a `Default8BitQuantizeConfig` from its config. + + Args: + config: Output of `get_config()`. + + Returns: + A `Default8BitQuantizeConfig` instance. + """ + return cls(**config) + + def get_config(self) -> Dict[str, Any]: + """Get a config for this quantize config.""" + # TODO(pulkitb): Add weight and activation quantizer to config. + # Currently it's created internally, but ideally the quantizers should be + # part of the constructor and passed in from the registry. + return { + 'weight_attrs': self.weight_attrs, + 'activation_attrs': self.activation_attrs, + 'quantize_output': self.quantize_output + } + + def __eq__(self, other): + if not isinstance(other, Default8BitQuantizeConfig): + return False + + return (self.weight_attrs == other.weight_attrs and + self.activation_attrs == self.activation_attrs and + self.weight_quantizer == other.weight_quantizer and + self.activation_quantizer == other.activation_quantizer and + self.quantize_output == other.quantize_output) + + def __ne__(self, other): + return not self.__eq__(other) + + +class Default8BitConvWeightsQuantizer( + tfmot.quantization.keras.quantizers.LastValueQuantizer): + """Quantizer for handling weights in Conv2D/DepthwiseConv2D layers.""" + + def __init__(self): + """Construct LastValueQuantizer with params specific for TFLite Convs.""" + + super(Default8BitConvWeightsQuantizer, self).__init__( + num_bits=8, per_axis=True, symmetric=True, narrow_range=True) + + def build(self, + tensor_shape: tf.TensorShape, + name: str, + layer: Layer): + """Build min/max quantization variables.""" + min_weight = layer.add_weight( + name + '_min', + shape=(tensor_shape[-1],), + initializer=tf.keras.initializers.Constant(-6.0), + trainable=False) + max_weight = layer.add_weight( + name + '_max', + shape=(tensor_shape[-1],), + initializer=tf.keras.initializers.Constant(6.0), + trainable=False) + + return {'min_var': min_weight, 'max_var': max_weight} + + +class NoQuantizer(tfmot.quantization.keras.quantizers.Quantizer): + """Dummy quantizer for explicitly not quantize.""" + + def __call__(self, inputs, training, weights, **kwargs): + return tf.identity(inputs) + + def get_config(self): + return {} + + def build(self, tensor_shape, name, layer): + return {} + + +class Default8BitConvQuantizeConfig(Default8BitQuantizeConfig): + """QuantizeConfig for Conv2D/DepthwiseConv2D layers.""" + + def __init__(self, + weight_attrs: Sequence[str], + activation_attrs: Sequence[str], + quantize_output: bool): + """Initializes default 8bit quantization config for the conv layer.""" + super().__init__(weight_attrs, activation_attrs, quantize_output) + + self.weight_quantizer = Default8BitConvWeightsQuantizer() + + +class Default8BitActivationQuantizeConfig( + tfmot.quantization.keras.QuantizeConfig): + """QuantizeConfig for keras.layers.Activation. + + `keras.layers.Activation` needs a separate `QuantizeConfig` since the + decision to quantize depends on the specific activation type. + """ + + def _assert_activation_layer(self, layer: Layer): + if not isinstance(layer, tf.keras.layers.Activation): + raise RuntimeError( + 'Default8BitActivationQuantizeConfig can only be used with ' + '`keras.layers.Activation`.') + + def get_weights_and_quantizers( + self, layer: Layer) -> Sequence[WeightAndQuantizer]: + """See base class.""" + self._assert_activation_layer(layer) + return [] + + def get_activations_and_quantizers( + self, layer: Layer) -> Sequence[ActivationAndQuantizer]: + """See base class.""" + self._assert_activation_layer(layer) + return [] + + def set_quantize_weights( + self, + layer: Layer, + quantize_weights: Sequence[tf.Tensor]): + """See base class.""" + self._assert_activation_layer(layer) + + def set_quantize_activations( + self, + layer: Layer, + quantize_activations: Sequence[Activation]): + """See base class.""" + self._assert_activation_layer(layer) + + def get_output_quantizers(self, layer: Layer) -> Sequence[Quantizer]: + """See base class.""" + self._assert_activation_layer(layer) + + if not hasattr(layer.activation, '__name__'): + raise ValueError('Activation {} not supported by ' + 'Default8BitActivationQuantizeConfig.'.format( + layer.activation)) + + # This code is copied from TFMOT repo, but added relu6 to support mobilenet. + if layer.activation.__name__ in ['relu', 'relu6', 'swish', 'hard_swish']: + # 'relu' should generally get fused into the previous layer. + return [tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=8, per_axis=False, symmetric=False, narrow_range=False)] + elif layer.activation.__name__ in [ + 'linear', 'softmax', 'sigmoid', 'hard_sigmoid' + ]: + return [] + + raise ValueError('Activation {} not supported by ' + 'Default8BitActivationQuantizeConfig.'.format( + layer.activation)) + + def get_config(self) -> Dict[str, Any]: + """Get a config for this quantizer config.""" + return {} + + +def _types_dict(): + return { + 'Default8BitOutputQuantizeConfig': + Default8BitOutputQuantizeConfig, + 'NoOpQuantizeConfig': + NoOpQuantizeConfig, + 'Default8BitQuantizeConfig': + Default8BitQuantizeConfig, + 'Default8BitConvWeightsQuantizer': + Default8BitConvWeightsQuantizer, + 'Default8BitConvQuantizeConfig': + Default8BitConvQuantizeConfig, + 'Default8BitActivationQuantizeConfig': + Default8BitActivationQuantizeConfig, + } diff --git a/official/projects/qat/vision/quantization/configs_test.py b/official/projects/qat/vision/quantization/configs_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d23a65e3a1621890bb17e0a5774f737a6a666ed4 --- /dev/null +++ b/official/projects/qat/vision/quantization/configs_test.py @@ -0,0 +1,202 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for configs.py.""" + +# Import libraries + +import numpy as np +import tensorflow as tf + +import tensorflow_model_optimization as tfmot + +from official.projects.qat.vision.quantization import configs + + +class _TestHelper(object): + + def _convert_list(self, list_of_tuples): + """Transforms a list of 2-tuples to a tuple of 2 lists. + + `QuantizeConfig` methods return a list of 2-tuples in the form + [(weight1, quantizer1), (weight2, quantizer2)]. This function converts + it into a 2-tuple of lists. ([weight1, weight2]), (quantizer1, quantizer2). + + Args: + list_of_tuples: List of 2-tuples. + + Returns: + 2-tuple of lists. + """ + list1 = [] + list2 = [] + for a, b in list_of_tuples: + list1.append(a) + list2.append(b) + + return list1, list2 + + # TODO(pulkitb): Consider asserting on full equality for quantizers. + + def _assert_weight_quantizers(self, quantizer_list): + for quantizer in quantizer_list: + self.assertIsInstance( + quantizer, + tfmot.quantization.keras.quantizers.LastValueQuantizer) + + def _assert_activation_quantizers(self, quantizer_list): + for quantizer in quantizer_list: + self.assertIsInstance( + quantizer, + tfmot.quantization.keras.quantizers.MovingAverageQuantizer) + + def _assert_kernel_equality(self, a, b): + self.assertAllEqual(a.numpy(), b.numpy()) + + +class Default8BitQuantizeConfigTest(tf.test.TestCase, _TestHelper): + + def _simple_dense_layer(self): + layer = tf.keras.layers.Dense(2) + layer.build(input_shape=(3,)) + return layer + + def testGetsQuantizeWeightsAndQuantizers(self): + layer = self._simple_dense_layer() + + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + (weights, weight_quantizers) = self._convert_list( + quantize_config.get_weights_and_quantizers(layer)) + + self._assert_weight_quantizers(weight_quantizers) + self.assertEqual([layer.kernel], weights) + + def testGetsQuantizeActivationsAndQuantizers(self): + layer = self._simple_dense_layer() + + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + (activations, activation_quantizers) = self._convert_list( + quantize_config.get_activations_and_quantizers(layer)) + + self._assert_activation_quantizers(activation_quantizers) + self.assertEqual([layer.activation], activations) + + def testSetsQuantizeWeights(self): + layer = self._simple_dense_layer() + quantize_kernel = tf.keras.backend.variable( + np.ones(layer.kernel.shape.as_list())) + + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + quantize_config.set_quantize_weights(layer, [quantize_kernel]) + + self._assert_kernel_equality(layer.kernel, quantize_kernel) + + def testSetsQuantizeActivations(self): + layer = self._simple_dense_layer() + quantize_activation = tf.keras.activations.relu + + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + quantize_config.set_quantize_activations(layer, [quantize_activation]) + + self.assertEqual(layer.activation, quantize_activation) + + def testSetsQuantizeWeights_ErrorOnWrongNumberOfWeights(self): + layer = self._simple_dense_layer() + quantize_kernel = tf.keras.backend.variable( + np.ones(layer.kernel.shape.as_list())) + + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_weights(layer, []) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_weights(layer, + [quantize_kernel, quantize_kernel]) + + def testSetsQuantizeWeights_ErrorOnWrongShapeOfWeight(self): + layer = self._simple_dense_layer() + quantize_kernel = tf.keras.backend.variable(np.ones([1, 2])) + + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_weights(layer, [quantize_kernel]) + + def testSetsQuantizeActivations_ErrorOnWrongNumberOfActivations(self): + layer = self._simple_dense_layer() + quantize_activation = tf.keras.activations.relu + + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_activations(layer, []) + + with self.assertRaises(ValueError): + quantize_config.set_quantize_activations( + layer, [quantize_activation, quantize_activation]) + + def testGetsResultQuantizers_ReturnsQuantizer(self): + layer = self._simple_dense_layer() + quantize_config = configs.Default8BitQuantizeConfig( + [], [], True) + + output_quantizers = quantize_config.get_output_quantizers(layer) + + self.assertLen(output_quantizers, 1) + self._assert_activation_quantizers(output_quantizers) + + def testGetsResultQuantizers_EmptyWhenFalse(self): + layer = self._simple_dense_layer() + quantize_config = configs.Default8BitQuantizeConfig( + [], [], False) + + output_quantizers = quantize_config.get_output_quantizers(layer) + + self.assertEqual([], output_quantizers) + + def testSerialization(self): + quantize_config = configs.Default8BitQuantizeConfig( + ['kernel'], ['activation'], False) + + expected_config = { + 'class_name': 'Default8BitQuantizeConfig', + 'config': { + 'weight_attrs': ['kernel'], + 'activation_attrs': ['activation'], + 'quantize_output': False + } + } + serialized_quantize_config = tf.keras.utils.serialize_keras_object( + quantize_config) + + self.assertEqual(expected_config, serialized_quantize_config) + + quantize_config_from_config = tf.keras.utils.deserialize_keras_object( + serialized_quantize_config, + module_objects=globals(), + custom_objects=configs._types_dict()) + + self.assertEqual(quantize_config, quantize_config_from_config) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/quantization/helper.py b/official/projects/qat/vision/quantization/helper.py new file mode 100644 index 0000000000000000000000000000000000000000..3b8958caa8f1f1dcf1e530a3875b231f3e8949ff --- /dev/null +++ b/official/projects/qat/vision/quantization/helper.py @@ -0,0 +1,179 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Quantization helpers.""" +from typing import Any, Dict + +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.projects.qat.vision.quantization import configs + + +_QUANTIZATION_WEIGHT_NAMES = [ + 'output_max', 'output_min', 'optimizer_step', 'kernel_min', 'kernel_max', + 'add_three_min', 'add_three_max', 'divide_six_min', 'divide_six_max', + 'depthwise_kernel_min', 'depthwise_kernel_max', + 'reduce_mean_quantizer_vars_min', 'reduce_mean_quantizer_vars_max', + 'quantize_layer_min', 'quantize_layer_max', + 'quantize_layer_1_min', 'quantize_layer_1_max', + 'quantize_layer_2_min', 'quantize_layer_2_max', + 'quantize_layer_3_min', 'quantize_layer_3_max', + 'post_activation_min', 'post_activation_max', +] + +_ORIGINAL_WEIGHT_NAME = [ + 'kernel', 'depthwise_kernel', 'gamma', 'beta', 'moving_mean', + 'moving_variance', 'bias' +] + + +def is_quantization_weight_name(name: str) -> bool: + simple_name = name.split('/')[-1].split(':')[0] + if simple_name in _QUANTIZATION_WEIGHT_NAMES: + return True + if simple_name in _ORIGINAL_WEIGHT_NAME: + return False + raise ValueError('Variable name {} is not supported.'.format(simple_name)) + + +def copy_original_weights(original_model: tf.keras.Model, + quantized_model: tf.keras.Model): + """Helper function that copy the original model weights to quantized model.""" + original_weight_value = original_model.get_weights() + weight_values = quantized_model.get_weights() + + original_idx = 0 + for idx, weight in enumerate(quantized_model.weights): + if not is_quantization_weight_name(weight.name): + if original_idx >= len(original_weight_value): + raise ValueError('Not enought original model weights.') + weight_values[idx] = original_weight_value[original_idx] + original_idx = original_idx + 1 + + if original_idx < len(original_weight_value): + raise ValueError('Not enought quantized model weights.') + + quantized_model.set_weights(weight_values) + + +class LayerQuantizerHelper(object): + """Helper class that handles quantizers.""" + + def __init__(self, *args, **kwargs): + self._quantizers = {} + self._quantizer_vars = {} + super().__init__(*args, **kwargs) + + def _all_value_quantizer(self): + return tfmot.quantization.keras.quantizers.AllValuesQuantizer( + num_bits=8, per_axis=False, symmetric=False, narrow_range=False) + + def _moving_average_quantizer(self): + return tfmot.quantization.keras.quantizers.MovingAverageQuantizer( + num_bits=8, per_axis=False, symmetric=False, narrow_range=False) + + def _add_quantizer(self, name, all_value_quantizer=False): + if all_value_quantizer: + self._quantizers[name] = self._all_value_quantizer() + else: + self._quantizers[name] = self._moving_average_quantizer() + + def _apply_quantizer(self, name, inputs, training, **kwargs): + return self._quantizers[name]( + inputs, training, self._quantizer_vars[name], **kwargs) + + def _build_quantizer_vars(self): + for name in self._quantizers: + self._quantizer_vars[name] = self._quantizers[name].build( + tensor_shape=None, name=name, layer=self) + + +class NoOpActivation: + """No-op activation which simply returns the incoming tensor. + + This activation is required to distinguish between `keras.activations.linear` + which does the same thing. The main difference is that NoOpActivation should + not have any quantize operation applied to it. + """ + + def __call__(self, x: tf.Tensor) -> tf.Tensor: + return x + + def get_config(self) -> Dict[str, Any]: + """Get a config of this object.""" + return {} + + def __eq__(self, other: Any) -> bool: + if not other or not isinstance(other, NoOpActivation): + return False + + return True + + def __ne__(self, other: Any) -> bool: + return not self.__eq__(other) + + +def quantize_wrapped_layer(cls, quantize_config): + + def constructor(*arg, **kwargs): + return tfmot.quantization.keras.QuantizeWrapperV2( + cls(*arg, **kwargs), quantize_config) + + return constructor + + +def norm_by_activation(activation, norm_quantized, norm_no_quantized): + if activation not in ['relu', 'relu6']: + return norm_quantized + else: + return norm_no_quantized + + +Conv2DQuantized = quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.Default8BitConvQuantizeConfig(['kernel'], ['activation'], False)) +Conv2DOutputQuantized = quantize_wrapped_layer( + tf.keras.layers.Conv2D, + configs.Default8BitConvQuantizeConfig(['kernel'], ['activation'], True)) +DepthwiseConv2DQuantized = quantize_wrapped_layer( + tf.keras.layers.DepthwiseConv2D, + configs.Default8BitConvQuantizeConfig(['depthwise_kernel'], ['activation'], + False)) +DepthwiseConv2DOutputQuantized = quantize_wrapped_layer( + tf.keras.layers.DepthwiseConv2D, + configs.Default8BitConvQuantizeConfig(['depthwise_kernel'], ['activation'], + True)) +GlobalAveragePooling2DQuantized = quantize_wrapped_layer( + tf.keras.layers.GlobalAveragePooling2D, + configs.Default8BitQuantizeConfig([], [], True)) +AveragePooling2DQuantized = quantize_wrapped_layer( + tf.keras.layers.AveragePooling2D, + configs.Default8BitQuantizeConfig([], [], True)) +ResizingQuantized = quantize_wrapped_layer( + tf.keras.layers.Resizing, configs.Default8BitQuantizeConfig([], [], True)) +ConcatenateQuantized = quantize_wrapped_layer( + tf.keras.layers.Concatenate, configs.Default8BitQuantizeConfig([], [], + True)) +UpSampling2DQuantized = quantize_wrapped_layer( + tf.keras.layers.UpSampling2D, configs.Default8BitQuantizeConfig([], [], + True)) +ReshapeQuantized = quantize_wrapped_layer( + tf.keras.layers.Reshape, configs.Default8BitQuantizeConfig([], [], True)) + +# pylint:disable=g-long-lambda +BatchNormalizationQuantized = lambda norm_layer: quantize_wrapped_layer( + norm_layer, configs.Default8BitOutputQuantizeConfig()) +BatchNormalizationNoQuantized = lambda norm_layer: quantize_wrapped_layer( + norm_layer, configs.NoOpQuantizeConfig()) diff --git a/official/projects/qat/vision/quantization/helper_test.py b/official/projects/qat/vision/quantization/helper_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3f9c372dfdbed81d294b7973c5af9b4d03613d2d --- /dev/null +++ b/official/projects/qat/vision/quantization/helper_test.py @@ -0,0 +1,54 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for helper.""" +import numpy as np +import tensorflow as tf + +import tensorflow_model_optimization as tfmot +from official.projects.qat.vision.quantization import helper + + +class HelperTest(tf.test.TestCase): + + def create_simple_model(self): + return tf.keras.models.Sequential([ + tf.keras.layers.Dense(8, input_shape=(16,)), + ]) + + def test_copy_original_weights_for_simple_model_with_custom_weights(self): + one_model = self.create_simple_model() + one_weights = [np.ones_like(weight) for weight in one_model.get_weights()] + one_model.set_weights(one_weights) + + qat_model = tfmot.quantization.keras.quantize_model( + self.create_simple_model()) + zero_weights = [np.zeros_like(weight) for weight in qat_model.get_weights()] + qat_model.set_weights(zero_weights) + + helper.copy_original_weights(one_model, qat_model) + + qat_model_weights = qat_model.get_weights() + count = 0 + for idx, weight in enumerate(qat_model.weights): + if not helper.is_quantization_weight_name(weight.name): + self.assertAllEqual( + qat_model_weights[idx], np.ones_like(qat_model_weights[idx])) + count += 1 + self.assertLen(one_model.weights, count) + self.assertGreater(len(qat_model.weights), len(one_model.weights)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/quantization/layer_transforms.py b/official/projects/qat/vision/quantization/layer_transforms.py new file mode 100644 index 0000000000000000000000000000000000000000..c093f92e6dd53a436facdd3829dc367f0c316371 --- /dev/null +++ b/official/projects/qat/vision/quantization/layer_transforms.py @@ -0,0 +1,115 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains custom quantization layer transforms.""" +from typing import Any, Type, Mapping, List, Union, Tuple + +import tensorflow as tf +import tensorflow_model_optimization as tfmot +from official.projects.qat.vision.modeling.layers import nn_blocks as quantized_nn_blocks +from official.projects.qat.vision.modeling.layers import nn_layers as quantized_nn_layers +from official.projects.qat.vision.quantization import configs +from official.projects.qat.vision.quantization import helper + +keras = tf.keras +LayerNode = tfmot.quantization.keras.graph_transformations.transforms.LayerNode +LayerPattern = tfmot.quantization.keras.graph_transformations.transforms.LayerPattern + +_LAYER_NAMES = [ + 'Vision>Conv2DBNBlock', 'Vision>InvertedBottleneckBlock', + 'Vision>SegmentationHead', 'Vision>SpatialPyramidPooling', 'Vision>ASPP' +] + + +class CustomLayerQuantize( + tfmot.quantization.keras.graph_transformations.transforms.Transform): + """Add QAT support for Keras Custom layer.""" + + def __init__(self, original_layer_pattern: str, + quantized_layer_class: Type[keras.layers.Layer]): + super(CustomLayerQuantize, self).__init__() + self._original_layer_pattern = original_layer_pattern + self._quantized_layer_class = quantized_layer_class + + def pattern(self) -> LayerPattern: + """See base class.""" + return LayerPattern(self._original_layer_pattern) + + def _create_layer_metadata( + self, layer_class_name: str + ) -> Mapping[str, tfmot.quantization.keras.QuantizeConfig]: + if layer_class_name in _LAYER_NAMES: + layer_metadata = {'quantize_config': configs.NoOpQuantizeConfig()} + else: + layer_metadata = { + 'quantize_config': configs.Default8BitOutputQuantizeConfig() + } + return layer_metadata + + def _create_dummy_input_shape( + self, quantized_layer: tf.keras.layers.Layer + ) -> Union[List[int], Tuple[Any, Any]]: + dummy_input_shape = [1, 128, 128, 1] + # SegmentationHead layer requires a tuple of 2 tensors. + if isinstance(quantized_layer, + quantized_nn_layers.SegmentationHeadQuantized): + dummy_input_shape = ([1, 1, 1, 1], [1, 1, 1, 1]) + return dummy_input_shape + + def replacement(self, match_layer: LayerNode) -> LayerNode: + """See base class.""" + bottleneck_layer = match_layer.layer + bottleneck_config = bottleneck_layer['config'] + bottleneck_names_and_weights = list(match_layer.names_and_weights) + quantized_layer = self._quantized_layer_class(**bottleneck_config) + dummy_input_shape = self._create_dummy_input_shape(quantized_layer) + quantized_layer.compute_output_shape(dummy_input_shape) + quantized_names_and_weights = zip( + [weight.name for weight in quantized_layer.weights], + quantized_layer.get_weights()) + match_idx = 0 + names_and_weights = [] + for name_and_weight in quantized_names_and_weights: + if not helper.is_quantization_weight_name(name=name_and_weight[0]): + name_and_weight = bottleneck_names_and_weights[match_idx] + match_idx = match_idx + 1 + names_and_weights.append(name_and_weight) + + if match_idx != len(bottleneck_names_and_weights): + raise ValueError('{}/{} of Bottleneck weights is transformed.'.format( + match_idx, len(bottleneck_names_and_weights))) + quantized_layer_config = keras.layers.serialize(quantized_layer) + quantized_layer_config['name'] = quantized_layer_config['config']['name'] + + layer_metadata = self._create_layer_metadata(bottleneck_layer['class_name']) + + return LayerNode( + quantized_layer_config, + metadata=layer_metadata, + names_and_weights=names_and_weights) + + +CUSTOM_TRANSFORMS = [ + CustomLayerQuantize('Vision>BottleneckBlock', + quantized_nn_blocks.BottleneckBlockQuantized), + CustomLayerQuantize('Vision>InvertedBottleneckBlock', + quantized_nn_blocks.InvertedBottleneckBlockQuantized), + CustomLayerQuantize('Vision>Conv2DBNBlock', + quantized_nn_blocks.Conv2DBNBlockQuantized), + CustomLayerQuantize('Vision>SegmentationHead', + quantized_nn_layers.SegmentationHeadQuantized), + CustomLayerQuantize('Vision>SpatialPyramidPooling', + quantized_nn_layers.SpatialPyramidPoolingQuantized), + CustomLayerQuantize('Vision>ASPP', quantized_nn_layers.ASPPQuantized) +] diff --git a/official/projects/qat/vision/quantization/schemes.py b/official/projects/qat/vision/quantization/schemes.py new file mode 100644 index 0000000000000000000000000000000000000000..fca03e4cbf96139d488a7be615dbd4f08ffc4f1a --- /dev/null +++ b/official/projects/qat/vision/quantization/schemes.py @@ -0,0 +1,76 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Quantization schemes.""" +# Import libraries + +import tensorflow_model_optimization as tfmot +from official.projects.qat.vision.quantization import layer_transforms + + +default_8bit_transforms = tfmot.quantization.keras.default_8bit.default_8bit_transforms + + +class QuantizeLayoutTransform( + tfmot.quantization.keras.QuantizeLayoutTransform): + """Default model transformations.""" + + def apply(self, model, layer_quantize_map): + """Implement default 8-bit transforms. + + Currently this means the following. + 1. Pull activations into layers, and apply fuse activations. (TODO) + 2. Modify range in incoming layers for Concat. (TODO) + 3. Fuse Conv2D/DepthwiseConv2D + BN into single layer. + + Args: + model: Keras model to be quantized. + layer_quantize_map: Map with keys as layer names, and values as dicts + containing custom `QuantizeConfig`s which may have been passed with + layers. + + Returns: + (Transformed Keras model to better match TensorFlow Lite backend, updated + layer quantize map.) + """ + + transforms = [ + default_8bit_transforms.InputLayerQuantize(), + default_8bit_transforms.SeparableConv1DQuantize(), + default_8bit_transforms.SeparableConvQuantize(), + default_8bit_transforms.Conv2DReshapeBatchNormReLUQuantize(), + default_8bit_transforms.Conv2DReshapeBatchNormActivationQuantize(), + default_8bit_transforms.Conv2DBatchNormReLUQuantize(), + default_8bit_transforms.Conv2DBatchNormActivationQuantize(), + default_8bit_transforms.Conv2DReshapeBatchNormQuantize(), + default_8bit_transforms.Conv2DBatchNormQuantize(), + default_8bit_transforms.ConcatTransform6Inputs(), + default_8bit_transforms.ConcatTransform5Inputs(), + default_8bit_transforms.ConcatTransform4Inputs(), + default_8bit_transforms.ConcatTransform3Inputs(), + default_8bit_transforms.ConcatTransform(), + default_8bit_transforms.LayerReLUQuantize(), + default_8bit_transforms.LayerReluActivationQuantize() + ] + transforms += layer_transforms.CUSTOM_TRANSFORMS + return tfmot.quantization.keras.graph_transformations.model_transformer.ModelTransformer( + model, transforms, + set(layer_quantize_map.keys()), layer_quantize_map).transform() + + +class Default8BitQuantizeScheme( + tfmot.quantization.keras.default_8bit.Default8BitQuantizeScheme): + + def get_layout_transformer(self): + return QuantizeLayoutTransform() diff --git a/official/projects/qat/vision/registry_imports.py b/official/projects/qat/vision/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..2c93ccd9afebc2b697413e24b36391a6c54344d1 --- /dev/null +++ b/official/projects/qat/vision/registry_imports.py @@ -0,0 +1,21 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration on qat project.""" +# pylint: disable=unused-import +from official.projects.qat.vision import configs +from official.projects.qat.vision.modeling import layers +from official.projects.qat.vision.tasks import image_classification +from official.projects.qat.vision.tasks import retinanet +from official.projects.qat.vision.tasks import semantic_segmentation diff --git a/official/projects/qat/vision/serving/export_module.py b/official/projects/qat/vision/serving/export_module.py new file mode 100644 index 0000000000000000000000000000000000000000..2f24337c0bc954069fa5bc808dc251ef51803b40 --- /dev/null +++ b/official/projects/qat/vision/serving/export_module.py @@ -0,0 +1,68 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Export modules for QAT model serving/inference.""" +from absl import logging +import tensorflow as tf + +from official.projects.qat.vision.modeling import factory as qat_factory +from official.vision import configs +from official.vision.serving import detection +from official.vision.serving import image_classification +from official.vision.serving import semantic_segmentation + + +class ClassificationModule(image_classification.ClassificationModule): + """Classification Module.""" + + def _build_model(self): + model = super()._build_model() + input_specs = tf.keras.layers.InputSpec(shape=[self._batch_size] + + self._input_image_size + [3]) + return qat_factory.build_qat_classification_model( + model, self.params.task.quantization, input_specs, + self.params.task.model) + + +class SegmentationModule(semantic_segmentation.SegmentationModule): + """Segmentation Module.""" + + def _build_model(self): + model = super()._build_model() + input_specs = tf.keras.layers.InputSpec(shape=[self._batch_size] + + self._input_image_size + [3]) + return qat_factory.build_qat_segmentation_model( + model, self.params.task.quantization, input_specs) + + +class DetectionModule(detection.DetectionModule): + """Detection Module.""" + + def _build_model(self): + if self.params.task.model.detection_generator.nms_version != 'tflite': + self.params.task.model.detection_generator.nms_version = 'tflite' + logging.info('Set `nms_version` to `tflite` because only TFLite NMS is ' + 'supported for QAT detection models.') + + model = super()._build_model() + + if isinstance(self.params.task.model, configs.retinanet.RetinaNet): + model = qat_factory.build_qat_retinanet(model, + self.params.task.quantization, + self.params.task.model) + else: + raise ValueError('Detection module not implemented for {} model.'.format( + type(self.params.task.model))) + + return model diff --git a/official/projects/qat/vision/serving/export_saved_model.py b/official/projects/qat/vision/serving/export_saved_model.py new file mode 100644 index 0000000000000000000000000000000000000000..85336d2c25afda882446a344702cdae33e6e67b7 --- /dev/null +++ b/official/projects/qat/vision/serving/export_saved_model.py @@ -0,0 +1,138 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Vision models export binary for serving/inference. + +To export a trained checkpoint in saved_model format (shell script): + +EXPERIMENT_TYPE = XX +CHECKPOINT_PATH = XX +EXPORT_DIR_PATH = XX +export_saved_model --experiment=${EXPERIMENT_TYPE} \ + --export_dir=${EXPORT_DIR_PATH}/ \ + --checkpoint_path=${CHECKPOINT_PATH} \ + --batch_size=2 \ + --input_image_size=224,224 + +To serve (python): + +export_dir_path = XX +input_type = XX +input_images = XX +imported = tf.saved_model.load(export_dir_path) +model_fn = imported.signatures['serving_default'] +output = model_fn(input_images) +""" +from absl import app +from absl import flags + +from official.core import exp_factory +from official.modeling import hyperparams +from official.projects.qat.vision import registry_imports # pylint: disable=unused-import +from official.projects.qat.vision.serving import export_module +from official.vision import configs +from official.vision.serving import export_saved_model_lib + +FLAGS = flags.FLAGS + +_EXPERIMENT = flags.DEFINE_string( + 'experiment', None, 'experiment type, e.g. retinanet_resnetfpn_coco') +_EXPORT_DIR = flags.DEFINE_string('export_dir', None, 'The export directory.') +_CHECKPOINT_PATH = flags.DEFINE_string('checkpoint_path', None, + 'Checkpoint path.') +_CONFIG_FILE = flags.DEFINE_multi_string( + 'config_file', + default=None, + help='YAML/JSON files which specifies overrides. The override order ' + 'follows the order of args. Note that each file ' + 'can be used as an override template to override the default parameters ' + 'specified in Python. If the same parameter is specified in both ' + '`--config_file` and `--params_override`, `config_file` will be used ' + 'first, followed by params_override.') +_PARAMS_OVERRIDE = flags.DEFINE_string( + 'params_override', '', + 'The JSON/YAML file or string which specifies the parameter to be overriden' + ' on top of `config_file` template.') +_BATCH_SIZE = flags.DEFINE_integer('batch_size', None, 'The batch size.') +_IMAGE_TYPE = flags.DEFINE_string( + 'input_type', 'image_tensor', + 'One of `image_tensor`, `image_bytes`, `tf_example` and `tflite`.') +_INPUT_IMAGE_SIZE = flags.DEFINE_string( + 'input_image_size', '224,224', + 'The comma-separated string of two integers representing the height,width ' + 'of the input to the model.') +_EXPORT_CHECKPOINT_SUBDIR = flags.DEFINE_string( + 'export_checkpoint_subdir', 'checkpoint', + 'The subdirectory for checkpoints.') +_EXPORT_SAVED_MODEL_SUBDIR = flags.DEFINE_string( + 'export_saved_model_subdir', 'saved_model', + 'The subdirectory for saved model.') +_LOG_MODEL_FLOPS_AND_PARAMS = flags.DEFINE_bool( + 'log_model_flops_and_params', False, + 'If true, logs model flops and parameters.') +_INPUT_NAME = flags.DEFINE_string( + 'input_name', None, + 'Input tensor name in signature def. Default at None which' + 'produces input tensor name `inputs`.') + + +def main(_): + + params = exp_factory.get_exp_config(_EXPERIMENT.value) + for config_file in _CONFIG_FILE.value or []: + params = hyperparams.override_params_dict( + params, config_file, is_strict=True) + if _PARAMS_OVERRIDE.value: + params = hyperparams.override_params_dict( + params, _PARAMS_OVERRIDE.value, is_strict=True) + + params.validate() + params.lock() + + input_image_size = [int(x) for x in _INPUT_IMAGE_SIZE.value.split(',')] + + if isinstance(params.task, + configs.image_classification.ImageClassificationTask): + export_module_cls = export_module.ClassificationModule + elif isinstance(params.task, configs.retinanet.RetinaNetTask): + export_module_cls = export_module.DetectionModule + elif isinstance(params.task, + configs.semantic_segmentation.SemanticSegmentationTask): + export_module_cls = export_module.SegmentationModule + else: + raise TypeError(f'Export module for {type(params.task)} is not supported.') + + module = export_module_cls( + params=params, + batch_size=_BATCH_SIZE.value, + input_image_size=input_image_size, + input_type=_IMAGE_TYPE.value, + num_channels=3) + + export_saved_model_lib.export_inference_graph( + input_type=_IMAGE_TYPE.value, + batch_size=_BATCH_SIZE.value, + input_image_size=input_image_size, + params=params, + checkpoint_path=_CHECKPOINT_PATH.value, + export_dir=_EXPORT_DIR.value, + export_checkpoint_subdir=_EXPORT_CHECKPOINT_SUBDIR.value, + export_saved_model_subdir=_EXPORT_SAVED_MODEL_SUBDIR.value, + export_module=module, + log_model_flops_and_params=_LOG_MODEL_FLOPS_AND_PARAMS.value, + input_name=_INPUT_NAME.value) + + +if __name__ == '__main__': + app.run(main) diff --git a/official/projects/qat/vision/serving/export_tflite.py b/official/projects/qat/vision/serving/export_tflite.py new file mode 100644 index 0000000000000000000000000000000000000000..884ac0dad26f6caeb332cbdf145c1f01d16c3bb3 --- /dev/null +++ b/official/projects/qat/vision/serving/export_tflite.py @@ -0,0 +1,23 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Binary to convert a saved model to TFLite model for the QAT model.""" + +from absl import app + +from official.projects.qat.vision import registry_imports # pylint: disable=unused-import +from official.vision.serving import export_tflite + +if __name__ == '__main__': + app.run(export_tflite.main) diff --git a/official/projects/qat/vision/tasks/__init__.py b/official/projects/qat/vision/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..42350ee0790bca9d71df31c52e7e3940f24f0bc6 --- /dev/null +++ b/official/projects/qat/vision/tasks/__init__.py @@ -0,0 +1,17 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tasks package definition.""" + +from official.projects.qat.vision.tasks import image_classification diff --git a/official/projects/qat/vision/tasks/image_classification.py b/official/projects/qat/vision/tasks/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..20824dcef552e3299bba689a80aa5fafad9d550c --- /dev/null +++ b/official/projects/qat/vision/tasks/image_classification.py @@ -0,0 +1,49 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification task definition.""" +import tensorflow as tf + +from official.core import task_factory +from official.projects.qat.vision.configs import image_classification as exp_cfg +from official.projects.qat.vision.modeling import factory +from official.vision.tasks import image_classification + + +@task_factory.register_task_cls(exp_cfg.ImageClassificationTask) +class ImageClassificationTask(image_classification.ImageClassificationTask): + """A task for image classification with QAT.""" + + def build_model(self) -> tf.keras.Model: + """Builds classification model with QAT.""" + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = super(ImageClassificationTask, self).build_model() + if self.task_config.quantization: + model = factory.build_qat_classification_model( + model, + self.task_config.quantization, + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + + return model diff --git a/official/projects/qat/vision/tasks/image_classification_test.py b/official/projects/qat/vision/tasks/image_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..eac971da3236d8e1523eb13830fdc35c243efb70 --- /dev/null +++ b/official/projects/qat/vision/tasks/image_classification_test.py @@ -0,0 +1,79 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for image classification task.""" + +# pylint: disable=unused-import +import os + +from absl.testing import parameterized +import orbit +import tensorflow as tf + +from official import vision +from official.core import exp_factory +from official.modeling import optimization +from official.projects.qat.vision.tasks import image_classification as img_cls_task +from official.vision.dataloaders import tfexample_utils + + +class ImageClassificationTaskTest(tf.test.TestCase, parameterized.TestCase): + + def _create_test_tfrecord(self, tfrecord_file, example, num_samples): + examples = [example] * num_samples + tfexample_utils.dump_to_tfrecord( + record_file=tfrecord_file, tf_examples=examples) + + @parameterized.parameters(('resnet_imagenet_qat'), + ('mobilenet_imagenet_qat')) + def test_task(self, config_name): + input_image_size = [224, 224] + test_tfrecord_file = os.path.join(self.get_temp_dir(), 'cls_test.tfrecord') + example = tf.train.Example.FromString( + tfexample_utils.create_classification_example( + image_height=input_image_size[0], image_width=input_image_size[1])) + self._create_test_tfrecord( + tfrecord_file=test_tfrecord_file, example=example, num_samples=10) + + config = exp_factory.get_exp_config(config_name) + config.task.train_data.global_batch_size = 2 + config.task.validation_data.input_path = test_tfrecord_file + config.task.train_data.input_path = test_tfrecord_file + task = img_cls_task.ImageClassificationTask(config.task) + model = task.build_model() + metrics = task.build_metrics() + strategy = tf.distribute.get_strategy() + + dataset = orbit.utils.make_distributed_dataset(strategy, task.build_inputs, + config.task.train_data) + + iterator = iter(dataset) + opt_factory = optimization.OptimizerFactory(config.trainer.optimizer_config) + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + logs = task.train_step(next(iterator), model, optimizer, metrics=metrics) + for metric in metrics: + logs[metric.name] = metric.result() + self.assertIn('loss', logs) + self.assertIn('accuracy', logs) + self.assertIn('top_5_accuracy', logs) + logs = task.validation_step(next(iterator), model, metrics=metrics) + for metric in metrics: + logs[metric.name] = metric.result() + self.assertIn('loss', logs) + self.assertIn('accuracy', logs) + self.assertIn('top_5_accuracy', logs) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/tasks/retinanet.py b/official/projects/qat/vision/tasks/retinanet.py new file mode 100644 index 0000000000000000000000000000000000000000..5798bdec10b24ff02999a6991ede131771c1c695 --- /dev/null +++ b/official/projects/qat/vision/tasks/retinanet.py @@ -0,0 +1,40 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""RetinaNet task definition.""" +import tensorflow as tf + +from official.core import task_factory +from official.projects.qat.vision.configs import retinanet as exp_cfg +from official.projects.qat.vision.modeling import factory +from official.vision.tasks import retinanet + + +@task_factory.register_task_cls(exp_cfg.RetinaNetTask) +class RetinaNetTask(retinanet.RetinaNetTask): + """A task for RetinaNet object detection with QAT.""" + + def build_model(self) -> tf.keras.Model: + """Builds RetinaNet model with QAT.""" + model = super(RetinaNetTask, self).build_model() + # Call the model with dummy input to build the head part. + dummpy_input = tf.zeros([1] + self.task_config.model.input_size) + model(dummpy_input, training=True) + + if self.task_config.quantization: + model = factory.build_qat_retinanet( + model, + self.task_config.quantization, + model_config=self.task_config.model) + return model diff --git a/official/projects/qat/vision/tasks/retinanet_test.py b/official/projects/qat/vision/tasks/retinanet_test.py new file mode 100644 index 0000000000000000000000000000000000000000..03b3694c94d022964888dcff3fdd5df7fb96cb55 --- /dev/null +++ b/official/projects/qat/vision/tasks/retinanet_test.py @@ -0,0 +1,87 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for RetinaNet task.""" +# pylint: disable=unused-import +import os + +from absl.testing import parameterized +import orbit +import tensorflow as tf + +from official import vision +from official.core import exp_factory +from official.modeling import optimization +from official.projects.qat.vision.tasks import retinanet +from official.vision.configs import retinanet as exp_cfg +from official.vision.dataloaders import tfexample_utils + + +class RetinaNetTaskTest(parameterized.TestCase, tf.test.TestCase): + + def _create_test_tfrecord(self, tfrecord_file, example, num_samples): + examples = [example] * num_samples + tfexample_utils.dump_to_tfrecord( + record_file=tfrecord_file, tf_examples=examples) + + @parameterized.parameters( + ('retinanet_mobile_coco_qat', True), + ('retinanet_mobile_coco_qat', False), + ) + def test_retinanet_task(self, test_config, is_training): + """RetinaNet task test for training and val using toy configs.""" + input_image_size = [384, 384] + test_tfrecord_file = os.path.join(self.get_temp_dir(), 'det_test.tfrecord') + example = tfexample_utils.create_detection_test_example( + image_height=input_image_size[0], + image_width=input_image_size[1], + image_channel=3, + num_instances=10) + self._create_test_tfrecord( + tfrecord_file=test_tfrecord_file, example=example, num_samples=10) + config = exp_factory.get_exp_config(test_config) + # modify config to suit local testing + config.task.model.input_size = [128, 128, 3] + config.trainer.steps_per_loop = 1 + config.task.train_data.global_batch_size = 1 + config.task.validation_data.global_batch_size = 1 + config.task.train_data.shuffle_buffer_size = 2 + config.task.validation_data.shuffle_buffer_size = 2 + config.task.validation_data.input_path = test_tfrecord_file + config.task.train_data.input_path = test_tfrecord_file + config.task.annotation_file = None + config.train_steps = 1 + + task = retinanet.RetinaNetTask(config.task) + model = task.build_model() + self.assertLen(model.weights, 2393) + metrics = task.build_metrics(training=is_training) + + strategy = tf.distribute.get_strategy() + + data_config = config.task.train_data if is_training else config.task.validation_data + dataset = orbit.utils.make_distributed_dataset(strategy, task.build_inputs, + data_config) + iterator = iter(dataset) + opt_factory = optimization.OptimizerFactory(config.trainer.optimizer_config) + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + + if is_training: + task.train_step(next(iterator), model, optimizer, metrics=metrics) + else: + task.validation_step(next(iterator), model, metrics=metrics) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/qat/vision/tasks/semantic_segmentation.py b/official/projects/qat/vision/tasks/semantic_segmentation.py new file mode 100644 index 0000000000000000000000000000000000000000..a2c1bd0e972fd2f84c74ff59cb82e916885a1c6e --- /dev/null +++ b/official/projects/qat/vision/tasks/semantic_segmentation.py @@ -0,0 +1,36 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Semantic segmentation task definition.""" +import tensorflow as tf + +from official.core import task_factory +from official.projects.qat.vision.configs import semantic_segmentation as exp_cfg +from official.projects.qat.vision.modeling import factory +from official.vision.tasks import semantic_segmentation + + +@task_factory.register_task_cls(exp_cfg.SemanticSegmentationTask) +class SemanticSegmentationTask(semantic_segmentation.SemanticSegmentationTask): + """A task for semantic segmentation with QAT.""" + + def build_model(self) -> tf.keras.Model: + """Builds semantic segmentation model with QAT.""" + model = super().build_model() + input_specs = tf.keras.layers.InputSpec(shape=[None] + + self.task_config.model.input_size) + if self.task_config.quantization: + model = factory.build_qat_segmentation_model( + model, self.task_config.quantization, input_specs) + return model diff --git a/official/projects/qat/vision/train.py b/official/projects/qat/vision/train.py new file mode 100644 index 0000000000000000000000000000000000000000..453cb9fe4e0addafc929ed661ba75f463f2b03e5 --- /dev/null +++ b/official/projects/qat/vision/train.py @@ -0,0 +1,26 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver, including QAT configs..""" + +from absl import app + +from official.common import flags as tfm_flags +from official.projects.qat.vision import registry_imports # pylint: disable=unused-import +from official.vision import train + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/roformer/__init__.py b/official/projects/roformer/__init__.py index a25710c222e3327cb20e000db5df5c5651c4a2cc..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 100644 --- a/official/projects/roformer/__init__.py +++ b/official/projects/roformer/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/roformer/roformer.py b/official/projects/roformer/roformer.py index d0f4da23c19812f9a9662769575f9e48ad4945bd..0474de3dac87fcdd8458bc4ba2aaca1e38aed7e7 100644 --- a/official/projects/roformer/roformer.py +++ b/official/projects/roformer/roformer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/roformer/roformer_attention.py b/official/projects/roformer/roformer_attention.py index 2eec24db539ccda12d5eab65e414f7f0cbde0d2f..dc3be9507037eea20b4b019a1faa1e6a9b764ad0 100644 --- a/official/projects/roformer/roformer_attention.py +++ b/official/projects/roformer/roformer_attention.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ # pylint: disable=g-classes-have-attributes import tensorflow as tf -EinsumDense = tf.keras.layers.experimental.EinsumDense +EinsumDense = tf.keras.layers.EinsumDense MultiHeadAttention = tf.keras.layers.MultiHeadAttention diff --git a/official/projects/roformer/roformer_attention_test.py b/official/projects/roformer/roformer_attention_test.py index 92d6b9001e7df10612278314e3c748471b713751..d131e876a7a645c24bfb778f1846605b7fdad7e5 100644 --- a/official/projects/roformer/roformer_attention_test.py +++ b/official/projects/roformer/roformer_attention_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/roformer/roformer_encoder.py b/official/projects/roformer/roformer_encoder.py index a81aa17aae9556753ea714d38178205357296ff4..0e683e7f09a183bc79aa7249b764064ec8aaf6f5 100644 --- a/official/projects/roformer/roformer_encoder.py +++ b/official/projects/roformer/roformer_encoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,6 +19,7 @@ import collections from absl import logging import tensorflow as tf +from official.modeling import tf_utils from official.nlp.modeling import layers from official.projects.roformer import roformer_encoder_block @@ -115,7 +116,7 @@ class RoformerEncoder(tf.keras.Model): embedding_layer_inst = layers.on_device_embedding.OnDeviceEmbedding( vocab_size=vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), name='word_embeddings') else: embedding_layer_inst = embedding_layer @@ -125,7 +126,7 @@ class RoformerEncoder(tf.keras.Model): type_embedding_layer = layers.on_device_embedding.OnDeviceEmbedding( vocab_size=type_vocab_size, embedding_width=embedding_width, - initializer=initializer, + initializer=tf_utils.clone_initializer(initializer), use_one_hot=True, name='type_embeddings') type_embeddings = type_embedding_layer(type_ids) @@ -142,11 +143,11 @@ class RoformerEncoder(tf.keras.Model): # We project the 'embedding' output to 'hidden_size' if it is not already # 'hidden_size'. if embedding_width != hidden_size: - embedding_projection = tf.keras.layers.experimental.EinsumDense( + embedding_projection = tf.keras.layers.EinsumDense( '...x,xy->...y', output_shape=hidden_size, bias_axes='y', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='embedding_projection') embeddings = embedding_projection(embeddings) else: @@ -171,7 +172,7 @@ class RoformerEncoder(tf.keras.Model): attention_dropout=attention_dropout, norm_first=norm_first, output_range=transformer_output_range, - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='roformer/layer_%d' % i) transformer_layers.append(layer) data = layer([data, attention_mask]) @@ -185,7 +186,7 @@ class RoformerEncoder(tf.keras.Model): pooler_layer = tf.keras.layers.Dense( units=hidden_size, activation='tanh', - kernel_initializer=initializer, + kernel_initializer=tf_utils.clone_initializer(initializer), name='pooler_transform') cls_output = pooler_layer(first_token_tensor) diff --git a/official/projects/roformer/roformer_encoder_block.py b/official/projects/roformer/roformer_encoder_block.py index 20826c41e71eca94654d9facdb34f2101af3294b..91714917fc4d4f8517bdfcd13818e32aa10621e4 100644 --- a/official/projects/roformer/roformer_encoder_block.py +++ b/official/projects/roformer/roformer_encoder_block.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,6 +15,7 @@ """Roformer TransformerEncoder block layer.""" import tensorflow as tf +from official.modeling import tf_utils from official.projects.roformer import roformer_attention @@ -111,7 +112,8 @@ class RoformerEncoderBlock(tf.keras.layers.Layer): self._attention_initializer = tf.keras.initializers.get( attention_initializer) else: - self._attention_initializer = self._kernel_initializer + self._attention_initializer = tf_utils.clone_initializer( + self._kernel_initializer) self._attention_axes = attention_axes def build(self, input_shape): @@ -160,11 +162,11 @@ class RoformerEncoderBlock(tf.keras.layers.Layer): axis=-1, epsilon=self._norm_epsilon, dtype=tf.float32)) - self._intermediate_dense = tf.keras.layers.experimental.EinsumDense( + self._intermediate_dense = tf.keras.layers.EinsumDense( einsum_equation, output_shape=(None, self._inner_dim), bias_axes="d", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), name="intermediate", **common_kwargs) policy = tf.keras.mixed_precision.global_policy() @@ -177,12 +179,12 @@ class RoformerEncoderBlock(tf.keras.layers.Layer): self._inner_activation, dtype=policy) self._inner_dropout_layer = tf.keras.layers.Dropout( rate=self._inner_dropout) - self._output_dense = tf.keras.layers.experimental.EinsumDense( + self._output_dense = tf.keras.layers.EinsumDense( einsum_equation, output_shape=(None, hidden_size), bias_axes="d", name="output", - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), **common_kwargs) self._output_dropout = tf.keras.layers.Dropout(rate=self._output_dropout) # Use float32 in layernorm for numeric stability. diff --git a/official/projects/roformer/roformer_encoder_block_test.py b/official/projects/roformer/roformer_encoder_block_test.py index f4833f96aa0dfb8cc9dd713d378be04cb5032ab1..99dd2b00c6cbb3a8f7e835093c8f10c1109e7c62 100644 --- a/official/projects/roformer/roformer_encoder_block_test.py +++ b/official/projects/roformer/roformer_encoder_block_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/roformer/roformer_encoder_test.py b/official/projects/roformer/roformer_encoder_test.py index 7c4d4f5d6081a2ba45b74414fd459a0af62f5cae..7fc77f3cf4e123af1da4f35d29425cd4a0aebb74 100644 --- a/official/projects/roformer/roformer_encoder_test.py +++ b/official/projects/roformer/roformer_encoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/roformer/roformer_experiments.py b/official/projects/roformer/roformer_experiments.py index a16c26be4a77c52bb3bc5428bfb7710395d5f4b0..cb095847d3d47573763b8b45195e0b823abeb186 100644 --- a/official/projects/roformer/roformer_experiments.py +++ b/official/projects/roformer/roformer_experiments.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/roformer/train.py b/official/projects/roformer/train.py index 7bd5dde0b14dba9fe1875e303f44a4daad8fc6b8..6ea0aec4b35cce61b53839f53f82b655132e4d59 100644 --- a/official/projects/roformer/train.py +++ b/official/projects/roformer/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/s3d/configs/s3d.py b/official/projects/s3d/configs/s3d.py new file mode 100644 index 0000000000000000000000000000000000000000..1dcd1424c2c92dbff6155c6a8a3cbb427c49ed4a --- /dev/null +++ b/official/projects/s3d/configs/s3d.py @@ -0,0 +1,98 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""S3D model configurations.""" +import dataclasses +from typing import Text + +from official.modeling import hyperparams +from official.vision.configs import backbones_3d +from official.vision.configs import video_classification + + +@dataclasses.dataclass +class S3D(hyperparams.Config): + """S3D backbone config. + + Attributes: + final_endpoint: Specifies the endpoint to construct the network up to. It + can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1', + 'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c', + 'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', + 'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b', 'Mixed_5c'] + first_temporal_kernel_size: Specifies the temporal kernel size for the first + conv3d filter. A larger value slows down the model but provides little + accuracy improvement. Must be set to one of 1, 3, 5 or 7. + temporal_conv_start_at: Specifies the first conv block to use separable 3D + convs rather than 2D convs (implemented as [1, k, k] 3D conv). This is + used to construct the inverted pyramid models. 'Conv2d_2c_3x3' is the + first valid block to use separable 3D convs. If provided block name is + not present, all valid blocks will use separable 3D convs. + gating_start_at: Specifies the first conv block to use self gating. + 'Conv2d_2c_3x3' is the first valid block to use self gating. + swap_pool_and_1x1x1: If True, in Branch_3 1x1x1 convolution is performed + first, then followed by max pooling. 1x1x1 convolution is used to reduce + the number of filters. Thus, max pooling is performed on less filters. + gating_style: Self gating can be applied after each branch and/or after each + inception cell. It can be one of ['BRANCH', 'CELL', 'BRANCH_AND_CELL']. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + temporal_conv_type: It can be one of ['3d', '2+1d', '1+2d', '1+1+1d'] where + '3d' is SPATIOTEMPORAL 3d convolution, '2+1d' is SPATIAL_TEMPORAL_SEPARATE + with 2D convolution on the spatial dimensions followed by 1D convolution + on the temporal dimension, '1+2d' is TEMPORAL_SPATIAL_SEPARATE with 1D + convolution on the temporal dimension followed by 2D convolution on the + spatial dimensions, and '1+1+1d' is FULLY_SEPARATE with 1D convolutions on + the horizontal, vertical, and temporal dimensions, respectively. + depth_multiplier: Float multiplier for the depth (number of channels) for + all convolution ops. The value must be greater than zero. Typical usage + will be to set this value in (0, 1) to reduce the number of parameters or + computation cost of the model. + """ + final_endpoint: Text = 'Mixed_5c' + first_temporal_kernel_size: int = 3 + temporal_conv_start_at: Text = 'Conv2d_2c_3x3' + gating_start_at: Text = 'Conv2d_2c_3x3' + swap_pool_and_1x1x1: bool = True + gating_style: Text = 'CELL' + use_sync_bn: bool = False + norm_momentum: float = 0.999 + norm_epsilon: float = 0.001 + temporal_conv_type: Text = '2+1d' + depth_multiplier: float = 1.0 + + +@dataclasses.dataclass +class Backbone3D(backbones_3d.Backbone3D): + """Configuration for backbones. + + Attributes: + type: 'str', type of backbone be used, on the of fields below. + s3d: s3d backbone config. + """ + type: str = 's3d' + s3d: S3D = S3D() + + +@dataclasses.dataclass +class S3DModel(video_classification.VideoClassificationModel): + """The S3D model config. + + Attributes: + type: 'str', type of backbone be used, on the of fields below. + backbone: backbone config. + """ + model_type: str = 's3d' + backbone: Backbone3D = Backbone3D() diff --git a/official/projects/s3d/modeling/inception_utils.py b/official/projects/s3d/modeling/inception_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..8ffa051c724bf9a1f204f7ce1678f2474b96c6bc --- /dev/null +++ b/official/projects/s3d/modeling/inception_utils.py @@ -0,0 +1,536 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains modules related to Inception networks.""" +from typing import Callable, Dict, Optional, Sequence, Set, Text, Tuple, Type, Union + +import tensorflow as tf + +from official.modeling import tf_utils +from official.projects.s3d.modeling import net_utils +from official.vision.modeling.layers import nn_blocks_3d + +INCEPTION_V1_CONV_ENDPOINTS = [ + 'Conv2d_1a_7x7', 'Conv2d_2c_3x3', 'Mixed_3b', 'Mixed_3c', 'Mixed_4b', + 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'Mixed_5b', 'Mixed_5c' +] + +# Mapping from endpoint to branch filters. The endpoint shapes below are +# specific for input 64x224x224. +INCEPTION_V1_ARCH_SKELETON = [ + ('Mixed_3b', [[64], [96, 128], [16, 32], [32]]), # 32x28x28x256 + ('Mixed_3c', [[128], [128, 192], [32, 96], [64]]), # 32x28x28x480 + ('MaxPool_4a_3x3', [[3, 3, 3], [2, 2, 2]]), # 16x14x14x480 + ('Mixed_4b', [[192], [96, 208], [16, 48], [64]]), # 16x14x14x512 + ('Mixed_4c', [[160], [112, 224], [24, 64], [64]]), # 16x14x14x512 + ('Mixed_4d', [[128], [128, 256], [24, 64], [64]]), # 16x14x14x512 + ('Mixed_4e', [[112], [144, 288], [32, 64], [64]]), # 16x14x14x528 + ('Mixed_4f', [[256], [160, 320], [32, 128], [128]]), # 16x14x14x832 + ('MaxPool_5a_2x2', [[2, 2, 2], [2, 2, 2]]), # 8x7x7x832 + ('Mixed_5b', [[256], [160, 320], [32, 128], [128]]), # 8x7x7x832 + ('Mixed_5c', [[384], [192, 384], [48, 128], [128]]), # 8x7x7x1024 +] + +INCEPTION_V1_LOCAL_SKELETON = [ + ('MaxPool_5a_2x2_local', [[2, 2, 2], [2, 2, 2]]), # 8x7x7x832 + ('Mixed_5b_local', [[256], [160, 320], [32, 128], [128]]), # 8x7x7x832 + ('Mixed_5c_local', [[384], [192, 384], [48, 128], [128]]), # 8x7x7x1024 +] + +initializers = tf.keras.initializers +regularizers = tf.keras.regularizers + + +def inception_v1_stem_cells( + inputs: tf.Tensor, + depth_multiplier: float, + final_endpoint: Text, + temporal_conv_endpoints: Optional[Set[Text]] = None, + self_gating_endpoints: Optional[Set[Text]] = None, + temporal_conv_type: Text = '3d', + first_temporal_kernel_size: int = 7, + use_sync_bn: bool = False, + norm_momentum: float = 0.999, + norm_epsilon: float = 0.001, + temporal_conv_initializer: Union[ + Text, initializers.Initializer] = initializers.TruncatedNormal( + mean=0.0, stddev=0.01), + kernel_initializer: Union[Text, + initializers.Initializer] = 'truncated_normal', + kernel_regularizer: Union[Text, regularizers.Regularizer] = 'l2', + parameterized_conv_layer: Type[ + net_utils.ParameterizedConvLayer] = net_utils.ParameterizedConvLayer, + layer_naming_fn: Callable[[Text], Text] = lambda end_point: None, +) -> Tuple[tf.Tensor, Dict[Text, tf.Tensor]]: + """Stem cells used in the original I3D/S3D model. + + Args: + inputs: A 5-D float tensor of size [batch_size, num_frames, height, width, + channels]. + depth_multiplier: A float to reduce/increase number of channels. + final_endpoint: Specifies the endpoint to construct the network up to. It + can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1', + 'Conv2d_2c_3x3', 'MaxPool_3a_3x3']. + temporal_conv_endpoints: Specifies the endpoints where to perform temporal + convolution. + self_gating_endpoints: Specifies the endpoints where to perform self gating. + temporal_conv_type: '3d' for I3D model and '2+1d' for S3D model. + first_temporal_kernel_size: temporal kernel size of the first convolution + layer. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + temporal_conv_initializer: Weight initializer for temporal convolution + inside the cell. It only applies to 2+1d and 1+2d cases. + kernel_initializer: Weight initializer for convolutional layers other than + temporal convolution. + kernel_regularizer: Weight regularizer for all convolutional layers. + parameterized_conv_layer: class for parameterized conv layer. + layer_naming_fn: function to customize conv / pooling layer names given + endpoint name of the block. This is mainly used to creat model that is + compatible with TF1 checkpoints. + + Returns: + A dictionary from components of the network to the corresponding activation. + """ + + if temporal_conv_endpoints is None: + temporal_conv_endpoints = set() + if self_gating_endpoints is None: + self_gating_endpoints = set() + if use_sync_bn: + batch_norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + batch_norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + bn_axis = -1 + else: + bn_axis = 1 + + end_points = {} + # batch_size x 32 x 112 x 112 x 64 + end_point = 'Conv2d_1a_7x7' + net = tf.keras.layers.Conv3D( + filters=net_utils.apply_depth_multiplier(64, depth_multiplier), + kernel_size=[first_temporal_kernel_size, 7, 7], + strides=[2, 2, 2], + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(kernel_initializer), + kernel_regularizer=kernel_regularizer, + name=layer_naming_fn(end_point))( + inputs) + net = batch_norm( + axis=bn_axis, + momentum=norm_momentum, + epsilon=norm_epsilon, + scale=False, + gamma_initializer='ones', + name=layer_naming_fn(end_point + '/BatchNorm'))( + net) + net = tf.nn.relu(net) + end_points[end_point] = net + if final_endpoint == end_point: + return net, end_points + # batch_size x 32 x 56 x 56 x 64 + end_point = 'MaxPool_2a_3x3' + net = tf.keras.layers.MaxPool3D( + pool_size=[1, 3, 3], + strides=[1, 2, 2], + padding='same', + name=layer_naming_fn(end_point))( + net) + end_points[end_point] = net + if final_endpoint == end_point: + return net, end_points + # batch_size x 32 x 56 x 56 x 64 + end_point = 'Conv2d_2b_1x1' + net = tf.keras.layers.Conv3D( + filters=net_utils.apply_depth_multiplier(64, depth_multiplier), + strides=[1, 1, 1], + kernel_size=[1, 1, 1], + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(kernel_initializer), + kernel_regularizer=kernel_regularizer, + name=layer_naming_fn(end_point))( + net) + net = batch_norm( + axis=bn_axis, + momentum=norm_momentum, + epsilon=norm_epsilon, + scale=False, + gamma_initializer='ones', + name=layer_naming_fn(end_point + '/BatchNorm'))( + net) + net = tf.nn.relu(net) + end_points[end_point] = net + if final_endpoint == end_point: + return net, end_points + # batch_size x 32 x 56 x 56 x 192 + end_point = 'Conv2d_2c_3x3' + if end_point not in temporal_conv_endpoints: + temporal_conv_type = '2d' + net = parameterized_conv_layer( + conv_type=temporal_conv_type, + kernel_size=3, + filters=net_utils.apply_depth_multiplier(192, depth_multiplier), + strides=[1, 1, 1], + rates=[1, 1, 1], + use_sync_bn=use_sync_bn, + norm_momentum=norm_momentum, + norm_epsilon=norm_epsilon, + temporal_conv_initializer=temporal_conv_initializer, + kernel_initializer=tf_utils.clone_initializer(kernel_initializer), + kernel_regularizer=kernel_regularizer, + name=layer_naming_fn(end_point))( + net) + if end_point in self_gating_endpoints: + net = nn_blocks_3d.SelfGating( + filters=net_utils.apply_depth_multiplier(192, depth_multiplier), + name=layer_naming_fn(end_point + '/self_gating'))( + net) + end_points[end_point] = net + if final_endpoint == end_point: + return net, end_points + # batch_size x 32 x 28 x 28 x 192 + end_point = 'MaxPool_3a_3x3' + net = tf.keras.layers.MaxPool3D( + pool_size=[1, 3, 3], + strides=[1, 2, 2], + padding='same', + name=layer_naming_fn(end_point))( + net) + end_points[end_point] = net + return net, end_points + + +def _construct_branch_3_layers( + channels: int, + swap_pool_and_1x1x1: bool, + pool_type: Text, + batch_norm_layer: tf.keras.layers.Layer, + kernel_initializer: Union[Text, initializers.Initializer], + kernel_regularizer: Union[Text, regularizers.Regularizer], +): + """Helper function for Branch 3 inside Inception module.""" + kernel_size = [1, 3, 3] if pool_type == '2d' else [3] * 3 + + conv = tf.keras.layers.Conv3D( + filters=channels, + kernel_size=[1, 1, 1], + padding='same', + use_bias=False, + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer) + activation = tf.keras.layers.Activation('relu') + pool = tf.keras.layers.MaxPool3D( + pool_size=kernel_size, strides=[1, 1, 1], padding='same') + if swap_pool_and_1x1x1: + branch_3_layers = [conv, batch_norm_layer, activation, pool] + else: + branch_3_layers = [pool, conv, batch_norm_layer, activation] + return branch_3_layers + + +class InceptionV1CellLayer(tf.keras.layers.Layer): + """A single Tensorflow 2 cell used in the original I3D/S3D model.""" + + def __init__( + self, + branch_filters: Sequence[Sequence[int]], + conv_type: Text = '3d', + temporal_dilation_rate: int = 1, + swap_pool_and_1x1x1: bool = False, + use_self_gating_on_branch: bool = False, + use_self_gating_on_cell: bool = False, + use_sync_bn: bool = False, + norm_momentum: float = 0.999, + norm_epsilon: float = 0.001, + temporal_conv_initializer: Union[ + Text, initializers.Initializer] = initializers.TruncatedNormal( + mean=0.0, stddev=0.01), + kernel_initializer: Union[Text, + initializers.Initializer] = 'truncated_normal', + kernel_regularizer: Union[Text, regularizers.Regularizer] = 'l2', + parameterized_conv_layer: Type[ + net_utils.ParameterizedConvLayer] = net_utils.ParameterizedConvLayer, + **kwargs): + """A cell structure inspired by Inception V1. + + Args: + branch_filters: Specifies the number of filters in four branches + (Branch_0, Branch_1, Branch_2, Branch_3). Single number for Branch_0 and + Branch_3. For Branch_1 and Branch_2, each need to specify two numbers, + one for 1x1x1 and one for 3x3x3. + conv_type: The type of parameterized convolution. Currently, we support + '2d', '3d', '2+1d', '1+2d'. + temporal_dilation_rate: The dilation rate for temporal convolution. + swap_pool_and_1x1x1: A boolean flag indicates that whether to swap the + order of convolution and max pooling in Branch_3. + use_self_gating_on_branch: Whether or not to apply self gating on each + branch of the inception cell. + use_self_gating_on_cell: Whether or not to apply self gating on each cell + after the concatenation of all branches. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + temporal_conv_initializer: Weight initializer for temporal convolution + inside the cell. It only applies to 2+1d and 1+2d cases. + kernel_initializer: Weight initializer for convolutional layers other than + temporal convolution. + kernel_regularizer: Weight regularizer for all convolutional layers. + parameterized_conv_layer: class for parameterized conv layer. + **kwargs: keyword arguments to be passed. + + Returns: + out_tensor: A 5-D float tensor of size [batch_size, num_frames, height, + width, channels]. + """ + super(InceptionV1CellLayer, self).__init__(**kwargs) + + self._branch_filters = branch_filters + self._conv_type = conv_type + self._temporal_dilation_rate = temporal_dilation_rate + self._swap_pool_and_1x1x1 = swap_pool_and_1x1x1 + self._use_self_gating_on_branch = use_self_gating_on_branch + self._use_self_gating_on_cell = use_self_gating_on_cell + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._temporal_conv_initializer = temporal_conv_initializer + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._parameterized_conv_layer = parameterized_conv_layer + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + self._channel_axis = -1 + else: + self._channel_axis = 1 + + def _build_branch_params(self): + branch_0_params = [ + # Conv3D + dict( + filters=self._branch_filters[0][0], + kernel_size=[1, 1, 1], + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer), + # norm + dict( + axis=self._channel_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + scale=False, + gamma_initializer='ones'), + # relu + dict(), + ] + branch_1_params = [ + # Conv3D + dict( + filters=self._branch_filters[1][0], + kernel_size=[1, 1, 1], + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer), + # norm + dict( + axis=self._channel_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + scale=False, + gamma_initializer='ones'), + # relu + dict(), + # ParameterizedConvLayer + dict( + conv_type=self._conv_type, + kernel_size=3, + filters=self._branch_filters[1][1], + strides=[1, 1, 1], + rates=[self._temporal_dilation_rate, 1, 1], + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon, + temporal_conv_initializer=self._temporal_conv_initializer, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer), + ] + branch_2_params = [ + # Conv3D + dict( + filters=self._branch_filters[2][0], + kernel_size=[1, 1, 1], + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer), + # norm + dict( + axis=self._channel_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + scale=False, + gamma_initializer='ones'), + # relu + dict(), + # ParameterizedConvLayer + dict( + conv_type=self._conv_type, + kernel_size=3, + filters=self._branch_filters[2][1], + strides=[1, 1, 1], + rates=[self._temporal_dilation_rate, 1, 1], + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon, + temporal_conv_initializer=self._temporal_conv_initializer, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer) + ] + branch_3_params = [ + # Conv3D + dict( + filters=self._branch_filters[3][0], + kernel_size=[1, 1, 1], + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer), + # norm + dict( + axis=self._channel_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + scale=False, + gamma_initializer='ones'), + # relu + dict(), + # pool + dict( + pool_size=([1, 3, 3] if self._conv_type == '2d' else [3] * 3), + strides=[1, 1, 1], + padding='same') + ] + + if self._use_self_gating_on_branch: + branch_0_params.append(dict(filters=self._branch_filters[0][0])) + branch_1_params.append(dict(filters=self._branch_filters[1][1])) + branch_2_params.append(dict(filters=self._branch_filters[2][1])) + branch_3_params.append(dict(filters=self._branch_filters[3][0])) + + out_gating_params = [] + if self._use_self_gating_on_cell: + out_channels = ( + self._branch_filters[0][0] + self._branch_filters[1][1] + + self._branch_filters[2][1] + self._branch_filters[3][0]) + out_gating_params.append(dict(filters=out_channels)) + + return [ + branch_0_params, branch_1_params, branch_2_params, branch_3_params, + out_gating_params + ] + + def build(self, input_shape): + branch_params = self._build_branch_params() + + self._branch_0_layers = [ + tf.keras.layers.Conv3D(**branch_params[0][0]), + self._norm(**branch_params[0][1]), + tf.keras.layers.Activation('relu', **branch_params[0][2]), + ] + + self._branch_1_layers = [ + tf.keras.layers.Conv3D(**branch_params[1][0]), + self._norm(**branch_params[1][1]), + tf.keras.layers.Activation('relu', **branch_params[1][2]), + self._parameterized_conv_layer(**branch_params[1][3]), + ] + + self._branch_2_layers = [ + tf.keras.layers.Conv3D(**branch_params[2][0]), + self._norm(**branch_params[2][1]), + tf.keras.layers.Activation('relu', **branch_params[2][2]), + self._parameterized_conv_layer(**branch_params[2][3]) + ] + + if self._swap_pool_and_1x1x1: + self._branch_3_layers = [ + tf.keras.layers.Conv3D(**branch_params[3][0]), + self._norm(**branch_params[3][1]), + tf.keras.layers.Activation('relu', **branch_params[3][2]), + tf.keras.layers.MaxPool3D(**branch_params[3][3]), + ] + else: + self._branch_3_layers = [ + tf.keras.layers.MaxPool3D(**branch_params[3][3]), + tf.keras.layers.Conv3D(**branch_params[3][0]), + self._norm(**branch_params[3][1]), + tf.keras.layers.Activation('relu', **branch_params[3][2]), + ] + + if self._use_self_gating_on_branch: + self._branch_0_layers.append( + nn_blocks_3d.SelfGating(**branch_params[0][-1])) + self._branch_1_layers.append( + nn_blocks_3d.SelfGating(**branch_params[1][-1])) + self._branch_2_layers.append( + nn_blocks_3d.SelfGating(**branch_params[2][-1])) + self._branch_3_layers.append( + nn_blocks_3d.SelfGating(**branch_params[3][-1])) + + if self._use_self_gating_on_cell: + self.cell_self_gating = nn_blocks_3d.SelfGating(**branch_params[4][0]) + + super(InceptionV1CellLayer, self).build(input_shape) + + def call(self, inputs): + x = inputs + for layer in self._branch_0_layers: + x = layer(x) + branch_0 = x + + x = inputs + for layer in self._branch_1_layers: + x = layer(x) + branch_1 = x + + x = inputs + for layer in self._branch_2_layers: + x = layer(x) + branch_2 = x + + x = inputs + for layer in self._branch_3_layers: + x = layer(x) + branch_3 = x + out_tensor = tf.concat([branch_0, branch_1, branch_2, branch_3], + axis=self._channel_axis) + if self._use_self_gating_on_cell: + out_tensor = self.cell_self_gating(out_tensor) + return out_tensor diff --git a/official/projects/s3d/modeling/inception_utils_test.py b/official/projects/s3d/modeling/inception_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3fa79658dba0a4d5589201b000fef299aea2f88f --- /dev/null +++ b/official/projects/s3d/modeling/inception_utils_test.py @@ -0,0 +1,84 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.s3d.modeling import inception_utils + + +class InceptionUtilsTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters((1.0, 3, {'Conv2d_1a_7x7', 'Conv2d_2c_3x3'}), + (0.5, 5, {'Conv2d_1a_7x7', 'Conv2d_2c_3x3'}), + (0.25, 7, {'Conv2d_1a_7x7', 'Conv2d_2c_3x3'})) + def test_s3d_stem_cells(self, depth_multiplier, first_temporal_kernel_size, + temporal_conv_endpoints): + batch_size = 1 + num_frames = 64 + height, width = 224, 224 + + inputs = tf.keras.layers.Input( + shape=(num_frames, height, width, 3), batch_size=batch_size) + + outputs, output_endpoints = inception_utils.inception_v1_stem_cells( + inputs, + depth_multiplier, + 'Mixed_5c', + temporal_conv_endpoints=temporal_conv_endpoints, + self_gating_endpoints={'Conv2d_2c_3x3'}, + first_temporal_kernel_size=first_temporal_kernel_size) + self.assertListEqual(outputs.shape.as_list(), + [batch_size, 32, 28, 28, int(192 * depth_multiplier)]) + + expected_endpoints = { + 'Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1', 'Conv2d_2c_3x3', + 'MaxPool_3a_3x3' + } + self.assertSetEqual(expected_endpoints, set(output_endpoints.keys())) + + @parameterized.parameters( + ('3d', True, True, True), + ('2d', False, False, True), + ('1+2d', True, False, False), + ('2+1d', False, True, False), + ) + def test_inception_v1_cell_endpoint_match(self, conv_type, + swap_pool_and_1x1x1, + use_self_gating_on_branch, + use_self_gating_on_cell): + batch_size = 5 + num_frames = 32 + channels = 128 + height, width = 28, 28 + + inputs = tf.keras.layers.Input( + shape=(num_frames, height, width, channels), batch_size=batch_size) + + inception_v1_cell_layer = inception_utils.InceptionV1CellLayer( + [[64], [96, 128], [16, 32], [32]], + conv_type=conv_type, + swap_pool_and_1x1x1=swap_pool_and_1x1x1, + use_self_gating_on_branch=use_self_gating_on_branch, + use_self_gating_on_cell=use_self_gating_on_cell, + name='test') + outputs = inception_v1_cell_layer(inputs) + + # self.assertTrue(net.op.name.startswith('test')) + self.assertListEqual(outputs.shape.as_list(), + [batch_size, 32, 28, 28, 256]) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/s3d/modeling/net_utils.py b/official/projects/s3d/modeling/net_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..ce586da4420d9209ce9ff7bf21943832797b16c6 --- /dev/null +++ b/official/projects/s3d/modeling/net_utils.py @@ -0,0 +1,221 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Commonly used TensorFlow 2 network blocks.""" +from typing import Any, Text, Sequence, Union + +import tensorflow as tf +from official.modeling import tf_utils + +WEIGHT_INITIALIZER = { + 'Xavier': tf.keras.initializers.GlorotUniform, + 'Gaussian': lambda: tf.keras.initializers.RandomNormal(stddev=0.01), +} + +initializers = tf.keras.initializers +regularizers = tf.keras.regularizers + + +def make_set_from_start_endpoint(start_endpoint: Text, + endpoints: Sequence[Text]): + """Makes a subset of endpoints from the given starting position.""" + if start_endpoint not in endpoints: + return set() + start_index = endpoints.index(start_endpoint) + return set(endpoints[start_index:]) + + +def apply_depth_multiplier(d: Union[int, Sequence[Any]], + depth_multiplier: float): + """Applies depth_multiplier recursively to ints.""" + if isinstance(d, int): + return int(d * depth_multiplier) + else: + return [apply_depth_multiplier(x, depth_multiplier) for x in d] + + +class ParameterizedConvLayer(tf.keras.layers.Layer): + """Convolution layer based on the input conv_type.""" + + def __init__( + self, + conv_type: Text, + kernel_size: int, + filters: int, + strides: Sequence[int], + rates: Sequence[int], + use_sync_bn: bool = False, + norm_momentum: float = 0.999, + norm_epsilon: float = 0.001, + temporal_conv_initializer: Union[ + Text, initializers.Initializer] = 'glorot_uniform', + kernel_initializer: Union[Text, + initializers.Initializer] = 'truncated_normal', + kernel_regularizer: Union[Text, regularizers.Regularizer] = 'l2', + **kwargs): + super(ParameterizedConvLayer, self).__init__(**kwargs) + self._conv_type = conv_type + self._kernel_size = kernel_size + self._filters = filters + self._strides = strides + self._rates = rates + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + self._channel_axis = -1 + else: + self._channel_axis = 1 + self._temporal_conv_initializer = temporal_conv_initializer + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + + def _build_conv_layer_params(self, input_shape): + """Builds params for conv layers.""" + conv_layer_params = [] + if self._conv_type == '3d': + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[self._kernel_size] * 3, + strides=self._strides, + dilation_rate=self._rates, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + )) + elif self._conv_type == '2d': + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[1, self._kernel_size, self._kernel_size], + strides=[1, self._strides[1], self._strides[2]], + dilation_rate=[1, self._rates[1], self._rates[2]], + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + )) + elif self._conv_type == '1+2d': + channels_in = input_shape[self._channel_axis] + conv_layer_params.append( + dict( + filters=channels_in, + kernel_size=[self._kernel_size, 1, 1], + strides=[self._strides[0], 1, 1], + dilation_rate=[self._rates[0], 1, 1], + kernel_initializer=tf_utils.clone_initializer( + self._temporal_conv_initializer), + )) + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[1, self._kernel_size, self._kernel_size], + strides=[1, self._strides[1], self._strides[2]], + dilation_rate=[1, self._rates[1], self._rates[2]], + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + )) + elif self._conv_type == '2+1d': + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[1, self._kernel_size, self._kernel_size], + strides=[1, self._strides[1], self._strides[2]], + dilation_rate=[1, self._rates[1], self._rates[2]], + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + )) + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[self._kernel_size, 1, 1], + strides=[self._strides[0], 1, 1], + dilation_rate=[self._rates[0], 1, 1], + kernel_initializer=tf_utils.clone_initializer( + self._temporal_conv_initializer), + )) + elif self._conv_type == '1+1+1d': + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[1, 1, self._kernel_size], + strides=[1, 1, self._strides[2]], + dilation_rate=[1, 1, self._rates[2]], + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + )) + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[1, self._kernel_size, 1], + strides=[1, self._strides[1], 1], + dilation_rate=[1, self._rates[1], 1], + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + )) + conv_layer_params.append( + dict( + filters=self._filters, + kernel_size=[self._kernel_size, 1, 1], + strides=[self._strides[0], 1, 1], + dilation_rate=[self._rates[0], 1, 1], + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + )) + else: + raise ValueError('Unsupported conv_type: {}'.format(self._conv_type)) + return conv_layer_params + + def _build_norm_layer_params(self, conv_param): + """Builds params for the norm layer after one conv layer.""" + return dict( + axis=self._channel_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + scale=False, + gamma_initializer='ones') + + def _build_activation_layer_params(self, conv_param): + """Builds params for the activation layer after one conv layer.""" + return {} + + def _append_conv_layer(self, param): + """Appends conv, normalization and activation layers.""" + self._parameterized_conv_layers.append( + tf.keras.layers.Conv3D( + padding='same', + use_bias=False, + kernel_regularizer=self._kernel_regularizer, + **param, + )) + norm_layer_params = self._build_norm_layer_params(param) + self._parameterized_conv_layers.append(self._norm(**norm_layer_params)) + + relu_layer_params = self._build_activation_layer_params(param) + self._parameterized_conv_layers.append( + tf.keras.layers.Activation('relu', **relu_layer_params)) + + def build(self, input_shape): + self._parameterized_conv_layers = [] + for conv_layer_param in self._build_conv_layer_params(input_shape): + self._append_conv_layer(conv_layer_param) + super(ParameterizedConvLayer, self).build(input_shape) + + def call(self, inputs): + x = inputs + for layer in self._parameterized_conv_layers: + x = layer(x) + return x diff --git a/official/projects/s3d/modeling/net_utils_test.py b/official/projects/s3d/modeling/net_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d45c1142878f3313da45ad334c842e232cc1bde0 --- /dev/null +++ b/official/projects/s3d/modeling/net_utils_test.py @@ -0,0 +1,68 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from absl import logging +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.s3d.modeling import net_utils + + +class Tf2NetUtilsTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('3d', [2, 1, 1], [5, 16, 28, 28, 256]), + ('3d', [2, 2, 2], [5, 16, 14, 14, 256]), + ('3d', [1, 2, 1], [5, 32, 14, 28, 256]), + ('2d', [2, 2, 2], [5, 32, 14, 14, 256]), + ('2d', [1, 1, 2], [5, 32, 28, 14, 256]), + ('1+2d', [2, 2, 2], [5, 16, 14, 14, 256]), + ('1+2d', [2, 1, 1], [5, 16, 28, 28, 256]), + ('1+2d', [1, 1, 1], [5, 32, 28, 28, 256]), + ('1+2d', [1, 1, 2], [5, 32, 28, 14, 256]), + ('2+1d', [2, 2, 2], [5, 16, 14, 14, 256]), + ('2+1d', [1, 1, 1], [5, 32, 28, 28, 256]), + ('2+1d', [2, 1, 2], [5, 16, 28, 14, 256]), + ('1+1+1d', [2, 2, 2], [5, 16, 14, 14, 256]), + ('1+1+1d', [1, 1, 1], [5, 32, 28, 28, 256]), + ('1+1+1d', [2, 1, 2], [5, 16, 28, 14, 256]), + ) + def test_parameterized_conv_layer_creation(self, conv_type, strides, + expected_shape): + batch_size = 5 + temporal_size = 32 + spatial_size = 28 + channels = 128 + + kernel_size = 3 + filters = 256 + rates = [1, 1, 1] + + name = 'ParameterizedConv' + + inputs = tf.keras.Input( + shape=(temporal_size, spatial_size, spatial_size, channels), + batch_size=batch_size) + parameterized_conv_layer = net_utils.ParameterizedConvLayer( + conv_type, kernel_size, filters, strides, rates, name=name) + + features = parameterized_conv_layer(inputs) + logging.info(features.shape.as_list()) + logging.info([w.name for w in parameterized_conv_layer.weights]) + + self.assertAllEqual(features.shape.as_list(), expected_shape) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/s3d/modeling/s3d.py b/official/projects/s3d/modeling/s3d.py new file mode 100644 index 0000000000000000000000000000000000000000..9b76ad177ed43108c3b3a06477871738592eddff --- /dev/null +++ b/official/projects/s3d/modeling/s3d.py @@ -0,0 +1,355 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains the Tensorflow 2 version definition of S3D model. + +S3D model is described in the following paper: +https://arxiv.org/abs/1712.04851. +""" +from typing import Any, Dict, Mapping, Optional, Sequence, Text, Tuple, Union + +import tensorflow as tf + +from official.modeling import hyperparams +from official.projects.s3d.configs import s3d as cfg +from official.projects.s3d.modeling import inception_utils +from official.projects.s3d.modeling import net_utils +from official.vision.modeling import factory_3d as model_factory +from official.vision.modeling.backbones import factory as backbone_factory + +initializers = tf.keras.initializers +regularizers = tf.keras.regularizers + + +class S3D(tf.keras.Model): + """Class to build S3D family model.""" + + def __init__(self, + input_specs: tf.keras.layers.InputSpec, + final_endpoint: Text = 'Mixed_5c', + first_temporal_kernel_size: int = 3, + temporal_conv_start_at: Text = 'Conv2d_2c_3x3', + gating_start_at: Text = 'Conv2d_2c_3x3', + swap_pool_and_1x1x1: bool = True, + gating_style: Text = 'CELL', + use_sync_bn: bool = False, + norm_momentum: float = 0.999, + norm_epsilon: float = 0.001, + temporal_conv_initializer: Union[ + Text, + initializers.Initializer] = initializers.TruncatedNormal( + mean=0.0, stddev=0.01), + temporal_conv_type: Text = '2+1d', + kernel_initializer: Union[ + Text, + initializers.Initializer] = initializers.TruncatedNormal( + mean=0.0, stddev=0.01), + kernel_regularizer: Union[Text, regularizers.Regularizer] = 'l2', + depth_multiplier: float = 1.0, + **kwargs): + """Constructor. + + Args: + input_specs: `tf.keras.layers.InputSpec` specs of the input tensor. + final_endpoint: Specifies the endpoint to construct the network up to. + first_temporal_kernel_size: Temporal kernel size of the first convolution + layer. + temporal_conv_start_at: Specifies the endpoint where to start performimg + temporal convolution from. + gating_start_at: Specifies the endpoint where to start performimg self + gating from. + swap_pool_and_1x1x1: A boolean flag indicates that whether to swap the + order of convolution and max pooling in Branch_3 of inception v1 cell. + gating_style: A string that specifies self gating to be applied after each + branch and/or after each cell. It can be one of ['BRANCH', 'CELL', + 'BRANCH_AND_CELL']. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + temporal_conv_initializer: Weight initializer for temporal convolutional + layers. + temporal_conv_type: The type of parameterized convolution. Currently, we + support '2d', '3d', '2+1d', '1+2d'. + kernel_initializer: Weight initializer for convolutional layers other than + temporal convolution. + kernel_regularizer: Weight regularizer for all convolutional layers. + depth_multiplier: A float to reduce/increase number of channels. + **kwargs: keyword arguments to be passed. + """ + + self._input_specs = input_specs + self._final_endpoint = final_endpoint + self._first_temporal_kernel_size = first_temporal_kernel_size + self._temporal_conv_start_at = temporal_conv_start_at + self._gating_start_at = gating_start_at + self._swap_pool_and_1x1x1 = swap_pool_and_1x1x1 + self._gating_style = gating_style + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._temporal_conv_initializer = temporal_conv_initializer + self._temporal_conv_type = temporal_conv_type + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._depth_multiplier = depth_multiplier + + self._temporal_conv_endpoints = net_utils.make_set_from_start_endpoint( + temporal_conv_start_at, inception_utils.INCEPTION_V1_CONV_ENDPOINTS) + self._self_gating_endpoints = net_utils.make_set_from_start_endpoint( + gating_start_at, inception_utils.INCEPTION_V1_CONV_ENDPOINTS) + + inputs = tf.keras.Input(shape=input_specs.shape[1:]) + net, end_points = inception_utils.inception_v1_stem_cells( + inputs, + depth_multiplier, + final_endpoint, + temporal_conv_endpoints=self._temporal_conv_endpoints, + self_gating_endpoints=self._self_gating_endpoints, + temporal_conv_type=self._temporal_conv_type, + first_temporal_kernel_size=self._first_temporal_kernel_size, + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon, + temporal_conv_initializer=self._temporal_conv_initializer, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + parameterized_conv_layer=self._get_parameterized_conv_layer_impl(), + layer_naming_fn=self._get_layer_naming_fn(), + ) + + for end_point, filters in inception_utils.INCEPTION_V1_ARCH_SKELETON: + net, end_points = self._s3d_cell(net, end_point, end_points, filters) + if end_point == final_endpoint: + break + + if final_endpoint not in end_points: + raise ValueError( + 'Unrecognized final endpoint %s (available endpoints: %s).' % + (final_endpoint, end_points.keys())) + + super(S3D, self).__init__(inputs=inputs, outputs=end_points, **kwargs) + + def _s3d_cell( + self, + net: tf.Tensor, + end_point: Text, + end_points: Dict[Text, tf.Tensor], + filters: Union[int, Sequence[Any]], + non_local_block: Optional[tf.keras.layers.Layer] = None, + attention_cell: Optional[tf.keras.layers.Layer] = None, + attention_cell_super_graph: Optional[tf.keras.layers.Layer] = None + ) -> Tuple[tf.Tensor, Dict[Text, tf.Tensor]]: + if end_point.startswith('Mixed'): + conv_type = ( + self._temporal_conv_type + if end_point in self._temporal_conv_endpoints else '2d') + use_self_gating_on_branch = ( + end_point in self._self_gating_endpoints and + (self._gating_style == 'BRANCH' or + self._gating_style == 'BRANCH_AND_CELL')) + use_self_gating_on_cell = ( + end_point in self._self_gating_endpoints and + (self._gating_style == 'CELL' or + self._gating_style == 'BRANCH_AND_CELL')) + net = self._get_inception_v1_cell_layer_impl()( + branch_filters=net_utils.apply_depth_multiplier( + filters, self._depth_multiplier), + conv_type=conv_type, + temporal_dilation_rate=1, + swap_pool_and_1x1x1=self._swap_pool_and_1x1x1, + use_self_gating_on_branch=use_self_gating_on_branch, + use_self_gating_on_cell=use_self_gating_on_cell, + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon, + kernel_initializer=self._kernel_initializer, + temporal_conv_initializer=self._temporal_conv_initializer, + kernel_regularizer=self._kernel_regularizer, + name=self._get_layer_naming_fn()(end_point))( + net) + else: + net = tf.keras.layers.MaxPool3D( + pool_size=filters[0], + strides=filters[1], + padding='same', + name=self._get_layer_naming_fn()(end_point))( + net) + end_points[end_point] = net + if non_local_block: + # TODO(b/182299420): Implement non local block in TF2. + raise NotImplementedError('Non local block is not implemented yet.') + if attention_cell: + # TODO(b/182299420): Implement attention cell in TF2. + raise NotImplementedError('Attention cell is not implemented yet.') + if attention_cell_super_graph: + # TODO(b/182299420): Implement attention cell super graph in TF2. + raise NotImplementedError('Attention cell super graph is not implemented' + ' yet.') + return net, end_points + + def get_config(self): + config_dict = { + 'input_specs': self._input_specs, + 'final_endpoint': self._final_endpoint, + 'first_temporal_kernel_size': self._first_temporal_kernel_size, + 'temporal_conv_start_at': self._temporal_conv_start_at, + 'gating_start_at': self._gating_start_at, + 'swap_pool_and_1x1x1': self._swap_pool_and_1x1x1, + 'gating_style': self._gating_style, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'temporal_conv_initializer': self._temporal_conv_initializer, + 'temporal_conv_type': self._temporal_conv_type, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'depth_multiplier': self._depth_multiplier + } + return config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + @property + def output_specs(self): + """A dict of {level: TensorShape} pairs for the model output.""" + return self._output_specs + + def _get_inception_v1_cell_layer_impl(self): + return inception_utils.InceptionV1CellLayer + + def _get_parameterized_conv_layer_impl(self): + return net_utils.ParameterizedConvLayer + + def _get_layer_naming_fn(self): + return lambda end_point: None + + +class S3DModel(tf.keras.Model): + """An S3D model builder.""" + + def __init__(self, + backbone: tf.keras.Model, + num_classes: int, + input_specs: Mapping[Text, tf.keras.layers.InputSpec], + final_endpoint: Text = 'Mixed_5c', + dropout_rate: float = 0.0, + **kwargs): + """Constructor. + + Args: + backbone: S3D backbone Keras Model. + num_classes: `int` number of possible classes for video classification. + input_specs: input_specs: `tf.keras.layers.InputSpec` specs of the input + tensor. + final_endpoint: Specifies the endpoint to construct the network up to. + dropout_rate: `float` between 0 and 1. Fraction of the input units to + drop. Note that dropout_rate = 1.0 - dropout_keep_prob. + **kwargs: keyword arguments to be passed. + """ + self._self_setattr_tracking = False + self._backbone = backbone + self._num_classes = num_classes + self._input_specs = input_specs + self._final_endpoint = final_endpoint + self._dropout_rate = dropout_rate + self._config_dict = { + 'backbone': backbone, + 'num_classes': num_classes, + 'input_specs': input_specs, + 'final_endpoint': final_endpoint, + 'dropout_rate': dropout_rate, + } + + inputs = { + k: tf.keras.Input(shape=v.shape[1:]) for k, v in input_specs.items() + } + streams = self._backbone(inputs['image']) + + pool = tf.math.reduce_mean(streams[self._final_endpoint], axis=[1, 2, 3]) + fc = tf.keras.layers.Dropout(dropout_rate)(pool) + logits = tf.keras.layers.Dense(**self._build_dense_layer_params())(fc) + + super(S3DModel, self).__init__(inputs=inputs, outputs=logits, **kwargs) + + @property + def checkpoint_items(self): + """Returns a dictionary of items to be additionally checkpointed.""" + return dict(backbone=self.backbone) + + @property + def backbone(self): + return self._backbone + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + def _build_dense_layer_params(self): + return dict(units=self._num_classes, kernel_regularizer='l2') + + +@backbone_factory.register_backbone_builder('s3d') +def build_s3d( + input_specs: tf.keras.layers.InputSpec, + backbone_config: hyperparams.Config, + norm_activation_config: hyperparams.Config, + l2_regularizer: tf.keras.regularizers.Regularizer = None +) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + """Builds S3D backbone.""" + + backbone_type = backbone_config.type + backbone_cfg = backbone_config.get() + assert backbone_type == 's3d' + del norm_activation_config + + backbone = S3D( + input_specs=input_specs, + final_endpoint=backbone_cfg.final_endpoint, + first_temporal_kernel_size=backbone_cfg.first_temporal_kernel_size, + temporal_conv_start_at=backbone_cfg.temporal_conv_start_at, + gating_start_at=backbone_cfg.gating_start_at, + swap_pool_and_1x1x1=backbone_cfg.swap_pool_and_1x1x1, + gating_style=backbone_cfg.gating_style, + use_sync_bn=backbone_cfg.use_sync_bn, + norm_momentum=backbone_cfg.norm_momentum, + norm_epsilon=backbone_cfg.norm_epsilon, + temporal_conv_type=backbone_cfg.temporal_conv_type, + kernel_regularizer=l2_regularizer, + depth_multiplier=backbone_cfg.depth_multiplier) + return backbone + + +@model_factory.register_model_builder('s3d') +def build_s3d_model( + input_specs: tf.keras.layers.InputSpec, + model_config: cfg.S3DModel, + num_classes: int, + l2_regularizer: tf.keras.regularizers.Regularizer = None +) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + """Builds S3D model with classification layer.""" + input_specs_dict = {'image': input_specs} + backbone = build_s3d(input_specs, model_config.backbone, + model_config.norm_activation, l2_regularizer) + + model = S3DModel( + backbone, + num_classes=num_classes, + input_specs=input_specs_dict, + dropout_rate=model_config.dropout_rate) + return model diff --git a/official/projects/s3d/modeling/s3d_test.py b/official/projects/s3d/modeling/s3d_test.py new file mode 100644 index 0000000000000000000000000000000000000000..d9565aa4700d922d4192452a457578573e072253 --- /dev/null +++ b/official/projects/s3d/modeling/s3d_test.py @@ -0,0 +1,106 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for S3D model.""" + +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.s3d.modeling import s3d + + +class S3dTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (7, 224, 224, 3), + (7, 128, 128, 3), + (7, 256, 256, 3), + (7, 192, 192, 3), + (64, 224, 224, 3), + (32, 224, 224, 3), + (64, 224, 224, 11), + (32, 224, 224, 11), + ) + def test_build(self, num_frames, height, width, first_temporal_kernel_size): + batch_size = 5 + + input_shape = [batch_size, num_frames, height, width, 3] + input_specs = tf.keras.layers.InputSpec(shape=input_shape) + network = s3d.S3D( + input_specs=input_specs + ) + inputs = tf.keras.Input(shape=input_shape[1:], batch_size=input_shape[0]) + endpoints = network(inputs) + + temporal_1a = (num_frames - 1)//2 + 1 + expected_shapes = { + 'Conv2d_1a_7x7': [5, temporal_1a, height//2, width//2, 64], + 'Conv2d_2b_1x1': [5, temporal_1a, height//4, width//4, 64], + 'Conv2d_2c_3x3': [5, temporal_1a, height//4, height//4, 192], + 'MaxPool_2a_3x3': [5, temporal_1a, height//4, height//4, 64], + 'MaxPool_3a_3x3': [5, temporal_1a, height//8, width//8, 192], + 'Mixed_3b': [5, temporal_1a, height//8, width//8, 256], + 'Mixed_3c': [5, temporal_1a, height//8, width//8, 480], + 'MaxPool_4a_3x3': [5, temporal_1a//2, height//16, width//16, 480], + 'Mixed_4b': [5, temporal_1a//2, height//16, width//16, 512], + 'Mixed_4c': [5, temporal_1a//2, height//16, width//16, 512], + 'Mixed_4d': [5, temporal_1a//2, height//16, width//16, 512], + 'Mixed_4e': [5, temporal_1a//2, height//16, width//16, 528], + 'Mixed_4f': [5, temporal_1a//2, height//16, width//16, 832], + 'MaxPool_5a_2x2': [5, temporal_1a//4, height//32, width//32, 832], + 'Mixed_5b': [5, temporal_1a//4, height//32, width//32, 832], + 'Mixed_5c': [5, temporal_1a//4, height//32, width//32, 1024], + } + + output_shapes = dict() + for end_point, output_tensor in endpoints.items(): + output_shapes[end_point] = output_tensor.shape.as_list() + self.assertDictEqual(output_shapes, expected_shapes) + + def test_serialize_deserialize(self): + # Create a network object that sets all of its config options. + kwargs = dict( + input_specs=tf.keras.layers.InputSpec(shape=(5, 64, 224, 224, 3)), + final_endpoint='Mixed_5c', + first_temporal_kernel_size=3, + temporal_conv_start_at='Conv2d_2c_3x3', + gating_start_at='Conv2d_2c_3x3', + swap_pool_and_1x1x1=True, + gating_style='CELL', + use_sync_bn=False, + norm_momentum=0.999, + norm_epsilon=0.001, + temporal_conv_initializer=tf.keras.initializers.TruncatedNormal( + mean=0.0, stddev=0.01), + temporal_conv_type='2+1d', + kernel_initializer='truncated_normal', + kernel_regularizer='l2', + depth_multiplier=1.0 + ) + network = s3d.S3D(**kwargs) + + expected_config = dict(kwargs) + self.assertEqual(network.get_config(), expected_config) + + # Create another network object from the first object's config. + new_network = s3d.S3D.from_config(network.get_config()) + + # Validate that the config can be forced to JSON. + _ = new_network.to_json() + + # If the serialization was successful, the new config should match the old. + self.assertAllEqual(network.get_config(), new_network.get_config()) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/s3d/train.py b/official/projects/s3d/train.py new file mode 100644 index 0000000000000000000000000000000000000000..5f1819fc2ea3abe9df5764312120f4cb3ce7a392 --- /dev/null +++ b/official/projects/s3d/train.py @@ -0,0 +1,30 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver for S3D.""" + +from absl import app + +from official.common import flags as tfm_flags +# pylint: disable=unused-import +from official.projects.s3d.configs.google import s3d as s3d_config +from official.projects.s3d.modeling import s3d +from official.projects.s3d.tasks.google import automl_video_classification +from official.vision import registry_imports +# pylint: enable=unused-import +from official.vision import train + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/simclr/README.md b/official/projects/simclr/README.md new file mode 100644 index 0000000000000000000000000000000000000000..3644532032a78aefdd7376b1669687a93354dd2c --- /dev/null +++ b/official/projects/simclr/README.md @@ -0,0 +1,78 @@ +# Simple Framework for Contrastive Learning + +[![Paper](http://img.shields.io/badge/Paper-arXiv.2002.05709-B3181B?logo=arXiv)](https://arxiv.org/abs/2002.05709) +[![Paper](http://img.shields.io/badge/Paper-arXiv.2006.10029-B3181B?logo=arXiv)](https://arxiv.org/abs/2006.10029) + +
+ SimCLR Illustration +
+
+ An illustration of SimCLR (from our blog here). +
+ +## Environment setup + +The code can be run on multiple GPUs or TPUs with different distribution +strategies. See the TensorFlow distributed training +[guide](https://www.tensorflow.org/guide/distributed_training) for an overview +of `tf.distribute`. + +The code is compatible with TensorFlow 2.4+. See requirements.txt for all +prerequisites, and you can also install them using the following command. `pip +install -r ./official/requirements.txt` + +## Pretraining +To pretrain the model on Imagenet, try the following command: + +``` +python3 -m official.projects.simclr.train \ + --mode=train_and_eval \ + --experiment=simclr_pretraining \ + --model_dir={MODEL_DIR} \ + --config_file={CONFIG_FILE} +``` + +An example of the config file can be found [here](./configs/experiments/imagenet_simclr_pretrain_gpu.yaml) + + +## Semi-supervised learning and fine-tuning the whole network + +You can access 1% and 10% ImageNet subsets used for semi-supervised learning via +[tensorflow datasets](https://www.tensorflow.org/datasets/catalog/imagenet2012_subset). +You can also find image IDs of these subsets in `imagenet_subsets/`. + +To fine-tune the whole network, refer to the following command: + +``` +python3 -m official.projects.simclr.train \ + --mode=train_and_eval \ + --experiment=simclr_finetuning \ + --model_dir={MODEL_DIR} \ + --config_file={CONFIG_FILE} +``` + +An example of the config file can be found [here](./configs/experiments/imagenet_simclr_finetune_gpu.yaml). + +## Cite + +[SimCLR paper](https://arxiv.org/abs/2002.05709): + +``` +@article{chen2020simple, + title={A Simple Framework for Contrastive Learning of Visual Representations}, + author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey}, + journal={arXiv preprint arXiv:2002.05709}, + year={2020} +} +``` + +[SimCLRv2 paper](https://arxiv.org/abs/2006.10029): + +``` +@article{chen2020big, + title={Big Self-Supervised Models are Strong Semi-Supervised Learners}, + author={Chen, Ting and Kornblith, Simon and Swersky, Kevin and Norouzi, Mohammad and Hinton, Geoffrey}, + journal={arXiv preprint arXiv:2006.10029}, + year={2020} +} +``` diff --git a/official/projects/simclr/common/registry_imports.py b/official/projects/simclr/common/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..16b4a55c19eadbe0ba793467306915eb5684ace5 --- /dev/null +++ b/official/projects/simclr/common/registry_imports.py @@ -0,0 +1,22 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration.""" + +# pylint: disable=unused-import +from official.projects.simclr.configs import simclr +from official.projects.simclr.losses import contrastive_losses +from official.projects.simclr.modeling import simclr_model +from official.projects.simclr.tasks import simclr as simclr_task +from official.vision import registry_imports diff --git a/official/vision/beta/projects/simclr/configs/experiments/cifar_simclr_pretrain.yaml b/official/projects/simclr/configs/experiments/cifar_simclr_pretrain.yaml similarity index 100% rename from official/vision/beta/projects/simclr/configs/experiments/cifar_simclr_pretrain.yaml rename to official/projects/simclr/configs/experiments/cifar_simclr_pretrain.yaml diff --git a/official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_finetune_gpu.yaml b/official/projects/simclr/configs/experiments/imagenet_simclr_finetune_gpu.yaml similarity index 100% rename from official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_finetune_gpu.yaml rename to official/projects/simclr/configs/experiments/imagenet_simclr_finetune_gpu.yaml diff --git a/official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_finetune_tpu.yaml b/official/projects/simclr/configs/experiments/imagenet_simclr_finetune_tpu.yaml similarity index 100% rename from official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_finetune_tpu.yaml rename to official/projects/simclr/configs/experiments/imagenet_simclr_finetune_tpu.yaml diff --git a/official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_multitask_tpu.yaml b/official/projects/simclr/configs/experiments/imagenet_simclr_multitask_tpu.yaml similarity index 100% rename from official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_multitask_tpu.yaml rename to official/projects/simclr/configs/experiments/imagenet_simclr_multitask_tpu.yaml diff --git a/official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_pretrain_gpu.yaml b/official/projects/simclr/configs/experiments/imagenet_simclr_pretrain_gpu.yaml similarity index 100% rename from official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_pretrain_gpu.yaml rename to official/projects/simclr/configs/experiments/imagenet_simclr_pretrain_gpu.yaml diff --git a/official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_pretrain_tpu.yaml b/official/projects/simclr/configs/experiments/imagenet_simclr_pretrain_tpu.yaml similarity index 100% rename from official/vision/beta/projects/simclr/configs/experiments/imagenet_simclr_pretrain_tpu.yaml rename to official/projects/simclr/configs/experiments/imagenet_simclr_pretrain_tpu.yaml diff --git a/official/vision/beta/projects/simclr/configs/multitask_config.py b/official/projects/simclr/configs/multitask_config.py similarity index 90% rename from official/vision/beta/projects/simclr/configs/multitask_config.py rename to official/projects/simclr/configs/multitask_config.py index 8cf00d5afb1dc0bf441ea780ef116b02136a346f..59f6f752a3fe60d3c6d82d8103f7a6e65abed7ce 100644 --- a/official/vision/beta/projects/simclr/configs/multitask_config.py +++ b/official/projects/simclr/configs/multitask_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,10 +20,10 @@ from typing import List, Tuple from official.core import exp_factory from official.modeling import hyperparams from official.modeling.multitask import configs as multitask_configs -from official.vision.beta.configs import backbones -from official.vision.beta.configs import common -from official.vision.beta.projects.simclr.configs import simclr as simclr_configs -from official.vision.beta.projects.simclr.modeling import simclr_model +from official.projects.simclr.configs import simclr as simclr_configs +from official.projects.simclr.modeling import simclr_model +from official.vision.configs import backbones +from official.vision.configs import common @dataclasses.dataclass diff --git a/official/vision/beta/projects/simclr/configs/multitask_config_test.py b/official/projects/simclr/configs/multitask_config_test.py similarity index 84% rename from official/vision/beta/projects/simclr/configs/multitask_config_test.py rename to official/projects/simclr/configs/multitask_config_test.py index 666cd759962f0e150a5c29853e6f16659b40c9d7..d4cfded59b3a3d300046b78c01fcf4979629151b 100644 --- a/official/vision/beta/projects/simclr/configs/multitask_config_test.py +++ b/official/projects/simclr/configs/multitask_config_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,8 +18,8 @@ import tensorflow as tf from official.core import exp_factory from official.modeling.multitask import configs as multitask_configs -from official.vision.beta.projects.simclr.configs import multitask_config as simclr_multitask_config -from official.vision.beta.projects.simclr.configs import simclr as exp_cfg +from official.projects.simclr.configs import multitask_config as simclr_multitask_config +from official.projects.simclr.configs import simclr as exp_cfg class MultitaskConfigTest(tf.test.TestCase): diff --git a/official/projects/simclr/configs/simclr.py b/official/projects/simclr/configs/simclr.py new file mode 100644 index 0000000000000000000000000000000000000000..23c071a8f12650ed1ecc77bcf3476cbc684f9f47 --- /dev/null +++ b/official/projects/simclr/configs/simclr.py @@ -0,0 +1,318 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""SimCLR configurations.""" +import dataclasses +import os +from typing import List, Optional + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.projects.simclr.modeling import simclr_model +from official.vision.configs import backbones +from official.vision.configs import common + + +@dataclasses.dataclass +class Decoder(hyperparams.Config): + decode_label: bool = True + + +@dataclasses.dataclass +class Parser(hyperparams.Config): + """Parser config.""" + aug_rand_crop: bool = True + aug_rand_hflip: bool = True + aug_color_distort: bool = True + aug_color_jitter_strength: float = 1.0 + aug_color_jitter_impl: str = 'simclrv2' # 'simclrv1' or 'simclrv2' + aug_rand_blur: bool = True + parse_label: bool = True + test_crop: bool = True + mode: str = simclr_model.PRETRAIN + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Training data config.""" + input_path: str = '' + global_batch_size: int = 0 + is_training: bool = True + dtype: str = 'float32' + shuffle_buffer_size: int = 10000 + cycle_length: int = 10 + # simclr specific configs + parser: Parser = Parser() + decoder: Decoder = Decoder() + # Useful when doing a sanity check that we absolutely use no labels while + # pretrain by setting labels to zeros (default = False, keep original labels) + input_set_label_to_zero: bool = False + + +@dataclasses.dataclass +class ProjectionHead(hyperparams.Config): + proj_output_dim: int = 128 + num_proj_layers: int = 3 + ft_proj_idx: int = 1 # layer of the projection head to use for fine-tuning. + + +@dataclasses.dataclass +class SupervisedHead(hyperparams.Config): + num_classes: int = 1001 + zero_init: bool = False + + +@dataclasses.dataclass +class ContrastiveLoss(hyperparams.Config): + projection_norm: bool = True + temperature: float = 0.1 + l2_weight_decay: float = 0.0 + + +@dataclasses.dataclass +class ClassificationLosses(hyperparams.Config): + label_smoothing: float = 0.0 + one_hot: bool = True + l2_weight_decay: float = 0.0 + + +@dataclasses.dataclass +class Evaluation(hyperparams.Config): + top_k: int = 5 + one_hot: bool = True + + +@dataclasses.dataclass +class SimCLRModel(hyperparams.Config): + """SimCLR model config.""" + input_size: List[int] = dataclasses.field(default_factory=list) + backbone: backbones.Backbone = backbones.Backbone( + type='resnet', resnet=backbones.ResNet()) + projection_head: ProjectionHead = ProjectionHead( + proj_output_dim=128, num_proj_layers=3, ft_proj_idx=1) + supervised_head: SupervisedHead = SupervisedHead(num_classes=1001) + norm_activation: common.NormActivation = common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False) + mode: str = simclr_model.PRETRAIN + backbone_trainable: bool = True + + +@dataclasses.dataclass +class SimCLRPretrainTask(cfg.TaskConfig): + """SimCLR pretraining task config.""" + model: SimCLRModel = SimCLRModel(mode=simclr_model.PRETRAIN) + train_data: DataConfig = DataConfig( + parser=Parser(mode=simclr_model.PRETRAIN), is_training=True) + validation_data: DataConfig = DataConfig( + parser=Parser(mode=simclr_model.PRETRAIN), is_training=False) + loss: ContrastiveLoss = ContrastiveLoss() + evaluation: Evaluation = Evaluation() + init_checkpoint: Optional[str] = None + # all or backbone + init_checkpoint_modules: str = 'all' + + +@dataclasses.dataclass +class SimCLRFinetuneTask(cfg.TaskConfig): + """SimCLR fine tune task config.""" + model: SimCLRModel = SimCLRModel( + mode=simclr_model.FINETUNE, + supervised_head=SupervisedHead(num_classes=1001, zero_init=True)) + train_data: DataConfig = DataConfig( + parser=Parser(mode=simclr_model.FINETUNE), is_training=True) + validation_data: DataConfig = DataConfig( + parser=Parser(mode=simclr_model.FINETUNE), is_training=False) + loss: ClassificationLosses = ClassificationLosses() + evaluation: Evaluation = Evaluation() + init_checkpoint: Optional[str] = None + # all, backbone_projection or backbone + init_checkpoint_modules: str = 'backbone_projection' + + +@exp_factory.register_config_factory('simclr_pretraining') +def simclr_pretraining() -> cfg.ExperimentConfig: + """Image classification general.""" + return cfg.ExperimentConfig( + task=SimCLRPretrainTask(), + trainer=cfg.TrainerConfig(), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + +@exp_factory.register_config_factory('simclr_finetuning') +def simclr_finetuning() -> cfg.ExperimentConfig: + """Image classification general.""" + return cfg.ExperimentConfig( + task=SimCLRFinetuneTask(), + trainer=cfg.TrainerConfig(), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + +IMAGENET_TRAIN_EXAMPLES = 1281167 +IMAGENET_VAL_EXAMPLES = 50000 +IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord' + + +@exp_factory.register_config_factory('simclr_pretraining_imagenet') +def simclr_pretraining_imagenet() -> cfg.ExperimentConfig: + """Image classification general.""" + train_batch_size = 4096 + eval_batch_size = 4096 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + return cfg.ExperimentConfig( + task=SimCLRPretrainTask( + model=SimCLRModel( + mode=simclr_model.PRETRAIN, + backbone_trainable=True, + input_size=[224, 224, 3], + backbone=backbones.Backbone( + type='resnet', resnet=backbones.ResNet(model_id=50)), + projection_head=ProjectionHead( + proj_output_dim=128, num_proj_layers=3, ft_proj_idx=1), + supervised_head=SupervisedHead(num_classes=1001), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=True)), + loss=ContrastiveLoss(), + evaluation=Evaluation(), + train_data=DataConfig( + parser=Parser(mode=simclr_model.PRETRAIN), + decoder=Decoder(decode_label=True), + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + parser=Parser(mode=simclr_model.PRETRAIN), + decoder=Decoder(decode_label=True), + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size), + ), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=500 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'lars', + 'lars': { + 'momentum': + 0.9, + 'weight_decay_rate': + 0.000001, + 'exclude_from_weight_decay': [ + 'batch_normalization', 'bias' + ] + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + # 0.2 * BatchSize / 256 + 'initial_learning_rate': 0.2 * train_batch_size / 256, + # train_steps - warmup_steps + 'decay_steps': 475 * steps_per_epoch + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + # 5% of total epochs + 'warmup_steps': 25 * steps_per_epoch + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + +@exp_factory.register_config_factory('simclr_finetuning_imagenet') +def simclr_finetuning_imagenet() -> cfg.ExperimentConfig: + """Image classification general.""" + train_batch_size = 1024 + eval_batch_size = 1024 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + pretrain_model_base = '' + return cfg.ExperimentConfig( + task=SimCLRFinetuneTask( + model=SimCLRModel( + mode=simclr_model.FINETUNE, + backbone_trainable=True, + input_size=[224, 224, 3], + backbone=backbones.Backbone( + type='resnet', resnet=backbones.ResNet(model_id=50)), + projection_head=ProjectionHead( + proj_output_dim=128, num_proj_layers=3, ft_proj_idx=1), + supervised_head=SupervisedHead(num_classes=1001, zero_init=True), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), + loss=ClassificationLosses(), + evaluation=Evaluation(), + train_data=DataConfig( + parser=Parser(mode=simclr_model.FINETUNE), + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + parser=Parser(mode=simclr_model.FINETUNE), + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size), + init_checkpoint=pretrain_model_base, + # all, backbone_projection or backbone + init_checkpoint_modules='backbone_projection'), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=60 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'lars', + 'lars': { + 'momentum': + 0.9, + 'weight_decay_rate': + 0.0, + 'exclude_from_weight_decay': [ + 'batch_normalization', 'bias' + ] + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + # 0.01 × BatchSize / 512 + 'initial_learning_rate': 0.01 * train_batch_size / 512, + 'decay_steps': 60 * steps_per_epoch + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) diff --git a/official/vision/beta/projects/simclr/configs/simclr_test.py b/official/projects/simclr/configs/simclr_test.py similarity index 86% rename from official/vision/beta/projects/simclr/configs/simclr_test.py rename to official/projects/simclr/configs/simclr_test.py index 5a6518018e33a715a9118eef431e40e14f28fe36..af3dfbf5729f1633c681a7c20c556e7d6dc2fb1f 100644 --- a/official/vision/beta/projects/simclr/configs/simclr_test.py +++ b/official/projects/simclr/configs/simclr_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,8 +19,8 @@ import tensorflow as tf from official.core import config_definitions as cfg from official.core import exp_factory -from official.vision.beta.projects.simclr.common import registry_imports # pylint: disable=unused-import -from official.vision.beta.projects.simclr.configs import simclr as exp_cfg +from official.projects.simclr.common import registry_imports # pylint: disable=unused-import +from official.projects.simclr.configs import simclr as exp_cfg class SimCLRConfigTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/projects/simclr/dataloaders/preprocess_ops.py b/official/projects/simclr/dataloaders/preprocess_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..081621466e81955cec26634b51bb944c05094b7c --- /dev/null +++ b/official/projects/simclr/dataloaders/preprocess_ops.py @@ -0,0 +1,349 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Preprocessing ops.""" +import functools +import tensorflow as tf + +CROP_PROPORTION = 0.875 # Standard for ImageNet. + + +def random_apply(func, p, x): + """Randomly apply function func to x with probability p.""" + return tf.cond( + tf.less( + tf.random.uniform([], minval=0, maxval=1, dtype=tf.float32), + tf.cast(p, tf.float32)), lambda: func(x), lambda: x) + + +def random_brightness(image, max_delta, impl='simclrv2'): + """A multiplicative vs additive change of brightness.""" + if impl == 'simclrv2': + factor = tf.random.uniform([], tf.maximum(1.0 - max_delta, 0), + 1.0 + max_delta) + image = image * factor + elif impl == 'simclrv1': + image = tf.image.random_brightness(image, max_delta=max_delta) + else: + raise ValueError('Unknown impl {} for random brightness.'.format(impl)) + return image + + +def to_grayscale(image, keep_channels=True): + image = tf.image.rgb_to_grayscale(image) + if keep_channels: + image = tf.tile(image, [1, 1, 3]) + return image + + +def color_jitter_nonrand(image, + brightness=0, + contrast=0, + saturation=0, + hue=0, + impl='simclrv2'): + """Distorts the color of the image (jittering order is fixed). + + Args: + image: The input image tensor. + brightness: A float, specifying the brightness for color jitter. + contrast: A float, specifying the contrast for color jitter. + saturation: A float, specifying the saturation for color jitter. + hue: A float, specifying the hue for color jitter. + impl: 'simclrv1' or 'simclrv2'. Whether to use simclrv1 or simclrv2's + version of random brightness. + + Returns: + The distorted image tensor. + """ + with tf.name_scope('distort_color'): + def apply_transform(i, x, brightness, contrast, saturation, hue): + """Apply the i-th transformation.""" + if brightness != 0 and i == 0: + x = random_brightness(x, max_delta=brightness, impl=impl) + elif contrast != 0 and i == 1: + x = tf.image.random_contrast( + x, lower=1 - contrast, upper=1 + contrast) + elif saturation != 0 and i == 2: + x = tf.image.random_saturation( + x, lower=1 - saturation, upper=1 + saturation) + elif hue != 0: + x = tf.image.random_hue(x, max_delta=hue) + return x + + for i in range(4): + image = apply_transform(i, image, brightness, contrast, saturation, hue) + image = tf.clip_by_value(image, 0., 1.) + return image + + +def color_jitter_rand(image, + brightness=0, + contrast=0, + saturation=0, + hue=0, + impl='simclrv2'): + """Distorts the color of the image (jittering order is random). + + Args: + image: The input image tensor. + brightness: A float, specifying the brightness for color jitter. + contrast: A float, specifying the contrast for color jitter. + saturation: A float, specifying the saturation for color jitter. + hue: A float, specifying the hue for color jitter. + impl: 'simclrv1' or 'simclrv2'. Whether to use simclrv1 or simclrv2's + version of random brightness. + + Returns: + The distorted image tensor. + """ + with tf.name_scope('distort_color'): + def apply_transform(i, x): + """Apply the i-th transformation.""" + + def brightness_foo(): + if brightness == 0: + return x + else: + return random_brightness(x, max_delta=brightness, impl=impl) + + def contrast_foo(): + if contrast == 0: + return x + else: + return tf.image.random_contrast(x, lower=1 - contrast, + upper=1 + contrast) + + def saturation_foo(): + if saturation == 0: + return x + else: + return tf.image.random_saturation( + x, lower=1 - saturation, upper=1 + saturation) + + def hue_foo(): + if hue == 0: + return x + else: + return tf.image.random_hue(x, max_delta=hue) + + x = tf.cond(tf.less(i, 2), + lambda: tf.cond(tf.less(i, 1), brightness_foo, contrast_foo), + lambda: tf.cond(tf.less(i, 3), saturation_foo, hue_foo)) + return x + + perm = tf.random.shuffle(tf.range(4)) + for i in range(4): + image = apply_transform(perm[i], image) + image = tf.clip_by_value(image, 0., 1.) + return image + + +def color_jitter(image, strength, random_order=True, impl='simclrv2'): + """Distorts the color of the image. + + Args: + image: The input image tensor. + strength: the floating number for the strength of the color augmentation. + random_order: A bool, specifying whether to randomize the jittering order. + impl: 'simclrv1' or 'simclrv2'. Whether to use simclrv1 or simclrv2's + version of random brightness. + + Returns: + The distorted image tensor. + """ + brightness = 0.8 * strength + contrast = 0.8 * strength + saturation = 0.8 * strength + hue = 0.2 * strength + if random_order: + return color_jitter_rand( + image, brightness, contrast, saturation, hue, impl=impl) + else: + return color_jitter_nonrand( + image, brightness, contrast, saturation, hue, impl=impl) + + +def random_color_jitter(image, + p=1.0, + color_jitter_strength=1.0, + impl='simclrv2'): + """Perform random color jitter.""" + def _transform(image): + color_jitter_t = functools.partial( + color_jitter, strength=color_jitter_strength, impl=impl) + image = random_apply(color_jitter_t, p=0.8, x=image) + return random_apply(to_grayscale, p=0.2, x=image) + + return random_apply(_transform, p=p, x=image) + + +def gaussian_blur(image, kernel_size, sigma, padding='SAME'): + """Blurs the given image with separable convolution. + + + Args: + image: Tensor of shape [height, width, channels] and dtype float to blur. + kernel_size: Integer Tensor for the size of the blur kernel. This is should + be an odd number. If it is an even number, the actual kernel size will be + size + 1. + sigma: Sigma value for gaussian operator. + padding: Padding to use for the convolution. Typically 'SAME' or 'VALID'. + + Returns: + A Tensor representing the blurred image. + """ + radius = tf.cast(kernel_size / 2, dtype=tf.int32) + kernel_size = radius * 2 + 1 + x = tf.cast(tf.range(-radius, radius + 1), dtype=tf.float32) + blur_filter = tf.exp(-tf.pow(x, 2.0) / + (2.0 * tf.pow(tf.cast(sigma, dtype=tf.float32), 2.0))) + blur_filter /= tf.reduce_sum(blur_filter) + # One vertical and one horizontal filter. + blur_v = tf.reshape(blur_filter, [kernel_size, 1, 1, 1]) + blur_h = tf.reshape(blur_filter, [1, kernel_size, 1, 1]) + num_channels = tf.shape(image)[-1] + blur_h = tf.tile(blur_h, [1, 1, num_channels, 1]) + blur_v = tf.tile(blur_v, [1, 1, num_channels, 1]) + expand_batch_dim = image.shape.ndims == 3 + if expand_batch_dim: + # Tensorflow requires batched input to convolutions, which we can fake with + # an extra dimension. + image = tf.expand_dims(image, axis=0) + blurred = tf.nn.depthwise_conv2d( + image, blur_h, strides=[1, 1, 1, 1], padding=padding) + blurred = tf.nn.depthwise_conv2d( + blurred, blur_v, strides=[1, 1, 1, 1], padding=padding) + if expand_batch_dim: + blurred = tf.squeeze(blurred, axis=0) + return blurred + + +def random_blur(image, height, width, p=0.5): + """Randomly blur an image. + + Args: + image: `Tensor` representing an image of arbitrary size. + height: Height of output image. + width: Width of output image. + p: probability of applying this transformation. + + Returns: + A preprocessed image `Tensor`. + """ + del width + + def _transform(image): + sigma = tf.random.uniform([], 0.1, 2.0, dtype=tf.float32) + return gaussian_blur( + image, kernel_size=height // 10, sigma=sigma, padding='SAME') + + return random_apply(_transform, p=p, x=image) + + +def distorted_bounding_box_crop(image, + bbox, + min_object_covered=0.1, + aspect_ratio_range=(0.75, 1.33), + area_range=(0.05, 1.0), + max_attempts=100, + scope=None): + """Generates cropped_image using one of the bboxes randomly distorted. + + See `tf.image.sample_distorted_bounding_box` for more documentation. + + Args: + image: `Tensor` of image data. + bbox: `Tensor` of bounding boxes arranged `[1, num_boxes, coords]` + where each coordinate is [0, 1) and the coordinates are arranged + as `[ymin, xmin, ymax, xmax]`. If num_boxes is 0 then use the whole + image. + min_object_covered: An optional `float`. Defaults to `0.1`. The cropped + area of the image must contain at least this fraction of any bounding + box supplied. + aspect_ratio_range: An optional list of `float`s. The cropped area of the + image must have an aspect ratio = width / height within this range. + area_range: An optional list of `float`s. The cropped area of the image + must contain a fraction of the supplied image within in this range. + max_attempts: An optional `int`. Number of attempts at generating a cropped + region of the image of the specified constraints. After `max_attempts` + failures, return the entire image. + scope: Optional `str` for name scope. + Returns: + (cropped image `Tensor`, distorted bbox `Tensor`). + """ + with tf.name_scope(scope or 'distorted_bounding_box_crop'): + shape = tf.shape(image) + sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( + shape, + bounding_boxes=bbox, + min_object_covered=min_object_covered, + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + max_attempts=max_attempts, + use_image_if_no_bounding_boxes=True) + bbox_begin, bbox_size, _ = sample_distorted_bounding_box + + # Crop the image to the specified bounding box. + offset_y, offset_x, _ = tf.unstack(bbox_begin) + target_height, target_width, _ = tf.unstack(bbox_size) + image = tf.image.crop_to_bounding_box( + image, offset_y, offset_x, target_height, target_width) + + return image + + +def crop_and_resize(image, height, width): + """Make a random crop and resize it to height `height` and width `width`. + + Args: + image: Tensor representing the image. + height: Desired image height. + width: Desired image width. + + Returns: + A `height` x `width` x channels Tensor holding a random crop of `image`. + """ + bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) + aspect_ratio = width / height + image = distorted_bounding_box_crop( + image, + bbox, + min_object_covered=0.1, + aspect_ratio_range=(3. / 4 * aspect_ratio, 4. / 3. * aspect_ratio), + area_range=(0.08, 1.0), + max_attempts=100, + scope=None) + return tf.image.resize([image], [height, width], + method=tf.image.ResizeMethod.BICUBIC)[0] + + +def random_crop_with_resize(image, height, width, p=1.0): + """Randomly crop and resize an image. + + Args: + image: `Tensor` representing an image of arbitrary size. + height: Height of output image. + width: Width of output image. + p: Probability of applying this transformation. + + Returns: + A preprocessed image `Tensor`. + """ + + def _transform(image): # pylint: disable=missing-docstring + image = crop_and_resize(image, height, width) + return image + + return random_apply(_transform, p=p, x=image) diff --git a/official/vision/beta/projects/simclr/dataloaders/simclr_input.py b/official/projects/simclr/dataloaders/simclr_input.py similarity index 95% rename from official/vision/beta/projects/simclr/dataloaders/simclr_input.py rename to official/projects/simclr/dataloaders/simclr_input.py index 4170b2e681649090810efb049256c2d7c8fe9187..8585f5dada772c5b716a7c940fcaef33c1d10d9b 100644 --- a/official/vision/beta/projects/simclr/dataloaders/simclr_input.py +++ b/official/projects/simclr/dataloaders/simclr_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -40,11 +40,11 @@ from typing import List import tensorflow as tf -from official.vision.beta.dataloaders import decoder -from official.vision.beta.dataloaders import parser -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.projects.simclr.dataloaders import preprocess_ops as simclr_preprocess_ops -from official.vision.beta.projects.simclr.modeling import simclr_model +from official.projects.simclr.dataloaders import preprocess_ops as simclr_preprocess_ops +from official.projects.simclr.modeling import simclr_model +from official.vision.dataloaders import decoder +from official.vision.dataloaders import parser +from official.vision.ops import preprocess_ops class Decoder(decoder.Decoder): diff --git a/official/vision/beta/projects/simclr/heads/simclr_head.py b/official/projects/simclr/heads/simclr_head.py similarity index 97% rename from official/vision/beta/projects/simclr/heads/simclr_head.py rename to official/projects/simclr/heads/simclr_head.py index 947fc38e980bc6c089aa786a646241b27be7810b..7dc37f0dbe65790ae04f97227537136e4c017aaf 100644 --- a/official/vision/beta/projects/simclr/heads/simclr_head.py +++ b/official/projects/simclr/heads/simclr_head.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,11 +14,11 @@ """SimCLR prediction heads.""" -from typing import Text, Optional +from typing import Optional, Text import tensorflow as tf -from official.vision.beta.projects.simclr.modeling.layers import nn_blocks +from official.projects.simclr.modeling.layers import nn_blocks regularizers = tf.keras.regularizers layers = tf.keras.layers diff --git a/official/vision/beta/projects/simclr/heads/simclr_head_test.py b/official/projects/simclr/heads/simclr_head_test.py similarity index 96% rename from official/vision/beta/projects/simclr/heads/simclr_head_test.py rename to official/projects/simclr/heads/simclr_head_test.py index 1c8f92603ad26f3971933017f5e27f517ce48645..1ff7582c82d5c0ae7d3cb10b6172b0b59c222258 100644 --- a/official/vision/beta/projects/simclr/heads/simclr_head_test.py +++ b/official/projects/simclr/heads/simclr_head_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.projects.simclr.heads import simclr_head +from official.projects.simclr.heads import simclr_head class ProjectionHeadTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/vision/beta/projects/simclr/losses/contrastive_losses.py b/official/projects/simclr/losses/contrastive_losses.py similarity index 98% rename from official/vision/beta/projects/simclr/losses/contrastive_losses.py rename to official/projects/simclr/losses/contrastive_losses.py index af528265c6e20b9d2a0f519e4f2b92d82a471dc5..f16a7b723f59405ba082febd2e2160738f15b614 100644 --- a/official/vision/beta/projects/simclr/losses/contrastive_losses.py +++ b/official/projects/simclr/losses/contrastive_losses.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/simclr/losses/contrastive_losses_test.py b/official/projects/simclr/losses/contrastive_losses_test.py similarity index 94% rename from official/vision/beta/projects/simclr/losses/contrastive_losses_test.py rename to official/projects/simclr/losses/contrastive_losses_test.py index 815a3d01ef8c1906c50d47ba249408af4e029d28..364936ed3caeeb3766ecb1575bbef28b6e5042f5 100644 --- a/official/vision/beta/projects/simclr/losses/contrastive_losses_test.py +++ b/official/projects/simclr/losses/contrastive_losses_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.projects.simclr.losses import contrastive_losses +from official.projects.simclr.losses import contrastive_losses class ContrastiveLossesTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/projects/simclr/modeling/layers/nn_blocks.py b/official/projects/simclr/modeling/layers/nn_blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..013a7be5201e4a0b15f0c595e22f471ca43098cb --- /dev/null +++ b/official/projects/simclr/modeling/layers/nn_blocks.py @@ -0,0 +1,133 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common building blocks for simclr neural networks.""" +from typing import Text, Optional + +import tensorflow as tf + +from official.modeling import tf_utils + +regularizers = tf.keras.regularizers + + +class DenseBN(tf.keras.layers.Layer): + """Modified Dense layer to help build simclr system. + + The layer is a standards combination of Dense, BatchNorm and Activation. + """ + + def __init__( + self, + output_dim: int, + use_bias: bool = True, + use_normalization: bool = False, + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + activation: Optional[Text] = 'relu', + kernel_initializer: Text = 'VarianceScaling', + kernel_regularizer: Optional[regularizers.Regularizer] = None, + bias_regularizer: Optional[regularizers.Regularizer] = None, + name='linear_layer', + **kwargs): + """Customized Dense layer. + + Args: + output_dim: `int` size of output dimension. + use_bias: if True, use biase in the dense layer. + use_normalization: if True, use batch normalization. + use_sync_bn: if True, use synchronized batch normalization. + norm_momentum: `float` normalization momentum for the moving average. + norm_epsilon: `float` small float added to variance to avoid dividing by + zero. + activation: `str` name of the activation function. + kernel_initializer: kernel_initializer for convolutional layers. + kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D. + Default to None. + bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d. + Default to None. + name: `str`, name of the layer. + **kwargs: keyword arguments to be passed. + """ + # Note: use_bias is ignored for the dense layer when use_bn=True. + # However, it is still used for batch norm. + super(DenseBN, self).__init__(**kwargs) + self._output_dim = output_dim + self._use_bias = use_bias + self._use_normalization = use_normalization + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._activation = activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._name = name + + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + if activation: + self._activation_fn = tf_utils.get_activation(activation) + else: + self._activation_fn = None + + def get_config(self): + config = { + 'output_dim': self._output_dim, + 'use_bias': self._use_bias, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'use_normalization': self._use_normalization, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + } + base_config = super(DenseBN, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def build(self, input_shape): + self._dense0 = tf.keras.layers.Dense( + self._output_dim, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + use_bias=self._use_bias and not self._use_normalization) + + if self._use_normalization: + self._norm0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + center=self._use_bias, + scale=True) + + super(DenseBN, self).build(input_shape) + + def call(self, inputs, training=None): + assert inputs.shape.ndims == 2, inputs.shape + x = self._dense0(inputs) + if self._use_normalization: + x = self._norm0(x) + if self._activation: + x = self._activation_fn(x) + return x diff --git a/official/projects/simclr/modeling/layers/nn_blocks_test.py b/official/projects/simclr/modeling/layers/nn_blocks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..f8d830dfbf91b58c6fe4be410ab76d346f1f20bb --- /dev/null +++ b/official/projects/simclr/modeling/layers/nn_blocks_test.py @@ -0,0 +1,58 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from absl.testing import parameterized + +import tensorflow as tf + +from official.projects.simclr.modeling.layers import nn_blocks + + +class DenseBNTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + (64, True, True), + (64, True, False), + (64, False, True), + ) + def test_pass_through(self, output_dim, use_bias, use_normalization): + test_layer = nn_blocks.DenseBN( + output_dim=output_dim, + use_bias=use_bias, + use_normalization=use_normalization + ) + + x = tf.keras.Input(shape=(64,)) + out_x = test_layer(x) + + self.assertAllEqual(out_x.shape.as_list(), [None, output_dim]) + + # kernel of the dense layer + train_var_len = 1 + if use_normalization: + if use_bias: + # batch norm introduce two trainable variables + train_var_len += 2 + else: + # center is set to False if not use bias + train_var_len += 1 + else: + if use_bias: + # bias of dense layer + train_var_len += 1 + self.assertLen(test_layer.trainable_variables, train_var_len) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/projects/simclr/modeling/multitask_model.py b/official/projects/simclr/modeling/multitask_model.py similarity index 91% rename from official/vision/beta/projects/simclr/modeling/multitask_model.py rename to official/projects/simclr/modeling/multitask_model.py index a971e85c89c952cef20c778138ee443fe9443cf3..0c814dba7382a645eb96c71ad8ea751cf0e7a5cb 100644 --- a/official/vision/beta/projects/simclr/modeling/multitask_model.py +++ b/official/projects/simclr/modeling/multitask_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,15 +14,15 @@ """Multi-task image multi-taskSimCLR model definition.""" from typing import Dict, Text -from absl import logging +from absl import logging import tensorflow as tf from official.modeling.multitask import base_model -from official.vision.beta.modeling import backbones -from official.vision.beta.projects.simclr.configs import multitask_config as simclr_multitask_config -from official.vision.beta.projects.simclr.heads import simclr_head -from official.vision.beta.projects.simclr.modeling import simclr_model +from official.projects.simclr.configs import multitask_config as simclr_multitask_config +from official.projects.simclr.heads import simclr_head +from official.projects.simclr.modeling import simclr_model +from official.vision.modeling import backbones PROJECTION_OUTPUT_KEY = 'projection_outputs' SUPERVISED_OUTPUT_KEY = 'supervised_outputs' @@ -110,8 +110,9 @@ class SimCLRMTModel(base_model.MultiTaskBaseModel): pretrained_items = dict( backbone=self._backbone, projection_head=self._projection_head) else: - assert ("Only 'backbone_projection' or 'backbone' can be used to " - 'initialize the model.') + raise ValueError( + "Only 'backbone_projection' or 'backbone' can be used to " + 'initialize the model.') ckpt = tf.train.Checkpoint(**pretrained_items) status = ckpt.read(ckpt_dir_or_file) diff --git a/official/vision/beta/projects/simclr/modeling/multitask_model_test.py b/official/projects/simclr/modeling/multitask_model_test.py similarity index 83% rename from official/vision/beta/projects/simclr/modeling/multitask_model_test.py rename to official/projects/simclr/modeling/multitask_model_test.py index 0190145a8974a6ebf8ea834317bf087f7d8676ce..365ae5a3517bc7bc1daef553109cfccbccf6e2c4 100644 --- a/official/vision/beta/projects/simclr/modeling/multitask_model_test.py +++ b/official/projects/simclr/modeling/multitask_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,9 +18,9 @@ import os.path import tensorflow as tf -from official.vision.beta.projects.simclr.configs import multitask_config -from official.vision.beta.projects.simclr.modeling import multitask_model -from official.vision.beta.projects.simclr.modeling import simclr_model +from official.projects.simclr.configs import multitask_config +from official.projects.simclr.modeling import multitask_model +from official.projects.simclr.modeling import simclr_model class MultitaskModelTest(tf.test.TestCase): diff --git a/official/vision/beta/projects/simclr/modeling/simclr_model.py b/official/projects/simclr/modeling/simclr_model.py similarity index 98% rename from official/vision/beta/projects/simclr/modeling/simclr_model.py rename to official/projects/simclr/modeling/simclr_model.py index 25db8a9f33c2611a4e3aa29da23d209256951d6a..da8a6e3572cfd3ef9e4c9bfb687e4b9151e7058a 100644 --- a/official/vision/beta/projects/simclr/modeling/simclr_model.py +++ b/official/projects/simclr/modeling/simclr_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/simclr/modeling/simclr_model_test.py b/official/projects/simclr/modeling/simclr_model_test.py similarity index 89% rename from official/vision/beta/projects/simclr/modeling/simclr_model_test.py rename to official/projects/simclr/modeling/simclr_model_test.py index ee8724ebadec7a44f7acc67bcf5bd8d3734c1da8..42f104be3adef12bdd469d8e078928988b7509c2 100644 --- a/official/vision/beta/projects/simclr/modeling/simclr_model_test.py +++ b/official/projects/simclr/modeling/simclr_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,13 +14,12 @@ """Test for SimCLR model.""" from absl.testing import parameterized - import numpy as np import tensorflow as tf -from official.vision.beta.modeling import backbones -from official.vision.beta.projects.simclr.heads import simclr_head -from official.vision.beta.projects.simclr.modeling import simclr_model +from official.projects.simclr.heads import simclr_head +from official.projects.simclr.modeling import simclr_model +from official.vision.modeling import backbones class SimCLRModelTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/beta/projects/simclr/multitask_train.py b/official/projects/simclr/multitask_train.py similarity index 89% rename from official/vision/beta/projects/simclr/multitask_train.py rename to official/projects/simclr/multitask_train.py index 77fb621a87910aca736cab4eba80688f06a278bd..fec106dcac6ce4a8779a8744d24515cec3b36e95 100644 --- a/official/vision/beta/projects/simclr/multitask_train.py +++ b/official/projects/simclr/multitask_train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -25,9 +25,9 @@ from official.modeling.multitask import multitask from official.modeling.multitask import train_lib # pylint: disable=unused-import -from official.vision.beta.projects.simclr.common import registry_imports -from official.vision.beta.projects.simclr.configs import multitask_config -from official.vision.beta.projects.simclr.modeling import multitask_model +from official.projects.simclr.common import registry_imports +from official.projects.simclr.configs import multitask_config +from official.projects.simclr.modeling import multitask_model # pylint: enable=unused-import FLAGS = flags.FLAGS diff --git a/official/projects/simclr/tasks/simclr.py b/official/projects/simclr/tasks/simclr.py new file mode 100644 index 0000000000000000000000000000000000000000..cf52fa1fbe04c25f19c9a39e3fa8de44d0887736 --- /dev/null +++ b/official/projects/simclr/tasks/simclr.py @@ -0,0 +1,635 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image SimCLR task definition. + +SimCLR training two different modes: +- pretrain +- fine-tuning + +For the above two different modes, the following components are different in +the task definition: +- training data format +- training loss +- projection_head and/or supervised_head +""" +from typing import Dict, Optional + +from absl import logging +import tensorflow as tf + +from official.core import base_task +from official.core import config_definitions +from official.core import input_reader +from official.core import task_factory +from official.modeling import optimization +from official.modeling import performance +from official.modeling import tf_utils +from official.projects.simclr.configs import simclr as exp_cfg +from official.projects.simclr.dataloaders import simclr_input +from official.projects.simclr.heads import simclr_head +from official.projects.simclr.losses import contrastive_losses +from official.projects.simclr.modeling import simclr_model +from official.vision.modeling import backbones + +OptimizationConfig = optimization.OptimizationConfig +RuntimeConfig = config_definitions.RuntimeConfig + + +@task_factory.register_task_cls(exp_cfg.SimCLRPretrainTask) +class SimCLRPretrainTask(base_task.Task): + """A task for image classification.""" + + def create_optimizer(self, + optimizer_config: OptimizationConfig, + runtime_config: Optional[RuntimeConfig] = None): + """Creates an TF optimizer from configurations. + + Args: + optimizer_config: the parameters of the Optimization settings. + runtime_config: the parameters of the runtime. + + Returns: + A tf.optimizers.Optimizer object. + """ + if (optimizer_config.optimizer.type == 'lars' and + self.task_config.loss.l2_weight_decay > 0.0): + raise ValueError('The l2_weight_decay cannot be used together with lars ' + 'optimizer. Please set it to 0.') + + opt_factory = optimization.OptimizerFactory(optimizer_config) + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + # Configuring optimizer when loss_scale is set in runtime config. This helps + # avoiding overflow/underflow for float16 computations. + if runtime_config and runtime_config.loss_scale: + optimizer = performance.configure_optimizer( + optimizer, + use_float16=runtime_config.mixed_precision_dtype == 'float16', + loss_scale=runtime_config.loss_scale) + + return optimizer + + def build_model(self): + model_config = self.task_config.model + input_specs = tf.keras.layers.InputSpec(shape=[None] + + model_config.input_size) + + l2_weight_decay = self.task_config.loss.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = ( + tf.keras.regularizers.l2(l2_weight_decay / + 2.0) if l2_weight_decay else None) + + # Build backbone + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=model_config.norm_activation, + l2_regularizer=l2_regularizer) + + # Build projection head + norm_activation_config = model_config.norm_activation + projection_head_config = model_config.projection_head + projection_head = simclr_head.ProjectionHead( + proj_output_dim=projection_head_config.proj_output_dim, + num_proj_layers=projection_head_config.num_proj_layers, + ft_proj_idx=projection_head_config.ft_proj_idx, + kernel_regularizer=l2_regularizer, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon) + + # Build supervised head + supervised_head_config = model_config.supervised_head + if supervised_head_config: + if supervised_head_config.zero_init: + s_kernel_initializer = 'zeros' + else: + s_kernel_initializer = 'random_uniform' + supervised_head = simclr_head.ClassificationHead( + num_classes=supervised_head_config.num_classes, + kernel_initializer=s_kernel_initializer, + kernel_regularizer=l2_regularizer) + else: + supervised_head = None + + model = simclr_model.SimCLRModel( + input_specs=input_specs, + backbone=backbone, + projection_head=projection_head, + supervised_head=supervised_head, + mode=model_config.mode, + backbone_trainable=model_config.backbone_trainable) + + logging.info(model.get_config()) + + return model + + def initialize(self, model: tf.keras.Model): + """Loading pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + elif self.task_config.init_checkpoint_modules == 'backbone': + ckpt = tf.train.Checkpoint(backbone=model.backbone) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + raise ValueError( + "Only 'all' or 'backbone' can be used to initialize the model.") + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs(self, params, input_context=None): + input_size = self.task_config.model.input_size + + if params.tfds_name: + decoder = simclr_input.TFDSDecoder(params.decoder.decode_label) + else: + decoder = simclr_input.Decoder(params.decoder.decode_label) + + parser = simclr_input.Parser( + output_size=input_size[:2], + aug_rand_crop=params.parser.aug_rand_crop, + aug_rand_hflip=params.parser.aug_rand_hflip, + aug_color_distort=params.parser.aug_color_distort, + aug_color_jitter_strength=params.parser.aug_color_jitter_strength, + aug_color_jitter_impl=params.parser.aug_color_jitter_impl, + aug_rand_blur=params.parser.aug_rand_blur, + parse_label=params.parser.parse_label, + test_crop=params.parser.test_crop, + mode=params.parser.mode, + dtype=params.dtype) + + reader = input_reader.InputReader( + params, + dataset_fn=tf.data.TFRecordDataset, + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + + dataset = reader.read(input_context=input_context) + + return dataset + + def build_losses(self, + labels, + model_outputs, + aux_losses=None) -> Dict[str, tf.Tensor]: + # Compute contrastive relative loss + con_losses_obj = contrastive_losses.ContrastiveLoss( + projection_norm=self.task_config.loss.projection_norm, + temperature=self.task_config.loss.temperature) + # The projection outputs from model has the size of + # (2 * bsz, project_dim) + projection_outputs = model_outputs[simclr_model.PROJECTION_OUTPUT_KEY] + projection1, projection2 = tf.split(projection_outputs, 2, 0) + contrast_loss, (contrast_logits, contrast_labels) = con_losses_obj( + projection1=projection1, projection2=projection2) + + contrast_accuracy = tf.equal( + tf.argmax(contrast_labels, axis=1), tf.argmax(contrast_logits, axis=1)) + contrast_accuracy = tf.reduce_mean(tf.cast(contrast_accuracy, tf.float32)) + + contrast_prob = tf.nn.softmax(contrast_logits) + contrast_entropy = -tf.reduce_mean( + tf.reduce_sum(contrast_prob * tf.math.log(contrast_prob + 1e-8), -1)) + + model_loss = contrast_loss + + losses = { + 'contrast_loss': contrast_loss, + 'contrast_accuracy': contrast_accuracy, + 'contrast_entropy': contrast_entropy + } + + if self.task_config.model.supervised_head is not None: + outputs = model_outputs[simclr_model.SUPERVISED_OUTPUT_KEY] + labels = tf.concat([labels, labels], 0) + + if self.task_config.evaluation.one_hot: + sup_loss = tf.keras.losses.CategoricalCrossentropy( + from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, + outputs) + else: + sup_loss = tf.keras.losses.SparseCategoricalCrossentropy( + from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, + outputs) + sup_loss = tf.reduce_mean(sup_loss) + + label_acc = tf.equal( + tf.argmax(labels, axis=1), tf.argmax(outputs, axis=1)) + label_acc = tf.reduce_mean(tf.cast(label_acc, tf.float32)) + + model_loss = contrast_loss + sup_loss + + losses.update({ + 'accuracy': label_acc, + 'supervised_loss': sup_loss, + }) + + total_loss = model_loss + if aux_losses: + reg_loss = tf.reduce_sum(aux_losses) + total_loss = model_loss + reg_loss + + losses['total_loss'] = total_loss + + return losses + + def build_metrics(self, training=True): + + if training: + metrics = [] + metric_names = [ + 'total_loss', 'contrast_loss', 'contrast_accuracy', 'contrast_entropy' + ] + if self.task_config.model.supervised_head: + metric_names.extend(['supervised_loss', 'accuracy']) + for name in metric_names: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + else: + k = self.task_config.evaluation.top_k + if self.task_config.evaluation.one_hot: + metrics = [ + tf.keras.metrics.CategoricalAccuracy(name='accuracy'), + tf.keras.metrics.TopKCategoricalAccuracy( + k=k, name='top_{}_accuracy'.format(k)) + ] + else: + metrics = [ + tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), + tf.keras.metrics.SparseTopKCategoricalAccuracy( + k=k, name='top_{}_accuracy'.format(k)) + ] + return metrics + + def train_step(self, inputs, model, optimizer, metrics=None): + features, labels = inputs + + # To do a sanity check that we absolutely use no labels when pretraining, we + # can set the labels here to zero. + if self.task_config.train_data.input_set_label_to_zero: + labels *= 0 + + if (self.task_config.model.supervised_head is not None and + self.task_config.evaluation.one_hot): + num_classes = self.task_config.model.supervised_head.num_classes + labels = tf.one_hot(labels, num_classes) + + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model(features, training=True) + # Casting output layer as float32 is necessary when mixed_precision is + # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + losses = self.build_losses( + model_outputs=outputs, labels=labels, aux_losses=model.losses) + + scaled_loss = losses['total_loss'] / num_replicas + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + logging.info('Trainable variables:') + for var in tvars: + logging.info(var.name) + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient when LossScaleOptimizer is used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: losses['total_loss']} + + for m in metrics: + m.update_state(losses[m.name]) + logs.update({m.name: m.result()}) + + return logs + + def validation_step(self, inputs, model, metrics=None): + if self.task_config.model.supervised_head is None: + raise ValueError( + 'Skipping eval during pretraining without supervised head.') + + features, labels = inputs + if self.task_config.evaluation.one_hot: + num_classes = self.task_config.model.supervised_head.num_classes + labels = tf.one_hot(labels, num_classes) + + outputs = model( + features, training=False)[simclr_model.SUPERVISED_OUTPUT_KEY] + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + + logs = {self.loss: 0} + + if metrics: + self.process_metrics(metrics, labels, outputs) + logs.update({m.name: m.result() for m in metrics}) + elif model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in model.metrics}) + + return logs + + +@task_factory.register_task_cls(exp_cfg.SimCLRFinetuneTask) +class SimCLRFinetuneTask(base_task.Task): + """A task for image classification.""" + + def create_optimizer(self, + optimizer_config: OptimizationConfig, + runtime_config: Optional[RuntimeConfig] = None): + """Creates an TF optimizer from configurations. + + Args: + optimizer_config: the parameters of the Optimization settings. + runtime_config: the parameters of the runtime. + + Returns: + A tf.optimizers.Optimizer object. + """ + if (optimizer_config.optimizer.type == 'lars' and + self.task_config.loss.l2_weight_decay > 0.0): + raise ValueError('The l2_weight_decay cannot be used together with lars ' + 'optimizer. Please set it to 0.') + + opt_factory = optimization.OptimizerFactory(optimizer_config) + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + # Configuring optimizer when loss_scale is set in runtime config. This helps + # avoiding overflow/underflow for float16 computations. + if runtime_config and runtime_config.loss_scale: + optimizer = performance.configure_optimizer( + optimizer, + use_float16=runtime_config.mixed_precision_dtype == 'float16', + loss_scale=runtime_config.loss_scale) + + return optimizer + + def build_model(self): + model_config = self.task_config.model + input_specs = tf.keras.layers.InputSpec(shape=[None] + + model_config.input_size) + + l2_weight_decay = self.task_config.loss.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = ( + tf.keras.regularizers.l2(l2_weight_decay / + 2.0) if l2_weight_decay else None) + + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=model_config.norm_activation, + l2_regularizer=l2_regularizer) + + norm_activation_config = model_config.norm_activation + projection_head_config = model_config.projection_head + projection_head = simclr_head.ProjectionHead( + proj_output_dim=projection_head_config.proj_output_dim, + num_proj_layers=projection_head_config.num_proj_layers, + ft_proj_idx=projection_head_config.ft_proj_idx, + kernel_regularizer=l2_regularizer, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon) + + supervised_head_config = model_config.supervised_head + if supervised_head_config.zero_init: + s_kernel_initializer = 'zeros' + else: + s_kernel_initializer = 'random_uniform' + supervised_head = simclr_head.ClassificationHead( + num_classes=supervised_head_config.num_classes, + kernel_initializer=s_kernel_initializer, + kernel_regularizer=l2_regularizer) + + model = simclr_model.SimCLRModel( + input_specs=input_specs, + backbone=backbone, + projection_head=projection_head, + supervised_head=supervised_head, + mode=model_config.mode, + backbone_trainable=model_config.backbone_trainable) + + logging.info(model.get_config()) + + return model + + def initialize(self, model: tf.keras.Model): + """Loading pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + elif self.task_config.init_checkpoint_modules == 'backbone_projection': + ckpt = tf.train.Checkpoint( + backbone=model.backbone, projection_head=model.projection_head) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + elif self.task_config.init_checkpoint_modules == 'backbone': + ckpt = tf.train.Checkpoint(backbone=model.backbone) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + raise ValueError( + "Only 'all' or 'backbone' can be used to initialize the model.") + + # If the checkpoint is from pretraining, reset the following parameters + model.backbone_trainable = self.task_config.model.backbone_trainable + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs(self, params, input_context=None): + input_size = self.task_config.model.input_size + + if params.tfds_name: + decoder = simclr_input.TFDSDecoder(params.decoder.decode_label) + else: + decoder = simclr_input.Decoder(params.decoder.decode_label) + parser = simclr_input.Parser( + output_size=input_size[:2], + parse_label=params.parser.parse_label, + test_crop=params.parser.test_crop, + mode=params.parser.mode, + dtype=params.dtype) + + reader = input_reader.InputReader( + params, + dataset_fn=tf.data.TFRecordDataset, + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + + dataset = reader.read(input_context=input_context) + + return dataset + + def build_losses(self, labels, model_outputs, aux_losses=None): + """Sparse categorical cross entropy loss. + + Args: + labels: labels. + model_outputs: Output logits of the classifier. + aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model. + + Returns: + The total loss tensor. + """ + losses_config = self.task_config.loss + if losses_config.one_hot: + total_loss = tf.keras.losses.categorical_crossentropy( + labels, + model_outputs, + from_logits=True, + label_smoothing=losses_config.label_smoothing) + else: + total_loss = tf.keras.losses.sparse_categorical_crossentropy( + labels, model_outputs, from_logits=True) + + total_loss = tf_utils.safe_mean(total_loss) + if aux_losses: + total_loss += tf.add_n(aux_losses) + + return total_loss + + def build_metrics(self, training=True): + """Gets streaming metrics for training/validation.""" + k = self.task_config.evaluation.top_k + if self.task_config.evaluation.one_hot: + metrics = [ + tf.keras.metrics.CategoricalAccuracy(name='accuracy'), + tf.keras.metrics.TopKCategoricalAccuracy( + k=k, name='top_{}_accuracy'.format(k)) + ] + else: + metrics = [ + tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), + tf.keras.metrics.SparseTopKCategoricalAccuracy( + k=k, name='top_{}_accuracy'.format(k)) + ] + return metrics + + def train_step(self, inputs, model, optimizer, metrics=None): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + if self.task_config.loss.one_hot: + num_classes = self.task_config.model.supervised_head.num_classes + labels = tf.one_hot(labels, num_classes) + + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model( + features, training=True)[simclr_model.SUPERVISED_OUTPUT_KEY] + # Casting output layer as float32 is necessary when mixed_precision is + # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + loss = self.build_losses( + model_outputs=outputs, labels=labels, aux_losses=model.losses) + # Scales loss as the default gradients allreduce performs sum inside the + # optimizer. + scaled_loss = loss / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + logging.info('Trainable variables:') + for var in tvars: + logging.info(var.name) + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient before apply_gradients when LossScaleOptimizer is + # used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: loss} + if metrics: + self.process_metrics(metrics, labels, outputs) + logs.update({m.name: m.result() for m in metrics}) + elif model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in model.metrics}) + return logs + + def validation_step(self, inputs, model, metrics=None): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + if self.task_config.loss.one_hot: + num_classes = self.task_config.model.supervised_head.num_classes + labels = tf.one_hot(labels, num_classes) + + outputs = self.inference_step(features, + model)[simclr_model.SUPERVISED_OUTPUT_KEY] + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + loss = self.build_losses( + model_outputs=outputs, labels=labels, aux_losses=model.losses) + + logs = {self.loss: loss} + if metrics: + self.process_metrics(metrics, labels, outputs) + logs.update({m.name: m.result() for m in metrics}) + elif model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in model.metrics}) + return logs diff --git a/official/projects/simclr/train.py b/official/projects/simclr/train.py new file mode 100644 index 0000000000000000000000000000000000000000..3114af5c3d2b4be93927a6ab7a70a70df8b3e49d --- /dev/null +++ b/official/projects/simclr/train.py @@ -0,0 +1,66 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision SimCLR trainer.""" +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +from official.projects.simclr.common import registry_imports # pylint: disable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + print(FLAGS.experiment) + params = train_utils.parse_configuration(FLAGS) + + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(main) diff --git a/official/nlp/projects/teams/README.md b/official/projects/teams/README.md similarity index 100% rename from official/nlp/projects/teams/README.md rename to official/projects/teams/README.md diff --git a/official/projects/teams/__init__.py b/official/projects/teams/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/teams/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/nlp/projects/teams/experiments/base/glue_mnli.yaml b/official/projects/teams/experiments/base/glue_mnli.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/base/glue_mnli.yaml rename to official/projects/teams/experiments/base/glue_mnli.yaml diff --git a/official/nlp/projects/teams/experiments/base/squad_v1.yaml b/official/projects/teams/experiments/base/squad_v1.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/base/squad_v1.yaml rename to official/projects/teams/experiments/base/squad_v1.yaml diff --git a/official/nlp/projects/teams/experiments/base/squad_v2.yaml b/official/projects/teams/experiments/base/squad_v2.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/base/squad_v2.yaml rename to official/projects/teams/experiments/base/squad_v2.yaml diff --git a/official/nlp/projects/teams/experiments/base/wiki_books_pretrain.yaml b/official/projects/teams/experiments/base/wiki_books_pretrain.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/base/wiki_books_pretrain.yaml rename to official/projects/teams/experiments/base/wiki_books_pretrain.yaml diff --git a/official/nlp/projects/teams/experiments/small/glue_mnli.yaml b/official/projects/teams/experiments/small/glue_mnli.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/small/glue_mnli.yaml rename to official/projects/teams/experiments/small/glue_mnli.yaml diff --git a/official/nlp/projects/teams/experiments/small/squad_v1.yaml b/official/projects/teams/experiments/small/squad_v1.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/small/squad_v1.yaml rename to official/projects/teams/experiments/small/squad_v1.yaml diff --git a/official/nlp/projects/teams/experiments/small/squad_v2.yaml b/official/projects/teams/experiments/small/squad_v2.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/small/squad_v2.yaml rename to official/projects/teams/experiments/small/squad_v2.yaml diff --git a/official/nlp/projects/teams/experiments/small/wiki_books_pretrain.yaml b/official/projects/teams/experiments/small/wiki_books_pretrain.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/small/wiki_books_pretrain.yaml rename to official/projects/teams/experiments/small/wiki_books_pretrain.yaml diff --git a/official/nlp/projects/teams/experiments/teams_en_uncased_base.yaml b/official/projects/teams/experiments/teams_en_uncased_base.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/teams_en_uncased_base.yaml rename to official/projects/teams/experiments/teams_en_uncased_base.yaml diff --git a/official/nlp/projects/teams/experiments/teams_en_uncased_small.yaml b/official/projects/teams/experiments/teams_en_uncased_small.yaml similarity index 100% rename from official/nlp/projects/teams/experiments/teams_en_uncased_small.yaml rename to official/projects/teams/experiments/teams_en_uncased_small.yaml diff --git a/official/nlp/projects/teams/teams.py b/official/projects/teams/teams.py similarity index 98% rename from official/nlp/projects/teams/teams.py rename to official/projects/teams/teams.py index e5aed0a7a5235f98ff3a65c01309d1df92681aca..d2833cfe5def06bf7e654889e3c0b1c515fb3418 100644 --- a/official/nlp/projects/teams/teams.py +++ b/official/projects/teams/teams.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/projects/teams/teams_experiments.py b/official/projects/teams/teams_experiments.py similarity index 96% rename from official/nlp/projects/teams/teams_experiments.py rename to official/projects/teams/teams_experiments.py index 3c9df9b4d13af417e5d953c8a5bc1fe261f99015..030e1393918786edcf83d346ab227c7bd2c186cd 100644 --- a/official/nlp/projects/teams/teams_experiments.py +++ b/official/projects/teams/teams_experiments.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 # pylint: disable=g-doc-return-or-yield,line-too-long """TEAMS experiments.""" import dataclasses @@ -24,10 +23,10 @@ from official.nlp.configs import encoders from official.nlp.data import pretrain_dataloader from official.nlp.data import question_answering_dataloader from official.nlp.data import sentence_prediction_dataloader -from official.nlp.projects.teams import teams -from official.nlp.projects.teams import teams_task from official.nlp.tasks import question_answering from official.nlp.tasks import sentence_prediction +from official.projects.teams import teams +from official.projects.teams import teams_task AdamWeightDecay = optimization.AdamWeightDecayConfig PolynomialLr = optimization.PolynomialLrConfig diff --git a/official/nlp/projects/teams/teams_pretrainer.py b/official/projects/teams/teams_pretrainer.py similarity index 98% rename from official/nlp/projects/teams/teams_pretrainer.py rename to official/projects/teams/teams_pretrainer.py index 727c0184f01067524291b3e760a6b5d9b4f9f5b5..ea8121f9256ff1d0626815eabebe8c33455a7bd2 100644 --- a/official/nlp/projects/teams/teams_pretrainer.py +++ b/official/projects/teams/teams_pretrainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -58,15 +58,16 @@ class ReplacedTokenDetectionHead(tf.keras.layers.Layer): intermediate_activation=self.activation, dropout_rate=self.hidden_cfg['dropout_rate'], attention_dropout_rate=self.hidden_cfg['attention_dropout_rate'], - kernel_initializer=self.initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='transformer/layer_%d_rtd' % i)) self.dense = tf.keras.layers.Dense( self.hidden_size, activation=self.activation, - kernel_initializer=self.initializer, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='transform/rtd_dense') self.rtd_head = tf.keras.layers.Dense( - units=1, kernel_initializer=self.initializer, + units=1, + kernel_initializer=tf_utils.clone_initializer(self.initializer), name='transform/rtd_head') if output not in ('predictions', 'logits'): diff --git a/official/nlp/projects/teams/teams_pretrainer_test.py b/official/projects/teams/teams_pretrainer_test.py similarity index 98% rename from official/nlp/projects/teams/teams_pretrainer_test.py rename to official/projects/teams/teams_pretrainer_test.py index 643038509f16ff8a11d0d99e718992fbb26a66c2..9a1fc2029d8bd8a9b7263705a2f4d52376361376 100644 --- a/official/nlp/projects/teams/teams_pretrainer_test.py +++ b/official/projects/teams/teams_pretrainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ from tensorflow.python.keras import keras_parameterized # pylint: disable=g-dir from official.modeling import activations from official.nlp.modeling.networks import encoder_scaffold from official.nlp.modeling.networks import packed_sequence_embedding -from official.nlp.projects.teams import teams_pretrainer +from official.projects.teams import teams_pretrainer # This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It diff --git a/official/nlp/projects/teams/teams_task.py b/official/projects/teams/teams_task.py similarity index 98% rename from official/nlp/projects/teams/teams_task.py rename to official/projects/teams/teams_task.py index c14ba0a09f4d8fbfba9dfd1448c494dd32b1d3ff..c8da8c8274395f0d76db1dd2e21e5dd02020a187 100644 --- a/official/nlp/projects/teams/teams_task.py +++ b/official/projects/teams/teams_task.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,8 +23,8 @@ from official.core import task_factory from official.modeling import tf_utils from official.nlp.data import pretrain_dataloader from official.nlp.modeling import layers -from official.nlp.projects.teams import teams -from official.nlp.projects.teams import teams_pretrainer +from official.projects.teams import teams +from official.projects.teams import teams_pretrainer @dataclasses.dataclass diff --git a/official/nlp/projects/teams/teams_task_test.py b/official/projects/teams/teams_task_test.py similarity index 92% rename from official/nlp/projects/teams/teams_task_test.py rename to official/projects/teams/teams_task_test.py index 329fd3fe7cd50976d545c13443607af88edf73c3..df3c93a0f92b8b79653d21231aac1d50cc07efd3 100644 --- a/official/nlp/projects/teams/teams_task_test.py +++ b/official/projects/teams/teams_task_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,8 +19,8 @@ import tensorflow as tf from official.nlp.configs import encoders from official.nlp.data import pretrain_dataloader -from official.nlp.projects.teams import teams -from official.nlp.projects.teams import teams_task +from official.projects.teams import teams +from official.projects.teams import teams_task class TeamsPretrainTaskTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/projects/teams/train.py b/official/projects/teams/train.py new file mode 100644 index 0000000000000000000000000000000000000000..b13afe537e514197edcafbc6cdb8e3aae96f7c26 --- /dev/null +++ b/official/projects/teams/train.py @@ -0,0 +1,28 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Teams training driver, register Teams configs.""" + +# pylint: disable=unused-import +from absl import app + +from official.common import flags as tfm_flags +from official.nlp import tasks +from official.nlp import train +from official.projects.teams import teams_experiments +from official.projects.teams import teams_task + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/text_classification_example/classification_data_loader.py b/official/projects/text_classification_example/classification_data_loader.py index fea67e026a5d702d853628b2fd233eb54c43ff20..fa142bc03b595a7e24603208fbc0742fbe830cdb 100644 --- a/official/projects/text_classification_example/classification_data_loader.py +++ b/official/projects/text_classification_example/classification_data_loader.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/text_classification_example/classification_example.py b/official/projects/text_classification_example/classification_example.py index da0eccb750c43c6ab012617e8071a5d9e01fde98..b8600a4e043f95b4c8b6bc8ad58750d7de88a5ec 100644 --- a/official/projects/text_classification_example/classification_example.py +++ b/official/projects/text_classification_example/classification_example.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/text_classification_example/classification_example_test.py b/official/projects/text_classification_example/classification_example_test.py index d26ece724459fc1475ada0998b682e9dda13d71c..4de434f531df9da31f5f2d5586ebf498ae1ec680 100644 --- a/official/projects/text_classification_example/classification_example_test.py +++ b/official/projects/text_classification_example/classification_example_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/text_classification_example/train.py b/official/projects/text_classification_example/train.py index bfa28b5c6252775b7e1b04c52140be79b886f49a..c2e8e16558cdd0484e29796957584e225f4ba4ee 100644 --- a/official/projects/text_classification_example/train.py +++ b/official/projects/text_classification_example/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/projects/tn_bert/README.md b/official/projects/tn_bert/README.md similarity index 100% rename from official/nlp/projects/tn_bert/README.md rename to official/projects/tn_bert/README.md diff --git a/official/projects/token_dropping/README.md b/official/projects/token_dropping/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4e8eb007ee1cf9d0e787720e2ffa9c4a72b5e0f1 --- /dev/null +++ b/official/projects/token_dropping/README.md @@ -0,0 +1,104 @@ +# Token Dropping for Efficient BERT Pretraining + +This is the official implementation of the token dropping method +[Pang et al. Token Dropping for Efficient BERT Pretraining. ACL 2022](#reference). + +Token dropping aims to accelerate the pretraining of transformer +models such as BERT without degrading its performance on downstream tasks. In +particular, we drop unimportant tokens starting from an intermediate layer in +the model, to make the model focus on important tokens more efficiently with its +limited computational resources. The dropped tokens are later picked up by the +last layer of the model, so that the model still produces full-length sequences. +We leverage the already built-in masked language modeling (MLM) loss and its +dynamics to identify unimportant tokens with practically no computational +overhead. In our experiments, this simple approach reduces the pretraining cost +of BERT by 25% while achieving slightly better overall fine-tuning performance +on standard downstream tasks. + +A BERT model pretrained using this token dropping method is not different to +a BERT model pretrained in the conventional way: a BERT checkpoint pretrained +with token dropping can be viewed and used as a normal BERT checkpoint, for +finetuning etc. Thus, this README file only illustrates how to run token +dropping for pretraining. + +### Requirements + +The starter code requires Tensorflow. If you haven't installed it yet, follow +the instructions on [tensorflow.org][1]. +This code has been tested with Tensorflow 2.5.0. Going forward, +we will continue to target the latest released version of Tensorflow. + +Please verify that you have Python 3.6+ and Tensorflow 2.5.0 or higher +installed by running the following commands: + +```sh +python --version +python -c 'import tensorflow as tf; print(tf.__version__)' +``` + +Refer to the [instructions here][2] +for using the model in this repo. Make sure to add the models folder to your +Python path. + +[1]: https://www.tensorflow.org/install/ +[2]: +https://github.com/tensorflow/models/tree/master/official#running-the-models + +Then, you need to generate pretraining data. See +[this instruction] +(https://github.com/tensorflow/models/blob/27fb855b027ead16d2616dcb59c67409a2176b7f/official/legacy/bert/README.md#pre-training) +on how to do that. + +## Train using the config file. + +After you generated your pretraining data, run the following command to start +pretraining: + +```bash +PARAMS="task.train_data.input_data=/path/to/train/data" +PARAMS="${PARAMS},task.validation_data.input_path=/path/to/validation/data" +PARAMS="${PARAMS},runtime.distribution_strategy=tpu" + +python3 train.py \ + --experiment=token_drop_bert/pretraining \ + --config_file=wiki_books_pretrain_sequence_pack.yaml \ + --config_file=bert_en_uncased_base_token_drop.yaml \ + --params_override=${PARAMS} \ + --tpu=local \ + --model_dir=/folder/to/hold/logs/and/models/ \ + --mode=train_and_eval +``` + +## Implementation + +We implement the encoder and layers using `tf.keras` APIs in NLP +modeling library: + + * [masked_lm.py](https://github.com/tensorflow/models/blob/master/official/projects/token_dropping/masked_lm.py) + contains the BERT pretraining task. + + * [experiment_configs.py](https://github.com/tensorflow/models/blob/master/official/projects/token_dropping/experiment_configs.py) + registers the token dropping experiment. + + * [encoder.py](https://github.com/tensorflow/models/blob/master/official/projects/token_dropping/encoder.py) + contains the BERT encoder that supports token dropping. + + * [encoder_config.py](https://github.com/tensorflow/models/blob/master/official/projects/token_dropping/encoder_config.py) + contains the config and method for instantiating the token dropping BERT + encoder. + + * [train.py](https://github.com/tensorflow/models/blob/master/official/projects/token_dropping/train.py) + is the program entry. + +## Reference + +Please cite our paper: + +``` +@article{hou2022token, + title={Token Dropping for Efficient BERT Pretraining}, + author={Pang, Richard Yuanzhe and Hou, Le and Zhou, Tianyi and Wu, Yuexin and Song, Xinying and Song, Xiaodan and Zhou, Denny}, + journal={arXiv preprint arXiv:2203.13240}, + year={2022} +} +``` diff --git a/official/projects/token_dropping/bert_en_uncased_base_token_drop.yaml b/official/projects/token_dropping/bert_en_uncased_base_token_drop.yaml new file mode 100644 index 0000000000000000000000000000000000000000..718aacb22cec58d7bd74b79a4cd3b8ead4db66be --- /dev/null +++ b/official/projects/token_dropping/bert_en_uncased_base_token_drop.yaml @@ -0,0 +1,26 @@ +task: + model: + encoder: + type: any + any: + token_allow_list: !!python/tuple + - 100 # [UNK] + - 101 # [CLS] + - 102 # [SEP] + - 103 # [MASK] + token_deny_list: !!python/tuple + - 0 # [PAD] + attention_dropout_rate: 0.1 + dropout_rate: 0.1 + hidden_activation: gelu + hidden_size: 768 + initializer_range: 0.02 + intermediate_size: 3072 + max_position_embeddings: 512 + num_attention_heads: 12 + num_layers: 12 + type_vocab_size: 2 + vocab_size: 30522 + token_loss_init_value: 10.0 + token_loss_beta: 0.995 + token_keep_k: 256 diff --git a/official/projects/token_dropping/encoder.py b/official/projects/token_dropping/encoder.py new file mode 100644 index 0000000000000000000000000000000000000000..c83d21f5e0538e19c89258898607e4b24fdf0b30 --- /dev/null +++ b/official/projects/token_dropping/encoder.py @@ -0,0 +1,400 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Transformer-based BERT encoder network.""" +# pylint: disable=g-classes-have-attributes + +from typing import Any, Callable, Optional, Union, Tuple +from absl import logging +import tensorflow as tf + +from official.modeling import tf_utils +from official.nlp.modeling import layers + + +_Initializer = Union[str, tf.keras.initializers.Initializer] +_Activation = Union[str, Callable[..., Any]] + +_approx_gelu = lambda x: tf.keras.activations.gelu(x, approximate=True) + + +class TokenDropBertEncoder(tf.keras.layers.Layer): + """Bi-directional Transformer-based encoder network with token dropping. + + During pretraining, we drop unimportant tokens starting from an intermediate + layer in the model, to make the model focus on important tokens more + efficiently with its limited computational resources. The dropped tokens are + later picked up by the last layer of the model, so that the model still + produces full-length sequences. This approach reduces the pretraining cost of + BERT by 25% while achieving better overall fine-tuning performance on standard + downstream tasks. + + Args: + vocab_size: The size of the token vocabulary. + hidden_size: The size of the transformer hidden layers. + num_layers: The number of transformer layers. + num_attention_heads: The number of attention heads for each transformer. The + hidden size must be divisible by the number of attention heads. + max_sequence_length: The maximum sequence length that this encoder can + consume. If None, max_sequence_length uses the value from sequence length. + This determines the variable shape for positional embeddings. + type_vocab_size: The number of types that the 'type_ids' input can take. + inner_dim: The output dimension of the first Dense layer in a two-layer + feedforward network for each transformer. + inner_activation: The activation for the first Dense layer in a two-layer + feedforward network for each transformer. + output_dropout: Dropout probability for the post-attention and output + dropout. + attention_dropout: The dropout rate to use for the attention layers within + the transformer layers. + token_loss_init_value: The default loss value of a token, when the token is + never masked and predicted. + token_loss_beta: How running average factor for computing the average loss + value of a token. + token_keep_k: The number of tokens you want to keep in the intermediate + layers. The rest will be dropped in those layers. + token_allow_list: The list of token-ids that should not be droped. In the + BERT English vocab, token-id from 1 to 998 contains special tokens such as + [CLS], [SEP]. By default, token_allow_list contains all of these special + tokens. + token_deny_list: The list of token-ids that should always be droped. In the + BERT English vocab, token-id=0 means [PAD]. By default, token_deny_list + contains and only contains [PAD]. + initializer: The initialzer to use for all weights in this encoder. + output_range: The sequence output range, [0, output_range), by slicing the + target sequence of the last transformer layer. `None` means the entire + target sequence will attend to the source sequence, which yields the full + output. + embedding_width: The width of the word embeddings. If the embedding width is + not equal to hidden size, embedding parameters will be factorized into two + matrices in the shape of ['vocab_size', 'embedding_width'] and + ['embedding_width', 'hidden_size'] ('embedding_width' is usually much + smaller than 'hidden_size'). + embedding_layer: An optional Layer instance which will be called to generate + embeddings for the input word IDs. + norm_first: Whether to normalize inputs to attention and intermediate dense + layers. If set False, output of attention and intermediate dense layers is + normalized. + with_dense_inputs: Whether to accept dense embeddings as the input. + """ + + def __init__( + self, + vocab_size: int, + hidden_size: int = 768, + num_layers: int = 12, + num_attention_heads: int = 12, + max_sequence_length: int = 512, + type_vocab_size: int = 16, + inner_dim: int = 3072, + inner_activation: _Activation = _approx_gelu, + output_dropout: float = 0.1, + attention_dropout: float = 0.1, + token_loss_init_value: float = 10.0, + token_loss_beta: float = 0.995, + token_keep_k: int = 256, + token_allow_list: Tuple[int, ...] = (100, 101, 102, 103), + token_deny_list: Tuple[int, ...] = (0,), + initializer: _Initializer = tf.keras.initializers.TruncatedNormal( + stddev=0.02), + output_range: Optional[int] = None, + embedding_width: Optional[int] = None, + embedding_layer: Optional[tf.keras.layers.Layer] = None, + norm_first: bool = False, + with_dense_inputs: bool = False, + **kwargs): + # Pops kwargs that are used in V1 implementation. + if 'dict_outputs' in kwargs: + kwargs.pop('dict_outputs') + if 'return_all_encoder_outputs' in kwargs: + kwargs.pop('return_all_encoder_outputs') + if 'intermediate_size' in kwargs: + inner_dim = kwargs.pop('intermediate_size') + if 'activation' in kwargs: + inner_activation = kwargs.pop('activation') + if 'dropout_rate' in kwargs: + output_dropout = kwargs.pop('dropout_rate') + if 'attention_dropout_rate' in kwargs: + attention_dropout = kwargs.pop('attention_dropout_rate') + super().__init__(**kwargs) + + if output_range is not None: + logging.warning('`output_range` is available as an argument for `call()`.' + 'The `output_range` as __init__ argument is deprecated.') + + activation = tf.keras.activations.get(inner_activation) + initializer = tf.keras.initializers.get(initializer) + + if embedding_width is None: + embedding_width = hidden_size + + if embedding_layer is None: + self._embedding_layer = layers.OnDeviceEmbedding( + vocab_size=vocab_size, + embedding_width=embedding_width, + initializer=tf_utils.clone_initializer(initializer), + name='word_embeddings') + else: + self._embedding_layer = embedding_layer + + self._position_embedding_layer = layers.PositionEmbedding( + initializer=tf_utils.clone_initializer(initializer), + max_length=max_sequence_length, + name='position_embedding') + + self._type_embedding_layer = layers.OnDeviceEmbedding( + vocab_size=type_vocab_size, + embedding_width=embedding_width, + initializer=tf_utils.clone_initializer(initializer), + use_one_hot=True, + name='type_embeddings') + + self._embedding_norm_layer = tf.keras.layers.LayerNormalization( + name='embeddings/layer_norm', axis=-1, epsilon=1e-12, dtype=tf.float32) + + self._embedding_dropout = tf.keras.layers.Dropout( + rate=output_dropout, name='embedding_dropout') + + # We project the 'embedding' output to 'hidden_size' if it is not already + # 'hidden_size'. + self._embedding_projection = None + if embedding_width != hidden_size: + self._embedding_projection = tf.keras.layers.EinsumDense( + '...x,xy->...y', + output_shape=hidden_size, + bias_axes='y', + kernel_initializer=tf_utils.clone_initializer(initializer), + name='embedding_projection') + + # The first 999 tokens are special tokens such as [PAD], [CLS], [SEP]. + # We want to always mask [PAD], and always not to maks [CLS], [SEP]. + init_importance = tf.constant(token_loss_init_value, shape=(vocab_size)) + if token_allow_list: + init_importance = tf.tensor_scatter_nd_update( + tensor=init_importance, + indices=[[x] for x in token_allow_list], + updates=[1.0e4 for x in token_allow_list]) + if token_deny_list: + init_importance = tf.tensor_scatter_nd_update( + tensor=init_importance, + indices=[[x] for x in token_deny_list], + updates=[-1.0e4 for x in token_deny_list]) + self._token_importance_embed = layers.TokenImportanceWithMovingAvg( + vocab_size=vocab_size, + init_importance=init_importance, + moving_average_beta=token_loss_beta) + + self._token_separator = layers.SelectTopK(top_k=token_keep_k) + self._transformer_layers = [] + self._num_layers = num_layers + self._attention_mask_layer = layers.SelfAttentionMask( + name='self_attention_mask') + for i in range(num_layers): + layer = layers.TransformerEncoderBlock( + num_attention_heads=num_attention_heads, + inner_dim=inner_dim, + inner_activation=inner_activation, + output_dropout=output_dropout, + attention_dropout=attention_dropout, + norm_first=norm_first, + kernel_initializer=tf_utils.clone_initializer(initializer), + name='transformer/layer_%d' % i) + self._transformer_layers.append(layer) + + self._pooler_layer = tf.keras.layers.Dense( + units=hidden_size, + activation='tanh', + kernel_initializer=tf_utils.clone_initializer(initializer), + name='pooler_transform') + + self._config = { + 'vocab_size': vocab_size, + 'hidden_size': hidden_size, + 'num_layers': num_layers, + 'num_attention_heads': num_attention_heads, + 'max_sequence_length': max_sequence_length, + 'type_vocab_size': type_vocab_size, + 'inner_dim': inner_dim, + 'inner_activation': tf.keras.activations.serialize(activation), + 'output_dropout': output_dropout, + 'attention_dropout': attention_dropout, + 'token_loss_init_value': token_loss_init_value, + 'token_loss_beta': token_loss_beta, + 'token_keep_k': token_keep_k, + 'token_allow_list': token_allow_list, + 'token_deny_list': token_deny_list, + 'initializer': tf.keras.initializers.serialize(initializer), + 'output_range': output_range, + 'embedding_width': embedding_width, + 'embedding_layer': embedding_layer, + 'norm_first': norm_first, + 'with_dense_inputs': with_dense_inputs, + } + if with_dense_inputs: + self.inputs = dict( + input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + dense_inputs=tf.keras.Input( + shape=(None, embedding_width), dtype=tf.float32), + dense_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + dense_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + ) + else: + self.inputs = dict( + input_word_ids=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_mask=tf.keras.Input(shape=(None,), dtype=tf.int32), + input_type_ids=tf.keras.Input(shape=(None,), dtype=tf.int32)) + + def call(self, inputs, output_range: Optional[tf.Tensor] = None): + if isinstance(inputs, dict): + word_ids = inputs.get('input_word_ids') + mask = inputs.get('input_mask') + type_ids = inputs.get('input_type_ids') + + dense_inputs = inputs.get('dense_inputs', None) + dense_mask = inputs.get('dense_mask', None) + dense_type_ids = inputs.get('dense_type_ids', None) + else: + raise ValueError('Unexpected inputs type to %s.' % self.__class__) + + word_embeddings = self._embedding_layer(word_ids) + + if dense_inputs is not None: + # Concat the dense embeddings at sequence end. + word_embeddings = tf.concat([word_embeddings, dense_inputs], axis=1) + type_ids = tf.concat([type_ids, dense_type_ids], axis=1) + mask = tf.concat([mask, dense_mask], axis=1) + + # absolute position embeddings. + position_embeddings = self._position_embedding_layer(word_embeddings) + type_embeddings = self._type_embedding_layer(type_ids) + + embeddings = word_embeddings + position_embeddings + type_embeddings + embeddings = self._embedding_norm_layer(embeddings) + embeddings = self._embedding_dropout(embeddings) + + if self._embedding_projection is not None: + embeddings = self._embedding_projection(embeddings) + + attention_mask = self._attention_mask_layer(embeddings, mask) + + encoder_outputs = [] + x = embeddings + + # Get token routing. + token_importance = self._token_importance_embed(word_ids) + selected, not_selected = self._token_separator(token_importance) + + # For a 12-layer BERT: + # 1. All tokens fist go though 5 transformer layers, then + # 2. Only important tokens go through 1 transformer layer with cross + # attention to unimportant tokens, then + # 3. Only important tokens go through 5 transformer layers without cross + # attention. + # 4. Finally, all tokens go through the last layer. + + # Step 1. + for i, layer in enumerate(self._transformer_layers[:self._num_layers // 2 - + 1]): + x = layer([x, attention_mask], + output_range=output_range if i == self._num_layers - + 1 else None) + encoder_outputs.append(x) + + # Step 2. + # First, separate important and non-important tokens. + x_selected = tf.gather(x, selected, batch_dims=1, axis=1) + mask_selected = tf.gather(mask, selected, batch_dims=1, axis=1) + attention_mask_token_drop = self._attention_mask_layer( + x_selected, mask_selected) + + x_not_selected = tf.gather(x, not_selected, batch_dims=1, axis=1) + mask_not_selected = tf.gather(mask, not_selected, batch_dims=1, axis=1) + attention_mask_token_pass = self._attention_mask_layer( + x_selected, tf.concat([mask_selected, mask_not_selected], axis=1)) + x_all = tf.concat([x_selected, x_not_selected], axis=1) + + # Then, call transformer layer with cross attention. + x_selected = self._transformer_layers[self._num_layers // 2 - 1]( + [x_selected, x_all, attention_mask_token_pass], + output_range=output_range if self._num_layers // 2 - + 1 == self._num_layers - 1 else None) + encoder_outputs.append(x_selected) + + # Step 3. + for i, layer in enumerate(self._transformer_layers[self._num_layers // + 2:-1]): + x_selected = layer([x_selected, attention_mask_token_drop], + output_range=output_range if i == self._num_layers - 1 + else None) + encoder_outputs.append(x_selected) + + # Step 4. + # First, merge important and non-important tokens. + x_not_selected = tf.cast(x_not_selected, dtype=x_selected.dtype) + x = tf.concat([x_selected, x_not_selected], axis=1) + indices = tf.concat([selected, not_selected], axis=1) + reverse_indices = tf.argsort(indices) + x = tf.gather(x, reverse_indices, batch_dims=1, axis=1) + + # Then, call transformer layer with all tokens. + x = self._transformer_layers[-1]([x, attention_mask], + output_range=output_range) + encoder_outputs.append(x) + + last_encoder_output = encoder_outputs[-1] + first_token_tensor = last_encoder_output[:, 0, :] + pooled_output = self._pooler_layer(first_token_tensor) + + return dict( + sequence_output=encoder_outputs[-1], + pooled_output=pooled_output, + encoder_outputs=encoder_outputs) + + def record_mlm_loss(self, mlm_ids: tf.Tensor, mlm_losses: tf.Tensor): + self._token_importance_embed.update_token_importance( + token_ids=mlm_ids, importance=mlm_losses) + + def get_embedding_table(self): + return self._embedding_layer.embeddings + + def get_embedding_layer(self): + return self._embedding_layer + + def get_config(self): + return dict(self._config) + + @property + def transformer_layers(self): + """List of Transformer layers in the encoder.""" + return self._transformer_layers + + @property + def pooler_layer(self): + """The pooler dense layer after the transformer layers.""" + return self._pooler_layer + + @classmethod + def from_config(cls, config, custom_objects=None): + if 'embedding_layer' in config and config['embedding_layer'] is not None: + warn_string = ( + 'You are reloading a model that was saved with a ' + 'potentially-shared embedding layer object. If you contine to ' + 'train this model, the embedding layer will no longer be shared. ' + 'To work around this, load the model outside of the Keras API.') + print('WARNING: ' + warn_string) + logging.warn(warn_string) + + return cls(**config) diff --git a/official/projects/token_dropping/encoder_config.py b/official/projects/token_dropping/encoder_config.py new file mode 100644 index 0000000000000000000000000000000000000000..b7809d46f817c03f48d94cc7357a7df12784deac --- /dev/null +++ b/official/projects/token_dropping/encoder_config.py @@ -0,0 +1,67 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Token dropping encoder configuration and instantiation.""" +import dataclasses +from typing import Tuple +import tensorflow as tf + +from official.modeling import tf_utils +from official.modeling.hyperparams import base_config +from official.nlp.configs import encoders +from official.projects.token_dropping import encoder + + +@dataclasses.dataclass +class TokenDropBertEncoderConfig(encoders.BertEncoderConfig): + token_loss_init_value: float = 10.0 + token_loss_beta: float = 0.995 + token_keep_k: int = 256 + token_allow_list: Tuple[int, ...] = (100, 101, 102, 103) + token_deny_list: Tuple[int, ...] = (0,) + + +@base_config.bind(TokenDropBertEncoderConfig) +def get_encoder(encoder_cfg: TokenDropBertEncoderConfig): + """Instantiates 'TokenDropBertEncoder'. + + Args: + encoder_cfg: A 'TokenDropBertEncoderConfig'. + + Returns: + A 'encoder.TokenDropBertEncoder' object. + """ + return encoder.TokenDropBertEncoder( + vocab_size=encoder_cfg.vocab_size, + hidden_size=encoder_cfg.hidden_size, + num_layers=encoder_cfg.num_layers, + num_attention_heads=encoder_cfg.num_attention_heads, + intermediate_size=encoder_cfg.intermediate_size, + activation=tf_utils.get_activation(encoder_cfg.hidden_activation), + dropout_rate=encoder_cfg.dropout_rate, + attention_dropout_rate=encoder_cfg.attention_dropout_rate, + max_sequence_length=encoder_cfg.max_position_embeddings, + type_vocab_size=encoder_cfg.type_vocab_size, + initializer=tf.keras.initializers.TruncatedNormal( + stddev=encoder_cfg.initializer_range), + output_range=encoder_cfg.output_range, + embedding_width=encoder_cfg.embedding_size, + return_all_encoder_outputs=encoder_cfg.return_all_encoder_outputs, + dict_outputs=True, + norm_first=encoder_cfg.norm_first, + token_loss_init_value=encoder_cfg.token_loss_init_value, + token_loss_beta=encoder_cfg.token_loss_beta, + token_keep_k=encoder_cfg.token_keep_k, + token_allow_list=encoder_cfg.token_allow_list, + token_deny_list=encoder_cfg.token_deny_list) diff --git a/official/projects/token_dropping/encoder_test.py b/official/projects/token_dropping/encoder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..0cfcb594bf87fc180b764767d987a1fdf2d75f56 --- /dev/null +++ b/official/projects/token_dropping/encoder_test.py @@ -0,0 +1,524 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for transformer-based bert encoder network.""" + +# Import libraries +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from tensorflow.python.keras import keras_parameterized # pylint: disable=g-direct-tensorflow-import +from official.nlp.modeling.networks import bert_encoder +from official.projects.token_dropping import encoder + + +# This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It +# guarantees forward compatibility of this code for the V2 switchover. +@keras_parameterized.run_all_keras_modes +class TokenDropBertEncoderTest(keras_parameterized.TestCase): + + def tearDown(self): + super(TokenDropBertEncoderTest, self).tearDown() + tf.keras.mixed_precision.set_global_policy("float32") + + def test_dict_outputs_network_creation(self): + hidden_size = 32 + sequence_length = 21 + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + self.assertIsInstance(test_network.transformer_layers, list) + self.assertLen(test_network.transformer_layers, 3) + self.assertIsInstance(test_network.pooler_layer, tf.keras.layers.Dense) + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, data.dtype) + self.assertAllEqual(tf.float32, pooled.dtype) + + def test_dict_outputs_all_encoder_outputs_network_creation(self): + hidden_size = 32 + sequence_length = 21 + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + dict_outputs=True, + token_keep_k=sequence_length, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + all_encoder_outputs = dict_outputs["encoder_outputs"] + pooled = dict_outputs["pooled_output"] + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertLen(all_encoder_outputs, 3) + for data in all_encoder_outputs: + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, all_encoder_outputs[-1].dtype) + self.assertAllEqual(tf.float32, pooled.dtype) + + def test_dict_outputs_network_creation_with_float16_dtype(self): + hidden_size = 32 + sequence_length = 21 + tf.keras.mixed_precision.set_global_policy("mixed_float16") + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=4, + dict_outputs=True, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # If float_dtype is set to float16, the data output is float32 (from a layer + # norm) and pool output should be float16. + self.assertAllEqual(tf.float32, data.dtype) + self.assertAllEqual(tf.float16, pooled.dtype) + + @parameterized.named_parameters( + ("all_sequence_encoder", None, 21), + ("output_range_encoder", 1, 1), + ) + def test_dict_outputs_network_invocation( + self, output_range, out_seq_len): + hidden_size = 32 + sequence_length = 21 + vocab_size = 57 + num_types = 7 + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=vocab_size, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + dict_outputs=True, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids), + output_range=output_range) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + # Create a model based off of this network: + model = tf.keras.Model([word_ids, mask, type_ids], [data, pooled]) + + # Invoke the model. We can't validate the output data here (the model is too + # complex) but this will catch structural runtime errors. + batch_size = 3 + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length)) + mask_data = np.random.randint(2, size=(batch_size, sequence_length)) + type_id_data = np.random.randint( + num_types, size=(batch_size, sequence_length)) + outputs = model.predict([word_id_data, mask_data, type_id_data]) + self.assertEqual(outputs[0].shape[1], out_seq_len) + + # Creates a BertEncoder with max_sequence_length != sequence_length + max_sequence_length = 128 + test_network = encoder.TokenDropBertEncoder( + vocab_size=vocab_size, + hidden_size=hidden_size, + max_sequence_length=max_sequence_length, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + dict_outputs=True, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + model = tf.keras.Model([word_ids, mask, type_ids], [data, pooled]) + outputs = model.predict([word_id_data, mask_data, type_id_data]) + self.assertEqual(outputs[0].shape[1], sequence_length) + + # Creates a BertEncoder with embedding_width != hidden_size + test_network = encoder.TokenDropBertEncoder( + vocab_size=vocab_size, + hidden_size=hidden_size, + max_sequence_length=max_sequence_length, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + embedding_width=16, + dict_outputs=True, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + model = tf.keras.Model([word_ids, mask, type_ids], [data, pooled]) + outputs = model.predict([word_id_data, mask_data, type_id_data]) + self.assertEqual(outputs[0].shape[-1], hidden_size) + self.assertTrue(hasattr(test_network, "_embedding_projection")) + + def test_network_creation(self): + hidden_size = 32 + sequence_length = 21 + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + self.assertIsInstance(test_network.transformer_layers, list) + self.assertLen(test_network.transformer_layers, 3) + self.assertIsInstance(test_network.pooler_layer, tf.keras.layers.Dense) + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, data.dtype) + self.assertAllEqual(tf.float32, pooled.dtype) + + test_network = encoder.TokenDropBertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + inputs = dict( + input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids) + _ = test_network(inputs) + + def test_all_encoder_outputs_network_creation(self): + hidden_size = 32 + sequence_length = 21 + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + return_all_encoder_outputs=True, + token_keep_k=sequence_length, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + all_encoder_outputs = dict_outputs["encoder_outputs"] + pooled = dict_outputs["pooled_output"] + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertLen(all_encoder_outputs, 3) + for data in all_encoder_outputs: + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # The default output dtype is float32. + self.assertAllEqual(tf.float32, all_encoder_outputs[-1].dtype) + self.assertAllEqual(tf.float32, pooled.dtype) + + def test_network_creation_with_float16_dtype(self): + hidden_size = 32 + sequence_length = 21 + tf.keras.mixed_precision.set_global_policy("mixed_float16") + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=100, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=4, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + expected_data_shape = [None, sequence_length, hidden_size] + expected_pooled_shape = [None, hidden_size] + self.assertAllEqual(expected_data_shape, data.shape.as_list()) + self.assertAllEqual(expected_pooled_shape, pooled.shape.as_list()) + + # If float_dtype is set to float16, the data output is float32 (from a layer + # norm) and pool output should be float16. + self.assertAllEqual(tf.float32, data.dtype) + self.assertAllEqual(tf.float16, pooled.dtype) + + @parameterized.named_parameters( + ("all_sequence", None, 21), + ("output_range", 1, 1), + ) + def test_network_invocation(self, output_range, out_seq_len): + hidden_size = 32 + sequence_length = 21 + vocab_size = 57 + num_types = 7 + # Create a small BertEncoder for testing. + test_network = encoder.TokenDropBertEncoder( + vocab_size=vocab_size, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + # Create the inputs (note that the first dimension is implicit). + word_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + mask = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + type_ids = tf.keras.Input(shape=(sequence_length,), dtype=tf.int32) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids), + output_range=output_range) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + + # Create a model based off of this network: + model = tf.keras.Model([word_ids, mask, type_ids], [data, pooled]) + + # Invoke the model. We can't validate the output data here (the model is too + # complex) but this will catch structural runtime errors. + batch_size = 3 + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length)) + mask_data = np.random.randint(2, size=(batch_size, sequence_length)) + type_id_data = np.random.randint( + num_types, size=(batch_size, sequence_length)) + outputs = model.predict([word_id_data, mask_data, type_id_data]) + self.assertEqual(outputs[0].shape[1], out_seq_len) + + # Creates a BertEncoder with max_sequence_length != sequence_length + max_sequence_length = 128 + test_network = encoder.TokenDropBertEncoder( + vocab_size=vocab_size, + hidden_size=hidden_size, + max_sequence_length=max_sequence_length, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + model = tf.keras.Model([word_ids, mask, type_ids], [data, pooled]) + outputs = model.predict([word_id_data, mask_data, type_id_data]) + self.assertEqual(outputs[0].shape[1], sequence_length) + + # Creates a BertEncoder with embedding_width != hidden_size + test_network = encoder.TokenDropBertEncoder( + vocab_size=vocab_size, + hidden_size=hidden_size, + max_sequence_length=max_sequence_length, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + embedding_width=16, + token_keep_k=2, + token_allow_list=(), + token_deny_list=()) + dict_outputs = test_network( + dict(input_word_ids=word_ids, input_mask=mask, input_type_ids=type_ids)) + data = dict_outputs["sequence_output"] + pooled = dict_outputs["pooled_output"] + model = tf.keras.Model([word_ids, mask, type_ids], [data, pooled]) + outputs = model.predict([word_id_data, mask_data, type_id_data]) + self.assertEqual(outputs[0].shape[-1], hidden_size) + self.assertTrue(hasattr(test_network, "_embedding_projection")) + + +class TokenDropCompatibilityTest(tf.test.TestCase): + + def tearDown(self): + super().tearDown() + tf.keras.mixed_precision.set_global_policy("float32") + + def test_checkpoint_forward_compatible(self): + batch_size = 3 + + hidden_size = 32 + sequence_length = 21 + vocab_size = 57 + num_types = 7 + + kwargs = dict( + vocab_size=vocab_size, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + output_range=None) + + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length)) + mask_data = np.random.randint(2, size=(batch_size, sequence_length)) + type_id_data = np.random.randint( + num_types, size=(batch_size, sequence_length)) + data = dict( + input_word_ids=word_id_data, + input_mask=mask_data, + input_type_ids=type_id_data) + + old_net = bert_encoder.BertEncoderV2(**kwargs) + old_net_outputs = old_net(data) + ckpt = tf.train.Checkpoint(net=old_net) + path = ckpt.save(self.get_temp_dir()) + new_net = encoder.TokenDropBertEncoder( + token_keep_k=sequence_length, + token_allow_list=(), + token_deny_list=(), + **kwargs) + new_ckpt = tf.train.Checkpoint(net=new_net) + status = new_ckpt.restore(path) + status.assert_existing_objects_matched() + # assert_consumed will fail because the old model has redundant nodes. + new_net_outputs = new_net(data) + + self.assertAllEqual(old_net_outputs.keys(), new_net_outputs.keys()) + for key in old_net_outputs: + self.assertAllClose(old_net_outputs[key], new_net_outputs[key]) + + def test_keras_model_checkpoint_forward_compatible(self): + batch_size = 3 + + hidden_size = 32 + sequence_length = 21 + vocab_size = 57 + num_types = 7 + + kwargs = dict( + vocab_size=vocab_size, + hidden_size=hidden_size, + num_attention_heads=2, + num_layers=3, + type_vocab_size=num_types, + output_range=None) + + word_id_data = np.random.randint( + vocab_size, size=(batch_size, sequence_length)) + mask_data = np.random.randint(2, size=(batch_size, sequence_length)) + type_id_data = np.random.randint( + num_types, size=(batch_size, sequence_length)) + data = dict( + input_word_ids=word_id_data, + input_mask=mask_data, + input_type_ids=type_id_data) + + old_net = bert_encoder.BertEncoderV2(**kwargs) + inputs = old_net.inputs + outputs = old_net(inputs) + old_model = tf.keras.Model(inputs=inputs, outputs=outputs) + old_model_outputs = old_model(data) + ckpt = tf.train.Checkpoint(net=old_model) + path = ckpt.save(self.get_temp_dir()) + new_net = encoder.TokenDropBertEncoder( + token_keep_k=sequence_length, + token_allow_list=(), + token_deny_list=(), + **kwargs) + inputs = new_net.inputs + outputs = new_net(inputs) + new_model = tf.keras.Model(inputs=inputs, outputs=outputs) + new_ckpt = tf.train.Checkpoint(net=new_model) + new_ckpt.restore(path) + + new_model_outputs = new_model(data) + + self.assertAllEqual(old_model_outputs.keys(), new_model_outputs.keys()) + for key in old_model_outputs: + self.assertAllClose(old_model_outputs[key], new_model_outputs[key]) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/projects/token_dropping/experiment_configs.py b/official/projects/token_dropping/experiment_configs.py new file mode 100644 index 0000000000000000000000000000000000000000..3f2fd6a85b7900905bef6b0a48c58ca69c190fe6 --- /dev/null +++ b/official/projects/token_dropping/experiment_configs.py @@ -0,0 +1,72 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Token dropping BERT experiment configurations. + +Only pretraining configs. Token dropping BERT's checkpoints can be used directly +for the regular BERT. So you can just use the regular BERT for finetuning. +""" +# pylint: disable=g-doc-return-or-yield,line-too-long +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import optimization +from official.nlp.configs import bert +from official.nlp.configs import encoders +from official.nlp.data import pretrain_dataloader +from official.projects.token_dropping import encoder_config +from official.projects.token_dropping import masked_lm + + +@exp_factory.register_config_factory('token_drop_bert/pretraining') +def token_drop_bert_pretraining() -> cfg.ExperimentConfig: + """BERT pretraining with token dropping.""" + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(enable_xla=True), + task=masked_lm.TokenDropMaskedLMConfig( + model=bert.PretrainerConfig( + encoder=encoders.EncoderConfig( + any=encoder_config.TokenDropBertEncoderConfig( + vocab_size=30522, num_layers=1, token_keep_k=64), + type='any')), + train_data=pretrain_dataloader.BertPretrainDataConfig(), + validation_data=pretrain_dataloader.BertPretrainDataConfig( + is_training=False)), + trainer=cfg.TrainerConfig( + train_steps=1000000, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': + 0.01, + 'exclude_from_weight_decay': + ['LayerNorm', 'layer_norm', 'bias'], + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 1e-4, + 'end_learning_rate': 0.0, + } + }, + 'warmup': { + 'type': 'polynomial' + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config diff --git a/official/projects/token_dropping/masked_lm.py b/official/projects/token_dropping/masked_lm.py new file mode 100644 index 0000000000000000000000000000000000000000..f159a216d3724b0ceb852d5a7db66fd75c5d9456 --- /dev/null +++ b/official/projects/token_dropping/masked_lm.py @@ -0,0 +1,124 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Masked language task.""" + +import dataclasses +from typing import Tuple +import tensorflow as tf + +from official.core import task_factory +from official.nlp.tasks import masked_lm + + +@dataclasses.dataclass +class TokenDropMaskedLMConfig(masked_lm.MaskedLMConfig): + """The model config.""" + pass + + +@task_factory.register_task_cls(TokenDropMaskedLMConfig) +class TokenDropMaskedLMTask(masked_lm.MaskedLMTask): + """Task object for Mask language modeling.""" + + def build_losses(self, + labels, + model_outputs, + metrics, + aux_losses=None) -> Tuple[tf.Tensor, tf.Tensor]: + """Return the final loss, and the masked-lm loss.""" + with tf.name_scope('MaskedLMTask/losses'): + metrics = dict([(metric.name, metric) for metric in metrics]) + lm_prediction_losses = tf.keras.losses.sparse_categorical_crossentropy( + labels['masked_lm_ids'], + tf.cast(model_outputs['mlm_logits'], tf.float32), + from_logits=True) + lm_label_weights = labels['masked_lm_weights'] + lm_numerator_loss = tf.reduce_sum(lm_prediction_losses * + lm_label_weights) + lm_denominator_loss = tf.reduce_sum(lm_label_weights) + mlm_loss = tf.math.divide_no_nan(lm_numerator_loss, lm_denominator_loss) + metrics['lm_example_loss'].update_state(mlm_loss) + if 'next_sentence_labels' in labels: + sentence_labels = labels['next_sentence_labels'] + sentence_outputs = tf.cast( + model_outputs['next_sentence'], dtype=tf.float32) + sentence_loss = tf.reduce_mean( + tf.keras.losses.sparse_categorical_crossentropy( + sentence_labels, sentence_outputs, from_logits=True)) + metrics['next_sentence_loss'].update_state(sentence_loss) + total_loss = mlm_loss + sentence_loss + else: + total_loss = mlm_loss + + if aux_losses: + total_loss += tf.add_n(aux_losses) + return total_loss, lm_prediction_losses + + def train_step(self, inputs, model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, metrics): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + with tf.GradientTape() as tape: + outputs = model(inputs, training=True) + # Computes per-replica loss. + loss, lm_prediction_losses = self.build_losses( + labels=inputs, + model_outputs=outputs, + metrics=metrics, + aux_losses=model.losses) + model.encoder_network.record_mlm_loss( + mlm_ids=inputs['masked_lm_ids'], + mlm_losses=lm_prediction_losses) + if self.task_config.scale_loss: + # Scales loss as the default gradients allreduce performs sum inside the + # optimizer. + scaled_loss = loss / tf.distribute.get_strategy().num_replicas_in_sync + tvars = model.trainable_variables + if self.task_config.scale_loss: + grads = tape.gradient(scaled_loss, tvars) + else: + grads = tape.gradient(loss, tvars) + optimizer.apply_gradients(list(zip(grads, tvars))) + self.process_metrics(metrics, inputs, outputs) + return {self.loss: loss} + + def validation_step(self, inputs, model: tf.keras.Model, metrics): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + outputs = self.inference_step(inputs, model) + loss, _ = self.build_losses( + labels=inputs, + model_outputs=outputs, + metrics=metrics, + aux_losses=model.losses) + self.process_metrics(metrics, inputs, outputs) + return {self.loss: loss} diff --git a/official/projects/token_dropping/masked_lm_test.py b/official/projects/token_dropping/masked_lm_test.py new file mode 100644 index 0000000000000000000000000000000000000000..2c0ea5af948e01c3e89a25a4b14e690a1589d25b --- /dev/null +++ b/official/projects/token_dropping/masked_lm_test.py @@ -0,0 +1,63 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for official.nlp.tasks.masked_lm.""" + +import tensorflow as tf + +from official.nlp.configs import bert +from official.nlp.configs import encoders +from official.nlp.data import pretrain_dataloader +from official.projects.token_dropping import encoder_config +from official.projects.token_dropping import masked_lm + + +class MLMTaskTest(tf.test.TestCase): + + def test_task(self): + config = masked_lm.TokenDropMaskedLMConfig( + init_checkpoint=self.get_temp_dir(), + scale_loss=True, + model=bert.PretrainerConfig( + encoder=encoders.EncoderConfig( + any=encoder_config.TokenDropBertEncoderConfig( + vocab_size=30522, num_layers=1, token_keep_k=64), + type="any"), + cls_heads=[ + bert.ClsHeadConfig( + inner_dim=10, num_classes=2, name="next_sentence") + ]), + train_data=pretrain_dataloader.BertPretrainDataConfig( + input_path="dummy", + max_predictions_per_seq=20, + seq_length=128, + global_batch_size=1)) + task = masked_lm.TokenDropMaskedLMTask(config) + model = task.build_model() + metrics = task.build_metrics() + dataset = task.build_inputs(config.train_data) + + iterator = iter(dataset) + optimizer = tf.keras.optimizers.SGD(lr=0.1) + task.train_step(next(iterator), model, optimizer, metrics=metrics) + task.validation_step(next(iterator), model, metrics=metrics) + + # Saves a checkpoint. + ckpt = tf.train.Checkpoint(model=model, **model.checkpoint_items) + ckpt.save(config.init_checkpoint) + task.initialize(model) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/projects/token_dropping/train.py b/official/projects/token_dropping/train.py new file mode 100644 index 0000000000000000000000000000000000000000..e84d45f77247a6c10a8c241bfd1f31b7b53a073f --- /dev/null +++ b/official/projects/token_dropping/train.py @@ -0,0 +1,69 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""A customized training binary for running token dropping experiments.""" + +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +from official.projects.token_dropping import experiment_configs # pylint: disable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu, + **params.runtime.model_parallelism()) + + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(main) diff --git a/official/projects/token_dropping/wiki_books_pretrain.yaml b/official/projects/token_dropping/wiki_books_pretrain.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f585ca353571695f10ecac77e5d90e309d60802d --- /dev/null +++ b/official/projects/token_dropping/wiki_books_pretrain.yaml @@ -0,0 +1,48 @@ +task: + init_checkpoint: '' + model: + cls_heads: [{activation: tanh, cls_token_idx: 0, dropout_rate: 0.1, inner_dim: 768, name: next_sentence, num_classes: 2}] + train_data: + drop_remainder: true + global_batch_size: 512 + input_path: /path-to-data/wikipedia.tfrecord*,/path-to-data/books.tfrecord* + is_training: true + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: true + use_position_id: false + use_v2_feature_names: true + validation_data: + drop_remainder: false + global_batch_size: 512 + input_path: /path-to-data/wikipedia.tfrecord*,/path-to-data/books.tfrecord* + is_training: false + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: true + use_position_id: false + use_v2_feature_names: true +trainer: + checkpoint_interval: 20000 + max_to_keep: 5 + optimizer_config: + learning_rate: + polynomial: + cycle: false + decay_steps: 1000000 + end_learning_rate: 0.0 + initial_learning_rate: 0.0001 + power: 1.0 + type: polynomial + optimizer: + type: adamw + warmup: + polynomial: + power: 1 + warmup_steps: 10000 + type: polynomial + steps_per_loop: 1000 + summary_interval: 1000 + train_steps: 1000000 + validation_interval: 1000 + validation_steps: 64 diff --git a/official/projects/token_dropping/wiki_books_pretrain_sequence_pack.yaml b/official/projects/token_dropping/wiki_books_pretrain_sequence_pack.yaml new file mode 100644 index 0000000000000000000000000000000000000000..8904d57192c4aa1bb3ea1968f117d04e179ba4d9 --- /dev/null +++ b/official/projects/token_dropping/wiki_books_pretrain_sequence_pack.yaml @@ -0,0 +1,48 @@ +task: + init_checkpoint: '' + model: + cls_heads: [] + train_data: + drop_remainder: true + global_batch_size: 512 + input_path: /path-to-packed-data/wikipedia.tfrecord*,/path-to-packed-data/books.tfrecord* + is_training: true + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: false + use_position_id: false + use_v2_feature_names: true + validation_data: + drop_remainder: false + global_batch_size: 512 + input_path: /path-to-packed-data/wikipedia.tfrecord*,/path-to-packed-data/books.tfrecord* + is_training: false + max_predictions_per_seq: 76 + seq_length: 512 + use_next_sentence_label: false + use_position_id: false + use_v2_feature_names: true +trainer: + checkpoint_interval: 20000 + max_to_keep: 5 + optimizer_config: + learning_rate: + polynomial: + cycle: false + decay_steps: 1000000 + end_learning_rate: 0.0 + initial_learning_rate: 0.0001 + power: 1.0 + type: polynomial + optimizer: + type: adamw + warmup: + polynomial: + power: 1 + warmup_steps: 10000 + type: polynomial + steps_per_loop: 1000 + summary_interval: 1000 + train_steps: 1000000 + validation_interval: 1000 + validation_steps: 64 diff --git a/official/projects/triviaqa/__init__.py b/official/projects/triviaqa/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/triviaqa/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/nlp/projects/triviaqa/dataset.py b/official/projects/triviaqa/dataset.py similarity index 99% rename from official/nlp/projects/triviaqa/dataset.py rename to official/projects/triviaqa/dataset.py index 1623991266b1a2af318ff07874463f9c3047ceb6..706bbb6779bfd34fc483e9279fdaa8ba6ac215bd 100644 --- a/official/nlp/projects/triviaqa/dataset.py +++ b/official/projects/triviaqa/dataset.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,7 +23,7 @@ import six import tensorflow as tf import tensorflow_datasets.public_api as tfds -from official.nlp.projects.triviaqa import preprocess +from official.projects.triviaqa import preprocess _CITATION = """ @article{2017arXivtriviaqa, diff --git a/official/nlp/projects/triviaqa/download_and_prepare.py b/official/projects/triviaqa/download_and_prepare.py similarity index 94% rename from official/nlp/projects/triviaqa/download_and_prepare.py rename to official/projects/triviaqa/download_and_prepare.py index 98b3e4befd41d74e0f2042636ca9c97afaae2eca..1a3140c3dd840c57a081c358620998829408e123 100644 --- a/official/nlp/projects/triviaqa/download_and_prepare.py +++ b/official/projects/triviaqa/download_and_prepare.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ from absl import logging import apache_beam as beam import tensorflow_datasets as tfds -from official.nlp.projects.triviaqa import dataset # pylint: disable=unused-import +from official.projects.triviaqa import dataset # pylint: disable=unused-import flags.DEFINE_integer('sequence_length', 4096, 'Max number of tokens.') diff --git a/official/nlp/projects/triviaqa/evaluate.py b/official/projects/triviaqa/evaluate.py similarity index 92% rename from official/nlp/projects/triviaqa/evaluate.py rename to official/projects/triviaqa/evaluate.py index 2afdeacd903fa67655b77cc394de6c86bec5e7d5..6d19c58e6062c2c2abf50b517e66484bb2f36b02 100644 --- a/official/nlp/projects/triviaqa/evaluate.py +++ b/official/projects/triviaqa/evaluate.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ from absl import flags from absl import logging import tensorflow as tf -from official.nlp.projects.triviaqa import evaluation +from official.projects.triviaqa import evaluation flags.DEFINE_string('gold_path', None, 'Path to golden validation, i.e. wikipedia-dev.json.') diff --git a/official/nlp/projects/triviaqa/evaluation.py b/official/projects/triviaqa/evaluation.py similarity index 98% rename from official/nlp/projects/triviaqa/evaluation.py rename to official/projects/triviaqa/evaluation.py index fb987f4cce3656bac3cb28504d795228ab484f42..80218cab90b0a60e09513268ecf6a2edf39c08c6 100644 --- a/official/nlp/projects/triviaqa/evaluation.py +++ b/official/projects/triviaqa/evaluation.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/projects/triviaqa/inputs.py b/official/projects/triviaqa/inputs.py similarity index 99% rename from official/nlp/projects/triviaqa/inputs.py rename to official/projects/triviaqa/inputs.py index 30a2f29e746b5e8f0974bd4ffdd8a32fe434d2af..426a11541107add368ab312a14eefe932d1a7937 100644 --- a/official/nlp/projects/triviaqa/inputs.py +++ b/official/projects/triviaqa/inputs.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ import tensorflow as tf import tensorflow_datasets as tfds from official.modeling import tf_utils -from official.nlp.projects.triviaqa import dataset # pylint: disable=unused-import +from official.projects.triviaqa import dataset # pylint: disable=unused-import def _flatten_dims(tensor: tf.Tensor, diff --git a/official/nlp/projects/triviaqa/modeling.py b/official/projects/triviaqa/modeling.py similarity index 98% rename from official/nlp/projects/triviaqa/modeling.py rename to official/projects/triviaqa/modeling.py index 9a2c711352b4248b667ef5c882a5244f84f79de4..4df0f1b2b0173e79e82431ffbfcc382c88dd67ac 100644 --- a/official/nlp/projects/triviaqa/modeling.py +++ b/official/projects/triviaqa/modeling.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/projects/triviaqa/predict.py b/official/projects/triviaqa/predict.py similarity index 96% rename from official/nlp/projects/triviaqa/predict.py rename to official/projects/triviaqa/predict.py index bc4f5dad87792f8bb1c80ba4609cbbec526c03c0..16ccdb83faef10793915f0518cda890ef9f1d121 100644 --- a/official/nlp/projects/triviaqa/predict.py +++ b/official/projects/triviaqa/predict.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -27,9 +27,9 @@ import tensorflow_datasets as tfds import sentencepiece as spm from official.nlp.configs import encoders # pylint: disable=unused-import -from official.nlp.projects.triviaqa import evaluation -from official.nlp.projects.triviaqa import inputs -from official.nlp.projects.triviaqa import prediction +from official.projects.triviaqa import evaluation +from official.projects.triviaqa import inputs +from official.projects.triviaqa import prediction flags.DEFINE_string('data_dir', None, 'TensorFlow Datasets directory.') diff --git a/official/nlp/projects/triviaqa/prediction.py b/official/projects/triviaqa/prediction.py similarity index 97% rename from official/nlp/projects/triviaqa/prediction.py rename to official/projects/triviaqa/prediction.py index f9ebd729fa7698bf71af1b4b2efa3a70f79a42ad..f2c96954fabf9b07f304dcd41c08cbce33e4349c 100644 --- a/official/nlp/projects/triviaqa/prediction.py +++ b/official/projects/triviaqa/prediction.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/nlp/projects/triviaqa/preprocess.py b/official/projects/triviaqa/preprocess.py similarity index 99% rename from official/nlp/projects/triviaqa/preprocess.py rename to official/projects/triviaqa/preprocess.py index 45406a68f7724b3ec17b4356e890cef67c843574..fb16ef8a058591893f53771324cdc28bfc212ada 100644 --- a/official/nlp/projects/triviaqa/preprocess.py +++ b/official/projects/triviaqa/preprocess.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -30,8 +30,8 @@ import numpy as np import tensorflow.io.gfile as gfile import sentencepiece as spm -from official.nlp.projects.triviaqa import evaluation -from official.nlp.projects.triviaqa import sentencepiece_pb2 +from official.projects.triviaqa import evaluation +from official.projects.triviaqa import sentencepiece_pb2 @dataclasses.dataclass diff --git a/official/nlp/projects/triviaqa/sentencepiece_pb2.py b/official/projects/triviaqa/sentencepiece_pb2.py similarity index 99% rename from official/nlp/projects/triviaqa/sentencepiece_pb2.py rename to official/projects/triviaqa/sentencepiece_pb2.py index 518e907792e1dd36d222182f39f3bd49b81afb4f..080682d35d871f264d173813477b83f743b2d776 100755 --- a/official/nlp/projects/triviaqa/sentencepiece_pb2.py +++ b/official/projects/triviaqa/sentencepiece_pb2.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/triviaqa/train.py b/official/projects/triviaqa/train.py new file mode 100644 index 0000000000000000000000000000000000000000..ff84f8dc205cba624f7c17cb9bf002bcfb9aa152 --- /dev/null +++ b/official/projects/triviaqa/train.py @@ -0,0 +1,384 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TriviaQA training script.""" +import collections +import contextlib +import functools +import json +import operator +import os + +from absl import app +from absl import flags +from absl import logging +import gin +import tensorflow as tf +import tensorflow_datasets as tfds + +import sentencepiece as spm +from official.nlp import optimization as nlp_optimization +from official.nlp.configs import encoders +from official.projects.triviaqa import evaluation +from official.projects.triviaqa import inputs +from official.projects.triviaqa import modeling +from official.projects.triviaqa import prediction + +flags.DEFINE_string('data_dir', None, 'Data directory for TensorFlow Datasets.') + +flags.DEFINE_string( + 'validation_gold_path', None, + 'Path to golden validation. Usually, the wikipedia-dev.json file.') + +flags.DEFINE_string('model_dir', None, + 'Directory for checkpoints and summaries.') + +flags.DEFINE_string('model_config_path', None, + 'JSON file containing model coniguration.') + +flags.DEFINE_string('sentencepiece_model_path', None, + 'Path to sentence piece model.') + +flags.DEFINE_enum('encoder', 'bigbird', + ['bert', 'bigbird', 'albert', 'mobilebert'], + 'Which transformer encoder model to use.') + +flags.DEFINE_integer('bigbird_block_size', 64, + 'Size of blocks for sparse block attention.') + +flags.DEFINE_string('init_checkpoint_path', None, + 'Path from which to initialize weights.') + +flags.DEFINE_integer('train_sequence_length', 4096, + 'Maximum number of tokens for training.') + +flags.DEFINE_integer('train_global_sequence_length', 320, + 'Maximum number of global tokens for training.') + +flags.DEFINE_integer('validation_sequence_length', 4096, + 'Maximum number of tokens for validation.') + +flags.DEFINE_integer('validation_global_sequence_length', 320, + 'Maximum number of global tokens for validation.') + +flags.DEFINE_integer('batch_size', 32, 'Size of batch.') + +flags.DEFINE_string('master', '', 'Address of the TPU master.') + +flags.DEFINE_integer('decode_top_k', 8, + 'Maximum number of tokens to consider for begin/end.') + +flags.DEFINE_integer('decode_max_size', 16, + 'Maximum number of sentence pieces in an answer.') + +flags.DEFINE_float('dropout_rate', 0.1, 'Dropout rate for hidden layers.') + +flags.DEFINE_float('attention_dropout_rate', 0.3, + 'Dropout rate for attention layers.') + +flags.DEFINE_float('label_smoothing', 1e-1, 'Degree of label smoothing.') + +flags.DEFINE_multi_string( + 'gin_bindings', [], + 'Gin bindings to override the values set in the config files') + +FLAGS = flags.FLAGS + + +@contextlib.contextmanager +def worker_context(): + if FLAGS.master: + with tf.device('/job:worker') as d: + yield d + else: + yield + + +def read_sentencepiece_model(path): + with tf.io.gfile.GFile(path, 'rb') as file: + processor = spm.SentencePieceProcessor() + processor.LoadFromSerializedProto(file.read()) + return processor + + +# Rename old BERT v1 configuration parameters. +_MODEL_CONFIG_REPLACEMENTS = { + 'num_hidden_layers': 'num_layers', + 'attention_probs_dropout_prob': 'attention_dropout_rate', + 'hidden_dropout_prob': 'dropout_rate', + 'hidden_act': 'hidden_activation', + 'window_size': 'block_size', +} + + +def read_model_config(encoder, + path, + bigbird_block_size=None) -> encoders.EncoderConfig: + """Merges the JSON configuration into the encoder configuration.""" + with tf.io.gfile.GFile(path) as f: + model_config = json.load(f) + for key, value in _MODEL_CONFIG_REPLACEMENTS.items(): + if key in model_config: + model_config[value] = model_config.pop(key) + model_config['attention_dropout_rate'] = FLAGS.attention_dropout_rate + model_config['dropout_rate'] = FLAGS.dropout_rate + model_config['block_size'] = bigbird_block_size + encoder_config = encoders.EncoderConfig(type=encoder) + # Override the default config with those loaded from the JSON file. + encoder_config_keys = encoder_config.get().as_dict().keys() + overrides = {} + for key, value in model_config.items(): + if key in encoder_config_keys: + overrides[key] = value + else: + logging.warning('Ignoring config parameter %s=%s', key, value) + encoder_config.get().override(overrides) + return encoder_config + + +@gin.configurable(denylist=[ + 'model', + 'strategy', + 'train_dataset', + 'model_dir', + 'init_checkpoint_path', + 'evaluate_fn', +]) +def fit(model, + strategy, + train_dataset, + model_dir, + init_checkpoint_path=None, + evaluate_fn=None, + learning_rate=1e-5, + learning_rate_polynomial_decay_rate=1., + weight_decay_rate=1e-1, + num_warmup_steps=5000, + num_decay_steps=51000, + num_epochs=6): + """Train and evaluate.""" + hparams = dict( + learning_rate=learning_rate, + num_decay_steps=num_decay_steps, + num_warmup_steps=num_warmup_steps, + num_epochs=num_epochs, + weight_decay_rate=weight_decay_rate, + dropout_rate=FLAGS.dropout_rate, + attention_dropout_rate=FLAGS.attention_dropout_rate, + label_smoothing=FLAGS.label_smoothing) + logging.info(hparams) + learning_rate_schedule = nlp_optimization.WarmUp( + learning_rate, + tf.keras.optimizers.schedules.PolynomialDecay( + learning_rate, + num_decay_steps, + end_learning_rate=0., + power=learning_rate_polynomial_decay_rate), num_warmup_steps) + with strategy.scope(): + optimizer = nlp_optimization.AdamWeightDecay( + learning_rate_schedule, + weight_decay_rate=weight_decay_rate, + epsilon=1e-6, + exclude_from_weight_decay=['LayerNorm', 'layer_norm', 'bias']) + model.compile(optimizer, loss=modeling.SpanOrCrossEntropyLoss()) + + def init_fn(init_checkpoint_path): + ckpt = tf.train.Checkpoint(encoder=model.encoder) + ckpt.restore(init_checkpoint_path).assert_existing_objects_matched() + + with worker_context(): + ckpt_manager = tf.train.CheckpointManager( + tf.train.Checkpoint(model=model, optimizer=optimizer), + model_dir, + max_to_keep=None, + init_fn=(functools.partial(init_fn, init_checkpoint_path) + if init_checkpoint_path else None)) + with strategy.scope(): + ckpt_manager.restore_or_initialize() + val_summary_writer = tf.summary.create_file_writer( + os.path.join(model_dir, 'val')) + best_exact_match = 0. + for epoch in range(len(ckpt_manager.checkpoints), num_epochs): + model.fit( + train_dataset, + callbacks=[ + tf.keras.callbacks.TensorBoard(model_dir, write_graph=False), + ]) + ckpt_path = ckpt_manager.save() + if evaluate_fn is None: + continue + metrics = evaluate_fn() + logging.info('Epoch %d: %s', epoch + 1, metrics) + if best_exact_match < metrics['exact_match']: + best_exact_match = metrics['exact_match'] + model.save(os.path.join(model_dir, 'export'), include_optimizer=False) + logging.info('Exporting %s as SavedModel.', ckpt_path) + with val_summary_writer.as_default(): + for name, data in metrics.items(): + tf.summary.scalar(name, data, epoch + 1) + + +def evaluate(sp_processor, features_map_fn, labels_map_fn, logits_fn, + decode_logits_fn, split_and_pad_fn, distribute_strategy, + validation_dataset, ground_truth): + """Run evaluation.""" + loss_metric = tf.keras.metrics.Mean() + + @tf.function + def update_loss(y, logits): + loss_fn = modeling.SpanOrCrossEntropyLoss( + reduction=tf.keras.losses.Reduction.NONE) + return loss_metric(loss_fn(y, logits)) + + predictions = collections.defaultdict(list) + for _, (features, labels) in validation_dataset.enumerate(): + token_ids = features['token_ids'] + y = labels_map_fn(token_ids, labels) + x = split_and_pad_fn(features_map_fn(features)) + logits = tf.concat( + distribute_strategy.experimental_local_results(logits_fn(x)), 0) + logits = logits[:features['token_ids'].shape[0]] + update_loss(y, logits) + end_limit = token_ids.row_lengths() - 1 # inclusive + begin, end, scores = decode_logits_fn(logits, end_limit) + answers = prediction.decode_answer(features['context'], begin, end, + features['token_offsets'], + end_limit).numpy() + for _, (qid, token_id, offset, score, answer) in enumerate( + zip(features['qid'].numpy(), + tf.gather(features['token_ids'], begin, batch_dims=1).numpy(), + tf.gather(features['token_offsets'], begin, batch_dims=1).numpy(), + scores, answers)): + if not answer: + continue + if sp_processor.IdToPiece(int(token_id)).startswith('▁') and offset > 0: + answer = answer[1:] + predictions[qid.decode('utf-8')].append((score, answer.decode('utf-8'))) + predictions = { + qid: evaluation.normalize_answer( + sorted(answers, key=operator.itemgetter(0), reverse=True)[0][1]) + for qid, answers in predictions.items() + } + metrics = evaluation.evaluate_triviaqa(ground_truth, predictions, mute=True) + metrics['loss'] = loss_metric.result().numpy() + return metrics + + +def main(argv): + if len(argv) > 1: + raise app.UsageError('Too many command-line arguments.') + gin.parse_config(FLAGS.gin_bindings) + model_config = read_model_config( + FLAGS.encoder, + FLAGS.model_config_path, + bigbird_block_size=FLAGS.bigbird_block_size) + logging.info(model_config.get().as_dict()) + # Configure input processing. + sp_processor = read_sentencepiece_model(FLAGS.sentencepiece_model_path) + features_map_fn = functools.partial( + inputs.features_map_fn, + local_radius=FLAGS.bigbird_block_size, + relative_pos_max_distance=24, + use_hard_g2l_mask=True, + padding_id=sp_processor.PieceToId(''), + eos_id=sp_processor.PieceToId(''), + null_id=sp_processor.PieceToId(''), + cls_id=sp_processor.PieceToId(''), + sep_id=sp_processor.PieceToId('')) + train_features_map_fn = tf.function( + functools.partial( + features_map_fn, + sequence_length=FLAGS.train_sequence_length, + global_sequence_length=FLAGS.train_global_sequence_length), + autograph=False) + train_labels_map_fn = tf.function( + functools.partial( + inputs.labels_map_fn, sequence_length=FLAGS.train_sequence_length)) + # Connect to TPU cluster. + if FLAGS.master: + resolver = tf.distribute.cluster_resolver.TPUClusterResolver(FLAGS.master) + tf.config.experimental_connect_to_cluster(resolver) + tf.tpu.experimental.initialize_tpu_system(resolver) + strategy = tf.distribute.TPUStrategy(resolver) + else: + strategy = tf.distribute.MirroredStrategy() + # Initialize datasets. + with worker_context(): + _ = tf.random.get_global_generator() + train_dataset = inputs.read_batches( + FLAGS.data_dir, + tfds.Split.TRAIN, + FLAGS.batch_size, + shuffle=True, + drop_final_batch=True) + validation_dataset = inputs.read_batches(FLAGS.data_dir, + tfds.Split.VALIDATION, + FLAGS.batch_size) + + def train_map_fn(x, y): + features = train_features_map_fn(x) + labels = modeling.smooth_labels(FLAGS.label_smoothing, + train_labels_map_fn(x['token_ids'], y), + features['question_lengths'], + features['token_ids']) + return features, labels + + train_dataset = train_dataset.map(train_map_fn, 16).prefetch(16) + # Initialize model and compile. + with strategy.scope(): + model = modeling.TriviaQaModel(model_config, FLAGS.train_sequence_length) + logits_fn = tf.function( + functools.partial(prediction.distributed_logits_fn, model)) + decode_logits_fn = tf.function( + functools.partial(prediction.decode_logits, FLAGS.decode_top_k, + FLAGS.decode_max_size)) + split_and_pad_fn = tf.function( + functools.partial(prediction.split_and_pad, strategy, FLAGS.batch_size)) + # Evaluation strategy. + with tf.io.gfile.GFile(FLAGS.validation_gold_path) as f: + ground_truth = { + datum['QuestionId']: datum['Answer'] for datum in json.load(f)['Data'] + } + validation_features_map_fn = tf.function( + functools.partial( + features_map_fn, + sequence_length=FLAGS.validation_sequence_length, + global_sequence_length=FLAGS.validation_global_sequence_length), + autograph=False) + validation_labels_map_fn = tf.function( + functools.partial( + inputs.labels_map_fn, + sequence_length=FLAGS.validation_sequence_length)) + evaluate_fn = functools.partial( + evaluate, + sp_processor=sp_processor, + features_map_fn=validation_features_map_fn, + labels_map_fn=validation_labels_map_fn, + logits_fn=logits_fn, + decode_logits_fn=decode_logits_fn, + split_and_pad_fn=split_and_pad_fn, + distribute_strategy=strategy, + validation_dataset=validation_dataset, + ground_truth=ground_truth) + logging.info('Model initialized. Beginning training fit loop.') + fit(model, strategy, train_dataset, FLAGS.model_dir, + FLAGS.init_checkpoint_path, evaluate_fn) + + +if __name__ == '__main__': + flags.mark_flags_as_required([ + 'model_config_path', 'model_dir', 'sentencepiece_model_path', + 'validation_gold_path' + ]) + app.run(main) diff --git a/official/projects/unified_detector/README.md b/official/projects/unified_detector/README.md new file mode 100644 index 0000000000000000000000000000000000000000..31c2980d5a5f52373d77831114b9227ac6805466 --- /dev/null +++ b/official/projects/unified_detector/README.md @@ -0,0 +1,163 @@ +# Towards End-to-End Unified Scene Text Detection and Layout Analysis + +![unified detection](docs/images/task.png) + +[![UnifiedDetector](https://img.shields.io/badge/UnifiedDetector-arxiv.2203.15143-green)](https://arxiv.org/abs/2203.15143) + +Official TensorFlow 2 implementation of the paper `Towards End-to-End Unified +Scene Text Detection and Layout Analysis`. If you encounter any issues using the +code, you are welcome to submit them to the Issues tab or send emails directly +to us: `hiertext@google.com`. + +## Installation + +### Set up TensorFlow Models + +```bash +# (Optional) Create and enter a virtual environment +pip3 install --user virtualenv +virtualenv -p python3 unified_detector +source ./unified_detector/bin/activate + +# First clone the TensorFlow Models project: +git clone https://github.com/tensorflow/models.git + +# Install the requirements of TensorFlow Models and this repo: +cd models +pip3 install -r official/requirements.txt +pip3 install -r official/projects/unified_detector/requirements.txt + +# Compile the protos +# If `protoc` is not installed, please follow: https://grpc.io/docs/protoc-installation/ +export PYTHONPATH=${PYTHONPATH}:${PWD}/research/ +cd research/object_detection/ +protoc protos/string_int_label_map.proto --python_out=. +``` + +### Set up Deeplab2 + +```bash +# Clone Deeplab2 anywhere you like +cd +git clone https://github.com/google-research/deeplab2.git + +# Compile the protos +protoc deeplab2/*.proto --python_out=. + +# Add to PYTHONPATH the directory where deeplab2 sits. +export PYTHONPATH=${PYTHONPATH}:${PWD} +``` + +## Running the model on some images using the provided checkpoint. + +### Download the checkpoint + +Model | Input Resolution | #object query | line PQ (val) | paragraph PQ (val) | line PQ (test) | paragraph PQ (test) +---------------------------------------------------------------------------------------------------------------------------------- | ---------------- | ------------- | ------------- | ------------------ | -------------- | ------------------- +Unified-Detector-Line ([ckpt](https://storage.cloud.google.com/tf_model_garden/vision/unified_detector/unified_detector_ckpt.tgz)) | 1024 | 384 | 61.04 | 52.84 | 62.20 | 53.52 + +### Demo on single images + +```bash +# run from `models/` +python3 -m official.projects.unified_detector.run_inference \ +--gin_file=official/projects/unified_detector/configs/gin_files/unified_detector_model.gin \ +--ckpt_path= \ +--img_file= \ +--output_path=/demo.jsonl \ +--vis_dir= + +``` + +The output will be stored in jsonl in the same hierarchical format as required +by the evaluation script of the HierText dataset. There will also be +visualizations of the word/line/paragraph boundaries. Note that, the unified +detector produces line-level masks and an affinity matrix for grouping lines +into paragraphs. For visualization purpose, we split each line mask into pixel +groups which are defined as connected components/pixels. We visualize these +groups as `words`. They are not necessarily at the word granularity, though. We +visualize lines and paragraphs as groupings of these `words` using axis-aligned +bounding boxes. + +## Inference and Evaluation on the HierText dataset + +### Download the HierText dataset + +Clone the [HierText repo](https://github.com/google-research-datasets/hiertext) +and download the dataset. The `requirements.txt` in this folder already covers +those in the HierText repo, so there is no need to create a new virtual +environment again. + +### Inference and eval + +The following command will run the model on the validation set and compute the +score. Note that the test set annotation is not released yet, so only validation +set is used here for demo purposes. + +#### Inference + +```bash +# Run from `models/` +python3 -m official.projects.unified_detector.run_inference \ +--gin_file=official/projects/unified_detector/configs/gin_files/unified_detector_model.gin \ +--ckpt_path= \ +--img_dir= \ +--output_path=/validation_output.jsonl + +``` + +#### Evaluation + +```bash +# Run from `hiertext/` +python3 eval.py \ +--gt=gt/validation.jsonl \ +--result=/validation_output.jsonl \ +--output=./validation-score.txt \ +--mask_stride=1 \ +--eval_lines \ +--eval_paragraphs \ +--num_workers=0 + +``` + +## Train new models. + +First, you will need to convert the HierText dataset into TFrecords: + +```bash +# Run from `models/official/projects/unified_detector/data_conversion` +CUDA_VISIBLE_DEVICES='' python3 convert.py \ +--gt_file=/path/to/gt.jsonl \ +--img_dir=/path/to/image \ +--out_file=/path/to/tfrecords/file-prefix + +``` + +To train the unified detector, run the following script: + +```bash +# Run from `models/` +python3 -m official.projects.unified_detector.train \ +--mode=train \ +--experiment=unified_detector \ +--model_dir='' \ +--gin_file='official/projects/unified_detector/configs/gin_files/unified_detector_train.gin' \ +--gin_file='official/projects/unified_detector/configs/gin_files/unified_detector_model.gin' \ +--gin_params='InputFn.input_paths = ["/path/to/tfrecords/file-prefix*"]' + +``` + +## Citation + +Please cite our [paper](https://arxiv.org/pdf/2203.15143.pdf) if you find this +work helpful: + +``` +@inproceedings{long2022towards, + title={Towards End-to-End Unified Scene Text Detection and Layout Analysis}, + author={Long, Shangbang and Qin, Siyang and Panteleev, Dmitry and Bissacco, Alessandro and Fujii, Yasuhisa and Raptis, Michalis}, + booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, + year={2022} +} +``` diff --git a/official/projects/unified_detector/configs/gin_files/unified_detector_model.gin b/official/projects/unified_detector/configs/gin_files/unified_detector_model.gin new file mode 100644 index 0000000000000000000000000000000000000000..4dfcbc1d71c8d72b4b651a43066c73401e42a8fd --- /dev/null +++ b/official/projects/unified_detector/configs/gin_files/unified_detector_model.gin @@ -0,0 +1,43 @@ +# Defining the unified detector models. + +# Model +## Backbone +num_slots = 384 +SyncBatchNormalization.momentum = 0.95 + +get_max_deep_lab_backbone.num_slots = %num_slots + +## Decoder +intermediate_filters = 256 +num_entity_class = 3 # C + 1 (bkg) + 1 (void) + +_get_decoder_head.atrous_rates = (6, 12, 18) +_get_decoder_head.pixel_space_dim = 128 +_get_decoder_head.pixel_space_intermediate = %intermediate_filters +_get_decoder_head.num_classes = %num_entity_class +_get_decoder_head.aux_sem_intermediate = %intermediate_filters +_get_decoder_head.low_level = [ + {'feature_key': 'res3', 'channels_project': 64,}, + {'feature_key': 'res2', 'channels_project': 32,},] +_get_decoder_head.norm_fn = @SyncBatchNormalization +_get_embed_head.norm_fn = @LayerNorm + +# Loss +# pq loss +alpha = 0.75 +tau = 0.3 +_entity_mask_loss.alpha = %alpha +_instance_discrimination_loss.tau = %tau +_paragraph_grouping_loss.tau = %tau +_paragraph_grouping_loss.loss_mode = 'balanced' + + +# Other Model setting +UniversalDetector.mask_threshold = 0.4 +UniversalDetector.class_threshold = 0.5 +UniversalDetector.filter_area = 32 +universal_detection_loss_weights.loss_segmentation_word = 1e0 +universal_detection_loss_weights.loss_inst_dist = 1e0 +universal_detection_loss_weights.loss_mask_id = 1e-4 +universal_detection_loss_weights.loss_pq = 3e0 +universal_detection_loss_weights.loss_para = 1e0 diff --git a/official/projects/unified_detector/configs/gin_files/unified_detector_train.gin b/official/projects/unified_detector/configs/gin_files/unified_detector_train.gin new file mode 100644 index 0000000000000000000000000000000000000000..384fa4cbbafb3449aec1286281d8b30ff75580a3 --- /dev/null +++ b/official/projects/unified_detector/configs/gin_files/unified_detector_train.gin @@ -0,0 +1,22 @@ +# Defining the input pipeline of unified detector. + +# ===== ===== Model ===== ===== +# Internal import 2. +OcrTask.model_fn = @UniversalDetector + +# ===== ===== Data pipeline ===== ===== +InputFn.parser_fn = @UniDetectorParserFn +InputFn.dataset_type = 'tfrecord' +InputFn.batch_size = 256 + +# Internal import 3. + +UniDetectorParserFn.output_dimension = 1024 +# Simple data augmentation for now. +UniDetectorParserFn.rot90_probability = 0.0 +UniDetectorParserFn.use_color_distortion = True +UniDetectorParserFn.crop_min_scale = 0.5 +UniDetectorParserFn.crop_max_scale = 1.5 +UniDetectorParserFn.crop_min_aspect = 0.8 +UniDetectorParserFn.crop_max_aspect = 1.25 +UniDetectorParserFn.max_num_instance = 384 diff --git a/official/projects/unified_detector/configs/ocr_config.py b/official/projects/unified_detector/configs/ocr_config.py new file mode 100644 index 0000000000000000000000000000000000000000..7500900c16c2f6de5e175bce5524c1708f0e55a3 --- /dev/null +++ b/official/projects/unified_detector/configs/ocr_config.py @@ -0,0 +1,78 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""OCR tasks and models configurations.""" + +import dataclasses +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import optimization + + +@dataclasses.dataclass +class OcrTaskConfig(cfg.TaskConfig): + train_data: cfg.DataConfig = cfg.DataConfig() + model_call_needs_labels: bool = False + + +@exp_factory.register_config_factory('unified_detector') +def unified_detector() -> cfg.ExperimentConfig: + """Configurations for trainer of unified detector.""" + total_train_steps = 100000 + summary_interval = steps_per_loop = 200 + checkpoint_interval = 2000 + warmup_steps = 1000 + config = cfg.ExperimentConfig( + # Input pipeline and model are configured through Gin. + task=OcrTaskConfig(train_data=cfg.DataConfig(is_training=True)), + trainer=cfg.TrainerConfig( + train_steps=total_train_steps, + steps_per_loop=steps_per_loop, + summary_interval=summary_interval, + checkpoint_interval=checkpoint_interval, + max_to_keep=1, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': 0.05, + 'include_in_weight_decay': [ + '^((?!depthwise).)*(kernel|weights):0$', + ], + 'exclude_from_weight_decay': [ + '(^((?!kernel).)*:0)|(depthwise_kernel)', + ], + 'gradient_clip_norm': 10., + }, + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 1e-3, + 'decay_steps': total_train_steps - warmup_steps, + 'alpha': 1e-2, + 'offset': warmup_steps, + }, + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_learning_rate': 1e-5, + 'warmup_steps': warmup_steps, + } + }, + }), + ), + ) + return config diff --git a/official/projects/unified_detector/data_conversion/convert.py b/official/projects/unified_detector/data_conversion/convert.py new file mode 100644 index 0000000000000000000000000000000000000000..ad133cd55cab2021e42ed6d1f72db3aadbd977d7 --- /dev/null +++ b/official/projects/unified_detector/data_conversion/convert.py @@ -0,0 +1,66 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Script to convert HierText to TFExamples. + +This script is only intended to run locally. + +python3 data_preprocess/convert.py \ +--gt_file=/path/to/gt.jsonl \ +--img_dir=/path/to/image \ +--out_file=/path/to/tfrecords/file-prefix + +""" + +import json +import os +import random + +from absl import app +from absl import flags +import tensorflow as tf +import tqdm +import utils + + +_GT_FILE = flags.DEFINE_string('gt_file', None, 'Path to the GT file') +_IMG_DIR = flags.DEFINE_string('img_dir', None, 'Path to the image folder.') +_OUT_FILE = flags.DEFINE_string('out_file', None, 'Path for the tfrecords.') +_NUM_SHARD = flags.DEFINE_integer( + 'num_shard', 100, 'The number of shards of tfrecords.') + + +def main(unused_argv) -> None: + annotations = json.load(open(_GT_FILE.value))['annotations'] + random.shuffle(annotations) + n_sample = len(annotations) + n_shards = _NUM_SHARD.value + n_sample_per_shard = (n_sample - 1) // n_shards + 1 + + for shard in tqdm.tqdm(range(n_shards)): + output_path = f'{_OUT_FILE.value}-{shard:05}-{n_shards:05}.tfrecords' + annotation_subset = annotations[ + shard * n_sample_per_shard : (shard + 1) * n_sample_per_shard] + + with tf.io.TFRecordWriter(output_path) as file_writer: + for annotation in annotation_subset: + img_file_path = os.path.join(_IMG_DIR.value, + f"{annotation['image_id']}.jpg") + tfexample = utils.convert_to_tfe(img_file_path, annotation) + file_writer.write(tfexample) + + +if __name__ == '__main__': + flags.mark_flags_as_required(['gt_file', 'img_dir', 'out_file']) + app.run(main) diff --git a/official/projects/unified_detector/data_conversion/utils.py b/official/projects/unified_detector/data_conversion/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..a7fee2f56352b552da99ba70bf27a5866ca82991 --- /dev/null +++ b/official/projects/unified_detector/data_conversion/utils.py @@ -0,0 +1,182 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Utilities to convert data to TFExamples and store in TFRecords.""" + +from typing import Any, Dict, List, Tuple, Union + + +import cv2 +import numpy as np +import tensorflow as tf + + +def encode_image( + image_tensor: np.ndarray, + encoding_type: str = 'png') -> Union[np.ndarray, tf.Tensor]: + """Encode image tensor into byte string.""" + if encoding_type == 'jpg': + image_encoded = tf.image.encode_jpeg(tf.constant(image_tensor)) + elif encoding_type == 'png': + image_encoded = tf.image.encode_png(tf.constant(image_tensor)) + else: + raise ValueError('Invalid encoding type.') + if tf.executing_eagerly(): + image_encoded = image_encoded.numpy() + else: + image_encoded = image_encoded.eval() + return image_encoded + + +def int64_feature(value: Union[int, List[int]]) -> tf.train.Feature: + if not isinstance(value, list): + value = [value] + return tf.train.Feature(int64_list=tf.train.Int64List(value=value)) + + +def float_feature(value: Union[float, List[float]]) -> tf.train.Feature: + if not isinstance(value, list): + value = [value] + return tf.train.Feature(float_list=tf.train.FloatList(value=value)) + + +def bytes_feature(value: Union[Union[bytes, str], List[Union[bytes, str]]] + ) -> tf.train.Feature: + if not isinstance(value, list): + value = [value] + for i in range(len(value)): + if not isinstance(value[i], bytes): + value[i] = value[i].encode('utf-8') + return tf.train.Feature(bytes_list=tf.train.BytesList(value=value)) + + +def annotation_to_entities(annotation: Dict[str, Any]) -> List[Dict[str, Any]]: + """Flatten the annotation dict to a list of 'entities'.""" + entities = [] + for paragraph in annotation['paragraphs']: + paragraph_id = len(entities) + paragraph['type'] = 3 # 3 for paragraph + paragraph['parent_id'] = -1 + entities.append(paragraph) + + for line in paragraph['lines']: + line_id = len(entities) + line['type'] = 2 # 2 for line + line['parent_id'] = paragraph_id + entities.append(line) + + for word in line['words']: + word['type'] = 1 # 1 for word + word['parent_id'] = line_id + entities.append(word) + + return entities + + +def draw_entity_mask( + entities: List[Dict[str, Any]], + image_shape: Tuple[int, int, int]) -> np.ndarray: + """Draw entity id mask. + + Args: + entities: A list of entity objects. Should be output from + `annotation_to_entities`. + image_shape: The shape of the input image. + Returns: + A (H, W, 3) entity id mask of the same height/width as the image. Each pixel + (i, j, :) encodes the entity id of one pixel. Only word entities are + rendered. 0 for non-text pixels; word entity ids start from 1. + """ + instance_mask = np.zeros(image_shape, dtype=np.uint8) + for i, entity in enumerate(entities): + # only draw word masks + if entity['type'] != 1: + continue + vertices = np.array(entity['vertices']) + # the pixel value is actually 1 + position in entities + entity_id = i + 1 + if entity_id >= 65536: + # As entity_id is encoded in the last two channels, it should be less than + # 256**2=65536. + raise ValueError( + (f'Entity ID overflow: {entity_id}. Currently only entity_id<65536 ' + 'are supported.')) + + # use the last two channels to encode the entity id. + color = [0, entity_id // 256, entity_id % 256] + instance_mask = cv2.fillPoly(instance_mask, + [np.round(vertices).astype('int32')], color) + return instance_mask + + +def convert_to_tfe(img_file_name: str, + annotation: Dict[str, Any]) -> tf.train.Example: + """Convert the annotation dict into a TFExample.""" + + img = cv2.imread(img_file_name) + img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) + h, w, c = img.shape + encoded_img = encode_image(img) + + entities = annotation_to_entities(annotation) + masks = draw_entity_mask(entities, img.shape) + encoded_mask = encode_image(masks) + + # encode attributes + parent = [] + classes = [] + content_type = [] + text = [] + vertices = [] + + for entity in entities: + parent.append(entity['parent_id']) + classes.append(entity['type']) + # 0 for annotated; 8 for not annotated + content_type.append((0 if entity['legible'] else 8)) + text.append(entity.get('text', '')) + v = np.array(entity['vertices']) + vertices.append(','.join(str(float(n)) for n in v.reshape(-1))) + + example = tf.train.Example( + features=tf.train.Features( + feature={ + # input images + 'image/encoded': bytes_feature(encoded_img), + # image format + 'image/format': bytes_feature('png'), + # image width + 'image/width': int64_feature([w]), + # image height + 'image/height': int64_feature([h]), + # image channels + 'image/channels': int64_feature([c]), + # image key + 'image/source_id': bytes_feature(annotation['image_id']), + # HxWx3 tensors: channel 2-3 encodes the id of the word entity. + 'image/additional_channels/encoded': bytes_feature(encoded_mask), + # format of the additional channels + 'image/additional_channels/format': bytes_feature('png'), + 'image/object/parent': int64_feature(parent), + # word / line / paragraph / symbol / ... + 'image/object/classes': int64_feature(classes), + # text / handwritten / not-annotated / ... + 'image/object/content_type': int64_feature(content_type), + # string text transcription + 'image/object/text': bytes_feature(text), + # comma separated coordinates, (x,y) * n + 'image/object/vertices': bytes_feature(vertices), + })).SerializeToString() + + return example diff --git a/official/projects/unified_detector/data_loaders/autoaugment.py b/official/projects/unified_detector/data_loaders/autoaugment.py new file mode 100644 index 0000000000000000000000000000000000000000..26a4d838f47c3fa26717cb02492ecf1282c25682 --- /dev/null +++ b/official/projects/unified_detector/data_loaders/autoaugment.py @@ -0,0 +1,753 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""AutoAugment and RandAugment policies for enhanced image preprocessing. + +AutoAugment Reference: https://arxiv.org/abs/1805.09501 +RandAugment Reference: https://arxiv.org/abs/1909.13719 + +This library is adapted from: +`https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/autoaugment.py`. +Several changes are made. They are inspired by the TIMM library: +https://github.com/rwightman/pytorch-image-models/tree/master/timm/data + +Changes include: +(1) Random Erasing / Cutout is added, and separated from the random augmentation + pool (not sampled as an operation). +(2) For `posterize` and `solarize`, the arguments are changed such that the + level of corruption increases as the `magnitude` argument increases. +(3) `color`, `contrast`, `brightness`, `sharpness` are randomly enhanced or + diminished. +(4) Magnitude is randomly sampled from a normal distribution. +(5) Operations are applied with a probability. +""" + +import inspect +import math +import tensorflow as tf +import tensorflow_addons.image as tfa_image + +# This signifies the max integer that the controller RNN could predict for the +# augmentation scheme. +_MAX_LEVEL = 10. + + +def policy_v0(): + """Autoaugment policy that was used in AutoAugment Paper.""" + # Each tuple is an augmentation operation of the form + # (operation, probability, magnitude). Each element in policy is a + # sub-policy that will be applied sequentially on the image. + policy = [ + [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)], + [('Color', 0.4, 9), ('Equalize', 0.6, 3)], + [('Color', 0.4, 1), ('Rotate', 0.6, 8)], + [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], + [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], + [('Color', 0.2, 0), ('Equalize', 0.8, 8)], + [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], + [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)], + [('Color', 0.6, 1), ('Equalize', 1.0, 2)], + [('Invert', 0.4, 9), ('Rotate', 0.6, 0)], + [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)], + [('Color', 0.4, 7), ('Equalize', 0.6, 0)], + [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], + [('Solarize', 0.6, 8), ('Color', 0.6, 9)], + [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)], + [('Rotate', 1.0, 7), ('TranslateY', 0.8, 9)], + [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)], + [('ShearY', 0.8, 0), ('Color', 0.6, 4)], + [('Color', 1.0, 0), ('Rotate', 0.6, 2)], + [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], + [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], + [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)], + [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], + [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], + [('Color', 0.8, 6), ('Rotate', 0.4, 5)], + ] + return policy + + +def policy_vtest(): + """Autoaugment test policy for debugging.""" + # Each tuple is an augmentation operation of the form + # (operation, probability, magnitude). Each element in policy is a + # sub-policy that will be applied sequentially on the image. + policy = [ + [('TranslateX', 1.0, 4), ('Equalize', 1.0, 10)], + ] + return policy + + +# pylint: disable=g-long-lambda +blend = tf.function(lambda i1, i2, factor: tf.cast( + tfa_image.blend(tf.cast(i1, tf.float32), tf.cast(i2, tf.float32), factor), + tf.uint8)) +# pylint: enable=g-long-lambda + + +def random_erase(image, + prob, + min_area=0.02, + max_area=1 / 3, + min_aspect=1 / 3, + max_aspect=10 / 3, + mode='pixel'): + """The random erasing augmentations: https://arxiv.org/pdf/1708.04896.pdf. + + This augmentation is applied after image normalization. + + Args: + image: Input image after all other augmentation and normalization. It has + type tf.float32. + prob: Probability of applying the random erasing operation. + min_area: As named. + max_area: As named. + min_aspect: As named. + max_aspect: As named. + mode: How the erased area is filled. 'pixel' means white noise (uniform + dist). + + Returns: + Randomly erased image. + """ + + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + image_area = tf.cast(image_width * image_height, tf.float32) + + # Sample width, height + erase_area = tf.random.uniform([], min_area, max_area) * image_area + log_max_target_ar = tf.math.log( + tf.minimum( + tf.math.divide( + tf.math.square(tf.cast(image_width, tf.float32)), erase_area), + max_aspect)) + log_min_target_ar = tf.math.log( + tf.maximum( + tf.math.divide(erase_area, + tf.math.square(tf.cast(image_height, tf.float32))), + min_aspect)) + erase_aspect_ratio = tf.math.exp( + tf.random.uniform([], log_min_target_ar, log_max_target_ar)) + erase_h = tf.cast(tf.math.sqrt(erase_area / erase_aspect_ratio), tf.int32) + erase_w = tf.cast(tf.math.sqrt(erase_area * erase_aspect_ratio), tf.int32) + + # Sample (left, top) of the rectangle to erase + erase_left = tf.random.uniform( + shape=[], minval=0, maxval=image_width - erase_w, dtype=tf.int32) + erase_top = tf.random.uniform( + shape=[], minval=0, maxval=image_height - erase_h, dtype=tf.int32) + pad_right = image_width - erase_w - erase_left + pad_bottom = image_height - erase_h - erase_top + mask = tf.pad( + tf.zeros([erase_h, erase_w], dtype=image.dtype), + [[erase_top, pad_bottom], [erase_left, pad_right]], + constant_values=1) + mask = tf.expand_dims(mask, -1) # [H, W, 1] + if mode == 'pixel': + fill = tf.random.truncated_normal( + tf.shape(image), 0.0, 1.0, dtype=image.dtype) + else: + fill = tf.zeros(tf.shape(image), dtype=image.dtype) + + should_apply_op = tf.cast( + tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) + augmented_image = tf.cond(should_apply_op, + lambda: mask * image + (1 - mask) * fill, + lambda: image) + return augmented_image + + +def solarize(image, threshold=128): + # For each pixel in the image, select the pixel + # if the value is less than the threshold. + # Otherwise, subtract 255 from the pixel. + return tf.where(image < threshold, image, 255 - image) + + +def solarize_add(image, addition=0, threshold=128): + # For each pixel in the image less than threshold + # we add 'addition' amount to it and then clip the + # pixel value to be between 0 and 255. The value + # of 'addition' is between -128 and 128. + added_image = tf.cast(image, tf.int64) + addition + added_image = tf.cast(tf.clip_by_value(added_image, 0, 255), tf.uint8) + return tf.where(image < threshold, added_image, image) + + +def color(image, factor): + """Equivalent of PIL Color.""" + degenerate = tf.image.grayscale_to_rgb(tf.image.rgb_to_grayscale(image)) + return blend(degenerate, image, factor) + + +def contrast(image, factor): + """Equivalent of PIL Contrast.""" + degenerate = tf.image.rgb_to_grayscale(image) + # Cast before calling tf.histogram. + degenerate = tf.cast(degenerate, tf.int32) + + # Compute the grayscale histogram, then compute the mean pixel value, + # and create a constant image size of that value. Use that as the + # blending degenerate target of the original image. + hist = tf.histogram_fixed_width(degenerate, [0, 255], nbins=256) + mean = tf.reduce_sum(tf.cast(hist, tf.float32)) / 256.0 + degenerate = tf.ones_like(degenerate, dtype=tf.float32) * mean + degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) + degenerate = tf.image.grayscale_to_rgb(tf.cast(degenerate, tf.uint8)) + return blend(degenerate, image, factor) + + +def brightness(image, factor): + """Equivalent of PIL Brightness.""" + degenerate = tf.zeros_like(image) + return blend(degenerate, image, factor) + + +def posterize(image, bits): + """Equivalent of PIL Posterize. Smaller `bits` means larger degradation.""" + shift = 8 - bits + return tf.bitwise.left_shift(tf.bitwise.right_shift(image, shift), shift) + + +def rotate(image, degrees, replace): + """Rotates the image by degrees either clockwise or counterclockwise. + + Args: + image: An image Tensor of type uint8. + degrees: Float, a scalar angle in degrees to rotate all images by. If + degrees is positive the image will be rotated clockwise otherwise it will + be rotated counterclockwise. + replace: A one or three value 1D tensor to fill empty pixels caused by the + rotate operation. + + Returns: + The rotated version of image. + """ + # Convert from degrees to radians. + degrees_to_radians = math.pi / 180.0 + radians = degrees * degrees_to_radians + + # In practice, we should randomize the rotation degrees by flipping + # it negatively half the time, but that's done on 'degrees' outside + # of the function. + if isinstance(replace, list) or isinstance(replace, tuple): + replace = replace[0] + image = tfa_image.rotate(image, radians, fill_value=replace) + return image + + +def translate_x(image, pixels, replace): + """Equivalent of PIL Translate in X dimension.""" + return tfa_image.translate_xy(image, [-pixels, 0], replace) + + +def translate_y(image, pixels, replace): + """Equivalent of PIL Translate in Y dimension.""" + return tfa_image.translate_xy(image, [0, -pixels], replace) + + +def autocontrast(image): + """Implements Autocontrast function from PIL using TF ops. + + Args: + image: A 3D uint8 tensor. + + Returns: + The image after it has had autocontrast applied to it and will be of type + uint8. + """ + + def scale_channel(image): + """Scale the 2D image using the autocontrast rule.""" + # A possibly cheaper version can be done using cumsum/unique_with_counts + # over the histogram values, rather than iterating over the entire image. + # to compute mins and maxes. + lo = tf.cast(tf.reduce_min(image), tf.float32) + hi = tf.cast(tf.reduce_max(image), tf.float32) + + # Scale the image, making the lowest value 0 and the highest value 255. + def scale_values(im): + scale = 255.0 / (hi - lo) + offset = -lo * scale + im = tf.cast(im, tf.float32) * scale + offset + im = tf.clip_by_value(im, 0.0, 255.0) + return tf.cast(im, tf.uint8) + + result = tf.cond(hi > lo, lambda: scale_values(image), lambda: image) + return result + + # Assumes RGB for now. Scales each channel independently + # and then stacks the result. + s1 = scale_channel(image[:, :, 0]) + s2 = scale_channel(image[:, :, 1]) + s3 = scale_channel(image[:, :, 2]) + image = tf.stack([s1, s2, s3], 2) + return image + + +def sharpness(image, factor): + """Implements Sharpness function from PIL using TF ops.""" + orig_image = image + image = tf.cast(image, tf.float32) + # Make image 4D for conv operation. + image = tf.expand_dims(image, 0) + # SMOOTH PIL Kernel. + kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]], + dtype=tf.float32, + shape=[3, 3, 1, 1]) / 13. + # Tile across channel dimension. + kernel = tf.tile(kernel, [1, 1, 3, 1]) + strides = [1, 1, 1, 1] + with tf.device('/cpu:0'): + # Some augmentation that uses depth-wise conv will cause crashing when + # training on GPU. See (b/156242594) for details. + degenerate = tf.nn.depthwise_conv2d(image, kernel, strides, padding='VALID') + degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) + degenerate = tf.squeeze(tf.cast(degenerate, tf.uint8), [0]) + + # For the borders of the resulting image, fill in the values of the + # original image. + mask = tf.ones_like(degenerate) + padded_mask = tf.pad(mask, [[1, 1], [1, 1], [0, 0]]) + padded_degenerate = tf.pad(degenerate, [[1, 1], [1, 1], [0, 0]]) + result = tf.where(tf.equal(padded_mask, 1), padded_degenerate, orig_image) + + # Blend the final result. + return blend(result, orig_image, factor) + + +def equalize(image): + """Implements Equalize function from PIL using TF ops.""" + + def scale_channel(im, c): + """Scale the data in the channel to implement equalize.""" + im = tf.cast(im[:, :, c], tf.int32) + # Compute the histogram of the image channel. + histo = tf.histogram_fixed_width(im, [0, 255], nbins=256) + + # For the purposes of computing the step, filter out the nonzeros. + nonzero = tf.where(tf.not_equal(histo, 0)) + nonzero_histo = tf.reshape(tf.gather(histo, nonzero), [-1]) + step = (tf.reduce_sum(nonzero_histo) - nonzero_histo[-1]) // 255 + + def build_lut(histo, step): + # Compute the cumulative sum, shifting by step // 2 + # and then normalization by step. + lut = (tf.cumsum(histo) + (step // 2)) // step + # Shift lut, prepending with 0. + lut = tf.concat([[0], lut[:-1]], 0) + # Clip the counts to be in range. This is done + # in the C code for image.point. + return tf.clip_by_value(lut, 0, 255) + + # If step is zero, return the original image. Otherwise, build + # lut from the full histogram and step and then index from it. + result = tf.cond( + tf.equal(step, 0), lambda: im, + lambda: tf.gather(build_lut(histo, step), im)) + + return tf.cast(result, tf.uint8) + + # Assumes RGB for now. Scales each channel independently + # and then stacks the result. + s1 = scale_channel(image, 0) + s2 = scale_channel(image, 1) + s3 = scale_channel(image, 2) + image = tf.stack([s1, s2, s3], 2) + return image + + +def invert(image): + """Inverts the image pixels.""" + image = tf.convert_to_tensor(image) + return 255 - image + + +NAME_TO_FUNC = { + 'AutoContrast': autocontrast, + 'Equalize': equalize, + 'Invert': invert, + 'Rotate': rotate, + 'Posterize': posterize, + 'PosterizeIncreasing': posterize, + 'Solarize': solarize, + 'SolarizeIncreasing': solarize, + 'SolarizeAdd': solarize_add, + 'Color': color, + 'ColorIncreasing': color, + 'Contrast': contrast, + 'ContrastIncreasing': contrast, + 'Brightness': brightness, + 'BrightnessIncreasing': brightness, + 'Sharpness': sharpness, + 'SharpnessIncreasing': sharpness, + 'ShearX': tfa_image.shear_x, + 'ShearY': tfa_image.shear_y, + 'TranslateX': translate_x, + 'TranslateY': translate_y, + 'Cutout': tfa_image.random_cutout, + 'Hue': tf.image.adjust_hue, +} + + +def _randomly_negate_tensor(tensor): + """With 50% prob turn the tensor negative.""" + should_flip = tf.cast(tf.floor(tf.random.uniform([]) + 0.5), tf.bool) + final_tensor = tf.cond(should_flip, lambda: -tensor, lambda: tensor) + return final_tensor + + +def _rotate_level_to_arg(level): + level = (level / _MAX_LEVEL) * 30. + level = _randomly_negate_tensor(level) + return (level,) + + +def _shrink_level_to_arg(level): + """Converts level to ratio by which we shrink the image content.""" + if level == 0: + return (1.0,) # if level is zero, do not shrink the image + # Maximum shrinking ratio is 2.9. + level = 2. / (_MAX_LEVEL / level) + 0.9 + return (level,) + + +def _enhance_level_to_arg(level): + return ((level / _MAX_LEVEL) * 1.8 + 0.1,) + + +def _enhance_increasing_level_to_arg(level): + level = (level / _MAX_LEVEL) * .9 + level = 1.0 + _randomly_negate_tensor(level) + return (level,) + + +def _shear_level_to_arg(level): + level = (level / _MAX_LEVEL) * 0.3 + # Flip level to negative with 50% chance. + level = _randomly_negate_tensor(level) + return (level,) + + +def _translate_level_to_arg(level, translate_const): + level = level / _MAX_LEVEL * translate_const + # Flip level to negative with 50% chance. + level = _randomly_negate_tensor(level) + return (level,) + + +def _posterize_level_to_arg(level): + return (tf.cast(level / _MAX_LEVEL * 4, tf.uint8),) + + +def _posterize_increase_level_to_arg(level): + return (4 - _posterize_level_to_arg(level)[0],) + + +def _solarize_level_to_arg(level): + return (tf.cast(level / _MAX_LEVEL * 256, tf.uint8),) + + +def _solarize_increase_level_to_arg(level): + return (256 - _solarize_level_to_arg(level)[0],) + + +def _solarize_add_level_to_arg(level): + return (tf.cast(level / _MAX_LEVEL * 110, tf.int64),) + + +def _cutout_arg(level, cutout_size): + pad_size = tf.cast(level / _MAX_LEVEL * cutout_size, tf.int32) + return (2 * pad_size, 2 * pad_size) + + +def level_to_arg(hparams): + return { + 'AutoContrast': + lambda level: (), + 'Equalize': + lambda level: (), + 'Invert': + lambda level: (), + 'Rotate': + _rotate_level_to_arg, + 'Posterize': + _posterize_level_to_arg, + 'PosterizeIncreasing': + _posterize_increase_level_to_arg, + 'Solarize': + _solarize_level_to_arg, + 'SolarizeIncreasing': + _solarize_increase_level_to_arg, + 'SolarizeAdd': + _solarize_add_level_to_arg, + 'Color': + _enhance_level_to_arg, + 'ColorIncreasing': + _enhance_increasing_level_to_arg, + 'Contrast': + _enhance_level_to_arg, + 'ContrastIncreasing': + _enhance_increasing_level_to_arg, + 'Brightness': + _enhance_level_to_arg, + 'BrightnessIncreasing': + _enhance_increasing_level_to_arg, + 'Sharpness': + _enhance_level_to_arg, + 'SharpnessIncreasing': + _enhance_increasing_level_to_arg, + 'ShearX': + _shear_level_to_arg, + 'ShearY': + _shear_level_to_arg, + # pylint:disable=g-long-lambda + 'Cutout': + lambda level: _cutout_arg(level, hparams['cutout_const']), + # pylint:disable=g-long-lambda + 'TranslateX': + lambda level: _translate_level_to_arg(level, hparams['translate_const' + ]), + 'TranslateY': + lambda level: _translate_level_to_arg(level, hparams['translate_const' + ]), + 'Hue': + lambda level: ((level / _MAX_LEVEL) * 0.25,), + # pylint:enable=g-long-lambda + } + + +def _parse_policy_info(name, prob, level, replace_value, augmentation_hparams): + """Return the function that corresponds to `name` and update `level` param.""" + func = NAME_TO_FUNC[name] + args = level_to_arg(augmentation_hparams)[name](level) + + # Add in replace arg if it is required for the function that is being called. + # pytype:disable=wrong-arg-types + if 'replace' in inspect.signature(func).parameters.keys(): # pylint: disable=deprecated-method + args = tuple(list(args) + [replace_value]) + # pytype:enable=wrong-arg-types + + return (func, prob, args) + + +def _apply_func_with_prob(func, image, args, prob): + """Apply `func` to image w/ `args` as input with probability `prob`.""" + assert isinstance(args, tuple) + + # Apply the function with probability `prob`. + should_apply_op = tf.cast( + tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) + augmented_image = tf.cond(should_apply_op, lambda: func(image, *args), + lambda: image) + return augmented_image + + +def select_and_apply_random_policy(policies, image): + """Select a random policy from `policies` and apply it to `image`.""" + policy_to_select = tf.random.uniform([], maxval=len(policies), dtype=tf.int32) + # Note that using tf.case instead of tf.conds would result in significantly + # larger graphs and would even break export for some larger policies. + for (i, policy) in enumerate(policies): + image = tf.cond( + tf.equal(i, policy_to_select), + lambda selected_policy=policy: selected_policy(image), + lambda: image) + return image + + +def build_and_apply_nas_policy(policies, image, augmentation_hparams): + """Build a policy from the given policies passed in and apply to image. + + Args: + policies: list of lists of tuples in the form `(func, prob, level)`, `func` + is a string name of the augmentation function, `prob` is the probability + of applying the `func` operation, `level` is the input argument for + `func`. + image: tf.Tensor that the resulting policy will be applied to. + augmentation_hparams: Hparams associated with the NAS learned policy. + + Returns: + A version of image that now has data augmentation applied to it based on + the `policies` pass into the function. + """ + replace_value = [128, 128, 128] + + # func is the string name of the augmentation function, prob is the + # probability of applying the operation and level is the parameter associated + # with the tf op. + + # tf_policies are functions that take in an image and return an augmented + # image. + tf_policies = [] + for policy in policies: + tf_policy = [] + # Link string name to the correct python function and make sure the correct + # argument is passed into that function. + for policy_info in policy: + policy_info = list(policy_info) + [replace_value, augmentation_hparams] + + tf_policy.append(_parse_policy_info(*policy_info)) + # Now build the tf policy that will apply the augmentation procedue + # on image. + def make_final_policy(tf_policy_): + + def final_policy(image_): + for func, prob, args in tf_policy_: + image_ = _apply_func_with_prob(func, image_, args, prob) + return image_ + + return final_policy + + tf_policies.append(make_final_policy(tf_policy)) + + augmented_image = select_and_apply_random_policy(tf_policies, image) + return augmented_image + + +def distort_image_with_autoaugment(image, augmentation_name): + """Applies the AutoAugment policy to `image`. + + AutoAugment is from the paper: https://arxiv.org/abs/1805.09501. + + Args: + image: `Tensor` of shape [height, width, 3] representing an image. + augmentation_name: The name of the AutoAugment policy to use. The available + options are `v0` and `test`. `v0` is the policy used for all of the + results in the paper and was found to achieve the best results on the COCO + dataset. `v1`, `v2` and `v3` are additional good policies found on the + COCO dataset that have slight variation in what operations were used + during the search procedure along with how many operations are applied in + parallel to a single image (2 vs 3). + + Returns: + A tuple containing the augmented versions of `image`. + """ + available_policies = {'v0': policy_v0, 'test': policy_vtest} + if augmentation_name not in available_policies: + raise ValueError('Invalid augmentation_name: {}'.format(augmentation_name)) + + policy = available_policies[augmentation_name]() + # Hparams that will be used for AutoAugment. + augmentation_hparams = dict(cutout_const=100, translate_const=250) + + return build_and_apply_nas_policy(policy, image, augmentation_hparams) + + +# Cutout is implemented separately. +_RAND_TRANSFORMS = [ + 'AutoContrast', + 'Equalize', + 'Invert', + 'Rotate', + 'Posterize', + 'Solarize', + 'Color', + 'Contrast', + 'Brightness', + 'Sharpness', + 'ShearX', + 'ShearY', + 'TranslateX', + 'TranslateY', + 'SolarizeAdd', + 'Hue', +] + +# Cutout is implemented separately. +_RAND_INCREASING_TRANSFORMS = [ + 'AutoContrast', + 'Equalize', + 'Invert', + 'Rotate', + 'PosterizeIncreasing', + 'SolarizeIncreasing', + 'SolarizeAdd', + 'ColorIncreasing', + 'ContrastIncreasing', + 'BrightnessIncreasing', + 'SharpnessIncreasing', + 'ShearX', + 'ShearY', + 'TranslateX', + 'TranslateY', + 'Hue', +] + +# These augmentations are not suitable for detection task. +_NON_COLOR_DISTORTION_OPS = [ + 'Rotate', + 'ShearX', + 'ShearY', + 'TranslateX', + 'TranslateY', +] + + +def distort_image_with_randaugment(image, + num_layers, + magnitude, + mag_std, + inc, + prob, + color_only=False): + """Applies the RandAugment policy to `image`. + + RandAugment is from the paper https://arxiv.org/abs/1909.13719, + + Args: + image: `Tensor` of shape [height, width, 3] representing an image. The image + should have uint8 type in [0, 255]. + num_layers: Integer, the number of augmentation transformations to apply + sequentially to an image. Represented as (N) in the paper. Usually best + values will be in the range [1, 3]. + magnitude: Integer, shared magnitude across all augmentation operations. + Represented as (M) in the paper. Usually best values are in the range [5, + 30]. + mag_std: Randomness of magnitude. The magnitude will be sampled from a + normal distribution on the fly. + inc: Whether to select aug that increases as magnitude increases. + prob: Probability of any aug being applied. + color_only: Whether only apply operations that distort color and do not + change spatial layouts. + + Returns: + The augmented version of `image`. + """ + replace_value = [128] * 3 + augmentation_hparams = dict(cutout_const=40, translate_const=100) + available_ops = _RAND_INCREASING_TRANSFORMS if inc else _RAND_TRANSFORMS + if color_only: + available_ops = list( + filter(lambda op: op not in _NON_COLOR_DISTORTION_OPS, available_ops)) + + for layer_num in range(num_layers): + op_to_select = tf.random.uniform([], + maxval=len(available_ops), + dtype=tf.int32) + random_magnitude = tf.clip_by_value( + tf.random.normal([], magnitude, mag_std), 0., _MAX_LEVEL) + with tf.name_scope('randaug_layer_{}'.format(layer_num)): + for (i, op_name) in enumerate(available_ops): + func, _, args = _parse_policy_info(op_name, prob, random_magnitude, + replace_value, augmentation_hparams) + image = tf.cond( + tf.equal(i, op_to_select), + # pylint:disable=g-long-lambda + lambda s_func=func, s_args=args: _apply_func_with_prob( + s_func, image, s_args, prob), + # pylint:enable=g-long-lambda + lambda: image) + return image diff --git a/official/projects/unified_detector/data_loaders/input_reader.py b/official/projects/unified_detector/data_loaders/input_reader.py new file mode 100644 index 0000000000000000000000000000000000000000..a850f9ca30b44f53a298d5bde00e5c02629fe11a --- /dev/null +++ b/official/projects/unified_detector/data_loaders/input_reader.py @@ -0,0 +1,270 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Input data reader. + +Creates a tf.data.Dataset object from multiple input sstables and use a +provided data parser function to decode the serialized tf.Example and optionally +run data augmentation. +""" + +import os +from typing import Any, Callable, List, Optional, Sequence, Union + +import gin +from six.moves import map +import tensorflow as tf + +from official.common import dataset_fn +from research.object_detection.utils import label_map_util +from official.core import config_definitions as cfg +from official.projects.unified_detector.data_loaders import universal_detection_parser # pylint: disable=unused-import + +FuncType = Callable[..., Any] + + +@gin.configurable(denylist=['is_training']) +class InputFn(object): + """Input data reader class. + + Creates a tf.data.Dataset object from multiple datasets (optionally performs + weighted sampling between different datasets), parses the tf.Example message + using `parser_fn`. The datasets can either be stored in SSTable or TfRecord. + """ + + def __init__(self, + is_training: bool, + batch_size: Optional[int] = None, + data_root: str = '', + input_paths: List[str] = gin.REQUIRED, + dataset_type: str = 'tfrecord', + use_sampling: bool = False, + sampling_weights: Optional[Sequence[Union[int, float]]] = None, + cycle_length: Optional[int] = 64, + shuffle_buffer_size: Optional[int] = 512, + parser_fn: Optional[FuncType] = None, + parser_num_parallel_calls: Optional[int] = 64, + max_intra_op_parallelism: Optional[int] = None, + label_map_proto_path: Optional[str] = None, + input_filter_fns: Optional[List[FuncType]] = None, + input_training_filter_fns: Optional[Sequence[FuncType]] = None, + dense_to_ragged_batch: bool = False, + data_validator_fn: Optional[Callable[[Sequence[str]], + None]] = None): + """Input reader constructor. + + Args: + is_training: Boolean indicating TRAIN or EVAL. + batch_size: Input data batch size. Ignored if batch size is passed through + params. In that case, this can be None. + data_root: All the relative input paths are based on this location. + input_paths: Input file patterns. + dataset_type: Can be 'sstable' or 'tfrecord'. + use_sampling: Whether to perform weighted sampling between different + datasets. + sampling_weights: Unnormalized sampling weights. The length should be + equal to `input_paths`. + cycle_length: The number of input Datasets to interleave from in parallel. + If set to None tf.data experimental autotuning is used. + shuffle_buffer_size: The random shuffle buffer size. + parser_fn: The function to run decoding and data augmentation. The + function takes `is_training` as an input, which is passed from here. + parser_num_parallel_calls: The number of parallel calls for `parser_fn`. + The number of CPU cores is the suggested value. If set to None tf.data + experimental autotuning is used. + max_intra_op_parallelism: if set limits the max intra op parallelism of + functions run on slices of the input. + label_map_proto_path: Path to a StringIntLabelMap which will be used to + decode the input data. + input_filter_fns: A list of functions on the dataset points which returns + true for valid data. + input_training_filter_fns: A list of functions on the dataset points which + returns true for valid data used only for training. + dense_to_ragged_batch: Whether to use ragged batching for MPNN format. + data_validator_fn: If not None, used to validate the data specified by + input_paths. + + Raises: + ValueError for invalid input_paths. + """ + self._is_training = is_training + + if data_root: + # If an input path is absolute this does not change it. + input_paths = [os.path.join(data_root, value) for value in input_paths] + + self._input_paths = input_paths + # Disables datasets sampling during eval. + self._batch_size = batch_size + if is_training: + self._use_sampling = use_sampling + else: + self._use_sampling = False + self._sampling_weights = sampling_weights + self._cycle_length = (cycle_length if cycle_length else tf.data.AUTOTUNE) + self._shuffle_buffer_size = shuffle_buffer_size + self._parser_num_parallel_calls = ( + parser_num_parallel_calls + if parser_num_parallel_calls else tf.data.AUTOTUNE) + self._max_intra_op_parallelism = max_intra_op_parallelism + self._label_map_proto_path = label_map_proto_path + if label_map_proto_path: + name_to_id = label_map_util.get_label_map_dict(label_map_proto_path) + self._lookup_str_keys = list(name_to_id.keys()) + self._lookup_int_values = list(name_to_id.values()) + self._parser_fn = parser_fn + self._input_filter_fns = input_filter_fns or [] + if is_training and input_training_filter_fns: + self._input_filter_fns.extend(input_training_filter_fns) + self._dataset_type = dataset_type + self._dense_to_ragged_batch = dense_to_ragged_batch + + if data_validator_fn is not None: + data_validator_fn(self._input_paths) + + @property + def batch_size(self): + return self._batch_size + + def __call__( + self, + params: cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None + ) -> tf.data.Dataset: + """Read and parse input datasets, return a tf.data.Dataset object.""" + # TPUEstimator passes the batch size through params. + if params is not None and 'batch_size' in params: + batch_size = params['batch_size'] + else: + batch_size = self._batch_size + + per_replica_batch_size = input_context.get_per_replica_batch_size( + batch_size) if input_context else batch_size + + with tf.name_scope('input_reader'): + dataset = self._build_dataset_from_records() + dataset_parser_fn = self._build_dataset_parser_fn() + + dataset = dataset.map( + dataset_parser_fn, num_parallel_calls=self._parser_num_parallel_calls) + for filter_fn in self._input_filter_fns: + dataset = dataset.filter(filter_fn) + + if self._dense_to_ragged_batch: + dataset = dataset.apply( + tf.data.experimental.dense_to_ragged_batch( + batch_size=per_replica_batch_size, drop_remainder=True)) + else: + dataset = dataset.batch(per_replica_batch_size, drop_remainder=True) + dataset = dataset.prefetch(tf.data.AUTOTUNE) + + return dataset + + def _fetch_dataset(self, filename: str) -> tf.data.Dataset: + """Fetch dataset depending on type. + + Args: + filename: Location of dataset. + + Returns: + Tf Dataset. + """ + + data_cls = dataset_fn.pick_dataset_fn(self._dataset_type) + + data = data_cls([filename]) + return data + + def _build_dataset_parser_fn(self) -> Callable[..., tf.Tensor]: + """Depending on label_map and storage type, build a parser_fn.""" + # Parse the fetched records to input tensors for model function. + if self._label_map_proto_path: + lookup_initializer = tf.lookup.KeyValueTensorInitializer( + keys=tf.constant(self._lookup_str_keys, dtype=tf.string), + values=tf.constant(self._lookup_int_values, dtype=tf.int32)) + name_to_id_table = tf.lookup.StaticHashTable( + initializer=lookup_initializer, default_value=0) + parser_fn = self._parser_fn( + is_training=self._is_training, label_lookup_table=name_to_id_table) + else: + parser_fn = self._parser_fn(is_training=self._is_training) + + return parser_fn + + def _build_dataset_from_records(self) -> tf.data.Dataset: + """Build a tf.data.Dataset object from input SSTables. + + If the input data come from multiple SSTables, use the user defined sampling + weights to perform sampling. For example, if the sampling weights is + [1., 2.], the second dataset will be sampled twice more often than the first + one. + + Returns: + Dataset built from SSTables. + Raises: + ValueError for inability to find SSTable files. + """ + all_file_patterns = [] + if self._use_sampling: + for file_pattern in self._input_paths: + all_file_patterns.append([file_pattern]) + # Normalize sampling probabilities. + total_weight = sum(self._sampling_weights) + sampling_probabilities = [ + float(w) / total_weight for w in self._sampling_weights + ] + else: + all_file_patterns.append(self._input_paths) + + datasets = [] + for file_pattern in all_file_patterns: + filenames = sum(list(map(tf.io.gfile.glob, file_pattern)), []) + if not filenames: + raise ValueError( + f'Error trying to read input files for file pattern {file_pattern}') + # Create a dataset of filenames and shuffle the files. In each epoch, + # the file order is shuffled again. This may help if + # per_host_input_for_training = false on TPU. + dataset = tf.data.Dataset.list_files( + file_pattern, shuffle=self._is_training) + + if self._is_training: + dataset = dataset.repeat() + + if self._max_intra_op_parallelism: + # Disable intra-op parallelism to optimize for throughput instead of + # latency. + options = tf.data.Options() + options.experimental_threading.max_intra_op_parallelism = 1 + dataset = dataset.with_options(options) + + dataset = dataset.interleave( + self._fetch_dataset, + cycle_length=self._cycle_length, + num_parallel_calls=self._cycle_length, + deterministic=(not self._is_training)) + + if self._is_training: + dataset = dataset.shuffle(self._shuffle_buffer_size) + + datasets.append(dataset) + + if self._use_sampling: + assert len(datasets) == len(sampling_probabilities) + dataset = tf.data.experimental.sample_from_datasets( + datasets, sampling_probabilities) + else: + dataset = datasets[0] + + return dataset diff --git a/official/projects/unified_detector/data_loaders/tf_example_decoder.py b/official/projects/unified_detector/data_loaders/tf_example_decoder.py new file mode 100644 index 0000000000000000000000000000000000000000..1057cfd4d6c2df8ca7bd2e24069959f7e8108112 --- /dev/null +++ b/official/projects/unified_detector/data_loaders/tf_example_decoder.py @@ -0,0 +1,320 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tensorflow Example proto decoder for GOCR.""" + +from typing import List, Optional, Sequence, Tuple, Union + +import tensorflow as tf +from official.projects.unified_detector.utils.typing import TensorDict +from official.vision.dataloaders import decoder + + +class TfExampleDecoder(decoder.Decoder): + """Tensorflow Example proto decoder.""" + + def __init__(self, + use_instance_mask: bool = False, + additional_class_names: Optional[Sequence[str]] = None, + additional_regression_names: Optional[Sequence[str]] = None, + num_additional_channels: int = 0): + """Constructor. + + keys_to_features is a dictionary mapping the names of the tf.Example + fields to tf features, possibly with defaults. + + Uses fixed length for scalars and variable length for vectors. + + Args: + use_instance_mask: if False, prevents decoding of the instance mask, which + can take a lot of resources. + additional_class_names: If not none, a list of additional class names. For + additional class name n, named image/object/${n} are expected to be an + int vector of length one, and are mapped to tensor dict key + groundtruth_${n}. + additional_regression_names: If not none, a list of additional regression + output names. For additional class name n, named image/object/${n} are + expected to be a float vector, and are mapped to tensor dict key + groundtruth_${n}. + num_additional_channels: The number of additional channels of information + present in the tf.Example proto. + """ + self._num_additional_channels = num_additional_channels + self._use_instance_mask = use_instance_mask + + self.keys_to_features = {} + # Map names in the final tensor dict (output of `self.decode()`) to names in + # tf examples, e.g. 'groundtruth_text' -> 'image/object/text' + self.name_to_key = {} + + if use_instance_mask: + self.keys_to_features.update({ + 'image/object/mask': tf.io.VarLenFeature(tf.string), + }) + + # Now we have lists of standard types. + # To add new features, just add entries here. + # The tuple elements are (example name, tensor name, default value). + # If the items_to_handlers part is already set up use None for + # the tensor name. + # There are other tensor names listed as None which we probably + # want to discuss and specify. + scalar_strings = [ + ('image/encoded', None, ''), + ('image/format', None, 'jpg'), + ('image/additional_channels/encoded', None, ''), + ('image/additional_channels/format', None, 'png'), + ('image/label_type', 'label_type', ''), + ('image/key', 'key', ''), + ('image/source_id', 'source_id', ''), + ] + vector_strings = [ + ('image/attributes', None, ''), + ('image/object/text', 'groundtruth_text', ''), + ('image/object/encoded_text', 'groundtruth_encoded_text', ''), + ('image/object/vertices', 'groundtruth_vertices', ''), + ('image/object/object_type', None, ''), + ('image/object/language', 'language', ''), + ('image/object/reorderer_type', None, ''), + ('image/label_map_path', 'label_map_path', '') + ] + scalar_ints = [ + ('image/height', None, 1), + ('image/width', None, 1), + ('image/channels', None, 3), + ] + vector_ints = [ + ('image/object/classes', 'groundtruth_classes', 0), + ('image/object/frame_id', 'frame_id', 0), + ('image/object/track_id', 'track_id', 0), + ('image/object/content_type', 'groundtruth_content_type', 0), + ] + if additional_class_names: + vector_ints += [('image/object/%s' % name, 'groundtruth_%s' % name, 0) + for name in additional_class_names] + # This one is not yet needed: + # scalar_floats = [ + # ] + vector_floats = [ + ('image/object/weight', 'groundtruth_weight', 0), + ('image/object/rbox_tl_x', None, 0), + ('image/object/rbox_tl_y', None, 0), + ('image/object/rbox_width', None, 0), + ('image/object/rbox_height', None, 0), + ('image/object/rbox_angle', None, 0), + ('image/object/bbox/xmin', None, 0), + ('image/object/bbox/xmax', None, 0), + ('image/object/bbox/ymin', None, 0), + ('image/object/bbox/ymax', None, 0), + ] + if additional_regression_names: + vector_floats += [('image/object/%s' % name, 'groundtruth_%s' % name, 0) + for name in additional_regression_names] + + self._init_scalar_features(scalar_strings, tf.string) + self._init_vector_features(vector_strings, tf.string) + self._init_scalar_features(scalar_ints, tf.int64) + self._init_vector_features(vector_ints, tf.int64) + self._init_vector_features(vector_floats, tf.float32) + + def _init_scalar_features( + self, + feature_list: List[Tuple[str, Optional[str], Union[str, int, float]]], + ftype: tf.dtypes.DType) -> None: + for entry in feature_list: + self.keys_to_features[entry[0]] = tf.io.FixedLenFeature( + (), ftype, default_value=entry[2]) + if entry[1] is not None: + self.name_to_key[entry[1]] = entry[0] + + def _init_vector_features( + self, + feature_list: List[Tuple[str, Optional[str], Union[str, int, float]]], + ftype: tf.dtypes.DType) -> None: + for entry in feature_list: + self.keys_to_features[entry[0]] = tf.io.VarLenFeature(ftype) + if entry[1] is not None: + self.name_to_key[entry[1]] = entry[0] + + def _decode_png_instance_masks(self, keys_to_tensors: TensorDict)-> tf.Tensor: + """Decode PNG instance segmentation masks and stack into dense tensor. + + The instance segmentation masks are reshaped to [num_instances, height, + width]. + + Args: + keys_to_tensors: A dictionary from keys to tensors. + + Returns: + A 3-D float tensor of shape [num_instances, height, width] with values + in {0, 1}. + """ + + def decode_png_mask(image_buffer): + image = tf.squeeze( + tf.image.decode_image(image_buffer, channels=1), axis=2) + image.set_shape([None, None]) + image = tf.to_float(tf.greater(image, 0)) + return image + + png_masks = keys_to_tensors['image/object/mask'] + height = keys_to_tensors['image/height'] + width = keys_to_tensors['image/width'] + if isinstance(png_masks, tf.SparseTensor): + png_masks = tf.sparse_tensor_to_dense(png_masks, default_value='') + return tf.cond( + tf.greater(tf.size(png_masks), 0), + lambda: tf.map_fn(decode_png_mask, png_masks, dtype=tf.float32), + lambda: tf.zeros(tf.to_int32(tf.stack([0, height, width])))) + + def _decode_image(self, + parsed_tensors: TensorDict, + channel: int = 3) -> TensorDict: + """Decodes the image and set its shape (H, W are dynamic; C is fixed).""" + image = tf.io.decode_image(parsed_tensors['image/encoded'], + channels=channel) + image.set_shape([None, None, channel]) + return {'image': image} + + def _decode_additional_channels(self, + parsed_tensors: TensorDict, + channel: int = 3) -> TensorDict: + """Decodes the additional channels and set its static shape.""" + channels = tf.io.decode_image( + parsed_tensors['image/additional_channels/encoded'], channels=channel) + channels.set_shape([None, None, channel]) + return {'additional_channels': channels} + + def _decode_boxes(self, parsed_tensors: TensorDict) -> TensorDict: + """Concat box coordinates in the format of [ymin, xmin, ymax, xmax].""" + xmin = parsed_tensors['image/object/bbox/xmin'] + xmax = parsed_tensors['image/object/bbox/xmax'] + ymin = parsed_tensors['image/object/bbox/ymin'] + ymax = parsed_tensors['image/object/bbox/ymax'] + return { + 'groundtruth_aligned_boxes': tf.stack([ymin, xmin, ymax, xmax], axis=-1) + } + + def _decode_rboxes(self, parsed_tensors: TensorDict) -> TensorDict: + """Concat rbox coordinates: [left, top, box_width, box_height, angle].""" + top_left_x = parsed_tensors['image/object/rbox_tl_x'] + top_left_y = parsed_tensors['image/object/rbox_tl_y'] + width = parsed_tensors['image/object/rbox_width'] + height = parsed_tensors['image/object/rbox_height'] + angle = parsed_tensors['image/object/rbox_angle'] + return { + 'groundtruth_boxes': + tf.stack([top_left_x, top_left_y, width, height, angle], axis=-1) + } + + def _decode_masks(self, parsed_tensors: TensorDict) -> TensorDict: + """Decode a set of PNG masks to the tf.float32 tensors.""" + + def _decode_png_mask(png_bytes): + mask = tf.squeeze( + tf.io.decode_png(png_bytes, channels=1, dtype=tf.uint8), axis=-1) + mask = tf.cast(mask, dtype=tf.float32) + mask.set_shape([None, None]) + return mask + + height = parsed_tensors['image/height'] + width = parsed_tensors['image/width'] + masks = parsed_tensors['image/object/mask'] + masks = tf.cond( + pred=tf.greater(tf.size(input=masks), 0), + true_fn=lambda: tf.map_fn(_decode_png_mask, masks, dtype=tf.float32), + false_fn=lambda: tf.zeros([0, height, width], dtype=tf.float32)) + return {'groundtruth_instance_masks': masks} + + def decode(self, tf_example_string_tensor: tf.string): + """Decodes serialized tensorflow example and returns a tensor dictionary. + + Args: + tf_example_string_tensor: A string tensor holding a serialized tensorflow + example proto. + + Returns: + A dictionary contains a subset of the following, depends on the inputs: + image: A uint8 tensor of shape [height, width, 3] containing the image. + source_id: A string tensor contains image fingerprint. + key: A string tensor contains the unique sha256 hash key. + label_type: Either `full` or `partial`. `full` means all the text are + fully labeled, `partial` otherwise. Currently, this is used by E2E + model. If an input image is fully labeled, we update the weights of + both the detection and the recognizer. Otherwise, only recognizer part + of the model is trained. + groundtruth_text: A string tensor list, the original transcriptions. + groundtruth_encoded_text: A string tensor list, the class ids for the + atoms in the text, after applying the reordering algorithm, in string + form. For example "90,71,85,69,86,85,93,90,71,91,1,71,85,93,90,71". + This depends on the class label map provided to the conversion + program. These are 0 based, with -1 for OOV symbols. + groundtruth_classes: A int32 tensor of shape [num_boxes] contains the + class id. Note this is 1 based, 0 is reserved for background class. + groundtruth_content_type: A int32 tensor of shape [num_boxes] contains + the content type. Values correspond to PageLayoutEntity::ContentType. + groundtruth_weight: A int32 tensor of shape [num_boxes], either 0 or 1. + If a region has weight 0, it will be ignored when computing the + losses. + groundtruth_boxes: A float tensor of shape [num_boxes, 5] contains the + groundtruth rotated rectangles. Each row is in [left, top, box_width, + box_height, angle] order, absolute coordinates are used. + groundtruth_aligned_boxes: A float tensor of shape [num_boxes, 4] + contains the groundtruth axis-aligned rectangles. Each row is in + [ymin, xmin, ymax, xmax] order. Currently, this is used to store + groundtruth symbol boxes. + groundtruth_vertices: A string tensor list contains encoded normalized + box or polygon coordinates. E.g. `x1,y1,x2,y2,x3,y3,x4,y4`. + groundtruth_instance_masks: A float tensor of shape [num_boxes, height, + width] contains binarized image sized instance segmentation masks. + `1.0` for positive region, `0.0` otherwise. None if not in tfe. + frame_id: A int32 tensor of shape [num_boxes], either `0` or `1`. + `0` means object comes from first image, `1` means second. + track_id: A int32 tensor of shape [num_boxes], where value indicates + identity across frame indices. + additional_channels: A uint8 tensor of shape [H, W, C] representing some + features. + """ + parsed_tensors = tf.io.parse_single_example( + serialized=tf_example_string_tensor, features=self.keys_to_features) + for k in parsed_tensors: + if isinstance(parsed_tensors[k], tf.SparseTensor): + if parsed_tensors[k].dtype == tf.string: + parsed_tensors[k] = tf.sparse.to_dense( + parsed_tensors[k], default_value='') + else: + parsed_tensors[k] = tf.sparse.to_dense( + parsed_tensors[k], default_value=0) + + decoded_tensors = {} + decoded_tensors.update(self._decode_image(parsed_tensors)) + decoded_tensors.update(self._decode_rboxes(parsed_tensors)) + decoded_tensors.update(self._decode_boxes(parsed_tensors)) + if self._use_instance_mask: + decoded_tensors[ + 'groundtruth_instance_masks'] = self._decode_png_instance_masks( + parsed_tensors) + if self._num_additional_channels: + decoded_tensors.update(self._decode_additional_channels( + parsed_tensors, self._num_additional_channels)) + + # other attributes: + for key in self.name_to_key: + if key not in decoded_tensors: + decoded_tensors[key] = parsed_tensors[self.name_to_key[key]] + + if 'groundtruth_instance_masks' not in decoded_tensors: + decoded_tensors['groundtruth_instance_masks'] = None + + return decoded_tensors diff --git a/official/projects/unified_detector/data_loaders/universal_detection_parser.py b/official/projects/unified_detector/data_loaders/universal_detection_parser.py new file mode 100644 index 0000000000000000000000000000000000000000..7a4011360656a68c91d9e723f486975f4a1f0d4c --- /dev/null +++ b/official/projects/unified_detector/data_loaders/universal_detection_parser.py @@ -0,0 +1,606 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Data parser for universal detector.""" + +import enum +import functools +from typing import Any, Tuple + +import gin +import tensorflow as tf + +from official.projects.unified_detector.data_loaders import autoaugment +from official.projects.unified_detector.data_loaders import tf_example_decoder +from official.projects.unified_detector.utils import utilities +from official.projects.unified_detector.utils.typing import NestedTensorDict +from official.projects.unified_detector.utils.typing import TensorDict + + +@gin.constants_from_enum +class DetectionClass(enum.IntEnum): + """As in `PageLayoutEntity.EntityType`.""" + WORD = 0 + LINE = 2 + PARAGRAPH = 3 + BLOCK = 4 + + +NOT_ANNOTATED_ID = 8 + + +def _erase(mask: tf.Tensor, + feature: tf.Tensor, + min_val: float = 0., + max_val: float = 256.) -> tf.Tensor: + """Erase the feature maps with a mask. + + Erase feature maps with a mask and replace the erased area with uniform random + noise. The mask can have different size from the feature maps. + + Args: + mask: an (h, w) binay mask for pixels to erase with. Value 1 represents + pixels to erase. + feature: the (H, W, C) feature maps to erase from. + min_val: The minimum value of random noise. + max_val: The maximum value of random noise. + + Returns: + The (H, W, C) feature maps, with pixels in mask replaced with noises. It's + equal to mask * noise + (1 - mask) * feature. + """ + h, w, c = utilities.resolve_shape(feature) + resized_mask = tf.image.resize( + tf.tile(tf.expand_dims(tf.cast(mask, tf.float32), -1), (1, 1, c)), (h, w)) + erased = tf.where( + condition=(resized_mask > 0.5), + x=tf.cast(tf.random.uniform((h, w, c), min_val, max_val), feature.dtype), + y=feature) + return erased + + +@gin.configurable(denylist=['is_training']) +class UniDetectorParserFn(object): + """Data parser for universal detector.""" + + def __init__( + self, + is_training: bool, + output_dimension: int = 1025, + mask_dimension: int = -1, + max_num_instance: int = 128, + rot90_probability: float = 0.5, + use_color_distortion: bool = True, + randaug_mag: float = 5., + randaug_std: float = 0.5, + randaug_layer: int = 2, + randaug_prob: float = 0.5, + use_cropping: bool = True, + crop_min_scale: float = 0.5, + crop_max_scale: float = 1.5, + crop_min_aspect: float = 4 / 5, + crop_max_aspect: float = 5 / 4, + is_shape_defined: bool = True, + use_tpu: bool = True, + detection_unit: DetectionClass = DetectionClass.LINE, + ): + """Constructor. + + Args: + is_training: bool indicating TRAIN or EVAL. + output_dimension: The size of input images. + mask_dimension: The size of the output mask. If negative or zero, it will + be set the same as output_dimension. + max_num_instance: The maximum number of instances to output. If it's + negative, padding or truncating will not be performed. + rot90_probability: The probability of rotating multiples of 90 degrees. + use_color_distortion: Whether to apply color distortions to images (via + autoaugment). + randaug_mag: (autoaugment parameter) Color distortion magnitude. Note + that, this value should be set conservatively, as some color distortions + can easily make text illegible e.g. posterize. + randaug_std: (autoaugment parameter) Randomness in color distortion + magnitude. + randaug_layer: (autoaugment parameter) Number of color distortion + operations. + randaug_prob: (autoaugment parameter) Probabilily of applying each + distortion operation. + use_cropping: Bool, whether to use random cropping and resizing in + training. + crop_min_scale: The minimum scale of a random crop. + crop_max_scale: The maximum scale of a random crop. If >1, it means the + images are downsampled. + crop_min_aspect: The minimum aspect ratio of a random crop. + crop_max_aspect: The maximum aspect ratio of a random crop. + is_shape_defined: Whether to define the static shapes for all features and + labels. This must be set to True in TPU training as it requires static + shapes for all tensors. + use_tpu: Whether the inputs are fed to a TPU device. + detection_unit: Whether word or line (or else) is regarded as an entity. + The instance masks will be at word or line level. + """ + if is_training and max_num_instance < 0: + raise ValueError('In TRAIN mode, padding/truncation is required.') + + self._is_training = is_training + self._output_dimension = output_dimension + self._mask_dimension = ( + mask_dimension if mask_dimension > 0 else output_dimension) + self._max_num_instance = max_num_instance + self._decoder = tf_example_decoder.TfExampleDecoder( + num_additional_channels=3, additional_class_names=['parent']) + self._use_color_distortion = use_color_distortion + self._rot90_probability = rot90_probability + self._randaug_mag = randaug_mag + self._randaug_std = randaug_std + self._randaug_layer = randaug_layer + self._randaug_prob = randaug_prob + self._use_cropping = use_cropping + self._crop_min_scale = crop_min_scale + self._crop_max_scale = crop_max_scale + self._crop_min_aspect = crop_min_aspect + self._crop_max_aspect = crop_max_aspect + self._is_shape_defined = is_shape_defined + self._use_tpu = use_tpu + self._detection_unit = detection_unit + + def __call__(self, value: str) -> Tuple[TensorDict, NestedTensorDict]: + """Parsing the data. + + Args: + value: The serialized data sample. + + Returns: + Two dicts for features and labels. + features: + 'source_id': id of the sample; only in EVAL mode + 'images': the normalized images, (output_dimension, output_dimension, 3) + labels: + See `_prepare_labels` for its content. + """ + data = self._decoder.decode(value) + features = {} + labels = {} + self._preprocess(data, features, labels) + self._rot90k(data, features, labels) + self._crop_and_resize(data, features, labels) + self._color_distortion_and_normalize(data, features, labels) + self._prepare_labels(data, features, labels) + self._define_shapes(features, labels) + return features, labels + + def _preprocess(self, data: TensorDict, features: TensorDict, + unused_labels: TensorDict): + """All kinds of preprocessing of the decoded data dict.""" + # (1) Decode the entity_id_mask: a H*W*1 mask, each pixel equals to + # (1 + position) of the entity in the GT entity list. The IDs + # (which can be larger than 255) are stored in the last two channels. + data['additional_channels'] = tf.cast(data['additional_channels'], tf.int32) + entity_id_mask = ( + data['additional_channels'][:, :, -2:-1] * 256 + + data['additional_channels'][:, :, -1:]) + data['entity_id_mask'] = entity_id_mask + + # (2) Write image id. Used in evaluation. + if not self._use_tpu: + features['source_id'] = data['source_id'] + + # (3) Block mask: area without annotation + data['image'] = _erase( + data['additional_channels'][:, :, 0], + data['image'], + min_val=0., + max_val=256.) + + def _rot90k(self, data: TensorDict, unused_features: TensorDict, + unused_labels: TensorDict): + """Rotate the image, gt_bboxes, masks by 90k degrees.""" + if not self._is_training: + return + + rotate_90_choice = tf.random.uniform([]) + + def _rotate(): + """Rotation. + + These will be rotated: + image, + rbox, + entity_id_mask, + TODO(longshangbang): rotate vertices. + + Returns: + The rotated tensors of the above fields. + """ + k = tf.random.uniform([], 1, 4, dtype=tf.int32) + h, w, _ = utilities.resolve_shape(data['image']) + # Image + rotated_img = tf.image.rot90(data['image'], k=k, name='image_rot90k') + # Box + rotate_box_op = functools.partial( + utilities.rotate_rboxes90, + rboxes=data['groundtruth_boxes'], + image_width=w, + image_height=h) + rotated_boxes = tf.switch_case( + k - 1, # Indices start with 1. + branch_fns=[ + lambda: rotate_box_op(rotation_count=1), + lambda: rotate_box_op(rotation_count=2), + lambda: rotate_box_op(rotation_count=3) + ]) + # Mask + rotated_mask = tf.image.rot90( + data['entity_id_mask'], k=k, name='mask_rot90k') + return rotated_img, rotated_boxes, rotated_mask + + # pylint: disable=g-long-lambda + (data['image'], data['groundtruth_boxes'], + data['entity_id_mask']) = tf.cond( + rotate_90_choice < self._rot90_probability, _rotate, lambda: + (data['image'], data['groundtruth_boxes'], data['entity_id_mask'])) + # pylint: enable=g-long-lambda + + def _crop_and_resize(self, data: TensorDict, unused_features: TensorDict, + unused_labels: TensorDict): + """Perform random cropping and resizing.""" + # TODO(longshangbang): resize & translate box as well + # TODO(longshangbang): resize & translate vertices as well + + # Get cropping target. + h, w = utilities.resolve_shape(data['image'])[:2] + left, top, crop_w, crop_h, pad_w, pad_h = self._get_crop_box( + tf.cast(h, tf.float32), tf.cast(w, tf.float32)) + + # Crop the image. (Pad the images if the crop box is larger than image.) + if self._is_training: + # padding left, top, right, bottom + pad_left = tf.random.uniform([], 0, pad_w + 1, dtype=tf.int32) + pad_top = tf.random.uniform([], 0, pad_h + 1, dtype=tf.int32) + else: + pad_left = 0 + pad_top = 0 + cropped_img = tf.image.crop_to_bounding_box(data['image'], top, left, + crop_h, crop_w) + padded_img = tf.pad( + cropped_img, + [[pad_top, pad_h - pad_top], [pad_left, pad_w - pad_left], [0, 0]], + constant_values=127) + + # Resize images + data['resized_image'] = tf.image.resize( + padded_img, (self._output_dimension, self._output_dimension)) + data['resized_image'] = tf.cast(data['resized_image'], tf.uint8) + + # Crop the masks + cropped_masks = tf.image.crop_to_bounding_box(data['entity_id_mask'], top, + left, crop_h, crop_w) + padded_masks = tf.pad( + cropped_masks, + [[pad_top, pad_h - pad_top], [pad_left, pad_w - pad_left], [0, 0]]) + + # Resize masks + data['resized_masks'] = tf.image.resize( + padded_masks, (self._mask_dimension, self._mask_dimension), + method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) + data['resized_masks'] = tf.squeeze(data['resized_masks'], -1) + + def _get_crop_box( + self, h: tf.Tensor, + w: tf.Tensor) -> Tuple[Any, Any, tf.Tensor, tf.Tensor, Any, Any]: + """Get the cropping box. + + Args: + h: The height of the image to crop. Should be float type. + w: The width of the image to crop. Should be float type. + + Returns: + A tuple representing (left, top, crop_w, crop_h, pad_w, pad_h). + Then in `self._crop_and_resize`, a crop will be extracted with bounding + box from top-left corner (left, top) and with size (crop_w, crop_h). This + crop will then be padded with (pad_w, pad_h) to square sizes. + The outputs also are re-cast to int32 type. + """ + if not self._is_training or not self._use_cropping: + # cast back to integers. + w = tf.cast(w, tf.int32) + h = tf.cast(h, tf.int32) + side = tf.maximum(w, h) + return 0, 0, w, h, side - w, side - h + + # Get box size + scale = tf.random.uniform([], self._crop_min_scale, self._crop_max_scale) + max_edge = tf.maximum(w, h) + long_edge = max_edge * scale + + sqrt_aspect_ratio = tf.math.sqrt( + tf.random.uniform([], self._crop_min_aspect, self._crop_max_aspect)) + box_h = long_edge / sqrt_aspect_ratio + box_w = long_edge * sqrt_aspect_ratio + + # Get box location + left = tf.random.uniform([], 0., tf.maximum(0., w - box_w)) + top = tf.random.uniform([], 0., tf.maximum(0., h - box_h)) + # Get crop & pad + crop_w = tf.minimum(box_w, w - left) + crop_h = tf.minimum(box_h, h - top) + pad_w = box_w - crop_w + pad_h = box_h - crop_h + return (tf.cast(left, tf.int32), tf.cast(top, tf.int32), + tf.cast(crop_w, tf.int32), tf.cast(crop_h, tf.int32), + tf.cast(pad_w, tf.int32), tf.cast(pad_h, tf.int32)) + + def _color_distortion_and_normalize(self, data: TensorDict, + features: TensorDict, + unused_labels: TensorDict): + """Distort colors.""" + if self._is_training and self._use_color_distortion: + data['resized_image'] = autoaugment.distort_image_with_randaugment( + data['resized_image'], self._randaug_layer, self._randaug_mag, + self._randaug_std, True, self._randaug_prob, True) + # Normalize + features['images'] = utilities.normalize_image_to_range( + data['resized_image']) + + def _prepare_labels(self, data: TensorDict, features: TensorDict, + labels: TensorDict): + """This function prepares the labels. + + These following targets are added to labels['segmentation_output']: + 'gt_word_score': A (h, w) float32 mask for textness score. 1 for word, + 0 for bkg. + + These following targets are added to labels['instance_labels']: + 'num_instance': A float scalar tensor for the total number of + instances. It is bounded by the maximum number of instances allowed. + It includes the special background instance, so it equals to + (1 + entity numbers). + 'masks': A (h, w) int32 mask for entity IDs. The value of each pixel is + the id of the entity it belongs to. A value of `0` means the bkg mask. + 'classes': A (max_num,) int tensor indicating the classes of each + instance: + 2 for background + 1 for text entity + 0 for non-object + 'masks_sizes': A (max_num,) float tensor for the size of all masks. + 'gt_weights': Whether it's difficult / does not have text annotation. + + These following targets are added to labels['paragraph_labels']: + 'paragraph_ids': A (max_num,) integer tensor for paragprah id. if `-1`, + then no paragraph label for this text. + 'has_para_ids': A float scalar; 1.0 if the sample has paragraph labels. + + Args: + data: The data dictionary. + features: The feature dict. + labels: The label dict. + """ + # Segmentation labels: + self._get_segmentation_labels(data, features, labels) + # Instance labels: + self._get_instance_labels(data, features, labels) + + def _get_segmentation_labels(self, data: TensorDict, + unused_features: TensorDict, + labels: NestedTensorDict): + labels['segmentation_output'] = { + 'gt_word_score': tf.cast((data['resized_masks'] > 0), tf.float32) + } + + def _get_instance_labels(self, data: TensorDict, features: TensorDict, + labels: NestedTensorDict): + """Generate the labels for text entity detection.""" + + labels['instance_labels'] = {} + # (1) Depending on `detection_unit`: + # Convert the word-id map to line-id map or use the word-id map directly + # Word entity ids start from 1 in the map, so pad a -1 at the beginning of + # the parent list to counter this offset. + padded_parent = tf.concat( + [tf.constant([-1]), + tf.cast(data['groundtruth_parent'], tf.int32)], 0) + if self._detection_unit == DetectionClass.WORD: + entity_id_mask = data['resized_masks'] + elif self._detection_unit == DetectionClass.LINE: + # The pixel value is entity_id + 1, shape = [H, W]; 0 for background. + # correctness: + # 0s in data['resized_masks'] --> padded_parent[0] == -1 + # i-th entity in plp.entities --> i+1 in data['resized_masks'] + # --> padded_parent[i+1] + # --> data['groundtruth_parent'][i] + # --> the parent of i-th entity + entity_id_mask = tf.gather(padded_parent, data['resized_masks']) + 1 + elif self._detection_unit == DetectionClass.PARAGRAPH: + # directly segmenting paragraphs; two hops here. + entity_id_mask = tf.gather(padded_parent, data['resized_masks']) + 1 + entity_id_mask = tf.gather(padded_parent, entity_id_mask) + 1 + else: + raise ValueError(f'No such detection unit: {self._detection_unit}') + data['entity_id_mask'] = entity_id_mask + + # (2) Get individual masks for entities. + entity_selection_mask = tf.equal(data['groundtruth_classes'], + self._detection_unit) + num_all_entity = utilities.resolve_shape(data['groundtruth_classes'])[0] + # entity_ids is a 1-D tensor for IDs of all entities of a certain type. + entity_ids = tf.boolean_mask( + tf.range(num_all_entity, dtype=tf.int32), entity_selection_mask) # (N,) + # +1 to match the entity ids in entity_id_mask + entity_ids = tf.reshape(entity_ids, (-1, 1, 1)) + 1 + individual_masks = tf.expand_dims(entity_id_mask, 0) + individual_masks = tf.equal(entity_ids, individual_masks) # (N, H, W), bool + # TODO(longshangbang): replace with real mask sizes computing. + # Currently, we use full-resolution masks for individual_masks. In order to + # compute mask sizes, we need to convert individual_masks to int/float type. + # This will cause OOM because the mask is too large. + masks_sizes = tf.cast( + tf.reduce_any(individual_masks, axis=[1, 2]), tf.float32) + # remove empty masks (usually caused by cropping) + non_empty_masks_ids = tf.not_equal(masks_sizes, 0) + valid_masks = tf.boolean_mask(individual_masks, non_empty_masks_ids) + valid_entity_ids = tf.boolean_mask(entity_ids, non_empty_masks_ids)[:, 0, 0] + + # (3) Write num of instance + num_instance = tf.reduce_sum(tf.cast(non_empty_masks_ids, tf.float32)) + num_instance_and_bkg = num_instance + 1 + if self._max_num_instance >= 0: + num_instance_and_bkg = tf.minimum(num_instance_and_bkg, + self._max_num_instance) + labels['instance_labels']['num_instance'] = num_instance_and_bkg + + # (4) Write instance masks + num_entity_int = tf.cast(num_instance, tf.int32) + max_num_entities = self._max_num_instance - 1 # Spare 1 for bkg. + pad_num = tf.maximum(max_num_entities - num_entity_int, 0) + padded_valid_masks = tf.pad(valid_masks, [[0, pad_num], [0, 0], [0, 0]]) + + # If there are more instances than allowed, randomly sample some. + # `random_selection_mask` is a 0/1 array; the maximum number of 1 is + # `self._max_num_instance`; if not bound, it's an array with all 1s. + if self._max_num_instance >= 0: + padded_size = num_entity_int + pad_num + random_selection = tf.random.uniform((padded_size,), dtype=tf.float32) + selected_indices = tf.math.top_k(random_selection, k=max_num_entities)[1] + random_selection_mask = tf.scatter_nd( + indices=tf.expand_dims(selected_indices, axis=-1), + updates=tf.ones((max_num_entities,), dtype=tf.bool), + shape=(padded_size,)) + else: + random_selection_mask = tf.ones((num_entity_int,), dtype=tf.bool) + random_discard_mask = tf.logical_not(random_selection_mask) + + kept_masks = tf.boolean_mask(padded_valid_masks, random_selection_mask) + erased_masks = tf.boolean_mask(padded_valid_masks, random_discard_mask) + erased_masks = tf.cast(tf.reduce_any(erased_masks, axis=0), tf.float32) + # erase text instances that are obmitted. + features['images'] = _erase(erased_masks, features['images'], -1., 1.) + labels['segmentation_output']['gt_word_score'] *= 1. - erased_masks + kept_masks_and_bkg = tf.concat( + [ + tf.math.logical_not( + tf.reduce_any(kept_masks, axis=0, keepdims=True)), # bkg + kept_masks, + ], + 0) + labels['instance_labels']['masks'] = tf.argmax(kept_masks_and_bkg, axis=0) + + # (5) Write mask size + # TODO(longshangbang): replace with real masks sizes + masks_sizes = tf.cast( + tf.reduce_any(kept_masks_and_bkg, axis=[1, 2]), tf.float32) + labels['instance_labels']['masks_sizes'] = masks_sizes + # (6) Write classes. + classes = tf.ones((num_instance,), dtype=tf.int32) + classes = tf.concat([tf.constant(2, tf.int32, (1,)), classes], 0) # bkg + if self._max_num_instance >= 0: + classes = utilities.truncate_or_pad(classes, self._max_num_instance, 0) + labels['instance_labels']['classes'] = classes + + # (7) gt-weights + selected_ids = tf.boolean_mask(valid_entity_ids, + random_selection_mask[:num_entity_int]) + + if self._detection_unit != DetectionClass.PARAGRAPH: + gt_text = tf.gather(data['groundtruth_text'], selected_ids - 1) + gt_weights = tf.cast(tf.strings.length(gt_text) > 0, tf.float32) + else: + text_types = tf.concat( + [ + tf.constant([8]), + tf.cast(data['groundtruth_content_type'], tf.int32), + # TODO(longshangbang): temp solution for tfes with no para labels + tf.constant(8, shape=(1000,)), + ], + 0) + para_types = tf.gather(text_types, selected_ids) + + gt_weights = tf.cast( + tf.not_equal(para_types, NOT_ANNOTATED_ID), tf.float32) + + gt_weights = tf.concat([tf.constant(1., shape=(1,)), gt_weights], 0) # bkg + if self._max_num_instance >= 0: + gt_weights = utilities.truncate_or_pad( + gt_weights, self._max_num_instance, 0) + labels['instance_labels']['gt_weights'] = gt_weights + + # (8) get paragraph label + # In this step, an array `{p_i}` is generated. `p_i` is an integer that + # indicates the group of paragraph which i-th text belongs to. `p_i` == -1 + # if this instance is non-text or it has no paragraph labels. + # word -> line -> paragraph + if self._detection_unit == DetectionClass.WORD: + num_hop = 2 + elif self._detection_unit == DetectionClass.LINE: + num_hop = 1 + elif self._detection_unit == DetectionClass.PARAGRAPH: + num_hop = 0 + else: + raise ValueError(f'No such detection unit: {self._detection_unit}. ' + 'Note that this error should have been raised in ' + 'previous lines, not here!') + para_ids = tf.identity(selected_ids) # == id in plp + 1 + for _ in range(num_hop): + para_ids = tf.gather(padded_parent, para_ids) + 1 + + text_types = tf.concat( + [ + tf.constant([8]), + tf.cast(data['groundtruth_content_type'], tf.int32), + # TODO(longshangbang): tricks for tfes that have not para labels + tf.constant(8, shape=(1000,)), + ], + 0) + para_types = tf.gather(text_types, para_ids) + + para_ids = para_ids - 1 # revert to id in plp.entities; -1 for no labels + valid_para = tf.cast(tf.not_equal(para_types, NOT_ANNOTATED_ID), tf.int32) + para_ids = valid_para * para_ids + (1 - valid_para) * (-1) + para_ids = tf.concat([tf.constant([-1]), para_ids], 0) # add bkg + + has_para_ids = tf.cast(tf.reduce_sum(valid_para) > 0, tf.float32) + + if self._max_num_instance >= 0: + para_ids = utilities.truncate_or_pad( + para_ids, self._max_num_instance, 0, -1) + labels['paragraph_labels'] = { + 'paragraph_ids': para_ids, + 'has_para_ids': has_para_ids + } + + def _define_shapes(self, features: TensorDict, labels: TensorDict): + """Define the tensor shapes for TPU compiling.""" + if not self._is_shape_defined: + return + features['images'] = tf.ensure_shape( + features['images'], (self._output_dimension, self._output_dimension, 3)) + labels['segmentation_output']['gt_word_score'] = tf.ensure_shape( + labels['segmentation_output']['gt_word_score'], + (self._mask_dimension, self._mask_dimension)) + labels['instance_labels']['num_instance'] = tf.ensure_shape( + labels['instance_labels']['num_instance'], []) + if self._max_num_instance >= 0: + labels['instance_labels']['masks_sizes'] = tf.ensure_shape( + labels['instance_labels']['masks_sizes'], (self._max_num_instance,)) + labels['instance_labels']['masks'] = tf.ensure_shape( + labels['instance_labels']['masks'], + (self._mask_dimension, self._mask_dimension)) + labels['instance_labels']['classes'] = tf.ensure_shape( + labels['instance_labels']['classes'], (self._max_num_instance,)) + labels['instance_labels']['gt_weights'] = tf.ensure_shape( + labels['instance_labels']['gt_weights'], (self._max_num_instance,)) + labels['paragraph_labels']['paragraph_ids'] = tf.ensure_shape( + labels['paragraph_labels']['paragraph_ids'], + (self._max_num_instance,)) + labels['paragraph_labels']['has_para_ids'] = tf.ensure_shape( + labels['paragraph_labels']['has_para_ids'], []) diff --git a/official/projects/unified_detector/docs/images/task.png b/official/projects/unified_detector/docs/images/task.png new file mode 100644 index 0000000000000000000000000000000000000000..342ecef630c424adb051ad46a8ee8bb85f7969e5 Binary files /dev/null and b/official/projects/unified_detector/docs/images/task.png differ diff --git a/official/projects/unified_detector/external_configurables.py b/official/projects/unified_detector/external_configurables.py new file mode 100644 index 0000000000000000000000000000000000000000..b10fc66a66256348c09098913352868063d9b76a --- /dev/null +++ b/official/projects/unified_detector/external_configurables.py @@ -0,0 +1,22 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Wrap external code in gin.""" + +import gin +import gin.tf.external_configurables +import tensorflow as tf + +# Tensorflow. +gin.external_configurable(tf.keras.layers.experimental.SyncBatchNormalization) diff --git a/official/projects/unified_detector/modeling/universal_detector.py b/official/projects/unified_detector/modeling/universal_detector.py new file mode 100644 index 0000000000000000000000000000000000000000..a076701af78b679ffa3e83fcea5e191b42cec2bf --- /dev/null +++ b/official/projects/unified_detector/modeling/universal_detector.py @@ -0,0 +1,888 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Universal detector implementation.""" + +from typing import Any, Dict, Optional, Sequence, Tuple, Union + +import gin +import tensorflow as tf + +from deeplab2 import config_pb2 +from deeplab2.model.decoder import max_deeplab as max_deeplab_head +from deeplab2.model.encoder import axial_resnet_instances +from deeplab2.model.loss import matchers_ops +from official.legacy.transformer import transformer +from official.projects.unified_detector.utils import typing +from official.projects.unified_detector.utils import utilities + + +EPSILON = 1e-6 + + +@gin.configurable +def universal_detection_loss_weights( + loss_segmentation_word: float = 1e0, + loss_inst_dist: float = 1e0, + loss_mask_id: float = 1e-4, + loss_pq: float = 3e0, + loss_para: float = 1e0) -> Dict[str, float]: + """A function that returns a dict for the weights of loss terms.""" + return { + "loss_segmentation_word": loss_segmentation_word, + "loss_inst_dist": loss_inst_dist, + "loss_mask_id": loss_mask_id, + "loss_pq": loss_pq, + "loss_para": loss_para, + } + + +@gin.configurable +class LayerNorm(tf.keras.layers.LayerNormalization): + """A wrapper to allow passing the `training` argument. + + The normalization layers in the MaX-DeepLab implementation are passed with + the `training` argument. This wrapper enables the usage of LayerNorm. + """ + + def call(self, + inputs: tf.Tensor, + training: Optional[bool] = None) -> tf.Tensor: + del training + return super().call(inputs) + + +@gin.configurable +def get_max_deep_lab_backbone(num_slots: int = 128): + return axial_resnet_instances.get_model( + "max_deeplab_s", + bn_layer=LayerNorm, + block_group_config={ + "drop_path_schedule": "linear", + "axial_use_recompute_grad": False + }, + backbone_use_transformer_beyond_stride=16, + extra_decoder_use_transformer_beyond_stride=16, + num_mask_slots=num_slots, + max_num_mask_slots=num_slots) + + +@gin.configurable +class UniversalDetector(tf.keras.layers.Layer): + """Univeral Detector.""" + loss_items = ("loss_pq", "loss_inst_dist", "loss_para", "loss_mask_id", + "loss_segmentation_word") + + def __init__(self, + backbone_fn: tf.keras.layers.Layer = get_max_deep_lab_backbone, + mask_threshold: float = 0.4, + class_threshold: float = 0.5, + filter_area: float = 32, + **kwargs: Any): + """Constructor. + + Args: + backbone_fn: The function to initialize a backbone. + mask_threshold: Masks are thresholded with this value. + class_threshold: Classification heads are thresholded with this value. + filter_area: In inference, detections with area smaller than this + threshold will be removed. + **kwargs: other keyword arguments passed to the base class. + """ + super().__init__(**kwargs) + + # Model + self._backbone_fn = backbone_fn() + self._decoder = _get_decoder_head() + self._class_embed_head, self._para_embed_head = _get_embed_head() + self._para_head, self._para_proj = _get_para_head() + + # Losses + # self._max_deeplab_loss = _get_max_deeplab_loss() + self._loss_weights = universal_detection_loss_weights() + + # Post-processing + self._mask_threshold = mask_threshold + self._class_threshold = class_threshold + self._filter_area = filter_area + + def _preprocess_labels(self, labels: typing.TensorDict): + # Preprocessing + # Converted the integer mask to one-hot embedded masks. + num_instances = utilities.resolve_shape( + labels["instance_labels"]["masks_sizes"])[1] + labels["instance_labels"]["masks"] = tf.one_hot( + labels["instance_labels"]["masks"], + depth=num_instances, + axis=1, + dtype=tf.float32) # (B, N, H, W) + + def compute_losses( + self, labels: typing.NestedTensorDict, outputs: typing.NestedTensorDict + ) -> Tuple[tf.Tensor, typing.NestedTensorDict]: + """Computes the loss. + + Args: + labels: A dictionary of ground-truth labels. + outputs: Output from self.call(). + + Returns: + A scalar total loss tensor and a dictionary for individual losses. + """ + loss_dict = {} + + self._preprocess_labels(labels) + + # Main loss: PQ loss. + _entity_mask_loss(loss_dict, labels["instance_labels"], + outputs["instance_output"]) + # Auxiliary loss 1: semantic loss + _semantic_loss(loss_dict, labels["segmentation_output"], + outputs["segmentation_output"]) + # Auxiliary loss 2: instance discrimination + _instance_discrimination_loss(loss_dict, labels["instance_labels"], outputs) + # Auxiliary loss 3: mask id + _mask_id_xent_loss(loss_dict, labels["instance_labels"], outputs) + # Auxiliary loss 4: paragraph grouping + _paragraph_grouping_loss(loss_dict, labels, outputs) + + weighted_loss = [self._loss_weights[k] * v for k, v in loss_dict.items()] + total_loss = sum(weighted_loss) + return total_loss, loss_dict + + def call(self, + features: typing.TensorDict, + training: bool = False) -> typing.NestedTensorDict: + """Forward pass of the model. + + Args: + features: The input features: {"images": tf.Tensor}. Shape = [B, H, W, C] + training: Whether it's training mode. + + Returns: + A dictionary of output with this structure: + { + "max_deep_lab": { + All the max deeplab outputs are here, including both backbone and + decoder. + } + "segmentation_output": { + "word_score": tf.Tensor, [B, h, w], + } + "instance_output": { + "cls_logits": tf.Tensor, [B, N, C], + "mask_id_logits": tf.Tensor, [B, H, W, N], + "cls_prob": tf.Tensor, [B, N, C], + "mask_id_prob": tf.Tensor, [B, H, W, N], + } + "postprocessed": { + "classes": A (B, N) tensor for the class ids. Zero for non-firing + slots. + "binary_masks": A (B, H, W, N) tensor for the N binary masks. Masks + for void cls are set to zero. + "confidence": A (B, N) float tensor for the confidence of "classes". + "mask_area": A (B, N) float tensor for the area of each mask. + } + "transformer_group_feature": (B, N, C) float tensor (normalized), + "para_affinity": (B, N, N) float tensor. + } + + Class-0 is for void. Class-(C-1) is for background. Class-1~(C-2) is for + valid classes. + """ + # backbone + backbone_output = self._backbone_fn(features["images"], training) + # split instance embedding and paragraph embedding; + # then perform paragraph grouping + para_fts = self._get_para_outputs(backbone_output, training) + affinity = tf.linalg.matmul(para_fts, para_fts, transpose_b=True) + # text detection head + decoder_output = self._decoder(backbone_output, training) + output_dict = { + "max_deep_lab": decoder_output, + "transformer_group_feature": para_fts, + "para_affinity": affinity, + } + input_shape = utilities.resolve_shape(features["images"]) + self._get_semantic_outputs(output_dict, input_shape) + self._get_instance_outputs(output_dict, input_shape) + self._postprocess(output_dict) + + return output_dict + + def _get_para_outputs(self, outputs: typing.TensorDict, + training: bool) -> tf.Tensor: + """Apply the paragraph head. + + This function first splits the features for instance classification and + instance grouping. Then, the additional grouping branch (transformer layers) + is applied to further encode the grouping features. Finally, a tensor of + normalized grouping features is returned. + + Args: + outputs: output dictionary from the backbone. + training: training / eval mode mark. + + Returns: + The normalized paragraph embedding vector of shape (B, N, C). + """ + # Project the object embeddings into classification feature and grouping + # feature. + fts = outputs["transformer_class_feature"] # B,N,C + class_feature = self._class_embed_head(fts, training) + group_feature = self._para_embed_head(fts, training) + outputs["transformer_class_feature"] = class_feature + outputs["transformer_group_feature"] = group_feature + + # Feed the grouping features into additional group encoding branch. + # First we need to build the attention_bias which is used the standard + # transformer encoder. + input_shape = utilities.resolve_shape(group_feature) + b = input_shape[0] + n = int(input_shape[1]) + seq_len = tf.constant(n, shape=(b,)) + padding_mask = utilities.get_padding_mask_from_valid_lengths( + seq_len, n, tf.float32) + attention_bias = utilities.get_transformer_attention_bias(padding_mask) + group_feature = self._para_proj( + self._para_head(group_feature, attention_bias, None, training)) + return tf.math.l2_normalize(group_feature, axis=-1) + + def _get_semantic_outputs(self, outputs: typing.NestedTensorDict, + input_shape: tf.TensorShape): + """Add `segmentation_output` to outputs. + + Args: + outputs: A dictionary of outputs. + input_shape: The shape of the input images. + """ + h, w = input_shape[1:3] + # B, H/4, W/4, C + semantic_logits = outputs["max_deep_lab"]["semantic_logits"] + textness, unused_logits = tf.split(semantic_logits, [2, -1], -1) + # Channel[0:2], textness. c0: non-textness, c1: textness. + word_score = tf.nn.softmax(textness, -1, "word_score")[:, :, :, 1:2] + word_score = tf.squeeze(tf.image.resize(word_score, (h, w)), -1) + # Channel[2:] not used yet + outputs["segmentation_output"] = {"word_score": word_score} + + def _get_instance_outputs(self, outputs: typing.NestedTensorDict, + input_shape: tf.TensorShape): + """Add `instance_output` to outputs. + + Args: + outputs: A dictionary of outputs. + input_shape: The shape of the input images. + These following fields are added to outputs["instance_output"]: + "cls_logits": tf.Tensor, [B, N, C]. + "mask_id_logits": tf.Tensor, [B, H, W, N]. + "cls_prob": tf.Tensor, [B, N, C], softmax probability. + "mask_id_prob": tf.Tensor, [B, H, W, N], softmax probability. They are + used in training. Masks are all resized to full resolution. + """ + # Get instance_output + h, w = input_shape[1:3] + ## Classes + class_logits = outputs["max_deep_lab"]["transformer_class_logits"] + # The MaX-DeepLab repo uses the last logit for void; but we use 0. + # Therefore we shift the logits here. + class_logits = tf.roll(class_logits, shift=1, axis=-1) + class_prob = tf.nn.softmax(class_logits) + + ## Masks + mask_id_logits = outputs["max_deep_lab"]["pixel_space_mask_logits"] + mask_id_prob = tf.nn.softmax(mask_id_logits) + mask_id_logits = tf.image.resize(mask_id_logits, (h, w)) + mask_id_prob = tf.image.resize(mask_id_prob, (h, w)) + outputs["instance_output"] = { + "cls_logits": class_logits, + "mask_id_logits": mask_id_logits, + "cls_prob": class_prob, + "mask_id_prob": mask_id_prob, + } + + def _postprocess(self, outputs: typing.NestedTensorDict): + """Post-process (filtering) the outputs. + + Args: + outputs: A dictionary of outputs. + These following fields are added to outputs["postprocessed"]: + "classes": A (B,N) integer tensor for the class ids. + "binary_masks": A (B, H, W, N) tensor for the N binarized 0/1 masks. Masks + for void cls are set to zero. + "confidence": A (B, N) float tensor for the confidence of "classes". + "mask_area": A (B, N) float tensor for the area of each mask. They are + used in inference / visualization. + """ + # Get postprocessed outputs + outputs["postprocessed"] = {} + + ## Masks: + mask_id_prob = outputs["instance_output"]["mask_id_prob"] + mask_max_prob = tf.reduce_max(mask_id_prob, axis=-1, keepdims=True) + thresholded_binary_masks = tf.cast( + tf.math.logical_and( + tf.equal(mask_max_prob, mask_id_prob), + tf.greater_equal(mask_max_prob, self._mask_threshold)), tf.float32) + area = tf.reduce_sum(thresholded_binary_masks, axis=(1, 2)) # (B, N) + ## Classification: + cls_prob = outputs["instance_output"]["cls_prob"] + cls_max_prob = tf.reduce_max(cls_prob, axis=-1) # B, N + cls_max_id = tf.cast(tf.argmax(cls_prob, axis=-1), tf.float32) # B, N + + ## filtering + c = utilities.resolve_shape(cls_prob)[2] + non_void = tf.reduce_all( + tf.stack( + [ + tf.greater_equal(area, self._filter_area), # mask large enough. + tf.not_equal(cls_max_id, 0), # class-0 is for non-object. + tf.not_equal(cls_max_id, + c - 1), # class-(c-1) is for background (last). + tf.greater_equal(cls_max_prob, + self._class_threshold) # prob >= thr + ], + axis=-1), + axis=-1) + non_void = tf.cast(non_void, tf.float32) + + # Storing + outputs["postprocessed"]["classes"] = tf.cast(cls_max_id * non_void, + tf.int32) + b, n = utilities.resolve_shape(non_void) + outputs["postprocessed"]["binary_masks"] = ( + thresholded_binary_masks * tf.reshape(non_void, (b, 1, 1, n))) + outputs["postprocessed"]["confidence"] = cls_max_prob + outputs["postprocessed"]["mask_area"] = area + + def _coloring(self, masks: tf.Tensor) -> tf.Tensor: + """Coloring segmentation masks. + + Used in visualization. + + Args: + masks: A float binary tensor of shape (B, H, W, N), representing `B` + samples, with `N` masks of size `H*W` each. Each of the `N` masks will + be assigned a random color. + + Returns: + A (b, h, w, 3) float tensor in [0., 1.] for the coloring result. + """ + b, h, w, n = utilities.resolve_shape(masks) + palette = tf.random.uniform((1, n, 3), 0.5, 1.) + colored = tf.reshape( + tf.matmul(tf.reshape(masks, (b, -1, n)), palette), (b, h, w, 3)) + return colored + + def visualize(self, + outputs: typing.NestedTensorDict, + labels: Optional[typing.TensorDict] = None): + """Visualizes the outputs and labels. + + Args: + outputs: A dictionary of outputs. + labels: A dictionary of labels. + The following dict is added to outputs["visualization"]: { + "instance": { + "pred": A (B, H, W, 3) tensor for the visualized map in [0,1]. + "gt": A (B, H, W, 3) tensor for the visualized map in [0,1], if labels + is present. + "concat": Concatenation of "prediction" and "gt" along width axis, if + labels is present. } + "seg-text": {... Similar to above, but the shape is (B, H, W, 1).} } All + of these tensors have a rank of 4 (B, H, W, C). + """ + + outputs["visualization"] = {} + # 1. prediction + # 1.1 instance mask + binary_masks = outputs["postprocessed"]["binary_masks"] + outputs["visualization"]["instance"] = { + "pred": self._coloring(binary_masks), + } + # 1.2 text-seg + outputs["visualization"]["seg-text"] = { + "pred": + tf.expand_dims(outputs["segmentation_output"]["word_score"], -1), + } + + # 2. labels + if labels is not None: + # 2.1 instance mask + # (B, N, H, W) -> (B, H, W, N); the first one is bkg so removed. + gt_masks = tf.transpose(labels["instance_labels"]["masks"][:, 1:], + (0, 2, 3, 1)) + outputs["visualization"]["instance"]["gt"] = self._coloring(gt_masks) + # 2.2 text-seg + outputs["visualization"]["seg-text"]["gt"] = tf.expand_dims( + labels["segmentation_output"]["gt_word_score"], -1) + + # 3. concat + for v in outputs["visualization"].values(): + # Resize to make the size align. The prediction always has stride=1 + # resolution, so we make gt align with pred instead of vice versa. + v["concat"] = tf.concat( + [v["pred"], + tf.image.resize(v["gt"], + tf.shape(v["pred"])[1:3])], + axis=2) + + @tf.function + def serve(self, image_tensor: tf.Tensor) -> typing.NestedTensorDict: + """Method to be exported for SavedModel. + + Args: + image_tensor: A float32 normalized tensor representing an image of shape + [1, height, width, channels]. + + Returns: + Dict of output: + classes: (B, N) int32 tensor == o["postprocessed"]["classes"] + masks: (B, H, W, N) float32 tensor == o["postprocessed"]["binary_masks"] + groups: (B, N, N) float32 tensor == o["para_affinity"] + confidence: A (B, N) float tensor == o["postprocessed"]["confidence"] + mask_area: A (B, N) float tensor == o["postprocessed"]["mask_area"] + """ + features = {"images": image_tensor} + nn_outputs = self(features, False) + outputs = { + "classes": nn_outputs["postprocessed"]["classes"], + "masks": nn_outputs["postprocessed"]["binary_masks"], + "confidence": nn_outputs["postprocessed"]["confidence"], + "mask_area": nn_outputs["postprocessed"]["mask_area"], + "groups": nn_outputs["para_affinity"], + } + return outputs + + +@gin.configurable() +def _get_decoder_head( + atrous_rates: Sequence[int] = (6, 12, 18), + pixel_space_dim: int = 128, + pixel_space_intermediate: int = 256, + low_level: Sequence[Dict[str, Union[str, int]]] = ({ + "feature_key": "res3", + "channels_project": 64, + }, { + "feature_key": "res2", + "channels_project": 32, + }), + num_classes=3, + aux_sem_intermediate=256, + norm_fn=tf.keras.layers.BatchNormalization, +) -> max_deeplab_head.MaXDeepLab: + """Get the MaX-DeepLab prediction head. + + Args: + atrous_rates: Dilation rate for astrou conv in the semantic head. + pixel_space_dim: The dimension for the final panoptic features. + pixel_space_intermediate: The dimension for the layer before + `pixel_space_dim` (i.e. the separable 5x5 layer). + low_level: A list of dicts for the feature pyramid in forming the semantic + output. Each dict represents one skip-path from the backbone. + num_classes: Number of classes (entities + bkg) including void. For example, + if we only want to detect word, then `num_classes` = 3 (1 for word, 1 for + bkg, and 1 for void). + aux_sem_intermediate: Similar to `pixel_space_intermediate`, but for the + auxiliary semantic output head. + norm_fn: The normalization function used in the head. + + Returns: + A MaX-DeepLab decoder head (as a keras layer). + """ + + # Initialize the configs. + configs = config_pb2.ModelOptions() + configs.decoder.feature_key = "feature_semantic" + configs.decoder.atrous_rates.extend(atrous_rates) + configs.max_deeplab.pixel_space_head.output_channels = pixel_space_dim + configs.max_deeplab.pixel_space_head.head_channels = pixel_space_intermediate + for low_level_config in low_level: + low_level_ = configs.max_deeplab.auxiliary_low_level.add() + low_level_.feature_key = low_level_config["feature_key"] + low_level_.channels_project = low_level_config["channels_project"] + configs.max_deeplab.auxiliary_semantic_head.output_channels = num_classes + configs.max_deeplab.auxiliary_semantic_head.head_channels = aux_sem_intermediate + + return max_deeplab_head.MaXDeepLab(configs.decoder, + configs.max_deeplab, 0, norm_fn) + + +class PseudoLayer(tf.keras.layers.Layer): + """Pseudo layer for ablation study. + + The `call()` function has the same argument signature as a transformer + encoder stack. `unused_ph1` and `unused_ph2` are place holders for this + purpose. When studying the effectiveness of using transformer as the + grouping branch, we can use this PseudoLayer to replace the transformer to + use as a no-transformer baseline. + + To use a single projection layer instead of transformer, simply set `extra_fc` + to True. + """ + + def __init__(self, extra_fc: bool): + super().__init__(name="extra_fc") + self._extra_fc = extra_fc + if extra_fc: + self._layer = tf.keras.Sequential([ + tf.keras.layers.Dense(256, activation="relu"), + tf.keras.layers.LayerNormalization(), + ]) + + def call(self, + fts: tf.Tensor, + unused_ph1: Optional[tf.Tensor], + unused_ph2: Optional[tf.Tensor], + training: Optional[bool] = None) -> tf.Tensor: + """See base class.""" + if self._extra_fc: + return self._layer(fts, training) + return fts + + +@gin.configurable() +def _get_embed_head( + dimension=256, + norm_fn=tf.keras.layers.BatchNormalization +) -> Tuple[tf.keras.Sequential, tf.keras.Sequential]: + """Projection layers to get instance & grouping features.""" + instance_head = tf.keras.Sequential([ + tf.keras.layers.Dense(dimension, use_bias=False), + norm_fn(), + tf.keras.layers.ReLU(), + ]) + grouping_head = tf.keras.Sequential([ + tf.keras.layers.Dense(dimension, use_bias=False), + norm_fn(), + tf.keras.layers.ReLU(), + ]) + return instance_head, grouping_head + + +@gin.configurable() +def _get_para_head( + dimension=128, + num_layer=3, + extra_fc=False) -> Tuple[tf.keras.layers.Layer, tf.keras.layers.Layer]: + """Get the additional para head. + + Args: + dimension: the dimension of the final output. + num_layer: the number of transformer layer. + extra_fc: Whether an extra single fully-connected layer is used, when + num_layer=0. + + Returns: + an encoder and a projection layer for the grouping features. + """ + if num_layer > 0: + encoder = transformer.EncoderStack( + params={ + "hidden_size": 256, + "num_hidden_layers": num_layer, + "num_heads": 4, + "filter_size": 512, + "initializer_gain": 1.0, + "attention_dropout": 0.1, + "relu_dropout": 0.1, + "layer_postprocess_dropout": 0.1, + "allow_ffn_pad": True, + }) + else: + encoder = PseudoLayer(extra_fc) + dense = tf.keras.layers.Dense(dimension) + return encoder, dense + + +def _dice_sim(pred: tf.Tensor, ground_truth: tf.Tensor) -> tf.Tensor: + """Dice Coefficient for mask similarity. + + Args: + pred: The predicted mask. [B, N, H, W], in [0, 1]. + ground_truth: The ground-truth mask. [B, N, H, W], in [0, 1] or {0, 1}. + + Returns: + A matrix for the losses: m[b, i, j] is the dice similarity between pred `i` + and gt `j` in batch `b`. + """ + b, n = utilities.resolve_shape(pred)[:2] + ground_truth = tf.reshape( + tf.transpose(ground_truth, (0, 2, 3, 1)), (b, -1, n)) # B, HW, N + pred = tf.reshape(pred, (b, n, -1)) # B, N, HW + numerator = tf.matmul(pred, ground_truth) * 2. + # TODO(longshangbang): The official implementation does not square the scores. + # Need to do experiment to determine which one is better. + denominator = ( + tf.math.reduce_sum(tf.math.square(ground_truth), 1, keepdims=True) + + tf.math.reduce_sum(tf.math.square(pred), 2, keepdims=True)) + return (numerator + EPSILON) / (denominator + EPSILON) + + +def _semantic_loss( + loss_dict: Dict[str, tf.Tensor], + labels: tf.Tensor, + outputs: tf.Tensor, +): + """Auxiliary semantic loss. + + Currently, these losses are added: + (1) text/non-text heatmap + + Args: + loss_dict: A dictionary for the loss. The values are loss scalars. + labels: The label dictionary containing: + `gt_word_score`: (B, H, W) tensor for the text/non-text map. + outputs: The output dictionary containing: + `word_score`: (B, H, W) prediction tensor for `gt_word_score` + """ + pred = tf.expand_dims(outputs["word_score"], 1) + gt = tf.expand_dims(labels["gt_word_score"], 1) + loss_dict["loss_segmentation_word"] = 1. - tf.reduce_mean(_dice_sim(pred, gt)) + + +@gin.configurable +def _entity_mask_loss(loss_dict: Dict[str, tf.Tensor], + labels: tf.Tensor, + outputs: tf.Tensor, + alpha: float = gin.REQUIRED): + """PQ loss for entity-mask training. + + This method adds the PQ loss term to loss_dict directly. The match result will + also be stored in outputs (As a [B, N_pred, N_gt] float tensor). + + Args: + loss_dict: A dictionary for the loss. The values are loss scalars. + labels: A dict containing: `num_instance` - (B,) `masks` - (B, N, H, W) + `classes` - (B, N) + outputs: A dict containing: + `cls_prob`: (B, N, C) + `mask_id_prob`: (B, H, W, N) + `cls_logits`: (B, N, C) + `mask_id_logits`: (B, H, W, N) + alpha: Weight for pos/neg balance. + """ + # Classification score: (B, N, N) + # in batch b, the probability of prediction i being class of gt j, i.e.: + # score[b, i, j] = pred_cls[b, i, gt_cls[b, j]] + gt_cls = labels["classes"] # (B, N) + pred_cls = outputs["cls_prob"] # (B, N, C) + b, n = utilities.resolve_shape(pred_cls)[:2] + # indices[b, i, j] = gt_cls[b, j] + indices = tf.tile(tf.expand_dims(gt_cls, 1), (1, n, 1)) + cls_score = tf.gather(pred_cls, tf.cast(indices, tf.int32), batch_dims=2) + + # Mask score (dice): (B, N, N) + # mask_score[b, i, j]: dice-similarity for pred i and gt j in batch b. + mask_score = _dice_sim( + tf.transpose(outputs["mask_id_prob"], (0, 3, 1, 2)), labels["masks"]) + + # Get similarity matrix and matching. + # padded mask[b, j, i] = -1 << other scores, if i >= num_instance[b] + similarity = cls_score * mask_score + padded_mask = tf.cast(tf.reshape(tf.range(n), (1, 1, n)), tf.float32) + padded_mask = tf.cast( + tf.math.greater_equal(padded_mask, + tf.reshape(labels["num_instance"], (b, 1, 1))), + tf.float32) + # The constant value for padding has no effect. + masked_similarity = similarity * (1. - padded_mask) + padded_mask * (-1.) + matched_mask = matchers_ops.hungarian_matching(-masked_similarity) + matched_mask = tf.cast(matched_mask, tf.float32) * (1 - padded_mask) + outputs["matched_mask"] = matched_mask + # Pos loss + loss_pos = ( + tf.stop_gradient(cls_score) * (-mask_score) + + tf.stop_gradient(mask_score) * (-tf.math.log(cls_score))) + loss_pos = tf.reduce_sum(loss_pos * matched_mask, axis=[1, 2]) # (B,) + # Neg loss + matched_pred = tf.cast(tf.reduce_sum(matched_mask, axis=2) > 0, + tf.float32) # (B, N) + # 0 for void class + log_loss = -tf.nn.log_softmax(outputs["cls_logits"])[:, :, 0] # (B, N) + loss_neg = tf.reduce_sum(log_loss * (1. - matched_pred), axis=-1) # (B,) + + loss_pq = (alpha * loss_pos + (1 - alpha) * loss_neg) / n + loss_pq = tf.reduce_mean(loss_pq) + loss_dict["loss_pq"] = loss_pq + + +@gin.configurable +def _instance_discrimination_loss(loss_dict: Dict[str, Any], + labels: Dict[str, Any], + outputs: Dict[str, Any], + tau: float = gin.REQUIRED): + """Instance discrimination loss. + + This method adds the ID loss term to loss_dict directly. + + Args: + loss_dict: A dictionary for the loss. The values are loss scalars. + labels: The label dictionary. + outputs: The output dictionary. + tau: The temperature term in the loss + """ + # The normalized feature, shape=(B, H/4, W/4, D) + g = outputs["max_deep_lab"]["pixel_space_normalized_feature"] + b, h, w = utilities.resolve_shape(g)[:3] + # The ground-truth masks, shape=(B, N, H, W) --> (B, N, H/4, W/4) + m = labels["masks"] + m = tf.image.resize( + tf.transpose(m, (0, 2, 3, 1)), (h, w), + tf.image.ResizeMethod.NEAREST_NEIGHBOR) + m = tf.transpose(m, (0, 3, 1, 2)) + # The number of ground-truth instance (K), shape=(B,) + num = labels["num_instance"] + n = utilities.resolve_shape(m)[1] # max number of predictions + # is_void[b, i] = 1 if instance i in batch b is a padded slot. + is_void = tf.cast(tf.expand_dims(tf.range(n), 0), tf.float32) # (1, n) + is_void = tf.cast( + tf.math.greater_equal(is_void, tf.expand_dims(num, 1)), tf.float32) + + # (B, N, D) + t = tf.math.l2_normalize(tf.einsum("bhwd,bnhw->bnd", g, m), axis=-1) + inst_dist_logits = tf.einsum("bhwd,bid->bhwi", g, t) / tau # (B, H, W, N) + inst_dist_logits = inst_dist_logits - 100. * tf.reshape(is_void, (b, 1, 1, n)) + mask_id = tf.cast( + tf.einsum("bnhw,n->bhw", m, tf.range(n, dtype=tf.float32)), tf.int32) + loss_map = tf.nn.sparse_softmax_cross_entropy_with_logits( + labels=mask_id, logits=inst_dist_logits) # B, H, W + valid_mask = tf.reduce_sum(m, axis=1) + loss_inst_dist = ( + (tf.reduce_sum(loss_map * valid_mask, axis=[1, 2]) + EPSILON) / + (tf.reduce_sum(valid_mask, axis=[1, 2]) + EPSILON)) + loss_dict["loss_inst_dist"] = tf.reduce_mean(loss_inst_dist) + + +@gin.configurable +def _paragraph_grouping_loss( + loss_dict: Dict[str, Any], + labels: Dict[str, Any], + outputs: Dict[str, Any], + tau: float = gin.REQUIRED, + loss_mode="vanilla", + fl_alpha: float = 0.25, + fl_gamma: float = 2., +): + """Instance discrimination loss. + + This method adds the para discrimination loss term to loss_dict directly. + + Args: + loss_dict: A dictionary for the loss. The values are loss scalars. + labels: The label dictionary. + outputs: The output dictionary. + tau: The temperature term in the loss + loss_mode: The type of loss. + fl_alpha: alpha value in focal loss + fl_gamma: gamma value in focal loss + """ + if "paragraph_labels" not in labels: + loss_dict["loss_para"] = 0. + return + # step 1: + # obtain the paragraph labels for each prediction + # (batch, pred, gt) + matched_matrix = outputs["instance_output"]["matched_mask"] # B, N, N + para_label_gt = labels["paragraph_labels"]["paragraph_ids"] # B, N + has_para_label_gt = ( + labels["paragraph_labels"]["has_para_ids"][:, tf.newaxis, tf.newaxis]) + # '0' means no paragraph labels + pred_label_gt = tf.einsum("bij,bj->bi", matched_matrix, + tf.cast(para_label_gt + 1, tf.float32)) + pred_label_gt_pad_col = tf.expand_dims(pred_label_gt, -1) # b,n,1 + pred_label_gt_pad_row = tf.expand_dims(pred_label_gt, 1) # b,1,n + gt_affinity = tf.cast( + tf.equal(pred_label_gt_pad_col, pred_label_gt_pad_row), tf.float32) + gt_affinity_mask = ( + has_para_label_gt * pred_label_gt_pad_col * pred_label_gt_pad_row) + gt_affinity_mask = tf.cast(tf.not_equal(gt_affinity_mask, 0.), tf.float32) + + # step 2: + # get affinity matrix + affinity = outputs["para_affinity"] + + # step 3: + # compute loss + loss_fn = tf.keras.losses.BinaryCrossentropy( + from_logits=True, + label_smoothing=0, + axis=-1, + reduction=tf.keras.losses.Reduction.NONE, + name="para_dist") + affinity = tf.reshape(affinity, (-1, 1)) # (b*n*n, 1) + gt_affinity = tf.reshape(gt_affinity, (-1, 1)) # (b*n*n, 1) + gt_affinity_mask = tf.reshape(gt_affinity_mask, (-1,)) # (b*n*n,) + pointwise_loss = loss_fn(gt_affinity, affinity / tau) # (b*n*n,) + + if loss_mode == "vanilla": + loss = ( + tf.reduce_sum(pointwise_loss * gt_affinity_mask) / + (tf.reduce_sum(gt_affinity_mask) + EPSILON)) + elif loss_mode == "balanced": + # pos + pos_mask = gt_affinity_mask * gt_affinity[:, 0] + pos_loss = ( + tf.reduce_sum(pointwise_loss * pos_mask) / + (tf.reduce_sum(pos_mask) + EPSILON)) + # neg + neg_mask = gt_affinity_mask * (1. - gt_affinity[:, 0]) + neg_loss = ( + tf.reduce_sum(pointwise_loss * neg_mask) / + (tf.reduce_sum(neg_mask) + EPSILON)) + loss = 0.25 * pos_loss + 0.75 * neg_loss + elif loss_mode == "focal": + alpha_wt = fl_alpha * gt_affinity + (1. - fl_alpha) * (1. - gt_affinity) + prob_pos = tf.math.sigmoid(affinity / tau) + pt = prob_pos * gt_affinity + (1. - prob_pos) * (1. - gt_affinity) + fl_loss_pw = tf.stop_gradient( + alpha_wt * tf.pow(1. - pt, fl_gamma))[:, 0] * pointwise_loss + loss = ( + tf.reduce_sum(fl_loss_pw * gt_affinity_mask) / + (tf.reduce_sum(gt_affinity_mask) + EPSILON)) + else: + raise ValueError(f"Not supported loss mode: {loss_mode}") + + loss_dict["loss_para"] = loss + + +def _mask_id_xent_loss(loss_dict: Dict[str, Any], labels: Dict[str, Any], + outputs: Dict[str, Any]): + """Mask ID loss. + + This method adds the mask ID loss term to loss_dict directly. + + Args: + loss_dict: A dictionary for the loss. The values are loss scalars. + labels: The label dictionary. + outputs: The output dictionary. + """ + # (B, N, H, W) + mask_gt = labels["masks"] + # B, H, W, N + mask_id_logits = outputs["instance_output"]["mask_id_logits"] + # B, N, N + matched_matrix = outputs["instance_output"]["matched_mask"] + # B, N + gt_to_pred_id = tf.cast(tf.math.argmax(matched_matrix, axis=1), tf.float32) + # B, H, W + mask_id_labels = tf.cast( + tf.einsum("bnhw,bn->bhw", mask_gt, gt_to_pred_id), tf.int32) + loss_map = tf.nn.sparse_softmax_cross_entropy_with_logits( + labels=mask_id_labels, logits=mask_id_logits) + valid_mask = tf.reduce_sum(mask_gt, axis=1) + loss_mask_id = ( + (tf.reduce_sum(loss_map * valid_mask, axis=[1, 2]) + EPSILON) / + (tf.reduce_sum(valid_mask, axis=[1, 2]) + EPSILON)) + loss_dict["loss_mask_id"] = tf.reduce_mean(loss_mask_id) diff --git a/official/projects/unified_detector/registry_imports.py b/official/projects/unified_detector/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..93eebcca58e586c42379014d7d74f81e603c473e --- /dev/null +++ b/official/projects/unified_detector/registry_imports.py @@ -0,0 +1,21 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration.""" + +# pylint: disable=unused-import +from official.projects.unified_detector import external_configurables +from official.projects.unified_detector.configs import ocr_config +from official.projects.unified_detector.tasks import ocr_task +from official.vision import registry_imports diff --git a/official/projects/unified_detector/requirements.txt b/official/projects/unified_detector/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..081993d5d7486bced189dc115828ce23865d9099 --- /dev/null +++ b/official/projects/unified_detector/requirements.txt @@ -0,0 +1,8 @@ +tf-nightly +gin-config +opencv-python==4.2.0.32 +absl-py>=1.0.0 +shapely>=1.8.1 +apache_beam>=2.37.0 +matplotlib>=3.5.1 +notebook>=6.4.10 diff --git a/official/projects/unified_detector/run_inference.py b/official/projects/unified_detector/run_inference.py new file mode 100644 index 0000000000000000000000000000000000000000..13a0c67c7ec4f81f18f1cb1acfadb685eb767316 --- /dev/null +++ b/official/projects/unified_detector/run_inference.py @@ -0,0 +1,222 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""A binary to run unified detector.""" + +import json +import os +from typing import Any, Dict, Sequence, Union + +from absl import app +from absl import flags +from absl import logging + +import cv2 +import gin +import numpy as np +import tensorflow as tf +import tqdm + +from official.projects.unified_detector import external_configurables # pylint: disable=unused-import +from official.projects.unified_detector.modeling import universal_detector +from official.projects.unified_detector.utils import utilities + + +# group two lines into a paragraph if affinity score higher than this +_PARA_GROUP_THR = 0.5 + + +# MODEL spec +_GIN_FILE = flags.DEFINE_string( + 'gin_file', None, 'Path to the Gin file that defines the model.') +_CKPT_PATH = flags.DEFINE_string( + 'ckpt_path', None, 'Path to the checkpoint directory.') +_IMG_SIZE = flags.DEFINE_integer( + 'img_size', 1024, 'Size of the image fed to the model.') + +# Input & Output +# Note that, all images specified by `img_file` and `img_dir` will be processed. +_IMG_FILE = flags.DEFINE_multi_string('img_file', [], 'Paths to the images.') +_IMG_DIR = flags.DEFINE_multi_string( + 'img_dir', [], 'Paths to the image directories.') +_OUTPUT_PATH = flags.DEFINE_string('output_path', None, 'Path for the output.') +_VIS_DIR = flags.DEFINE_string( + 'vis_dir', None, 'Path for the visualization output.') + + +def _preprocess(raw_image: np.ndarray) -> Union[np.ndarray, float]: + """Convert a raw image to properly resized, padded, and normalized ndarray.""" + # (1) convert to tf.Tensor and float32. + img_tensor = tf.convert_to_tensor(raw_image, dtype=tf.float32) + + # (2) pad to square. + height, width = img_tensor.shape[:2] + maximum_side = tf.maximum(height, width) + height_pad = maximum_side - height + width_pad = maximum_side - width + img_tensor = tf.pad( + img_tensor, [[0, height_pad], [0, width_pad], [0, 0]], + constant_values=127) + ratio = maximum_side / _IMG_SIZE.value + # (3) resize long side to the maximum length. + img_tensor = tf.image.resize( + img_tensor, (_IMG_SIZE.value, _IMG_SIZE.value)) + img_tensor = tf.cast(img_tensor, tf.uint8) + + # (4) normalize + img_tensor = utilities.normalize_image_to_range(img_tensor) + + # (5) Add batch dimension and return as numpy array. + return tf.expand_dims(img_tensor, 0).numpy(), float(ratio) + + +def load_model() -> tf.keras.layers.Layer: + gin.parse_config_file(_GIN_FILE.value) + model = universal_detector.UniversalDetector() + ckpt = tf.train.Checkpoint(model=model) + ckpt_path = _CKPT_PATH.value + logging.info('Load ckpt from: %s', ckpt_path) + ckpt.restore(ckpt_path).expect_partial() + return model + + +def inference(img_file: str, model: tf.keras.layers.Layer) -> Dict[str, Any]: + """Inference step.""" + img = cv2.cvtColor(cv2.imread(img_file), cv2.COLOR_BGR2RGB) + img_ndarray, ratio = _preprocess(img) + + output_dict = model.serve(img_ndarray) + class_tensor = output_dict['classes'].numpy() + mask_tensor = output_dict['masks'].numpy() + group_tensor = output_dict['groups'].numpy() + + indices = np.where(class_tensor[0])[0].tolist() # indices of positive slots. + mask_list = [ + mask_tensor[0, :, :, index] for index in indices] # List of mask ndarray. + + # Form lines and words + lines = [] + line_indices = [] + for index, mask in tqdm.tqdm(zip(indices, mask_list)): + line = { + 'words': [], + 'text': '', + } + + contours, _ = cv2.findContours( + (mask > 0.).astype(np.uint8), + cv2.RETR_TREE, + cv2.CHAIN_APPROX_SIMPLE)[-2:] + for contour in contours: + if (isinstance(contour, np.ndarray) and + len(contour.shape) == 3 and + contour.shape[0] > 2 and + contour.shape[1] == 1 and + contour.shape[2] == 2): + cnt_list = (contour[:, 0] * ratio).astype(np.int32).tolist() + line['words'].append({'text': '', 'vertices': cnt_list}) + else: + logging.error('Invalid contour: %s, discarded', str(contour)) + if line['words']: + lines.append(line) + line_indices.append(index) + + # Form paragraphs + line_grouping = utilities.DisjointSet(len(line_indices)) + affinity = group_tensor[0][line_indices][:, line_indices] + for i1, i2 in zip(*np.where(affinity > _PARA_GROUP_THR)): + line_grouping.union(i1, i2) + + line_groups = line_grouping.to_group() + paragraphs = [] + for line_group in line_groups: + paragraph = {'lines': []} + for id_ in line_group: + paragraph['lines'].append(lines[id_]) + if paragraph: + paragraphs.append(paragraph) + + return paragraphs + + +def main(argv: Sequence[str]) -> None: + if len(argv) > 1: + raise app.UsageError('Too many command-line arguments.') + + # Get list of images + img_lists = [] + img_lists.extend(_IMG_FILE.value) + for img_dir in _IMG_DIR.value: + img_lists.extend(tf.io.gfile.glob(os.path.join(img_dir, '*'))) + + logging.info('Total number of input images: %d', len(img_lists)) + + model = load_model() + + vis_dis = _VIS_DIR.value + + output = {'annotations': []} + for img_file in tqdm.tqdm(img_lists): + output['annotations'].append({ + 'image_id': img_file.split('/')[-1].split('.')[0], + 'paragraphs': inference(img_file, model), + }) + + if vis_dis: + key = output['annotations'][-1]['image_id'] + paragraphs = output['annotations'][-1]['paragraphs'] + img = cv2.cvtColor(cv2.imread(img_file), cv2.COLOR_BGR2RGB) + word_bnds = [] + line_bnds = [] + para_bnds = [] + for paragraph in paragraphs: + paragraph_points_list = [] + for line in paragraph['lines']: + line_points_list = [] + for word in line['words']: + word_bnds.append( + np.array(word['vertices'], np.int32).reshape((-1, 1, 2))) + line_points_list.extend(word['vertices']) + paragraph_points_list.extend(line_points_list) + + line_points = np.array(line_points_list, np.int32) # (N,2) + left = int(np.min(line_points[:, 0])) + top = int(np.min(line_points[:, 1])) + right = int(np.max(line_points[:, 0])) + bottom = int(np.max(line_points[:, 1])) + line_bnds.append( + np.array([[[left, top]], [[right, top]], [[right, bottom]], + [[left, bottom]]], np.int32)) + para_points = np.array(paragraph_points_list, np.int32) # (N,2) + left = int(np.min(para_points[:, 0])) + top = int(np.min(para_points[:, 1])) + right = int(np.max(para_points[:, 0])) + bottom = int(np.max(para_points[:, 1])) + para_bnds.append( + np.array([[[left, top]], [[right, top]], [[right, bottom]], + [[left, bottom]]], np.int32)) + + for name, bnds in zip(['paragraph', 'line', 'word'], + [para_bnds, line_bnds, word_bnds]): + vis = cv2.polylines(img, bnds, True, (0, 0, 255), 2) + cv2.imwrite(os.path.join(vis_dis, f'{key}-{name}.jpg'), + cv2.cvtColor(vis, cv2.COLOR_RGB2BGR)) + + with tf.io.gfile.GFile(_OUTPUT_PATH.value, mode='w') as f: + f.write(json.dumps(output, ensure_ascii=False, indent=2)) + + +if __name__ == '__main__': + flags.mark_flags_as_required(['gin_file', 'ckpt_path', 'output_path']) + app.run(main) diff --git a/official/projects/unified_detector/tasks/all_models.py b/official/projects/unified_detector/tasks/all_models.py new file mode 100644 index 0000000000000000000000000000000000000000..cc75f37956cca273a64a399647fd101ed13178e4 --- /dev/null +++ b/official/projects/unified_detector/tasks/all_models.py @@ -0,0 +1,23 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Import all models. + +All model files are imported here so that they can be referenced in Gin. Also, +importing here avoids making ocr_task.py too messy. +""" + +# pylint: disable=unused-import + +from official.projects.unified_detector.modeling import universal_detector diff --git a/official/projects/unified_detector/tasks/ocr_task.py b/official/projects/unified_detector/tasks/ocr_task.py new file mode 100644 index 0000000000000000000000000000000000000000..8462fd4284ccf1fdec5ad80889a2c5292aea1d1d --- /dev/null +++ b/official/projects/unified_detector/tasks/ocr_task.py @@ -0,0 +1,108 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Task definition for ocr.""" + +from typing import Callable, Dict, Optional, Sequence, Tuple, Union + +import gin +import tensorflow as tf + +from official.core import base_task +from official.core import config_definitions as cfg +from official.core import task_factory +from official.projects.unified_detector.configs import ocr_config +from official.projects.unified_detector.data_loaders import input_reader +from official.projects.unified_detector.tasks import all_models # pylint: disable=unused-import +from official.projects.unified_detector.utils import typing + +NestedTensorDict = typing.NestedTensorDict +ModelType = Union[tf.keras.layers.Layer, tf.keras.Model] + + +@task_factory.register_task_cls(ocr_config.OcrTaskConfig) +@gin.configurable +class OcrTask(base_task.Task): + """Defining the OCR training task.""" + + _loss_items = [] + + def __init__(self, + params: cfg.TaskConfig, + logging_dir: Optional[str] = None, + name: Optional[str] = None, + model_fn: Callable[..., ModelType] = gin.REQUIRED): + super().__init__(params, logging_dir, name) + self._modef_fn = model_fn + + def build_model(self) -> ModelType: + """Build and return the model, record the loss items as well.""" + model = self._modef_fn() + self._loss_items.extend(model.loss_items) + return model + + def build_inputs( + self, + params: cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None + ) -> tf.data.Dataset: + """Build the tf.data.Dataset instance.""" + return input_reader.InputFn(is_training=params.is_training)({}, + input_context) + + def build_metrics(self, + training: bool = True) -> Sequence[tf.keras.metrics.Metric]: + """Build the metrics (currently, only for loss summaries in TensorBoard).""" + del training + metrics = [] + # Add loss items + for name in self._loss_items: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + # TODO(longshangbang): add evaluation metrics + return metrics + + def train_step( + self, + inputs: Tuple[NestedTensorDict, NestedTensorDict], + model: ModelType, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[Sequence[tf.keras.metrics.Metric]] = None + ) -> Dict[str, tf.Tensor]: + features, labels = inputs + input_dict = {"features": features} + if self.task_config.model_call_needs_labels: + input_dict["labels"] = labels + + is_mixed_precision = isinstance(optimizer, + tf.keras.mixed_precision.LossScaleOptimizer) + + with tf.GradientTape() as tape: + outputs = model(**input_dict, training=True) + loss, loss_dict = model.compute_losses(labels=labels, outputs=outputs) + loss = loss / tf.distribute.get_strategy().num_replicas_in_sync + if is_mixed_precision: + loss = optimizer.get_scaled_loss(loss) + + tvars = model.trainable_variables + grads = tape.gradient(loss, tvars) + if is_mixed_precision: + grads = optimizer.get_unscaled_gradients(grads) + + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {"loss": loss} + if metrics: + for m in metrics: + m.update_state(loss_dict[m.name]) + return logs diff --git a/official/projects/unified_detector/train.py b/official/projects/unified_detector/train.py new file mode 100644 index 0000000000000000000000000000000000000000..03ce0e2a18ab907a0d8ff427c3a8ab3eef498206 --- /dev/null +++ b/official/projects/unified_detector/train.py @@ -0,0 +1,70 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver.""" + +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +# pylint: disable=unused-import +from official.projects.unified_detector import registry_imports +# pylint: enable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + + +if __name__ == '__main__': + tfm_flags.define_flags() + flags.mark_flags_as_required(['experiment', 'mode', 'model_dir']) + app.run(main) diff --git a/official/projects/unified_detector/utils/typing.py b/official/projects/unified_detector/utils/typing.py new file mode 100644 index 0000000000000000000000000000000000000000..00bcb50e95610f9d0d055e3effd7f1076ca74c54 --- /dev/null +++ b/official/projects/unified_detector/utils/typing.py @@ -0,0 +1,28 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Typing extension.""" + +from typing import Dict, Union + +import numpy as np +import tensorflow as tf + +NpDict = Dict[str, np.ndarray] +FeaturesAndLabelsType = Dict[str, Dict[str, tf.Tensor]] +TensorDict = Dict[Union[str, int], tf.Tensor] +NestedTensorDict = Dict[ + Union[str, int], + Union[tf.Tensor, + TensorDict]] diff --git a/official/projects/unified_detector/utils/utilities.py b/official/projects/unified_detector/utils/utilities.py new file mode 100644 index 0000000000000000000000000000000000000000..2a8c20a8711a28a9e4e30fa76b2d3605665c9103 --- /dev/null +++ b/official/projects/unified_detector/utils/utilities.py @@ -0,0 +1,235 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Utility functions.""" + +import collections +from typing import List, Optional, Union + +import tensorflow as tf + + +def resolve_shape( + tensor: tf.Tensor, + resolve_batch_size: bool = True) -> List[Union[tf.Tensor, int]]: + """Fully resolves the shape of the tensor. + + Args: + tensor: The tensor for which to resolve the shape. + resolve_batch_size: If True, fully resolve the batch size. If False, + return the batch size if it is statically known and -1 otherwise. This + can be more efficient when converting a model to TFLite. + + Returns: + A list containing the static dimension where possible and the dynamic + dimension otherwise. + """ + with tf.name_scope('resolve_shape'): + shape = tensor.get_shape().as_list() + if None in shape: + shape_dynamic = tf.shape(tensor) + if shape[0] is None: + shape[0] = shape_dynamic[0] if resolve_batch_size else -1 + for i in range(1, len(shape)): + if shape[i] is None: + shape[i] = shape_dynamic[i] + return shape + + +def set_shape_dim(tensor: tf.Tensor, index: int, size: int) -> None: + """Set value of index-th element of tensor shape to size.""" + shape = tensor.get_shape().as_list() + if len(shape) <= index: + raise ValueError( + 'Tensor rank must be at least %d. Got %d' % (index + 1, len(shape))) + shape[index] = size + tensor.set_shape(shape) + + +def truncate_or_pad(input_tensor: tf.Tensor, + new_size: int, + axis: int = 1, + constant_value: Union[int, float] = 0) -> tf.Tensor: + """Truncate or zeros pad the axis of input tensor to new size.""" + rank = len(input_tensor.shape) + + if rank <= axis: + raise ValueError( + 'Tensor rank must be at least %d. Got %d' % (axis + 1, rank)) + + orig_size = tf.shape(input_tensor)[axis] + + def _new_size(dim): + if dim == axis: + return new_size + n = tf.shape(input_tensor)[dim] + return -1 if n is None else n + + def _truncate(): + begin = [0] * rank + size = [_new_size(dim) for dim in range(rank)] + return tf.slice(input_tensor, begin, size) + + def _pad(): + padding = [[0, 0] for _ in range(rank)] + padding[axis][1] = new_size - orig_size + return tf.pad(input_tensor, padding, constant_values=constant_value) + + output = tf.cond(orig_size >= new_size, _truncate, _pad) + if isinstance(new_size, int): + set_shape_dim(output, axis, new_size) + return output + + +def rotate_rboxes90(rboxes: tf.Tensor, + image_width: int, + image_height: int, + rotation_count: int = 1) -> tf.Tensor: + """Rotate oriented rectangles counter-clockwise by multiples of 90 degrees.""" + image_width = tf.cast(image_width, dtype=tf.float32) + image_height = tf.cast(image_height, dtype=tf.float32) + + rotation_count = rotation_count % 4 + x, y, w, h, angle = tf.split(rboxes, 5, axis=1) + + if rotation_count == 0: + return rboxes + elif rotation_count == 1: + angle = tf.where(angle < -90.0, angle + 270, angle - 90) + return tf.concat([y, image_width - x - 1, w, h, angle], axis=1) + elif rotation_count == 2: + angle = tf.where(angle < 0.0, angle + 180, angle - 180) + return tf.concat([image_width - x - 1, image_height - y - 1, w, h, angle], + axis=1) + else: + angle = tf.where(angle > 90.0, angle - 270, angle + 90) + return tf.concat([image_height - y - 1, x, w, h, angle], axis=1) + + +def normalize_image_to_range(image: tf.Tensor, + original_minval: int = 0, + original_maxval: int = 255, + target_minval: float = -1.0, + target_maxval: float = 1.0) -> tf.Tensor: + """Normalizes pixel values in the image. + + Moves the pixel values from the current [original_minval, original_maxval] + range to the [target_minval, target_maxval] range. + + Args: + image: A tensor of shape [height, width, channels]. Input will be converted + to float32 type before normalization. + original_minval: current image minimum value. + original_maxval: current image maximum value. + target_minval: target image minimum value. + target_maxval: target image maximum value. + + Returns: + A float tensor with the same shape as the input image. + """ + if image.dtype is not tf.float32: + image = tf.cast(image, dtype=tf.float32) + + original_minval = float(original_minval) + original_maxval = float(original_maxval) + target_minval = float(target_minval) + target_maxval = float(target_maxval) + image = tf.cast(image, dtype=tf.float32) + image = tf.subtract(image, original_minval) + image = tf.multiply(image, (target_maxval - target_minval) / + (original_maxval - original_minval)) + image = tf.add(image, target_minval) + + return image + + +def get_padding_mask_from_valid_lengths( + valid_lengths: tf.Tensor, + max_length: Optional[int] = None, + dtype: tf.dtypes.DType = tf.bool) -> tf.Tensor: + """Gets a 2D mask of the padded region from valid lengths. + + Args: + valid_lengths: A 1D int tensor containing the valid length of each row. + max_length: (optional, int) The maximum length of each row. If `None`, the + maximum value in `valid_lengths` will be used. + dtype: The output dtype. + + Returns: + 2D padded region mask. + """ + with tf.name_scope('get_padding_mask_from_valid_lengths'): + if max_length is None: + max_length = tf.reduce_max(valid_lengths) + padding_mask = tf.logical_not(tf.sequence_mask(valid_lengths, max_length)) + + return tf.cast(padding_mask, dtype=dtype) + + +def get_transformer_attention_bias(padding_mask: tf.Tensor) -> tf.Tensor: + """Gets attention bias. + + Bias tensor that is added to the pre-softmax multi-headed attention logits, + which has shape [batch_size, num_attention_heads, max_length, max_length]. + The tensor is zero at non-padded locations, and -1e9 (negative infinity) at + padded locations. + + Args: + padding_mask: A [batch_size, max_length] float tensor, the padding mask. + + Returns: + Attention bias tensor of shape [batch_size, 1, 1, max_length]. + """ + with tf.name_scope('attention_bias'): + # Uses -1e9 to represent -infinity. We do not actually use -Inf, since we + # want to be able to multiply these values by zero to get zero. + # (-Inf * 0 = NaN) + attention_bias = padding_mask * -1e9 + attention_bias = tf.expand_dims( + tf.expand_dims(attention_bias, axis=1), axis=1) + + return attention_bias + + +class DisjointSet: + """A disjoint set implementation.""" + + def __init__(self, num_elements: int): + self._num_elements = num_elements + self._parent = list(range(num_elements)) + + def find(self, item: int) -> int: + if self._parent[item] == item: + return item + else: + self._parent[item] = self.find(self._parent[item]) + return self._parent[item] + + def union(self, i1: int, i2: int) -> None: + r1 = self.find(i1) + r2 = self.find(i2) + self._parent[r1] = r2 + + def to_group(self) -> List[List[int]]: + """Return the grouping results. + + Returns: + A list of integer lists. Each list represents the IDs belonging to the + same group. + """ + groups = collections.defaultdict(list) + for i in range(self._num_elements): + r = self.find(i) + groups[r].append(i) + return list(groups.values()) diff --git a/official/projects/video_ssl/README.md b/official/projects/video_ssl/README.md new file mode 100644 index 0000000000000000000000000000000000000000..86b00d819cc046818857aca6d9532cf25cef5baa --- /dev/null +++ b/official/projects/video_ssl/README.md @@ -0,0 +1,59 @@ +# Spatiotemporal Contrastive Video Representation Learning + +[![Paper](http://img.shields.io/badge/Paper-arXiv.2008.03800-B3181B?logo=arXiv)](https://arxiv.org/abs/2008.03800) + +This repository is the official TF2 implementation of [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800). + +

+ +

+ +## Description + +We present a self-supervised Contrastive Video Representation Learning (CVRL) +method to learn spatiotemporal visual representations from unlabeled videos. Our +representations are learned using a contrastive loss, where two augmented clips +from the same short video are pulled together in the embedding space, while +clips from different videos are pushed away. CVRL significantly closes the gap +between unsupervised and supervised video representation learning. + +Here we release the code and pre-trained models. + + +## Experimental Results + +### Kinetics-600 top-1 linear classification accuracy + +

+ +

+ + +## Pre-trained Model Checkpoints + +We provide model checkpoints pre-trained on unlabeled RGB videos from +Kinetics-400 and Kinetics-600. All models are trained from scratch with random +initialization. + +We also provide a baseline model checkpoint of "ImageNet inflated" we used in +the paper. The model has the same architecture as 3D-ResNet-50 (R3D-50), with +model weights inflated from a 2D ResNet-50 pre-trained on ImageNet. + +| Model | Parameters | Dataset | Epochs | K400 Linear Eval. | K600 Linear Eval. | Checkpoint | +| :--------------: | :----: | :--: | :--: |:-----------: | :----------: | :----------: | +| R3D-50 (1x) | 31.7M | ImageNet | - | 53.5% | 54.7% | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/imagenet.tar.gz) | +| R3D-50 (1x) | 31.7M | Kinetics-400 | 200 | 63.8% | - | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/r3d_1x_k400_200ep.tar.gz) | +| R3D-50 (1x) | 31.7M | Kinetics-400 | 800 | 66.1% | - | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/r3d_1x_k400_800ep.tar.gz) | +| R3D-50 (1x) | 31.7M | Kinetics-600 | 800 | 68.5% | 70.4% | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/r3d_1x_k600_800ep.tar.gz) | + + +## Citation + +``` +@inproceedings{qian2021spatiotemporal, + title={Spatiotemporal contrastive video representation learning}, + author={Qian, Rui and Meng, Tianjian and Gong, Boqing and Yang, Ming-Hsuan and Wang, Huisheng and Belongie, Serge and Cui, Yin}, + booktitle={CVPR}, + year={2021} +} +``` diff --git a/official/projects/video_ssl/configs/__init__.py b/official/projects/video_ssl/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..976989d6c849e215702132b12c2c88f290d62646 --- /dev/null +++ b/official/projects/video_ssl/configs/__init__.py @@ -0,0 +1,17 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs package definition.""" + +from official.projects.video_ssl.configs import video_ssl diff --git a/official/vision/beta/projects/video_ssl/configs/experiments/cvrl_linear_eval_k600.yaml b/official/projects/video_ssl/configs/experiments/cvrl_linear_eval_k600.yaml similarity index 100% rename from official/vision/beta/projects/video_ssl/configs/experiments/cvrl_linear_eval_k600.yaml rename to official/projects/video_ssl/configs/experiments/cvrl_linear_eval_k600.yaml diff --git a/official/vision/beta/projects/video_ssl/configs/experiments/cvrl_pretrain_k600_200ep.yaml b/official/projects/video_ssl/configs/experiments/cvrl_pretrain_k600_200ep.yaml similarity index 100% rename from official/vision/beta/projects/video_ssl/configs/experiments/cvrl_pretrain_k600_200ep.yaml rename to official/projects/video_ssl/configs/experiments/cvrl_pretrain_k600_200ep.yaml diff --git a/official/vision/beta/projects/video_ssl/configs/video_ssl.py b/official/projects/video_ssl/configs/video_ssl.py similarity index 96% rename from official/vision/beta/projects/video_ssl/configs/video_ssl.py rename to official/projects/video_ssl/configs/video_ssl.py index b2dcb22cef59ad3ddc3a484c3d787f53dc687df0..80bf506d7f77cd489a37dc4973e27cfc426c4774 100644 --- a/official/vision/beta/projects/video_ssl/configs/video_ssl.py +++ b/official/projects/video_ssl/configs/video_ssl.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Video classification configuration definition.""" @@ -20,8 +19,8 @@ import dataclasses from official.core import config_definitions as cfg from official.core import exp_factory -from official.vision.beta.configs import common -from official.vision.beta.configs import video_classification +from official.vision.configs import common +from official.vision.configs import video_classification Losses = video_classification.Losses diff --git a/official/vision/beta/projects/video_ssl/configs/video_ssl_test.py b/official/projects/video_ssl/configs/video_ssl_test.py similarity index 91% rename from official/vision/beta/projects/video_ssl/configs/video_ssl_test.py rename to official/projects/video_ssl/configs/video_ssl_test.py index d6e3eeac2f8849a4d2a92bac97cd2c05949f0da4..3b11ddec1302c115d03a6425ba80535ae2e23562 100644 --- a/official/vision/beta/projects/video_ssl/configs/video_ssl_test.py +++ b/official/projects/video_ssl/configs/video_ssl_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,16 +12,15 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 # pylint: disable=unused-import from absl.testing import parameterized import tensorflow as tf +from official import vision from official.core import config_definitions as cfg from official.core import exp_factory -from official.vision import beta -from official.vision.beta.projects.video_ssl.configs import video_ssl as exp_cfg +from official.projects.video_ssl.configs import video_ssl as exp_cfg class VideoClassificationConfigTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/vision/beta/projects/video_ssl/dataloaders/video_ssl_input.py b/official/projects/video_ssl/dataloaders/video_ssl_input.py similarity index 94% rename from official/vision/beta/projects/video_ssl/dataloaders/video_ssl_input.py rename to official/projects/video_ssl/dataloaders/video_ssl_input.py index dc7bd88ed7a13caf0d67c6fe6d6a58e5bb1b3837..f49f712342b419d8380d792653f658a22f11d6b5 100644 --- a/official/vision/beta/projects/video_ssl/dataloaders/video_ssl_input.py +++ b/official/projects/video_ssl/dataloaders/video_ssl_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,18 +12,16 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Parser for video and label datasets.""" from typing import Dict, Optional, Tuple from absl import logging import tensorflow as tf - -from official.vision.beta.dataloaders import video_input -from official.vision.beta.ops import preprocess_ops_3d -from official.vision.beta.projects.video_ssl.configs import video_ssl as exp_cfg -from official.vision.beta.projects.video_ssl.ops import video_ssl_preprocess_ops +from official.projects.video_ssl.configs import video_ssl as exp_cfg +from official.projects.video_ssl.ops import video_ssl_preprocess_ops +from official.vision.dataloaders import video_input +from official.vision.ops import preprocess_ops_3d IMAGE_KEY = 'image/encoded' LABEL_KEY = 'clip/label/index' @@ -130,6 +128,9 @@ def _process_image(image: tf.Tensor, # Self-supervised pre-training augmentations. if is_training and is_ssl: + if zero_centering_image: + image_1 = 0.5 * (image_1 + 1.0) + image_2 = 0.5 * (image_2 + 1.0) # Temporally consistent color jittering. image_1 = video_ssl_preprocess_ops.random_color_jitter_3d(image_1) image_2 = video_ssl_preprocess_ops.random_color_jitter_3d(image_2) @@ -141,6 +142,8 @@ def _process_image(image: tf.Tensor, image_2 = video_ssl_preprocess_ops.random_solarization(image_2) image = tf.concat([image_1, image_2], axis=0) image = tf.clip_by_value(image, 0., 1.) + if zero_centering_image: + image = 2 * (image - 0.5) return image @@ -235,7 +238,8 @@ class Parser(video_input.Parser): stride=self._stride, num_test_clips=self._num_test_clips, min_resize=self._min_resize, - crop_size=self._crop_size) + crop_size=self._crop_size, + zero_centering_image=self._zero_centering_image) image = tf.cast(image, dtype=self._dtype) features = {'image': image} @@ -257,7 +261,8 @@ class Parser(video_input.Parser): num_test_clips=self._num_test_clips, min_resize=self._min_resize, crop_size=self._crop_size, - num_crops=self._num_crops) + num_crops=self._num_crops, + zero_centering_image=self._zero_centering_image) image = tf.cast(image, dtype=self._dtype) features = {'image': image} diff --git a/official/vision/beta/projects/video_ssl/dataloaders/video_ssl_input_test.py b/official/projects/video_ssl/dataloaders/video_ssl_input_test.py similarity index 93% rename from official/vision/beta/projects/video_ssl/dataloaders/video_ssl_input_test.py rename to official/projects/video_ssl/dataloaders/video_ssl_input_test.py index abf6478968b0ac5d28caa691f57ac38407709f03..951f5bd0ec3fcef25fd4a7f06125d27330462463 100644 --- a/official/vision/beta/projects/video_ssl/dataloaders/video_ssl_input_test.py +++ b/official/projects/video_ssl/dataloaders/video_ssl_input_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 import io @@ -21,8 +20,8 @@ import numpy as np from PIL import Image import tensorflow as tf -from official.vision.beta.projects.video_ssl.configs import video_ssl as exp_cfg -from official.vision.beta.projects.video_ssl.dataloaders import video_ssl_input +from official.projects.video_ssl.configs import video_ssl as exp_cfg +from official.projects.video_ssl.dataloaders import video_ssl_input AUDIO_KEY = 'features/audio' diff --git a/official/projects/video_ssl/losses/losses.py b/official/projects/video_ssl/losses/losses.py new file mode 100644 index 0000000000000000000000000000000000000000..2aa2085b80e4392d055230505b5ccc2852562e0d --- /dev/null +++ b/official/projects/video_ssl/losses/losses.py @@ -0,0 +1,135 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Define losses.""" + +# Import libraries +import tensorflow as tf +from tensorflow.compiler.tf2xla.python import xla + + +def contrastive_loss(hidden, + num_replicas, + normalize_hidden, + temperature, + model, + weight_decay): + """Computes contrastive loss. + + Args: + hidden: embedding of video clips after projection head. + num_replicas: number of distributed replicas. + normalize_hidden: whether or not to l2 normalize the hidden vector. + temperature: temperature in the InfoNCE contrastive loss. + model: keras model for calculating weight decay. + weight_decay: weight decay parameter. + + Returns: + A loss scalar. + The logits for contrastive prediction task. + The labels for contrastive prediction task. + """ + large_num = 1e9 + + hidden1, hidden2 = tf.split(hidden, num_or_size_splits=2, axis=0) + if normalize_hidden: + hidden1 = tf.math.l2_normalize(hidden1, -1) + hidden2 = tf.math.l2_normalize(hidden2, -1) + batch_size = tf.shape(hidden1)[0] + + if num_replicas == 1: + # This is the local version + hidden1_large = hidden1 + hidden2_large = hidden2 + labels = tf.one_hot(tf.range(batch_size), batch_size * 2) + masks = tf.one_hot(tf.range(batch_size), batch_size) + + else: + # This is the cross-tpu version. + hidden1_large = tpu_cross_replica_concat(hidden1, num_replicas) + hidden2_large = tpu_cross_replica_concat(hidden2, num_replicas) + enlarged_batch_size = tf.shape(hidden1_large)[0] + replica_id = tf.cast(tf.cast(xla.replica_id(), tf.uint32), tf.int32) + labels_idx = tf.range(batch_size) + replica_id * batch_size + labels = tf.one_hot(labels_idx, enlarged_batch_size * 2) + masks = tf.one_hot(labels_idx, enlarged_batch_size) + + logits_aa = tf.matmul(hidden1, hidden1_large, transpose_b=True) / temperature + logits_aa = logits_aa - tf.cast(masks, logits_aa.dtype) * large_num + logits_bb = tf.matmul(hidden2, hidden2_large, transpose_b=True) / temperature + logits_bb = logits_bb - tf.cast(masks, logits_bb.dtype) * large_num + logits_ab = tf.matmul(hidden1, hidden2_large, transpose_b=True) / temperature + logits_ba = tf.matmul(hidden2, hidden1_large, transpose_b=True) / temperature + + loss_a = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( + labels, tf.concat([logits_ab, logits_aa], 1))) + loss_b = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( + labels, tf.concat([logits_ba, logits_bb], 1))) + loss = loss_a + loss_b + + l2_loss = weight_decay * tf.add_n([ + tf.nn.l2_loss(v) + for v in model.trainable_variables + if 'kernel' in v.name + ]) + + total_loss = loss + tf.cast(l2_loss, loss.dtype) + + contrast_prob = tf.nn.softmax(logits_ab) + contrast_entropy = - tf.reduce_mean( + tf.reduce_sum(contrast_prob * tf.math.log(contrast_prob + 1e-8), -1)) + + contrast_acc = tf.equal(tf.argmax(labels, 1), tf.argmax(logits_ab, axis=1)) + contrast_acc = tf.reduce_mean(tf.cast(contrast_acc, tf.float32)) + + return { + 'total_loss': total_loss, + 'contrastive_loss': loss, + 'reg_loss': l2_loss, + 'contrast_acc': contrast_acc, + 'contrast_entropy': contrast_entropy, + } + + +def tpu_cross_replica_concat(tensor, num_replicas): + """Reduce a concatenation of the `tensor` across TPU cores. + + Args: + tensor: tensor to concatenate. + num_replicas: number of TPU device replicas. + + Returns: + Tensor of the same rank as `tensor` with first dimension `num_replicas` + times larger. + """ + with tf.name_scope('tpu_cross_replica_concat'): + # This creates a tensor that is like the input tensor but has an added + # replica dimension as the outermost dimension. On each replica it will + # contain the local values and zeros for all other values that need to be + # fetched from other replicas. + ext_tensor = tf.scatter_nd( + indices=[[xla.replica_id()]], + updates=[tensor], + shape=[num_replicas] + tensor.shape.as_list()) + + # As every value is only present on one replica and 0 in all others, adding + # them all together will result in the full tensor on all replicas. + replica_context = tf.distribute.get_replica_context() + ext_tensor = replica_context.all_reduce(tf.distribute.ReduceOp.SUM, + ext_tensor) + + # Flatten the replica dimension. + # The first dimension size will be: tensor.shape[0] * num_replicas + # Using [-1] trick to support also scalar input. + return tf.reshape(ext_tensor, [-1] + ext_tensor.shape.as_list()[2:]) diff --git a/official/vision/beta/projects/video_ssl/modeling/video_ssl_model.py b/official/projects/video_ssl/modeling/video_ssl_model.py similarity index 94% rename from official/vision/beta/projects/video_ssl/modeling/video_ssl_model.py rename to official/projects/video_ssl/modeling/video_ssl_model.py index 01a604d9192d1a5ca224b1eaa0cfac75ce22a725..7faf9e6a289b3eb376b4efbb19693869236ea9c5 100644 --- a/official/vision/beta/projects/video_ssl/modeling/video_ssl_model.py +++ b/official/projects/video_ssl/modeling/video_ssl_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,9 +20,9 @@ from typing import Mapping, Optional import tensorflow as tf from official.modeling import tf_utils -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import factory_3d as model_factory -from official.vision.beta.projects.video_ssl.configs import video_ssl as video_ssl_cfg +from official.projects.video_ssl.configs import video_ssl as video_ssl_cfg +from official.vision.modeling import backbones +from official.vision.modeling import factory_3d as model_factory layers = tf.keras.layers @@ -53,7 +53,7 @@ class VideoSSLModel(tf.keras.Model): hidden_dim: `int` number of hidden units in MLP. hidden_layer_num: `int` number of hidden layers in MLP. hidden_norm_args: `dict` for batchnorm arguments in MLP. - projection_dim: `int` number of ouput dimension for MLP. + projection_dim: `int` number of output dimension for MLP. input_specs: `tf.keras.layers.InputSpec` specs of the input tensor. dropout_rate: `float` rate for dropout regularization. aggregate_endpoints: `bool` aggregate all end ponits or only use the diff --git a/official/vision/beta/projects/video_ssl/ops/video_ssl_preprocess_ops.py b/official/projects/video_ssl/ops/video_ssl_preprocess_ops.py similarity index 99% rename from official/vision/beta/projects/video_ssl/ops/video_ssl_preprocess_ops.py rename to official/projects/video_ssl/ops/video_ssl_preprocess_ops.py index 253798e856b580fa0edd94443dd994c662d31843..f6a2ef3aa311a7e7596a22e95baa62b974d91982 100644 --- a/official/vision/beta/projects/video_ssl/ops/video_ssl_preprocess_ops.py +++ b/official/projects/video_ssl/ops/video_ssl_preprocess_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Utils for customed ops for video ssl.""" import functools diff --git a/official/vision/beta/projects/video_ssl/ops/video_ssl_preprocess_ops_test.py b/official/projects/video_ssl/ops/video_ssl_preprocess_ops_test.py similarity index 88% rename from official/vision/beta/projects/video_ssl/ops/video_ssl_preprocess_ops_test.py rename to official/projects/video_ssl/ops/video_ssl_preprocess_ops_test.py index d7292ffc482446527f4bace7e615a72247fc70cb..7e1b61465a9caa753a74a31e8c4d6dc098cbd1bb 100644 --- a/official/vision/beta/projects/video_ssl/ops/video_ssl_preprocess_ops_test.py +++ b/official/projects/video_ssl/ops/video_ssl_preprocess_ops_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,10 +12,9 @@ # See the License for the specific language governing permissions and # limitations under the License. - import tensorflow as tf -from official.vision.beta.ops import preprocess_ops_3d -from official.vision.beta.projects.video_ssl.ops import video_ssl_preprocess_ops +from official.projects.video_ssl.ops import video_ssl_preprocess_ops +from official.vision.ops import preprocess_ops_3d class VideoSslPreprocessOpsTest(tf.test.TestCase): diff --git a/official/projects/video_ssl/tasks/__init__.py b/official/projects/video_ssl/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..cb7092512589db9a854ac1127a0694f9ffd61eed --- /dev/null +++ b/official/projects/video_ssl/tasks/__init__.py @@ -0,0 +1,18 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tasks package definition.""" + +from official.projects.video_ssl.tasks import linear_eval +from official.projects.video_ssl.tasks import pretrain diff --git a/official/vision/beta/projects/video_ssl/tasks/linear_eval.py b/official/projects/video_ssl/tasks/linear_eval.py similarity index 88% rename from official/vision/beta/projects/video_ssl/tasks/linear_eval.py rename to official/projects/video_ssl/tasks/linear_eval.py index dc245e44cf161db98ab97c1e6febbce2895376d7..5d7849422c765b8832b764b07128054962cde64f 100644 --- a/official/vision/beta/projects/video_ssl/tasks/linear_eval.py +++ b/official/projects/video_ssl/tasks/linear_eval.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Video ssl linear evaluation task definition.""" from typing import Any, Optional, List, Tuple from absl import logging @@ -20,9 +19,9 @@ import tensorflow as tf # pylint: disable=unused-import from official.core import task_factory -from official.vision.beta.projects.video_ssl.configs import video_ssl as exp_cfg -from official.vision.beta.projects.video_ssl.modeling import video_ssl_model -from official.vision.beta.tasks import video_classification +from official.projects.video_ssl.configs import video_ssl as exp_cfg +from official.projects.video_ssl.modeling import video_ssl_model +from official.vision.tasks import video_classification @task_factory.register_task_cls(exp_cfg.VideoSSLEvalTask) diff --git a/official/vision/beta/projects/video_ssl/tasks/pretrain.py b/official/projects/video_ssl/tasks/pretrain.py similarity index 92% rename from official/vision/beta/projects/video_ssl/tasks/pretrain.py rename to official/projects/video_ssl/tasks/pretrain.py index b82b2624ab3026e8c2bb4fc3eb590b2188fad633..f58db11ce5836ef0ca0912bf13eea928bb97f200 100644 --- a/official/vision/beta/projects/video_ssl/tasks/pretrain.py +++ b/official/projects/video_ssl/tasks/pretrain.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Video ssl pretrain task definition.""" from absl import logging import tensorflow as tf @@ -20,12 +19,13 @@ import tensorflow as tf # pylint: disable=unused-import from official.core import input_reader from official.core import task_factory -from official.vision.beta.modeling import factory_3d -from official.vision.beta.projects.video_ssl.configs import video_ssl as exp_cfg -from official.vision.beta.projects.video_ssl.dataloaders import video_ssl_input -from official.vision.beta.projects.video_ssl.losses import losses -from official.vision.beta.projects.video_ssl.modeling import video_ssl_model -from official.vision.beta.tasks import video_classification +from official.projects.video_ssl.configs import video_ssl as exp_cfg +from official.projects.video_ssl.dataloaders import video_ssl_input +from official.projects.video_ssl.losses import losses +from official.projects.video_ssl.modeling import video_ssl_model +from official.vision.modeling import factory_3d +from official.vision.tasks import video_classification +# pylint: enable=unused-import @task_factory.register_task_cls(exp_cfg.VideoSSLPretrainTask) diff --git a/official/vision/beta/projects/video_ssl/tasks/pretrain_test.py b/official/projects/video_ssl/tasks/pretrain_test.py similarity index 91% rename from official/vision/beta/projects/video_ssl/tasks/pretrain_test.py rename to official/projects/video_ssl/tasks/pretrain_test.py index e6ec40d577e0c73a22e77ed827106d0a76cfaac5..5f0bdbbb38fe5b8f799d6e8cf60d0c53cd83d028 100644 --- a/official/vision/beta/projects/video_ssl/tasks/pretrain_test.py +++ b/official/projects/video_ssl/tasks/pretrain_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 import functools import os @@ -22,12 +21,13 @@ import orbit import tensorflow as tf # pylint: disable=unused-import +from official import vision from official.core import exp_factory from official.core import task_factory from official.modeling import optimization -from official.vision import beta -from official.vision.beta.dataloaders import tfexample_utils -from official.vision.beta.projects.video_ssl.tasks import pretrain +from official.projects.video_ssl.tasks import pretrain +from official.vision.dataloaders import tfexample_utils +# pylint: enable=unused-import class VideoClassificationTaskTest(tf.test.TestCase): diff --git a/official/projects/video_ssl/train.py b/official/projects/video_ssl/train.py new file mode 100644 index 0000000000000000000000000000000000000000..5d1f4e8f58ffabd58ec4ce152626bc583f184042 --- /dev/null +++ b/official/projects/video_ssl/train.py @@ -0,0 +1,77 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Training driver.""" + +from absl import app +from absl import flags +import gin + +# pylint: disable=unused-import +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +from official.projects.video_ssl.modeling import video_ssl_model +from official.projects.video_ssl.tasks import linear_eval +from official.projects.video_ssl.tasks import pretrain +from official.vision import registry_imports +# pylint: disable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + if 'train_and_eval' in FLAGS.mode: + assert (params.task.train_data.feature_shape == + params.task.validation_data.feature_shape), ( + f'train {params.task.train_data.feature_shape} != validate ' + f'{params.task.validation_data.feature_shape}') + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(main) diff --git a/official/vision/beta/projects/vit/README.md b/official/projects/vit/README.md similarity index 100% rename from official/vision/beta/projects/vit/README.md rename to official/projects/vit/README.md diff --git a/official/projects/vit/configs/__init__.py b/official/projects/vit/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..fa295acb873151f754c8e345b7d8de3191060cf3 --- /dev/null +++ b/official/projects/vit/configs/__init__.py @@ -0,0 +1,17 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs package definition.""" + +from official.projects.vit.configs import image_classification diff --git a/official/projects/vit/configs/backbones.py b/official/projects/vit/configs/backbones.py new file mode 100644 index 0000000000000000000000000000000000000000..a9169d58303939eda3f13611c0423ec058e51114 --- /dev/null +++ b/official/projects/vit/configs/backbones.py @@ -0,0 +1,57 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Backbones configurations.""" +import dataclasses +from typing import Optional, Tuple + +from official.modeling import hyperparams + + +@dataclasses.dataclass +class Transformer(hyperparams.Config): + """Transformer config.""" + mlp_dim: int = 1 + num_heads: int = 1 + num_layers: int = 1 + attention_dropout_rate: float = 0.0 + dropout_rate: float = 0.1 + + +@dataclasses.dataclass +class VisionTransformer(hyperparams.Config): + """VisionTransformer config.""" + model_name: str = 'vit-b16' + # pylint: disable=line-too-long + pooler: str = 'token' # 'token', 'gap' or 'none'. If set to 'token', an extra classification token is added to sequence. + # pylint: enable=line-too-long + representation_size: int = 0 + hidden_size: int = 1 + patch_size: int = 16 + transformer: Transformer = Transformer() + init_stochastic_depth_rate: float = 0.0 + original_init: bool = True + pos_embed_shape: Optional[Tuple[int, int]] = None + + +@dataclasses.dataclass +class Backbone(hyperparams.OneOfConfig): + """Configuration for backbones. + + Attributes: + type: 'str', type of backbone be used, one the of fields below. + vit: vit backbone config. + """ + type: Optional[str] = None + vit: VisionTransformer = VisionTransformer() diff --git a/official/projects/vit/configs/image_classification.py b/official/projects/vit/configs/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..950fe92c4325f465f22445a283910c052d5be064 --- /dev/null +++ b/official/projects/vit/configs/image_classification.py @@ -0,0 +1,274 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification configuration definition.""" +import dataclasses +import os +from typing import Optional + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.core import task_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.projects.vit.configs import backbones +from official.vision.configs import common +from official.vision.configs import image_classification as img_cls_cfg +from official.vision.tasks import image_classification + +# pytype: disable=wrong-keyword-args + +DataConfig = img_cls_cfg.DataConfig + + +@dataclasses.dataclass +class ImageClassificationModel(img_cls_cfg.ImageClassificationModel): + """The model config.""" + backbone: backbones.Backbone = backbones.Backbone( + type='vit', vit=backbones.VisionTransformer()) + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + loss_weight: float = 1.0 + one_hot: bool = True + label_smoothing: float = 0.0 + l2_weight_decay: float = 0.0 + soft_labels: bool = False + + +@dataclasses.dataclass +class Evaluation(hyperparams.Config): + top_k: int = 5 + + +@dataclasses.dataclass +class ImageClassificationTask(cfg.TaskConfig): + """The task config. Same as the classification task for convnets.""" + model: ImageClassificationModel = ImageClassificationModel() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False) + losses: Losses = Losses() + evaluation: Evaluation = Evaluation() + init_checkpoint: Optional[str] = None + init_checkpoint_modules: str = 'all' # all or backbone + freeze_backbone: bool = False + + +IMAGENET_TRAIN_EXAMPLES = 1281167 +IMAGENET_VAL_EXAMPLES = 50000 +IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord' + +# TODO(b/177942984): integrate the experiments to TF-vision. +task_factory.register_task_cls(ImageClassificationTask)( + image_classification.ImageClassificationTask) + + +@exp_factory.register_config_factory('legacy_deit_imagenet_pretrain') +def image_classification_imagenet_deit_pretrain() -> cfg.ExperimentConfig: + """Image classification on imagenet with vision transformer.""" + train_batch_size = 4096 # originally was 1024 but 4096 better for tpu v3-32 + eval_batch_size = 4096 # originally was 1024 but 4096 better for tpu v3-32 + num_classes = 1001 + label_smoothing = 0.1 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=num_classes, + input_size=[224, 224, 3], + kernel_initializer='zeros', + backbone=backbones.Backbone( + type='vit', + vit=backbones.VisionTransformer( + model_name='vit-b16', + representation_size=768, + init_stochastic_depth_rate=0.1, + original_init=False, + transformer=backbones.Transformer( + dropout_rate=0.0, attention_dropout_rate=0.0)))), + losses=Losses( + l2_weight_decay=0.0, + label_smoothing=label_smoothing, + one_hot=False, + soft_labels=True), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + aug_type=common.Augmentation( + type='randaug', + randaug=common.RandAugment( + magnitude=9, exclude_ops=['Cutout'])), + mixup_and_cutmix=common.MixupAndCutmix( + label_smoothing=label_smoothing)), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=300 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': 0.05, + 'include_in_weight_decay': r'.*(kernel|weight):0$', + 'gradient_clip_norm': 0.0 + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.0005 * train_batch_size / 512, + 'decay_steps': 300 * steps_per_epoch, + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('legacy_vit_imagenet_pretrain') +def image_classification_imagenet_vit_pretrain() -> cfg.ExperimentConfig: + """Image classification on imagenet with vision transformer.""" + train_batch_size = 4096 + eval_batch_size = 4096 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[224, 224, 3], + kernel_initializer='zeros', + backbone=backbones.Backbone( + type='vit', + vit=backbones.VisionTransformer( + model_name='vit-b16', representation_size=768))), + losses=Losses(l2_weight_decay=0.0), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=300 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': 0.3, + 'include_in_weight_decay': r'.*(kernel|weight):0$', + 'gradient_clip_norm': 0.0 + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.003 * train_batch_size / 4096, + 'decay_steps': 300 * steps_per_epoch, + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 10000, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('legacy_vit_imagenet_finetune') +def image_classification_imagenet_vit_finetune() -> cfg.ExperimentConfig: + """Image classification on imagenet with vision transformer.""" + train_batch_size = 512 + eval_batch_size = 512 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[384, 384, 3], + backbone=backbones.Backbone( + type='vit', + vit=backbones.VisionTransformer(model_name='vit-b16'))), + losses=Losses(l2_weight_decay=0.0), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=20000, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9, + 'global_clipnorm': 1.0, + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.003, + 'decay_steps': 20000, + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config diff --git a/official/projects/vit/modeling/nn_blocks.py b/official/projects/vit/modeling/nn_blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..891c9ac2426baab96ccfa26ce93407108b472ce5 --- /dev/null +++ b/official/projects/vit/modeling/nn_blocks.py @@ -0,0 +1,119 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Keras-based TransformerEncoder block layer.""" +import tensorflow as tf + +from official.nlp import modeling +from official.vision.modeling.layers.nn_layers import StochasticDepth + + +class TransformerEncoderBlock(modeling.layers.TransformerEncoderBlock): + """TransformerEncoderBlock layer with stochastic depth.""" + + def __init__(self, + *args, + stochastic_depth_drop_rate=0.0, + return_attention=False, + **kwargs): + """Initializes TransformerEncoderBlock.""" + super().__init__(*args, **kwargs) + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._return_attention = return_attention + + def build(self, input_shape): + if self._stochastic_depth_drop_rate: + self._stochastic_depth = StochasticDepth(self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = lambda x, *args, **kwargs: tf.identity(x) + + super().build(input_shape) + + def get_config(self): + config = {"stochastic_depth_drop_rate": self._stochastic_depth_drop_rate} + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + """Transformer self-attention encoder block call.""" + if isinstance(inputs, (list, tuple)): + if len(inputs) == 2: + input_tensor, attention_mask = inputs + key_value = None + elif len(inputs) == 3: + input_tensor, key_value, attention_mask = inputs + else: + raise ValueError("Unexpected inputs to %s with length at %d" % + (self.__class__, len(inputs))) + else: + input_tensor, key_value, attention_mask = (inputs, None, None) + + if self._output_range: + if self._norm_first: + source_tensor = input_tensor[:, 0:self._output_range, :] + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor[:, 0:self._output_range, :] + if attention_mask is not None: + attention_mask = attention_mask[:, 0:self._output_range, :] + else: + if self._norm_first: + source_tensor = input_tensor + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor + + if key_value is None: + key_value = input_tensor + attention_output, attention_scores = self._attention_layer( + query=target_tensor, value=key_value, attention_mask=attention_mask, + return_attention_scores=True) + attention_output = self._attention_dropout(attention_output) + + if self._norm_first: + attention_output = source_tensor + self._stochastic_depth( + attention_output, training=training) + else: + attention_output = self._attention_layer_norm( + target_tensor + + self._stochastic_depth(attention_output, training=training)) + + if self._norm_first: + source_attention_output = attention_output + attention_output = self._output_layer_norm(attention_output) + inner_output = self._intermediate_dense(attention_output) + inner_output = self._intermediate_activation_layer(inner_output) + inner_output = self._inner_dropout_layer(inner_output) + layer_output = self._output_dense(inner_output) + layer_output = self._output_dropout(layer_output) + + if self._norm_first: + if self._return_attention: + return source_attention_output + self._stochastic_depth( + layer_output, training=training), attention_scores + else: + return source_attention_output + self._stochastic_depth( + layer_output, training=training) + + # During mixed precision training, layer norm output is always fp32 for now. + # Casts fp32 for the subsequent add. + layer_output = tf.cast(layer_output, tf.float32) + if self._return_attention: + return self._output_layer_norm(layer_output + self._stochastic_depth( + attention_output, training=training)), attention_scores + else: + return self._output_layer_norm(layer_output + self._stochastic_depth( + attention_output, training=training)) diff --git a/official/projects/vit/modeling/transformer_scaffold.py b/official/projects/vit/modeling/transformer_scaffold.py new file mode 100644 index 0000000000000000000000000000000000000000..edf13a6235ef3c4e29e6f429c8eea86b0ea3ae54 --- /dev/null +++ b/official/projects/vit/modeling/transformer_scaffold.py @@ -0,0 +1,161 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Keras-based Scaffold TransformerEncoder block for vision models. + +This implementation is subclassed from NLP TransformerScaffold to support +customized `attention_layer` and `feedforward_layer`. In addition, this +implementation has a few features to better support vision use cases: +1. `stochastic_depth_drop_rate` to supress model overfitting. +2. `return_attention_scores`, optionally returns the attention output. +3. `ffn_has_residual_connection`, clearly define whether feedforward network has + residual connection or not to avoid ambiguity. +""" +from typing import List, Optional, Tuple, Union + +import gin +import tensorflow as tf + +from official.nlp import modeling +from official.vision.modeling.layers.nn_layers import StochasticDepth + + +@tf.keras.utils.register_keras_serializable(package="Vision") +@gin.configurable +class TransformerScaffold(modeling.layers.TransformerScaffold): + """TransformerScaffold layer for vision applications. + + This layer is a subclass of NLP TransformerScaffold: + + Attributes: + stochastic_depth_drop_rate: Drop rate for the residual connections. + return_attention_scores: Optionally return the attention output. + ffn_has_residual_connection: Whether the feedforward network has internal + residual connection and layer norm. If False, the residual connection and + the layer norm op are called inside TransformerScaffold. + """ + + def __init__(self, + *args, + stochastic_depth_drop_rate: float = 0.0, + return_attention_scores: bool = False, + ffn_has_residual_connection: bool = False, + **kwargs): + """Initializes TransformerEncoderBlock.""" + super().__init__(*args, **kwargs) + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._return_attention_scores = return_attention_scores + self._ffn_has_residual_connection = ffn_has_residual_connection + + def build(self, input_shape: Union[tf.TensorShape, List[int]]): + if self._stochastic_depth_drop_rate: + self._stochastic_depth = StochasticDepth(self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = lambda x, *args, **kwargs: tf.identity(x) + + super().build(input_shape) + + def get_config(self): + config = {"stochastic_depth_drop_rate": self._stochastic_depth_drop_rate, + "return_attention_scores": self._return_attention_scores, + "ffn_has_residual_connection": self._ffn_has_residual_connection} + base_config = super().get_config() + base_config.update(config) + return base_config + + def call( + self, + inputs: tf.Tensor, + training: Optional[bool] = None + ) -> Union[tf.Tensor, Tuple[tf.Tensor, tf.Tensor]]: + """Transformer self-attention encoder block call.""" + if isinstance(inputs, (list, tuple)): + if len(inputs) == 2: + input_tensor, attention_mask = inputs + key_value = None + elif len(inputs) == 3: + input_tensor, key_value, attention_mask = inputs + else: + raise ValueError("Unexpected inputs to %s with length at %d" % + (self.__class__, len(inputs))) + else: + input_tensor, key_value, attention_mask = (inputs, None, None) + + if key_value is None: + key_value = input_tensor + + if self._norm_first: + source_tensor = input_tensor + input_tensor = self._attention_layer_norm(input_tensor, training=training) + + attention_layer_output = self._attention_layer( + query=input_tensor, + value=key_value, + attention_mask=attention_mask, + training=training, + return_attention_scores=self._return_attention_scores) + if isinstance(attention_layer_output, tuple): + # `attention_layer_output` contains two tensors when + # `return_attention_scores` is True. + attention_output, attention_scores = attention_layer_output + else: + attention_output = attention_layer_output + attention_output = self._attention_dropout(attention_output, + training=training) + + if self._norm_first: + source_attention_output = source_tensor + self._stochastic_depth( + attention_output, training=training) + attention_output = self._output_layer_norm(source_attention_output, + training=training) + else: + attention_output = self._attention_layer_norm( + input_tensor + + self._stochastic_depth(attention_output, training=training), + training=training) + + if self._feedforward_block is None: + intermediate_output = self._intermediate_dense(attention_output) + intermediate_output = self._intermediate_activation_layer( + intermediate_output) + layer_output = self._output_dense(intermediate_output, training=training) + layer_output = self._output_dropout(layer_output, training=training) + else: + layer_output = self._feedforward_block(attention_output, + training=training) + + # During mixed precision training, layer norm output is always fp32 for now. + # Casts fp32 for the subsequent add. + layer_output = tf.cast(layer_output, tf.float32) + + if self._norm_first: + if self._ffn_has_residual_connection: + raise ValueError( + "In the case of `norm_first`, the residual connection should be" + "done in the TransformerScaffold call function, not FFN's" + "call function.") + output = source_attention_output + self._stochastic_depth( + layer_output, training=training) + else: + if self._ffn_has_residual_connection: + output = self._stochastic_depth(layer_output, training=training) + else: + output = self._output_layer_norm( + attention_output + self._stochastic_depth( + layer_output, training=training)) + + if self._return_attention_scores: + return output, attention_scores + else: + return output diff --git a/official/projects/vit/modeling/transformer_scaffold_test.py b/official/projects/vit/modeling/transformer_scaffold_test.py new file mode 100644 index 0000000000000000000000000000000000000000..632918b06ec6790252ae1496a8efe80f9cdffc3f --- /dev/null +++ b/official/projects/vit/modeling/transformer_scaffold_test.py @@ -0,0 +1,518 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for Keras-based transformer block layer.""" + +import numpy as np +import tensorflow as tf + +from tensorflow.python.keras import keras_parameterized # pylint: disable=g-direct-tensorflow-import +from official.nlp import modeling +from official.projects.vit.modeling import transformer_scaffold + +TransformerScaffold = transformer_scaffold.TransformerScaffold + + +# Test class that wraps a standard attention layer. If this layer is called +# at any point, the list passed to the config object will be filled with a +# boolean 'True'. We register this class as a Keras serializable so we can +# test serialization below. +@tf.keras.utils.register_keras_serializable(package='TestOnlyAttention') +class ValidatedAttentionLayer(modeling.layers.attention.MultiHeadAttention): + + def __init__(self, call_list, **kwargs): + super(ValidatedAttentionLayer, self).__init__(**kwargs) + self.list = call_list + + def call(self, + query, + value, + attention_mask=None, + return_attention_scores=False,): + self.list.append(True) + return super(ValidatedAttentionLayer, self).call( + query, + value, + attention_mask=attention_mask, + return_attention_scores=return_attention_scores) + + def get_config(self): + config = super(ValidatedAttentionLayer, self).get_config() + config['call_list'] = [] + return config + + +# Test class implements a simple feedforward layer. If this layer is called +# at any point, the list passed to the config object will be filled with a +# boolean 'True'. We register this class as a Keras serializable so we can +# test serialization below. +@tf.keras.utils.register_keras_serializable(package='TestOnlyFeedforward') +class ValidatedFeedforwardLayer(tf.keras.layers.Layer): + + def __init__(self, call_list, activation, **kwargs): + super(ValidatedFeedforwardLayer, self).__init__(**kwargs) + self.list = call_list + self.activation = activation + + def build(self, input_shape): + hidden_size = input_shape.as_list()[-1] + self._feedforward_dense = tf.keras.layers.EinsumDense( + '...x,xy->...y', + output_shape=hidden_size, + bias_axes='y', + activation=self.activation, + name='feedforward') + + def call(self, inputs): + self.list.append(True) + return self._feedforward_dense(inputs) + + def get_config(self): + config = super(ValidatedFeedforwardLayer, self).get_config() + config['call_list'] = [] + config['activation'] = self.activation + return config + + +# This decorator runs the test in V1, V2-Eager, and V2-Functional mode. It +# guarantees forward compatibility of this code for the V2 switchover. +@keras_parameterized.run_all_keras_modes +class TransformerLayerTest(keras_parameterized.TestCase): + + def tearDown(self): + super(TransformerLayerTest, self).tearDown() + tf.keras.mixed_precision.set_global_policy('float32') + + def test_layer_creation(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu') + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + output_tensor = test_layer(data_tensor) + # The default output of a transformer layer should be the same as the input. + self.assertEqual(data_tensor.shape.as_list(), output_tensor.shape.as_list()) + + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0], "The passed layer class wasn't instantiated.") + + def test_layer_creation_with_feedforward_cls(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + feedforward_call_list = [] + feedforward_layer_cfg = { + 'activation': 'relu', + 'call_list': feedforward_call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + feedforward_cls=ValidatedFeedforwardLayer, + feedforward_cfg=feedforward_layer_cfg, + num_attention_heads=10, + inner_dim=None, + inner_activation=None) + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + output_tensor = test_layer(data_tensor) + # The default output of a transformer layer should be the same as the input. + self.assertEqual(data_tensor.shape.as_list(), output_tensor.shape.as_list()) + + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0], "The passed layer class wasn't instantiated.") + self.assertNotEmpty(feedforward_call_list) + self.assertTrue(feedforward_call_list[0], + "The passed layer class wasn't instantiated.") + + def test_layer_creation_with_mask(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu') + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + # Create a 2-dimensional input (the first dimension is implicit). + mask_tensor = tf.keras.Input(shape=(sequence_length, sequence_length)) + output_tensor = test_layer([data_tensor, mask_tensor]) + # The default output of a transformer layer should be the same as the input. + self.assertEqual(data_tensor.shape.as_list(), output_tensor.shape.as_list()) + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0], "The passed layer class wasn't instantiated.") + + def test_layer_invocation(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu') + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + output_tensor = test_layer(data_tensor) + + # Create a model from the test layer. + model = tf.keras.Model(data_tensor, output_tensor) + + # Invoke the model on test data. We can't validate the output data itself + # (the NN is too complex) but this will rule out structural runtime errors. + batch_size = 6 + input_data = 10 * np.random.random_sample( + (batch_size, sequence_length, width)) + _ = model.predict(input_data) + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0], "The passed layer class wasn't instantiated.") + + def test_layer_invocation_with_feedforward_cls(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + feedforward_call_list = [] + feedforward_layer_cfg = { + 'activation': 'relu', + 'call_list': feedforward_call_list, + } + feedforward_layer = ValidatedFeedforwardLayer(**feedforward_layer_cfg) + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + feedforward_cls=feedforward_layer, + num_attention_heads=10, + inner_dim=None, + inner_activation=None) + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + # Create a 2-dimensional input (the first dimension is implicit). + mask_tensor = tf.keras.Input(shape=(sequence_length, sequence_length)) + output_tensor = test_layer([data_tensor, mask_tensor]) + + # Create a model from the test layer. + model = tf.keras.Model([data_tensor, mask_tensor], output_tensor) + + # Invoke the model on test data. We can't validate the output data itself + # (the NN is too complex) but this will rule out structural runtime errors. + batch_size = 6 + input_data = 10 * np.random.random_sample( + (batch_size, sequence_length, width)) + # The attention mask should be of shape (batch, from_seq_len, to_seq_len), + # which here is (batch, sequence_length, sequence_length) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length, sequence_length)) + _ = model.predict([input_data, mask_data]) + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0], "The passed layer class wasn't instantiated.") + self.assertNotEmpty(feedforward_call_list) + self.assertTrue(feedforward_call_list[0], + "The passed layer class wasn't instantiated.") + + def test_layer_invocation_with_mask(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu') + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + # Create a 2-dimensional input (the first dimension is implicit). + mask_tensor = tf.keras.Input(shape=(sequence_length, sequence_length)) + output_tensor = test_layer([data_tensor, mask_tensor]) + + # Create a model from the test layer. + model = tf.keras.Model([data_tensor, mask_tensor], output_tensor) + + # Invoke the model on test data. We can't validate the output data itself + # (the NN is too complex) but this will rule out structural runtime errors. + batch_size = 6 + input_data = 10 * np.random.random_sample( + (batch_size, sequence_length, width)) + # The attention mask should be of shape (batch, from_seq_len, to_seq_len), + # which here is (batch, sequence_length, sequence_length) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length, sequence_length)) + _ = model.predict([input_data, mask_data]) + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0], "The passed layer class wasn't instantiated.") + + def test_layer_invocation_with_float16_dtype(self): + tf.keras.mixed_precision.set_global_policy('mixed_float16') + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu') + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + # Create a 2-dimensional input (the first dimension is implicit). + mask_tensor = tf.keras.Input(shape=(sequence_length, sequence_length)) + output_tensor = test_layer([data_tensor, mask_tensor]) + + # Create a model from the test layer. + model = tf.keras.Model([data_tensor, mask_tensor], output_tensor) + + # Invoke the model on test data. We can't validate the output data itself + # (the NN is too complex) but this will rule out structural runtime errors. + batch_size = 6 + input_data = (10 * np.random.random_sample( + (batch_size, sequence_length, width))) + # The attention mask should be of shape (batch, from_seq_len, to_seq_len), + # which here is (batch, sequence_length, sequence_length) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length, sequence_length)) + _ = model.predict([input_data, mask_data]) + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0], "The passed layer class wasn't instantiated.") + + def test_transform_with_initializer(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu', + kernel_initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02)) + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + output = test_layer(data_tensor) + # The default output of a transformer layer should be the same as the input. + self.assertEqual(data_tensor.shape.as_list(), output.shape.as_list()) + # If call_list[0] exists and is True, the passed layer class was + # instantiated from the given config properly. + self.assertNotEmpty(call_list) + self.assertTrue(call_list[0]) + + def test_layer_restoration_from_config(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + 'name': 'test_layer', + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + num_attention_heads=10, + inner_dim=2048, + inner_activation='relu') + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + # Create a 2-dimensional input (the first dimension is implicit). + mask_tensor = tf.keras.Input(shape=(sequence_length, sequence_length)) + output_tensor = test_layer([data_tensor, mask_tensor]) + + # Create a model from the test layer. + model = tf.keras.Model([data_tensor, mask_tensor], output_tensor) + + # Invoke the model on test data. We can't validate the output data itself + # (the NN is too complex) but this will rule out structural runtime errors. + batch_size = 6 + input_data = 10 * np.random.random_sample( + (batch_size, sequence_length, width)) + # The attention mask should be of shape (batch, from_seq_len, to_seq_len), + # which here is (batch, sequence_length, sequence_length) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length, sequence_length)) + pre_serialization_output = model.predict([input_data, mask_data]) + + # Serialize the model config. Pass the serialized data through json to + # ensure that we can serialize this layer to disk. + serialized_data = model.get_config() + + # Create a new model from the old config, and copy the weights. These models + # should have identical outputs. + new_model = tf.keras.Model.from_config(serialized_data) + new_model.set_weights(model.get_weights()) + output = new_model.predict([input_data, mask_data]) + + self.assertAllClose(pre_serialization_output, output) + + # If the layer was configured correctly, it should have a list attribute + # (since it should have the custom class and config passed to it). + new_model.summary() + new_call_list = new_model.get_layer( + name='transformer_scaffold')._attention_layer.list + self.assertNotEmpty(new_call_list) + self.assertTrue(new_call_list[0], + "The passed layer class wasn't instantiated.") + + def test_layer_with_feedforward_cls_restoration_from_config(self): + sequence_length = 21 + width = 80 + + call_list = [] + attention_layer_cfg = { + 'num_heads': 10, + 'key_dim': 8, + 'call_list': call_list, + 'name': 'test_layer', + } + feedforward_call_list = [] + feedforward_layer_cfg = { + 'activation': 'relu', + 'call_list': feedforward_call_list, + } + test_layer = TransformerScaffold( + attention_cls=ValidatedAttentionLayer, + attention_cfg=attention_layer_cfg, + feedforward_cls=ValidatedFeedforwardLayer, + feedforward_cfg=feedforward_layer_cfg, + num_attention_heads=10, + inner_dim=None, + inner_activation=None) + + # Create a 3-dimensional input (the first dimension is implicit). + data_tensor = tf.keras.Input(shape=(sequence_length, width)) + # Create a 2-dimensional input (the first dimension is implicit). + mask_tensor = tf.keras.Input(shape=(sequence_length, sequence_length)) + output_tensor = test_layer([data_tensor, mask_tensor]) + + # Create a model from the test layer. + model = tf.keras.Model([data_tensor, mask_tensor], output_tensor) + + # Invoke the model on test data. We can't validate the output data itself + # (the NN is too complex) but this will rule out structural runtime errors. + batch_size = 6 + input_data = 10 * np.random.random_sample( + (batch_size, sequence_length, width)) + # The attention mask should be of shape (batch, from_seq_len, to_seq_len), + # which here is (batch, sequence_length, sequence_length) + mask_data = np.random.randint( + 2, size=(batch_size, sequence_length, sequence_length)) + pre_serialization_output = model.predict([input_data, mask_data]) + + serialized_data = model.get_config() + # Create a new model from the old config, and copy the weights. These models + # should have identical outputs. + new_model = tf.keras.Model.from_config(serialized_data) + new_model.set_weights(model.get_weights()) + output = new_model.predict([input_data, mask_data]) + + self.assertAllClose(pre_serialization_output, output) + + # If the layer was configured correctly, it should have a list attribute + # (since it should have the custom class and config passed to it). + new_model.summary() + new_call_list = new_model.get_layer( + name='transformer_scaffold')._attention_layer.list + self.assertNotEmpty(new_call_list) + self.assertTrue(new_call_list[0], + "The passed layer class wasn't instantiated.") + new_feedforward_call_list = new_model.get_layer( + name='transformer_scaffold')._feedforward_block.list + self.assertNotEmpty(new_feedforward_call_list) + self.assertTrue(new_feedforward_call_list[0], + "The passed layer class wasn't instantiated.") + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/vit/modeling/vit.py b/official/projects/vit/modeling/vit.py new file mode 100644 index 0000000000000000000000000000000000000000..4245711dc921a1c6d4adc5f4c18d782767a1d502 --- /dev/null +++ b/official/projects/vit/modeling/vit.py @@ -0,0 +1,323 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""VisionTransformer models.""" +from typing import Optional, Tuple + +from absl import logging + +# import immutabledict +import tensorflow as tf + +from official.modeling import activations +from official.projects.vit.modeling import nn_blocks +from official.projects.vit.modeling.vit_specs import VIT_SPECS +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_layers + +layers = tf.keras.layers + + +class AddPositionEmbs(tf.keras.layers.Layer): + """Adds (optionally learned) positional embeddings to the inputs.""" + + def __init__(self, + posemb_init: Optional[tf.keras.initializers.Initializer] = None, + posemb_origin_shape: Optional[Tuple[int, int]] = None, + posemb_target_shape: Optional[Tuple[int, int]] = None, + **kwargs): + """Constructs Postional Embedding module. + + The logic of this module is: the learnable positional embeddings length will + be determined by the inputs_shape or posemb_origin_shape (if provided) + during the construction. If the posemb_target_shape is provided and is + different from the positional embeddings length, the embeddings will be + interpolated during the forward call. + + Args: + posemb_init: The positional embedding initializer. + posemb_origin_shape: The intended positional embedding shape. + posemb_target_shape: The potential target shape positional embedding may + be interpolated to. + **kwargs: other args. + """ + super().__init__(**kwargs) + self.posemb_init = posemb_init + self.posemb_origin_shape = posemb_origin_shape + self.posemb_target_shape = posemb_target_shape + + def build(self, inputs_shape): + if self.posemb_origin_shape is not None: + pos_emb_length = self.posemb_origin_shape[0] * self.posemb_origin_shape[1] + else: + pos_emb_length = inputs_shape[1] + pos_emb_shape = (1, pos_emb_length, inputs_shape[2]) + self.pos_embedding = self.add_weight( + 'pos_embedding', pos_emb_shape, initializer=self.posemb_init) + + def _interpolate(self, pos_embedding: tf.Tensor, from_shape: Tuple[int, int], + to_shape: Tuple[int, int]) -> tf.Tensor: + """Interpolates the positional embeddings.""" + logging.info('Interpolating postional embedding from length: %d to %d', + from_shape, to_shape) + grid_emb = tf.reshape(pos_embedding, [1] + list(from_shape) + [-1]) + # NOTE: Using BILINEAR interpolation by default. + grid_emb = tf.image.resize(grid_emb, to_shape) + return tf.reshape(grid_emb, [1, to_shape[0] * to_shape[1], -1]) + + def call(self, inputs, inputs_positions=None): + del inputs_positions + pos_embedding = self.pos_embedding + # inputs.shape is (batch_size, seq_len, emb_dim). + if inputs.shape[1] != pos_embedding.shape[1]: + pos_embedding = self._interpolate( + pos_embedding, + from_shape=self.posemb_origin_shape, + to_shape=self.posemb_target_shape) + pos_embedding = tf.cast(pos_embedding, inputs.dtype) + + return inputs + pos_embedding + + +class TokenLayer(tf.keras.layers.Layer): + """A simple layer to wrap token parameters.""" + + def build(self, inputs_shape): + self.cls = self.add_weight( + 'cls', (1, 1, inputs_shape[-1]), initializer='zeros') + + def call(self, inputs): + cls = tf.cast(self.cls, inputs.dtype) + cls = cls + tf.zeros_like(inputs[:, 0:1]) # A hacky way to tile. + x = tf.concat([cls, inputs], axis=1) + return x + + +class Encoder(tf.keras.layers.Layer): + """Transformer Encoder.""" + + def __init__(self, + num_layers, + mlp_dim, + num_heads, + dropout_rate=0.1, + attention_dropout_rate=0.1, + kernel_regularizer=None, + inputs_positions=None, + init_stochastic_depth_rate=0.0, + kernel_initializer='glorot_uniform', + add_pos_embed=True, + pos_embed_origin_shape=None, + pos_embed_target_shape=None, + **kwargs): + super().__init__(**kwargs) + self._num_layers = num_layers + self._mlp_dim = mlp_dim + self._num_heads = num_heads + self._dropout_rate = dropout_rate + self._attention_dropout_rate = attention_dropout_rate + self._kernel_regularizer = kernel_regularizer + self._inputs_positions = inputs_positions + self._init_stochastic_depth_rate = init_stochastic_depth_rate + self._kernel_initializer = kernel_initializer + self._add_pos_embed = add_pos_embed + self._pos_embed_origin_shape = pos_embed_origin_shape + self._pos_embed_target_shape = pos_embed_target_shape + + def build(self, input_shape): + if self._add_pos_embed: + self._pos_embed = AddPositionEmbs( + posemb_init=tf.keras.initializers.RandomNormal(stddev=0.02), + posemb_origin_shape=self._pos_embed_origin_shape, + posemb_target_shape=self._pos_embed_target_shape, + name='posembed_input') + self._dropout = layers.Dropout(rate=self._dropout_rate) + + self._encoder_layers = [] + # Set layer norm epsilons to 1e-6 to be consistent with JAX implementation. + # https://flax.readthedocs.io/en/latest/_autosummary/flax.deprecated.nn.LayerNorm.html + for i in range(self._num_layers): + encoder_layer = nn_blocks.TransformerEncoderBlock( + inner_activation=activations.gelu, + num_attention_heads=self._num_heads, + inner_dim=self._mlp_dim, + output_dropout=self._dropout_rate, + attention_dropout=self._attention_dropout_rate, + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + norm_first=True, + stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( + self._init_stochastic_depth_rate, i + 1, self._num_layers), + norm_epsilon=1e-6) + self._encoder_layers.append(encoder_layer) + self._norm = layers.LayerNormalization(epsilon=1e-6) + super().build(input_shape) + + def call(self, inputs, training=None): + x = inputs + if self._add_pos_embed: + x = self._pos_embed(x, inputs_positions=self._inputs_positions) + x = self._dropout(x, training=training) + + for encoder_layer in self._encoder_layers: + x = encoder_layer(x, training=training) + x = self._norm(x) + return x + + def get_config(self): + config = super().get_config() + updates = { + 'num_layers': self._num_layers, + 'mlp_dim': self._mlp_dim, + 'num_heads': self._num_heads, + 'dropout_rate': self._dropout_rate, + 'attention_dropout_rate': self._attention_dropout_rate, + 'kernel_regularizer': self._kernel_regularizer, + 'inputs_positions': self._inputs_positions, + 'init_stochastic_depth_rate': self._init_stochastic_depth_rate, + 'kernel_initializer': self._kernel_initializer, + 'add_pos_embed': self._add_pos_embed, + 'pos_embed_origin_shape': self._pos_embed_origin_shape, + 'pos_embed_target_shape': self._pos_embed_target_shape, + } + config.update(updates) + return config + + +class VisionTransformer(tf.keras.Model): + """Class to build VisionTransformer family model.""" + + def __init__(self, + mlp_dim=3072, + num_heads=12, + num_layers=12, + attention_dropout_rate=0.0, + dropout_rate=0.1, + init_stochastic_depth_rate=0.0, + input_specs=layers.InputSpec(shape=[None, None, None, 3]), + patch_size=16, + hidden_size=768, + representation_size=0, + pooler='token', + kernel_regularizer=None, + original_init: bool = True, + pos_embed_shape: Optional[Tuple[int, int]] = None): + """VisionTransformer initialization function.""" + self._mlp_dim = mlp_dim + self._num_heads = num_heads + self._num_layers = num_layers + self._hidden_size = hidden_size + self._patch_size = patch_size + + inputs = tf.keras.Input(shape=input_specs.shape[1:]) + + x = layers.Conv2D( + filters=hidden_size, + kernel_size=patch_size, + strides=patch_size, + padding='valid', + kernel_regularizer=kernel_regularizer, + kernel_initializer='lecun_normal' if original_init else 'he_uniform')( + inputs) + if tf.keras.backend.image_data_format() == 'channels_last': + rows_axis, cols_axis = (1, 2) + else: + rows_axis, cols_axis = (2, 3) + # The reshape below assumes the data_format is 'channels_last,' so + # transpose to that. Once the data is flattened by the reshape, the + # data_format is irrelevant, so no need to update + # tf.keras.backend.image_data_format. + x = tf.transpose(x, perm=[0, 2, 3, 1]) + + pos_embed_target_shape = (x.shape[rows_axis], x.shape[cols_axis]) + seq_len = (input_specs.shape[rows_axis] // patch_size) * ( + input_specs.shape[cols_axis] // patch_size) + x = tf.reshape(x, [-1, seq_len, hidden_size]) + + # If we want to add a class token, add it here. + if pooler == 'token': + x = TokenLayer(name='cls')(x) + + x = Encoder( + num_layers=num_layers, + mlp_dim=mlp_dim, + num_heads=num_heads, + dropout_rate=dropout_rate, + attention_dropout_rate=attention_dropout_rate, + kernel_regularizer=kernel_regularizer, + kernel_initializer='glorot_uniform' if original_init else dict( + class_name='TruncatedNormal', config=dict(stddev=.02)), + init_stochastic_depth_rate=init_stochastic_depth_rate, + pos_embed_origin_shape=pos_embed_shape, + pos_embed_target_shape=pos_embed_target_shape)( + x) + + if pooler == 'token': + x = x[:, 0] + elif pooler == 'gap': + x = tf.reduce_mean(x, axis=1) + elif pooler == 'none': + x = tf.identity(x, name='encoded_tokens') + else: + raise ValueError(f'unrecognized pooler type: {pooler}') + + if representation_size: + x = tf.keras.layers.Dense( + representation_size, + kernel_regularizer=kernel_regularizer, + name='pre_logits', + kernel_initializer='lecun_normal' if original_init else 'he_uniform')( + x) + x = tf.nn.tanh(x) + else: + x = tf.identity(x, name='pre_logits') + + if pooler == 'none': + endpoints = {'encoded_tokens': x} + else: + endpoints = { + 'pre_logits': + tf.reshape(x, [-1, 1, 1, representation_size or hidden_size]) + } + super(VisionTransformer, self).__init__(inputs=inputs, outputs=endpoints) + + +@factory.register_backbone_builder('legacy_vit') +def build_vit(input_specs, + backbone_config, + norm_activation_config, + l2_regularizer=None): + """Build ViT model.""" + del norm_activation_config + backbone_type = backbone_config.type + backbone_cfg = backbone_config.get() + assert backbone_type == 'legacy_vit', (f'Inconsistent backbone type ' + f'{backbone_type}') + backbone_cfg.override(VIT_SPECS[backbone_cfg.model_name]) + + return VisionTransformer( + mlp_dim=backbone_cfg.transformer.mlp_dim, + num_heads=backbone_cfg.transformer.num_heads, + num_layers=backbone_cfg.transformer.num_layers, + attention_dropout_rate=backbone_cfg.transformer.attention_dropout_rate, + dropout_rate=backbone_cfg.transformer.dropout_rate, + init_stochastic_depth_rate=backbone_cfg.init_stochastic_depth_rate, + input_specs=input_specs, + patch_size=backbone_cfg.patch_size, + hidden_size=backbone_cfg.hidden_size, + representation_size=backbone_cfg.representation_size, + pooler=backbone_cfg.pooler, + kernel_regularizer=l2_regularizer, + original_init=backbone_cfg.original_init, + pos_embed_shape=backbone_cfg.pos_embed_shape) diff --git a/official/projects/vit/modeling/vit_specs.py b/official/projects/vit/modeling/vit_specs.py new file mode 100644 index 0000000000000000000000000000000000000000..060bc2d09be50b4dd10b2891a518a6e948d435b1 --- /dev/null +++ b/official/projects/vit/modeling/vit_specs.py @@ -0,0 +1,68 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""VisionTransformer backbone specs.""" +import immutabledict + + +VIT_SPECS = immutabledict.immutabledict({ + 'vit-ti16': + dict( + hidden_size=192, + patch_size=16, + transformer=dict(mlp_dim=768, num_heads=3, num_layers=12), + ), + 'vit-s16': + dict( + hidden_size=384, + patch_size=16, + transformer=dict(mlp_dim=1536, num_heads=6, num_layers=12), + ), + 'vit-b16': + dict( + hidden_size=768, + patch_size=16, + transformer=dict(mlp_dim=3072, num_heads=12, num_layers=12), + ), + 'vit-b32': + dict( + hidden_size=768, + patch_size=32, + transformer=dict(mlp_dim=3072, num_heads=12, num_layers=12), + ), + 'vit-l16': + dict( + hidden_size=1024, + patch_size=16, + transformer=dict(mlp_dim=4096, num_heads=16, num_layers=24), + ), + 'vit-l32': + dict( + hidden_size=1024, + patch_size=32, + transformer=dict(mlp_dim=4096, num_heads=16, num_layers=24), + ), + 'vit-h14': + dict( + hidden_size=1280, + patch_size=14, + transformer=dict(mlp_dim=5120, num_heads=16, num_layers=32), + ), + 'vit-g14': + dict( + hidden_size=1664, + patch_size=14, + transformer=dict(mlp_dim=8192, num_heads=16, num_layers=48), + ), +}) diff --git a/official/projects/vit/modeling/vit_test.py b/official/projects/vit/modeling/vit_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3d1aa8eb065a6f853f890d8f4c7b1f448b9e919b --- /dev/null +++ b/official/projects/vit/modeling/vit_test.py @@ -0,0 +1,73 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for VIT.""" + +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.vit.modeling import vit + + +class VisionTransformerTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (224, 85798656), + (256, 85844736), + ) + def test_network_creation(self, input_size, params_count): + """Test creation of VisionTransformer family models.""" + tf.keras.backend.set_image_data_format('channels_last') + input_specs = tf.keras.layers.InputSpec( + shape=[2, input_size, input_size, 3]) + network = vit.VisionTransformer(input_specs=input_specs) + + inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) + _ = network(inputs) + self.assertEqual(network.count_params(), params_count) + + def test_network_none_pooler(self): + tf.keras.backend.set_image_data_format('channels_last') + input_size = 256 + input_specs = tf.keras.layers.InputSpec( + shape=[2, input_size, input_size, 3]) + network = vit.VisionTransformer( + input_specs=input_specs, + patch_size=16, + pooler='none', + representation_size=128, + pos_embed_shape=(14, 14)) # (224 // 16) + + inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) + output = network(inputs)['encoded_tokens'] + self.assertEqual(output.shape, [1, 256, 128]) + + def test_posembedding_interpolation(self): + tf.keras.backend.set_image_data_format('channels_last') + input_size = 256 + input_specs = tf.keras.layers.InputSpec( + shape=[2, input_size, input_size, 3]) + network = vit.VisionTransformer( + input_specs=input_specs, + patch_size=16, + pooler='gap', + pos_embed_shape=(14, 14)) # (224 // 16) + + inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) + output = network(inputs)['pre_logits'] + self.assertEqual(output.shape, [1, 1, 1, 768]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/vit/train.py b/official/projects/vit/train.py new file mode 100644 index 0000000000000000000000000000000000000000..2ef9ebdfff1b1057f13cd4d7fe993134517b3c8c --- /dev/null +++ b/official/projects/vit/train.py @@ -0,0 +1,27 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver, including ViT configs..""" + +from absl import app + +from official.common import flags as tfm_flags +from official.projects.vit import configs # pylint: disable=unused-import +from official.projects.vit.modeling import vit # pylint: disable=unused-import +from official.vision import train + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/volumetric_models/configs/backbones.py b/official/projects/volumetric_models/configs/backbones.py index 7fb357d6884995bac0736312765a98e5d21e17ce..faae1882e3af4333b576e201b89cac22268a80da 100644 --- a/official/projects/volumetric_models/configs/backbones.py +++ b/official/projects/volumetric_models/configs/backbones.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Backbones configurations.""" import dataclasses from typing import Optional, Sequence diff --git a/official/projects/volumetric_models/configs/decoders.py b/official/projects/volumetric_models/configs/decoders.py index b5d0adea7cb8878329283539839fa9dddf25a932..828eaa9898c14953b4067f8c25d7c606d56cfc5f 100644 --- a/official/projects/volumetric_models/configs/decoders.py +++ b/official/projects/volumetric_models/configs/decoders.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Decoders configurations.""" import dataclasses from typing import Optional, Sequence diff --git a/official/projects/volumetric_models/configs/semantic_segmentation_3d.py b/official/projects/volumetric_models/configs/semantic_segmentation_3d.py index 713f9ea3510ffff64130c511ae7f1afb4a2797d5..3f6987f43bc3d795e25d75d4ab0c4dc6258dc518 100644 --- a/official/projects/volumetric_models/configs/semantic_segmentation_3d.py +++ b/official/projects/volumetric_models/configs/semantic_segmentation_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Semantic segmentation configuration definition.""" import dataclasses @@ -23,7 +22,7 @@ from official.modeling import hyperparams from official.modeling import optimization from official.projects.volumetric_models.configs import backbones from official.projects.volumetric_models.configs import decoders -from official.vision.beta.configs import common +from official.vision.configs import common @dataclasses.dataclass diff --git a/official/projects/volumetric_models/configs/semantic_segmentation_3d_test.py b/official/projects/volumetric_models/configs/semantic_segmentation_3d_test.py index fd56c2a672350980f4aafa5ac02b620d609a053a..e54b0f98f451688c6a9812a33048f00b990223db 100644 --- a/official/projects/volumetric_models/configs/semantic_segmentation_3d_test.py +++ b/official/projects/volumetric_models/configs/semantic_segmentation_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for semantic_segmentation.""" # pylint: disable=unused-import diff --git a/official/projects/volumetric_models/dataloaders/segmentation_input_3d.py b/official/projects/volumetric_models/dataloaders/segmentation_input_3d.py index 381159684cea6bb523bcc4c872f1ab86e8704848..1d7ba4bfb86c800d5040a4d9520dbc6fa74d7d43 100644 --- a/official/projects/volumetric_models/dataloaders/segmentation_input_3d.py +++ b/official/projects/volumetric_models/dataloaders/segmentation_input_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,8 +16,8 @@ from typing import Any, Dict, Sequence, Tuple import tensorflow as tf -from official.vision.beta.dataloaders import decoder -from official.vision.beta.dataloaders import parser +from official.vision.dataloaders import decoder +from official.vision.dataloaders import parser class Decoder(decoder.Decoder): diff --git a/official/projects/volumetric_models/dataloaders/segmentation_input_3d_test.py b/official/projects/volumetric_models/dataloaders/segmentation_input_3d_test.py index 931867f447b4f5cb3c80172984ef135271b6fad0..7a71de141cd360caa899cfd052cd51292572360f 100644 --- a/official/projects/volumetric_models/dataloaders/segmentation_input_3d_test.py +++ b/official/projects/volumetric_models/dataloaders/segmentation_input_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ from absl.testing import parameterized import tensorflow as tf from official.projects.volumetric_models.dataloaders import segmentation_input_3d -from official.vision.beta.dataloaders import tfexample_utils +from official.vision.dataloaders import tfexample_utils class InputReaderTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/volumetric_models/evaluation/segmentation_metrics.py b/official/projects/volumetric_models/evaluation/segmentation_metrics.py index 9d53c7de869a8593baa99898cb7f9d928ae24aad..fc265720f8b2ee75706aadbacf4b1274a40d70d0 100644 --- a/official/projects/volumetric_models/evaluation/segmentation_metrics.py +++ b/official/projects/volumetric_models/evaluation/segmentation_metrics.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/evaluation/segmentation_metrics_test.py b/official/projects/volumetric_models/evaluation/segmentation_metrics_test.py index cf93b7556ae5b7ccda23a5e14063291fccc84c8b..1eac720016f06aaaa30af9d618d567e0f8aadd37 100644 --- a/official/projects/volumetric_models/evaluation/segmentation_metrics_test.py +++ b/official/projects/volumetric_models/evaluation/segmentation_metrics_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/losses/segmentation_losses.py b/official/projects/volumetric_models/losses/segmentation_losses.py index fad3d695b77c24c752c573754f8ac527a1748ea7..6d422a9243e5f9370f14aadde159be8c5000f627 100644 --- a/official/projects/volumetric_models/losses/segmentation_losses.py +++ b/official/projects/volumetric_models/losses/segmentation_losses.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/losses/segmentation_losses_test.py b/official/projects/volumetric_models/losses/segmentation_losses_test.py index ef047f064e5b7190a99a1005609e4ed6baf0883b..f2f444c2b238e51ea336b2c24a2305ce1dca5790 100644 --- a/official/projects/volumetric_models/losses/segmentation_losses_test.py +++ b/official/projects/volumetric_models/losses/segmentation_losses_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/modeling/backbones/__init__.py b/official/projects/volumetric_models/modeling/backbones/__init__.py index 08bfc21705e73b63cccd1f1a179757bd6880cf9b..8e220167f1d189e088239c348c991d981b01a2d9 100644 --- a/official/projects/volumetric_models/modeling/backbones/__init__.py +++ b/official/projects/volumetric_models/modeling/backbones/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Backbones package definition.""" from official.projects.volumetric_models.modeling.backbones.unet_3d import UNet3D diff --git a/official/projects/volumetric_models/modeling/backbones/unet_3d.py b/official/projects/volumetric_models/modeling/backbones/unet_3d.py index d5539c9df35a93925338a78f614bb5806c1b0209..c675315aa37b152257cc3d84b886bb13a6e1bd34 100644 --- a/official/projects/volumetric_models/modeling/backbones/unet_3d.py +++ b/official/projects/volumetric_models/modeling/backbones/unet_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -25,7 +25,7 @@ from typing import Any, Mapping, Sequence import tensorflow as tf from official.modeling import hyperparams from official.projects.volumetric_models.modeling import nn_blocks_3d -from official.vision.beta.modeling.backbones import factory +from official.vision.modeling.backbones import factory layers = tf.keras.layers diff --git a/official/projects/volumetric_models/modeling/backbones/unet_3d_test.py b/official/projects/volumetric_models/modeling/backbones/unet_3d_test.py index e42b3d1cbadada1b7489c78c98bd13683a2612ef..01e86a9d1a37e4c98deef399860569351c4bd10a 100644 --- a/official/projects/volumetric_models/modeling/backbones/unet_3d_test.py +++ b/official/projects/volumetric_models/modeling/backbones/unet_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for 3D UNet backbone.""" # Import libraries diff --git a/official/projects/volumetric_models/modeling/decoders/__init__.py b/official/projects/volumetric_models/modeling/decoders/__init__.py index ef86bd5206cd89b31a2187a0484949672412c470..f699cffbcc986c25fabaee28172ee033d32b938e 100644 --- a/official/projects/volumetric_models/modeling/decoders/__init__.py +++ b/official/projects/volumetric_models/modeling/decoders/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Decoders package definition.""" from official.projects.volumetric_models.modeling.decoders.unet_3d_decoder import UNet3DDecoder diff --git a/official/projects/volumetric_models/modeling/decoders/factory.py b/official/projects/volumetric_models/modeling/decoders/factory.py index 759779caa6a2aae21e0ff8267756a37b4d2f65a1..0e1d17c42996122aea8471abd1e1bd3f1222d1c6 100644 --- a/official/projects/volumetric_models/modeling/decoders/factory.py +++ b/official/projects/volumetric_models/modeling/decoders/factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/modeling/decoders/factory_test.py b/official/projects/volumetric_models/modeling/decoders/factory_test.py index 50fbd1b2bd76120d506b0f2854040924dc317ddb..bcd4df694507f8d677afaee04e6fc3dc4b19ebb7 100644 --- a/official/projects/volumetric_models/modeling/decoders/factory_test.py +++ b/official/projects/volumetric_models/modeling/decoders/factory_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder.py b/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder.py index a83724fe5c5df3a47bb389dc794151a79fec4d87..0f6d43972d568e8329560327807a2e0b03104fd5 100644 --- a/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder.py +++ b/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -95,7 +95,7 @@ class UNet3DDecoder(tf.keras.Model): channel_dim = 1 # Build 3D UNet. - inputs = self._build_input_pyramid(input_specs, model_id) + inputs = self._build_input_pyramid(input_specs, model_id) # pytype: disable=wrong-arg-types # dynamic-method-lookup # Add levels with up-convolution or up-sampling. x = inputs[str(model_id)] diff --git a/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder_test.py b/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder_test.py index 39dea0887d9df4b2cffa03af8cd09afc1d04fb9b..d901a6e46eb18b1caeeca94179dd40a42b825413 100644 --- a/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder_test.py +++ b/official/projects/volumetric_models/modeling/decoders/unet_3d_decoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for 3D UNet decoder.""" # Import libraries diff --git a/official/projects/volumetric_models/modeling/factory.py b/official/projects/volumetric_models/modeling/factory.py index caff2e09f97949c5ef0f917380f5db9cb2b797f7..640ba067531606b01997202d7361f476c832486f 100644 --- a/official/projects/volumetric_models/modeling/factory.py +++ b/official/projects/volumetric_models/modeling/factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,7 +13,7 @@ # limitations under the License. """Factory methods to build models.""" - +from typing import Sequence, Union # Import libraries import tensorflow as tf @@ -21,14 +21,16 @@ import tensorflow as tf from official.modeling import hyperparams from official.projects.volumetric_models.modeling.decoders import factory as decoder_factory from official.projects.volumetric_models.modeling.heads import segmentation_heads_3d -from official.vision.beta.modeling import segmentation_model -from official.vision.beta.modeling.backbones import factory as backbone_factory +from official.vision.modeling import segmentation_model +from official.vision.modeling.backbones import factory as backbone_factory def build_segmentation_model_3d( - input_specs: tf.keras.layers.InputSpec, + input_specs: Union[tf.keras.layers.InputSpec, + Sequence[tf.keras.layers.InputSpec]], model_config: hyperparams.Config, - l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + l2_regularizer: tf.keras.regularizers.Regularizer = None +) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras """Builds Segmentation model.""" norm_activation_config = model_config.norm_activation backbone = backbone_factory.build_backbone( diff --git a/official/projects/volumetric_models/modeling/factory_test.py b/official/projects/volumetric_models/modeling/factory_test.py index 86000af99f8a12b64b8d2f0f324c6cb869fb36e8..2de27eeb833a24fe1951a5505ebeabe04fb16421 100644 --- a/official/projects/volumetric_models/modeling/factory_test.py +++ b/official/projects/volumetric_models/modeling/factory_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d.py b/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d.py index 3154a271aef61854fedcc3fd0308f84efdf67f76..f2358d64199f32149caa861a0126f931f580cf07 100644 --- a/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d.py +++ b/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -88,7 +88,7 @@ class SegmentationHead3D(tf.keras.layers.Layer): self._bn_axis = -1 else: self._bn_axis = 1 - self._activation = tf_utils.get_activation(activation) + self._activation = tf_utils.get_activation(activation, use_keras_layer=True) def build(self, input_shape: Union[tf.TensorShape, Sequence[tf.TensorShape]]): """Creates the variables of the segmentation head.""" diff --git a/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d_test.py b/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d_test.py index 4dc35fdeb68dc90ab202ead18eb47c7129b0619a..6c3aee1ee92ef302cb0a3d5aa2d80599787d5eda 100644 --- a/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d_test.py +++ b/official/projects/volumetric_models/modeling/heads/segmentation_heads_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for segmentation_heads.py.""" from absl.testing import parameterized diff --git a/official/projects/volumetric_models/modeling/nn_blocks_3d.py b/official/projects/volumetric_models/modeling/nn_blocks_3d.py index b2f6dbd790a2f2ace74f4eb1453c8045cbb38e53..a7c459b0dc18f36b9ee640295cd67b2f51e880f5 100644 --- a/official/projects/volumetric_models/modeling/nn_blocks_3d.py +++ b/official/projects/volumetric_models/modeling/nn_blocks_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ from typing import Sequence, Union import tensorflow as tf from official.modeling import tf_utils -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.layers import nn_layers @tf.keras.utils.register_keras_serializable(package='Vision') diff --git a/official/projects/volumetric_models/modeling/nn_blocks_3d_test.py b/official/projects/volumetric_models/modeling/nn_blocks_3d_test.py index 759757ceca4bbd8060af53a90a746e28985634a1..cd18c6b27d04a89205925e6dc9829c578b664d54 100644 --- a/official/projects/volumetric_models/modeling/nn_blocks_3d_test.py +++ b/official/projects/volumetric_models/modeling/nn_blocks_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for 3D volumeric convoluion blocks.""" # Import libraries diff --git a/official/projects/volumetric_models/modeling/segmentation_model_test.py b/official/projects/volumetric_models/modeling/segmentation_model_test.py index 3bfc94882c141cb7bb6f8923d13de3db2d5d158d..f5df0a4241d8d55bc4b65646a28b5d7571aa53d6 100644 --- a/official/projects/volumetric_models/modeling/segmentation_model_test.py +++ b/official/projects/volumetric_models/modeling/segmentation_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for segmentation network.""" from absl.testing import parameterized @@ -21,7 +20,7 @@ import tensorflow as tf from official.projects.volumetric_models.modeling import backbones from official.projects.volumetric_models.modeling import decoders from official.projects.volumetric_models.modeling.heads import segmentation_heads_3d -from official.vision.beta.modeling import segmentation_model +from official.vision.modeling import segmentation_model class SegmentationNetworkUNet3DTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/volumetric_models/registry_imports.py b/official/projects/volumetric_models/registry_imports.py index 06a551227a0e5838aa8e12852dae11dc96c41490..461a0028f1496ee21a0ca0dba99bdbd846f43d91 100644 --- a/official/projects/volumetric_models/registry_imports.py +++ b/official/projects/volumetric_models/registry_imports.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,8 +15,8 @@ """All necessary imports for registration.""" # pylint: disable=unused-import -from official.common import registry_imports from official.projects.volumetric_models.configs import semantic_segmentation_3d as semantic_segmentation_3d_cfg from official.projects.volumetric_models.modeling import backbones from official.projects.volumetric_models.modeling import decoders from official.projects.volumetric_models.tasks import semantic_segmentation_3d +from official.vision import registry_imports diff --git a/official/projects/volumetric_models/serving/export_saved_model.py b/official/projects/volumetric_models/serving/export_saved_model.py index 1a01003857523ee414ef9fcd8b33c5b7d5839695..0fcf249ef9ec93b2403aa8daf7d24e6473a4503a 100644 --- a/official/projects/volumetric_models/serving/export_saved_model.py +++ b/official/projects/volumetric_models/serving/export_saved_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -43,7 +43,7 @@ from official.common import registry_imports # pylint: disable=unused-import from official.core import exp_factory from official.modeling import hyperparams from official.projects.volumetric_models.serving import semantic_segmentation_3d -from official.vision.beta.serving import export_saved_model_lib +from official.vision.serving import export_saved_model_lib FLAGS = flags.FLAGS diff --git a/official/projects/volumetric_models/serving/semantic_segmentation_3d.py b/official/projects/volumetric_models/serving/semantic_segmentation_3d.py index 4d6097095560378b5aee60514a77f61fd96e4a27..a85399c43b6a3c3d6ac0b25a1a0c1010f1fa87f1 100644 --- a/official/projects/volumetric_models/serving/semantic_segmentation_3d.py +++ b/official/projects/volumetric_models/serving/semantic_segmentation_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,7 +22,7 @@ import tensorflow as tf from official.projects.volumetric_models.modeling import backbones from official.projects.volumetric_models.modeling import decoders from official.projects.volumetric_models.modeling import factory -from official.vision.beta.serving import export_base +from official.vision.serving import export_base class SegmentationModule(export_base.ExportModule): diff --git a/official/projects/volumetric_models/serving/semantic_segmentation_3d_test.py b/official/projects/volumetric_models/serving/semantic_segmentation_3d_test.py index 11097bf219d90ceafb4a7851065be9e675bb7ee4..4001b829d68fb7296be8f2e479ffd146465e6314 100644 --- a/official/projects/volumetric_models/serving/semantic_segmentation_3d_test.py +++ b/official/projects/volumetric_models/serving/semantic_segmentation_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/volumetric_models/tasks/semantic_segmentation_3d.py b/official/projects/volumetric_models/tasks/semantic_segmentation_3d.py index a6222ab0d9e0b3c2dd3c2ed73e6e141f731a52ea..928d5d26cbfebf7de84518a2601f94820adfb214 100644 --- a/official/projects/volumetric_models/tasks/semantic_segmentation_3d.py +++ b/official/projects/volumetric_models/tasks/semantic_segmentation_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Image segmentation task definition.""" from typing import Any, Dict, Mapping, Optional, Sequence, Union diff --git a/official/projects/volumetric_models/tasks/semantic_segmentation_3d_test.py b/official/projects/volumetric_models/tasks/semantic_segmentation_3d_test.py index a7fec218d811c49f944bec2c8500047485d5bf1e..08cf0e693d28ee6ef8c7f88c4c35f8ca330b7dd7 100644 --- a/official/projects/volumetric_models/tasks/semantic_segmentation_3d_test.py +++ b/official/projects/volumetric_models/tasks/semantic_segmentation_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for semantic segmentation task.""" # pylint: disable=unused-import @@ -30,7 +29,7 @@ from official.projects.volumetric_models.evaluation import segmentation_metrics from official.projects.volumetric_models.modeling import backbones from official.projects.volumetric_models.modeling import decoders from official.projects.volumetric_models.tasks import semantic_segmentation_3d as img_seg_task -from official.vision.beta.dataloaders import tfexample_utils +from official.vision.dataloaders import tfexample_utils class SemanticSegmentationTaskTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/projects/volumetric_models/train.py b/official/projects/volumetric_models/train.py index b84569e1b8404f3bfa9f31ae813dac1725f24d61..e04956fb722e122ca40ba65b279e2dfda348ba76 100644 --- a/official/projects/volumetric_models/train.py +++ b/official/projects/volumetric_models/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ import gin # pylint: disable=unused-import from official.common import flags as tfm_flags from official.projects.volumetric_models import registry_imports # pylint: disable=unused-import -from official.vision.beta import train +from official.vision import train def main(_): diff --git a/official/projects/volumetric_models/train_test.py b/official/projects/volumetric_models/train_test.py index 47a32ec54b86cad29b24afac625f9a0a5ec0441e..50e8fa7e49a9e3812410d5c2ecc90d5ac9af3b3d 100644 --- a/official/projects/volumetric_models/train_test.py +++ b/official/projects/volumetric_models/train_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ from absl import logging from absl.testing import flagsaver import tensorflow as tf from official.projects.volumetric_models import train as train_lib -from official.vision.beta.dataloaders import tfexample_utils +from official.vision.dataloaders import tfexample_utils FLAGS = flags.FLAGS diff --git a/official/projects/waste_identification_ml/README.md b/official/projects/waste_identification_ml/README.md new file mode 100644 index 0000000000000000000000000000000000000000..edbe262ca5b5cd3489846a852b9b71759b2fc20e --- /dev/null +++ b/official/projects/waste_identification_ml/README.md @@ -0,0 +1,35 @@ +# CircularNet + +Instance segmentation models for identification of recyclables on conveyor +belts. + +Note: These are demo models built on limited datasets. If you’re interested in +updated versions of the models, or in using models trained on specific +materials, reach out to waste-innovation-external@google.com + +## Overview + +CircularNet is built using Mask RCNN, which is a deep learning model for +instance image segmentation, where the goal is to assign instance level labels +(e.g. person1, person2, cat) to every pixel in an input image. + +Mask RCNN algorithm is available in the TensorFlow Model Garden which is a +repository with a number of different implementations of state-of-the-art models +and modeling solutions for TensorFlow users. + +## Model Categories + +- Material Type - Identifies the high level material type (e.g. plastic, paper + etc) of an object +- Material Form - Categorizes objects based on the form factor (e.g. cup, + bottle, bag etc) +- Plastic Type - Identifies the plastic resin type of the object (e.g. PET, + HDPE, LDPE, etc) + +## Model paths in GCP buckets + +| Model categories | Model backbone | Model type | GCP bucket path | +| ------ | ------ | ----- | ------ | +| Material Model | Resnet | saved model & TFLite | [click here](https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_model.zip) | +| Material Form model | Resnet | saved model & TFLite | [click here](https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_form_model.zip) | +|Plastic model | Resnet| saved model & TFLite | [click here](https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/plastic_types_model.zip) | diff --git a/official/projects/waste_identification_ml/model_conversion/checkpoints_to_savedModel_to_tflite.ipynb b/official/projects/waste_identification_ml/model_conversion/checkpoints_to_savedModel_to_tflite.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..9cf5417dd6b5b38ac2f59d0805f4194e55549d10 --- /dev/null +++ b/official/projects/waste_identification_ml/model_conversion/checkpoints_to_savedModel_to_tflite.ipynb @@ -0,0 +1,346 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "wm0ezXfhdp2P" + }, + "source": [ + "# Convert Tensorflow model checkpoints to saved model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KAy1ciItdrDB" + }, + "source": [ + "Given the checkpoints exported from Tensorflow model training, our goal is to convert those checkpoints into saved model for inference purpose.\u003cbr\u003e\n", + "Checkpoints is a binary file which contains all the values of the weights, biases, gradients and all the other variables saved. This file has an extension .ckpt. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.\u003cbr\u003e\n", + "A saved model contains a complete tensorflow program, including trained parameters and computation. It does not require the original model building code to run, which makes it useful for sharing or deploying with TFLite, tensorflow.js, Tensorflow Serving or Tensorflow Hub." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Lg4uH43Qds0q" + }, + "source": [ + "**Note** - We also assume that the script will be used as a Google Colab notebook. But this can be changed according to the needs of users. They can modify this in case they are working on their local workstation, remote server or any other database. This colab notebook can be changed to a regular jupyter notebook running on a local machine according to the need of the users." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BofYg406d4LV" + }, + "source": [ + "## Import libraries \u0026 clone the TF model directory" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oegDgL7yaaAq" + }, + "outputs": [], + "source": [ + "# install model-garden official and RESTART RUNTIME of the colab\n", + "!pip install tf-models-official" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fMROz-xXdx6c" + }, + "outputs": [], + "source": [ + "import os\n", + "from google.colab import drive\n", + "import yaml\n", + "import tensorflow as tf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 1197, + "status": "ok", + "timestamp": 1659384634603, + "user": { + "displayName": "Umair Sabir", + "userId": "06940594206388957365" + }, + "user_tz": 420 + }, + "id": "gz1ajpHgeAJT", + "outputId": "1187e44e-82eb-4be1-8adc-1b50f6d7d0ed" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount(\"/content/gdrive\", force_remount=True).\n", + "ln: failed to create symbolic link '/mydrive/My Drive': File exists\n", + "Successful\n" + ] + } + ], + "source": [ + "# use this if your model and data are stored in the google drive\n", + "drive.mount('/content/gdrive')\n", + "\n", + "try:\n", + " !ln -s /content/gdrive/My\\ Drive/ /mydrive\n", + " print('Successful')\n", + "except Exception as e:\n", + " print(e)\n", + " print('Not successful')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 290, + "status": "ok", + "timestamp": 1659384648663, + "user": { + "displayName": "Umair Sabir", + "userId": "06940594206388957365" + }, + "user_tz": 420 + }, + "id": "rRGalo90e2my", + "outputId": "f0305c8b-cf06-4637-a83c-05f5471313e7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "fatal: destination path 'models' already exists and is not an empty directory.\n" + ] + } + ], + "source": [ + "# Clone the tensorflow models repository\n", + "!git clone --depth 1 https://github.com/tensorflow/models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 189, + "status": "ok", + "timestamp": 1659384681521, + "user": { + "displayName": "Umair Sabir", + "userId": "06940594206388957365" + }, + "user_tz": 420 + }, + "id": "HalXsX7BqdyX", + "outputId": "cb5555e4-0b77-4036-9230-1c01fcf1afaf" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/content/models\n" + ] + } + ], + "source": [ + "%cd models" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8o7QzpHOsFHS" + }, + "source": [ + "## **MUST CHANGE** - Define the parameters according to your need" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xak5WwMXppDF" + }, + "outputs": [], + "source": [ + "# this parameter depends on the backbone you will be using. In our \n", + "# case we used resnet backbone\n", + "EXPERIMENT_TYPE = 'maskrcnn_resnetfpn_coco' #@param {type:\"string\"}\n", + "\n", + "# path to the folder where all the files and checkpoints after model training \n", + "# are exported to\n", + "CHECKPOINT_PATH = '/mydrive/plastics_model/version_1/' #@param {type:\"string\"}\n", + "\n", + "# path where the saved model will be exported to\n", + "EXPORT_DIR_PATH = '/mydrive/plastics_model/experiment/' #@param {type:\"string\"}\n", + "\n", + "# config files are always stored with the checkpoints\n", + "CONFIG_FILE= CHECKPOINT_PATH + 'params.yaml'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ddZzH5KhqSAy" + }, + "outputs": [], + "source": [ + "# config files are always stored with the checkpoints\n", + "# read the params.yaml file in order to get the height and width of an image\n", + "with open(CONFIG_FILE) as f:\n", + " my_dict = yaml.safe_load(f)\n", + "\n", + "HEIGHT = my_dict['task']['model']['input_size'][0]\n", + "WIDTH = my_dict['task']['model']['input_size'][1]\n", + "print(HEIGHT, WIDTH)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZWrbqHqJt947" + }, + "source": [ + "## calling the function to convert checkpoints to saved model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "k3wuYiVte4t_" + }, + "outputs": [], + "source": [ + "# run the conversion command\n", + "!python -m official.vision.serving.export_saved_model --experiment=$EXPERIMENT_TYPE \\\n", + " --export_dir=$EXPORT_DIR_PATH \\\n", + " --checkpoint_path=$CHECKPOINT_PATH \\\n", + " --batch_size=1 \\\n", + " --input_image_size=$HEIGHT,$WIDTH \\\n", + " --input_type=tflite \\\n", + " --config_file=$CONFIG_FILE" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jQb13BbW78bs" + }, + "source": [ + "# Convert saved model to TF Lite model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QEKaddZ58Ab6" + }, + "source": [ + "Given the saved model after Tensorflow model training, our goal is to convert saved model to TFLite for inference purpose on edge devices. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZXpSX-w_8A1c" + }, + "source": [ + "Tensorflow Lite is a set of tools that enables on-device machine learning by helping developers run their models on mobile, embedded and edge devices." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "20sV-bx59GRD" + }, + "source": [ + "## **MUST CHANGE** - Define the parameters according to your need" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "icoDtIin9REv" + }, + "outputs": [], + "source": [ + "# path where the tflite model will be written with its name\n", + "TFLITE_PATH = '/mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/model_version_1/plastics_model/tflite_fan/model.tflite' #@param {type:\"string\"}\n", + "\n", + "# path where saved model parameters are saved\n", + "SAVED_MODEL_DIR = EXPORT_DIR_PATH + '/saved_model/'" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YE1Z13xs9NtJ" + }, + "source": [ + "## conversion of saved model to tflite" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1LY9yoUP6Gr4" + }, + "outputs": [], + "source": [ + "converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir=SAVED_MODEL_DIR) \n", + "tflite_model = converter.convert() \n", + "with open(TFLITE_PATH, 'wb') as f:\n", + " f.write(tflite_model)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "checkpoints_to_saved_model_to_tflite.ipynb", + "provenance": [] + }, + "gpuClass": "standard", + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/waste_identification_ml/model_inference/TFHub_saved_model_inference.ipynb b/official/projects/waste_identification_ml/model_inference/TFHub_saved_model_inference.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..fbc8e09a2a1781a874d7724325955fa8923cdb22 --- /dev/null +++ b/official/projects/waste_identification_ml/model_inference/TFHub_saved_model_inference.ipynb @@ -0,0 +1,1215 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "rOvvWAVTkMR7" + }, + "source": [ + "# Welcome to the Waste Identification Colab\n", + "\n", + "Welcome to the Instance Segmentation Colab! This notebook will take you through the steps of running an \"out-of-the-box\" Mask RCNN Instance Segmentation model on images." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HVTXSC07QwfG" + }, + "source": [ + "Given 3 different Mask RCNN models for the material type, material form type and plastic type, your goal is to perform inference with any of the models and visualize the results. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AQUsAE0TRkmh" + }, + "source": [ + "To finish this task, a proper path for the saved models and a single image needs to be provided. The path to the labels on which the models are trained is in the waste_identification_ml directory inside the Tensorflow Model Garden repository. The label files are inferred automatically once you select the ML model by which you want to do the inference." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vPs64QA1Zdov" + }, + "source": [ + "## Imports and Setup\n", + "\n", + "Let's start with the base imports." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yn5_uV1HLvaz" + }, + "outputs": [], + "source": [ + "import os\n", + "import pathlib\n", + "import cv2\n", + "import logging\n", + "logging.disable(logging.WARNING)\n", + "\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import numpy as np\n", + "from six import BytesIO\n", + "from PIL import Image\n", + "from six.moves.urllib.request import urlopen\n", + "\n", + "import tensorflow as tf\n", + "import tensorflow_hub as hub\n", + "\n", + "tf.get_logger().setLevel('ERROR')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "14bNk1gzh0TN" + }, + "source": [ + "## Visualization tools\n", + "\n", + "To visualize the images with the proper detected boxes and segmentation masks, we will use the TensorFlow Object Detection API. To install it we will clone the repo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oi28cqGGFWnY", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c2b27919-e013-4d47-be4a-4f62291b8ac6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Cloning into 'models'...\n", + "remote: Enumerating objects: 3451, done.\u001b[K\n", + "remote: Counting objects: 100% (3451/3451), done.\u001b[K\n", + "remote: Compressing objects: 100% (2891/2891), done.\u001b[K\n", + "remote: Total 3451 (delta 896), reused 1476 (delta 503), pack-reused 0\u001b[K\n", + "Receiving objects: 100% (3451/3451), 46.92 MiB | 15.79 MiB/s, done.\n", + "Resolving deltas: 100% (896/896), done.\n", + "Checking out files: 100% (3125/3125), done.\n" + ] + } + ], + "source": [ + "# Clone the tensorflow models repository\n", + "!git clone --depth 1 https://github.com/tensorflow/models" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yX3pb_pXDjYA" + }, + "source": [ + "Intalling the Object Detection API" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NwdsBdGhFanc", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c507dccb-4658-4964-fcce-baed5a817db9" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Reading package lists...\n", + "Building dependency tree...\n", + "Reading state information...\n", + "protobuf-compiler is already the newest version (3.0.0-9.1ubuntu1).\n", + "The following package was automatically installed and is no longer required:\n", + " libnvidia-common-460\n", + "Use 'sudo apt autoremove' to remove it.\n", + "0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.\n", + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Processing /content/models/research\n", + "Collecting avro-python3\n", + " Downloading avro-python3-1.10.2.tar.gz (38 kB)\n", + "Collecting apache-beam\n", + " Downloading apache_beam-2.40.0-cp37-cp37m-manylinux2010_x86_64.whl (10.9 MB)\n", + "Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (7.1.2)\n", + "Requirement already satisfied: lxml in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (4.9.1)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (3.2.2)\n", + "Requirement already satisfied: Cython in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (0.29.32)\n", + "Requirement already satisfied: contextlib2 in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (0.5.5)\n", + "Collecting tf-slim\n", + " Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.15.0)\n", + "Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.0.4)\n", + "Collecting lvis\n", + " Downloading lvis-0.5.3-py3-none-any.whl (14 kB)\n", + "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.7.3)\n", + "Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.3.5)\n", + "Collecting tf-models-official>=2.5.1\n", + " Downloading tf_models_official-2.9.2-py2.py3-none-any.whl (2.1 MB)\n", + "Collecting tensorflow_io\n", + " Downloading tensorflow_io-0.26.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (25.9 MB)\n", + "Requirement already satisfied: keras in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.8.0)\n", + "Collecting pyparsing==2.4.7\n", + " Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)\n", + "Requirement already satisfied: oauth2client in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (4.1.3)\n", + "Requirement already satisfied: opencv-python-headless in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (4.6.0.66)\n", + "Collecting sacrebleu\n", + " Downloading sacrebleu-2.2.0-py3-none-any.whl (116 kB)\n", + "Requirement already satisfied: tensorflow-datasets in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (4.6.0)\n", + "Collecting py-cpuinfo>=3.3.0\n", + " Downloading py-cpuinfo-8.0.0.tar.gz (99 kB)\n", + "Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (1.21.6)\n", + "Requirement already satisfied: gin-config in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (0.5.0)\n", + "Collecting pyyaml<6.0,>=5.1\n", + " Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)\n", + "Requirement already satisfied: tensorflow-hub>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (0.12.0)\n", + "Collecting seqeval\n", + " Downloading seqeval-1.2.2.tar.gz (43 kB)\n", + "Collecting tensorflow~=2.9.0\n", + " Downloading tensorflow-2.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB)\n", + "Collecting tensorflow-text~=2.9.0\n", + " Downloading tensorflow_text-2.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)\n", + "Requirement already satisfied: psutil>=5.4.3 in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (5.4.8)\n", + "Requirement already satisfied: google-api-python-client>=1.6.7 in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (1.12.11)\n", + "Collecting sentencepiece\n", + " Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", + "Collecting tensorflow-addons\n", + " Downloading tensorflow_addons-0.17.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", + "Collecting tensorflow-model-optimization>=0.4.1\n", + " Downloading tensorflow_model_optimization-0.7.3-py2.py3-none-any.whl (238 kB)\n", + "Requirement already satisfied: kaggle>=1.3.9 in /usr/local/lib/python3.7/dist-packages (from tf-models-official>=2.5.1->object-detection==0.1) (1.5.12)\n", + "Requirement already satisfied: google-auth<3dev,>=1.16.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (1.35.0)\n", + "Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (0.0.4)\n", + "Requirement already satisfied: httplib2<1dev,>=0.15.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (0.17.4)\n", + "Requirement already satisfied: google-api-core<3dev,>=1.21.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (1.31.6)\n", + "Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (3.0.1)\n", + "Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (2022.1)\n", + "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (2.23.0)\n", + "Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (57.4.0)\n", + "Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (21.3)\n", + "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (1.56.4)\n", + "Requirement already satisfied: protobuf<4.0.0dev,>=3.12.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (3.17.3)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (0.2.8)\n", + "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (4.2.4)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (4.9)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1) (2022.6.15)\n", + "Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1) (6.1.2)\n", + "Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1) (2.8.2)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1) (4.64.0)\n", + "Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1) (1.24.3)\n", + "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (0.4.8)\n", + "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (3.0.4)\n", + "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (2.10)\n", + "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (0.26.0)\n", + "Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (3.1.0)\n", + "Collecting gast<=0.4.0,>=0.2.1\n", + " Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)\n", + "Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (0.2.0)\n", + "Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (3.3.0)\n", + "Collecting keras\n", + " Downloading keras-2.9.0-py2.py3-none-any.whl (1.6 MB)\n", + "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.47.0)\n", + "Collecting tensorboard<2.10,>=2.9\n", + " Downloading tensorboard-2.9.1-py3-none-any.whl (5.8 MB)\n", + "Collecting tensorflow-estimator<2.10.0,>=2.9.0rc0\n", + " Downloading tensorflow_estimator-2.9.0-py2.py3-none-any.whl (438 kB)\n", + "Requirement already satisfied: keras-preprocessing>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.1.2)\n", + "Collecting flatbuffers<2,>=1.12\n", + " Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)\n", + "Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (4.1.1)\n", + "Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (14.0.6)\n", + "Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.2.0)\n", + "Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.14.1)\n", + "Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.6.3)\n", + "Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.7/dist-packages (from astunparse>=1.6.0->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (0.37.1)\n", + "Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=2.9.0->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.5.2)\n", + "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (0.4.6)\n", + "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.0.1)\n", + "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (3.4.1)\n", + "Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.8.1)\n", + "Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (0.6.1)\n", + "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (1.3.1)\n", + "Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (4.12.0)\n", + "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (3.8.1)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official>=2.5.1->object-detection==0.1) (3.2.0)\n", + "Requirement already satisfied: dm-tree~=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow-model-optimization>=0.4.1->tf-models-official>=2.5.1->object-detection==0.1) (0.1.7)\n", + "Collecting cloudpickle<3,>=2.1.0\n", + " Downloading cloudpickle-2.1.0-py3-none-any.whl (25 kB)\n", + "Collecting fastavro<2,>=0.23.6\n", + " Downloading fastavro-1.5.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)\n", + "Collecting pymongo<4.0.0,>=3.8.0\n", + " Downloading pymongo-3.12.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (508 kB)\n", + "Collecting proto-plus<2,>=1.7.1\n", + " Downloading proto_plus-1.22.0-py3-none-any.whl (47 kB)\n", + "Collecting hdfs<3.0.0,>=2.1.0\n", + " Downloading hdfs-2.7.0-py3-none-any.whl (34 kB)\n", + "Requirement already satisfied: pyarrow<8.0.0,>=0.15.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam->object-detection==0.1) (6.0.1)\n", + "Requirement already satisfied: crcmod<2.0,>=1.7 in /usr/local/lib/python3.7/dist-packages (from apache-beam->object-detection==0.1) (1.7)\n", + "Requirement already satisfied: pydot<2,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam->object-detection==0.1) (1.3.0)\n", + "Collecting requests<3.0.0dev,>=2.18.0\n", + " Downloading requests-2.28.1-py3-none-any.whl (62 kB)\n", + "Collecting orjson<4.0\n", + " Downloading orjson-3.7.11-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (275 kB)\n", + "Collecting dill<0.3.2,>=0.3.1.1\n", + " Downloading dill-0.3.1.1.tar.gz (151 kB)\n", + "Collecting docopt\n", + " Downloading docopt-0.6.2.tar.gz (25 kB)\n", + "Collecting protobuf<4.0.0dev,>=3.12.0\n", + " Downloading protobuf-3.19.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", + "Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1) (2.1.0)\n", + "Requirement already satisfied: cycler>=0.10.0 in /usr/local/lib/python3.7/dist-packages (from lvis->object-detection==0.1) (0.11.0)\n", + "Requirement already satisfied: opencv-python>=4.1.0.25 in /usr/local/lib/python3.7/dist-packages (from lvis->object-detection==0.1) (4.6.0.66)\n", + "Requirement already satisfied: kiwisolver>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from lvis->object-detection==0.1) (1.4.4)\n", + "Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1) (1.3)\n", + "Collecting portalocker\n", + " Downloading portalocker-2.5.1-py2.py3-none-any.whl (15 kB)\n", + "Requirement already satisfied: tabulate>=0.8.9 in /usr/local/lib/python3.7/dist-packages (from sacrebleu->tf-models-official>=2.5.1->object-detection==0.1) (0.8.10)\n", + "Collecting colorama\n", + " Downloading colorama-0.4.5-py2.py3-none-any.whl (16 kB)\n", + "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from sacrebleu->tf-models-official>=2.5.1->object-detection==0.1) (2022.6.2)\n", + "Requirement already satisfied: scikit-learn>=0.21.3 in /usr/local/lib/python3.7/dist-packages (from seqeval->tf-models-official>=2.5.1->object-detection==0.1) (1.0.2)\n", + "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval->tf-models-official>=2.5.1->object-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval->tf-models-official>=2.5.1->object-detection==0.1) (3.1.0)\n", + "Requirement already satisfied: typeguard>=2.7 in /usr/local/lib/python3.7/dist-packages (from tensorflow-addons->tf-models-official>=2.5.1->object-detection==0.1) (2.7.1)\n", + "Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1) (1.9.0)\n", + "Requirement already satisfied: importlib-resources in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1) (5.9.0)\n", + "Requirement already satisfied: promise in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1) (2.3)\n", + "Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1) (0.10.2)\n", + "Requirement already satisfied: etils[epath] in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1) (0.7.1)\n", + "Building wheels for collected packages: object-detection, py-cpuinfo, dill, avro-python3, docopt, seqeval\n", + " Building wheel for object-detection (setup.py): started\n", + " Building wheel for object-detection (setup.py): finished with status 'done'\n", + " Created wheel for object-detection: filename=object_detection-0.1-py3-none-any.whl size=1694955 sha256=5b07d9d3f9b50a4e043579877792dcc5f911757c859a40646bfe44b1411c2b0b\n", + " Stored in directory: /tmp/pip-ephem-wheel-cache-_e25017f/wheels/fa/a4/d2/e9a5057e414fd46c8e543d2706cd836d64e1fcd9eccceb2329\n", + " Building wheel for py-cpuinfo (setup.py): started\n", + " Building wheel for py-cpuinfo (setup.py): finished with status 'done'\n", + " Created wheel for py-cpuinfo: filename=py_cpuinfo-8.0.0-py3-none-any.whl size=22257 sha256=16043574dd7ea707666388babd039116f8de2bbe60cc7c7bcc35fd39aa978651\n", + " Stored in directory: /root/.cache/pip/wheels/d2/f1/1f/041add21dc9c4220157f1bd2bd6afe1f1a49524c3396b94401\n", + " Building wheel for dill (setup.py): started\n", + " Building wheel for dill (setup.py): finished with status 'done'\n", + " Created wheel for dill: filename=dill-0.3.1.1-py3-none-any.whl size=78544 sha256=ed98e6a3a08f7b25f291c3ded9aa4b85fccbf64a0e52816961d5579e44e7594b\n", + " Stored in directory: /root/.cache/pip/wheels/a4/61/fd/c57e374e580aa78a45ed78d5859b3a44436af17e22ca53284f\n", + " Building wheel for avro-python3 (setup.py): started\n", + " Building wheel for avro-python3 (setup.py): finished with status 'done'\n", + " Created wheel for avro-python3: filename=avro_python3-1.10.2-py3-none-any.whl size=44010 sha256=6a76a37f9b0865055470da37fca662fff8934ac3a2776a1f01d09fac2703eb34\n", + " Stored in directory: /root/.cache/pip/wheels/d6/e5/b1/6b151d9b535ee50aaa6ab27d145a0104b6df02e5636f0376da\n", + " Building wheel for docopt (setup.py): started\n", + " Building wheel for docopt (setup.py): finished with status 'done'\n", + " Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=c1fd090fa0616d3d71b524dc465344600ef9cd2cfcff4f9558571a397f0fd7ad\n", + " Stored in directory: /root/.cache/pip/wheels/72/b0/3f/1d95f96ff986c7dfffe46ce2be4062f38ebd04b506c77c81b9\n", + " Building wheel for seqeval (setup.py): started\n", + " Building wheel for seqeval (setup.py): finished with status 'done'\n", + " Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16180 sha256=6a692bf82ea13a55bc0bb10a37d5b444a675ed7f77d72e893fa02a97246e9204\n", + " Stored in directory: /root/.cache/pip/wheels/05/96/ee/7cac4e74f3b19e3158dce26a20a1c86b3533c43ec72a549fd7\n", + "Successfully built object-detection py-cpuinfo dill avro-python3 docopt seqeval\n", + "Installing collected packages: requests, pyparsing, protobuf, tensorflow-estimator, tensorboard, keras, gast, flatbuffers, tensorflow, portalocker, docopt, dill, colorama, tf-slim, tensorflow-text, tensorflow-model-optimization, tensorflow-addons, seqeval, sentencepiece, sacrebleu, pyyaml, pymongo, py-cpuinfo, proto-plus, orjson, hdfs, fastavro, cloudpickle, tf-models-official, tensorflow-io, lvis, avro-python3, apache-beam, object-detection\n", + " Attempting uninstall: requests\n", + " Found existing installation: requests 2.23.0\n", + " Uninstalling requests-2.23.0:\n", + " Successfully uninstalled requests-2.23.0\n", + " Attempting uninstall: pyparsing\n", + " Found existing installation: pyparsing 3.0.9\n", + " Uninstalling pyparsing-3.0.9:\n", + " Successfully uninstalled pyparsing-3.0.9\n", + " Attempting uninstall: protobuf\n", + " Found existing installation: protobuf 3.17.3\n", + " Uninstalling protobuf-3.17.3:\n", + " Successfully uninstalled protobuf-3.17.3\n", + " Attempting uninstall: tensorflow-estimator\n", + " Found existing installation: tensorflow-estimator 2.8.0\n", + " Uninstalling tensorflow-estimator-2.8.0:\n", + " Successfully uninstalled tensorflow-estimator-2.8.0\n", + " Attempting uninstall: tensorboard\n", + " Found existing installation: tensorboard 2.8.0\n", + " Uninstalling tensorboard-2.8.0:\n", + " Successfully uninstalled tensorboard-2.8.0\n", + " Attempting uninstall: keras\n", + " Found existing installation: keras 2.8.0\n", + " Uninstalling keras-2.8.0:\n", + " Successfully uninstalled keras-2.8.0\n", + " Attempting uninstall: gast\n", + " Found existing installation: gast 0.5.3\n", + " Uninstalling gast-0.5.3:\n", + " Successfully uninstalled gast-0.5.3\n", + " Attempting uninstall: flatbuffers\n", + " Found existing installation: flatbuffers 2.0\n", + " Uninstalling flatbuffers-2.0:\n", + " Successfully uninstalled flatbuffers-2.0\n", + " Attempting uninstall: tensorflow\n", + " Found existing installation: tensorflow 2.8.2+zzzcolab20220719082949\n", + " Uninstalling tensorflow-2.8.2+zzzcolab20220719082949:\n", + " Successfully uninstalled tensorflow-2.8.2+zzzcolab20220719082949\n", + " Attempting uninstall: dill\n", + " Found existing installation: dill 0.3.5.1\n", + " Uninstalling dill-0.3.5.1:\n", + " Successfully uninstalled dill-0.3.5.1\n", + " Attempting uninstall: pyyaml\n", + " Found existing installation: PyYAML 3.13\n", + " Uninstalling PyYAML-3.13:\n", + " Successfully uninstalled PyYAML-3.13\n", + " Attempting uninstall: pymongo\n", + " Found existing installation: pymongo 4.2.0\n", + " Uninstalling pymongo-4.2.0:\n", + " Successfully uninstalled pymongo-4.2.0\n", + " Attempting uninstall: cloudpickle\n", + " Found existing installation: cloudpickle 1.3.0\n", + " Uninstalling cloudpickle-1.3.0:\n", + " Successfully uninstalled cloudpickle-1.3.0\n", + "Successfully installed apache-beam-2.40.0 avro-python3-1.10.2 cloudpickle-2.1.0 colorama-0.4.5 dill-0.3.1.1 docopt-0.6.2 fastavro-1.5.4 flatbuffers-1.12 gast-0.4.0 hdfs-2.7.0 keras-2.9.0 lvis-0.5.3 object-detection-0.1 orjson-3.7.11 portalocker-2.5.1 proto-plus-1.22.0 protobuf-3.19.4 py-cpuinfo-8.0.0 pymongo-3.12.3 pyparsing-2.4.7 pyyaml-5.4.1 requests-2.28.1 sacrebleu-2.2.0 sentencepiece-0.1.97 seqeval-1.2.2 tensorboard-2.9.1 tensorflow-2.9.1 tensorflow-addons-0.17.1 tensorflow-estimator-2.9.0 tensorflow-io-0.26.0 tensorflow-model-optimization-0.7.3 tensorflow-text-2.9.0 tf-models-official-2.9.2 tf-slim-1.1.0\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "\n", + "WARNING: apt does not have a stable CLI interface. Use with caution in scripts.\n", + "\n", + " DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.\n", + " pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.\n", + "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "gym 0.17.3 requires cloudpickle<1.7.0,>=1.2.0, but you have cloudpickle 2.1.0 which is incompatible.\n" + ] + } + ], + "source": [ + "%%bash\n", + "sudo apt install -y protobuf-compiler\n", + "cd models/research/\n", + "protoc object_detection/protos/*.proto --python_out=.\n", + "cp object_detection/packages/tf2/setup.py .\n", + "python -m pip install ." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3yDNgIx-kV7X" + }, + "source": [ + "Now we can import the dependencies we will need later" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2JCeQU3fkayh" + }, + "outputs": [], + "source": [ + "from object_detection.utils import label_map_util\n", + "from object_detection.utils import visualization_utils as viz_utils\n", + "from object_detection.utils import ops as utils_ops\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XRUr9Aiwuho7" + }, + "source": [ + "## Import pre-trained models from the Waste Identification project" + ] + }, + { + "cell_type": "code", + "source": [ + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_model.zip \n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_form_model.zip \n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/plastic_types_model.zip " + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "7AnetxTI6BdR", + "outputId": "dfef214d-9c6f-4464-95fa-5f0ccc9a17ba" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2022-08-12 22:53:17-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.103.128, 142.250.159.128, 142.251.120.128, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.103.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 521320844 (497M) [application/zip]\n", + "Saving to: ‘material_model.zip’\n", + "\n", + "material_model.zip 100%[===================>] 497.17M 97.0MB/s in 5.4s \n", + "\n", + "2022-08-12 22:53:23 (92.2 MB/s) - ‘material_model.zip’ saved [521320844/521320844]\n", + "\n", + "--2022-08-12 22:53:23-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_form_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.103.128, 142.250.159.128, 142.251.120.128, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.103.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 523568744 (499M) [application/zip]\n", + "Saving to: ‘material_form_model.zip’\n", + "\n", + "material_form_model 100%[===================>] 499.31M 147MB/s in 3.4s \n", + "\n", + "2022-08-12 22:53:27 (146 MB/s) - ‘material_form_model.zip’ saved [523568744/523568744]\n", + "\n", + "--2022-08-12 22:53:27-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/plastic_types_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.103.128, 142.250.159.128, 142.251.120.128, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.103.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 521268394 (497M) [application/zip]\n", + "Saving to: ‘plastic_types_model.zip’\n", + "\n", + "plastic_types_model 100%[===================>] 497.12M 137MB/s in 4.0s \n", + "\n", + "2022-08-12 22:53:31 (124 MB/s) - ‘plastic_types_model.zip’ saved [521268394/521268394]\n", + "\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "%%bash\n", + "mkdir material material_form plastic_type\n", + "unzip material_model.zip -d material/\n", + "unzip material_form_model.zip -d material_form/\n", + "unzip plastic_types_model.zip -d plastic_type/" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "SFFo0fdQtg26", + "outputId": "8a8843e2-b251-4d2a-84ed-34bae99e22cb" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Archive: material_model.zip\n", + " creating: material/saved_model/\n", + " inflating: material/saved_model/params.yaml \n", + " creating: material/saved_model/saved_model/\n", + " inflating: material/saved_model/saved_model/saved_model.pb \n", + " creating: material/saved_model/saved_model/variables/\n", + " inflating: material/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: material/saved_model/saved_model/variables/variables.index \n", + " creating: material/saved_model/checkpoint/\n", + " inflating: material/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: material/saved_model/checkpoint/checkpoint \n", + " inflating: material/saved_model/checkpoint/ckpt-1.index \n", + " creating: material/tflite_model/\n", + " inflating: material/tflite_model/model.tflite \n", + "Archive: material_form_model.zip\n", + " creating: material_form/saved_model/\n", + " inflating: material_form/saved_model/params.yaml \n", + " creating: material_form/saved_model/saved_model/\n", + " inflating: material_form/saved_model/saved_model/saved_model.pb \n", + " creating: material_form/saved_model/saved_model/variables/\n", + " inflating: material_form/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: material_form/saved_model/saved_model/variables/variables.index \n", + " creating: material_form/saved_model/checkpoint/\n", + " inflating: material_form/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: material_form/saved_model/checkpoint/checkpoint \n", + " inflating: material_form/saved_model/checkpoint/ckpt-1.index \n", + " creating: material_form/tflite_model/\n", + " inflating: material_form/tflite_model/model.tflite \n", + "Archive: plastic_types_model.zip\n", + " creating: plastic_type/saved_model/\n", + " inflating: plastic_type/saved_model/params.yaml \n", + " creating: plastic_type/saved_model/saved_model/\n", + " inflating: plastic_type/saved_model/saved_model/saved_model.pb \n", + " creating: plastic_type/saved_model/saved_model/variables/\n", + " inflating: plastic_type/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: plastic_type/saved_model/saved_model/variables/variables.index \n", + " creating: plastic_type/saved_model/checkpoint/\n", + " inflating: plastic_type/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: plastic_type/saved_model/checkpoint/checkpoint \n", + " inflating: plastic_type/saved_model/checkpoint/ckpt-1.index \n", + " creating: plastic_type/tflite_model/\n", + " inflating: plastic_type/tflite_model/model.tflite \n" + ] + } + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ey-8Ij2sKjkD" + }, + "outputs": [], + "source": [ + "ALL_MODELS = {\n", + "'material_model' : 'material/saved_model/saved_model/',\n", + "'material_form_model' : 'material_form/saved_model/saved_model/',\n", + "'plastic_model' : 'plastic_type/saved_model/saved_model/'\n", + "}\n", + "\n", + "# path to an image\n", + "IMAGES_FOR_TEST = {\n", + " 'Image1' : 'models/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_2.png'\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IogyryF2lFBL" + }, + "source": [ + "## Utilities\n", + "\n", + "Run the following cell to create some utils that will be needed later:\n", + "\n", + "- Helper method to load an image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9XXfEdD9PMKn" + }, + "outputs": [], + "source": [ + "# Inputs to preprocess functions\n", + "\n", + "def normalize_image(image,\n", + " offset=(0.485, 0.456, 0.406),\n", + " scale=(0.229, 0.224, 0.225)):\n", + " \"\"\"Normalizes the image to zero mean and unit variance.\"\"\"\n", + " with tf.name_scope('normalize_image'):\n", + " image = tf.image.convert_image_dtype(image, dtype=tf.float32)\n", + " offset = tf.constant(offset)\n", + " offset = tf.expand_dims(offset, axis=0)\n", + " offset = tf.expand_dims(offset, axis=0)\n", + " image -= offset\n", + "\n", + " scale = tf.constant(scale)\n", + " scale = tf.expand_dims(scale, axis=0)\n", + " scale = tf.expand_dims(scale, axis=0)\n", + " image /= scale\n", + " return image\n", + "\n", + " \n", + "def load_image_into_numpy_array(path):\n", + " \"\"\"Load an image from file into a numpy array.\n", + "\n", + " Puts image into numpy array to feed into tensorflow graph.\n", + " Note that by convention we put it into a numpy array with shape\n", + " (height, width, channels), where channels=3 for RGB.\n", + "\n", + " Args:\n", + " path: the file path to the image\n", + "\n", + " Returns:\n", + " uint8 numpy array with shape (1, h, w, 3)\n", + " \"\"\"\n", + " image = None\n", + " if(path.startswith('http')):\n", + " response = urlopen(path)\n", + " image_data = response.read()\n", + " image_data = BytesIO(image_data)\n", + " image = Image.open(image_data)\n", + " else:\n", + " image_data = tf.io.gfile.GFile(path, 'rb').read()\n", + " image = Image.open(BytesIO(image_data))\n", + "\n", + " (im_width, im_height) = image.size\n", + " return np.array(image.getdata()).reshape(\n", + " (1, im_height, im_width, 3)).astype(np.uint8)\n", + "\n", + "\n", + "def build_inputs_for_segmentation(image):\n", + " \"\"\"Builds segmentation model inputs for serving.\"\"\"\n", + " # Normalizes image with mean and std pixel values.\n", + " image = normalize_image(image)\n", + " return image" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6917xnUSlp9x" + }, + "source": [ + "## Build a instance segmentation model and load pre-trained model weights\n", + "\n", + "Here we will choose which Instance Segmentation model we will use.\n", + "If you want to change the model to try other architectures later, just change the next cell and execute following ones. \n", + "3 models are available.\n", + "1. **material_model** : Identify the highest level of the category of the material like plastic, metal, etc.\n", + "2. **material_form_model** : Identify the product or form in which the object is like plate, bottle, etc.\n", + "3. **plastic_model** : Identify the types of the plastics like HDPE, PETE, etc." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HtwrSqvakTNn", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0a6d2e8b-30a3-4c0d-955a-6a410bdb0b0b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Selected model:material_form_model\n", + "Model Handle at TensorFlow Hub: material_form/saved_model/saved_model/\n" + ] + } + ], + "source": [ + "# @title Model Selection { display-mode: \"form\", run: \"auto\" }\n", + "model_display_name = 'material_form_model' # @param ['material_model','material_form_model','plastic_model']\n", + "model_handle = ALL_MODELS[model_display_name]\n", + "\n", + "print('Selected model:'+ model_display_name)\n", + "print('Model Handle at TensorFlow Hub: {}'.format(model_handle))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NKtD0IeclbL5" + }, + "source": [ + "### Load label map data (for plotting).\n", + "\n", + "Label maps correspond index numbers to category names, so that when our convolution network predicts `2`, we know that this corresponds to `Bottle`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine.\n", + "\n", + "We are going, for simplicity, to load from the repository that we loaded the Object Detection API code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3Kwqa0T1NTUf", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "54b701ab-c936-46b0-c76e-d0cc48e4b87c" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Labels selected for material_form_model\n", + "\n", + "\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{1: {'id': 1, 'name': 'Flexibles'},\n", + " 2: {'id': 2, 'name': 'Bottle'},\n", + " 3: {'id': 3, 'name': 'Jar'},\n", + " 4: {'id': 4, 'name': 'Carton'},\n", + " 5: {'id': 5, 'name': 'Sachets-&-Pouch'},\n", + " 6: {'id': 6, 'name': 'Blister-pack'},\n", + " 7: {'id': 7, 'name': 'Tray'},\n", + " 8: {'id': 8, 'name': 'Tube'},\n", + " 9: {'id': 9, 'name': 'Can'},\n", + " 10: {'id': 10, 'name': 'Tub'},\n", + " 11: {'id': 11, 'name': 'Cosmetic'},\n", + " 12: {'id': 12, 'name': 'Box'},\n", + " 13: {'id': 13, 'name': 'Clothes'},\n", + " 14: {'id': 14, 'name': 'Bulb'},\n", + " 15: {'id': 15, 'name': 'Cup-&-glass'},\n", + " 16: {'id': 16, 'name': 'Book-&-magazine'},\n", + " 17: {'id': 17, 'name': 'Bag'},\n", + " 18: {'id': 18, 'name': 'Lid'},\n", + " 19: {'id': 19, 'name': 'Clamshell'},\n", + " 20: {'id': 20, 'name': 'Mirror'},\n", + " 21: {'id': 21, 'name': 'Tangler'},\n", + " 22: {'id': 22, 'name': 'Cutlery'},\n", + " 23: {'id': 23, 'name': 'Cassette-&-tape'},\n", + " 24: {'id': 24, 'name': 'Electronic-devices'},\n", + " 25: {'id': 25, 'name': 'Battery'},\n", + " 26: {'id': 26, 'name': 'Pen-&-pencil'},\n", + " 27: {'id': 27, 'name': 'Paper-products'},\n", + " 28: {'id': 28, 'name': 'Foot-wear'},\n", + " 29: {'id': 29, 'name': 'Scissor'},\n", + " 30: {'id': 30, 'name': 'Toys'},\n", + " 31: {'id': 31, 'name': 'Brush'},\n", + " 32: {'id': 32, 'name': 'Pipe'},\n", + " 33: {'id': 33, 'name': 'Foil'},\n", + " 34: {'id': 34, 'name': 'Hangers'}}" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ], + "source": [ + "# @title Labels for the above model { display-mode: \"form\", run: \"auto\" }\n", + "\n", + "if model_display_name == 'material_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/material_labels.pbtxt'\n", + "elif model_display_name == 'material_form_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/material_form_labels.pbtxt'\n", + "elif model_display_name == 'plastic_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/plastic_type_labels.pbtxt'\n", + "\n", + "print('Labels selected for',model_display_name)\n", + "print('\\n')\n", + "category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)\n", + "category_index" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "muhUt-wWL582" + }, + "source": [ + "## Loading the selected model from TensorFlow Hub\n", + "\n", + "Here we just need the model handle that was selected and use the Tensorflow Hub library to load it to memory.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rBuD07fLlcEO", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6103a711-791a-4ed5-fcf5-96fc6de76976" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "loading model...\n", + "model loaded!\n" + ] + } + ], + "source": [ + "print('loading model...')\n", + "hub_model = hub.load(model_handle)\n", + "print('model loaded!')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GIawRDKPPnd4" + }, + "source": [ + "## Loading an image\n", + "\n", + "Let's try the model on a simple image. \n", + "\n", + "Here are some simple things to try out if you are curious:\n", + "* Try running inference on your own images, just upload them to colab and load the same way it's done in the cell below.\n", + "* Modify some of the input images and see if detection still works. Some simple things to try out here include flipping the image horizontally, or converting to grayscale (note that we still expect the input image to have 3 channels).\n", + "\n", + "**Be careful:** when using images with an alpha channel, the model expect 3 channels images and the alpha will count as a 4th.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hX-AWUQ1wIEr", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 666 + }, + "outputId": "3c0ddf39-c51a-4f46-cad7-8d21ebcaf4f2" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "min: 0 max: 255\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "#@title Image Selection (don't forget to execute the cell!) { display-mode: \"form\"}\n", + "selected_image = 'Image1' # @param ['Image1']\n", + "flip_image_horizontally = False #@param {type:\"boolean\"}\n", + "convert_image_to_grayscale = False #@param {type:\"boolean\"}\n", + "\n", + "image_path = IMAGES_FOR_TEST[selected_image]\n", + "image_np = load_image_into_numpy_array(image_path)\n", + "\n", + "# Flip horizontally\n", + "if(flip_image_horizontally):\n", + " image_np[0] = np.fliplr(image_np[0]).copy()\n", + "\n", + "# Convert image to grayscale\n", + "if(convert_image_to_grayscale):\n", + " image_np[0] = np.tile(\n", + " np.mean(image_np[0], 2, keepdims=True), (1, 1, 3)).astype(np.uint8)\n", + "\n", + "print('min:',np.min(image_np[0]), 'max:', np.max(image_np[0]))\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np[0])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dkkBAgGcX65P" + }, + "source": [ + "## Pre-processing an image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "97zIaKAhX-92", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c630f46d-9bb5-4fd6-d59a-b346d547bbd5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(512, 1024)\n" + ] + } + ], + "source": [ + "# get an input size of images on which an Instance Segmentation model is trained\n", + "hub_model_fn = hub_model.signatures[\"serving_default\"]\n", + "height=hub_model_fn.structured_input_signature[1]['inputs'].shape[1]\n", + "width = hub_model_fn.structured_input_signature[1]['inputs'].shape[2]\n", + "input_size = (height, width)\n", + "print(input_size)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-K0V6KWiYYpD", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6f27f7c1-11f4-42c1-b848-71dc44f6646d" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "TensorShape([1, 512, 1024, 3])" + ] + }, + "metadata": {}, + "execution_count": 15 + } + ], + "source": [ + "# apply pre-processing functions which were applied during training the model\n", + "image_np_cp = cv2.resize(image_np[0], input_size[::-1], interpolation = cv2.INTER_AREA)\n", + "image_np = build_inputs_for_segmentation(image_np_cp)\n", + "image_np = tf.expand_dims(image_np, axis=0)\n", + "image_np.get_shape()" + ] + }, + { + "cell_type": "markdown", + "source": [ + "You may notice that the image gets way darker. This is because the pre-processing normalizes the original image." + ], + "metadata": { + "id": "-O6fHWIh4C8r" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ga1lccBpdxpd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 583 + }, + "outputId": "7356f5db-625b-4aa8-ac38-293e99fe14fc" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "# display pre-processed image\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np[0])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FTHsFjR6HNwb" + }, + "source": [ + "## Doing the inference\n", + "\n", + "To do the inference we just need to call our TF Hub loaded model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Gb_siXKcnnGC", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "3c4d3952-74d8-49c8-98c6-70966ff241e8" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "dict_keys(['image_info', 'detection_scores', 'num_detections', 'detection_masks', 'detection_boxes', 'detection_classes'])\n" + ] + } + ], + "source": [ + "# running inference\n", + "results = hub_model_fn(image_np)\n", + "\n", + "# different object detection models have additional results\n", + "# all of them are explained in the documentation\n", + "result = {key:value.numpy() for key,value in results.items()}\n", + "print(result.keys())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IZ5VYaBoeeFM" + }, + "source": [ + "## Visualizing the results\n", + "\n", + "Here is where we will need the TensorFlow Object Detection API to show the squares from the inference step (and the keypoints when available).\n", + "\n", + "the full documentation of this method can be seen [here](https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py)\n", + "\n", + "Here you can, for example, set `min_score_thresh` to other values (between 0 and 1) to allow more detections in or to filter out more detections." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PMzURFjxxqF7" + }, + "outputs": [], + "source": [ + "# selecting parameters for visualization\n", + "label_id_offset = 0\n", + "min_score_thresh =0.6\n", + "use_normalized_coordinates=True\n", + "\n", + "if use_normalized_coordinates:\n", + " # Normalizing detection boxes\n", + " result['detection_boxes'][0][:,[0,2]] /= height\n", + " result['detection_boxes'][0][:,[1,3]] /= width" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "FILNrrDy0kUg", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 583 + }, + "outputId": "5ba415a2-ddaa-4b26-ec58-041387751498" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "# Visualize detection and masks\n", + "if 'detection_masks' in result:\n", + " # we need to convert np.arrays to tensors\n", + " detection_masks = tf.convert_to_tensor(result['detection_masks'][0])\n", + " detection_boxes = tf.convert_to_tensor(result['detection_boxes'][0])\n", + "\n", + " detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(\n", + " detection_masks, detection_boxes,\n", + " image_np.shape[1], image_np.shape[2])\n", + " detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,\n", + " np.uint8)\n", + "\n", + " result['detection_masks_reframed'] = detection_masks_reframed.numpy()\n", + "viz_utils.visualize_boxes_and_labels_on_image_array(\n", + " image_np_cp,\n", + " result['detection_boxes'][0],\n", + " (result['detection_classes'][0] + label_id_offset).astype(int),\n", + " result['detection_scores'][0],\n", + " category_index=category_index,\n", + " use_normalized_coordinates=use_normalized_coordinates,\n", + " max_boxes_to_draw=200,\n", + " min_score_thresh=min_score_thresh,\n", + " agnostic_mode=False,\n", + " instance_masks=result.get('detection_masks_reframed', None),\n", + " line_thickness=2)\n", + "\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np_cp)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c75cSAeJ5JAQ" + }, + "source": [ + "## Visualizing the masks only" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tt7RxYqhLpn9", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 602 + }, + "outputId": "c0b1444d-062c-4bd8-db6f-3b01d11e6890" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Total number of objects found are: 26\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAABWMAAALACAYAAAD2e2C+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzde5iN9f7/8de9Zs2agzEnxhiMIUNDhonKlETpRONMqIS0+5WS0kEO1S7tyt4lKe0OSudspbZOUqRIW4ldGpRxGGMYhhwGE3O6f39U65sdMWOt9VmH5+O61rW771nrvp85zG69516f27JtWwAAAAAAAAAA73KYDgAAAAAAAACAUMAwFgAAAAAAAAB8gGEsAAAAAAAAAPgAw1gAAAAAAAAA8AGGsQAAAAAAAADgAwxjAQAAAAAAAMAHvDKMtSzrUsuyfrQsa71lWXd54xwAAAAAAAAAEEgs27Y9e0DLCpO0TtJFkgolLZc02LbtNR49EQAAAAAAAAAEEG9cGXuWpPW2bW+0bbtM0ixJvbxwHgAAAAAAAAAIGE4vHLOhpC2/2y6U1OHPXmBZlmcvzwUAAAAAAAAAQ2zbto623xvD2BNiWdZ1kq4zdX4AAAAAAAAA8CVvDGO3Skr93XajX/cdwbbtZyU9K3FlLAAAAAAAAIDg5401Y5dLam5ZVlPLslySBkl61wvnAQAAAAAAAICA4fErY23brrAs6yZJ8yWFSXrBtu3Vnj4PAAAAAAAAAAQSy7bNrxDAMgUAAAAAAAAAgsWxbuDljWUKAAAAAAAAAAD/g2EsAAAAAAAAAPgAw1gAAAAAAAAA8AGGsQAAAAAAAADgAwxjAQAAAAAAAMAHGMbiTzVp0kTNmjUznQEAAAAAAAAEPMu2bdMNsizLfESIsyzrqPtff/11xcbGKicn5w9f84c/OwAAAAAAAIC/sW37qMM2hrHQKaecoq+//vqoX6tdu7Ysy1JJSckR+3/66SdlZGQwkAUAAAAAAAD+x7GGsU5fh8C/9OjRQ9dcc43q1Knzp8/736/HxMTopZde0t13363Nmzd7MxEAAAAAAAAICqwZG8Kys7PVu3dv9e7du9qvjYiI0JAhQ9S9e3elp6d7oQ4AAAAAAAAILixTEGLCwsIUHh4uSfr666+VmZl50secMmWKJkyYIEk6dOjQSR8PAAAAAAAACGSsGQtJ0tVXX62nn35akhQZGXnMG3dVR0VFhcrLyyVJGRkZKigoOOljAgAAAAAAAIHqWMNYlikIIQ8//LBuv/12RUVFKSoqyiODWElyOp3uY7700kvq06ePR44LAAAAAAAABBNu4BUCnE6n+vfvr549e6ply5ZePVeXLl20bds2RUZGSpLmzJmjsrIyr54TAAAAAAAACAQsUxDEatWqJZfLpVq1amnDhg1yuVw+PX9VVZVatGih3bt3q7y8XAcOHPDp+QEAAAAAAAATWKYgBM2YMUO7du3S5s2bfT6IlSSHw6F169Zp165devPNN31+fgAAAAAAAMCfcGVsEAoLC9OCBQvUtm1bJSQkmM6RJJWUlOjHH3+UJN18881atmyZ4SIAAAAAAADAO451ZSzD2CBTv359DR48WA8++KB73VZ/88wzz+iHH344Yt/KlSu1ePFiQ0UAAAAAAACA5zCMDXINGzaU0+lUu3bt9Pbbb5vOqbYZM2bogQceOGJfWVmZioqKDBUBAAAAAAAANcMwNsht2bJFjRo1Mp3hUf/973/Vrl070xkAAAAAAABAtTCMDUI33nijbrnlFklS06ZNFRYWZrjIs8rKylRQUODe/vLLLzV06FCDRQAAAAAAAMDxMYwNMrfccov69u2rTp06mU7xmc2bN+uNN95wb3/33XeaNWuWwSIAAAAAAADgjxjGBgmn06lWrVrp7bffVrNmzUznGDVv3jzddddd7u1du3Zp27ZtBosAAAAAAAAAhrEBzbL+7/eufv36DByPYdq0ae5lG/zhzzUAAAAAAABCE8PYAFWrVi2tW7dO4eHhkiSHw6E6deoYrvJPP//8sw4cOCBJys7O1saNGw0XAQAAAAAAIBQdaxjr8HUITlxWVpb++c9/qn79+kpKSlJSUhKD2D8RFRXl/nWaPHmyunXrZjoJAAAAAAAAcHOaDsD/iYuLU7t27dzb5557roYMGWKwKHD1799f+/bt06FDhyRJS5cuVVlZmeEqAAAAAAAAhDKWKfATTqdTnTp10qeffmo6JejYtq20tDRt3bpVVVVVpnMAAAAAAAAQ5FimwM898sgj+vDDD01nBCXLsvTjjz9ylTEAAAAAAACM4spYQ6688koNGjTIvZ2Zmam0tDSDRcFv1apV+uCDDzR+/HjTKQAAAAAAAAhix7oyljVjfaRZs2Zq3769e7tPnz7KyckxWBR62rRpI4fDoZUrV+qdd95RZWWl6SQAAAAAAACEEK6M9bK4uDhJ0rXXXqtHHnnEcA0kqby8XE2aNNHBgwdVXl6u0tJS00kAAAAAAAAIIse6MpZhrBfFxsaquLhYTqdTlmXJ4WCJXn/x21Wxb7311hHLRQAAAAAAAAAni2GsDzidTs2fP1/R0dGSpLCwMJ155pmGq/Bndu/erZUrV+qSSy5RVVWV6RwAAAAAAAAEAdaM9ZIRI0YcMXzt2LGjIiIiDFfhRCUmJurcc8/VqFGj9K9//Uvbt283nQQAAAAAAIAgFfBXxkZERKhevXqezJEkVVVVaevWrapXr96fDldXrlypunXrevz88L2OHTvqyy+/NJ0BAAAAAACAABe0V8Z27txZ8+fP9/hx9+7dq7p16+rdd99Vhw4dPH58AAAAAAAAAKHF76+MTU1N1YIFC4752ujoaDVq1MjjTVVVVVq/fr0aN26syMhIjx8f/mfLli167rnnNGnSJNMpAAAAAAAACGABcWVsVlaW+vXrd8S+xMREtWjRwuctDofDyHlhTmpqqleWvAAAAAAAAAAkPxvGZmZmauLEiaYzAAAAAAAAAMDjHKYDAAAAAAAAACAU+NWVsYBJPXr00OLFi01nAAAAAAAAIEgxjEVIW7FihZ555hlJ0vLly1VSUmK4CAAAAAAAAMGKYSxCyt69e/Xdd9+5tz/77DM999xzBosAAAAAAAAQKhjGIihVVFTItu0/7F++fLkuvvhiA0UAAAAAAAAIdQxjEZTOOOMMbdiw4Q/7KysrDdQAAAAAAAAADGMRwJYsWaJHH330qF9bv369Dh486OMiAAAAAAAA4NgYxsIjDhw4oPnz5/v0nF988YXmzp3r03MCAAAAAAAANcUwFtVSWlqqioqKP+zPz89X//79DRQBAAAAAAAAgYFhLKqlb9++WrhwoekMAAAAAAAAIOAwjMUfrF27Vn/5y1+O+rXc3NyjXhkLAAAAAAAA4M/51TD2xx9/1MyZMzV8+HDTKSFl3759eu2119zbmzdv1tKlSw0WAQAAAAAAAMHHsm3bdIMsy3JHJCQkaNWqVUpOTlZ4eLjJrKBVVVWl7du3u7c3btyoTp06GSwCAAAAAAAAgodt29bR9vvdMPY3K1asULt27UzkBL3i4mLVr19f/vB7DwAAAAAAAASbgBvGNmnSRDfeeKNuv/12E0lB5eeff1Z2drbKysokSRUVFVq/fr3hKgAAAAAAACA4HWsY61drxv5efn6+3nnnHR08eFCSNHr0aMXHxxuuCjzr1q3Tiy++qNWrV6uystJ0DgAAAAAAABCy/PbK2P/18ccfq2HDhoqKilLTpk19kRXwCgsL9e677+rGG280nQIAAAAAAACEjIBbpuBY2rVrpxUrVngzJ2gMHDhQs2fPNp0BAAAAAAAAhJSgGcaGh4crOTlZq1atUkJCgjezAlZlZaVOO+00bd68WYcOHTKdAwAAAAAAAISUgFsz9ljKy8tVWFioMWPGKDIyUq1atdKoUaNMZ/mNTZs2afLkydq0aZP7hl0AAAAAAAAAzAu4K2P/11lnnaWpU6eqQ4cOcjgcnswKOBs3btQnn3yi66+/3nQKAAAAAAAAELKCZpmCo4mMjNTu3bsVFRXlqaSAUlFRIUm6+eab9c9//tNwDQAAAAAAABDagmaZAhwpLy9P7du3lyTWhwUAAAAAAAD8GMPYAHXrrbdqy5Yt2r9/v/bv3286Bwh6t99+u7Kzs2v02qqqKg0bNkylpaUergIAAAAAAIGEYWyAOXz4sObNm6c5c+Zoy5YtpnOAoGdZlrp3766+ffvq7LPPrtExqqqqNHfuXB08eFDbt2/XsmXLPFwJAAAAAAACAWvGBpDy8nIVFBQoPT3ddAoQEsLCwhQbG6vNmzerdu3aHjnmRx99pAEDBujAgQMeOR4AAAAAAPA/x1oz1uHrENTcSy+9pJYtW5rOAEJGhw4dtGPHDo8NYiXpkksu0ZYtW+RyuTx2TAAAAAAAEBgYxgaIm2++WZMnT1Z5ebnpFCBkWJal8PBwjx8zNjZWCxcuVLt27Tx6bAAAAAAA4N8Yxvq58vJyPf/885o3b57Wr19vOgeABzgcDp177rm68sordd5555nOAQAAAAAAPsINvPzc4cOHdcMNN3BFLBCExowZo/j4eC1evNh0CgAAAAAA8AGujAUAAAAAAAAAH2AY68c+/fRTdejQgatiAQMmTJigl19+2XQGAAAAAAAIIgxj/dRbb72lGTNmaM2aNaZTgJDUoEEDnXLKKV4/T1ZWlm699VavnwcAAAAAAJgX8GvGRkdHKz09XQ5H8MyV8/Ly9Oyzz+qTTz4xnQKEpGbNmikhIcEn52rXrp1SU1M1depU2bbtk3MCAAAAAAAzLH94829ZVo0jLr74Ys2fP9+TOUZVVFQoMTFR+/fvN50ChKytW7eqQYMGPjvfzp07lZyczDAWAAAAAIAgYdu2dbT9AX856eeff6709HQdOnTIdMpJ+/rrr9WkSRMdOHDAdAoAH6pTp44KCgqUkZFhOgUAAAAAAHhRwC9TcPjwYW3dujXgryibPXu2Zs+era1bt5pOAeBjDodDjRo1ksvlMp0CAAAAAAC8KOCHscFgxYoVmjVrlt555x3TKUBIi4qKUtu2bRmKAgAAAAAAr2AYa1hFRYV69OihoqIi0ylAyEtLS9N//vMfY+d3OBxyOByqqqoy1gAAAAAAALwn4NeMDWQ//vij6tatq+3bt5tOAeAHlixZorvvvtt0BgAAAAAA8JKguDK2rKxMQ4YM0cSJE5WVlWWso7KyUtdcc80J30yspKRE+/bt83IVgBPRv39/jRgxwmhDTEyMIiMjjTYAAAAAAADvCYphbFVVlebMmaPWrVtry5Yt7v2dOnVSfHx8tY+3bds2rVixotqvq6io0OzZs094GAvAP3Tp0kV9+/bVpZdeajpFzZs31/nnn69FixaZTgEAAAAAAB5m2bZtukGWZXklYvHixWrfvn21X/fmm29q2LBhng8C4Fcsy1JUVJRWrFihjIwM0zluGzduVGZmpqRfrvyvqKgwXAQAAAAAAKrDtm3raPuDehgbHh4uyzrqv/efqqqqYvgBhICUlBTl5+fX+HuFN5WVlUmSbr/9dj3xxBOGawAAAAAAQHWE5DAWAI6lZ8+eGjt2rM455xzTKX9q/fr1WrBggW644QbTKQAAAAAA4AQdaxjr8HUIAJh22WWXqX///n4/iJWk9PR0XXrppRo+fLhcLpfpHAAAAAAAcBK4MhZAyLAsS0lJSXrvvfd01llnmc6plqqqKrVs2VKbN2/W4cOHTecAAAAAAIA/wZWxAEJeTEyMCgsLA24QK0kOh0M//PCD+vbtazoFAAAAAADUEFfGAggJF154oR577DG1bt3adMpJKSgo0N69e3Xo0CF16tTJfaMvAAAAAADgP451ZazT1yEA4Gv9+vVT3759A34QK0mNGzdW48aN9fPPP8vh4MMNAAAAAAAEEoaxAILeVVddpd69e5vOAAAAAAAAIY7LqgAAAAAAAADAB7gyFkDQcjqdWrVqlZo0aWI6BQAAAAAAgCtjAQQvy7KUmpqqqKgo0ykAAAAAAAAMYwEAAAAAAADAFxjGAgAAAAAAAIAPMIwFAAAAAAAAAB9gGAsgKGVkZOiVV15RZGSk6RQAAAAAAABJktN0AAB4Q7169TRw4EDTGQAAAAAAAG5cGQsAAAAAAAAAPsAwFgAAAAAAAAB8gGEsAAAAAAAAAPgAw1gAAAAAAAAA8AFu4BVknE6n4uLivH4e27a1e/dur58HAAAAAAAACBYMY4PMGWecoS+//NLr59m5c6fq168v27a9fi4AAAAAAAAgGDCMDRAzZsxQu3btjvu8WrVqybIsr/fUqVNHK1eulG3bevHFFzVt2jSvnxMAAAAAAAAIZAxj/VSvXr3UokUL9/YFF1ygpk2bGiw6UlhYmLKysiRJ+/btU0REhCTpmWeeUUlJick0AAAAAAAAwC8xjPVDTZo00ciRI3XxxRebTjkhXbp0UZcuXSRJb775JsNYwMt+/vlnbdq0SVVVVaZTAAAAAABANTCM9TNOp1OrVq1S7dq1TacA8FOfffaZunfvbjoDAAAAAABUk8N0AH4xadIkbdy4UXl5eYqJiTGdU2NLlizRtddeazoDAAAAAAAA8DtcGWtQcnKy7rjjDknShRde6FdrwtZUo0aNFBcXZzoDAAAAAAAA8DsMYw1p0KCBOnbsqNtuu810CgAAAAAAAAAfYJkCAyzL0vXXX6/Zs2ebTvEKy7JkWZbpDAAAAAAAAMCvMIw14Msvv9Ttt99uOsNr7rvvPi1YsMB0BgAAAAAAAOBXGMYaEB8fr6ioKNMZXhMdHc26sQAAAAAAAMD/YM1YH4qIiFDnzp1Vq1Yt0ylAQGvcuLEyMjKO+rXPPvtMZWVlPi4CAAAAAAA4PoaxPlS3bl3Nnz/fdAYQsCIjIyVJ/fr105QpU476nKZNm2rLli2+zAIAAAAAADghDGMBBIS4uDgVFhYqLCxMTuexv3WtXbtWI0eO1IYNG3xYBwAAAAAAcHwMY+EVzZs314cffqiBAwdq//79pnMQgJxOp9566y331bBOp1O1atWSZVl/+rrIyEiNGTNGpaWlvsj0ub///e+aPXu26QwAAAAAAFADDGPhFbGxsbrkkkvkcrlMpyAAJScnq1u3brr00ksVERFR7de3bt3aC1X+4dtvv9WKFStMZwAAAAAAgBpgGAvAb8THx8vhcCg7O1szZ840nQMAAAAAAOBRDGMB+AXLsrR27VrVq1fvuEsRAAAAAAAABKLjDmMty3pBUo6kYtu2W/+6L1HSvyQ1kZQv6XLbtvdYv0xQHpfUXVKppGG2ba/0Tjpw8rp27aqHHnqoWq/59ttvdd1113mpKLQ5HA45HA7TGQAAAAAAAF5xIlfGvijpSUkv/27fXZIW2rb9sGVZd/26PVZSN0nNf310kPTPX/8X8BsOh0MjR45UeHi4srKydOaZZ1br9XXr1tWtt94qSXrvvfe0fv16b2SGhOHDhys+Pt69HR0dbbAGnpaVlaXzzz+/2q8rKyvTU089Jdu2vVAFAAAAAIA5xx3G2ra92LKsJv+zu5ekLr/+80uSPtMvw9hekl62f3kHvcyyrHjLslJs2y7yVDACh2VZSk1N1aFDh3Tw4EHTOZKkyMhINWrUSJMnT67x4K9p06aaMmWKJKm0tFT79u3Tzp07PZkZ9MLCwpSamqp7771XaWlppnPgIbVq1VJSUpJ7u1evXvrrX/9a7eMcOHBAH330kSorK1VSUqLdu3d7sBIAAAAAAHNq+nng5N8NWLdLSv71nxtK2vK75xX+ug8hyLIs/fe//1WfPn1Mp7hlZ2crLy/PY1dgPv3003rmmWc8cqxQ0rBhQ23atIlBbJDJycnRpk2b3I+aDGIlKSYmRuvXr9emTZs0adIkz0YCAAAAAGDQSd/Ay7Zt27Ksan+W1LKs6ySx8CZ85sEHH9TQoUM9ftyLLrpI3377rc444wxVVFR4/PjBZM6cOWrbtq2cTu4dGAyys7P16quvurdjYmI8fo6rrrpKWVlZ6tixo8ePDQAAAACAr9V0IrLjt+UHLMtKkVT86/6tklJ/97xGv+77A9u2n5X0rCTVZJgLVMfEiRN12WWXqUGDBh4/dkxMjFq2bKmHHnpI06dPV35+vsfPEeiioqJ0zz336Oyzz1ZKSorpHNRQSkqKbrnlFvd2WlqamjVr5tVzxsbGqk2bNpo8ebIeeeQRlgQBAAAAAAS0mg5j35U0VNLDv/7v3N/tv8myrFn65cZd+1gvFqZZlqWbbrpJycnJx39yDblcLt1+++2aO3cuw9jfOeWUUxQbG6vY2FjdeeedcjhqujIKTEtJSdF5552nO++80+fnjomJ0Z133qmZM2cyjAUAAAAABLTjDmMty3pDv9ysq65lWYWS7tUvQ9jZlmWNkLRZ0uW/Pv1DSd0lrZdUKmm4F5oRYCzLkmVZxu6MblmWkfNCevLJJ9WtWzfTGaih3//dGT16tMaOHWuwRgzzAQAAAAAB77jvbG3bHmzbdopt2+G2bTeybft527Z/sm27q23bzW3bvtC27d2/Pte2bftG27ab2badadv2N97/V4C/mz59umbNmmXk3BkZGdq+fbvq1atn5PxAoLr++uu1Y8cO92PUqFGmk7RkyRLddtttpjMAAAAAAKgx7qIDr6tdu7bi4+N9ft5+/fppyJAhSkpK8tk5x48fr1mzZunll1/22Tn9TatWrXTHHXdIkjIzMw3XBA/btjVq1Ch99dVXXjtHrVq1NG3aNDkcDp122mk+/btzIhITE1WrVi3TGQAAAAAA1BjDWASt9u3bq1evXj49Z7du3bR58+aQHca2atVKOTk5GjZsmOmUoGPbtl5//XXt2bPHK8evX7++zj77bA0bNozlAAAAAAAA8BKGsfAJh8Mhl8ulsrIyn5wvPDxcYWFhPjkX/s+dd96poUOHms5ANbhcLknSJZdcohdffNFszAkICwtTeHi4ysvLTacAAAAAAFBtDGPhExdccIGKiorUsGFDHTp0yOvn++abb9SqVSuvnwcIZBEREdq6dauioqLkdAbG/x2MGzdOOTk5at++vekUAAAAAACqLTDefSPgORwORUdHe/08SUlJevHFF5Wenh4ww6VgYFmWXn/9dZ177rmmU/Anfvt9io2NlfTL38uEhISAWpYgPDxcLVq00HvvvaehQ4dq9+7dppMAAAAAADhhTKsQVKKiotS9e3fTGSGjbdu2ysjIkMPh0GWXXabatWubTgpKu3fv1vz582u0zEdOTo77plfB8vsUExOjnJwcDRw4UB9//LE2bNhgOgkAAAAAgBPCMBZBIyIiwn3FH7wvNjZWI0aM0KhRo0ynBL0NGzboiiuuOOHnR0REKCoqSpI0ffp0NW7c2FtpRj311FMaM2aMnnvuOR04cMB0DgAAAAAAxxU4n00FjuO6667Td999ZzojZKxYsUI33XST6QwcxU033aRdu3Zp165dQTuI/c2jjz6q999/33QGAAAAAAAnhGEsfCYiIkKLFy9Wdna2x489Y8YM3XbbbQG19mWgczgcsizLdEbQmzp1qq655prjPi8uLk5ffvmlli1bpptvvllhYWEKCwvzQaFZlmWpXbt2WrJkiU/WpQYAAAAA4GSwTAF8xrIsnXnmmYqLi/P4sVu3bq20tDSPHxdHysrKUlZWll588UXTKSFh5syZ+ve//63c3NxjPqd3795q3LixatWqpezs7JAckNeuXVvnnHOObrrpJs2ZM4c1ZAEAAAAAfothLAKaZVlq2LChXC6X6ZSQ0LVrV40ePVoLFy6U08m3D2+pqKhQUVGR7r77bm3duvWIryUmJrpvyCVJY8aMUadOnXyd6HccDocmT56svLw8hrEAAAAAAL/FNAUBLS4uTvn5+SHxcWx/kZqaqoKCAtMZQS0/P1/Nmzc/6teeeOKJat3MCwAAAAAA+A8W2PSh7du3q3nz5tq4caPplKCQk5Ojb775hkGsj7zzzju66667TGcEvWeffVYXXXTREftat26tdevWad26derRo4ehssAwffp0TZ8+3XQGAAAAAABHxTDWhyorK7V+/XqVlZWZTjFq2LBh6t+//0kdY8iQIRoxYoSaNWvmoSrPOeusszR27FjTGR7XuHFj1a1b13RGUHvyySc1a9Ys5efnu/d169ZNY8aMUfPmzdW8eXPVrl3bXGAASElJUYMGDUxnAAAAAABwVCxTAJ8bNGiQKioq9NZbb1X7tZZlqVWrVvrLX/7it+tktmvXTikpKZo8ebLpFASAPXv2qLCwUNIvSxDs2bNHrVu3dn/9iiuu0FVXXWUqDwAAAAAAeBDDWANs2zadELAiIiL09ddfKzo62nRKyLAsy3RCUPv3v/+tESNGuLfHjh2rhx56yGBRcLAsi++1AAAAAAC/wzIFBnTq1ElTpkwxnRFwzj33XG3evJlBrA8NGDBARUVFKioqUps2bUznBKXBgwe7f42Lioo0YcIE00kBr1u3bsrLy1N4eLjpFAAAAAAAjsAw1oCffvpJBw4cMJ0RcCIiIlSvXj3TGSckPj5eM2bMCPi1KyMjI5WcnKzk5GQ5nVxIf7KeeeYZPfHEE0fs+/2vcXJysmJiYgzVBY+IiAglJyebzgAAAAAA4A8YxhqSn5+v5cuXm84IGKeeempAXZkZFRWlESNGKCEhwXRKjbVp00YtW7Y0nRHwSkpKtHjxYlVWVmr79u3atm2b6aSQEBYWpvPOO09xcXGmUwAAAAAAcLP8YU09y7LMRxjQqlUrrV692nSGEa+//rqGDRum8vLyE3r+zJkzNWzYMO9GeUHr1q0D9vd4/vz5uvjii01nVFt5ebnCwsLkcPjHz5q++uornX/++SoqKmIwaMAFF1zgHoYDAAAAAOArtm0f9SY8/jGtQMi5/PLLtWHDBr8ZmCF4ZGRk6OWXXzad4XbmmWequLiYQawhH374oR577DHTGQAAAAAASJJYBNKggoIC9e7dWy+88IISExNN5/iU0+lUrVq1TGfgKFwul1577TWdfvrpplNqpLS09ISvuPYFh8PBOrAGRUZGqmfPnoqNjQ3Iq+sBAACAUHHjjTfqoosu8sixKisrdeWVV+rQoUMeOR7gSQxjDQle4MYAACAASURBVDpw4IDmzp3LN4fj6Nmzp5o0aWI6I2SEhYUpJydHkZGRplNqpEePHjrllFNMZ8CPpKWlqVevXhowYIBs29batWsDdvkQAAAAIBj16NFDffv21QUXXOCR41VWVmrAgAH6+eefT+j5+/fv1/z58z1ybuB4WDPWD2zdulUNGjQwneFzu3fvVlJSkqqqqo75HMuyVFxcrLp16/qwzHOOtmZsZGSkIiIiTvgY+/fv/9NfI09yOp2qU6eO8vPzA24YW1VVpf379ys2NlaWddRlWQBJ0gMPPKC7777bdAYAAAAQ8hwOh2JjY7Vu3TolJSUZ61i3bp3OOuuso37N3z59icDBmrGAn/jHP/6hXbt2nfCjVatWPmu74IILVFhYGHCDWElas2aNkpKSVFxcbDoFAAAAAHACWrVqpZ07dxodxEpSixYtjvmevGfPnkbbEHxYpsAP9OrVSxMmTFDv3r1Np/hUbGysvvjiCw0fPlw//vij6RyfcTgccjpP/K/eq6++qoMHDx6xb+rUqXrzzTc9nSbLsqrV5i+ee+45PfXUUyovL5c/XO0P/zZs2DClp6dr8ODBplMAAACAkOcv70GP1fHggw9qzJgxx3zd1VdfrQ0bNngrC0HIP/7Eh7hvvvlG27dvN53hc06nU2effXZQ39xo0KBBmjt3rr755htJv3yTbt26dbWO0bZt2z/s27p16xFLN3zyySdav379ycUGsIKCAn377bemMxAgGjVqFLA3qAMAAACCRYcOHZSTk2M647hatGihFi1aHPPrw4YN07Zt247Yt2TJEuXm5no7DQGKYayf2Lt3r3bt2hWwa6N6Q3h4uJKTk+VwBO5qGhMnTlRVVZV7GPu3v/1NjRo1OunjDhgwQAMGDHBvjxw58g8Lkx88eFB79+494WMmJCQE5J+/4uJi7d+/33QGAAAAAKAaevXqpXHjxpnOOGkTJ078w757771Xe/bskW3bKioq4hOcOELgTrmCzLhx49SnTx/TGX7ltNNO05YtW5SYmGg6xe899dRTKiwsPOIxZcqUah1j2rRpevXVV71U6D2dO3fW448/bjoDAAAAAABJ0n333afCwkIVFBQoLi7OdA78DMNYIEj17dtXa9eudT9uu+0200le8dtPGNPT07V27VrjC78jMDRt2lRr1qxRcnKy6RQAAAAAQSosLExfffXVEe/N165dq+zsbNNpMIhlCvxIQUGB7rvvPo0bN04ul8t0DgJcXFzcET+B69u3r2JjY93b8+bN07Jly0ykeURJSYkeffRR7dq1S5IUGRmpjIwMw1UIFC6XSy1btlR4eLjpFAAAAABB7GjrzY4cOVLdunXTwYMH9fe//91AFUxiGOtHCgoK9MADD2jMmDEMY4PEhg0bVFxcLJfLpebNmxv9fT3nnHN0zjnnuLcjIyNVXFysjRs3SvrlpmCbN29WWlqaqcRqKSkp0f333286AwAAAACAahkyZIgkac+ePfrwww//sKbsunXrVF5ebiINPsAwFvCiXr16afXq1crIyPC7OyneeeeduuCCC3TmmWdKku666y4tWrRIH330keEyAAAAAACCX0JCgr7//vs/7G/atKny8/N9HwSfYM1YIIS1bdtWW7ZsUXx8vOkUj1u2bJnS0tJ04MAB0ynwYytWrNDw4cNNZwAAAACA21dffaVt27Zp3rx5plPgBVwZ62cqKys1evRo3XzzzcrKyjKdY0yfPn105ZVXms4IeuHh4WrUqJGmTp2qQ4cOKTU11XTSCVm0aJGef/75I/Zt3bpV119/vXs7JiZGd999tyIiInydhwBSr149XXPNNUpKSmKtJgAAAAB+oV69epIkp9Opp59+WpI0c+ZMffXVVyaz4CEMY/2MbduaOXOmevToEdLD2OzsbPXr1890RsgYOnSo6YRqyc3N1WuvvXbEvj179uiZZ55Rhw4dFB4erqysLF177bWGChFIzj33XMXHx2vp0qVatmyZKisrTScBAAAAgJKSkvT//t//kyTt3LnziBsQ5+bmau/evabScBIYxgIIaE6nU5Zlubfnzp2r5ORkg0UIRK1bt9bnn3+uOnXqaN++faZzAAAAAOAIEydO1MSJE93b3bt314IFC2TbtioqKgyWoboYxgIIaMuWLVOLFi3c2zExMQZrAAAAAADwvjlz5qiiokK7d+/WKaecoqqqKtNJOEEMY/3U5MmTtW7dOo0dO9Z0Ck7SgAEDdN1115nOCBrjxo2TZVl6++23JUmnnnoqA1gAAAAAQEiJioqSJEVEROitt95y79+xY4duuOEGU1k4AQxj/dRXX32llJQU0xnwgJYtW+rCCy80nRFUmjVrpj59+pjOAAAAAADAKJfLdcT74+LiYn388cfu7b1792rRokUm0nAMDGMBBJSHHnrIdAIAAAAA4CQdPnxYpaWlio6ONp0SVOrVq+f+JKkkrVq1Sh07dtTBgwdl27bBMvzGYToAAAAAAAAAoWXSpEnq2LGj6Yyg16ZNG/30009KTEw0nYJfMYwFcEzz589XTk4OPz0DAAAAAHhUVVWV1q1bp06dOmnPnj2mc4Kay+XS+++/ryVLlhzx6NChg+m0kMQyBfA7gwYNUlZWlukM6Je1ZpYtWybbtmVZlukcAAAAAEAQKS0t1RdffKHnn39eOTk5ysjIMJ0UtLKzs/+w78orr1RMTIwWLlxooCh0MYyF33nggQfUrFkz0xkhb/fu3dq7d6/pDAAAAABAkLvjjjuUkJDAMNbHRo0apdTUVK1Zs+aI/ZWVlSouLjZUFfwYxgI4qiuuuELz589XnTp1TKcAAAAAAAAv6N27t3r37n3EvoKCAqWlpRkqCn4MYwEcFevEAgAQ3MLCwvTll1+e1F2sq6qq1LFjRx04cMCDZQCAUHTPPfdo6dKleuGFF0ynhLyUlBR9//33kqRp06bpueeeM1wUXBjGAgBCXmFhoWbMmKFDhw6ZTgEAr+vWrZvOPPNMORwOZWVlyeVy1fhYlZWVCgsL82AdACBUbdu2TYsWLdIDDzygsWPHKjw83HRSyAoPD1fr1q0lSQMHDlRKSops29aUKVN08OBBw3WBj2Es/EJCQoLq1asnSSf1hgAnz7Zt5eXl8Q0WIWPHjh1aunSp7rvvPtMpAOA1LpdLTZs2lSRdffXVGjRokOEiAAD+KD8/X/fff78uu+wyNW/eXDExMaaTQl7Xrl3VtWtX2bathQsX6qeffpL0f7MDPlVbfQxj4ReGDRumKVOmmM6ApEOHDqlt27ZcIYiQMWnSJE2fPt10BgB4VXp6ulavXm06AwCA4yovL1e7du307rvvqkePHqZz8CvLsrR06VL3dllZmRITE7mQqwYcpgMA+K9+/frp+++/l8PBtwoEpzPPPFMvvvii6QwA8KrbbrtNixYtMp0BAACCiMvlUl5enrZs2aJ//vOfpnMCClfGwrixY8cqJSXFdAYkrV27Vo8//rjKy8slSbm5uZo6daoefvhhWZZluA7wnB07dmjSpElau3YtP8kFEPRiY2Pdy0EBAAB4ym+znAsvvFBPPPGEe//ixYv15ptvmsryewxjYUxFRYWWL1+ubt26sQ6MH8jLy9NHH32kZ555xr3vxx9/1PPPP6+HHnqIYSyCRmFhob744guWJgAQErKyspSammo6AwCAalu7dq1OPfVUtWjRwnQKjiM9PV033XSTe7t+/fraunWre7ugoECFhYUm0vwSnz2GMfv371eXLl20bt060ykhr7KyUvfee6/GjBlzxH7LsrhDMoJGZWWlKisr9cILL2jw4MGmcwDAJ9544w0NHz7cdAYAANU2duxYTZgwwXQGaqB///5aunSp+3HNNdcwW/gdroyFMfHx8dq5c6dq1aplOiWkVVVVKT09/YifWv1m4MCBevbZZ1kzFgGvuLjY/RN1bk4HAAAAAL4zfvx4devWTWeffbbpFL/AMBbGWJal2NhY0xmQVFJS4l4n1uVyaebMmXK5XEpLS1Pt2rUN1wE1d++992rNmjU6dOiQ9u3bZzrnuE455RRNnjz5T5+zf/9+jRgxQrZt+6gKQCBKTEzU008/rYYNG5pOAQCgxpYtW6ahQ4dq5syZXCQUwCIiItSqVasj1pFds2aN7r33XoNV5jCMBULY3r179fnnn6usrMy9ppzL5dKAAQMUHh5uOg+otm+++UZFRUXu7Tlz5mj16tUGi47uvPPOU1xc3B/2p6enq3///n/62gMHDmju3LmqqqrSxo0b/fLfD4B5UVFRGjBggOkMAABOSmFhod566y316dNHnTt3VkJCgukk1FBsbOwR73Vyc3P1zTffSJKWL1+u7du3m0rzOYaxQIgqKytTbm6urrzySkm/rMczaNAgw1XAiSstLVVVVdUR+yZNmqR3333XUNGfczqdioiIkCRNmzZNbdu2rdFxYmJi9O9//1uS9OSTT2rs2LEqLS31WCeAwOd0OhUVFWU6AwAAjygtLVWfPn20YMECderUSS6Xy3QSPKB169bu926DBw/We++9J0k6ePCgySyfYBgLhKiJEydqyZIl2rNnjySxmDYCTqtWrbRt27Yj9lVUVBiqOb7Bgwfr+eefl/TLoMQTRo4cqUsuuYQ7zAI4wogRI/TEE0+YzgAAwKMuvfRS3XjjjZo6darpFHjYK6+8Itu2VVJSovr16/v1+zpPYMENBI23336bKztP0ODBg5WQkKAnn3xS4eHhCg8PZ/0dBIQXX3xR559/vs4//3wVFRWpvLz8iIe/rqP6yCOPaPz48e6/b5ZleeS4DodDqampWrRokRo0aOCRYwIIbNOmTdNtt93GckMAgKBTUVER9EO6UOV0OhUeHq74+HgtWLBAixYt0q233mo6y2u4MhZBIy4uTomJiXr++ec1YMAAbg52FPv379fs2bNVp04dnXHGGWrfvr3pJOCEzZkzR2+//bY+++wz0yknJDU1VRdffLEkqXv37srIyPDKeSIjI9WlSxcNHTpUH3zwgVatWuWV8wDwb+Hh4brqqqvUrVs3paenm84BAMAr1qxZozfeeEODBw82nQIvCAsLU+fOnSX9ctP3kpISSb+8F9y7d6/JNI9iGIug0bVrVzVv3lxpaWlq1aqVMjMzFRMTYzrLr/z000+aMGGCli9frtTUVNM5wJ/as2ePysrK3Nt33XWX1q9fb7DoxMXFxem8887TjBkzfHbOBx98UJWVlQxjgRAVFRWlZ5991mPLoAAA4I8WLVqktWvX6oILLlDdunVZbi+Ide7c2T2Y/fHHH5WXl+f+2s6dO/9w/5BAwueSEZTOOecc1ko7iiZNmqioqIhBLAJCz549lZKS4n4EyiBWkp566im9+uqrpjMAAACAoLN9+/aAe3+Ak7NkyRIVFRW5H/Xr1zeddFIYxiKopKSkaNWqVWrcuLEef/xx9e/f33SS3/HUWpWAp40ePVqZmZnux8qVK2XbtvsRCBwOh5YsWaLLLrvMyPlvvfVW/etf/zJybgDmXHTRRVq6dClXxQIAQoZt2+rZs6cyMzPVt29f0znwAcuy3I+FCxdq1apVeumll0xn1QjDWASV8PBwZWZmavTo0WratKm++OILPfTQQzpw4IDpNADHUFVVpUcffVQff/yxcnNz3Y/S0lLTadWSkpKicePGqX379oqLizPSUL9+fXXq1El33XWXoqKijDQA8K3+/ftr+PDhat26tekUAAB8at26dcrNzdXSpUt53x9iMjIylJmZqVNOOcV0So0wjEVQGjNmjAYMGKCIiAhNmDBBBw8eNJ0E4H8UFxcrLy9PP/zwg+655x798MMPppNOSmpqqh544AHjQ9CUlBQ9+OCDatOmDTcyBELAVVddxU1MAAAhrbi4WOPHj9eqVauUl5enwsJC00nwkejoaKWnp8vhCKzxZmDVAtUwZswYvf/++6YzABzDxIkT1aJFC5122mkBdxWsv7MsS8uWLVO/fv1MpwDwMpYfAgDgFx07dlSLFi34IWUIadeundauXRtwN29nGIugZ9u22rdvr1mzZvnsnMXFxUpLS9O6det8dk7A3/3jH/9QWlqa+/H666+bTgp6jzzyCL/OQJAKDw/X2rVrddFFF5lOAQDAryxfvtz9nuOjjz4ynQMvczqdWr16tbp162Y65YSxyj+CWoMGDTR16lTdfffdPl2qoLKyUgUFBbrrrrsC6hsC4ElLlizRW2+95d7+8ssvVVBQYLDIe/r06aOBAweazviDxMREnXvuuXrkkUc0btw4lZeXm04C4AHp6ekaPXq0mjVrpvDwcNM5AAD4lcOHD7vfdzz55JOaN2+e+2ujR48O2HVGcWyNGjXSTTfdpIYNG2rGjBmmc47L8oc7VFuWZT7CD/Xu3VvvvPOO6YyAZ9u2UlJSdM0112jIkCFq2bKl189ZVFSkBg0a6Pvvv+eGGghJubm5euGFF/TYY4+ZTvGJxx57TLfccovpjGM6fPiwEhMTWQ4CqKbo6GhlZmbW6LXffvutDh8+7OEiqUmTJurWrZueeuopjx+7JiorK1WnTh3t27fPdAoAAMf19NNPKysrS+Hh4WrXrp3pHHjY3Llz1bt3b9MZbrZtH3U9Ka6MRUgICwvTww8/rP/85z9atGiR6Rwg6FRVVen3P9wbMWKEvv76a4NF+F9hYWGmE4CAYlmWWrVqpWXLltXo9c2bN9eGDRvkqQsffvs7PGHCBF177bUeOSYAAKHm+uuvlyTVr1/ffaMvh8PBGuzwKdaMRdCzLEtr1qzRVVddZToFCFpXXHGF6tat636sWLHCdBJ+JyIiQlu2bGFtSaAa/vrXv+rTTz+t8etXrFih0aNHe6QlIyNDu3bt0q5du3T11Vd75JgAAISyHTt2uN+7PPzww6ZzEGK4MhYhIS4uTi6XS2vWrNHVV1+t5557ThEREaazgICWm5urBx98UNIv68Pu3bvXcJE5r7zyin766SdNmjTJdMoxxcXFacKECTr11FP15JNPms4B/E5OTo6uuOIK9/bpp5+u2rVr1/h4sbGxGjZsmJKTkzVu3LgaHePRRx9VSkqK4uPjFR8fX+MWAABwJNu23e9fXn/9dX3//feSpOnTpyshIcFkGkIAw1g/tn37dn388ce66KKLuGTeA9q0aaPc3FzNmjVLTz31FMNYoJpKSkr0xRdfuLe/++47vfHGGwaL/MfKlSvldDr9ehgrSZ07d9a2bdsYxgK/Cg8P14UXXijplxvxDR482KPHb9u2rWzbrtYwNi0tTa1atZIkXX755WrUqJFHmwAAwJFyc3OVm5srSerWrZvq1KmjhIQEnX322YbLEKwYxvqxZcuWqV+/ftq9ezd3yvWAm2++WW3atNHFF19sOgUIOOXl5VqzZo0uu+wy0yl+q6qqSqWlpYqOjjad8qecTqeioqL0888/m04BjAoLC1P9+vX1wQcfePWH3g6HQ9HR0Sd0A73IyEj1799fjzzyiNd6AADAsf22HFCHDh2OWK4oLCyMC7rgMawZCwA4rvvvv1+dO3c2neHXVqxYoaSkJL9frqFfv37atGkTN/RCyBs0aJDWr1/v9U8fZWZm6qefflLdunWP+9yPP/6YdesAAPADX3/9tRITE90P7kEDT2IY68c6duyo9957T04nFzB7Stu2bfX+++/r8ssv13/+8x/TOYDfmjNnji688EL345VXXlFZWZnpLL9m27YOHz7ssTune4vD4eCn+ghZLpdLH374oRYsWKDx48fL5XJ5/ZyWZSkiIuKEhr4ulyvg/rvP4XBo7ty5/MAOABBUfvtv+98en332mfu90apVq0znIcAF1n/thZikpCR16dLFdEZQSUhI0Pnnn6++fftq165dpnMAv7Fx40YtXrzYvT1//nwtXLjQYFFgqqqq0muvvabLLrtMTZs2NZ1zTC6XS1dffbXef/997dy503QO4DNhYWE6//zzFRkZ6dPzWpalwYMH68MPP9T69et9em5vsyxLnTt3VnJysukUAAC8ZteuXe73R6+//rpWrlypyMhIDRo0yHAZAhHDWISsffv2af/+/Sd1p2QgUJWXl2vfvn3u7Y8//lg33HCDwaLgYNu2Ro0apTp16vj1MDY6OlovvPCCsrOzGcYiZISHh6tu3brGbor6+OOPq6SkJOiGsb+JjY1VbGysSkpKTKcAAOBVkydPlvTLBXRdu3aVZVmKjo72+3tHwH+wTAFC1tVXX60RI0aYzgCMWLRokerVq+d+jBw50nQSAHhV165dtXnzZpbp8JJnn31Wr7zyiukMAAB8ZufOnUpOTla9evX00EMPmc5BAGEYi5DjdDq1dOlSderUye/XdgS84Z577tHIkSNl2/YRD3jO+PHjdfPNN5vOOK6XX35Zd9xxh+kMwOv+9re/6cknnzR2VWwosCyLX18AQMj57b3Uc889p9NPP12nn366duzYYTorJN1xxx0aM2aM6YwTwjIFCDmWZalt27aKi4sznQL4zI4dO/TSSy9Jkj744ANt2LDBcFFwy8/P1yeffKIpU6Zo9OjRCgsLM510VC1atFDPnj21f/9+Pf3006ZzAI+zLEujRo1S9+7d1axZM9M56t69u37++Wf961//Mp3iFenp6br11lv1+OOPq6qqynQOAAA+s2PHDvcQ9vHHH1d8fLwaNGigq666ynBZ6Ni0aZM2btxoOuOEWP5wNZRlWeYj/FDv3r31zjvvmM4IWj179lRFRYWmT5/u8bUdi4qK1KBBA33//fdq3bq1R48NVEdBQYHKy8u1evVq9erVy3ROyImKitKePXv8/mPRBQUFSktLM50BeFRERIRSU1O1YsUKxcbGms5x+/TTT9W1a9ejfm3ZsmXq0KGDj4s8q7S0VO3bt1d+fr4OHTpkOgcAAGPatm2rOXPmSJKSk5MVExNjuCi49e/f3/3r7S9s2z7qx4ZYpgAhbd68eTr99NNVWVlpOgXwis6dOys9PZ1BLICQk5mZqby8PL8axIaC6OhorV27VtnZ2aZTAAAw6rvvvlN6errS09P1wQcfmM6BH2EYi5A1Y8YMTZ8+3XQG4HHPPvusmjZtqqZNm6qwsNB0Tkg7dOiQTj31VH3xxRemU/5Uw4YNtWnTJjVp0sR0CoAgMWvWLPfdpgEACHWjRo1S06ZN1bJlS5WVlZnOCSoVFRXKzMzUvHnzTKecMIaxCFm/3UXem/7xj39o/vz5Xj0HIEkHDhzQmDFjNGbMGL3yyivKz89Xfn6+KioqTKeFNNu2tXnzZk2dOlVvvPGG6ZxjCgsLU5MmTXTPPfcc8+PTQCDp1auXbr31VtMZIS05OVl169Y1nQEAgF/YuXOn8vPztX79et1xxx3Ky8sznRRUCgoKVFpaajrjhHEDL4S0hIQEZWZmasWKFWrVqpXH13B5+eWX5XK5jvpm5LcbifnrjX0QGPLy8lRSUqK9e/dq6tSp8od1wPFHc+bMUUxMjAYPHmw65U8NHz5ceXl5WrhwoekUoMZat26tgQMH+v3ft1BQt25dtW3bVt99953pFAAA/EJFRYWmTZumjIwMlZSUyOl0qm3btqazAtaBAwe0evXqgFt6kmEsQlrXrl11xhlnqE6dOlq8eLHOOeccj59jxowZmjFjxh/2u1wu7dy5k7XsUCO/3aV65MiRWrBggeEanAjbtlVVVSWHgw+lAN7icDj0wQcfqHHjxqZTjsmyLDkcDvf38WDWs2dPZWdnq0GDBgH3JgkAAG8aOXKkJCkpKUnbtm2Tw+GQZVmyrKPe7wnH8O2336pTp06mM6qNd4SAAWeddZa2b9+u2rVrm05BAFqwYIGSkpKUlJSkzz77zHQOTtCsWbN06qmncvUy4CVNmjTRzp071ahRI9Mpf6pTp07aunWroqKiTKf4RFJSkoqLi3XaaaeZTgEAwO/s2rVLycnJSkpK0v333286Bz7ClbF+avTo0erTp4/pDHjB0KFDNXjwYCUkJJhOQYDYtGmT7rnnHvf21q1btXv3boNFqImysjJt2bJFQ4YM0d/+9jelpaWZTjqqyy+/XLVr19b48eNNpwAnLCcnR9dee60SExNNpxyX0+lUYmJiyFz5YlmWEhMT9fDDD+vll1/Wm2++aToJAAC/Ydu2+73d7NmztW/fPk2ZMsVwFbyNYayf6ty5szp37mw6I6R89dVXSkhIUMuWLU/qOFu2bDnmndO7dOmi/v3765JLLjmpcyB0/PDDD/rkk0/06quvmk6BBxw+fFivvfaabrvtNr8dxmZlZSk+Pl6ff/65Fi1axN1e4ffOOecc9e3bV7169TKdgj+Rk5OjgwcP6v+zd+dhUVb9/8DfNzAwIouAbAqKiuKCW6DikluG4QLuIqmpmWWLW1lm+aRpLtmjpk+pPea+Zj36xdLcSdzFDXAhdwRBVpF9mbl/f5j8MlGBWc7M8H5d17mKmbnPeQPjPcOZc39OUlLSM98nERERVWWXL19GZmbmE/MF9vb2CAgIEJjKcF2+fBmnT58WHaNSOBlLVZ4kSahWrRo+/PBD3LhxA//5z3806u+3337D+PHjnxrDysoKmzZtQq1atTTqn0yfLMsoLCwEAHz33XcaPyfJ8BQWFqKkpAQWFob5Muzl5YU9e/agXr16SEhIYK1HMmjLly9HixYtRMeoMKVSiYKCgipRO/axoUOHws/PD82bN0dBQYHoOERERAYnKSkJr732WunXfn5+pR9iKhQKbgD+l8LCQnz77bf44YcfREepFNaMpSrPzs4OqampaNeunc7GaNSoETIzMzkRS+Vy8+ZNODg4wMHBAcuXLxcdh3SgS5cumDlzpugYzyVJEuLi4jBixAjRUYhMjqWlJRITE6vklTINGjRAZmYmPD09RUchIiIyeOfOnSv92/Cnn34SHcdgNG7cGD/++KPoGJXGyVgDNWvWLMyaNUt0jCpDqVTCzMwM4eHhCAsL02rfr7/+OlatWgWlUqnVfsl04WMm8QAAIABJREFUZGVl4bXXXkPPnj3Rs2dPjBo1CgUFBSgoKOCKRBNVVFSETZs2YdSoUaKjPJeVlRWmTp2Kr7/+WnQUoqfY29vj999/R/369UVHqRSlUomvvvoKn376qegoeiVJEpRKJTZu3Mj9EYiIiF5AluXSvw3nzp2Lnj17Ijg4uMqXEisqKjLqv5UN8/pIwsWLF1GvXj3RMaqcu3fv4siRI5U+ft++fTh16lTp13369MGAAQPQqVMnbcQjE3PgwAEkJyfj4cOH2Lt3r+g4pGe3b9+GmZnhfybatGlTpKSkiI5B9BRLS0sEBgYa9UZYrVu3xp9//ik6hhCdO3dGfHw88vPz8fvvv4uOQ0REZPBiY2MRGxsLhUKBTZs24bXXXoO7u7voWHr18OFDhIeHIy8vT3QUjXAylkiLZs+e/cSmFPPmzYOvr6/ARGRoiouL8fDhQwCPVsBzE5OqTaVSIT09HQ4ODgY9Mft49/fHO70SkfZYWlrCwcEBmZmZoqPo3fDhw9G0aVOcOXMGAJCdnV3lV/oQERG9SHFxMcaMGYPt27ejW7duMDMzg4ODg+hYOpefn4/Lly+bRBk1w/3Lj4jIBO3duxcuLi5wcXHhRCzhzp07cHFxwd27d0VHea5OnTohMTER1atXFx2FyOT0798fN27cqLIbcrz00ktISUlBSkoKBgwYIDoOERGR0RgyZAhcXFzg4+NTJTYEXbhwITp06CA6hlZwZSzRP6SkpMDf379Sx8bFxQEAatasiT179sDb21ub0chIqdVqvPLKK8jOzkZWVlaVeKGk8lOr1ejTpw8++eQTDB8+XHScZ6qqE0VE+mDIK+P14fH3P3fuXAQEBGDSpEmCExERERk+WZYhyzIyMjLQtm1bAI+uOjHV19HH368p4GQs0T8UFxfj7NmzlT6+ZcuWGDBgQKUndMk0/PDDD8jOzgbwaLLt5MmTKCgoEJyKDFVsbGzp7qiGOiFrZmaGCRMmYNu2bbh586boOFTF+fr6YuDAgUZdL/bvrKysMGXKlCpX9+2f6tWrh8DAQEycOBH/+c9/jHpjDiIiIn1RqVSlcxg2Njalr59jx46Fvb29yGhas3btWhw/flx0DK2RDGFWWZIk8SEMUL9+/bBjxw7RMTSWlpYGlUoFV1dX0VGeq2PHjpX+xy1JEurUqQNJkvD6669jzpw5Wk5HxiArK6u05l+HDh2QlJQkOBEZGz8/P0RFRYmO8Vxjx45FeHg4UlNTRUehKmz06NFYvXq16BikIwUFBWjZsiXi4+P5QSYREVEl/fHHH6hTpw6srKyM9gNflUqFu3fv4tVXX8X169dFx6kwWZbLXDnAlbGkc5MnT0ZKSopJ7xZvY2ODa9euQaFQiI5CAi1btgwzZswQHYNIp1atWoWff/4ZgwcPFh2FiEyUUqlEXFwcevTogYMHD4qOQ0REZJS6dOkCAGjVqhXOnz8vOE3l3L9/H/Xq1RMdQ+uqdoEqA7dv3z40b94cJSUloqNUiizL8PPzw86dO0VH0anAwEBER0dzIraKCwoKwqJFi0THICMXExODhg0b4sGDB6KjEBEJt2HDBty4caO0NW3aVHQkIiIio3P58mV4e3uXXsVpLDZv3oz27duLjqETXBlrwPLy8nD79m2jKlA8f/780n/gsizj8uXLJnl5mZmZGWbOnAkrKys0atQIXl5eoiORnq1cufKJuplRUVFG9+JGhqeoqAg3b940+DqJzZs3x6xZszBr1qxKbUg3YsQI+Pr6ln69fPly3L59W4sJicgU/POSyo8//hj3798v/frv7zuJiIiobEVFRbhx4wa++OILvPHGG/Dz8xMd6YWWL1+OnTt3Ij4+XnQUneBkrIFTq9W4cOECmjRpAhsbG52OFRsbi+LiYo36WLp0qcnXyaxevTqaNGmCDz/8ENbW1qLjkB5lZWWVTsD+97//1WijN6JnkWUZMTExaNmyJRwcHETHKZOPjw+mTJmCXbt2oaSkBGlpaUhISHjuMdWrV0fDhg0BAKNGjUL37t1L77t69eoTl05dvXrVJD/IIyLNvPHGG098feTIESQmJj5x2507dzhBS0REVIZly5ahZs2aqFGjBho0aCA6TpnUajWio6Pxww8/4MKFC6Lj6I4sy8IbAJnt+S0iIkJWq9WytqjV6qeap6enzvIHBgZqLbuudOjQ4bnfgyRJsiRJcufOnUVHJT17/G9kx44dws8FbFWnbdiwQfRTv9y+++47+a/NOMtsFT13+vr6lp5zn9cvW9Vto0eP1uEzmozZiBEjeP5gY2NjY2N7TuvRo0fp37iG4nGeBw8eyBYWFsJ/Rtpq8jPmQVkz1kiEhIRg6tSpWukrLS0Nrq6ucHFxeaL9c2UBPSk6OhopKSkIDw8XHYX0qKSkBPXr14eLi8tTK3KI6JExY8bg4sWLz7x/4cKFFTp3HjlyBCkpKUhJScH9+/fh5uamjZhEVAV89913peePlJQUdOvWTXQkIiIig/LHH3+UzgMZykaZo0aNgouLCxo0aGC0+yZVBMsUGImsrCzs2rULxcXF+PbbbyvVx4IFC3DlyhUUFBQgNTVVywmNV35+Pt577z1cu3atzPtr166Nr776Cg0aNEC1atX0nI5EWb9+PQ4dOgS1Wo3ExESNS3gQVdTKlSuRnJyMjz76SHSUF1Iqlc8tqWBjYwN7e/ty9/fPvpYuXYq8vDxcv34dc+bMqXROMg2ffPIJgoODRccgA2VrawtbW9vSr6dPn46GDRvihx9+EJiKiIjIcBQXFyMtLQ3Ao/rr165dwzvvvCMki0qlwnvvvYfDhw+XZqoKOBlrRP7880+kpqaib9++kCSpwsdv2bLluSuXqqqioiKsX7/+qQ1zAgICUL16dXh5eXFFZBVw69atJzbk+uWXX7gKmoQ6evQo8vPz4efnhy5dusDMzLAvZqlWrRpeeeUVnDhxAnl5eU/cd/XqVZw9e7bSmwUMHjwYwKOdYI8dO4bIyMgq8Yk5PcnMzAxdunRBaGgoWrVqJToOGYkePXpAkiRcvXoVR44cER2HiIjIoBw4cAAqlap0b4c2bdrAzs5O5+NevXoViYmJKCkpwfr165Gfn6/zMQ2JJD+q2So2xKN6TmTCAgMDsXfvXtExypSVlQUnJyeoVCqYm5vDwuLRZxTR0dFo1KiR4HSka4WFhQCAOXPmcMUdGSSlUonMzEwolUrRUcqlZcuWuHTp0lMfcHXs2BEHDx6ElZWVRv2XlJSgdu3aSEtLg1qt1qgvMi7W1tbIzMyEpaWl6ChkhDIzM1G7dm2o1WqoVCp+oENERFSGyMhItGnTRufjjB8/HmvWrNH5OKLJslzmSkqujCX6m7Fjx2Lx4sUAYDQTH1R5SUlJ8Pb2hizLLENApCWnTp3Cl19+iXnz5j1x+4kTJ0rrk9vY2FS6fwsLC9y+fRuvv/46duzYoWlcIqoiHBwckJ6eDuBR6ZNp06YJTkRERGR4evTooZcr8oqKinQ+hiHjZCzRX7755hv07NmTdWFNWH5+PoYOHVq6GqaoqOipy6mJSDNKpRKjR49Gp06doFarMXLkSGRmZkKtViM7OxsDBw7EF198gQ4dOlR6jGrVqmHGjBnw9fXF7NmztZieDJW/vz/mzJlTevUKUWU8fo83ZMgQNG/e/JmPW7x4MQ4cOKCvWERERAbj8ZWjpFt8R0tVnqWlJUJDQxEcHFxaJ4VMS2RkJBISEpCfn4/du3c/dfk0kSFTqVT46aef8Oqrr8Ld3V10nHJp2LBh6fk0NDQU+/btw40bNyDLMvbt26eVDQJat24NtVqNq1ev4pdffmHJAhPn5uaGnj17io5BJqJevXqoV6/eM++/e/cunJ2dn7jtzp07OH78uK6jERERURXAmrGkF4ZcM5ZMU0lJCbKzswEAI0aMwG+//SY4EZFmfv/9d6OdjJoyZQpWrVpV+m9y/fr16N+/v0blCh7Ly8tDvXr1UFxcjMLCQq52N1F9+vTBrl27RMegKuyXX37BW2+9BQB4+PAhP9glIiKiF3pWzVjD3pqZiKiSTpw4AWdnZzg7O3Milkiwf//7309MpI0aNQqjR4/WSt/W1ta4d+8eUlNTsWTJEq30SUT0TwMGDEBqaipSU1Ph5+cnOg4REREZMZYpICKjt3LlSqxevfqJ27hqhchwSJIEPz8/nDp1CgDwySef4ODBg+jevTv2798Pc3Nzjfp/fHy/fv3g6emJoKAgjTMTEf2dJEml55o1a9Zg1apVpZu+EhEREVUEJ2OJyCjl5ORg1apVAIA9e/bg9OnTghMR0fPY2Nigbdu2AICRI0ciOjoaSqUSkiRh+/btcHJyQvfu3TUaw9nZGR07dsSkSZOwceNGpKWlaSM6kcG7du0a9u/fj/Hjx0OSyrwajrSoadOm6NevHx4+fIgff/xRaBZfX1+0b98e//3vf4XmICIiovLjZCwRGZX79++jsLAQycnJmDx5sug4RHqTkpKC9PR0ODk5iY6isX+WKNi2bRtq164Nb29vAIC7uzsUCkWl+ra1tcXixYtx+fJlnDp1CllZWRrnJTJ0165dw7JlyxAUFARzc3PY2trCwcFBdCyT1rlzZ3h7e2P//v1l3p+eno7c3FyNxnBwcICtre1zH9O3b19MmjQJe/fuRVJSEoqLizUak4iIiPRAlmXhDYDMZtotMDBQJtKGgIAA4c9nNjZRbfjw4aL/CerM8uXLS7/PmJgYrfQ5e/Zs4b8zNu20Pn36aOU5YcqysrJkCwsLGYA8fvx40XGqvDfffFPj5/2KFSsqNGazZs2E/1tlY2NjY2Nj+/9NfsY8KFfGEpFBe+edd3Dw4MHSrxMSEgSmISJdGTZsGFq0aIGOHTuiT58+mDJlCiZMmKBRn++++y7atWuHwMBALaUkMg6bN29GTEwMIiMjRUepsubOnYtPPvlEoz5cXFwq9PjffvsNixYtwtKlSzUal4iIiHRLkh+tTBUbQpLEhyCdCgwMxN69e0XHIAO0b98+REREPPP+7du34/r16/oLRGTAmjVrhtDQUHz22WcmWRcyJycHy5YtAwAUFRWhsLAQADBt2jTY2dlVqs9r166hUaNGWstI+hcaGooBAwZg8ODBoqMYtKKiIixevBjLly/HnTt3YGNjgw8++ACTJ0+Gs7Oz6HikJ0ePHsUvv/yCJUuWiI5CRERU5cmyXOYfbZyMJb3gZCwBwPXr15GXl/fEbUuXLhW++QWRMXF0dERqairMzMxER9GplStX4vvvv4ckSfj222/h4OAAa2vr0rqy5RUfH4++ffsCAO7du8dNvYzQtm3bMGTIENExjMaoUaNw/vx5FBUV4erVq9ixYwfat28PV1dX0dFIT27duoV+/frh8uXLKCkpER2HiIioyuJkLAnFydiq6+/nmICAAJw+fVpgGiLjV1UmY//O09MTCQkJCAgIwPHjxyu9KnjSpEn49ttvtZyOdI2TsZVz7949eHh4QJZlzJw5E1988YXoSKRnj8+dREREJMazJmNf+JecJEmekiQdliTpsiRJlyRJmvjX7Y6SJO2XJOnaX/91+Ot2SZKkpZIkXZckKVqSpJe0+60QkbEICwuDm5tbaTt37pzoSERkhM6fP49Ro0bh7Nmz8PDwQHZ2dqX6mTNnDnbv3q3ldESGyc3NDcnJyfD29sbChQvx8ssvi45Eevb43ElERESGpTwbeJUA+FCW5XOSJNkCOCtJ0n4AowAclGV5viRJ0wBMA/AJgCAADf9q7QAs/+u/RGSC4uPjMXPmzDLvi4yMREpKin4DEZHJqVmzJsaNGwd3d3fMnz8f7777LiZOnAh/f/8K9WNjYwN/f//S0igrV67kan0yWWZmZnBxccGCBQuwdu1aREZGYsyYMVi0aBFq1KghOh7pwd/PnfPmzRMdh4iIiP7ywslYWZaTACT99f/ZkiRdAVAbQAiArn89bB2ACDyajA0BsF5+dG3ySUmSakiS5P5XP0Rk4NRqNY4ePQq1Wl2ux1+7dg1r1qzRcSoieqy4uBgRERFo06YNbG1tRcfRm/bt28PW1hbHjx/Hli1b0KBBA+Tk5MDc3BydOnUqd+kCZ2dnjBkzBgCQmJgIa2trqNVqREZGwhBKNxFp24ABA3Dx4kXs2rULa9asQa9evVCzZk3Y29ujdevWouORjj0+d544cQIAcOnSJaSmpgpORUREVLVVqGasJEleAI4A8AUQL8tyjb9ulwBkyrJcQ5KkXwHMl2X56F/3HQTwiSzLUf/oaxyAcX996afh90EGjjVjDY9KpYJKpXrq9vz8fLi5uaGgoEBAKiIqr6ioKPj5Vb2Xz5KSEtSqVQvp6elQq9WoXr06kpOTUb169UrXks3Ly4ObmxsKCwuhVqu54Y2BYc1YzX355ZeYM2cOiouLS2/r2LEjDh06BEtLS4HJSN/CwsKwfft2nueIiIj0oNI1Yx+TJMkGwC8AJsmy/PAfncsAKrScRJblH2RZ9pdluWLXGBKRVnz++edwdHR8qtWuXZsTsURksCwsLHD79m2EhIQAAHJzc+Hm5oZjx45Vuk9ra2vcu3cP6enpWLZsmbaiEhmMTz/9tHRl5GMnTpyAq6trpWswk3Fas2YNNm7cKDoGERFRlVaemrGQJEmBRxOxm2RZ/t9fN99/XH5AkiR3AI8LQyYC8Pzb4R5/3UZV2Pnz5zFo0CBs2bIFCoVCdJwqZ9KkSbh58+YTt126dAm5ubmCEhERVZ61tTU+++wzjB49GgUFBQgLCytzpX9F2NjYAAB69+6NDRs2YOTIkSxbQCZDoVCgWrVqT9ymVquRlZWFoUOHYsaMGWjfvr2gdKRPVlZW6Nq1K8LDwwEA06dPR2xsrOBUREREVcsLJ2P/KkHwI4Arsiwv+ttd4QDeADD/r//+399uf1+SpK14tHFXFuvFUmpqKn777TeoVCpOxpbDrVu3cObMGa31t3PnTty5c0dr/RERiebn5wc/Pz8UFhZi8ODBiImJQfXq1Su8qdc/eXp6Ijg4GEOGDIEsy7h69Sqio6O1lJpIHDs7OwwZMgTh4eGlV8DIsow9e/bgrbfeEpyO9MnV1RV9+/YFAJw7dw5KpRJRUVEvOIqIiIi0pTwrYzsCGAEgRpKkC3/dNh2PJmF/kiTpTQB3ADwu5rUbQC8A1wHkARit1cRERiw7O7tcG2P9+uuvmDBhgh4SEZGxysnJQUFBAZRKpegoQllZWWHz5s3o1asXYmNj4ePjo/HGZnZ2dti6dSsA4Ntvv8WMGTN4KTcZPQ8PD2zbtg21a9fGvXv3nrgvNzcXeXl5sLa2FpSORPniiy9Qr149vP/++zzPERER6UmFNvDSWQhJEh+CdE6pVCIzM7NKTxzUrVsXiYkvrtohy3K5Jm2JqOoyNzfH5MmTsXDhQtFRDIJarcbq1avxxRdfICEhodKbef2TLMu4du0afHx8tNIfVRw38NKusiZjzczM0L9/f/z888+CUpFIsiwjKSkJHh4eLM9CRESkRRpv4EVE5bNjxw60b9++zJacnAyVSvXCxolYInqRx+cLesTMzAzBwcFYu3YtOnTogPbt22PlypUa9ytJEurWrYsTJ07gxIkTCA0N1UJaInHCw8PRv3//J25Tq9U4dOgQunfvjpKSEkHJSBRJkuDs7Izjx4/jxIkTePvtt0VHIiIiMmnl2sCLqCrJycnB2rVrK3380aNHcfLkSe0FIiKicnFxcUFAQAD8/f1LV3pt3LgRw4cP16hfKysrBAQEAADu3bsHlUqF7du3ayMykd75+fnB1dX1qdszMzNx/PhxfP/99xg0aBBq1aolIB2JolAoSs9zWVlZyMnJwaZNmwSnIiIiMk2cjCX6h5ycHCxYsADJyclcHUJEZGRsbW2xbNkyAMCyZcuwcuVKjSdj/27AgAFo0KABTpw4AQDIyMhAXl6e1vqnp2VkZCA9PR1OTk6io5gMBwcHODs7IzU19YnbCwsLMXHiRNSuXRvt2rWDubk53N3dBaUkUXr27IlGjRrhjz/+QFJSEq/CICIi0jLWjCW9MbaasY0bN0ZcXJzoGEREzzR58mQsWrRIdIwqbcSIEdi4caPoGCavT58+2LVrl+gYJiUyMhKdO3d+7mPc3NyQlJSkp0RkiOrUqYO7d++KjkFERGSUWDOWiIjIxKxduxbdu3cXHcOg3blzB40bN0ZycrJO+v/6669x9epVXLhwAVZWVjoZg0gX/P39ERMTg+rVqz/zMampqWjcuDEaN26MpUuX6jEdGYrDhw/j6tWr2LZtm+goREREJoOTsUTPMHHiRHTt2lV0DCKiZ8rMzMStW7dExzBotra2GD58OL7//nud1PN2d3eHj48PmjVrhpkzZ8Lb21vrYxDpQrVq1dC4cWOYmT37zwGVSoW4uDjExcVh+/btmDFjBv71r3/h4cOHekxKIjVo0AA+Pj7o0KEDZs2aZTRXuBERERky1owleobx48dDpVKV1lOLi4tjDVkiMjhFRUWIjY2Fj48PFAqF6DgGx9HREZ9//jl69OgBOzs7uLq6ol69elofx8LCAtOmTcPNmzeRn5+PxMRErY9BpG2SJKFJkya4cuUKsrOzn/vYo0eP4ujRowCArl27olWrVnB0dNRHTDIAHh4e+Pzzz7Fnzx5cvnyZE/JEREQa4MpYoud4//33ERsbi9jYWDg7O4uOQ0T0lHv37qF58+aIj48XHcWgHThwAAkJCQgLC9PpOD/88APmzZun0zGItMXc3BynTp1Cjx49KnTcK6+8gs2bN+soFRkqMzMznDhxAr169RIdhYiIyKhxMpaonC5evIikpCTs3LlTdBQiIqqE2bNn48svv4SHhweysrJ0Ns7gwYNx+fJlSFKZ9fqJTMJnn32GPn36iI5BAqxYsQIbNmwQHYOIiMhocTKWqJycnZ3h5uaGdu3aYcWKFVixYgVatWolOhYREQDg888/x759+0THMGi2trZo2bIlZsyYgU8//RRRUVE6GUepVKJ+/fpYsWIF6tSpo5MxiER7+PAhzpw5g3fffRf5+fmi45Ae2dvbo3Pnzli6dCksLFj1joiIqKI4GUtUQW5ubnj77bfx9ttv66TuIBFRZWzduhUXLlwQHcPgubi44K233sLt27dx8uRJHDlyBEePHoUsy1odx8rKCuPGjUNgYCDq16+v1b6rorS0NJ38nuiRZs2aoWnTphU+LiUlBStXrkRERATS0tJ0kIwMVZ06dTBu3Dh06dIFnTt3RqNGjURHIiIiMhqcjCXSgIWFBczNzUXHICIC8Gjnc240+GJmZmbYvXs3Tp48ie7du6Nnz57Izc1FcXExVCqVVsf673//iw8++ICrxzR08uRJ9OzZE8XFxaKjmKTZs2fjq6++qtSxarUavXr1wqFDh7ScigydlZUVDhw4gD/++AOfffYZN5EkIiIqJ07GEmlg/fr1WL9+vegYREQAgJkzZ6Jr166iYxiNVatWYdu2bcjLy4O7uzscHR3x/vvva32c9957D+fPn9d6v0REhiIsLAxxcXGslU1ERFQOnIwl0oBSqUS3bt2wefNmrpAlIuGKioqQm5srOobRUCqV6NSpE7Zu3YqCggLk5ORg9+7d6N+/P/r374/bt29rZRyFQoH69etjx44dcHV11UqfRIZm4cKFmDt3rugYJIiFhQVq166NHTt2wMvLS3QcIiIig8bJWCINubu7o0+fPjAz4z8nIiJj4+rqin79+mHAgAFwdnZGfHw8du7ciZ07d+J///sfYmNjtTKOtbV16Tje3t5a6bOqUalU2LlzJ1JTU0VHMUm1atVC3759K72yMSoqCjt27EB4eDjUarWW05ExsLS0REhICPr3749mzZqJjkNERGSwOHtERERkQtRqNbKzs7nRUQVYWVlh27ZtaNGixRO3f/jhh9iwYYNWx/r+++8RGhqK6tWra7XfqqCwsBBDhw5FTEyM6CgmqW3btti8eTMcHBwqfbVPVFQUwsLCkJmZiezsbBQUFGg5JRmDRYsWYezYsbC1tYWtrS1LFxAREf0DJ2OJiIhMSHR0NGrWrIn09HTRUegZZs2ahYiICNExiJ5iY2OD+/fvo3379pXuIzc3F25ubnB0dMSECRO0mI6MyYQJE5CRkYGMjAzUqVNHdBwiIiKDwslYIiIiE1NUVIQ+ffpgz549oqMYlSVLluDTTz994rbNmzcjNDRUq+OYmZmxzrgGJk6ciPnz54uOYbIsLCzw3XffITIyEjt37qxUHyUlJSgpKUF4eDiCg4O1nJCMgZmZGSwsLGBhYYHt27djyJAhoiMREREZDPOZM2eKzoBZs2aJD0E6Z2FhgWnTpsHCwkJ0FK26desWtmzZgr179/KyYCIyGImJiejRowdatWolOorRcHFxgbW1NUpKSnDx4kUAwMOHD3H//n3Y2dnBx8cHVlZWWhmrpKQEeXl5uHTpEkpKSrTSZ1WRkpKCOnXqICQkRHQUk+Xq6oo6deqgVq1ayMrKwo0bNypVciA3NxdJSUmoUaMG6tevD2trax2kJUNXu3ZtSJIEJycn+Pn5ITY2FiqVSnQsIiIinZs5c+assm6XDGHySJIk8SFI55RKJTIzM6FUKkVH0ZrMzEz873//w9ixY0VHISJ6yqJFizBy5Eg4OTmJjmJU0tPT0bJlS9y/f/+JidJTp07Bw8MDlpaWqFmzpsbjyLKM5s2bIzMzE4WFhSwtUQGjR4/G6tWrRceoMrp164azZ88iOzu70n1cvHjxqbrMVPWo1Wo0b94ct27dQn5+vug4REREOiXLcpmF01mmgEgDb775JidiichgTZkyReuX2FcFTk5OSEhIeGo38Hbt2qF27doICgrSyjiSJCE2NhaJiYnYsmWLVvok0oXDhw9j6tSpomOQCTAzM8OlS5cwcOBA0VGIiIiE4WQsUSWUlJSgTZs2OHDggOgoRETPZQhXwBirZ+0AHhtXpkbMAAAgAElEQVQbi2bNmqFZs2bYtWuXVsZq3749zp8/b1JXj5BpGT9+vEbP9/79+2PJkiVaTETGbMGCBVi5cqXoGEREREKwZizpjSnVjFWpVJg6dSpycnJERyEiei5JkpCbm4uAgABuGlVBsiyjoKAAt27deuL2kpISpKamIjU1FQUFBVCpVGjevLlGY1laWsLZ2RmSJOHu3bvIyMjQqL+qIDc3Fx06dBAdo8qwtraGo6MjqlWrhnPnzqGoqKhCx2dmZiI3Nxd5eXlo27atjlKSsbC1tYW9vT1sbGxw8uRJqNVq0ZGIiIi0jjVjSThTqRmbn5+P69evIyAgAHl5eaLjEBG9kJWVFTIyMrh5TiV8/fXX+OSTT577mE6dOmHVqlVo1KjRM1fTVsTbb7+NI0eOQK1W488//9S4P1Pl7OyM+/fva+VnThXTrVs3XLx4EZmZmRU+tm7duvj9998BPNoozMHBQdvxyIgUFhYiICAA165dQ25urug4REREWsWasURaEhUVhRYtWnAiloiIAABHjx5F69atK7XbfFlWrlyJK1eu4OTJk1zNTAbp8OHDCAsLq9Sxd+7cQZMmTdCkSROsW7dOy8nI2FhZWeH8+fPo2LGj6ChERER6w5WxpDfGvjK2devWSE1NRWFhIdLS0kTHISKqkFq1amHDhg3o3r276ChGJScnB9HR0eWaKKhVqxYkScKgQYO0UhtTlmUkJSWhd+/euHDhgsb9mRqujBUrKysLR44cQXBwcKX7sLe3R6dOnfDrr79qMRkZo/T0dOTn5+PBgwdo0aIF650TEZFJeNbKWE7Gkt4Y62RsUlISZs+ejXXr1hnNatgJEybAx8dHdAwAj1YSr1mzRnQMIgLQu3dvDBw4EKNHjxYdxajcuHED3t7e5X58o0aNEBISggULFmhlovCXX37B1q1b8fPPP2vclynhZKx49+/fx/bt2zF16tRKrwx3cXHB4MGD8fXXX7OUCqGwsBCrV6/GvHnzcPfuXdFxiIiINMLJWBLOGCdj4+PjceTIEYwYMUJ0lBdycHBA06ZNAQDffvst/Pz8BCd65MCBA/j8889x6tQp0VGICECPHj0wZ84ctGvXTnQUo5GQkIDQ0FBERUWhsLCwXMfY2dnh119/RevWrWFjY6Nxhp9++gkLFy5EVFSUxn2ZCk7GGga1Wo1evXrh7Nmzlb5yyMzMDL/99hv8/f1Rs2ZNLSckYzRmzBgcPnwYt2/fFh2FiIio0jgZS8IZ02RsSUkJAGDWrFmYM2eO4DRPMzMzg5nZkyWfg4KCEB4eLijR82VmZsLNza3COy8TkW7UqFEDaWlprEdaQd7e3rh582aFLp89ffo0WrduDUmSNP553717F/Xr1wfwaAKsqu8+zslYwxIWFoZt27Zp9LzcvHkzhg0bpsVUZMwWL16MqVOnQqVSiY5CRERUKdzAi6ickpKS4OTkBCcnJyxcuFB0nDLNmzcP6enpT7StW7eKjvVMDg4OSE1NNZjVukRElXHhwgW8++67FTqmW7ducHJyQqtWrTQe38PDo/ScP3HiRI37I9KmH3/8EZs3bxYdg0zIe++9h/Pnz4uOQUREpHVcGUt6YywrYxMSEuDp6Sk6xhMGDhz4xEqRli1bVqh+oaE4fPgw1q9fj7Vr14qOQlSlKRQK9OnTBwsWLEDDhg1FxzEq0dHR2LNnD6ZNm1ah46ytrREUFAQAGDlypEabHgFAbGws4uLiIMsyxo4di6ysLI36M0ZWVlbo1asXFi9ejLp164qOQ3j0gfbx48cBAF988QUuXbpUoePbtm2LkJAQTJ8+XRfxyAjdunWr9IoAIiIiY/OslbEW+g5CZMhu3bqFiIgI0TEAPPojs2fPnpAkCf369cPAgQNFR9JYt27dkJOTg8TEROzfv190HKIqq7i4GDt27MAnn3wiOorRadGiRbnrxv5dXl4efvnlFwCAra0t7O3t0aVLl0rn8PX1ha+vLwBg165diIiIQHx8fKX7M0aFhYXYsWMHZs6cKToK/cXd3b30/cr58+ef+OD48OHDePjw4XOPP336NFQqFZo1awYA6NixI2vIVnHVq1dHSEgI9u/fbzQb6RIREb0IV8aS3hj6ytj8/HwsX74cH374oegosLCwQJ06dXD9+nWTrIWXkJCAJk2aIDc3t0K1F4lIuw4dOoSOHTvC0tJSdBSjcu7cOXTp0gU5OTmV7qNFixY4duyYVjb3AoB3330XGzZsgCzLyM3N1UqfxuLEiRN46aWX+Dw2cO3bt8e5c+cqVD9+165dePXVV2FlZaXDZGQMmjdvjri4OBQXF4uOQkREVG6sGUv0Aq+99ho+/vhj0TEAACNGjMDVq1dNciIWeFT3MCMjg5edEQkWGBhY4cvtCWjdujVSU1Ph4OBQ6T5iYmJQs2ZNpKenayXTsmXLkJGRgeTkZFhbW2ulT2Px8ssv48svvxQdg14gMjISM2bMqNAx/fv3x3vvvaejRGRMzp07h/fff190DCIiIq3gZCzRX0pKSoTt1tqxY0dERESUtmnTpkGhUAjJoi8KhcJkJ5uJjEVJSQl++uknDB06VHQUoyJJEiwtLTU6h8myjMLCQgQHB6Nr16746KOPNMpkbm4OhUKB6tWrY+/evejQoYNG/RkTka/fVH4WFhYYOXIkNmzYUO5jSkpK8Ouvv6Jfv368kqaKUygUeP/99xEREYHDhw/Dzs5OdCQiIqJKMzeEOluzZs0SH4J0zsLCAtOmTYOFhWGVKi4uLsbatWvx+++/IzMzUy9jurq6IiwsDH5+fvDz88Orr76KwYMHw8vLC15eXnByctJLDtFSUlKQnZ2N5ORk0VGIqqzs7GwkJSXB0dERDRs25OXAFZCcnIy0tDRkZGRUuo+7d+/izp07yMzMhLW1NVq0aAEzs8p/Vi5JEurUqYP8/HwUFRXh1q1ble7LmNjY2EChUJTW0SXDZG9vDzc3N2RlZZX7kvPc3Fykp6dj6tSp/BC3inNwcICXlxfq1q2L1NRUjc+/REREujZz5sxZZd3OmrGkF5aWlnB3d0dcXJzB/aGflZUFJycnva2qsbW1RZcuXbBr1y69jGfo5s6di0WLFmntUl0iqrw///wTDRs2FB3DqIwYMQIbN27USl+WlpaIi4tDrVq1tFL/NDw8HOPGjQMAZGRkmHytRW9vb1y7dk10DConPz8/JCYmln6dkpLyzNWvjo6OiI2NhbOzs8F9qE/ijBo1CuvWrRMdg4iI6JlYM5aE6tq1K27fvm1wE7EizJ8/nxOxfzN9+nQcPHhQdAwiIuGKiopQr149HDp0SCv9BQcHIzk5GcnJyfDz89NKn0Tacvbs2dLnZ0JCAmrUqPHMx2ZkZKBWrVqIiYnRY0IiIiIi3eBkLJEeeHl5ISYmBjExMQgNDRUdx+D4+PggOjoajo6OoqMQVWm9evXCqlWrRMcwKvPmzcOKFSu02uc777yD5s2bo2PHjlCr1Vrpc9OmTYiJicGBAwe00p8hio+PR/PmzZGUlCQ6ClWQhYUFjh07hpiYGHz33XfPfNzgwYOxePFiPSYjQzZnzhz88MMPomMQERFVGGvGkl5IkoT8/HwEBARoVAtP265evYqVK1ciMjJSJxtDjBs3Dq+99hpeeeUVBAcHw8XFBdWqVdP6OMbOwsICrq6uUKvVSEtLw/3790VHIqqSMjIyUFBQgIKCAq6kLCc7Ozukp6drrVQB8Kh8TkpKCtLS0gA8uvze1tZWoz4dHBzg4uKCmjVrwtzcHJcvX0Z+fr424hoMlUqFlJQUfPDBB89dZUmGydnZGS4uLrC3t4e1tTVOnjz51HuzzMxM5ObmoqCgAG3atBGUlAyFnZ0dMjIyKrQpHBERkT6xZiwJp1QqkZmZCaVSKTpKqZ9//hmDBw/Wap/29vZwdXUFAPz222/w9vbWav+mbvr06diwYQMSEhJERyGqsnx9fXk5cAVERkZizJgxuHHjhk4+2Nu+fTu6dOkCZ2dnrfUZFBSE06dPm+TmN7dv30bdunVFxyANPJ5svXnzJvLy8p6639PTs3SVt7OzMxwcHPQdkQzE8ePH8cYbb+js/EtERKQJ1owl0pNhw4YhLi4OcXFxnIithLlz53IzBiIyKi+//DKio6N1Vhd98ODBmDt3rlb73LNnD959912t9kmkLUqlEjExMWjfvn2Z99+9exc+Pj7w8fHB6tWr9ZyODEmHDh0QGxtrUIs9iIiIXoQrY0lvDG1l7JgxYxAeHo709HSN+7KxsUF0dDQsLCxgY2PDFRoaKiwsRGJiInx9fU3uMloiY6BQKODu7o6zZ8+iZs2aouMYhfz8fDg6OqKgoEAn/dvY2KBZs2Y4efKk1vp8+PAhTp8+jVdffVVrfRoCd3d3zJ49G2+++aboKKSh1NRUbNy4EVOmTHnmY+zt7dGhQwfs3r1bj8nI0CQkJGDEiBGIiIgQHYWIiKgUV8aSUI0bN8aiRYugUChER4FKpcJHH32Ew4cPa2UitkWLFvjmm29Qt25deHp6ciJWC6ysrODl5YVFixbB19dXdByiKqe4uBjx8fGYNm0aTpw4IToOAcjJycGlS5cwYcKE0lqymrKzs8NLL72EpUuXalyT1pAkJSUhOztbdAzSAmdnZwQFBWHu3LmQpDL/lkFWVhbOnj2LiRMnllnSgKoGDw8P7stARERGg5OxpBd16tTB+PHjYW5uLjoK1Go1VqxYgdu3b2vcl7e3N4KCgvD2228b1MZkpsDMzAzvvPMOXnvtNZZ7IBLkxx9/xK5du3DlyhXRUQyemZkZ2rZtC3t7e52NkZOTg2XLliEiIgL37t3TSp+Ojo744IMP0KlTJ7i4uGilT0Nw584dxMbGio5BWtC4cWOMHz8e7dq1g7W1dZmPSUlJwX/+8x9eTVPF+fj48D0jEREZBZYpIL0IDAzE3r17RccA8GjFl4ODA3Jzcyt8rJmZ2RMrM37++Wf069dPm/GoDNu3b8ewYcOgUqlERyGqkrp27YoDBw4YxAdqhq5nz57Yt2+fzseZPXs2pk+frtUPAt955x2sWrUKAEzifOvv74+TJ0/yeWtC2rRpg/PnzwN4+jlqZmaGpKQkODs7P3MVLZm+rVu3Yvjw4SZxDiMiIuPHMgVEWvDvf/8b6enppa13796iI1UJ/fr1w82bN/kHNZEgR48ehbu7Oy8BNiBfffUVOnfurNU+Fy9ejPT0dCQlJcHGxkarfYtw/vx5uLi44MGDB6KjkJY8LjF19uzZp+5Tq9Vo2LAhtm/fLiAZGYqBAwfi+vXrvGKNiIgMGl+lqEq5fPkyXn/99UptsLJ8+XL07dsX9vb2pc0QauBWBY83E9q8eTMvPyMSoKSkBA8ePIBarRYdxeBNnz4d48eP1/k4BQUFuHTpEkJDQxEaGqqVTWuqVasGe3t71KxZE2vXrsXWrVsxefJkzcMKolKp8ODBAxjCVWCkHTY2NrC3t0eDBg2wdetWODs7P3H/w4cPsXjxYsybN09QQhJNoVDotFwMERGRNliIDkCkTykpKRVaMdGoUSM0atQIADB06FBuziWQQqHAkCFDcObMGezZsweXLl0SHYmoSpFlGb///juUSiVq1aqFl156SXQkg9SlSxfEx8dj+fLlOh/rwYMH2LZtGwDA1dUVSqUSAQEBGvcrSRIGDhwIAPD09MS1a9cAAMeOHUNmZqbG/euTLMvYu3cvOnXqBA8PD9FxSEtsbGwwdOhQTJs2DampqU/cd/LkSRQUFKB58+YAgHbt2j01aUumTaFQoE+fPjhy5AhXxhMRkUFizVjSi1deeQXh4eHP3HhBH4qKinDo0CEEBQWV6/FKpRL/+te/8Omnn+o4GVXUihUrMGXKFG7UQWUyNzeHlZWVVvssKSlBUVGRVvs0ZoMGDeKlwM+xefNmjB07Vu/nqC5dumD37t06e63t0aMHIiMjjfLfwqZNmxAWFiY6BmlZkyZNcP36dZSUlDzzMTt37kRQUBAsLS31mIwMQbt27XD69GnRMYiIqApjzVgS6tChQ6hVq1alygNoy8SJExEcHFzux58+fRoff/yxDhNRZY0bN467ZNMz9e7dG5mZmVpty5YtE/1tkREJDQ3FjRs39F6z8MiRI3Bzc9NZbd89e/bg3//+t076JqqMixcv4v3333/uYwYPHoy3335bT4mIiIiIXowrY0lvlEolMjMzoVQq9T72oEGDcPz4cSQlJT33cSEhIZgwYQIAICAgQOhKXnq+wsJCHDt2DACwZs0abNy4UXAi09egQQP88MMPomO8kLOzc+nlqdqSlJSEEydOlF66XdU5OzsjICAAO3fu5CYpz1BSUoLIyEiMGzcO169f19u4kiShc+fOmDdvHtq3b6/1/hMSEvDnn3+Wfr1u3TqsX79e6+NoW9OmTREcHMxaoiboo48+euGHBK6urggICMCOHTsgSWUuUCETFBUVhdWrV+ulbAwREVFZnrUyljVjyaRlZmZi586d2L9/Px4+fPjcxwYFBWHQoEHo3r27ntKRJqysrEp/V1lZWcjJycHOnTsFpzJe9evXf+HO7HXq1Kmy/z5ycnKQkJAgOobBSE1NxZEjR7gx0nNYWFigW7dusLOz0+u4sizjjz/+wNatW1FYWIiuXbtqtX8PD48naq9mZ2eXTshv27bNYMvHXL58GXXr1hUdg3SgTZs26Nu3L3bt2vXMx9y/fx8HDhzAmjVrEBISAicnJz0mJFH8/f1x9OhR0TGIiIiewslY0htZlpGamgpXV1e91O3KyclBdHQ0xowZ89zHSZKEmjVrYtasWWjTpo3Oc5H29e/fH02aNMGxY8eQnp7OHd+fw8nJqcyVjIGBgVw58gwPHjzA7t27MWnSJNFRDIparUZKSgqcnZ1hYcG3E4Zm6dKluHPnDpo1awYAcHBw0MnvKSQkBCEhIQAerUK7fv260JJEz1NUVIS0tDTUrFlTdBTSoqFDh8Lb2/u5k7EAkJubizfffBMtW7bkZCwREREJxWsLSW8KCwtRt25dHD58WC/jzZ07F926dXvh4+zs7JCYmMiJWCPXuHFjJCcnw8XFRXQUg2VmZoYrV67g/v37TzVOxD7bgAEDOBFbhuzsbNSuXRtnzpwRHYWe4f/+7//g6uoKV1dXREVF6Xy86OhojB49WufjVNbBgwdRt25dFBYWio5CRERERFUYJ2NJr3R9SWtycjJat26N1q1b48cff3zheI93hlYoFDrNRfrB2pVlmzRpEs6fP49z587B0dERkiQ91ehp+fn5aNOmDScbn0OWZYwcORLz588XHcVgbdmyBZMnTxY2vizLkGUZI0aMwIIFC3Q6liRJmD59Os6fP4+oqChUr15dp+NVRn5+Ptq2bYuTJ0+KjkKChIWFoXXr1ujVq5foKERERFRFmc+cOVN0BsyaNUt8CNIbhUIBc3NzNGrUSGt9rl+/Hnv27EFERAR++eUXJCcnIzc3t8zHmpmZYdKkSejatSt69uyJHj16aC0HiVdUVIS2bdvC09MTsbGxouMI984772Dw4MFo164d3NzcOGFdTtevX8fKlSuxbds2rqJ7gYyMDOTm5qKgoABt27YVHcfgODk5ITY2Fvv27ROaIyMjAzk5OSgqKtLplSB2dnZwc3ODm5sbiouL0b59e7i6uuLKlSs6G7Oi7t+/D+DR+wFtvhchcZKSksq9wWR6ejqSk5ORnp4OWZbh6+srZHNZ0o+SkhIolUq9XB1ARET0TzNnzpxV1u2SIWy+IUmS+BCkVyEhIVi8eDHq1atX6T5ycnKQnJwMABg0aBAuXrz4wmOqVauGunXr4ty5c6hWrVqlxybDd+bMGQwbNgw3b96scpsM1ahRo7Qm4uHDh5/YbIdeLDk5GeHh4Xj77bdFRzEqtWvX5iZnZbh79y6+//57g1k97ObmhsjISNStW1dvV4UcPHgQ48aNw82bN/UyXnn169cPixYt0ui9CBmGs2fPwt/fv1LHHjx4EHXq1IGVlRU8PT21nIwMQXx8vN428Ktfv36ZH3xnZWUhNTVVLxmIiMhwyLJc5mWonIwlYRwdHZGamlrplXpbtmxBWFhYhY7p2rWr3mrWknjFxcVwcnJCdna26Ch6NXXqVHz99deiYxitoUOH4qeffhIdw+hwMrZsfn5+OHfunOgYT7l69Sp8fHz0Nl5OTg4cHR1RXFystzHLw8nJCampqSzXYuQ0mYx9rFWrVjh//ryWEpEh0ddkrEKhQEZGBmxsbJ66b8WKFRg/frzOMxARkWF51mQstz8mYR48eIB69epV+g+gZ5UhIHpMoVDg0qVLGDt2rPBLhMnwqVQqtGrVyuBW7xmL5ORkeHl54dChQ6hfv77oOPQC3bt3h0KhgI+PD/bu3avz8apXr47r169DlmVs2bIFn376qc7HLI/MzEx4eXnht99+g6+vr+g4VEnNmzfHrVu30Lp1azx48EB0HKpi5s2bh2HDhkGSpGfWyh4+fDiCgoJKv16yZAmWLFmir4hERGRgOBlLwqjVasTHx+ttvCFDhmDQoEF6G48Mg6enJ9577z24u7tj3bp1ouPo3OTJk9G3b1/RMYzOnTt3sGTJEvz5558oKioSHccoqVQq3LlzB7Nnz8awYcMQGBgoOpIwUVFR2Lx5MwAY7Grhe/fuAXi0YnXKlCmYNWsWbG1tdTaeJEmoU6cOAKB3794oLCyEIexb8Pi9yPz58zF06FCeP42UpaUl6tSpA3Nz80r3kZiYiClTpuDLL78sc2Uj0d+5ubnh448/BvBoQ+AXrby1sbF54nk1dOhQWFlZ6XxjRSIiMlCPd9kV2QDIbGy6bK1atZJ37twpU9V18OBB+aWXXhL+XNRVs7CwkP39/eUrV66I/lEbnTt37sg//vij8N+hKbVZs2aJ/rUKExcXJ8+cOVP476Cibc+ePXJKSorefk5paWmyv7+/XK1aNeHf++M2depUvX3/pH0qlUp2cnLS6DkgSZK8b98+OTU1VfS3Q1qUnJws+/v7y1ZWVlo5V9SuXVsODQ3VONetW7dkf39/WaFQCD//sbGxsbHppsnPmAflttpk8szNzfH7778jJCREdBQSqHv37oiIiICVlRXMzMxMqj6gJElwdHTE6dOn0bhxY9FxjIparcb8+fPx5ptvio5iUmRZhlqtFh1D79RqNd5//32DWPFZUUFBQfjf//4HtVqtl9+dk5MTzpw5g4YNGxrM+Vhf3zvphlqt1njDTlmWERgYiPDwcD4XTIirqyvOnDmjlQ3azMzM8M4772DLli0a9+Xl5YUzZ87A1dXVYM6DRESkH5yMJZPWtGlTpKSkwMXFRXQUMgC2trZISkpCamoqFi5cKDqO1gwZMgRxcXF8I18JLVu2xOrVq0XHMDkLFizAyy+/LDqGXhUUFMDT0xMRERGio1TalClT4OzsjFq1aiEnJ0cvYx45cgTTp0/Xy1gv8t1338HPz090DKqECxcuwNXVFZmZmVrpb8KECSxtRWU6fvw4PvroI632GRMTg1GjRmm1TyIiMmycjCWT9M0332DTpk1YuHAhHB0dOUlFpRwcHODo6Ijg4GAsW7ZMdBytsLS0RI0aNUTHMCpJSUkICwvDjRs3UFhYKDqOycnPz8elS5cwfPjwKrGZTnR0NEaNGoX79++juLhYdJxKy8vLQ0ZGBlJSUvDmm28iLCwM3377rU7HtLe3R2hoKL755hudjlMeBQUFePjwoegYVAkqlQoZGRkar4x9LDc3FydOnEBYWBjCwsJw4cIFrfRLYi1atAjBwcGVOtbR0RGbNm1CkyZNoFQqtZqrRo0aWu+TiIgMGzfwIpMUHByMhg0bio5BBqxhw4ZwcnLC7t27ERkZqbdVYCTejRs3cOjQIa1cYkjPlpWVVfqhmKl8WBAREYH8/Pynbj979iy2bdsmIJFuyLKMn376CQAQHx+Ppk2bokePHjr7YNPX1xeWlpZaX21WGXl5edizZw+6du2KatWqiY5DAiUnJ5e+TjRu3BiSJKFly5aCU5Em+vbtixMnTiA8PLzCx1pbWyMsLEwHqYiIqCriZCyZFEmSoFQquRKWysXR0RG7d+9G69atERsbi5KSEtGRKszS0hKWlpaiYxiNwsJCrF+/Hl9++aXoKFVGQUEBVCqVRruci1JYWPhE3cg33ngD8fHxpa81f/esibuCggKtrdYT4dixYxg4cCASExNRvXp1mJmZ9kVVycnJ6NWrF27evIl69eqJjkMG4osvvsClS5ewbt06rmA0cgqFAlZWVhW6Ksbc3Jy/dyIi0irTfkdNVY6HhwcyMjLQoEED0VHIiJw6dQqffPKJ6BiVsmPHDixfvlx0DKMREBCAuXPnio5RpTRp0gRr164VHaNSOnfuDEdHx9IWHx8PAHjppZeQkZFRrmYKK+mys7Ph7OyMU6dOiY5CJMzPP/+MevXqQaVSiY5CGpgxYwYiIyMrdMy4ceMQGxuro0RERFQVcWUsmYx+/fph8uTJ/OSaKszS0hKjR4+Gl5cX3nrrLdFxKkShUEChUIiOYdB2795dWvfy2rVrRrkC2pgVFhYa3c88KysLQ4cOxZUrV9C3b1+MHTv2ifvt7e3L/VpjKitJCwsLMWnSJNSoUQNOTk7YvHmz6Eg6NXLkSEycOJGbOFEptVqNgoIC0TFIQxYWFrCysir345csWYKePXtW6BgiIqIX4WQsGTVLS0sMGTIEkiQhMDAQnTt3Fh2JjFSDBg1gbW2NESNG4Oeffy6zLiQZn71792L79u3Yt2+f6ChV2smTJ+Hl5YWePXuKjvJc+/fvR/Xq1dG8eXO4urqiX79+6N27NwIDA0VHMwinT58G8GizmfXr1yMkJAT29vaCU+nG0aNH4eHhAaVSiT59+oiOQwaiqKgIGzduhJmZGRo3bow2bdqIjkQ6ULdu3dK/KXr37g1vb2/BiYiIyNRwMpaMlqWlJTw9PbFu3aweJfAAACAASURBVDqTWXlEYrm7u2P9+vU4d+4cbty4YfArYJycnFgv9jnS09Mxc+ZMnDx5UnSUKm/t2rW4evUq/P394eTkJDpOqezsbMiyDDs7OwDA8uXL4eHhgQ4dOmDdunWC0xmuBw8e4I033sCxY8fQunVrk93oauvWrbh8+TLat28PR0dH1qMn5OXlYdSoUQAeXbru7e0NBwcHsaFIq2xtbdGjRw+sWrVKb2NmZmYa/HtOIiLSLskQNpWQJEl8CDI6oaGh2LRpEydiSetkWcZ7771n0LVYq1WrhrS0NFhbW4uOYpBycnJQs2bNCm3QQbpnZWWFtLQ02NjYiI4CAAgLC0N2djZ27doFAKUbbWlz0s3Pzw/nzp3TWn+GRJIkzJw5E//617807uvPP/+Ej4+PFlJpn4WFBZKTkw3qgwR60tmzZ+Hv76/3cevWrYvbt2/rfVzSTHR09DPree/cuRPBwcF6/fDF09MTCQkJehuPiIj0R5blMl9QOItFRkuSJE7Ekk5IkoRPP/0UP/74o+goz8Xnf9kiIiLQqVMnFBUViY5C/1BUVIROnTrh8OHDOh8rNDQUfn5+T7StW7c+8ZivvvoKS5YsKf1akiSufqwAWZaxYsUKk6+rWlJSgu7du+PXX38VHYUMzL1790rPL2vWrBEdhzSgUCgQGRmJrl276v11QK1W63U8IiISj2UKyCj179+fNdxIpzw9PdG9e3dMnjwZ3333nUFN7Hl5eWHYsGGwsOAp/J/+7//+Dzt27MDFixdFR6EyyLKMixcvYu3atXj48CFCQkK01ve2bdsQHx9f+vUff/wBLy8vvPzyy6W3eXh4PHFMvXr1tDZ+VZWUlITIyEgsXLgQAPDqq6+iVatWglNpX3R0NDZt2oSCggKTn3ym8isuLi5d+f7TTz9BkqTSMgZk+MaPH196pYa5uTnatWvHTVGJiEg/ZFkW3gDIbGzlbXXr1pUPHTokE+lDcXGx7OvrK1tbWwt/7j9uQUFBon8sBun27dtyz549hf9+2MrXevbsKd++fVvj37tarZZv3bolt2vXTraxsZG9vLxK2zfffKOFZ5ZmXnrpJeE/a322L7/8Ur53716Ff05xcXHCs5enderUSb5165asVqt18GyhyoqKihL+3AAg+/j4yLdu3ZJLSkpE/0joOa5cuSLXr19fTktLE5qjqKhIvnnzpuzi4iL8ufv/2Lvv8KjK/H/j75NKgARIAgFpWTqIFIMICCpRFFxWyoKCwopiV2C/LgtrQWXBgqKgSFGXqqKyItJEBAQkILj0Jp1Qk1CSkATSZub8/gDzW5YWIJlnyv26rnNdmZI5N0qYmc+cPIeNjY2NrXg2+xJzUNaMhVcJDQ1Vamoq62TC7bp166aZM2eazpAkdejQQd9//73pDI/icDgUGRmpzMxM0ym4CuHh4Tp58uR1HYmUlZWlyMhI5efnq2/fvm496Uph+PKasZdy2223KSEh4aq+x5PXjP1fJUqUUGpqqs+euMwbmVoz9mIsy1JSUpJiYmJMp8DD7dixQ/Xr1zedAQAoRjZrxsLbtWrVSr/99htvfmDE2LFjtXfvXm3ZsoXlATzM2rVrVbt2bWVlZZlOgRs9//zzqlmzpm699VZt3bpVe/fu1dtvv206Czr7M1m3bl1+JgEAAICLYKIArxEWFsb6fjDm9yNcsrKyjJ7gp2fPnvrzn/9sbP+e5ttvv9U333zD2ay9VG5urgYPHlzoDziqVaum559/XpJ0zz33qHr16goJCVGtWrU87oR2Z86c0dChQ3XkyBHTKW6Xm5urPXv26OWXX9bjjz+um2666bL3/+mnnzR9+nQ31V2//Px8vfTSS3r66ae95mheXzZv3jzNmDHDdEYB27Y1dOhQ9enTR82bNzedAwAAPBDDWHiF6tWrq3bt2qYzAAUGBqpp06bavn27kaO+OnbsyDD2nC1btmj69Okes3wErl5eXp5GjRp1ydtLlCihBg0aFFxu3Lhxwdf3339/sbZdr+zsbI0cOdJvz5Ltcrn04YcfqkqVKsrPz7/sfb/55htNnDjRTWXXz+l0avTo0brnnnsYxnqAZcuW6bPPPjOdcZ7x48crOjpaQUFBsixLjRs39rgPjGBWUlKStm3bZjoDAGAIw1h4hddff52z08IjhIWFac2aNWrbtq2WL18ud667bfKIXE/jcrl07733KikpyXQKith//z2vUaOG1q1bZ7AG12vQoEGmE4pNwQkY+LcZFzFs2DANGzZMwcHBOnnypMLDw00nwUPYtq2PP/5YQ4cONZ0CADCEj2gB4BrMnj1b7733ntv2FxAQoN27d3NUrM6e6KdChQpKSUkxnYIiFhMTo2PHjun48eM6fvy4Vq5caToJuKSePXvqmWeeMZ0BD5efn6/Y2FgtXLjQdAo8ROvWrfXOO++YzgAAGMSRsQBwDSIiItSxY0cFBQWpf//+btlnZGSkQkND3bIvT/Xtt99q6tSpOnnypOkUXKe4uDgNGDDgvOvCwsIUHR1tqKhohYeHa8qUKRoyZIgOHDhgOgfFICMjQwsXLtSzzz6rsWPHcoQsLik1NVUjRozQ7t27C9a9hv9KT09Xdna26QwAgEEMYwHgGtWuXVvh4eHFPoyNiIhQy5YtFRwcXKz78VS5ublasWKFbNvWrFmzNGfOHNNJuAZly5bVLbfcUnD5tttuU+/evQ0WFa+QkBD17t1bo0ePZhjrwxITE/X111/ro48+YhhrQEJCgtf8fC1dulT5+flq2LCh7rjjDv6+AADgxxjGAsB1sCxLJUqUkCQ5HA45HI4iffygoCDddNNN+uGHH4r0cb2F0+nUkSNH1K5dO9MpKKSQkJCLnqimadOm+vHHHw0UAcXLtm3l5OQoLCyMAZub9e7dW4mJiaYzCi0hIUGdOnXSkSNHVKpUKf6++KHc3Fy/PbEjAOD/Y81YALgOMTExSk1NVWpqql5++eUif/w333xTS5YsKfLH9RYTJ07UjTfeaDoDV+HXX38t+Jn4781fP1CA70tLS1N0dLTWr19vOgVeICMjQxUqVNCaNWtMp8DNkpKSVK5cOe3cudN0CgDAMIaxAHCdwsLCFBYWpt69e+v777/X999/X2TrXgYHB/vtOrEDBw7UBx98oJycHNMpuIKOHTsW/N2vWbNmwc/Ef28hISGmM4Fik52drf79++vTTz81neIXkpOTdd9993ntiRyzs7P1f//3fxo/frzpFLiRbdvKzs6WbdumUwAAhrFMAQAUkZo1a6pmzZqSzp5le+HChdq1a9c1P16nTp1Ur169osrzCgcPHlRCQoIkae7cudf13w/Fp1q1arrtttsKLt97773q0KGDwSJcqzJlyui+++4zneF2S5cuVXJycpE+5qpVq1SmTBlVqFBBnTp1KtLHxv+3fft2/fjjj1qwYIHplOuyevVqhYeHq0yZMpLOfqgVERFhuArFZffu3Vq4cKHpDACAh7A84ZM5y7LMR8CjTZ48WX369DGdAVyVf/zjHxo3bpwyMzOv6fsTExNVvXr1Iq7yTOnp6bJtW7Nnz9ajjz5qOgf/IygoSOHh4QWXO3XqpMmTJxss8h5xcXEe/evrDRs21JYtW0xnuN0999yjRYsWFctjV69e3avWMfU277zzjgYPHmw6o8itWbNGjRo1KliHHr5lwoQJeuaZZ0xnAADczLbtiy4QzzIFAFBM3nrrLX3//femMzxefn6+qlevrujoaD322GOmc3ARrVu31okTJwq2SZMmmU4CAJ/SsmVLvfvuu6YzAACAG7BMAQAUE8uy1KRJE61evVp33323srKyTCd5nPXr1+vJJ59UVlYWZxf2ILVq1dL06dMLLkdERCgggM9vgcJISkrSLbfcolmzZqlKlSqmc3xK7969tWzZMtMZxcLlcmnChAnatm2bvvrqK9M5KEJPPPFEsR2JDwDwTgxjAaAYlS5dWs2bN1e/fv3073//W3v27DGd5BE++eQTnTlzRnv37tW6detM5/i98PBw9e3bt+BypUqVdMsttxgsAopX165dFRAQUCxrOObl5Wnt2rWaMGGCOnfurGbNmhX5PvzV1q1bdfjwYdMZxebo0aNavny5PvjgA/Xt21elS5c2nYQi8Ntvv+nAgQOmMwAAHoRhLAAUM8uy9Oabbyo9PV2nTp3S8ePHTScZ8fufX5JefvllnThxwnCR/woICFCVKlVkWWeXMKpUqZJGjRpluAruVqZMGd1www2mM4x4+umnVapUqWI9oc4bb7yh0qVLM4zFVUlOTtZf//pXtWrVSvXr12cgCwCAD+J3DgHATcaNG+fXa22OHDlSsbGxio2NZRBrWEREhPbt26fExEQlJibql19+MZ0EAwYOHMjZvQEP1bx5c33xxRemMwAAQDHgyFgAcKP4+Hht3rxZcXFxys/PN51T7JKTk9WmTRtJ0smTJw3X+KdWrVpp6tSp510XEBCgwMBAQ0WA/xg5cqRWrVqlOXPmmE6BFxoyZIhGjhx53nXt2rXTuHHjDBXhamRlZemWW25RYmKi6RQAgIdhGAuvMGPGDDmdzvPWNAS8UcmSJVWvXj29+eabmjBhgvbu3Ws6qViMGzdOhw4d0qlTp1gn15BXXnlFpUqVUmxsrGrVqmU6B/A4TZs21ZAhQzRs2LBi28fJkyd16NChYnt8+Lbjx49fsLRRQECAXnzxxUI/RqdOndSiRYvL3uc///mPFi5cqFdeeeWaOnGhrVu3atKkSdq1axcnKAUAXIBhLLzCggULlJqaqubNm6thw4YF6xwC3ig4OFgDBw7Uzp07lZub61MnI3G5XNq6davGjx+vrVu3ms7xC8HBwapfv/551wUEBOhvf/ubypYta6gKnq5u3bqqUKGC6QyjGjZsqEqVKmn48OGybbvY9pOdna3Nmzfrxhtv5Ij0a+R0OrVt2zbl5OSYTjFu165devvttwt9f8uyVLJkycveZ/78+froo4/0pz/96Yqvsfl7XDg7d+5kLXYAwCVZxfnis9ARlmU+Al4hNDRUqampV3xRCXiLyZMnFxzx/b//Hu/fv1+xsbEGqq6NbdtKT09X+fLl5XQ6Tef4rP99o1y9enXt37/fUA2uJC4uTuvXrzedcYEdO3aobt26pjOMO3nypMqXL1+sw1jp7M9tcnKy3w/Ar9WJEydUoUKFYv//hMsLDAzU8ePHVbZsWQ6MuIJvvvlG3bt3N50BADDMtu2LPmFyAi8AMKhnz55KSUlRSkqKGjRocN5tcXFx+uSTTwyVXZ1evXopJiZGderUYRBbjLp161bw9+X3bd26daazAFyBbdtq0KCBvvrqK9MpwDVzOp2qU6eOYmJi9NBDD5nO8Vh9+vRhaTUAwGWxTAG8Sn5+vp555hkNHDhQN910k+kc4LqVKFFCJUqUkCS98cYb+vzzzzVz5kxJUmpqqiZNmqRjx4557DpuDodD/fr1088//3zBuna4Po8//rhatmx53nW1atVS+fLlDRXhWrz22mv64osvNGPGDNMpMOzkyZMaO3asDh8+rIEDB5rO8RqrVq3SRx99xFGxHuLEiROSpBUrVuiZZ57RmDFjFBTEW0rp7LC6X79+Wrp0qTIyMkznAAA8GM+c8Coul0vTpk1TvXr1ZFmWGjZsaDoJKDKdO3dWdna2jhw5otWrV0uS1qxZo/T0dLVq1UrS2bXaYmJiTGYWSEtL06+//qrJkycrNzfXdI5Xa9GihcLCws677sEHH9Tdd99tqAhF5f7779fu3bsZxkKSlJCQoKysLDVr1ky33367AgL4JbXL2bhxo+bMmaMvv/zSdAr+x+HDhzV58mR17txZwcHBqlSp0gXrl/sbl8ulzz77TFlZWaZTAAAejjVj4bUeeOABffbZZwoJCTGdAhSpPXv2qHbt2he9bcqUKerZs6ckGf2773A4tGTJErVv395YgzcJDg6+7Pp6u3btUvXq1d1YBHcaPXq0Bg8erLy8PNMpBVgz9ix3rRn7v0JCQpScnKwyZcowkL2EvLw8tW/fXkuXLjWdgkL4y1/+ok8//VTSlZ/zfJFt2zp9+rQqVarEMBYAUOBSa8YyjIXXCgwMVExMjA4dOsQbGfiUyw1jQ0NDFRgYqLCwMB05ckShoaFurjvr6aef1tSpUzmzdSHNnTtX8fHxl7w9LCzM7964+hOHw6Fdu3bpxhtvNJ1SgGHsWaaGsZJUsmRJzZw5kw+1LiI7O1tVqlRRenq6XC6X6RwUQlBQUMGHxKtXr/a75cQSEhJ077336syZM6ZTAAAe5FLDWJYpgNdyOp06duyY7r//fr333nu8qYRf+H05gJycHHXp0kVvvvmmmjRp4taGXr16afny5Qxi/8vjjz+uTp06XfL2Fi1aqGTJkm4sgicJCgq6YBkK4MyZMxoyZIg2b96sQYMGmc7xGGvXrtWrr77KINbLOBwOORwOSdKzzz6rvn37qk+fPmaj3OSjjz7SF198wSAWAFBoDGPh1RwOh+bPn68WLVqoY8eObh9KAcWhdOnSeuCBBzRv3rxLvrB3uVxasGCBbr75Zu3atUuhoaGXHQYWhfT0dC1cuFBz587lxBT/pWPHjurSpYvuu+8+0ynAFYWFhelPf/qTwsPDTad4hJCQED3wwAP68ccflZaW5vb9r127VjVr1nT7fj3ZsWPHtGDBAtMZuA4JCQmKjIxUhQoV/OK5cfPmzQVr/QMAUBgsUwCf8dRTT+ndd9/lDSZ8RmxsrA4cOFCo+0ZGRmrPnj0KCAhQaGioSpQoUaQtOTk5Wrt2rdq0aVOkj+sLDhw4oGrVqpnOgIfbv3+/atSoYTpDVapU0aFDh0xneJy4uDitX7/eyL67dOmiKVOmKCIiwsj+PcmZM2c0f/58PfDAA6ZTUARiY2O1ceNGRURE+OxSPBkZGerfv7+mTp1qOgUA4IEutUwBC23CZ3zyySdq1KiR6QzAiNTUVJUvX15RUVF67bXXivzx//nPf+qOO+4o8scFAH/33XffqUaNGnI6naZTjHvyySfVo0cP0xkoIomJiYqKitLBgwdNpxSbBg0aaNq0aaYzAABehmEsfIZt2zp69KhatmzJUT/wS06nU06nU59//rlatmypli1bKikp6boft3v37po6dSpr9wFAMbBtW+np6WrdurW2bt1qOsfttm/fXvCc9cMPP/Bc42OcTqc6d+6sr776ynRKsXA6nUZOAAgA8G6sGQufkpeXp9WrV+vTTz9Vp06dFBcXZzoJcLujR4/q6NGjkqSPP/5YUVFRkiTLsvT4448XegmDzMxMTZkyRcuWLdOJEyeKrRcA/J3T6dTq1as1adIkde7cWbfffrvpJLdYsWKFZs+ezXqbPm7jxo2aMWOGAgICfGYJitTUVH3xxRc6ffq06RQAgBdiGAufNGzYMOXl5alKlSqKiYkxnQMYM3To0IKvLcvS7bffrnLlyqlEiRIqX778Rb8nJSVFeXl5OnLkiPr37++uVADwe6NGjVJaWppq166tSpUqmc4pVklJSZo6daomTpxoOgVuMGvWLO3bt08tW7aUJEVFRalkyZKGq65NRkaG1q9fz2skAMA1Y5kC+KwRI0bo7rvvNp0BeAzbttW4cWNVq1ZNvXr1uuT9OnbsqGrVqhW8YQIAuM+UKVN06623ms4odi1atGAQ62c2bdqkatWqqVq1avr2229N51yzd999V+3atTOdAQDwYgxj4dN27dqlevXq6eTJk6ZTAI+SkJCgunXrXnTbsmWL6Tyv0rZtW86iDK/Qq1cvLV++3HQGCiEpKUl169ZVYmKi6ZQi9fHHHxc81/y+nA7809///nfVrVtXjRs3Vm5urukcAADcimUK4NPy8vK0c+dODR8+XL169WINWXiF1NRUvf/++0pLSyu2fZw5c0a7du0qtsf3J/v27SvW/1fwDZGRkRo2bFix/2xfTrly5VSjRg0j+8bVcTgc2rVrl0aMGKEePXrojjvuMJ10TdatW6dZs2YVXF65ciXPPZAkJScnKzk5WWFhYV51AqzRo0dr2bJlpjMAAF6OYSz8wujRo1W2bFlFRkbqD3/4g+kc4LJSU1P1xhtvmM7AVUhKStK+ffsYdOGSSpYsqc6dO+vjjz9meI9CmzBhgoKDg1W5cmXVqlXLdE6huFwubd++XZL0/fff83yGSypdurTq1q0ry7JMp1yRw+HQjh079MEHH/jcEesAAPezPOGTSMuyzEfAL9x5551aunSp6Qzgsvbs2aPatWubzsBVat68udasWWM6Ax5q//79xof1/fr104cffmi0wVPFxcVp/fr1pjMuqWHDhl6zhExWVpaioqKUl5dnOgUern379lqwYIHpjEI5fPiwqlatajoDAOBlbNu+6CeOrBkLv7Jq1SpVq1ZNZ86cMZ0CwMds2LBBVapU0alTp0ynAPAxO3bs0A033KDjx4+bTrmkp556SpUqVVKtWrUYxOKKxowZo+nTp5vOAADACIax8Ct5eXk6cuSInnvuOW3evNl0DnCBH374Qa+88orpDFyD/Px8HT16VP3799d//vMf0znwMNHR0frkk08UHR1tOgVeyOFwKCkpSS+88IJWrFhhOuc8LpdLAwYM0OLFi5WcnKyUlBTTSfBglmVp1KhRat++vcqVK2c6p1CWLl2qQYMGmc4AAPgQ1oyF33G5XJoyZYrq1KmjoKAgNWjQwHQSUGDjxo36+uuvTWfgGtm2rWnTpql69eoKCwtTw4YNTSfBQ4SHh+uJJ57Q22+/rRMnTrh9/02aNPGaNUdxaZ9//rkqVqyoiIgINW7c2GjLnj17dOTIETmdTk2cOFGnT5822gPPFx4ermbNmqlv374KDw83nVNoW7du1Zdffmk6AwDgQxjGwm+99NJLWrdunb788ksFBwebzgHkcDjkcDhMZ6AIDBs2TGvXrtX3339vOgWQJI0fP14tWrQwnYEiMHLkSP36669avHixkdcv+fn5kqT3339f48ePd/v+4Z0CAwN144036qeffjKdclUcDoecTqfpDACAj2EYC782e/ZsxcbG6uDBgwoMDDSdAz/Xpk0bjz6BDADAM6xcuVIVK1bUgQMHVLp0abftd/fu3br55pslSbm5uW7bL7zf8OHDNWDAANMZV+2ee+7RqlWrTGcAAHwMw1j4NYfDoWPHjqlLly565513VK9ePdNJ8EMZGRn6y1/+ou3bt3PSEx+yfv16PfDAA/riiy84+h5AkXI6nUpLS1OPHj0UFBSkVq1aFcualunp6erTp0/B5YyMDGVlZRX5fuDbPvnkE8XHxyssLMx0SqGdOXNGvXr10saNG/ngAQBQ5BjGwu85HA7NnTtXLVq00H333acmTZqYToIfSUxM1NKlSzV79mzTKShiKSkpmjdvnlwul+kUAD7Itm3Nnz9f0tl/b2rWrClJuvPOOxUVFXVNj/n7a6Lf/91KT0/n+QnXrFSpUmrfvr06d+6s8uXLm84ptMOHD+unn37S7NmzeQ4HABQLhrHAOS+//LISExP1/vvvu/VX/uC/zpw5o3nz5qlfv36mU1CMMjIyFBkZyVIofs62bWVmZvLGHsVi9erV6tatmyRp4cKF17w+cFZWlh588MGCdWGBaxUSEqJatWrpm2++MZ1yVc6cOaMlS5acd0Q4AABFzbJt+/J3sKwSkn6WFKqzw9tvbNt+zbKsP0j6SlKUpHWSetu2nWdZVqikaZLiJJ2U9KBt24lX2MflIwA3sSxL1apVU2JioukU+IF77rlHS5YsYTjj44KCgrRw4ULFx8ebToFBiYmJql27trGT9P3yyy+cwOsy4uLifGbN7sDAQFmWdc3fz4kkURQGDhyot99+2+s+iOzSpYvmzp3LSbsAAEXCtu2LvigLKMT35kqKt227saQmktpbltVC0ghJo2zbriUpTVLfc/fvKynt3PWjzt0P8Aq2bSspKUmtW7dW69atNXnyZNNJ8EE5OTlq27atfv31VwaxfsDhcKh///4aPny46RQY8tVXX6lbt24MueAWTqdTDofjmjfgek2ePFnPP/+81w1ipbM/PwxiAQDF7YrLFNhnD539faX+4HObLSle0kPnrp8q6XVJ4yV1Ove1JH0j6SPLsiz7SofgAh4iLy9PK1eulCSVLVtWYWFh6tGjh+Eq+IoDBw5o1qxZSkhI4E2vH9m2bZvmzZunypUr69FHHzWdAzfIzc3VpEmTZNu2Fi9erHXr1plOAnBO48aN1apVK9MZRjkcDk2cOLFYPhRu0qSJqlevXuSPW5xcLpcmTpyo/fv3m04BAPiBQq0Za1lWoM4uRVBL0lhJeyWl27b9+yThsKTK576uLOmQJNm27bAs65TOLmVw4n8e80lJT17vHwAoTvPnz9fu3bt1++23q2LFigoIKMzB5MClbdmyRf/3f/9nOgMGrFmzRnv37lWfPn2u61eI4bmSkpL0+2fPqampeu6552T6s2jLslSxYkUFBwcb7QA8SXx8vN5//33TGUZlZWVpypQpRTqMDQgIUExMjIKCvOu0JHl5eUpKStKAAQOUnZ1tOgcA4AcK9Uxp27ZTUhPLsspKmiWp3vXu2LbtTyR9IrFmLDzbrl27VKVKFR05ckSVKlUynQMA8EDZ2dmqWbOmx72Rj4iI0MGDB71uOALA+0RFRenw4cNed/BCQkKC7rrrLtMZAAA/clXPlLZtp0taKqmlpLKWZf3+yr6KpCPnvj4iqaoknbu9jM6eyAvwWqaPbIJveOGFF/TMM8+YzoBBaWlpatCggX777TfTKSgigwcPVv369dW0aVPl5OSYzrkojsQG8L9KlSqlLVu2qFmzZkX6uPx7AwDAlV3xMAnLsspLyrdtO92yrDBJ7XT2pFxLJXWT9JWkRyTNPvctc85d/uXc7T+xXix8wXvvvaeePXsqLi7OdAq8yOHDhzVx4kRJ0o8//qjDhw8bLoJJTqdTO3bs8NihHS4u48KXfgAAIABJREFULS1NH3744UVv++GHH7Rjxw43FwHA9bEsS3Xr1lWpUqWK5PGaNGmihx56yOuGsXPmzNHMmTNNZwAA/ExhfmetkqSp59aNDZA0w7bteZZlbZf0lWVZwyVtkDTx3P0nSvrMsqw9klIlceYj+IT33ntPpUuXVlhYWMELWG/7NSy4V3JyslasWKHXX3/ddAo8zL59+1S5cmVVqFDBdAouY9++fcrJydGhQ4f4OQaAy2jSpIn+/ve/m84olNzcXO3du1eSNGXKFM2aNctwEQDA31xxGGvb9mZJTS9y/T5JzS9yfY6k7kVSB3iYoUOHaujQoQoODtbJkycVHh5uOgke7M0339SYMWNMZ8ADdevWTYMGDdKIESNMp+Ayunbtqk2bNpnOAAAUoR07dqhJkyamMwAAfozD+oBrkJ+frzp16mjBggWmU+ChbrnlFk2aNMl0BjzYuHHj1LJlS9MZfu3ZZ59V5cqVL7lt27bNdCKAIrZgwQKOdP8vM2fO1MiRI01nuM3bb7+tdu3amc4AAPg5Tq0LXKPk5GR98MEHOnTokJ588knTOfAQx44d02uvvabffvtNp0+fNp0DD5aVlaVt27bpmWee0RtvvKHIyEjTST7tpZdeUlpa2nnXLVq0SEePHjVUBMCE8uXLKyIiwnSGx4iKilLZsmVNZ7hNRkaGjh8/bjoDAODnGMYC12HhwoXKzs5Wo0aNdOutt3rdSQtQtA4fPqwVK1ZowoQJplPgJTIzMzVhwgQNHjyYYWwRyc/P15o1ay64fuLEiTp27JiBInOio6MVFxfHcxMASFq3bh0nUgUAeASGscB1+vnnn3XPPfcoJSVFJUqU4E2vH3I4HJKkyZMn69VXXzVcA2/kcDjkcrk4KeBVcDqdsm37gutTUlJ0++23X/Q2fxMfH6+vv/7adAYAGOdwOPTggw8WnLgLAACTeNcHFIHMzEzFxMRo9erVplPgZseOHVNUVJQiIyP1xhtvmM6Bl2rSpInGjh1rOsOrPPbYY4qMjLxgq1+/PoNYAECBw4cPKzIyUvv27TOdAgCAJI6MBYpMZmamXnzxRUVHR6tcuXL69NNPTSehmC1evFijR49WRkaG6RR4udOnTysvL890hsfbuXOnXn75ZUnS6tWrlZmZabgIACBJS5cu1WOPPaaJEyd61G+JzZs3T+PGjeP5AgDgURjGAkVo+fLlkqRy5crpj3/8o+666y6Fh4cbrkJxSEhI0Lfffqv58+ebToGP2Lp1q5YtW6Y777zTdIrbrVixQidPnrzi/Xbs2KGZM2e6oQj+pm3btsrNzdW2bdtMpwBe6cCBA5o3b57pDEln1w7//vvvZdu25syZowULFphOAgDgPAxjgWKQlpamLl26aMuWLWrYsKHpHBSDV155pWD4DhSFKVOmaOPGjVqxYoVKly5tOqfYZGVlXXDdoEGDWOYFRo0cOVIxMTEaNGiQ6RQA18HhcCg5OVldu3aVy+UynQMAwEUxjAUAwENs2rRJ0dHROnr0qCIjI03nFDmHw6Hq1atf8Ouiv58EDwCA6zFz5kz17t2bQSwAwKMxjAWKUZ8+ffTss8/qscceM52CIpCQkFCwXuWmTZsM18AX2bat3Nxc/fGPf9Q///lPtWvXznTSVevdu7cOHjx40dtcLpdOnTolp9Pp5ioAQHFLT0/X7bffrsmTJ6tWrVpGGlwul/Lz843sGwCAwmIYCxSjdevW6cCBA6YzcB1s29a0adOUl5enDRs26OeffzadBD+wevVqffnll3I4HOrQoYPpnIvKz8/XtGnTLjj6aNGiRUpJSTFUBQAwJT8/XwkJCZo2bZo6duyo5s2bu2W/LpdLU6dOlcPh0K+//uqWfQIAcD0YxgLFLDMzUydOnFB0dLTpFFyl/Px8paSkqF+/fpyFF243efJk7du3T02bNpUkRUZGKiQkxO0dGRkZOnPmzAXXZ2Zm6qmnnuIoVwDAeYYNG6bc3Fy3DWPz8/P13HPPKTs72y37AwDgegWYDgB83ahRo9SmTRvTGbgGa9asUdWqVRnEwpjly5erUqVKqlSpkpYuXWqk4W9/+1tBw39vderUYRALAAAAAFeJI2MBN9i/f79uuukmLVmyRBUqVDCdA8ALPfXUU+rZs6feeuutIn3czz//XCNGjLjk7UeOHCnS/cE9xowZo06dOpnOAIBi88orr2j27NmybVs5OTmmcwAAKDSGsYAb5ObmauvWrRo5cqQefPBBxcXFmU7CZYwePVpZWVms9wuPcuDAAS1YsEClSpWSJPXt21eVKlUq1Pc6nU6NHDnyoic1WbVqlbZu3VqkrTCvevXqqlq1qukMACgW7733nubPn8/zFwDAKzGMBdzo3XffVVhYmMqXL69q1aqZzsF/OXnypE6cOCHbtvXWW2/p2LFjppOAC2zatEmbNm2SJNWvX18NGzYs1Pfl5ubq9ddf58ghAIBXy83N1b59+zR8+HClp6ebzgEA4JowjAXc7J///KcWL16slStXmk7BfxkzZoyGDh1qOgMotG7duplOAADArbZv366bb77ZdAYAANeFE3gBBqxdu1axsbGcGMpDxMfHa9SoUaYzAAAAfML48ePVunXrIn3M4cOHq3379kX6mAAAmMAwFjAgLy9PBw8e1MCBA7Vx40bTOX4rIyND/fv318aNG5WRkWE6BwAAwCdkZmYqOTm5SB8zIyODZaQAAD6BZQoAQ2zb1ieffKLq1asrLCxMdevWNZ3kN7Zu3aqsrCwdP35cY8aMMZ0DAACAy9iwYYOOHj1qOgMAgCLBMBYw7OWXX9bq1as1a9YsBQYGms7xWbZty+VySZKeeuoprVq1ynARAAAArsTpdKpr165KTEw0nQIAQJFgmQLAAyxYsEDVqlVTfn6+6RSfNXfuXEVFRSkqKkpr1qwxnQMAAIArOHDggKKionTw4EHTKQAAFBmGsYAHcDgcSklJ0UMPPaTt27ebzvE5I0eO1IgRI3Tq1CmdOnVKTqfTdBIAFIugoCBNmzZNcXFxplMA4LrMmzdPzz33nE6dOlXw200AAPgChrGAh3A6nfrmm280e/Zsbdq0yXSOT7BtWwsWLNDMmTNZlgCAXwgICNCf//xn3XDDDaZTAPi506dPa+7cucrOzr7q7/3555/17bffav78+cVQBgCAWawZC3iYl156STt37tT48eMVFhZmOscr5eXlyeFwKC8vTw888ICysrJMJwHFIigoSCEhIdf1GL//rAAAUJSSk5N1//33a//+/YqNjS309505c0YDBgzQxo0biy8OAACDODIW8ECfffaZ6tatK9u2Tad4pb/+9a8qV66cYmJiGMTCpz333HNKS0u7ru2tt94y/ccAAECSlJqaqqioKH5LDADg0xjGAh7I5XIpOTlZ8fHx2rdvn+kcr7B//361bdtWbdu21Xfffae8vDyO9oNPmzBhgp5//nmFhIRc1/bAAw/oq6++Mv3HAQD4qJ49e+rf//73Fe/3448/qlOnTsrJyeGABACAT2OZAsBD5efna9myZfr8889VpUoVRUREqFu3bqazPMaZM2fOGyAdOXJEy5YtMxcEuEHNmjV1++23S5Lat2+v6tWrX/djVqlSRe3atdOjjz4qSVq3bp02b9583Y8LAIAkrV69WjNnzlRoaKjuv//+i95n4cKF+ve//62EhAQ31wEA4H4MYwEP99prr0mSatSoodtvv13R0dEKCPDvg9qzs7O1a9cu9e3b13QK4DZly5bVPffco3HjxhX5Y0dGRmrSpEmSpDfffFOHDx9Wampqke8HAOCfvv76a+3cuVMtWrSQJEVERCggIEDp6emSpLfffpsP1QEAfsO/JzqAF9m3b58qVqyoo0ePmk4xbuLEibr55ptNZwBu9eWXXxbLIPZ/vfTSS1q6dGmx7wcA4F82btyoihUrqmLFivrmm2+0dOnSgssMYgEA/oQjYwEvYtu236+h9Ze//EVLlizx+/8O8B+hoaFasWKF6tWr57Z91qlTRxs3btRdd92lkydPum2/AADf9vvrt5deekmBgYG8ngMA+CWOjAW8zNixY7V27VrTGW6Xm5urESNGaNmyZRwdDL8RGxurwYMHq2nTpgoPD3fbfkuUKKFGjRopKIjPbAEARe/QoUNKTEw0nQEAgBG8ywK8zIgRIxQcHKyKFSuqSpUqpnOKXVJSkk6fPq3MzEy9/PLLcjqdppMAt4iJidGdd96poUOHGtm/ZVn6wx/+oOzsbGVkZBhpAAAAAABfw5GxgBcaPny4unXrZjrDLZ566inVrl1bN998M4NY+JWRI0dq8uTJRht++eUXde/e3WgDAAAAAPgSjowFvNSGDRtUvXp1SdJHH32kP/3pT4aLilZ+fr5uuukmHTx40HQK4FaWZWnjxo2qVauW6RQAAAAAQBFjGAt4qby8vIJB5fjx45WcnKwnnnjCcFXR2L17t8aMGaO9e/fK4XCYzgHcpmrVqvrb3/6mOnXqqESJEqZzJEk9evRQw4YN5XQ69dJLLykvL890ElDk3nvvPc2ZM8d0BgAAAPwAw1jAByxYsECnTp1S06ZNFRcXJ8uyTCdds/3792vRokUaM2aM6RTArapWrar4+HgNGDDAdMp57r77bt19991yOByaO3euNm3apPT0dNNZuIiIiAjddNNNCghgFaqrNX36dK1fv950BgAAAPwAr9YBH7Fq1SrdcccdysnJMZ1yTVwul1wul4YNG6bnnnvOdA7gVgEBAfrrX/+qKVOmmE65pKCgIC1btkx33HGHV3/g48vi4uKUkJDgMUdVA/AOlmXxIQ4AAG7Esy7gQ86cOaMbbrhBy5cvN51yVbZu3aro6GhFR0friy++MJ0DuN369eu95kOIzz//XJMmTTKdAQAoIr169dL27dv5oA0AADdhmQLAx6Snp2vo0KF68MEH9dRTT5nOucCCBQv02WefnXddenq60tLSDBUB5lSoUEGjR49WzZo1FRoaajqnUEqXLq277rpL//rXv/Tkk0/K5XKZTgIAXIeQkBCVLVvWdAYAAH6DYSzgg5YuXaqgoCDVqlVLd911l+mcAr/88ou+++47ffnll6ZTAI9QunRp9ezZ03TGVatatap69uyp7777TgkJCawhCwAAAACFxDAW8FGLFi3Shg0btH//fgUGBiooKEjBwcFu73A6nQVnX+/fv7/Wrl3r9gbAEwUFBXn12p4lS5bU3LlzFR8fr4SEBOXn55tOAlDMLMsqsn+3cnJyZNt2kTxWYYWFhbE2KgAAMI5XI4APO3HihKKjo1WuXDm99tprRhpmzpypcuXKqVy5clq3bp2RBsATDRgwQBs2bDCdcd0WLlyod955x3QGADeoXbu20tLSimSrUqWKW9vLlSunkydPqkmTJm7dLwAAwP/iyFjAx+Xm5kqSpk+fXnBU6meffaaYmJhi2d/69ev1j3/8o+ByUlJSQQOA/y8wMFAhISGmM65bcHCw/vznP6tixYpeueQC/FtGRoa6d++u3bt3X3DbE088oe7du8u2bXXv3l0ZGRkGCj3Dp59+qurVq6tUqVJFtr719OnT9f7772vWrFlF8nhXYlmWQkNDOUkVAAAwjmEs4CcOHDigAwcOSDp7NvTo6GhFRUWpY8eOhX6MrVu3XvHo1m3btmnRokXX1QrAu1StWlXt27fXI488olmzZvn10AreJT8/X4sWLZJt27rxxhvVrFmzgtu6dOmidu3aybZt9e7dW1lZWRd8/969e5WQkODO5GJTtWpVxcfHX/S29u3bF/mRrK1bt9bBgwcVEREh27b11VdfFSxrBAAA4MsYxgJ+aODAgZKkJk2aqGXLlgXXBwcHKyIiouBybm7ueW8+v/32W2PLHQDwbGXLltWUKVO0c+dObdq0SdnZ2aaTgMvKzc1VWlqaoqOj5XK51K1bN73++usX3M+yLH300UcXfYyZM2fqt99+u+x+nE7nRU9yFxUVVfC1bdtKTU29uj/AdShVqtQFa7+2bdtWU6ZMcVuDJD300EN66KGH5HK5tHLlSh06dIiBLAAA8H22bRvfJNlsbGxmNsuyCrYOHTrY/23atGnn3W5ZlvFeNjZf2QYNGmT7IpfLZQ8ZMsT4f19/3Nq2bWv6f79XmTZtml2+fHnb4XDYLpfLdrlc1/Q4v3/vpbb169df8P8qJCTEzszMLLhPamqqHRgY6La/K+PGjbtoq0kul8vu0aNHsf2ZIyMjbafTafTP6MmSk5N5ncfGxsbGxlbEm32JOShHxgJ+zv6vMxknJCSoadOmBZfT0tLcfqZjAN7Nsiw9++yzatSokbp37246B7ioZ599VpK0ZMkSBQYGXtdjXWkN0nr16mn9+vXnXRcQEKBSpUoVfG9ERITWrl172edc27bVrl27az6CNjY2Vt9++62ks0sSeNraqZZleVwTAABAcWAYC6BAZmamNm7caDoDgJerWLGi6tWrZzoDuKSmTZsqPDxcN910U7HvKyws7LwPOi8mMDBQTZo0ueJjDRgw4KJr1xZGhQoVrthhWseOHZWTk+O2k3oBAACYwDAWAAAUudDQUNWoUUMHDhyQ0+k0nQOc54knnjCdcE1effVV0wnF6qGHHtINN9ygDRs2KDEx0XQOAABAsQgwHQAAAHxP7dq1tXfvXlWoUMF0CgAvcuedd2r79u0KDQ01nQIAAFAsGMYCAIBis2bNGvXs2dN0BgAvUqJECe3YsUOtWrUynQIAAFDkGMYCAGDATz/9pLfeest0RrGrWrWqHnvsMY0cOVIjRozgaDcAV2RZlmJjYxUWFmY6BQAAoMixZiwAAAasXbtWSUlJuueee3TTTTcpJCTEdFKxufvuu3X33XfL4XBo/vz52rx5s9LT001nAfBwderU0d69e1k/tpidOHFCmzZtkm3bplMAAPALHBkLAIAhR44cUbNmzXT48GHTKW4RFBSk5cuXq23btqZTAHiBcePG6fXXXzed4fNmz56te++913QGAAB+g2EsAACGxcXFady4caYzAAAAAADFjGEsAACGpaenKzs723SG2wwYMED9+/c3nQEft3//fvXp00dZWVmmU3Ad2rRpo7Fjx8qyLNMpAAAARYJhLAAAcKs77rhD3bt3V9u2bRmwoNicOHFCn332mfLz802n4DrUqFFDDzzwgOkMAACAIsMwFgAAD5Cfn6+8vDzTGW7TunVrzZ49W6VLl1ZAAC9HAFxeWFgYH94AAACfwLsfAAA8wKuvvqr4+HjTGW4VHh6uY8eOqUWLFqZTAHiw6OhonThxQo0aNTKdAgAAcN0YxgIA4AHy8/O1ZcsW3XfffcrMzDSd4zYlSpTQqFGj9Pzzz5tOAeDBwsLCNG7cOD366KOmUwAAAK5LkOkAAABwVkZGhhYuXOhXyxVIUvPmzZWVlaWkpCTNnDnTdA58iG3bmjFjhu69917FxsaazsF1atWqlVJTU5WWlqbvvvvOdA4AAMA14chYAAA8iG3bSk9P97uBbHx8vMaOHavIyEgFBgaazoGXO3PmjDIyMmTbtp5++mn9+OOPysrKMp2FItCxY0e99957pjMAAACuGcNYAAA8iG3bqlOnjmbMmGE6xe1iYmJ0/Phx1a1b13QKvNwrr7yidu3aFVx++umn1aNHD4NFAAAAwFkMYwEA8DAul0tDhgxRv379TKe4XUBAAGdMx3WzbVu2bZ93OSEhQW3atPG7o859UeXKlfXrr7+qUqVKhbp/hw4dtHjxYgUE8NYHAACYxysSAAA8UGJiohYtWqQPP/xQTqfTdI5b/eUvf1GLFi1MZ8DHnDp1SmvWrNHo0aN18OBB0zm4DqGhobrlllv07LPPKi4u7or3j4qKUtOmTd1QBgAAcGUMYwEA8FA7d+7UoEGDtGfPHmVnZ5vOcZtBgwapZ8+eio2NVWxsLGvI4qpFRkZe9KjJ/Px8DR48WDt37jRQhaL2yiuv6M477zSdAQAAcFUYxgIA4MFyc3NVr149/fzzz6ZT3Kp///7av3+/9u/fr4oVK5rOgZcZMmSIZs+ebToDAAAAuADDWAAAvMCjjz6qF154wXSGEStXrtTDDz9sOgNeplGjRtq1a5fKlClzwW2PPPKI/v73vxuoQlF78cUX9d1335nOAAAAKDSGsQAAeIGkpCQtXLhQQ4cOPe/ERP6gevXqeuSRR/TMM8+YToEXCQ0NVa1atfTPf/5TTZo0Oe+2pKQkpaSkGCpDUYqKilKVKlVMZwAAABQaw1gAALzE9u3b9f7772vDhg1+tYasJLVr1069evUynQEvY1mW+vfvrw4dOqhGjRqmc1BMwsLC1LhxY9aXBgAAXoFhLAAAXiQjI0NxcXHaunWr6RS3syxLlmWZzoAXevPNNzV27Fj+/vioBg0aaP369SpbtqzpFAAAgCtiGAsAgBe699579eqrr5rOcKtbbrlFR44cUXh4uOkUeKH4+HgdOHBAoaGhplMAAADgxxjGAgDghdLS0pSVlWU6w62CgoJUsWJFjRs3Ts2aNTOdAy8TEhKiG264QZ9++qkaNGigVatWqX///n63BrOvCggI0JgxY9SiRQvTKQAAAJfFMBYAAC918OBBJSQkmM5wK8uy1KtXL91///1q1KiR6Rx4mcDAQPXu3VuVK1fW3r17NX36dNNJKEI9e/ZUrVq1TGcAAABcFsNYAAC81MyZM9W7d2/TGUYMGTJEb7zxhkJCQkynwAsFBQUpKCjIdAYAAAD8EMNYAADglTp06KDDhw8rODjYdAq8zKxZszR+/HjTGQAAAPBDDGMBAPBiKSkpuu+++5ScnGw6xe0CAwMVGRmpOXPmqGHDhqZz4EVCQ0M5kRcAAACMYBgLAIAXy87O1oIFC3T69GnTKUYEBgaqffv2ioqKMp0CLxMbG6suXbqYzkARu/XWW3XHHXeYzgAAALgkFssCAMAHZGRkKDc312+P9gsPD1fJkiV15swZ0ynwEm3atFGbNm1MZ6CIPf/886pTp46WL19uOgUAAOCiODIWAAAf0KxZM3344YemM4yZPXu2Ro8ebToDAAAAAC6LYSwAAD7A5XJpzJgxeuSRR0ynGBEQEKCAAF7WAAAAAPBsvGsBAMBHHDp0SEuWLNGHH36onJwc0zlud+ONN+rxxx83nQEvsWPHDn3yySemMwAAAOBnGMYCAOBDjhw5or/+9a/auXOn362f2qJFC7311luqVq2aqlatqjJlyphOggf7z3/+o5deekm2bZtOAQAAgB9hGAsAgI+xbVtNmjTRjBkzTKe4XXR0tA4cOKCDBw9q4MCBpnMAAAAA4DwMYwEA8FH/+Mc/1KdPH9MZxjz77LP68ccfTWcAAAAAQAGGsQAA+KiUlBQdPnzYdIYxkZGRiouL07Bhw1iyAAAAAIBHYBgLAIAPy8rK0tatW/12XczIyEi98sorioiIMJ0CD7J//34dOnTIdAYAAAD8UJDpAAAAUHzWrFmjZs2a6cSJEypdurTpHGMsyzKdAA/y1FNPadGiRYqKijKdAgAAAD/DkbEAAPi43Nxc1axZU0uWLDGdYsy6dev0+OOPm84AAAAA4OcYxgIA4AeOHTumd955R59++qnpFCOio6PVt29fvfjii6ZTYFB+fr6eeeYZbdu2zXQKAAAA/BTLFAAA4Cd+/PFHOZ1O1a9fX61btzad43YtWrRQmTJltHLlSq1atUoOh8N0EtzM4XBoypQpysnJMZ0CAAAAP8WRsQAA+JElS5boT3/6k5xOp+kUI+rXr6+ffvpJkZGRCgjgZRAAAAAA9+JdCAAA8CuBgYHav3+/unTpYjoFAAAAgJ9hGAsAgJ85ffq0unbtql27dplOMaZkyZJ68cUXNWTIENMpAAAAAPwIa8YCAOBn8vPzNWfOHN16662677771KRJE9NJRsTFxSklJcV0BtwkOTlZP/30k98u0QEAAADPwJGxAAD4qZdfflmTJ082nQG4xYYNG/Twww8rPz/fdAoAAAD8GMNYAAAAAAAAAHADhrEAAPixf//73+ratavpDCNee+01/e1vfzOdAQAAAMCPMIwFAMCPJSUladmyZRo3bpxOnTplOsetdu7cqR07dpjOAAAAAOBHGMYCAODn0tLS9Nxzz2nLli3KysoyneMWSUlJys7ONp0BN0lLS9PJkydNZwAAAAAMYwEAwFlt2rTRmDFjTGcUO6fTqQYNGmjOnDmmU+Am/fr1U+/evU1nAAAAAAxjAQDA/zdq1Ch17tzZdEax2bx5sxo2bKiMjAzTKXAjl8tlOgEAAACQJAWZDgAAAJ7j+PHj2rt3r+mMYpOdnc06sQAAAACM4chYAABwnpycHG3fvl1Op9N0CgAAAAD4FIaxAADgPHv27FHDhg114sQJ0ykAAAAA4FMYxgIAgAvYtq1GjRpp+vTpplOK1M0336xDhw4pIiLCdArcwOVyqX79+po1a5bpFAAAAEASa8YCAIBLOHbsmD7++GMlJyfrhRdeMJ1TJIKDg1W5cmV98MEHys3NveL9Fy5cyCDPy6WkpCgnJ8d0BgAAACCJYSwAALiMn3/+WRkZGWrevLlatWqlgADv/6Uay7LUp0+fQt03PDxcx48fL7i8f/9+HTlypJjKAMC9tmzZol27dpnOAADAr3j/OyoAAFCsNm7cqPj4eGVkZMi2bdM5bvXQQw9pxYoVBVvPnj0VFMRn2QB8Q79+/fTOO++YzgAAwK8wjAUAAFeUn5+vatWq6YcffjCdYtTw4cO1ePFi0xkAAAAAvBTDWAAAUCiZmZlyOBymM4wKDQ1V48aN9c0336h06dKmcwBcwZAhQzRgwADTGQAAAAUYxgIAgEL75Zdf9Ouvv5rOMKps2bLq2rWrunbtqqpVq5rOwSWkp6dr1qxZysvLM50Cg2677TY1a9bMdAYAAEABhrEAAKDQ3nrrLb377rumM4yzLEtTp07Vvffeq7CwMNM5uIjdu3erW7fX5t+JAAAgAElEQVRuOn36tOkUAAAAoADDWAAAgGs0YcIEzZgxw3QGAAAAAC/BMBYAAFyVZcuWqUOHDnK5XKZTjAsMDNRtt92mFStWaMWKFUZ/HXr9+vWKj49XTk6OsQYAAAAAlxdkOgAAAHiXEydO6JdffpFt26ZTPEK5cuXUunVrSVLv3r0VHh6upUuXur3j1KlTWrlyJUNy+LWlS5dq/vz5pjMAAAAuiWEsAAC4ai6XS0lJSYqJiVFwcLDpHI/Rv39/3XDDDdq+fbtSUlJM5wB+Z9KkSfr8889NZwAAAFwSyxQAAICrlpmZqapVq+o///mP6RSP061bN23btk0BAbzMAgAAAHA+3iUAAIBr9vDDD2v48OGmMzxO2bJltWXLFtWrV890il8aMWKEevToYToDAAAAuADDWAAAcM0SExM1d+5cjRo1ynSKRwkMDFSDBg00YMAAtWvXznSO30lOTta+fftMZ8CALl266OGHHzadAQAAcEmFXjPWsqxASWslHbFtu6NlWX+Q9JWkKEnrJPW2bTvPsqxQSdMkxUk6KelB27YTi7wcAAB4hF9//VWJiYnq0KGDatSooZCQENNJHuPpp59WaGio9u7dy3AQcIOuXbuqdOnS+uKLL0ynAAAAXNTVHBk7QNJv/3V5hKRRtm3XkpQmqe+56/tKSjt3/ahz9wMAAD7s2LFjql+/PgPHi3j00Ue1ePFi0xkAAAAAPEChhrGWZVWR9EdJ/zp32ZIUL+mbc3eZKqnzua87nbusc7ffde7+AADAx91xxx0sWXAR1apV06FDh1StWjXTKQAAAAAMKuyRsaMlDZLkOnc5SlK6bduOc5cPS6p87uvKkg5J0rnbT527PwAA8HHHjh1TZmam6QyPExgYqCpVqmj48OG66667TOcAAAAAMOSKw1jLsjpKOmbb9rqi3LFlWU9alrXWsqy1Rfm4AADArEOHDmnjxo2mMzxS79691blzZzVu3Nh0CgAAAAADCnNk7G2S7rcsK1FnT9gVL+kDSWUty/r9BGBVJB059/URSVUl6dztZXT2RF7nsW37E9u2m9m23ey6/gQAAMCj/Otf/9LDDz8sp9NpOsUjPf/885o8ebLpDAAAAAAGXHEYa9v2i7ZtV7FtO1ZSD0k/2bb9sKSlkrqdu9sjkmaf+3rOucs6d/tPtm3bRVoNAAA82o4dOxQVFaWUlBTTKQAAAADgMQq7ZuzFDJb0gmVZe3R2TdiJ566fKCnq3PUvSPrH9SUCAABv43K5dOrUKT3++ONavHix6RyP84c//EFff/21wsPDTacAAAAAcKOgK9/l/7Nte5mkZee+3iep+UXukyOpexG0AQAALzdv3jxVrVpVJUqUUOvWrU3neIyyZcuqe/fumjNnjpYvX67Dhw+bTgIAAADgBtdzZCwAAMAVjR8/XoMHD1ZWVpbpFI9iWZY+//xz3XfffSpVqpRKlSplOgkAAABAMWMYCwAAit0vv/yimJgYBrIXMW7cOKWlpSkpKUlhYWGmcwAAAAAUI4axAID/1969x+lc5/8ff77nmgMap8k4JTPk2GodcpZyDOVQshsllFI2Lfa3DrFqW9rOOmizKUSig7RSK6QSWpWQRoaGHErDjMHMYE7X9fn9YZqvRA5zXdf7M9f1uN9u3cx1zTWfz0Pl4zOv+VzvDxBwjuPo+PHj6tGjh1avXm07x1U8Ho+ioqIUGxurDz74QC1b/moVKAAAAAAhgmEsAAAICsdxtGbNGi1YsEArV660neM6xhhdffXVuvjii22nlGhvvPGGkpKSbGcAAAAAp3VeN/ACAAAorunTpys1NVWdO3e2neJKFSpUUPny5XXkyBHbKSXSuHHjtHv3btsZAAAAwGlxZSwAAICLzJ8/Xy+88ILtDAAAAAABwDAWAAAE3cqVK9WmTRsVFBTYTnGl66+/XqtXr1ZEBKdqAAAAQCjhDB8AAARdZmamNm7cqEcffVSPPPKIli1bZjvJVcqXL68rr7xS999/v6pVq2Y7BwAAAICfsGYsAACwIjc3V5MmTZIkDRgwQPXq1VOtWrUsV7lH6dKlNWXKFCUnJ+vTTz9VWlqa7SQAAAAAxcSVsQAAwLoFCxaoXbt2tjNcaeHChZowYYLtDAAAAAB+wDAWAAC4wv79+1WzZk1t377ddorrDB06VP/73/9sZwAIEceOHVOdOnW0bt062ykAAIQdhrEAAMAVfD6f9u7dq7y8PNsprlO2bFn97ne/07PPPqtKlSqd9jV16tTRU089pejo6CDXAShpfD6f9uzZo9zcXNspAACEHdaMBQAArvLNN9+oQoUKqlGjhu0UVylbtqz+/Oc/a9q0aUpPT//V5y+99FKNGDHCQhngHjt27FBycrLtjLAQERGh5s2byxhz2s/n5OTo66+/DnIVAADuxzAWAAC4yi233KK//e1veuihhxQRwZt4AJy7v//975o3b57tjLBw0UUXafXq1We8Gn/nzp2qV6+evF5vkMsAAHA3vsMBAACu8+STT6pNmza2MwAAp9G1a1ft3bv3N5dFSUxMVHp6umrWrBnEMgAA3I9hLAAAcJ2cnBwlJyfrlltu0aFDh2znuMrTTz+tPn362M4AEKbuu+8+TZw4UeXLl//N10VERKhChQqaNm2aevToEaQ6AADcj2EsAABwpczMTC1YsEDvvfee9uzZYzvHNXr27KnLL7/cdgaAMNWuXTtdc8015/z63r17q0GDBgEsAgCgZGEYCwAAXG3QoEFatGgRd/0+SXR0tEqVKmU7o0TzeDwqU6aM7QwgLHDMAgDg/zCMBQAArjdmzBh169bNdoZrTJo0SatXr7adUaL1799fKSkpZ7wTPAD/mTJlilauXGk7AwAAV2AYCwAAXK+goECbNm1St27dlJ2dbTvHOo/HowYNGmjFihWqUKGCJGnTpk3q0aOHcnJyLNeVDBEREb958yEA/hMZGalGjRpp+fLlKlu2rO0cAACsYhgLAABKhCNHjuijjz6S1+u1neIKsbGx6tKliwYNGqQ6dero0KFD+uijj+Tz+WynAQhhn3zyiT788MPz/rpy5cqpS5cuGjx4sGrXrh2AMgAASoZI2wEAQkdsbKxf1gPLz8/XkSNH/FAEINQ4jqODBw+qdOnSXNVY6Nlnn5UxRq+88oqOHz9uO8can8+njIwMhvVAgP373//W7t271aVLl/P+WmOMpk2bpsOHD2vnzp0BqAMAwP0YxgLwm2nTpmnw4MHF3s5HH310QSf4AEKf1+tVnTp19Oabb6pfv362c1zjmWeeUa9evXTdddfZTrFm586dqlevnhzHsZ0CAAAAnBHDWAAXZMmSJbrkkkt+8VxCQoJfboTSqlUrbdiw4Yyff/311/X4448Xez8ASibHcTRmzBg9/PDDKlOmjFatWqXISE5pWrVqpc8//zys71jOIBYoGaZMmaJWrVrpvvvus50CAEDQ8Z0LgHPWvHlzderUSZLUtm1bxcXFBWQ/sbGxatq06Rk/f/LbcF9++WVlZGQEpAOAe+3atUuSFB0drSeeeEIDBw7UpZdeajfKstjYWDVp0kSStGjRIqWkpFguCq709PTzev2WLVs0bdo0hkGABQkJCapbt67tDAAArGAYC+CMYmNjVaVKlaLHffv21f3332+x6IS2bduqbdu2kqSNGzdq586dysvL0969ey2XAQi2vLw8TZgwQQkJCeratavi4+NtJ7nCsmXLtHjxYu3fv992imtt375dc+fO1YgRI/zyrg4A56dMmTKqXbu2vv/+e65qBwCElQjbAQDc68Ybb1RKSkrRP24YxJ5q+fLlSklJ0X//+1/bKQAsuvXWW115jLLlxRdf1HPPPWc7w9VuvPFGffnllwxiAUvat2+vpKQkxcTE2E4BACCoGMYCOK25c+dq6tSptjPOWf369fX999/r+++/V//+/W3nAIB1PXv21ObNm+XxeGynAAAAACjEMBZAkfr162vq1KmaOnWqrr76alWqVMl20jmLiopSYmKiEhMTNXToUE2dOlVPPPEEV1sAYeTzzz/Xgw8+aDvDNcqUKaN69erpySefVEJCgu0cAC6RlJSk+++/X16v13YKAABhiTVjAahJkybyeDxq27atRo8ebTun2Lp06aIuXbqooKBAS5Ys0ebNm3X48GHbWQACLCkpSfv27VOvXr30u9/9TqVLl7adZF1MTIxGjRqlzZs368MPP2RtbQDasWOHnn/++WJtIzMzUxs3blSTJk2KtdRHRESEmjVrpqSkJGVmZharCQCAkoIrY4EwFxkZqZUrV2r9+vUht75gZGSkVq1apY4dO7ImIBAmMjIy1KJFC23fvl0+n08+n892kivMmjVLY8eOVUQEp34Aim/t2rW66qqrlJ+fX6ztxMTEaO3atWrdurWfygAAcD/OyIEw1qRJE+3fv18VK1a0nRJQc+fO1ezZs21nAAiiq6++WvHx8WrVqpXtFNe466679PXXX9vOAAAAAMIaw1ggTA0YMED//Oc/FRcXF/JXjcbGxqpjx46aMWMGV4UBYSIzM1MZGRnatm2bbrvtNmVkZNhOsi4mJka1a9fWq6++qldffVV9+vSxnQQAkqRx48bp7rvvtp0BAEBQMJUAwlSLFi3Uo0cP2xlBU7NmTQ0cOFA9evRQhQoVbOcACJKsrCzNmzdPR48etZ3iCmXKlNHAgQM1cOBA9e3bV+3atbOdBADq1KkTxyMAQNhgGAuEoZiYGHk8HtsZQVe6dGm99957atq0qaKiomznAAii3Nxc7hx+ikGDBmnGjBkqVaqU7RQAAAAgbDCMBcLQ119/rXvvvdd2hjUffPCBnnrqKdsZAIKoUaNGmj59uu0M12nYsKEyMjJUpUoV2ykAAABAWGAYC4SRatWqadmyZUpISAjLK2N/Fh0dzZWxQJjJzc3V888/r5EjR9pOcRVjjEqXLq033ngjrJauAeA+Xbp00VtvvRXy9zIAAIBhLBBGypQpo2uvvZa3pEqqV6+ebrzxRtsZAIJo27Ztev/99zVv3jzl5+fbznGVa665Rv369VPXrl1tpwAIU9WqVVPHjh0ZxgIAQh7DWCBMlCpVihtXnaRTp056/vnnbWcACLIdO3ZoyJAh2rt3r/Ly8mznuModd9yhhx56SHFxcbZTAq5s2bKKjY21nQEAAIAwxDAWCBP33XefvvzyS9sZAGCd1+tV3bp19c4779hOcZ02bdpo3759IT+oXLhwof71r3/ZzgAAAEAYYhgLhAljDG/7OkV8fLzWr1+vxMRE2ykAgszn82nChAkaNWqU7RTXiY6O1qeffqoOHTrYTvG7mJgYrV27Vm3atOHvRMCFKlSooC+++EINGjSwnQIAQMAwjAXCwG233ab27dvbznCdqKgoXXnllSpTpoztFAAW7Ny5U8uWLdNzzz0nr9drO8c1jDFq2rSpBg8erF69etnO8ZvExESNHj1aLVu2VNmyZW3nADgNj8fDuRkAIOQxjAXCwF/+8hf17NnTdgYAuE5ycrLGjx+vHTt2KCcnx3aOqwwZMkR/+tOfVLNmTdspxVapUiV16NBBjzzyiCIjI23nAAAAIIwxjAUAAGHt+PHjql+/vtauXWs7xXW6d++upKSkEj/AfPbZZzV79mzbGQAAAIBK9pk1AAAAcAYRERFav3696tatazsFAAAAkMSVsUBIu+iii/Too4+qevXqtlMAwPVeeukljRs3TpMnT7ad4ioxMTF65JFHlJCQYDvlvNSoUUOPPvqoGjZsqNjYWNs5QMjJy8vThAkTlJKS4vdt//nPf9b111/v9+0CAOAGXBkLhLAyZcpo7Nix3DH6LBo2bKiMjAylpqbaTgFg0RtvvCFJqlixoq6//no1bNhQpUuXtlxlX3R0tP7617/q22+/1Ycffqi9e/faTjqrSy65RB06dNCYMWNspwAhq6CgQE899ZR69OihOnXq+HXbgwcPVnp6ut5//32/bhcAADfgylgAYW/hwoW65557bGcAcIlDhw7pyiuvVHJyshzHkeM4tpNcYdasWZo4caLrf8BnjNGYMWP06quv2k4BXIdjGgAA9nFlLAAAwGl06tRJkZGRSkxM1Jdffmk7xxUGDx6sdu3a6YorrrCdckZfffWVGjRoYDsDcKWhQ4fq7bfftp0BAEBY48pYAACA0zh8+LDS09O1bds2DRkyROnp6baTrCtVqpQuu+wyvfLKK65bj7xy5cp65ZVXVLduXZaXAM4gKytLmZmZtjPOSbdu3fTEE0/YzgAAwO8YxgIAAPyGrKwszZkzR9nZ2bZTXKF06dIaPHiwunXrpsTERNs5kqSEhAR1795dgwcP5mZdQIho1KiRbrrpJtsZAAD4HcNYAGEvLy9PBQUFtjMAuFxeXp5yc3OVn59vO8UVZs2apSFDhtjOkCQNGDBAc+bMsZ0BAAAAnBVrxgIIe23atNHmzZttZwBwucaNG8sYoyuvvFKrV6+2nQMAAACgBOLKWABhLycnhytjAZxVTk6Ojh8/rm+++Ua9evVi2QJJt9xyi/79739bbZg2bZprrtAFAAAAzoZhLAAAwHk4cuSI3n//fS1YsEC7du2ynWNV3bp11atXLw0YMCDoN82KiYlR//791bt3b9WvXz+o+wYAAAAuFMNYAACA8+Q4joYNG6Zly5aF/RWy1atX1/z585WQkKCYmJig7bdcuXKaP3++atasGbR9Agguj8ejihUrKiKCb1sBAKGDv9UAAAAu0PDhw/XHP/7RdoYrbNmyRbfffrvtDAAhpGbNmkpLS+OHLgCAkMIwFgAA4AI5jqPPPvtM7du3V15enu0cqyIiIjR+/HjNmDEj4Pvq1auXli1bJmNMwPcFwC6Px8OVsQCAkMLfagAAAMVw5MgRff7553r22We1Z88e2zlWJSQkqEuXLvrzn/+s6OjogO2nUqVKatq0acC2j5LpxRdf1NatW21nhJ23335bK1assJ0BAECJwTAWAACgmPLz8zV27Fh9+umnOnjwoO0cq2rVqqWnnnpKpUqVsp2CMOLz+TRx4kR99dVXtlPCzvTp0zV//vyA7qN69eoqW7ZsQPcBAECwMIwFAADwk9tuu02jR4+2nQEAIWX16tW6++67bWcAAOAXDGMBhK0DBw6obt26SklJsZ0CIIT85z//UYsWLeQ4ju0UayIjI7VhwwZ169bNdgqAEMEa0QCAUMEwFkDYKigoUEpKStjfdAdwg5YtW+r++++3neEXWVlZ2rJliyZMmKB9+/bZzrHmsssu07BhwzRw4EDbKQhxP/zwgyZOnKhjx47ZTnEtn8+nf/zjH0pKSrKdAgBA2Iu0HQAAAMJPuXLllJiYWPS4V69euueee/Tf//636IrS3Nxcbdu2zVJh8Rw/flyPPvqoGjdurGuuuUbVqlWznWRF37595fF4NG/ePNspCGH79+/Xo48+ajvD1Xw+n6ZOnaojR47YTgEAIOwxjAUQtnw+n+0EIOz8/DbTzp07a9GiRb/6/KZNm4o+Tk5O1uWXXy5JJfYt/wMGDND/+3//T08++aTtFGuMMTLGlNj/hnA//t8CAAAlCcsUAAhLc+bMUdOmTW1nAGElMjJSKSkpSk1N1ezZs8/6+rp16yo1NVWpqakaMmRI4AMREN27d9eOHTsUFRVlOwUh6O9//7uuvfZa2xkAAADnjGEsgLB0/Phxpaen284AwkadOnX00ksvqWbNmqpcubLKly9/1q/xeDyqXLmyKleurGHDhmnmzJmaMWOGSpcuHYRi/1m2bJnGjRtnO8Oa6OhoXXrppZoxY4bq169vOwchZNy4cXr77bd16NChM75m6tSpevnll4NYFZ7WrFmjUaNGBXQff/jDH/Tggw8GdB8AAAQDw1gAYWfjxo3avn277QwgbNSrV0/XXXedhgwZosjIC1shqU2bNrrjjjt0xx13qHPnzurQoYMaNWrk59LASEpK0ty5c/XJJ58oJyfHdo4VkZGRGjJkiHr06OGXgWxqaqrWrFnjhzKURDk5Ofrkk080Z86cs96Qavny5Vq1alWQysJXSkqK5s+fH9AlI1q0aKE+ffoEbPsAAAQLa8YCIS43N1cxMTFF6zRCuu+++7R27VrbGUBYiIqK0pgxY3TnnXf6ZXsej0dLliyRJC1ZskT9+vWTJOXn57t63cjU1FR17NhRycnJqlWrlowxYfm2/aefflqXX365hg0bVqztLF26VOvXr9fevXsVHR3N33FhxOfzaffu3erYseM5f43X61V+fn5Y/pkDAADuw5WxQAhLS0tTpUqVznrVCAAEytatWwO23ut1112ngwcP6uDBg2rXrl1A9uFvzZo1U1xcnDp06GA7pcRLS0vTxRdfrC1btthOQRA9++yzatas2Xl9zVtvvaXLLruMG3cCAABXYBgLhLijR4/K6/XaznCF48eP68Ybb9TWrVttpwAhr2bNmnr33XdVo0aNC16a4Gw8Ho9iY2MVGxurxx57TCNGjAjIfvzp2LFjOnr0qJKSktSnTx9lZWXZTgq6bt26ac6cOX7Z1tGjRzVixAjNnTvXL9uDu40aNUozZszQsWPHzuvrCgoKlJqaqj59+qh3796aOXNmgArdKTk5WTfeeKOOHj1qOwUAAIhlCoCwsHz5ckVEROj3v/+97RSrCgoK9N5776mgoMB2ChDSfve73+naa69Vr169grbPtm3bKicnR3v27NG7774btP1eqMzMTC1ZskRvvvmmOnfurMTERNtJQVOzZk317t1bN998s5YuXarMzMxibW/VqlWqUKGCKlWqpOuuu85PlXCLXbt26YsvvpAkLV68WLt27bqg7eTn5+u9996TdGKpg7JlyxZ9rnHjxiF9c7mMjIyi3zsAALDPuGF9NWOM/QggxI0cOVKTJ0/+xTcf4eTnq2Jq1arFMBYIoNjYWE2aNEljx461sv+0tDTVq1dPmZmZJeYtydOnT9fAgQMVGxtrOyXorrjiCr8tpVO7dm1t2LCh6PHPV06jZMrMzJTjOFqwYIGGDx8e0H099NBDGjlypCSpXLlyIbUGcU5OjlavXq1rr7024PuKj4/X/v37A/rvb+PGjee9TAUAALY4jnPavxQZxgJhwhijunXratu2bbZTrFi0aJH++Mc/smQDEGBbtmxRw4YNrQ4zvF6vmjRpUmLWy46IiFDXrl31wQcf2E4JOn8OY6UTA9iftWjRQv/73//8tm0Ez/HjxxUfH6+cnBw5jhPwH6xERETIGKOIiAjt27dPlSpVCuj+gum+++7TCy+8EJQfTjGMBQDgl840jGWZAiBMOI6j3bt3q02bNnrnnXdUtWpV20kBV1BQoGuvvVbHjx9XRkYGg1gggOLj47V48WLVrl3b+lVlJw/kSgKfz6d169bpmmuu0fLlyxUTE2M7KWjmzZun6dOn68UXX/TL9k4+zn/zzTdq06aNX7brZo8//rjat29vO6NYfD6fevToUbRkhc/n07FjxxSsi0Z+HlR6vV716NFDkZGRatGihZ577rmg7D+QvF5v0N4lcOjQIbVt21Zz585V3bp1g7JPAABKIoaxQBjJzc3VunXrNGPGDPXp00eNGze2nRQw+/bt08KFC/XZZ58pNzfXdg4Q0ho1aqQ+ffqExeArUI4cOaLPPvtML7zwgm666SbVrFnTdlJQNG7cWDVq1AjIto8ePap169YFZNtuMm/ePH399de/ej42NlZDhgwJftB5OnDggN544w2tWbPmvG/MFQjr16+XdGKw+Pzzz1uuKb5vvvkmaPsqKCjQunXrAnqjsPj4eA0fPlyzZ89WTk5OwPYDAEAgsUwBEKYeeOAB3XvvvapcubLtFL87fPiwVq5cqX79+tlOAUJefHy8hg0bpilTpthO+QV/v/09mObOnavrr79ecXFxtlMCbv/+/Zo6daoef/xx2ykhp2rVqvryyy+LHkdFRalKlSoWi34pNTVVBQUF2rRpU1Bv9ofA27hxo5o0aRKw7TuOo/j4eB08eDBg+wAAwB9YMxbArzRv3vwX36iFimHDhumll16ynQGEhbVr16pt27a2M36lJA9jJWnAgAGaP3++7YyAK+n/nUqSyy+/XFu2bLGdUSQhIUF79uyxnYEAYBgLAMAJZxrGRgQ7BIB7JCUlqWHDhjpy5IjtFL9p37693nzzTdsZQEjr1q2bkpOTlZycrKZNm9rOCUnvvfeeWrVqFbS1HhH6UlJS1KBBAzVo0ECvv/66lYYZM2YUNezbt89KAwAAgG2sGQuEsZycHG3btk2TJ0/WbbfdVmLXkF25cqU++eQTSSeuxgjkWmVAuPJ4PJowYYKio6PVsGFD1a9f33bSaWVnZ+vxxx/XgQMHbKcUS1ZWljZv3qwHHnhA99xzT8DWVUX4yMvL07Zt2ySdWArjbFfJtmjRQr179y7WPjds2KB33nmn6PHatWuLGgAAAMIVw1ggzDmOo6eeekrx8fGqUKGCEhISbCedl5SUFL311lt+uxM3gF+qXr26KlasqKioKN1///0qXbq07aTfdPToUU2ZMiVod2EPpJycHD388MNq1KiROnTooKpVq9pOQohYunSpli5d+puv+cMf/qDatWsXaz/vv/++69aTBgAAsI01YwEU6datmz744APbGeelQYMGXGUDBNDLL7+soUOH2s44Z/v371e1atVCYhh7spEjR+qZZ56xneF3rBkLhB7WjAUA4ARu4AW/iouLU1JSkow57f9X56Vbt27avHmzH6pQXNHR0apevbqSk5MVExNjO+eMZs2apYkTJ0qS0tLS5PV6LRcBoaVBgwb6+OOPJUnly5d3/dWwJwvVYWyZMmXUsGFDrV+/3naKX6Slpalx48ZKS0tTQUGB7RwAfnTxxRdr1KhR+tvf/haQ7TOMBQCUFGcaxrJMAX7T0KFD1aJFi189X7p0aVWrVs0v+3jwwQeVlpZW9HjSpEm/eIzgycvL0969e3XvvR88IPoAABjYSURBVPcqMjJSrVu31pAhQ2xn6auvvtJLL71U9DgpKUmpqakWi4DQ9I9//EOVK1fWxRdfXCLfEr9mzRrNmDEj5AaxknTs2DFt27ZN99xzjyZPnqz4+HjbScXi9Xr1008/2c4AEAAHDx5UVlaW7QwAAFyLYSyKtGzZUqVKlfrFcwMGDFDnzp0Dut++ffv+4vEnn3xSdIddn8+nzz//XPn5+QFtwP/xer2aOXOmJGnXrl2qX7++2rRpE/QOx3G0du1a+Xw+ffzxx6wJCwRAkyZNVK5cuaLHt99+e4m9UdTmzZu1ZMkSvfrqq7ZTAiY7O1svvviirrnmGl111VW69NJLbSddkNTUVK1bt852BgAAAGAFyxSEuYiICHk8HknSjh07XPeNneM4uuSSS5Seni7HcXgrowVxcXHat2+fIiIifvH/SyCcPHTPzc1VlSpVdOzYsYDtDwhHxhhFRp74WeyaNWvUsmVLy0X+0aFDB61atcp2RtBMnjxZ48ePL/pvWRL8fIyfOXOmhg8fbrkGQCCNHTtWjz32WEC2zTIFAICS4kzLFEQEOwTuMnr0aGVkZCgjI8OVV0MZY7R9+3ZlZGRo0aJFtnPCUkZGhipVqqS4uDhNmjQpYPtxHEcNGjRQXFyc4uLiVK1aNQaxQADcdNNNRcf95s2b287BBZo8ebLat29vO+OcZWdnq2rVqoqLi9OoUaNs5wAAAADWlJzLKeA35cuX1+zZs2WMUb169RQbG2s76Tf93Ne6dWu98847kqQpU6boq6++spkVVrKzsyVJb731lrZu3Vr0/ODBg3XDDTec9/bS09N11113/er5H3/8Ubm5uRceCuC0oqKi9OqrryomJkY1atRw/XEfZ5eXl6dvv/1WN954oyRpxIgRAV9WqLiysrJYdggIE4sWLVJmZqamT59uOwUAANdhGBuGYmJidMMNN8iY014t7Vrx8fFFg7+kpCTVrFmz6HOffPKJDh06FPSmVq1aqXr16uf9dRkZGSXy7bQpKSlKSUkpelyuXLkLulFOWlqa/vOf//gzDcBpNGvWTAkJCYqOjtYNN9ygmJgY20l+5/V6tWTJEqWnp9tOCbrMzMyiY+kll1yimJgYXXXVVZarfu2nn37Sxx9/LJ/PZzsFQJCkpKQE9GaKPXv21Mcff6w9e/YEbB8AAAQKa8aGmaioKF166aVKSUkpccPY39KpUyd99tlnAb+qslSpUoqKiip6/Nprr6lXr17nvZ3PP/9cXbt29WfaWeXl5XHVKRBGYmNjNX36dA0cONB2SkBlZ2crLi6OKy4lXX311XrvvfdUtmxZ2ylFjh8/rsWLF2vAgAG2UwAE2WWXXfaLH+L725AhQzRnzpyAbR8AgOI605qxXBkbZm699Va99NJLITWIlaQVK1boueee01/+8peA7mfq1Km/eHv9hd7MqmXLlsrIyPBX1jl54oknNGHChKDuE4Adxhh99913qlKliu0UBNHq1atVtWpVpaWlqUyZMrZzJJ0Ylrz99tu2MwAAAADXYBgbZk6+i3Yo8Xg8uvnmm9WiRQtJ0oABA/TDDz/4dR/vvPOOWrdu7Zd/fzb+O9x2223ndLOX3NxcXXfddcrLywtCFQB/q1OnjmbPnq1KlSqF3A/eTrVmzRqNGzdOBQUFtlNcwXEcHT9+XF27dtVjjz3miiULvF6vvF6v7QwAAADANUJvKoewVb169aL1W++44w4dOHBABw8e1FtvvVWs7VasWFH9+/dXp06dVK5cOX+kWlGjRg3VqFHjrK/Lz8/X3Xfffda3/L722mvKysryVx4AP4mNjXXFEC4YDh48qM8++8x2hqs4jqPPPvtM8+fPLxrMBltqamrRWrY7duwI+v4BAAAAN2MYi5D00EMPSZKSk5O1du1aSdLhw4d17Nix895W9erV9cILL/i1z82ioqL03HPPnfV1mzZtuqCbJmRnZyszM1OSVLlyZR07dkzZ2dnnvR0Av1auXDlVrlzZdgZcYPr06frhhx90xRVXqGrVqkHZ5/79++X1evXll19q+PDhQdknAPcqKCjQvn37VKVKlQte2gsAgFDEMBYhrUGDBvrxxx8lSXfeeadmzpxpuSh0/O9//7ugr3vmmWc0evRoSdLatWs1a9YsPfLII/5MA8LWxIkTNXbsWNsZcIklS5aoYcOGSk9PD8ogpFmzZtq3b1/A9wOgZNi9e7cuueQSff/990pMTLSdAwCAa0TYDgCCZfLkyZo9e/Z5fc2IESP07rvvBqgoPA0aNEhbtmzRli1blJCQoFGjRhU9PvmfTZs2qXTp0rZzgRLjgw8+0NChQ21nwGUyMzPVqFEjff311wHbx86dO9WwYUMdOHAgYPsAAAAAQgVXxiJsVKtWTR06dNADDzwgSXr99de1ffv2M75++PDh6tevn2rXrh2sxLAQFxenuLi4oseVK1c+7duqvV6vJkyYoPz8fG3ZsoW7cQNnUL58eY0cOVItW7ZUxYoVbecEzcKFC7Vo0SLbGa7n8/mUnJys559/XjfddJO6d+/ut21Pnz5daWlpSktLU3Jyst+2CwAAAIQyhrEIK4mJiUXryaalpeno0aNFyxj8LDIyUnXq1NGYMWNUq1YtG5mQ5PF49Le//U2S9OabbzKMBc6gYsWKRce1cPLGG29o4cKFtjNKjJdfflnHjx9XnTp1VKdOHb9s8/nnn9e3337rl20BAAAA4YJlChC2XnjhBT3zzDO/er5KlSraunUrg1gAQEh57bXX1LFjR79tz+fz+W1bAAAAQLhgGAvA9Xr16qVvv/2WO/ECp7jrrru0bt062xkoQfbt26caNWpox44dF/T1r776qmrUqKEaNWooJSXFz3UAQlHr1q310ksv2c4AAMA1WKYAYa1p06Z66qmnNGbMGPl8Pl1zzTW64447bGfhFKVLl1b16tVtZwCuMm7cOPXu3VtVqlSxnYISxOfz6ccff9SkSZNUoUIFVa9evWhJmDPJzc3VX//6V3m9Xm3ZsuVXy/sAwG/Zv3+/srOzbWcAAOAaDGMR1i677DINHTpUY8eOlSQ1atRIgwYNslwFAGcWGRmpli1b6vbbb1f9+vVt56CEWrBggSSpdu3a6tChwy8+FxERoVatWum7775Tenq6jh07punTp8vr9VooBQAAAEILw9gw4ziOvF4vb/c+DY/Hw78Xl4uKimIYgLBXrlw5rV69WhER4bvSkNfrleM4tjNCws6dO9W+fftfPBcVFaUDBw5o/PjxWrx4saUyAAAAIDQxjA0zr732mlavXq3vvvtOxhjbOa6ydOnSX31DCvcoX7689u/fr86dO2v9+vW2cwBYUlBQoFq1aik1NdV2SsjKz89XzZo1dfz4cdspAAAAQMgJ38tqwlR+fr6ysrJsZ7jSRRddpFKlStnOwG8oV66cIiP5GRLC19VXX61XXnklrK+KlaSsrCwVFBTYzghp/DsG4E+vvPKKxo8f79dtjhgxQg888IBftwkAQDAw1QAAoARo3bq1brjhBvXq1ct2CgAA52Xz5s2Kjo726zabN2+uQ4cO+XWbAAAEA8NYAABKgL///e/q1q2b7QwAAAAAQDGE9/scAQAAAAAAACBIGMYirK1atUo333yzVqxYoUaNGtnOAYBfiY6O1vLly9WyZUvbKQAAXLBt27apU6dO3L8CABD2WKYAYS0tLU3r1q3T1VdfLY/HYzsHAH4lIiJC7du35waDAIASLSsrS6tWrVJ+fr7tFAAArOLKWAAAXCo6OlpVqlSRMcZ2CgAAAADADxjGAgDgUh06dNCuXbsUExNjOwUAAAAA4AcsUwAAgAs98sgj6t+/v+0MAAAAAIAfcWUsAAAuYozR6NGj1aNHDyUmJtrOAQDAbxzH0dSpU5WUlOSX7dWuXVvjxo1TZCTXGAEASg6GsQhrsbGxqlOnju0MAChijNHEiRPVuHFj2ykAAPiV4zh6+OGHtWnTJr9s77LLLtNDDz2kqKgov2wPAIBgYBiLsNa9e3etX79eHo/HdgoAAAAAAABCHMNYhK077rhDw4cPt50BAEUaNWqk3bt3q2LFirZTAAAAAAABwOI6CFvp6encoRyAa9xwww3q37+/atSoYTsFAICAmjt3rrKysrgwAgAQlrgyFgAAF7jqqqt08803285wtezsbH3xxRcqKCiwnQIAKIYVK1Zo8eLFtjMAALCCK2MBALDM4/EoIoKfj57N5s2b1a5dO9sZAACXiYyMlDFGjuPYTgEA4Kz4zg8AAMs2btyoP/3pT7YzAAAocWJiYrR371517tzZdgoAAOeEYWwYOnLkiAYMGKA9e/bYTgHO2bFjx3TLLbfou+++s50C+F3ZsmVZw/osZs2apQceeMB2BgDAhcqXL6+oqCjbGQAAnJNzGsYaY3YZY74xxmwyxqwvfC7OGLPCGPNd4a8VC583xpjnjDEpxpjNxphmgfwN4Pzl5ubqjTfe0OHDh22nAOcsPz9fb775pg4ePGg7BfCbMmXKqGfPnipTpoztFFf78MMPtXDhQq1cudJ2CgDAT9LS0rR06VL5fD7bKQAABNX5XBnb0XGcJo7jNC98PF7SSsdx6kpaWfhYknpIqlv4zzBJ0/0VCwBAqIiMjFStWrW0ZMkSVa5c2XaOqw0ZMkRLly61nQEA8KMNGzbopptuUl5enu0UAACCqjjLFPSRNKfw4zmSbjjp+bnOCeskVTDGVCvGfgAACDkjR47Uhg0bbGcAAAAAAILoXIexjqTlxpivjDHDCp+r4jjOT4Ufp0qqUvjxJZL2nvS1PxQ+B5e56667NHPmTNsZABCWPB6PoqOjbWe42p49e9SpUyelp6fbTgEAAAAAv4g8x9dd5TjOj8aYypJWGGOST/6k4ziOMcY5nx0XDnWHnfWFCJgvvvhC8fHxqlixovr27Ws7BwDCRu/evdW8efOzvzCMff3113r//ff18ccf204BAARIQUGB5s6dq549e6p69erF2lbXrl117NgxrVq1yk91AAAExjkNYx3H+bHw1wPGmHcktZS03xhTzXGcnwqXIThQ+PIfJV160pfXKHzu1G3OkDRDks53kAv/ef/997V161a1b99eF198sSIiirNyRcmRnp6uvLw87l5eQuTm5iotLc12BlBsxhhVqlRJU6ZM0RVXXGE7x9UWL16sBx980HYGACCA8vPzdffdd2vZsmXFHsaOHj1aVatWZRgLAHC9s07ejDEXGWPK/vyxpGslJUl6V9LgwpcNlrS48ON3JQ0yJ7SWdOSk5QzgQjt37lSVKlX044+/mpmHpPz8fNWqVUvLli2znYJz9O6776pevXryer22U4Biufjii/XTTz8xiAUAAACAMHUuV8ZWkfSOMebn1893HOcDY8yXkt40xgyVtFvSHwtf/19J10lKkXRM0u1+r4bfOY4jxwmfC5TD6fcaCsLt/0+Epl69eumf//ynPB6P7RTX69evn9auXWs7AwAAAAD87qzDWMdxdkpqfJrnD0rqfJrnHUn3+qUOQfXCCy+oQoUKql69ugYNGmQ7BwBCRv/+/dW3b181atTIdoqrZWdn61//+pdWr16tAwcOnP0LAAAh4fXXX1deXp569uxpOwUAgIA71xt4IQw89thjkqTf//73ateunWrVqhU2a8jCvX766SelpqbazgCK5c4771Tnzr/6+SVOkZWVpfHjx9vOAAAE2ezZs1VQUFDsYWzZsmWVmJioXbt2+ScMAIAAYNKGX9m8ebPq16+vI0eO2E4BdPfdd2vkyJG2MwAAAOByPXv21IYNG7igBADgavwthdPyer36/e9/r3fffdd2CgCUWKVKldL27dvVrl072ymuN2/ePLVs2dJ2BgAAAAAEFMNYnNEPP/ygf//73/rLX/6i8ePHq6CgwHYSwojX69X48eO1ZcsW2ynABWnQoIEef/xx1a5dW6VKlbKd43rZ2dn64YcfbGcAACz58ssvNWnSJG7aCgAIeQxj8ZuWLl2qp59+Ws8884y++OILZWZm2k4qlqysLK1fv14+n892Cs7C6/Vq2rRp2rlzp+0U4LzVrl1b3bt313333SePx2M7x/WSk5O1e/du2xkAAIuSk5M1ffp0hrEAgJDHDbxwTnJzc9WuXTstX7686CY0JXEtpq+++kodO3a0nQEghEVEROjhhx9W//79baeUCD6fT3fddZfWrFljOwUAYJnjOPL5fMX+PsPj8XDxBQDAtUreNA1W3XTTTYqPj1f9+vX5qTUAnCIiIkLfffed+vbtazulRMjOzla1atW0bt062ykAABc4dOiQqlSpos2bN1/wNipUqKD9+/erWbNmfiwDAMB/uDIW5yUrK0uSdPToUd16662SpN69e5eIK8Befvllvf7667YzAIQwx3E0btw4RUVF2U4pEQoKCpSWlsYP9wAAkk78PZqRkVGse1UYY1SxYkVFRvKtLgDAnfgbChckNzdXCxYskCTl5+erfPnylovObuHChVq5cuUvnktNTdXSpUstFeG3FBQUyOv12s4AzovjOFq4cKHtDAAASrS1a9dq//79xdrG4cOH/VQDAIB/GTdcjWKMsR8BAAAAAAAAAH7gOI453fNuuTI2XdLRwl8BQJIqiWMCgF/iuADgZBwTAJyK4wKAU9k6LiSc6ROuuDJWkowx6x3HaW67A4A7cEwAcCqOCwBOxjEBwKk4LgA4lRuPCxG2AwAAAAAAAAAgHDCMBQAAAAAAAIAgcNMwdobtAACuwjEBwKk4LgA4GccEAKfiuADgVK47LrhmzVgAAAAAAAAACGVuujIWAAAAAAAAAEKW9WGsMaa7MWabMSbFGDPedg+A4DDGXGqM+dgY860xZosxZmTh83HGmBXGmO8Kf61Y+LwxxjxXeKzYbIxpZvd3ACAQjDEeY8xGY8x7hY9rGWM+L/yz/4YxJrrw+ZjCxymFn0+02Q0gMIwxFYwxC40xycaYrcaYNpwrAOHLGDO68HuHJGPMAmNMKc4VgPBijJlljDlgjEk66bnzPjcwxgwufP13xpjBwfw9WB3GGmM8kv4lqYekyyUNMMZcbrMJQNAUSPp/juNcLqm1pHsL//yPl7TScZy6klYWPpZOHCfqFv4zTNL04CcDCIKRkrae9PgxSU87jlNH0iFJQwufHyrpUOHzTxe+DkDoeVbSB47jNJDUWCeOD5wrAGHIGHOJpD9Lau44TiNJHkn9xbkCEG5ekdT9lOfO69zAGBMn6UFJrSS1lPTgzwPcYLB9ZWxLSSmO4+x0HCdP0uuS+lhuAhAEjuP85DjOhsKPs3Tim6tLdOIYMKfwZXMk3VD4cR9Jc50T1kmqYIypFuRsAAFkjKkh6XpJLxc+NpI6SVpY+JJTjwk/HysWSupc+HoAIcIYU17S1ZJmSpLjOHmO4xwW5wpAOIuUVNoYEympjKSfxLkCEFYcx/lUUsYpT5/vuUE3SSscx8lwHOeQpBX69YA3YGwPYy+RtPekxz8UPgcgjBS+ZaippM8lVXEc56fCT6VKqlL4MccLIPQ9I2msJF/h44slHXYcp6Dw8cl/7ouOCYWfP1L4egCho5akNEmzC5cvedkYc5E4VwDCkuM4P0p6UtIenRjCHpH0lThXAHD+5wZWzxlsD2MBhDljTKyktyWNchwn8+TPOY7jSHKshAEIKmNMT0kHHMf5ynYLANeIlNRM0nTHcZpKOqr/e9uhJM4VgHBS+BbiPjrxg5rqki5SEK9kA1AylIRzA9vD2B8lXXrS4xqFzwEIA8aYKJ0YxL7mOM6iwqf3//yWwsJfDxQ+z/ECCG3tJPU2xuzSiWWLOunEWpEVCt+KKP3yz33RMaHw8+UlHQxmMICA+0HSD47jfF74eKFODGc5VwDCUxdJ3zuOk+Y4Tr6kRTpx/sC5AoDzPTewes5gexj7paS6hXc/jNaJxbfftdwEIAgK12uaKWmr4zhTT/rUu5J+vpPhYEmLT3p+UOHdEFtLOnLS2xAAlHCO49zvOE4Nx3ESdeJ84CPHcW6V9LGkfoUvO/WY8POxol/h6139E3AA58dxnFRJe40x9Quf6izpW3GuAISrPZJaG2PKFH4v8fMxgXMFAOd7brBM0rXGmIqFV91fW/hcUBjbxyJjzHU6sUacR9Isx3EethoEICiMMVdJWi3pG/3f+pATdGLd2Dcl1ZS0W9IfHcfJKDzhel4n3op0TNLtjuOsD3o4gIAzxnSQ9FfHcXoaY2rrxJWycZI2ShroOE6uMaaUpFd1Yr3pDEn9HcfZaasZQGAYY5roxE39oiXtlHS7TlxQwrkCEIaMMQ9JullSgU6cF9ypE+s8cq4AhAljzAJJHSRVkrRf0oOS/qPzPDcwxtyhEzMISXrYcZzZQfs92B7GAgAAAAAAAEA4sL1MAQAAAAAAAACEBYaxAAAAAAAAABAEDGMBAAAAAAAAIAgYxgIAAAAAAABAEDCMBQAAAAAAAIAgYBgLAAAAAAAAAEHAMBYAAAAAAAAAgoBhLAAAAAAAAAAEwf8HNCUpE4InlJoAAAAASUVORK5CYII=\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "# collecting all masks and saving\n", + "\n", + "mask_count = np.sum(result['detection_scores'][0] >= min_score_thresh)\n", + "print('Total number of objects found are:', mask_count)\n", + "mask = np.zeros_like(detection_masks_reframed[0])\n", + "for i in range(mask_count):\n", + " if result['detection_scores'][0][i] >= min_score_thresh:\n", + " mask += detection_masks_reframed[i]\n", + "\n", + "mask = tf.clip_by_value(mask, 0,1)\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(mask,cmap='gray')\n", + "plt.show()" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "gpuClass": "standard" + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/official/projects/waste_identification_ml/model_inference/saved_model_inference.ipynb b/official/projects/waste_identification_ml/model_inference/saved_model_inference.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..c7005d8128078bae11e623d7c116471992907d47 --- /dev/null +++ b/official/projects/waste_identification_ml/model_inference/saved_model_inference.ipynb @@ -0,0 +1,1142 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "rOvvWAVTkMR7" + }, + "source": [ + "# Waste identification with instance segmentation in TensorFlow\n", + "\n", + "Welcome to the Instance Segmentation Colab! This notebook will take you through the steps of running an \"out-of-the-box\" Mask RCNN Instance Segmentation model on images." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HVTXSC07QwfG" + }, + "source": [ + "Given 3 different Mask RCNN models for the material type, material form type and plastic type, your goal is to perform inference with any of the models and visualize the results. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AQUsAE0TRkmh" + }, + "source": [ + "To finish this task, a proper path for the saved models and a single image needs to be provided. The path to the labels on which the models are trained is in the waste_identification_ml directory inside the Tensorflow Model Garden repository. The label files are inferred automatically once you select the ML model by which you want to do the inference." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vPs64QA1Zdov" + }, + "source": [ + "## Imports and Setup\n", + "\n", + "Let's start with the base imports." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OtfgxYR-oT8J" + }, + "outputs": [], + "source": [ + "# install model-garden official\n", + "!pip install tf-models-official" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yn5_uV1HLvaz" + }, + "outputs": [], + "source": [ + "import os\n", + "import pathlib\n", + "import cv2\n", + "import logging\n", + "logging.disable(logging.WARNING)\n", + "\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import numpy as np\n", + "from six import BytesIO\n", + "from PIL import Image\n", + "from six.moves.urllib.request import urlopen\n", + "\n", + "from official.vision.ops.preprocess_ops import normalize_image\n", + "\n", + "import tensorflow as tf" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "14bNk1gzh0TN" + }, + "source": [ + "## Visualization tools\n", + "\n", + "To visualize the images with the proper detected boxes and segmentation masks, we will use the TensorFlow Object Detection API. To install it we will clone the repo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "oi28cqGGFWnY", + "outputId": "b3a95e8c-9597-4a03-ec9e-651a1f5dfabb" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cloning into 'models'...\n", + "remote: Enumerating objects: 3444, done.\u001b[K\n", + "remote: Counting objects: 100% (3444/3444), done.\u001b[K\n", + "remote: Compressing objects: 100% (2889/2889), done.\u001b[K\n", + "remote: Total 3444 (delta 894), reused 1458 (delta 498), pack-reused 0\u001b[K\n", + "Receiving objects: 100% (3444/3444), 43.78 MiB | 18.58 MiB/s, done.\n", + "Resolving deltas: 100% (894/894), done.\n" + ] + } + ], + "source": [ + "# Clone the tensorflow models repository\n", + "!git clone --depth 1 https://github.com/tensorflow/models" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yX3pb_pXDjYA" + }, + "source": [ + "Intalling the Object Detection API" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "NwdsBdGhFanc", + "outputId": "dabaca83-793a-4141-a31c-eb75c5f05ba0" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Reading package lists...\n", + "Building dependency tree...\n", + "Reading state information...\n", + "protobuf-compiler is already the newest version (3.0.0-9.1ubuntu1).\n", + "The following package was automatically installed and is no longer required:\n", + " libnvidia-common-460\n", + "Use 'sudo apt autoremove' to remove it.\n", + "0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.\n", + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Processing /content/models/research\n", + "Collecting avro-python3\n", + " Downloading avro-python3-1.10.2.tar.gz (38 kB)\n", + "Collecting apache-beam\n", + " Downloading apache_beam-2.40.0-cp37-cp37m-manylinux2010_x86_64.whl (10.9 MB)\n", + "Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (7.1.2)\n", + "Requirement already satisfied: lxml in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (4.9.1)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (3.2.2)\n", + "Requirement already satisfied: Cython in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (0.29.32)\n", + "Requirement already satisfied: contextlib2 in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (0.5.5)\n", + "Requirement already satisfied: tf-slim in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.15.0)\n", + "Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.0.4)\n", + "Collecting lvis\n", + " Downloading lvis-0.5.3-py3-none-any.whl (14 kB)\n", + "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.7.3)\n", + "Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.3.5)\n", + "Requirement already satisfied: tf-models-official\u003e=2.5.1 in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.9.2)\n", + "Collecting tensorflow_io\n", + " Downloading tensorflow_io-0.26.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (25.9 MB)\n", + "Requirement already satisfied: keras in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.9.0)\n", + "Collecting pyparsing==2.4.7\n", + " Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)\n", + "Requirement already satisfied: numpy\u003e=1.20 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.21.6)\n", + "Requirement already satisfied: sacrebleu in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.2.0)\n", + "Requirement already satisfied: gin-config in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.5.0)\n", + "Requirement already satisfied: tensorflow-text~=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.0)\n", + "Requirement already satisfied: tensorflow-datasets in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.6.0)\n", + "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.1.97)\n", + "Requirement already satisfied: oauth2client in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.1.3)\n", + "Requirement already satisfied: psutil\u003e=5.4.3 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (5.4.8)\n", + "Requirement already satisfied: seqeval in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.2.2)\n", + "Requirement already satisfied: py-cpuinfo\u003e=3.3.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (8.0.0)\n", + "Requirement already satisfied: google-api-python-client\u003e=1.6.7 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.12.11)\n", + "Requirement already satisfied: tensorflow-model-optimization\u003e=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.7.3)\n", + "Requirement already satisfied: pyyaml\u003c6.0,\u003e=5.1 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (5.4.1)\n", + "Requirement already satisfied: tensorflow-hub\u003e=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.12.0)\n", + "Requirement already satisfied: kaggle\u003e=1.3.9 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.5.12)\n", + "Requirement already satisfied: tensorflow-addons in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.17.1)\n", + "Requirement already satisfied: tensorflow~=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.1)\n", + "Requirement already satisfied: opencv-python-headless in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.6.0.66)\n", + "Requirement already satisfied: google-auth-httplib2\u003e=0.0.3 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.0.4)\n", + "Requirement already satisfied: uritemplate\u003c4dev,\u003e=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.0.1)\n", + "Requirement already satisfied: google-auth\u003c3dev,\u003e=1.16.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.35.0)\n", + "Requirement already satisfied: google-api-core\u003c3dev,\u003e=1.21.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.31.6)\n", + "Requirement already satisfied: httplib2\u003c1dev,\u003e=0.15.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.17.4)\n", + "Requirement already satisfied: packaging\u003e=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (21.3)\n", + "Requirement already satisfied: setuptools\u003e=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (57.4.0)\n", + "Requirement already satisfied: requests\u003c3.0.0dev,\u003e=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.23.0)\n", + "Requirement already satisfied: protobuf\u003c4.0.0dev,\u003e=3.12.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.17.3)\n", + "Requirement already satisfied: googleapis-common-protos\u003c2.0dev,\u003e=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.56.4)\n", + "Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2022.1)\n", + "Requirement already satisfied: cachetools\u003c5.0,\u003e=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.2.4)\n", + "Requirement already satisfied: pyasn1-modules\u003e=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.2.8)\n", + "Requirement already satisfied: rsa\u003c5,\u003e=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.9)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2022.6.15)\n", + "Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.8.2)\n", + "Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.24.3)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.64.0)\n", + "Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (6.1.2)\n", + "Requirement already satisfied: pyasn1\u003c0.5.0,\u003e=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules\u003e=0.2.1-\u003egoogle-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.8)\n", + "Requirement already satisfied: idna\u003c3,\u003e=2.5 in /usr/local/lib/python3.7/dist-packages (from requests\u003c3.0.0dev,\u003e=2.18.0-\u003egoogle-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.10)\n", + "Requirement already satisfied: chardet\u003c4,\u003e=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests\u003c3.0.0dev,\u003e=2.18.0-\u003egoogle-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.0.4)\n", + "Requirement already satisfied: flatbuffers\u003c2,\u003e=1.12 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.12)\n", + "Requirement already satisfied: wrapt\u003e=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.14.1)\n", + "Requirement already satisfied: gast\u003c=0.4.0,\u003e=0.2.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.0)\n", + "Requirement already satisfied: astunparse\u003e=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.6.3)\n", + "Requirement already satisfied: termcolor\u003e=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: typing-extensions\u003e=3.6.6 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.1.1)\n", + "Requirement already satisfied: libclang\u003e=13.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (14.0.6)\n", + "Requirement already satisfied: absl-py\u003e=1.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.2.0)\n", + "Requirement already satisfied: grpcio\u003c2.0,\u003e=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.47.0)\n", + "Requirement already satisfied: keras-preprocessing\u003e=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.1.2)\n", + "Requirement already satisfied: google-pasta\u003e=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.2.0)\n", + "Requirement already satisfied: tensorflow-io-gcs-filesystem\u003e=0.23.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.26.0)\n", + "Requirement already satisfied: tensorflow-estimator\u003c2.10.0,\u003e=2.9.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.0)\n", + "Requirement already satisfied: opt-einsum\u003e=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.3.0)\n", + "Requirement already satisfied: h5py\u003e=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.1.0)\n", + "Requirement already satisfied: tensorboard\u003c2.10,\u003e=2.9 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.1)\n", + "Requirement already satisfied: wheel\u003c1.0,\u003e=0.23.0 in /usr/local/lib/python3.7/dist-packages (from astunparse\u003e=1.6.0-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.37.1)\n", + "Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py\u003e=2.9.0-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.5.2)\n", + "Requirement already satisfied: tensorboard-data-server\u003c0.7.0,\u003e=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.6.1)\n", + "Requirement already satisfied: google-auth-oauthlib\u003c0.5,\u003e=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.6)\n", + "Requirement already satisfied: werkzeug\u003e=1.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.0.1)\n", + "Requirement already satisfied: tensorboard-plugin-wit\u003e=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.8.1)\n", + "Requirement already satisfied: markdown\u003e=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.4.1)\n", + "Requirement already satisfied: requests-oauthlib\u003e=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib\u003c0.5,\u003e=0.4.1-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.3.1)\n", + "Requirement already satisfied: importlib-metadata\u003e=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown\u003e=2.6.8-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.12.0)\n", + "Requirement already satisfied: zipp\u003e=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata\u003e=4.4-\u003emarkdown\u003e=2.6.8-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.8.1)\n", + "Requirement already satisfied: oauthlib\u003e=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib\u003e=0.7.0-\u003egoogle-auth-oauthlib\u003c0.5,\u003e=0.4.1-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.2.0)\n", + "Requirement already satisfied: dm-tree~=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow-model-optimization\u003e=0.4.1-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.1.7)\n", + "Collecting hdfs\u003c3.0.0,\u003e=2.1.0\n", + " Downloading hdfs-2.7.0-py3-none-any.whl (34 kB)\n", + "Requirement already satisfied: pyarrow\u003c8.0.0,\u003e=0.15.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam-\u003eobject-detection==0.1) (6.0.1)\n", + "Collecting dill\u003c0.3.2,\u003e=0.3.1.1\n", + " Downloading dill-0.3.1.1.tar.gz (151 kB)\n", + "Requirement already satisfied: pydot\u003c2,\u003e=1.2.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam-\u003eobject-detection==0.1) (1.3.0)\n", + "Collecting requests\u003c3.0.0dev,\u003e=2.18.0\n", + " Downloading requests-2.28.1-py3-none-any.whl (62 kB)\n", + "Collecting fastavro\u003c2,\u003e=0.23.6\n", + " Downloading fastavro-1.5.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)\n", + "Requirement already satisfied: crcmod\u003c2.0,\u003e=1.7 in /usr/local/lib/python3.7/dist-packages (from apache-beam-\u003eobject-detection==0.1) (1.7)\n", + "Collecting pymongo\u003c4.0.0,\u003e=3.8.0\n", + " Downloading pymongo-3.12.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (508 kB)\n", + "Collecting proto-plus\u003c2,\u003e=1.7.1\n", + " Downloading proto_plus-1.22.0-py3-none-any.whl (47 kB)\n", + "Collecting orjson\u003c4.0\n", + " Downloading orjson-3.7.11-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (275 kB)\n", + "Collecting cloudpickle\u003c3,\u003e=2.1.0\n", + " Downloading cloudpickle-2.1.0-py3-none-any.whl (25 kB)\n", + "Collecting docopt\n", + " Downloading docopt-0.6.2.tar.gz (25 kB)\n", + "Collecting protobuf\u003c4.0.0dev,\u003e=3.12.0\n", + " Downloading protobuf-3.19.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", + "Requirement already satisfied: charset-normalizer\u003c3,\u003e=2 in /usr/local/lib/python3.7/dist-packages (from requests\u003c3.0.0dev,\u003e=2.18.0-\u003egoogle-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.1.0)\n", + "Requirement already satisfied: opencv-python\u003e=4.1.0.25 in /usr/local/lib/python3.7/dist-packages (from lvis-\u003eobject-detection==0.1) (4.6.0.66)\n", + "Requirement already satisfied: cycler\u003e=0.10.0 in /usr/local/lib/python3.7/dist-packages (from lvis-\u003eobject-detection==0.1) (0.11.0)\n", + "Requirement already satisfied: kiwisolver\u003e=1.1.0 in /usr/local/lib/python3.7/dist-packages (from lvis-\u003eobject-detection==0.1) (1.4.4)\n", + "Requirement already satisfied: text-unidecode\u003e=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify-\u003ekaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.3)\n", + "Requirement already satisfied: colorama in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.5)\n", + "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2022.6.2)\n", + "Requirement already satisfied: portalocker in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.5.1)\n", + "Requirement already satisfied: tabulate\u003e=0.8.9 in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.8.10)\n", + "Requirement already satisfied: scikit-learn\u003e=0.21.3 in /usr/local/lib/python3.7/dist-packages (from seqeval-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.0.2)\n", + "Requirement already satisfied: joblib\u003e=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn\u003e=0.21.3-\u003eseqeval-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: threadpoolctl\u003e=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn\u003e=0.21.3-\u003eseqeval-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.1.0)\n", + "Requirement already satisfied: typeguard\u003e=2.7 in /usr/local/lib/python3.7/dist-packages (from tensorflow-addons-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.7.1)\n", + "Requirement already satisfied: importlib-resources in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (5.9.0)\n", + "Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.9.0)\n", + "Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.10.2)\n", + "Requirement already satisfied: etils[epath] in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.6.0)\n", + "Requirement already satisfied: promise in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.3)\n", + "Building wheels for collected packages: object-detection, dill, avro-python3, docopt\n", + " Building wheel for object-detection (setup.py): started\n", + " Building wheel for object-detection (setup.py): finished with status 'done'\n", + " Created wheel for object-detection: filename=object_detection-0.1-py3-none-any.whl size=1694955 sha256=e6df096d57a88411b4a823975e0e0162fd70255bf3577e0cca1488d57f27b72a\n", + " Stored in directory: /tmp/pip-ephem-wheel-cache-yb2jlxju/wheels/fa/a4/d2/e9a5057e414fd46c8e543d2706cd836d64e1fcd9eccceb2329\n", + " Building wheel for dill (setup.py): started\n", + " Building wheel for dill (setup.py): finished with status 'done'\n", + " Created wheel for dill: filename=dill-0.3.1.1-py3-none-any.whl size=78544 sha256=3ab6f7fa5e9e0a4b6080a20211a4d9b769c2d0d16cba5c4bc403206ec046bf7c\n", + " Stored in directory: /root/.cache/pip/wheels/a4/61/fd/c57e374e580aa78a45ed78d5859b3a44436af17e22ca53284f\n", + " Building wheel for avro-python3 (setup.py): started\n", + " Building wheel for avro-python3 (setup.py): finished with status 'done'\n", + " Created wheel for avro-python3: filename=avro_python3-1.10.2-py3-none-any.whl size=44010 sha256=6e5591fba8971fe694d841f174b8b7e520bd81ed1427d77ecd3f8e2093905308\n", + " Stored in directory: /root/.cache/pip/wheels/d6/e5/b1/6b151d9b535ee50aaa6ab27d145a0104b6df02e5636f0376da\n", + " Building wheel for docopt (setup.py): started\n", + " Building wheel for docopt (setup.py): finished with status 'done'\n", + " Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=647cd63412960e299ff35e70444f1fea18e72685cd9a12ed6ccadcb4c3b7f156\n", + " Stored in directory: /root/.cache/pip/wheels/72/b0/3f/1d95f96ff986c7dfffe46ce2be4062f38ebd04b506c77c81b9\n", + "Successfully built object-detection dill avro-python3 docopt\n", + "Installing collected packages: requests, pyparsing, protobuf, docopt, dill, pymongo, proto-plus, orjson, hdfs, fastavro, cloudpickle, tensorflow-io, lvis, avro-python3, apache-beam, object-detection\n", + " Attempting uninstall: requests\n", + " Found existing installation: requests 2.23.0\n", + " Uninstalling requests-2.23.0:\n", + " Successfully uninstalled requests-2.23.0\n", + " Attempting uninstall: pyparsing\n", + " Found existing installation: pyparsing 3.0.9\n", + " Uninstalling pyparsing-3.0.9:\n", + " Successfully uninstalled pyparsing-3.0.9\n", + " Attempting uninstall: protobuf\n", + " Found existing installation: protobuf 3.17.3\n", + " Uninstalling protobuf-3.17.3:\n", + " Successfully uninstalled protobuf-3.17.3\n", + " Attempting uninstall: dill\n", + " Found existing installation: dill 0.3.5.1\n", + " Uninstalling dill-0.3.5.1:\n", + " Successfully uninstalled dill-0.3.5.1\n", + " Attempting uninstall: pymongo\n", + " Found existing installation: pymongo 4.2.0\n", + " Uninstalling pymongo-4.2.0:\n", + " Successfully uninstalled pymongo-4.2.0\n", + " Attempting uninstall: cloudpickle\n", + " Found existing installation: cloudpickle 1.3.0\n", + " Uninstalling cloudpickle-1.3.0:\n", + " Successfully uninstalled cloudpickle-1.3.0\n", + "Successfully installed apache-beam-2.40.0 avro-python3-1.10.2 cloudpickle-2.1.0 dill-0.3.1.1 docopt-0.6.2 fastavro-1.5.4 hdfs-2.7.0 lvis-0.5.3 object-detection-0.1 orjson-3.7.11 proto-plus-1.22.0 protobuf-3.19.4 pymongo-3.12.3 pyparsing-2.4.7 requests-2.28.1 tensorflow-io-0.26.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "WARNING: apt does not have a stable CLI interface. Use with caution in scripts.\n", + "\n", + " DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.\n", + " pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.\n", + "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "gym 0.17.3 requires cloudpickle\u003c1.7.0,\u003e=1.2.0, but you have cloudpickle 2.1.0 which is incompatible.\n" + ] + } + ], + "source": [ + "%%bash\n", + "sudo apt install -y protobuf-compiler\n", + "cd models/research/\n", + "protoc object_detection/protos/*.proto --python_out=.\n", + "cp object_detection/packages/tf2/setup.py .\n", + "python -m pip install ." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3yDNgIx-kV7X" + }, + "source": [ + "Now we can import the dependencies we will need later" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2JCeQU3fkayh" + }, + "outputs": [], + "source": [ + "from object_detection.utils import label_map_util\n", + "from object_detection.utils import visualization_utils as viz_utils\n", + "from object_detection.utils import ops as utils_ops\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XRUr9Aiwuho7" + }, + "source": [ + "## Import pre-trained models from the Waste Identification project" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "BWMh8UWl7eZA", + "outputId": "b095cad1-89e2-4b60-dab6-2f5ff0ea94ac" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2022-08-10 22:47:36-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.217.128, 108.177.11.128, 108.177.12.128, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.217.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 521320844 (497M) [application/zip]\n", + "Saving to: ‘material_model.zip’\n", + "\n", + "material_model.zip 100%[===================\u003e] 497.17M 217MB/s in 2.3s \n", + "\n", + "2022-08-10 22:47:38 (217 MB/s) - ‘material_model.zip’ saved [521320844/521320844]\n", + "\n", + "--2022-08-10 22:47:39-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_form_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.217.128, 108.177.11.128, 108.177.12.128, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.217.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 523568744 (499M) [application/zip]\n", + "Saving to: ‘material_form_model.zip’\n", + "\n", + "material_form_model 100%[===================\u003e] 499.31M 131MB/s in 4.0s \n", + "\n", + "2022-08-10 22:47:43 (125 MB/s) - ‘material_form_model.zip’ saved [523568744/523568744]\n", + "\n", + "--2022-08-10 22:47:43-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/plastic_types_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.217.128, 108.177.11.128, 108.177.12.128, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.217.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 521268394 (497M) [application/zip]\n", + "Saving to: ‘plastic_types_model.zip’\n", + "\n", + "plastic_types_model 100%[===================\u003e] 497.12M 152MB/s in 3.3s \n", + "\n", + "2022-08-10 22:47:46 (152 MB/s) - ‘plastic_types_model.zip’ saved [521268394/521268394]\n", + "\n" + ] + } + ], + "source": [ + "# download the model weights from the Google's repo\n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_model.zip \n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_form_model.zip \n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/plastic_types_model.zip " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1oiSODmn7gh-", + "outputId": "a17329e3-371a-4ca6-ac04-0ab39ddfc07c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Archive: material_model.zip\n", + " creating: material/saved_model/\n", + " inflating: material/saved_model/params.yaml \n", + " creating: material/saved_model/saved_model/\n", + " inflating: material/saved_model/saved_model/saved_model.pb \n", + " creating: material/saved_model/saved_model/variables/\n", + " inflating: material/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: material/saved_model/saved_model/variables/variables.index \n", + " creating: material/saved_model/checkpoint/\n", + " inflating: material/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: material/saved_model/checkpoint/checkpoint \n", + " inflating: material/saved_model/checkpoint/ckpt-1.index \n", + " creating: material/tflite_model/\n", + " inflating: material/tflite_model/model.tflite \n", + "Archive: material_form_model.zip\n", + " creating: material_form/saved_model/\n", + " inflating: material_form/saved_model/params.yaml \n", + " creating: material_form/saved_model/saved_model/\n", + " inflating: material_form/saved_model/saved_model/saved_model.pb \n", + " creating: material_form/saved_model/saved_model/variables/\n", + " inflating: material_form/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: material_form/saved_model/saved_model/variables/variables.index \n", + " creating: material_form/saved_model/checkpoint/\n", + " inflating: material_form/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: material_form/saved_model/checkpoint/checkpoint \n", + " inflating: material_form/saved_model/checkpoint/ckpt-1.index \n", + " creating: material_form/tflite_model/\n", + " inflating: material_form/tflite_model/model.tflite \n", + "Archive: plastic_types_model.zip\n", + " creating: plastic_type/saved_model/\n", + " inflating: plastic_type/saved_model/params.yaml \n", + " creating: plastic_type/saved_model/saved_model/\n", + " inflating: plastic_type/saved_model/saved_model/saved_model.pb \n", + " creating: plastic_type/saved_model/saved_model/variables/\n", + " inflating: plastic_type/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: plastic_type/saved_model/saved_model/variables/variables.index \n", + " creating: plastic_type/saved_model/checkpoint/\n", + " inflating: plastic_type/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: plastic_type/saved_model/checkpoint/checkpoint \n", + " inflating: plastic_type/saved_model/checkpoint/ckpt-1.index \n", + " creating: plastic_type/tflite_model/\n", + " inflating: plastic_type/tflite_model/model.tflite \n" + ] + } + ], + "source": [ + "# unziping the folders\n", + "%%bash\n", + "mkdir material material_form plastic_type\n", + "unzip material_model.zip -d material/\n", + "unzip material_form_model.zip -d material_form/\n", + "unzip plastic_types_model.zip -d plastic_type/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ey-8Ij2sKjkD" + }, + "outputs": [], + "source": [ + "ALL_MODELS = {\n", + "'material_model' : 'material/saved_model/saved_model/',\n", + "'material_form_model' : 'material_form/saved_model/saved_model/',\n", + "'plastic_model' : 'plastic_type/saved_model/saved_model/'\n", + "}\n", + "\n", + "# path to an image\n", + "IMAGES_FOR_TEST = {\n", + " 'Image1' : 'models/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_2.png'\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IogyryF2lFBL" + }, + "source": [ + "## Utilities\n", + "\n", + "Run the following cell to create some utils that will be needed later:\n", + "\n", + "- Helper method to load an image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9XXfEdD9PMKn" + }, + "outputs": [], + "source": [ + "# Inputs to preprocess functions\n", + "\n", + "def load_image_into_numpy_array(path):\n", + " \"\"\"Load an image from file into a numpy array.\n", + "\n", + " Puts image into numpy array to feed into tensorflow graph.\n", + " Note that by convention we put it into a numpy array with shape\n", + " (height, width, channels), where channels=3 for RGB.\n", + "\n", + " Args:\n", + " path: the file path to the image\n", + "\n", + " Returns:\n", + " uint8 numpy array with shape (1, h, w, 3)\n", + " \"\"\"\n", + " image = None\n", + " if(path.startswith('http')):\n", + " response = urlopen(path)\n", + " image_data = response.read()\n", + " image_data = BytesIO(image_data)\n", + " image = Image.open(image_data)\n", + " else:\n", + " image_data = tf.io.gfile.GFile(path, 'rb').read()\n", + " image = Image.open(BytesIO(image_data))\n", + "\n", + " (im_width, im_height) = image.size\n", + " return np.array(image.getdata()).reshape(\n", + " (1, im_height, im_width, 3)).astype(np.uint8)\n", + "\n", + "\n", + "def build_inputs_for_segmentation(image):\n", + " \"\"\"Builds segmentation model inputs for serving.\"\"\"\n", + " # Normalizes image with mean and std pixel values.\n", + " image = normalize_image(image)\n", + " return image" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6917xnUSlp9x" + }, + "source": [ + "## Build a instance segmentation model and load pre-trained model weights\n", + "\n", + "Here we will choose which Instance Segmentation model we will use.\n", + "If you want to change the model to try other architectures later, just change the next cell and execute following ones." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "HtwrSqvakTNn", + "outputId": "94710cf5-1077-4921-d703-04eb23f6cd18" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Selected model:material_form_model\n", + "Model Handle at TensorFlow Hub: material_form/saved_model/saved_model/\n" + ] + } + ], + "source": [ + "# @title Model Selection { display-mode: \"form\", run: \"auto\" }\n", + "model_display_name = 'material_form_model' # @param ['material_model','material_form_model','plastic_model']\n", + "model_handle = ALL_MODELS[model_display_name]\n", + "\n", + "print('Selected model:'+ model_display_name)\n", + "print('Model Handle at TensorFlow Hub: {}'.format(model_handle))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NKtD0IeclbL5" + }, + "source": [ + "### Load label map data (for plotting).\n", + "\n", + "Label maps correspond index numbers to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine.\n", + "\n", + "We are going, for simplicity, to load from the repository that we loaded the Object Detection API code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3Kwqa0T1NTUf", + "outputId": "69ce362e-0e37-4dff-bb20-970404c80714" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Labels selected for material_form_model\n", + "\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "{1: {'id': 1, 'name': 'Flexibles'},\n", + " 2: {'id': 2, 'name': 'Bottle'},\n", + " 3: {'id': 3, 'name': 'Jar'},\n", + " 4: {'id': 4, 'name': 'Carton'},\n", + " 5: {'id': 5, 'name': 'Sachets-\u0026-Pouch'},\n", + " 6: {'id': 6, 'name': 'Blister-pack'},\n", + " 7: {'id': 7, 'name': 'Tray'},\n", + " 8: {'id': 8, 'name': 'Tube'},\n", + " 9: {'id': 9, 'name': 'Can'},\n", + " 10: {'id': 10, 'name': 'Tub'},\n", + " 11: {'id': 11, 'name': 'Cosmetic'},\n", + " 12: {'id': 12, 'name': 'Box'},\n", + " 13: {'id': 13, 'name': 'Clothes'},\n", + " 14: {'id': 14, 'name': 'Bulb'},\n", + " 15: {'id': 15, 'name': 'Cup-\u0026-glass'},\n", + " 16: {'id': 16, 'name': 'Book-\u0026-magazine'},\n", + " 17: {'id': 17, 'name': 'Bag'},\n", + " 18: {'id': 18, 'name': 'Lid'},\n", + " 19: {'id': 19, 'name': 'Clamshell'},\n", + " 20: {'id': 20, 'name': 'Mirror'},\n", + " 21: {'id': 21, 'name': 'Tangler'},\n", + " 22: {'id': 22, 'name': 'Cutlery'},\n", + " 23: {'id': 23, 'name': 'Cassette-\u0026-tape'},\n", + " 24: {'id': 24, 'name': 'Electronic-devices'},\n", + " 25: {'id': 25, 'name': 'Battery'},\n", + " 26: {'id': 26, 'name': 'Pen-\u0026-pencil'},\n", + " 27: {'id': 27, 'name': 'Paper-products'},\n", + " 28: {'id': 28, 'name': 'Foot-wear'},\n", + " 29: {'id': 29, 'name': 'Scissor'},\n", + " 30: {'id': 30, 'name': 'Toys'},\n", + " 31: {'id': 31, 'name': 'Brush'},\n", + " 32: {'id': 32, 'name': 'Pipe'},\n", + " 33: {'id': 33, 'name': 'Foil'},\n", + " 34: {'id': 34, 'name': 'Hangers'}}" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# @title Labels for the above model { display-mode: \"form\", run: \"auto\" }\n", + "\n", + "if model_display_name == 'material_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/material_labels.pbtxt'\n", + "elif model_display_name == 'material_form_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/material_form_labels.pbtxt'\n", + "elif model_display_name == 'plastic_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/plastic_type_labels.pbtxt'\n", + "\n", + "print('Labels selected for',model_display_name)\n", + "print('\\n')\n", + "category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)\n", + "category_index" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "muhUt-wWL582" + }, + "source": [ + "## Loading the selected model from TensorFlow Hub\n", + "\n", + "Here we just need the model handle that was selected and use the Tensorflow Hub library to load it to memory.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "rBuD07fLlcEO", + "outputId": "80bb02b6-90d6-4e86-89cc-0a8a57941b33" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "loading model...\n", + "model loaded!\n" + ] + } + ], + "source": [ + "print('loading model...')\n", + "model = tf.saved_model.load(model_handle)\n", + "print('model loaded!')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GIawRDKPPnd4" + }, + "source": [ + "## Loading an image\n", + "\n", + "Let's try the model on a simple image. \n", + "\n", + "Here are some simple things to try out if you are curious:\n", + "* Try running inference on your own images, just upload them to colab and load the same way it's done in the cell below.\n", + "* Modify some of the input images and see if detection still works. Some simple things to try out here include flipping the image horizontally, or converting to grayscale (note that we still expect the input image to have 3 channels).\n", + "\n", + "**Be careful:** when using images with an alpha channel, the model expect 3 channels images and the alpha will count as a 4th.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 822 + }, + "id": "hX-AWUQ1wIEr", + "outputId": "e98a9bfb-d334-493b-8f0d-f3d3282b6d5d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "min: 0 max: 255\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "#@title Image Selection (don't forget to execute the cell!) { display-mode: \"form\"}\n", + "selected_image = 'Image1' # @param ['Image1']\n", + "flip_image_horizontally = False #@param {type:\"boolean\"}\n", + "convert_image_to_grayscale = False #@param {type:\"boolean\"}\n", + "\n", + "image_path = IMAGES_FOR_TEST[selected_image]\n", + "image_np = load_image_into_numpy_array(image_path)\n", + "\n", + "# Flip horizontally\n", + "if(flip_image_horizontally):\n", + " image_np[0] = np.fliplr(image_np[0]).copy()\n", + "\n", + "# Convert image to grayscale\n", + "if(convert_image_to_grayscale):\n", + " image_np[0] = np.tile(\n", + " np.mean(image_np[0], 2, keepdims=True), (1, 1, 3)).astype(np.uint8)\n", + "\n", + "print('min:',np.min(image_np[0]), 'max:', np.max(image_np[0]))\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np[0])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dkkBAgGcX65P" + }, + "source": [ + "## Pre-processing an image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "97zIaKAhX-92", + "outputId": "6b3df1b7-4b4a-45d4-c5cc-6a72b987448b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(512, 1024)\n" + ] + } + ], + "source": [ + "# get an input size of images on which an Instance Segmentation model is trained\n", + "detection_fn = model.signatures['serving_default']\n", + "height= detection_fn.structured_input_signature[1]['inputs'].shape[1]\n", + "width = detection_fn.structured_input_signature[1]['inputs'].shape[2]\n", + "input_size = (height, width)\n", + "print(input_size)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-K0V6KWiYYpD", + "outputId": "4192b95a-2fe9-41da-a0eb-f1b6d26c904a" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "TensorShape([1, 512, 1024, 3])" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# apply pre-processing functions which were applied during training the model\n", + "image_np_cp = cv2.resize(image_np[0], input_size[::-1], interpolation = cv2.INTER_AREA)\n", + "image_np = build_inputs_for_segmentation(image_np_cp)\n", + "image_np = tf.expand_dims(image_np, axis=0)\n", + "image_np.get_shape()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 721 + }, + "id": "ga1lccBpdxpd", + "outputId": "315ba6fd-bb2e-4403-b911-7036dfac48b1" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# display pre-processed image\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np[0])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FTHsFjR6HNwb" + }, + "source": [ + "## Doing the inference\n", + "\n", + "To do the inference we just need to call our TF Hub loaded model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Gb_siXKcnnGC", + "outputId": "f26565c2-484e-43a7-83d7-d682a4cad7df" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dict_keys(['num_detections', 'detection_classes', 'detection_scores', 'detection_masks', 'image_info', 'detection_boxes'])\n" + ] + } + ], + "source": [ + "# running inference\n", + "results = detection_fn(image_np)\n", + "\n", + "# different object detection models have additional results\n", + "# all of them are explained in the documentation\n", + "result = {key:value.numpy() for key,value in results.items()}\n", + "print(result.keys())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IZ5VYaBoeeFM" + }, + "source": [ + "## Visualizing the results\n", + "\n", + "Here is where we will need the TensorFlow Object Detection API to show the squares from the inference step (and the keypoints when available).\n", + "\n", + "the full documentation of this method can be seen [here](https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py)\n", + "\n", + "Here you can, for example, set `min_score_thresh` to other values (between 0 and 1) to allow more detections in or to filter out more detections." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PMzURFjxxqF7" + }, + "outputs": [], + "source": [ + "# selecting parameters for visualization\n", + "label_id_offset = 0\n", + "min_score_thresh =0.6\n", + "use_normalized_coordinates=True\n", + "\n", + "if use_normalized_coordinates:\n", + " # Normalizing detection boxes\n", + " result['detection_boxes'][0][:,[0,2]] /= height\n", + " result['detection_boxes'][0][:,[1,3]] /= width" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 721 + }, + "id": "FILNrrDy0kUg", + "outputId": "d3be2e7c-4e00-4d90-acd9-637aff9c970f" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Visualize detection and masks\n", + "if 'detection_masks' in result:\n", + " # we need to convert np.arrays to tensors\n", + " detection_masks = tf.convert_to_tensor(result['detection_masks'][0])\n", + " detection_boxes = tf.convert_to_tensor(result['detection_boxes'][0])\n", + "\n", + " detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(\n", + " detection_masks, detection_boxes,\n", + " image_np.shape[1], image_np.shape[2])\n", + " detection_masks_reframed = tf.cast(detection_masks_reframed \u003e 0.5,\n", + " np.uint8)\n", + "\n", + " result['detection_masks_reframed'] = detection_masks_reframed.numpy()\n", + "viz_utils.visualize_boxes_and_labels_on_image_array(\n", + " image_np_cp,\n", + " result['detection_boxes'][0],\n", + " (result['detection_classes'][0] + label_id_offset).astype(int),\n", + " result['detection_scores'][0],\n", + " category_index=category_index,\n", + " use_normalized_coordinates=use_normalized_coordinates,\n", + " max_boxes_to_draw=200,\n", + " min_score_thresh=min_score_thresh,\n", + " agnostic_mode=False,\n", + " instance_masks=result.get('detection_masks_reframed', None),\n", + " line_thickness=2)\n", + "\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np_cp)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c75cSAeJ5JAQ" + }, + "source": [ + "## Visualizing the masks only" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 738 + }, + "id": "tt7RxYqhLpn9", + "outputId": "54c554d4-e732-4466-d582-476b46dcea37" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total number of objects found are: 26\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# collecting all masks and saving\n", + "\n", + "mask_count = np.sum(result['detection_scores'][0] \u003e= min_score_thresh)\n", + "print('Total number of objects found are:', mask_count)\n", + "mask = np.zeros_like(detection_masks_reframed[0])\n", + "for i in range(mask_count):\n", + " if result['detection_scores'][0][i] \u003e= min_score_thresh:\n", + " mask += detection_masks_reframed[i]\n", + "\n", + "mask = tf.clip_by_value(mask, 0,1)\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(mask,cmap='gray')\n", + "plt.show()" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "saved_model_inference.ipynb", + "provenance": [] + }, + "gpuClass": "standard", + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/waste_identification_ml/model_inference/tflite_model_inference.ipynb b/official/projects/waste_identification_ml/model_inference/tflite_model_inference.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..5618e89f35c2a2a64d671d07c7be9763f10bf912 --- /dev/null +++ b/official/projects/waste_identification_ml/model_inference/tflite_model_inference.ipynb @@ -0,0 +1,1178 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "rOvvWAVTkMR7" + }, + "source": [ + "# Waste identification with instance segmentation in TensorFlow\n", + "\n", + "Welcome to the Instance Segmentation Colab! This notebook will take you through the steps of running an \"out-of-the-box\" Mask RCNN Instance Segmentation model on images." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HVTXSC07QwfG" + }, + "source": [ + "Given 3 different Mask RCNN models for the material type, material form type and plastic type, your goal is to inference with any of the models and visualize the results. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AQUsAE0TRkmh" + }, + "source": [ + "To finish this task, a proper path for the TF Lite models and a single image needs to be provided. The path to the labels on which the models are trained is in the waste_identification_ml directory inside the Tensorflow Model Garden repository. The label files are inferred automatically once you select the ML model by which you want to do the inference." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vPs64QA1Zdov" + }, + "source": [ + "## Imports and Setup\n", + "\n", + "Let's start with the base imports." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Xk4FU-jx9kc3" + }, + "outputs": [], + "source": [ + "# install model-garden official\n", + "!pip install tf-models-official" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "yn5_uV1HLvaz" + }, + "outputs": [], + "source": [ + "import cv2\n", + "\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "from pprint import pprint\n", + "\n", + "import numpy as np\n", + "from six import BytesIO\n", + "from PIL import Image\n", + "from six.moves.urllib.request import urlopen\n", + "\n", + "from official.vision.ops.preprocess_ops import normalize_image\n", + "\n", + "import tensorflow as tf" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "14bNk1gzh0TN" + }, + "source": [ + "## Visualization tools\n", + "\n", + "To visualize the images with the proper detected boxes and segmentation masks, we will use the TensorFlow Object Detection API. To install it we will clone the repo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "oi28cqGGFWnY", + "outputId": "0d35a3d1-0615-4a69-b861-c98228bfae26" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cloning into 'models'...\n", + "remote: Enumerating objects: 3444, done.\u001b[K\n", + "remote: Counting objects: 100% (3444/3444), done.\u001b[K\n", + "remote: Compressing objects: 100% (2888/2888), done.\u001b[K\n", + "remote: Total 3444 (delta 894), reused 1456 (delta 499), pack-reused 0\u001b[K\n", + "Receiving objects: 100% (3444/3444), 43.78 MiB | 20.57 MiB/s, done.\n", + "Resolving deltas: 100% (894/894), done.\n" + ] + } + ], + "source": [ + "# Clone the tensorflow models repository\n", + "!git clone --depth 1 https://github.com/tensorflow/models" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yX3pb_pXDjYA" + }, + "source": [ + "Installing the Object Detection API" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "NwdsBdGhFanc", + "outputId": "534bf43d-6325-463f-bac6-764d0271977b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Reading package lists...\n", + "Building dependency tree...\n", + "Reading state information...\n", + "protobuf-compiler is already the newest version (3.0.0-9.1ubuntu1).\n", + "The following package was automatically installed and is no longer required:\n", + " libnvidia-common-460\n", + "Use 'sudo apt autoremove' to remove it.\n", + "0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.\n", + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Processing /content/models/research\n", + "Collecting avro-python3\n", + " Downloading avro-python3-1.10.2.tar.gz (38 kB)\n", + "Collecting apache-beam\n", + " Downloading apache_beam-2.40.0-cp37-cp37m-manylinux2010_x86_64.whl (10.9 MB)\n", + "Requirement already satisfied: pillow in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (7.1.2)\n", + "Requirement already satisfied: lxml in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (4.9.1)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (3.2.2)\n", + "Requirement already satisfied: Cython in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (0.29.32)\n", + "Requirement already satisfied: contextlib2 in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (0.5.5)\n", + "Requirement already satisfied: tf-slim in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.15.0)\n", + "Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.0.4)\n", + "Collecting lvis\n", + " Downloading lvis-0.5.3-py3-none-any.whl (14 kB)\n", + "Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.7.3)\n", + "Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (1.3.5)\n", + "Requirement already satisfied: tf-models-official\u003e=2.5.1 in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.9.2)\n", + "Collecting tensorflow_io\n", + " Downloading tensorflow_io-0.26.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (25.9 MB)\n", + "Requirement already satisfied: keras in /usr/local/lib/python3.7/dist-packages (from object-detection==0.1) (2.9.0)\n", + "Collecting pyparsing==2.4.7\n", + " Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)\n", + "Requirement already satisfied: gin-config in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.5.0)\n", + "Requirement already satisfied: seqeval in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.2.2)\n", + "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.1.97)\n", + "Requirement already satisfied: tensorflow-model-optimization\u003e=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.7.3)\n", + "Requirement already satisfied: numpy\u003e=1.20 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.21.6)\n", + "Requirement already satisfied: tensorflow~=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.1)\n", + "Requirement already satisfied: psutil\u003e=5.4.3 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (5.4.8)\n", + "Requirement already satisfied: pyyaml\u003c6.0,\u003e=5.1 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (5.4.1)\n", + "Requirement already satisfied: kaggle\u003e=1.3.9 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.5.12)\n", + "Requirement already satisfied: py-cpuinfo\u003e=3.3.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (8.0.0)\n", + "Requirement already satisfied: tensorflow-hub\u003e=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.12.0)\n", + "Requirement already satisfied: oauth2client in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.1.3)\n", + "Requirement already satisfied: tensorflow-datasets in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.6.0)\n", + "Requirement already satisfied: tensorflow-addons in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.17.1)\n", + "Requirement already satisfied: google-api-python-client\u003e=1.6.7 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.12.11)\n", + "Requirement already satisfied: sacrebleu in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.2.0)\n", + "Requirement already satisfied: tensorflow-text~=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.0)\n", + "Requirement already satisfied: opencv-python-headless in /usr/local/lib/python3.7/dist-packages (from tf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.6.0.66)\n", + "Requirement already satisfied: uritemplate\u003c4dev,\u003e=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.0.1)\n", + "Requirement already satisfied: google-api-core\u003c3dev,\u003e=1.21.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.31.6)\n", + "Requirement already satisfied: httplib2\u003c1dev,\u003e=0.15.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.17.4)\n", + "Requirement already satisfied: google-auth\u003c3dev,\u003e=1.16.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.35.0)\n", + "Requirement already satisfied: google-auth-httplib2\u003e=0.0.3 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.0.4)\n", + "Requirement already satisfied: requests\u003c3.0.0dev,\u003e=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.23.0)\n", + "Requirement already satisfied: setuptools\u003e=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (57.4.0)\n", + "Requirement already satisfied: packaging\u003e=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (21.3)\n", + "Requirement already satisfied: protobuf\u003c4.0.0dev,\u003e=3.12.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.17.3)\n", + "Requirement already satisfied: googleapis-common-protos\u003c2.0dev,\u003e=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.56.4)\n", + "Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2022.1)\n", + "Requirement already satisfied: rsa\u003c5,\u003e=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.9)\n", + "Requirement already satisfied: pyasn1-modules\u003e=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.2.8)\n", + "Requirement already satisfied: cachetools\u003c5.0,\u003e=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.2.4)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.64.0)\n", + "Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.24.3)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2022.6.15)\n", + "Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.8.2)\n", + "Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (6.1.2)\n", + "Requirement already satisfied: pyasn1\u003c0.5.0,\u003e=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules\u003e=0.2.1-\u003egoogle-auth\u003c3dev,\u003e=1.16.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.8)\n", + "Requirement already satisfied: idna\u003c3,\u003e=2.5 in /usr/local/lib/python3.7/dist-packages (from requests\u003c3.0.0dev,\u003e=2.18.0-\u003egoogle-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.10)\n", + "Requirement already satisfied: chardet\u003c4,\u003e=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests\u003c3.0.0dev,\u003e=2.18.0-\u003egoogle-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.0.4)\n", + "Requirement already satisfied: libclang\u003e=13.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (14.0.6)\n", + "Requirement already satisfied: tensorflow-estimator\u003c2.10.0,\u003e=2.9.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.0)\n", + "Requirement already satisfied: absl-py\u003e=1.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.2.0)\n", + "Requirement already satisfied: gast\u003c=0.4.0,\u003e=0.2.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.0)\n", + "Requirement already satisfied: keras-preprocessing\u003e=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.1.2)\n", + "Requirement already satisfied: h5py\u003e=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.1.0)\n", + "Requirement already satisfied: tensorboard\u003c2.10,\u003e=2.9 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.9.1)\n", + "Requirement already satisfied: google-pasta\u003e=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.2.0)\n", + "Requirement already satisfied: wrapt\u003e=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.14.1)\n", + "Requirement already satisfied: grpcio\u003c2.0,\u003e=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.47.0)\n", + "Requirement already satisfied: astunparse\u003e=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.6.3)\n", + "Requirement already satisfied: flatbuffers\u003c2,\u003e=1.12 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.12)\n", + "Requirement already satisfied: tensorflow-io-gcs-filesystem\u003e=0.23.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.26.0)\n", + "Requirement already satisfied: opt-einsum\u003e=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.3.0)\n", + "Requirement already satisfied: termcolor\u003e=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: typing-extensions\u003e=3.6.6 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.1.1)\n", + "Requirement already satisfied: wheel\u003c1.0,\u003e=0.23.0 in /usr/local/lib/python3.7/dist-packages (from astunparse\u003e=1.6.0-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.37.1)\n", + "Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py\u003e=2.9.0-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.5.2)\n", + "Requirement already satisfied: markdown\u003e=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.4.1)\n", + "Requirement already satisfied: google-auth-oauthlib\u003c0.5,\u003e=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.6)\n", + "Requirement already satisfied: tensorboard-data-server\u003c0.7.0,\u003e=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.6.1)\n", + "Requirement already satisfied: tensorboard-plugin-wit\u003e=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.8.1)\n", + "Requirement already satisfied: werkzeug\u003e=1.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.0.1)\n", + "Requirement already satisfied: requests-oauthlib\u003e=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib\u003c0.5,\u003e=0.4.1-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.3.1)\n", + "Requirement already satisfied: importlib-metadata\u003e=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown\u003e=2.6.8-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (4.12.0)\n", + "Requirement already satisfied: zipp\u003e=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata\u003e=4.4-\u003emarkdown\u003e=2.6.8-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.8.1)\n", + "Requirement already satisfied: oauthlib\u003e=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib\u003e=0.7.0-\u003egoogle-auth-oauthlib\u003c0.5,\u003e=0.4.1-\u003etensorboard\u003c2.10,\u003e=2.9-\u003etensorflow~=2.9.0-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.2.0)\n", + "Requirement already satisfied: dm-tree~=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow-model-optimization\u003e=0.4.1-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.1.7)\n", + "Collecting requests\u003c3.0.0dev,\u003e=2.18.0\n", + " Downloading requests-2.28.1-py3-none-any.whl (62 kB)\n", + "Collecting hdfs\u003c3.0.0,\u003e=2.1.0\n", + " Downloading hdfs-2.7.0-py3-none-any.whl (34 kB)\n", + "Requirement already satisfied: pydot\u003c2,\u003e=1.2.0 in /usr/local/lib/python3.7/dist-packages (from apache-beam-\u003eobject-detection==0.1) (1.3.0)\n", + "Collecting proto-plus\u003c2,\u003e=1.7.1\n", + " Downloading proto_plus-1.22.0-py3-none-any.whl (47 kB)\n", + "Collecting orjson\u003c4.0\n", + " Downloading orjson-3.7.11-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (275 kB)\n", + "Requirement already satisfied: crcmod\u003c2.0,\u003e=1.7 in /usr/local/lib/python3.7/dist-packages (from apache-beam-\u003eobject-detection==0.1) (1.7)\n", + "Collecting cloudpickle\u003c3,\u003e=2.1.0\n", + " Downloading cloudpickle-2.1.0-py3-none-any.whl (25 kB)\n", + "Collecting dill\u003c0.3.2,\u003e=0.3.1.1\n", + " Downloading dill-0.3.1.1.tar.gz (151 kB)\n", + "Collecting fastavro\u003c2,\u003e=0.23.6\n", + " Downloading fastavro-1.5.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)\n", + "Requirement already satisfied: pyarrow\u003c8.0.0,\u003e=0.15.1 in /usr/local/lib/python3.7/dist-packages (from apache-beam-\u003eobject-detection==0.1) (6.0.1)\n", + "Collecting pymongo\u003c4.0.0,\u003e=3.8.0\n", + " Downloading pymongo-3.12.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (508 kB)\n", + "Collecting docopt\n", + " Downloading docopt-0.6.2.tar.gz (25 kB)\n", + "Collecting protobuf\u003c4.0.0dev,\u003e=3.12.0\n", + " Downloading protobuf-3.19.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", + "Requirement already satisfied: charset-normalizer\u003c3,\u003e=2 in /usr/local/lib/python3.7/dist-packages (from requests\u003c3.0.0dev,\u003e=2.18.0-\u003egoogle-api-core\u003c3dev,\u003e=1.21.0-\u003egoogle-api-python-client\u003e=1.6.7-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.1.0)\n", + "Requirement already satisfied: kiwisolver\u003e=1.1.0 in /usr/local/lib/python3.7/dist-packages (from lvis-\u003eobject-detection==0.1) (1.4.4)\n", + "Requirement already satisfied: opencv-python\u003e=4.1.0.25 in /usr/local/lib/python3.7/dist-packages (from lvis-\u003eobject-detection==0.1) (4.6.0.66)\n", + "Requirement already satisfied: cycler\u003e=0.10.0 in /usr/local/lib/python3.7/dist-packages (from lvis-\u003eobject-detection==0.1) (0.11.0)\n", + "Requirement already satisfied: text-unidecode\u003e=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify-\u003ekaggle\u003e=1.3.9-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.3)\n", + "Requirement already satisfied: portalocker in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.5.1)\n", + "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2022.6.2)\n", + "Requirement already satisfied: colorama in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.4.5)\n", + "Requirement already satisfied: tabulate\u003e=0.8.9 in /usr/local/lib/python3.7/dist-packages (from sacrebleu-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.8.10)\n", + "Requirement already satisfied: scikit-learn\u003e=0.21.3 in /usr/local/lib/python3.7/dist-packages (from seqeval-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.0.2)\n", + "Requirement already satisfied: joblib\u003e=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn\u003e=0.21.3-\u003eseqeval-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.1.0)\n", + "Requirement already satisfied: threadpoolctl\u003e=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn\u003e=0.21.3-\u003eseqeval-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (3.1.0)\n", + "Requirement already satisfied: typeguard\u003e=2.7 in /usr/local/lib/python3.7/dist-packages (from tensorflow-addons-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.7.1)\n", + "Requirement already satisfied: importlib-resources in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (5.9.0)\n", + "Requirement already satisfied: etils[epath] in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.6.0)\n", + "Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (0.10.2)\n", + "Requirement already satisfied: promise in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (2.3)\n", + "Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets-\u003etf-models-official\u003e=2.5.1-\u003eobject-detection==0.1) (1.9.0)\n", + "Building wheels for collected packages: object-detection, dill, avro-python3, docopt\n", + " Building wheel for object-detection (setup.py): started\n", + " Building wheel for object-detection (setup.py): finished with status 'done'\n", + " Created wheel for object-detection: filename=object_detection-0.1-py3-none-any.whl size=1694955 sha256=bbe6ac88d20695351c8d1ff4cfa94b837bf70b4a1ea34d2500087845a1a518f7\n", + " Stored in directory: /tmp/pip-ephem-wheel-cache-fosu29b4/wheels/fa/a4/d2/e9a5057e414fd46c8e543d2706cd836d64e1fcd9eccceb2329\n", + " Building wheel for dill (setup.py): started\n", + " Building wheel for dill (setup.py): finished with status 'done'\n", + " Created wheel for dill: filename=dill-0.3.1.1-py3-none-any.whl size=78544 sha256=8600ea58bf6db3cb069ab8ec62f0ac32b9785185e5c756a953626ba68cd67e08\n", + " Stored in directory: /root/.cache/pip/wheels/a4/61/fd/c57e374e580aa78a45ed78d5859b3a44436af17e22ca53284f\n", + " Building wheel for avro-python3 (setup.py): started\n", + " Building wheel for avro-python3 (setup.py): finished with status 'done'\n", + " Created wheel for avro-python3: filename=avro_python3-1.10.2-py3-none-any.whl size=44010 sha256=4b5cf56fe1a1eecb4dc5deeef369333aec8389555dbc4bf86959cc48ca96adab\n", + " Stored in directory: /root/.cache/pip/wheels/d6/e5/b1/6b151d9b535ee50aaa6ab27d145a0104b6df02e5636f0376da\n", + " Building wheel for docopt (setup.py): started\n", + " Building wheel for docopt (setup.py): finished with status 'done'\n", + " Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=c9e261014ce1176b800ebe8923f8660f712dd8e5dc07d7975998f863881c2b3d\n", + " Stored in directory: /root/.cache/pip/wheels/72/b0/3f/1d95f96ff986c7dfffe46ce2be4062f38ebd04b506c77c81b9\n", + "Successfully built object-detection dill avro-python3 docopt\n", + "Installing collected packages: requests, pyparsing, protobuf, docopt, dill, pymongo, proto-plus, orjson, hdfs, fastavro, cloudpickle, tensorflow-io, lvis, avro-python3, apache-beam, object-detection\n", + " Attempting uninstall: requests\n", + " Found existing installation: requests 2.23.0\n", + " Uninstalling requests-2.23.0:\n", + " Successfully uninstalled requests-2.23.0\n", + " Attempting uninstall: pyparsing\n", + " Found existing installation: pyparsing 3.0.9\n", + " Uninstalling pyparsing-3.0.9:\n", + " Successfully uninstalled pyparsing-3.0.9\n", + " Attempting uninstall: protobuf\n", + " Found existing installation: protobuf 3.17.3\n", + " Uninstalling protobuf-3.17.3:\n", + " Successfully uninstalled protobuf-3.17.3\n", + " Attempting uninstall: dill\n", + " Found existing installation: dill 0.3.5.1\n", + " Uninstalling dill-0.3.5.1:\n", + " Successfully uninstalled dill-0.3.5.1\n", + " Attempting uninstall: pymongo\n", + " Found existing installation: pymongo 4.2.0\n", + " Uninstalling pymongo-4.2.0:\n", + " Successfully uninstalled pymongo-4.2.0\n", + " Attempting uninstall: cloudpickle\n", + " Found existing installation: cloudpickle 1.3.0\n", + " Uninstalling cloudpickle-1.3.0:\n", + " Successfully uninstalled cloudpickle-1.3.0\n", + "Successfully installed apache-beam-2.40.0 avro-python3-1.10.2 cloudpickle-2.1.0 dill-0.3.1.1 docopt-0.6.2 fastavro-1.5.4 hdfs-2.7.0 lvis-0.5.3 object-detection-0.1 orjson-3.7.11 proto-plus-1.22.0 protobuf-3.19.4 pymongo-3.12.3 pyparsing-2.4.7 requests-2.28.1 tensorflow-io-0.26.0\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n", + "WARNING: apt does not have a stable CLI interface. Use with caution in scripts.\n", + "\n", + " DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.\n", + " pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.\n", + "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", + "gym 0.17.3 requires cloudpickle\u003c1.7.0,\u003e=1.2.0, but you have cloudpickle 2.1.0 which is incompatible.\n" + ] + } + ], + "source": [ + "%%bash\n", + "sudo apt install -y protobuf-compiler\n", + "cd models/research/\n", + "protoc object_detection/protos/*.proto --python_out=.\n", + "cp object_detection/packages/tf2/setup.py .\n", + "python -m pip install ." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3yDNgIx-kV7X" + }, + "source": [ + "Now we can import the dependencies we will need later" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2JCeQU3fkayh" + }, + "outputs": [], + "source": [ + "from object_detection.utils import label_map_util\n", + "from object_detection.utils import visualization_utils as viz_utils\n", + "from object_detection.utils import ops as utils_ops\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XRUr9Aiwuho7" + }, + "source": [ + "## Import pre-trained models from the Waste Identification project" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZSLPDKwV7G9i", + "outputId": "c261518a-7667-472d-e22d-9328160cefa3" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2022-08-10 22:46:06-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.128, 2607:f8b0:4023:c0d::80\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 521320844 (497M) [application/zip]\n", + "Saving to: ‘material_model.zip’\n", + "\n", + "material_model.zip 100%[===================\u003e] 497.17M 131MB/s in 3.8s \n", + "\n", + "2022-08-10 22:46:10 (131 MB/s) - ‘material_model.zip’ saved [521320844/521320844]\n", + "\n", + "--2022-08-10 22:46:10-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_form_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.128, 2607:f8b0:4023:c0d::80\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 523568744 (499M) [application/zip]\n", + "Saving to: ‘material_form_model.zip’\n", + "\n", + "material_form_model 100%[===================\u003e] 499.31M 130MB/s in 4.4s \n", + "\n", + "2022-08-10 22:46:15 (113 MB/s) - ‘material_form_model.zip’ saved [523568744/523568744]\n", + "\n", + "--2022-08-10 22:46:15-- https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/plastic_types_model.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.2.128, 2607:f8b0:4023:c0d::80\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.2.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 521268394 (497M) [application/zip]\n", + "Saving to: ‘plastic_types_model.zip’\n", + "\n", + "plastic_types_model 100%[===================\u003e] 497.12M 159MB/s in 3.1s \n", + "\n", + "2022-08-10 22:46:18 (159 MB/s) - ‘plastic_types_model.zip’ saved [521268394/521268394]\n", + "\n" + ] + } + ], + "source": [ + "# download the model weights from the Google's repo\n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_model.zip \n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/material_form_model.zip \n", + "!wget https://storage.googleapis.com/tf_model_garden/vision/waste_identification_ml/plastic_types_model.zip " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "RkC_Pk197QlC", + "outputId": "8c1775bf-82c8-44bc-b92f-768b744802b9" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Archive: material_model.zip\n", + " creating: material/saved_model/\n", + " inflating: material/saved_model/params.yaml \n", + " creating: material/saved_model/saved_model/\n", + " inflating: material/saved_model/saved_model/saved_model.pb \n", + " creating: material/saved_model/saved_model/variables/\n", + " inflating: material/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: material/saved_model/saved_model/variables/variables.index \n", + " creating: material/saved_model/checkpoint/\n", + " inflating: material/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: material/saved_model/checkpoint/checkpoint \n", + " inflating: material/saved_model/checkpoint/ckpt-1.index \n", + " creating: material/tflite_model/\n", + " inflating: material/tflite_model/model.tflite \n", + "Archive: material_form_model.zip\n", + " creating: material_form/saved_model/\n", + " inflating: material_form/saved_model/params.yaml \n", + " creating: material_form/saved_model/saved_model/\n", + " inflating: material_form/saved_model/saved_model/saved_model.pb \n", + " creating: material_form/saved_model/saved_model/variables/\n", + " inflating: material_form/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: material_form/saved_model/saved_model/variables/variables.index \n", + " creating: material_form/saved_model/checkpoint/\n", + " inflating: material_form/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: material_form/saved_model/checkpoint/checkpoint \n", + " inflating: material_form/saved_model/checkpoint/ckpt-1.index \n", + " creating: material_form/tflite_model/\n", + " inflating: material_form/tflite_model/model.tflite \n", + "Archive: plastic_types_model.zip\n", + " creating: plastic_type/saved_model/\n", + " inflating: plastic_type/saved_model/params.yaml \n", + " creating: plastic_type/saved_model/saved_model/\n", + " inflating: plastic_type/saved_model/saved_model/saved_model.pb \n", + " creating: plastic_type/saved_model/saved_model/variables/\n", + " inflating: plastic_type/saved_model/saved_model/variables/variables.data-00000-of-00001 \n", + " inflating: plastic_type/saved_model/saved_model/variables/variables.index \n", + " creating: plastic_type/saved_model/checkpoint/\n", + " inflating: plastic_type/saved_model/checkpoint/ckpt-1.data-00000-of-00001 \n", + " inflating: plastic_type/saved_model/checkpoint/checkpoint \n", + " inflating: plastic_type/saved_model/checkpoint/ckpt-1.index \n", + " creating: plastic_type/tflite_model/\n", + " inflating: plastic_type/tflite_model/model.tflite \n" + ] + } + ], + "source": [ + "# unziping the folders\n", + "%%bash\n", + "mkdir material material_form plastic_type\n", + "unzip material_model.zip -d material/\n", + "unzip material_form_model.zip -d material_form/\n", + "unzip plastic_types_model.zip -d plastic_type/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ey-8Ij2sKjkD" + }, + "outputs": [], + "source": [ + "ALL_MODELS = {\n", + "'material_model' : 'material/tflite_model/model.tflite',\n", + "'material_form_model' : 'material_form/tflite_model/model.tflite',\n", + "'plastic_model' : 'plastic_type/tflite_model/model.tflite'\n", + "}\n", + "\n", + "# path to an image\n", + "IMAGES_FOR_TEST = {\n", + " 'Image1' : 'models/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_2.png'\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IogyryF2lFBL" + }, + "source": [ + "## Utilities\n", + "\n", + "Run the following cell to create some utils that will be needed later:\n", + "\n", + "- Helper method to load an image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9XXfEdD9PMKn" + }, + "outputs": [], + "source": [ + "# Inputs to preprocess functions\n", + "\n", + "def load_image_into_numpy_array(path):\n", + " \"\"\"Load an image from file into a numpy array.\n", + "\n", + " Puts image into numpy array to feed into tensorflow graph.\n", + " Note that by convention we put it into a numpy array with shape\n", + " (height, width, channels), where channels=3 for RGB.\n", + "\n", + " Args:\n", + " path: the file path to the image\n", + "\n", + " Returns:\n", + " uint8 numpy array with shape (1, h, w, 3)\n", + " \"\"\"\n", + " image = None\n", + " if(path.startswith('http')):\n", + " response = urlopen(path)\n", + " image_data = response.read()\n", + " image_data = BytesIO(image_data)\n", + " image = Image.open(image_data)\n", + " else:\n", + " image_data = tf.io.gfile.GFile(path, 'rb').read()\n", + " image = Image.open(BytesIO(image_data))\n", + "\n", + " (im_width, im_height) = image.size\n", + " return np.array(image.getdata()).reshape(\n", + " (1, im_height, im_width, 3)).astype(np.uint8)\n", + "\n", + "\n", + "def build_inputs_for_segmentation(image):\n", + " \"\"\"Builds segmentation model inputs for serving.\"\"\"\n", + " # Normalizes image with mean and std pixel values.\n", + " image = normalize_image(image)\n", + " return image" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6917xnUSlp9x" + }, + "source": [ + "## Build an instance segmentation model and load pre-trained model weights\n", + "\n", + "Here we will choose which Instance Segmentation model we will use.\n", + "If you want to change the model to try other architectures later, just change the next cell and execute following ones." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "HtwrSqvakTNn", + "outputId": "7f40f0ed-a0aa-45e3-8d43-e4f857e058d1" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Selected model:material_form_model\n", + "Model Handle at TensorFlow Hub: material_form/tflite_model/model.tflite\n" + ] + } + ], + "source": [ + "# @title Model Selection { display-mode: \"form\", run: \"auto\" }\n", + "model_display_name = 'material_form_model' # @param ['material_model','material_form_model','plastic_model']\n", + "model_handle = ALL_MODELS[model_display_name]\n", + "\n", + "print('Selected model:'+ model_display_name)\n", + "print('Model Handle at TensorFlow Hub: {}'.format(model_handle))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NKtD0IeclbL5" + }, + "source": [ + "### Load label map data (for plotting).\n", + "\n", + "Label maps correspond index numbers to category names, so that when our convolution network predicts `7`, we know that this corresponds to `tray`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine.\n", + "\n", + "We are going, for simplicity, to load from the repository that we loaded the Object Detection API code" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3Kwqa0T1NTUf", + "outputId": "d8d12557-7308-4227-d33e-1b7514bac381" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Labels selected for material_form_model\n", + "\n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "{1: {'id': 1, 'name': 'Flexibles'},\n", + " 2: {'id': 2, 'name': 'Bottle'},\n", + " 3: {'id': 3, 'name': 'Jar'},\n", + " 4: {'id': 4, 'name': 'Carton'},\n", + " 5: {'id': 5, 'name': 'Sachets-\u0026-Pouch'},\n", + " 6: {'id': 6, 'name': 'Blister-pack'},\n", + " 7: {'id': 7, 'name': 'Tray'},\n", + " 8: {'id': 8, 'name': 'Tube'},\n", + " 9: {'id': 9, 'name': 'Can'},\n", + " 10: {'id': 10, 'name': 'Tub'},\n", + " 11: {'id': 11, 'name': 'Cosmetic'},\n", + " 12: {'id': 12, 'name': 'Box'},\n", + " 13: {'id': 13, 'name': 'Clothes'},\n", + " 14: {'id': 14, 'name': 'Bulb'},\n", + " 15: {'id': 15, 'name': 'Cup-\u0026-glass'},\n", + " 16: {'id': 16, 'name': 'Book-\u0026-magazine'},\n", + " 17: {'id': 17, 'name': 'Bag'},\n", + " 18: {'id': 18, 'name': 'Lid'},\n", + " 19: {'id': 19, 'name': 'Clamshell'},\n", + " 20: {'id': 20, 'name': 'Mirror'},\n", + " 21: {'id': 21, 'name': 'Tangler'},\n", + " 22: {'id': 22, 'name': 'Cutlery'},\n", + " 23: {'id': 23, 'name': 'Cassette-\u0026-tape'},\n", + " 24: {'id': 24, 'name': 'Electronic-devices'},\n", + " 25: {'id': 25, 'name': 'Battery'},\n", + " 26: {'id': 26, 'name': 'Pen-\u0026-pencil'},\n", + " 27: {'id': 27, 'name': 'Paper-products'},\n", + " 28: {'id': 28, 'name': 'Foot-wear'},\n", + " 29: {'id': 29, 'name': 'Scissor'},\n", + " 30: {'id': 30, 'name': 'Toys'},\n", + " 31: {'id': 31, 'name': 'Brush'},\n", + " 32: {'id': 32, 'name': 'Pipe'},\n", + " 33: {'id': 33, 'name': 'Foil'},\n", + " 34: {'id': 34, 'name': 'Hangers'}}" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# @title Labels for the above model { display-mode: \"form\", run: \"auto\" }\n", + "\n", + "if model_display_name == 'material_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/material_labels.pbtxt'\n", + "elif model_display_name == 'material_form_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/material_form_labels.pbtxt'\n", + "elif model_display_name == 'plastic_model':\n", + " PATH_TO_LABELS = './models/official/projects/waste_identification_ml/pre_processing/config/data/plastic_type_labels.pbtxt'\n", + "\n", + "print('Labels selected for',model_display_name)\n", + "print('\\n')\n", + "category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)\n", + "category_index" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "muhUt-wWL582" + }, + "source": [ + "## Loading the selected model \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "rBuD07fLlcEO", + "outputId": "9f5392ec-91a0-42c4-a3cb-4976f19eb99e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "loading model...\n", + "model loaded!\n" + ] + } + ], + "source": [ + "print('loading model...')\n", + "interpreter = tf.lite.Interpreter(model_path=model_handle)\n", + "runner = interpreter.get_signature_runner()\n", + "print('model loaded!')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0s0Rne2xA4i1", + "outputId": "48083ea1-aea7-48f5-9dab-57ac4fa5ed83" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'serving_default': {'inputs': ['inputs'],\n", + " 'outputs': ['detection_boxes',\n", + " 'detection_classes',\n", + " 'detection_masks',\n", + " 'detection_scores',\n", + " 'image_info',\n", + " 'num_detections']}}\n" + ] + } + ], + "source": [ + "# get signature list\n", + "pprint(interpreter.get_signature_list())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GIawRDKPPnd4" + }, + "source": [ + "## Loading an image\n", + "\n", + "Let's try the model on a simple image. \n", + "\n", + "Here are some simple things to try out if you are curious:\n", + "* Try running inference on your own images, just upload them to colab and load the same way it's done in the cell below.\n", + "* Modify some of the input images and see if detection still works. Some simple things to try out here include flipping the image horizontally, or converting to grayscale (note that we still expect the input image to have 3 channels).\n", + "\n", + "**Be careful:** when using images with an alpha channel, the model expect 3 channels images and the alpha will count as a 4th.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 822 + }, + "id": "hX-AWUQ1wIEr", + "outputId": "f0ccad9b-e6fc-46fd-c975-8acdf261f482" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "min: 0 max: 255\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "#@title Image Selection (don't forget to execute the cell!) { display-mode: \"form\"}\n", + "selected_image = 'Image1' # @param ['Image1']\n", + "flip_image_horizontally = False #@param {type:\"boolean\"}\n", + "convert_image_to_grayscale = False #@param {type:\"boolean\"}\n", + "\n", + "image_path = IMAGES_FOR_TEST[selected_image]\n", + "image_np = load_image_into_numpy_array(image_path)\n", + "\n", + "# Flip horizontally\n", + "if(flip_image_horizontally):\n", + " image_np[0] = np.fliplr(image_np[0]).copy()\n", + "\n", + "# Convert image to grayscale\n", + "if(convert_image_to_grayscale):\n", + " image_np[0] = np.tile(\n", + " np.mean(image_np[0], 2, keepdims=True), (1, 1, 3)).astype(np.uint8)\n", + "\n", + "print('min:',np.min(image_np[0]), 'max:', np.max(image_np[0]))\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np[0])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dkkBAgGcX65P" + }, + "source": [ + "## Pre-processing an image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "97zIaKAhX-92", + "outputId": "9156b8c5-6ca4-4e69-fdd1-60f82b0ee4b2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "height = 512, width = 1024\n" + ] + } + ], + "source": [ + "# get an input size of images on which an Instance Segmentation model is trained\n", + "\n", + "# Get model details.\n", + "input_details = interpreter.get_input_details()\n", + "output_details = interpreter.get_output_details()\n", + "\n", + "# read height and width\n", + "height = input_details[0]['shape'][1]\n", + "width = input_details[0]['shape'][2]\n", + "\n", + "# verify input height and width\n", + "input_size = (height, width)\n", + "print('height = {}, width = {}'.format(height, width))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-K0V6KWiYYpD", + "outputId": "f5788aa0-6a0d-48e1-a0a8-cf49738420ec" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "TensorShape([1, 512, 1024, 3])" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# apply pre-processing functions which were applied during training the model\n", + "image_np_cp = cv2.resize(image_np[0], input_size[::-1], interpolation = cv2.INTER_AREA)\n", + "image_np = build_inputs_for_segmentation(image_np_cp)\n", + "image_np = tf.expand_dims(image_np, axis=0)\n", + "image_np.get_shape()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 738 + }, + "id": "ga1lccBpdxpd", + "outputId": "3268aa20-3cb3-4f4f-cdcc-c06b8867d016" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# display pre-processed image\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np[0])\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FTHsFjR6HNwb" + }, + "source": [ + "## Doing the inference\n", + "\n", + "To do the inference we just need to call our TF Hub loaded model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Gb_siXKcnnGC", + "outputId": "1f521cea-3020-43f6-ae79-dec5b601fe1a" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dict_keys(['detection_boxes', 'detection_classes', 'detection_masks', 'detection_scores', 'image_info', 'num_detections'])\n" + ] + } + ], + "source": [ + "# running inference\n", + "result = runner(inputs=image_np)\n", + "print(result.keys())" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IZ5VYaBoeeFM" + }, + "source": [ + "## Visualizing the results\n", + "\n", + "Here is where we will need the TensorFlow Object Detection API to show the squares from the inference step (and the keypoints when available).\n", + "\n", + "the full documentation of this method can be seen [here](https://github.com/tensorflow/models/blob/master/research/object_detection/utils/visualization_utils.py)\n", + "\n", + "Here you can, for example, set `min_score_thresh` to other values (between 0 and 1) to allow more detections in or to filter out more detections." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "PMzURFjxxqF7" + }, + "outputs": [], + "source": [ + "# selecting parameters for visualization\n", + "label_id_offset = 0\n", + "min_score_thresh =0.9\n", + "use_normalized_coordinates=True\n", + "\n", + "if use_normalized_coordinates:\n", + " # Normalizing detection boxes\n", + " result['detection_boxes'][0][:,[0,2]] /= height\n", + " result['detection_boxes'][0][:,[1,3]] /= width" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 721 + }, + "id": "FILNrrDy0kUg", + "outputId": "95575a29-e498-4889-f2e0-e82fb918602d" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Visualize detection and masks\n", + "if 'detection_masks' in result:\n", + " # we need to convert np.arrays to tensors\n", + " detection_masks = tf.convert_to_tensor(result['detection_masks'][0])\n", + " detection_boxes = tf.convert_to_tensor(result['detection_boxes'][0])\n", + "\n", + " detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(\n", + " detection_masks, detection_boxes,\n", + " image_np.shape[1], image_np.shape[2])\n", + " detection_masks_reframed = tf.cast(detection_masks_reframed \u003e 0.5,\n", + " np.uint8)\n", + "\n", + " result['detection_masks_reframed'] = detection_masks_reframed.numpy()\n", + "viz_utils.visualize_boxes_and_labels_on_image_array(\n", + " image_np_cp,\n", + " result['detection_boxes'][0],\n", + " (result['detection_classes'][0] + label_id_offset).astype(int),\n", + " result['detection_scores'][0],\n", + " category_index=category_index,\n", + " use_normalized_coordinates=use_normalized_coordinates,\n", + " max_boxes_to_draw=200,\n", + " min_score_thresh=min_score_thresh,\n", + " agnostic_mode=False,\n", + " instance_masks=result.get('detection_masks_reframed', None),\n", + " line_thickness=2)\n", + "\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(image_np_cp)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c75cSAeJ5JAQ" + }, + "source": [ + "## Visualizing the masks only" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 738 + }, + "id": "tt7RxYqhLpn9", + "outputId": "c1e79bc1-ed27-45de-d4c5-a5ea43117e98" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total number of objects found are: 26\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 1728x2304 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# collecting all masks and saving\n", + "\n", + "mask_count = np.sum(result['detection_scores'][0] \u003e= min_score_thresh)\n", + "print('Total number of objects found are:', mask_count)\n", + "mask = np.zeros_like(detection_masks_reframed[0])\n", + "for i in range(mask_count):\n", + " if result['detection_scores'][0][i] \u003e= min_score_thresh:\n", + " mask += detection_masks_reframed[i]\n", + "\n", + "mask = tf.clip_by_value(mask, 0,1)\n", + "plt.figure(figsize=(24,32))\n", + "plt.imshow(mask,cmap='gray')\n", + "plt.show()" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "tflite_model_inference.ipynb", + "provenance": [] + }, + "gpuClass": "standard", + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/waste_identification_ml/pre_processing/bb_to_mask_to_coco.ipynb b/official/projects/waste_identification_ml/pre_processing/bb_to_mask_to_coco.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..84609e7e82559ddfe2133b5462f57c7a5f933cd6 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/bb_to_mask_to_coco.ipynb @@ -0,0 +1,938 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "bXCInrb_b96Y" + }, + "source": [ + "# Convert Bounding Box to Masks " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6kN0IgH7cWlq" + }, + "source": [ + "The goal is to find the mask of an object using the bounding box coordinates. Then use the mask and image to create a COCO format JSON file. It is required to create a dataset for applying an instance segmentation algorithm." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eXwFSgoLeoQL" + }, + "source": [ + "\n", + "To find the mask of an object inside an image, a state-of-an-art algorithm called Deep MAC will be used. Input to the [Deep MAC](https://arxiv.org/abs/2104.00613) algorithm will be the normalized bounding box coordinate and an image. Its output will be a mask. Deep MAC pre trained weights trained on a SpineNet backbone will be used to detect the masks. These weights are available in open source. Deep MAC inference script can be [found here](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/deepmac_colab.ipynb) as well but we modified it according to the our project's need. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eCPPN3JOeszb" + }, + "source": [ + "\n", + "The output mask and its corresponding image will be then used to create a COCO format JSON annotation file using an open source library known as Imantics." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RHjzxxIUfSgg" + }, + "source": [ + "## Import libraries & clone the TF models directory" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "LX6NoABvFht4", + "outputId": "97eafac3-96f2-45c7-dd6d-05828bee1b35" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Collecting tf-models-official\n", + " Downloading tf_models_official-2.9.2-py2.py3-none-any.whl (2.1 MB)\n", + "\u001b[K |████████████████████████████████| 2.1 MB 5.1 MB/s \n", + "\u001b[?25hRequirement already satisfied: gin-config in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (0.5.0)\n", + "Collecting py-cpuinfo>=3.3.0\n", + " Downloading py-cpuinfo-8.0.0.tar.gz (99 kB)\n", + "\u001b[K |████████████████████████████████| 99 kB 8.6 MB/s \n", + "\u001b[?25hCollecting seqeval\n", + " Downloading seqeval-1.2.2.tar.gz (43 kB)\n", + "\u001b[K |████████████████████████████████| 43 kB 1.7 MB/s \n", + "\u001b[?25hRequirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (7.1.2)\n", + "Requirement already satisfied: tensorflow-hub>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (0.12.0)\n", + "Requirement already satisfied: pycocotools in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (2.0.4)\n", + "Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (1.7.3)\n", + "Requirement already satisfied: tensorflow-datasets in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (4.6.0)\n", + "Requirement already satisfied: oauth2client in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (4.1.3)\n", + "Collecting tensorflow-text~=2.9.0\n", + " Downloading tensorflow_text-2.9.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)\n", + "\u001b[K |████████████████████████████████| 4.6 MB 56.7 MB/s \n", + "\u001b[?25hCollecting tensorflow-addons\n", + " Downloading tensorflow_addons-0.17.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n", + "\u001b[K |████████████████████████████████| 1.1 MB 41.8 MB/s \n", + "\u001b[?25hRequirement already satisfied: Cython in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (0.29.32)\n", + "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (3.2.2)\n", + "Collecting tensorflow-model-optimization>=0.4.1\n", + " Downloading tensorflow_model_optimization-0.7.3-py2.py3-none-any.whl (238 kB)\n", + "\u001b[K |████████████████████████████████| 238 kB 8.8 MB/s \n", + "\u001b[?25hRequirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (1.15.0)\n", + "Requirement already satisfied: kaggle>=1.3.9 in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (1.5.12)\n", + "Requirement already satisfied: google-api-python-client>=1.6.7 in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (1.12.11)\n", + "Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (1.21.6)\n", + "Collecting tensorflow~=2.9.0\n", + " Downloading tensorflow-2.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB)\n", + "\u001b[K |████████████████████████████████| 511.7 MB 6.1 kB/s \n", + "\u001b[?25hCollecting pyyaml<6.0,>=5.1\n", + " Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)\n", + "\u001b[K |████████████████████████████████| 636 kB 69.9 MB/s \n", + "\u001b[?25hRequirement already satisfied: pandas>=0.22.0 in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (1.3.5)\n", + "Collecting tf-slim>=1.1.0\n", + " Downloading tf_slim-1.1.0-py2.py3-none-any.whl (352 kB)\n", + "\u001b[K |████████████████████████████████| 352 kB 60.2 MB/s \n", + "\u001b[?25hRequirement already satisfied: opencv-python-headless in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (4.6.0.66)\n", + "Collecting sacrebleu\n", + " Downloading sacrebleu-2.2.0-py3-none-any.whl (116 kB)\n", + "\u001b[K |████████████████████████████████| 116 kB 46.9 MB/s \n", + "\u001b[?25hRequirement already satisfied: psutil>=5.4.3 in /usr/local/lib/python3.7/dist-packages (from tf-models-official) (5.4.8)\n", + "Collecting sentencepiece\n", + " Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n", + "\u001b[K |████████████████████████████████| 1.3 MB 20.9 MB/s \n", + "\u001b[?25hRequirement already satisfied: httplib2<1dev,>=0.15.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official) (0.17.4)\n", + "Requirement already satisfied: google-auth<3dev,>=1.16.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official) (1.35.0)\n", + "Requirement already satisfied: google-api-core<3dev,>=1.21.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official) (1.31.6)\n", + "Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official) (3.0.1)\n", + "Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client>=1.6.7->tf-models-official) (0.0.4)\n", + "Requirement already satisfied: protobuf<4.0.0dev,>=3.12.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (3.17.3)\n", + "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (2.23.0)\n", + "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (1.56.4)\n", + "Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (21.3)\n", + "Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (57.4.0)\n", + "Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (2022.1)\n", + "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official) (4.2.4)\n", + "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official) (0.2.8)\n", + "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official) (4.9)\n", + "Requirement already satisfied: python-slugify in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official) (6.1.2)\n", + "Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official) (2.8.2)\n", + "Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official) (1.24.3)\n", + "Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official) (2022.6.15)\n", + "Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from kaggle>=1.3.9->tf-models-official) (4.64.0)\n", + "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=14.3->google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (3.0.9)\n", + "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3dev,>=1.16.0->google-api-python-client>=1.6.7->tf-models-official) (0.4.8)\n", + "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (3.0.4)\n", + "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<3dev,>=1.21.0->google-api-python-client>=1.6.7->tf-models-official) (2.10)\n", + "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (1.47.0)\n", + "Collecting flatbuffers<2,>=1.12\n", + " Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)\n", + "Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (1.14.1)\n", + "Collecting gast<=0.4.0,>=0.2.1\n", + " Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)\n", + "Requirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (4.1.1)\n", + "Collecting tensorboard<2.10,>=2.9\n", + " Downloading tensorboard-2.9.1-py3-none-any.whl (5.8 MB)\n", + "\u001b[K |████████████████████████████████| 5.8 MB 42.9 MB/s \n", + "\u001b[?25hRequirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (3.1.0)\n", + "Collecting tensorflow-estimator<2.10.0,>=2.9.0rc0\n", + " Downloading tensorflow_estimator-2.9.0-py2.py3-none-any.whl (438 kB)\n", + "\u001b[K |████████████████████████████████| 438 kB 56.1 MB/s \n", + "\u001b[?25hRequirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (0.26.0)\n", + "Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (0.2.0)\n", + "Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (3.3.0)\n", + "Requirement already satisfied: keras-preprocessing>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (1.1.2)\n", + "Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (1.1.0)\n", + "Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (1.2.0)\n", + "Collecting keras<2.10.0,>=2.9.0rc0\n", + " Downloading keras-2.9.0-py2.py3-none-any.whl (1.6 MB)\n", + "\u001b[K |████████████████████████████████| 1.6 MB 6.0 MB/s \n", + "\u001b[?25hRequirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (1.6.3)\n", + "Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow~=2.9.0->tf-models-official) (14.0.6)\n", + "Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.7/dist-packages (from astunparse>=1.6.0->tensorflow~=2.9.0->tf-models-official) (0.37.1)\n", + "Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=2.9.0->tensorflow~=2.9.0->tf-models-official) (1.5.2)\n", + "Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (1.0.1)\n", + "Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (1.8.1)\n", + "Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (3.4.1)\n", + "Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (0.4.6)\n", + "Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (0.6.1)\n", + "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (1.3.1)\n", + "Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (4.12.0)\n", + "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (3.8.1)\n", + "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow~=2.9.0->tf-models-official) (3.2.0)\n", + "Requirement already satisfied: dm-tree~=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow-model-optimization>=0.4.1->tf-models-official) (0.1.7)\n", + "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->tf-models-official) (0.11.0)\n", + "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->tf-models-official) (1.4.4)\n", + "Requirement already satisfied: text-unidecode>=1.3 in /usr/local/lib/python3.7/dist-packages (from python-slugify->kaggle>=1.3.9->tf-models-official) (1.3)\n", + "Collecting colorama\n", + " Downloading colorama-0.4.5-py2.py3-none-any.whl (16 kB)\n", + "Requirement already satisfied: tabulate>=0.8.9 in /usr/local/lib/python3.7/dist-packages (from sacrebleu->tf-models-official) (0.8.10)\n", + "Collecting portalocker\n", + " Downloading portalocker-2.5.1-py2.py3-none-any.whl (15 kB)\n", + "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from sacrebleu->tf-models-official) (2022.6.2)\n", + "Requirement already satisfied: lxml in /usr/local/lib/python3.7/dist-packages (from sacrebleu->tf-models-official) (4.9.1)\n", + "Requirement already satisfied: scikit-learn>=0.21.3 in /usr/local/lib/python3.7/dist-packages (from seqeval->tf-models-official) (1.0.2)\n", + "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval->tf-models-official) (3.1.0)\n", + "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval->tf-models-official) (1.1.0)\n", + "Requirement already satisfied: typeguard>=2.7 in /usr/local/lib/python3.7/dist-packages (from tensorflow-addons->tf-models-official) (2.7.1)\n", + "Requirement already satisfied: promise in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official) (2.3)\n", + "Requirement already satisfied: tensorflow-metadata in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official) (1.9.0)\n", + "Requirement already satisfied: toml in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official) (0.10.2)\n", + "Requirement already satisfied: importlib-resources in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official) (5.9.0)\n", + "Requirement already satisfied: etils[epath] in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official) (0.6.0)\n", + "Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from tensorflow-datasets->tf-models-official) (0.3.5.1)\n", + "Building wheels for collected packages: py-cpuinfo, seqeval\n", + " Building wheel for py-cpuinfo (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for py-cpuinfo: filename=py_cpuinfo-8.0.0-py3-none-any.whl size=22257 sha256=9abdaf2ab73c28e554cac77b22ceb8dbb7e5da8afbd765dd97cea1b1fb9279d9\n", + " Stored in directory: /root/.cache/pip/wheels/d2/f1/1f/041add21dc9c4220157f1bd2bd6afe1f1a49524c3396b94401\n", + " Building wheel for seqeval (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16180 sha256=26e42d24a45f20224fdfd75829909e94dba226915eccc43a309861a340461060\n", + " Stored in directory: /root/.cache/pip/wheels/05/96/ee/7cac4e74f3b19e3158dce26a20a1c86b3533c43ec72a549fd7\n", + "Successfully built py-cpuinfo seqeval\n", + "Installing collected packages: tensorflow-estimator, tensorboard, keras, gast, flatbuffers, tensorflow, portalocker, colorama, tf-slim, tensorflow-text, tensorflow-model-optimization, tensorflow-addons, seqeval, sentencepiece, sacrebleu, pyyaml, py-cpuinfo, tf-models-official\n", + " Attempting uninstall: tensorflow-estimator\n", + " Found existing installation: tensorflow-estimator 2.8.0\n", + " Uninstalling tensorflow-estimator-2.8.0:\n", + " Successfully uninstalled tensorflow-estimator-2.8.0\n", + " Attempting uninstall: tensorboard\n", + " Found existing installation: tensorboard 2.8.0\n", + " Uninstalling tensorboard-2.8.0:\n", + " Successfully uninstalled tensorboard-2.8.0\n", + " Attempting uninstall: keras\n", + " Found existing installation: keras 2.8.0\n", + " Uninstalling keras-2.8.0:\n", + " Successfully uninstalled keras-2.8.0\n", + " Attempting uninstall: gast\n", + " Found existing installation: gast 0.5.3\n", + " Uninstalling gast-0.5.3:\n", + " Successfully uninstalled gast-0.5.3\n", + " Attempting uninstall: flatbuffers\n", + " Found existing installation: flatbuffers 2.0\n", + " Uninstalling flatbuffers-2.0:\n", + " Successfully uninstalled flatbuffers-2.0\n", + " Attempting uninstall: tensorflow\n", + " Found existing installation: tensorflow 2.8.2+zzzcolab20220719082949\n", + " Uninstalling tensorflow-2.8.2+zzzcolab20220719082949:\n", + " Successfully uninstalled tensorflow-2.8.2+zzzcolab20220719082949\n", + " Attempting uninstall: pyyaml\n", + " Found existing installation: PyYAML 3.13\n", + " Uninstalling PyYAML-3.13:\n", + " Successfully uninstalled PyYAML-3.13\n", + "Successfully installed colorama-0.4.5 flatbuffers-1.12 gast-0.4.0 keras-2.9.0 portalocker-2.5.1 py-cpuinfo-8.0.0 pyyaml-5.4.1 sacrebleu-2.2.0 sentencepiece-0.1.97 seqeval-1.2.2 tensorboard-2.9.1 tensorflow-2.9.1 tensorflow-addons-0.17.1 tensorflow-estimator-2.9.0 tensorflow-model-optimization-0.7.3 tensorflow-text-2.9.0 tf-models-official-2.9.2 tf-slim-1.1.0\n", + "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", + "Collecting imantics\n", + " Downloading imantics-0.1.12.tar.gz (13 kB)\n", + "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from imantics) (1.21.6)\n", + "Requirement already satisfied: opencv-python>=3 in /usr/local/lib/python3.7/dist-packages (from imantics) (4.6.0.66)\n", + "Requirement already satisfied: lxml in /usr/local/lib/python3.7/dist-packages (from imantics) (4.9.1)\n", + "Collecting xmljson\n", + " Downloading xmljson-0.2.1-py2.py3-none-any.whl (10 kB)\n", + "Building wheels for collected packages: imantics\n", + " Building wheel for imantics (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Created wheel for imantics: filename=imantics-0.1.12-py3-none-any.whl size=16033 sha256=fe346841a0a21db0c7d8936ed1098c7d1dfd74596a23facca44fdfa9a88c263f\n", + " Stored in directory: /root/.cache/pip/wheels/da/7c/3e/296fe3ed4eb3bd713e91dee0d0549f12f316d49939a64bdc96\n", + "Successfully built imantics\n", + "Installing collected packages: xmljson, imantics\n", + "Successfully installed imantics-0.1.12 xmljson-0.2.1\n" + ] + } + ], + "source": [ + "# install additional libraries\n", + "!pip install tf-models-official\n", + "!pip3 install imantics" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CXfMBkXvHyjg" + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "\n", + "import logging\n", + "logging.disable(logging.WARNING)\n", + "\n", + "from matplotlib import pyplot as plt\n", + "from matplotlib import patches\n", + "from PIL import Image\n", + "import numpy as np\n", + "import random\n", + "from skimage import color\n", + "from skimage.color import rgb_colors\n", + "from skimage import transform\n", + "from skimage import util\n", + "import tensorflow as tf\n", + "import warnings\n", + "from imantics import Mask, Category, Image as imantics_Image\n", + "import json\n", + "tf.compat.v1.enable_eager_execution()\n", + "\n", + "\n", + "COLORS = ([rgb_colors.cyan, rgb_colors.orange, rgb_colors.pink,\n", + " rgb_colors.purple, rgb_colors.limegreen , rgb_colors.crimson] + \n", + " [(color) for (name, color) in color.color_dict.items()])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dflD6h1vWW4G" + }, + "source": [ + "## Visualization functions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dOCWTzvSFnk8" + }, + "outputs": [], + "source": [ + "def reframe_box_masks_to_image_masks(box_masks, boxes, image_height,\n", + " image_width, resize_method='bilinear'):\n", + " \"\"\"Transforms the box masks back to full image masks.\n", + "\n", + " Embeds masks in bounding boxes of larger masks whose shapes correspond to\n", + " image shape.\n", + "\n", + " Args:\n", + " box_masks: A tensor of size [num_masks, mask_height, mask_width].\n", + " boxes: A tf.float32 tensor of size [num_masks, 4] containing the box\n", + " corners. Row i contains [ymin, xmin, ymax, xmax] of the box\n", + " corresponding to mask i. Note that the box corners are in\n", + " normalized coordinates.\n", + " image_height: Image height. The output mask will have the same height as\n", + " the image height.\n", + " image_width: Image width. The output mask will have the same width as the\n", + " image width.\n", + " resize_method: The resize method, either 'bilinear' or 'nearest'. Note that\n", + " 'bilinear' is only respected if box_masks is a float.\n", + "\n", + " Returns:\n", + " A tensor of size [num_masks, image_height, image_width] with the same dtype\n", + " as `box_masks`.\n", + " \"\"\"\n", + " resize_method = 'nearest' if box_masks.dtype == tf.uint8 else resize_method\n", + " def reframe_box_masks_to_image_masks_default():\n", + " \"\"\"The default function when there are more than 0 box masks.\"\"\"\n", + "\n", + " num_boxes = tf.shape(box_masks)[0]\n", + " box_masks_expanded = tf.expand_dims(box_masks, axis=3)\n", + "\n", + " resized_crops = tf.image.crop_and_resize(\n", + " image=box_masks_expanded,\n", + " boxes=reframe_image_corners_relative_to_boxes(boxes),\n", + " box_indices=tf.range(num_boxes),\n", + " crop_size=[image_height, image_width],\n", + " method=resize_method,\n", + " extrapolation_value=0)\n", + " return tf.cast(resized_crops, box_masks.dtype)\n", + "\n", + " image_masks = tf.cond(\n", + " tf.shape(box_masks)[0] > 0,\n", + " reframe_box_masks_to_image_masks_default,\n", + " lambda: tf.zeros([0, image_height, image_width, 1], box_masks.dtype))\n", + " return tf.squeeze(image_masks, axis=3)\n", + "\n", + "def reframe_image_corners_relative_to_boxes(boxes):\n", + " \"\"\"Reframe the image corners ([0, 0, 1, 1]) to be relative to boxes.\n", + "\n", + " The local coordinate frame of each box is assumed to be relative to\n", + " its own for corners.\n", + "\n", + " Args:\n", + " boxes: A float tensor of [num_boxes, 4] of (ymin, xmin, ymax, xmax)\n", + " coordinates in relative coordinate space of each bounding box.\n", + "\n", + " Returns:\n", + " reframed_boxes: Reframes boxes with same shape as input.\n", + " \"\"\"\n", + " ymin, xmin, ymax, xmax = (boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3])\n", + "\n", + " height = tf.maximum(ymax - ymin, 1e-4)\n", + " width = tf.maximum(xmax - xmin, 1e-4)\n", + "\n", + " ymin_out = (0 - ymin) / height\n", + " xmin_out = (0 - xmin) / width\n", + " ymax_out = (1 - ymin) / height\n", + " xmax_out = (1 - xmin) / width\n", + " return tf.stack([ymin_out, xmin_out, ymax_out, xmax_out], axis=1)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "G-gUJ2qffiiH" + }, + "source": [ + "## Utility functions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-cWctY5cyUKC" + }, + "outputs": [], + "source": [ + "def read_image(path):\n", + " \"\"\"Read an image and optionally resize it for better plotting.\"\"\"\n", + " with tf.io.gfile.GFile(path, 'rb') as f:\n", + " img = Image.open(f)\n", + " return np.array(img, dtype=np.uint8)\n", + "\n", + "def resize_for_display(image, max_height=600):\n", + " height, width, _ = image.shape\n", + " width = int(width * max_height / height)\n", + " with warnings.catch_warnings():\n", + " warnings.simplefilter(\"ignore\", UserWarning)\n", + " return util.img_as_ubyte(transform.resize(image, (height, width)))\n", + "\n", + "\n", + "def get_mask_prediction_function(model):\n", + " \"\"\"Get single image mask preidction function using a model.\"\"\"\n", + "\n", + " detection_fn = model.signatures['serving_default']\n", + "\n", + "\n", + " @tf.function\n", + " def predict_masks(image, boxes):\n", + " height, width, _ = image.shape.as_list()\n", + " batch = image[tf.newaxis]\n", + " boxes = boxes[tf.newaxis]\n", + " detections = detection_fn(images=batch, boxes=boxes)\n", + " masks = detections['detection_masks']\n", + " return reframe_box_masks_to_image_masks(masks[0], boxes[0],\n", + " height, width)\n", + " \n", + " return predict_masks\n", + "\n", + "\n", + "def display(im):\n", + " plt.figure(figsize=(16, 12))\n", + " plt.imshow(im)\n", + " plt.show()\n", + "\n", + "def plot_image_annotations(image, boxes, masks=None, darken_image=0.7):\n", + " fig, ax = plt.subplots(figsize=(16, 12))\n", + " ax.set_axis_on()\n", + " image = (image * darken_image).astype(np.uint8)\n", + " ax.imshow(image)\n", + "\n", + " height, width, _ = image.shape\n", + "\n", + " num_colors = len(COLORS)\n", + " color_index = 0\n", + " boxes = boxes[:20]\n", + " \n", + " masks_list = masks if masks is not None else [None] * len(boxes)\n", + " for box, mask in zip(boxes, masks_list):\n", + " ymin, xmin, ymax, xmax = box\n", + " ymin *= height\n", + " ymax *= height\n", + " xmin *= width\n", + " xmax *= width\n", + "\n", + " color = COLORS[color_index]\n", + " color = np.array(color)\n", + " rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,\n", + " linewidth=2.5, edgecolor=color, facecolor='none')\n", + " ax.add_patch(rect)\n", + "\n", + " if masks is not None:\n", + " mask = (mask > 0.5).astype(np.float32)\n", + " color_image = np.ones_like(image) * color[np.newaxis, np.newaxis, :]\n", + " color_and_mask = np.concatenate(\n", + " [color_image, mask[:, :, np.newaxis]], axis=2)\n", + "\n", + " ax.imshow(color_and_mask, alpha=0.5)\n", + "\n", + " color_index = (color_index + 1) % num_colors\n", + "\n", + " return ax" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1Dn44FimfmId" + }, + "source": [ + "## Import pre-trained Deep MAC weights" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "A4jhuP2zADfS", + "outputId": "ae74d5c5-54b8-48af-ea58-197fef886557" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2022-08-11 23:49:21-- https://storage.googleapis.com/tf_model_garden/vision/deepmac_maskrcnn/deepmarc_spinenet.zip\n", + "Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.213.128, 173.194.214.128, 173.194.215.128, ...\n", + "Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.213.128|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 314902579 (300M) [application/zip]\n", + "Saving to: ‘deepmarc_spinenet.zip’\n", + "\n", + "deepmarc_spinenet.z 100%[===================>] 300.31M 142MB/s in 2.1s \n", + "\n", + "2022-08-11 23:49:24 (142 MB/s) - ‘deepmarc_spinenet.zip’ saved [314902579/314902579]\n", + "\n", + "Archive: deepmarc_spinenet.zip\n", + " creating: deepmarc_spinenet/\n", + " creating: deepmarc_spinenet/variables/\n", + " inflating: deepmarc_spinenet/variables/variables.data-00000-of-00001 \n", + " inflating: deepmarc_spinenet/variables/variables.index \n", + " creating: deepmarc_spinenet/assets/\n", + " inflating: deepmarc_spinenet/saved_model.pb \n" + ] + } + ], + "source": [ + "!wget https://storage.googleapis.com/tf_model_garden/vision/deepmac_maskrcnn/deepmarc_spinenet.zip\n", + "!unzip deepmarc_spinenet.zip" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PLLKy18bfzEW" + }, + "source": [ + "## Load the model" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "3RGiRcAc7LDU" + }, + "outputs": [], + "source": [ + "MODEL = '/content/deepmarc_spinenet/'\n", + "model = tf.saved_model.load(MODEL)\n", + "prediction_function = get_mask_prediction_function(model)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l8suwOvRf5P5" + }, + "source": [ + "## MUST CHANGE - Modify the path of an image according to your convenience" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "FP8ETRBUFupE", + "outputId": "4c529603-ea75-480f-a119-90443cd665aa" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " % Total % Received % Xferd Average Speed Time Time Time Current\n", + " Dload Upload Total Spent Left Speed\n", + "100 1235k 100 1235k 0 0 3687k 0 --:--:-- --:--:-- --:--:-- 3676k\n" + ] + } + ], + "source": [ + "# import an image\n", + "!curl -O https://raw.githubusercontent.com/tensorflow/models/master/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_3.jpg " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QlzhbuxxysPy" + }, + "outputs": [], + "source": [ + "# path to an image\n", + "IMAGE_PATH = 'image_3.jpg' #@param {type:\"string\"}\n", + "\n", + "# list of bounding box coordinates in the ymin, xmin, ymax, xmax format\n", + "BB_CORD = [175.0, 815.06625, 948.0, 1630.125] #@param {type:\"raw\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cL-IU_LN0ycU", + "outputId": "5595008a-833d-4d72-f534-6dbfedac4806" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "2048 2592\n", + "0.08544921875 0.3144545717592592 0.462890625 0.62890625\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[0.08544922, 0.31445457, 0.46289062, 0.62890625]])" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "# get height and width of an image\n", + "im = read_image(IMAGE_PATH)\n", + "height, width, _ = im.shape\n", + "print(height, width)\n", + "\n", + "# convert bounding box coordinates to normalized coordinates\n", + "YMIN, XMIN, YMAX, XMAX = BB_CORD[0], BB_CORD[1], BB_CORD[2], BB_CORD[3]\n", + "YMIN_NOR, XMIN_NOR, YMAX_NOR, XMAX_NOR = YMIN/height, XMIN/width, YMAX/height, XMAX/width\n", + "print(YMIN_NOR, XMIN_NOR, YMAX_NOR, XMAX_NOR)\n", + "\n", + "# reshape the coordinates\n", + "boxes = np.array([YMIN_NOR, XMIN_NOR, YMAX_NOR, XMAX_NOR]).reshape(1,4)\n", + "boxes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 723 + }, + "id": "wo1ogZ_r5w1-", + "outputId": "cab79b62-5493-4fae-f065-6eed106ba0a7" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 10 + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "%matplotlib inline\n", + "# display bounding box over an image\n", + "plot_image_annotations(im, boxes)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lzhbOoyWgJSO" + }, + "source": [ + "## Doing the inference and showing the results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "m6Yyhh6irf1n" + }, + "outputs": [], + "source": [ + "masks = prediction_function(tf.convert_to_tensor(im),\n", + " tf.convert_to_tensor(boxes, dtype=tf.float32))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 704 + }, + "id": "Uq6vbiRirlLB", + "outputId": "e3074a1a-37d2-41e7-8b1a-a222a5e28646" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "plot_image_annotations(im, boxes, masks.numpy())\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WcCAReeIgPBt" + }, + "source": [ + "## Get the mask" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 704 + }, + "id": "nTkvNK6xrzm5", + "outputId": "ca31e5c4-1b7a-44be-d537-7fcfafe5b867" + }, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "iVBORw0KGgoAAAANSUhEUgAAA2gAAAKvCAYAAAAfjOyFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdZ5ieZYE24OuemSRDChCKGBKQEBI6UkJXLFgAEewLiwUsEYFVwL7q4q64ioiNFRFdRRHFwiqgiCK6oksLoERqCCRAQq9pJJnyfD8Y/AJSQjKZ55l5z/M45sg799uu+ZOZ673LU6qqCgAAAPVrqzsAAAAAj1HQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpiwAtaKWWfUspNpZRZpZSPDfT7AwAANFUZyOuglVLak8xM8sokc5NMT3JwVVXXD1gIAACAhhroGbRdksyqqurWqqqWJTkryYEDnAEAAKCROgb4/cYnuWO57+cm2fWZnjC8jKg6M2q1hgIAABgoS7Ioy6ql5anuG+iCtkJKKdOSTEuSzozMrmXvmhMBAAD0j8uri572voFe4jgvyUbLfT+hb+wJqqo6raqqqVVVTR2WEQMWDgAAoE4DXdCmJ5lcSplYShme5KAk5w5wBgAAgEYa0CWOVVV1l1KOSvKbJO1JvlNV1XUDmQEAAKCpBnwPWlVV5yc5f6DfFwAAoOkG/ELVAAAAPDUFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGiIjroDAEArKMOGp22TCUkpjw1UVXpn356qu7veYAA0ioIGAAPgriOn5odHn5TO0pMkWVR1ZNonjs5aZ15WczIAmkRBA4DVqZSUqdvkre/+TbYevsYT7lr6lodTftxhFg2Av7MHDQD6WynpGPf8PHLIbpn9w+3y+Z98Ox9e55Z/eNg5O3w78984NWlrryEkAE1UqqqqO8MzWrOsU+1a9q47BgA8s1LSu9f2mbNfZ9o2WZTP7HBODhx1f0aUYc/4tBuWLc7+fz4yW3zi/nTfdscAhQWgTpdXF2V+9WB5qvsUNADoBwsO2i1nnPDFTBo2eqWef8DN+6Tnn6p0331PPycDoGmeqaBZ4ggAq2q37XLwp3690uUsSX662Xm5/1Wb9mMoAAYjh4QAwEpoX2/dZP11MueN6+ff3vajHDTmoVV6vRFlWO7doydrf7+fAgIwKCloAPAcVHtun5sPHZZj9/htXj36+kzqWCPtpX8WpBy6+59z2Zh107tgQb+8HgCDj4IGACuilCzdb2p2+o+r8ttxV/cNjurXt3jfOtNzydbvSy6b0a+vC8DgoaABwLMoI0bklv/YMWf/05ez3fDO1fY+67atkTmvHZVNXLsaoGU5JAQAnkpbe9rHjs2yfXbOradvkb8csnrLWZK0l7b8+5vOSscmG6/W9wGgucygAcCTVLu/MHM/2JOPbXNBXjvqVxnbPjLJ6i1nj3vz6Ady6mltGfXO8emeO29A3hOA5jCDBgDLad/geSnHP5Dr9/hB3r7m/X3lbADfv7Tloq3/J7d+eZ20r73WgL43APVT0AAgSdvIkel98Q5ZfMbInL/FubVmaS9tuXaP7+W2I7auNQcAA09BA6A1lZK2kSPTts0Wuf3Te2TyxV357pkn53+3+UW/HZu/KtpLW45463npGL9h3VEAGED2oAHQcto3m5j7vjIse42blSPW+3Y26RjZV8pG1x3tCQ5f67ac8o7XZsJ/3ll3FAAGiIIGQGspJTd8bN3M3vHbfQPNKmXLay9tefvBF+aPp2yYnocfqTsOAANgpddwlFI2KqX8oZRyfSnlulLKB/rGP11KmVdK+Wvf137LPefjpZRZpZSbSimv7o8fAABWVPuaa+a+w3fLL19xct1RVtjhY2fk1mO2TunwmSpAK1iV/+27k3ywqqqrSyljklxVSrmw774vV1X1xeUfXErZKslBSbZOsmGS35VSplRV1bMKGQDgGbWvuWa6t900s/55RPaeem1+Mv7LGd22Rt2xVthabWvkz+88MbuP+FAmnzQrPffdV3ckAFajlS5oVVXdleSuvtsLSik3JBn/DE85MMlZVVUtTTK7lDIryS5JLl3ZDADwdEpHR+Yds0sOe8cFOXDM+Zk07PGljANzPbP+tF77qNz4tq9nm0nvyMb/9GDS67NNgKGqX46pKqVskmSHJJf3DR1VSplRSvlOKWVs39j4JHcs97S5eZpCV0qZVkq5spRyZVeW9kdEAFpI+/rrZ9bnp+YP7z8xx65z63LlbPBqL215/WbXpG34sLqjALAarXJBK6WMTnJ2kqOrqpqf5BtJJiXZPo/NsJ30XF+zqqrTqqqaWlXV1GEZsaoRAWgh7WuumRs/tWluOviUrNc+qu44/erwdS9Jzw6b1x0DgNVolQpaKWVYHitnZ1ZV9T9JUlXVPVVV9VRV1ZvkW3lsGWOSzEuy0XJPn9A3BgCrpq097VtOzkOH7p6un6+Va99wciOuZdbfNu4YnZsPHV53DABWo5Xeg1ZKKUn+O8kNVVV9abnxcX3705Lk9Umu7bt9bpIfllK+lMcOCZmc5IqVfX8ASJKUktnH75Kf/fOXs1lHW0a2DU8ydEvMd1/x3zlx4mvTPfu2uqMAsBqsyseLeyZ5W5KXP+lI/S+UUv5WSpmR5GVJjkmSqqquS/KTJNcnuSDJkU5wBGBVVbtvl2/802nZbnhnXzkb2vbs7Mrtb3qmM7kAGMxKVVV1Z3hGa5Z1ql3L3nXHAKBh2kaNyuKXb523nPDrHLn2Hc/+hCHk6w9vlF+9bpf0zLyl7igArITLq4syv3qwPNV9Q2+BPgBDWvuWk3PLibtnmz8tzllf/1LLlbMkOXLtOzL7syPTvtnEuqMA0M9W5ULVADCgyg5b55CzLsghYx7oGxn8x+evrBv2PCNTjn97Jh5ckoavhgFgxZlBA2BQaNtmi7zwO9ctV8743I6/SMfzN6g7BgD9SEEDoNHaRo7MvI/ukdf/9OKcsMFf647TKPuPeiAPvmyTumMA0I8scQSgccqIEclWm+XuF6+VCW+YnSs2+0pLnND4XI0owzLi0LtTzh6RaunSuuMA0A8UNAAapXR05KZTt8kf9/5qxrWvkWGlPUP5umar6idbnpk37XNM1jjHpUUBhgJLHAFolIUH7pTfvfyr2bhjdF8545k8r31U2o6897FZRwAGPQUNgNq1dXamY+ILctu/75Hjv/CtTBrWuqczroyfbXlmbv/ITmnr7Kw7CgCryBJHAGrTNnJkHj5wu4w/YlaOGf/z7DYiaS8+O3yu1msflUvfe1L22umdGf/2eemZP7/uSACsJL8FAahHW3tu+dQLc8GJX87PJv0ue3a2KWerYK22NXLRTt/Oo7tPqTsKAKvAb0IABlzbmDGZ+7Fd88dDTsxabWvUHWfIWK99VG7b3749gMFMQQNg4LS1p+tVUzPhd72ZfsRXMq7DXrP+dvK+30v7lEl1xwBgJdmDBsBq1/bCLXP/Tmtn6QEP5+c7fKXvEBBH568Orxm5JEcdu242/5c7UnUtqzsOAM+RggbAalWmbpOP/viH2b1zaUaUYUnMmq1uf9v/a3nVH47OmB9fVncUAJ4jSxwBWG3axozJbR8teekavX3ljIEwuq0zSw55yLH7AIOQggZAv2sbMyZt222RWd+clBl7nF53nJb05x3PyN3v2rHuGAA8R5Y4AtBvOl6wUW76l/F5y96X5KCxF2a74Z1JnCpYh5Ftw/PeI8/JuT/eIj33P1B3HABWkIIGQL8oI0bkxs+ul1tefmrfiOV1dXvrmrfkxzvtm+G/UdAABgtLHAFYZe3rr5+ZJ26fa176jbqjsJzRbZ25cy97/wAGEwUNgJXW1tmZOcfvngP+eH1ueuMpGd1m1qxpDtj3srSPHVt3DABWkIIGwHNWOjpy75F7pDp/3Vxx6Jdy+NrzMqzYa9ZEx29wRe46eMu6YwCwguxBA+A5aV9//cz64Ga55JATs177qCRr1B2JZzCiDMsr3n1prvvp+um577664wDwLMygAbDCSkdHbjh+k9zwtq/3lTMGg89vcFVu+sSkpM0sJ0DTKWgAPKu2kSNT7bl9bj1j61y931fTXvz6GEzaS1sufeNJmfntHdI2SrEGaDK/YQF4Wu1rr5WZ39glE/5Q8s0f/ldmvuR7Gds+su5YrITntY/K3171X3nktdvWHQWAZ2APGgBP6473bJ2ZB5zcdwDI6LrjsIpGt3Xm3qnJmLPqTgLA0zGDBsA/KB0dWfbqqfnwu37idMYhZpudZ6d0+HwWoKkUNAD+rm3UqDz4zt1zz9mb5TunfSVvX/P+uiPRz/7jBeekbcqmdccA4Gn4CA2AJEnbNlvk3s/15k87fDUj24bHksahabvhnbn50HWz6UfqTgLAUzGDBkDattkiO5xxfa7a6Sd95Yyh7HWvuCxtY8bUHQOAp6CgAbSwMmx4Hn777nnlWVfkPzeYUXccBsix6/0pDx24dd0xAHgKljgCtKAybHiyzeTMPGZELnvZF/M8F51uKeM6Ruf1H70of/zLjum57qa64wCwHAUNoIW0r71Wbj5lYnbdZE7+bfxpmTJsVBLlrBV9dN2bc/8Zo3P96zdK92131B0HgD4KGkALefjVW+bSF5+U9doVM5J/f97l2X/yjhmmoAE0hj1oAK2glCzdd+e86VO/7StnkIxsG5679hxRdwwAlmMGDWCoKiUdz98gD75skzzw2kfzs91PznbDO+tORcOM3vn+pK096e2pOwoAUdAAhqTS0ZGbvzg1n9vvRzlw1HkZUYYlUc74R5/f8uyctOkB6Zk1u+4oAERBAxhy2kaOzO1Hb59L33hi3+mMw+qORIPtOmJRFmz7vIxU0AAawR40gCGirbMzCw7aLWtcMCqXHHGSo/NZIaPbOlO9977HLr0AQO3MoAEMUmXEiLRtslHuffH6WbBJstcrZ+Ss8V/OWm1rJFmj7ngMIudt/YPs9ukPZuJx01N1d9cdB6ClKWgAg9Di1++aiR+5IR8f9/1sNmxEhpX2vnsUM567se0jc9nbT8qeSz6UjT5zSd1xAFqaJY4Ag0hbZ2ceeetu+cwXv5Xvv+DibDl85HLlDFbe2PaROfWwU1LtuX3dUQBamhk0gEGgbdSoLNh3m6x55B355WYnuZYZq8Venclde4zMhv9XdxKA1qWgATRQ+5prpuuFk3LPzmtkwVbLcvBOV+TY9b7UV8yUM1af8fvelnxleKquZXVHAWhJChpAg7SNGpV5731hjnrPL7LfqPMzrn1k2svjq9EVM1a/f93kVzlhg9eke+68uqMAtCQFDaAh2tdbNzd8fmJu3PerfReWHl13JFrQriO68vDuEzL6pwoaQB0cEgJQo/a110r75E1z99F7ZI/fz8vMfb/ZV86gHiPKsNz58ioppe4oAC3JDBrAQCkl7ZM3Te+Yztz14rWyYLOevHOvP+aNa/46mw4b1lfMnMhI/b736tNy/G7vSLn0mrqjALQcBQ1ggCzdZ2o+//VTM6Hj0SftLRtZay54sr06k23/62+5YZ/103PffXXHAWgpljgCDIRdts0BJ16U3TrbM6Fj9HLlDJrp88+fnps+MSlpM6sLMJD8hQCwmpSOjlR7vDBzPrt7Djj9f3PsOrfWHQlW2LDSnj+84Yt56G271B0FoKVY4gjQj9rXXSeLd5uURyYOy8j9784Ptvx6Jg5zGiOD08Ydo3Pfi7oz9nt1JwFoHQoaQD9Zsv8uedMJv8lha52f0W2dfaPKGYPbW6ZOzzUjRqRaurTuKAAtwRJHgFVUOjqy+A275hNf+W7+Zexty5UzGPymrfPnlM0n1h0DoGWYQQNYSR3jN8z9L39Blrzx4Zy340nZuMNsGUPPxh1r5J49xmb9GXUnAWgNChrACuoY9/ws2Wp87tp9RHq3W5DPbn9O9h35i4xsGx5LGRmqhpX2rPmGu1K+a5kjwEBQ0ABWQNuYMSk/KvnZpJMztn3565YNry0TDJTvbv6DvG+79yXT/1Z3FIAhzx40gGfRsekmuem/Jufsyec+qZxBa5g4bHRuere9lQADQUEDeAqPX8Ps1hN2z3t+87vc/IpvZ0QZVncsqM2PXnlqys7b1h0DYMizxBGgT+noyKP77Jj5m3Sk8zX35AdbfT2T/n4NM59n0dp262zPjqddk7+8bav0Xntj3XEAhiwFDSCPlbObTt4x0/f/ctZrH9U36uAPWN5/bjAjkz62UzZ7a91JAIYuHwkDLa9j001yy/e3eVI5A57KW7e9Im1jxtQdA2DIMoMGtJ5S0rHhuDzwso3z4Gsezed2+nleN+rhtBflDJ7NG9e6KldselhyzQ11RwEYkhQ0oGV0jHt+7jh40yzZeVH+Y8dzc8Cox69hllhQACtmu+GduWOfsRl/Td1JAIYmBQ1oCe1TJmXDM+7OuRP+K+3l8TLmGmawMjbZd3a6v+LC1QCrg4+MgSGtdHSkfevNs9EP7sy3Nvq/5coZsLL+c5OfJ9tOrjsGwJBkBg0YkjpesFEe3GN8Hn7DonxzpzOyl2vsQr/ZbnhnHvnMo1n79Z3pXbKk7jgAQ4qCBgwZ7euuk7vfvHmWvGJBPrbNBTlkzL1mzGA1uWDbH2Tn09+bzY6cm54HHqw7DsCQoaABQ0L7uutk3n9vkKt3/vpypUw5g9VlrbY1ct2Lv5ttjjkqm3zy0rrjAAwZ/noBBr+29sz64Oa5euczzZjBABpW2vPqfa5MGTGi7igAQ8Yq/yVTSplTSvlbKeWvpZQr+8bWKaVcWEq5ue/fsX3jpZTytVLKrFLKjFLKjqv6/kCLa2vPvI/smj++9UTlDGrwyQ3+N4v3eWHdMQCGjP76a+ZlVVVtX1XV1L7vP5bkoqqqJie5qO/7JNk3yeS+r2lJvtFP7w+0qAVv2TkXHPGFjOsYXXcUaEnPax+Vhe96JCml7igAQ8Lq+rj5wCTf67v9vSSvW278+9VjLkuydill3GrKALSAsX+6PT+Zv13dMaClnbzNj9K23RZ1xwAYEvqjoFVJfltKuaqUMq1vbIOqqu7qu313kg36bo9Pcsdyz53bNwawUrrn3ZnvnLFPuqqeuqNAy9qzsy03HjHaLBpAP+iPgvaiqqp2zGPLF48spey1/J1VVVV5rMStsFLKtFLKlaWUK7uytB8iAkPZC866I5cuba87BrS03+zzlTzwrt2UNIBVtMoFraqqeX3/3pvk50l2SXLP40sX+/69t+/h85JstNzTJ/SNPfk1T6uqampVVVOHxclQwDPrufveHHfLgXXHgJY2Zdio/PBTX8wdn9xdSQNYBatU0Eopo0opYx6/neRVSa5Ncm6Sd/Q97B1Jzum7fW6St/ed5rhbkkeWWwoJsFKqpUuz6MwNLXOEmk0ZNirvP/ictI8ZU3cUgEFrVWfQNkjy51LKNUmuSPKrqqouSPL5JK8spdyc5BV93yfJ+UluTTIrybeSHLGK7w+QJFnvlzPzgTv3rDsGtLwd15iTTHh+3TEABq2OVXlyVVW3JvmHi59UVfVAkr2fYrxKcuSqvCfAU+m5/4HMeceUfPTMJTlhg7/WHQda1k7D2zPv1evl+dfPrDsKwKDkqq7AkNFz/cxc9m+75KwFY+uOAi2rvbRlnf3mpXSs0mfAAC1LQQOGlM7zrsjXP/6W3LBscd1RoGWdsNnP0jZ5Yt0xAAYlBQ0Yckb+/IoceNnhdceAlrXT8Pbc8dr1644BMCgpaMDQU1VZ55cjs7TqqjsJtKT20pYTp/13lu63c91RAAYdBQ0Yktb5y0P58YJxdceAlrXPyKX58NfOUNIAniMFDRiSeq67KT/e70WZeM60LOxdUnccaEmvGbkku352etrXXqvuKACDhoIGDFndt87J5kdfk92nv7PuKNCyDl/nz+mdNKHuGACDhoIGDGnV0qXZ4Gudmdu9sO4o0JImDhud2/cxgwawohQ0YMgbdsVN+czdr6w7BrSsbfa9KWXEiLpjAAwKChow5PUuWpSrvrl9FvcuqzsKtKTjNzo32WqzumMADAoKGtASNrhoXmYsa687BrSkKcNG5ab3jkpKqTsKQOMpaEBL6L59Xt4x/bC6Y0DL+t2+X8q9R+6ets7OuqMANJqCBrSG3p5M+tSiTPne+3KXA0NgwE0aNjq//+gXc+PXt0np6Kg7DkBjKWhAy+iZeUsmfuKKvPKUj+TenkV1x4GWM7Z9ZH7/iq+k2nHLuqMANJaCBrSW3p5s9JWr8+m79647CbSkCR1r5N6pY+qOAdBYChrQcnqXLMnvLtyh7hjQkoaV9gx7zX1Jm0N7AJ6Kgga0pE1+uTi324sGtfjCFmenfdIL6o4B0EgKGtCS2q64Lq+/5p11x4CWtGdnV+a95vl1xwBoJAUNaElVd3c2+FBvtrnskLqjQMsZVtrz5nf+Ph0TxtcdBaBxFDSgZfXcNCsbf2xJTn7IUisYaJ9c78a0n9mT7r13sh8NYDkKGtDSembekrM/9Kr835LeuqNAyzl38gU57lv/nfbNN607CkBjKGhAyxvx6+l555lH1h0DWtJencm9e65XdwyAxlDQAJJsct7CzO5yqiPUoWv/h1M6OuqOAdAIChpAknLNzPzHXfvWHQNa0nk7fitdL3lh3TEAGkFBA0hSLV2a2z45JS+a8YYs7l1WdxxoKRt3jM6cw+wDBUgUNIC/G/a7qzLmdXdm6/OOSk/lj0UYSD9/0TfS/fKd6o4BUDsFDWA5vUuWZKv/vDNnLHARXRhI2w3vzNSTrsqS/XepOwpArRQ0gCfpvmNu/v33r6s7BrScEzb4a7568slZ/IZd644CUBsFDeApTD5jaa5b9mjdMaDlbD9iRO7e1Z8nQOvyPyDAUyiXX5tDr31H3TGgJW252+ykrb3uGAC1UNAAnkpvT5ZdtF7u7VlUdxJoOe/e8OJ0vGBC3TEAaqGgATyN53/18ux2zrFKGgyw14xcmJnv3bDuGAC1UNAAnk5vTya//8rs/6kP5aJHLbeCgdJe2vL5N5yZjokvqDsKwIBT0ACeSW9Pxp5+ad77P9NcGw0G0OtGPZybjhpnLxrQchQ0gBUw5Zt354JHR9YdA1pGe2nLJW85KbO+uHPa11u37jgAA0ZBA1gBPbfMyYf/+qa6Y0BLeV77qMz8p1My722b1x0FYMAoaAAroqqy8eeqvP/OnS11hAHUXtoy4sGq7hgAA0ZBA1hB1VXXZdY+a2b7K95adxRoGTO7FmXtmxfXHQNgwChoAM9Bz/0PZKNP9uTiJXUngaGvp+rNvmd/MOWya+uOAjBgFDSA56jn+pl5xwXT6o4BQ9rsroXZ6s+HZvMTbk16e+qOAzBgOuoOADDoVFW2OGV+Lt4n2auz7jAw9Lzu5ldn4afHZ5M/XpMe5QxoMWbQAFZC73U3ZdqVb6s7BgwpFy9Jpnzvfen65/a0/+FqM2dASzKDBrAyqiqjLxydvKjuIDA0zOxalH/98LGZePal6a47DECNzKABrKQNfndnDrh5nyzuXVZ3FBjULnq0PW845cMZ9fMr644CUDsFDWAldc++LV37LcjW5x2lpMFKOGvB2Gz623flxH86OONPuMSSRoBY4giwSnoXLcoWx8zIdmtNy6yXnl53HBg0jpi3W+a8dUImz7w6VeVC1ACPM4MGsIp6lyzJxG+U3NW9sO4oMCicPv95mfPWCem5aVainAE8gYIG0A86pt+Qd93ylrpjQKPd37Mox923dU477g2PlTMA/oEljgD9oHfJktx97ibp+VBv2ovPvmB5XVVPtv7TYRn/3eFZ4/KbM+bhy+qOBNBY/ooA6CfjL7gvv1i0dt0xoFG6qp5s8ft3Z9J7Zmf4b65Mz8OP1B0JoNEUNIB+0nPDzfnicf+ce3sW1R0FGqGn6s2H7941W3z4zvQuWFB3HIBBQUED6Edr/fTKvOrED+fcRSPrjgK1md21MDte+U/Z9tSjcvMbN0z33ffUHQlg0LAHDaAfVd3d2eBrl+TLMw9J53+dnleN7Ko7EgyY2V0L8+HbX5cHPjsxz/vdX1J135TuukMBDDJm0ABWg+EXTM+RZ7+77hgwYM5csG4Oe98xWfiyhzP8gumpulUzgJWhoAGsJpucvyRzXRuNFnDB4hH57uEHZsT5ihnAqlLQAFaTjitn5sUXHu0C1gxZC3uX5IQHJucz/3pY2v9wdd1xAIYEBQ1gNeldtChT3vOXvPKUjyhpDCkLe5dk7+sPyH5HvT9/fOlGGf0T1zUD6C8KGsDq1NuT8V+4PHue88G6k0C/2f7iwzP8wAeyxi+uSM8DD9YdB2BIUdAAVrfenkw+89HM7jKLxuB36sPjs/knH07vItf7A1gdFDSAAVCuvD4HX3do3TFgpdzfsyj73PiabHnaETnnoBen+9Y5dUcCGLJcBw1gAFTd3ek6Z/0sfWFXRpRhdceBFfb+O3fO1Z/bMWN+dU02XjIvvXUHAhjizKABDJAN/mdWfr7weXXHgBWytOrKpN8fllkHTciosy9P75IldUcCaAkKGsAA6X3ooXz8D2/O0qqr7ijwjJZWXdnil0dkyrSb0jNrdt1xAFqKggYwQKru7mxx7LXZ4tfvU9JorOuWPZqdT/pAtjjmuvQuXlx3HICWYw8awADqXbw4W7z/+mz1zWm5Ze/v1h0H/m5p1ZV9rn9TlnxnXMb96BJ7zQBqYgYNYID1Ll6cKSctzQ3LzEstMCYAACAASURBVE7QDDO7FmXnkz6QztfdnzV/5KLTAHVS0ABqUF03K5+644C6Y0Du71mU157xoYz78qWubQbQAAoaQA2qrmWZ9ZMp9qJRi56qN2cvXDNTLn573nLY+7Ppf16TVFXdsQCIPWgAtdnwBzfkK9O2ykfXvbnuKLSQnqo3U6/852x49KOZOOdvSVXZbwbQIGbQAGrS89BDOfWSl5pFY0C9aMabM+6d96Z79m1mzQAaSEEDqNGWH5uVV1/35rpj0AIe6lmc0+c/L2M+Mzo9DzxYdxwAnsZKF7RSyuallL8u9zW/lHJ0KeXTpZR5y43vt9xzPl5KmVVKuamU8ur++REABq+ehx7KqHcsycuuO7DuKAxRM5YtyaYXvjOvn/aB/ORF26ZcOqPuSAA8g1L1w/KGUkp7knlJdk1yWJKFVVV98UmP2SrJj5LskmTDJL9LMqWqqp5neu01yzrVrmXvVc4I0GQdG03IKy+4NkePnVN3FIaInqo3Jz+8aX7xkVdmxK+vTnqf8dctAAPo8uqizK8eLE91X38tcdw7yS1VVd32DI85MMlZVVUtrapqdpJZeaysAbS87jvm5uQL96k7BkPEI72PZvOzjsiFr9giI341XTkDGET6q6AdlMdmxx53VCllRinlO6WUsX1j45Pcsdxj5vaNAZDkBb/syr09rkPFqrl4SbLL6cdms49dne677q47DgDP0SoXtFLK8CQHJPlp39A3kkxKsn2Su5KctBKvOa2UcmUp5cquLF3ViACDwvA/XZtDZh5UdwwGoa6qJ195aJNMPHdaPve6g7LJJy9N1bWs7lgArIT+mEHbN8nVVVXdkyRVVd1TVVVPVVW9Sb6V/7+McV6SjZZ73oS+sX9QVdVpVVVNrapq6rCM6IeIAM1XLV2ato+vnYNmvzw9lStTsWJmLFuSHb72L/ntSydnyuFXpHfGjXVHAmAV9EdBOzjLLW8spYxb7r7XJ7m27/a5SQ4qpYwopUxMMjnJFf3w/gBDxxV/y/z9q7xtjsOReHYzli3Ju487JuNPuCQ9991XdxwA+kHHqjy5lDIqySuTvHe54S+UUrZPUiWZ8/h9VVVdV0r5SZLrk3QnOfLZTnAEaEU9Dz2UOV/bLbO/cF4mDhtddxwa6Kqly/LDB3fL/31pl4w989K64wDQj/rlmP3VyTH7QEsqJXOO3y3XH/r1tJf+Os+Jwe6WroV55S8/mC1OeTjVnLnpXeRQGYDB6JmO2V+lGTQAVpOqyman3pGfvmndHDTmobrTULO53QvzpmsPTefJYzP5givS0/APVwFYeT6WBWio7rnz8qmrDqw7BjX7/vz1csiRx2btA2/PiF9PT5QzgCFNQQNoqqrKuB8Pz1Hzds3SqqvuNAygpVVXZixbkjfMemW+f/gB6TzvCsfmA7QIe9AAGq5tzJjc9oFt89tpX8iEDoeGDHXH3bd1zj3tJRl34T3pvfW2VN3ddUcCoJ/ZgwYwiPUuWJCNT7gye01+f2595XfqjsNqdNDsl2fBIaPzvDmXxDHHAK3JEkeAQaDqWpZNv5vc1b2w7iisBot7l2Xby/85C946Jt1zbq87DgA1UtAABolhV92cL963V90x6Cc9VW9mdy3Mu25/UaZ+4+iMP+iWdM++re5YANTMEkeAQaJ3wYJc/rldc/rxc3PomvfWHYeVdPGS5EM3vDndv1wv61+1MG0zbs5GSy5Js3eEAzBQHBICMMi0T5mU8WfcnW9t9H91R+E56Kl6s/d1b8iIf1szueK6pNcuM4BW9UyHhFjiCDDI9My8JXccvkkuW+IP/MFice+y7DD9kIw8eH5y2QzlDICnpaABDELVX6/P2356VBb2Lqk7Cs/g/p5F2e+m/bLXcR/I+LfNTc8DD9YdCYCGs8QRYJBq6+xM+wVj88spv647Ck+yuHdZPnTXXpl+yg5Z94zprmUGwBNY4ggwBPUuWZJFn5+Qix5trzsKy/nfR9sy9dSjM/vlw7LOdy9VzgB4ThQ0gEFs+AXT8+kPvTs3LFtcdxSSPNSzOB/99Huz0fGXpHfBgrrjADAIKWgAg9zIc67MW075UP73Uf+l1+WR3kdz/P1b5MX/9aGMPeuquuMAMIjZgwYwRNw/bff86d++mpFtw+uO0jIW9y7LrtMPzeifrpmxv7w+PfPn1x0JgEHAHjSAFrDBWdflX+/Zo+4YLaOr6snWvzwqEw6+NWv+8DLlDIB+oaABDBE98+fnV7/bue4YLePUhzfNVv9+e3qXuNQBAP1HQQMYQib9bGHeO3f39FS9dUcZku7vWZR/nv2ybPnNI/KL978i3XfdXXckAIYYBQ1gCKmm/y13vHZMNj/riNzfs6juOEPKqQ+Pzz6f/lAefPnibPzvl6TjIoeBAND/FDSAIabnnnuz2UevzM6/OqbuKEPCvT2LMuXit+fc1++edf/70lRLl9YdCYAhTEEDGIKq7u5M/v7S3N69sO4og9rSqisvOuND2fRtN6bnpll1xwGgBShoAENU2/QbcvD1b687xqA0u2th3nPHntntsx/IZp+/LlXXsrojAdAiOuoOAMDqUXUty7CT181d31iYcR2j644zKNzfsyg7n3dMpnx/Scpfbsrzll6SnrpDAdBSzKABDGGdv70me/35qHRVasazub9nUfY67cPZ/P1XJ5fNsNcMgFooaABDWNW1LJOPmJ3tLjnU0fvPYGHvkuz6x6Oy0WcvT9XdXXccAFqYggYwxPU8/Egmvuf27DD9kLqjNM7FS5JJvz8sr/rg0Zny3plJr5lGAOqloAG0gJ6HH8n4T1b5/vz16o7SGL9dPCyf+sC0bPbWv2TMjy9L7yLXjQOgfgoaQIvovfbGfPbsN9cdoxEuerQ9x3/wsHSed0XdUQDgCRQ0gBbygl8ubulroy2tuvL1hzfKvx/7rqxxjnIGQPMoaAAtpO3qG/OSC47JI72P1h1lwJ368Pjs+W/vz/l7b62cAdBYChpAC6mWLs0W7/9b9vriB7Owd0ndcQbEFUu7MvGCd+fcA3bJuv99abrvurvuSADwtFyoGqDF9C5Zkg1P+2t22+tduXa3M+uOs1odd9/Wuew9O2bK9KvSU1V1xwGAZ2UGDaAF9S5enHFfGpYrlnbVHWW1uejR9vzu+BclV/wtUc4AGCTMoAG0qLY//zWHf+H9+fMnvpqRbcPrjtMvLlvSk3+5/uAs/r/1svF5D2b0tZfXHQkAnhMFDaCFbfDdv2SbLf8lFx14UiYOG113nJXWU/Vmx+mHZMPjknVm3Jh1qpnprTsUAKwESxwBWljvkiWZcuxVOfhfP5S/Ll1ad5yV0lP15pi7ds2EYxan95obLGcEYFBT0ABaXNXdnbXOvCxv+tnRWVoNnj1pS6uuXPRoezb/4zsza9+10z37trojAcAqs8QRgCTJlBNm5sRXb5tPrndj3VGe1tKqK39e0plv3f2S3PjjLbLh+Xdl0q1/S09vT93RAKBfKGgAJEl67n8gZ/7Py/PJac0raF1VT6bd8dJcc/o2Gffbu9I9545s0HtJ1DIAhhpLHAH4u40vWJi7uhfWHeMJFvcuy5Rz35d79h+e9U+9NN23zknMmAEwRCloAPxd+81z86P529UdI0ly3bJH86/3bJc9vnB0tjhmRnruf6DuSACw2lniCMDf9Tz0SP784KQcu86tA/7e9/csyqkP7ZTvX79Lhv9ldDa64KGU2fOywfxLHJkPQMtQ0AD4/3p7MueHm2Xpp87PiDJsQN7yoZ7F2fdvb88aX107a1w6MxPnz3gsyoC8OwA0iyWOADzB8394XV593ZsH5L3edfuLsv8Hj83YN8zN8N9cmZ758wfkfQGgqcygAfAEPfPnZ+GPt07X1j0ZVtr773Wr3jzQ+2iS5LyFk/L1m1+ScUcuyug7LjNbBgB9FDQAnqiUPPiSpf1Szm7pWpjj79o3F1+2dUbPacu4Pz2S9CZtd96X9e+blW6nMQLAEyhoADxB29ab54wXfTursgr+oZ7F2evKd2eDr3Wm408zsln3ZUmSqu9+tQwAnpqCBsAT3Hrw2OzZuWpblHc695hM+cBVqbq7/17KAIBnp6AB8P+1tWenl974nJ+2uHdZfrRg43zp+r2z+L5R2eozc9Ld3b0aAgLA0KagAfAEi7uHr9Dj5nYvzM8XbJ0vT39FNjyvI2v+cVYm3H9dkkQ1A4CVo6AB8P/19uT6SzZNJj/9Q7qqnux97ZvS/uX10vl/N2bygquS2FcGAP3BddAAeIIRD5X0VP948H1P1ZtfLe7Mlj88KqMPejjDL5ie3gULakgIAEOXGTQAnmDjM+dk2xFHpXrSKfvtS5KNz747k26+1GwZAKwmChoAT9A9785s9Jk7n/I+xQwAVi9LHAEAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGWKGCVkr5Tinl3lLKtcuNrVNKubCUcnPfv2P7xksp5WullFmllBmllB2Xe847+h5/cynlHf3/4wAAAAxeKzqDdnqSfZ409rEkF1VVNTnJRX3fJ8m+SSb3fU1L8o3ksUKX5LgkuybZJclxj5c6AAAAVrCgVVV1cZIHnzR8YJLv9d3+XpLXLTf+/eoxlyVZu5QyLsmrk1xYVdWDVVU9lOTC/GPpAwAAaFkdq/DcDaqquqvv9t1JNui7PT7JHcs9bm7f2NON/4NSyrQ8NvuWzoxchYgAAACDR78cElJVVZWk6o/X6nu906qqmlpV1dRhGdFfLwsAANBoq1LQ7ulbupi+f+/tG5+XZKPlHjehb+zpxgEAAMiqFbRzkzx+EuM7kpyz3Pjb+05z3C3JI31LIX+T5FWllLF9h4O8qm8MAACArOAetFLKj5K8NMl6pZS5eew0xs8n+Ukp5V1Jbkvylr6Hn59kvySzkixOcliSVFX1YCnlM0mm9z3uP6qqevLBIwAAAC2rPLZ9rLnWLOtUu5a9644BAADQLy6vLsr86sHyVPf1yyEhAAAArDoFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIRQ0AACAhlDQAAAAGkJBAwAAaAgFDQAAoCEUNAAAgIZQ0AAAABpCQQMAAGgIBQ0AAKAhFDQAAICGUNAAAAAaQkEDAABoCAUNAACgIZ61oJVSvlNKubeUcu1yYyeWUm4spcwopfy8lLJ23/gmpZRHSyl/7fs6dbnn7FRK+VspZVYp5WullLJ6fiQAAIDBaUVm0E5Pss+Txi5Msk1VVdslmZnk48vdd0tVVdv3fR2+3Pg3krwnyeS+rye/JgAAQEt71oJWVdXFSR580thvq6rq7vv2siQTnuk1SinjkqxZVdVlVVVVSb6f5HUrFxkAAGBo6o89aO9M8uvlvp9YSvlLKeWPpZQX942NTzJ3ucfM7RsDAACgT8eqPLmU8okk3UnO7Bu6K8nGVVU9UErZKckvSilbr8TrTksyLUk6M3JVIgIAAAwaK13QSimHJtk/yd59yxZTVdXSJEv7bl9VSrklyZQk8/LEZZAT+saeUlVVpyU5LUnWLOtUK5sRAABgMFmpJY6llH2SfCTJAVVVLV5ufP1SSnvf7U3z2GEgt1ZVdVeS+aWU3fpOb3x7knNWOT0AAMAQ8qwzaKWUHyV5aZL1SilzkxyXx05tHJHkwr7T8i/rO7FxryT/UUrpStKb5PCqqh4/YOSIPHYi5Bp5bM/a8vvWAAAAWl7pW53YWGuWdapdy951xwAAAOgXl1cXZX714FNeF7o/TnEEAACgHyhoAAAADaGgAQAANISCBgAA0BAKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0hIIGAADQEAoaAABAQyhoAAAADaGgAQAANISCBgAA0BAKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0hIIGAADQEAoaAABAQyhoAAAADaGgAQAANISCBgAA0BAKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0hIIGAADQEAoaAABAQyhoAAAADaGgAQAANISCBgAA0BAKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0hIIGAADQEAoaAABAQyhoAAAADaGgAQAANISCBgAA0BAKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0hIIGAADQEAoaAABAQyhoAAAADaGgAQAANISCBgAA0BAKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0xLMWtFLKd0op95ZSrl1u7NOllHmllL/2fe233H0fL6XMKqXcVEp59XLj+/SNzSqlfKz/fxQAAIDBbUVm0E5Pss9TjH+5qqrt+77OT5JSylZJDkqydd9zTimltJdS2pN8Pcm+SbZKcnDfYwEAAOjT8WwPqKrq4lLKJiv4egcmOauqqqVJZpdSZiXZpe++WVVV3ZokpZSz+h57/XNODAAAMEStyh60o0opM/qWQI7tGxuf5I7lHjO3b+zpxp9SKWVaKeXKUsqVXVm6ChEBAAAGj5UtaN9IMinJ9knuSnJSvyVKUlXVaVVVTa2qauqwjOjPlwYAAGisZ13i+FSqqrrn8dullG8l+WXft/OSbLTcQyf0jeUZxgEAAMhKzqCVUsYt9+3rkzx+wuO5SQ4qpYwopUxMMjnJFUmmJ5lcSplYShmexw4SOXflYwMAAAw9zzqDVkr5UZKXJlmvlDI3yXFJXlpK2T5JlWROkvcmSVVV15VSfpLHDv/oTnJkVVU9fa9zVJLfJGlP8p2qqq7r958GAABgECtVVdWd4RmtWdapdi171x0DAACgX1xeXZT51YPlqe5blVMcAQAA6EcKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0hIIGAADQEAoaAABAQyhoAAAADaGgAQAANISCBgAA0BAKGgAAQEMoaAAAAA2hoAEAADSEggYAANAQChoAAEBDKGgAAAANoaABAAA0hIIGAADQEAoaAABAQyhoAAAADaGgAf+vvfuLubuu7wD+/oQyL3TENhrCChtsqUuYFwgNJXEaE7eC3JTtgsGFdIaMLcNFk13MeYPRG7JMk5ElLGySQeJwZOrggs11xOjN6CjY8HesxWFsU2lcDcW4MHGfXZxfzRHb8jzt8/R8T/t6JU/O7/n8fs95vqf5nO953jnf8y0AAIMQ0AAAAAYhoAEAAAxCQAMAABiEgAYAADAIAQ0AAGAQAhoAAMAgBDQAAIBBCGgAAACDENAAAAAGIaABAAAMQkADAAAYhIAGAAAwCAENAABgEAIaAADAIAQ0AACAQQhoAAAAgxDQAAAABiGgAQAADEJAAwAAGISABgAAMAgBDQAAYBACGgAAwCAENAAAgEEIaAAAAIMQ0AAAAAYhoAEAAAxCQAMAABiEgAYAADAIAQ0AAGAQAhoAAMAgBDQAAIBBCGgAAACDENAAAAAGIaABAAAMQkADAAAYhIAGAAAwCAENAABgEAIaAADAIAQ0AACAQQhoAAAAgxDQAAAABiGgAQAADEJAAwAAGISABgAAMAgBDQAAYBACGgAAwCAENAAAgEG8aUCrqnur6nBVPTNX+/uq2jt9vVRVe6f6pVX1P3Pn/mruZ66qqqeran9V3VVVtT4PCQAAYDltWME1f5vkL5Pcf6zQ3b9z7LiqPpvklbnrX+zuK45zP3cn+b0ku5M8kuS6JP+0+iEDAACcnd70HbTu/kaSI8c7N70LdmOSB052H1V1UZILuvux7u7Mwt4Nqx8uAADA2et0P4P2viQvd/e+udplVfXNqvp6Vb1vqm1OcmDumgNT7biq6raq2lNVe36U105ziAAAAMthJUscT+bm/PS7Z4eS/GJ3/3dVXZXkH6vq11Z7p919T5J7kuSC2tSnOUYAAIClcMoBrao2JPntJFcdq3X3a8nsLa/ufqKqXkzyriQHk1w89+MXTzUAAAAmp7PE8TeS/Ed3/2TpYlW9s6rOm45/OcmWJN/q7kNJjlbVNdPn1m5J8tBp/G4AAICzzkq22X8gyb8l+dWqOlBVt06nbsrPbg7y/iRPTdvu/0OSP+juYxuM/GGSv0myP8mLsYMjAADAT6nZporjuqA29bb64KKHAQAAsCZ296M52keO+/9Cn+4ujgAAAKwRAQ0AAGAQAhoAAMAgBDQAAIBBCGgAAACDENAAAAAGIaABAAAMQkADAAAYhIAGAAAwCAENAABgEAIaAADAIAQ0AACAQQhoAAAAgxDQAAAABiGgAQAADEJAAwAAGISABgAAMAgBDQAAYBACGgAAwCAENAAAgEEIaAAAAIMQ0AAAAAYhoAEAAAxCQAMAABiEgAYAADAIAQ0AAGAQAhoAAMAgBDQAAIBBCGgAAACDENAAAAAGIaABAAAMQkADAAAYhIAGAAAwCAENAABgEAIaAADAIAQ0AACAQQhoAAAAgxDQAAAABiGgAQAADEJAAwAAGISABgAAMAgBDQAAYBACGgAAwCAENAAAgEEIaAAAAIMQ0AAAAAYhoAEAAAxCQAMAABiEgAYAADAIAQ0AAGAQAhoAAMAgBDQAAIBBCGgAAACDENAAAAAGIaABAAAMQkADAAAYhIAGAAAwCAENAABgEAIaAADAIAQ0AACAQQhoAAAAgxDQAAAABiGgAQAADEJAAwAAGISABgAAMAgBDQAAYBACGgAAwCAENAAAgEEIaAAAAIMQ0AAAAAbxpgGtqi6pqq9V1XNV9WxVfWyqb6qqXVW1b7rdONWrqu6qqv1V9VRVXTl3Xzun6/dV1c71e1gAAADLZyXvoL2e5I+7+/Ik1yS5vaouT/KJJI9295Ykj07fJ8mHkmyZvm5LcncyC3RJ7kiyLcnVSe44FuoAAABYQUDr7kPd/eR0/GqS55NsTrIjyX3TZfcluWE63pHk/p55LMnbq+qiJNcm2dXdR7r7+0l2JbluTR8NAADAElvVZ9Cq6tIk70myO8mF3X1oOvXdJBdOx5uTfGfuxw5MtRPVAQAAyCoCWlW9LcmXkny8u4/On+vuTtJrNaiquq2q9lTVnh/ltbW6WwAAgKGtKKBV1fmZhbMvdPeXp/LL09LFTLeHp/rBJJfM/fjFU+1E9Z/R3fd099bu3np+3rLSxwIAALDUVrKLYyX5fJLnu/tzc6ceTnJsJ8adSR6aq98y7eZ4TZJXpqWQX02yvao2TpuDbJ9qAAAAJNmwgmvem+TDSZ6uqr1T7ZNJ7kzyYFXdmuTbSW6czj2S5Pok+5P8MMlHkqS7j1TVZ5I8Pl336e4+siaPAgAA4CxQs4+PjeuC2tTb6oOLHgYAAMCa2N2P5mgfqeOdW9UujgAAAKwfAQ0AAGAQAhoAAMAgBDQAAIBBCGgAAACDENAAAAAGIaABAAAMQkADAAAYhIAGAAAwCAENAABgENXdix7DSVXVq0leWPQ4OCe9I8n3Fj0Izkl6j0XSfyyK3mNRFtF7v9Td7zzeiQ1neCCn4oXu3rroQXDuqao9eo9F0Hsskv5jUfQeizJa71niCAAAMAgBDQAAYBDLENDuWfQAOGfpPRZF77FI+o9F0XssylC9N/wmIQAAAOeKZXgHDQAA4JwwbECrquuq6oWq2l9Vn1j0eDj7VNVLVfV0Ve2tqj1TbVNV7aqqfdPtxqleVXXX1I9PVdWVix09y6aq7q2qw1X1zFxt1f1WVTun6/dV1c5FPBaWywl671NVdXCa//ZW1fVz5/506r0XquraubrXZValqi6pqq9V1XNV9WxVfWyqm/tYVyfpvaWY+4Zc4lhV5yX5zyS/meRAkseT3Nzdzy10YJxVquqlJFu7+3tztT9LcqS775yehBu7+0+mJ/AfJbk+ybYkf9Hd2xYxbpZTVb0/yQ+S3N/d755qq+q3qtqUZE+SrUk6yRNJruru7y/gIbEkTtB7n0ryg+7+8zdce3mSB5JcneQXkvxrkndNp70usypVdVGSi7r7yar6+czmrBuS/G7Mfayjk/TejVmCuW/Ud9CuTrK/u7/V3f+b5ItJdix4TJwbdiS5bzq+L7Mn87H6/T3zWJK3T09+WJHu/kaSI28or7bfrk2yq7uPTH+Y7Epy3fqPnmV2gt47kR1Jvtjdr3X3fyXZn9lrstdlVq27D3X3k9Pxq0meT7I55j7W2Ul670SGmvtGDWibk3xn7vsDOfk/KpyKTvIvVfVEVd021S7s7kPT8XeTXDgd60nWw2r7TR+ylj46LSO799gSs+g91klVXZrkPUl2x9zHGfSG3kuWYO4bNaDBmfDr3X1lkg8luX1aBvQTPVv/O94aYM5K+o0z7O4kv5LkiiSHknx2scPhbFZVb0vypSQfvwn3JgAAAc5JREFU7+6j8+fMfayn4/TeUsx9owa0g0kumfv+4qkGa6a7D063h5N8JbO3sV8+tnRxuj08Xa4nWQ+r7Td9yJro7pe7+8fd/X9J/jqz+S/Re6yxqjo/sz+Qv9DdX57K5j7W3fF6b1nmvlED2uNJtlTVZVX1c0luSvLwgsfEWaSq3jp9aDRV9dYk25M8k1mfHdsdameSh6bjh5PcMu0wdU2SV+aWZ8CpWm2/fTXJ9qraOC3L2D7VYFXe8Bna38ps/ktmvXdTVb2lqi5LsiXJv8frMqegqirJ55M8392fmztl7mNdnaj3lmXu27Dev+BUdPfrVfXRzJ585yW5t7ufXfCwOLtcmOQrs+dvNiT5u+7+56p6PMmDVXVrkm9ntttPkjyS2a5S+5P8MMlHzvyQWWZV9UCSDyR5R1UdSHJHkjuzin7r7iNV9ZnMXjCS5NPdvdLNHzhHnaD3PlBVV2S2tOylJL+fJN39bFU9mOS5JK8nub27fzzdj9dlVuu9ST6c5Omq2jvVPhlzH+vvRL138zLMfUNusw8AAHAuGnWJIwAAwDlHQAMAABiEgAYAADAIAQ0AAGAQAhoAAMAgBDQAAIBBCGgAAACDENAAAAAG8f9mI9faKie39wAAAABJRU5ErkJggg==\n" + }, + "metadata": { + "needs_background": "light" + } + } + ], + "source": [ + "mask = masks[0].numpy().reshape(im.shape[0], im.shape[1])\n", + "mask = np.where(mask > 0.90, 1, 0)\n", + "mask = np.array(mask, dtype=np.uint8)\n", + "display(mask)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gWp0pMCRgRsc" + }, + "source": [ + "# Convert Mask & Image to COCO JSON\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "g-7HTpM3sLbs", + "outputId": "00ffe574-644f-4c90-87b0-be89e682132b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "64187" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ], + "source": [ + "# use of imantics library\n", + "\n", + "# give the path of an image\n", + "image = imantics_Image.from_path(IMAGE_PATH)\n", + "\n", + "# array of the mask\n", + "mask = Mask(mask)\n", + "\n", + "# define the category of an object\n", + "image.add(mask, category=Category(\"Category Name\")) \n", + "\n", + "# create a dict of coco\n", + "coco_json = image.export(style='coco')\n", + "coco_json.keys()\n", + "\n", + "# write coco_json dict to coco.json\n", + "open('coco.json', \"w\").write(json.dumps(coco_json, indent=4))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "4t8BD-isKTK7", + "outputId": "01722bbb-84e1-41f5-f0b9-1d3af20e9d6d" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[{'color': '#1fab35',\n", + " 'id': 1,\n", + " 'metadata': {},\n", + " 'name': 'Category Name',\n", + " 'supercategory': None}]" + ] + }, + "metadata": {}, + "execution_count": 15 + } + ], + "source": [ + "# display the categories\n", + "coco_json['categories']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "36i8AEKzKabO", + "outputId": "5eac0496-1ec3-4a53-ee6e-717e673e49d6" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[{'coco_url': None,\n", + " 'date_captured': None,\n", + " 'fickr_url': None,\n", + " 'file_name': 'image_3.jpg',\n", + " 'height': 2048,\n", + " 'id': 0,\n", + " 'license': None,\n", + " 'metadata': {},\n", + " 'path': 'image_3.jpg',\n", + " 'width': 2592}]" + ] + }, + "metadata": {}, + "execution_count": 16 + } + ], + "source": [ + "# display image information\n", + "coco_json['images']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "FzQMbZqGKfH8", + "outputId": "d1686f0f-5053-4b73-8353-9cc590635abc" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(833, 188, 784, 747)" + ] + }, + "metadata": {}, + "execution_count": 17 + } + ], + "source": [ + "# display bounding box\n", + "coco_json['annotations'][0]['bbox']" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "bb_to_mask_to_coco.ipynb", + "provenance": [] + }, + "gpuClass": "standard", + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/official/projects/waste_identification_ml/pre_processing/coco_to_tfrecord.ipynb b/official/projects/waste_identification_ml/pre_processing/coco_to_tfrecord.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..4cc32e7cfbb32df4bd1e419f09fa2e6fbe67af14 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/coco_to_tfrecord.ipynb @@ -0,0 +1,397 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Conversion of COCO annotation JSON file to TFRecords" + ], + "metadata": { + "id": "SsIv6LYT84gm" + } + }, + { + "cell_type": "markdown", + "source": [ + "Given a COCO annotated JSON file, your goal is to convert it into a TFRecords file necessary to train with the Mask RCNN model.\n", + "\n", + "To accomplish this task, you will clone the TensorFlow Model Garden repo. The TensorFlow Model Garden is a repository with a number of different implementations of state-of-the-art (SOTA) models and modeling solutions for TensorFlow users.\n", + "\n", + "This notebook is an end to end example. When you run the notebook, it will take COCO annotated JSON train and test files as an input and will convert them into TFRecord files. You can also output sharded TFRecord files in case your training and validation data is huge. It makes it easier for the algorithm to read and access the data." + ], + "metadata": { + "id": "zl7o2xEW9IbX" + } + }, + { + "cell_type": "markdown", + "source": [ + "**Note** - In this example, we assume that all our data is saved on Google drive and we will also write our outputs to Google drive. We also assume that the script will be used as a Google Colab notebook. But this can be changed according to the needs of users. They can modify this in case they are working on their local workstation, remote server or any other database. This colab notebook can be changed to a regular jupyter notebook running on a local machine according to the need of the users." + ], + "metadata": { + "id": "g3OHfWQBpYVB" + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CRwVTTPuED_1" + }, + "source": [ + "## Run the below command to connect to your google drive" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hdRAEurMA3zi", + "outputId": "7212e558-af5d-4cb2-dd1f-6e634f5fca0a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting tensorflow-addons\n", + " Downloading tensorflow_addons-0.16.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)\n", + "\u001b[?25l\r\u001b[K |▎ | 10 kB 22.4 MB/s eta 0:00:01\r\u001b[K |▋ | 20 kB 8.9 MB/s eta 0:00:01\r\u001b[K |▉ | 30 kB 8.3 MB/s eta 0:00:01\r\u001b[K |█▏ | 40 kB 7.7 MB/s eta 0:00:01\r\u001b[K |█▌ | 51 kB 4.1 MB/s eta 0:00:01\r\u001b[K |█▊ | 61 kB 4.9 MB/s eta 0:00:01\r\u001b[K |██ | 71 kB 5.3 MB/s eta 0:00:01\r\u001b[K |██▍ | 81 kB 5.5 MB/s eta 0:00:01\r\u001b[K |██▋ | 92 kB 6.1 MB/s eta 0:00:01\r\u001b[K |███ | 102 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███▏ | 112 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███▌ | 122 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███▉ | 133 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████ | 143 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████▍ | 153 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████▊ | 163 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████ | 174 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████▎ | 184 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████▌ | 194 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████▉ | 204 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████▏ | 215 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████▍ | 225 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████▊ | 235 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████ | 245 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████▎ | 256 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████▋ | 266 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████▉ | 276 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████▏ | 286 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████▌ | 296 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████▊ | 307 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████ | 317 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████▍ | 327 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████▋ | 337 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████ | 348 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████▏ | 358 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████▌ | 368 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████▉ | 378 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████ | 389 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████▍ | 399 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████▊ | 409 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████ | 419 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████▎ | 430 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████▌ | 440 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████▉ | 450 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████▏ | 460 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████▍ | 471 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████▊ | 481 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████ | 491 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████▎ | 501 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████▋ | 512 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████▉ | 522 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████▏ | 532 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████▌ | 542 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████▊ | 552 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████ | 563 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████▍ | 573 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████▋ | 583 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████ | 593 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████▏ | 604 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████▌ | 614 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████▉ | 624 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████ | 634 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████▍ | 645 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████▊ | 655 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████ | 665 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████▎ | 675 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████▌ | 686 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████▉ | 696 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████▏ | 706 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████▍ | 716 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████▊ | 727 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████ | 737 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████▎ | 747 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████▋ | 757 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████▉ | 768 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████▏ | 778 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████▌ | 788 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████▊ | 798 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████ | 808 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████▍ | 819 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████▋ | 829 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████ | 839 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████▏ | 849 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████▌ | 860 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████▉ | 870 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████████ | 880 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████████▍ | 890 kB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████████▊ | 901 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████████ | 911 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████████▎ | 921 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████████▌ | 931 kB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████████▉ | 942 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████████▏ | 952 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████████▍ | 962 kB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████████▊ | 972 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████████ | 983 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████████▎ | 993 kB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████████▋ | 1.0 MB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████████▉ | 1.0 MB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▏ | 1.0 MB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▌ | 1.0 MB 5.1 MB/s eta 0:00:01\r\u001b[K |█████████████████████████████▊ | 1.0 MB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████████████ | 1.1 MB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████████████▍ | 1.1 MB 5.1 MB/s eta 0:00:01\r\u001b[K |██████████████████████████████▋ | 1.1 MB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████████████ | 1.1 MB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████████████▏| 1.1 MB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████████████▌| 1.1 MB 5.1 MB/s eta 0:00:01\r\u001b[K |███████████████████████████████▉| 1.1 MB 5.1 MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 1.1 MB 5.1 MB/s \n", + "\u001b[?25hRequirement already satisfied: typeguard>=2.7 in /usr/local/lib/python3.7/dist-packages (from tensorflow-addons) (2.7.1)\n", + "Installing collected packages: tensorflow-addons\n", + "Successfully installed tensorflow-addons-0.16.1\n" + ] + } + ], + "source": [ + "!pip install tf-nightly\n", + "!pip install tensorflow-addons" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bBN0CZWlD7zl" + }, + "outputs": [], + "source": [ + "# import libraries\n", + "from google.colab import drive\n", + "import sys" + ] + }, + { + "cell_type": "code", + "source": [ + "# \"opencv-python-headless\" version should be same of \"opencv-python\"\n", + "import pkg_resources\n", + "version_number = pkg_resources.get_distribution(\"opencv-python\").version\n", + "\n", + "!pip install opencv-python-headless==$version_number" + ], + "metadata": { + "id": "leap_jk5fq_v", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b5608bb5-24df-4fb1-9885-649ceca98a26" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Collecting opencv-python-headless==4.1.2.30\n", + " Downloading opencv_python_headless-4.1.2.30-cp37-cp37m-manylinux1_x86_64.whl (21.8 MB)\n", + "\u001b[K |████████████████████████████████| 21.8 MB 62.9 MB/s \n", + "\u001b[?25hRequirement already satisfied: numpy>=1.14.5 in /usr/local/lib/python3.7/dist-packages (from opencv-python-headless==4.1.2.30) (1.21.6)\n", + "Installing collected packages: opencv-python-headless\n", + "Successfully installed opencv-python-headless-4.1.2.30\n" + ] + } + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "i80tEP0pEJif", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "cb0d8dde-8852-49eb-e6d7-33653722eee0" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mounted at /content/gdrive\n", + "Successful\n" + ] + } + ], + "source": [ + "# connect to google drive\n", + "drive.mount('/content/gdrive')\n", + "\n", + "# making an alias for the root path\n", + "try:\n", + " !ln -s /content/gdrive/My\\ Drive/ /mydrive\n", + " print('Successful')\n", + "except Exception as e:\n", + " print(e)\n", + " print('Not successful')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w40-VpWXU-Hu" + }, + "source": [ + "## Clone TensorFlow Model Garden repository" + ] + }, + { + "cell_type": "code", + "source": [ + "# clone the Model Garden directory for Tensorflow where all the config files and scripts are located for this project. \n", + "# project folder name is - 'waste_identification_ml'\n", + "!git clone https://github.com/tensorflow/models.git " + ], + "metadata": { + "id": "Vh42KtozpqeT" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "# Go to the model folder\n", + "%cd models" + ], + "metadata": { + "id": "wm-k6-S4pr_B" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xNe2NuqjV4uW" + }, + "source": [ + "## Create TFRecord for training data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "J9Nz75g0oJkI" + }, + "outputs": [], + "source": [ + "training_images_folder = '/mydrive/gtech/total_images/' #@param {type:\"string\"}\n", + "training_annotation_file = '/mydrive/gtech/_train.json' #@param {type:\"string\"}\n", + "output_folder = '/mydrive/gtech/train/' #@param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "mjsai7PDAxgp", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "c78c7eaa-36e0-48e0-ba2c-3e674bdc5402" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "I0422 00:06:23.072771 139705362556800 create_coco_tf_record.py:494] writing to output path: /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/train/\n", + "I0422 00:06:25.089654 139705362556800 create_coco_tf_record.py:366] Building bounding box index.\n", + "I0422 00:06:25.115955 139705362556800 create_coco_tf_record.py:377] 0 images are missing bboxes.\n", + "I0422 00:07:39.273266 139705362556800 tfrecord_lib.py:168] On image 0\n", + "I0422 00:09:03.214606 139705362556800 tfrecord_lib.py:168] On image 100\n", + "I0422 00:10:14.332473 139705362556800 tfrecord_lib.py:168] On image 200\n", + "I0422 00:11:11.556596 139705362556800 tfrecord_lib.py:168] On image 300\n", + "I0422 00:12:11.437826 139705362556800 tfrecord_lib.py:168] On image 400\n", + "I0422 00:13:13.166231 139705362556800 tfrecord_lib.py:168] On image 500\n", + "I0422 00:14:21.695016 139705362556800 tfrecord_lib.py:168] On image 600\n", + "I0422 00:15:24.191824 139705362556800 tfrecord_lib.py:168] On image 700\n", + "I0422 00:16:48.620902 139705362556800 tfrecord_lib.py:168] On image 800\n", + "I0422 00:17:48.565592 139705362556800 tfrecord_lib.py:168] On image 900\n", + "I0422 00:18:41.091029 139705362556800 tfrecord_lib.py:168] On image 1000\n", + "I0422 00:19:39.844225 139705362556800 tfrecord_lib.py:168] On image 1100\n", + "I0422 00:20:45.108587 139705362556800 tfrecord_lib.py:168] On image 1200\n", + "I0422 00:22:13.738559 139705362556800 tfrecord_lib.py:168] On image 1300\n", + "I0422 00:23:13.147292 139705362556800 tfrecord_lib.py:168] On image 1400\n", + "I0422 00:24:06.315325 139705362556800 tfrecord_lib.py:168] On image 1500\n", + "I0422 00:24:59.421572 139705362556800 tfrecord_lib.py:168] On image 1600\n", + "I0422 00:25:45.958540 139705362556800 tfrecord_lib.py:168] On image 1700\n", + "I0422 00:26:35.475085 139705362556800 tfrecord_lib.py:168] On image 1800\n", + "I0422 00:27:38.255803 139705362556800 tfrecord_lib.py:168] On image 1900\n", + "I0422 00:28:37.250636 139705362556800 tfrecord_lib.py:168] On image 2000\n", + "I0422 00:29:38.937792 139705362556800 tfrecord_lib.py:168] On image 2100\n", + "I0422 00:30:24.683607 139705362556800 tfrecord_lib.py:168] On image 2200\n", + "I0422 00:31:13.964802 139705362556800 tfrecord_lib.py:168] On image 2300\n", + "I0422 00:32:06.411041 139705362556800 tfrecord_lib.py:168] On image 2400\n", + "I0422 00:33:06.038232 139705362556800 tfrecord_lib.py:168] On image 2500\n", + "I0422 00:34:15.721037 139705362556800 tfrecord_lib.py:168] On image 2600\n", + "I0422 00:35:19.886712 139705362556800 tfrecord_lib.py:168] On image 2700\n", + "I0422 00:36:32.834578 139705362556800 tfrecord_lib.py:168] On image 2800\n", + "I0422 00:38:00.137243 139705362556800 tfrecord_lib.py:168] On image 2900\n", + "I0422 00:39:24.083769 139705362556800 tfrecord_lib.py:168] On image 3000\n", + "I0422 00:40:47.815561 139705362556800 tfrecord_lib.py:168] On image 3100\n", + "I0422 00:42:01.868806 139705362556800 tfrecord_lib.py:168] On image 3200\n", + "I0422 00:43:10.464518 139705362556800 tfrecord_lib.py:168] On image 3300\n", + "I0422 00:44:08.492330 139705362556800 tfrecord_lib.py:168] On image 3400\n", + "I0422 00:45:06.637591 139705362556800 tfrecord_lib.py:168] On image 3500\n", + "I0422 00:46:17.144057 139705362556800 tfrecord_lib.py:168] On image 3600\n", + "I0422 00:47:34.219212 139705362556800 tfrecord_lib.py:168] On image 3700\n", + "I0422 00:48:47.535176 139705362556800 tfrecord_lib.py:168] On image 3800\n", + "I0422 00:49:44.018001 139705362556800 tfrecord_lib.py:168] On image 3900\n", + "I0422 00:50:46.843277 139705362556800 tfrecord_lib.py:168] On image 4000\n", + "I0422 00:51:42.749161 139705362556800 tfrecord_lib.py:168] On image 4100\n", + "I0422 00:52:29.118489 139705362556800 tfrecord_lib.py:168] On image 4200\n", + "I0422 00:53:12.499863 139705362556800 tfrecord_lib.py:168] On image 4300\n", + "I0422 00:54:02.751904 139705362556800 tfrecord_lib.py:168] On image 4400\n", + "I0422 00:54:54.855237 139705362556800 tfrecord_lib.py:168] On image 4500\n", + "I0422 00:56:11.432259 139705362556800 tfrecord_lib.py:168] On image 4600\n", + "I0422 00:57:12.901312 139705362556800 tfrecord_lib.py:168] On image 4700\n", + "I0422 00:58:15.347571 139705362556800 tfrecord_lib.py:168] On image 4800\n", + "I0422 00:59:13.046698 139705362556800 tfrecord_lib.py:168] On image 4900\n", + "I0422 01:00:38.408758 139705362556800 tfrecord_lib.py:168] On image 5000\n", + "I0422 01:02:03.484946 139705362556800 tfrecord_lib.py:168] On image 5100\n", + "I0422 01:02:57.290261 139705362556800 tfrecord_lib.py:168] On image 5200\n", + "I0422 01:03:54.188467 139705362556800 tfrecord_lib.py:168] On image 5300\n", + "I0422 01:04:49.160263 139705362556800 tfrecord_lib.py:168] On image 5400\n", + "I0422 01:05:46.782065 139705362556800 tfrecord_lib.py:168] On image 5500\n", + "I0422 01:07:00.913060 139705362556800 tfrecord_lib.py:168] On image 5600\n", + "I0422 01:08:05.558512 139705362556800 tfrecord_lib.py:168] On image 5700\n", + "I0422 01:09:09.658477 139705362556800 tfrecord_lib.py:168] On image 5800\n", + "I0422 01:10:10.147291 139705362556800 tfrecord_lib.py:168] On image 5900\n", + "I0422 01:11:11.286698 139705362556800 tfrecord_lib.py:168] On image 6000\n", + "I0422 01:12:08.696386 139705362556800 tfrecord_lib.py:168] On image 6100\n", + "I0422 01:13:02.225769 139705362556800 tfrecord_lib.py:168] On image 6200\n", + "I0422 01:13:55.910152 139705362556800 tfrecord_lib.py:168] On image 6300\n", + "I0422 01:14:47.861520 139705362556800 tfrecord_lib.py:181] Finished writing, skipped 8 annotations.\n", + "I0422 01:14:47.862285 139705362556800 create_coco_tf_record.py:529] Finished writing, skipped 8 annotations.\n" + ] + } + ], + "source": [ + "# run the script to convert your json file to TFRecord file\n", + "# --num_shards (how many TFRecord sharded files you want)\n", + "!python3 -m official.vision.data.create_coco_tf_record --logtostderr \\\n", + " --image_dir=$training_images_folder \\\n", + " --object_annotations_file=$training_annotation_file \\\n", + " --output_file_prefix=$output_folder \\\n", + " --num_shards=100 \\\n", + " --include_masks=True \\\n", + " --num_processes=0" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zwazp89SojMA" + }, + "source": [ + "## Create TFRecord for validation data" + ] + }, + { + "cell_type": "code", + "source": [ + "validation_annotation_file = '/mydrive/gtech/total_images/' #@param {type:\"string\"}\n", + "validation_data_folder = '/mydrive/gtech/_val.json' #@param {type:\"string\"}\n", + "output_folder = '/mydrive/gtech/val/' #@param {type:\"string\"}" + ], + "metadata": { + "id": "OVQn5DiFBUfv" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nWbKeLoVwXbi", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "63f4fc03-43b1-424e-dfb2-200f9bbdf1e5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "I0421 20:53:39.071351 140304098097024 create_coco_tf_record.py:494] writing to output path: /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/val/\n", + "I0421 20:53:40.622877 140304098097024 create_coco_tf_record.py:366] Building bounding box index.\n", + "I0421 20:53:40.627101 140304098097024 create_coco_tf_record.py:377] 0 images are missing bboxes.\n", + "I0421 20:54:41.275259 140304098097024 tfrecord_lib.py:168] On image 0\n", + "I0421 20:56:53.052898 140304098097024 tfrecord_lib.py:168] On image 100\n", + "I0421 20:59:01.886727 140304098097024 tfrecord_lib.py:168] On image 200\n", + "I0421 21:01:12.356394 140304098097024 tfrecord_lib.py:168] On image 300\n", + "I0421 21:03:03.635432 140304098097024 tfrecord_lib.py:168] On image 400\n", + "I0421 21:05:04.787051 140304098097024 tfrecord_lib.py:168] On image 500\n", + "I0421 21:06:52.991898 140304098097024 tfrecord_lib.py:168] On image 600\n", + "I0421 21:09:02.626780 140304098097024 tfrecord_lib.py:168] On image 700\n", + "I0421 21:11:39.070799 140304098097024 tfrecord_lib.py:168] On image 800\n", + "I0421 21:13:58.603258 140304098097024 tfrecord_lib.py:168] On image 900\n", + "I0421 21:16:23.214870 140304098097024 tfrecord_lib.py:168] On image 1000\n", + "I0421 21:18:25.072518 140304098097024 tfrecord_lib.py:168] On image 1100\n", + "I0421 21:20:29.223420 140304098097024 tfrecord_lib.py:168] On image 1200\n", + "I0421 21:22:34.431273 140304098097024 tfrecord_lib.py:168] On image 1300\n", + "I0421 21:24:29.066092 140304098097024 tfrecord_lib.py:168] On image 1400\n", + "I0421 21:26:33.851860 140304098097024 tfrecord_lib.py:168] On image 1500\n", + "I0421 21:28:25.426244 140304098097024 tfrecord_lib.py:168] On image 1600\n", + "I0421 21:28:59.923923 140304098097024 tfrecord_lib.py:181] Finished writing, skipped 2 annotations.\n", + "I0421 21:28:59.924295 140304098097024 create_coco_tf_record.py:529] Finished writing, skipped 2 annotations.\n" + ] + } + ], + "source": [ + "# run the script to convert your json file to TFRecord file\n", + "# --num_shards (how many TFRecord sharded files you want)\n", + "!python3 -m official.vision.data.create_coco_tf_record --logtostderr \\\n", + " --image_dir=$validation_images_folder \\\n", + " --object_annotations_file=$validation_annotation_file \\\n", + " --output_file_prefix=$output_folder \\\n", + " --num_shards=100 \\\n", + " --include_masks=True \\\n", + " --num_processes=0" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "collapsed_sections": [], + "machine_shape": "hm", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/official/projects/waste_identification_ml/pre_processing/config/categories_list_of_dictionaries.py b/official/projects/waste_identification_ml/pre_processing/config/categories_list_of_dictionaries.py new file mode 100644 index 0000000000000000000000000000000000000000..104e8e7d3b907637bcd7d2c58d9a2277523b6d47 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/categories_list_of_dictionaries.py @@ -0,0 +1,90 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Create a list of dictionaries for categories according to the taxonomy. + +Example usage- + build_material(MATERIAL_LIST,'material-types') + build_material(MATERIAL_FORM_LIST,'material-form-types') + build_material(MATERIAL_SUBCATEGORY_LIST,'material-subcategory-types') + build_material(MATERIAL_FORM_SUBCATEGORY_LIST,'material-form-subcategory-types') +""" +#! /usr/bin/env python + +from typing import List, Dict, Union + +MATERIAL_LIST = [ + 'Inorganic-wastes', 'Textiles', 'Rubber-and-Leather', 'Wood', 'Food', + 'Plastics', 'Yard-trimming', 'Fiber', 'Glass', 'Metals' +] + +MATERIAL_FORM_LIST = [ + 'Flexibles', 'Bottle', 'Jar', 'Carton', 'Sachets-&-Pouch', 'Blister-pack', + 'Tray', 'Tube', 'Can', 'Tub', 'Cosmetic', 'Box', 'Clothes', 'Bulb', + 'Cup-&-glass', 'Book-&-magazine', 'Bag', 'Lid', 'Clamshell', 'Mirror', + 'Tangler', 'Cutlery', 'Cassette-&-tape', 'Electronic-devices', 'Battery', + 'Pen-&-pencil', 'Paper-products', 'Foot-wear', 'Scissor', 'Toys', 'Brush', + 'Pipe', 'Foil', 'Hangers' +] + +MATERIAL_SUBCATEGORY_LIST = [ + 'HDPE_Flexible_Color', 'HDPE_Rigid_Color', 'LDPE_Flexible_Color', + 'LDPE_Rigid_Color', 'PP_Flexible_Color', 'PP_Rigid_Color', 'PETE', 'PS', + 'PVC', 'Others-MLP', 'Others-Tetrapak', 'Others-HIPC', 'Aluminium', + 'Ferrous_Iron', 'Ferrous_Steel', 'Non-ferrous_Lead', 'Non-ferrous_Copper', + 'Non-ferrous_Zinc' +] + +PLASTICS_SUBCATEGORY_LIST = [ + 'HDPE', 'PETE', 'LDPE', 'PS', 'PP', 'PVC', 'Others-MLP', 'Others-Tetrapak', + 'Others-HIPC' +] + + +def build_material(category_list: List[str], + supercategory: str) -> List[Dict[str, Union[int, str]]]: + """Creates a list of dictionaries for the category classes. + + Args: + category_list: list of categories from MATERIAL_LIST, MATERIAL_FORM_LIST, + MATERIAL_SUBCATEGORY_LIST, PLASTICS_SUBCATEGORY_LIST + supercategory: supercategory can be 'material-types', 'material-form-types', + 'material-subcategory-types', 'material-form-subcategory-types', + 'plastic-types' + + Returns: + List of dictionaries returning categories with their IDs + """ + list_of_dictionaries = [] + for num, m in enumerate(category_list, start=1): + list_of_dictionaries.append({ + 'id': num, + 'name': m, + 'supercategory': supercategory + }) + return list_of_dictionaries diff --git a/official/projects/waste_identification_ml/pre_processing/config/config.ini b/official/projects/waste_identification_ml/pre_processing/config/config.ini new file mode 100644 index 0000000000000000000000000000000000000000..9c24c1c4de16dbfb660220e283f5f8708ac8e44f --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/config.ini @@ -0,0 +1,24 @@ +[config] +config_folder_path = /mydrive/TFVision/pre-processing/config/ + +[paths] +annotation_path = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/18012022/annotations_18012022_coco.json +images_folder_path = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/18012022/images +new_annotation_path = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/18012022/material_annotations_18012022_coco.json + +[merge] +input_files = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/20122021/material_annotations_20122021_coco.json,/mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/27122021/material_annotations_27122021_coco.json,/mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/03012022/material_annotations_03012022_coco.json,/mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/10012022/material_annotations_10012022_coco.json,/mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/18012022/material_annotations_18012022_coco.json +output_file = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/output.json + +[split] +input_file = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/output.json +output_folder = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/ + +[tfrecord] +tensorflow_model_folder = /mydrive/TFVision/ +training_data_folder = /mydrive/TFVision/tfrecords/train/ +validation_data_folder = /mydrive/TFVision/tfrecords/val/ +training_images_folder = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/Total_images/train/ +training_annotation_file = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/_train.json +validation_images_folder = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/Total_images/validation/ +validation_annotation_file = /mydrive/gtech/MRFs/Recykal/Latest_sharing_by_sanket/Google_Recykal/Taxonomy_version_2/_val.json diff --git a/official/projects/waste_identification_ml/pre_processing/config/data/material_form_labels.pbtxt b/official/projects/waste_identification_ml/pre_processing/config/data/material_form_labels.pbtxt new file mode 100644 index 0000000000000000000000000000000000000000..2aa36d5f18e8cb57f662b3b00613015d2c56bb3e --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/data/material_form_labels.pbtxt @@ -0,0 +1,137 @@ + +item { +id:1 +name:'Flexibles' +} +item { +id:2 +name:'Bottle' +} +item { +id:3 +name:'Jar' +} +item { +id:4 +name:'Carton' +} +item { +id:5 +name:'Sachets-&-Pouch' +} +item { +id:6 +name:'Blister-pack' +} +item { +id:7 +name:'Tray' +} +item { +id:8 +name:'Tube' +} +item { +id:9 +name:'Can' +} +item { +id:10 +name:'Tub' +} +item { +id:11 +name:'Cosmetic' +} +item { +id:12 +name:'Box' +} +item { +id:13 +name:'Clothes' +} +item { +id:14 +name:'Bulb' +} +item { +id:15 +name:'Cup-&-glass' +} +item { +id:16 +name:'Book-&-magazine' +} +item { +id:17 +name:'Bag' +} +item { +id:18 +name:'Lid' +} +item { +id:19 +name:'Clamshell' +} +item { +id:20 +name:'Mirror' +} +item { +id:21 +name:'Tangler' +} +item { +id:22 +name:'Cutlery' +} +item { +id:23 +name:'Cassette-&-tape' +} +item { +id:24 +name:'Electronic-devices' +} +item { +id:25 +name:'Battery' +} +item { +id:26 +name:'Pen-&-pencil' +} +item { +id:27 +name:'Paper-products' +} +item { +id:28 +name:'Foot-wear' +} +item { +id:29 +name:'Scissor' +} +item { +id:30 +name:'Toys' +} +item { +id:31 +name:'Brush' +} +item { +id:32 +name:'Pipe' +} +item { +id:33 +name:'Foil' +} +item { +id:34 +name:'Hangers' +} diff --git a/official/projects/waste_identification_ml/pre_processing/config/data/material_labels.pbtxt b/official/projects/waste_identification_ml/pre_processing/config/data/material_labels.pbtxt new file mode 100644 index 0000000000000000000000000000000000000000..287949f761b97e6aaffafc7c33646029bb4a2b88 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/data/material_labels.pbtxt @@ -0,0 +1,41 @@ + +item { +id:1 +name:'Inorganic-waste' +} +item { +id:2 +name:'Textiles' +} +item { +id:3 +name:'Rubber-and-Leather' +} +item { +id:4 +name:'Wood' +} +item { +id:5 +name:'Food' +} +item { +id:6 +name:'Plastics' +} +item { +id:7 +name:'Yard-trimming' +} +item { +id:8 +name:'Fiber' +} +item { +id:9 +name:'Glass' +} +item { +id:10 +name:'Metals' +} diff --git a/official/projects/waste_identification_ml/pre_processing/config/data/plastic_type_labels.pbtxt b/official/projects/waste_identification_ml/pre_processing/config/data/plastic_type_labels.pbtxt new file mode 100644 index 0000000000000000000000000000000000000000..568b07555affaaf434781e9c694da9baf9039aea --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/data/plastic_type_labels.pbtxt @@ -0,0 +1,37 @@ + +item { +id:1 +name:'HDPE' +} +item { +id:2 +name:'PETE' +} +item { +id:3 +name:'LDPE' +} +item { +id:4 +name:'PS' +} +item { +id:5 +name:'PP' +} +item { +id:6 +name:'PVC' +} +item { +id:7 +name:'Others-MLP' +} +item { +id:8 +name:'Others-Tetrapak' +} +item { +id:9 +name:'Others-HIPC' +} diff --git a/official/projects/waste_identification_ml/pre_processing/config/sample_images/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg b/official/projects/waste_identification_ml/pre_processing/config/sample_images/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg new file mode 100644 index 0000000000000000000000000000000000000000..10bcd073e8f7a55a3a2b623c60230b988a00e519 Binary files /dev/null and b/official/projects/waste_identification_ml/pre_processing/config/sample_images/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg differ diff --git a/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_2.png b/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_2.png new file mode 100644 index 0000000000000000000000000000000000000000..a63244352c616ff974a3539b2e8e5c85b91608e3 Binary files /dev/null and b/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_2.png differ diff --git a/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_3.jpg b/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_3.jpg new file mode 100644 index 0000000000000000000000000000000000000000..34d5e2be84b37677bf5bf62b686a570b52cb6ca6 Binary files /dev/null and b/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_3.jpg differ diff --git a/official/projects/waste_identification_ml/pre_processing/config/sample_json/dataset.json b/official/projects/waste_identification_ml/pre_processing/config/sample_json/dataset.json new file mode 100644 index 0000000000000000000000000000000000000000..e33f12c33df9bed4d8e78f72a06762fead9783a1 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/sample_json/dataset.json @@ -0,0 +1 @@ +{"images":[{"height":2048,"width":2592,"id":1,"file_name":"ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg"},{"height":1080,"width":1920,"id":2,"file_name":"image_2.png"}],"annotations":[{"iscrowd":0,"image_id":1,"bbox":[832,255,729,697],"segmentation":[[994,255,990,398,989,536,996,565,971,624,832,772,852,801,937,817,1113,870,1247,903,1304,941,1363,941,1392,951,1413,834,1458,711,1511,572,1561,401,1561,386,1377,337,1059,258]],"category_id":0,"id":1,"area":311359},{"iscrowd":0,"image_id":2,"bbox":[84,305,105,116],"segmentation":[[84,346,87,377,100,399,137,420,174,417,189,378,186,340,169,319,147,304,115,311,94,325]],"category_id":1,"id":2,"area":9352},{"iscrowd":0,"image_id":2,"bbox":[671,80,232,105],"segmentation":[[689,107,697,174,742,177,798,185,887,172,903,162,897,98,887,88,858,80,829,80,810,104,766,103,671,104]],"category_id":2,"id":3,"area":17023},{"iscrowd":0,"image_id":2,"bbox":[646,235,234,376],"segmentation":[[645,243,655,282,652,311,655,346,661,383,679,419,710,483,745,554,760,590,768,603,774,607,790,611,831,591,871,556,879,530,836,465,760,311,697,261,668,235]],"category_id":3,"id":4,"area":41260},{"iscrowd":0,"image_id":2,"bbox":[54,640,342,284],"segmentation":[[355,640,187,761,60,875,53,888,87,914,105,924,160,891,227,845,281,798,303,774,348,745,394,704,395,678]],"category_id":4,"id":5,"area":30998},{"iscrowd":0,"image_id":2,"bbox":[513,622,248,202],"segmentation":[[513,733,631,665,660,664,687,659,724,641,731,632,745,622,761,646,761,667,744,678,716,693,684,714,676,743,550,824]],"category_id":5,"id":6,"area":19383}],"categories":[{"id":0,"name":"plastics_HDPE_flexible_color_SAchets-&-pouch_pouch","supercategory":"plastics_HDPE_flexible_color_SAchets-&-pouch_pouch"},{"id":1,"name":"Plastics_HDPE_Rigid_Blue_Lid_Bottle-Cap_Na_Na","supercategory":"Plastics_HDPE_Rigid_Blue_Lid_Bottle-Cap_Na_Na"},{"id":2,"name":"Plastics_peTE_Na_Clear_Bottle_Shampoo-Bottle_250Ml_Vlcc","supercategory":"Plastics_peTE_Na_Clear_Bottle_Shampoo-Bottle_250Ml_Vlcc"},{"id":3,"name":"Plastics_na_Rigid_Blue_Bottle_Hair-Oil-Bottle-500Ml_Parachute","supercategory":"Plastics_na_Rigid_Blue_Bottle_Hair-Oil-Bottle-500Ml_Parachute"},{"id":4,"name":"Plastics_HDPE_Rigid_Na_Cosmetic_Comb_Na_Na","supercategory":"Plastics_HDPE_Rigid_Na_Cosmetic_Comb_Na_Na"},{"id":5,"name":"Plastics_PETE_Na_Clear_Bottle_Energy-Drink-Bottle_250Ml_Sting-Energy","supercategory":"Plastics_PETE_Na_Clear_Bottle_Energy-Drink-Bottle_250Ml_Sting-Energy"}]} \ No newline at end of file diff --git a/official/projects/waste_identification_ml/pre_processing/config/sample_json/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.json b/official/projects/waste_identification_ml/pre_processing/config/sample_json/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.json new file mode 100644 index 0000000000000000000000000000000000000000..1b816e53340ae23b0d4b865d300e7099309e25fe --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/sample_json/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.json @@ -0,0 +1,98 @@ +{ + "version": "4.5.13", + "flags": {}, + "shapes": [ + { + "label": "plastics_HDPE_flexible_color_SAchets-&-pouch_pouch", + "points": [ + [ + 994.3103448275863, + 255.0689655172414 + ], + [ + 990.8620689655174, + 398.1724137931035 + ], + [ + 989.1379310344828, + 536.1034482758621 + ], + [ + 996.0344827586207, + 565.4137931034484 + ], + [ + 971.8965517241381, + 624.0344827586207 + ], + [ + 832.2413793103449, + 772.3103448275863 + ], + [ + 852.9310344827588, + 801.6206896551724 + ], + [ + 937.4137931034484, + 817.1379310344828 + ], + [ + 1113.2758620689656, + 870.5862068965517 + ], + [ + 1247.7586206896553, + 903.344827586207 + ], + [ + 1304.6551724137933, + 941.2758620689656 + ], + [ + 1363.2758620689656, + 941.2758620689656 + ], + [ + 1392.586206896552, + 951.6206896551724 + ], + [ + 1413.2758620689656, + 834.3793103448277 + ], + [ + 1458.1034482758623, + 711.9655172413794 + ], + [ + 1511.5517241379312, + 572.3103448275863 + ], + [ + 1561.5517241379312, + 401.62068965517244 + ], + [ + 1561.5517241379312, + 386.1034482758621 + ], + [ + 1377.0689655172414, + 337.82758620689657 + ], + [ + 1059.8275862068967, + 258.51724137931035 + ] + ], + "group_id": null, + "shape_type": "polygon", + "flags": {} + } + ], + "imagePath": "ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg", + "imageData": "", + "imageHeight": 2048, + "imageWidth": 2592 +} \ No newline at end of file diff --git a/official/projects/waste_identification_ml/pre_processing/config/visualization.py b/official/projects/waste_identification_ml/pre_processing/config/visualization.py new file mode 100644 index 0000000000000000000000000000000000000000..b8a79c86c3d2f22fab0284f88e184f13049f7ff3 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/config/visualization.py @@ -0,0 +1,109 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""To visualize of the category distribution in an annotated JSON file.""" + +#! /usr/bin/env python3 + +import json +import numpy as np +import pandas as pd + + +def data_creation(path: str) -> pd.DataFrame: + """Create a dataframe with the occurences of images and categories. + + Args: + path: path to the annotated JSON file. + + Returns: + dataset consisting of the counts of images and categories. + """ + # get annotation file data into a variable + with open(path) as json_file: + data = json.load(json_file) + + # count the occurance of each category and an image in the annotation file + category_names = [i['name'] for i in data['categories']] + category_ids = [i['category_id'] for i in data['annotations']] + image_ids = [i['image_id'] for i in data['annotations']] + + # create a dataframe + df = pd.DataFrame( + list(zip(category_ids, image_ids)), columns=['category_ids', 'image_ids']) + df = df.groupby('category_ids').agg( + object_count=('category_ids', 'count'), + image_count=('image_ids', 'nunique')) + df = df.reindex(range(1, len(data['categories']) + 1), fill_value=0) + df.index = category_names + return df + + +def visualize_detailed_counts_horizontally(path: str) -> None: + """Plot a vertical bar graph showing the counts of images & categories. + + Args: + path: path to the annotated JSON file. + """ + df = data_creation(path) + ax = df.plot( + kind='bar', + figsize=(40, 10), + xlabel='Categories', + ylabel='Counts', + width=0.8, + linewidth=1, + edgecolor='white') # rot = 0 for horizontal labeling + for p in ax.patches: + ax.annotate( + text=np.round(p.get_height()), + xy=(p.get_x() + p.get_width() / 2., p.get_height()), + ha='center', + va='top', + xytext=(4, 14), + textcoords='offset points') + + +def visualize_detailed_counts_vertically(path: str) -> None: + """Plot a horizontal bar graph showing the counts of images & categories. + + Args: + path: path to the annotated JSON file. + """ + df = data_creation(path) + ax = df.plot( + kind='barh', + figsize=(15, 40), + xlabel='Categories', + ylabel='Counts', + width=0.6) + for p in ax.patches: + ax.annotate( + str(p.get_width()), (p.get_x() + p.get_width(), p.get_y()), + xytext=(4, 6), + textcoords='offset points') + + +def visualize_annotation_file(path: str) -> None: + """Plot a bar graph showing the category distribution. + + Args: + path: path to the annotated JSON file. + """ + df = data_creation(path) + df['object_count'].plot.bar( + figsize=(20, 5), + width=0.5, + xlabel='Material types', + ylabel='count of material types') diff --git a/official/projects/waste_identification_ml/pre_processing/json_preparation.ipynb b/official/projects/waste_identification_ml/pre_processing/json_preparation.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..9fb846a5a7b9bf921051ad00054788f539dbef81 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/json_preparation.ipynb @@ -0,0 +1,750 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "0JmF5ohLbPlF" + }, + "source": [ + "# Pre processing steps of a COCO JSON annotated file " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uXwwz3PlbUX2" + }, + "source": [ + "Given a single COCO annotated JSON file, your goal is to pre-process in order to remove noise and manipulate it into a form which is suitable for training a ML model. This script will also check if the annotated images are broken or missing." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "E1SxGZD2bv8E" + }, + "source": [ + "The COCO annotation file includes the following -\n", + "\n", + "1. Name of the images.\n", + "\n", + "2. Dimensions of the images.\n", + "\n", + "3. Classes in the image category.\n", + "\n", + "4. Name of the super categories of the classes.\n", + "\n", + "5. Area acquired by the segmented pixels in an image.\n", + "\n", + "6. Bounding box co-ordinates.\n", + "\n", + "7. Annotated segmentation coordinates." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "j0v31gxTbweO" + }, + "source": [ + "There is a lot of noise in the real world annotation file. The images name could be wrong. The images mentioned in an annotation file may not be present in the image folder, which will disrupt the model training procedure. The contents within an annotation file may not match with each other. Even the files present in an image folder may be broken or truncated, which will cause errors while reading image files. Our goal is to eradicate all these problems." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PyFn96EKb7A-" + }, + "source": [ + "Our goal is to make sure that all information in the key values corresponds to each other correctly. This notebook will help you achieve this task." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W6aXxxox0DDa" + }, + "source": [ + "## Import labels and sample JSON file \n", + "To import total classes for the material, material_form and plastic_type we will import the label files from the waste_identification_ml project from Tensorflow Model Garden.\n", + "We will also import a noisy sample JSON file to illustrate an example." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WluEHMZYm0zM", + "outputId": "b8c4738c-4636-4c56-c6ea-b4e4da92474c", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + " % Total % Received % Xferd Average Speed Time Time Time Current\n", + " Dload Upload Total Spent Left Speed\n", + "\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 3536 100 3536 0 0 29714 0 --:--:-- --:--:-- --:--:-- 29714\n", + " % Total % Received % Xferd Average Speed Time Time Time Current\n", + " Dload Upload Total Spent Left Speed\n", + "\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 2427 100 2427 0 0 18248 0 --:--:-- --:--:-- --:--:-- 18248\n", + " % Total % Received % Xferd Average Speed Time Time Time Current\n", + " Dload Upload Total Spent Left Speed\n", + "\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 3303k 100 3303k 0 0 14.1M 0 --:--:-- --:--:-- --:--:-- 14.2M\n" + ] + } + ], + "source": [ + "%%bash\n", + "curl -O https://raw.githubusercontent.com/tensorflow/models/master/official/projects/waste_identification_ml/pre_processing/config/categories_list_of_dictionaries.py\n", + "\n", + "curl -O https://raw.githubusercontent.com/tensorflow/models/master/official/projects/waste_identification_ml/pre_processing/config/sample_json/dataset.json\n", + "\n", + "mkdir image_folder\n", + "\n", + "curl -o image_folder/image_2.png https://raw.githubusercontent.com/tensorflow/models/master/official/projects/waste_identification_ml/pre_processing/config/sample_images/image_2.png" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MRhCAFlVcRm0" + }, + "source": [ + "## Import the required libraries" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Mnxbo8GBcN2O" + }, + "outputs": [], + "source": [ + "import glob\n", + "import tqdm\n", + "import json\n", + "from PIL import Image\n", + "import subprocess\n", + "import copy\n", + "import os\n", + "from google.colab import files\n", + "from categories_list_of_dictionaries import *" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "f-05VwsL0mCi" + }, + "outputs": [], + "source": [ + "# reading labels \n", + "\n", + "images_folder_path = 'image_folder/' #@param {type:\"string\"}\n", + "list_of_material = build_material(MATERIAL_LIST,'material-types')\n", + "list_of_material_form = build_material(MATERIAL_FORM_LIST,'material-form-types')\n", + "list_of_plastic_type = build_material(PLASTICS_SUBCATEGORY_LIST,'plastic-types')" + ] + }, + { + "cell_type": "code", + "source": [ + "# common labeling typo errors\n", + "_KNOWN_TYPOS = {\n", + " 'and': '&',\n", + " 'Cassete': 'Cassette',\n", + " 'Toy':'Toys',\n", + " 'Mug-&-Tub':'tub',\n", + " 'Toyss':'toys'\n", + "}\n", + "_KNOWN_TYPOS" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "p7xwZDoc5rZU", + "outputId": "33d060a4-f4aa-475a-e140-30bc222553a3" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'and': '&',\n", + " 'Cassete': 'Cassette',\n", + " 'Toy': 'Toys',\n", + " 'Mug-&-Tub': 'tub',\n", + " 'Toyss': 'toys'}" + ] + }, + "metadata": {}, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "958ZSjT_eG_b" + }, + "source": [ + "## Utility functions" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tGOCdeiucUgq" + }, + "outputs": [], + "source": [ + "def read_json(file):\n", + " \"\"\"Read any JSON file.\n", + "\n", + " Args:\n", + " file: path to the file\n", + " \"\"\"\n", + " with open(file) as json_file:\n", + " data = json.load(json_file)\n", + " return data\n", + "\n", + "\n", + "def search_dict_value(dic, id):\n", + " \"\"\"Returns the key of the dictionary from its value'\n", + "\n", + " Args:\n", + " dic = Mapping to search by value.\n", + " id = Value to search.\n", + " \"\"\" \n", + " key_list = list(dic.keys())\n", + " val_list = list(dic.values())\n", + " position = val_list.index(id)\n", + " return key_list[position]\n", + "\n", + "\n", + "def delete_truncated_images(folder_path: str) -> None:\n", + " \"\"\"Find and delete truncated images.\n", + "\n", + " Args:\n", + " folder_path: path to the folder where images are saved.\n", + " \"\"\"\n", + " # path to the images folder to read its content\n", + " files = glob.glob(folder_path + '/*')\n", + " print('Total number of files in the folder:', len(files))\n", + "\n", + " num = 0\n", + "\n", + " # read all image files and remove them from the directory in case they are broken\n", + " for file in tqdm.tqdm(files):\n", + " if file.endswith(('.png','.jpg')):\n", + " try:\n", + " img = Image.open(file)\n", + " img.verify()\n", + " except:\n", + " num = num + 1\n", + " subprocess.run(['rm', file])\n", + " print('Broken file name: ' + file)\n", + " if num == 0:\n", + " print('\\nNo broken images found')\n", + " else:\n", + " print('Total number of broken images found:', num)\n", + "\n", + "\n", + "def spelling_correction(dic):\n", + " \"\"\"Correcting some common spelling mistakes.\"\"\"\n", + " for i in dic['categories']:\n", + " for old, new in _KNOWN_TYPOS.items():\n", + " i['name'].replace(old, new)\n", + "\n", + "\n", + "def labeling_correction(dic, num, labels_dict):\n", + " \"\"\"Matching annotated labels with the correct labels and correcting the mistakes.\n", + "\n", + " Mapping the modified labeling ID with the corresponding original ID for alignment\n", + " of categories.\n", + "\n", + " Args:\n", + " dic: JSON file read as a dictionary\n", + " num: keyword position inside the label\n", + " labels_dict: dictionary showing the labels ID of the original categories \n", + " \"\"\"\n", + " incorrect_labels = []\n", + " mapping_list = []\n", + " for i in dic['categories']:\n", + " if i['name'].split('_')[num].lower() in labels_dict.values():\n", + " id = i['id']\n", + " name = i['name'].split('_')[num]\n", + " id_match = search_dict_value(labels_dict, i['name'].split('_')[num].lower())\n", + " mapping_list.append((id, name, id_match))\n", + " else:\n", + " id = i['id']\n", + " incorrect_labels.append(id)\n", + " return mapping_list, incorrect_labels\n", + "\n", + "\n", + "def images_key(dic):\n", + " \"\"\"Align the data within the dictionary in the 'images' key.\n", + " \n", + " The 'image_id' parameter in the 'annotation' key is the same as 'id' in the 'images' key of the dictionary. This function \n", + " will also remove all image data from the 'images' key whose 'id' does not \n", + " match with 'image_id' in the 'annotation' key in the dictionary.\n", + "\n", + " Args:\n", + " dic: where the JSON file is read into\n", + " \"\"\"\n", + " image_ids = set(i['image_id'] for i in dic['annotations'])\n", + " new_images = [i for i in dic['images'] if i['id'] in image_ids]\n", + " return new_images\n", + "\n", + "\n", + "def annotations_key(dic, incorrect_labels, mapping_dict):\n", + " \"\"\"Align the data within the dictionary in the 'annotation' key.\n", + " \n", + " Notice that the 'category_id' in the 'annotation' key is same as 'id' \n", + " in the 'categories' key of the dictionary.\n", + "\n", + " Args:\n", + " dic: where the JSON file is read into\n", + " \"\"\"\n", + " new_annotation = []\n", + "\n", + " for i in dic['annotations']:\n", + " id = i['category_id']\n", + " if id not in incorrect_labels:\n", + " new_id = [i[2] for i in mapping_dict if i[0] == id][0]\n", + " i['category_id'] = new_id\n", + " new_annotation.append(i)\n", + " return new_annotation\n", + "\n", + "\n", + "def annotated_images(folder_path, dic):\n", + " \"\"\"Get images infromation that are mentioned in an annotation file but are not present in an image folder.\n", + "\n", + " Args:\n", + " folder_path: path of an image folder.\n", + " \"\"\"\n", + " # read the file names from the directory \n", + " files = glob.glob(folder_path + '/*')\n", + " files = set(map(os.path.basename, files))\n", + "\n", + " # list of images in an annotation file\n", + " dic['images'] = [i for i in dic['images'] if i['file_name'] in files]\n", + " return dic\n", + "\n", + "\n", + "def image_annotation_key(dic):\n", + " \"\"\"Check if same images are present in both \"images\" key and \"annotations\" key. \n", + "\n", + " List of the image IDs which are in the \"images\" key but NOT in \"annotation\" key.\n", + " Remove information if they are not present in both keys.\n", + "\n", + " Args:\n", + " dic: annotation file read as a dictionary\n", + " \"\"\"\n", + " images_id = [i['id'] for i in dic['images']]\n", + " annotation_id = [i['image_id'] for i in dic['annotations']]\n", + " common_list = set(images_id).intersection(annotation_id)\n", + " dic['images'] = [i for i in dic['images'] if i['id'] in common_list]\n", + " dic['annotations'] = [i for i in dic['annotations'] if i['image_id'] in common_list]\n", + " return dic" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0OoDmNC22ycz" + }, + "source": [ + "## Find and delete truncated images from the image folder." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "bUUu3F6I20w3", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "21ad1270-0486-4eb8-b4ed-07d7dd99b936" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Total number of files in the folder: 1\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "100%|██████████| 1/1 [00:00<00:00, 30.21it/s]" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "No broken images found\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "\n" + ] + } + ], + "source": [ + "delete_truncated_images(images_folder_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "65XuyPBSea7-" + }, + "source": [ + "## Perform operations on the file\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "l-uMtZK2edPY", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "428298ff-af02-44c2-c06b-7223646a70db" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "dict_keys(['images', 'annotations', 'categories'])\n" + ] + } + ], + "source": [ + "# read json file and it should contain at least the three keys as shown below\n", + "path_to_json = 'dataset.json' #@param {type:\"string\"}\n", + "data = read_json(path_to_json)\n", + "print(data.keys())\n", + "\n", + "# create a copy to compare the results in the end\n", + "data_preprocessing = copy.deepcopy(data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "G8w7MfDtvDIq", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "4c33d866-e788-4b59-cdb2-b3b929c25f7b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "100%|██████████| 6/6 [00:00<00:00, 51463.85it/s]" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "Total number of wrong annotated labels are 5\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "\n" + ] + } + ], + "source": [ + "# checking labeling mistakes as all annotated labels should have 6 keywords connected by '_' \n", + "num = 0\n", + "for i in tqdm.tqdm(data['categories']):\n", + " if len(i['name'].split('_')) != 6:\n", + " num += 1\n", + "print('\\nTotal number of wrong annotated labels are', num)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "q2jOWegZxPEp", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "0d44c754-aead-422a-c6bc-3fa304acd2b5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "Total number of labels which has less than 6 keywords are 0\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[{'id': 0,\n", + " 'name': 'plastics_HDPE_flexible_color_SAchets-&-pouch_pouch',\n", + " 'supercategory': 'plastics_HDPE_flexible_color_SAchets-&-pouch_pouch'},\n", + " {'id': 1,\n", + " 'name': 'Plastics_HDPE_Rigid_Blue_Lid_Bottle-Cap_Na_Na',\n", + " 'supercategory': 'Plastics_HDPE_Rigid_Blue_Lid_Bottle-Cap_Na_Na'},\n", + " {'id': 2,\n", + " 'name': 'Plastics_peTE_Na_Clear_Bottle_Shampoo-Bottle_250Ml_Vlcc',\n", + " 'supercategory': 'Plastics_peTE_Na_Clear_Bottle_Shampoo-Bottle_250Ml_Vlcc'},\n", + " {'id': 3,\n", + " 'name': 'Plastics_na_Rigid_Blue_Bottle_Hair-Oil-Bottle-500Ml_Parachute',\n", + " 'supercategory': 'Plastics_na_Rigid_Blue_Bottle_Hair-Oil-Bottle-500Ml_Parachute'},\n", + " {'id': 4,\n", + " 'name': 'Plastics_HDPE_Rigid_Na_Cosmetic_Comb_Na_Na',\n", + " 'supercategory': 'Plastics_HDPE_Rigid_Na_Cosmetic_Comb_Na_Na'},\n", + " {'id': 5,\n", + " 'name': 'Plastics_PETE_Na_Clear_Bottle_Energy-Drink-Bottle_250Ml_Sting-Energy',\n", + " 'supercategory': 'Plastics_PETE_Na_Clear_Bottle_Energy-Drink-Bottle_250Ml_Sting-Energy'}]" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "# remove category labels which has less than 6 keywords\n", + "categories = []\n", + "num = 0\n", + "for i in data['categories']:\n", + " if len(i['name'].split('_')) >= 6:\n", + " categories.append(i)\n", + " else:\n", + " num += 1\n", + "print('\\nTotal number of labels which has less than 6 keywords are', num)\n", + "data['categories'] = categories\n", + "\n", + "# display categories after removing the labels\n", + "data['categories']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qup_-ReIz-iv", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "55963fc5-89f9-4767-b10e-b8ca8ea7a65d" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "100%|██████████| 6/6 [00:00<00:00, 48026.38it/s]\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[{'id': 0,\n", + " 'name': 'plastics_HDPE_flexible_color_SAchets-&-pouch_pouch',\n", + " 'supercategory': 'plastics_HDPE_flexible_color_SAchets-&-pouch_pouch'},\n", + " {'id': 1,\n", + " 'name': 'Plastics_HDPE_Rigid_Blue_Lid_Bottle-Cap-Na-Na',\n", + " 'supercategory': 'Plastics_HDPE_Rigid_Blue_Lid_Bottle-Cap_Na_Na'},\n", + " {'id': 2,\n", + " 'name': 'Plastics_peTE_Na_Clear_Bottle_Shampoo-Bottle-250Ml-Vlcc',\n", + " 'supercategory': 'Plastics_peTE_Na_Clear_Bottle_Shampoo-Bottle_250Ml_Vlcc'},\n", + " {'id': 3,\n", + " 'name': 'Plastics_na_Rigid_Blue_Bottle_Hair-Oil-Bottle-500Ml-Parachute',\n", + " 'supercategory': 'Plastics_na_Rigid_Blue_Bottle_Hair-Oil-Bottle-500Ml_Parachute'},\n", + " {'id': 4,\n", + " 'name': 'Plastics_HDPE_Rigid_Na_Cosmetic_Comb-Na-Na',\n", + " 'supercategory': 'Plastics_HDPE_Rigid_Na_Cosmetic_Comb_Na_Na'},\n", + " {'id': 5,\n", + " 'name': 'Plastics_PETE_Na_Clear_Bottle_Energy-Drink-Bottle-250Ml-Sting-Energy',\n", + " 'supercategory': 'Plastics_PETE_Na_Clear_Bottle_Energy-Drink-Bottle_250Ml_Sting-Energy'}]" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ], + "source": [ + "# According to the collected data it was found that most issues occurs from the\n", + "# 6th keyword which are the sub category of the material form.\n", + "\n", + "for i in tqdm.tqdm(data['categories']):\n", + " l1 = i['name'].split('_')[:5]\n", + " l2 = i['name'].split('_')[5:]\n", + " l1.append('-'.join(l2))\n", + " i['name'] = '_'.join(l1)\n", + "\n", + "# display categories after making corrections\n", + "data['categories']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ro8KNGaGFv7k", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2dcd1d5b-2387-49be-e1ab-45b796169e4b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Dictionary characteristics before processing :\n", + "images: 2 categories: 6 annotations: 6\n", + "\n", + "Dictionary characteristics after processing of material_type_annotation :\n", + "images: 1 categories: 10 annotations: 5\n", + "\n", + "Dictionary characteristics after processing of material_form_type_annotation :\n", + "images: 1 categories: 34 annotations: 5\n", + "\n", + "Dictionary characteristics after processing of plastic_type_annotation :\n", + "images: 1 categories: 9 annotations: 4\n" + ] + } + ], + "source": [ + "print('Dictionary characteristics before processing :')\n", + "print('images:',len(data_preprocessing['images']),'categories:', len(data_preprocessing['categories']),'annotations:',len(data_preprocessing['annotations']))\n", + "\n", + "list_of_categories = [(list_of_material,0,'material_type_annotation.json'),\\\n", + " (list_of_material_form,4,'material_form_type_annotation.json'),\\\n", + " (list_of_plastic_type,1,'plastic_type_annotation.json')]\n", + "\n", + "for m in list_of_categories:\n", + "\n", + " data_processing = copy.deepcopy(data)\n", + "\n", + " # create a dict showing TDs corresponding to the labels & convert all words\n", + " # to lower case in order to eliminate case sensitive issues\n", + " labels_dict = dict([(i['id'], i['name'].lower()) for i in m[0]])\n", + "\n", + " # correcting grammatical errors\n", + " spelling_correction(data_processing)\n", + "\n", + " # create a mapping table to map each label to the right label structure.\n", + " # find the incorrect labels.\n", + " mapping_dict, incorrect_labels = labeling_correction(data_processing, m[1], labels_dict) \n", + "\n", + " # change the 'categories' key\n", + " data_processing['categories'] = m[0]\n", + "\n", + " # change the 'annotation' key\n", + " data_processing['annotations'] = annotations_key(data_processing, incorrect_labels, mapping_dict)\n", + "\n", + " # change the 'images' key\n", + " data_processing['images'] = images_key(data_processing)\n", + "\n", + " # remove data from the 'images' key not present in the image folder\n", + " data_processing = annotated_images(images_folder_path, data_processing)\n", + "\n", + " # align 'images' and 'annotations' key\n", + " data_processing = image_annotation_key(data_processing)\n", + "\n", + " # write to a new JSON file\n", + " with open(m[2], 'w') as opened_file:\n", + " opened_file.write(json.dumps(data_processing, indent=4))\n", + "\n", + " print('\\nDictionary characteristics after processing of', m[2].replace('.json','') ,':')\n", + " print('images:',len(data_processing['images']),'categories:', len(data_processing['categories']),'annotations:',len(data_processing['annotations'])) " + ] + }, + { + "cell_type": "code", + "source": [ + "# View the final JSON file\n", + "try:\n", + " files.view(m[2]) # use files.download to download the file\n", + "except ImportError:\n", + " pass" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 17 + }, + "id": "nC6XzQYL15Ki", + "outputId": "ea8e0a7d-79bc-4321-8a63-4855afc190e1" + }, + "execution_count": 13, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "" + ], + "application/javascript": [ + "\n", + " ((filepath) => {{\n", + " if (!google.colab.kernel.accessAllowed) {{\n", + " return;\n", + " }}\n", + " google.colab.files.view(filepath);\n", + " }})(\"/content/plastic_type_annotation.json\")" + ] + }, + "metadata": {} + } + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "json_preparation.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/official/projects/waste_identification_ml/pre_processing/labelme_to_coco.ipynb b/official/projects/waste_identification_ml/pre_processing/labelme_to_coco.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..d0ca6a305425fc109fd17f0aa4ec3382178deaff --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/labelme_to_coco.ipynb @@ -0,0 +1,224 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "wINm_lPYZhlO" + }, + "source": [ + "# Convert label me annotations to COCO JSON format" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jlmasaZNtJ6C" + }, + "source": [ + "Given the images and their corresponding annotated JSON files exported from the \"labelme\" tool. Our goal is to generate COCO format JSON files." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M-LDdP-NtT3F" + }, + "source": [ + "We will use an open source library called labelme2coco to convert all the JSON files from \"labelme\" tool to COCO JSON format." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RojnXi7lfLSA" + }, + "source": [ + "Put the annotated JSON files and their corresponding images in the same folder and create another folder for storing the output COCO JSON file and then use the labelme2coco tool to export the output." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ud4OzlNMtHv7" + }, + "outputs": [], + "source": [ + "# install the library and RESTART runtime\n", + "!pip install labelme2coco" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nWgcQzpHt4hq" + }, + "outputs": [], + "source": [ + "# import the library\n", + "import labelme2coco\n", + "import json" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "executionInfo": { + "elapsed": 56, + "status": "ok", + "timestamp": 1660342052001, + "user": { + "displayName": "", + "userId": "" + }, + "user_tz": 420 + }, + "id": "3Wt3WELFXO5o" + }, + "outputs": [], + "source": [ + "# Import a sample annotation file exported from \"labelme\" tool and its corresponding image\n", + "!curl -O https://raw.githubusercontent.com/tensorflow/models/master/official/projects/\\\n", + "waste_identification_ml/pre_processing/config/sample_json/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.json\n", + "\n", + "# Import the corresponding image file mentioned in the annotation file\n", + "!curl -O https://raw.githubusercontent.com/tensorflow/models/master/official/projects/waste_identification_ml/\\\n", + "pre_processing/config/sample_images/ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9A8aivRPXXqF", + "outputId": "047b6300-5e58-4d0d-c993-a3a29ce98966" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "dict_keys(['version', 'flags', 'shapes', 'imagePath', 'imageData', 'imageHeight', 'imageWidth'])" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# check the content of the file exported from the \"labelme\" tool\n", + "with open('ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.json') as json_file:\n", + " data = json.load(json_file)\n", + "data.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2wQKjve3XpG7" + }, + "outputs": [], + "source": [ + "%%bash\n", + "\n", + "# set input directory that contains labelme annotations and image files\n", + "mkdir labelme_folder\n", + "\n", + "# set output dir\n", + "mkdir export_dir\n", + "\n", + "# put all the annotation files exported from \"labelme\" tool and their corresponding images!ls in the labelme_folder\n", + "mv ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.json labelme_folder/\n", + "mv ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg labelme_folder/" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "i5G7dOOycExg", + "outputId": "f235fe9c-7ad8-439a-c89d-26a331ecc31c" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 1 listed files in folder .\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Converting labelme annotations to COCO format: 100%|██████████| 1/1 [00:00\u003c00:00, 171.26it/s]\n" + ] + } + ], + "source": [ + "input = '/content/labelme_folder/'\n", + "output = '/content/export_dir/'\n", + "\n", + "# it will combine all the JSON files and convert them to COCO JSON format\n", + "labelme2coco.convert(input, output)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "S6jqV6PtbOtA", + "outputId": "15e02124-6558-4427-ec88-5f0bccbbb33e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dict_keys(['images', 'annotations', 'categories'])\n", + "[{'id': 0, 'name': 'plastics_HDPE_flexible_color_SAchets-\u0026-pouch_pouch', 'supercategory': 'plastics_HDPE_flexible_color_SAchets-\u0026-pouch_pouch'}]\n", + "[{'height': 2048, 'width': 2592, 'id': 1, 'file_name': 'ffdeb4cd-43ba-4ca0-a1e6-aa5824005f44.jpg'}]\n", + "[832, 255, 729, 697]\n" + ] + } + ], + "source": [ + "# check the content of the output COCO JSON format\n", + "with open('export_dir/dataset.json') as json_file:\n", + " data = json.load(json_file)\n", + "print(data.keys())\n", + "print(data['categories'])\n", + "print(data['images'])\n", + "print(data['annotations'][0]['bbox'])" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "labelme_to_coco.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/waste_identification_ml/pre_processing/merge_coco_files.ipynb b/official/projects/waste_identification_ml/pre_processing/merge_coco_files.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..3a940f1de3e301fdf620a1040c12996a2c248c85 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/merge_coco_files.ipynb @@ -0,0 +1,392 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "0cTM_BOrBUSU" + }, + "source": [ + "# Merge multiple COCO annotation JSON files into one file. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cwJuts2DBaaU" + }, + "source": [ + "Given multiple COCO annotated JSON files, your goal is to merge them into one COCO annotated JSON file. \n", + "\n", + "A merged COCO annotated JSON file is required where all the data is in one place and it becomes easy to split it into a training and validation JSON file according to the percentage ratio. In case you already have a validated COCO annotated JSON file, then this notebook can be used to merge multiple files into one training COCO annotated JSON file. \n", + "\n", + "This notebook uses a third party library to accomplish this task. Recursion is used to combine multiple JSON files using a third party library. \n", + "\n", + "This notebook is an end to end example. When you run the notebook, it will take all the multiple JSON files and will output one JSON file. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aDE5tjSUm1qu" + }, + "source": [ + "**Note** - In this example, we assume that all our data is saved on Google drive and we will also write our outputs to Google drive. We also assume that the script will be used as a Google Colab notebook. But this can be changed according to the needs of users. They can modify this in case they are working on their local workstation, remote server or any other database. This colab notebook can be changed to a regular jupyter notebook running on a local machine according to the need of the users." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BM-tYHTlWhDQ" + }, + "source": [ + "## **MUST DO** - Install the package and restart runtime" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "zegGuOmQGBOr" + }, + "outputs": [], + "source": [ + "# install python object detection insights library to merge multiple COCO annotation files\n", + "!pip install pyodi\n", + "\n", + "# RESTART THE RUNTIME in order to use this library" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l7eLOdQ5F33b" + }, + "source": [ + "## Run the below command to connect to your google drive" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "h5soS-6URktT" + }, + "outputs": [], + "source": [ + "# import other libraries\n", + "from google.colab import drive\n", + "import pyodi\n", + "import subprocess\n", + "import sys\n", + "import os\n", + "import json\n", + "import numpy as np\n", + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "mTXqVbFdlqxi", + "outputId": "b12566b2-458f-4673-eb97-7cf30075e258" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Mounted at /content/gdrive\n", + "Successful\n" + ] + } + ], + "source": [ + "# connect to google drive\n", + "drive.mount('/content/gdrive')\n", + "\n", + "# making an alias for the root path\n", + "try:\n", + " !ln -s /content/gdrive/My\\ Drive/ /mydrive\n", + " print('Successful')\n", + "except Exception as e:\n", + " print(e)\n", + " print('Not successful')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "U-80gvViT833" + }, + "source": [ + "## Visualization function" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ipOuE69eT6Vk" + }, + "outputs": [], + "source": [ + "def data_creation(path: str) -\u003e pd.DataFrame:\n", + " \"\"\"Create a dataframe with the occurences of images and categories.\n", + " Args:\n", + " path: path to the annotated JSON file.\n", + " Returns:\n", + " dataset consisting of the counts of images and categories.\n", + " \"\"\"\n", + " # get annotation file data into a variable\n", + " with open(path) as json_file:\n", + " data = json.load(json_file)\n", + "\n", + " # count the occurance of each category and an image in the annotation file\n", + " category_names = [i['name'] for i in data['categories']]\n", + " category_ids = [i['category_id'] for i in data['annotations']]\n", + " image_ids = [i['image_id'] for i in data['annotations']]\n", + "\n", + " # create a dataframe\n", + " df = pd.DataFrame(\n", + " list(zip(category_ids, image_ids)), columns=['category_ids', 'image_ids'])\n", + " df = df.groupby('category_ids').agg(\n", + " object_count=('category_ids', 'count'),\n", + " image_count=('image_ids', 'nunique'))\n", + " df = df.reindex(range(1, len(data['categories']) + 1), fill_value=0)\n", + " df.index = category_names\n", + " return df\n", + "\n", + "def visualize_detailed_counts_horizontally(path: str) -\u003e None:\n", + " \"\"\"Plot a vertical bar graph showing the counts of images \u0026 categories.\n", + " Args:\n", + " path: path to the annotated JSON file.\n", + " \"\"\"\n", + " df = data_creation(path)\n", + " ax = df.plot(\n", + " kind='bar',\n", + " figsize=(40, 10),\n", + " xlabel='Categories',\n", + " ylabel='Counts',\n", + " width=0.8,\n", + " linewidth=1,\n", + " edgecolor='white') # rot = 0 for horizontal labeling\n", + " for p in ax.patches:\n", + " ax.annotate(\n", + " text=np.round(p.get_height()),\n", + " xy=(p.get_x() + p.get_width() / 2., p.get_height()),\n", + " ha='center',\n", + " va='top',\n", + " xytext=(4, 14),\n", + " textcoords='offset points')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4hF5F_QE627R" + }, + "source": [ + "## Define the paths of inputs and outputs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "qUE6cHse3zOT" + }, + "outputs": [], + "source": [ + "def list_full_paths(directory):\n", + " '''return the files names in a directory with absolute path.\n", + " Args:\n", + " directory: path where all the files that need to merge are saved.\n", + " '''\n", + " return [os.path.join(directory, file) for file in os.listdir(directory)]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "a5UVAF--2nas" + }, + "outputs": [], + "source": [ + "folder_with_jsons = '/mydrive/TFHub/jsons/' #@param {type:\"string\"}\n", + "output_merged_file = '/mydrive/TFHub/jsons/merged.json' #@param {type:\"string\"}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "80gZnQCU30gK", + "outputId": "d3fdb152-c63a-42b5-f5bf-f4d50eb16d02" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "8" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# get a list of all the JSON files that need to merge with their absolute paths\n", + "list_of_json_files = list_full_paths(folder_with_jsons)\n", + "len(list_of_json_files)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6ot4VWOcSTWO" + }, + "source": [ + "# Merge the files" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hngtw3K6S7Qx" + }, + "outputs": [], + "source": [ + "def merge_two_files(file1, file2, output_file):\n", + " \"\"\"Function to merge 2 files\n", + "\n", + " Args:\n", + " file1: path of the 1st COCO annotation json file\n", + " file2: path of the 2nd COCO annotation json file\n", + " output_file: path of the output COCO annotation json file after merge\n", + "\n", + " Returns:\n", + " Path of the merged COCO annotation json file\n", + " \"\"\"\n", + " subprocess.run(['pyodi', 'coco', 'merge', file1, file2, output_file])\n", + " return output_file\n", + "\n", + "def merge_multiple_files(list_of_files,output_file_path):\n", + " \"\"\"Recursive function to merge multiple files\n", + "\n", + " Args:\n", + " list_of_files: list of all the COCO annotation json files that need to be merged \n", + " output_file_path: path of the output COCO annotation json file after merge\n", + "\n", + " Returns:\n", + " Path of the merged COCO annotation json file\n", + " \"\"\"\n", + " if len(list_of_files) == 2:\n", + " return merge_two_files(list_of_files[0], list_of_files[1], output_file_path)\n", + "\n", + " else:\n", + " return merge_two_files(list_of_files[0], merge_multiple_files(list_of_files[1:], output_file_path), output_file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "__Wj0rO2D-HT" + }, + "source": [ + "The output of the below code will be a merged COCO annotation file in the same directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZMKFOjchVMlH", + "outputId": "815a83a3-43ae-478d-abaa-58181f034b94" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total number of files to merge : 8\n", + "Merge Done\n" + ] + } + ], + "source": [ + "# call function to merge multiple files\n", + "print('Total number of files to merge :', len(list_of_json_files))\n", + "merge_multiple_files(list_of_json_files, output_merged_file)\n", + "\n", + "print('Merge Done')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vf4ESHiDEQYF" + }, + "source": [ + "## Visualize the results" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 429 + }, + "id": "UBwMf-W0EqG1", + "outputId": "9bfbfb43-2ad2-4c18-8d75-05ea56a9573e" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 2880x720 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# visualize the merged COCO annotated JSON file\n", + "visualize_detailed_counts_horizontally(output_merged_file)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/waste_identification_ml/pre_processing/split_coco_files.ipynb b/official/projects/waste_identification_ml/pre_processing/split_coco_files.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..5e83659ac4fbbfb9fe6fe42939dd321cea4a7db0 --- /dev/null +++ b/official/projects/waste_identification_ml/pre_processing/split_coco_files.ipynb @@ -0,0 +1,358 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "m0qQu-luFmB5" + }, + "source": [ + "# Split one COCO annotation JSON file into training and validation JSON files." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9NGkWKGrF3pc" + }, + "source": [ + "Given a single COCO annotated JSON file, your goal is to split them into training and validation COCO annotated JSON files.\n", + "\n", + " A single JSON file needs to be split into training and validation files. The output files will be further converted to TFRecord files using another notebook.\n", + "\n", + "This notebook uses a third party library to accomplish this task. The library can split the JSON files according to the ratio. We kept the validation file to contain 20% of the data. \n", + "\n", + "This notebook is an end to end example. When you run the notebook, it will take one JSON file and will split into a train and a val JSON file." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GIjj-vE-n1e3" + }, + "source": [ + "**Note** - In this example, we assume that all our data is saved on Google drive and we will also write our outputs to Google drive. We also assume that the script will be used as a Google Colab notebook. But this can be changed according to the needs of users. They can modify this in case they are working on their local workstation, remote server or any other database. This colab notebook can be changed to a regular jupyter notebook running on a local machine according to the need of the users." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QElyM7FtWv5E" + }, + "source": [ + "## **MUST DO** - Install and restart runtime" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "WMy_xu64FJ1j" + }, + "outputs": [], + "source": [ + "# install python object detection insights library to merge multiple COCO annotation files\n", + "!pip install pyodi\n", + "\n", + "# RESTART THE RUNTIME in order to use this library" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tySpWIuVFPj0" + }, + "source": [ + "## Run the below command to connect to your google drive" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "RfJAkMY9FSPz" + }, + "outputs": [], + "source": [ + "# import other libraries\n", + "from google.colab import drive\n", + "import pyodi\n", + "import subprocess\n", + "import sys\n", + "import os\n", + "import json\n", + "import numpy as np\n", + "import pandas as pd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "AOLmsOOZFVdJ", + "outputId": "f7f6dba8-0872-4d21-d55d-2b95c42a06a4" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Mounted at /content/gdrive\n", + "Successful\n" + ] + } + ], + "source": [ + "# connect to google drive\n", + "drive.mount('/content/gdrive')\n", + "\n", + "# making an alias for the root path\n", + "try:\n", + " !ln -s /content/gdrive/My\\ Drive/ /mydrive\n", + " print('Successful')\n", + "except Exception as e:\n", + " print(e)\n", + " print('Not successful')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v0vJRt5qUOD_" + }, + "source": [ + "## Visualization function" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "HbNMcLBmUOZ2" + }, + "outputs": [], + "source": [ + "def data_creation(path: str) -\u003e pd.DataFrame:\n", + " \"\"\"Create a dataframe with the occurences of images and categories.\n", + " Args:\n", + " path: path to the annotated JSON file.\n", + " Returns:\n", + " dataset consisting of the counts of images and categories.\n", + " \"\"\"\n", + " # get annotation file data into a variable\n", + " with open(path) as json_file:\n", + " data = json.load(json_file)\n", + "\n", + " # count the occurance of each category and an image in the annotation file\n", + " category_names = [i['name'] for i in data['categories']]\n", + " category_ids = [i['category_id'] for i in data['annotations']]\n", + " image_ids = [i['image_id'] for i in data['annotations']]\n", + "\n", + " # create a dataframe\n", + " df = pd.DataFrame(\n", + " list(zip(category_ids, image_ids)), columns=['category_ids', 'image_ids'])\n", + " df = df.groupby('category_ids').agg(\n", + " object_count=('category_ids', 'count'),\n", + " image_count=('image_ids', 'nunique'))\n", + " df = df.reindex(range(1, len(data['categories']) + 1), fill_value=0)\n", + " df.index = category_names\n", + " return df\n", + "\n", + "def visualize_detailed_counts_horizontally(path: str) -\u003e None:\n", + " \"\"\"Plot a vertical bar graph showing the counts of images \u0026 categories.\n", + " Args:\n", + " path: path to the annotated JSON file.\n", + " \"\"\"\n", + " df = data_creation(path)\n", + " ax = df.plot(\n", + " kind='bar',\n", + " figsize=(40, 10),\n", + " xlabel='Categories',\n", + " ylabel='Counts',\n", + " width=0.8,\n", + " linewidth=1,\n", + " edgecolor='white') # rot = 0 for horizontal labeling\n", + " for p in ax.patches:\n", + " ax.annotate(\n", + " text=np.round(p.get_height()),\n", + " xy=(p.get_x() + p.get_width() / 2., p.get_height()),\n", + " ha='center',\n", + " va='top',\n", + " xytext=(4, 14),\n", + " textcoords='offset points')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gmTfyRQo9pT3" + }, + "source": [ + "## Define the paths of inputs and outputs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "nl5MrEPR9q9x" + }, + "outputs": [], + "source": [ + "input_file = '/mydrive/TFHub/jsons/merged.json' #@param {type:\"string\"}\n", + "output_folder = '/mydrive/TFHub/jsons/' #@param {type:\"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2E7P4_2eFaPB" + }, + "source": [ + "## Split coco annotation file into train and val COCO files" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "9HLYrO4JGKFm", + "outputId": "a31b04fa-0d7c-4c22-cd18-58672d5a29e7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[32m2022-09-09 21:40:00.173\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mpyodi.apps.coco.coco_split\u001b[0m:\u001b[36mrandom_split\u001b[0m:\u001b[36m183\u001b[0m - \u001b[1mGathering images...\u001b[0m\n", + "\u001b[32m2022-09-09 21:40:00.192\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mpyodi.apps.coco.coco_split\u001b[0m:\u001b[36mrandom_split\u001b[0m:\u001b[36m194\u001b[0m - \u001b[1mGathering annotations...\u001b[0m\n", + "\u001b[32m2022-09-09 21:40:11.078\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mpyodi.apps.coco.coco_split\u001b[0m:\u001b[36mrandom_split\u001b[0m:\u001b[36m218\u001b[0m - \u001b[1mSaving splits to file...\u001b[0m\n", + "/mydrive/TFHub/jsons/_train.json\n", + "/mydrive/TFHub/jsons/_val.json\n" + ] + } + ], + "source": [ + "# split a COCO annotation file into train and val files\n", + "!pyodi coco random-split $input_file $output_folder --val-percentage 0.2\n", + "\n", + "# there will be two files with name '_train.json' and '_val.json' in the output_folder" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wLnDJLIuMf8o" + }, + "source": [ + "## Visualization" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 429 + }, + "id": "2dNcl3XCMLDX", + "outputId": "4dd5fa70-ddda-4afa-911d-b22eced089da" + }, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 2880x720 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# visualization of the input COCO annotated JSON file\n", + "visualize_detailed_counts_horizontally(input_file)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 878 + }, + "id": "GHZZ3aLbMO35", + "outputId": "89b78d18-112b-413b-a417-71599ff87a53" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Train JSON\n", + "Validation JSON\n" + ] + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 2880x720 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "\u003cFigure size 2880x720 with 1 Axes\u003e" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# visualization of the training COCO annotated JSON file\n", + "print('Train JSON')\n", + "visualize_detailed_counts_horizontally(output_folder + '_train.json')\n", + "\n", + "print('Validation JSON')\n", + "# visualization of the validation COCO annotated JSON file\n", + "visualize_detailed_counts_horizontally(output_folder + '_val.json')" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "provenance": [] + }, + "gpuClass": "standard", + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/official/projects/yolo/README.md b/official/projects/yolo/README.md new file mode 100644 index 0000000000000000000000000000000000000000..1dc55188c3caa08c077ea93540d03a53959be2d9 --- /dev/null +++ b/official/projects/yolo/README.md @@ -0,0 +1,81 @@ +# YOLO Object Detectors, You Only Look Once + +[![Paper](http://img.shields.io/badge/Paper-arXiv.1804.02767-B3181B?logo=arXiv)](https://arxiv.org/abs/1804.02767) +[![Paper](http://img.shields.io/badge/Paper-arXiv.2004.10934-B3181B?logo=arXiv)](https://arxiv.org/abs/2004.10934) + +This repository is the unofficial implementation of the following papers. +However, we spent painstaking hours ensuring that every aspect that we +constructed was the exact same as the original paper and the original +repository. + +* YOLOv3: An Incremental Improvement: [YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767) + +* YOLOv4: Optimal Speed and Accuracy of Object Detection: [YOLOv4: Optimal Speed and Accuracy of Object Detection](https://arxiv.org/abs/2004.10934) + +## Description + +YOLO v1 the original implementation was released in 2015 providing a +ground breaking algorithm that would quickly process images and locate objects +in a single pass through the detector. The original implementation used a +backbone derived from state of the art object classifiers of the time, like +[GoogLeNet](https://arxiv.org/abs/1409.4842) and +[VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel +YOLO Detection head that allowed for Object Detection with a single pass of an +image. Though limited, the network could predict up to 90 bounding boxes per +image, and was tested for about 80 classes per box. Also, the model can only +make predictions at one scale. These attributes caused YOLO v1 to be more +limited and less versatile, so as the year passed, the Developers continued to +update and develop this model. + +YOLO v3 and v4 serve as the most up to date and capable versions of the YOLO +network group. This model uses a custom backbone called Darknet53 that uses +knowledge gained from the ResNet paper to improve its predictions. The new +backbone also allows for objects to be detected at multiple scales. As for the +new detection head, the model now predicts the bounding boxes using a set of +anchor box priors (Anchor Boxes) as suggestions. Multiscale predictions in +combination with Anchor boxes allow for the network to make up to 1000 object +predictions on a single image. Finally, the new loss function forces the network +to make better predictions by using Intersection Over Union (IOU) to inform the +model's confidence rather than relying on the mean squared error for the entire +output. + + +## Authors + +* Vishnu Samardh Banna ([@GitHub vishnubanna](https://github.com/vishnubanna)) +* Anirudh Vegesana ([@GitHub anivegesana](https://github.com/anivegesana)) +* Akhil Chinnakotla ([@GitHub The-Indian-Chinna](https://github.com/The-Indian-Chinna)) +* Tristan Yan ([@GitHub Tyan3001](https://github.com/Tyan3001)) +* Naveen Vivek ([@GitHub naveen-vivek](https://github.com/naveen-vivek)) + +## Table of Contents + +* [Our Goal](#our-goal) +* [Models in the library](#models-in-the-library) +* [References](#references) + + +## Our Goal + +Our goal with this model conversion is to provide implementation of the Backbone +and YOLO Head. We have built the model in such a way that the YOLO head could be +connected to a new, more powerful backbone if a person chose to. + +## Models in the library + +| Object Detectors | Classifiers | +| :--------------: | :--------------: | +| Yolo-v3 | Darknet53 | +| Yolo-v3 tiny | CSPDarknet53 | +| Yolo-v3 spp | +| Yolo-v4 | +| Yolo-v4 tiny | +| Yolo-v4 csp | +| Yolo-v4 large | + +## Models Zoo + + +## Requirements +[![TensorFlow 2.6](https://img.shields.io/badge/TensorFlow-2.6-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.6.0) +[![Python 3.8](https://img.shields.io/badge/Python-3.8-3776AB)](https://www.python.org/downloads/release/python-380/) diff --git a/official/projects/yolo/__init__.py b/official/projects/yolo/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/yolo/common/__init__.py b/official/projects/yolo/common/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/common/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/yolo/common/registry_imports.py b/official/projects/yolo/common/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..6c4e0298fa108553aba0403d49e48b6e5d076113 --- /dev/null +++ b/official/projects/yolo/common/registry_imports.py @@ -0,0 +1,36 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration.""" + +# pylint: disable=unused-import +# pylint: disable=g-bad-import-order +from official.vision import registry_imports + +# import configs +from official.projects.yolo.configs import darknet_classification +from official.projects.yolo.configs import yolo as yolo_config + +# import modeling components +from official.projects.yolo.modeling.backbones import darknet +from official.projects.yolo.modeling.decoders import yolo_decoder + +# import tasks +from official.projects.yolo.tasks import image_classification +from official.projects.yolo.tasks import yolo as yolo_task + +# import optimization packages +from official.projects.yolo.optimization import optimizer_factory +from official.projects.yolo.optimization.configs import optimizer_config +from official.projects.yolo.optimization.configs import optimization_config diff --git a/official/projects/yolo/configs/__init__.py b/official/projects/yolo/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/configs/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/yolo/configs/backbones.py b/official/projects/yolo/configs/backbones.py new file mode 100644 index 0000000000000000000000000000000000000000..f397809a12dcc6f4b19b46eefe0d324466c7352a --- /dev/null +++ b/official/projects/yolo/configs/backbones.py @@ -0,0 +1,36 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Backbones configurations.""" +import dataclasses +from official.modeling import hyperparams +from official.vision.configs import backbones + + +@dataclasses.dataclass +class Darknet(hyperparams.Config): + """DarkNet config.""" + model_id: str = 'cspdarknet53' + width_scale: float = 1.0 + depth_scale: float = 1.0 + dilate: bool = False + min_level: int = 3 + max_level: int = 5 + use_separable_conv: bool = False + use_reorg_input: bool = False + + +@dataclasses.dataclass +class Backbone(backbones.Backbone): + darknet: Darknet = Darknet() diff --git a/official/vision/beta/projects/yolo/configs/darknet_classification.py b/official/projects/yolo/configs/darknet_classification.py similarity index 90% rename from official/vision/beta/projects/yolo/configs/darknet_classification.py rename to official/projects/yolo/configs/darknet_classification.py index d3022d522e5d6be9069d12e4ecbaf0f7d904b9cb..1b534bada42a5070cc1287c8234fc7478454466b 100644 --- a/official/vision/beta/projects/yolo/configs/darknet_classification.py +++ b/official/projects/yolo/configs/darknet_classification.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,9 +20,9 @@ from typing import List, Optional from official.core import config_definitions as cfg from official.core import exp_factory from official.modeling import hyperparams -from official.vision.beta.configs import common -from official.vision.beta.configs import image_classification as imc -from official.vision.beta.projects.yolo.configs import backbones +from official.projects.yolo.configs import backbones +from official.vision.configs import common +from official.vision.configs import image_classification as imc @dataclasses.dataclass diff --git a/official/projects/yolo/configs/decoders.py b/official/projects/yolo/configs/decoders.py new file mode 100644 index 0000000000000000000000000000000000000000..2a796a1e29b7d06dfb65ee8be9d76624fb30e0a9 --- /dev/null +++ b/official/projects/yolo/configs/decoders.py @@ -0,0 +1,48 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Decoders configurations.""" +import dataclasses +from typing import Optional +from official.modeling import hyperparams +from official.vision.configs import decoders + + +@dataclasses.dataclass +class YoloDecoder(hyperparams.Config): + """Builds Yolo decoder. + + If the name is specified, or version is specified we ignore input parameters + and use version and name defaults. + """ + version: Optional[str] = None + type: Optional[str] = None + use_fpn: Optional[bool] = None + use_spatial_attention: bool = False + use_separable_conv: bool = False + csp_stack: Optional[bool] = None + fpn_depth: Optional[int] = None + max_fpn_depth: Optional[int] = None + max_csp_stack: Optional[int] = None + fpn_filter_scale: Optional[int] = None + path_process_len: Optional[int] = None + max_level_process_len: Optional[int] = None + embed_spp: Optional[bool] = None + activation: Optional[str] = 'same' + + +@dataclasses.dataclass +class Decoder(decoders.Decoder): + type: Optional[str] = 'yolo_decoder' + yolo_decoder: YoloDecoder = YoloDecoder() diff --git a/official/vision/beta/projects/yolo/configs/experiments/darknet/csp_darknet53.yaml b/official/projects/yolo/configs/experiments/darknet/csp_darknet53.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/darknet/csp_darknet53.yaml rename to official/projects/yolo/configs/experiments/darknet/csp_darknet53.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/darknet/csp_darknet53_tfds.yaml b/official/projects/yolo/configs/experiments/darknet/csp_darknet53_tfds.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/darknet/csp_darknet53_tfds.yaml rename to official/projects/yolo/configs/experiments/darknet/csp_darknet53_tfds.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/darknet/darknet53.yaml b/official/projects/yolo/configs/experiments/darknet/darknet53.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/darknet/darknet53.yaml rename to official/projects/yolo/configs/experiments/darknet/darknet53.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/darknet/darknet53_tfds.yaml b/official/projects/yolo/configs/experiments/darknet/darknet53_tfds.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/darknet/darknet53_tfds.yaml rename to official/projects/yolo/configs/experiments/darknet/darknet53_tfds.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_csp_640_tpu.yaml b/official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_csp_640_tpu.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_csp_640_tpu.yaml rename to official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_csp_640_tpu.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p5_896_tpu.yaml b/official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p5_896_tpu.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p5_896_tpu.yaml rename to official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p5_896_tpu.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p6_1280_tpu.yaml b/official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p6_1280_tpu.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p6_1280_tpu.yaml rename to official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p6_1280_tpu.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p7_1536_tpu.yaml b/official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p7_1536_tpu.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p7_1536_tpu.yaml rename to official/projects/yolo/configs/experiments/scaled-yolo/detection/yolo_l_p7_1536_tpu.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/tpu/640.yaml b/official/projects/yolo/configs/experiments/scaled-yolo/tpu/640.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/scaled-yolo/tpu/640.yaml rename to official/projects/yolo/configs/experiments/scaled-yolo/tpu/640.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/yolov4/detection/yolov4_512_tpu.yaml b/official/projects/yolo/configs/experiments/yolov4/detection/yolov4_512_tpu.yaml old mode 100755 new mode 100644 similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/yolov4/detection/yolov4_512_tpu.yaml rename to official/projects/yolo/configs/experiments/yolov4/detection/yolov4_512_tpu.yaml diff --git a/official/vision/beta/projects/yolo/configs/experiments/yolov4/imagenet_pretraining/cspdarknet53_256_tpu.yaml b/official/projects/yolo/configs/experiments/yolov4/imagenet_pretraining/cspdarknet53_256_tpu.yaml similarity index 100% rename from official/vision/beta/projects/yolo/configs/experiments/yolov4/imagenet_pretraining/cspdarknet53_256_tpu.yaml rename to official/projects/yolo/configs/experiments/yolov4/imagenet_pretraining/cspdarknet53_256_tpu.yaml diff --git a/official/projects/yolo/configs/yolo.py b/official/projects/yolo/configs/yolo.py new file mode 100644 index 0000000000000000000000000000000000000000..c24ff9719bee47c6ab40e39d9da6e1b402658262 --- /dev/null +++ b/official/projects/yolo/configs/yolo.py @@ -0,0 +1,519 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""YOLO configuration definition.""" +import dataclasses +import os +from typing import Any, List, Optional, Union + +import numpy as np + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.projects.yolo import optimization +from official.projects.yolo.configs import backbones +from official.projects.yolo.configs import decoders +from official.vision.configs import common + + +# pytype: disable=annotation-type-mismatch + +MIN_LEVEL = 1 +MAX_LEVEL = 7 +GLOBAL_SEED = 1000 + + +def _build_dict(min_level, max_level, value): + vals = {str(key): value for key in range(min_level, max_level + 1)} + vals['all'] = None + return lambda: vals + + +def _build_path_scales(min_level, max_level): + return lambda: {str(key): 2**key for key in range(min_level, max_level + 1)} + + +@dataclasses.dataclass +class FPNConfig(hyperparams.Config): + """FPN config.""" + all: Optional[Any] = None + + def get(self): + """Allow for a key for each level or a single key for all the levels.""" + values = self.as_dict() + if 'all' in values and values['all'] is not None: + for key in values: + if key != 'all': + values[key] = values['all'] + return values + + +# pylint: disable=missing-class-docstring +@dataclasses.dataclass +class TfExampleDecoder(hyperparams.Config): + regenerate_source_id: bool = False + coco91_to_80: bool = True + + +@dataclasses.dataclass +class TfExampleDecoderLabelMap(hyperparams.Config): + regenerate_source_id: bool = False + label_map: str = '' + + +@dataclasses.dataclass +class DataDecoder(hyperparams.OneOfConfig): + type: Optional[str] = 'simple_decoder' + simple_decoder: TfExampleDecoder = TfExampleDecoder() + label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap() + + +@dataclasses.dataclass +class Mosaic(hyperparams.Config): + mosaic_frequency: float = 0.0 + mixup_frequency: float = 0.0 + mosaic_center: float = 0.2 + mosaic_crop_mode: Optional[str] = None + aug_scale_min: float = 1.0 + aug_scale_max: float = 1.0 + jitter: float = 0.0 + + +@dataclasses.dataclass +class Parser(hyperparams.Config): + max_num_instances: int = 200 + letter_box: Optional[bool] = True + random_flip: bool = True + random_pad: float = False + jitter: float = 0.0 + aug_scale_min: float = 1.0 + aug_scale_max: float = 1.0 + aug_rand_saturation: float = 0.0 + aug_rand_brightness: float = 0.0 + aug_rand_hue: float = 0.0 + aug_rand_angle: float = 0.0 + aug_rand_translate: float = 0.0 + aug_rand_perspective: float = 0.0 + use_tie_breaker: bool = True + best_match_only: bool = False + anchor_thresh: float = -0.01 + area_thresh: float = 0.1 + mosaic: Mosaic = Mosaic() + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + global_batch_size: int = 64 + input_path: str = '' + tfds_name: str = '' + tfds_split: str = '' + global_batch_size: int = 1 + is_training: bool = True + dtype: str = 'float16' + decoder: DataDecoder = DataDecoder() + parser: Parser = Parser() + shuffle_buffer_size: int = 10000 + tfds_download: bool = True + cache: bool = False + drop_remainder: bool = True + file_type: str = 'tfrecord' + + +@dataclasses.dataclass +class YoloHead(hyperparams.Config): + """Parameterization for the YOLO Head.""" + smart_bias: bool = True + + +@dataclasses.dataclass +class YoloDetectionGenerator(hyperparams.Config): + box_type: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 'original')) + scale_xy: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) + path_scales: FPNConfig = dataclasses.field( + default_factory=_build_path_scales(MIN_LEVEL, MAX_LEVEL)) + nms_type: str = 'greedy' + iou_thresh: float = 0.001 + nms_thresh: float = 0.6 + max_boxes: int = 200 + pre_nms_points: int = 5000 + + +@dataclasses.dataclass +class YoloLoss(hyperparams.Config): + ignore_thresh: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 0.0)) + truth_thresh: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) + box_loss_type: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 'ciou')) + iou_normalizer: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) + cls_normalizer: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) + object_normalizer: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) + max_delta: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, np.inf)) + objectness_smooth: FPNConfig = dataclasses.field( + default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 0.0)) + label_smoothing: float = 0.0 + use_scaled_loss: bool = True + update_on_repeat: bool = True + + +@dataclasses.dataclass +class Box(hyperparams.Config): + box: List[int] = dataclasses.field(default=list) + + +@dataclasses.dataclass +class AnchorBoxes(hyperparams.Config): + boxes: Optional[List[Box]] = None + level_limits: Optional[List[int]] = None + anchors_per_scale: int = 3 + + generate_anchors: bool = False + scaling_mode: str = 'sqrt' + box_generation_mode: str = 'per_level' + num_samples: int = 1024 + + def get(self, min_level, max_level): + """Distribute them in order to each level. + + Args: + min_level: `int` the lowest output level. + max_level: `int` the heighest output level. + Returns: + anchors_per_level: A `Dict[List[int]]` of the anchor boxes for each level. + self.level_limits: A `List[int]` of the box size limits to link to each + level under anchor free conditions. + """ + if self.level_limits is None: + boxes = [box.box for box in self.boxes] + else: + boxes = [[1.0, 1.0]] * ((max_level - min_level) + 1) + self.anchors_per_scale = 1 + + anchors_per_level = dict() + start = 0 + for i in range(min_level, max_level + 1): + anchors_per_level[str(i)] = boxes[start:start + self.anchors_per_scale] + start += self.anchors_per_scale + return anchors_per_level, self.level_limits + + def set_boxes(self, boxes): + self.boxes = [Box(box=box) for box in boxes] + + +@dataclasses.dataclass +class Yolo(hyperparams.Config): + input_size: Optional[List[int]] = dataclasses.field( + default_factory=lambda: [512, 512, 3]) + backbone: backbones.Backbone = backbones.Backbone( + type='darknet', darknet=backbones.Darknet(model_id='cspdarknet53')) + decoder: decoders.Decoder = decoders.Decoder( + type='yolo_decoder', + yolo_decoder=decoders.YoloDecoder(version='v4', type='regular')) + head: YoloHead = YoloHead() + detection_generator: YoloDetectionGenerator = YoloDetectionGenerator() + loss: YoloLoss = YoloLoss() + norm_activation: common.NormActivation = common.NormActivation( + activation='mish', + use_sync_bn=True, + norm_momentum=0.99, + norm_epsilon=0.001) + num_classes: int = 80 + anchor_boxes: AnchorBoxes = AnchorBoxes() + darknet_based_model: bool = False + + +@dataclasses.dataclass +class YoloTask(cfg.TaskConfig): + per_category_metrics: bool = False + smart_bias_lr: float = 0.0 + model: Yolo = Yolo() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False) + weight_decay: float = 0.0 + annotation_file: Optional[str] = None + init_checkpoint: Optional[str] = None + init_checkpoint_modules: Union[ + str, List[str]] = 'all' # all, backbone, and/or decoder + gradient_clip_norm: float = 0.0 + seed = GLOBAL_SEED + + +COCO_INPUT_PATH_BASE = 'coco' +COCO_TRAIN_EXAMPLES = 118287 +COCO_VAL_EXAMPLES = 5000 + + +@exp_factory.register_config_factory('yolo') +def yolo() -> cfg.ExperimentConfig: + """Yolo general config.""" + return cfg.ExperimentConfig( + task=YoloTask(), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + +@exp_factory.register_config_factory('yolo_darknet') +def yolo_darknet() -> cfg.ExperimentConfig: + """COCO object detection with YOLOv3 and v4.""" + train_batch_size = 256 + eval_batch_size = 8 + train_epochs = 300 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + validation_interval = 5 + + max_num_instances = 200 + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=YoloTask( + smart_bias_lr=0.1, + init_checkpoint='', + init_checkpoint_modules='backbone', + annotation_file=None, + weight_decay=0.0, + model=Yolo( + darknet_based_model=True, + norm_activation=common.NormActivation(use_sync_bn=True), + head=YoloHead(smart_bias=True), + loss=YoloLoss(use_scaled_loss=False, update_on_repeat=True), + anchor_boxes=AnchorBoxes( + anchors_per_scale=3, + boxes=[ + Box(box=[12, 16]), + Box(box=[19, 36]), + Box(box=[40, 28]), + Box(box=[36, 75]), + Box(box=[76, 55]), + Box(box=[72, 146]), + Box(box=[142, 110]), + Box(box=[192, 243]), + Box(box=[459, 401]) + ])), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + dtype='float32', + parser=Parser( + letter_box=False, + aug_rand_saturation=1.5, + aug_rand_brightness=1.5, + aug_rand_hue=0.1, + use_tie_breaker=True, + best_match_only=False, + anchor_thresh=0.4, + area_thresh=0.1, + max_num_instances=max_num_instances, + mosaic=Mosaic( + mosaic_frequency=0.75, + mixup_frequency=0.0, + mosaic_crop_mode='crop', + mosaic_center=0.2))), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=True, + dtype='float32', + parser=Parser( + letter_box=False, + use_tie_breaker=True, + best_match_only=False, + anchor_thresh=0.4, + area_thresh=0.1, + max_num_instances=max_num_instances, + ))), + trainer=cfg.TrainerConfig( + train_steps=train_epochs * steps_per_epoch, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + validation_interval=validation_interval * steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'ema': { + 'average_decay': 0.9998, + 'trainable_weights_only': False, + 'dynamic_decay': True, + }, + 'optimizer': { + 'type': 'sgd_torch', + 'sgd_torch': { + 'momentum': 0.949, + 'momentum_start': 0.949, + 'nesterov': True, + 'warmup_steps': 1000, + 'weight_decay': 0.0005, + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + 240 * steps_per_epoch + ], + 'values': [ + 0.00131 * train_batch_size / 64.0, + 0.000131 * train_batch_size / 64.0, + ] + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 1000, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('scaled_yolo') +def scaled_yolo() -> cfg.ExperimentConfig: + """COCO object detection with YOLOv4-csp and v4.""" + train_batch_size = 256 + eval_batch_size = 8 + train_epochs = 300 + warmup_epochs = 3 + + validation_interval = 5 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + + max_num_instances = 300 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=YoloTask( + smart_bias_lr=0.1, + init_checkpoint_modules='', + annotation_file=None, + weight_decay=0.0, + model=Yolo( + darknet_based_model=False, + norm_activation=common.NormActivation( + activation='mish', + use_sync_bn=True, + norm_epsilon=0.001, + norm_momentum=0.97), + head=YoloHead(smart_bias=True), + loss=YoloLoss(use_scaled_loss=True), + anchor_boxes=AnchorBoxes( + anchors_per_scale=3, + boxes=[ + Box(box=[12, 16]), + Box(box=[19, 36]), + Box(box=[40, 28]), + Box(box=[36, 75]), + Box(box=[76, 55]), + Box(box=[72, 146]), + Box(box=[142, 110]), + Box(box=[192, 243]), + Box(box=[459, 401]) + ])), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + dtype='float32', + parser=Parser( + aug_rand_saturation=0.7, + aug_rand_brightness=0.4, + aug_rand_hue=0.015, + letter_box=True, + use_tie_breaker=True, + best_match_only=True, + anchor_thresh=4.0, + random_pad=False, + area_thresh=0.1, + max_num_instances=max_num_instances, + mosaic=Mosaic( + mosaic_crop_mode='scale', + mosaic_frequency=1.0, + mixup_frequency=0.0, + ))), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=True, + dtype='float32', + parser=Parser( + letter_box=True, + use_tie_breaker=True, + best_match_only=True, + anchor_thresh=4.0, + area_thresh=0.1, + max_num_instances=max_num_instances, + ))), + trainer=cfg.TrainerConfig( + train_steps=train_epochs * steps_per_epoch, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + validation_interval=validation_interval * steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=5 * steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'ema': { + 'average_decay': 0.9999, + 'trainable_weights_only': False, + 'dynamic_decay': True, + }, + 'optimizer': { + 'type': 'sgd_torch', + 'sgd_torch': { + 'momentum': 0.937, + 'momentum_start': 0.8, + 'nesterov': True, + 'warmup_steps': steps_per_epoch * warmup_epochs, + 'weight_decay': 0.0005, + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.01, + 'alpha': 0.2, + 'decay_steps': train_epochs * steps_per_epoch, + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': steps_per_epoch * warmup_epochs, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config diff --git a/official/projects/yolo/dataloaders/__init__.py b/official/projects/yolo/dataloaders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 --- /dev/null +++ b/official/projects/yolo/dataloaders/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + diff --git a/official/projects/yolo/dataloaders/classification_input.py b/official/projects/yolo/dataloaders/classification_input.py new file mode 100644 index 0000000000000000000000000000000000000000..e1737dba35417a18de64eb2c9d7a91a1665a5f0d --- /dev/null +++ b/official/projects/yolo/dataloaders/classification_input.py @@ -0,0 +1,92 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Classification decoder and parser.""" +import tensorflow as tf +from official.vision.dataloaders import classification_input +from official.vision.ops import preprocess_ops + + +class Parser(classification_input.Parser): + """Parser to parse an image and its annotations into a dictionary of tensors.""" + + def _parse_train_image(self, decoded_tensors): + """Parses image data for training.""" + image_bytes = decoded_tensors[self._image_field_key] + + if self._decode_jpeg_only: + image_shape = tf.image.extract_jpeg_shape(image_bytes) + + # Crops image. + cropped_image = preprocess_ops.random_crop_image_v2( + image_bytes, image_shape) + image = tf.cond( + tf.reduce_all(tf.equal(tf.shape(cropped_image), image_shape)), + lambda: preprocess_ops.center_crop_image_v2(image_bytes, image_shape), + lambda: cropped_image) + else: + # Decodes image. + image = tf.io.decode_image(image_bytes, channels=3) + image.set_shape([None, None, 3]) + + # Crops image. + cropped_image = preprocess_ops.random_crop_image(image) + + image = tf.cond( + tf.reduce_all(tf.equal(tf.shape(cropped_image), tf.shape(image))), + lambda: preprocess_ops.center_crop_image(image), + lambda: cropped_image) + + if self._aug_rand_hflip: + image = tf.image.random_flip_left_right(image) + + # Resizes image. + image = tf.image.resize( + image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) + image.set_shape([self._output_size[0], self._output_size[1], 3]) + + # Apply autoaug or randaug. + if self._augmenter is not None: + image = self._augmenter.distort(image) + + # Convert image to self._dtype. + image = tf.image.convert_image_dtype(image, self._dtype) + image = image / 255.0 + return image + + def _parse_eval_image(self, decoded_tensors): + """Parses image data for evaluation.""" + image_bytes = decoded_tensors[self._image_field_key] + + if self._decode_jpeg_only: + image_shape = tf.image.extract_jpeg_shape(image_bytes) + + # Center crops. + image = preprocess_ops.center_crop_image_v2(image_bytes, image_shape) + else: + # Decodes image. + image = tf.io.decode_image(image_bytes, channels=3) + image.set_shape([None, None, 3]) + + # Center crops. + image = preprocess_ops.center_crop_image(image) + + image = tf.image.resize( + image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) + image.set_shape([self._output_size[0], self._output_size[1], 3]) + + # Convert image to self._dtype. + image = tf.image.convert_image_dtype(image, self._dtype) + image = image / 255.0 + return image diff --git a/official/projects/yolo/dataloaders/tf_example_decoder.py b/official/projects/yolo/dataloaders/tf_example_decoder.py new file mode 100644 index 0000000000000000000000000000000000000000..8578b663f5c28e9872b9fd37a60febb8909fd380 --- /dev/null +++ b/official/projects/yolo/dataloaders/tf_example_decoder.py @@ -0,0 +1,119 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tensorflow Example proto decoder for object detection. + +A decoder to decode string tensors containing serialized tensorflow.Example +protos for object detection. +""" +import tensorflow as tf + +from official.vision.dataloaders import tf_example_decoder + + +def _coco91_to_80(classif, box, areas, iscrowds): + """Function used to reduce COCO 91 to COCO 80 (2017 to 2014 format).""" + # Vector where index i coralates to the class at index[i]. + class_ids = [ + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, + 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, + 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, + 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, + 86, 87, 88, 89, 90 + ] + new_classes = tf.expand_dims(tf.convert_to_tensor(class_ids), axis=0) + + # Resahpe the classes to in order to build a class mask. + classes = tf.expand_dims(classif, axis=-1) + + # One hot the classificiations to match the 80 class format. + ind = classes == tf.cast(new_classes, classes.dtype) + + # Select the max values. + selected_class = tf.reshape( + tf.math.argmax(tf.cast(ind, tf.float32), axis=-1), [-1]) + ind = tf.where(tf.reduce_any(ind, axis=-1)) + + # Gather the valuable instances. + classif = tf.gather_nd(selected_class, ind) + box = tf.gather_nd(box, ind) + areas = tf.gather_nd(areas, ind) + iscrowds = tf.gather_nd(iscrowds, ind) + + # Restate the number of viable detections, ideally it should be the same. + num_detections = tf.shape(classif)[0] + return classif, box, areas, iscrowds, num_detections + + +class TfExampleDecoder(tf_example_decoder.TfExampleDecoder): + """Tensorflow Example proto decoder.""" + + def __init__(self, + coco91_to_80=None, + include_mask=False, + regenerate_source_id=False, + mask_binarize_threshold=None): + """Initialize the example decoder. + + Args: + coco91_to_80: `bool` indicating whether to convert coco from its 91 class + format to the 80 class format. + include_mask: `bool` indicating if the decoder should also decode instance + masks for instance segmentation. + regenerate_source_id: `bool` indicating if the source id needs to be + recreated for each image sample. + mask_binarize_threshold: `float` for binarizing mask values. + """ + if coco91_to_80 and include_mask: + raise ValueError('If masks are included you cannot convert coco from the' + '91 class format to the 80 class format.') + + self._coco91_to_80 = coco91_to_80 + super().__init__( + include_mask=include_mask, + regenerate_source_id=regenerate_source_id, + mask_binarize_threshold=mask_binarize_threshold) + + def decode(self, serialized_example): + """Decode the serialized example. + + Args: + serialized_example: a single serialized tf.Example string. + + Returns: + decoded_tensors: a dictionary of tensors with the following fields: + - source_id: a string scalar tensor. + - image: a uint8 tensor of shape [None, None, 3]. + - height: an integer scalar tensor. + - width: an integer scalar tensor. + - groundtruth_classes: a int64 tensor of shape [None]. + - groundtruth_is_crowd: a bool tensor of shape [None]. + - groundtruth_area: a float32 tensor of shape [None]. + - groundtruth_boxes: a float32 tensor of shape [None, 4]. + - groundtruth_instance_masks: a float32 tensor of shape + [None, None, None]. + - groundtruth_instance_masks_png: a string tensor of shape [None]. + """ + decoded_tensors = super().decode(serialized_example) + + if self._coco91_to_80: + (decoded_tensors['groundtruth_classes'], + decoded_tensors['groundtruth_boxes'], + decoded_tensors['groundtruth_area'], + decoded_tensors['groundtruth_is_crowd'], + _) = _coco91_to_80(decoded_tensors['groundtruth_classes'], + decoded_tensors['groundtruth_boxes'], + decoded_tensors['groundtruth_area'], + decoded_tensors['groundtruth_is_crowd']) + return decoded_tensors diff --git a/official/vision/beta/projects/yolo/dataloaders/yolo_input.py b/official/projects/yolo/dataloaders/yolo_input.py old mode 100755 new mode 100644 similarity index 97% rename from official/vision/beta/projects/yolo/dataloaders/yolo_input.py rename to official/projects/yolo/dataloaders/yolo_input.py index 112fc1bf0559fb53ef3aa2343e7a23bdbe700701..bf59606b6743f7959c15b70adc8659527c71b6e1 --- a/official/vision/beta/projects/yolo/dataloaders/yolo_input.py +++ b/official/projects/yolo/dataloaders/yolo_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,12 +15,12 @@ """Detection Data parser and processing for YOLO.""" import tensorflow as tf -from official.vision.beta.dataloaders import parser -from official.vision.beta.dataloaders import utils -from official.vision.beta.ops import box_ops as bbox_ops -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.projects.yolo.ops import anchor -from official.vision.beta.projects.yolo.ops import preprocessing_ops +from official.projects.yolo.ops import anchor +from official.projects.yolo.ops import preprocessing_ops +from official.vision.dataloaders import parser +from official.vision.dataloaders import utils +from official.vision.ops import box_ops as bbox_ops +from official.vision.ops import preprocess_ops class Parser(parser.Parser): diff --git a/official/projects/yolo/losses/__init__.py b/official/projects/yolo/losses/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/losses/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/yolo/losses/yolo_loss.py b/official/projects/yolo/losses/yolo_loss.py old mode 100755 new mode 100644 similarity index 99% rename from official/vision/beta/projects/yolo/losses/yolo_loss.py rename to official/projects/yolo/losses/yolo_loss.py index aac117bdf58ae8eb3e38bd6cddc1ebb65d141565..f917b2fe476fc54d59ff12e5bed054bc654f9cb4 --- a/official/vision/beta/projects/yolo/losses/yolo_loss.py +++ b/official/projects/yolo/losses/yolo_loss.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,9 +19,9 @@ import functools import tensorflow as tf -from official.vision.beta.projects.yolo.ops import box_ops -from official.vision.beta.projects.yolo.ops import loss_utils -from official.vision.beta.projects.yolo.ops import math_ops +from official.projects.yolo.ops import box_ops +from official.projects.yolo.ops import loss_utils +from official.projects.yolo.ops import math_ops class YoloLossBase(object, metaclass=abc.ABCMeta): @@ -323,7 +323,7 @@ class DarknetLoss(YoloLossBase): grid_points = tf.stop_gradient(grid_points) anchor_grid = tf.stop_gradient(anchor_grid) - # Split all the ground truths to use as seperate items in loss computation. + # Split all the ground truths to use as separate items in loss computation. (true_box, ind_mask, true_class) = tf.split(y_true, [4, 1, 1], axis=-1) true_conf = tf.squeeze(true_conf, axis=-1) true_class = tf.squeeze(true_class, axis=-1) diff --git a/official/vision/beta/projects/yolo/losses/yolo_loss_test.py b/official/projects/yolo/losses/yolo_loss_test.py old mode 100755 new mode 100644 similarity index 95% rename from official/vision/beta/projects/yolo/losses/yolo_loss_test.py rename to official/projects/yolo/losses/yolo_loss_test.py index b94901812697ac3b0ee3a845792c778e93c043e9..28ba20ffa0fd3e9d25c9fa3de95badd8ca624ded --- a/official/vision/beta/projects/yolo/losses/yolo_loss_test.py +++ b/official/projects/yolo/losses/yolo_loss_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.projects.yolo.losses import yolo_loss +from official.projects.yolo.losses import yolo_loss class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/yolo/modeling/__init__.py b/official/projects/yolo/modeling/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/modeling/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/yolo/modeling/backbones/__init__.py b/official/projects/yolo/modeling/backbones/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/modeling/backbones/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/yolo/modeling/backbones/darknet.py b/official/projects/yolo/modeling/backbones/darknet.py similarity index 99% rename from official/vision/beta/projects/yolo/modeling/backbones/darknet.py rename to official/projects/yolo/modeling/backbones/darknet.py index 7adcb0960565ce9a9af632a76b1eacfc0eca7101..bff572f07405c48de27b90675ebb13e255269f5a 100644 --- a/official/vision/beta/projects/yolo/modeling/backbones/darknet.py +++ b/official/projects/yolo/modeling/backbones/darknet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -36,11 +36,12 @@ Darknets are used mainly for object detection in: """ import collections + import tensorflow as tf from official.modeling import hyperparams -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.projects.yolo.modeling.layers import nn_blocks +from official.projects.yolo.modeling.layers import nn_blocks +from official.vision.modeling.backbones import factory class BlockConfig: diff --git a/official/vision/beta/projects/yolo/modeling/backbones/darknet_test.py b/official/projects/yolo/modeling/backbones/darknet_test.py similarity index 96% rename from official/vision/beta/projects/yolo/modeling/backbones/darknet_test.py rename to official/projects/yolo/modeling/backbones/darknet_test.py index 9441b06a31162a6b3d52886b9764e9cb85858f26..61eb718910ffe661c4fd139cc6eec732c3023bf7 100644 --- a/official/vision/beta/projects/yolo/modeling/backbones/darknet_test.py +++ b/official/projects/yolo/modeling/backbones/darknet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for yolo.""" from absl.testing import parameterized @@ -21,7 +20,7 @@ import tensorflow as tf from tensorflow.python.distribute import combinations from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.projects.yolo.modeling.backbones import darknet +from official.projects.yolo.modeling.backbones import darknet class DarknetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/yolo/modeling/decoders/__init__.py b/official/projects/yolo/modeling/decoders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/modeling/decoders/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder.py b/official/projects/yolo/modeling/decoders/yolo_decoder.py similarity index 98% rename from official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder.py rename to official/projects/yolo/modeling/decoders/yolo_decoder.py index 8aaab13f6d6f5d6453ea1a67bde00e6e9dbb3115..5ce0c01610e94c33dfe48aa56c8809a6db89fafb 100644 --- a/official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder.py +++ b/official/projects/yolo/modeling/decoders/yolo_decoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,13 +13,13 @@ # limitations under the License. """Feature Pyramid Network and Path Aggregation variants used in YOLO.""" -from typing import Mapping, Union, Optional +from typing import Mapping, Optional, Union import tensorflow as tf from official.modeling import hyperparams -from official.vision.beta.modeling.decoders import factory -from official.vision.beta.projects.yolo.modeling.layers import nn_blocks +from official.projects.yolo.modeling.layers import nn_blocks +from official.vision.modeling.decoders import factory # model configurations # the structure is as follows. model version, {v3, v4, v#, ... etc} @@ -613,7 +613,7 @@ def build_yolo_decoder( '{yolo_model.YOLO_MODELS[decoder_cfg.version].keys()}' 'or specify a custom decoder config using YoloDecoder.') - base_model = YOLO_MODELS[decoder_cfg.version][decoder_cfg.type] + base_model = YOLO_MODELS[decoder_cfg.version][decoder_cfg.type].copy() cfg_dict = decoder_cfg.as_dict() for key in base_model: diff --git a/official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder_test.py b/official/projects/yolo/modeling/decoders/yolo_decoder_test.py similarity index 96% rename from official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder_test.py rename to official/projects/yolo/modeling/decoders/yolo_decoder_test.py index 611c458594566b70ab538edac8a71371da812add..4ab47e95ca98e46d3892bffe26a165372278e39a 100644 --- a/official/vision/beta/projects/yolo/modeling/decoders/yolo_decoder_test.py +++ b/official/projects/yolo/modeling/decoders/yolo_decoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for YOLO.""" # Import libraries @@ -21,7 +20,7 @@ import tensorflow as tf from tensorflow.python.distribute import combinations from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.projects.yolo.modeling.decoders import yolo_decoder as decoders +from official.projects.yolo.modeling.decoders import yolo_decoder as decoders class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/yolo/modeling/factory.py b/official/projects/yolo/modeling/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..d18b8aa9fdb0427c3baab10270867f300f1e3930 --- /dev/null +++ b/official/projects/yolo/modeling/factory.py @@ -0,0 +1,95 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common factory functions yolo neural networks.""" + +from absl import logging + +from official.projects.yolo.configs import yolo +from official.projects.yolo.modeling import yolo_model +from official.projects.yolo.modeling.heads import yolo_head +from official.projects.yolo.modeling.layers import detection_generator +from official.vision.modeling.backbones import factory as backbone_factory +from official.vision.modeling.decoders import factory as decoder_factory + + +def build_yolo_detection_generator(model_config: yolo.Yolo, anchor_boxes): + """Builds yolo detection generator.""" + model = detection_generator.YoloLayer( + classes=model_config.num_classes, + anchors=anchor_boxes, + iou_thresh=model_config.detection_generator.iou_thresh, + nms_thresh=model_config.detection_generator.nms_thresh, + max_boxes=model_config.detection_generator.max_boxes, + pre_nms_points=model_config.detection_generator.pre_nms_points, + nms_type=model_config.detection_generator.nms_type, + box_type=model_config.detection_generator.box_type.get(), + path_scale=model_config.detection_generator.path_scales.get(), + scale_xy=model_config.detection_generator.scale_xy.get(), + label_smoothing=model_config.loss.label_smoothing, + use_scaled_loss=model_config.loss.use_scaled_loss, + update_on_repeat=model_config.loss.update_on_repeat, + truth_thresh=model_config.loss.truth_thresh.get(), + loss_type=model_config.loss.box_loss_type.get(), + max_delta=model_config.loss.max_delta.get(), + iou_normalizer=model_config.loss.iou_normalizer.get(), + cls_normalizer=model_config.loss.cls_normalizer.get(), + object_normalizer=model_config.loss.object_normalizer.get(), + ignore_thresh=model_config.loss.ignore_thresh.get(), + objectness_smooth=model_config.loss.objectness_smooth.get()) + return model + + +def build_yolo_head(input_specs, model_config: yolo.Yolo, l2_regularization): + """Builds yolo head.""" + min_level = min(map(int, input_specs.keys())) + max_level = max(map(int, input_specs.keys())) + head = yolo_head.YoloHead( + min_level=min_level, + max_level=max_level, + classes=model_config.num_classes, + boxes_per_level=model_config.anchor_boxes.anchors_per_scale, + norm_momentum=model_config.norm_activation.norm_momentum, + norm_epsilon=model_config.norm_activation.norm_epsilon, + kernel_regularizer=l2_regularization, + smart_bias=model_config.head.smart_bias) + return head + + +def build_yolo(input_specs, model_config, l2_regularization): + """Builds yolo model.""" + backbone = model_config.backbone.get() + anchor_dict, _ = model_config.anchor_boxes.get( + backbone.min_level, backbone.max_level) + backbone = backbone_factory.build_backbone(input_specs, model_config.backbone, + model_config.norm_activation, + l2_regularization) + decoder = decoder_factory.build_decoder(backbone.output_specs, model_config, + l2_regularization) + + head = build_yolo_head(decoder.output_specs, model_config, l2_regularization) + detection_generator_obj = build_yolo_detection_generator(model_config, + anchor_dict) + + model = yolo_model.Yolo( + backbone=backbone, + decoder=decoder, + head=head, + detection_generator=detection_generator_obj) + model.build(input_specs.shape) + + model.summary(print_fn=logging.info) + + losses = detection_generator_obj.get_losses() + return model, losses diff --git a/official/projects/yolo/modeling/heads/__init__.py b/official/projects/yolo/modeling/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/modeling/heads/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/yolo/modeling/heads/yolo_head.py b/official/projects/yolo/modeling/heads/yolo_head.py similarity index 97% rename from official/vision/beta/projects/yolo/modeling/heads/yolo_head.py rename to official/projects/yolo/modeling/heads/yolo_head.py index 23d41a045e83ced984db7e8654ebb64618b8f16e..27141799435199baf1a9de085e5352d359ef41d0 100644 --- a/official/vision/beta/projects/yolo/modeling/heads/yolo_head.py +++ b/official/projects/yolo/modeling/heads/yolo_head.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,7 @@ """Yolo heads.""" import tensorflow as tf -from official.vision.beta.projects.yolo.modeling.layers import nn_blocks +from official.projects.yolo.modeling.layers import nn_blocks class YoloHead(tf.keras.layers.Layer): diff --git a/official/vision/beta/projects/yolo/modeling/heads/yolo_head_test.py b/official/projects/yolo/modeling/heads/yolo_head_test.py similarity index 93% rename from official/vision/beta/projects/yolo/modeling/heads/yolo_head_test.py rename to official/projects/yolo/modeling/heads/yolo_head_test.py index 8c5414e5d849aafbb503dc293c1769e0ec1b9455..d96ef96331d4e4f618d78222bbf9db874130c966 100644 --- a/official/vision/beta/projects/yolo/modeling/heads/yolo_head_test.py +++ b/official/projects/yolo/modeling/heads/yolo_head_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for yolo heads.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.projects.yolo.modeling.heads import yolo_head as heads +from official.projects.yolo.modeling.heads import yolo_head as heads class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/yolo/modeling/layers/__init__.py b/official/projects/yolo/modeling/layers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/modeling/layers/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/yolo/modeling/layers/detection_generator.py b/official/projects/yolo/modeling/layers/detection_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..afb6fcd0eb557939987a1002a76d1be4edf0a85b --- /dev/null +++ b/official/projects/yolo/modeling/layers/detection_generator.py @@ -0,0 +1,307 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common building blocks for yolo layer (detection layer).""" +import tensorflow as tf + +from official.projects.yolo.losses import yolo_loss +from official.projects.yolo.ops import box_ops +from official.projects.yolo.ops import loss_utils +from official.vision.modeling.layers import detection_generator + + +class YoloLayer(tf.keras.Model): + """Yolo layer (detection generator).""" + + def __init__(self, + anchors, + classes, + iou_thresh=0.0, + ignore_thresh=0.7, + truth_thresh=1.0, + nms_thresh=0.6, + max_delta=10.0, + loss_type='ciou', + iou_normalizer=1.0, + cls_normalizer=1.0, + object_normalizer=1.0, + use_scaled_loss=False, + update_on_repeat=False, + pre_nms_points=5000, + label_smoothing=0.0, + max_boxes=200, + box_type='original', + path_scale=None, + scale_xy=None, + nms_type='greedy', + objectness_smooth=False, + **kwargs): + """Parameters for the loss functions used at each detection head output. + + Args: + anchors: `List[List[int]]` for the anchor boxes that are used in the + model. + classes: `int` for the number of classes. + iou_thresh: `float` to use many anchors per object if IoU(Obj, Anchor) > + iou_thresh. + ignore_thresh: `float` for the IOU value over which the loss is not + propagated, and a detection is assumed to have been made. + truth_thresh: `float` for the IOU value over which the loss is propagated + despite a detection being made'. + nms_thresh: `float` for the minimum IOU value for an overlap. + max_delta: gradient clipping to apply to the box loss. + loss_type: `str` for the typeof iou loss to use with in {ciou, diou, + giou, iou}. + iou_normalizer: `float` for how much to scale the loss on the IOU or the + boxes. + cls_normalizer: `float` for how much to scale the loss on the classes. + object_normalizer: `float` for how much to scale loss on the detection + map. + use_scaled_loss: `bool` for whether to use the scaled loss + or the traditional loss. + update_on_repeat: `bool` indicating how you would like to handle repeated + indexes in a given [j, i] index. Setting this to True will give more + consistent MAP, setting it to falls will improve recall by 1-2% but will + sacrifice some MAP. + pre_nms_points: `int` number of top candidate detections per class before + NMS. + label_smoothing: `float` for how much to smooth the loss on the classes. + max_boxes: `int` for the maximum number of boxes retained over all + classes. + box_type: `str`, there are 3 different box types that will affect training + differently {original, scaled and anchor_free}. The original method + decodes the boxes by applying an exponential to the model width and + height maps, then scaling the maps by the anchor boxes. This method is + used in Yolo-v4, Yolo-v3, and all its counterparts. The Scale method + squares the width and height and scales both by a fixed factor of 4. + This method is used in the Scale Yolo models, as well as Yolov4-CSP. + Finally, anchor_free is like the original method but will not apply an + activation function to the boxes, this is used for some of the newer + anchor free versions of YOLO. + path_scale: `dict` for the size of the input tensors. Defaults to + precalulated values from the `mask`. + scale_xy: dictionary `float` values inidcating how far each pixel can see + outside of its containment of 1.0. a value of 1.2 indicates there is a + 20% extended radius around each pixel that this specific pixel can + predict values for a center at. the center can range from 0 - value/2 + to 1 + value/2, this value is set in the yolo filter, and resused here. + there should be one value for scale_xy for each level from min_level to + max_level. + nms_type: `str` for which non max suppression to use. + objectness_smooth: `float` for how much to smooth the loss on the + detection map. + **kwargs: Addtional keyword arguments. + """ + super().__init__(**kwargs) + self._anchors = anchors + self._thresh = iou_thresh + self._ignore_thresh = ignore_thresh + self._truth_thresh = truth_thresh + self._iou_normalizer = iou_normalizer + self._cls_normalizer = cls_normalizer + self._object_normalizer = object_normalizer + self._objectness_smooth = objectness_smooth + self._nms_thresh = nms_thresh + self._max_boxes = max_boxes + self._max_delta = max_delta + self._classes = classes + self._loss_type = loss_type + + self._use_scaled_loss = use_scaled_loss + self._update_on_repeat = update_on_repeat + + self._pre_nms_points = pre_nms_points + self._label_smoothing = label_smoothing + + self._keys = list(anchors.keys()) + self._len_keys = len(self._keys) + self._box_type = box_type + self._path_scale = path_scale or {key: 2**int(key) for key in self._keys} + + self._nms_type = nms_type + self._scale_xy = scale_xy or {key: 1.0 for key, _ in anchors.items()} + + self._generator = {} + self._len_mask = {} + for key in self._keys: + anchors = self._anchors[key] + self._generator[key] = loss_utils.GridGenerator( + anchors, scale_anchors=self._path_scale[key]) + self._len_mask[key] = len(anchors) + return + + def parse_prediction_path(self, key, inputs): + shape_ = tf.shape(inputs) + shape = inputs.get_shape().as_list() + batchsize, height, width = shape_[0], shape[1], shape[2] + + if height is None or width is None: + height, width = shape_[1], shape_[2] + + generator = self._generator[key] + len_mask = self._len_mask[key] + scale_xy = self._scale_xy[key] + + # reshape the yolo output to (batchsize, + # width, + # height, + # number_anchors, + # remaining_points) + data = tf.reshape(inputs, [-1, height, width, len_mask, self._classes + 5]) + + # use the grid generator to get the formatted anchor boxes and grid points + # in shape [1, height, width, 2] + centers, anchors = generator(height, width, batchsize, dtype=data.dtype) + + # split the yolo detections into boxes, object score map, classes + boxes, obns_scores, class_scores = tf.split( + data, [4, 1, self._classes], axis=-1) + + # determine the number of classes + classes = class_scores.get_shape().as_list()[-1] + + # configurable to use the new coordinates in scaled Yolo v4 or not + _, _, boxes = loss_utils.get_predicted_box( + tf.cast(height, data.dtype), + tf.cast(width, data.dtype), + boxes, + anchors, + centers, + scale_xy, + stride=self._path_scale[key], + darknet=False, + box_type=self._box_type[key]) + + # convert boxes from yolo(x, y, w. h) to tensorflow(ymin, xmin, ymax, xmax) + boxes = box_ops.xcycwh_to_yxyx(boxes) + + # activate and detection map + obns_scores = tf.math.sigmoid(obns_scores) + + # convert detection map to class detection probabailities + class_scores = tf.math.sigmoid(class_scores) * obns_scores + + # platten predictions to [batchsize, N, -1] for non max supression + fill = height * width * len_mask + boxes = tf.reshape(boxes, [-1, fill, 4]) + class_scores = tf.reshape(class_scores, [-1, fill, classes]) + obns_scores = tf.reshape(obns_scores, [-1, fill]) + return obns_scores, boxes, class_scores + + def call(self, inputs): + boxes = [] + class_scores = [] + object_scores = [] + levels = list(inputs.keys()) + min_level = int(min(levels)) + max_level = int(max(levels)) + + # aggregare boxes over each scale + for i in range(min_level, max_level + 1): + key = str(i) + object_scores_, boxes_, class_scores_ = self.parse_prediction_path( + key, inputs[key]) + boxes.append(boxes_) + class_scores.append(class_scores_) + object_scores.append(object_scores_) + + # colate all predicitons + boxes = tf.concat(boxes, axis=1) + object_scores = tf.concat(object_scores, axis=1) + class_scores = tf.concat(class_scores, axis=1) + + # get masks to threshold all the predicitons + object_mask = tf.cast(object_scores > self._thresh, object_scores.dtype) + class_mask = tf.cast(class_scores > self._thresh, class_scores.dtype) + + # apply thresholds mask to all the predicitons + object_scores *= object_mask + class_scores *= (tf.expand_dims(object_mask, axis=-1) * class_mask) + + # apply nms + if self._nms_type == 'greedy': + # greedy NMS + boxes = tf.cast(boxes, dtype=tf.float32) + class_scores = tf.cast(class_scores, dtype=tf.float32) + boxes, object_scores_, class_scores, num_detections = ( + tf.image.combined_non_max_suppression( + tf.expand_dims(boxes, axis=-2), + class_scores, + self._pre_nms_points, + self._max_boxes, + iou_threshold=self._nms_thresh, + score_threshold=self._thresh)) + # cast the boxes and predicitons abck to original datatype + boxes = tf.cast(boxes, object_scores.dtype) + class_scores = tf.cast(class_scores, object_scores.dtype) + object_scores = tf.cast(object_scores_, object_scores.dtype) + else: + # TPU NMS + boxes = tf.cast(boxes, dtype=tf.float32) + class_scores = tf.cast(class_scores, dtype=tf.float32) + (boxes, confidence, classes, + num_detections) = detection_generator._generate_detections_v2( # pylint:disable=protected-access + tf.expand_dims(boxes, axis=-2), + class_scores, + pre_nms_top_k=self._pre_nms_points, + max_num_detections=self._max_boxes, + nms_iou_threshold=self._nms_thresh, + pre_nms_score_threshold=self._thresh) + boxes = tf.cast(boxes, object_scores.dtype) + class_scores = tf.cast(classes, object_scores.dtype) + object_scores = tf.cast(confidence, object_scores.dtype) + + # format and return + return { + 'bbox': boxes, + 'classes': class_scores, + 'confidence': object_scores, + 'num_detections': num_detections, + } + + def get_losses(self): + """Generates a dictionary of losses to apply to each path. + + Done in the detection generator because all parameters are the same + across both loss and detection generator. + + Returns: + Dict[str, tf.Tensor] of losses + """ + loss = yolo_loss.YoloLoss( + keys=self._keys, + classes=self._classes, + anchors=self._anchors, + path_strides=self._path_scale, + truth_thresholds=self._truth_thresh, + ignore_thresholds=self._ignore_thresh, + loss_types=self._loss_type, + iou_normalizers=self._iou_normalizer, + cls_normalizers=self._cls_normalizer, + object_normalizers=self._object_normalizer, + objectness_smooths=self._objectness_smooth, + box_types=self._box_type, + max_deltas=self._max_delta, + scale_xys=self._scale_xy, + use_scaled_loss=self._use_scaled_loss, + update_on_repeat=self._update_on_repeat, + label_smoothing=self._label_smoothing) + return loss + + def get_config(self): + return { + 'anchors': [list(a) for a in self._anchors], + 'thresh': self._thresh, + 'max_boxes': self._max_boxes, + } diff --git a/official/projects/yolo/modeling/layers/detection_generator_test.py b/official/projects/yolo/modeling/layers/detection_generator_test.py new file mode 100644 index 0000000000000000000000000000000000000000..bed13cc660cf800304850732104bfaf86db5674a --- /dev/null +++ b/official/projects/yolo/modeling/layers/detection_generator_test.py @@ -0,0 +1,61 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for yolo detection generator.""" +from absl.testing import parameterized +import tensorflow as tf + +from official.projects.yolo.modeling.layers import detection_generator as dg + + +class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (True), + (False), + ) + def test_network_creation(self, nms): + """Test creation of ResNet family models.""" + tf.keras.backend.set_image_data_format('channels_last') + input_shape = { + '3': [1, 52, 52, 255], + '4': [1, 26, 26, 255], + '5': [1, 13, 13, 255] + } + classes = 80 + anchors = { + '3': [[12.0, 19.0], [31.0, 46.0], [96.0, 54.0]], + '4': [[46.0, 114.0], [133.0, 127.0], [79.0, 225.0]], + '5': [[301.0, 150.0], [172.0, 286.0], [348.0, 340.0]] + } + + box_type = {key: 'scaled' for key in anchors.keys()} + + layer = dg.YoloLayer(anchors, classes, box_type=box_type, max_boxes=10) + + inputs = {} + for key in input_shape: + inputs[key] = tf.ones(input_shape[key], dtype=tf.float32) + + endpoints = layer(inputs) + + boxes = endpoints['bbox'] + classes = endpoints['classes'] + + self.assertAllEqual(boxes.shape.as_list(), [1, 10, 4]) + self.assertAllEqual(classes.shape.as_list(), [1, 10]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/yolo/modeling/layers/nn_blocks.py b/official/projects/yolo/modeling/layers/nn_blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..91f956f33084895a3d49d0d6878712ab8524ce5f --- /dev/null +++ b/official/projects/yolo/modeling/layers/nn_blocks.py @@ -0,0 +1,1718 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common building blocks for yolo neural networks.""" +from typing import Callable, List, Tuple + +import tensorflow as tf + +from official.modeling import tf_utils +from official.vision.ops import spatial_transform_ops + + +class Identity(tf.keras.layers.Layer): + + def call(self, inputs): + return inputs + + +class ConvBN(tf.keras.layers.Layer): + """ConvBN block. + + Modified Convolution layer to match that of the Darknet Library. + The Layer is a standards combination of Conv BatchNorm Activation, + however, the use of bias in the conv is determined by the use of batch + normalization. + Cross Stage Partial networks (CSPNets) were proposed in: + [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, + Ping-Yang Chen, Jun-Wei Hsieh + CSPNet: A New Backbone that can Enhance Learning Capability of CNN. + arXiv:1911.11929 + """ + + def __init__(self, + filters=1, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + dilation_rate=(1, 1), + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + use_separable_conv=False, + use_bn=True, + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + activation='leaky', + leaky_alpha=0.1, + **kwargs): + """ConvBN initializer. + + Args: + filters: integer for output depth, or the number of features to learn. + kernel_size: integer or tuple for the shape of the weight matrix or kernel + to learn. + strides: integer of tuple how much to move the kernel after each kernel + use. + padding: string 'valid' or 'same', if same, then pad the image, else do + not. + dilation_rate: tuple to indicate how much to modulate kernel weights and + how many pixels in a feature map to skip. + kernel_initializer: string to indicate which function to use to initialize + weights. + bias_initializer: string to indicate which function to use to initialize + bias. + bias_regularizer: string to indicate which function to use to regularizer + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + use_separable_conv: `bool` wether to use separable convs. + use_bn: boolean for whether to use batch normalization. + use_sync_bn: boolean for whether sync batch normalization statistics + of all batch norm layers to the models global statistics + (across all input batches). + norm_momentum: float for moment to use for batch normalization. + norm_epsilon: float for batch normalization epsilon. + activation: string or None for activation function to use in layer, + if None activation is replaced by linear. + leaky_alpha: float to use as alpha if activation function is leaky. + **kwargs: Keyword Arguments. + """ + + # convolution params + self._filters = filters + self._kernel_size = kernel_size + self._strides = strides + self._padding = padding + self._dilation_rate = dilation_rate + + if kernel_initializer == 'VarianceScaling': + # to match pytorch initialization method + self._kernel_initializer = tf.keras.initializers.VarianceScaling( + scale=1 / 3, mode='fan_in', distribution='uniform') + else: + self._kernel_initializer = kernel_initializer + + self._bias_initializer = bias_initializer + self._kernel_regularizer = kernel_regularizer + + self._bias_regularizer = bias_regularizer + + # batch normalization params + self._use_bn = use_bn + self._use_separable_conv = use_separable_conv + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + + ksize = self._kernel_size + if not isinstance(ksize, List) and not isinstance(ksize, Tuple): + ksize = [ksize] + if use_separable_conv and not all([a == 1 for a in ksize]): + self._conv_base = tf.keras.layers.SeparableConv2D + else: + self._conv_base = tf.keras.layers.Conv2D + + if use_sync_bn: + self._bn_base = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._bn_base = tf.keras.layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + # format: (batch_size, height, width, channels) + self._bn_axis = -1 + else: + # format: (batch_size, channels, width, height) + self._bn_axis = 1 + + # activation params + self._activation = activation + self._leaky_alpha = leaky_alpha + self._fuse = False + + super().__init__(**kwargs) + + def build(self, input_shape): + use_bias = not self._use_bn + + self.conv = self._conv_base( + filters=self._filters, + kernel_size=self._kernel_size, + strides=self._strides, + padding=self._padding, + dilation_rate=self._dilation_rate, + use_bias=use_bias, + kernel_initializer=self._kernel_initializer, + bias_initializer=self._bias_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + + if self._use_bn: + self.bn = self._bn_base( + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + axis=self._bn_axis) + else: + self.bn = None + + if self._activation == 'leaky': + self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) + elif self._activation == 'mish': + self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) + else: + self._activation_fn = tf_utils.get_activation(self._activation) + + def call(self, x): + x = self.conv(x) + if self._use_bn and not self._fuse: + x = self.bn(x) + x = self._activation_fn(x) + return x + + def fuse(self): + if self.bn is not None and not self._use_separable_conv: + # Fuse convolution and batchnorm, gives me +2 to 3 FPS 2ms latency. + # layers: https://tehnokv.com/posts/fusing-batchnorm-and-conv/ + if self._fuse: + return + + self._fuse = True + conv_weights = self.conv.get_weights()[0] + gamma, beta, moving_mean, moving_variance = self.bn.get_weights() + + self.conv.use_bias = True + infilters = conv_weights.shape[-2] + self.conv.build([None, None, None, infilters]) + + base = tf.sqrt(self._norm_epsilon + moving_variance) + w_conv_base = tf.transpose(conv_weights, perm=(3, 2, 0, 1)) + w_conv = tf.reshape(w_conv_base, [conv_weights.shape[-1], -1]) + + w_bn = tf.linalg.diag(gamma / base) + w_conv = tf.reshape(tf.matmul(w_bn, w_conv), w_conv_base.get_shape()) + w_conv = tf.transpose(w_conv, perm=(2, 3, 1, 0)) + + b_bn = beta - gamma * moving_mean / base + + self.conv.set_weights([w_conv, b_bn]) + del self.bn + + self.trainable = False + self.conv.trainable = False + self.bn = None + return + + def get_config(self): + # used to store/share parameters to reconstruct the model + layer_config = { + 'filters': self._filters, + 'kernel_size': self._kernel_size, + 'strides': self._strides, + 'padding': self._padding, + 'dilation_rate': self._dilation_rate, + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'bias_regularizer': self._bias_regularizer, + 'kernel_regularizer': self._kernel_regularizer, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'activation': self._activation, + 'leaky_alpha': self._leaky_alpha + } + layer_config.update(super().get_config()) + return layer_config + + +class DarkResidual(tf.keras.layers.Layer): + """Darknet block with Residual connection for Yolo v3 Backbone.""" + + def __init__(self, + filters=1, + filter_scale=2, + dilation_rate=1, + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + kernel_regularizer=None, + bias_regularizer=None, + use_bn=True, + use_sync_bn=False, + use_separable_conv=False, + norm_momentum=0.99, + norm_epsilon=0.001, + activation='leaky', + leaky_alpha=0.1, + sc_activation='linear', + downsample=False, + **kwargs): + """Dark Residual initializer. + + Args: + filters: integer for output depth, or the number of features to learn. + filter_scale: `int` for filter scale. + dilation_rate: tuple to indicate how much to modulate kernel weights and + how many pixels in a feature map to skip. + kernel_initializer: string to indicate which function to use to initialize + weights. + bias_initializer: string to indicate which function to use to initialize + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + bias_regularizer: string to indicate which function to use to regularizer + bias. + use_bn: boolean for whether to use batch normalization. + use_sync_bn: boolean for whether sync batch normalization statistics. + of all batch norm layers to the models global statistics + (across all input batches). + use_separable_conv: `bool` wether to use separable convs. + norm_momentum: float for moment to use for batch normalization. + norm_epsilon: float for batch normalization epsilon. + activation: string or None for activation function to use in layer, + if None activation is replaced by linear. + leaky_alpha: float to use as alpha if activation function is leaky. + sc_activation: string for activation function to use in layer. + downsample: boolean for if image input is larger than layer output, set + downsample to True so the dimensions are forced to match. + **kwargs: Keyword Arguments. + """ + + # downsample + self._downsample = downsample + + # ConvBN params + self._filters = filters + self._filter_scale = filter_scale + self._kernel_initializer = kernel_initializer + self._bias_initializer = bias_initializer + self._bias_regularizer = bias_regularizer + self._use_bn = use_bn + self._use_sync_bn = use_sync_bn + self._use_separable_conv = use_separable_conv + self._kernel_regularizer = kernel_regularizer + + # normal params + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._dilation_rate = dilation_rate if isinstance(dilation_rate, + int) else dilation_rate[0] + + # activation params + self._conv_activation = activation + self._leaky_alpha = leaky_alpha + self._sc_activation = sc_activation + + super().__init__(**kwargs) + + def build(self, input_shape): + dark_conv_args = { + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'bias_regularizer': self._bias_regularizer, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'activation': self._conv_activation, + 'kernel_regularizer': self._kernel_regularizer, + 'leaky_alpha': self._leaky_alpha + } + if self._downsample: + if self._dilation_rate > 1: + dilation_rate = 1 + if self._dilation_rate // 2 > 0: + dilation_rate = self._dilation_rate // 2 + down_stride = 1 + else: + dilation_rate = 1 + down_stride = 2 + + self._dconv = ConvBN( + filters=self._filters, + kernel_size=(3, 3), + strides=down_stride, + dilation_rate=dilation_rate, + padding='same', + **dark_conv_args) + + self._conv1 = ConvBN( + filters=self._filters // self._filter_scale, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + **dark_conv_args) + + self._conv2 = ConvBN( + filters=self._filters, + kernel_size=(3, 3), + strides=(1, 1), + dilation_rate=self._dilation_rate, + padding='same', + **dark_conv_args) + + self._shortcut = tf.keras.layers.Add() + if self._sc_activation == 'leaky': + self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) + elif self._sc_activation == 'mish': + self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) + else: + self._activation_fn = tf_utils.get_activation(self._sc_activation) + super().build(input_shape) + + def call(self, inputs, training=None): + if self._downsample: + inputs = self._dconv(inputs) + x = self._conv1(inputs) + x = self._conv2(x) + x = self._shortcut([x, inputs]) + return self._activation_fn(x) + + def get_config(self): + # used to store/share parameters to reconstruct the model + layer_config = { + 'filters': self._filters, + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'dilation_rate': self._dilation_rate, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'activation': self._conv_activation, + 'leaky_alpha': self._leaky_alpha, + 'sc_activation': self._sc_activation, + 'downsample': self._downsample, + } + layer_config.update(super().get_config()) + return layer_config + + +class CSPTiny(tf.keras.layers.Layer): + """CSP Tiny layer. + + A Small size convolution block proposed in the CSPNet. The layer uses + shortcuts, routing(concatnation), and feature grouping in order to improve + gradient variablity and allow for high efficency, low power residual learning + for small networtf.keras. + Cross Stage Partial networks (CSPNets) were proposed in: + [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, + Ping-Yang Chen, Jun-Wei Hsieh + CSPNet: A New Backbone that can Enhance Learning Capability of CNN. + arXiv:1911.11929 + """ + + def __init__(self, + filters=1, + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + use_bn=True, + dilation_rate=1, + use_sync_bn=False, + use_separable_conv=False, + group_id=1, + groups=2, + norm_momentum=0.99, + norm_epsilon=0.001, + activation='leaky', + downsample=True, + leaky_alpha=0.1, + **kwargs): + """Initializer for CSPTiny block. + + Args: + filters: integer for output depth, or the number of features to learn. + kernel_initializer: string to indicate which function to use to initialize + weights. + bias_initializer: string to indicate which function to use to initialize + bias. + bias_regularizer: string to indicate which function to use to regularizer + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + use_bn: boolean for whether to use batch normalization. + dilation_rate: `int`, dilation rate for conv layers. + use_sync_bn: boolean for whether sync batch normalization statistics + of all batch norm layers to the models global statistics + (across all input batches). + use_separable_conv: `bool` wether to use separable convs. + group_id: integer for which group of features to pass through the csp + tiny stack. + groups: integer for how many splits there should be in the convolution + feature stack output. + norm_momentum: float for moment to use for batch normalization. + norm_epsilon: float for batch normalization epsilon. + activation: string or None for activation function to use in layer, + if None activation is replaced by linear. + downsample: boolean for if image input is larger than layer output, set + downsample to True so the dimensions are forced to match. + leaky_alpha: float to use as alpha if activation function is leaky. + **kwargs: Keyword Arguments. + """ + + # ConvBN params + self._filters = filters + self._kernel_initializer = kernel_initializer + self._bias_initializer = bias_initializer + self._bias_regularizer = bias_regularizer + self._use_bn = use_bn + self._dilation_rate = dilation_rate + self._use_sync_bn = use_sync_bn + self._use_separable_conv = use_separable_conv + self._kernel_regularizer = kernel_regularizer + self._groups = groups + self._group_id = group_id + self._downsample = downsample + + # normal params + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + + # activation params + self._conv_activation = activation + self._leaky_alpha = leaky_alpha + + super().__init__(**kwargs) + + def build(self, input_shape): + dark_conv_args = { + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'activation': self._conv_activation, + 'leaky_alpha': self._leaky_alpha + } + self._convlayer1 = ConvBN( + filters=self._filters, + kernel_size=(3, 3), + strides=(1, 1), + padding='same', + **dark_conv_args) + + self._convlayer2 = ConvBN( + filters=self._filters // 2, + kernel_size=(3, 3), + strides=(1, 1), + padding='same', + **dark_conv_args) + + self._convlayer3 = ConvBN( + filters=self._filters // 2, + kernel_size=(3, 3), + strides=(1, 1), + padding='same', + **dark_conv_args) + + self._convlayer4 = ConvBN( + filters=self._filters, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + **dark_conv_args) + + if self._downsample: + self._maxpool = tf.keras.layers.MaxPool2D( + pool_size=2, strides=2, padding='same', data_format=None) + + super().build(input_shape) + + def call(self, inputs, training=None): + x1 = self._convlayer1(inputs) + x1_group = tf.split(x1, self._groups, axis=-1)[self._group_id] + x2 = self._convlayer2(x1_group) # grouping + x3 = self._convlayer3(x2) + x4 = tf.concat([x3, x2], axis=-1) # csp partial using grouping + x5 = self._convlayer4(x4) + x = tf.concat([x1, x5], axis=-1) # csp connect + if self._downsample: + x = self._maxpool(x) + return x, x5 + + +class CSPRoute(tf.keras.layers.Layer): + """CSPRoute block. + + Down sampling layer to take the place of down sampleing done in Residual + networks. This is the first of 2 layers needed to convert any Residual Network + model to a CSPNet. At the start of a new level change, this CSPRoute layer + creates a learned identity that will act as a cross stage connection, + that is used to inform the inputs to the next stage. It is called cross stage + partial because the number of filters required in every intermitent Residual + layer is reduced by half. The sister layer will take the partial generated by + this layer and concatnate it with the output of the final residual layer in + the stack to create a fully feature level output. This concatnation merges the + partial blocks of 2 levels as input to the next allowing the gradients of each + level to be more unique, and reducing the number of parameters required by + each level by 50% while keeping accuracy consistent. + + Cross Stage Partial networks (CSPNets) were proposed in: + [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, + Ping-Yang Chen, Jun-Wei Hsieh + CSPNet: A New Backbone that can Enhance Learning Capability of CNN. + arXiv:1911.11929 + """ + + def __init__(self, + filters, + filter_scale=2, + activation='mish', + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + dilation_rate=1, + use_bn=True, + use_sync_bn=False, + use_separable_conv=False, + norm_momentum=0.99, + norm_epsilon=0.001, + downsample=True, + leaky_alpha=0.1, + **kwargs): + """CSPRoute layer initializer. + + Args: + filters: integer for output depth, or the number of features to learn + filter_scale: integer dictating (filters//2) or the number of filters in + the partial feature stack. + activation: string for activation function to use in layer. + kernel_initializer: string to indicate which function to use to + initialize weights. + bias_initializer: string to indicate which function to use to initialize + bias. + bias_regularizer: string to indicate which function to use to regularizer + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + dilation_rate: dilation rate for conv layers. + use_bn: boolean for whether to use batch normalization. + use_sync_bn: boolean for whether sync batch normalization statistics + of all batch norm layers to the models global statistics + (across all input batches). + use_separable_conv: `bool` wether to use separable convs. + norm_momentum: float for moment to use for batch normalization. + norm_epsilon: float for batch normalization epsilon. + downsample: down_sample the input. + leaky_alpha: `float`, for leaky alpha value. + **kwargs: Keyword Arguments. + """ + + super().__init__(**kwargs) + # layer params + self._filters = filters + self._filter_scale = filter_scale + self._activation = activation + + # convoultion params + self._kernel_initializer = kernel_initializer + self._bias_initializer = bias_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._dilation_rate = dilation_rate + self._use_bn = use_bn + self._use_sync_bn = use_sync_bn + self._use_separable_conv = use_separable_conv + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._downsample = downsample + self._leaky_alpha = leaky_alpha + + def build(self, input_shape): + dark_conv_args = { + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'bias_regularizer': self._bias_regularizer, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'activation': self._activation, + 'kernel_regularizer': self._kernel_regularizer, + 'leaky_alpha': self._leaky_alpha, + } + if self._downsample: + if self._dilation_rate > 1: + dilation_rate = 1 + if self._dilation_rate // 2 > 0: + dilation_rate = self._dilation_rate // 2 + down_stride = 1 + else: + dilation_rate = 1 + down_stride = 2 + + self._conv1 = ConvBN( + filters=self._filters, + kernel_size=(3, 3), + strides=down_stride, + dilation_rate=dilation_rate, + **dark_conv_args) + + self._conv2 = ConvBN( + filters=self._filters // self._filter_scale, + kernel_size=(1, 1), + strides=(1, 1), + **dark_conv_args) + + self._conv3 = ConvBN( + filters=self._filters // self._filter_scale, + kernel_size=(1, 1), + strides=(1, 1), + **dark_conv_args) + + def call(self, inputs, training=None): + if self._downsample: + inputs = self._conv1(inputs) + y = self._conv2(inputs) + x = self._conv3(inputs) + return (x, y) + + +class CSPConnect(tf.keras.layers.Layer): + """CSPConnect block. + + Sister Layer to the CSPRoute layer. Merges the partial feature stacks + generated by the CSPDownsampling layer, and the finaly output of the + residual stack. Suggested in the CSPNet paper. + Cross Stage Partial networks (CSPNets) were proposed in: + [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, + Ping-Yang Chen, Jun-Wei Hsieh + CSPNet: A New Backbone that can Enhance Learning Capability of CNN. + arXiv:1911.11929 + """ + + def __init__(self, + filters, + filter_scale=2, + drop_final=False, + drop_first=False, + activation='mish', + kernel_size=(1, 1), + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + dilation_rate=1, + use_bn=True, + use_sync_bn=False, + use_separable_conv=False, + norm_momentum=0.99, + norm_epsilon=0.001, + leaky_alpha=0.1, + **kwargs): + """Initializer for CSPConnect block. + + Args: + filters: integer for output depth, or the number of features to learn. + filter_scale: integer dictating (filters//2) or the number of filters in + the partial feature stack. + drop_final: `bool`, whether to drop final conv layer. + drop_first: `bool`, whether to drop first conv layer. + activation: string for activation function to use in layer. + kernel_size: `Tuple`, kernel size for conv layers. + kernel_initializer: string to indicate which function to use to initialize + weights. + bias_initializer: string to indicate which function to use to initialize + bias. + bias_regularizer: string to indicate which function to use to regularizer + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + dilation_rate: `int`, dilation rate for conv layers. + use_bn: boolean for whether to use batch normalization. + use_sync_bn: boolean for whether sync batch normalization statistics + of all batch norm layers to the models global + statistics (across all input batches). + use_separable_conv: `bool` wether to use separable convs. + norm_momentum: float for moment to use for batch normalization. + norm_epsilon: float for batch normalization epsilon. + leaky_alpha: `float`, for leaky alpha value. + **kwargs: Keyword Arguments. + """ + + super().__init__(**kwargs) + # layer params + self._filters = filters + self._filter_scale = filter_scale + self._activation = activation + + # convoultion params + self._kernel_size = kernel_size + self._kernel_initializer = kernel_initializer + self._bias_initializer = bias_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._use_bn = use_bn + self._use_sync_bn = use_sync_bn + self._use_separable_conv = use_separable_conv + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._drop_final = drop_final + self._drop_first = drop_first + self._leaky_alpha = leaky_alpha + + def build(self, input_shape): + dark_conv_args = { + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'bias_regularizer': self._bias_regularizer, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'activation': self._activation, + 'kernel_regularizer': self._kernel_regularizer, + 'leaky_alpha': self._leaky_alpha, + } + if not self._drop_first: + self._conv1 = ConvBN( + filters=self._filters // self._filter_scale, + kernel_size=self._kernel_size, + strides=(1, 1), + **dark_conv_args) + self._concat = tf.keras.layers.Concatenate(axis=-1) + + if not self._drop_final: + self._conv2 = ConvBN( + filters=self._filters, + kernel_size=(1, 1), + strides=(1, 1), + **dark_conv_args) + + def call(self, inputs, training=None): + x_prev, x_csp = inputs + if not self._drop_first: + x_prev = self._conv1(x_prev) + x = self._concat([x_prev, x_csp]) + + # skipped if drop final is true + if not self._drop_final: + x = self._conv2(x) + return x + + +class CSPStack(tf.keras.layers.Layer): + """CSP Stack layer. + + CSP full stack, combines the route and the connect in case you dont want to + jsut quickly wrap an existing callable or list of layers to + make it a cross stage partial. Added for ease of use. you should be able + to wrap any layer stack with a CSP independent of wether it belongs + to the Darknet family. if filter_scale = 2, then the blocks in the stack + passed into the CSP stack should also have filters = filters/filter_scale + Cross Stage Partial networks (CSPNets) were proposed in: + + [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, + Ping-Yang Chen, Jun-Wei Hsieh + CSPNet: A New Backbone that can Enhance Learning Capability of CNN. + arXiv:1911.11929 + """ + + def __init__(self, + filters, + model_to_wrap=None, + filter_scale=2, + activation='mish', + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + downsample=True, + use_bn=True, + use_sync_bn=False, + use_separable_conv=False, + norm_momentum=0.99, + norm_epsilon=0.001, + **kwargs): + """CSPStack layer initializer. + + Args: + filters: filter size for conv layers. + model_to_wrap: callable Model or a list of callable objects that will + process the output of CSPRoute, and be input into CSPConnect. list will + be called sequentially. + filter_scale: integer dictating (filters//2) or the number of filters in + the partial feature stack. + activation: string for activation function to use in layer. + kernel_initializer: string to indicate which function to use to initialize + weights. + bias_initializer: string to indicate which function to use to initialize + bias. + bias_regularizer: string to indicate which function to use to regularizer + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + downsample: down_sample the input. + use_bn: boolean for whether to use batch normalization. + use_sync_bn: boolean for whether sync batch normalization statistics of + all batch norm layers to the models global statistics (across all input + batches). + use_separable_conv: `bool` wether to use separable convs. + norm_momentum: float for moment to use for batch normalization. + norm_epsilon: float for batch normalization epsilon. + **kwargs: Keyword Arguments. + + Raises: + TypeError: model_to_wrap is not a layer or a list of layers + """ + + super().__init__(**kwargs) + # layer params + self._filters = filters + self._filter_scale = filter_scale + self._activation = activation + self._downsample = downsample + + # convoultion params + self._kernel_initializer = kernel_initializer + self._bias_initializer = bias_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._use_bn = use_bn + self._use_sync_bn = use_sync_bn + self._use_separable_conv = use_separable_conv + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + + if model_to_wrap is None: + self._model_to_wrap = [] + elif isinstance(model_to_wrap, Callable): + self._model_to_wrap = [model_to_wrap] + elif isinstance(model_to_wrap, List): + self._model_to_wrap = model_to_wrap + else: + raise TypeError( + 'the input to the CSPStack must be a list of layers that we can' + + 'iterate through, or \n a callable') + + def build(self, input_shape): + dark_conv_args = { + 'filters': self._filters, + 'filter_scale': self._filter_scale, + 'activation': self._activation, + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'bias_regularizer': self._bias_regularizer, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'kernel_regularizer': self._kernel_regularizer, + } + self._route = CSPRoute(downsample=self._downsample, **dark_conv_args) + self._connect = CSPConnect(**dark_conv_args) + + def call(self, inputs, training=None): + x, x_route = self._route(inputs) + for layer in self._model_to_wrap: + x = layer(x) + x = self._connect([x, x_route]) + return x + + +class PathAggregationBlock(tf.keras.layers.Layer): + """Path Aggregation block.""" + + def __init__(self, + filters=1, + drop_final=True, + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + use_bn=True, + use_sync_bn=False, + use_separable_conv=False, + inverted=False, + norm_momentum=0.99, + norm_epsilon=0.001, + activation='leaky', + leaky_alpha=0.1, + downsample=False, + upsample=False, + upsample_size=2, + **kwargs): + """Initializer for path aggregation block. + + Args: + filters: integer for output depth, or the number of features to learn. + drop_final: do not create the last convolution block. + kernel_initializer: string to indicate which function to use to initialize + weights. + bias_initializer: string to indicate which function to use to initialize + bias. + bias_regularizer: string to indicate which function to use to regularizer + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + use_bn: boolean for whether to use batch normalization. + use_sync_bn: boolean for whether sync batch normalization statistics + of all batch norm layers to the models global statistics + (across all input batches). + use_separable_conv: `bool` wether to use separable convs. + inverted: boolean for inverting the order of the convolutions. + norm_momentum: float for moment to use for batch normalization. + norm_epsilon: float for batch normalization epsilon. + activation: string or None for activation function to use in layer, + if None activation is replaced by linear. + leaky_alpha: float to use as alpha if activation function is leaky. + downsample: `bool` for whehter to downwample and merge. + upsample: `bool` for whehter to upsample and merge. + upsample_size: `int` how much to upsample in order to match shapes. + **kwargs: Keyword Arguments. + """ + + # Darkconv params + self._filters = filters + self._kernel_initializer = kernel_initializer + self._bias_initializer = bias_initializer + self._bias_regularizer = bias_regularizer + self._kernel_regularizer = kernel_regularizer + self._use_bn = use_bn + self._use_sync_bn = use_sync_bn + self._use_separable_conv = use_separable_conv + + # Normal params + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + + # Activation params + self._conv_activation = activation + self._leaky_alpha = leaky_alpha + self._downsample = downsample + self._upsample = upsample + self._upsample_size = upsample_size + self._drop_final = drop_final + + # Block params + self._inverted = inverted + + super().__init__(**kwargs) + + def _build_regular(self, input_shape, kwargs): + if self._downsample: + self._conv = ConvBN( + filters=self._filters, + kernel_size=(3, 3), + strides=(2, 2), + padding='same', + **kwargs) + else: + self._conv = ConvBN( + filters=self._filters, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + **kwargs) + + if not self._drop_final: + self._conv_concat = ConvBN( + filters=self._filters, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + **kwargs) + + def _build_reversed(self, input_shape, kwargs): + if self._downsample: + self._conv_prev = ConvBN( + filters=self._filters, + kernel_size=(3, 3), + strides=(2, 2), + padding='same', + **kwargs) + else: + self._conv_prev = ConvBN( + filters=self._filters, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + **kwargs) + + self._conv_route = ConvBN( + filters=self._filters, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + **kwargs) + + if not self._drop_final: + self._conv_sync = ConvBN( + filters=self._filters, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + **kwargs) + + def build(self, input_shape): + dark_conv_args = { + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'bias_regularizer': self._bias_regularizer, + 'use_bn': self._use_bn, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'activation': self._conv_activation, + 'kernel_regularizer': self._kernel_regularizer, + 'leaky_alpha': self._leaky_alpha, + } + + if self._inverted: + self._build_reversed(input_shape, dark_conv_args) + else: + self._build_regular(input_shape, dark_conv_args) + + self._concat = tf.keras.layers.Concatenate() + super().build(input_shape) + + def _call_regular(self, inputs, training=None): + input_to_convolve, input_to_concat = inputs + x_prev = self._conv(input_to_convolve) + if self._upsample: + x_prev = spatial_transform_ops.nearest_upsampling(x_prev, + self._upsample_size) + x = self._concat([x_prev, input_to_concat]) + + # used in csp conversion + if not self._drop_final: + x = self._conv_concat(x) + return x_prev, x + + def _call_reversed(self, inputs, training=None): + x_route, x_prev = inputs + x_prev = self._conv_prev(x_prev) + if self._upsample: + x_prev = spatial_transform_ops.nearest_upsampling(x_prev, + self._upsample_size) + x_route = self._conv_route(x_route) + x = self._concat([x_route, x_prev]) + if not self._drop_final: + x = self._conv_sync(x) + return x_prev, x + + def call(self, inputs, training=None): + # done this way to prevent confusion in the auto graph + if self._inverted: + return self._call_reversed(inputs, training=training) + else: + return self._call_regular(inputs, training=training) + + +class SPP(tf.keras.layers.Layer): + """Spatial Pyramid Pooling. + + A non-agregated SPP layer that uses Pooling. + """ + + def __init__(self, sizes, **kwargs): + self._sizes = list(reversed(sizes)) + if not sizes: + raise ValueError('More than one maxpool should be specified in SSP block') + super().__init__(**kwargs) + + def build(self, input_shape): + maxpools = [] + for size in self._sizes: + maxpools.append( + tf.keras.layers.MaxPool2D( + pool_size=(size, size), + strides=(1, 1), + padding='same', + data_format=None)) + self._maxpools = maxpools + super().build(input_shape) + + def call(self, inputs, training=None): + outputs = [] + for maxpool in self._maxpools: + outputs.append(maxpool(inputs)) + outputs.append(inputs) + concat_output = tf.keras.layers.concatenate(outputs) + return concat_output + + def get_config(self): + layer_config = {'sizes': self._sizes} + layer_config.update(super().get_config()) + return layer_config + + +class SAM(tf.keras.layers.Layer): + """Spatial Attention Model. + + [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon + CBAM: Convolutional Block Attention Module. arXiv:1807.06521 + + implementation of the Spatial Attention Model (SAM) + """ + + def __init__(self, + use_pooling=False, + filter_match=False, + filters=1, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + dilation_rate=(1, 1), + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + use_bn=True, + use_sync_bn=True, + use_separable_conv=False, + norm_momentum=0.99, + norm_epsilon=0.001, + activation='sigmoid', + output_activation=None, + leaky_alpha=0.1, + **kwargs): + + # use_pooling + self._use_pooling = use_pooling + self._filters = filters + self._output_activation = output_activation + self._leaky_alpha = leaky_alpha + + self.dark_conv_args = { + 'kernel_size': kernel_size, + 'strides': strides, + 'padding': padding, + 'dilation_rate': dilation_rate, + 'kernel_initializer': kernel_initializer, + 'bias_initializer': bias_initializer, + 'bias_regularizer': bias_regularizer, + 'use_bn': use_bn, + 'use_sync_bn': use_sync_bn, + 'use_separable_conv': use_separable_conv, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'activation': activation, + 'kernel_regularizer': kernel_regularizer, + 'leaky_alpha': leaky_alpha + } + + super().__init__(**kwargs) + + def build(self, input_shape): + if self._filters == -1: + self._filters = input_shape[-1] + self._conv = ConvBN(filters=self._filters, **self.dark_conv_args) + if self._output_activation == 'leaky': + self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) + elif self._output_activation == 'mish': + self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) + else: + self._activation_fn = tf_utils.get_activation(self._output_activation) + + def call(self, inputs, training=None): + if self._use_pooling: + depth_max = tf.reduce_max(inputs, axis=-1, keepdims=True) + depth_avg = tf.reduce_mean(inputs, axis=-1, keepdims=True) + input_maps = tf.concat([depth_avg, depth_max], axis=-1) + else: + input_maps = inputs + + attention_mask = self._conv(input_maps) + return self._activation_fn(inputs * attention_mask) + + +class CAM(tf.keras.layers.Layer): + """Channel Attention Model. + + [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon + CBAM: Convolutional Block Attention Module. arXiv:1807.06521 + + Implementation of the Channel Attention Model (CAM) + """ + + def __init__(self, + reduction_ratio=1.0, + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + use_bn=False, + use_sync_bn=False, + use_bias=False, + norm_momentum=0.99, + norm_epsilon=0.001, + mlp_activation='linear', + activation='sigmoid', + leaky_alpha=0.1, + **kwargs): + + self._reduction_ratio = reduction_ratio + + # use_pooling + if use_sync_bn: + self._bn = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._bn = tf.keras.layers.BatchNormalization + + if not use_bn: + self._bn = Identity + self._bn_args = {} + else: + self._bn_args = { + 'momentum': norm_momentum, + 'epsilon': norm_epsilon, + } + + self._mlp_args = { + 'use_bias': use_bias, + 'kernel_initializer': kernel_initializer, + 'bias_initializer': bias_initializer, + 'bias_regularizer': bias_regularizer, + 'activation': mlp_activation, + 'kernel_regularizer': kernel_regularizer, + } + + self._leaky_alpha = leaky_alpha + self._activation = activation + + super().__init__(**kwargs) + + def build(self, input_shape): + self._filters = input_shape[-1] + + self._mlp = tf.keras.Sequential([ + tf.keras.layers.Dense(self._filters, **self._mlp_args), + self._bn(**self._bn_args), + tf.keras.layers.Dense( + int(self._filters * self._reduction_ratio), **self._mlp_args), + self._bn(**self._bn_args), + tf.keras.layers.Dense(self._filters, **self._mlp_args), + self._bn(**self._bn_args), + ]) + + if self._activation == 'leaky': + self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) + elif self._activation == 'mish': + self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) + else: + self._activation_fn = tf_utils.get_activation(self._activation) + + def call(self, inputs, training=None): + depth_max = self._mlp(tf.reduce_max(inputs, axis=(1, 2))) + depth_avg = self._mlp(tf.reduce_mean(inputs, axis=(1, 2))) + channel_mask = self._activation_fn(depth_avg + depth_max) + + channel_mask = tf.expand_dims(channel_mask, axis=1) + attention_mask = tf.expand_dims(channel_mask, axis=1) + + return inputs * attention_mask + + +class CBAM(tf.keras.layers.Layer): + """Convolutional Block Attention Module. + + [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon + CBAM: Convolutional Block Attention Module. arXiv:1807.06521 + + implementation of the Convolution Block Attention Module (CBAM) + """ + + def __init__(self, + use_pooling=False, + filters=1, + reduction_ratio=1.0, + kernel_size=(1, 1), + strides=(1, 1), + padding='same', + dilation_rate=(1, 1), + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + use_bn=True, + use_sync_bn=False, + use_separable_conv=False, + norm_momentum=0.99, + norm_epsilon=0.001, + mlp_activation=None, + activation='sigmoid', + leaky_alpha=0.1, + **kwargs): + + # use_pooling + + self._sam_args = { + 'use_pooling': use_pooling, + 'filters': filters, + 'kernel_size': kernel_size, + 'strides': strides, + 'padding': padding, + 'dilation_rate': dilation_rate, + 'use_separable_conv': use_separable_conv, + } + + self._cam_args = { + 'reduction_ratio': reduction_ratio, + 'mlp_activation': mlp_activation + } + + self._common_args = { + 'kernel_initializer': kernel_initializer, + 'bias_initializer': bias_initializer, + 'bias_regularizer': bias_regularizer, + 'use_bn': use_bn, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'activation': activation, + 'kernel_regularizer': kernel_regularizer, + 'leaky_alpha': leaky_alpha + } + + self._cam_args.update(self._common_args) + self._sam_args.update(self._common_args) + super().__init__(**kwargs) + + def build(self, input_shape): + self._cam = CAM(**self._cam_args) + self._sam = SAM(**self._sam_args) + + def call(self, inputs, training=None): + return self._sam(self._cam(inputs)) + + +class DarkRouteProcess(tf.keras.layers.Layer): + """Dark Route Process block. + + Process darknet outputs and connect back bone to head more generalizably + Abstracts repetition of DarkConv objects that is common in YOLO. + + It is used like the following: + + x = ConvBN(1024, (3, 3), (1, 1))(x) + proc = DarkRouteProcess(filters = 1024, + repetitions = 3, + insert_spp = False)(x) + """ + + def __init__(self, + filters=2, + repetitions=2, + insert_spp=False, + insert_sam=False, + insert_cbam=False, + csp_stack=0, + csp_scale=2, + kernel_initializer='VarianceScaling', + bias_initializer='zeros', + bias_regularizer=None, + kernel_regularizer=None, + use_sync_bn=False, + use_separable_conv=False, + norm_momentum=0.99, + norm_epsilon=0.001, + block_invert=False, + activation='leaky', + leaky_alpha=0.1, + spp_keys=None, + **kwargs): + """DarkRouteProcess initializer. + + Args: + filters: the number of filters to be used in all subsequent layers + filters should be the depth of the tensor input into this layer, + as no downsampling can be done within this layer object. + repetitions: number of times to repeat the processign nodes. + for tiny: 1 repition, no spp allowed. + for spp: insert_spp = True, and allow for 6 repetitions. + for regular: insert_spp = False, and allow for 6 repetitions. + insert_spp: bool if true add the spatial pyramid pooling layer. + insert_sam: bool if true add spatial attention module to path. + insert_cbam: bool if true add convolutional block attention + module to path. + csp_stack: int for the number of sequential layers from 0 + to you would like to convert into a Cross Stage + Partial(csp) type. + csp_scale: int for how much to down scale the number of filters + only for the csp layers in the csp section of the processing + path. A value 2 indicates that each layer that is int eh CSP + stack will have filters = filters/2. + kernel_initializer: method to use to initialize kernel weights. + bias_initializer: method to use to initialize the bias of the conv + layers. + bias_regularizer: string to indicate which function to use to regularizer + bias. + kernel_regularizer: string to indicate which function to use to + regularizer weights. + use_sync_bn: bool if true use the sync batch normalization. + use_separable_conv: `bool` wether to use separable convs. + norm_momentum: batch norm parameter see Tensorflow documentation. + norm_epsilon: batch norm parameter see Tensorflow documentation. + block_invert: bool use for switching between the even and odd + repretions of layers. usually the repetition is based on a + 3x3 conv with filters, followed by a 1x1 with filters/2 with + an even number of repetitions to ensure each 3x3 gets a 1x1 + sqeeze. block invert swaps the 3x3/1 1x1/2 to a 1x1/2 3x3/1 + ordering typically used when the model requires an odd number + of repetiitions. All other peramters maintain their affects + activation: activation function to use in processing. + leaky_alpha: if leaky acitivation function, the alpha to use in + processing the relu input. + spp_keys: List[int] of the sampling levels to be applied by + the Spatial Pyramid Pooling Layer. By default it is + [5, 9, 13] inidicating a 5x5 pooling followed by 9x9 + followed by 13x13 then followed by the standard concatnation + and convolution. + **kwargs: Keyword Arguments. + """ + + super().__init__(**kwargs) + # darkconv params + self._filters = filters + self._use_sync_bn = use_sync_bn + self._use_separable_conv = use_separable_conv + self._kernel_initializer = kernel_initializer + self._bias_initializer = bias_initializer + self._bias_regularizer = bias_regularizer + self._kernel_regularizer = kernel_regularizer + + # normal params + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + + # activation params + self._activation = activation + self._leaky_alpha = leaky_alpha + + repetitions += (2 * int(insert_spp)) + if repetitions == 1: + block_invert = True + + self._repetitions = repetitions + self.layer_list, self.outputs = self._get_base_layers() + + if csp_stack > 0: + self._csp_scale = csp_scale + csp_stack += (2 * int(insert_spp)) + self._csp_filters = lambda x: x // csp_scale + self._convert_csp(self.layer_list, self.outputs, csp_stack) + block_invert = False + + self._csp_stack = csp_stack + + if block_invert: + self._conv1_filters = lambda x: x + self._conv2_filters = lambda x: x // 2 + self._conv1_kernel = (3, 3) + self._conv2_kernel = (1, 1) + else: + self._conv1_filters = lambda x: x // 2 + self._conv2_filters = lambda x: x + self._conv1_kernel = (1, 1) + self._conv2_kernel = (3, 3) + + # insert SPP will always add to the total nuber of layer, never replace + if insert_spp: + self._spp_keys = spp_keys if spp_keys is not None else [5, 9, 13] + self.layer_list = self._insert_spp(self.layer_list) + + if repetitions > 1: + self.outputs[-2] = True + + if insert_sam: + self.layer_list = self._insert_sam(self.layer_list, self.outputs) + self._repetitions += 1 + self.outputs[-1] = True + + def _get_base_layers(self): + layer_list = [] + outputs = [] + for i in range(self._repetitions): + layers = ['conv1'] * ((i + 1) % 2) + ['conv2'] * (i % 2) + layer_list.extend(layers) + outputs = [False] + outputs + return layer_list, outputs + + def _insert_spp(self, layer_list): + if len(layer_list) <= 3: + layer_list[1] = 'spp' + else: + layer_list[3] = 'spp' + return layer_list + + def _convert_csp(self, layer_list, outputs, csp_stack_size): + layer_list[0] = 'csp_route' + layer_list.insert(csp_stack_size - 1, 'csp_connect') + outputs.insert(csp_stack_size - 1, False) + return layer_list, outputs + + def _insert_sam(self, layer_list, outputs): + if len(layer_list) >= 2 and layer_list[-2] != 'spp': + layer_list.insert(-2, 'sam') + outputs.insert(-1, True) + else: + layer_list.insert(-1, 'sam') + outputs.insert(-1, False) + return layer_list + + def _conv1(self, filters, kwargs, csp=False): + if csp: + filters_ = self._csp_filters + else: + filters_ = self._conv1_filters + + x1 = ConvBN( + filters=filters_(filters), + kernel_size=self._conv1_kernel, + strides=(1, 1), + padding='same', + use_bn=True, + **kwargs) + return x1 + + def _conv2(self, filters, kwargs, csp=False): + if csp: + filters_ = self._csp_filters + else: + filters_ = self._conv2_filters + + x1 = ConvBN( + filters=filters_(filters), + kernel_size=self._conv2_kernel, + strides=(1, 1), + padding='same', + use_bn=True, + **kwargs) + return x1 + + def _csp_route(self, filters, kwargs): + x1 = CSPRoute( + filters=filters, + filter_scale=self._csp_scale, + downsample=False, + **kwargs) + return x1 + + def _csp_connect(self, filters, kwargs): + x1 = CSPConnect(filters=filters, drop_final=True, drop_first=True, **kwargs) + return x1 + + def _spp(self, filters, kwargs): + x1 = SPP(self._spp_keys) + return x1 + + def _sam(self, filters, kwargs): + x1 = SAM(filters=-1, use_pooling=False, use_bn=True, **kwargs) + return x1 + + def build(self, input_shape): + dark_conv_args = { + 'activation': self._activation, + 'kernel_initializer': self._kernel_initializer, + 'bias_initializer': self._bias_initializer, + 'bias_regularizer': self._bias_regularizer, + 'use_sync_bn': self._use_sync_bn, + 'use_separable_conv': self._use_separable_conv, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'kernel_regularizer': self._kernel_regularizer, + 'leaky_alpha': self._leaky_alpha, + } + + csp = False + self.layers = [] + for layer in self.layer_list: + if layer == 'csp_route': + self.layers.append(self._csp_route(self._filters, dark_conv_args)) + csp = True + elif layer == 'csp_connect': + self.layers.append(self._csp_connect(self._filters, dark_conv_args)) + csp = False + elif layer == 'conv1': + self.layers.append(self._conv1(self._filters, dark_conv_args, csp=csp)) + elif layer == 'conv2': + self.layers.append(self._conv2(self._filters, dark_conv_args, csp=csp)) + elif layer == 'spp': + self.layers.append(self._spp(self._filters, dark_conv_args)) + elif layer == 'sam': + self.layers.append(self._sam(-1, dark_conv_args)) + + self._lim = len(self.layers) + super().build(input_shape) + + def _call_regular(self, inputs, training=None): + # check efficiency + x = inputs + x_prev = x + output_prev = True + + for (layer, output) in zip(self.layers, self.outputs): + if output_prev: + x_prev = x + x = layer(x) + output_prev = output + return x_prev, x + + def _call_csp(self, inputs, training=None): + # check efficiency + x = inputs + x_prev = x + output_prev = True + x_route = None + + for i, (layer, output) in enumerate(zip(self.layers, self.outputs)): + if output_prev: + x_prev = x + if i == 0: + x, x_route = layer(x) + elif i == self._csp_stack - 1: + x = layer([x, x_route]) + else: + x = layer(x) + output_prev = output + return x_prev, x + + def call(self, inputs, training=None): + if self._csp_stack > 0: + return self._call_csp(inputs, training=training) + else: + return self._call_regular(inputs) + + +class Reorg(tf.keras.layers.Layer): + """Splits a high resolution image into 4 lower resolution images. + + Used in YOLOR to process very high resolution inputs efficiently. + for example an input image of [1280, 1280, 3] will become [640, 640, 12], + the images are sampled in such a way that the spatial resoltion is + retained. + """ + + def call(self, x, training=None): + return tf.concat([ + x[..., ::2, ::2, :], x[..., 1::2, ::2, :], x[..., ::2, 1::2, :], + x[..., 1::2, 1::2, :] + ], + axis=-1) diff --git a/official/projects/yolo/modeling/layers/nn_blocks_test.py b/official/projects/yolo/modeling/layers/nn_blocks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..98b001612dfdd90c0f7b422643fe429c6bd2d569 --- /dev/null +++ b/official/projects/yolo/modeling/layers/nn_blocks_test.py @@ -0,0 +1,305 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.projects.yolo.modeling.layers import nn_blocks + + +class CSPConnectTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters(('same', 224, 224, 64, 1), + ('downsample', 224, 224, 64, 2)) + def test_pass_through(self, width, height, filters, mod): + x = tf.keras.Input(shape=(width, height, filters)) + test_layer = nn_blocks.CSPRoute(filters=filters, filter_scale=mod) + test_layer2 = nn_blocks.CSPConnect(filters=filters, filter_scale=mod) + outx, px = test_layer(x) + outx = test_layer2([outx, px]) + print(outx) + print(outx.shape.as_list()) + self.assertAllEqual( + outx.shape.as_list(), + [None, np.ceil(width // 2), + np.ceil(height // 2), (filters)]) + + @parameterized.named_parameters(('same', 224, 224, 64, 1), + ('downsample', 224, 224, 128, 2)) + def test_gradient_pass_though(self, filters, width, height, mod): + loss = tf.keras.losses.MeanSquaredError() + optimizer = tf.keras.optimizers.SGD() + test_layer = nn_blocks.CSPRoute(filters, filter_scale=mod) + path_layer = nn_blocks.CSPConnect(filters, filter_scale=mod) + + init = tf.random_normal_initializer() + x = tf.Variable( + initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) + y = tf.Variable( + initial_value=init( + shape=(1, int(np.ceil(width // 2)), int(np.ceil(height // 2)), + filters), + dtype=tf.float32)) + + with tf.GradientTape() as tape: + x_hat, x_prev = test_layer(x) + x_hat = path_layer([x_hat, x_prev]) + grad_loss = loss(x_hat, y) + grad = tape.gradient(grad_loss, test_layer.trainable_variables) + optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) + + self.assertNotIn(None, grad) + + +class CSPRouteTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters(('same', 224, 224, 64, 1), + ('downsample', 224, 224, 64, 2)) + def test_pass_through(self, width, height, filters, mod): + x = tf.keras.Input(shape=(width, height, filters)) + test_layer = nn_blocks.CSPRoute(filters=filters, filter_scale=mod) + outx, _ = test_layer(x) + print(outx) + print(outx.shape.as_list()) + self.assertAllEqual( + outx.shape.as_list(), + [None, np.ceil(width // 2), + np.ceil(height // 2), (filters / mod)]) + + @parameterized.named_parameters(('same', 224, 224, 64, 1), + ('downsample', 224, 224, 128, 2)) + def test_gradient_pass_though(self, filters, width, height, mod): + loss = tf.keras.losses.MeanSquaredError() + optimizer = tf.keras.optimizers.SGD() + test_layer = nn_blocks.CSPRoute(filters, filter_scale=mod) + path_layer = nn_blocks.CSPConnect(filters, filter_scale=mod) + + init = tf.random_normal_initializer() + x = tf.Variable( + initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) + y = tf.Variable( + initial_value=init( + shape=(1, int(np.ceil(width // 2)), int(np.ceil(height // 2)), + filters), + dtype=tf.float32)) + + with tf.GradientTape() as tape: + x_hat, x_prev = test_layer(x) + x_hat = path_layer([x_hat, x_prev]) + grad_loss = loss(x_hat, y) + grad = tape.gradient(grad_loss, test_layer.trainable_variables) + optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) + + self.assertNotIn(None, grad) + + +class ConvBNTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters( + ('valid', (3, 3), 'valid', (1, 1)), ('same', (3, 3), 'same', (1, 1)), + ('downsample', (3, 3), 'same', (2, 2)), ('test', (1, 1), 'valid', (1, 1))) + def test_pass_through(self, kernel_size, padding, strides): + if padding == 'same': + pad_const = 1 + else: + pad_const = 0 + x = tf.keras.Input(shape=(224, 224, 3)) + test_layer = nn_blocks.ConvBN( + filters=64, + kernel_size=kernel_size, + padding=padding, + strides=strides, + trainable=False) + outx = test_layer(x) + print(outx.shape.as_list()) + test = [ + None, + int((224 - kernel_size[0] + (2 * pad_const)) / strides[0] + 1), + int((224 - kernel_size[1] + (2 * pad_const)) / strides[1] + 1), 64 + ] + print(test) + self.assertAllEqual(outx.shape.as_list(), test) + + @parameterized.named_parameters(('filters', 3)) + def test_gradient_pass_though(self, filters): + loss = tf.keras.losses.MeanSquaredError() + optimizer = tf.keras.optimizers.SGD() + with tf.device('/CPU:0'): + test_layer = nn_blocks.ConvBN(filters, kernel_size=(3, 3), padding='same') + + init = tf.random_normal_initializer() + x = tf.Variable( + initial_value=init(shape=(1, 224, 224, 3), dtype=tf.float32)) + y = tf.Variable( + initial_value=init(shape=(1, 224, 224, filters), dtype=tf.float32)) + + with tf.GradientTape() as tape: + x_hat = test_layer(x) + grad_loss = loss(x_hat, y) + grad = tape.gradient(grad_loss, test_layer.trainable_variables) + optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) + self.assertNotIn(None, grad) + + +class DarkResidualTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters(('same', 224, 224, 64, False), + ('downsample', 223, 223, 32, True), + ('oddball', 223, 223, 32, False)) + def test_pass_through(self, width, height, filters, downsample): + mod = 1 + if downsample: + mod = 2 + x = tf.keras.Input(shape=(width, height, filters)) + test_layer = nn_blocks.DarkResidual(filters=filters, downsample=downsample) + outx = test_layer(x) + print(outx) + print(outx.shape.as_list()) + self.assertAllEqual( + outx.shape.as_list(), + [None, np.ceil(width / mod), + np.ceil(height / mod), filters]) + + @parameterized.named_parameters(('same', 64, 224, 224, False), + ('downsample', 32, 223, 223, True), + ('oddball', 32, 223, 223, False)) + def test_gradient_pass_though(self, filters, width, height, downsample): + loss = tf.keras.losses.MeanSquaredError() + optimizer = tf.keras.optimizers.SGD() + test_layer = nn_blocks.DarkResidual(filters, downsample=downsample) + + if downsample: + mod = 2 + else: + mod = 1 + + init = tf.random_normal_initializer() + x = tf.Variable( + initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) + y = tf.Variable( + initial_value=init( + shape=(1, int(np.ceil(width / mod)), int(np.ceil(height / mod)), + filters), + dtype=tf.float32)) + + with tf.GradientTape() as tape: + x_hat = test_layer(x) + grad_loss = loss(x_hat, y) + grad = tape.gradient(grad_loss, test_layer.trainable_variables) + optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) + + self.assertNotIn(None, grad) + + +class DarkSppTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters(('RouteProcessSpp', 224, 224, 3, [5, 9, 13]), + ('test1', 300, 300, 10, [2, 3, 4, 5]), + ('test2', 256, 256, 5, [10])) + def test_pass_through(self, width, height, channels, sizes): + x = tf.keras.Input(shape=(width, height, channels)) + test_layer = nn_blocks.SPP(sizes=sizes) + outx = test_layer(x) + self.assertAllEqual(outx.shape.as_list(), + [None, width, height, channels * (len(sizes) + 1)]) + return + + @parameterized.named_parameters(('RouteProcessSpp', 224, 224, 3, [5, 9, 13]), + ('test1', 300, 300, 10, [2, 3, 4, 5]), + ('test2', 256, 256, 5, [10])) + def test_gradient_pass_though(self, width, height, channels, sizes): + loss = tf.keras.losses.MeanSquaredError() + optimizer = tf.keras.optimizers.SGD() + test_layer = nn_blocks.SPP(sizes=sizes) + + init = tf.random_normal_initializer() + x = tf.Variable( + initial_value=init( + shape=(1, width, height, channels), dtype=tf.float32)) + y = tf.Variable( + initial_value=init( + shape=(1, width, height, channels * (len(sizes) + 1)), + dtype=tf.float32)) + + with tf.GradientTape() as tape: + x_hat = test_layer(x) + grad_loss = loss(x_hat, y) + grad = tape.gradient(grad_loss, test_layer.trainable_variables) + optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) + + self.assertNotIn(None, grad) + return + + +class DarkRouteProcessTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters( + ('test1', 224, 224, 64, 7, False), ('test2', 223, 223, 32, 3, False), + ('tiny', 223, 223, 16, 1, False), ('spp', 224, 224, 64, 7, False)) + def test_pass_through(self, width, height, filters, repetitions, spp): + x = tf.keras.Input(shape=(width, height, filters)) + test_layer = nn_blocks.DarkRouteProcess( + filters=filters, repetitions=repetitions, insert_spp=spp) + outx = test_layer(x) + self.assertLen(outx, 2, msg='len(outx) != 2') + if repetitions == 1: + filter_y1 = filters + else: + filter_y1 = filters // 2 + self.assertAllEqual( + outx[1].shape.as_list(), [None, width, height, filter_y1]) + self.assertAllEqual( + filters % 2, + 0, + msg='Output of a DarkRouteProcess layer has an odd number of filters') + self.assertAllEqual(outx[0].shape.as_list(), [None, width, height, filters]) + + @parameterized.named_parameters( + ('test1', 224, 224, 64, 7, False), ('test2', 223, 223, 32, 3, False), + ('tiny', 223, 223, 16, 1, False), ('spp', 224, 224, 64, 7, False)) + def test_gradient_pass_though(self, width, height, filters, repetitions, spp): + loss = tf.keras.losses.MeanSquaredError() + optimizer = tf.keras.optimizers.SGD() + test_layer = nn_blocks.DarkRouteProcess( + filters=filters, repetitions=repetitions, insert_spp=spp) + + if repetitions == 1: + filter_y1 = filters + else: + filter_y1 = filters // 2 + + init = tf.random_normal_initializer() + x = tf.Variable( + initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) + y_0 = tf.Variable( + initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) + y_1 = tf.Variable( + initial_value=init( + shape=(1, width, height, filter_y1), dtype=tf.float32)) + + with tf.GradientTape() as tape: + x_hat_0, x_hat_1 = test_layer(x) + grad_loss_0 = loss(x_hat_0, y_0) + grad_loss_1 = loss(x_hat_1, y_1) + grad = tape.gradient([grad_loss_0, grad_loss_1], + test_layer.trainable_variables) + optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) + + self.assertNotIn(None, grad) + return + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/projects/yolo/modeling/yolo_model.py b/official/projects/yolo/modeling/yolo_model.py similarity index 95% rename from official/vision/beta/projects/yolo/modeling/yolo_model.py rename to official/projects/yolo/modeling/yolo_model.py index 06f79750ea8434c9ff6380b7264fbda1583d03ad..30929386b47a74ce1fd38fe19de21fe49e5acf13 100644 --- a/official/vision/beta/projects/yolo/modeling/yolo_model.py +++ b/official/projects/yolo/modeling/yolo_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ from typing import Mapping, Union import tensorflow as tf -from official.vision.beta.projects.yolo.modeling.layers import nn_blocks +from official.projects.yolo.modeling.layers import nn_blocks class Yolo(tf.keras.Model): diff --git a/official/projects/yolo/ops/__init__.py b/official/projects/yolo/ops/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..ba97902e7ec1e12871c0fad301b9ce48c92cf1d1 --- /dev/null +++ b/official/projects/yolo/ops/__init__.py @@ -0,0 +1,15 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + diff --git a/official/projects/yolo/ops/anchor.py b/official/projects/yolo/ops/anchor.py new file mode 100644 index 0000000000000000000000000000000000000000..aeaf4d393ebf437a5a0c3f500f67d6bc04c62c08 --- /dev/null +++ b/official/projects/yolo/ops/anchor.py @@ -0,0 +1,481 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Yolo Anchor labler.""" +import numpy as np +import tensorflow as tf + +from official.projects.yolo.ops import box_ops +from official.projects.yolo.ops import loss_utils +from official.projects.yolo.ops import preprocessing_ops + +INF = 10000000 + + +def get_best_anchor(y_true, + anchors, + stride, + width=1, + height=1, + iou_thresh=0.25, + best_match_only=False, + use_tie_breaker=True): + """Get the correct anchor that is assoiciated with each box using IOU. + + Args: + y_true: tf.Tensor[] for the list of bounding boxes in the yolo format. + anchors: list or tensor for the anchor boxes to be used in prediction found + via Kmeans. + stride: `int` stride for the anchors. + width: int for the image width. + height: int for the image height. + iou_thresh: `float` the minimum iou threshold to use for selecting boxes for + each level. + best_match_only: `bool` if the box only has one match and it is less than + the iou threshold, when set to True, this match will be dropped as no + anchors can be linked to it. + use_tie_breaker: `bool` if there is many anchors for a given box, then + attempt to use all of them, if False, only the first matching box will be + used. + Returns: + tf.Tensor: y_true with the anchor associated with each ground truth box + known + """ + with tf.name_scope('get_best_anchor'): + width = tf.cast(width, dtype=tf.float32) + height = tf.cast(height, dtype=tf.float32) + scaler = tf.convert_to_tensor([width, height]) + + # scale to levels houts width and height + true_wh = tf.cast(y_true[..., 2:4], dtype=tf.float32) * scaler + + # scale down from large anchor to small anchor type + anchors = tf.cast(anchors, dtype=tf.float32) / stride + + k = tf.shape(anchors)[0] + + anchors = tf.concat([tf.zeros_like(anchors), anchors], axis=-1) + truth_comp = tf.concat([tf.zeros_like(true_wh), true_wh], axis=-1) + + if iou_thresh >= 1.0: + anchors = tf.expand_dims(anchors, axis=-2) + truth_comp = tf.expand_dims(truth_comp, axis=-3) + + aspect = truth_comp[..., 2:4] / anchors[..., 2:4] + aspect = tf.where(tf.math.is_nan(aspect), tf.zeros_like(aspect), aspect) + aspect = tf.maximum(aspect, 1 / aspect) + aspect = tf.where(tf.math.is_nan(aspect), tf.zeros_like(aspect), aspect) + aspect = tf.reduce_max(aspect, axis=-1) + + values, indexes = tf.math.top_k( + tf.transpose(-aspect, perm=[1, 0]), + k=tf.cast(k, dtype=tf.int32), + sorted=True) + values = -values + ind_mask = tf.cast(values < iou_thresh, dtype=indexes.dtype) + else: + truth_comp = box_ops.xcycwh_to_yxyx(truth_comp) + anchors = box_ops.xcycwh_to_yxyx(anchors) + iou_raw = box_ops.aggregated_comparitive_iou( + truth_comp, + anchors, + iou_type=3, + ) + values, indexes = tf.math.top_k( + iou_raw, k=tf.cast(k, dtype=tf.int32), sorted=True) + ind_mask = tf.cast(values >= iou_thresh, dtype=indexes.dtype) + + # pad the indexs such that all values less than the thresh are -1 + # add one, multiply the mask to zeros all the bad locations + # subtract 1 makeing all the bad locations 0. + if best_match_only: + iou_index = ((indexes[..., 0:] + 1) * ind_mask[..., 0:]) - 1 + elif use_tie_breaker: + iou_index = tf.concat([ + tf.expand_dims(indexes[..., 0], axis=-1), + ((indexes[..., 1:] + 1) * ind_mask[..., 1:]) - 1 + ], + axis=-1) + else: + iou_index = tf.concat([ + tf.expand_dims(indexes[..., 0], axis=-1), + tf.zeros_like(indexes[..., 1:]) - 1 + ], + axis=-1) + + return tf.cast(iou_index, dtype=tf.float32), tf.cast(values, dtype=tf.float32) + + +class YoloAnchorLabeler: + """Anchor labeler for the Yolo Models.""" + + def __init__(self, + anchors=None, + anchor_free_level_limits=None, + level_strides=None, + center_radius=None, + max_num_instances=200, + match_threshold=0.25, + best_matches_only=False, + use_tie_breaker=True, + darknet=False, + dtype='float32'): + """Initialization for anchor labler. + + Args: + anchors: `Dict[List[Union[int, float]]]` values for each anchor box. + anchor_free_level_limits: `List` the box sizes that will be allowed at + each FPN level as is done in the FCOS and YOLOX paper for anchor free + box assignment. + level_strides: `Dict[int]` for how much the model scales down the images + at the each level. + center_radius: `Dict[float]` for radius around each box center to search + for extra centers in each level. + max_num_instances: `int` for the number of boxes to compute loss on. + match_threshold: `float` indicating the threshold over which an anchor + will be considered for prediction, at zero, all the anchors will be used + and at 1.0 only the best will be used. for anchor thresholds larger than + 1.0 we stop using the IOU for anchor comparison and resort directly to + comparing the width and height, this is used for the scaled models. + best_matches_only: `boolean` indicating how boxes are selected for + optimization. + use_tie_breaker: `boolean` indicating whether to use the anchor threshold + value. + darknet: `boolean` indicating which data pipeline to use. Setting to True + swaps the pipeline to output images realtive to Yolov4 and older. + dtype: `str` indicating the output datatype of the datapipeline selecting + from {"float32", "float16", "bfloat16"}. + """ + self.anchors = anchors + self.masks = self._get_mask() + self.anchor_free_level_limits = self._get_level_limits( + anchor_free_level_limits) + + if darknet and self.anchor_free_level_limits is None: + center_radius = None + + self.keys = self.anchors.keys() + if self.anchor_free_level_limits is not None: + maxim = 2000 + match_threshold = -0.01 + self.num_instances = {key: maxim for key in self.keys} + elif not darknet: + self.num_instances = { + key: (6 - i) * max_num_instances for i, key in enumerate(self.keys) + } + else: + self.num_instances = {key: max_num_instances for key in self.keys} + + self.center_radius = center_radius + self.level_strides = level_strides + self.match_threshold = match_threshold + self.best_matches_only = best_matches_only + self.use_tie_breaker = use_tie_breaker + self.dtype = dtype + + def _get_mask(self): + """For each level get indexs of each anchor for box search across levels.""" + masks = {} + start = 0 + + minimum = int(min(self.anchors.keys())) + maximum = int(max(self.anchors.keys())) + for i in range(minimum, maximum + 1): + per_scale = len(self.anchors[str(i)]) + masks[str(i)] = list(range(start, per_scale + start)) + start += per_scale + return masks + + def _get_level_limits(self, level_limits): + """For each level receptive feild range for anchor free box placement.""" + if level_limits is not None: + level_limits_dict = {} + level_limits = [0.0] + level_limits + [np.inf] + + for i, key in enumerate(self.anchors.keys()): + level_limits_dict[key] = level_limits[i:i + 2] + else: + level_limits_dict = None + return level_limits_dict + + def _tie_breaking_search(self, anchors, mask, boxes, classes): + """After search, link each anchor ind to the correct map in ground truth.""" + mask = tf.cast(tf.reshape(mask, [1, 1, 1, -1]), anchors.dtype) + anchors = tf.expand_dims(anchors, axis=-1) + viable = tf.where(tf.squeeze(anchors == mask, axis=0)) + + gather_id, _, anchor_id = tf.split(viable, 3, axis=-1) + + boxes = tf.gather_nd(boxes, gather_id) + classes = tf.gather_nd(classes, gather_id) + + classes = tf.expand_dims(classes, axis=-1) + classes = tf.cast(classes, boxes.dtype) + anchor_id = tf.cast(anchor_id, boxes.dtype) + return boxes, classes, anchor_id + + def _get_anchor_id(self, + key, + boxes, + classes, + width, + height, + stride, + iou_index=None): + """Find the object anchor assignments in an anchor based paradigm.""" + + # find the best anchor + anchors = self.anchors[key] + num_anchors = len(anchors) + if self.best_matches_only: + # get the best anchor for each box + iou_index, _ = get_best_anchor( + boxes, + anchors, + stride, + width=width, + height=height, + best_match_only=True, + iou_thresh=self.match_threshold) + mask = range(num_anchors) + else: + # search is done across FPN levels, get the mask of anchor indexes + # corralated to this level. + mask = self.masks[key] + + # search for the correct box to use + (boxes, classes, + anchors) = self._tie_breaking_search(iou_index, mask, boxes, classes) + return boxes, classes, anchors, num_anchors + + def _get_centers(self, boxes, classes, anchors, width, height, scale_xy): + """Find the object center assignments in an anchor based paradigm.""" + offset = tf.cast(0.5 * (scale_xy - 1), boxes.dtype) + + grid_xy, _ = tf.split(boxes, 2, axis=-1) + wh_scale = tf.cast(tf.convert_to_tensor([width, height]), boxes.dtype) + + grid_xy = grid_xy * wh_scale + centers = tf.math.floor(grid_xy) + + if offset != 0.0: + clamp = lambda x, ma: tf.maximum( # pylint:disable=g-long-lambda + tf.minimum(x, tf.cast(ma, x.dtype)), tf.zeros_like(x)) + + grid_xy_index = grid_xy - centers + positive_shift = ((grid_xy_index < offset) & (grid_xy > 1.)) + negative_shift = ((grid_xy_index > (1 - offset)) & (grid_xy < + (wh_scale - 1.))) + + zero, _ = tf.split(tf.ones_like(positive_shift), 2, axis=-1) + shift_mask = tf.concat([zero, positive_shift, negative_shift], axis=-1) + offset = tf.cast([[0, 0], [1, 0], [0, 1], [-1, 0], [0, -1]], + offset.dtype) * offset + + num_shifts = tf.shape(shift_mask) + num_shifts = num_shifts[-1] + boxes = tf.tile(tf.expand_dims(boxes, axis=-2), [1, num_shifts, 1]) + classes = tf.tile(tf.expand_dims(classes, axis=-2), [1, num_shifts, 1]) + anchors = tf.tile(tf.expand_dims(anchors, axis=-2), [1, num_shifts, 1]) + + shift_mask = tf.cast(shift_mask, boxes.dtype) + shift_ind = shift_mask * tf.range(0, num_shifts, dtype=boxes.dtype) + shift_ind = shift_ind - (1 - shift_mask) + shift_ind = tf.expand_dims(shift_ind, axis=-1) + + boxes_and_centers = tf.concat([boxes, classes, anchors, shift_ind], + axis=-1) + boxes_and_centers = tf.reshape(boxes_and_centers, [-1, 7]) + _, center_ids = tf.split(boxes_and_centers, [6, 1], axis=-1) + + select = tf.where(center_ids >= 0) + select, _ = tf.split(select, 2, axis=-1) + + boxes_and_centers = tf.gather_nd(boxes_and_centers, select) + + center_ids = tf.gather_nd(center_ids, select) + center_ids = tf.cast(center_ids, tf.int32) + shifts = tf.gather_nd(offset, center_ids) + + boxes, classes, anchors, _ = tf.split( + boxes_and_centers, [4, 1, 1, 1], axis=-1) + grid_xy, _ = tf.split(boxes, 2, axis=-1) + centers = tf.math.floor(grid_xy * wh_scale - shifts) + centers = clamp(centers, wh_scale - 1) + + x, y = tf.split(centers, 2, axis=-1) + centers = tf.cast(tf.concat([y, x, anchors], axis=-1), tf.int32) + return boxes, classes, centers + + def _get_anchor_free(self, key, boxes, classes, height, width, stride, + center_radius): + """Find the box assignements in an anchor free paradigm.""" + level_limits = self.anchor_free_level_limits[key] + gen = loss_utils.GridGenerator(anchors=[[1, 1]], scale_anchors=stride) + grid_points = gen(width, height, 1, boxes.dtype)[0] + grid_points = tf.squeeze(grid_points, axis=0) + box_list = boxes + class_list = classes + + grid_points = (grid_points + 0.5) * stride + x_centers, y_centers = grid_points[..., 0], grid_points[..., 1] + boxes *= (tf.convert_to_tensor([width, height, width, height]) * stride) + + tlbr_boxes = box_ops.xcycwh_to_yxyx(boxes) + + boxes = tf.reshape(boxes, [1, 1, -1, 4]) + tlbr_boxes = tf.reshape(tlbr_boxes, [1, 1, -1, 4]) + if self.use_tie_breaker: + area = tf.reduce_prod(boxes[..., 2:], axis=-1) + + # check if the box is in the receptive feild of the this fpn level + b_t = y_centers - tlbr_boxes[..., 0] + b_l = x_centers - tlbr_boxes[..., 1] + b_b = tlbr_boxes[..., 2] - y_centers + b_r = tlbr_boxes[..., 3] - x_centers + box_delta = tf.stack([b_t, b_l, b_b, b_r], axis=-1) + if level_limits is not None: + max_reg_targets_per_im = tf.reduce_max(box_delta, axis=-1) + gt_min = max_reg_targets_per_im >= level_limits[0] + gt_max = max_reg_targets_per_im <= level_limits[1] + is_in_boxes = tf.logical_and(gt_min, gt_max) + else: + is_in_boxes = tf.reduce_min(box_delta, axis=-1) > 0.0 + is_in_boxes_all = tf.reduce_any(is_in_boxes, axis=(0, 1), keepdims=True) + + # check if the center is in the receptive feild of the this fpn level + c_t = y_centers - (boxes[..., 1] - center_radius * stride) + c_l = x_centers - (boxes[..., 0] - center_radius * stride) + c_b = (boxes[..., 1] + center_radius * stride) - y_centers + c_r = (boxes[..., 0] + center_radius * stride) - x_centers + centers_delta = tf.stack([c_t, c_l, c_b, c_r], axis=-1) + is_in_centers = tf.reduce_min(centers_delta, axis=-1) > 0.0 + is_in_centers_all = tf.reduce_any(is_in_centers, axis=(0, 1), keepdims=True) + + # colate all masks to get the final locations + is_in_index = tf.logical_or(is_in_boxes_all, is_in_centers_all) + is_in_boxes_and_center = tf.logical_and(is_in_boxes, is_in_centers) + is_in_boxes_and_center = tf.logical_and(is_in_index, is_in_boxes_and_center) + + if self.use_tie_breaker: + boxes_all = tf.cast(is_in_boxes_and_center, area.dtype) + boxes_all = ((boxes_all * area) + ((1 - boxes_all) * INF)) + boxes_min = tf.reduce_min(boxes_all, axis=-1, keepdims=True) + boxes_min = tf.where(boxes_min == INF, -1.0, boxes_min) + is_in_boxes_and_center = boxes_all == boxes_min + + # construct the index update grid + reps = tf.reduce_sum(tf.cast(is_in_boxes_and_center, tf.int16), axis=-1) + indexes = tf.cast(tf.where(is_in_boxes_and_center), tf.int32) + y, x, t = tf.split(indexes, 3, axis=-1) + + boxes = tf.gather_nd(box_list, t) + classes = tf.cast(tf.gather_nd(class_list, t), boxes.dtype) + reps = tf.gather_nd(reps, tf.concat([y, x], axis=-1)) + reps = tf.cast(tf.expand_dims(reps, axis=-1), boxes.dtype) + classes = tf.cast(tf.expand_dims(classes, axis=-1), boxes.dtype) + conf = tf.ones_like(classes) + + # return the samples and the indexes + samples = tf.concat([boxes, conf, classes], axis=-1) + indexes = tf.concat([y, x, tf.zeros_like(t)], axis=-1) + return indexes, samples + + def build_label_per_path(self, + key, + boxes, + classes, + width, + height, + iou_index=None): + """Builds the labels for one path.""" + stride = self.level_strides[key] + scale_xy = self.center_radius[key] if self.center_radius is not None else 1 + + width = tf.cast(width // stride, boxes.dtype) + height = tf.cast(height // stride, boxes.dtype) + + if self.anchor_free_level_limits is None: + (boxes, classes, anchors, num_anchors) = self._get_anchor_id( + key, boxes, classes, width, height, stride, iou_index=iou_index) + boxes, classes, centers = self._get_centers(boxes, classes, anchors, + width, height, scale_xy) + ind_mask = tf.ones_like(classes) + updates = tf.concat([boxes, ind_mask, classes], axis=-1) + else: + num_anchors = 1 + (centers, updates) = self._get_anchor_free(key, boxes, classes, height, + width, stride, scale_xy) + boxes, ind_mask, classes = tf.split(updates, [4, 1, 1], axis=-1) + + width = tf.cast(width, tf.int32) + height = tf.cast(height, tf.int32) + full = tf.zeros([height, width, num_anchors, 1], dtype=classes.dtype) + full = tf.tensor_scatter_nd_add(full, centers, ind_mask) + + num_instances = int(self.num_instances[key]) + centers = preprocessing_ops.pad_max_instances( + centers, num_instances, pad_value=0, pad_axis=0) + updates = preprocessing_ops.pad_max_instances( + updates, num_instances, pad_value=0, pad_axis=0) + + updates = tf.cast(updates, self.dtype) + full = tf.cast(full, self.dtype) + return centers, updates, full + + def __call__(self, boxes, classes, width, height): + """Builds the labels for a single image, not functional in batch mode. + + Args: + boxes: `Tensor` of shape [None, 4] indicating the object locations in an + image. + classes: `Tensor` of shape [None] indicating the each objects classes. + width: `int` for the images width. + height: `int` for the images height. + + Returns: + centers: `Tensor` of shape [None, 3] of indexes in the final grid where + boxes are located. + updates: `Tensor` of shape [None, 8] the value to place in the final grid. + full: `Tensor` of [width/stride, height/stride, num_anchors, 1] holding + a mask of where boxes are locates for confidence losses. + """ + indexes = {} + updates = {} + true_grids = {} + iou_index = None + + boxes = box_ops.yxyx_to_xcycwh(boxes) + if not self.best_matches_only and self.anchor_free_level_limits is None: + # stitch and search boxes across fpn levels + anchorsvec = [] + for stitch in self.anchors: + anchorsvec.extend(self.anchors[stitch]) + + stride = tf.cast([width, height], boxes.dtype) + # get the best anchor for each box + iou_index, _ = get_best_anchor( + boxes, + anchorsvec, + stride, + width=1.0, + height=1.0, + best_match_only=False, + use_tie_breaker=self.use_tie_breaker, + iou_thresh=self.match_threshold) + + for key in self.keys: + indexes[key], updates[key], true_grids[key] = self.build_label_per_path( + key, boxes, classes, width, height, iou_index=iou_index) + return indexes, updates, true_grids diff --git a/official/projects/yolo/ops/box_ops.py b/official/projects/yolo/ops/box_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..2ccbc5eb5f8b7ef4a9d8b8032c99a1e1e9f85e23 --- /dev/null +++ b/official/projects/yolo/ops/box_ops.py @@ -0,0 +1,322 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Yolo box ops.""" +import math +import tensorflow as tf +from official.projects.yolo.ops import math_ops + + +def yxyx_to_xcycwh(box: tf.Tensor): + """Converts boxes from yxyx to x_center, y_center, width, height. + + Args: + box: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes in ymin, xmin, ymax, xmax. + + Returns: + box: a `Tensor` whose shape is the same as `box` in new format. + """ + with tf.name_scope('yxyx_to_xcycwh'): + ymin, xmin, ymax, xmax = tf.split(box, 4, axis=-1) + x_center = (xmax + xmin) / 2 + y_center = (ymax + ymin) / 2 + width = xmax - xmin + height = ymax - ymin + box = tf.concat([x_center, y_center, width, height], axis=-1) + return box + + +def xcycwh_to_yxyx(box: tf.Tensor): + """Converts boxes from x_center, y_center, width, height to yxyx format. + + Args: + box: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes in x_center, y_center, width, height. + + Returns: + box: a `Tensor` whose shape is the same as `box` in new format. + """ + with tf.name_scope('xcycwh_to_yxyx'): + xy, wh = tf.split(box, 2, axis=-1) + xy_min = xy - wh / 2 + xy_max = xy + wh / 2 + x_min, y_min = tf.split(xy_min, 2, axis=-1) + x_max, y_max = tf.split(xy_max, 2, axis=-1) + box = tf.concat([y_min, x_min, y_max, x_max], axis=-1) + return box + + +def intersect_and_union(box1, box2, yxyx=False): + """Calculates the intersection and union between box1 and box2. + + Args: + box1: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + box2: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + yxyx: a `bool` indicating whether the input box is of the format x_center + y_center, width, height or y_min, x_min, y_max, x_max. + + Returns: + intersection: a `Tensor` who represents the intersection. + union: a `Tensor` who represents the union. + """ + if not yxyx: + box1_area = tf.reduce_prod(tf.split(box1, 2, axis=-1)[-1], axis=-1) + box2_area = tf.reduce_prod(tf.split(box2, 2, axis=-1)[-1], axis=-1) + box1 = xcycwh_to_yxyx(box1) + box2 = xcycwh_to_yxyx(box2) + + b1mi, b1ma = tf.split(box1, 2, axis=-1) + b2mi, b2ma = tf.split(box2, 2, axis=-1) + intersect_mins = tf.math.maximum(b1mi, b2mi) + intersect_maxes = tf.math.minimum(b1ma, b2ma) + intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins, 0.0) + intersection = tf.reduce_prod(intersect_wh, axis=-1) + + if yxyx: + box1_area = tf.reduce_prod(b1ma - b1mi, axis=-1) + box2_area = tf.reduce_prod(b2ma - b2mi, axis=-1) + union = box1_area + box2_area - intersection + return intersection, union + + +def smallest_encompassing_box(box1, box2, yxyx=False, clip=False): + """Calculates the smallest box that encompasses box1 and box2. + + Args: + box1: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + box2: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + yxyx: a `bool` indicating whether the input box is of the format x_center + y_center, width, height or y_min, x_min, y_max, x_max. + clip: a `bool`, whether or not to clip boxes. + + Returns: + box_c: a `Tensor` whose last dimension is 4 representing the coordinates of + boxes, the return format is y_min, x_min, y_max, x_max if yxyx is set to + to True. In other words it will match the input format. + """ + if not yxyx: + box1 = xcycwh_to_yxyx(box1) + box2 = xcycwh_to_yxyx(box2) + + b1mi, b1ma = tf.split(box1, 2, axis=-1) + b2mi, b2ma = tf.split(box2, 2, axis=-1) + + bcmi = tf.math.minimum(b1mi, b2mi) + bcma = tf.math.maximum(b1ma, b2ma) + box_c = tf.concat([bcmi, bcma], axis=-1) + + if not yxyx: + box_c = yxyx_to_xcycwh(box_c) + + if clip: + bca = tf.reduce_prod(bcma - bcmi, keepdims=True, axis=-1) + box_c = tf.where(bca <= 0.0, tf.zeros_like(box_c), box_c) + return bcmi, bcma, box_c + + +def compute_iou(box1, box2, yxyx=False): + """Calculates the intersection over union between box1 and box2. + + Args: + box1: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + box2: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + yxyx: a `bool` indicating whether the input box is of the format x_center + y_center, width, height or y_min, x_min, y_max, x_max. + + Returns: + iou: a `Tensor` who represents the intersection over union. + """ + with tf.name_scope('iou'): + intersection, union = intersect_and_union(box1, box2, yxyx=yxyx) + iou = math_ops.divide_no_nan(intersection, union) + return iou + + +def compute_giou(box1, box2, yxyx=False): + """Calculates the General intersection over union between box1 and box2. + + Args: + box1: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + box2: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + yxyx: a `bool` indicating whether the input box is of the format x_center + y_center, width, height or y_min, x_min, y_max, x_max. + + Returns: + giou: a `Tensor` who represents the General intersection over union. + """ + with tf.name_scope('giou'): + if not yxyx: + yxyx1 = xcycwh_to_yxyx(box1) + yxyx2 = xcycwh_to_yxyx(box2) + else: + yxyx1, yxyx2 = box1, box2 + + cmi, cma, _ = smallest_encompassing_box(yxyx1, yxyx2, yxyx=True) + intersection, union = intersect_and_union(yxyx1, yxyx2, yxyx=True) + iou = math_ops.divide_no_nan(intersection, union) + + bcwh = cma - cmi + c = tf.math.reduce_prod(bcwh, axis=-1) + + regularization = math_ops.divide_no_nan((c - union), c) + giou = iou - regularization + return iou, giou + + +def compute_diou(box1, box2, beta=1.0, yxyx=False): + """Calculates the distance intersection over union between box1 and box2. + + Args: + box1: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + box2: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + beta: a `float` indicating the amount to scale the distance iou + regularization term. + yxyx: a `bool` indicating whether the input box is of the format x_center + y_center, width, height or y_min, x_min, y_max, x_max. + + Returns: + diou: a `Tensor` who represents the distance intersection over union. + """ + with tf.name_scope('diou'): + # compute center distance + if not yxyx: + xycc1, xycc2 = box1, box2 + yxyx1 = xcycwh_to_yxyx(box1) + yxyx2 = xcycwh_to_yxyx(box2) + else: + yxyx1, yxyx2 = box1, box2 + xycc1 = yxyx_to_xcycwh(box1) + xycc2 = yxyx_to_xcycwh(box2) + + cmi, cma, _ = smallest_encompassing_box(yxyx1, yxyx2, yxyx=True) + intersection, union = intersect_and_union(yxyx1, yxyx2, yxyx=True) + iou = math_ops.divide_no_nan(intersection, union) + + b1xy, _ = tf.split(xycc1, 2, axis=-1) + b2xy, _ = tf.split(xycc2, 2, axis=-1) + bcwh = cma - cmi + + center_dist = tf.reduce_sum((b1xy - b2xy)**2, axis=-1) + c_diag = tf.reduce_sum(bcwh**2, axis=-1) + + regularization = math_ops.divide_no_nan(center_dist, c_diag) + diou = iou - regularization**beta + return iou, diou + + +def compute_ciou(box1, box2, yxyx=False, darknet=False): + """Calculates the complete intersection over union between box1 and box2. + + Args: + box1: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + box2: any `Tensor` whose last dimension is 4 representing the coordinates of + boxes. + yxyx: a `bool` indicating whether the input box is of the format x_center + y_center, width, height or y_min, x_min, y_max, x_max. + darknet: a `bool` indicating whether the calling function is the YOLO + darknet loss. + + Returns: + ciou: a `Tensor` who represents the complete intersection over union. + """ + with tf.name_scope('ciou'): + if not yxyx: + xycc1, xycc2 = box1, box2 + yxyx1 = xcycwh_to_yxyx(box1) + yxyx2 = xcycwh_to_yxyx(box2) + else: + yxyx1, yxyx2 = box1, box2 + xycc1 = yxyx_to_xcycwh(box1) + xycc2 = yxyx_to_xcycwh(box2) + + # Build the smallest encomapssing box. + cmi, cma, _ = smallest_encompassing_box(yxyx1, yxyx2, yxyx=True) + intersection, union = intersect_and_union(yxyx1, yxyx2, yxyx=True) + iou = math_ops.divide_no_nan(intersection, union) + + b1xy, b1w, b1h = tf.split(xycc1, [2, 1, 1], axis=-1) + b2xy, b2w, b2h = tf.split(xycc2, [2, 1, 1], axis=-1) + bchw = cma - cmi + + # Center regularization + center_dist = tf.reduce_sum((b1xy - b2xy)**2, axis=-1) + c_diag = tf.reduce_sum(bchw**2, axis=-1) + regularization = math_ops.divide_no_nan(center_dist, c_diag) + + # Computer aspect ratio consistency + terma = math_ops.divide_no_nan(b1w, b1h) # gt + termb = math_ops.divide_no_nan(b2w, b2h) # pred + arcterm = tf.squeeze( + tf.math.pow(tf.math.atan(termb) - tf.math.atan(terma), 2), axis=-1) + v = (4 / math.pi**2) * arcterm + + # Compute the aspect ratio weight, should be treated as a constant + a = tf.stop_gradient(math_ops.divide_no_nan(v, 1 - iou + v)) + + if darknet: + grad_scale = tf.stop_gradient(tf.square(b2w) + tf.square(b2h)) + v *= tf.squeeze(grad_scale, axis=-1) + + ciou = iou - regularization - (v * a) + return iou, ciou + + +def aggregated_comparitive_iou(boxes1, boxes2=None, iou_type=0, beta=0.6): + """Calculates the IOU between two set of boxes. + + Similar to bbox_overlap but far more versitile. + + Args: + boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates + of boxes. + boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates + of boxes. + iou_type: `integer` representing the iou version to use, 0 is distance iou, + 1 is the general iou, 2 is the complete iou, any other number uses the + standard iou. + beta: `float` for the scaling quantity to apply to distance iou + regularization. + + Returns: + iou: a `Tensor` who represents the intersection over union in of the + expected/input type. + """ + boxes1 = tf.expand_dims(boxes1, axis=-2) + + if boxes2 is not None: + boxes2 = tf.expand_dims(boxes2, axis=-3) + else: + boxes2 = tf.transpose(boxes1, perm=(0, 2, 1, 3)) + + if iou_type == 0 or iou_type == 'diou': # diou + _, iou = compute_diou(boxes1, boxes2, beta=beta, yxyx=True) + elif iou_type == 1 or iou_type == 'giou': # giou + _, iou = compute_giou(boxes1, boxes2, yxyx=True) + elif iou_type == 2 or iou_type == 'ciou': # ciou + _, iou = compute_ciou(boxes1, boxes2, yxyx=True) + else: + iou = compute_iou(boxes1, boxes2, yxyx=True) + return iou diff --git a/official/vision/beta/projects/yolo/ops/box_ops_test.py b/official/projects/yolo/ops/box_ops_test.py similarity index 94% rename from official/vision/beta/projects/yolo/ops/box_ops_test.py rename to official/projects/yolo/ops/box_ops_test.py index afba1ee53c191666a122f904d8847dd5cfb0335c..5374cde9877a2faaf91db7a96783cccfeb3b52dc 100644 --- a/official/vision/beta/projects/yolo/ops/box_ops_test.py +++ b/official/projects/yolo/ops/box_ops_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.projects.yolo.ops import box_ops +from official.projects.yolo.ops import box_ops class InputUtilsTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/yolo/ops/kmeans_anchors.py b/official/projects/yolo/ops/kmeans_anchors.py new file mode 100644 index 0000000000000000000000000000000000000000..30278c6ab5284ac3d165f300599c78b12ab7145a --- /dev/null +++ b/official/projects/yolo/ops/kmeans_anchors.py @@ -0,0 +1,317 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""K-means for generation of anchor boxes for YOLO.""" +import logging + +import numpy as np +import tensorflow as tf + +from official.core import input_reader +from official.projects.yolo.ops import box_ops + + +def _iou(x, centroids_x, iou_type="iou"): + """Compute the WH IOU between the ground truths and the centroids.""" + + # set the center of the boxes to zeros + x = tf.concat([tf.zeros_like(x), x], axis=-1) + centroids = tf.concat([tf.zeros_like(centroids_x), centroids_x], axis=-1) + + # compute IOU + if iou_type == "iou": + iou, _ = box_ops.compute_giou(x, centroids) + else: + _, iou = box_ops.compute_giou(x, centroids) + return iou + + +class AnchorKMeans: + """Box Anchor K-means.""" + + @property + def boxes(self): + return self._boxes.numpy() + + def get_box_from_dataset(self, dataset, num_samples=-1): + """Load all the boxes in the dataset into memory.""" + box_list = [] + + for i, sample in enumerate(dataset): + if num_samples > 0 and i > num_samples: + break + width = sample["width"] + height = sample["height"] + boxes = sample["groundtruth_boxes"] + + # convert the box format from yxyx to xywh to allow + # kmeans by width height IOU + scale = tf.cast([width, height], boxes.dtype) + + # scale the boxes then remove excessily small boxes that are + # less than 1 pixel in width or height + boxes = box_ops.yxyx_to_xcycwh(boxes)[..., 2:] * scale + boxes = boxes[tf.reduce_max(boxes, axis=-1) >= 1] / scale + box_list.append(boxes) + + # loading is slow, so log the current iteration as a progress bar + tf.print("loading sample: ", i, end="\r") + + box_list = tf.concat(box_list, axis=0) + inds = tf.argsort(tf.reduce_prod(box_list, axis=-1), axis=0) + box_list = tf.gather(box_list, inds, axis=0) + self._boxes = box_list + + def get_init_centroids(self, boxes, k): + """Initialize centroids by splitting the sorted boxes into k groups.""" + box_num = tf.shape(boxes)[0] + + # fixed_means + split = box_num // k + bn2 = split * k + boxes = boxes[:bn2, :] + cluster_groups = tf.split(boxes, k, axis=0) + clusters = [] + for c in cluster_groups: + clusters.append(tf.reduce_mean(c, axis=0)) + clusters = tf.convert_to_tensor(clusters).numpy() + return clusters + + def iou(self, boxes, clusters): + """Computes iou.""" + # broadcast the clusters to the same shape as the boxes + n = tf.shape(boxes)[0] + k = tf.shape(clusters)[0] + boxes = tf.repeat(boxes, k, axis=0) + boxes = tf.reshape(boxes, (n, k, -1)) + boxes = tf.cast(boxes, tf.float32) + + clusters = tf.tile(clusters, [n, 1]) + clusters = tf.reshape(clusters, (n, k, -1)) + clusters = tf.cast(clusters, tf.float32) + + # compute the IOU + return _iou(boxes, clusters) + + def maximization(self, boxes, clusters, assignments): + """K-means maximization term.""" + for i in range(clusters.shape[0]): + hold = tf.math.reduce_mean(boxes[assignments == i], axis=0) + clusters = tf.tensor_scatter_nd_update(clusters, [[i]], [hold]) + return clusters + + def _kmeans(self, boxes, clusters, k, max_iters=1000): + """Run Kmeans on arbitrary boxes and clusters with k centers.""" + assignments = tf.zeros((boxes.shape[0]), dtype=tf.int64) - 1 + dists = tf.zeros((boxes.shape[0], k)) + num_iters = 1 + + # do one iteration outside of the optimization loop + dists = 1 - self.iou(boxes, clusters) + curr = tf.math.argmin(dists, axis=-1) + clusters = self.maximization(boxes, clusters, curr) + + # iterate the boxes until the clusters not longer change + while not tf.math.reduce_all(curr == assignments) and num_iters < max_iters: + # get the distiance + assignments = curr + dists = 1 - self.iou(boxes, clusters) + curr = tf.math.argmin(dists, axis=-1) + clusters = self.maximization(boxes, clusters, curr) + tf.print("k-Means box generation iteration: ", num_iters, end="\r") + num_iters += 1 + + tf.print("k-Means box generation iteration: ", num_iters, end="\n") + assignments = curr + + # sort the clusters by area then get the final assigments + clusters = tf.convert_to_tensor( + np.array(sorted(clusters.numpy(), key=lambda x: x[0] * x[1]))) + dists = 1 - self.iou(boxes, clusters) + assignments = tf.math.argmin(dists, axis=-1) + return clusters, assignments + + def run_kmeans(self, k, boxes, clusters=None): + """Kmeans Wrapping function.""" + if clusters is None: + clusters = self.get_init_centroids(boxes, k) + clusters, assignments = self._kmeans(boxes, clusters, k) + return clusters.numpy(), assignments.numpy() + + def _avg_iou(self, boxes, clusters, assignments): + """Compute the IOU between the centroid and the boxes in the centroid.""" + ious = [] + num_boxes = [] + clusters1 = tf.split(clusters, clusters.shape[0], axis=0) + for i, c in enumerate(clusters1): + hold = boxes[assignments == i] + iou = tf.reduce_mean(self.iou(hold, c)).numpy() + ious.append(iou) + num_boxes.append(hold.shape[0]) + + clusters = np.floor(np.array(sorted(clusters, key=lambda x: x[0] * x[1]))) + print("boxes: ", clusters.tolist()) + print("iou over cluster : ", ious) + print("boxes per cluster: ", num_boxes) + print("dataset avgiou: ", np.mean(iou)) + return ious + + def avg_iou_total(self, boxes, clusters): + clusters = tf.convert_to_tensor(clusters) + dists = 1 - self.iou(boxes, clusters) + assignments = tf.math.argmin(dists, axis=-1) + ious = self._avg_iou(boxes, clusters, assignments) + return clusters, assignments, ious + + def get_boxes(self, boxes_, clusters, assignments=None): + """given a the clusters, the boxes in each cluster.""" + if assignments is None: + dists = 1 - self.iou(boxes_, np.array(clusters)) + assignments = tf.math.argmin(dists, axis=-1) + boxes = [] + clusters = tf.split(clusters, clusters.shape[0], axis=0) + for i, _ in enumerate(clusters): + hold = boxes_[assignments == i] + if hasattr(hold, "numpy"): + hold = hold.numpy() + boxes.append(hold) + return boxes + + def __call__(self, + dataset, + k, + anchors_per_scale=None, + scaling_mode="sqrt_log", + box_generation_mode="across_level", + image_resolution=(512, 512, 3), + num_samples=-1): + """Run k-means on th eboxes for a given input resolution. + + Args: + dataset: `tf.data.Dataset` for the decoded object detection dataset. The + boxes must have the key 'groundtruth_boxes'. + k: `int` for the number for centroids to generate. + anchors_per_scale: `int` for how many anchor boxes to use per level. + scaling_mode: `str` for the type of box scaling to used when generating + anchor boxes. Must be in the set {sqrt, default}. + box_generation_mode: `str` for the type of kmeans to use when generating + anchor boxes. Must be in the set {across_level, per_level}. + image_resolution: `List[int]` for the resolution of the boxes to run + k-means for. + num_samples: `int` for number of samples to process in the dataset. + + Returns: + boxes: `List[List[int]]` of shape [k, 2] for the anchor boxes to use for + box predicitons. + """ + self.get_box_from_dataset(dataset, num_samples=num_samples) + + if scaling_mode == "sqrt": + boxes_ls = tf.math.sqrt(self._boxes.numpy()) + else: + boxes_ls = self._boxes.numpy() + + if isinstance(image_resolution, int): + image_resolution = [image_resolution, image_resolution] + else: + image_resolution = image_resolution[:2] + image_resolution = image_resolution[::-1] + + if box_generation_mode == "even_split": + clusters = self.get_init_centroids(boxes_ls, k) + dists = 1 - self.iou(boxes_ls, np.array(clusters)) + assignments = tf.math.argmin(dists, axis=-1) + elif box_generation_mode == "across_level": + clusters = self.get_init_centroids(boxes_ls, k) + clusters, assignments = self.run_kmeans(k, boxes_ls, clusters) + else: + # generate a box region for each FPN level + clusters = self.get_init_centroids(boxes_ls, k//anchors_per_scale) + + # square off the clusters + clusters += np.roll(clusters, 1, axis=-1) + clusters /= 2 + + # for each contained box set, compute K means + boxes_sets = self.get_boxes(boxes_ls, clusters) + clusters = [] + for boxes in boxes_sets: + cluster_set, assignments = self.run_kmeans(anchors_per_scale, boxes) + clusters.extend(cluster_set) + clusters = np.array(clusters) + + dists = 1 - self.iou(boxes_ls, np.array(clusters)) + assignments = tf.math.argmin(dists, axis=-1) + + if scaling_mode == "sqrt": + clusters = tf.square(clusters) + + self._boxes *= tf.convert_to_tensor(image_resolution, self._boxes.dtype) + clusters = self.maximization(self._boxes, clusters, assignments) + if hasattr(clusters, "numpy"): + clusters = clusters.numpy() + _, _, _ = self.avg_iou_total(self._boxes, clusters) + clusters = np.floor(np.array(sorted(clusters, key=lambda x: x[0] * x[1]))) + return clusters.tolist() + + +class BoxGenInputReader(input_reader.InputReader): + """Input reader that returns a tf.data.Dataset instance.""" + + def read(self, + k, + anchors_per_scale, + scaling_mode="sqrt", + box_generation_mode="across_level", + image_resolution=(512, 512, 3), + num_samples=-1): + """Run k-means on th eboxes for a given input resolution. + + Args: + k: `int` for the number for centroids to generate. + anchors_per_scale: `int` for how many anchor boxes to use per level. + scaling_mode: `str` for the type of box scaling to used when generating + anchor boxes. Must be in the set {sqrt, none}. By default we use sqrt + to get an even distribution of anchor boxes across FPN levels. + box_generation_mode: `str` for the type of kmeans to use when generating + anchor boxes. Must be in the set {across_level, per_level}. + image_resolution: `List[int]` for the resolution of the boxes to run + k-means for. + num_samples: `Optional[int]` for the number of samples to use for kmeans, + typically about 5000 samples are all that are needed, but for the best + results use None to run the entire dataset. + + Returns: + boxes: `List[List[int]]` of shape [k, 2] for the anchor boxes to use for + box predicitons. + """ + self._is_training = False + dataset = super().read() + dataset = dataset.unbatch() + + kmeans_gen = AnchorKMeans() + boxes = kmeans_gen( + dataset, + k, + anchors_per_scale=anchors_per_scale, + image_resolution=image_resolution, + scaling_mode=scaling_mode, + box_generation_mode=box_generation_mode, + num_samples=num_samples) + del kmeans_gen # free the memory + del dataset + + logging.info("clusting complete -> default boxes used ::") + logging.info(boxes) + return boxes diff --git a/official/projects/yolo/ops/kmeans_anchors_test.py b/official/projects/yolo/ops/kmeans_anchors_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c372f27dd695b3402092eca3afb65e3f04e279fb --- /dev/null +++ b/official/projects/yolo/ops/kmeans_anchors_test.py @@ -0,0 +1,44 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""kmeans_test tests.""" +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.projects.yolo.ops import kmeans_anchors + + +class KMeansTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters((9, 3, 100)) + def test_kmeans(self, k, anchors_per_scale, samples): + sample_list = [] + for _ in range(samples): + boxes = tf.convert_to_tensor(np.random.uniform(0, 1, [k * 100, 4])) + sample_list.append({ + "groundtruth_boxes": boxes, + "width": 10, + "height": 10 + }) + + kmeans = kmeans_anchors.AnchorKMeans() + cl = kmeans( + sample_list, k, anchors_per_scale, image_resolution=[512, 512, 3]) + cl = tf.convert_to_tensor(cl) + self.assertAllEqual(tf.shape(cl).numpy(), [k, 2]) + + +if __name__ == "__main__": + tf.test.main() diff --git a/official/projects/yolo/ops/loss_utils.py b/official/projects/yolo/ops/loss_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..7e97ed4b11cdc4b2cd17d429d5a1523f5df42c06 --- /dev/null +++ b/official/projects/yolo/ops/loss_utils.py @@ -0,0 +1,632 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Yolo loss utility functions.""" + +import numpy as np +import tensorflow as tf + +from official.projects.yolo.ops import box_ops +from official.projects.yolo.ops import math_ops + + +@tf.custom_gradient +def sigmoid_bce(y, x_prime, label_smoothing): + """Applies the Sigmoid Cross Entropy Loss. + + Implements the same derivative as that found in the Darknet C library. + The derivative of this method is not the same as the standard binary cross + entropy with logits function. + + The BCE with logits function equation is as follows: + x = 1 / (1 + exp(-x_prime)) + bce = -ylog(x) - (1 - y)log(1 - x) + + The standard BCE with logits function derivative is as follows: + dloss = -y/x + (1-y)/(1-x) + dsigmoid = x * (1 - x) + dx = dloss * dsigmoid + + This derivative can be reduced simply to: + dx = (-y + x) + + This simplification is used by the darknet library in order to improve + training stability. The gradient is almost the same + as tf.keras.losses.binary_crossentropy but varies slightly and + yields different performance. + + Args: + y: `Tensor` holding ground truth data. + x_prime: `Tensor` holding the predictions prior to application of the + sigmoid operation. + label_smoothing: float value between 0.0 and 1.0 indicating the amount of + smoothing to apply to the data. + + Returns: + bce: Tensor of the be applied loss values. + delta: callable function indicating the custom gradient for this operation. + """ + + eps = 1e-9 + x = tf.math.sigmoid(x_prime) + y = tf.stop_gradient(y * (1 - label_smoothing) + 0.5 * label_smoothing) + bce = -y * tf.math.log(x + eps) - (1 - y) * tf.math.log(1 - x + eps) + + def delta(dpass): + x = tf.math.sigmoid(x_prime) + dx = (-y + x) * dpass + dy = tf.zeros_like(y) + return dy, dx, 0.0 + + return bce, delta + + +def apply_mask(mask, x, value=0): + """This function is used for gradient masking. + + The YOLO loss function makes extensive use of dynamically shaped tensors. + To allow this use case on the TPU while preserving the gradient correctly + for back propagation we use this masking function to use a tf.where operation + to hard set masked location to have a gradient and a value of zero. + + Args: + mask: A `Tensor` with the same shape as x used to select values of + importance. + x: A `Tensor` with the same shape as mask that will be getting masked. + value: `float` constant additive value. + + Returns: + x: A masked `Tensor` with the same shape as x. + """ + mask = tf.cast(mask, tf.bool) + masked = tf.where(mask, x, tf.zeros_like(x) + value) + return masked + + +def build_grid(indexes, truths, preds, ind_mask, update=False, grid=None): + """This function is used to broadcast elements into the output shape. + + This function is used to broadcasts a list of truths into the correct index + in the output shape. This is used for the ground truth map construction in + the scaled loss and the classification map in the darknet loss. + + Args: + indexes: A `Tensor` for the indexes + truths: A `Tensor` for the ground truth. + preds: A `Tensor` for the predictions. + ind_mask: A `Tensor` for the index masks. + update: A `bool` for updating the grid. + grid: A `Tensor` for the grid. + + Returns: + grid: A `Tensor` representing the augmented grid. + """ + # this function is used to broadcast all the indexes to the correct + # into the correct ground truth mask, used for iou detection map + # in the scaled loss and the classification mask in the darknet loss + num_flatten = tf.shape(preds)[-1] + + # is there a way to verify that we are not on the CPU? + ind_mask = tf.cast(ind_mask, indexes.dtype) + + # find all the batch indexes using the cumulated sum of a ones tensor + # cumsum(ones) - 1 yeild the zero indexed batches + bhep = tf.reduce_max(tf.ones_like(indexes), axis=-1, keepdims=True) + bhep = tf.math.cumsum(bhep, axis=0) - 1 + + # concatnate the batch sizes to the indexes + indexes = tf.concat([bhep, indexes], axis=-1) + indexes = apply_mask(tf.cast(ind_mask, indexes.dtype), indexes) + indexes = (indexes + (ind_mask - 1)) + + # mask truths + truths = apply_mask(tf.cast(ind_mask, truths.dtype), truths) + truths = (truths + (tf.cast(ind_mask, truths.dtype) - 1)) + + # reshape the indexes into the correct shape for the loss, + # just flatten all indexes but the last + indexes = tf.reshape(indexes, [-1, 4]) + + # also flatten the ground truth value on all axis but the last + truths = tf.reshape(truths, [-1, num_flatten]) + + # build a zero grid in the samve shape as the predicitons + if grid is None: + grid = tf.zeros_like(preds) + # remove invalid values from the truths that may have + # come up from computation, invalid = nan and inf + truths = math_ops.rm_nan_inf(truths) + + # scatter update the zero grid + if update: + grid = tf.tensor_scatter_nd_update(grid, indexes, truths) + else: + grid = tf.tensor_scatter_nd_max(grid, indexes, truths) + + # stop gradient and return to avoid TPU errors and save compute + # resources + return grid + + +class GridGenerator: + """Grid generator that generates anchor grids for box decoding.""" + + def __init__(self, anchors, scale_anchors=None): + """Initialize Grid Generator. + + Args: + anchors: A `List[List[int]]` for the anchor boxes that are used in the + model at all levels. + scale_anchors: An `int` for how much to scale this level to get the + original input shape. + """ + self.dtype = tf.keras.backend.floatx() + self._scale_anchors = scale_anchors + self._anchors = tf.convert_to_tensor(anchors) + return + + def _build_grid_points(self, lheight, lwidth, anchors, dtype): + """Generate a grid of fixed grid edges for box center decoding.""" + with tf.name_scope('center_grid'): + y = tf.range(0, lheight) + x = tf.range(0, lwidth) + x_left = tf.tile( + tf.transpose(tf.expand_dims(x, axis=-1), perm=[1, 0]), [lheight, 1]) + y_left = tf.tile(tf.expand_dims(y, axis=-1), [1, lwidth]) + x_y = tf.stack([x_left, y_left], axis=-1) + x_y = tf.cast(x_y, dtype=dtype) + num = tf.shape(anchors)[0] + x_y = tf.expand_dims( + tf.tile(tf.expand_dims(x_y, axis=-2), [1, 1, num, 1]), axis=0) + return x_y + + def _build_anchor_grid(self, height, width, anchors, dtype): + """Get the transformed anchor boxes for each dimention.""" + with tf.name_scope('anchor_grid'): + num = tf.shape(anchors)[0] + anchors = tf.cast(anchors, dtype=dtype) + anchors = tf.reshape(anchors, [1, 1, 1, num, 2]) + anchors = tf.tile(anchors, [1, tf.cast(height, tf.int32), + tf.cast(width, tf.int32), 1, 1]) + return anchors + + def _extend_batch(self, grid, batch_size): + return tf.tile(grid, [batch_size, 1, 1, 1, 1]) + + def __call__(self, height, width, batch_size, dtype=None): + if dtype is None: + self.dtype = tf.keras.backend.floatx() + else: + self.dtype = dtype + grid_points = self._build_grid_points(height, width, self._anchors, + self.dtype) + anchor_grid = self._build_anchor_grid( + height, width, + tf.cast(self._anchors, self.dtype) / + tf.cast(self._scale_anchors, self.dtype), self.dtype) + + grid_points = self._extend_batch(grid_points, batch_size) + anchor_grid = self._extend_batch(anchor_grid, batch_size) + return grid_points, anchor_grid + + +TILE_SIZE = 50 + + +class PairWiseSearch: + """Apply a pairwise search between the ground truth and the labels. + + The goal is to indicate the locations where the predictions overlap with + ground truth for dynamic ground truth associations. + """ + + def __init__(self, + iou_type='iou', + any_match=True, + min_conf=0.0, + track_boxes=False, + track_classes=False): + """Initialization of Pair Wise Search. + + Args: + iou_type: An `str` for the iou type to use. + any_match: A `bool` for any match(no class match). + min_conf: An `int` for minimum confidence threshold. + track_boxes: A `bool` dynamic box assignment. + track_classes: A `bool` dynamic class assignment. + """ + self.iou_type = iou_type + self._any = any_match + self._min_conf = min_conf + self._track_boxes = track_boxes + self._track_classes = track_classes + return + + def box_iou(self, true_box, pred_box): + # based on the type of loss, compute the iou loss for a box + # compute_ indicated the type of iou to use + if self.iou_type == 'giou': + _, iou = box_ops.compute_giou(true_box, pred_box) + elif self.iou_type == 'ciou': + _, iou = box_ops.compute_ciou(true_box, pred_box) + else: + iou = box_ops.compute_iou(true_box, pred_box) + return iou + + def _search_body(self, pred_box, pred_class, boxes, classes, running_boxes, + running_classes, max_iou, idx): + """Main search fn.""" + + # capture the batch size to be used, and gather a slice of + # boxes from the ground truth. currently TILE_SIZE = 50, to + # save memory + batch_size = tf.shape(boxes)[0] + box_slice = tf.slice(boxes, [0, idx * TILE_SIZE, 0], + [batch_size, TILE_SIZE, 4]) + + # match the dimentions of the slice to the model predictions + # shape: [batch_size, 1, 1, num, TILE_SIZE, 4] + box_slice = tf.expand_dims(box_slice, axis=1) + box_slice = tf.expand_dims(box_slice, axis=1) + box_slice = tf.expand_dims(box_slice, axis=1) + + box_grid = tf.expand_dims(pred_box, axis=-2) + + # capture the classes + class_slice = tf.slice(classes, [0, idx * TILE_SIZE], + [batch_size, TILE_SIZE]) + class_slice = tf.expand_dims(class_slice, axis=1) + class_slice = tf.expand_dims(class_slice, axis=1) + class_slice = tf.expand_dims(class_slice, axis=1) + + iou = self.box_iou(box_slice, box_grid) + + if self._min_conf > 0.0: + if not self._any: + class_grid = tf.expand_dims(pred_class, axis=-2) + class_mask = tf.one_hot( + tf.cast(class_slice, tf.int32), + depth=tf.shape(pred_class)[-1], + dtype=pred_class.dtype) + class_mask = tf.reduce_any(tf.equal(class_mask, class_grid), axis=-1) + else: + class_mask = tf.reduce_max(pred_class, axis=-1, keepdims=True) + class_mask = tf.cast(class_mask, iou.dtype) + iou *= class_mask + + max_iou_ = tf.concat([max_iou, iou], axis=-1) + max_iou = tf.reduce_max(max_iou_, axis=-1, keepdims=True) + ind = tf.expand_dims(tf.argmax(max_iou_, axis=-1), axis=-1) + + if self._track_boxes: + running_boxes = tf.expand_dims(running_boxes, axis=-2) + box_slice = tf.zeros_like(running_boxes) + box_slice + box_slice = tf.concat([running_boxes, box_slice], axis=-2) + running_boxes = tf.gather_nd(box_slice, ind, batch_dims=4) + + if self._track_classes: + running_classes = tf.expand_dims(running_classes, axis=-1) + class_slice = tf.zeros_like(running_classes) + class_slice + class_slice = tf.concat([running_classes, class_slice], axis=-1) + running_classes = tf.gather_nd(class_slice, ind, batch_dims=4) + + return (pred_box, pred_class, boxes, classes, running_boxes, + running_classes, max_iou, idx + 1) + + def __call__(self, + pred_boxes, + pred_classes, + boxes, + classes, + clip_thresh=0.0): + num_boxes = tf.shape(boxes)[-2] + num_tiles = (num_boxes // TILE_SIZE) - 1 + + if self._min_conf > 0.0: + pred_classes = tf.cast(pred_classes > self._min_conf, pred_classes.dtype) + + def _loop_cond(unused_pred_box, unused_pred_class, boxes, unused_classes, + unused_running_boxes, unused_running_classes, unused_max_iou, + idx): + + # check that the slice has boxes that all zeros + batch_size = tf.shape(boxes)[0] + box_slice = tf.slice(boxes, [0, idx * TILE_SIZE, 0], + [batch_size, TILE_SIZE, 4]) + + return tf.logical_and(idx < num_tiles, + tf.math.greater(tf.reduce_sum(box_slice), 0)) + + running_boxes = tf.zeros_like(pred_boxes) + running_classes = tf.zeros_like(tf.reduce_sum(running_boxes, axis=-1)) + max_iou = tf.zeros_like(tf.reduce_sum(running_boxes, axis=-1)) + max_iou = tf.expand_dims(max_iou, axis=-1) + + (pred_boxes, pred_classes, boxes, classes, running_boxes, running_classes, + max_iou, _) = tf.while_loop(_loop_cond, self._search_body, [ + pred_boxes, pred_classes, boxes, classes, running_boxes, + running_classes, max_iou, + tf.constant(0) + ]) + + mask = tf.cast(max_iou > clip_thresh, running_boxes.dtype) + running_boxes *= mask + running_classes *= tf.squeeze(mask, axis=-1) + max_iou *= mask + max_iou = tf.squeeze(max_iou, axis=-1) + mask = tf.squeeze(mask, axis=-1) + + return (tf.stop_gradient(running_boxes), tf.stop_gradient(running_classes), + tf.stop_gradient(max_iou), tf.stop_gradient(mask)) + + +def average_iou(iou): + """Computes the average intersection over union without counting locations. + + where the iou is zero. + + Args: + iou: A `Tensor` representing the iou values. + + Returns: + tf.stop_gradient(avg_iou): A `Tensor` representing average + intersection over union. + """ + iou_sum = tf.reduce_sum(iou, axis=tf.range(1, tf.shape(tf.shape(iou))[0])) + counts = tf.cast( + tf.math.count_nonzero(iou, axis=tf.range(1, + tf.shape(tf.shape(iou))[0])), + iou.dtype) + avg_iou = tf.reduce_mean(math_ops.divide_no_nan(iou_sum, counts)) + return tf.stop_gradient(avg_iou) + + +def _scale_boxes(encoded_boxes, width, height, anchor_grid, grid_points, + scale_xy): + """Decodes models boxes applying and exponential to width and height maps.""" + # split the boxes + pred_xy = encoded_boxes[..., 0:2] + pred_wh = encoded_boxes[..., 2:4] + + # build a scaling tensor to get the offset of th ebox relative to the image + scaler = tf.convert_to_tensor([height, width, height, width]) + scale_xy = tf.cast(scale_xy, encoded_boxes.dtype) + + # apply the sigmoid + pred_xy = tf.math.sigmoid(pred_xy) + + # scale the centers and find the offset of each box relative to + # their center pixel + pred_xy = pred_xy * scale_xy - 0.5 * (scale_xy - 1) + + # scale the offsets and add them to the grid points or a tensor that is + # the realtive location of each pixel + box_xy = grid_points + pred_xy + + # scale the width and height of the predictions and corlate them + # to anchor boxes + box_wh = tf.math.exp(pred_wh) * anchor_grid + + # build the final predicted box + scaled_box = tf.concat([box_xy, box_wh], axis=-1) + pred_box = scaled_box / scaler + + # shift scaled boxes + scaled_box = tf.concat([pred_xy, box_wh], axis=-1) + return (scaler, scaled_box, pred_box) + + +@tf.custom_gradient +def _darknet_boxes(encoded_boxes, width, height, anchor_grid, grid_points, + max_delta, scale_xy): + """Wrapper for _scale_boxes to implement a custom gradient.""" + (scaler, scaled_box, pred_box) = _scale_boxes(encoded_boxes, width, height, + anchor_grid, grid_points, + scale_xy) + + def delta(unused_dy_scaler, dy_scaled, dy): + dy_xy, dy_wh = tf.split(dy, 2, axis=-1) + dy_xy_, dy_wh_ = tf.split(dy_scaled, 2, axis=-1) + + # add all the gradients that may have been applied to the + # boxes and those that have been applied to the width and height + dy_wh += dy_wh_ + dy_xy += dy_xy_ + + # propagate the exponential applied to the width and height in + # order to ensure the gradient propagated is of the correct + # magnitude + pred_wh = encoded_boxes[..., 2:4] + dy_wh *= tf.math.exp(pred_wh) + + dbox = tf.concat([dy_xy, dy_wh], axis=-1) + + # apply the gradient clipping to xy and wh + dbox = math_ops.rm_nan_inf(dbox) + delta = tf.cast(max_delta, dbox.dtype) + dbox = tf.clip_by_value(dbox, -delta, delta) + return dbox, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 + + return (scaler, scaled_box, pred_box), delta + + +def _new_coord_scale_boxes(encoded_boxes, width, height, anchor_grid, + grid_points, scale_xy): + """Decodes models boxes by squaring and scaling the width and height maps.""" + # split the boxes + pred_xy = encoded_boxes[..., 0:2] + pred_wh = encoded_boxes[..., 2:4] + + # build a scaling tensor to get the offset of th ebox relative to the image + scaler = tf.convert_to_tensor([height, width, height, width]) + scale_xy = tf.cast(scale_xy, pred_xy.dtype) + + # apply the sigmoid + pred_xy = tf.math.sigmoid(pred_xy) + pred_wh = tf.math.sigmoid(pred_wh) + + # scale the xy offset predictions according to the config + pred_xy = pred_xy * scale_xy - 0.5 * (scale_xy - 1) + + # find the true offset from the grid points and the scaler + # where the grid points are the relative offset of each pixel with + # in the image + box_xy = grid_points + pred_xy + + # decode the widht and height of the boxes and correlate them + # to the anchor boxes + box_wh = (2 * pred_wh)**2 * anchor_grid + + # build the final boxes + scaled_box = tf.concat([box_xy, box_wh], axis=-1) + pred_box = scaled_box / scaler + + # shift scaled boxes + scaled_box = tf.concat([pred_xy, box_wh], axis=-1) + return (scaler, scaled_box, pred_box) + + +@tf.custom_gradient +def _darknet_new_coord_boxes(encoded_boxes, width, height, anchor_grid, + grid_points, max_delta, scale_xy): + """Wrapper for _new_coord_scale_boxes to implement a custom gradient.""" + (scaler, scaled_box, + pred_box) = _new_coord_scale_boxes(encoded_boxes, width, height, anchor_grid, + grid_points, scale_xy) + + def delta(unused_dy_scaler, dy_scaled, dy): + dy_xy, dy_wh = tf.split(dy, 2, axis=-1) + dy_xy_, dy_wh_ = tf.split(dy_scaled, 2, axis=-1) + + # add all the gradients that may have been applied to the + # boxes and those that have been applied to the width and height + dy_wh += dy_wh_ + dy_xy += dy_xy_ + + dbox = tf.concat([dy_xy, dy_wh], axis=-1) + + # apply the gradient clipping to xy and wh + dbox = math_ops.rm_nan_inf(dbox) + delta = tf.cast(max_delta, dbox.dtype) + dbox = tf.clip_by_value(dbox, -delta, delta) + return dbox, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 + + return (scaler, scaled_box, pred_box), delta + + +def _anchor_free_scale_boxes(encoded_boxes, + width, + height, + stride, + grid_points, + darknet=False): + """Decode models boxes using FPN stride under anchor free conditions.""" + del darknet + # split the boxes + pred_xy = encoded_boxes[..., 0:2] + pred_wh = encoded_boxes[..., 2:4] + + # build a scaling tensor to get the offset of th ebox relative to the image + scaler = tf.convert_to_tensor([height, width, height, width]) + + # scale the offsets and add them to the grid points or a tensor that is + # the realtive location of each pixel + box_xy = (grid_points + pred_xy) + + # scale the width and height of the predictions and corlate them + # to anchor boxes + box_wh = tf.math.exp(pred_wh) + + # build the final predicted box + scaled_box = tf.concat([box_xy, box_wh], axis=-1) + + # properly scaling boxes gradeints + scaled_box = scaled_box * tf.cast(stride, scaled_box.dtype) + pred_box = scaled_box / tf.cast(scaler * stride, scaled_box.dtype) + return (scaler, scaled_box, pred_box) + + +def get_predicted_box(width, + height, + encoded_boxes, + anchor_grid, + grid_points, + scale_xy, + stride, + darknet=False, + box_type='original', + max_delta=np.inf): + """Decodes the predicted boxes from the model format to a usable format. + + This function decodes the model outputs into the [x, y, w, h] format for + use in the loss function as well as for use within the detection generator. + + Args: + width: A `float` scalar indicating the width of the prediction layer. + height: A `float` scalar indicating the height of the prediction layer + encoded_boxes: A `Tensor` of shape [..., height, width, 4] holding encoded + boxes. + anchor_grid: A `Tensor` of shape [..., 1, 1, 2] holding the anchor boxes + organized for box decoding, box width and height. + grid_points: A `Tensor` of shape [..., height, width, 2] holding the anchor + boxes for decoding the box centers. + scale_xy: A `float` scaler used to indicate the range for each center + outside of its given [..., i, j, 4] index, where i and j are indexing + pixels along the width and height of the predicted output map. + stride: An `int` defining the amount of down stride realtive to the input + image. + darknet: A `bool` used to select between custom gradient and default + autograd. + box_type: An `str` indicating the type of box encoding that is being used. + max_delta: A `float` scaler used for gradient clipping in back propagation. + + Returns: + scaler: A `Tensor` of shape [4] returned to allow the scaling of the ground + truth boxes to be of the same magnitude as the decoded predicted boxes. + scaled_box: A `Tensor` of shape [..., height, width, 4] with the predicted + boxes. + pred_box: A `Tensor` of shape [..., height, width, 4] with the predicted + boxes divided by the scaler parameter used to put all boxes in the [0, 1] + range. + """ + if box_type == 'anchor_free': + (scaler, scaled_box, pred_box) = _anchor_free_scale_boxes( + encoded_boxes, width, height, stride, grid_points, darknet=darknet) + elif darknet: + + # pylint:disable=unbalanced-tuple-unpacking + # if we are using the darknet loss we shoud nto propagate the + # decoding of the box + if box_type == 'scaled': + (scaler, scaled_box, + pred_box) = _darknet_new_coord_boxes(encoded_boxes, width, height, + anchor_grid, grid_points, max_delta, + scale_xy) + else: + (scaler, scaled_box, + pred_box) = _darknet_boxes(encoded_boxes, width, height, anchor_grid, + grid_points, max_delta, scale_xy) + else: + # if we are using the scaled loss we should propagate the decoding of + # the boxes + if box_type == 'scaled': + (scaler, scaled_box, + pred_box) = _new_coord_scale_boxes(encoded_boxes, width, height, + anchor_grid, grid_points, scale_xy) + else: + (scaler, scaled_box, pred_box) = _scale_boxes(encoded_boxes, width, + height, anchor_grid, + grid_points, scale_xy) + + return (scaler, scaled_box, pred_box) diff --git a/official/vision/beta/projects/yolo/ops/math_ops.py b/official/projects/yolo/ops/math_ops.py similarity index 96% rename from official/vision/beta/projects/yolo/ops/math_ops.py rename to official/projects/yolo/ops/math_ops.py index 8350acf2c4f2d96e0ba923e926e3a1bfb9423168..7a42288c15cef4989f98011364a5605ccaee61ce 100644 --- a/official/vision/beta/projects/yolo/ops/math_ops.py +++ b/official/projects/yolo/ops/math_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/projects/yolo/ops/mosaic.py b/official/projects/yolo/ops/mosaic.py old mode 100755 new mode 100644 similarity index 98% rename from official/vision/beta/projects/yolo/ops/mosaic.py rename to official/projects/yolo/ops/mosaic.py index cf386cd610b100304c4cbdc2d595eb601c83355f..963244bd973d9af14f3c836feefc449d0521c7e2 --- a/official/vision/beta/projects/yolo/ops/mosaic.py +++ b/official/projects/yolo/ops/mosaic.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,12 +14,13 @@ """Mosaic op.""" import random + import tensorflow as tf import tensorflow_addons as tfa -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.projects.yolo.ops import preprocessing_ops +from official.projects.yolo.ops import preprocessing_ops +from official.vision.ops import box_ops +from official.vision.ops import preprocess_ops class Mosaic: diff --git a/official/vision/beta/projects/yolo/ops/preprocessing_ops.py b/official/projects/yolo/ops/preprocessing_ops.py old mode 100755 new mode 100644 similarity index 99% rename from official/vision/beta/projects/yolo/ops/preprocessing_ops.py rename to official/projects/yolo/ops/preprocessing_ops.py index 981fbcac97dd607c9c667d219282b75b286eacba..93c8b1569228fe630f0cdebd90fc89eae8d6355e --- a/official/vision/beta/projects/yolo/ops/preprocessing_ops.py +++ b/official/projects/yolo/ops/preprocessing_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ import numpy as np import tensorflow as tf import tensorflow_addons as tfa -from official.vision.beta.ops import box_ops as bbox_ops +from official.vision.ops import box_ops as bbox_ops PAD_VALUE = 114 GLOBAL_SEED_SET = False diff --git a/official/vision/beta/projects/yolo/ops/preprocessing_ops_test.py b/official/projects/yolo/ops/preprocessing_ops_test.py old mode 100755 new mode 100644 similarity index 96% rename from official/vision/beta/projects/yolo/ops/preprocessing_ops_test.py rename to official/projects/yolo/ops/preprocessing_ops_test.py index 43cca574b7f1f1630c117445dc82b7d04f19e7a2..8c3fdb011accb8726caaeb1103256c4bf4426d08 --- a/official/vision/beta/projects/yolo/ops/preprocessing_ops_test.py +++ b/official/projects/yolo/ops/preprocessing_ops_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.ops import box_ops as bbox_ops -from official.vision.beta.projects.yolo.ops import preprocessing_ops +from official.projects.yolo.ops import preprocessing_ops +from official.vision.ops import box_ops as bbox_ops class InputUtilsTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/projects/yolo/optimization/__init__.py b/official/projects/yolo/optimization/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..c98ae619a644edb22bab5cd6118e95034f90e70c --- /dev/null +++ b/official/projects/yolo/optimization/__init__.py @@ -0,0 +1,22 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Optimization package definition.""" + +# pylint: disable=wildcard-import +from official.modeling.optimization.configs.learning_rate_config import * +from official.modeling.optimization.ema_optimizer import ExponentialMovingAverage +from official.projects.yolo.optimization.configs.optimization_config import * +from official.projects.yolo.optimization.configs.optimizer_config import * +from official.projects.yolo.optimization.optimizer_factory import OptimizerFactory as YoloOptimizerFactory diff --git a/official/projects/yolo/optimization/configs/__init__.py b/official/projects/yolo/optimization/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/optimization/configs/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/projects/yolo/optimization/configs/optimization_config.py b/official/projects/yolo/optimization/configs/optimization_config.py old mode 100755 new mode 100644 similarity index 92% rename from official/vision/beta/projects/yolo/optimization/configs/optimization_config.py rename to official/projects/yolo/optimization/configs/optimization_config.py index 92b8d1a79b1021dfed990ddefeb31120066cf22a..8ebdbcfd487d481639909aae75c065eeff1905f6 --- a/official/vision/beta/projects/yolo/optimization/configs/optimization_config.py +++ b/official/projects/yolo/optimization/configs/optimization_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,7 +22,7 @@ import dataclasses from typing import Optional from official.modeling.optimization.configs import optimization_config as optimization_cfg -from official.vision.beta.projects.yolo.optimization.configs import optimizer_config as opt_cfg +from official.projects.yolo.optimization.configs import optimizer_config as opt_cfg @dataclasses.dataclass diff --git a/official/vision/beta/projects/yolo/optimization/configs/optimizer_config.py b/official/projects/yolo/optimization/configs/optimizer_config.py old mode 100755 new mode 100644 similarity index 97% rename from official/vision/beta/projects/yolo/optimization/configs/optimizer_config.py rename to official/projects/yolo/optimization/configs/optimizer_config.py index c1124ee44430ead06eee1a95459a9f2c3ddd98b3..46c9609649cc8fde3fafb646d3863018420c984a --- a/official/vision/beta/projects/yolo/optimization/configs/optimizer_config.py +++ b/official/projects/yolo/optimization/configs/optimizer_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yolo/optimization/optimizer_factory.py b/official/projects/yolo/optimization/optimizer_factory.py new file mode 100644 index 0000000000000000000000000000000000000000..4fe3c330cbba841af0bed4a73c3421e2c0e67083 --- /dev/null +++ b/official/projects/yolo/optimization/optimizer_factory.py @@ -0,0 +1,99 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Optimizer factory class.""" + +import gin + +from official.modeling.optimization import ema_optimizer +from official.modeling.optimization import optimizer_factory +from official.projects.yolo.optimization import sgd_torch + +optimizer_factory.OPTIMIZERS_CLS.update({ + 'sgd_torch': sgd_torch.SGDTorch, +}) + +OPTIMIZERS_CLS = optimizer_factory.OPTIMIZERS_CLS +LR_CLS = optimizer_factory.LR_CLS +WARMUP_CLS = optimizer_factory.WARMUP_CLS + + +class OptimizerFactory(optimizer_factory.OptimizerFactory): + """Optimizer factory class. + + This class builds learning rate and optimizer based on an optimization config. + To use this class, you need to do the following: + (1) Define optimization config, this includes optimizer, and learning rate + schedule. + (2) Initialize the class using the optimization config. + (3) Build learning rate. + (4) Build optimizer. + + This is a typical example for using this class: + params = { + 'optimizer': { + 'type': 'sgd', + 'sgd': {'momentum': 0.9} + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': {'boundaries': [10000, 20000], + 'values': [0.1, 0.01, 0.001]} + }, + 'warmup': { + 'type': 'linear', + 'linear': {'warmup_steps': 500, 'warmup_learning_rate': 0.01} + } + } + opt_config = OptimizationConfig(params) + opt_factory = OptimizerFactory(opt_config) + lr = opt_factory.build_learning_rate() + optimizer = opt_factory.build_optimizer(lr) + """ + + def get_bias_lr_schedule(self, bias_lr): + """Build learning rate. + + Builds learning rate from config. Learning rate schedule is built according + to the learning rate config. If learning rate type is consant, + lr_config.learning_rate is returned. + + Args: + bias_lr: learning rate config. + + Returns: + tf.keras.optimizers.schedules.LearningRateSchedule instance. If + learning rate type is consant, lr_config.learning_rate is returned. + """ + if self._lr_type == 'constant': + lr = self._lr_config.learning_rate + else: + lr = LR_CLS[self._lr_type](**self._lr_config.as_dict()) + + if self._warmup_config: + if self._warmup_type != 'linear': + raise ValueError('Smart Bias is only supported currently with a' + 'linear warm up.') + warm_up_cfg = self._warmup_config.as_dict() + warm_up_cfg['warmup_learning_rate'] = bias_lr + lr = WARMUP_CLS['linear'](lr, **warm_up_cfg) + return lr + + @gin.configurable + def add_ema(self, optimizer): + """Add EMA to the optimizer independently of the build optimizer method.""" + if self._use_ema: + optimizer = ema_optimizer.ExponentialMovingAverage( + optimizer, **self._ema_config.as_dict()) + return optimizer diff --git a/official/vision/beta/projects/yolo/optimization/sgd_torch.py b/official/projects/yolo/optimization/sgd_torch.py similarity index 98% rename from official/vision/beta/projects/yolo/optimization/sgd_torch.py rename to official/projects/yolo/optimization/sgd_torch.py index 289dc7a6d011a3944af1d8f6cf4a3feb483e8644..d537a08af5d0bbf3a562f0c630ad8c61d5eb103e 100644 --- a/official/vision/beta/projects/yolo/optimization/sgd_torch.py +++ b/official/projects/yolo/optimization/sgd_torch.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -43,7 +43,7 @@ def _var_key(var): return var._unique_id -class SGDTorch(tf.keras.optimizers.Optimizer): +class SGDTorch(tf.keras.optimizers.legacy.Optimizer): """Optimizer that simulates the SGD module used in pytorch. diff --git a/official/projects/yolo/serving/__init__.py b/official/projects/yolo/serving/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/serving/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/yolo/serving/export_module_factory.py b/official/projects/yolo/serving/export_module_factory.py new file mode 100644 index 0000000000000000000000000000000000000000..e1488687f469ce3ea36caccead8bcbca4e2a2430 --- /dev/null +++ b/official/projects/yolo/serving/export_module_factory.py @@ -0,0 +1,245 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Factory for YOLO export modules.""" + +from typing import Any, Callable, Dict, List, Optional, Text, Union + +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.core import export_base +from official.projects.yolo.configs.yolo import YoloTask +from official.projects.yolo.modeling import factory as yolo_factory +from official.projects.yolo.modeling.backbones import darknet # pylint: disable=unused-import +from official.projects.yolo.modeling.decoders import yolo_decoder # pylint: disable=unused-import +from official.projects.yolo.serving import model_fn as yolo_model_fn +from official.vision import configs +from official.vision.dataloaders import classification_input +from official.vision.modeling import factory +from official.vision.serving import export_utils + + +class ExportModule(export_base.ExportModule): + """Base Export Module.""" + + def __init__(self, + params: cfg.ExperimentConfig, + model: tf.keras.Model, + input_signature: Union[tf.TensorSpec, Dict[str, tf.TensorSpec]], + preprocessor: Optional[Callable[..., Any]] = None, + inference_step: Optional[Callable[..., Any]] = None, + postprocessor: Optional[Callable[..., Any]] = None, + eval_postprocessor: Optional[Callable[..., Any]] = None): + """Initializes a module for export. + + Args: + params: A dataclass for parameters to the module. + model: A tf.keras.Model instance to be exported. + input_signature: tf.TensorSpec, e.g. + tf.TensorSpec(shape=[None, 224, 224, 3], dtype=tf.uint8) + preprocessor: An optional callable to preprocess the inputs. + inference_step: An optional callable to forward-pass the model. + postprocessor: An optional callable to postprocess the model outputs. + eval_postprocessor: An optional callable to postprocess model outputs + used for model evaluation. + """ + super().__init__( + params, + model=model, + preprocessor=preprocessor, + inference_step=inference_step, + postprocessor=postprocessor) + self.eval_postprocessor = eval_postprocessor + self.input_signature = input_signature + + @tf.function + def serve(self, inputs: Any) -> Any: + x = self.preprocessor(inputs=inputs) if self.preprocessor else inputs + x = self.inference_step(x) + x = self.postprocessor(x) if self.postprocessor else x + return x + + @tf.function + def serve_eval(self, inputs: Any) -> Any: + x = self.preprocessor(inputs=inputs) if self.preprocessor else inputs + x = self.inference_step(x) + x = self.eval_postprocessor(x) if self.eval_postprocessor else x + return x + + def get_inference_signatures( + self, function_keys: Dict[Text, Text]): + """Gets defined function signatures. + + Args: + function_keys: A dictionary with keys as the function to create signature + for and values as the signature keys when returns. + + Returns: + A dictionary with key as signature key and value as concrete functions + that can be used for tf.saved_model.save. + """ + signatures = {} + for _, def_name in function_keys.items(): + if 'eval' in def_name and self.eval_postprocessor: + signatures[def_name] = self.serve_eval.get_concrete_function( + self.input_signature) + else: + signatures[def_name] = self.serve.get_concrete_function( + self.input_signature) + return signatures + + +def create_classification_export_module( + params: cfg.ExperimentConfig, + input_type: str, + batch_size: int, + input_image_size: List[int], + num_channels: int = 3) -> ExportModule: + """Creates classification export module.""" + input_signature = export_utils.get_image_input_signatures( + input_type, batch_size, input_image_size, num_channels) + input_specs = tf.keras.layers.InputSpec(shape=[batch_size] + + input_image_size + [num_channels]) + + model = factory.build_classification_model( + input_specs=input_specs, + model_config=params.task.model, + l2_regularizer=None) + + def preprocess_fn(inputs): + image_tensor = export_utils.parse_image(inputs, input_type, + input_image_size, num_channels) + # If input_type is `tflite`, do not apply image preprocessing. + if input_type == 'tflite': + return image_tensor + + def preprocess_image_fn(inputs): + return classification_input.Parser.inference_fn(inputs, input_image_size, + num_channels) + + images = tf.map_fn( + preprocess_image_fn, + elems=image_tensor, + fn_output_signature=tf.TensorSpec( + shape=input_image_size + [num_channels], dtype=tf.float32)) + + return images + + def postprocess_fn(logits): + probs = tf.nn.softmax(logits) + return {'logits': logits, 'probs': probs} + + export_module = ExportModule( + params, + model=model, + input_signature=input_signature, + preprocessor=preprocess_fn, + postprocessor=postprocess_fn) + return export_module + + +def create_yolo_export_module( + params: cfg.ExperimentConfig, + input_type: str, + batch_size: int, + input_image_size: List[int], + num_channels: int = 3) -> ExportModule: + """Creates YOLO export module.""" + input_signature = export_utils.get_image_input_signatures( + input_type, batch_size, input_image_size, num_channels) + input_specs = tf.keras.layers.InputSpec(shape=[batch_size] + + input_image_size + [num_channels]) + model, _ = yolo_factory.build_yolo( + input_specs=input_specs, + model_config=params.task.model, + l2_regularization=None) + + def preprocess_fn(inputs): + image_tensor = export_utils.parse_image(inputs, input_type, + input_image_size, num_channels) + # If input_type is `tflite`, do not apply image preprocessing. + if input_type == 'tflite': + return image_tensor + + def preprocess_image_fn(inputs): + image = tf.cast(inputs, dtype=tf.float32) + image = image / 255. + (image, image_info) = yolo_model_fn.letterbox( + image, + input_image_size, + letter_box=params.task.validation_data.parser.letter_box) + return image, image_info + + images_spec = tf.TensorSpec(shape=input_image_size + [3], dtype=tf.float32) + + image_info_spec = tf.TensorSpec(shape=[4, 2], dtype=tf.float32) + + images, image_info = tf.nest.map_structure( + tf.identity, + tf.map_fn( + preprocess_image_fn, + elems=image_tensor, + fn_output_signature=(images_spec, image_info_spec), + parallel_iterations=32)) + + return images, image_info + + def inference_steps(inputs, model): + images, image_info = inputs + detection = model(images, training=False) + detection['bbox'] = yolo_model_fn.undo_info( + detection['bbox'], + detection['num_detections'], + image_info, + expand=False) + + final_outputs = { + 'detection_boxes': detection['bbox'], + 'detection_scores': detection['confidence'], + 'detection_classes': detection['classes'], + 'num_detections': detection['num_detections'] + } + + return final_outputs + + export_module = ExportModule( + params, + model=model, + input_signature=input_signature, + preprocessor=preprocess_fn, + inference_step=inference_steps) + + return export_module + + +def get_export_module(params: cfg.ExperimentConfig, + input_type: str, + batch_size: Optional[int], + input_image_size: List[int], + num_channels: int = 3) -> ExportModule: + """Factory for export modules.""" + if isinstance(params.task, + configs.image_classification.ImageClassificationTask): + export_module = create_classification_export_module(params, input_type, + batch_size, + input_image_size, + num_channels) + elif isinstance(params.task, YoloTask): + export_module = create_yolo_export_module(params, input_type, batch_size, + input_image_size, num_channels) + else: + raise ValueError('Export module not implemented for {} task.'.format( + type(params.task))) + return export_module diff --git a/official/projects/yolo/serving/export_saved_model.py b/official/projects/yolo/serving/export_saved_model.py new file mode 100644 index 0000000000000000000000000000000000000000..5f6eee932ed4d26e433a2feb0eba8e88f0f2ebb8 --- /dev/null +++ b/official/projects/yolo/serving/export_saved_model.py @@ -0,0 +1,107 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""YOLO model export binary for serving/inference. + +To export a trained checkpoint in saved_model format (shell script): + +CHECKPOINT_PATH = XX +EXPORT_DIR_PATH = XX +CONFIG_FILE_PATH = XX +export_saved_model --export_dir=${EXPORT_DIR_PATH}/ \ + --checkpoint_path=${CHECKPOINT_PATH} \ + --config_file=${CONFIG_FILE_PATH} \ + --batch_size=2 \ + --input_image_size=224,224 +To serve (python): +export_dir_path = XX +input_type = XX +input_images = XX +imported = tf.saved_model.load(export_dir_path) +model_fn = imported.signatures['serving_default'] +output = model_fn(input_images) +""" + +from absl import app +from absl import flags + +from official.core import exp_factory +from official.modeling import hyperparams +from official.projects.yolo.configs import yolo as cfg # pylint: disable=unused-import +from official.projects.yolo.serving import export_module_factory +from official.projects.yolo.tasks import yolo as task # pylint: disable=unused-import +from official.vision.serving import export_saved_model_lib + +FLAGS = flags.FLAGS + +flags.DEFINE_string('experiment', 'scaled_yolo', + 'experiment type, e.g. scaled_yolo') +flags.DEFINE_string('export_dir', None, 'The export directory.') +flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path.') +flags.DEFINE_multi_string( + 'config_file', + default=None, + help='YAML/JSON files which specifies overrides. The override order ' + 'follows the order of args. Note that each file ' + 'can be used as an override template to override the default parameters ' + 'specified in Python. If the same parameter is specified in both ' + '`--config_file` and `--params_override`, `config_file` will be used ' + 'first, followed by params_override.') +flags.DEFINE_string( + 'params_override', '', + 'The JSON/YAML file or string which specifies the parameter to be overriden' + ' on top of `config_file` template.') +flags.DEFINE_integer('batch_size', 1, 'The batch size.') +flags.DEFINE_string('input_type', 'image_tensor', + 'One of `image_tensor`, `image_bytes`, `tf_example`.') +flags.DEFINE_string( + 'input_image_size', '224,224', + 'The comma-separated string of two integers representing the height,width ' + 'of the input to the model.') + + +def main(_): + + params = exp_factory.get_exp_config(FLAGS.experiment) + for config_file in FLAGS.config_file or []: + params = hyperparams.override_params_dict( + params, config_file, is_strict=True) + if FLAGS.params_override: + params = hyperparams.override_params_dict( + params, FLAGS.params_override, is_strict=True) + + params.validate() + params.lock() + + input_image_size = [int(x) for x in FLAGS.input_image_size.split(',')] + + export_module = export_module_factory.get_export_module( + params=params, + input_type=FLAGS.input_type, + batch_size=FLAGS.batch_size, + input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], + num_channels=3) + + export_saved_model_lib.export_inference_graph( + input_type=FLAGS.input_type, + batch_size=FLAGS.batch_size, + input_image_size=input_image_size, + params=params, + checkpoint_path=FLAGS.checkpoint_path, + export_dir=FLAGS.export_dir, + export_module=export_module) + + +if __name__ == '__main__': + app.run(main) diff --git a/official/projects/yolo/serving/model_fn.py b/official/projects/yolo/serving/model_fn.py new file mode 100644 index 0000000000000000000000000000000000000000..e65f95bfc08c00d4baa3fc37ed73e2f1e77b986e --- /dev/null +++ b/official/projects/yolo/serving/model_fn.py @@ -0,0 +1,82 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""YOLO input and model functions for serving/inference.""" + +from typing import List, Tuple + +import tensorflow as tf + +from official.projects.yolo.ops import preprocessing_ops +from official.vision.ops import box_ops + + +def letterbox(image: tf.Tensor, + desired_size: List[int], + letter_box: bool = True) -> Tuple[tf.Tensor, tf.Tensor]: + """Letter box an image for image serving.""" + + with tf.name_scope('letter_box'): + image_size = tf.cast(preprocessing_ops.get_image_shape(image), tf.float32) + + scaled_size = tf.cast(desired_size, image_size.dtype) + if letter_box: + scale = tf.minimum(scaled_size[0] / image_size[0], + scaled_size[1] / image_size[1]) + scaled_size = tf.round(image_size * scale) + else: + scale = 1.0 + + # Computes 2D image_scale. + image_scale = scaled_size / image_size + image_offset = tf.cast((desired_size - scaled_size) * 0.5, tf.int32) + offset = (scaled_size - desired_size) * 0.5 + scaled_image = tf.image.resize( + image, tf.cast(scaled_size, tf.int32), method='nearest') + + output_image = tf.image.pad_to_bounding_box(scaled_image, image_offset[0], + image_offset[1], + desired_size[0], + desired_size[1]) + + image_info = tf.stack([ + image_size, + tf.cast(desired_size, dtype=tf.float32), image_scale, + tf.cast(offset, tf.float32) + ]) + return output_image, image_info + + +def undo_info(boxes: tf.Tensor, + num_detections: int, + info: tf.Tensor, + expand: bool = True) -> tf.Tensor: + """Clip and normalize boxes for serving.""" + + mask = tf.sequence_mask(num_detections, maxlen=tf.shape(boxes)[1]) + boxes = tf.cast(tf.expand_dims(mask, axis=-1), boxes.dtype) * boxes + + if expand: + info = tf.cast(tf.expand_dims(info, axis=0), boxes.dtype) + inshape = tf.expand_dims(info[:, 1, :], axis=1) + ogshape = tf.expand_dims(info[:, 0, :], axis=1) + scale = tf.expand_dims(info[:, 2, :], axis=1) + offset = tf.expand_dims(info[:, 3, :], axis=1) + + boxes = box_ops.denormalize_boxes(boxes, inshape) + boxes += tf.tile(offset, [1, 1, 2]) + boxes /= tf.tile(scale, [1, 1, 2]) + boxes = box_ops.clip_boxes(boxes, ogshape) + boxes = box_ops.normalize_boxes(boxes, ogshape) + return boxes diff --git a/official/projects/yolo/tasks/__init__.py b/official/projects/yolo/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/projects/yolo/tasks/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/projects/yolo/tasks/image_classification.py b/official/projects/yolo/tasks/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..86647342282ac6f555aab2bea5bd5b12e8a2055f --- /dev/null +++ b/official/projects/yolo/tasks/image_classification.py @@ -0,0 +1,65 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification task definition.""" +from official.common import dataset_fn +from official.core import task_factory +from official.projects.yolo.configs import darknet_classification as exp_cfg +from official.projects.yolo.dataloaders import classification_input +from official.vision.dataloaders import classification_input as classification_input_base +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import tfds_factory +from official.vision.tasks import image_classification + + +@task_factory.register_task_cls(exp_cfg.ImageClassificationTask) +class ImageClassificationTask(image_classification.ImageClassificationTask): + """A task for image classification.""" + + def build_inputs(self, params, input_context=None): + """Builds classification input.""" + + num_classes = self.task_config.model.num_classes + input_size = self.task_config.model.input_size + image_field_key = self.task_config.train_data.image_field_key + label_field_key = self.task_config.train_data.label_field_key + is_multilabel = self.task_config.train_data.is_multilabel + + if params.tfds_name: + decoder = tfds_factory.get_classification_decoder(params.tfds_name) + else: + decoder = classification_input_base.Decoder( + image_field_key=image_field_key, + label_field_key=label_field_key, + is_multilabel=is_multilabel) + + parser = classification_input.Parser( + output_size=input_size[:2], + num_classes=num_classes, + image_field_key=image_field_key, + label_field_key=label_field_key, + decode_jpeg_only=params.decode_jpeg_only, + aug_rand_hflip=params.aug_rand_hflip, + aug_type=params.aug_type, + is_multilabel=is_multilabel, + dtype=params.dtype) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + + dataset = reader.read(input_context=input_context) + return dataset diff --git a/official/vision/beta/projects/yolo/tasks/task_utils.py b/official/projects/yolo/tasks/task_utils.py similarity index 95% rename from official/vision/beta/projects/yolo/tasks/task_utils.py rename to official/projects/yolo/tasks/task_utils.py index d759f3f1f53bf247caa5317738e19a151e469d50..9a14f49104b085d487d045e39806ee5a850f9509 100644 --- a/official/vision/beta/projects/yolo/tasks/task_utils.py +++ b/official/projects/yolo/tasks/task_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yolo/tasks/yolo.py b/official/projects/yolo/tasks/yolo.py new file mode 100644 index 0000000000000000000000000000000000000000..826c95f88b00441cc2dcc74bc7718a9be9cb9694 --- /dev/null +++ b/official/projects/yolo/tasks/yolo.py @@ -0,0 +1,449 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains classes used to train Yolo.""" + +import collections +from typing import Optional + +from absl import logging +import tensorflow as tf + +from official.common import dataset_fn +from official.core import base_task +from official.core import config_definitions +from official.core import input_reader +from official.core import task_factory +from official.modeling import performance +from official.projects.yolo import optimization +from official.projects.yolo.configs import yolo as exp_cfg +from official.projects.yolo.dataloaders import tf_example_decoder +from official.projects.yolo.dataloaders import yolo_input +from official.projects.yolo.modeling import factory +from official.projects.yolo.ops import kmeans_anchors +from official.projects.yolo.ops import mosaic +from official.projects.yolo.ops import preprocessing_ops +from official.projects.yolo.tasks import task_utils +from official.vision.dataloaders import tfds_factory +from official.vision.dataloaders import tf_example_label_map_decoder +from official.vision.evaluation import coco_evaluator +from official.vision.ops import box_ops + +OptimizationConfig = optimization.OptimizationConfig +RuntimeConfig = config_definitions.RuntimeConfig + + +@task_factory.register_task_cls(exp_cfg.YoloTask) +class YoloTask(base_task.Task): + """A single-replica view of training procedure. + + YOLO task provides artifacts for training/evalution procedures, including + loading/iterating over Datasets, initializing the model, calculating the loss, + post-processing, and customized metrics with reduction. + """ + + def __init__(self, params, logging_dir: Optional[str] = None): + super().__init__(params, logging_dir) + self.coco_metric = None + self._loss_fn = None + self._model = None + self._coco_91_to_80 = False + self._metrics = [] + + # globally set the random seed + preprocessing_ops.set_random_seeds(seed=params.seed) + + if self.task_config.model.anchor_boxes.generate_anchors: + self.generate_anchors() + return + + def generate_anchors(self): + """Generate Anchor boxes for an arbitrary object detection dataset.""" + input_size = self.task_config.model.input_size + anchor_cfg = self.task_config.model.anchor_boxes + backbone = self.task_config.model.backbone.get() + + dataset = self.task_config.train_data + decoder = self._get_data_decoder(dataset) + + num_anchors = backbone.max_level - backbone.min_level + 1 + num_anchors *= anchor_cfg.anchors_per_scale + + gbs = dataset.global_batch_size + dataset.global_batch_size = 1 + box_reader = kmeans_anchors.BoxGenInputReader( + dataset, + dataset_fn=tf.data.TFRecordDataset, + decoder_fn=decoder.decode) + + boxes = box_reader.read( + k=num_anchors, + anchors_per_scale=anchor_cfg.anchors_per_scale, + image_resolution=input_size, + scaling_mode=anchor_cfg.scaling_mode, + box_generation_mode=anchor_cfg.box_generation_mode, + num_samples=anchor_cfg.num_samples) + + dataset.global_batch_size = gbs + + with open('anchors.txt', 'w') as f: + f.write(f'input resolution: {input_size} \n boxes: \n {boxes}') + logging.info('INFO: boxes will be saved to anchors.txt, mack sure to save' + 'them and update the boxes feild in you yaml config file.') + + anchor_cfg.set_boxes(boxes) + return boxes + + def build_model(self): + """Build an instance of Yolo.""" + + model_base_cfg = self.task_config.model + l2_weight_decay = self.task_config.weight_decay / 2.0 + + input_size = model_base_cfg.input_size.copy() + input_specs = tf.keras.layers.InputSpec(shape=[None] + input_size) + l2_regularizer = ( + tf.keras.regularizers.l2(l2_weight_decay) if l2_weight_decay else None) + model, losses = factory.build_yolo( + input_specs, model_base_cfg, l2_regularizer) + + # save for later usage within the task. + self._loss_fn = losses + self._model = model + return model + + def _get_data_decoder(self, params): + """Get a decoder object to decode the dataset.""" + if params.tfds_name: + decoder = tfds_factory.get_detection_decoder(params.tfds_name) + else: + decoder_cfg = params.decoder.get() + if params.decoder.type == 'simple_decoder': + self._coco_91_to_80 = decoder_cfg.coco91_to_80 + decoder = tf_example_decoder.TfExampleDecoder( + coco91_to_80=decoder_cfg.coco91_to_80, + regenerate_source_id=decoder_cfg.regenerate_source_id) + elif params.decoder.type == 'label_map_decoder': + decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( + label_map=decoder_cfg.label_map, + regenerate_source_id=decoder_cfg.regenerate_source_id) + else: + raise ValueError('Unknown decoder type: {}!'.format( + params.decoder.type)) + return decoder + + def build_inputs(self, params, input_context=None): + """Build input dataset.""" + model = self.task_config.model + + # get anchor boxes dict based on models min and max level + backbone = model.backbone.get() + anchor_dict, level_limits = model.anchor_boxes.get(backbone.min_level, + backbone.max_level) + + params.seed = self.task_config.seed + # set shared patamters between mosaic and yolo_input + base_config = dict( + letter_box=params.parser.letter_box, + aug_rand_translate=params.parser.aug_rand_translate, + aug_rand_angle=params.parser.aug_rand_angle, + aug_rand_perspective=params.parser.aug_rand_perspective, + area_thresh=params.parser.area_thresh, + random_flip=params.parser.random_flip, + seed=params.seed, + ) + + # get the decoder + decoder = self._get_data_decoder(params) + + # init Mosaic + sample_fn = mosaic.Mosaic( + output_size=model.input_size, + mosaic_frequency=params.parser.mosaic.mosaic_frequency, + mixup_frequency=params.parser.mosaic.mixup_frequency, + jitter=params.parser.mosaic.jitter, + mosaic_center=params.parser.mosaic.mosaic_center, + mosaic_crop_mode=params.parser.mosaic.mosaic_crop_mode, + aug_scale_min=params.parser.mosaic.aug_scale_min, + aug_scale_max=params.parser.mosaic.aug_scale_max, + **base_config) + + # init Parser + parser = yolo_input.Parser( + output_size=model.input_size, + anchors=anchor_dict, + use_tie_breaker=params.parser.use_tie_breaker, + jitter=params.parser.jitter, + aug_scale_min=params.parser.aug_scale_min, + aug_scale_max=params.parser.aug_scale_max, + aug_rand_hue=params.parser.aug_rand_hue, + aug_rand_saturation=params.parser.aug_rand_saturation, + aug_rand_brightness=params.parser.aug_rand_brightness, + max_num_instances=params.parser.max_num_instances, + scale_xy=model.detection_generator.scale_xy.get(), + expanded_strides=model.detection_generator.path_scales.get(), + darknet=model.darknet_based_model, + best_match_only=params.parser.best_match_only, + anchor_t=params.parser.anchor_thresh, + random_pad=params.parser.random_pad, + level_limits=level_limits, + dtype=params.dtype, + **base_config) + + # init the dataset reader + reader = input_reader.InputReader( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + sample_fn=sample_fn.mosaic_fn(is_training=params.is_training), + parser_fn=parser.parse_fn(params.is_training)) + dataset = reader.read(input_context=input_context) + return dataset + + def build_metrics(self, training=True): + """Build detection metrics.""" + metrics = [] + + backbone = self.task_config.model.backbone.get() + metric_names = collections.defaultdict(list) + for key in range(backbone.min_level, backbone.max_level + 1): + key = str(key) + metric_names[key].append('loss') + metric_names[key].append('avg_iou') + metric_names[key].append('avg_obj') + + metric_names['net'].append('box') + metric_names['net'].append('class') + metric_names['net'].append('conf') + + for _, key in enumerate(metric_names.keys()): + metrics.append(task_utils.ListMetrics(metric_names[key], name=key)) + + self._metrics = metrics + if not training: + annotation_file = self.task_config.annotation_file + if self._coco_91_to_80: + annotation_file = None + self.coco_metric = coco_evaluator.COCOEvaluator( + annotation_file=annotation_file, + include_mask=False, + need_rescale_bboxes=False, + per_category_metrics=self._task_config.per_category_metrics) + + return metrics + + def build_losses(self, outputs, labels, aux_losses=None): + """Build YOLO losses.""" + return self._loss_fn(labels, outputs) + + def train_step(self, inputs, model, optimizer, metrics=None): + """Train Step. + + Forward step and backwards propagate the model. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + image, label = inputs + + with tf.GradientTape(persistent=False) as tape: + # Compute a prediction + y_pred = model(image, training=True) + + # Cast to float32 for gradietn computation + y_pred = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), y_pred) + + # Get the total loss + (scaled_loss, metric_loss, + loss_metrics) = self.build_losses(y_pred['raw_output'], label) + + # Scale the loss for numerical stability + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + # Compute the gradient + train_vars = model.trainable_variables + gradients = tape.gradient(scaled_loss, train_vars) + + # Get unscaled loss if we are using the loss scale optimizer on fp16 + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + gradients = optimizer.get_unscaled_gradients(gradients) + + # Apply gradients to the model + optimizer.apply_gradients(zip(gradients, train_vars)) + logs = {self.loss: metric_loss} + + # Compute all metrics + if metrics: + for m in metrics: + m.update_state(loss_metrics[m.name]) + logs.update({m.name: m.result()}) + return logs + + def _reorg_boxes(self, boxes, info, num_detections): + """Scale and Clean boxes prior to Evaluation.""" + mask = tf.sequence_mask(num_detections, maxlen=tf.shape(boxes)[1]) + mask = tf.cast(tf.expand_dims(mask, axis=-1), boxes.dtype) + + # Denormalize the boxes by the shape of the image + inshape = tf.expand_dims(info[:, 1, :], axis=1) + ogshape = tf.expand_dims(info[:, 0, :], axis=1) + scale = tf.expand_dims(info[:, 2, :], axis=1) + offset = tf.expand_dims(info[:, 3, :], axis=1) + + boxes = box_ops.denormalize_boxes(boxes, inshape) + boxes = box_ops.clip_boxes(boxes, inshape) + boxes += tf.tile(offset, [1, 1, 2]) + boxes /= tf.tile(scale, [1, 1, 2]) + boxes = box_ops.clip_boxes(boxes, ogshape) + + # Mask the boxes for usage + boxes *= mask + boxes += (mask - 1) + return boxes + + def validation_step(self, inputs, model, metrics=None): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + image, label = inputs + + # Step the model once + y_pred = model(image, training=False) + y_pred = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), y_pred) + (_, metric_loss, loss_metrics) = self.build_losses(y_pred['raw_output'], + label) + logs = {self.loss: metric_loss} + + # Reorganize and rescale the boxes + info = label['groundtruths']['image_info'] + boxes = self._reorg_boxes(y_pred['bbox'], info, y_pred['num_detections']) + + # Build the input for the coc evaluation metric + coco_model_outputs = { + 'detection_boxes': boxes, + 'detection_scores': y_pred['confidence'], + 'detection_classes': y_pred['classes'], + 'num_detections': y_pred['num_detections'], + 'source_id': label['groundtruths']['source_id'], + 'image_info': label['groundtruths']['image_info'] + } + + # Compute all metrics + if metrics: + logs.update( + {self.coco_metric.name: (label['groundtruths'], coco_model_outputs)}) + for m in metrics: + m.update_state(loss_metrics[m.name]) + logs.update({m.name: m.result()}) + return logs + + def aggregate_logs(self, state=None, step_outputs=None): + """Get Metric Results.""" + if not state: + self.coco_metric.reset_states() + state = self.coco_metric + self.coco_metric.update_state(step_outputs[self.coco_metric.name][0], + step_outputs[self.coco_metric.name][1]) + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + """Reduce logs and remove unneeded items. Update with COCO results.""" + res = self.coco_metric.result() + return res + + def initialize(self, model: tf.keras.Model): + """Loading pretrained checkpoint.""" + + if not self.task_config.init_checkpoint: + logging.info('Training from Scratch.') + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + ckpt_items = {} + if 'backbone' in self.task_config.init_checkpoint_modules: + ckpt_items.update(backbone=model.backbone) + if 'decoder' in self.task_config.init_checkpoint_modules: + ckpt_items.update(decoder=model.decoder) + + ckpt = tf.train.Checkpoint(**ckpt_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def create_optimizer(self, + optimizer_config: OptimizationConfig, + runtime_config: Optional[RuntimeConfig] = None): + """Creates an TF optimizer from configurations. + + Args: + optimizer_config: the parameters of the Optimization settings. + runtime_config: the parameters of the runtime. + + Returns: + A tf.optimizers.Optimizer object. + """ + opt_factory = optimization.YoloOptimizerFactory(optimizer_config) + # pylint: disable=protected-access + ema = opt_factory._use_ema + opt_factory._use_ema = False + + opt_type = opt_factory._optimizer_type + if opt_type == 'sgd_torch': + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + optimizer.set_bias_lr( + opt_factory.get_bias_lr_schedule(self._task_config.smart_bias_lr)) + optimizer.search_and_set_variable_groups(self._model.trainable_variables) + else: + optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) + opt_factory._use_ema = ema + + if ema: + logging.info('EMA is enabled.') + optimizer = opt_factory.add_ema(optimizer) + + # pylint: enable=protected-access + + if runtime_config and runtime_config.loss_scale: + use_float16 = runtime_config.mixed_precision_dtype == 'float16' + optimizer = performance.configure_optimizer( + optimizer, + use_float16=use_float16, + loss_scale=runtime_config.loss_scale) + + return optimizer diff --git a/official/projects/yolo/train.py b/official/projects/yolo/train.py new file mode 100644 index 0000000000000000000000000000000000000000..38d4f029ca4f05f958a7cea8fe857ad1f749fe32 --- /dev/null +++ b/official/projects/yolo/train.py @@ -0,0 +1,29 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver.""" + +from absl import app +from absl import flags + +from official.common import flags as tfm_flags +from official.projects.yolo.common import registry_imports # pylint: disable=unused-import +from official.vision import train + +FLAGS = flags.FLAGS + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/projects/yt8m/__init__.py b/official/projects/yt8m/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/yt8m/__init__.py +++ b/official/projects/yt8m/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yt8m/configs/__init__.py b/official/projects/yt8m/configs/__init__.py index 2785613f22bdd5886332e53a03d96f3d529b7fd9..d34bc0957cdfa5234fe5d98b45c181d49d7f5d6d 100644 --- a/official/projects/yt8m/configs/__init__.py +++ b/official/projects/yt8m/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yt8m/configs/yt8m.py b/official/projects/yt8m/configs/yt8m.py index e367779d1ade85eef4b1361fdf18fc41cc29e43c..63f8eb5a30f4c6247aba6c84a951f1e9cbcd9e7c 100644 --- a/official/projects/yt8m/configs/yt8m.py +++ b/official/projects/yt8m/configs/yt8m.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ from official.core import config_definitions as cfg from official.core import exp_factory from official.modeling import hyperparams from official.modeling import optimization -from official.vision.beta.configs import common +from official.vision.configs import common FLAGS = flags.FLAGS @@ -35,13 +35,46 @@ YT8M_VAL_PATH = 'gs://youtube8m-ml/3/frame/validate/validate*.tfrecord' @dataclasses.dataclass class DataConfig(cfg.DataConfig): - """The base configuration for building datasets.""" + """The base configuration for building datasets. + + Attributes: + name: Dataset name. + split: dataset split, 'train' or 'valid'. + feature_sizes: shape(length) of each feature specified in the feature_names. + feature_names: names of the features in the tf.SequenceExample. + feature_sources: if the feature from 'context' or 'features'. + feature_dtypes: dtype of decoded feature. + feature_from_bytes: decode feature from bytes or as dtype list. + label_fields: name of field to read from tf.SequenceExample. + segment_size: Number of frames in each segment. + segment_labels: Use segment level label. Default: False, video level label. + include_video_id: `True` means include video id (string) in the input to + the model. + temporal_stride: Not used. Need to deprecated. + max_frames: Maxim Number of frames in a input example. It is used to crop + the input in the temporal dimension. + num_frames: Number of frames in a single input example. + num_classes: Number of classes to classify. Assuming it is a classification + task. + num_devices: Not used. To be deprecated. + input_path: The path to the input. + is_training: Whether this data is used for training or not. + num_examples: Number of examples in the dataset. It is used to compute the + steps for train or eval. set the value to `-1` to make the experiment run + until the end of dataset. + file_type: type of input files. + """ name: Optional[str] = 'yt8m' split: Optional[str] = None feature_sizes: Tuple[int, ...] = (1024, 128) feature_names: Tuple[str, ...] = ('rgb', 'audio') + feature_sources: Tuple[str, ...] = ('feature', 'feature') + feature_dtypes: Tuple[str, ...] = ('uint8', 'uint8') + feature_from_bytes: Tuple[bool, ...] = (True, True) + label_field: str = 'labels' segment_size: int = 1 segment_labels: bool = False + include_video_id: bool = False temporal_stride: int = 1 max_frames: int = 300 num_frames: int = 300 # set smaller to allow random sample (Parser) @@ -49,12 +82,13 @@ class DataConfig(cfg.DataConfig): num_devices: int = 1 input_path: str = '' is_training: bool = True - random_seed: int = 123 num_examples: int = -1 + file_type: str = 'tfrecord' def yt8m(is_training): """YT8M dataset configs.""" + # pylint: disable=unexpected-keyword-arg return DataConfig( num_frames=30, temporal_stride=1, @@ -62,8 +96,10 @@ def yt8m(is_training): segment_size=5, is_training=is_training, split='train' if is_training else 'valid', + drop_remainder=is_training, # pytype: disable=wrong-keyword-args num_examples=YT8M_TRAIN_EXAMPLES if is_training else YT8M_VAL_EXAMPLES, input_path=YT8M_TRAIN_PATH if is_training else YT8M_VAL_PATH) + # pylint: enable=unexpected-keyword-arg @dataclasses.dataclass @@ -118,24 +154,26 @@ def add_trainer( eval_batch_size: int, learning_rate: float = 0.0001, train_epochs: int = 50, + num_train_examples: int = YT8M_TRAIN_EXAMPLES, + num_val_examples: int = YT8M_VAL_EXAMPLES, ): """Add and config a trainer to the experiment config.""" - if YT8M_TRAIN_EXAMPLES <= 0: + if num_train_examples <= 0: raise ValueError('Wrong train dataset size {!r}'.format( experiment.task.train_data)) - if YT8M_VAL_EXAMPLES <= 0: + if num_val_examples <= 0: raise ValueError('Wrong validation dataset size {!r}'.format( experiment.task.validation_data)) experiment.task.train_data.global_batch_size = train_batch_size experiment.task.validation_data.global_batch_size = eval_batch_size - steps_per_epoch = YT8M_TRAIN_EXAMPLES // train_batch_size - steps_per_loop = 30 + steps_per_epoch = num_train_examples // train_batch_size + steps_per_loop = 500 experiment.trainer = cfg.TrainerConfig( steps_per_loop=steps_per_loop, summary_interval=steps_per_loop, checkpoint_interval=steps_per_loop, train_steps=train_epochs * steps_per_epoch, - validation_steps=YT8M_VAL_EXAMPLES // eval_batch_size, + validation_steps=num_val_examples // eval_batch_size, validation_interval=steps_per_loop, optimizer_config=optimization.OptimizationConfig({ 'optimizer': { @@ -176,14 +214,16 @@ def yt8m_experiment() -> cfg.ExperimentConfig: 'task.train_data.num_classes == task.validation_data.num_classes', 'task.train_data.feature_sizes != None', 'task.train_data.feature_names != None', + 'task.train_data.feature_sources != None', + 'task.train_data.feature_dtypes != None', ]) # Per TPUv3 Core batch size 16GB HBM. `factor` in range(1, 26) factor = 1 - num_cores = 32 # for TPU 4x4 + num_cores = 32 # for TPUv3 4x4 train_per_core_bs = 32 * factor train_bs = train_per_core_bs * num_cores - eval_per_core_bs = 32 * 50 # multiplier<=100 + eval_per_core_bs = 4 * 50 # multiplier<=100 eval_bs = eval_per_core_bs * num_cores # based lr=0.0001 for bs=512 return add_trainer( diff --git a/official/projects/yt8m/configs/yt8m_test.py b/official/projects/yt8m/configs/yt8m_test.py new file mode 100644 index 0000000000000000000000000000000000000000..04a153de0bcc93496110b21e67ce27f4b36cb0ba --- /dev/null +++ b/official/projects/yt8m/configs/yt8m_test.py @@ -0,0 +1,40 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from absl.testing import parameterized +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.projects.yt8m.configs import yt8m # pylint: disable=unused-import +from official.projects.yt8m.configs.yt8m import yt8m as exp_cfg + + +class YT8MTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + ('yt8m_experiment',),) + def test_yt8m_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, cfg.TaskConfig) + self.assertIsInstance(config.task.model, hyperparams.Config) + self.assertIsInstance(config.task.train_data, cfg.DataConfig) + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/yt8m/dataloaders/utils.py b/official/projects/yt8m/dataloaders/utils.py index eda1a4ab23ac9e8ccf27f03ebcc86ae6be7b34e6..d37f51ef2c0b440c096baa4727a692a40bf9d632 100644 --- a/official/projects/yt8m/dataloaders/utils.py +++ b/official/projects/yt8m/dataloaders/utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,11 +15,12 @@ """Contains a collection of util functions for training and evaluating.""" from absl import logging -import numpy +import numpy as np import tensorflow as tf +from official.vision.dataloaders import tfexample_utils -def Dequantize(feat_vector, max_quantized_value=2, min_quantized_value=-2): +def dequantize(feat_vector, max_quantized_value=2, min_quantized_value=-2): """Dequantize the feature from the byte format to the float format. Args: @@ -37,7 +38,7 @@ def Dequantize(feat_vector, max_quantized_value=2, min_quantized_value=-2): return feat_vector * scalar + bias -def MakeSummary(name, value): +def make_summary(name, value): """Creates a tf.Summary proto with the given name and value.""" summary = tf.Summary() val = summary.value.add() @@ -46,10 +47,10 @@ def MakeSummary(name, value): return summary -def AddGlobalStepSummary(summary_writer, - global_step_val, - global_step_info_dict, - summary_scope="Eval"): +def add_global_step_summary(summary_writer, + global_step_val, + global_step_info_dict, + summary_scope="Eval"): """Add the global_step summary to the Tensorboard. Args: @@ -68,19 +69,19 @@ def AddGlobalStepSummary(summary_writer, examples_per_second = global_step_info_dict.get("examples_per_second", -1) summary_writer.add_summary( - MakeSummary("GlobalStep/" + summary_scope + "_Hit@1", this_hit_at_one), + make_summary("GlobalStep/" + summary_scope + "_Hit@1", this_hit_at_one), global_step_val) summary_writer.add_summary( - MakeSummary("GlobalStep/" + summary_scope + "_Perr", this_perr), + make_summary("GlobalStep/" + summary_scope + "_Perr", this_perr), global_step_val) summary_writer.add_summary( - MakeSummary("GlobalStep/" + summary_scope + "_Loss", this_loss), + make_summary("GlobalStep/" + summary_scope + "_Loss", this_loss), global_step_val) if examples_per_second != -1: summary_writer.add_summary( - MakeSummary("GlobalStep/" + summary_scope + "_Example_Second", - examples_per_second), global_step_val) + make_summary("GlobalStep/" + summary_scope + "_Example_Second", + examples_per_second), global_step_val) summary_writer.flush() info = ( @@ -91,10 +92,10 @@ def AddGlobalStepSummary(summary_writer, return info -def AddEpochSummary(summary_writer, - global_step_val, - epoch_info_dict, - summary_scope="Eval"): +def add_epoch_summary(summary_writer, + global_step_val, + epoch_info_dict, + summary_scope="Eval"): """Add the epoch summary to the Tensorboard. Args: @@ -113,21 +114,21 @@ def AddEpochSummary(summary_writer, avg_loss = epoch_info_dict["avg_loss"] aps = epoch_info_dict["aps"] gap = epoch_info_dict["gap"] - mean_ap = numpy.mean(aps) + mean_ap = np.mean(aps) summary_writer.add_summary( - MakeSummary("Epoch/" + summary_scope + "_Avg_Hit@1", avg_hit_at_one), + make_summary("Epoch/" + summary_scope + "_Avg_Hit@1", avg_hit_at_one), global_step_val) summary_writer.add_summary( - MakeSummary("Epoch/" + summary_scope + "_Avg_Perr", avg_perr), + make_summary("Epoch/" + summary_scope + "_Avg_Perr", avg_perr), global_step_val) summary_writer.add_summary( - MakeSummary("Epoch/" + summary_scope + "_Avg_Loss", avg_loss), + make_summary("Epoch/" + summary_scope + "_Avg_Loss", avg_loss), global_step_val) summary_writer.add_summary( - MakeSummary("Epoch/" + summary_scope + "_MAP", mean_ap), global_step_val) + make_summary("Epoch/" + summary_scope + "_MAP", mean_ap), global_step_val) summary_writer.add_summary( - MakeSummary("Epoch/" + summary_scope + "_GAP", gap), global_step_val) + make_summary("Epoch/" + summary_scope + "_GAP", gap), global_step_val) summary_writer.flush() info = ("epoch/eval number {0} | Avg_Hit@1: {1:.3f} | Avg_PERR: {2:.3f} " @@ -137,7 +138,7 @@ def AddEpochSummary(summary_writer, return info -def GetListOfFeatureNamesAndSizes(feature_names, feature_sizes): +def get_list_of_feature_names_and_sizes(feature_names, feature_sizes): """Extract the list of feature names and the dimensionality. Args: @@ -163,53 +164,53 @@ def GetListOfFeatureNamesAndSizes(feature_names, feature_sizes): return list_of_feature_names, list_of_feature_sizes -def ClipGradientNorms(gradients_to_variables, max_norm): - """Clips the gradients by the given value. +def make_yt8m_example(num_segment: int = 5) -> tf.train.SequenceExample: + """Generate fake data for unit tests.""" + rgb = np.random.randint(low=256, size=1024, dtype=np.uint8) + audio = np.random.randint(low=256, size=128, dtype=np.uint8) - Args: - gradients_to_variables: A list of gradient to variable pairs (tuples). - max_norm: the maximum norm value. - - Returns: - A list of clipped gradient to variable pairs. - """ - clipped_grads_and_vars = [] - for grad, var in gradients_to_variables: - if grad is not None: - if isinstance(grad, tf.IndexedSlices): - tmp = tf.clip_by_norm(grad.values, max_norm) - grad = tf.IndexedSlices(tmp, grad.indices, grad.dense_shape) - else: - grad = tf.clip_by_norm(grad, max_norm) - clipped_grads_and_vars.append((grad, var)) - return clipped_grads_and_vars - - -def CombineGradients(tower_grads): - """Calculate the combined gradient for each shared variable across all towers. - - Note that this function provides a synchronization point across all towers. + seq_example = tf.train.SequenceExample() + seq_example.context.feature["id"].bytes_list.value[:] = [b"id001"] + seq_example.context.feature["labels"].int64_list.value[:] = [1, 2, 3, 4] + seq_example.context.feature["segment_labels"].int64_list.value[:] = ( + [4] * num_segment) + seq_example.context.feature["segment_start_times"].int64_list.value[:] = [ + i * 5 for i in range(num_segment) + ] + seq_example.context.feature["segment_scores"].float_list.value[:] = ( + [0.5] * num_segment) + tfexample_utils.put_bytes_list_to_feature( + seq_example, rgb.tobytes(), key="rgb", repeat_num=120) + tfexample_utils.put_bytes_list_to_feature( + seq_example, audio.tobytes(), key="audio", repeat_num=120) + + return seq_example + + +# TODO(yeqing): Move the test related functions to test_utils. +def make_example_with_float_features( + num_segment: int = 5) -> tf.train.SequenceExample: + """Generate fake data for unit tests.""" + rgb = np.random.rand(1, 2048).astype(np.float32) + audio = np.random.rand(256).astype(np.float32) + + seq_example = tf.train.SequenceExample() + seq_example.context.feature["id"].bytes_list.value[:] = [b"id001"] + seq_example.context.feature["clip/label/index"].int64_list.value[:] = [ + 1, 2, 3, 4 + ] + seq_example.context.feature["segment_labels"].int64_list.value[:] = ( + [4] * num_segment) + seq_example.context.feature["segment_start_times"].int64_list.value[:] = [ + i * 5 for i in range(num_segment) + ] + seq_example.context.feature["segment_scores"].float_list.value[:] = ( + [0.] * num_segment) + seq_example.context.feature[ + "VIDEO_EMBEDDING/context_feature/floats"].float_list.value[:] = ( + audio.tolist()) - Args: - tower_grads: List of lists of (gradient, variable) tuples. The outer list is - over individual gradients. The inner list is over the gradient calculation - for each tower. + tfexample_utils.put_float_list_to_feature( + seq_example, rgb.tolist(), key="FEATURE/feature/floats") - Returns: - List of pairs of (gradient, variable) where the gradient has been summed - across all towers. - """ - filtered_grads = [ - [x for x in grad_list if x[0] is not None] for grad_list in tower_grads - ] - final_grads = [] - for i in range(len(filtered_grads[0])): - grads = [filtered_grads[t][i] for t in range(len(filtered_grads))] - grad = tf.stack([x[0] for x in grads], 0) - grad = tf.reduce_sum(grad, 0) - final_grads.append(( - grad, - filtered_grads[0][i][1], - )) - - return final_grads + return seq_example diff --git a/official/projects/yt8m/dataloaders/yt8m_input.py b/official/projects/yt8m/dataloaders/yt8m_input.py index 0ea305d425f2b71c4e94912fc18b6803b6b05f90..443e4a1e20f9b41bb99d6c764a58ddeea09e596b 100644 --- a/official/projects/yt8m/dataloaders/yt8m_input.py +++ b/official/projects/yt8m/dataloaders/yt8m_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,18 +22,17 @@ back into a range between min_quantized_value and max_quantized_value. link for details: https://research.google.com/youtube8m/download.html """ - -from typing import Dict +from typing import Any, Dict import tensorflow as tf from official.projects.yt8m.dataloaders import utils -from official.vision.beta.configs import video_classification as exp_cfg -from official.vision.beta.dataloaders import decoder -from official.vision.beta.dataloaders import parser +from official.vision.configs import video_classification as exp_cfg +from official.vision.dataloaders import decoder +from official.vision.dataloaders import parser def resize_axis(tensor, axis, new_size, fill_value=0): - """Truncates or pads a tensor to new_size on on a given axis. + """Truncates or pads a tensor to new_size on a given axis. Truncate or extend tensor such that tensor.shape[axis] == new_size. If the size increases, the padding will be performed at the end, using fill_value. @@ -81,13 +80,14 @@ def _process_segment_and_label(video_matrix, num_frames, contexts, num_frames: Number of frames per subclip. contexts: context information extracted from decoder segment_labels: if we read segment labels instead. - segment_size: the segment_size used for reading segments. + segment_size: the segment_size used for reading segments. Segment length. num_classes: a positive integer for the number of classes. Returns: output: dictionary containing batch information """ # Partition frame-level feature matrix to segment-level feature matrix. + batch_video_ids = None if segment_labels: start_times = contexts["segment_start_times"].values # Here we assume all the segments that started at the same start time has @@ -101,8 +101,9 @@ def _process_segment_and_label(video_matrix, num_frames, contexts, batch_video_matrix = tf.gather_nd(video_matrix, tf.expand_dims(range_mtx, axis=-1)) num_segment = tf.shape(batch_video_matrix)[0] - batch_video_ids = tf.reshape( - tf.tile([contexts["id"]], [num_segment]), (num_segment,)) + if "id" in contexts: + batch_video_ids = tf.reshape( + tf.tile([contexts["id"]], [num_segment]), (num_segment,)) batch_frames = tf.reshape( tf.tile([segment_size], [num_segment]), (num_segment,)) batch_frames = tf.cast(tf.expand_dims(batch_frames, 1), tf.float32) @@ -134,32 +135,35 @@ def _process_segment_and_label(video_matrix, num_frames, contexts, sparse_labels, default_value=False, validate_indices=False) # convert to batch format. - batch_video_ids = tf.expand_dims(contexts["id"], 0) + if "id" in contexts: + batch_video_ids = tf.expand_dims(contexts["id"], 0) batch_video_matrix = tf.expand_dims(video_matrix, 0) batch_labels = tf.expand_dims(labels, 0) batch_frames = tf.expand_dims(num_frames, 0) batch_label_weights = None output_dict = { - "video_ids": batch_video_ids, "video_matrix": batch_video_matrix, "labels": batch_labels, "num_frames": batch_frames, } + if batch_video_ids is not None: + output_dict["video_ids"] = batch_video_ids if batch_label_weights is not None: output_dict["label_weights"] = batch_label_weights return output_dict -def _get_video_matrix(features, feature_size, max_frames, max_quantized_value, - min_quantized_value): +def _get_video_matrix(features, feature_size, dtype, max_frames, + max_quantized_value, min_quantized_value): """Decodes features from an input string and quantizes it. Args: - features: raw feature values - feature_size: length of each frame feature vector - max_frames: number of frames (rows) in the output feature_matrix + features: raw feature values. + feature_size: length of each frame feature vector. + dtype: raw type of the feature. + max_frames: number of frames (rows) in the output feature_matrix. max_quantized_value: the maximum of the quantized value. min_quantized_value: the minimum of the quantized value. @@ -167,25 +171,27 @@ def _get_video_matrix(features, feature_size, max_frames, max_quantized_value, feature_matrix: matrix of all frame-features num_frames: number of frames in the sequence """ - decoded_features = tf.reshape( - tf.cast(tf.io.decode_raw(features, tf.uint8), tf.float32), - [-1, feature_size]) + decoded_features = tf.reshape(features, [-1, feature_size]) num_frames = tf.math.minimum(tf.shape(decoded_features)[0], max_frames) - feature_matrix = utils.Dequantize(decoded_features, max_quantized_value, - min_quantized_value) + if dtype.is_integer: + feature_matrix = utils.dequantize(decoded_features, max_quantized_value, + min_quantized_value) + else: + feature_matrix = decoded_features feature_matrix = resize_axis(feature_matrix, 0, max_frames) return feature_matrix, num_frames -def _concat_features(features, feature_names, feature_sizes, max_frames, - max_quantized_value, min_quantized_value): +def _concat_features(features, feature_names, feature_sizes, feature_dtypes, + max_frames, max_quantized_value, min_quantized_value): """Loads (potentially) different types of features and concatenates them. Args: features: raw feature values feature_names: list of feature names feature_sizes: list of features sizes + feature_dtypes: dtype of the feature. max_frames: number of frames in the sequence max_quantized_value: the maximum of the quantized value. min_quantized_value: the minimum of the quantized value. @@ -201,17 +207,20 @@ def _concat_features(features, feature_names, feature_sizes, max_frames, assert len(feature_names) == len(feature_sizes), ( "length of feature_names (={}) != length of feature_sizes (={})".format( len(feature_names), len(feature_sizes))) + assert len(feature_names) == len(feature_dtypes), ( + "length of feature_names (={}) != length of feature_sizes (={})".format( + len(feature_names), len(feature_dtypes))) num_frames = -1 # the number of frames in the video feature_matrices = [None] * num_features # an array of different features - for feature_index in range(num_features): + for i in range(num_features): feature_matrix, num_frames_in_this_feature = _get_video_matrix( - features[feature_names[feature_index]], feature_sizes[feature_index], - max_frames, max_quantized_value, min_quantized_value) + features[feature_names[i]], feature_sizes[i], + tf.dtypes.as_dtype(feature_dtypes[i]), max_frames, max_quantized_value, + min_quantized_value) if num_frames == -1: num_frames = num_frames_in_this_feature - - feature_matrices[feature_index] = feature_matrix + feature_matrices[i] = feature_matrix # cap the number of frames at self.max_frames num_frames = tf.minimum(num_frames, max_frames) @@ -223,7 +232,7 @@ def _concat_features(features, feature_names, feature_sizes, max_frames, class Decoder(decoder.Decoder): - """A tf.Example decoder for classification task.""" + """A tf.train.SequeneExample decoder for classification task.""" def __init__( self, @@ -232,9 +241,22 @@ class Decoder(decoder.Decoder): self._segment_labels = input_params.segment_labels self._feature_names = input_params.feature_names - self._context_features = { - "id": tf.io.FixedLenFeature([], tf.string), - } + self._feature_sources = input_params.feature_sources + self._feature_sizes = input_params.feature_sizes + self._feature_dtypes = input_params.feature_dtypes + self._feature_from_bytes = input_params.feature_from_bytes + self._include_video_id = input_params.include_video_id + self._label_field = input_params.label_field + + assert len(self._feature_names) == len(self._feature_sources), ( + "length of feature_names (={}) != length of feature_sizes (={})".format( + len(self._feature_names), len(self._feature_sources))) + + self._context_features = {} + self._sequence_features = {} + if self._include_video_id: + self._context_features["id"] = tf.io.FixedLenFeature([], tf.string) + if self._segment_labels: self._context_features.update({ # There is no need to read end-time given we always assume the segment @@ -244,22 +266,50 @@ class Decoder(decoder.Decoder): "segment_scores": tf.io.VarLenFeature(tf.float32) }) else: - self._context_features.update({"labels": tf.io.VarLenFeature(tf.int64)}) - - self._sequence_features = { - feature_name: tf.io.FixedLenSequenceFeature([], dtype=tf.string) - for feature_name in self._feature_names - } - - def decode(self, serialized_example): - """Parses a single tf.Example into image and label tensors.""" + self._add_labels_specification() + for i, name in enumerate(self._feature_names): + if self._feature_from_bytes[i]: + feature_type = tf.io.FixedLenSequenceFeature([], dtype=tf.string) + else: + dtype = tf.dtypes.as_dtype(self._feature_dtypes[i]) + feature_shape = [self._feature_sizes[i]] + if self._feature_sources[i] == "feature": + feature_type = tf.io.FixedLenSequenceFeature(feature_shape, dtype) + else: + feature_type = tf.io.FixedLenFeature(feature_shape, dtype) + if self._feature_sources[i] == "feature": + self._sequence_features[name] = feature_type + elif self._feature_sources[i] == "context": + self._context_features[name] = feature_type + else: + raise ValueError( + f"Unknow feature source {self._feature_sources[i]} for {name}") + + def _add_labels_specification(self): + if not self._label_field: + raise ValueError(f"Invalid label field: {self._label_field}!") + self._context_features.update( + {self._label_field: tf.io.VarLenFeature(tf.int64)}) + + def decode(self, + serialized_example: tf.train.SequenceExample) -> Dict[str, Any]: + """Parses a single tf.train.SequenceExample into video and label tensors.""" contexts, features = tf.io.parse_single_sequence_example( serialized_example, context_features=self._context_features, sequence_features=self._sequence_features) - - return {"contexts": contexts, "features": features} + decoded_tensor = {**contexts, **features} + for i, name in enumerate(self._feature_names): + # Convert the VarLen feature to dense tensor. + if self._feature_from_bytes[i]: + dtype = tf.dtypes.as_dtype(self._feature_dtypes[i]) + decoded_tensor[name] = tf.cast( + tf.io.decode_raw(decoded_tensor[name], dtype), tf.float32), + else: + if isinstance(decoded_tensor[name], tf.SparseTensor): + decoded_tensor[name] = tf.sparse.to_dense(decoded_tensor[name]) + return decoded_tensor class Parser(parser.Parser): @@ -278,14 +328,14 @@ class Parser(parser.Parser): min_quantized_value=-2, ): self._num_classes = input_params.num_classes + self._label_field = input_params.label_field self._segment_size = input_params.segment_size self._segment_labels = input_params.segment_labels + self._include_video_id = input_params.include_video_id self._feature_names = input_params.feature_names self._feature_sizes = input_params.feature_sizes - self.stride = input_params.temporal_stride + self._feature_dtypes = input_params.feature_dtypes self._max_frames = input_params.max_frames - self._num_frames = input_params.num_frames - self._seed = input_params.random_seed self._max_quantized_value = max_quantized_value self._min_quantized_value = min_quantized_value @@ -293,27 +343,46 @@ class Parser(parser.Parser): """Parses data for training.""" # loads (potentially) different types of features and concatenates them self.video_matrix, self.num_frames = _concat_features( - decoded_tensors["features"], self._feature_names, self._feature_sizes, - self._max_frames, self._max_quantized_value, self._min_quantized_value) - output_dict = _process_segment_and_label(self.video_matrix, self.num_frames, - decoded_tensors["contexts"], - self._segment_labels, - self._segment_size, - self._num_classes) - return output_dict + decoded_tensors, self._feature_names, self._feature_sizes, + self._feature_dtypes, self._max_frames, self._max_quantized_value, + self._min_quantized_value) + if not self._include_video_id and "id" in decoded_tensors: + del decoded_tensors["id"] + + return self._process_label(self.video_matrix, self.num_frames, + decoded_tensors) def _parse_eval_data(self, decoded_tensors): """Parses data for evaluation.""" # loads (potentially) different types of features and concatenates them self.video_matrix, self.num_frames = _concat_features( - decoded_tensors["features"], self._feature_names, self._feature_sizes, - self._max_frames, self._max_quantized_value, self._min_quantized_value) - output_dict = _process_segment_and_label(self.video_matrix, self.num_frames, - decoded_tensors["contexts"], + decoded_tensors, self._feature_names, self._feature_sizes, + self._feature_dtypes, self._max_frames, self._max_quantized_value, + self._min_quantized_value) + if not self._include_video_id and "id" in decoded_tensors: + del decoded_tensors["id"] + + return self._process_label(self.video_matrix, self.num_frames, + decoded_tensors) + + def _process_label(self, video_matrix, num_frames, contexts): + """Processes a batched Tensor of frames. + + Args: + video_matrix: video feature matric. + num_frames: number of frames in this video. + contexts: context information extracted from decoder. + + Returns: + output: dictionary containing batch information + """ + if self._label_field and not self._segment_labels: + contexts["labels"] = contexts[self._label_field] + output_dict = _process_segment_and_label(video_matrix, num_frames, contexts, self._segment_labels, self._segment_size, self._num_classes) - return output_dict # batched + return output_dict def parse_fn(self, is_training): """Returns a parse fn that reads and parses raw tensors from the decoder. @@ -337,50 +406,6 @@ class Parser(parser.Parser): return parse -class PostBatchProcessor(): - """Processes a video and label dataset which is batched.""" - - def __init__(self, input_params: exp_cfg.DataConfig): - self.segment_labels = input_params.segment_labels - self.num_classes = input_params.num_classes - self.segment_size = input_params.segment_size - - def post_fn(self, batched_tensors): - """Processes batched Tensors.""" - video_ids = batched_tensors["video_ids"] - video_matrix = batched_tensors["video_matrix"] - labels = batched_tensors["labels"] - num_frames = batched_tensors["num_frames"] - label_weights = None - - if self.segment_labels: - # [batch x num_segment x segment_size x num_features] - # -> [batch * num_segment x segment_size x num_features] - video_ids = tf.reshape(video_ids, [-1]) - video_matrix = tf.reshape(video_matrix, [-1, self.segment_size, 1152]) - labels = tf.reshape(labels, [-1, self.num_classes]) - num_frames = tf.reshape(num_frames, [-1, 1]) - - label_weights = tf.reshape(batched_tensors["label_weights"], - [-1, self.num_classes]) - - else: - video_matrix = tf.squeeze(video_matrix) - labels = tf.squeeze(labels) - - batched_tensors = { - "video_ids": video_ids, - "video_matrix": video_matrix, - "labels": labels, - "num_frames": num_frames, - } - - if label_weights is not None: - batched_tensors["label_weights"] = label_weights - - return batched_tensors - - class TransformBatcher(): """Performs manual batching on input dataset.""" @@ -388,32 +413,84 @@ class TransformBatcher(): self._segment_labels = input_params.segment_labels self._global_batch_size = input_params.global_batch_size self._is_training = input_params.is_training + self._include_video_id = input_params.include_video_id + self._drop_remainder = input_params.drop_remainder def batch_fn(self, dataset, input_context): """Add padding when segment_labels is true.""" per_replica_batch_size = input_context.get_per_replica_batch_size( self._global_batch_size) if input_context else self._global_batch_size if not self._segment_labels: - dataset = dataset.batch(per_replica_batch_size, drop_remainder=True) + dataset = dataset.batch( + per_replica_batch_size, drop_remainder=self._drop_remainder) else: # add padding pad_shapes = { - "video_ids": [None], "video_matrix": [None, None, None], "labels": [None, None], "num_frames": [None, None], "label_weights": [None, None] } pad_values = { - "video_ids": None, "video_matrix": 0.0, "labels": -1.0, "num_frames": 0.0, "label_weights": 0.0 } + if self._include_video_id: + pad_shapes["video_ids"] = [None] + pad_values["video_ids"] = None dataset = dataset.padded_batch( per_replica_batch_size, padded_shapes=pad_shapes, - drop_remainder=True, + drop_remainder=self._drop_remainder, padding_values=pad_values) return dataset + + +class PostBatchProcessor(): + """Processes a video and label dataset which is batched.""" + + def __init__(self, input_params: exp_cfg.DataConfig): + self.segment_labels = input_params.segment_labels + self.num_classes = input_params.num_classes + self.segment_size = input_params.segment_size + self.num_features = sum(input_params.feature_sizes) + + def post_fn(self, batched_tensors: Dict[str, + tf.Tensor]) -> Dict[str, tf.Tensor]: + """Processes batched Tensors.""" + video_ids = batched_tensors.get("video_ids", None) + video_matrix = batched_tensors["video_matrix"] + labels = batched_tensors["labels"] + num_frames = batched_tensors["num_frames"] + + if self.segment_labels: + # [batch x num_segment x segment_size x num_features] + # -> [batch * num_segment x segment_size x num_features] + if video_ids is not None: + video_ids = tf.reshape(video_ids, [-1]) + video_matrix = tf.reshape(video_matrix, + [-1, self.segment_size, self.num_features]) + labels = tf.reshape(labels, [-1, self.num_classes]) + num_frames = tf.reshape(num_frames, [-1, 1]) + batched_tensors["label_weights"] = tf.reshape( + batched_tensors["label_weights"], [-1, self.num_classes]) + else: + # NOTE(b/237445211): Must provide axis argument to tf.squeeze. + video_matrix = tf.squeeze(video_matrix, axis=1) + labels = tf.squeeze(labels, axis=1) + num_frames = tf.reshape(num_frames, [-1, 1]) + if "label_weights" in batched_tensors: + batched_tensors["label_weights"] = tf.squeeze( + batched_tensors["label_weights"], axis=1) + + batched_tensors.update({ + "video_matrix": video_matrix, + "labels": labels, + "num_frames": num_frames, + }) + if video_ids is not None: + batched_tensors["video_ids"] = video_ids + + return batched_tensors diff --git a/official/projects/yt8m/dataloaders/yt8m_input_test.py b/official/projects/yt8m/dataloaders/yt8m_input_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b89dbb9950b3071b56c122aab3f8b923298e9c59 --- /dev/null +++ b/official/projects/yt8m/dataloaders/yt8m_input_test.py @@ -0,0 +1,200 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os + +from absl import logging +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.core import input_reader +from official.projects.yt8m.configs import yt8m as yt8m_configs +from official.projects.yt8m.dataloaders import utils +from official.projects.yt8m.dataloaders import yt8m_input +from official.vision.dataloaders import tfexample_utils + + +class Yt8mInputTest(parameterized.TestCase, tf.test.TestCase): + + def setUp(self): + super().setUp() + self._model_dir = os.path.join(self.get_temp_dir(), 'model_dir') + tf.io.gfile.makedirs(self._model_dir) + + data_dir = os.path.join(self.get_temp_dir(), 'data') + tf.io.gfile.makedirs(data_dir) + self.data_path = os.path.join(data_dir, 'data.tfrecord') + self.num_segment = 6 + examples = [utils.make_yt8m_example(self.num_segment) for _ in range(8)] + tfexample_utils.dump_to_tfrecord(self.data_path, tf_examples=examples) + + def create_input_reader(self, params): + decoder = yt8m_input.Decoder(input_params=params) + decoder_fn = decoder.decode + parser = yt8m_input.Parser(input_params=params) + parser_fn = parser.parse_fn(params.is_training) + postprocess = yt8m_input.PostBatchProcessor(input_params=params) + postprocess_fn = postprocess.post_fn + transform_batch = yt8m_input.TransformBatcher(input_params=params) + batch_fn = transform_batch.batch_fn + + return input_reader.InputReader( + params, + dataset_fn=tf.data.TFRecordDataset, + decoder_fn=decoder_fn, + parser_fn=parser_fn, + postprocess_fn=postprocess_fn, + transform_and_batch_fn=batch_fn) + + @parameterized.parameters((True,), (False,)) + def test_read_video_level_input(self, include_video_id): + params = yt8m_configs.yt8m(is_training=False) + params.global_batch_size = 4 + params.segment_labels = False + params.input_path = self.data_path + params.include_video_id = include_video_id + reader = self.create_input_reader(params) + + dataset = reader.read() + iterator = iter(dataset) + example = next(iterator) + + for k, v in example.items(): + logging.info('DEBUG read example %r %r %r', k, v.shape, type(v)) + if include_video_id: + self.assertCountEqual( + ['video_matrix', 'labels', 'num_frames', 'video_ids'], example.keys()) + else: + self.assertCountEqual(['video_matrix', 'labels', 'num_frames'], + example.keys()) + batch_size = params.global_batch_size + self.assertEqual(example['video_matrix'].shape.as_list(), + [batch_size, params.max_frames, + sum(params.feature_sizes)]) + self.assertEqual(example['labels'].shape.as_list(), + [batch_size, params.num_classes]) + # Check non empty labels. + self.assertGreater(np.nonzero(example['labels'][0].numpy())[0].shape[0], 0) + + self.assertEqual(example['num_frames'].shape.as_list(), [batch_size, 1]) + if include_video_id: + self.assertEqual(example['video_ids'].shape.as_list(), [batch_size, 1]) + + @parameterized.parameters((True,), (False,)) + def test_read_segement_level_input(self, include_video_id): + params = yt8m_configs.yt8m(is_training=False) + params.global_batch_size = 4 + params.segment_labels = True + params.input_path = self.data_path + params.include_video_id = include_video_id + reader = self.create_input_reader(params) + + dataset = reader.read() + iterator = iter(dataset) + example = next(iterator) + + for k, v in example.items(): + logging.info('DEBUG read example %r %r %r', k, v.shape, type(v)) + if include_video_id: + self.assertCountEqual([ + 'video_matrix', 'labels', 'num_frames', 'label_weights', 'video_ids' + ], example.keys()) + else: + self.assertCountEqual( + ['video_matrix', 'labels', 'num_frames', 'label_weights'], + example.keys()) + batch_size = params.global_batch_size * self.num_segment + self.assertEqual( + example['video_matrix'].shape.as_list(), + [batch_size, params.segment_size, + sum(params.feature_sizes)]) + self.assertEqual(example['labels'].shape.as_list(), + [batch_size, params.num_classes]) + self.assertGreater(np.nonzero(example['labels'][0].numpy())[0].shape[0], 0) + self.assertEqual(example['num_frames'].shape.as_list(), [batch_size, 1]) + self.assertEqual(example['label_weights'].shape.as_list(), + [batch_size, params.num_classes]) + if include_video_id: + self.assertEqual(example['video_ids'].shape.as_list(), [batch_size]) + + @parameterized.parameters((True,), (False,)) + def test_read_video_level_float_input(self, include_video_id): + data_dir = os.path.join(self.get_temp_dir(), 'data2') + tf.io.gfile.makedirs(data_dir) + data_path = os.path.join(data_dir, 'data2.tfrecord') + examples = [ + utils.make_example_with_float_features(self.num_segment) + for _ in range(8) + ] + tfexample_utils.dump_to_tfrecord(data_path, tf_examples=examples) + + params = yt8m_configs.yt8m(is_training=False) + params.global_batch_size = 4 + params.segment_labels = False + params.input_path = data_path + params.num_frames = 2 + params.max_frames = 2 + params.feature_names = ('VIDEO_EMBEDDING/context_feature/floats', + 'FEATURE/feature/floats') + params.feature_sources = ('context', 'feature') + params.feature_dtypes = ('float32', 'float32') + params.feature_sizes = (256, 2048) + params.feature_from_bytes = (False, False) + params.label_field = 'clip/label/index' + params.include_video_id = include_video_id + reader = self.create_input_reader(params) + + dataset = reader.read() + iterator = iter(dataset) + example = next(iterator) + + for k, v in example.items(): + logging.info('DEBUG read example %r %r %r', k, v.shape, type(v)) + logging.info('DEBUG read example %r', example['video_matrix'][0, 0, :]) + if include_video_id: + self.assertCountEqual( + ['video_matrix', 'labels', 'num_frames', 'video_ids'], example.keys()) + else: + self.assertCountEqual(['video_matrix', 'labels', 'num_frames'], + example.keys()) + + # Check tensor values. + expected_context = examples[0].context.feature[ + 'VIDEO_EMBEDDING/context_feature/floats'].float_list.value + expected_feature = examples[0].feature_lists.feature_list[ + 'FEATURE/feature/floats'].feature[0].float_list.value + expected_labels = examples[0].context.feature[ + params.label_field].int64_list.value + self.assertAllEqual(expected_feature, + example['video_matrix'][0, 0, params.feature_sizes[0]:]) + self.assertAllEqual(expected_context, + example['video_matrix'][0, 0, :params.feature_sizes[0]]) + self.assertAllEqual( + np.nonzero(example['labels'][0, :].numpy())[0], expected_labels) + self.assertGreater(np.nonzero(example['labels'][0].numpy())[0].shape[0], 0) + + # Check tensor shape. + batch_size = params.global_batch_size + self.assertEqual(example['video_matrix'].shape.as_list(), + [batch_size, params.max_frames, + sum(params.feature_sizes)]) + self.assertEqual(example['labels'].shape.as_list(), + [batch_size, params.num_classes]) + self.assertEqual(example['num_frames'].shape.as_list(), [batch_size, 1]) + if include_video_id: + self.assertEqual(example['video_ids'].shape.as_list(), [batch_size, 1]) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/yt8m/eval_utils/average_precision_calculator.py b/official/projects/yt8m/eval_utils/average_precision_calculator.py index 9bf1123793d5ee8f726687a9681936c7d49553bf..4e47962629e85d13344e2f7dea1373ecd97a53da 100644 --- a/official/projects/yt8m/eval_utils/average_precision_calculator.py +++ b/official/projects/yt8m/eval_utils/average_precision_calculator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -268,6 +268,5 @@ class AveragePrecisionCalculator(object): The normalized prediction. """ denominator = numpy.max(predictions) - numpy.min(predictions) - ret = (predictions - numpy.min(predictions)) / numpy.max( - denominator, epsilon) + ret = (predictions - numpy.min(predictions)) / max(denominator, epsilon) return ret diff --git a/official/projects/yt8m/eval_utils/eval_util.py b/official/projects/yt8m/eval_utils/eval_util.py index 617aeda2c25924d2a09ffef3f57f6489de4c82cf..9fdb9e608344396e08680a29c815b85916c121d4 100644 --- a/official/projects/yt8m/eval_utils/eval_util.py +++ b/official/projects/yt8m/eval_utils/eval_util.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,6 +13,7 @@ # limitations under the License. """Provides functions to help with evaluating models.""" +import logging import numpy as np import tensorflow as tf from official.projects.yt8m.eval_utils import average_precision_calculator as ap_calculator @@ -57,6 +58,9 @@ def calculate_precision_at_equal_recall_rate(predictions, actuals): """ aggregated_precision = 0.0 num_videos = actuals.shape[0] + if num_videos == 0: + logging.warning("Num_videos is 0, returning 0.0 aggregated_precision.") + return aggregated_precision for row in np.arange(num_videos): num_labels = int(np.sum(actuals[row])) top_indices = np.argpartition(predictions[row], -num_labels)[-num_labels:] @@ -99,8 +103,8 @@ def top_k_by_class(predictions, labels, k=20): Args: predictions: A numpy matrix containing the outputs of the model. Dimensions are 'batch' x 'num_classes'. - labels: A numpy matrix containing the ground truth labels. - Dimensions are 'batch' x 'num_classes'. + labels: A numpy matrix containing the ground truth labels. Dimensions are + 'batch' x 'num_classes'. k: the top k non-zero entries to preserve in each prediction. Returns: @@ -139,9 +143,10 @@ def top_k_triplets(predictions, labels, k=20): Args: predictions: A numpy matrix containing the outputs of the model. Dimensions are 'batch' x 'num_classes'. - labels: A numpy matrix containing the ground truth labels. - Dimensions are 'batch' x 'num_classes'. + labels: A numpy matrix containing the ground truth labels. Dimensions are + 'batch' x 'num_classes'. k: The number top predictions to pick. + Returns: a sparse list of tuples in (prediction, class) format. """ @@ -171,7 +176,7 @@ class EvaluationMetrics(object): self.sum_hit_at_one = 0.0 self.sum_perr = 0.0 self.map_calculator = map_calculator.MeanAveragePrecisionCalculator( - num_class, top_n=top_n) + num_class, filter_empty_classes=False, top_n=top_n) self.global_ap_calculator = ap_calculator.AveragePrecisionCalculator() self.top_k = top_k self.num_examples = 0 @@ -213,9 +218,13 @@ class EvaluationMetrics(object): return {"hit_at_one": mean_hit_at_one, "perr": mean_perr} - def get(self): + def get(self, return_per_class_ap=False): """Calculate the evaluation metrics for the whole epoch. + Args: + return_per_class_ap: a bool variable to determine whether return the + detailed class-wise ap for more detailed analysis. Default is `False`. + Raises: ValueError: If no examples were accumulated. @@ -239,6 +248,10 @@ class EvaluationMetrics(object): "map": mean_ap, "gap": gap } + + if return_per_class_ap: + epoch_info_dict["per_class_ap"] = aps + return epoch_info_dict def clear(self): diff --git a/official/projects/yt8m/eval_utils/eval_util_test.py b/official/projects/yt8m/eval_utils/eval_util_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8a1431e9426f8579cebf71251d3ad5aee59b4ada --- /dev/null +++ b/official/projects/yt8m/eval_utils/eval_util_test.py @@ -0,0 +1,70 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from absl import logging +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.projects.yt8m.eval_utils.average_precision_calculator import AveragePrecisionCalculator + + +class YT8MAveragePrecisionCalculatorTest(parameterized.TestCase, + tf.test.TestCase): + + def setUp(self): + super().setUp() + self.prediction = np.array([ + [0.98, 0.88, 0.77, 0.65, 0.64, 0.59, 0.45, 0.43, 0.20, 0.05], + [0.878, 0.832, 0.759, 0.621, 0.458, 0.285, 0.134], + [0.98], + [0.56], + ]) + self.raw_prediction = np.random.rand(5, 10) + np.random.randint( + low=0, high=10, size=(5, 10)) + self.ground_truth = np.array([[1, 1, 0, 0, 0, 1, 1, 0, 0, 1], + [1, 0, 1, 0, 0, 1, 0], [1], [0]]) + + self.expected_ap = np.array([ + 0.714, + 0.722, + 1.000, + 0.000, + ]) + + def test_ap_calculator_ap(self): + + # Compare Expected Average Precision with function expected + for i, _ in enumerate(self.ground_truth): + calculator = AveragePrecisionCalculator() + ap = calculator.ap(self.prediction[i], self.ground_truth[i]) + logging.info('DEBUG %dth AP: %r', i + 1, ap) + + def test_ap_calculator_zero_one_normalize(self): + for i, _ in enumerate(self.raw_prediction): + calculator = AveragePrecisionCalculator() + logging.error('%r', self.raw_prediction[i]) + normalized_score = calculator._zero_one_normalize(self.raw_prediction[i]) + self.assertAllInRange(normalized_score, lower_bound=0.0, upper_bound=1.0) + + @parameterized.parameters((None,), (3,), (5,), (10,), (20,)) + def test_ap_calculator_ap_at_n(self, n): + for i, _ in enumerate(self.ground_truth): + calculator = AveragePrecisionCalculator(n) + ap = calculator.ap_at_n(self.prediction[i], self.ground_truth[i], n) + logging.info('DEBUG %dth AP: %r', i + 1, ap) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/projects/yt8m/eval_utils/mean_average_precision_calculator.py b/official/projects/yt8m/eval_utils/mean_average_precision_calculator.py index 6004522195c2a036a16aebb63ceef794794eea18..a5ed00000c98f5c6173de7850335aa0370746fe2 100644 --- a/official/projects/yt8m/eval_utils/mean_average_precision_calculator.py +++ b/official/projects/yt8m/eval_utils/mean_average_precision_calculator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yt8m/experiments/yt8m.yaml b/official/projects/yt8m/experiments/yt8m.yaml index c099f23f90b3c94b874e60713b63de6ebd1c1c3a..c4d2ed2ea50d7a0ac240f2f57f8d80d51d9b6bc6 100644 --- a/official/projects/yt8m/experiments/yt8m.yaml +++ b/official/projects/yt8m/experiments/yt8m.yaml @@ -27,7 +27,6 @@ task: num_devices: 1 input_path: 'gs://youtube8m-ml/2/frame/train/train*.tfrecord' is_training: true - random_seed: 123 validation_data: name: 'yt8m' split: 'train' @@ -46,7 +45,6 @@ task: num_devices: 1 input_path: 'gs://youtube8m-ml/3/frame/validate/validate*.tfrecord' is_training: false - random_seed: 123 losses: name: 'binary_crossentropy' from_logits: false diff --git a/official/projects/yt8m/modeling/__init__.py b/official/projects/yt8m/modeling/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/projects/yt8m/modeling/__init__.py +++ b/official/projects/yt8m/modeling/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yt8m/modeling/nn_layers.py b/official/projects/yt8m/modeling/nn_layers.py new file mode 100644 index 0000000000000000000000000000000000000000..67638db3a8cf254868d646d99381a8a633170bac --- /dev/null +++ b/official/projects/yt8m/modeling/nn_layers.py @@ -0,0 +1,119 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains model definitions.""" +from typing import Any, Dict, Optional + +import tensorflow as tf +from official.projects.yt8m.modeling import yt8m_model_utils as utils + +layers = tf.keras.layers + + +class LogisticModel(): + """Logistic model with L2 regularization.""" + + def create_model(self, model_input, vocab_size, l2_penalty=1e-8): + """Creates a logistic model. + + Args: + model_input: 'batch' x 'num_features' matrix of input features. + vocab_size: The number of classes in the dataset. + l2_penalty: L2 weight regularization ratio. + + Returns: + A dictionary with a tensor containing the probability predictions of the + model in the 'predictions' key. The dimensions of the tensor are + batch_size x num_classes. + """ + output = layers.Dense( + vocab_size, + activation=tf.nn.sigmoid, + kernel_regularizer=tf.keras.regularizers.l2(l2_penalty))( + model_input) + return {"predictions": output} + + +class MoeModel(): + """A softmax over a mixture of logistic models (with L2 regularization).""" + + def create_model(self, + model_input, + vocab_size, + num_mixtures: int = 2, + use_input_context_gate: bool = False, + use_output_context_gate: bool = False, + normalizer_fn=None, + normalizer_params: Optional[Dict[str, Any]] = None, + l2_penalty: float = 1e-5): + """Creates a Mixture of (Logistic) Experts model. + + The model consists of a per-class softmax distribution over a + configurable number of logistic classifiers. One of the classifiers + in the mixture is not trained, and always predicts 0. + Args: + model_input: 'batch_size' x 'num_features' matrix of input features. + vocab_size: The number of classes in the dataset. + num_mixtures: The number of mixtures (excluding a dummy 'expert' that + always predicts the non-existence of an entity). + use_input_context_gate: if True apply context gate layer to the input. + use_output_context_gate: if True apply context gate layer to the output. + normalizer_fn: normalization op constructor (e.g. batch norm). + normalizer_params: parameters to the `normalizer_fn`. + l2_penalty: How much to penalize the squared magnitudes of parameter + values. + + Returns: + A dictionary with a tensor containing the probability predictions + of the model in the 'predictions' key. The dimensions of the tensor + are batch_size x num_classes. + """ + if use_input_context_gate: + model_input = utils.context_gate( + model_input, + normalizer_fn=normalizer_fn, + normalizer_params=normalizer_params, + ) + + gate_activations = layers.Dense( + vocab_size * (num_mixtures + 1), + activation=None, + bias_initializer=None, + kernel_regularizer=tf.keras.regularizers.l2(l2_penalty))( + model_input) + expert_activations = layers.Dense( + vocab_size * num_mixtures, + activation=None, + kernel_regularizer=tf.keras.regularizers.l2(l2_penalty))( + model_input) + + gating_distribution = tf.nn.softmax( + tf.reshape( + gate_activations, + [-1, num_mixtures + 1])) # (Batch * #Labels) x (num_mixtures + 1) + expert_distribution = tf.nn.sigmoid( + tf.reshape(expert_activations, + [-1, num_mixtures])) # (Batch * #Labels) x num_mixtures + + final_probabilities_by_class_and_batch = tf.reduce_sum( + gating_distribution[:, :num_mixtures] * expert_distribution, 1) + final_probabilities = tf.reshape(final_probabilities_by_class_and_batch, + [-1, vocab_size]) + if use_output_context_gate: + final_probabilities = utils.context_gate( + final_probabilities, + normalizer_fn=normalizer_fn, + normalizer_params=normalizer_params, + ) + return {"predictions": final_probabilities} diff --git a/official/projects/yt8m/modeling/yt8m_agg_models.py b/official/projects/yt8m/modeling/yt8m_agg_models.py deleted file mode 100644 index 0e46fc5fd4562834473cca572345f0675233e1d2..0000000000000000000000000000000000000000 --- a/official/projects/yt8m/modeling/yt8m_agg_models.py +++ /dev/null @@ -1,119 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains model definitions.""" -from typing import Any, Dict, Optional - -import tensorflow as tf -from official.projects.yt8m.modeling import yt8m_model_utils as utils - -layers = tf.keras.layers - - -class LogisticModel(): - """Logistic model with L2 regularization.""" - - def create_model(self, model_input, vocab_size, l2_penalty=1e-8): - """Creates a logistic model. - - Args: - model_input: 'batch' x 'num_features' matrix of input features. - vocab_size: The number of classes in the dataset. - l2_penalty: L2 weight regularization ratio. - - Returns: - A dictionary with a tensor containing the probability predictions of the - model in the 'predictions' key. The dimensions of the tensor are - batch_size x num_classes. - """ - output = layers.Dense( - vocab_size, - activation=tf.nn.sigmoid, - kernel_regularizer=tf.keras.regularizers.l2(l2_penalty))( - model_input) - return {"predictions": output} - - -class MoeModel(): - """A softmax over a mixture of logistic models (with L2 regularization).""" - - def create_model(self, - model_input, - vocab_size, - num_mixtures: int = 2, - use_input_context_gate: bool = False, - use_output_context_gate: bool = False, - normalizer_fn=None, - normalizer_params: Optional[Dict[str, Any]] = None, - l2_penalty: float = 1e-5): - """Creates a Mixture of (Logistic) Experts model. - - The model consists of a per-class softmax distribution over a - configurable number of logistic classifiers. One of the classifiers - in the mixture is not trained, and always predicts 0. - Args: - model_input: 'batch_size' x 'num_features' matrix of input features. - vocab_size: The number of classes in the dataset. - num_mixtures: The number of mixtures (excluding a dummy 'expert' that - always predicts the non-existence of an entity). - use_input_context_gate: if True apply context gate layer to the input. - use_output_context_gate: if True apply context gate layer to the output. - normalizer_fn: normalization op constructor (e.g. batch norm). - normalizer_params: parameters to the `normalizer_fn`. - l2_penalty: How much to penalize the squared magnitudes of parameter - values. - - Returns: - A dictionary with a tensor containing the probability predictions - of the model in the 'predictions' key. The dimensions of the tensor - are batch_size x num_classes. - """ - if use_input_context_gate: - model_input = utils.context_gate( - model_input, - normalizer_fn=normalizer_fn, - normalizer_params=normalizer_params, - ) - - gate_activations = layers.Dense( - vocab_size * (num_mixtures + 1), - activation=None, - bias_initializer=None, - kernel_regularizer=tf.keras.regularizers.l2(l2_penalty))( - model_input) - expert_activations = layers.Dense( - vocab_size * num_mixtures, - activation=None, - kernel_regularizer=tf.keras.regularizers.l2(l2_penalty))( - model_input) - - gating_distribution = tf.nn.softmax( - tf.reshape( - gate_activations, - [-1, num_mixtures + 1])) # (Batch * #Labels) x (num_mixtures + 1) - expert_distribution = tf.nn.sigmoid( - tf.reshape(expert_activations, - [-1, num_mixtures])) # (Batch * #Labels) x num_mixtures - - final_probabilities_by_class_and_batch = tf.reduce_sum( - gating_distribution[:, :num_mixtures] * expert_distribution, 1) - final_probabilities = tf.reshape(final_probabilities_by_class_and_batch, - [-1, vocab_size]) - if use_output_context_gate: - final_probabilities = utils.context_gate( - final_probabilities, - normalizer_fn=normalizer_fn, - normalizer_params=normalizer_params, - ) - return {"predictions": final_probabilities} diff --git a/official/projects/yt8m/modeling/yt8m_model.py b/official/projects/yt8m/modeling/yt8m_model.py index 2259c9ece5403f4cc6e1b5bd49d12473cc353e6b..84668f05d119380bd4e9c56ddee02d748d4343c3 100644 --- a/official/projects/yt8m/modeling/yt8m_model.py +++ b/official/projects/yt8m/modeling/yt8m_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,9 +16,10 @@ from typing import Optional import tensorflow as tf + from official.modeling import tf_utils from official.projects.yt8m.configs import yt8m as yt8m_cfg -from official.projects.yt8m.modeling import yt8m_agg_models +from official.projects.yt8m.modeling import nn_layers from official.projects.yt8m.modeling import yt8m_model_utils as utils layers = tf.keras.layers @@ -38,9 +39,10 @@ class DbofModel(tf.keras.Model): def __init__( self, params: yt8m_cfg.DbofModel, - num_frames=30, - num_classes=3862, - input_specs=layers.InputSpec(shape=[None, None, 1152]), + num_frames: int = 30, + num_classes: int = 3862, + input_specs: layers.InputSpec = layers.InputSpec( + shape=[None, None, 1152]), kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, activation: str = "relu", use_sync_bn: bool = False, @@ -63,12 +65,11 @@ class DbofModel(tf.keras.Model): norm_epsilon: A `float` added to variance to avoid dividing by zero. **kwargs: keyword arguments to be passed. """ - + del num_frames self._self_setattr_tracking = False self._config_dict = { "input_specs": input_specs, "num_classes": num_classes, - "num_frames": num_frames, "params": params } self._num_classes = num_classes @@ -78,26 +79,24 @@ class DbofModel(tf.keras.Model): self._norm = layers.experimental.SyncBatchNormalization else: self._norm = layers.BatchNormalization - if tf.keras.backend.image_data_format() == "channels_last": - bn_axis = -1 - else: - bn_axis = 1 + bn_axis = -1 # [batch_size x num_frames x num_features] feature_size = input_specs.shape[-1] # shape 'excluding' batch_size model_input = tf.keras.Input(shape=self._input_specs.shape[1:]) - reshaped_input = tf.reshape(model_input, [-1, feature_size]) - tf.summary.histogram("input_hist", model_input) + # normalize input features + input_data = tf.nn.l2_normalize(model_input, -1) + tf.summary.histogram("input_hist", input_data) # configure model if params.add_batch_norm: - reshaped_input = self._norm( + input_data = self._norm( axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon, name="input_bn")( - reshaped_input) + input_data) # activation = reshaped input * cluster weights if params.cluster_size > 0: @@ -106,7 +105,7 @@ class DbofModel(tf.keras.Model): kernel_regularizer=kernel_regularizer, kernel_initializer=tf.random_normal_initializer( stddev=1 / tf.sqrt(tf.cast(feature_size, tf.float32))))( - reshaped_input) + input_data) if params.add_batch_norm: activation = self._norm( @@ -140,7 +139,7 @@ class DbofModel(tf.keras.Model): pooling_method=pooling_method, hidden_layer_size=params.context_gate_cluster_bottleneck_size, kernel_regularizer=kernel_regularizer) - activation = tf.reshape(activation, [-1, num_frames, params.cluster_size]) + activation = utils.frame_pooling(activation, params.pooling_method) # activation = activation * hidden1_weights @@ -170,7 +169,7 @@ class DbofModel(tf.keras.Model): activation = self._act_fn(activation) tf.summary.histogram("hidden1_output", activation) - aggregated_model = getattr(yt8m_agg_models, + aggregated_model = getattr(nn_layers, params.yt8m_agg_classifier_model) norm_args = dict(axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon) output = aggregated_model().create_model( diff --git a/official/projects/yt8m/modeling/yt8m_model_test.py b/official/projects/yt8m/modeling/yt8m_model_test.py index 3c957618bba2b7b6dff97f15e66865d895a832a5..b204ec6cf91f0849229bd43bd9f0d7e8afedd305 100644 --- a/official/projects/yt8m/modeling/yt8m_model_test.py +++ b/official/projects/yt8m/modeling/yt8m_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -34,7 +34,7 @@ class YT8MNetworkTest(parameterized.TestCase, tf.test.TestCase): num_frames: number of frames. feature_dims: indicates total dimension size of the features. """ - input_specs = tf.keras.layers.InputSpec(shape=[num_frames, feature_dims]) + input_specs = tf.keras.layers.InputSpec(shape=[None, None, feature_dims]) num_classes = 3862 model = yt8m_model.DbofModel( @@ -44,7 +44,7 @@ class YT8MNetworkTest(parameterized.TestCase, tf.test.TestCase): input_specs=input_specs) # batch = 2 -> arbitrary value for test - inputs = np.random.rand(2 * num_frames, feature_dims) + inputs = np.random.rand(2, num_frames, feature_dims) logits = model(inputs) self.assertAllEqual([2, num_classes], logits.numpy().shape) diff --git a/official/projects/yt8m/modeling/yt8m_model_utils.py b/official/projects/yt8m/modeling/yt8m_model_utils.py index c8497e9c1a59c27b4f9d2e1631e68e61d236cc18..d56fe44ba9e47de5a6b448301667109b5a5ce02a 100644 --- a/official/projects/yt8m/modeling/yt8m_model_utils.py +++ b/official/projects/yt8m/modeling/yt8m_model_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yt8m/tasks/__init__.py b/official/projects/yt8m/tasks/__init__.py index fe6bc09d2cd80d575ce40326be41a0a6d0220d9e..85df31a45b8c1dee9b096e975c519963e796c07e 100644 --- a/official/projects/yt8m/tasks/__init__.py +++ b/official/projects/yt8m/tasks/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/projects/yt8m/tasks/yt8m_task.py b/official/projects/yt8m/tasks/yt8m_task.py index 1c7ee005336275a29b8ed791dacac46b367c3845..0a1b82a6767106a3ee8bb4da9ae6ccf1ef395380 100644 --- a/official/projects/yt8m/tasks/yt8m_task.py +++ b/official/projects/yt8m/tasks/yt8m_task.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,6 +13,8 @@ # limitations under the License. """Video classification task definition.""" +from typing import Dict, List, Optional, Tuple + from absl import logging import tensorflow as tf @@ -95,31 +97,46 @@ class YT8MTask(base_task.Task): return dataset - def build_losses(self, labels, model_outputs, aux_losses=None): + def build_losses(self, + labels, + model_outputs, + label_weights=None, + aux_losses=None): """Sigmoid Cross Entropy. Args: labels: tensor containing truth labels. model_outputs: output logits of the classifier. + label_weights: optional tensor of label weights. aux_losses: tensor containing auxiliarly loss tensors, i.e. `losses` in keras.Model. Returns: - Tensors: The total loss, model loss tensors. + A dict of tensors contains total loss, model loss tensors. """ losses_config = self.task_config.losses model_loss = tf.keras.losses.binary_crossentropy( labels, model_outputs, from_logits=losses_config.from_logits, - label_smoothing=losses_config.label_smoothing) + label_smoothing=losses_config.label_smoothing, + axis=None) + + if label_weights is None: + model_loss = tf_utils.safe_mean(model_loss) + else: + model_loss = model_loss * label_weights + # Manutally compute weighted mean loss. + total_loss = tf.reduce_sum(model_loss) + total_weight = tf.cast( + tf.reduce_sum(label_weights), dtype=total_loss.dtype) + model_loss = tf.math.divide_no_nan(total_loss, total_weight) - model_loss = tf_utils.safe_mean(model_loss) total_loss = model_loss if aux_losses: total_loss += tf.add_n(aux_losses) - return total_loss, model_loss + return {'total_loss': total_loss, 'model_loss': model_loss} def build_metrics(self, training=True): """Gets streaming metrics for training/validation. @@ -130,10 +147,10 @@ class YT8MTask(base_task.Task): top_n: A positive Integer specifying the average precision at n, or None to use all provided data points. Args: - training: bool value, true for training mode, false for eval/validation. + training: Bool value, true for training mode, false for eval/validation. Returns: - list of strings that indicate metrics to be used + A list of strings that indicate metrics to be used. """ metrics = [] metric_names = ['total_loss', 'model_loss'] @@ -149,15 +166,48 @@ class YT8MTask(base_task.Task): return metrics + def process_metrics(self, + metrics: List[tf.keras.metrics.Metric], + labels: tf.Tensor, + outputs: tf.Tensor, + model_losses: Optional[Dict[str, tf.Tensor]] = None, + label_weights: Optional[tf.Tensor] = None, + training: bool = True, + **kwargs) -> Dict[str, Tuple[tf.Tensor, ...]]: + """Updates metrics. + + Args: + metrics: Evaluation metrics to be updated. + labels: A tensor containing truth labels. + outputs: Model output logits of the classifier. + model_losses: An optional dict of model losses. + label_weights: Optional label weights, can be broadcast into shape of + outputs/labels. + training: Bool indicates if in training mode. + **kwargs: Additional input arguments. + + Returns: + Updated dict of metrics log. + """ + if model_losses is None: + model_losses = {} + + logs = {} + if not training: + logs.update({self.avg_prec_metric.name: (labels, outputs)}) + + for m in metrics: + m.update_state(model_losses[m.name]) + logs[m.name] = m.result() + return logs + def train_step(self, inputs, model, optimizer, metrics=None): """Does forward and backward. Args: - inputs: a dictionary of input tensors. output_dict = { - "video_ids": batch_video_ids, - "video_matrix": batch_video_matrix, - "labels": batch_labels, - "num_frames": batch_frames, } + inputs: a dictionary of input tensors. output_dict = { "video_ids": + batch_video_ids, "video_matrix": batch_video_matrix, "labels": + batch_labels, "num_frames": batch_frames, } model: the model, forward pass definition. optimizer: the optimizer for this training step. metrics: a nested structure of metrics objects. @@ -167,10 +217,7 @@ class YT8MTask(base_task.Task): """ features, labels = inputs['video_matrix'], inputs['labels'] num_frames = inputs['num_frames'] - - # Normalize input features. - feature_dim = len(features.shape) - 1 - features = tf.nn.l2_normalize(features, feature_dim) + label_weights = inputs.get('label_weights', None) # sample random frames / random sequence num_frames = tf.cast(num_frames, tf.float32) @@ -187,26 +234,28 @@ class YT8MTask(base_task.Task): # Casting output layer as float32 is necessary when mixed_precision is # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - # Computes per-replica loss - loss, model_loss = self.build_losses( - model_outputs=outputs, labels=labels, aux_losses=model.losses) + all_losses = self.build_losses( + model_outputs=outputs, + labels=labels, + label_weights=label_weights, + aux_losses=model.losses) + + loss = all_losses['total_loss'] # Scales loss as the default gradients allreduce performs sum inside the # optimizer. scaled_loss = loss / num_replicas # For mixed_precision policy, when LossScaleOptimizer is used, loss is # scaled for numerical stability. - if isinstance(optimizer, - tf.keras.mixed_precision.LossScaleOptimizer): + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): scaled_loss = optimizer.get_scaled_loss(scaled_loss) tvars = model.trainable_variables grads = tape.gradient(scaled_loss, tvars) # Scales back gradient before apply_gradients when LossScaleOptimizer is # used. - if isinstance(optimizer, - tf.keras.mixed_precision.LossScaleOptimizer): + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): grads = optimizer.get_unscaled_gradients(grads) # Apply gradient clipping. @@ -217,12 +266,14 @@ class YT8MTask(base_task.Task): logs = {self.loss: loss} - all_losses = {'total_loss': loss, 'model_loss': model_loss} - - if metrics: - for m in metrics: - m.update_state(all_losses[m.name]) - logs.update({m.name: m.result()}) + logs.update( + self.process_metrics( + metrics, + labels=labels, + outputs=outputs, + model_losses=all_losses, + label_weights=label_weights, + training=True)) return logs @@ -230,11 +281,9 @@ class YT8MTask(base_task.Task): """Validatation step. Args: - inputs: a dictionary of input tensors. output_dict = { - "video_ids": batch_video_ids, - "video_matrix": batch_video_matrix, - "labels": batch_labels, - "num_frames": batch_frames, } + inputs: a dictionary of input tensors. output_dict = { "video_ids": + batch_video_ids, "video_matrix": batch_video_matrix, "labels": + batch_labels, "num_frames": batch_frames, } model: the model, forward definition metrics: a nested structure of metrics objects. @@ -243,10 +292,7 @@ class YT8MTask(base_task.Task): """ features, labels = inputs['video_matrix'], inputs['labels'] num_frames = inputs['num_frames'] - - # Normalize input features. - feature_dim = len(features.shape) - 1 - features = tf.nn.l2_normalize(features, feature_dim) + label_weights = inputs.get('label_weights', None) # sample random frames (None, 5, 1152) -> (None, 30, 1152) sample_frames = self.task_config.validation_data.num_frames @@ -260,23 +306,28 @@ class YT8MTask(base_task.Task): outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) if self.task_config.validation_data.segment_labels: # workaround to ignore the unrated labels. - outputs *= inputs['label_weights'] + outputs *= label_weights # remove padding outputs = outputs[~tf.reduce_all(labels == -1, axis=1)] labels = labels[~tf.reduce_all(labels == -1, axis=1)] - loss, model_loss = self.build_losses( - model_outputs=outputs, labels=labels, aux_losses=model.losses) - logs = {self.loss: loss} + all_losses = self.build_losses( + labels=labels, + model_outputs=outputs, + label_weights=label_weights, + aux_losses=model.losses) - all_losses = {'total_loss': loss, 'model_loss': model_loss} + logs = {self.loss: all_losses['total_loss']} - logs.update({self.avg_prec_metric.name: (labels, outputs)}) + logs.update( + self.process_metrics( + metrics, + labels=labels, + outputs=outputs, + model_losses=all_losses, + label_weights=inputs.get('label_weights', None), + training=False)) - if metrics: - for m in metrics: - m.update_state(all_losses[m.name]) - logs.update({m.name: m.result()}) return logs def inference_step(self, inputs, model): diff --git a/official/projects/yt8m/train.py b/official/projects/yt8m/train.py index e3b1abe031643926a5a73c2723e360505074bcb9..2145a1826510186b8cdb96db5525b0afcba000a3 100644 --- a/official/projects/yt8m/train.py +++ b/official/projects/yt8m/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ from official.common import flags as tfm_flags from official.projects.yt8m.configs import yt8m from official.projects.yt8m.tasks import yt8m_task # pylint: enable=unused-import -from official.vision.beta import train +from official.vision import train if __name__ == '__main__': diff --git a/official/projects/yt8m/train_test.py b/official/projects/yt8m/train_test.py index 2699d36a3ca306e09ac903106d401e0b07b3f7c2..773a6385440d2d54fd8bfba355d94c76103e4383 100644 --- a/official/projects/yt8m/train_test.py +++ b/official/projects/yt8m/train_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,50 +12,37 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 - import json import os from absl import flags from absl.testing import flagsaver -import numpy as np +from absl.testing import parameterized import tensorflow as tf from official.projects.yt8m import train as train_lib -from official.vision.beta.dataloaders import tfexample_utils +from official.projects.yt8m.dataloaders import utils +from official.vision.dataloaders import tfexample_utils FLAGS = flags.FLAGS -def make_yt8m_example(): - rgb = np.random.randint(low=256, size=1024, dtype=np.uint8) - audio = np.random.randint(low=256, size=128, dtype=np.uint8) - - seq_example = tf.train.SequenceExample() - seq_example.context.feature['id'].bytes_list.value[:] = [b'id001'] - seq_example.context.feature['labels'].int64_list.value[:] = [1, 2, 3, 4] - tfexample_utils.put_bytes_list_to_feature( - seq_example, rgb.tobytes(), key='rgb', repeat_num=120) - tfexample_utils.put_bytes_list_to_feature( - seq_example, audio.tobytes(), key='audio', repeat_num=120) - - return seq_example - - -class TrainTest(tf.test.TestCase): +class TrainTest(parameterized.TestCase, tf.test.TestCase): def setUp(self): - super(TrainTest, self).setUp() + super().setUp() self._model_dir = os.path.join(self.get_temp_dir(), 'model_dir') tf.io.gfile.makedirs(self._model_dir) data_dir = os.path.join(self.get_temp_dir(), 'data') tf.io.gfile.makedirs(data_dir) self._data_path = os.path.join(data_dir, 'data.tfrecord') - examples = [make_yt8m_example() for _ in range(8)] + examples = [utils.make_yt8m_example() for _ in range(8)] tfexample_utils.dump_to_tfrecord(self._data_path, tf_examples=examples) - def test_run(self): + @parameterized.named_parameters( + dict(testcase_name='segment', use_segment_level_labels=True), + dict(testcase_name='video', use_segment_level_labels=False)) + def test_train_and_eval(self, use_segment_level_labels): saved_flag_values = flagsaver.save_flag_values() train_lib.tfm_flags.define_flags() FLAGS.mode = 'train' @@ -88,13 +75,15 @@ class TrainTest(tf.test.TestCase): }, 'validation_data': { 'input_path': self._data_path, + 'segment_labels': use_segment_level_labels, 'global_batch_size': 4, } } }) FLAGS.params_override = params_override - train_lib.train.main('unused_args') + with train_lib.train.gin.unlock_config(): + train_lib.train.main('unused_args') FLAGS.mode = 'eval' diff --git a/official/recommendation/README.md b/official/recommendation/README.md index ea2abfadcab2902025ff65e1797ab38646f79082..59c73b5a8a4aacc84cbe6fe83d2966b2576b2836 100644 --- a/official/recommendation/README.md +++ b/official/recommendation/README.md @@ -17,7 +17,7 @@ Some abbreviations used the code base include: - ml-20m: MovieLens 20 million dataset ## Dataset -The [MovieLens datasets](http://files.grouplens.org/datasets/movielens/) are used for model training and evaluation. Specifically, we use two datasets: **ml-1m** (short for MovieLens 1 million) and **ml-20m** (short for MovieLens 20 million). +The [MovieLens datasets](https://files.grouplens.org/datasets/movielens/) are used for model training and evaluation. Specifically, we use two datasets: **ml-1m** (short for MovieLens 1 million) and **ml-20m** (short for MovieLens 20 million). ### ml-1m ml-1m dataset contains 1,000,209 anonymous ratings of approximately 3,706 movies made by 6,040 users who joined MovieLens in 2000. All ratings are contained in the file "ratings.dat" without header row, and are in the following format: diff --git a/official/recommendation/__init__.py b/official/recommendation/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/recommendation/__init__.py +++ b/official/recommendation/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/constants.py b/official/recommendation/constants.py index a7aae736c2dd36b0f5e321f741f6bb9b75f8e95c..bfbcf52ccceef56cd35ef83e7e9b9dfbf2265168 100644 --- a/official/recommendation/constants.py +++ b/official/recommendation/constants.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/create_ncf_data.py b/official/recommendation/create_ncf_data.py index bc411cbd8b03380baf9ef0e3e9481a4b97a90b66..013d0499740ff42011ed3eaf4b65e3c624dc7aa2 100644 --- a/official/recommendation/create_ncf_data.py +++ b/official/recommendation/create_ncf_data.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/data_pipeline.py b/official/recommendation/data_pipeline.py index dae2d44a1dc972feb7d4dcb6f2e0dc8b093cb525..78f2a892f275e0a60ead5ad15c1e400e5cdb5ecf 100644 --- a/official/recommendation/data_pipeline.py +++ b/official/recommendation/data_pipeline.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/data_preprocessing.py b/official/recommendation/data_preprocessing.py index d14bf6ae24dcafe059a2f7a46d59a508ecc64f0f..394935acf268fd14ca0046d7bcbf56fefabe9b7e 100644 --- a/official/recommendation/data_preprocessing.py +++ b/official/recommendation/data_preprocessing.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/data_test.py b/official/recommendation/data_test.py index 31e0ae4d2113cde0191c36223537b63957ecbfa9..841be8e5818cc958532f3e03bd09dfc554ab826a 100644 --- a/official/recommendation/data_test.py +++ b/official/recommendation/data_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/movielens.py b/official/recommendation/movielens.py index f50820e1fec2021c85fda4fe37e0bc9c78b9a249..fb9e595176cb62ed1ca01dc72a6a9900350b2b65 100644 --- a/official/recommendation/movielens.py +++ b/official/recommendation/movielens.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -49,7 +49,7 @@ RATINGS_FILE = "ratings.csv" MOVIES_FILE = "movies.csv" # URL to download dataset -_DATA_URL = "http://files.grouplens.org/datasets/movielens/" +_DATA_URL = "https://files.grouplens.org/datasets/movielens/" GENRE_COLUMN = "genres" ITEM_COLUMN = "item_id" # movies diff --git a/official/recommendation/ncf_common.py b/official/recommendation/ncf_common.py index 43d6a88f1231dc2948365b31fc230521dcdaa512..f1677bf15ad82b4994b7cedbafb24baf3606db06 100644 --- a/official/recommendation/ncf_common.py +++ b/official/recommendation/ncf_common.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ncf_input_pipeline.py b/official/recommendation/ncf_input_pipeline.py index 93f950bcee827d6ee43cb598c56aafc2ec455fc9..194a83866f4861c0cb19b9ee4e8c82fd6f71dd07 100644 --- a/official/recommendation/ncf_input_pipeline.py +++ b/official/recommendation/ncf_input_pipeline.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ncf_keras_main.py b/official/recommendation/ncf_keras_main.py index 2590df4ce32037dec1f7542767b0ebbcdc089ef2..268ce09343c87d488223bcce00fdb0579fa71e92 100644 --- a/official/recommendation/ncf_keras_main.py +++ b/official/recommendation/ncf_keras_main.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ncf_test.py b/official/recommendation/ncf_test.py index b37d0c1dcc486e8badaff7d5e3c941625245bec2..2f6f0865cb4621618615362372f96fc4bac67b38 100644 --- a/official/recommendation/ncf_test.py +++ b/official/recommendation/ncf_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/neumf_model.py b/official/recommendation/neumf_model.py index 93da37fb1451961267a2212a7bbe7d26741593a6..b739546ed1352c5fa23fcd2c29bf753f0c245f58 100644 --- a/official/recommendation/neumf_model.py +++ b/official/recommendation/neumf_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -37,6 +37,7 @@ import sys from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf +from tensorflow import estimator as tf_estimator from typing import Any, Dict, Text from official.recommendation import constants as rconst @@ -85,7 +86,7 @@ def neumf_model_fn(features, labels, mode, params): # Softmax with the first column of zeros is equivalent to sigmoid. softmax_logits = ncf_common.convert_to_softmax_logits(logits) - if mode == tf.estimator.ModeKeys.EVAL: + if mode == tf_estimator.ModeKeys.EVAL: duplicate_mask = tf.cast(features[rconst.DUPLICATE_MASK], tf.float32) return _get_estimator_spec_with_metrics( logits, @@ -95,7 +96,7 @@ def neumf_model_fn(features, labels, mode, params): params["match_mlperf"], use_tpu_spec=params["use_tpu"]) - elif mode == tf.estimator.ModeKeys.TRAIN: + elif mode == tf_estimator.ModeKeys.TRAIN: labels = tf.cast(labels, tf.int32) valid_pt_mask = features[rconst.VALID_POINT_MASK] @@ -124,7 +125,7 @@ def neumf_model_fn(features, labels, mode, params): update_ops = tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.UPDATE_OPS) train_op = tf.group(minimize_op, update_ops) - return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op) + return tf_estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op) else: raise NotImplementedError @@ -260,13 +261,13 @@ def _get_estimator_spec_with_metrics(logits: tf.Tensor, match_mlperf) if use_tpu_spec: - return tf.estimator.tpu.TPUEstimatorSpec( - mode=tf.estimator.ModeKeys.EVAL, + return tf_estimator.tpu.TPUEstimatorSpec( + mode=tf_estimator.ModeKeys.EVAL, loss=cross_entropy, eval_metrics=(metric_fn, [in_top_k, ndcg, metric_weights])) - return tf.estimator.EstimatorSpec( - mode=tf.estimator.ModeKeys.EVAL, + return tf_estimator.EstimatorSpec( + mode=tf_estimator.ModeKeys.EVAL, loss=cross_entropy, eval_metric_ops=metric_fn(in_top_k, ndcg, metric_weights)) diff --git a/official/recommendation/popen_helper.py b/official/recommendation/popen_helper.py index c13c795e7833f536fedee381fb740ab76ab00ab8..4004c207fab40507563d3831f69e3eb9d748b96b 100644 --- a/official/recommendation/popen_helper.py +++ b/official/recommendation/popen_helper.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/__init__.py b/official/recommendation/ranking/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/recommendation/ranking/__init__.py +++ b/official/recommendation/ranking/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/common.py b/official/recommendation/ranking/common.py index 43b290a3ac69ad6dc1ae25ccf713910cc1154395..f7bdf49ea5a86d6bf8bb0251400e02dd3af520bd 100644 --- a/official/recommendation/ranking/common.py +++ b/official/recommendation/ranking/common.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/configs/__init__.py b/official/recommendation/ranking/configs/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/recommendation/ranking/configs/__init__.py +++ b/official/recommendation/ranking/configs/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/configs/config.py b/official/recommendation/ranking/configs/config.py index 02b89a3196c50dde09f1384a19bca9b142be4e40..d7fa5807dc163188e8f374bea4b40985379d7062 100644 --- a/official/recommendation/ranking/configs/config.py +++ b/official/recommendation/ranking/configs/config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/configs/config_test.py b/official/recommendation/ranking/configs/config_test.py index df65051dc39cba6950d14c71dcd97cf71cacdc8f..890a1943b8e2940ad89f3308bcaec29a077bdc2d 100644 --- a/official/recommendation/ranking/configs/config_test.py +++ b/official/recommendation/ranking/configs/config_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/data/__init__.py b/official/recommendation/ranking/data/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/recommendation/ranking/data/__init__.py +++ b/official/recommendation/ranking/data/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/data/data_pipeline.py b/official/recommendation/ranking/data/data_pipeline.py index f6ba33d7223504163d234775dcd972575fb140ef..8a8a4a1b6e8bea7b93d5e4e46ab7400f4d084de8 100644 --- a/official/recommendation/ranking/data/data_pipeline.py +++ b/official/recommendation/ranking/data/data_pipeline.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/data/data_pipeline_test.py b/official/recommendation/ranking/data/data_pipeline_test.py index 015d49e553c926218aa26d45c8d3e85152bcf5da..d33f1564da3bc49ce188b9dd153acb5d9e82ed32 100644 --- a/official/recommendation/ranking/data/data_pipeline_test.py +++ b/official/recommendation/ranking/data/data_pipeline_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/preprocessing/criteo_preprocess.py b/official/recommendation/ranking/preprocessing/criteo_preprocess.py index ccaec5dd9d69f9a1184236ad4cabafe743f20c8e..7f0f5ae5e4762f01285316852742dea01f92a605 100644 --- a/official/recommendation/ranking/preprocessing/criteo_preprocess.py +++ b/official/recommendation/ranking/preprocessing/criteo_preprocess.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/preprocessing/setup.py b/official/recommendation/ranking/preprocessing/setup.py index 36fc5dd49943cdfdfc75e8c27c30a761f642d71e..37184cdddc777834cc7514b6189cc12df25f3c87 100644 --- a/official/recommendation/ranking/preprocessing/setup.py +++ b/official/recommendation/ranking/preprocessing/setup.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/preprocessing/shard_rebalancer.py b/official/recommendation/ranking/preprocessing/shard_rebalancer.py index 8f19ae74a9ed442641f073658b0e997360359496..51465025952565b0eeaaea1d32e586fc176e48b7 100644 --- a/official/recommendation/ranking/preprocessing/shard_rebalancer.py +++ b/official/recommendation/ranking/preprocessing/shard_rebalancer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/task.py b/official/recommendation/ranking/task.py index 42152647d7b3a5dd308652b2cd55968c00d078bc..d6dc9577adb7ba16ea26224f2672f55c5acfb9c9 100644 --- a/official/recommendation/ranking/task.py +++ b/official/recommendation/ranking/task.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -133,9 +133,9 @@ class RankingTask(base_task.Task): decay_steps=lr_config.decay_steps, decay_start_steps=lr_config.decay_start_steps) - dense_optimizer = tf.keras.optimizers.Adam() + dense_optimizer = tf.keras.optimizers.legacy.Adam() embedding_optimizer = tf.keras.optimizers.get( - self.optimizer_config.embedding_optimizer) + self.optimizer_config.embedding_optimizer, use_legacy_optimizer=True) embedding_optimizer.learning_rate = lr_callable feature_config = _get_tpu_embedding_feature_config( diff --git a/official/recommendation/ranking/task_test.py b/official/recommendation/ranking/task_test.py index 1ef4fe673be17094d53b27032f88a78910e1c91d..426f468d217931821e7236139f3026b1d82257a3 100644 --- a/official/recommendation/ranking/task_test.py +++ b/official/recommendation/ranking/task_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/train.py b/official/recommendation/ranking/train.py index 595a01a574bdfa74f85f3e19809078f827665bf2..5ae322a71e674edcbe45b660fa75e060b108a5e0 100644 --- a/official/recommendation/ranking/train.py +++ b/official/recommendation/ranking/train.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/recommendation/ranking/train_test.py b/official/recommendation/ranking/train_test.py index 1e0c1dad70958ae36990ef1af697161a3f7ffc3c..81d9f718d974dacdb05e2791b5e742d219acda37 100644 --- a/official/recommendation/ranking/train_test.py +++ b/official/recommendation/ranking/train_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -125,6 +125,8 @@ class TrainTest(parameterized.TestCase, tf.test.TestCase): interaction=interaction, use_orbit=use_orbit, strategy=strategy) + + default_mode = FLAGS.mode # Training. FLAGS.mode = 'train' train.main('unused_args') @@ -134,6 +136,7 @@ class TrainTest(parameterized.TestCase, tf.test.TestCase): # Evaluation. FLAGS.mode = 'eval' train.main('unused_args') + FLAGS.mode = default_mode if __name__ == '__main__': diff --git a/official/recommendation/stat_utils.py b/official/recommendation/stat_utils.py index 3f8c8050dad910bbadabe981b951ca8782c301f7..a565ce9df266304d77223242cb545ee38ac790db 100644 --- a/official/recommendation/stat_utils.py +++ b/official/recommendation/stat_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/requirements.txt b/official/requirements.txt index 1d623484a91de37f50f1f4f4b17c0d1aa3df759b..3c4babbc0c5540f63140579be087e82a69f3cdfb 100644 --- a/official/requirements.txt +++ b/official/requirements.txt @@ -1,7 +1,7 @@ six google-api-python-client>=1.6.7 kaggle>=1.3.9 -numpy>=1.15.4 +numpy>=1.20 oauth2client pandas>=0.22.0 psutil>=5.4.3 @@ -11,7 +11,6 @@ tensorflow-hub>=0.6.0 tensorflow-model-optimization>=0.4.1 tensorflow-datasets tensorflow-addons -dataclasses;python_version<"3.7" gin-config tf_slim>=1.1.0 Cython @@ -19,10 +18,12 @@ matplotlib # Loader becomes a required positional argument in 6.0 in yaml.load pyyaml>=5.1,<6.0 # CV related dependencies -opencv-python-headless +opencv-python-headless==4.5.2.52 Pillow pycocotools # NLP related dependencies seqeval sentencepiece sacrebleu +# Projects/vit dependencies +immutabledict diff --git a/official/utils/__init__.py b/official/utils/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/utils/__init__.py +++ b/official/utils/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/docs/README.md b/official/utils/docs/README.md new file mode 100644 index 0000000000000000000000000000000000000000..cc47f08fab53a64ca06141c34ee78630ee1598b4 --- /dev/null +++ b/official/utils/docs/README.md @@ -0,0 +1,12 @@ +# Docs generation scripts for TensorFlow Models + +The scripts here are used to generate api-reference pages for tensorflow.org. + +The scripts require tensorflow_docs, which can be installed directly from +github: + +``` +$> pip install -U git+https://github.com/tensorflow/docs +$> python build_all_api_docs.py --output_dir=/tmp/tfm_docs +``` + diff --git a/official/utils/docs/__init__.py b/official/utils/docs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/utils/docs/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/utils/docs/build_api_docs_lib.py b/official/utils/docs/build_api_docs_lib.py deleted file mode 100644 index 0bff8b0117770c5ea70b105d37aa06b20d4823b5..0000000000000000000000000000000000000000 --- a/official/utils/docs/build_api_docs_lib.py +++ /dev/null @@ -1,54 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -r"""Common library for API docs builder.""" - -import tensorflow as tf -from tensorflow_docs.api_generator import doc_controls - - -def hide_module_model_and_layer_methods(): - """Hide methods and properties defined in the base classes of Keras layers. - - We hide all methods and properties of the base classes, except: - - `__init__` is always documented. - - `call` is always documented, as it can carry important information for - complex layers. - """ - module_contents = list(tf.Module.__dict__.items()) - model_contents = list(tf.keras.Model.__dict__.items()) - layer_contents = list(tf.keras.layers.Layer.__dict__.items()) - - for name, obj in module_contents + layer_contents + model_contents: - if name == '__init__': - # Always document __init__. - continue - - if name == 'call': - # Always document `call`. - if hasattr(obj, doc_controls._FOR_SUBCLASS_IMPLEMENTERS): # pylint: disable=protected-access - delattr(obj, doc_controls._FOR_SUBCLASS_IMPLEMENTERS) # pylint: disable=protected-access - continue - - # Otherwise, exclude from documentation. - if isinstance(obj, property): - obj = obj.fget - - if isinstance(obj, (staticmethod, classmethod)): - obj = obj.__func__ - - try: - doc_controls.do_not_doc_in_subclasses(obj) - except AttributeError: - pass diff --git a/official/utils/docs/build_nlp_api_docs.py b/official/utils/docs/build_nlp_api_docs.py deleted file mode 100644 index 45af252c3cd23572d35d647afc94609bcb566603..0000000000000000000000000000000000000000 --- a/official/utils/docs/build_nlp_api_docs.py +++ /dev/null @@ -1,90 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -r"""Tool to generate api_docs for tensorflow_models/official library. - -Example: - -$> pip install -U git+https://github.com/tensorflow/docs -$> python build_nlp_api_docs \ - --output_dir=/tmp/api_docs -""" - -import os - -from absl import app -from absl import flags -from absl import logging -from tensorflow_docs.api_generator import generate_lib -from tensorflow_docs.api_generator import public_api - -from official.nlp import modeling as tfnlp -import build_api_docs_lib - -FLAGS = flags.FLAGS - -flags.DEFINE_string('output_dir', None, 'Where to write the resulting docs to.') -flags.DEFINE_string( - 'code_url_prefix', - 'https://github.com/tensorflow/models/blob/master/official/nlp/modeling/', - 'The url prefix for links to code.') - -flags.DEFINE_bool('search_hints', True, - 'Include metadata search hints in the generated files') - -flags.DEFINE_string('site_path', '/api_docs/python', - 'Path prefix in the _toc.yaml') - - -PROJECT_SHORT_NAME = 'tfnlp' -PROJECT_FULL_NAME = 'TensorFlow Official Models - NLP Modeling Library' - - -def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name, - project_full_name, search_hints): - """Generates api docs for the tensorflow docs package.""" - build_api_docs_lib.hide_module_model_and_layer_methods() - del tfnlp.layers.MultiHeadAttention - del tfnlp.layers.EinsumDense - - doc_generator = generate_lib.DocGenerator( - root_title=project_full_name, - py_modules=[(project_short_name, tfnlp)], - base_dir=os.path.dirname(tfnlp.__file__), - code_url_prefix=code_url_prefix, - search_hints=search_hints, - site_path=site_path, - callbacks=[public_api.explicit_package_contents_filter], - ) - - doc_generator.build(output_dir) - logging.info('Output docs to: %s', output_dir) - - -def main(argv): - if len(argv) > 1: - raise app.UsageError('Too many command-line arguments.') - - gen_api_docs( - code_url_prefix=FLAGS.code_url_prefix, - site_path=FLAGS.site_path, - output_dir=FLAGS.output_dir, - project_short_name=PROJECT_SHORT_NAME, - project_full_name=PROJECT_FULL_NAME, - search_hints=FLAGS.search_hints) - - -if __name__ == '__main__': - flags.mark_flag_as_required('output_dir') - app.run(main) diff --git a/official/utils/docs/build_orbit_api_docs.py b/official/utils/docs/build_orbit_api_docs.py new file mode 100644 index 0000000000000000000000000000000000000000..c7f25715e8c2a1a98452735e84ce3e8dde73d5e5 --- /dev/null +++ b/official/utils/docs/build_orbit_api_docs.py @@ -0,0 +1,119 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Tool to generate api_docs for tensorflow_models/official library. + +Example: + +$> pip install -U git+https://github.com/tensorflow/docs +$> python build_orbit_api_docs.py --output_dir=/tmp/api_docs +""" +from absl import app +from absl import flags +from absl import logging + +import orbit + +import tensorflow as tf +from tensorflow_docs.api_generator import doc_controls +from tensorflow_docs.api_generator import generate_lib +from tensorflow_docs.api_generator import public_api + +FLAGS = flags.FLAGS + +flags.DEFINE_string('output_dir', None, 'Where to write the resulting docs to.') +flags.DEFINE_string('code_url_prefix', + 'https://github.com/tensorflow/models/blob/master/orbit', + 'The url prefix for links to code.') + +flags.DEFINE_bool('search_hints', True, + 'Include metadata search hints in the generated files') + +flags.DEFINE_string('site_path', '/api_docs/python', + 'Path prefix in the _toc.yaml') + + +PROJECT_SHORT_NAME = 'orbit' +PROJECT_FULL_NAME = 'Orbit' + + +def hide_module_model_and_layer_methods(): + """Hide methods and properties defined in the base classes of Keras layers. + + We hide all methods and properties of the base classes, except: + - `__init__` is always documented. + - `call` is always documented, as it can carry important information for + complex layers. + """ + module_contents = list(tf.Module.__dict__.items()) + model_contents = list(tf.keras.Model.__dict__.items()) + layer_contents = list(tf.keras.layers.Layer.__dict__.items()) + + for name, obj in module_contents + layer_contents + model_contents: + if name == '__init__': + # Always document __init__. + continue + + if name == 'call': + # Always document `call`. + if hasattr(obj, doc_controls._FOR_SUBCLASS_IMPLEMENTERS): # pylint: disable=protected-access + delattr(obj, doc_controls._FOR_SUBCLASS_IMPLEMENTERS) # pylint: disable=protected-access + continue + + # Otherwise, exclude from documentation. + if isinstance(obj, property): + obj = obj.fget + + if isinstance(obj, (staticmethod, classmethod)): + obj = obj.__func__ + + try: + doc_controls.do_not_doc_in_subclasses(obj) + except AttributeError: + pass + + +def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name, + project_full_name, search_hints): + """Generates api docs for the tensorflow docs package.""" + + doc_generator = generate_lib.DocGenerator( + root_title=project_full_name, + py_modules=[(project_short_name, orbit)], + code_url_prefix=code_url_prefix, + search_hints=search_hints, + site_path=site_path, + callbacks=[public_api.explicit_package_contents_filter], + ) + + doc_generator.build(output_dir) + logging.info('Output docs to: %s', output_dir) + + +def main(argv): + if len(argv) > 1: + raise app.UsageError('Too many command-line arguments.') + + gen_api_docs( + code_url_prefix=FLAGS.code_url_prefix, + site_path=FLAGS.site_path, + output_dir=FLAGS.output_dir, + project_short_name=PROJECT_SHORT_NAME, + project_full_name=PROJECT_FULL_NAME, + search_hints=FLAGS.search_hints) + + +if __name__ == '__main__': + flags.mark_flag_as_required('output_dir') + app.run(main) diff --git a/official/utils/docs/build_tfm_api_docs.py b/official/utils/docs/build_tfm_api_docs.py new file mode 100644 index 0000000000000000000000000000000000000000..fe55ebd11760c80e0074ec5248a743027362c655 --- /dev/null +++ b/official/utils/docs/build_tfm_api_docs.py @@ -0,0 +1,197 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Tool to generate api_docs for tensorflow_models/official library. + +Example: + +$> pip install -U git+https://github.com/tensorflow/docs +$> python build_nlp_api_docs.py --output_dir=/tmp/api_docs +""" + +import pathlib + +from absl import app +from absl import flags +from absl import logging + +import tensorflow as tf +from tensorflow_docs.api_generator import doc_controls +from tensorflow_docs.api_generator import generate_lib +from tensorflow_docs.api_generator import parser +from tensorflow_docs.api_generator import public_api +from tensorflow_docs.api_generator.pretty_docs import base_page +from tensorflow_docs.api_generator.pretty_docs import function_page + +import tensorflow_models as tfm + +FLAGS = flags.FLAGS + +flags.DEFINE_string('output_dir', None, 'Where to write the resulting docs to.') +flags.DEFINE_string( + 'code_url_prefix', + 'https://github.com/tensorflow/models/blob/master/tensorflow_models', + 'The url prefix for links to code.') + +flags.DEFINE_bool('search_hints', True, + 'Include metadata search hints in the generated files') + +flags.DEFINE_string('site_path', '/api_docs/python', + 'Path prefix in the _toc.yaml') + + +PROJECT_SHORT_NAME = 'tfm' +PROJECT_FULL_NAME = 'TensorFlow Modeling Library' + + +class ExpFactoryInfo(function_page.FunctionPageInfo): + """Customize the page for the experiment factory.""" + + def collect_docs(self): + super().collect_docs() + self.doc.docstring_parts.append(self.make_factory_options_table()) + + def make_factory_options_table(self): + lines = [ + '', + 'Allowed values for `exp_name`:', + '', + # The indent is important here, it keeps the site's markdown parser + # from switching to HTML mode. + ' \n', + '', + ] + reference_resolver = self.parser_config.reference_resolver + api_tree = self.parser_config.api_tree + for name, fn in sorted(tfm.core.exp_factory._REGISTERED_CONFIGS.items()): # pylint: disable=protected-access + fn_api_node = api_tree.node_for_object(fn) + if fn_api_node is None: + location = parser.get_defined_in(self.py_object, self.parser_config) + link = base_page.small_source_link(location, name) + else: + link = reference_resolver.python_link(name, fn_api_node.full_name) + doc = fn.__doc__ + if doc: + doc = doc.splitlines()[0] + else: + doc = '' + + lines.append(f'') + + lines.append('
exp_nameDescription
{link}{doc}
') + return '\n'.join(lines) + + +def hide_module_model_and_layer_methods(): + """Hide methods and properties defined in the base classes of Keras layers. + + We hide all methods and properties of the base classes, except: + - `__init__` is always documented. + - `call` is always documented, as it can carry important information for + complex layers. + """ + module_contents = list(tf.Module.__dict__.items()) + model_contents = list(tf.keras.Model.__dict__.items()) + layer_contents = list(tf.keras.layers.Layer.__dict__.items()) + + for name, obj in module_contents + layer_contents + model_contents: + if name == '__init__': + # Always document __init__. + continue + + if name == 'call': + # Always document `call`. + if hasattr(obj, doc_controls._FOR_SUBCLASS_IMPLEMENTERS): # pylint: disable=protected-access + delattr(obj, doc_controls._FOR_SUBCLASS_IMPLEMENTERS) # pylint: disable=protected-access + continue + + # Otherwise, exclude from documentation. + if isinstance(obj, property): + obj = obj.fget + + if isinstance(obj, (staticmethod, classmethod)): + obj = obj.__func__ + + try: + doc_controls.do_not_doc_in_subclasses(obj) + except AttributeError: + pass + + +def custom_filter(path, parent, children): + if len(path) <= 2: + # Don't filter the contents of the top level `tfm.vision` package. + return children + else: + return public_api.explicit_package_contents_filter(path, parent, children) + + +def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name, + project_full_name, search_hints): + """Generates api docs for the tensorflow docs package.""" + hide_module_model_and_layer_methods() + del tfm.nlp.layers.MultiHeadAttention + del tfm.nlp.layers.EinsumDense + + doc_controls.set_custom_page_builder_cls(tfm.core.exp_factory.get_exp_config, + ExpFactoryInfo) + + url_parts = code_url_prefix.strip('/').split('/') + url_parts = url_parts[:url_parts.index('tensorflow_models')] + url_parts.append('official') + + official_url_prefix = '/'.join(url_parts) + + tfm_base_dir = pathlib.Path(tfm.__file__).parent + + # The `layers` submodule (and others) are actually defined in the `official` + # package. Find the path to `official`. + official_base_dir = [ + p for p in pathlib.Path(tfm.vision.layers.__file__).parents + if p.name == 'official' + ][0] + + doc_generator = generate_lib.DocGenerator( + root_title=project_full_name, + py_modules=[(project_short_name, tfm)], + base_dir=[tfm_base_dir, official_base_dir], + code_url_prefix=[ + code_url_prefix, + official_url_prefix, + ], + search_hints=search_hints, + site_path=site_path, + callbacks=[custom_filter], + ) + + doc_generator.build(output_dir) + logging.info('Output docs to: %s', output_dir) + + +def main(argv): + if len(argv) > 1: + raise app.UsageError('Too many command-line arguments.') + + gen_api_docs( + code_url_prefix=FLAGS.code_url_prefix, + site_path=FLAGS.site_path, + output_dir=FLAGS.output_dir, + project_short_name=PROJECT_SHORT_NAME, + project_full_name=PROJECT_FULL_NAME, + search_hints=FLAGS.search_hints) + + +if __name__ == '__main__': + flags.mark_flag_as_required('output_dir') + app.run(main) diff --git a/official/utils/docs/build_vision_api_docs.py b/official/utils/docs/build_vision_api_docs.py deleted file mode 100644 index 514da657c2a54bfe1821e0dc0fcde1cff0e42720..0000000000000000000000000000000000000000 --- a/official/utils/docs/build_vision_api_docs.py +++ /dev/null @@ -1,87 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -r"""Tool to generate api_docs for tensorflow_models/official library. - -Example: - -$> pip install -U git+https://github.com/tensorflow/docs -$> python build_vision_api_docs \ - --output_dir=/tmp/api_docs -""" - -import os - -from absl import app -from absl import flags -from absl import logging -from tensorflow_docs.api_generator import generate_lib -from tensorflow_docs.api_generator import public_api - -import build_api_docs_lib -from official.vision.beta import modeling as tfvision - -FLAGS = flags.FLAGS - -flags.DEFINE_string('output_dir', None, 'Where to write the resulting docs to.') -flags.DEFINE_string( - 'code_url_prefix', - 'https://github.com/tensorflow/models/blob/master/official/vision/beta/modeling/', - 'The url prefix for links to code.') - -flags.DEFINE_bool('search_hints', True, - 'Include metadata search hints in the generated files') - -flags.DEFINE_string('site_path', 'tfvision/api_docs/python', - 'Path prefix in the _toc.yaml') - -PROJECT_SHORT_NAME = 'tfvision' -PROJECT_FULL_NAME = 'TensorFlow Official Models - Vision Modeling Library' - - -def gen_api_docs(code_url_prefix, site_path, output_dir, project_short_name, - project_full_name, search_hints): - """Generates api docs for the tensorflow docs package.""" - build_api_docs_lib.hide_module_model_and_layer_methods() - - doc_generator = generate_lib.DocGenerator( - root_title=project_full_name, - py_modules=[(project_short_name, tfvision)], - base_dir=os.path.dirname(tfvision.__file__), - code_url_prefix=code_url_prefix, - search_hints=search_hints, - site_path=site_path, - callbacks=[public_api.explicit_package_contents_filter], - ) - - doc_generator.build(output_dir) - logging.info('Output docs to: %s', output_dir) - - -def main(argv): - if len(argv) > 1: - raise app.UsageError('Too many command-line arguments.') - - gen_api_docs( - code_url_prefix=FLAGS.code_url_prefix, - site_path=FLAGS.site_path, - output_dir=FLAGS.output_dir, - project_short_name=PROJECT_SHORT_NAME, - project_full_name=PROJECT_FULL_NAME, - search_hints=FLAGS.search_hints) - - -if __name__ == '__main__': - flags.mark_flag_as_required('output_dir') - app.run(main) diff --git a/official/utils/flags/__init__.py b/official/utils/flags/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/utils/flags/__init__.py +++ b/official/utils/flags/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/_base.py b/official/utils/flags/_base.py index b8e1dc09a9dc49f5a50c2e3640f9974a00edf042..8e8f5b4d3cf21a5f3069656ee939ccde646465d5 100644 --- a/official/utils/flags/_base.py +++ b/official/utils/flags/_base.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/_benchmark.py b/official/utils/flags/_benchmark.py index abbe0a0b1a0ff990b00a677d320d8dffe8d22459..97adf5632680b8fdeb840ea32a3faf28c9ed85f8 100644 --- a/official/utils/flags/_benchmark.py +++ b/official/utils/flags/_benchmark.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/_conventions.py b/official/utils/flags/_conventions.py index a42ff42a2a1d5fc5791f9fb4865cf403f6218767..fa3d186d540bfe8357cf7be1a2ccf7635cf32de9 100644 --- a/official/utils/flags/_conventions.py +++ b/official/utils/flags/_conventions.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/_device.py b/official/utils/flags/_device.py index 9d76f48717d77d6b02be0dd622f46de76c2c03f3..09e004b720d7b1cd4df0100de1aec1366373b21e 100644 --- a/official/utils/flags/_device.py +++ b/official/utils/flags/_device.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/_distribution.py b/official/utils/flags/_distribution.py index 848e550cfed602cc692a975ab5e358fe2c638ddd..76ec5bb283e824a0f16e2e683c3812c6181ecf28 100644 --- a/official/utils/flags/_distribution.py +++ b/official/utils/flags/_distribution.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/_misc.py b/official/utils/flags/_misc.py index 744e3628bfdf9265a2132f4e607846687003e320..fc25d7bbf93d7330ea203b25d7f158ac8960880e 100644 --- a/official/utils/flags/_misc.py +++ b/official/utils/flags/_misc.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/_performance.py b/official/utils/flags/_performance.py index 5c05577beacfa280d9777b7387419f2b08c57167..6ccfd4a9f89a109c85d8be76252b99d1f10cd5dc 100644 --- a/official/utils/flags/_performance.py +++ b/official/utils/flags/_performance.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/core.py b/official/utils/flags/core.py index d864b957b30f901e751f365e118f07228e6cddf6..36a244da2392fe9f6e6cabfb9fe2ef73c7b6f004 100644 --- a/official/utils/flags/core.py +++ b/official/utils/flags/core.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/flags/flags_test.py b/official/utils/flags/flags_test.py index 11bc2ab4ce0aa39f1148e5992880531c6f63cbe3..f8c639c396380205f10fab9fb00417254a1758db 100644 --- a/official/utils/flags/flags_test.py +++ b/official/utils/flags/flags_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/hyperparams_flags.py b/official/utils/hyperparams_flags.py index e47bd8f066466f08502dd9a2757fb2afc078a508..d3428e0f9b894537d769e36399b88f2cfce41d68 100644 --- a/official/utils/hyperparams_flags.py +++ b/official/utils/hyperparams_flags.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/misc/__init__.py b/official/utils/misc/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/utils/misc/__init__.py +++ b/official/utils/misc/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/misc/keras_utils.py b/official/utils/misc/keras_utils.py index a5b20c8a3ebc36387e2997f67b2d411894c5ca57..c3e8d12b038c71b90447eb6173f9198dfb8b705b 100644 --- a/official/utils/misc/keras_utils.py +++ b/official/utils/misc/keras_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/misc/model_helpers.py b/official/utils/misc/model_helpers.py index 4c310588b39e32f23748772c64aa7ee9b4e987f2..f5065ceaef175063b09f59a659f726944fd6418f 100644 --- a/official/utils/misc/model_helpers.py +++ b/official/utils/misc/model_helpers.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/misc/model_helpers_test.py b/official/utils/misc/model_helpers_test.py index dd01c3431766d0ba00647ca2081c3f5687f2bfd5..6d5a3e84e224dbddd289437e5208138b1078fb75 100644 --- a/official/utils/misc/model_helpers_test.py +++ b/official/utils/misc/model_helpers_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/testing/__init__.py b/official/utils/testing/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/utils/testing/__init__.py +++ b/official/utils/testing/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/testing/integration.py b/official/utils/testing/integration.py index 763de50bef6a7ade0c27c2deca8597649f276719..84af32a015195ccba0adfb693a827b515843abf4 100644 --- a/official/utils/testing/integration.py +++ b/official/utils/testing/integration.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/testing/mock_task.py b/official/utils/testing/mock_task.py index b99b96d694cb5bfacef34eacec714b2a8337a8aa..dd7e493e197b75984337992f701ca003a235fb58 100644 --- a/official/utils/testing/mock_task.py +++ b/official/utils/testing/mock_task.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/utils/testing/scripts/presubmit.sh b/official/utils/testing/scripts/presubmit.sh index b94683f48aae5a3ffc8de0ac8d7eb5899a608444..7d9a0aec0f3f8a1d38faccc7a0904f7134c88655 100755 --- a/official/utils/testing/scripts/presubmit.sh +++ b/official/utils/testing/scripts/presubmit.sh @@ -32,7 +32,7 @@ py_test() { echo "===========Running Python test============" # Skipping Ranking tests, TODO(b/189265753) remove it once the issue is fixed. - for test_file in `find official/ -name '*test.py' -print | grep -v 'official/recommendation/ranking'` + for test_file in `find official/ -name '*test.py' -print | grep -v -E 'official/(recommendation/ranking|legacy)'` do echo "####=======Testing ${test_file}=======####" ${PY_BINARY} "${test_file}" diff --git a/official/vision/MODEL_GARDEN.md b/official/vision/MODEL_GARDEN.md new file mode 100644 index 0000000000000000000000000000000000000000..0ce10df86c3fad54d421270a93e6167195512f0e --- /dev/null +++ b/official/vision/MODEL_GARDEN.md @@ -0,0 +1,217 @@ +# TF-Vision Model Garden + +⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or +distributed by Google. The dataset is made available by third parties. +Please review the terms and conditions made available by the third parties +before using the data. + +## Introduction + +TF-Vision modeling library for computer vision provides a collection of +baselines and checkpoints for image classification, object detection, and +segmentation. + +## Image Classification + +### ImageNet Baselines + +#### ResNet models trained with vanilla settings + +* Models are trained from scratch with batch size 4096 and 1.6 initial learning + rate. +* Linear warmup is applied for the first 5 epochs. +* Models trained with l2 weight regularization and ReLU activation. + +| Model | Resolution | Epochs | Top-1 | Top-5 | Download | +| ------------ |:-------------:|--------:|--------:|--------:|---------:| +| ResNet-50 | 224x224 | 90 | 76.1 | 92.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) | +| ResNet-50 | 224x224 | 200 | 77.1 | 93.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) | +| ResNet-101 | 224x224 | 200 | 78.3 | 94.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) | +| ResNet-152 | 224x224 | 200 | 78.7 | 94.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) | + +#### ResNet-RS models trained with various settings + +We support state-of-the-art [ResNet-RS](https://arxiv.org/abs/2103.07579) image +classification models with features: + +* ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS + adopts ReLU activation in the paper.) +* Regularization methods including Random Augment, 4e-5 weight decay, stochastic +depth, label smoothing and dropout. +* New training methods including a 350-epoch schedule, cosine learning rate and + EMA. +* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification). + +| Model | Resolution | Params (M) | Top-1 | Top-5 | Download | +| --------- | :--------: | ---------: | ----: | ----: | --------:| +| ResNet-RS-50 | 160x160 | 35.7 | 79.1 | 94.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-50-i160.tar.gz) | +| ResNet-RS-101 | 160x160 | 63.7 | 80.2 | 94.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i160.tar.gz) | +| ResNet-RS-101 | 192x192 | 63.7 | 81.3 | 95.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz) | +| ResNet-RS-152 | 192x192 | 86.8 | 81.9 | 95.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i192.tar.gz) | +| ResNet-RS-152 | 224x224 | 86.8 | 82.5 | 96.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i224.tar.gz) | +| ResNet-RS-152 | 256x256 | 86.8 | 83.1 | 96.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i256.tar.gz) | +| ResNet-RS-200 | 256x256 | 93.4 | 83.5 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-200-i256.tar.gz) | +| ResNet-RS-270 | 256x256 | 130.1 | 83.6 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-270-i256.tar.gz) | +| ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) | +| ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) | + + +#### Vision Transformer (ViT) + +We support [ViT](https://arxiv.org/abs/2010.11929) and [DEIT](https://arxiv.org/abs/2012.12877) implementations in a TF +Vision +[project](https://github.com/tensorflow/models/tree/master/official/projects/vit). ViT models trained under the DEIT settings: + +model | resolution | Top-1 | Top-5 | +--------- | :--------: | ----: | ----: | +ViT-s16 | 224x224 | 79.4 | 94.7 | +ViT-b16 | 224x224 | 81.8 | 95.8 | +ViT-l16 | 224x224 | 82.2 | 95.8 | + + +## Object Detection and Instance Segmentation + +### Common Settings and Notes + +* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and + [SpineNet](https://arxiv.org/abs/1912.05027) backbones based on detection frameworks: + * [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057) + * [Mask R-CNN](https://arxiv.org/abs/1703.06870) + * [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057) +* Models are all trained on [COCO](https://cocodataset.org/) train2017 and +evaluated on [COCO](https://cocodataset.org/) val2017. +* Training details: + * Models finetuned from [ImageNet](https://www.image-net.org/) pretrained + checkpoints adopt the 12 or 36 epochs schedule. Models trained from scratch + adopt the 350 epochs schedule. + * The default training data augmentation implements horizontal flipping and + scale jittering with a random scale between [0.5, 2.0]. + * Unless noted, all models are trained with l2 weight regularization and ReLU + activation. + * We use batch size 256 and stepwise learning rate that decays at the last 30 + and 10 epoch. + * We use square image as input by resizing the long side of an image to the + target size then padding the short side with zeros. + +### COCO Object Detection Baselines + +#### RetinaNet (ImageNet pretrained) + +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | +| ------------ |:-------------:| -------:|--------------:|-----------:|-------:|---------:| +| R50-FPN | 640x640 | 12 | 97.0 | 34.0 | 34.3 | config| +| R50-FPN | 640x640 | 72 | 97.0 | 34.0 | 36.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/retinanet.py#L187-L258) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/retinanet-resnet50fpn.tar.gz) | + +#### RetinaNet (Trained from scratch) with training features including: + +* Stochastic depth with drop rate 0.2. +* Swish activation. + +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | +| ------------ |:-------------:| -------:|--------------:|-----------:|--------:|---------:| +| SpineNet-49 | 640x640 | 500 | 85.4| 28.5 | 44.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| +| SpineNet-96 | 1024x1024 | 500 | 265.4 | 43.0 | 48.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| +| SpineNet-143 | 1280x1280 | 500 | 524.0 | 67.0 | 50.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet143_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| + +#### Mobile-size RetinaNet (Trained from scratch): + +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | +| ----------- | :--------: | -----: | --------: | ---------: | -----: | --------:| +| MobileNetv2 | 256x256 | 600 | - | 2.27 | 23.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml) | +| Mobile SpineNet-49 | 384x384 | 600 | 1.0 | 2.32 | 28.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/spinenet49mobile.tar.gz) | + +### Instance Segmentation Baselines + +#### Mask R-CNN (Trained from scratch) + +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download | +| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:| +| ResNet50-FPN | 640x640 | 350 | 227.7 | 46.3 | 42.3 | 37.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) | +| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) | +| SpineNet-96 | 1024x1024 | 500 | 315.0 | 55.2 | 48.1 | 42.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) | +| SpineNet-143 | 1280x1280 | 500 | 498.8 | 79.2 | 49.3 | 43.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) | + + +#### Cascade RCNN-RS (Trained from scratch) + +| Backbone | Resolution | Epochs | Params (M) | Box AP | Mask AP | Download +------------ | :--------: | -----: | ---------: | -----: | ------: | -------: +| SpineNet-49 | 640x640 | 500 | 56.4 | 46.4 | 40.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)| +| SpineNet-96 | 1024x1024 | 500 | 70.8 | 50.9 | 43.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml)| +| SpineNet-143 | 1280x1280 | 500 | 94.9 | 51.9 | 45.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)| + +## Semantic Segmentation + +* We support [DeepLabV3](https://arxiv.org/pdf/1706.05587.pdf) and + [DeepLabV3+](https://arxiv.org/pdf/1802.02611.pdf) architectures, with + Dilated ResNet backbones. +* Backbones are pre-trained on ImageNet. + +### PASCAL-VOC + +| Model | Backbone | Resolution | Steps | mIoU | Download | +| ---------- | :----------------: | :--------: | ----: | ---: | --------:| +| DeepLabV3 | Dilated Resnet-101 | 512x512 | 30k | 78.7 | | +| DeepLabV3+ | Dilated Resnet-101 | 512x512 | 30k | 79.2 | | + +### CITYSCAPES + +| Model | Backbone | Resolution | Steps | mIoU | Download | +| ---------- | :----------------: | :--------: | ----: | ----: | --------:| +| DeepLabV3+ | Dilated Resnet-101 | 1024x2048 | 90k | 78.79 | | + +## Video Classification + +### Common Settings and Notes + +* We provide models for video classification with backbones: + * SlowOnly in + [SlowFast Networks for Video Recognition](https://arxiv.org/abs/1812.03982). + * ResNet-3D (R3D) in + [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800). + * ResNet-3D-RS (R3D-RS) in + [Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf). + * Mobile Video Networks (MoViNets) in + [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511). + +* Training and evaluation details (SlowFast and ResNet): + * All models are trained from scratch with vision modality (RGB) for 200 + epochs. + * We use batch size of 1024 and cosine learning rate decay with linear warmup + in first 5 epochs. + * We follow [SlowFast](https://arxiv.org/abs/1812.03982) to perform 30-view + evaluation. + +### Kinetics-400 Action Recognition Baselines + +| Model | Input (frame x stride) | Top-1 | Top-5 | Download | +| -------- |:----------------------:|--------:|--------:|---------:| +| SlowOnly | 8 x 8 | 74.1 | 91.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml) | +| SlowOnly | 16 x 4 | 75.6 | 92.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml) | +| R3D-50 | 32 x 2 | 77.0 | 93.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml) | +| R3D-RS-50 | 32 x 2 | 78.2 | 93.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml) | +| R3D-RS-101 | 32 x 2 | 79.5 | 94.2 | - +| R3D-RS-152 | 32 x 2 | 79.9 | 94.3 | - +| R3D-RS-200 | 32 x 2 | 80.4 | 94.4 | - +| R3D-RS-200 | 48 x 2 | 81.0 | - | - +| MoViNet-A0-Base | 50 x 5 | 69.40 | 89.18 | - +| MoViNet-A1-Base | 50 x 5 | 74.57 | 92.03 | - +| MoViNet-A2-Base | 50 x 5 | 75.91 | 92.63 | - +| MoViNet-A3-Base | 120 x 2 | 79.34 | 94.52 | - +| MoViNet-A4-Base | 80 x 3 | 80.64 | 94.93 | - +| MoViNet-A5-Base | 120 x 2 | 81.39 | 95.06 | - + +### Kinetics-600 Action Recognition Baselines + +| Model | Input (frame x stride) | Top-1 | Top-5 | Download | +| -------- |:----------------------:|--------:|--------:|---------:| +| SlowOnly | 8 x 8 | 77.3 | 93.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml) | +| R3D-50 | 32 x 2 | 79.5 | 94.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) | +| R3D-RS-200 | 32 x 2 | 83.1 | - | - +| R3D-RS-200 | 48 x 2 | 83.8 | - | - +| MoViNet-A0-Base | 50 x 5 | 72.05 | 90.92 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml) | +| MoViNet-A1-Base | 50 x 5 | 76.69 | 93.40 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml) | +| MoViNet-A2-Base | 50 x 5 | 78.62 | 94.17 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml) | +| MoViNet-A3-Base | 120 x 2 | 81.79 | 95.67 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml) | +| MoViNet-A4-Base | 80 x 3 | 83.48 | 96.16 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml) | +| MoViNet-A5-Base | 120 x 2 | 84.27 | 96.39 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml) | diff --git a/official/vision/README.md b/official/vision/README.md new file mode 100644 index 0000000000000000000000000000000000000000..57365b3c16fd77f51a22b7107c9312a210b6106e --- /dev/null +++ b/official/vision/README.md @@ -0,0 +1,295 @@ +# TF-Vision Model Garden + +⚠️ Disclaimer: All datasets hyperlinked from this page are not owned or +distributed by Google. The dataset is made available by third parties. +Please review the terms and conditions made available by the third parties +before using the data. + +## Table of Contents + +- [Introduction](#introduction) +- [Image Classification](#image-classification) + * [ResNet models trained with vanilla settings](#resnet-models-trained-with-vanilla-settings) + * [ResNet-RS models trained with various settings](#resnet-rs-models-trained-with-various-settings) + * [Vision Transformer (ViT)](#vision-transformer-ViT) +- [Object Detection and Instance Segmentation](#object-detection-and-instance-segmentation) + * [Common Settings and Notes](#Common-Settings-and-Notes) +- [COCO Object Detection Baselines](#COCO-Object-Detection-Baselines) + * [RetinaNet (ImageNet pretrained)](#RetinaNet-ImageNet-pretrained) + * [RetinaNet (Trained from scratch)](#RetinaNet-Trained-from-scratch) + * [Mobile-size RetinaNet (Trained from scratch)](#Mobile-size-RetinaNet-Trained-from-scratch)) +- [Instance Segmentation Baselines](#Instance-Segmentation-Baselines) + * [Mask R-CNN (Trained from scratch)](#Mask-R-CNN-Trained-from-scratch) + * [Cascade RCNN-RS (Trained from scratch)](#Cascade-RCNN-RS-Trained-from-scratch) +- [Semantic Segmentation](#semantic-segmentation) + * [PASCAL-VOC](#PASCAL-VOC) + * [CITYSCAPES](#CITYSCAPES) +- [Video Classification](#video-classification) + * [Common Settings and Notes](#Common-Settings-and-Notes) + * [Kinetics-400 Action Recognition Baselines](#Kinetics-400-Action-Recognition-Baselines) + * [Kinetics-600 Action Recognition Baselines](#Kinetics-600-Action-Recognition-Baselines) + +## Introduction + +TF-Vision modeling library for computer vision provides a collection of +baselines and checkpoints for image classification, object detection, and +segmentation. + +## Image Classification + +### ResNet models trained with vanilla settings + +
+ +* Models are trained from scratch with batch size 4096 and 1.6 initial learning + rate. +* Linear warmup is applied for the first 5 epochs. +* Models trained with l2 weight regularization and ReLU activation. + +| Model | Resolution | Epochs | Top-1 | Top-5 | Download | +| ------------ |:-------------:|--------:|--------:|--------:|---------:| +| ResNet-50 | 224x224 | 90 | 76.1 | 92.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) | +| ResNet-50 | 224x224 | 200 | 77.1 | 93.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) | +| ResNet-101 | 224x224 | 200 | 78.3 | 94.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) | +| ResNet-152 | 224x224 | 200 | 78.7 | 94.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) | + +
+ +### ResNet-RS models trained with various settings + +
+ +We support state-of-the-art [ResNet-RS](https://arxiv.org/abs/2103.07579) image +classification models with features: + +* ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS + adopts ReLU activation in the paper.) +* Regularization methods including Random Augment, 4e-5 weight decay, stochastic +depth, label smoothing and dropout. +* New training methods including a 350-epoch schedule, cosine learning rate and + EMA. +* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification). + +| Model | Resolution | Params (M) | Top-1 | Top-5 | Download | +| --------- | :--------: | ---------: | ----: | ----: | --------:| +| ResNet-RS-50 | 160x160 | 35.7 | 79.1 | 94.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-50-i160.tar.gz) | +| ResNet-RS-101 | 160x160 | 63.7 | 80.2 | 94.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i160.tar.gz) | +| ResNet-RS-101 | 192x192 | 63.7 | 81.3 | 95.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz) | +| ResNet-RS-152 | 192x192 | 86.8 | 81.9 | 95.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i192.tar.gz) | +| ResNet-RS-152 | 224x224 | 86.8 | 82.5 | 96.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i224.tar.gz) | +| ResNet-RS-152 | 256x256 | 86.8 | 83.1 | 96.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i256.tar.gz) | +| ResNet-RS-200 | 256x256 | 93.4 | 83.5 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-200-i256.tar.gz) | +| ResNet-RS-270 | 256x256 | 130.1 | 83.6 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-270-i256.tar.gz) | +| ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) | +| ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/image_classification/imagenet_resnetrs420_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) | + +
+ +### Vision Transformer (ViT) + +
+ +We support [ViT](https://arxiv.org/abs/2010.11929) and [DEIT](https://arxiv.org/abs/2012.12877) implementations in a TF +Vision +[project](https://github.com/tensorflow/models/tree/master/official/projects/vit). ViT models trained under the DEIT settings: + +model | resolution | Top-1 | Top-5 | +--------- | :--------: | ----: | ----: | +ViT-s16 | 224x224 | 79.4 | 94.7 | +ViT-b16 | 224x224 | 81.8 | 95.8 | +ViT-l16 | 224x224 | 82.2 | 95.8 | + +
+ +## Object Detection and Instance Segmentation + +### Common Settings and Notes + +
+ +* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) + and [SpineNet](https://arxiv.org/abs/1912.05027) backbones based on + detection frameworks: + * [RetinaNet](https://arxiv.org/abs/1708.02002) and + [RetinaNet-RS](https://arxiv.org/abs/2107.00057) + * [Mask R-CNN](https://arxiv.org/abs/1703.06870) + * [Cascade RCNN](https://arxiv.org/abs/1712.00726) and + [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057) +* Models are all trained on [COCO](https://cocodataset.org/) train2017 and + evaluated on [COCO](https://cocodataset.org/) val2017. +* Training details: + * Models finetuned from [ImageNet](https://www.image-net.org/) pretrained + checkpoints adopt the 12 or 36 epochs schedule. Models trained from + scratch adopt the 350 epochs schedule. + * The default training data augmentation implements horizontal flipping + and scale jittering with a random scale between [0.5, 2.0]. + * Unless noted, all models are trained with l2 weight regularization and + ReLU activation. + * We use batch size 256 and stepwise learning rate that decays at the last + 30 and 10 epoch. + * We use square image as input by resizing the long side of an image to + the target size then padding the short side with zeros. + +
+ +## COCO Object Detection Baselines + +### RetinaNet (ImageNet pretrained) + +
+ +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | +| ------------ |:-------------:| -------:|--------------:|-----------:|-------:|---------:| +| R50-FPN | 640x640 | 12 | 97.0 | 34.0 | 34.3 | config| +| R50-FPN | 640x640 | 72 | 97.0 | 34.0 | 36.8 | config \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/retinanet-resnet50fpn.tar.gz) | + +
+ +### RetinaNet (Trained from scratch) + +
+ +training features including: +* Stochastic depth with drop rate 0.2. +* Swish activation. + +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | +| ------------ |:-------------:| -------:|--------------:|-----------:|--------:|---------:| +| SpineNet-49 | 640x640 | 500 | 85.4| 28.5 | 44.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| +| SpineNet-96 | 1024x1024 | 500 | 265.4 | 43.0 | 48.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| +| SpineNet-143 | 1280x1280 | 500 | 524.0 | 67.0 | 50.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet143_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| + +
+ +### Mobile-size RetinaNet (Trained from scratch): + +
+ +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | +| ----------- | :--------: | -----: | --------: | ---------: | -----: | --------:| +| MobileNetv2 | 256x256 | 600 | - | 2.27 | 23.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml) | +| Mobile SpineNet-49 | 384x384 | 600 | 1.0 | 2.32 | 28.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/spinenet49mobile.tar.gz) | + +
+ +## Instance Segmentation Baselines + +### Mask R-CNN (Trained from scratch) + +
+ +| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download | +| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:| +| ResNet50-FPN | 640x640 | 350 | 227.7 | 46.3 | 42.3 | 37.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) | +| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) | +| SpineNet-96 | 1024x1024 | 500 | 315.0 | 55.2 | 48.1 | 42.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) | +| SpineNet-143 | 1280x1280 | 500 | 498.8 | 79.2 | 49.3 | 43.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) | + +
+ +### Cascade RCNN-RS (Trained from scratch) + +
+ +| Backbone | Resolution | Epochs | Params (M) | Box AP | Mask AP | Download +------------ | :--------: | -----: | ---------: | -----: | ------: | -------: +| SpineNet-49 | 640x640 | 500 | 56.4 | 46.4 | 40.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)| +| SpineNet-96 | 1024x1024 | 500 | 70.8 | 50.9 | 43.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml)| +| SpineNet-143 | 1280x1280 | 500 | 94.9 | 51.9 | 45.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)| + +
+ +## Semantic Segmentation + +* We support [DeepLabV3](https://arxiv.org/pdf/1706.05587.pdf) and + [DeepLabV3+](https://arxiv.org/pdf/1802.02611.pdf) architectures, with + Dilated ResNet backbones. +* Backbones are pre-trained on ImageNet. + +### PASCAL-VOC + +
+ +| Model | Backbone | Resolution | Steps | mIoU | Download | +| ---------- | :----------------: | :--------: | ----: | ---: | --------:| +| DeepLabV3 | Dilated Resnet-101 | 512x512 | 30k | 78.7 | | +| DeepLabV3+ | Dilated Resnet-101 | 512x512 | 30k | 79.2 | | + +
+ +### CITYSCAPES + +
+ +| Model | Backbone | Resolution | Steps | mIoU | Download | +| ---------- | :----------------: | :--------: | ----: | ----: | --------:| +| DeepLabV3+ | Dilated Resnet-101 | 1024x2048 | 90k | 78.79 | | + +
+ +## Video Classification + +### Common Settings and Notes + +
+ +* We provide models for video classification with backbones: + * SlowOnly in + [SlowFast Networks for Video Recognition](https://arxiv.org/abs/1812.03982). + * ResNet-3D (R3D) in + [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800). + * ResNet-3D-RS (R3D-RS) in + [Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf). + * Mobile Video Networks (MoViNets) in + [MoViNets: Mobile Video Networks for Efficient Video Recognition](https://arxiv.org/abs/2103.11511). + +* Training and evaluation details (SlowFast and ResNet): + * All models are trained from scratch with vision modality (RGB) for 200 + epochs. + * We use batch size of 1024 and cosine learning rate decay with linear warmup + in first 5 epochs. + * We follow [SlowFast](https://arxiv.org/abs/1812.03982) to perform 30-view + evaluation. + +
+ +### Kinetics-400 Action Recognition Baselines + +
+ +| Model | Input (frame x stride) | Top-1 | Top-5 | Download | +| -------- |:----------------------:|--------:|--------:|---------:| +| SlowOnly | 8 x 8 | 74.1 | 91.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml) | +| SlowOnly | 16 x 4 | 75.6 | 92.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml) | +| R3D-50 | 32 x 2 | 77.0 | 93.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml) | +| R3D-RS-50 | 32 x 2 | 78.2 | 93.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml) | +| R3D-RS-101 | 32 x 2 | 79.5 | 94.2 | - +| R3D-RS-152 | 32 x 2 | 79.9 | 94.3 | - +| R3D-RS-200 | 32 x 2 | 80.4 | 94.4 | - +| R3D-RS-200 | 48 x 2 | 81.0 | - | - +| MoViNet-A0-Base | 50 x 5 | 69.40 | 89.18 | - +| MoViNet-A1-Base | 50 x 5 | 74.57 | 92.03 | - +| MoViNet-A2-Base | 50 x 5 | 75.91 | 92.63 | - +| MoViNet-A3-Base | 120 x 2 | 79.34 | 94.52 | - +| MoViNet-A4-Base | 80 x 3 | 80.64 | 94.93 | - +| MoViNet-A5-Base | 120 x 2 | 81.39 | 95.06 | - + +
+ +### Kinetics-600 Action Recognition Baselines + +
+ +| Model | Input (frame x stride) | Top-1 | Top-5 | Download | +| -------- |:----------------------:|--------:|--------:|---------:| +| SlowOnly | 8 x 8 | 77.3 | 93.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml) | +| R3D-50 | 32 x 2 | 79.5 | 94.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) | +| R3D-RS-200 | 32 x 2 | 83.1 | - | - +| R3D-RS-200 | 48 x 2 | 83.8 | - | - +| MoViNet-A0-Base | 50 x 5 | 72.05 | 90.92 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml) | +| MoViNet-A1-Base | 50 x 5 | 76.69 | 93.40 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a1_k600_8x8.yaml) | +| MoViNet-A2-Base | 50 x 5 | 78.62 | 94.17 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a2_k600_8x8.yaml) | +| MoViNet-A3-Base | 120 x 2 | 81.79 | 95.67 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a3_k600_8x8.yaml) | +| MoViNet-A4-Base | 80 x 3 | 83.48 | 96.16 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a4_k600_8x8.yaml) | +| MoViNet-A5-Base | 120 x 2 | 84.27 | 96.39 | [config](https://github.com/tensorflow/models/blob/master/official/projects/movinet/configs/yaml/movinet_a5_k600_8x8.yaml) | +
diff --git a/official/vision/__init__.py b/official/vision/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..b691d1b83e054e871ee4711ba23c9eaef9c32b50 100644 --- a/official/vision/__init__.py +++ b/official/vision/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,3 +12,7 @@ # See the License for the specific language governing permissions and # limitations under the License. +"""Vision package definition.""" +# pylint: disable=unused-import +from official.vision import configs +from official.vision import tasks diff --git a/official/vision/beta/MODEL_GARDEN.md b/official/vision/beta/MODEL_GARDEN.md deleted file mode 100644 index ac429bab6fbdf2d69c5fcf1185c1d8c40f049954..0000000000000000000000000000000000000000 --- a/official/vision/beta/MODEL_GARDEN.md +++ /dev/null @@ -1,182 +0,0 @@ -# TF-Vision Model Garden - -## Introduction - -TF-Vision modeling library for computer vision provides a collection of -baselines and checkpoints for image classification, object detection, and -segmentation. - -## Image Classification - -### ImageNet Baselines - -#### ResNet models trained with vanilla settings - -* Models are trained from scratch with batch size 4096 and 1.6 initial learning - rate. -* Linear warmup is applied for the first 5 epochs. -* Models trained with l2 weight regularization and ReLU activation. - -| Model | Resolution | Epochs | Top-1 | Top-5 | Download | -| ------------ |:-------------:|--------:|--------:|--------:|---------:| -| ResNet-50 | 224x224 | 90 | 76.1 | 92.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) | -| ResNet-50 | 224x224 | 200 | 77.1 | 93.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml) | -| ResNet-101 | 224x224 | 200 | 78.3 | 94.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml) | -| ResNet-152 | 224x224 | 200 | 78.7 | 94.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml) | - -#### ResNet-RS models trained with various settings - -We support state-of-the-art [ResNet-RS](https://arxiv.org/abs/2103.07579) image -classification models with features: - -* ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS - adopts ReLU activation in the paper.) -* Regularization methods including Random Augment, 4e-5 weight decay, stochastic -depth, label smoothing and dropout. -* New training methods including a 350-epoch schedule, cosine learning rate and - EMA. -* Configs are in this [directory](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification). - -| Model | Resolution | Params (M) | Top-1 | Top-5 | Download | -| --------- | :--------: | ---------: | ----: | ----: | --------:| -| ResNet-RS-50 | 160x160 | 35.7 | 79.1 | 94.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-50-i160.tar.gz) | -| ResNet-RS-101 | 160x160 | 63.7 | 80.2 | 94.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i160.tar.gz) | -| ResNet-RS-101 | 192x192 | 63.7 | 81.3 | 95.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-101-i192.tar.gz) | -| ResNet-RS-152 | 192x192 | 86.8 | 81.9 | 95.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i192.tar.gz) | -| ResNet-RS-152 | 224x224 | 86.8 | 82.5 | 96.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i224.tar.gz) | -| ResNet-RS-152 | 256x256 | 86.8 | 83.1 | 96.3 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-152-i256.tar.gz) | -| ResNet-RS-200 | 256x256 | 93.4 | 83.5 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-200-i256.tar.gz) | -| ResNet-RS-270 | 256x256 | 130.1 | 83.6 | 96.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-270-i256.tar.gz) | -| ResNet-RS-350 | 256x256 | 164.3 | 83.7 | 96.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i256.tar.gz) | -| ResNet-RS-350 | 320x320 | 164.3 | 84.2 | 96.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i256.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/resnet-rs/resnet-rs-350-i320.tar.gz) | - -## Object Detection and Instance Segmentation - -### Common Settings and Notes - -* We provide models adopting [ResNet-FPN](https://arxiv.org/abs/1612.03144) and - [SpineNet](https://arxiv.org/abs/1912.05027) backbones based on detection frameworks: - * [RetinaNet](https://arxiv.org/abs/1708.02002) and [RetinaNet-RS](https://arxiv.org/abs/2107.00057) - * [Mask R-CNN](https://arxiv.org/abs/1703.06870) - * [Cascade RCNN](https://arxiv.org/abs/1712.00726) and [Cascade RCNN-RS](https://arxiv.org/abs/2107.00057) - -* Models are all trained on COCO train2017 and evaluated on COCO val2017. -* Training details: - * Models finetuned from ImageNet pretrained checkpoints adopt the 12 or 36 - epochs schedule. Models trained from scratch adopt the 350 epochs schedule. - * The default training data augmentation implements horizontal flipping and - scale jittering with a random scale between [0.5, 2.0]. - * Unless noted, all models are trained with l2 weight regularization and ReLU - activation. - * We use batch size 256 and stepwise learning rate that decays at the last 30 - and 10 epoch. - * We use square image as input by resizing the long side of an image to the - target size then padding the short side with zeros. - -### COCO Object Detection Baselines - -#### RetinaNet (ImageNet pretrained) - -| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | -| ------------ |:-------------:| -------:|--------------:|-----------:|-------:|---------:| -| R50-FPN | 640x640 | 12 | 97.0 | 34.0 | 34.3 | config| -| R50-FPN | 640x640 | 72 | 97.0 | 34.0 | 36.8 | config \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/retinanet-resnet50fpn.tar.gz) | - -#### RetinaNet (Trained from scratch) with training features including: - -* Stochastic depth with drop rate 0.2. -* Swish activation. - -| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | -| ------------ |:-------------:| -------:|--------------:|-----------:|--------:|---------:| -| SpineNet-49 | 640x640 | 500 | 85.4| 28.5 | 44.2 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| -| SpineNet-96 | 1024x1024 | 500 | 265.4 | 43.0 | 48.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet96_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| -| SpineNet-143 | 1280x1280 | 500 | 524.0 | 67.0 | 50.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet143_tpu.yaml) \| [TB.dev](https://tensorboard.dev/experiment/n2UN83TkTdyKZn3slCWulg/#scalars&_smoothingWeight=0)| - -#### Mobile-size RetinaNet (Trained from scratch): - -| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Download | -| ----------- | :--------: | -----: | --------: | ---------: | -----: | --------:| -| MobileNetv2 | 256x256 | 600 | - | 2.27 | 23.5 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml) | -| Mobile SpineNet-49 | 384x384 | 600 | 1.0 | 2.32 | 28.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml) \| [ckpt](https://storage.cloud.google.com/tf_model_garden/vision/retinanet/spinenet49mobile.tar.gz) | - -### Instance Segmentation Baselines - -#### Mask R-CNN (Trained from scratch) - -| Backbone | Resolution | Epochs | FLOPs (B) | Params (M) | Box AP | Mask AP | Download | -| ------------ |:-------------:| -------:|-----------:|-----------:|-------:|--------:|---------:| -ResNet50-FPN | 640x640 | 350 | 227.7 | 46.3 | 42.3 | 37.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml) | -| SpineNet-49 | 640x640 | 350 | 215.7 | 40.8 | 42.6 | 37.9 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml) | -SpineNet-96 | 1024x1024 | 500 | 315.0 | 55.2 | 48.1 | 42.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml) | -SpineNet-143 | 1280x1280 | 500 | 498.8 | 79.2 | 49.3 | 43.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml) | - - -#### Cascade RCNN-RS (Trained from scratch) - -backbone | resolution | epochs | params (M) | box AP | mask AP | download ------------- | :--------: | -----: | ---------: | -----: | ------: | -------: -SpineNet-49 | 640x640 | 500 | 56.4 | 46.4 | 40.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml)| -SpineNet-143 | 1280x1280 | 500 | 94.9 | 51.9 | 45.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml)| - -## Semantic Segmentation - -* We support [DeepLabV3](https://arxiv.org/pdf/1706.05587.pdf) and - [DeepLabV3+](https://arxiv.org/pdf/1802.02611.pdf) architectures, with - Dilated ResNet backbones. -* Backbones are pre-trained on ImageNet. - -### PASCAL-VOC - -| Model | Backbone | Resolution | Steps | mIoU | Download | -| ---------- | :----------------: | :--------: | ----: | ---: | --------:| -| DeepLabV3 | Dilated Resnet-101 | 512x512 | 30k | 78.7 | | -| DeepLabV3+ | Dilated Resnet-101 | 512x512 | 30k | 79.2 | | - -### CITYSCAPES - -| Model | Backbone | Resolution | Steps | mIoU | Download | -| ---------- | :----------------: | :--------: | ----: | ----: | --------:| -| DeepLabV3+ | Dilated Resnet-101 | 1024x2048 | 90k | 78.79 | | - -## Video Classification - -### Common Settings and Notes - -* We provide models for video classification with backbones: - * SlowOnly in - [SlowFast Networks for Video Recognition](https://arxiv.org/abs/1812.03982). - * ResNet-3D (R3D) in - [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800). - * ResNet-3D-RS (R3D-RS) in - [Revisiting 3D ResNets for Video Recognition](https://arxiv.org/pdf/2109.01696.pdf). - -* Training and evaluation details: - * All models are trained from scratch with vision modality (RGB) for 200 - epochs. - * We use batch size of 1024 and cosine learning rate decay with linear warmup - in first 5 epochs. - * We follow [SlowFast](https://arxiv.org/abs/1812.03982) to perform 30-view - evaluation. - -### Kinetics-400 Action Recognition Baselines - -| Model | Input (frame x stride) | Top-1 | Top-5 | Download | -| -------- |:----------------------:|--------:|--------:|---------:| -| SlowOnly | 8 x 8 | 74.1 | 91.4 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml) | -| SlowOnly | 16 x 4 | 75.6 | 92.1 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml) | -| R3D-50 | 32 x 2 | 77.0 | 93.0 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml) | -| R3D-RS-50 | 32 x 2 | 78.2 | 93.7 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml) | -| R3D-RS-101 | 32 x 2 | 79.5 | 94.2 | - -| R3D-RS-152 | 32 x 2 | 79.9 | 94.3 | - -| R3D-RS-200 | 32 x 2 | 80.4 | 94.4 | - -| R3D-RS-200 | 48 x 2 | 81.0 | - | - - -### Kinetics-600 Action Recognition Baselines - -| Model | Input (frame x stride) | Top-1 | Top-5 | Download | -| -------- |:----------------------:|--------:|--------:|---------:| -| SlowOnly | 8 x 8 | 77.3 | 93.6 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml) | -| R3D-50 | 32 x 2 | 79.5 | 94.8 | [config](https://github.com/tensorflow/models/blob/master/official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml) | -| R3D-RS-200 | 32 x 2 | 83.1 | - | - -| R3D-RS-200 | 48 x 2 | 83.8 | - | - diff --git a/official/vision/beta/README.md b/official/vision/beta/README.md deleted file mode 100644 index 065323b844c30c1e01b803694ec010dbae077380..0000000000000000000000000000000000000000 --- a/official/vision/beta/README.md +++ /dev/null @@ -1 +0,0 @@ -This directory contains the new design of TF model garden vision framework. diff --git a/official/vision/beta/__init__.py b/official/vision/beta/__init__.py deleted file mode 100644 index 91f07553490b4602e6a97aba939748b1a2dbef3e..0000000000000000000000000000000000000000 --- a/official/vision/beta/__init__.py +++ /dev/null @@ -1,19 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Vision package definition.""" -# Lint as: python3 -# pylint: disable=unused-import -from official.vision.beta import configs -from official.vision.beta import tasks diff --git a/official/vision/beta/configs/__init__.py b/official/vision/beta/configs/__init__.py deleted file mode 100644 index 925339330799dcf09d804daf73ad957370e5f6d2..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/__init__.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configs package definition.""" - -from official.vision.beta.configs import image_classification -from official.vision.beta.configs import maskrcnn -from official.vision.beta.configs import retinanet -from official.vision.beta.configs import semantic_segmentation -from official.vision.beta.configs import video_classification diff --git a/official/vision/beta/configs/backbones.py b/official/vision/beta/configs/backbones.py deleted file mode 100644 index b4eb0c52269f846f677d405099fb96a7ff0c5a4f..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/backbones.py +++ /dev/null @@ -1,132 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Backbones configurations.""" -import dataclasses -from typing import Optional, List - -# Import libraries - -from official.modeling import hyperparams - - -@dataclasses.dataclass -class ResNet(hyperparams.Config): - """ResNet config.""" - model_id: int = 50 - depth_multiplier: float = 1.0 - stem_type: str = 'v0' - se_ratio: float = 0.0 - stochastic_depth_drop_rate: float = 0.0 - scale_stem: bool = True - resnetd_shortcut: bool = False - replace_stem_max_pool: bool = False - bn_trainable: bool = True - - -@dataclasses.dataclass -class DilatedResNet(hyperparams.Config): - """DilatedResNet config.""" - model_id: int = 50 - output_stride: int = 16 - multigrid: Optional[List[int]] = None - stem_type: str = 'v0' - last_stage_repeats: int = 1 - se_ratio: float = 0.0 - stochastic_depth_drop_rate: float = 0.0 - - -@dataclasses.dataclass -class EfficientNet(hyperparams.Config): - """EfficientNet config.""" - model_id: str = 'b0' - se_ratio: float = 0.0 - stochastic_depth_drop_rate: float = 0.0 - - -@dataclasses.dataclass -class MobileNet(hyperparams.Config): - """Mobilenet config.""" - model_id: str = 'MobileNetV2' - filter_size_scale: float = 1.0 - stochastic_depth_drop_rate: float = 0.0 - output_stride: Optional[int] = None - output_intermediate_endpoints: bool = False - - -@dataclasses.dataclass -class SpineNet(hyperparams.Config): - """SpineNet config.""" - model_id: str = '49' - stochastic_depth_drop_rate: float = 0.0 - min_level: int = 3 - max_level: int = 7 - - -@dataclasses.dataclass -class SpineNetMobile(hyperparams.Config): - """SpineNet config.""" - model_id: str = '49' - stochastic_depth_drop_rate: float = 0.0 - se_ratio: float = 0.2 - expand_ratio: int = 6 - min_level: int = 3 - max_level: int = 7 - # If use_keras_upsampling_2d is True, model uses UpSampling2D keras layer - # instead of optimized custom TF op. It makes model be more keras style. We - # set this flag to True when we apply QAT from model optimization toolkit - # that requires the model should use keras layers. - use_keras_upsampling_2d: bool = False - - -@dataclasses.dataclass -class RevNet(hyperparams.Config): - """RevNet config.""" - # Specifies the depth of RevNet. - model_id: int = 56 - - -@dataclasses.dataclass -class MobileDet(hyperparams.Config): - """Mobiledet config.""" - model_id: str = 'MobileDetCPU' - filter_size_scale: float = 1.0 - - -@dataclasses.dataclass -class Backbone(hyperparams.OneOfConfig): - """Configuration for backbones. - - Attributes: - type: 'str', type of backbone be used, one of the fields below. - resnet: resnet backbone config. - dilated_resnet: dilated resnet backbone for semantic segmentation config. - revnet: revnet backbone config. - efficientnet: efficientnet backbone config. - spinenet: spinenet backbone config. - spinenet_mobile: mobile spinenet backbone config. - mobilenet: mobilenet backbone config. - mobiledet: mobiledet backbone config. - """ - type: Optional[str] = None - resnet: ResNet = ResNet() - dilated_resnet: DilatedResNet = DilatedResNet() - revnet: RevNet = RevNet() - efficientnet: EfficientNet = EfficientNet() - spinenet: SpineNet = SpineNet() - spinenet_mobile: SpineNetMobile = SpineNetMobile() - mobilenet: MobileNet = MobileNet() - mobiledet: MobileDet = MobileDet() - diff --git a/official/vision/beta/configs/common.py b/official/vision/beta/configs/common.py deleted file mode 100644 index 1013bea8de780f04d9eda8735e4f32a0ec88d2e6..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/common.py +++ /dev/null @@ -1,137 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Common configurations.""" - -import dataclasses -from typing import List, Optional - -# Import libraries - -from official.core import config_definitions as cfg -from official.modeling import hyperparams - - -@dataclasses.dataclass -class TfExampleDecoder(hyperparams.Config): - """A simple TF Example decoder config.""" - regenerate_source_id: bool = False - mask_binarize_threshold: Optional[float] = None - - -@dataclasses.dataclass -class TfExampleDecoderLabelMap(hyperparams.Config): - """TF Example decoder with label map config.""" - regenerate_source_id: bool = False - mask_binarize_threshold: Optional[float] = None - label_map: str = '' - - -@dataclasses.dataclass -class DataDecoder(hyperparams.OneOfConfig): - """Data decoder config. - - Attributes: - type: 'str', type of data decoder be used, one of the fields below. - simple_decoder: simple TF Example decoder config. - label_map_decoder: TF Example decoder with label map config. - """ - type: Optional[str] = 'simple_decoder' - simple_decoder: TfExampleDecoder = TfExampleDecoder() - label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap() - - -@dataclasses.dataclass -class RandAugment(hyperparams.Config): - """Configuration for RandAugment.""" - num_layers: int = 2 - magnitude: float = 10 - cutout_const: float = 40 - translate_const: float = 10 - magnitude_std: float = 0.0 - prob_to_apply: Optional[float] = None - exclude_ops: List[str] = dataclasses.field(default_factory=list) - - -@dataclasses.dataclass -class AutoAugment(hyperparams.Config): - """Configuration for AutoAugment.""" - augmentation_name: str = 'v0' - cutout_const: float = 100 - translate_const: float = 250 - - -@dataclasses.dataclass -class RandomErasing(hyperparams.Config): - """Configuration for RandomErasing.""" - probability: float = 0.25 - min_area: float = 0.02 - max_area: float = 1 / 3 - min_aspect: float = 0.3 - max_aspect = None - min_count = 1 - max_count = 1 - trials = 10 - - -@dataclasses.dataclass -class MixupAndCutmix(hyperparams.Config): - """Configuration for MixupAndCutmix.""" - mixup_alpha: float = .8 - cutmix_alpha: float = 1. - prob: float = 1.0 - switch_prob: float = 0.5 - label_smoothing: float = 0.1 - - -@dataclasses.dataclass -class Augmentation(hyperparams.OneOfConfig): - """Configuration for input data augmentation. - - Attributes: - type: 'str', type of augmentation be used, one of the fields below. - randaug: RandAugment config. - autoaug: AutoAugment config. - """ - type: Optional[str] = None - randaug: RandAugment = RandAugment() - autoaug: AutoAugment = AutoAugment() - - -@dataclasses.dataclass -class NormActivation(hyperparams.Config): - activation: str = 'relu' - use_sync_bn: bool = True - norm_momentum: float = 0.99 - norm_epsilon: float = 0.001 - - -@dataclasses.dataclass -class PseudoLabelDataConfig(cfg.DataConfig): - """Psuedo Label input config for training.""" - input_path: str = '' - data_ratio: float = 1.0 # Per-batch ratio of pseudo-labeled to labeled data. - is_training: bool = True - dtype: str = 'float32' - shuffle_buffer_size: int = 10000 - cycle_length: int = 10 - aug_rand_hflip: bool = True - aug_type: Optional[ - Augmentation] = None # Choose from AutoAugment and RandAugment. - file_type: str = 'tfrecord' - - # Keep for backward compatibility. - aug_policy: Optional[str] = None # None, 'autoaug', or 'randaug'. - randaug_magnitude: Optional[int] = 10 diff --git a/official/vision/beta/configs/decoders.py b/official/vision/beta/configs/decoders.py deleted file mode 100644 index 131be37785f21f1326a7f4f9c8ee2c6bb85f18c1..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/decoders.py +++ /dev/null @@ -1,72 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Decoders configurations.""" -import dataclasses -from typing import List, Optional - -# Import libraries - -from official.modeling import hyperparams - - -@dataclasses.dataclass -class Identity(hyperparams.Config): - """Identity config.""" - pass - - -@dataclasses.dataclass -class FPN(hyperparams.Config): - """FPN config.""" - num_filters: int = 256 - fusion_type: str = 'sum' - use_separable_conv: bool = False - - -@dataclasses.dataclass -class NASFPN(hyperparams.Config): - """NASFPN config.""" - num_filters: int = 256 - num_repeats: int = 5 - use_separable_conv: bool = False - - -@dataclasses.dataclass -class ASPP(hyperparams.Config): - """ASPP config.""" - level: int = 4 - dilation_rates: List[int] = dataclasses.field(default_factory=list) - dropout_rate: float = 0.0 - num_filters: int = 256 - use_depthwise_convolution: bool = False - pool_kernel_size: Optional[List[int]] = None # Use global average pooling. - spp_layer_version: str = 'v1' - output_tensor: bool = False - - -@dataclasses.dataclass -class Decoder(hyperparams.OneOfConfig): - """Configuration for decoders. - - Attributes: - type: 'str', type of decoder be used, one of the fields below. - fpn: fpn config. - """ - type: Optional[str] = None - fpn: FPN = FPN() - nasfpn: NASFPN = NASFPN() - identity: Identity = Identity() - aspp: ASPP = ASPP() diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv3large_tpu.yaml b/official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv3large_tpu.yaml deleted file mode 100644 index a8fb0c95d55f7eec2b45f076881e9beb6ee559dc..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv3large_tpu.yaml +++ /dev/null @@ -1,53 +0,0 @@ -# MobileNetV3-large_1.0 ImageNet classification: 74.96% top-1. -runtime: - distribution_strategy: 'tpu' - mixed_precision_dtype: 'bfloat16' -task: - model: - num_classes: 1001 - input_size: [224, 224, 3] - backbone: - type: 'mobilenet' - mobilenet: - model_id: 'MobileNetV3Large' - filter_size_scale: 1.0 - dropout_rate: 0.2 - losses: - l2_weight_decay: 0.00001 - one_hot: true - label_smoothing: 0.1 - train_data: - input_path: 'imagenet-2012-tfrecord/train*' - is_training: true - global_batch_size: 4096 - dtype: 'bfloat16' - # Enables Inception-style pre-processing. - decode_jpeg_only: false - validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' - is_training: false - global_batch_size: 4096 - dtype: 'bfloat16' - drop_remainder: false - # Enables Inception-style pre-processing. - decode_jpeg_only: false -trainer: - train_steps: 156000 # 500 epochs - validation_steps: 13 - validation_interval: 312 - steps_per_loop: 312 # NUM_EXAMPLES (1281167) // global_batch_size - summary_interval: 312 - checkpoint_interval: 312 - optimizer_config: - learning_rate: - type: 'cosine' - cosine: - alpha: 0.0 - decay_steps: 156000 - initial_learning_rate: 0.5 - name: CosineDecay - offset: 0 - warmup: - type: 'linear' - linear: - warmup_steps: 5000 diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml b/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml deleted file mode 100644 index dd6a4dc1618bfa4dbcd30410966ee365284b7cf8..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml +++ /dev/null @@ -1,48 +0,0 @@ -runtime: - distribution_strategy: 'mirrored' - mixed_precision_dtype: 'float16' - loss_scale: 'dynamic' -task: - model: - num_classes: 1001 - input_size: [224, 224, 3] - backbone: - type: 'resnet' - resnet: - model_id: 50 - losses: - l2_weight_decay: 0.0001 - one_hot: true - label_smoothing: 0.1 - train_data: - input_path: 'imagenet-2012-tfrecord/train*' - is_training: true - global_batch_size: 2048 - dtype: 'float16' - validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' - is_training: false - global_batch_size: 2048 - dtype: 'float16' - drop_remainder: false -trainer: - train_steps: 56160 - validation_steps: 25 - validation_interval: 625 - steps_per_loop: 625 - summary_interval: 625 - checkpoint_interval: 625 - optimizer_config: - optimizer: - type: 'sgd' - sgd: - momentum: 0.9 - learning_rate: - type: 'stepwise' - stepwise: - boundaries: [18750, 37500, 50000] - values: [0.8, 0.08, 0.008, 0.0008] - warmup: - type: 'linear' - linear: - warmup_steps: 3125 diff --git a/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_casrcnn_tpu.yaml b/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_casrcnn_tpu.yaml deleted file mode 100644 index 612608333c3e02652d665335314ea6359cc5267d..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_casrcnn_tpu.yaml +++ /dev/null @@ -1,56 +0,0 @@ -runtime: - distribution_strategy: 'tpu' - mixed_precision_dtype: 'bfloat16' -task: - init_checkpoint: null - train_data: - global_batch_size: 256 - parser: - aug_rand_hflip: true - aug_scale_min: 0.1 - aug_scale_max: 2.0 - losses: - l2_weight_decay: 0.00004 - model: - anchor: - anchor_size: 3.0 - num_scales: 3 - min_level: 3 - max_level: 7 - input_size: [1024, 1024, 3] - backbone: - spinenet: - stochastic_depth_drop_rate: 0.2 - model_id: '96' - type: 'spinenet' - decoder: - type: 'identity' - detection_head: - cascade_class_ensemble: true - class_agnostic_bbox_pred: true - rpn_head: - num_convs: 2 - num_filters: 256 - roi_sampler: - cascade_iou_thresholds: [0.7] - foreground_iou_threshold: 0.6 - norm_activation: - norm_epsilon: 0.001 - norm_momentum: 0.99 - use_sync_bn: true - activation: 'swish' - detection_generator: - pre_nms_top_k: 1000 -trainer: - train_steps: 231000 - optimizer_config: - learning_rate: - type: 'stepwise' - stepwise: - boundaries: [219450, 226380] - values: [0.32, 0.032, 0.0032] - warmup: - type: 'linear' - linear: - warmup_steps: 2000 - warmup_learning_rate: 0.0067 diff --git a/official/vision/beta/configs/image_classification.py b/official/vision/beta/configs/image_classification.py deleted file mode 100644 index be12bff026ab2251b61b25ed46541c4e2e3e3904..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/image_classification.py +++ /dev/null @@ -1,398 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Image classification configuration definition.""" -import dataclasses -import os -from typing import List, Optional - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.configs import backbones - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """Input config for training.""" - input_path: str = '' - global_batch_size: int = 0 - is_training: bool = True - dtype: str = 'float32' - shuffle_buffer_size: int = 10000 - cycle_length: int = 10 - is_multilabel: bool = False - aug_rand_hflip: bool = True - aug_type: Optional[ - common.Augmentation] = None # Choose from AutoAugment and RandAugment. - color_jitter: float = 0. - random_erasing: Optional[common.RandomErasing] = None - file_type: str = 'tfrecord' - image_field_key: str = 'image/encoded' - label_field_key: str = 'image/class/label' - decode_jpeg_only: bool = True - mixup_and_cutmix: Optional[common.MixupAndCutmix] = None - decoder: Optional[common.DataDecoder] = common.DataDecoder() - - # Keep for backward compatibility. - aug_policy: Optional[str] = None # None, 'autoaug', or 'randaug'. - randaug_magnitude: Optional[int] = 10 - - -@dataclasses.dataclass -class ImageClassificationModel(hyperparams.Config): - """The model config.""" - num_classes: int = 0 - input_size: List[int] = dataclasses.field(default_factory=list) - backbone: backbones.Backbone = backbones.Backbone( - type='resnet', resnet=backbones.ResNet()) - dropout_rate: float = 0.0 - norm_activation: common.NormActivation = common.NormActivation( - use_sync_bn=False) - # Adds a BatchNormalization layer pre-GlobalAveragePooling in classification - add_head_batch_norm: bool = False - kernel_initializer: str = 'random_uniform' - - -@dataclasses.dataclass -class Losses(hyperparams.Config): - loss_weight: float = 1.0 - one_hot: bool = True - label_smoothing: float = 0.0 - l2_weight_decay: float = 0.0 - soft_labels: bool = False - - -@dataclasses.dataclass -class Evaluation(hyperparams.Config): - top_k: int = 5 - - -@dataclasses.dataclass -class ImageClassificationTask(cfg.TaskConfig): - """The task config.""" - model: ImageClassificationModel = ImageClassificationModel() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False) - losses: Losses = Losses() - evaluation: Evaluation = Evaluation() - init_checkpoint: Optional[str] = None - init_checkpoint_modules: str = 'all' # all or backbone - model_output_keys: Optional[List[int]] = dataclasses.field( - default_factory=list) - - -@exp_factory.register_config_factory('image_classification') -def image_classification() -> cfg.ExperimentConfig: - """Image classification general.""" - return cfg.ExperimentConfig( - task=ImageClassificationTask(), - trainer=cfg.TrainerConfig(), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - -IMAGENET_TRAIN_EXAMPLES = 1281167 -IMAGENET_VAL_EXAMPLES = 50000 -IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord' - - -@exp_factory.register_config_factory('resnet_imagenet') -def image_classification_imagenet() -> cfg.ExperimentConfig: - """Image classification on imagenet with resnet.""" - train_batch_size = 4096 - eval_batch_size = 4096 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(enable_xla=True), - task=ImageClassificationTask( - model=ImageClassificationModel( - num_classes=1001, - input_size=[224, 224, 3], - backbone=backbones.Backbone( - type='resnet', resnet=backbones.ResNet(model_id=50)), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), - losses=Losses(l2_weight_decay=1e-4), - train_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size), - validation_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=90 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - 30 * steps_per_epoch, 60 * steps_per_epoch, - 80 * steps_per_epoch - ], - 'values': [ - 0.1 * train_batch_size / 256, - 0.01 * train_batch_size / 256, - 0.001 * train_batch_size / 256, - 0.0001 * train_batch_size / 256, - ] - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('resnet_rs_imagenet') -def image_classification_imagenet_resnetrs() -> cfg.ExperimentConfig: - """Image classification on imagenet with resnet-rs.""" - train_batch_size = 4096 - eval_batch_size = 4096 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - config = cfg.ExperimentConfig( - task=ImageClassificationTask( - model=ImageClassificationModel( - num_classes=1001, - input_size=[160, 160, 3], - backbone=backbones.Backbone( - type='resnet', - resnet=backbones.ResNet( - model_id=50, - stem_type='v1', - resnetd_shortcut=True, - replace_stem_max_pool=True, - se_ratio=0.25, - stochastic_depth_drop_rate=0.0)), - dropout_rate=0.25, - norm_activation=common.NormActivation( - norm_momentum=0.0, - norm_epsilon=1e-5, - use_sync_bn=False, - activation='swish')), - losses=Losses(l2_weight_decay=4e-5, label_smoothing=0.1), - train_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - aug_type=common.Augmentation( - type='randaug', randaug=common.RandAugment(magnitude=10))), - validation_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=350 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'ema': { - 'average_decay': 0.9999, - 'trainable_weights_only': False, - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - 'initial_learning_rate': 1.6, - 'decay_steps': 350 * steps_per_epoch - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - return config - - -@exp_factory.register_config_factory('revnet_imagenet') -def image_classification_imagenet_revnet() -> cfg.ExperimentConfig: - """Returns a revnet config for image classification on imagenet.""" - train_batch_size = 4096 - eval_batch_size = 4096 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - - config = cfg.ExperimentConfig( - task=ImageClassificationTask( - model=ImageClassificationModel( - num_classes=1001, - input_size=[224, 224, 3], - backbone=backbones.Backbone( - type='revnet', revnet=backbones.RevNet(model_id=56)), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False), - add_head_batch_norm=True), - losses=Losses(l2_weight_decay=1e-4), - train_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size), - validation_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=90 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - 30 * steps_per_epoch, 60 * steps_per_epoch, - 80 * steps_per_epoch - ], - 'values': [0.8, 0.08, 0.008, 0.0008] - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('mobilenet_imagenet') -def image_classification_imagenet_mobilenet() -> cfg.ExperimentConfig: - """Image classification on imagenet with mobilenet.""" - train_batch_size = 4096 - eval_batch_size = 4096 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - config = cfg.ExperimentConfig( - task=ImageClassificationTask( - model=ImageClassificationModel( - num_classes=1001, - dropout_rate=0.2, - input_size=[224, 224, 3], - backbone=backbones.Backbone( - type='mobilenet', - mobilenet=backbones.MobileNet( - model_id='MobileNetV2', filter_size_scale=1.0)), - norm_activation=common.NormActivation( - norm_momentum=0.997, norm_epsilon=1e-3, use_sync_bn=False)), - losses=Losses(l2_weight_decay=1e-5, label_smoothing=0.1), - train_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size), - validation_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=500 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'rmsprop', - 'rmsprop': { - 'rho': 0.9, - 'momentum': 0.9, - 'epsilon': 0.002, - } - }, - 'learning_rate': { - 'type': 'exponential', - 'exponential': { - 'initial_learning_rate': - 0.008 * (train_batch_size // 128), - 'decay_steps': - int(2.5 * steps_per_epoch), - 'decay_rate': - 0.98, - 'staircase': - True - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - }, - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config diff --git a/official/vision/beta/configs/image_classification_test.py b/official/vision/beta/configs/image_classification_test.py deleted file mode 100644 index 81109dc4ee0bbb895afc811bc5d0bea431ab5535..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/image_classification_test.py +++ /dev/null @@ -1,49 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for image_classification.""" -# pylint: disable=unused-import -from absl.testing import parameterized -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.vision import beta -from official.vision.beta.configs import image_classification as exp_cfg - - -class ImageClassificationConfigTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters( - ('resnet_imagenet',), - ('resnet_rs_imagenet',), - ('revnet_imagenet',), - ('mobilenet_imagenet'), - ) - def test_image_classification_configs(self, config_name): - config = exp_factory.get_exp_config(config_name) - self.assertIsInstance(config, cfg.ExperimentConfig) - self.assertIsInstance(config.task, exp_cfg.ImageClassificationTask) - self.assertIsInstance(config.task.model, - exp_cfg.ImageClassificationModel) - self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) - config.validate() - config.task.train_data.is_training = None - with self.assertRaises(KeyError): - config.validate() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/configs/maskrcnn.py b/official/vision/beta/configs/maskrcnn.py deleted file mode 100644 index 3cb491d01dc4c5268d3dcd83f8a894916f6c96f9..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/maskrcnn.py +++ /dev/null @@ -1,523 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""R-CNN(-RS) configuration definition.""" - -import dataclasses -import os -from typing import List, Optional, Union - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.configs import decoders -from official.vision.beta.configs import backbones - - -# pylint: disable=missing-class-docstring -@dataclasses.dataclass -class Parser(hyperparams.Config): - num_channels: int = 3 - match_threshold: float = 0.5 - unmatched_threshold: float = 0.5 - aug_rand_hflip: bool = False - aug_scale_min: float = 1.0 - aug_scale_max: float = 1.0 - skip_crowd_during_training: bool = True - max_num_instances: int = 100 - rpn_match_threshold: float = 0.7 - rpn_unmatched_threshold: float = 0.3 - rpn_batch_size_per_im: int = 256 - rpn_fg_fraction: float = 0.5 - mask_crop_size: int = 112 - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """Input config for training.""" - input_path: str = '' - global_batch_size: int = 0 - is_training: bool = False - dtype: str = 'bfloat16' - decoder: common.DataDecoder = common.DataDecoder() - parser: Parser = Parser() - shuffle_buffer_size: int = 10000 - file_type: str = 'tfrecord' - drop_remainder: bool = True - # Number of examples in the data set, it's used to create the annotation file. - num_examples: int = -1 - - -@dataclasses.dataclass -class Anchor(hyperparams.Config): - num_scales: int = 1 - aspect_ratios: List[float] = dataclasses.field( - default_factory=lambda: [0.5, 1.0, 2.0]) - anchor_size: float = 8.0 - - -@dataclasses.dataclass -class RPNHead(hyperparams.Config): - num_convs: int = 1 - num_filters: int = 256 - use_separable_conv: bool = False - - -@dataclasses.dataclass -class DetectionHead(hyperparams.Config): - num_convs: int = 4 - num_filters: int = 256 - use_separable_conv: bool = False - num_fcs: int = 1 - fc_dims: int = 1024 - class_agnostic_bbox_pred: bool = False # Has to be True for Cascade RCNN. - # If additional IoUs are passed in 'cascade_iou_thresholds' - # then ensemble the class probabilities from all heads. - cascade_class_ensemble: bool = False - - -@dataclasses.dataclass -class ROIGenerator(hyperparams.Config): - pre_nms_top_k: int = 2000 - pre_nms_score_threshold: float = 0.0 - pre_nms_min_size_threshold: float = 0.0 - nms_iou_threshold: float = 0.7 - num_proposals: int = 1000 - test_pre_nms_top_k: int = 1000 - test_pre_nms_score_threshold: float = 0.0 - test_pre_nms_min_size_threshold: float = 0.0 - test_nms_iou_threshold: float = 0.7 - test_num_proposals: int = 1000 - use_batched_nms: bool = False - - -@dataclasses.dataclass -class ROISampler(hyperparams.Config): - mix_gt_boxes: bool = True - num_sampled_rois: int = 512 - foreground_fraction: float = 0.25 - foreground_iou_threshold: float = 0.5 - background_iou_high_threshold: float = 0.5 - background_iou_low_threshold: float = 0.0 - # IoU thresholds for additional FRCNN heads in Cascade mode. - # `foreground_iou_threshold` is the first threshold. - cascade_iou_thresholds: Optional[List[float]] = None - - -@dataclasses.dataclass -class ROIAligner(hyperparams.Config): - crop_size: int = 7 - sample_offset: float = 0.5 - - -@dataclasses.dataclass -class DetectionGenerator(hyperparams.Config): - apply_nms: bool = True - pre_nms_top_k: int = 5000 - pre_nms_score_threshold: float = 0.05 - nms_iou_threshold: float = 0.5 - max_num_detections: int = 100 - nms_version: str = 'v2' # `v2`, `v1`, `batched` - use_cpu_nms: bool = False - soft_nms_sigma: Optional[float] = None # Only works when nms_version='v1'. - - -@dataclasses.dataclass -class MaskHead(hyperparams.Config): - upsample_factor: int = 2 - num_convs: int = 4 - num_filters: int = 256 - use_separable_conv: bool = False - class_agnostic: bool = False - - -@dataclasses.dataclass -class MaskSampler(hyperparams.Config): - num_sampled_masks: int = 128 - - -@dataclasses.dataclass -class MaskROIAligner(hyperparams.Config): - crop_size: int = 14 - sample_offset: float = 0.5 - - -@dataclasses.dataclass -class MaskRCNN(hyperparams.Config): - num_classes: int = 0 - input_size: List[int] = dataclasses.field(default_factory=list) - min_level: int = 2 - max_level: int = 6 - anchor: Anchor = Anchor() - include_mask: bool = True - backbone: backbones.Backbone = backbones.Backbone( - type='resnet', resnet=backbones.ResNet()) - decoder: decoders.Decoder = decoders.Decoder( - type='fpn', fpn=decoders.FPN()) - rpn_head: RPNHead = RPNHead() - detection_head: DetectionHead = DetectionHead() - roi_generator: ROIGenerator = ROIGenerator() - roi_sampler: ROISampler = ROISampler() - roi_aligner: ROIAligner = ROIAligner() - detection_generator: DetectionGenerator = DetectionGenerator() - mask_head: Optional[MaskHead] = MaskHead() - mask_sampler: Optional[MaskSampler] = MaskSampler() - mask_roi_aligner: Optional[MaskROIAligner] = MaskROIAligner() - norm_activation: common.NormActivation = common.NormActivation( - norm_momentum=0.997, - norm_epsilon=0.0001, - use_sync_bn=True) - - -@dataclasses.dataclass -class Losses(hyperparams.Config): - loss_weight: float = 1.0 - rpn_huber_loss_delta: float = 1. / 9. - frcnn_huber_loss_delta: float = 1. - l2_weight_decay: float = 0.0 - rpn_score_weight: float = 1.0 - rpn_box_weight: float = 1.0 - frcnn_class_weight: float = 1.0 - frcnn_box_weight: float = 1.0 - mask_weight: float = 1.0 - - -@dataclasses.dataclass -class MaskRCNNTask(cfg.TaskConfig): - model: MaskRCNN = MaskRCNN() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False, - drop_remainder=False) - losses: Losses = Losses() - init_checkpoint: Optional[str] = None - init_checkpoint_modules: Union[ - str, List[str]] = 'all' # all, backbone, and/or decoder - annotation_file: Optional[str] = None - per_category_metrics: bool = False - # If set, we only use masks for the specified class IDs. - allowed_mask_class_ids: Optional[List[int]] = None - # If set, the COCO metrics will be computed. - use_coco_metrics: bool = True - # If set, the Waymo Open Dataset evaluator would be used. - use_wod_metrics: bool = False - - -COCO_INPUT_PATH_BASE = 'coco' - - -@exp_factory.register_config_factory('fasterrcnn_resnetfpn_coco') -def fasterrcnn_resnetfpn_coco() -> cfg.ExperimentConfig: - """COCO object detection with Faster R-CNN.""" - steps_per_epoch = 500 - coco_val_samples = 5000 - train_batch_size = 64 - eval_batch_size = 8 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=MaskRCNNTask( - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', - init_checkpoint_modules='backbone', - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=MaskRCNN( - num_classes=91, - input_size=[1024, 1024, 3], - include_mask=False, - mask_head=None, - mask_sampler=None, - mask_roi_aligner=None), - losses=Losses(l2_weight_decay=0.00004), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - drop_remainder=False)), - trainer=cfg.TrainerConfig( - train_steps=22500, - validation_steps=coco_val_samples // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [15000, 20000], - 'values': [0.12, 0.012, 0.0012], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 500, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - return config - - -@exp_factory.register_config_factory('maskrcnn_resnetfpn_coco') -def maskrcnn_resnetfpn_coco() -> cfg.ExperimentConfig: - """COCO object detection with Mask R-CNN.""" - steps_per_epoch = 500 - coco_val_samples = 5000 - train_batch_size = 64 - eval_batch_size = 8 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig( - mixed_precision_dtype='bfloat16', enable_xla=True), - task=MaskRCNNTask( - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', - init_checkpoint_modules='backbone', - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=MaskRCNN( - num_classes=91, input_size=[1024, 1024, 3], include_mask=True), - losses=Losses(l2_weight_decay=0.00004), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - drop_remainder=False)), - trainer=cfg.TrainerConfig( - train_steps=22500, - validation_steps=coco_val_samples // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [15000, 20000], - 'values': [0.12, 0.012, 0.0012], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 500, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - return config - - -@exp_factory.register_config_factory('maskrcnn_spinenet_coco') -def maskrcnn_spinenet_coco() -> cfg.ExperimentConfig: - """COCO object detection with Mask R-CNN with SpineNet backbone.""" - steps_per_epoch = 463 - coco_val_samples = 5000 - train_batch_size = 256 - eval_batch_size = 8 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=MaskRCNNTask( - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=MaskRCNN( - backbone=backbones.Backbone( - type='spinenet', - spinenet=backbones.SpineNet( - model_id='49', - min_level=3, - max_level=7, - )), - decoder=decoders.Decoder( - type='identity', identity=decoders.Identity()), - anchor=Anchor(anchor_size=3), - norm_activation=common.NormActivation(use_sync_bn=True), - num_classes=91, - input_size=[640, 640, 3], - min_level=3, - max_level=7, - include_mask=True), - losses=Losses(l2_weight_decay=0.00004), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - drop_remainder=False)), - trainer=cfg.TrainerConfig( - train_steps=steps_per_epoch * 350, - validation_steps=coco_val_samples // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - steps_per_epoch * 320, steps_per_epoch * 340 - ], - 'values': [0.32, 0.032, 0.0032], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 2000, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.model.min_level == task.model.backbone.spinenet.min_level', - 'task.model.max_level == task.model.backbone.spinenet.max_level', - ]) - return config - - -@exp_factory.register_config_factory('cascadercnn_spinenet_coco') -def cascadercnn_spinenet_coco() -> cfg.ExperimentConfig: - """COCO object detection with Cascade RCNN-RS with SpineNet backbone.""" - steps_per_epoch = 463 - coco_val_samples = 5000 - train_batch_size = 256 - eval_batch_size = 8 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=MaskRCNNTask( - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=MaskRCNN( - backbone=backbones.Backbone( - type='spinenet', - spinenet=backbones.SpineNet( - model_id='49', - min_level=3, - max_level=7, - )), - decoder=decoders.Decoder( - type='identity', identity=decoders.Identity()), - roi_sampler=ROISampler(cascade_iou_thresholds=[0.6, 0.7]), - detection_head=DetectionHead( - class_agnostic_bbox_pred=True, cascade_class_ensemble=True), - anchor=Anchor(anchor_size=3), - norm_activation=common.NormActivation( - use_sync_bn=True, activation='swish'), - num_classes=91, - input_size=[640, 640, 3], - min_level=3, - max_level=7, - include_mask=True), - losses=Losses(l2_weight_decay=0.00004), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.1, aug_scale_max=2.5)), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - drop_remainder=False)), - trainer=cfg.TrainerConfig( - train_steps=steps_per_epoch * 500, - validation_steps=coco_val_samples // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - steps_per_epoch * 475, steps_per_epoch * 490 - ], - 'values': [0.32, 0.032, 0.0032], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 2000, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.model.min_level == task.model.backbone.spinenet.min_level', - 'task.model.max_level == task.model.backbone.spinenet.max_level', - ]) - return config diff --git a/official/vision/beta/configs/retinanet.py b/official/vision/beta/configs/retinanet.py deleted file mode 100644 index e0c793496278c0a6f44db207becb71ee4d76d8e0..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/retinanet.py +++ /dev/null @@ -1,422 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""RetinaNet configuration definition.""" - -import dataclasses -import os -from typing import List, Optional, Union - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.configs import decoders -from official.vision.beta.configs import backbones - - -# pylint: disable=missing-class-docstring -# Keep for backward compatibility. -@dataclasses.dataclass -class TfExampleDecoder(common.TfExampleDecoder): - """A simple TF Example decoder config.""" - - -# Keep for backward compatibility. -@dataclasses.dataclass -class TfExampleDecoderLabelMap(common.TfExampleDecoderLabelMap): - """TF Example decoder with label map config.""" - - -# Keep for backward compatibility. -@dataclasses.dataclass -class DataDecoder(common.DataDecoder): - """Data decoder config.""" - - -@dataclasses.dataclass -class Parser(hyperparams.Config): - num_channels: int = 3 - match_threshold: float = 0.5 - unmatched_threshold: float = 0.5 - aug_rand_hflip: bool = False - aug_scale_min: float = 1.0 - aug_scale_max: float = 1.0 - skip_crowd_during_training: bool = True - max_num_instances: int = 100 - # Can choose AutoAugment and RandAugment. - # TODO(b/205346436) Support RandAugment. - aug_type: Optional[common.Augmentation] = None - - # Keep for backward compatibility. Not used. - aug_policy: Optional[str] = None - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """Input config for training.""" - input_path: str = '' - global_batch_size: int = 0 - is_training: bool = False - dtype: str = 'bfloat16' - decoder: common.DataDecoder = common.DataDecoder() - parser: Parser = Parser() - shuffle_buffer_size: int = 10000 - file_type: str = 'tfrecord' - - -@dataclasses.dataclass -class Anchor(hyperparams.Config): - num_scales: int = 3 - aspect_ratios: List[float] = dataclasses.field( - default_factory=lambda: [0.5, 1.0, 2.0]) - anchor_size: float = 4.0 - - -@dataclasses.dataclass -class Losses(hyperparams.Config): - loss_weight: float = 1.0 - focal_loss_alpha: float = 0.25 - focal_loss_gamma: float = 1.5 - huber_loss_delta: float = 0.1 - box_loss_weight: int = 50 - l2_weight_decay: float = 0.0 - - -@dataclasses.dataclass -class AttributeHead(hyperparams.Config): - name: str = '' - type: str = 'regression' - size: int = 1 - - -@dataclasses.dataclass -class RetinaNetHead(hyperparams.Config): - num_convs: int = 4 - num_filters: int = 256 - use_separable_conv: bool = False - attribute_heads: List[AttributeHead] = dataclasses.field(default_factory=list) - - -@dataclasses.dataclass -class DetectionGenerator(hyperparams.Config): - apply_nms: bool = True - pre_nms_top_k: int = 5000 - pre_nms_score_threshold: float = 0.05 - nms_iou_threshold: float = 0.5 - max_num_detections: int = 100 - nms_version: str = 'v2' # `v2`, `v1`, `batched`. - use_cpu_nms: bool = False - soft_nms_sigma: Optional[float] = None # Only works when nms_version='v1'. - - -@dataclasses.dataclass -class RetinaNet(hyperparams.Config): - num_classes: int = 0 - input_size: List[int] = dataclasses.field(default_factory=list) - min_level: int = 3 - max_level: int = 7 - anchor: Anchor = Anchor() - backbone: backbones.Backbone = backbones.Backbone( - type='resnet', resnet=backbones.ResNet()) - decoder: decoders.Decoder = decoders.Decoder( - type='fpn', fpn=decoders.FPN()) - head: RetinaNetHead = RetinaNetHead() - detection_generator: DetectionGenerator = DetectionGenerator() - norm_activation: common.NormActivation = common.NormActivation() - - -@dataclasses.dataclass -class ExportConfig(hyperparams.Config): - output_normalized_coordinates: bool = False - cast_num_detections_to_float: bool = False - cast_detection_classes_to_float: bool = False - - -@dataclasses.dataclass -class RetinaNetTask(cfg.TaskConfig): - model: RetinaNet = RetinaNet() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False) - losses: Losses = Losses() - init_checkpoint: Optional[str] = None - init_checkpoint_modules: Union[ - str, List[str]] = 'all' # all, backbone, and/or decoder - annotation_file: Optional[str] = None - per_category_metrics: bool = False - export_config: ExportConfig = ExportConfig() - - -@exp_factory.register_config_factory('retinanet') -def retinanet() -> cfg.ExperimentConfig: - """RetinaNet general config.""" - return cfg.ExperimentConfig( - task=RetinaNetTask(), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - -COCO_INPUT_PATH_BASE = 'coco' -COCO_TRAIN_EXAMPLES = 118287 -COCO_VAL_EXAMPLES = 5000 - - -@exp_factory.register_config_factory('retinanet_resnetfpn_coco') -def retinanet_resnetfpn_coco() -> cfg.ExperimentConfig: - """COCO object detection with RetinaNet.""" - train_batch_size = 256 - eval_batch_size = 8 - steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=RetinaNetTask( - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', - init_checkpoint_modules='backbone', - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=RetinaNet( - num_classes=91, - input_size=[640, 640, 3], - norm_activation=common.NormActivation(use_sync_bn=False), - min_level=3, - max_level=7), - losses=Losses(l2_weight_decay=1e-4), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.2)), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - train_steps=72 * steps_per_epoch, - validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - 57 * steps_per_epoch, 67 * steps_per_epoch - ], - 'values': [ - 0.32 * train_batch_size / 256.0, - 0.032 * train_batch_size / 256.0, - 0.0032 * train_batch_size / 256.0 - ], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 500, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('retinanet_spinenet_coco') -def retinanet_spinenet_coco() -> cfg.ExperimentConfig: - """COCO object detection with RetinaNet using SpineNet backbone.""" - train_batch_size = 256 - eval_batch_size = 8 - steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size - input_size = 640 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='float32'), - task=RetinaNetTask( - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=RetinaNet( - backbone=backbones.Backbone( - type='spinenet', - spinenet=backbones.SpineNet( - model_id='49', - stochastic_depth_drop_rate=0.2, - min_level=3, - max_level=7)), - decoder=decoders.Decoder( - type='identity', identity=decoders.Identity()), - anchor=Anchor(anchor_size=3), - norm_activation=common.NormActivation( - use_sync_bn=True, activation='swish'), - num_classes=91, - input_size=[input_size, input_size, 3], - min_level=3, - max_level=7), - losses=Losses(l2_weight_decay=4e-5), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.1, aug_scale_max=2.0)), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - train_steps=500 * steps_per_epoch, - validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - 475 * steps_per_epoch, 490 * steps_per_epoch - ], - 'values': [ - 0.32 * train_batch_size / 256.0, - 0.032 * train_batch_size / 256.0, - 0.0032 * train_batch_size / 256.0 - ], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 2000, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.model.min_level == task.model.backbone.spinenet.min_level', - 'task.model.max_level == task.model.backbone.spinenet.max_level', - ]) - - return config - - -@exp_factory.register_config_factory('retinanet_mobile_coco') -def retinanet_spinenet_mobile_coco() -> cfg.ExperimentConfig: - """COCO object detection with mobile RetinaNet.""" - train_batch_size = 256 - eval_batch_size = 8 - steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size - input_size = 384 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='float32'), - task=RetinaNetTask( - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=RetinaNet( - backbone=backbones.Backbone( - type='spinenet_mobile', - spinenet_mobile=backbones.SpineNetMobile( - model_id='49', - stochastic_depth_drop_rate=0.2, - min_level=3, - max_level=7, - use_keras_upsampling_2d=False)), - decoder=decoders.Decoder( - type='identity', identity=decoders.Identity()), - head=RetinaNetHead(num_filters=48, use_separable_conv=True), - anchor=Anchor(anchor_size=3), - norm_activation=common.NormActivation( - use_sync_bn=True, activation='swish'), - num_classes=91, - input_size=[input_size, input_size, 3], - min_level=3, - max_level=7), - losses=Losses(l2_weight_decay=3e-5), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.1, aug_scale_max=2.0)), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - train_steps=600 * steps_per_epoch, - validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - 575 * steps_per_epoch, 590 * steps_per_epoch - ], - 'values': [ - 0.32 * train_batch_size / 256.0, - 0.032 * train_batch_size / 256.0, - 0.0032 * train_batch_size / 256.0 - ], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 2000, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - ]) - - return config diff --git a/official/vision/beta/configs/retinanet_test.py b/official/vision/beta/configs/retinanet_test.py deleted file mode 100644 index bc860088a5d7a9a64115cebb6cf37bb238737534..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/retinanet_test.py +++ /dev/null @@ -1,46 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for retinanet.""" -# pylint: disable=unused-import -from absl.testing import parameterized -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.vision import beta -from official.vision.beta.configs import retinanet as exp_cfg - - -class RetinaNetConfigTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters( - ('retinanet_resnetfpn_coco',), - ('retinanet_spinenet_coco',), - ('retinanet_mobile_coco',), - ) - def test_retinanet_configs(self, config_name): - config = exp_factory.get_exp_config(config_name) - self.assertIsInstance(config, cfg.ExperimentConfig) - self.assertIsInstance(config.task, exp_cfg.RetinaNetTask) - self.assertIsInstance(config.task.model, exp_cfg.RetinaNet) - self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) - config.validate() - config.task.train_data.is_training = None - with self.assertRaisesRegex(KeyError, 'Found inconsistncy between key'): - config.validate() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/configs/semantic_segmentation.py b/official/vision/beta/configs/semantic_segmentation.py deleted file mode 100644 index 0543fcc13d2d2e891561ad46ddbefb10e7e60e39..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/semantic_segmentation.py +++ /dev/null @@ -1,713 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Semantic segmentation configuration definition.""" -import dataclasses -import os -from typing import List, Optional, Union - -import numpy as np -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.configs import decoders -from official.vision.beta.configs import backbones - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """Input config for training.""" - output_size: List[int] = dataclasses.field(default_factory=list) - # If crop_size is specified, image will be resized first to - # output_size, then crop of size crop_size will be cropped. - crop_size: List[int] = dataclasses.field(default_factory=list) - input_path: str = '' - global_batch_size: int = 0 - is_training: bool = True - dtype: str = 'float32' - shuffle_buffer_size: int = 1000 - cycle_length: int = 10 - # If resize_eval_groundtruth is set to False, original image sizes are used - # for eval. In that case, groundtruth_padded_size has to be specified too to - # allow for batching the variable input sizes of images. - resize_eval_groundtruth: bool = True - groundtruth_padded_size: List[int] = dataclasses.field(default_factory=list) - aug_scale_min: float = 1.0 - aug_scale_max: float = 1.0 - aug_rand_hflip: bool = True - preserve_aspect_ratio: bool = True - aug_policy: Optional[str] = None - drop_remainder: bool = True - file_type: str = 'tfrecord' - decoder: Optional[common.DataDecoder] = common.DataDecoder() - - -@dataclasses.dataclass -class SegmentationHead(hyperparams.Config): - """Segmentation head config.""" - level: int = 3 - num_convs: int = 2 - num_filters: int = 256 - use_depthwise_convolution: bool = False - prediction_kernel_size: int = 1 - upsample_factor: int = 1 - feature_fusion: Optional[ - str] = None # None, deeplabv3plus, panoptic_fpn_fusion or pyramid_fusion - # deeplabv3plus feature fusion params - low_level: Union[int, str] = 2 - low_level_num_filters: int = 48 - # panoptic_fpn_fusion params - decoder_min_level: Optional[Union[int, str]] = None - decoder_max_level: Optional[Union[int, str]] = None - - -@dataclasses.dataclass -class MaskScoringHead(hyperparams.Config): - """Mask Scoring head config.""" - num_convs: int = 4 - num_filters: int = 128 - fc_input_size: List[int] = dataclasses.field(default_factory=list) - num_fcs: int = 2 - fc_dims: int = 1024 - - -@dataclasses.dataclass -class SemanticSegmentationModel(hyperparams.Config): - """Semantic segmentation model config.""" - num_classes: int = 0 - input_size: List[int] = dataclasses.field(default_factory=list) - min_level: int = 3 - max_level: int = 6 - head: SegmentationHead = SegmentationHead() - backbone: backbones.Backbone = backbones.Backbone( - type='resnet', resnet=backbones.ResNet()) - decoder: decoders.Decoder = decoders.Decoder(type='identity') - mask_scoring_head: Optional[MaskScoringHead] = None - norm_activation: common.NormActivation = common.NormActivation() - - -@dataclasses.dataclass -class Losses(hyperparams.Config): - loss_weight: float = 1.0 - label_smoothing: float = 0.0 - ignore_label: int = 255 - class_weights: List[float] = dataclasses.field(default_factory=list) - l2_weight_decay: float = 0.0 - use_groundtruth_dimension: bool = True - top_k_percent_pixels: float = 1.0 - - -@dataclasses.dataclass -class Evaluation(hyperparams.Config): - report_per_class_iou: bool = True - report_train_mean_iou: bool = True # Turning this off can speed up training. - - -@dataclasses.dataclass -class SemanticSegmentationTask(cfg.TaskConfig): - """The model config.""" - model: SemanticSegmentationModel = SemanticSegmentationModel() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False) - losses: Losses = Losses() - evaluation: Evaluation = Evaluation() - train_input_partition_dims: List[int] = dataclasses.field( - default_factory=list) - eval_input_partition_dims: List[int] = dataclasses.field( - default_factory=list) - init_checkpoint: Optional[str] = None - init_checkpoint_modules: Union[ - str, List[str]] = 'all' # all, backbone, and/or decoder - - -@exp_factory.register_config_factory('semantic_segmentation') -def semantic_segmentation() -> cfg.ExperimentConfig: - """Semantic segmentation general.""" - return cfg.ExperimentConfig( - task=SemanticSegmentationTask(), - trainer=cfg.TrainerConfig(), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - -# PASCAL VOC 2012 Dataset -PASCAL_TRAIN_EXAMPLES = 10582 -PASCAL_VAL_EXAMPLES = 1449 -PASCAL_INPUT_PATH_BASE = 'pascal_voc_seg' - - -@exp_factory.register_config_factory('seg_deeplabv3_pascal') -def seg_deeplabv3_pascal() -> cfg.ExperimentConfig: - """Image segmentation on pascal voc with resnet deeplabv3.""" - train_batch_size = 16 - eval_batch_size = 8 - steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size - output_stride = 16 - aspp_dilation_rates = [12, 24, 36] # [6, 12, 18] if output_stride = 16 - multigrid = [1, 2, 4] - stem_type = 'v1' - level = int(np.math.log2(output_stride)) - config = cfg.ExperimentConfig( - task=SemanticSegmentationTask( - model=SemanticSegmentationModel( - num_classes=21, - input_size=[None, None, 3], - backbone=backbones.Backbone( - type='dilated_resnet', dilated_resnet=backbones.DilatedResNet( - model_id=101, output_stride=output_stride, - multigrid=multigrid, stem_type=stem_type)), - decoder=decoders.Decoder( - type='aspp', aspp=decoders.ASPP( - level=level, dilation_rates=aspp_dilation_rates)), - head=SegmentationHead(level=level, num_convs=0), - norm_activation=common.NormActivation( - activation='swish', - norm_momentum=0.9997, - norm_epsilon=1e-3, - use_sync_bn=True)), - losses=Losses(l2_weight_decay=1e-4), - train_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), - # TODO(arashwan): test changing size to 513 to match deeplab. - output_size=[512, 512], - is_training=True, - global_batch_size=train_batch_size, - aug_scale_min=0.5, - aug_scale_max=2.0), - validation_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), - output_size=[512, 512], - is_training=False, - global_batch_size=eval_batch_size, - resize_eval_groundtruth=False, - groundtruth_padded_size=[512, 512], - drop_remainder=False), - # resnet101 - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/deeplab/deeplab_resnet101_imagenet/ckpt-62400', - init_checkpoint_modules='backbone'), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=45 * steps_per_epoch, - validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 0.007, - 'decay_steps': 45 * steps_per_epoch, - 'end_learning_rate': 0.0, - 'power': 0.9 - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('seg_deeplabv3plus_pascal') -def seg_deeplabv3plus_pascal() -> cfg.ExperimentConfig: - """Image segmentation on pascal voc with resnet deeplabv3+.""" - train_batch_size = 16 - eval_batch_size = 8 - steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size - output_stride = 16 - aspp_dilation_rates = [6, 12, 18] - multigrid = [1, 2, 4] - stem_type = 'v1' - level = int(np.math.log2(output_stride)) - config = cfg.ExperimentConfig( - task=SemanticSegmentationTask( - model=SemanticSegmentationModel( - num_classes=21, - input_size=[None, None, 3], - backbone=backbones.Backbone( - type='dilated_resnet', dilated_resnet=backbones.DilatedResNet( - model_id=101, output_stride=output_stride, - stem_type=stem_type, multigrid=multigrid)), - decoder=decoders.Decoder( - type='aspp', - aspp=decoders.ASPP( - level=level, dilation_rates=aspp_dilation_rates)), - head=SegmentationHead( - level=level, - num_convs=2, - feature_fusion='deeplabv3plus', - low_level=2, - low_level_num_filters=48), - norm_activation=common.NormActivation( - activation='swish', - norm_momentum=0.9997, - norm_epsilon=1e-3, - use_sync_bn=True)), - losses=Losses(l2_weight_decay=1e-4), - train_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), - output_size=[512, 512], - is_training=True, - global_batch_size=train_batch_size, - aug_scale_min=0.5, - aug_scale_max=2.0), - validation_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), - output_size=[512, 512], - is_training=False, - global_batch_size=eval_batch_size, - resize_eval_groundtruth=False, - groundtruth_padded_size=[512, 512], - drop_remainder=False), - # resnet101 - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/deeplab/deeplab_resnet101_imagenet/ckpt-62400', - init_checkpoint_modules='backbone'), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=45 * steps_per_epoch, - validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 0.007, - 'decay_steps': 45 * steps_per_epoch, - 'end_learning_rate': 0.0, - 'power': 0.9 - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('seg_resnetfpn_pascal') -def seg_resnetfpn_pascal() -> cfg.ExperimentConfig: - """Image segmentation on pascal voc with resnet-fpn.""" - train_batch_size = 256 - eval_batch_size = 32 - steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size - config = cfg.ExperimentConfig( - task=SemanticSegmentationTask( - model=SemanticSegmentationModel( - num_classes=21, - input_size=[512, 512, 3], - min_level=3, - max_level=7, - backbone=backbones.Backbone( - type='resnet', resnet=backbones.ResNet(model_id=50)), - decoder=decoders.Decoder(type='fpn', fpn=decoders.FPN()), - head=SegmentationHead(level=3, num_convs=3), - norm_activation=common.NormActivation( - activation='swish', - use_sync_bn=True)), - losses=Losses(l2_weight_decay=1e-4), - train_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), - is_training=True, - global_batch_size=train_batch_size, - aug_scale_min=0.2, - aug_scale_max=1.5), - validation_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - resize_eval_groundtruth=False, - groundtruth_padded_size=[512, 512], - drop_remainder=False), - ), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=450 * steps_per_epoch, - validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 0.007, - 'decay_steps': 450 * steps_per_epoch, - 'end_learning_rate': 0.0, - 'power': 0.9 - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('mnv2_deeplabv3_pascal') -def mnv2_deeplabv3_pascal() -> cfg.ExperimentConfig: - """Image segmentation on pascal with mobilenetv2 deeplabv3.""" - train_batch_size = 16 - eval_batch_size = 16 - steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size - output_stride = 16 - aspp_dilation_rates = [] - level = int(np.math.log2(output_stride)) - pool_kernel_size = [] - - config = cfg.ExperimentConfig( - task=SemanticSegmentationTask( - model=SemanticSegmentationModel( - num_classes=21, - input_size=[None, None, 3], - backbone=backbones.Backbone( - type='mobilenet', - mobilenet=backbones.MobileNet( - model_id='MobileNetV2', output_stride=output_stride)), - decoder=decoders.Decoder( - type='aspp', - aspp=decoders.ASPP( - level=level, - dilation_rates=aspp_dilation_rates, - pool_kernel_size=pool_kernel_size)), - head=SegmentationHead(level=level, num_convs=0), - norm_activation=common.NormActivation( - activation='relu', - norm_momentum=0.99, - norm_epsilon=1e-3, - use_sync_bn=True)), - losses=Losses(l2_weight_decay=4e-5), - train_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), - output_size=[512, 512], - is_training=True, - global_batch_size=train_batch_size, - aug_scale_min=0.5, - aug_scale_max=2.0), - validation_data=DataConfig( - input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), - output_size=[512, 512], - is_training=False, - global_batch_size=eval_batch_size, - resize_eval_groundtruth=False, - groundtruth_padded_size=[512, 512], - drop_remainder=False), - # mobilenetv2 - init_checkpoint='gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63', - init_checkpoint_modules=['backbone', 'decoder']), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=30000, - validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - best_checkpoint_eval_metric='mean_iou', - best_checkpoint_export_subdir='best_ckpt', - best_checkpoint_metric_comp='higher', - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 0.007 * train_batch_size / 16, - 'decay_steps': 30000, - 'end_learning_rate': 0.0, - 'power': 0.9 - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -# Cityscapes Dataset (Download and process the dataset yourself) -CITYSCAPES_TRAIN_EXAMPLES = 2975 -CITYSCAPES_VAL_EXAMPLES = 500 -CITYSCAPES_INPUT_PATH_BASE = 'cityscapes' - - -@exp_factory.register_config_factory('seg_deeplabv3plus_cityscapes') -def seg_deeplabv3plus_cityscapes() -> cfg.ExperimentConfig: - """Image segmentation on cityscapes with resnet deeplabv3+.""" - train_batch_size = 16 - eval_batch_size = 16 - steps_per_epoch = CITYSCAPES_TRAIN_EXAMPLES // train_batch_size - output_stride = 16 - aspp_dilation_rates = [6, 12, 18] - multigrid = [1, 2, 4] - stem_type = 'v1' - level = int(np.math.log2(output_stride)) - config = cfg.ExperimentConfig( - task=SemanticSegmentationTask( - model=SemanticSegmentationModel( - # Cityscapes uses only 19 semantic classes for train/evaluation. - # The void (background) class is ignored in train and evaluation. - num_classes=19, - input_size=[None, None, 3], - backbone=backbones.Backbone( - type='dilated_resnet', dilated_resnet=backbones.DilatedResNet( - model_id=101, output_stride=output_stride, - stem_type=stem_type, multigrid=multigrid)), - decoder=decoders.Decoder( - type='aspp', - aspp=decoders.ASPP( - level=level, dilation_rates=aspp_dilation_rates, - pool_kernel_size=[512, 1024])), - head=SegmentationHead( - level=level, - num_convs=2, - feature_fusion='deeplabv3plus', - low_level=2, - low_level_num_filters=48), - norm_activation=common.NormActivation( - activation='swish', - norm_momentum=0.99, - norm_epsilon=1e-3, - use_sync_bn=True)), - losses=Losses(l2_weight_decay=1e-4), - train_data=DataConfig( - input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, - 'train_fine**'), - crop_size=[512, 1024], - output_size=[1024, 2048], - is_training=True, - global_batch_size=train_batch_size, - aug_scale_min=0.5, - aug_scale_max=2.0), - validation_data=DataConfig( - input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, 'val_fine*'), - output_size=[1024, 2048], - is_training=False, - global_batch_size=eval_batch_size, - resize_eval_groundtruth=True, - drop_remainder=False), - # resnet101 - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/deeplab/deeplab_resnet101_imagenet/ckpt-62400', - init_checkpoint_modules='backbone'), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=500 * steps_per_epoch, - validation_steps=CITYSCAPES_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 0.01, - 'decay_steps': 500 * steps_per_epoch, - 'end_learning_rate': 0.0, - 'power': 0.9 - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('mnv2_deeplabv3_cityscapes') -def mnv2_deeplabv3_cityscapes() -> cfg.ExperimentConfig: - """Image segmentation on cityscapes with mobilenetv2 deeplabv3.""" - train_batch_size = 16 - eval_batch_size = 16 - steps_per_epoch = CITYSCAPES_TRAIN_EXAMPLES // train_batch_size - output_stride = 16 - aspp_dilation_rates = [] - pool_kernel_size = [512, 1024] - - level = int(np.math.log2(output_stride)) - config = cfg.ExperimentConfig( - task=SemanticSegmentationTask( - model=SemanticSegmentationModel( - # Cityscapes uses only 19 semantic classes for train/evaluation. - # The void (background) class is ignored in train and evaluation. - num_classes=19, - input_size=[None, None, 3], - backbone=backbones.Backbone( - type='mobilenet', - mobilenet=backbones.MobileNet( - model_id='MobileNetV2', output_stride=output_stride)), - decoder=decoders.Decoder( - type='aspp', - aspp=decoders.ASPP( - level=level, - dilation_rates=aspp_dilation_rates, - pool_kernel_size=pool_kernel_size)), - head=SegmentationHead(level=level, num_convs=0), - norm_activation=common.NormActivation( - activation='relu', - norm_momentum=0.99, - norm_epsilon=1e-3, - use_sync_bn=True)), - losses=Losses(l2_weight_decay=4e-5), - train_data=DataConfig( - input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, - 'train_fine**'), - crop_size=[512, 1024], - output_size=[1024, 2048], - is_training=True, - global_batch_size=train_batch_size, - aug_scale_min=0.5, - aug_scale_max=2.0), - validation_data=DataConfig( - input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, 'val_fine*'), - output_size=[1024, 2048], - is_training=False, - global_batch_size=eval_batch_size, - resize_eval_groundtruth=True, - drop_remainder=False), - # Coco pre-trained mobilenetv2 checkpoint - init_checkpoint='gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63', - init_checkpoint_modules='backbone'), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=100000, - validation_steps=CITYSCAPES_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - best_checkpoint_eval_metric='mean_iou', - best_checkpoint_export_subdir='best_ckpt', - best_checkpoint_metric_comp='higher', - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'polynomial', - 'polynomial': { - 'initial_learning_rate': 0.01, - 'decay_steps': 100000, - 'end_learning_rate': 0.0, - 'power': 0.9 - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('mnv2_deeplabv3plus_cityscapes') -def mnv2_deeplabv3plus_cityscapes() -> cfg.ExperimentConfig: - """Image segmentation on cityscapes with mobilenetv2 deeplabv3plus.""" - config = mnv2_deeplabv3_cityscapes() - config.task.model.head = SegmentationHead( - level=4, - num_convs=2, - feature_fusion='deeplabv3plus', - use_depthwise_convolution=True, - low_level='2/depthwise', - low_level_num_filters=48) - config.task.model.backbone.mobilenet.output_intermediate_endpoints = True - return config diff --git a/official/vision/beta/configs/semantic_segmentation_test.py b/official/vision/beta/configs/semantic_segmentation_test.py deleted file mode 100644 index 9652582ce000dc52c26af307c319d00c3669f0fa..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/semantic_segmentation_test.py +++ /dev/null @@ -1,46 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for semantic_segmentation.""" - -# pylint: disable=unused-import -from absl.testing import parameterized -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.vision import beta -from official.vision.beta.configs import semantic_segmentation as exp_cfg - - -class ImageSegmentationConfigTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters(('seg_deeplabv3_pascal',), - ('seg_deeplabv3plus_pascal',)) - def test_semantic_segmentation_configs(self, config_name): - config = exp_factory.get_exp_config(config_name) - self.assertIsInstance(config, cfg.ExperimentConfig) - self.assertIsInstance(config.task, exp_cfg.SemanticSegmentationTask) - self.assertIsInstance(config.task.model, - exp_cfg.SemanticSegmentationModel) - self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) - config.validate() - config.task.train_data.is_training = None - with self.assertRaises(KeyError): - config.validate() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/configs/video_classification.py b/official/vision/beta/configs/video_classification.py deleted file mode 100644 index e196d4d60b8dceda04e9975dfa42174322f7675c..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/video_classification.py +++ /dev/null @@ -1,371 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Video classification configuration definition.""" -import dataclasses -from typing import Optional, Tuple -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import backbones_3d -from official.vision.beta.configs import common - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """The base configuration for building datasets.""" - name: Optional[str] = None - file_type: Optional[str] = 'tfrecord' - compressed_input: bool = False - split: str = 'train' - variant_name: Optional[str] = None - feature_shape: Tuple[int, ...] = (64, 224, 224, 3) - temporal_stride: int = 1 - random_stride_range: int = 0 - num_test_clips: int = 1 - num_test_crops: int = 1 - num_classes: int = -1 - num_examples: int = -1 - global_batch_size: int = 128 - data_format: str = 'channels_last' - dtype: str = 'float32' - one_hot: bool = True - shuffle_buffer_size: int = 64 - cache: bool = False - input_path: str = '' - is_training: bool = True - cycle_length: int = 10 - drop_remainder: bool = True - min_image_size: int = 256 - is_multilabel: bool = False - output_audio: bool = False - audio_feature: str = '' - audio_feature_shape: Tuple[int, ...] = (-1,) - aug_min_aspect_ratio: float = 0.5 - aug_max_aspect_ratio: float = 2.0 - aug_min_area_ratio: float = 0.49 - aug_max_area_ratio: float = 1.0 - aug_type: Optional[str] = None # 'autoaug', 'randaug', or None - image_field_key: str = 'image/encoded' - label_field_key: str = 'clip/label/index' - - -def kinetics400(is_training): - """Generated Kinectics 400 dataset configs.""" - return DataConfig( - name='kinetics400', - num_classes=400, - is_training=is_training, - split='train' if is_training else 'valid', - drop_remainder=is_training, - num_examples=215570 if is_training else 17706, - feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) - - -def kinetics600(is_training): - """Generated Kinectics 600 dataset configs.""" - return DataConfig( - name='kinetics600', - num_classes=600, - is_training=is_training, - split='train' if is_training else 'valid', - drop_remainder=is_training, - num_examples=366016 if is_training else 27780, - feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) - - -def kinetics700(is_training): - """Generated Kinectics 600 dataset configs.""" - return DataConfig( - name='kinetics700', - num_classes=700, - is_training=is_training, - split='train' if is_training else 'valid', - drop_remainder=is_training, - num_examples=522883 if is_training else 33441, - feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) - - -def kinetics700_2020(is_training): - """Generated Kinectics 600 dataset configs.""" - return DataConfig( - name='kinetics700', - num_classes=700, - is_training=is_training, - split='train' if is_training else 'valid', - drop_remainder=is_training, - num_examples=535982 if is_training else 33640, - feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) - - -@dataclasses.dataclass -class VideoClassificationModel(hyperparams.Config): - """The model config.""" - model_type: str = 'video_classification' - backbone: backbones_3d.Backbone3D = backbones_3d.Backbone3D( - type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()) - norm_activation: common.NormActivation = common.NormActivation( - use_sync_bn=False) - dropout_rate: float = 0.2 - aggregate_endpoints: bool = False - require_endpoints: Optional[Tuple[str, ...]] = None - - -@dataclasses.dataclass -class Losses(hyperparams.Config): - one_hot: bool = True - label_smoothing: float = 0.0 - l2_weight_decay: float = 0.0 - - -@dataclasses.dataclass -class Metrics(hyperparams.Config): - use_per_class_recall: bool = False - - -@dataclasses.dataclass -class VideoClassificationTask(cfg.TaskConfig): - """The task config.""" - model: VideoClassificationModel = VideoClassificationModel() - train_data: DataConfig = DataConfig(is_training=True, drop_remainder=True) - validation_data: DataConfig = DataConfig( - is_training=False, drop_remainder=False) - losses: Losses = Losses() - metrics: Metrics = Metrics() - init_checkpoint: Optional[str] = None - init_checkpoint_modules: str = 'all' # all or backbone - # Spatial Partitioning fields. - train_input_partition_dims: Optional[Tuple[int, ...]] = None - eval_input_partition_dims: Optional[Tuple[int, ...]] = None - - -def add_trainer(experiment: cfg.ExperimentConfig, - train_batch_size: int, - eval_batch_size: int, - learning_rate: float = 1.6, - train_epochs: int = 44, - warmup_epochs: int = 5): - """Add and config a trainer to the experiment config.""" - if experiment.task.train_data.num_examples <= 0: - raise ValueError('Wrong train dataset size {!r}'.format( - experiment.task.train_data)) - if experiment.task.validation_data.num_examples <= 0: - raise ValueError('Wrong validation dataset size {!r}'.format( - experiment.task.validation_data)) - experiment.task.train_data.global_batch_size = train_batch_size - experiment.task.validation_data.global_batch_size = eval_batch_size - steps_per_epoch = experiment.task.train_data.num_examples // train_batch_size - experiment.trainer = cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=train_epochs * steps_per_epoch, - validation_steps=experiment.task.validation_data.num_examples // - eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9, - 'nesterov': True, - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - 'initial_learning_rate': learning_rate, - 'decay_steps': train_epochs * steps_per_epoch, - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': warmup_epochs * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })) - return experiment - - -@exp_factory.register_config_factory('video_classification') -def video_classification() -> cfg.ExperimentConfig: - """Video classification general.""" - return cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=VideoClassificationTask(), - trainer=cfg.TrainerConfig(), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.train_data.num_classes == task.validation_data.num_classes', - ]) - - -@exp_factory.register_config_factory('video_classification_ucf101') -def video_classification_ucf101() -> cfg.ExperimentConfig: - """Video classification on UCF-101 with resnet.""" - train_dataset = DataConfig( - name='ucf101', - num_classes=101, - is_training=True, - split='train', - drop_remainder=True, - num_examples=9537, - temporal_stride=2, - feature_shape=(32, 224, 224, 3)) - train_dataset.tfds_name = 'ucf101' - train_dataset.tfds_split = 'train' - validation_dataset = DataConfig( - name='ucf101', - num_classes=101, - is_training=True, - split='test', - drop_remainder=False, - num_examples=3783, - temporal_stride=2, - feature_shape=(32, 224, 224, 3)) - validation_dataset.tfds_name = 'ucf101' - validation_dataset.tfds_split = 'test' - task = VideoClassificationTask( - model=VideoClassificationModel( - backbone=backbones_3d.Backbone3D( - type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), - losses=Losses(l2_weight_decay=1e-4), - train_data=train_dataset, - validation_data=validation_dataset) - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=task, - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.train_data.num_classes == task.validation_data.num_classes', - ]) - add_trainer( - config, - train_batch_size=64, - eval_batch_size=16, - learning_rate=0.8, - train_epochs=100) - return config - - -@exp_factory.register_config_factory('video_classification_kinetics400') -def video_classification_kinetics400() -> cfg.ExperimentConfig: - """Video classification on Kinectics 400 with resnet.""" - train_dataset = kinetics400(is_training=True) - validation_dataset = kinetics400(is_training=False) - task = VideoClassificationTask( - model=VideoClassificationModel( - backbone=backbones_3d.Backbone3D( - type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), - losses=Losses(l2_weight_decay=1e-4), - train_data=train_dataset, - validation_data=validation_dataset) - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=task, - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.train_data.num_classes == task.validation_data.num_classes', - ]) - add_trainer(config, train_batch_size=1024, eval_batch_size=64) - return config - - -@exp_factory.register_config_factory('video_classification_kinetics600') -def video_classification_kinetics600() -> cfg.ExperimentConfig: - """Video classification on Kinectics 600 with resnet.""" - train_dataset = kinetics600(is_training=True) - validation_dataset = kinetics600(is_training=False) - task = VideoClassificationTask( - model=VideoClassificationModel( - backbone=backbones_3d.Backbone3D( - type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), - losses=Losses(l2_weight_decay=1e-4), - train_data=train_dataset, - validation_data=validation_dataset) - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=task, - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.train_data.num_classes == task.validation_data.num_classes', - ]) - add_trainer(config, train_batch_size=1024, eval_batch_size=64) - return config - - -@exp_factory.register_config_factory('video_classification_kinetics700') -def video_classification_kinetics700() -> cfg.ExperimentConfig: - """Video classification on Kinectics 700 with resnet.""" - train_dataset = kinetics700(is_training=True) - validation_dataset = kinetics700(is_training=False) - task = VideoClassificationTask( - model=VideoClassificationModel( - backbone=backbones_3d.Backbone3D( - type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), - losses=Losses(l2_weight_decay=1e-4), - train_data=train_dataset, - validation_data=validation_dataset) - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=task, - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.train_data.num_classes == task.validation_data.num_classes', - ]) - add_trainer(config, train_batch_size=1024, eval_batch_size=64) - return config - - -@exp_factory.register_config_factory('video_classification_kinetics700_2020') -def video_classification_kinetics700_2020() -> cfg.ExperimentConfig: - """Video classification on Kinectics 700 2020 with resnet.""" - train_dataset = kinetics700_2020(is_training=True) - validation_dataset = kinetics700_2020(is_training=False) - task = VideoClassificationTask( - model=VideoClassificationModel( - backbone=backbones_3d.Backbone3D( - type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), - losses=Losses(l2_weight_decay=1e-4), - train_data=train_dataset, - validation_data=validation_dataset) - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=task, - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.train_data.num_classes == task.validation_data.num_classes', - ]) - add_trainer(config, train_batch_size=1024, eval_batch_size=64) - return config diff --git a/official/vision/beta/configs/video_classification_test.py b/official/vision/beta/configs/video_classification_test.py deleted file mode 100644 index f2ce2118920161aeb76fae66d9c44049936bdabe..0000000000000000000000000000000000000000 --- a/official/vision/beta/configs/video_classification_test.py +++ /dev/null @@ -1,45 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for video_classification.""" - -# pylint: disable=unused-import -from absl.testing import parameterized -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.vision import beta -from official.vision.beta.configs import video_classification as exp_cfg - - -class VideoClassificationConfigTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters(('video_classification',), - ('video_classification_kinetics600',)) - def test_video_classification_configs(self, config_name): - config = exp_factory.get_exp_config(config_name) - self.assertIsInstance(config, cfg.ExperimentConfig) - self.assertIsInstance(config.task, exp_cfg.VideoClassificationTask) - self.assertIsInstance(config.task.model, exp_cfg.VideoClassificationModel) - self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) - config.validate() - config.task.train_data.is_training = None - with self.assertRaises(KeyError): - config.validate() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/data/__init__.py b/official/vision/beta/data/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/data/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/dataloaders/__init__.py b/official/vision/beta/dataloaders/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/dataloaders/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/dataloaders/classification_input.py b/official/vision/beta/dataloaders/classification_input.py deleted file mode 100644 index c0dc6fdb8ddf6dedb1e4ece6acb96ee13ca3e92d..0000000000000000000000000000000000000000 --- a/official/vision/beta/dataloaders/classification_input.py +++ /dev/null @@ -1,273 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Classification decoder and parser.""" -from typing import Any, Dict, List, Optional -# Import libraries -import tensorflow as tf - -from official.vision.beta.configs import common -from official.vision.beta.dataloaders import decoder -from official.vision.beta.dataloaders import parser -from official.vision.beta.ops import augment -from official.vision.beta.ops import preprocess_ops - -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) - -DEFAULT_IMAGE_FIELD_KEY = 'image/encoded' -DEFAULT_LABEL_FIELD_KEY = 'image/class/label' - - -class Decoder(decoder.Decoder): - """A tf.Example decoder for classification task.""" - - def __init__(self, - image_field_key: str = DEFAULT_IMAGE_FIELD_KEY, - label_field_key: str = DEFAULT_LABEL_FIELD_KEY, - is_multilabel: bool = False, - keys_to_features: Optional[Dict[str, Any]] = None): - if not keys_to_features: - keys_to_features = { - image_field_key: - tf.io.FixedLenFeature((), tf.string, default_value=''), - } - if is_multilabel: - keys_to_features.update( - {label_field_key: tf.io.VarLenFeature(dtype=tf.int64)}) - else: - keys_to_features.update({ - label_field_key: - tf.io.FixedLenFeature((), tf.int64, default_value=-1) - }) - self._keys_to_features = keys_to_features - - def decode(self, serialized_example): - return tf.io.parse_single_example( - serialized_example, self._keys_to_features) - - -class Parser(parser.Parser): - """Parser to parse an image and its annotations into a dictionary of tensors.""" - - def __init__(self, - output_size: List[int], - num_classes: float, - image_field_key: str = DEFAULT_IMAGE_FIELD_KEY, - label_field_key: str = DEFAULT_LABEL_FIELD_KEY, - decode_jpeg_only: bool = True, - aug_rand_hflip: bool = True, - aug_type: Optional[common.Augmentation] = None, - color_jitter: float = 0., - random_erasing: Optional[common.RandomErasing] = None, - is_multilabel: bool = False, - dtype: str = 'float32'): - """Initializes parameters for parsing annotations in the dataset. - - Args: - output_size: `Tensor` or `list` for [height, width] of output image. The - output_size should be divided by the largest feature stride 2^max_level. - num_classes: `float`, number of classes. - image_field_key: `str`, the key name to encoded image in tf.Example. - label_field_key: `str`, the key name to label in tf.Example. - decode_jpeg_only: `bool`, if True, only JPEG format is decoded, this is - faster than decoding other types. Default is True. - aug_rand_hflip: `bool`, if True, augment training with random - horizontal flip. - aug_type: An optional Augmentation object to choose from AutoAugment and - RandAugment. - color_jitter: Magnitude of color jitter. If > 0, the value is used to - generate random scale factor for brightness, contrast and saturation. - See `preprocess_ops.color_jitter` for more details. - random_erasing: if not None, augment input image by random erasing. See - `augment.RandomErasing` for more details. - is_multilabel: A `bool`, whether or not each example has multiple labels. - dtype: `str`, cast output image in dtype. It can be 'float32', 'float16', - or 'bfloat16'. - """ - self._output_size = output_size - self._aug_rand_hflip = aug_rand_hflip - self._num_classes = num_classes - self._image_field_key = image_field_key - if dtype == 'float32': - self._dtype = tf.float32 - elif dtype == 'float16': - self._dtype = tf.float16 - elif dtype == 'bfloat16': - self._dtype = tf.bfloat16 - else: - raise ValueError('dtype {!r} is not supported!'.format(dtype)) - if aug_type: - if aug_type.type == 'autoaug': - self._augmenter = augment.AutoAugment( - augmentation_name=aug_type.autoaug.augmentation_name, - cutout_const=aug_type.autoaug.cutout_const, - translate_const=aug_type.autoaug.translate_const) - elif aug_type.type == 'randaug': - self._augmenter = augment.RandAugment( - num_layers=aug_type.randaug.num_layers, - magnitude=aug_type.randaug.magnitude, - cutout_const=aug_type.randaug.cutout_const, - translate_const=aug_type.randaug.translate_const, - prob_to_apply=aug_type.randaug.prob_to_apply, - exclude_ops=aug_type.randaug.exclude_ops) - else: - raise ValueError('Augmentation policy {} not supported.'.format( - aug_type.type)) - else: - self._augmenter = None - self._label_field_key = label_field_key - self._color_jitter = color_jitter - if random_erasing: - self._random_erasing = augment.RandomErasing( - probability=random_erasing.probability, - min_area=random_erasing.min_area, - max_area=random_erasing.max_area, - min_aspect=random_erasing.min_aspect, - max_aspect=random_erasing.max_aspect, - min_count=random_erasing.min_count, - max_count=random_erasing.max_count, - trials=random_erasing.trials) - else: - self._random_erasing = None - self._is_multilabel = is_multilabel - self._decode_jpeg_only = decode_jpeg_only - - def _parse_train_data(self, decoded_tensors): - """Parses data for training.""" - image = self._parse_train_image(decoded_tensors) - label = tf.cast(decoded_tensors[self._label_field_key], dtype=tf.int32) - if self._is_multilabel: - if isinstance(label, tf.sparse.SparseTensor): - label = tf.sparse.to_dense(label) - label = tf.reduce_sum(tf.one_hot(label, self._num_classes), axis=0) - return image, label - - def _parse_eval_data(self, decoded_tensors): - """Parses data for evaluation.""" - image = self._parse_eval_image(decoded_tensors) - label = tf.cast(decoded_tensors[self._label_field_key], dtype=tf.int32) - if self._is_multilabel: - if isinstance(label, tf.sparse.SparseTensor): - label = tf.sparse.to_dense(label) - label = tf.reduce_sum(tf.one_hot(label, self._num_classes), axis=0) - return image, label - - def _parse_train_image(self, decoded_tensors): - """Parses image data for training.""" - image_bytes = decoded_tensors[self._image_field_key] - - if self._decode_jpeg_only: - image_shape = tf.image.extract_jpeg_shape(image_bytes) - - # Crops image. - cropped_image = preprocess_ops.random_crop_image_v2( - image_bytes, image_shape) - image = tf.cond( - tf.reduce_all(tf.equal(tf.shape(cropped_image), image_shape)), - lambda: preprocess_ops.center_crop_image_v2(image_bytes, image_shape), - lambda: cropped_image) - else: - # Decodes image. - image = tf.io.decode_image(image_bytes, channels=3) - image.set_shape([None, None, 3]) - - # Crops image. - cropped_image = preprocess_ops.random_crop_image(image) - - image = tf.cond( - tf.reduce_all(tf.equal(tf.shape(cropped_image), tf.shape(image))), - lambda: preprocess_ops.center_crop_image(image), - lambda: cropped_image) - - if self._aug_rand_hflip: - image = tf.image.random_flip_left_right(image) - - # Color jitter. - if self._color_jitter > 0: - image = preprocess_ops.color_jitter(image, self._color_jitter, - self._color_jitter, - self._color_jitter) - - # Resizes image. - image = tf.image.resize( - image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) - image.set_shape([self._output_size[0], self._output_size[1], 3]) - - # Apply autoaug or randaug. - if self._augmenter is not None: - image = self._augmenter.distort(image) - - # Normalizes image with mean and std pixel values. - image = preprocess_ops.normalize_image(image, - offset=MEAN_RGB, - scale=STDDEV_RGB) - - # Random erasing after the image has been normalized - if self._random_erasing is not None: - image = self._random_erasing.distort(image) - - # Convert image to self._dtype. - image = tf.image.convert_image_dtype(image, self._dtype) - - return image - - def _parse_eval_image(self, decoded_tensors): - """Parses image data for evaluation.""" - image_bytes = decoded_tensors[self._image_field_key] - - if self._decode_jpeg_only: - image_shape = tf.image.extract_jpeg_shape(image_bytes) - - # Center crops. - image = preprocess_ops.center_crop_image_v2(image_bytes, image_shape) - else: - # Decodes image. - image = tf.io.decode_image(image_bytes, channels=3) - image.set_shape([None, None, 3]) - - # Center crops. - image = preprocess_ops.center_crop_image(image) - - image = tf.image.resize( - image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) - image.set_shape([self._output_size[0], self._output_size[1], 3]) - - # Normalizes image with mean and std pixel values. - image = preprocess_ops.normalize_image(image, - offset=MEAN_RGB, - scale=STDDEV_RGB) - - # Convert image to self._dtype. - image = tf.image.convert_image_dtype(image, self._dtype) - - return image - - @classmethod - def inference_fn(cls, - image: tf.Tensor, - input_image_size: List[int], - num_channels: int = 3) -> tf.Tensor: - """Builds image model inputs for serving.""" - - image = tf.cast(image, dtype=tf.float32) - image = preprocess_ops.center_crop_image(image) - image = tf.image.resize( - image, input_image_size, method=tf.image.ResizeMethod.BILINEAR) - - # Normalizes image with mean and std pixel values. - image = preprocess_ops.normalize_image( - image, offset=MEAN_RGB, scale=STDDEV_RGB) - image.set_shape(input_image_size + [num_channels]) - return image diff --git a/official/vision/beta/dataloaders/input_reader.py b/official/vision/beta/dataloaders/input_reader.py deleted file mode 100644 index 99698cb5cd5873c4b4badc8e8891e2389dea1393..0000000000000000000000000000000000000000 --- a/official/vision/beta/dataloaders/input_reader.py +++ /dev/null @@ -1,179 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Dataset reader for vision model garden.""" - -from typing import Any, Callable, Optional, Tuple - -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.core import input_reader - - -def calculate_batch_sizes(total_batch_size: int, - pseudo_label_ratio: float) -> Tuple[int, int]: - """Calculates labeled and pseudo-labeled dataset batch sizes. - - Returns (labeled_batch_size, pseudo_labeled_batch_size) given a - total batch size and pseudo-label data ratio. - - Args: - total_batch_size: The total batch size for all data. - pseudo_label_ratio: A non-negative float ratio of pseudo-labeled - to labeled data in a batch. - - Returns: - (labeled_batch_size, pseudo_labeled_batch_size) as ints. - - Raises: - ValueError: If total_batch_size is negative. - ValueError: If pseudo_label_ratio is negative. - """ - if total_batch_size < 0: - raise ValueError('Invalid total_batch_size: {}'.format(total_batch_size)) - if pseudo_label_ratio < 0.0: - raise ValueError( - 'Invalid pseudo_label_ratio: {}'.format(pseudo_label_ratio)) - - ratio_factor = pseudo_label_ratio / (1.0 + pseudo_label_ratio) - pseudo_labeled_batch_size = int(round(total_batch_size * ratio_factor)) - labeled_batch_size = total_batch_size - pseudo_labeled_batch_size - return labeled_batch_size, pseudo_labeled_batch_size - - -class CombinationDatasetInputReader(input_reader.InputReader): - """Combination dataset input reader.""" - - def __init__(self, - params: cfg.DataConfig, - dataset_fn=tf.data.TFRecordDataset, - pseudo_label_dataset_fn=tf.data.TFRecordDataset, - decoder_fn: Optional[Callable[..., Any]] = None, - sample_fn: Optional[Callable[..., Any]] = None, - parser_fn: Optional[Callable[..., Any]] = None, - transform_and_batch_fn: Optional[Callable[ - [tf.data.Dataset, Optional[tf.distribute.InputContext]], - tf.data.Dataset]] = None, - postprocess_fn: Optional[Callable[..., Any]] = None): - """Initializes an CombinationDatasetInputReader instance. - - This class mixes a labeled and pseudo-labeled dataset. The params - must contain "pseudo_label_data.input_path" to specify the - pseudo-label dataset files and "pseudo_label_data.data_ratio" - to specify a per-batch mixing ratio of pseudo-label examples to - labeled dataset examples. - - Args: - params: A config_definitions.DataConfig object. - dataset_fn: A `tf.data.Dataset` that consumes the input files. For - example, it can be `tf.data.TFRecordDataset`. - pseudo_label_dataset_fn: A `tf.data.Dataset` that consumes the input - files. For example, it can be `tf.data.TFRecordDataset`. - decoder_fn: An optional `callable` that takes the serialized data string - and decodes them into the raw tensor dictionary. - sample_fn: An optional `callable` that takes a `tf.data.Dataset` object as - input and outputs the transformed dataset. It performs sampling on the - decoded raw tensors dict before the parser_fn. - parser_fn: An optional `callable` that takes the decoded raw tensors dict - and parse them into a dictionary of tensors that can be consumed by the - model. It will be executed after decoder_fn. - transform_and_batch_fn: An optional `callable` that takes a - `tf.data.Dataset` object and an optional `tf.distribute.InputContext` as - input, and returns a `tf.data.Dataset` object. It will be executed after - `parser_fn` to transform and batch the dataset; if None, after - `parser_fn` is executed, the dataset will be batched into per-replica - batch size. - postprocess_fn: A optional `callable` that processes batched tensors. It - will be executed after batching. - - Raises: - ValueError: If drop_remainder is False. - """ - super().__init__(params=params, - dataset_fn=dataset_fn, - decoder_fn=decoder_fn, - sample_fn=sample_fn, - parser_fn=parser_fn, - transform_and_batch_fn=transform_and_batch_fn, - postprocess_fn=postprocess_fn) - - self._pseudo_label_file_pattern = params.pseudo_label_data.input_path - self._pseudo_label_dataset_fn = pseudo_label_dataset_fn - self._pseudo_label_data_ratio = params.pseudo_label_data.data_ratio - self._pseudo_label_matched_files = input_reader.match_files( - self._pseudo_label_file_pattern) - if not self._drop_remainder: - raise ValueError( - 'Must use drop_remainder=True with CombinationDatasetInputReader') - - def read( - self, - input_context: Optional[tf.distribute.InputContext] = None - ) -> tf.data.Dataset: - """Generates a tf.data.Dataset object.""" - - labeled_batch_size, pl_batch_size = calculate_batch_sizes( - self._global_batch_size, self._pseudo_label_data_ratio) - - if not labeled_batch_size and pl_batch_size: - raise ValueError( - 'Invalid batch_size: {} and pseudo_label_data_ratio: {}, ' - 'resulting in a 0 batch size for one of the datasets.'.format( - self._global_batch_size, self._pseudo_label_data_ratio)) - - def _read_decode_and_parse_dataset(matched_files, dataset_fn, batch_size, - input_context, tfds_builder): - dataset = self._read_data_source(matched_files, dataset_fn, input_context, - tfds_builder) - return self._decode_and_parse_dataset(dataset, batch_size, input_context) - - labeled_dataset = _read_decode_and_parse_dataset( - matched_files=self._matched_files, - dataset_fn=self._dataset_fn, - batch_size=labeled_batch_size, - input_context=input_context, - tfds_builder=self._tfds_builder) - - pseudo_labeled_dataset = _read_decode_and_parse_dataset( - matched_files=self._pseudo_label_matched_files, - dataset_fn=self._pseudo_label_dataset_fn, - batch_size=pl_batch_size, - input_context=input_context, - tfds_builder=False) - - def concat_fn(d1, d2): - return tf.nest.map_structure( - lambda x1, x2: tf.concat([x1, x2], axis=0), d1, d2) - - dataset_concat = tf.data.Dataset.zip( - (labeled_dataset, pseudo_labeled_dataset)) - dataset_concat = dataset_concat.map( - concat_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) - - def maybe_map_fn(dataset, fn): - return dataset if fn is None else dataset.map( - fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) - - dataset_concat = maybe_map_fn(dataset_concat, self._postprocess_fn) - dataset_concat = self._maybe_apply_data_service(dataset_concat, - input_context) - - if self._deterministic is not None: - options = tf.data.Options() - options.experimental_deterministic = self._deterministic - dataset_concat = dataset_concat.with_options(options) - - return dataset_concat.prefetch(tf.data.experimental.AUTOTUNE) diff --git a/official/vision/beta/dataloaders/tf_example_decoder.py b/official/vision/beta/dataloaders/tf_example_decoder.py deleted file mode 100644 index e636f56151698225479a5888a13c165063babbb5..0000000000000000000000000000000000000000 --- a/official/vision/beta/dataloaders/tf_example_decoder.py +++ /dev/null @@ -1,176 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tensorflow Example proto decoder for object detection. - -A decoder to decode string tensors containing serialized tensorflow.Example -protos for object detection. -""" -import tensorflow as tf - -from official.vision.beta.dataloaders import decoder - - -def _generate_source_id(image_bytes): - # Hashing using 22 bits since float32 has only 23 mantissa bits. - return tf.strings.as_string( - tf.strings.to_hash_bucket_fast(image_bytes, 2 ** 22 - 1)) - - -class TfExampleDecoder(decoder.Decoder): - """Tensorflow Example proto decoder.""" - - def __init__(self, - include_mask=False, - regenerate_source_id=False, - mask_binarize_threshold=None): - self._include_mask = include_mask - self._regenerate_source_id = regenerate_source_id - self._keys_to_features = { - 'image/encoded': tf.io.FixedLenFeature((), tf.string), - 'image/height': tf.io.FixedLenFeature((), tf.int64), - 'image/width': tf.io.FixedLenFeature((), tf.int64), - 'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32), - 'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32), - 'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32), - 'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32), - 'image/object/class/label': tf.io.VarLenFeature(tf.int64), - 'image/object/area': tf.io.VarLenFeature(tf.float32), - 'image/object/is_crowd': tf.io.VarLenFeature(tf.int64), - } - self._mask_binarize_threshold = mask_binarize_threshold - if include_mask: - self._keys_to_features.update({ - 'image/object/mask': tf.io.VarLenFeature(tf.string), - }) - if not regenerate_source_id: - self._keys_to_features.update({ - 'image/source_id': tf.io.FixedLenFeature((), tf.string), - }) - - def _decode_image(self, parsed_tensors): - """Decodes the image and set its static shape.""" - image = tf.io.decode_image(parsed_tensors['image/encoded'], channels=3) - image.set_shape([None, None, 3]) - return image - - def _decode_boxes(self, parsed_tensors): - """Concat box coordinates in the format of [ymin, xmin, ymax, xmax].""" - xmin = parsed_tensors['image/object/bbox/xmin'] - xmax = parsed_tensors['image/object/bbox/xmax'] - ymin = parsed_tensors['image/object/bbox/ymin'] - ymax = parsed_tensors['image/object/bbox/ymax'] - return tf.stack([ymin, xmin, ymax, xmax], axis=-1) - - def _decode_classes(self, parsed_tensors): - return parsed_tensors['image/object/class/label'] - - def _decode_areas(self, parsed_tensors): - xmin = parsed_tensors['image/object/bbox/xmin'] - xmax = parsed_tensors['image/object/bbox/xmax'] - ymin = parsed_tensors['image/object/bbox/ymin'] - ymax = parsed_tensors['image/object/bbox/ymax'] - height = tf.cast(parsed_tensors['image/height'], dtype=tf.float32) - width = tf.cast(parsed_tensors['image/width'], dtype=tf.float32) - return tf.cond( - tf.greater(tf.shape(parsed_tensors['image/object/area'])[0], 0), - lambda: parsed_tensors['image/object/area'], - lambda: (xmax - xmin) * (ymax - ymin) * height * width) - - def _decode_masks(self, parsed_tensors): - """Decode a set of PNG masks to the tf.float32 tensors.""" - - def _decode_png_mask(png_bytes): - mask = tf.squeeze( - tf.io.decode_png(png_bytes, channels=1, dtype=tf.uint8), axis=-1) - mask = tf.cast(mask, dtype=tf.float32) - mask.set_shape([None, None]) - return mask - - height = parsed_tensors['image/height'] - width = parsed_tensors['image/width'] - masks = parsed_tensors['image/object/mask'] - return tf.cond( - pred=tf.greater(tf.size(input=masks), 0), - true_fn=lambda: tf.map_fn(_decode_png_mask, masks, dtype=tf.float32), - false_fn=lambda: tf.zeros([0, height, width], dtype=tf.float32)) - - def decode(self, serialized_example): - """Decode the serialized example. - - Args: - serialized_example: a single serialized tf.Example string. - - Returns: - decoded_tensors: a dictionary of tensors with the following fields: - - source_id: a string scalar tensor. - - image: a uint8 tensor of shape [None, None, 3]. - - height: an integer scalar tensor. - - width: an integer scalar tensor. - - groundtruth_classes: a int64 tensor of shape [None]. - - groundtruth_is_crowd: a bool tensor of shape [None]. - - groundtruth_area: a float32 tensor of shape [None]. - - groundtruth_boxes: a float32 tensor of shape [None, 4]. - - groundtruth_instance_masks: a float32 tensor of shape - [None, None, None]. - - groundtruth_instance_masks_png: a string tensor of shape [None]. - """ - parsed_tensors = tf.io.parse_single_example( - serialized=serialized_example, features=self._keys_to_features) - for k in parsed_tensors: - if isinstance(parsed_tensors[k], tf.SparseTensor): - if parsed_tensors[k].dtype == tf.string: - parsed_tensors[k] = tf.sparse.to_dense( - parsed_tensors[k], default_value='') - else: - parsed_tensors[k] = tf.sparse.to_dense( - parsed_tensors[k], default_value=0) - - if self._regenerate_source_id: - source_id = _generate_source_id(parsed_tensors['image/encoded']) - else: - source_id = tf.cond( - tf.greater(tf.strings.length(parsed_tensors['image/source_id']), 0), - lambda: parsed_tensors['image/source_id'], - lambda: _generate_source_id(parsed_tensors['image/encoded'])) - image = self._decode_image(parsed_tensors) - boxes = self._decode_boxes(parsed_tensors) - classes = self._decode_classes(parsed_tensors) - areas = self._decode_areas(parsed_tensors) - is_crowds = tf.cond( - tf.greater(tf.shape(parsed_tensors['image/object/is_crowd'])[0], 0), - lambda: tf.cast(parsed_tensors['image/object/is_crowd'], dtype=tf.bool), - lambda: tf.zeros_like(classes, dtype=tf.bool)) - if self._include_mask: - masks = self._decode_masks(parsed_tensors) - - if self._mask_binarize_threshold is not None: - masks = tf.cast(masks > self._mask_binarize_threshold, tf.float32) - - decoded_tensors = { - 'source_id': source_id, - 'image': image, - 'height': parsed_tensors['image/height'], - 'width': parsed_tensors['image/width'], - 'groundtruth_classes': classes, - 'groundtruth_is_crowd': is_crowds, - 'groundtruth_area': areas, - 'groundtruth_boxes': boxes, - } - if self._include_mask: - decoded_tensors.update({ - 'groundtruth_instance_masks': masks, - 'groundtruth_instance_masks_png': parsed_tensors['image/object/mask'], - }) - return decoded_tensors diff --git a/official/vision/beta/dataloaders/tfexample_utils.py b/official/vision/beta/dataloaders/tfexample_utils.py deleted file mode 100644 index ddb78eb7de418f9c7b57ec65c5d64abeea5deea3..0000000000000000000000000000000000000000 --- a/official/vision/beta/dataloaders/tfexample_utils.py +++ /dev/null @@ -1,269 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Utility functions to create tf.Example and tf.SequnceExample for test. - -Example:video classification end-to-end test -i.e. from reading input file to train and eval. - -```python -class FooTrainTest(tf.test.TestCase): - - def setUp(self): - super(TrainTest, self).setUp() - - # Write the fake tf.train.SequenceExample to file for test. - data_dir = os.path.join(self.get_temp_dir(), 'data') - tf.io.gfile.makedirs(data_dir) - self._data_path = os.path.join(data_dir, 'data.tfrecord') - examples = [ - tfexample_utils.make_video_test_example( - image_shape=(36, 36, 3), - audio_shape=(20, 128), - label=random.randint(0, 100)) for _ in range(2) - ] - tfexample_utils.dump_to_tfrecord(self._data_path, tf_examples=examples) - - def test_foo(self): - dataset = tf.data.TFRecordDataset(self._data_path) - ... - -``` - -""" -import io -from typing import Sequence, Union - -import numpy as np -from PIL import Image -import tensorflow as tf - -IMAGE_KEY = 'image/encoded' -CLASSIFICATION_LABEL_KEY = 'image/class/label' -LABEL_KEY = 'clip/label/index' -AUDIO_KEY = 'features/audio' -DUMP_SOURCE_ID = b'123' - - -def encode_image(image_array: np.array, fmt: str) -> bytes: - image = Image.fromarray(image_array) - with io.BytesIO() as output: - image.save(output, format=fmt) - return output.getvalue() - - -def make_image_bytes(shape: Sequence[int], fmt: str = 'JPEG') -> bytes: - """Generates image and return bytes in specified format.""" - random_image = np.random.randint(0, 256, size=shape, dtype=np.uint8) - return encode_image(random_image, fmt=fmt) - - -def put_int64_to_context(seq_example: tf.train.SequenceExample, - label: int = 0, - key: str = LABEL_KEY): - """Puts int64 to SequenceExample context with key.""" - seq_example.context.feature[key].int64_list.value[:] = [label] - - -def put_bytes_list_to_feature(seq_example: tf.train.SequenceExample, - raw_image_bytes: bytes, - key: str = IMAGE_KEY, - repeat_num: int = 2): - """Puts bytes list to SequenceExample context with key.""" - for _ in range(repeat_num): - seq_example.feature_lists.feature_list.get_or_create( - key).feature.add().bytes_list.value[:] = [raw_image_bytes] - - -def put_float_list_to_feature(seq_example: tf.train.SequenceExample, - value: Sequence[Sequence[float]], key: str): - """Puts float list to SequenceExample context with key.""" - for s in value: - seq_example.feature_lists.feature_list.get_or_create( - key).feature.add().float_list.value[:] = s - - -def make_video_test_example(image_shape: Sequence[int] = (263, 320, 3), - audio_shape: Sequence[int] = (10, 256), - label: int = 42): - """Generates data for testing video models (inc. RGB, audio, & label).""" - raw_image_bytes = make_image_bytes(shape=image_shape) - random_audio = np.random.normal(size=audio_shape).tolist() - - seq_example = tf.train.SequenceExample() - put_int64_to_context(seq_example, label=label, key=LABEL_KEY) - put_bytes_list_to_feature( - seq_example, raw_image_bytes, key=IMAGE_KEY, repeat_num=4) - - put_float_list_to_feature(seq_example, value=random_audio, key=AUDIO_KEY) - return seq_example - - -def dump_to_tfrecord(record_file: str, - tf_examples: Sequence[Union[tf.train.Example, - tf.train.SequenceExample]]): - """Writes serialized Example to TFRecord file with path.""" - with tf.io.TFRecordWriter(record_file) as writer: - for tf_example in tf_examples: - writer.write(tf_example.SerializeToString()) - - -def _encode_image(image_array: np.ndarray, fmt: str) -> bytes: - """Util function to encode an image.""" - image = Image.fromarray(image_array) - with io.BytesIO() as output: - image.save(output, format=fmt) - return output.getvalue() - - -def create_classification_example( - image_height: int, - image_width: int, - image_format: str = 'JPEG', - is_multilabel: bool = False) -> tf.train.Example: - """Creates image and labels for image classification input pipeline.""" - image = _encode_image( - np.uint8(np.random.rand(image_height, image_width, 3) * 255), - fmt=image_format) - labels = [0, 1] if is_multilabel else [0] - serialized_example = tf.train.Example( - features=tf.train.Features( - feature={ - IMAGE_KEY: (tf.train.Feature( - bytes_list=tf.train.BytesList(value=[image]))), - CLASSIFICATION_LABEL_KEY: (tf.train.Feature( - int64_list=tf.train.Int64List(value=labels))), - })).SerializeToString() - return serialized_example - - -def create_3d_image_test_example(image_height: int, image_width: int, - image_volume: int, - image_channel: int) -> tf.train.Example: - """Creates 3D image and label.""" - images = np.random.rand(image_height, image_width, image_volume, - image_channel) - images = images.astype(np.float32) - - labels = np.random.randint( - low=2, size=(image_height, image_width, image_volume, image_channel)) - labels = labels.astype(np.float32) - - feature = { - IMAGE_KEY: (tf.train.Feature( - bytes_list=tf.train.BytesList(value=[images.tobytes()]))), - CLASSIFICATION_LABEL_KEY: (tf.train.Feature( - bytes_list=tf.train.BytesList(value=[labels.tobytes()]))) - } - return tf.train.Example(features=tf.train.Features(feature=feature)) - - -def create_detection_test_example(image_height: int, image_width: int, - image_channel: int, - num_instances: int) -> tf.train.Example: - """Creates and returns a test example containing box and mask annotations. - - Args: - image_height: The height of test image. - image_width: The width of test image. - image_channel: The channel of test image. - num_instances: The number of object instances per image. - - Returns: - A tf.train.Example for testing. - """ - image = make_image_bytes([image_height, image_width, image_channel]) - if num_instances == 0: - xmins = [] - xmaxs = [] - ymins = [] - ymaxs = [] - labels = [] - areas = [] - is_crowds = [] - masks = [] - labels_text = [] - else: - xmins = list(np.random.rand(num_instances)) - xmaxs = list(np.random.rand(num_instances)) - ymins = list(np.random.rand(num_instances)) - ymaxs = list(np.random.rand(num_instances)) - labels_text = [b'class_1'] * num_instances - labels = list(np.random.randint(100, size=num_instances)) - areas = [(xmax - xmin) * (ymax - ymin) * image_height * image_width - for xmin, xmax, ymin, ymax in zip(xmins, xmaxs, ymins, ymaxs)] - is_crowds = [0] * num_instances - masks = [] - for _ in range(num_instances): - mask = make_image_bytes([image_height, image_width], fmt='PNG') - masks.append(mask) - return tf.train.Example( - features=tf.train.Features( - feature={ - 'image/encoded': (tf.train.Feature( - bytes_list=tf.train.BytesList(value=[image]))), - 'image/source_id': (tf.train.Feature( - bytes_list=tf.train.BytesList(value=[DUMP_SOURCE_ID]))), - 'image/height': (tf.train.Feature( - int64_list=tf.train.Int64List(value=[image_height]))), - 'image/width': (tf.train.Feature( - int64_list=tf.train.Int64List(value=[image_width]))), - 'image/object/bbox/xmin': (tf.train.Feature( - float_list=tf.train.FloatList(value=xmins))), - 'image/object/bbox/xmax': (tf.train.Feature( - float_list=tf.train.FloatList(value=xmaxs))), - 'image/object/bbox/ymin': (tf.train.Feature( - float_list=tf.train.FloatList(value=ymins))), - 'image/object/bbox/ymax': (tf.train.Feature( - float_list=tf.train.FloatList(value=ymaxs))), - 'image/object/class/label': (tf.train.Feature( - int64_list=tf.train.Int64List(value=labels))), - 'image/object/class/text': (tf.train.Feature( - bytes_list=tf.train.BytesList(value=labels_text))), - 'image/object/is_crowd': (tf.train.Feature( - int64_list=tf.train.Int64List(value=is_crowds))), - 'image/object/area': (tf.train.Feature( - float_list=tf.train.FloatList(value=areas))), - 'image/object/mask': (tf.train.Feature( - bytes_list=tf.train.BytesList(value=masks))), - })) - - -def create_segmentation_test_example(image_height: int, image_width: int, - image_channel: int) -> tf.train.Example: - """Creates and returns a test example containing mask annotations. - - Args: - image_height: The height of test image. - image_width: The width of test image. - image_channel: The channel of test image. - - Returns: - A tf.train.Example for testing. - """ - image = make_image_bytes([image_height, image_width, image_channel]) - mask = make_image_bytes([image_height, image_width], fmt='PNG') - return tf.train.Example( - features=tf.train.Features( - feature={ - 'image/encoded': (tf.train.Feature( - bytes_list=tf.train.BytesList(value=[image]))), - 'image/segmentation/class/encoded': (tf.train.Feature( - bytes_list=tf.train.BytesList(value=[mask]))), - 'image/height': (tf.train.Feature( - int64_list=tf.train.Int64List(value=[image_height]))), - 'image/width': (tf.train.Feature( - int64_list=tf.train.Int64List(value=[image_width]))) - })) diff --git a/official/vision/beta/dataloaders/utils.py b/official/vision/beta/dataloaders/utils.py deleted file mode 100644 index 3cc4e084c9e9aa7f7c2a0b62d2830b0015c33134..0000000000000000000000000000000000000000 --- a/official/vision/beta/dataloaders/utils.py +++ /dev/null @@ -1,69 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Data loader utils.""" -from typing import Dict - -# Import libraries -import tensorflow as tf - -from official.vision.beta.ops import preprocess_ops - - -def process_source_id(source_id: tf.Tensor) -> tf.Tensor: - """Processes source_id to the right format. - - Args: - source_id: A `tf.Tensor` that contains the source ID. It can be empty. - - Returns: - A formatted source ID. - """ - if source_id.dtype == tf.string: - source_id = tf.strings.to_number(source_id, tf.int64) - with tf.control_dependencies([source_id]): - source_id = tf.cond( - pred=tf.equal(tf.size(input=source_id), 0), - true_fn=lambda: tf.cast(tf.constant(-1), tf.int64), - false_fn=lambda: tf.identity(source_id)) - return source_id - - -def pad_groundtruths_to_fixed_size(groundtruths: Dict[str, tf.Tensor], - size: int) -> Dict[str, tf.Tensor]: - """Pads the first dimension of groundtruths labels to the fixed size. - - Args: - groundtruths: A dictionary of {`str`: `tf.Tensor`} that contains groundtruth - annotations of `boxes`, `is_crowds`, `areas` and `classes`. - size: An `int` that specifies the expected size of the first dimension of - padded tensors. - - Returns: - A dictionary of the same keys as input and padded tensors as values. - - """ - groundtruths['boxes'] = preprocess_ops.clip_or_pad_to_fixed_size( - groundtruths['boxes'], size, -1) - groundtruths['is_crowds'] = preprocess_ops.clip_or_pad_to_fixed_size( - groundtruths['is_crowds'], size, 0) - groundtruths['areas'] = preprocess_ops.clip_or_pad_to_fixed_size( - groundtruths['areas'], size, -1) - groundtruths['classes'] = preprocess_ops.clip_or_pad_to_fixed_size( - groundtruths['classes'], size, -1) - if 'attributes' in groundtruths: - for k, v in groundtruths['attributes'].items(): - groundtruths['attributes'][k] = preprocess_ops.clip_or_pad_to_fixed_size( - v, size, -1) - return groundtruths diff --git a/official/vision/beta/evaluation/__init__.py b/official/vision/beta/evaluation/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/evaluation/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/evaluation/coco_evaluator.py b/official/vision/beta/evaluation/coco_evaluator.py deleted file mode 100644 index 03793bdcd798568824cce827f6329f33a9dd6304..0000000000000000000000000000000000000000 --- a/official/vision/beta/evaluation/coco_evaluator.py +++ /dev/null @@ -1,336 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""The COCO-style evaluator. - -The following snippet demonstrates the use of interfaces: - - evaluator = COCOEvaluator(...) - for _ in range(num_evals): - for _ in range(num_batches_per_eval): - predictions, groundtruth = predictor.predict(...) # pop a batch. - evaluator.update_state(groundtruths, predictions) - evaluator.result() # finish one full eval and reset states. - -See also: https://github.com/cocodataset/cocoapi/ -""" - -import atexit -import tempfile -# Import libraries -from absl import logging -import numpy as np -from pycocotools import cocoeval -import six -import tensorflow as tf - -from official.vision.beta.evaluation import coco_utils - - -class COCOEvaluator(object): - """COCO evaluation metric class.""" - - def __init__(self, - annotation_file, - include_mask, - need_rescale_bboxes=True, - per_category_metrics=False): - """Constructs COCO evaluation class. - - The class provides the interface to COCO metrics_fn. The - _update_op() takes detections from each image and push them to - self.detections. The _evaluate() loads a JSON file in COCO annotation format - as the groundtruths and runs COCO evaluation. - - Args: - annotation_file: a JSON file that stores annotations of the eval dataset. - If `annotation_file` is None, groundtruth annotations will be loaded - from the dataloader. - include_mask: a boolean to indicate whether or not to include the mask - eval. - need_rescale_bboxes: If true bboxes in `predictions` will be rescaled back - to absolute values (`image_info` is needed in this case). - per_category_metrics: Whether to return per category metrics. - """ - if annotation_file: - if annotation_file.startswith('gs://'): - _, local_val_json = tempfile.mkstemp(suffix='.json') - tf.io.gfile.remove(local_val_json) - - tf.io.gfile.copy(annotation_file, local_val_json) - atexit.register(tf.io.gfile.remove, local_val_json) - else: - local_val_json = annotation_file - self._coco_gt = coco_utils.COCOWrapper( - eval_type=('mask' if include_mask else 'box'), - annotation_file=local_val_json) - self._annotation_file = annotation_file - self._include_mask = include_mask - self._per_category_metrics = per_category_metrics - self._metric_names = [ - 'AP', 'AP50', 'AP75', 'APs', 'APm', 'APl', 'ARmax1', 'ARmax10', - 'ARmax100', 'ARs', 'ARm', 'ARl' - ] - self._required_prediction_fields = [ - 'source_id', 'num_detections', 'detection_classes', 'detection_scores', - 'detection_boxes' - ] - self._need_rescale_bboxes = need_rescale_bboxes - if self._need_rescale_bboxes: - self._required_prediction_fields.append('image_info') - self._required_groundtruth_fields = [ - 'source_id', 'height', 'width', 'classes', 'boxes' - ] - if self._include_mask: - mask_metric_names = ['mask_' + x for x in self._metric_names] - self._metric_names.extend(mask_metric_names) - self._required_prediction_fields.extend(['detection_masks']) - self._required_groundtruth_fields.extend(['masks']) - - self.reset_states() - - @property - def name(self): - return 'coco_metric' - - def reset_states(self): - """Resets internal states for a fresh run.""" - self._predictions = {} - if not self._annotation_file: - self._groundtruths = {} - - def result(self): - """Evaluates detection results, and reset_states.""" - metric_dict = self.evaluate() - # Cleans up the internal variables in order for a fresh eval next time. - self.reset_states() - return metric_dict - - def evaluate(self): - """Evaluates with detections from all images with COCO API. - - Returns: - coco_metric: float numpy array with shape [24] representing the - coco-style evaluation metrics (box and mask). - """ - if not self._annotation_file: - logging.info('There is no annotation_file in COCOEvaluator.') - gt_dataset = coco_utils.convert_groundtruths_to_coco_dataset( - self._groundtruths) - coco_gt = coco_utils.COCOWrapper( - eval_type=('mask' if self._include_mask else 'box'), - gt_dataset=gt_dataset) - else: - logging.info('Using annotation file: %s', self._annotation_file) - coco_gt = self._coco_gt - coco_predictions = coco_utils.convert_predictions_to_coco_annotations( - self._predictions) - coco_dt = coco_gt.loadRes(predictions=coco_predictions) - image_ids = [ann['image_id'] for ann in coco_predictions] - - coco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='bbox') - coco_eval.params.imgIds = image_ids - coco_eval.evaluate() - coco_eval.accumulate() - coco_eval.summarize() - coco_metrics = coco_eval.stats - - if self._include_mask: - mcoco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='segm') - mcoco_eval.params.imgIds = image_ids - mcoco_eval.evaluate() - mcoco_eval.accumulate() - mcoco_eval.summarize() - mask_coco_metrics = mcoco_eval.stats - - if self._include_mask: - metrics = np.hstack((coco_metrics, mask_coco_metrics)) - else: - metrics = coco_metrics - - metrics_dict = {} - for i, name in enumerate(self._metric_names): - metrics_dict[name] = metrics[i].astype(np.float32) - - # Adds metrics per category. - if self._per_category_metrics: - metrics_dict.update(self._retrieve_per_category_metrics(coco_eval)) - - if self._include_mask: - metrics_dict.update(self._retrieve_per_category_metrics( - mcoco_eval, prefix='mask')) - - return metrics_dict - - def _retrieve_per_category_metrics(self, coco_eval, prefix=''): - """Retrieves and per-category metrics and retuns them in a dict. - - Args: - coco_eval: a cocoeval.COCOeval object containing evaluation data. - prefix: str, A string used to prefix metric names. - - Returns: - metrics_dict: A dictionary with per category metrics. - """ - - metrics_dict = {} - if prefix: - prefix = prefix + ' ' - - if hasattr(coco_eval, 'category_stats'): - for category_index, category_id in enumerate(coco_eval.params.catIds): - if self._annotation_file: - coco_category = self._coco_gt.cats[category_id] - # if 'name' is available use it, otherwise use `id` - category_display_name = coco_category.get('name', category_id) - else: - category_display_name = category_id - - metrics_dict[prefix + 'Precision mAP ByCategory/{}'.format( - category_display_name - )] = coco_eval.category_stats[0][category_index].astype(np.float32) - metrics_dict[prefix + 'Precision mAP ByCategory@50IoU/{}'.format( - category_display_name - )] = coco_eval.category_stats[1][category_index].astype(np.float32) - metrics_dict[prefix + 'Precision mAP ByCategory@75IoU/{}'.format( - category_display_name - )] = coco_eval.category_stats[2][category_index].astype(np.float32) - metrics_dict[prefix + 'Precision mAP ByCategory (small) /{}'.format( - category_display_name - )] = coco_eval.category_stats[3][category_index].astype(np.float32) - metrics_dict[prefix + 'Precision mAP ByCategory (medium) /{}'.format( - category_display_name - )] = coco_eval.category_stats[4][category_index].astype(np.float32) - metrics_dict[prefix + 'Precision mAP ByCategory (large) /{}'.format( - category_display_name - )] = coco_eval.category_stats[5][category_index].astype(np.float32) - metrics_dict[prefix + 'Recall AR@1 ByCategory/{}'.format( - category_display_name - )] = coco_eval.category_stats[6][category_index].astype(np.float32) - metrics_dict[prefix + 'Recall AR@10 ByCategory/{}'.format( - category_display_name - )] = coco_eval.category_stats[7][category_index].astype(np.float32) - metrics_dict[prefix + 'Recall AR@100 ByCategory/{}'.format( - category_display_name - )] = coco_eval.category_stats[8][category_index].astype(np.float32) - metrics_dict[prefix + 'Recall AR (small) ByCategory/{}'.format( - category_display_name - )] = coco_eval.category_stats[9][category_index].astype(np.float32) - metrics_dict[prefix + 'Recall AR (medium) ByCategory/{}'.format( - category_display_name - )] = coco_eval.category_stats[10][category_index].astype(np.float32) - metrics_dict[prefix + 'Recall AR (large) ByCategory/{}'.format( - category_display_name - )] = coco_eval.category_stats[11][category_index].astype(np.float32) - - return metrics_dict - - def _process_predictions(self, predictions): - image_scale = np.tile(predictions['image_info'][:, 2:3, :], (1, 1, 2)) - predictions['detection_boxes'] = ( - predictions['detection_boxes'].astype(np.float32)) - predictions['detection_boxes'] /= image_scale - if 'detection_outer_boxes' in predictions: - predictions['detection_outer_boxes'] = ( - predictions['detection_outer_boxes'].astype(np.float32)) - predictions['detection_outer_boxes'] /= image_scale - - def _convert_to_numpy(self, groundtruths, predictions): - """Converts tesnors to numpy arrays.""" - if groundtruths: - labels = tf.nest.map_structure(lambda x: x.numpy(), groundtruths) - numpy_groundtruths = {} - for key, val in labels.items(): - if isinstance(val, tuple): - val = np.concatenate(val) - numpy_groundtruths[key] = val - else: - numpy_groundtruths = groundtruths - - if predictions: - outputs = tf.nest.map_structure(lambda x: x.numpy(), predictions) - numpy_predictions = {} - for key, val in outputs.items(): - if isinstance(val, tuple): - val = np.concatenate(val) - numpy_predictions[key] = val - else: - numpy_predictions = predictions - - return numpy_groundtruths, numpy_predictions - - def update_state(self, groundtruths, predictions): - """Update and aggregate detection results and groundtruth data. - - Args: - groundtruths: a dictionary of Tensors including the fields below. - See also different parsers under `../dataloader` for more details. - Required fields: - - source_id: a numpy array of int or string of shape [batch_size]. - - height: a numpy array of int of shape [batch_size]. - - width: a numpy array of int of shape [batch_size]. - - num_detections: a numpy array of int of shape [batch_size]. - - boxes: a numpy array of float of shape [batch_size, K, 4]. - - classes: a numpy array of int of shape [batch_size, K]. - Optional fields: - - is_crowds: a numpy array of int of shape [batch_size, K]. If the - field is absent, it is assumed that this instance is not crowd. - - areas: a numy array of float of shape [batch_size, K]. If the - field is absent, the area is calculated using either boxes or - masks depending on which one is available. - - masks: a numpy array of float of shape - [batch_size, K, mask_height, mask_width], - predictions: a dictionary of tensors including the fields below. - See different parsers under `../dataloader` for more details. - Required fields: - - source_id: a numpy array of int or string of shape [batch_size]. - - image_info [if `need_rescale_bboxes` is True]: a numpy array of - float of shape [batch_size, 4, 2]. - - num_detections: a numpy array of - int of shape [batch_size]. - - detection_boxes: a numpy array of float of shape [batch_size, K, 4]. - - detection_classes: a numpy array of int of shape [batch_size, K]. - - detection_scores: a numpy array of float of shape [batch_size, K]. - Optional fields: - - detection_masks: a numpy array of float of shape - [batch_size, K, mask_height, mask_width]. - Raises: - ValueError: if the required prediction or groundtruth fields are not - present in the incoming `predictions` or `groundtruths`. - """ - groundtruths, predictions = self._convert_to_numpy(groundtruths, - predictions) - for k in self._required_prediction_fields: - if k not in predictions: - raise ValueError( - 'Missing the required key `{}` in predictions!'.format(k)) - if self._need_rescale_bboxes: - self._process_predictions(predictions) - for k, v in six.iteritems(predictions): - if k not in self._predictions: - self._predictions[k] = [v] - else: - self._predictions[k].append(v) - - if not self._annotation_file: - assert groundtruths - for k in self._required_groundtruth_fields: - if k not in groundtruths: - raise ValueError( - 'Missing the required key `{}` in groundtruths!'.format(k)) - for k, v in six.iteritems(groundtruths): - if k not in self._groundtruths: - self._groundtruths[k] = [v] - else: - self._groundtruths[k].append(v) diff --git a/official/vision/beta/evaluation/coco_utils.py b/official/vision/beta/evaluation/coco_utils.py deleted file mode 100644 index d1e9c1f6d58568861957a46fa9505f885b40e203..0000000000000000000000000000000000000000 --- a/official/vision/beta/evaluation/coco_utils.py +++ /dev/null @@ -1,400 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Util functions related to pycocotools and COCO eval.""" - -import copy -import json - -# Import libraries - -from absl import logging -import numpy as np -from PIL import Image -from pycocotools import coco -from pycocotools import mask as mask_api -import six -import tensorflow as tf - -from official.common import dataset_fn -from official.vision.beta.dataloaders import tf_example_decoder -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import mask_ops - - -class COCOWrapper(coco.COCO): - """COCO wrapper class. - - This class wraps COCO API object, which provides the following additional - functionalities: - 1. Support string type image id. - 2. Support loading the groundtruth dataset using the external annotation - dictionary. - 3. Support loading the prediction results using the external annotation - dictionary. - """ - - def __init__(self, eval_type='box', annotation_file=None, gt_dataset=None): - """Instantiates a COCO-style API object. - - Args: - eval_type: either 'box' or 'mask'. - annotation_file: a JSON file that stores annotations of the eval dataset. - This is required if `gt_dataset` is not provided. - gt_dataset: the groundtruth eval datatset in COCO API format. - """ - if ((annotation_file and gt_dataset) or - ((not annotation_file) and (not gt_dataset))): - raise ValueError('One and only one of `annotation_file` and `gt_dataset` ' - 'needs to be specified.') - - if eval_type not in ['box', 'mask']: - raise ValueError('The `eval_type` can only be either `box` or `mask`.') - - coco.COCO.__init__(self, annotation_file=annotation_file) - self._eval_type = eval_type - if gt_dataset: - self.dataset = gt_dataset - self.createIndex() - - def loadRes(self, predictions): - """Loads result file and return a result api object. - - Args: - predictions: a list of dictionary each representing an annotation in COCO - format. The required fields are `image_id`, `category_id`, `score`, - `bbox`, `segmentation`. - - Returns: - res: result COCO api object. - - Raises: - ValueError: if the set of image id from predctions is not the subset of - the set of image id of the groundtruth dataset. - """ - res = coco.COCO() - res.dataset['images'] = copy.deepcopy(self.dataset['images']) - res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) - - image_ids = [ann['image_id'] for ann in predictions] - if set(image_ids) != (set(image_ids) & set(self.getImgIds())): - raise ValueError('Results do not correspond to the current dataset!') - for ann in predictions: - x1, x2, y1, y2 = [ann['bbox'][0], ann['bbox'][0] + ann['bbox'][2], - ann['bbox'][1], ann['bbox'][1] + ann['bbox'][3]] - if self._eval_type == 'box': - ann['area'] = ann['bbox'][2] * ann['bbox'][3] - ann['segmentation'] = [ - [x1, y1, x1, y2, x2, y2, x2, y1]] - elif self._eval_type == 'mask': - ann['area'] = mask_api.area(ann['segmentation']) - - res.dataset['annotations'] = copy.deepcopy(predictions) - res.createIndex() - return res - - -def convert_predictions_to_coco_annotations(predictions): - """Converts a batch of predictions to annotations in COCO format. - - Args: - predictions: a dictionary of lists of numpy arrays including the following - fields. K below denotes the maximum number of instances per image. - Required fields: - - source_id: a list of numpy arrays of int or string of shape - [batch_size]. - - num_detections: a list of numpy arrays of int of shape [batch_size]. - - detection_boxes: a list of numpy arrays of float of shape - [batch_size, K, 4], where coordinates are in the original image - space (not the scaled image space). - - detection_classes: a list of numpy arrays of int of shape - [batch_size, K]. - - detection_scores: a list of numpy arrays of float of shape - [batch_size, K]. - Optional fields: - - detection_masks: a list of numpy arrays of float of shape - [batch_size, K, mask_height, mask_width]. - - Returns: - coco_predictions: prediction in COCO annotation format. - """ - coco_predictions = [] - num_batches = len(predictions['source_id']) - max_num_detections = predictions['detection_classes'][0].shape[1] - use_outer_box = 'detection_outer_boxes' in predictions - for i in range(num_batches): - predictions['detection_boxes'][i] = box_ops.yxyx_to_xywh( - predictions['detection_boxes'][i]) - if use_outer_box: - predictions['detection_outer_boxes'][i] = box_ops.yxyx_to_xywh( - predictions['detection_outer_boxes'][i]) - mask_boxes = predictions['detection_outer_boxes'] - else: - mask_boxes = predictions['detection_boxes'] - - batch_size = predictions['source_id'][i].shape[0] - for j in range(batch_size): - if 'detection_masks' in predictions: - image_masks = mask_ops.paste_instance_masks( - predictions['detection_masks'][i][j], - mask_boxes[i][j], - int(predictions['image_info'][i][j, 0, 0]), - int(predictions['image_info'][i][j, 0, 1])) - binary_masks = (image_masks > 0.0).astype(np.uint8) - encoded_masks = [ - mask_api.encode(np.asfortranarray(binary_mask)) - for binary_mask in list(binary_masks)] - for k in range(max_num_detections): - ann = {} - ann['image_id'] = predictions['source_id'][i][j] - ann['category_id'] = predictions['detection_classes'][i][j, k] - ann['bbox'] = predictions['detection_boxes'][i][j, k] - ann['score'] = predictions['detection_scores'][i][j, k] - if 'detection_masks' in predictions: - ann['segmentation'] = encoded_masks[k] - coco_predictions.append(ann) - - for i, ann in enumerate(coco_predictions): - ann['id'] = i + 1 - - return coco_predictions - - -def convert_groundtruths_to_coco_dataset(groundtruths, label_map=None): - """Converts groundtruths to the dataset in COCO format. - - Args: - groundtruths: a dictionary of numpy arrays including the fields below. - Note that each element in the list represent the number for a single - example without batch dimension. K below denotes the actual number of - instances for each image. - Required fields: - - source_id: a list of numpy arrays of int or string of shape - [batch_size]. - - height: a list of numpy arrays of int of shape [batch_size]. - - width: a list of numpy arrays of int of shape [batch_size]. - - num_detections: a list of numpy arrays of int of shape [batch_size]. - - boxes: a list of numpy arrays of float of shape [batch_size, K, 4], - where coordinates are in the original image space (not the - normalized coordinates). - - classes: a list of numpy arrays of int of shape [batch_size, K]. - Optional fields: - - is_crowds: a list of numpy arrays of int of shape [batch_size, K]. If - th field is absent, it is assumed that this instance is not crowd. - - areas: a list of numy arrays of float of shape [batch_size, K]. If the - field is absent, the area is calculated using either boxes or - masks depending on which one is available. - - masks: a list of numpy arrays of string of shape [batch_size, K], - label_map: (optional) a dictionary that defines items from the category id - to the category name. If `None`, collect the category mappping from the - `groundtruths`. - - Returns: - coco_groundtruths: the groundtruth dataset in COCO format. - """ - source_ids = np.concatenate(groundtruths['source_id'], axis=0) - heights = np.concatenate(groundtruths['height'], axis=0) - widths = np.concatenate(groundtruths['width'], axis=0) - gt_images = [{'id': int(i), 'height': int(h), 'width': int(w)} for i, h, w - in zip(source_ids, heights, widths)] - - gt_annotations = [] - num_batches = len(groundtruths['source_id']) - for i in range(num_batches): - logging.info( - 'convert_groundtruths_to_coco_dataset: Processing annotation %d', i) - max_num_instances = groundtruths['classes'][i].shape[1] - batch_size = groundtruths['source_id'][i].shape[0] - for j in range(batch_size): - num_instances = groundtruths['num_detections'][i][j] - if num_instances > max_num_instances: - logging.warning( - 'num_groundtruths is larger than max_num_instances, %d v.s. %d', - num_instances, max_num_instances) - num_instances = max_num_instances - for k in range(int(num_instances)): - ann = {} - ann['image_id'] = int(groundtruths['source_id'][i][j]) - if 'is_crowds' in groundtruths: - ann['iscrowd'] = int(groundtruths['is_crowds'][i][j, k]) - else: - ann['iscrowd'] = 0 - ann['category_id'] = int(groundtruths['classes'][i][j, k]) - boxes = groundtruths['boxes'][i] - ann['bbox'] = [ - float(boxes[j, k, 1]), - float(boxes[j, k, 0]), - float(boxes[j, k, 3] - boxes[j, k, 1]), - float(boxes[j, k, 2] - boxes[j, k, 0])] - if 'areas' in groundtruths: - ann['area'] = float(groundtruths['areas'][i][j, k]) - else: - ann['area'] = float( - (boxes[j, k, 3] - boxes[j, k, 1]) * - (boxes[j, k, 2] - boxes[j, k, 0])) - if 'masks' in groundtruths: - if isinstance(groundtruths['masks'][i][j, k], tf.Tensor): - mask = Image.open( - six.BytesIO(groundtruths['masks'][i][j, k].numpy())) - width, height = mask.size - np_mask = ( - np.array(mask.getdata()).reshape(height, - width).astype(np.uint8)) - else: - mask = Image.open( - six.BytesIO(groundtruths['masks'][i][j, k])) - width, height = mask.size - np_mask = ( - np.array(mask.getdata()).reshape(height, - width).astype(np.uint8)) - np_mask[np_mask > 0] = 255 - encoded_mask = mask_api.encode(np.asfortranarray(np_mask)) - ann['segmentation'] = encoded_mask - # Ensure the content of `counts` is JSON serializable string. - if 'counts' in ann['segmentation']: - ann['segmentation']['counts'] = six.ensure_str( - ann['segmentation']['counts']) - if 'areas' not in groundtruths: - ann['area'] = mask_api.area(encoded_mask) - gt_annotations.append(ann) - - for i, ann in enumerate(gt_annotations): - ann['id'] = i + 1 - - if label_map: - gt_categories = [{'id': i, 'name': label_map[i]} for i in label_map] - else: - category_ids = [gt['category_id'] for gt in gt_annotations] - gt_categories = [{'id': i} for i in set(category_ids)] - - gt_dataset = { - 'images': gt_images, - 'categories': gt_categories, - 'annotations': copy.deepcopy(gt_annotations), - } - return gt_dataset - - -class COCOGroundtruthGenerator: - """Generates the groundtruth annotations from a single example.""" - - def __init__(self, file_pattern, file_type, num_examples, include_mask, - regenerate_source_id=False): - self._file_pattern = file_pattern - self._num_examples = num_examples - self._include_mask = include_mask - self._dataset_fn = dataset_fn.pick_dataset_fn(file_type) - self._regenerate_source_id = regenerate_source_id - - def _parse_single_example(self, example): - """Parses a single serialized tf.Example proto. - - Args: - example: a serialized tf.Example proto string. - - Returns: - A dictionary of groundtruth with the following fields: - source_id: a scalar tensor of int64 representing the image source_id. - height: a scalar tensor of int64 representing the image height. - width: a scalar tensor of int64 representing the image width. - boxes: a float tensor of shape [K, 4], representing the groundtruth - boxes in absolute coordinates with respect to the original image size. - classes: a int64 tensor of shape [K], representing the class labels of - each instances. - is_crowds: a bool tensor of shape [K], indicating whether the instance - is crowd. - areas: a float tensor of shape [K], indicating the area of each - instance. - masks: a string tensor of shape [K], containing the bytes of the png - mask of each instance. - """ - decoder = tf_example_decoder.TfExampleDecoder( - include_mask=self._include_mask, - regenerate_source_id=self._regenerate_source_id) - decoded_tensors = decoder.decode(example) - - image = decoded_tensors['image'] - image_size = tf.shape(image)[0:2] - boxes = box_ops.denormalize_boxes( - decoded_tensors['groundtruth_boxes'], image_size) - - source_id = decoded_tensors['source_id'] - if source_id.dtype is tf.string: - source_id = tf.strings.to_number(source_id, out_type=tf.int64) - - groundtruths = { - 'source_id': source_id, - 'height': decoded_tensors['height'], - 'width': decoded_tensors['width'], - 'num_detections': tf.shape(decoded_tensors['groundtruth_classes'])[0], - 'boxes': boxes, - 'classes': decoded_tensors['groundtruth_classes'], - 'is_crowds': decoded_tensors['groundtruth_is_crowd'], - 'areas': decoded_tensors['groundtruth_area'], - } - if self._include_mask: - groundtruths.update({ - 'masks': decoded_tensors['groundtruth_instance_masks_png'], - }) - return groundtruths - - def _build_pipeline(self): - """Builds data pipeline to generate groundtruth annotations.""" - dataset = tf.data.Dataset.list_files(self._file_pattern, shuffle=False) - dataset = dataset.interleave( - map_func=lambda filename: self._dataset_fn(filename).prefetch(1), - cycle_length=None, - num_parallel_calls=tf.data.experimental.AUTOTUNE) - - dataset = dataset.take(self._num_examples) - dataset = dataset.map(self._parse_single_example, - num_parallel_calls=tf.data.experimental.AUTOTUNE) - dataset = dataset.batch(1, drop_remainder=False) - dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE) - return dataset - - def __call__(self): - return self._build_pipeline() - - -def scan_and_generator_annotation_file(file_pattern: str, - file_type: str, - num_samples: int, - include_mask: bool, - annotation_file: str, - regenerate_source_id: bool = False): - """Scans and generate the COCO-style annotation JSON file given a dataset.""" - groundtruth_generator = COCOGroundtruthGenerator( - file_pattern, file_type, num_samples, include_mask, regenerate_source_id) - generate_annotation_file(groundtruth_generator, annotation_file) - - -def generate_annotation_file(groundtruth_generator, - annotation_file): - """Generates COCO-style annotation JSON file given a groundtruth generator.""" - groundtruths = {} - logging.info('Loading groundtruth annotations from dataset to memory...') - for i, groundtruth in enumerate(groundtruth_generator()): - logging.info('generate_annotation_file: Processing annotation %d', i) - for k, v in six.iteritems(groundtruth): - if k not in groundtruths: - groundtruths[k] = [v] - else: - groundtruths[k].append(v) - gt_dataset = convert_groundtruths_to_coco_dataset(groundtruths) - - logging.info('Saving groundtruth annotations to the JSON file...') - with tf.io.gfile.GFile(annotation_file, 'w') as f: - f.write(json.dumps(gt_dataset)) - logging.info('Done saving the JSON file...') diff --git a/official/vision/beta/evaluation/iou.py b/official/vision/beta/evaluation/iou.py deleted file mode 100644 index b1d94e7ea446cb292a5ea7e3722a5ab1df696138..0000000000000000000000000000000000000000 --- a/official/vision/beta/evaluation/iou.py +++ /dev/null @@ -1,129 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""IOU Metrics used for semantic segmentation models.""" - -import numpy as np -import tensorflow as tf - - -class PerClassIoU(tf.keras.metrics.Metric): - """Computes the per-class Intersection-Over-Union metric. - - Mean Intersection-Over-Union is a common evaluation metric for semantic image - segmentation, which first computes the IOU for each semantic class. - IOU is defined as follows: - IOU = true_positive / (true_positive + false_positive + false_negative). - The predictions are accumulated in a confusion matrix, weighted by - `sample_weight` and the metric is then calculated from it. - - If `sample_weight` is `None`, weights default to 1. - Use `sample_weight` of 0 to mask values. - - Example: - - >>> # cm = [[1, 1], - >>> # [1, 1]] - >>> # sum_row = [2, 2], sum_col = [2, 2], true_positives = [1, 1] - >>> # iou = true_positives / (sum_row + sum_col - true_positives)) - >>> # result = [(1 / (2 + 2 - 1), 1 / (2 + 2 - 1)] = 0.33 - >>> m = tf.keras.metrics.MeanIoU(num_classes=2) - >>> m.update_state([0, 0, 1, 1], [0, 1, 0, 1]) - >>> m.result().numpy() - [0.33333334, 0.33333334] - - """ - - def __init__(self, num_classes, name=None, dtype=None): - """Initializes `PerClassIoU`. - - Args: - num_classes: The possible number of labels the prediction task can have. - This value must be provided, since a confusion matrix of dimension = - [num_classes, num_classes] will be allocated. - name: (Optional) string name of the metric instance. - dtype: (Optional) data type of the metric result. - - """ - - super(PerClassIoU, self).__init__(name=name, dtype=dtype) - self.num_classes = num_classes - - # Variable to accumulate the predictions in the confusion matrix. - self.total_cm = self.add_weight( - 'total_confusion_matrix', - shape=(num_classes, num_classes), - initializer=tf.compat.v1.zeros_initializer) - - def update_state(self, y_true, y_pred, sample_weight=None): - """Accumulates the confusion matrix statistics. - - Args: - y_true: The ground truth values. - y_pred: The predicted values. - sample_weight: Optional weighting of each example. Defaults to 1. Can be a - `Tensor` whose rank is either 0, or the same rank as `y_true`, and must - be broadcastable to `y_true`. - - Returns: - IOU per class. - """ - - y_true = tf.cast(y_true, self._dtype) - y_pred = tf.cast(y_pred, self._dtype) - - # Flatten the input if its rank > 1. - if y_pred.shape.ndims > 1: - y_pred = tf.reshape(y_pred, [-1]) - - if y_true.shape.ndims > 1: - y_true = tf.reshape(y_true, [-1]) - - if sample_weight is not None: - sample_weight = tf.cast(sample_weight, self._dtype) - if sample_weight.shape.ndims > 1: - sample_weight = tf.reshape(sample_weight, [-1]) - - # Accumulate the prediction to current confusion matrix. - current_cm = tf.math.confusion_matrix( - y_true, - y_pred, - self.num_classes, - weights=sample_weight, - dtype=self._dtype) - return self.total_cm.assign_add(current_cm) - - def result(self): - """Compute the mean intersection-over-union via the confusion matrix.""" - sum_over_row = tf.cast( - tf.reduce_sum(self.total_cm, axis=0), dtype=self._dtype) - sum_over_col = tf.cast( - tf.reduce_sum(self.total_cm, axis=1), dtype=self._dtype) - true_positives = tf.cast( - tf.linalg.tensor_diag_part(self.total_cm), dtype=self._dtype) - - # sum_over_row + sum_over_col = - # 2 * true_positives + false_positives + false_negatives. - denominator = sum_over_row + sum_over_col - true_positives - - return tf.math.divide_no_nan(true_positives, denominator) - - def reset_states(self): - tf.keras.backend.set_value( - self.total_cm, np.zeros((self.num_classes, self.num_classes))) - - def get_config(self): - config = {'num_classes': self.num_classes} - base_config = super(PerClassIoU, self).get_config() - return dict(list(base_config.items()) + list(config.items())) diff --git a/official/vision/beta/evaluation/segmentation_metrics.py b/official/vision/beta/evaluation/segmentation_metrics.py deleted file mode 100644 index ae1131dd227009686ac52ccbdfb66c8051ba2da9..0000000000000000000000000000000000000000 --- a/official/vision/beta/evaluation/segmentation_metrics.py +++ /dev/null @@ -1,227 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Metrics for segmentation.""" -import tensorflow as tf - -from official.vision.beta.evaluation import iou - - -class MeanIoU(tf.keras.metrics.MeanIoU): - """Mean IoU metric for semantic segmentation. - - This class utilizes tf.keras.metrics.MeanIoU to perform batched mean iou when - both input images and groundtruth masks are resized to the same size - (rescale_predictions=False). It also computes mean iou on groundtruth original - sizes, in which case, each prediction is rescaled back to the original image - size. - """ - - def __init__( - self, num_classes, rescale_predictions=False, name=None, dtype=None): - """Constructs Segmentation evaluator class. - - Args: - num_classes: `int`, number of classes. - rescale_predictions: `bool`, whether to scale back prediction to original - image sizes. If True, y_true['image_info'] is used to rescale - predictions. - name: `str`, name of the metric instance.. - dtype: data type of the metric result. - """ - self._rescale_predictions = rescale_predictions - super().__init__(num_classes=num_classes, name=name, dtype=dtype) - - def update_state(self, y_true, y_pred): - """Updates metric state. - - Args: - y_true: `dict`, dictionary with the following name, and key values. - - masks: [batch, width, height, 1], groundtruth masks. - - valid_masks: [batch, width, height, 1], valid elements in the mask. - - image_info: [batch, 4, 2], a tensor that holds information about - original and preprocessed images. Each entry is in the format of - [[original_height, original_width], [input_height, input_width], - [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, - desired_width] is the actual scaled image size, and [y_scale, x_scale] - is the scaling factor, which is the ratio of scaled dimension / - original dimension. - y_pred: Tensor [batch, width_p, height_p, num_classes], predicated masks. - """ - predictions = y_pred - masks = y_true['masks'] - valid_masks = y_true['valid_masks'] - images_info = y_true['image_info'] - - if isinstance(predictions, tuple) or isinstance(predictions, list): - predictions = tf.concat(predictions, axis=0) - masks = tf.concat(masks, axis=0) - valid_masks = tf.concat(valid_masks, axis=0) - images_info = tf.concat(images_info, axis=0) - - # Ignore mask elements is set to zero for argmax op. - masks = tf.where(valid_masks, masks, tf.zeros_like(masks)) - - if self._rescale_predictions: - # This part can only run on cpu/gpu due to dynamic image resizing. - for i in range(tf.shape(predictions)[0]): - mask = masks[i] - valid_mask = valid_masks[i] - predicted_mask = predictions[i] - image_info = images_info[i] - - rescale_size = tf.cast( - tf.math.ceil(image_info[1, :] / image_info[2, :]), tf.int32) - image_shape = tf.cast(image_info[0, :], tf.int32) - offsets = tf.cast(image_info[3, :], tf.int32) - - predicted_mask = tf.image.resize( - predicted_mask, - rescale_size, - method=tf.image.ResizeMethod.BILINEAR) - - predicted_mask = tf.image.crop_to_bounding_box(predicted_mask, - offsets[0], offsets[1], - image_shape[0], - image_shape[1]) - mask = tf.image.crop_to_bounding_box(mask, 0, 0, image_shape[0], - image_shape[1]) - valid_mask = tf.image.crop_to_bounding_box(valid_mask, 0, 0, - image_shape[0], - image_shape[1]) - - predicted_mask = tf.argmax(predicted_mask, axis=2) - flatten_predictions = tf.reshape(predicted_mask, shape=[1, -1]) - flatten_masks = tf.reshape(mask, shape=[1, -1]) - flatten_valid_masks = tf.reshape(valid_mask, shape=[1, -1]) - super(MeanIoU, self).update_state( - flatten_masks, flatten_predictions, - tf.cast(flatten_valid_masks, tf.float32)) - - else: - predictions = tf.image.resize( - predictions, - tf.shape(masks)[1:3], - method=tf.image.ResizeMethod.BILINEAR) - predictions = tf.argmax(predictions, axis=3) - flatten_predictions = tf.reshape(predictions, shape=[-1]) - flatten_masks = tf.reshape(masks, shape=[-1]) - flatten_valid_masks = tf.reshape(valid_masks, shape=[-1]) - - super().update_state(flatten_masks, flatten_predictions, - tf.cast(flatten_valid_masks, tf.float32)) - - -class PerClassIoU(iou.PerClassIoU): - """Per Class IoU metric for semantic segmentation. - - This class utilizes iou.PerClassIoU to perform batched per class - iou when both input images and groundtruth masks are resized to the same size - (rescale_predictions=False). It also computes per class iou on groundtruth - original sizes, in which case, each prediction is rescaled back to the - original image size. - """ - - def __init__( - self, num_classes, rescale_predictions=False, name=None, dtype=None): - """Constructs Segmentation evaluator class. - - Args: - num_classes: `int`, number of classes. - rescale_predictions: `bool`, whether to scale back prediction to original - image sizes. If True, y_true['image_info'] is used to rescale - predictions. - name: `str`, name of the metric instance.. - dtype: data type of the metric result. - """ - self._rescale_predictions = rescale_predictions - super().__init__(num_classes=num_classes, name=name, dtype=dtype) - - def update_state(self, y_true, y_pred): - """Updates metric state. - - Args: - y_true: `dict`, dictionary with the following name, and key values. - - masks: [batch, width, height, 1], groundtruth masks. - - valid_masks: [batch, width, height, 1], valid elements in the mask. - - image_info: [batch, 4, 2], a tensor that holds information about - original and preprocessed images. Each entry is in the format of - [[original_height, original_width], [input_height, input_width], - [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, - desired_width] is the actual scaled image size, and [y_scale, x_scale] - is the scaling factor, which is the ratio of scaled dimension / - original dimension. - y_pred: Tensor [batch, width_p, height_p, num_classes], predicated masks. - """ - predictions = y_pred - masks = y_true['masks'] - valid_masks = y_true['valid_masks'] - images_info = y_true['image_info'] - - if isinstance(predictions, tuple) or isinstance(predictions, list): - predictions = tf.concat(predictions, axis=0) - masks = tf.concat(masks, axis=0) - valid_masks = tf.concat(valid_masks, axis=0) - images_info = tf.concat(images_info, axis=0) - - # Ignore mask elements is set to zero for argmax op. - masks = tf.where(valid_masks, masks, tf.zeros_like(masks)) - - if self._rescale_predictions: - # This part can only run on cpu/gpu due to dynamic image resizing. - for i in range(tf.shape(predictions)[0]): - mask = masks[i] - valid_mask = valid_masks[i] - predicted_mask = predictions[i] - image_info = images_info[i] - - rescale_size = tf.cast( - tf.math.ceil(image_info[1, :] / image_info[2, :]), tf.int32) - image_shape = tf.cast(image_info[0, :], tf.int32) - offsets = tf.cast(image_info[3, :], tf.int32) - - predicted_mask = tf.image.resize( - predicted_mask, - rescale_size, - method=tf.image.ResizeMethod.BILINEAR) - - predicted_mask = tf.image.crop_to_bounding_box(predicted_mask, - offsets[0], offsets[1], - image_shape[0], - image_shape[1]) - mask = tf.image.crop_to_bounding_box(mask, 0, 0, image_shape[0], - image_shape[1]) - valid_mask = tf.image.crop_to_bounding_box(valid_mask, 0, 0, - image_shape[0], - image_shape[1]) - - predicted_mask = tf.argmax(predicted_mask, axis=2) - flatten_predictions = tf.reshape(predicted_mask, shape=[1, -1]) - flatten_masks = tf.reshape(mask, shape=[1, -1]) - flatten_valid_masks = tf.reshape(valid_mask, shape=[1, -1]) - super().update_state(flatten_masks, flatten_predictions, - tf.cast(flatten_valid_masks, tf.float32)) - - else: - predictions = tf.image.resize( - predictions, - tf.shape(masks)[1:3], - method=tf.image.ResizeMethod.BILINEAR) - predictions = tf.argmax(predictions, axis=3) - flatten_predictions = tf.reshape(predictions, shape=[-1]) - flatten_masks = tf.reshape(masks, shape=[-1]) - flatten_valid_masks = tf.reshape(valid_masks, shape=[-1]) - - super().update_state(flatten_masks, flatten_predictions, - tf.cast(flatten_valid_masks, tf.float32)) diff --git a/official/vision/beta/evaluation/segmentation_metrics_test.py b/official/vision/beta/evaluation/segmentation_metrics_test.py deleted file mode 100644 index 76f63e40a812d6441d6e62c75df088b9f5d6549b..0000000000000000000000000000000000000000 --- a/official/vision/beta/evaluation/segmentation_metrics_test.py +++ /dev/null @@ -1,77 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for segmentation_metrics.""" - -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.vision.beta.evaluation import segmentation_metrics - - -class SegmentationMetricsTest(parameterized.TestCase, tf.test.TestCase): - - def _create_test_data(self): - y_pred_cls0 = np.expand_dims( - np.array([[1, 1, 0], [1, 1, 0], [0, 0, 0]], dtype=np.uint16), - axis=(0, -1)) - y_pred_cls1 = np.expand_dims( - np.array([[0, 0, 0], [0, 0, 1], [0, 0, 1]], dtype=np.uint16), - axis=(0, -1)) - y_pred = np.concatenate((y_pred_cls0, y_pred_cls1), axis=-1) - - y_true = { - 'masks': - np.expand_dims( - np.array([[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], - [0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 1, 1], - [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1]], - dtype=np.uint16), - axis=(0, -1)), - 'valid_masks': - np.ones([1, 6, 6, 1], dtype=np.uint16), - 'image_info': - np.array([[[6, 6], [3, 3], [0.5, 0.5], [0, 0]]], dtype=np.float32) - } - return y_pred, y_true - - @parameterized.parameters(True, False) - def test_mean_iou_metric(self, rescale_predictions): - tf.config.experimental_run_functions_eagerly(True) - mean_iou_metric = segmentation_metrics.MeanIoU( - num_classes=2, rescale_predictions=rescale_predictions) - y_pred, y_true = self._create_test_data() - # Disable autograph for correct coverage statistics. - update_fn = tf.autograph.experimental.do_not_convert( - mean_iou_metric.update_state) - update_fn(y_true=y_true, y_pred=y_pred) - miou = mean_iou_metric.result() - self.assertAlmostEqual(miou.numpy(), 0.762, places=3) - - @parameterized.parameters(True, False) - def test_per_class_mean_iou_metric(self, rescale_predictions): - per_class_iou_metric = segmentation_metrics.PerClassIoU( - num_classes=2, rescale_predictions=rescale_predictions) - y_pred, y_true = self._create_test_data() - # Disable autograph for correct coverage statistics. - update_fn = tf.autograph.experimental.do_not_convert( - per_class_iou_metric.update_state) - update_fn(y_true=y_true, y_pred=y_pred) - per_class_miou = per_class_iou_metric.result() - self.assertAllClose(per_class_miou.numpy(), [0.857, 0.667], atol=1e-3) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/losses/__init__.py b/official/vision/beta/losses/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/losses/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/losses/loss_utils.py b/official/vision/beta/losses/loss_utils.py deleted file mode 100644 index 70bc1ce5cad1d26de41a41b4d58750fb6c9c2928..0000000000000000000000000000000000000000 --- a/official/vision/beta/losses/loss_utils.py +++ /dev/null @@ -1,42 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Losses utilities for detection models.""" - -import tensorflow as tf - - -def multi_level_flatten(multi_level_inputs, last_dim=None): - """Flattens a multi-level input. - - Args: - multi_level_inputs: Ordered Dict with level to [batch, d1, ..., dm]. - last_dim: Whether the output should be [batch_size, None], or [batch_size, - None, last_dim]. Defaults to `None`. - - Returns: - Concatenated output [batch_size, None], or [batch_size, None, dm] - """ - flattened_inputs = [] - batch_size = None - for level in multi_level_inputs.keys(): - single_input = multi_level_inputs[level] - if batch_size is None: - batch_size = single_input.shape[0] or tf.shape(single_input)[0] - if last_dim is not None: - flattened_input = tf.reshape(single_input, [batch_size, -1, last_dim]) - else: - flattened_input = tf.reshape(single_input, [batch_size, -1]) - flattened_inputs.append(flattened_input) - return tf.concat(flattened_inputs, axis=1) diff --git a/official/vision/beta/losses/segmentation_losses.py b/official/vision/beta/losses/segmentation_losses.py deleted file mode 100644 index 215fa183e839da1a319a2afbc378bd3a325c653c..0000000000000000000000000000000000000000 --- a/official/vision/beta/losses/segmentation_losses.py +++ /dev/null @@ -1,134 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Losses used for segmentation models.""" - -# Import libraries -import tensorflow as tf - -from official.modeling import tf_utils - -EPSILON = 1e-5 - - -class SegmentationLoss: - """Semantic segmentation loss.""" - - def __init__(self, label_smoothing, class_weights, ignore_label, - use_groundtruth_dimension, top_k_percent_pixels=1.0): - self._top_k_percent_pixels = top_k_percent_pixels - self._class_weights = class_weights - self._ignore_label = ignore_label - self._use_groundtruth_dimension = use_groundtruth_dimension - self._label_smoothing = label_smoothing - - def __call__(self, logits, labels): - _, height, width, num_classes = logits.get_shape().as_list() - - if self._use_groundtruth_dimension: - # TODO(arashwan): Test using align corners to match deeplab alignment. - logits = tf.image.resize( - logits, tf.shape(labels)[1:3], - method=tf.image.ResizeMethod.BILINEAR) - else: - labels = tf.image.resize( - labels, (height, width), - method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) - - valid_mask = tf.not_equal(labels, self._ignore_label) - normalizer = tf.reduce_sum(tf.cast(valid_mask, tf.float32)) + EPSILON - # Assign pixel with ignore label to class 0 (background). The loss on the - # pixel will later be masked out. - labels = tf.where(valid_mask, labels, tf.zeros_like(labels)) - - labels = tf.squeeze(tf.cast(labels, tf.int32), axis=3) - valid_mask = tf.squeeze(tf.cast(valid_mask, tf.float32), axis=3) - onehot_labels = tf.one_hot(labels, num_classes) - onehot_labels = onehot_labels * ( - 1 - self._label_smoothing) + self._label_smoothing / num_classes - cross_entropy_loss = tf.nn.softmax_cross_entropy_with_logits( - labels=onehot_labels, logits=logits) - - if not self._class_weights: - class_weights = [1] * num_classes - else: - class_weights = self._class_weights - - if num_classes != len(class_weights): - raise ValueError( - 'Length of class_weights should be {}'.format(num_classes)) - - weight_mask = tf.einsum('...y,y->...', - tf.one_hot(labels, num_classes, dtype=tf.float32), - tf.constant(class_weights, tf.float32)) - valid_mask *= weight_mask - cross_entropy_loss *= tf.cast(valid_mask, tf.float32) - - if self._top_k_percent_pixels >= 1.0: - loss = tf.reduce_sum(cross_entropy_loss) / normalizer - else: - cross_entropy_loss = tf.reshape(cross_entropy_loss, shape=[-1]) - top_k_pixels = tf.cast( - self._top_k_percent_pixels * - tf.cast(tf.size(cross_entropy_loss), tf.float32), tf.int32) - top_k_losses, _ = tf.math.top_k( - cross_entropy_loss, k=top_k_pixels, sorted=True) - normalizer = tf.reduce_sum( - tf.cast(tf.not_equal(top_k_losses, 0.0), tf.float32)) + EPSILON - loss = tf.reduce_sum(top_k_losses) / normalizer - - return loss - - -def get_actual_mask_scores(logits, labels, ignore_label): - """Gets actual mask scores.""" - _, height, width, num_classes = logits.get_shape().as_list() - batch_size = tf.shape(logits)[0] - logits = tf.stop_gradient(logits) - labels = tf.image.resize( - labels, (height, width), - method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) - predicted_labels = tf.argmax(logits, -1, output_type=tf.int32) - flat_predictions = tf.reshape(predicted_labels, [batch_size, -1]) - flat_labels = tf.cast(tf.reshape(labels, [batch_size, -1]), tf.int32) - - one_hot_predictions = tf.one_hot( - flat_predictions, num_classes, on_value=True, off_value=False) - one_hot_labels = tf.one_hot( - flat_labels, num_classes, on_value=True, off_value=False) - keep_mask = tf.not_equal(flat_labels, ignore_label) - keep_mask = tf.expand_dims(keep_mask, 2) - - overlap = tf.logical_and(one_hot_predictions, one_hot_labels) - overlap = tf.logical_and(overlap, keep_mask) - overlap = tf.reduce_sum(tf.cast(overlap, tf.float32), axis=1) - union = tf.logical_or(one_hot_predictions, one_hot_labels) - union = tf.logical_and(union, keep_mask) - union = tf.reduce_sum(tf.cast(union, tf.float32), axis=1) - actual_scores = tf.divide(overlap, tf.maximum(union, EPSILON)) - return actual_scores - - -class MaskScoringLoss: - """Mask Scoring loss.""" - - def __init__(self, ignore_label): - self._ignore_label = ignore_label - self._mse_loss = tf.keras.losses.MeanSquaredError( - reduction=tf.keras.losses.Reduction.NONE) - - def __call__(self, predicted_scores, logits, labels): - actual_scores = get_actual_mask_scores(logits, labels, self._ignore_label) - loss = tf_utils.safe_mean(self._mse_loss(actual_scores, predicted_scores)) - return loss diff --git a/official/vision/beta/modeling/__init__.py b/official/vision/beta/modeling/__init__.py deleted file mode 100644 index 3215829950349ce4201620a687e27ca57a61e437..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/__init__.py +++ /dev/null @@ -1,21 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Modeling package definition.""" - -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import decoders -from official.vision.beta.modeling import heads -from official.vision.beta.modeling import layers diff --git a/official/vision/beta/modeling/backbones/__init__.py b/official/vision/beta/modeling/backbones/__init__.py deleted file mode 100644 index 26c02f62d88f1723b53ea143126c4ec4076f498a..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/backbones/__init__.py +++ /dev/null @@ -1,26 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Backbones package definition.""" - -from official.vision.beta.modeling.backbones.efficientnet import EfficientNet -from official.vision.beta.modeling.backbones.mobiledet import MobileDet -from official.vision.beta.modeling.backbones.mobilenet import MobileNet -from official.vision.beta.modeling.backbones.resnet import ResNet -from official.vision.beta.modeling.backbones.resnet_3d import ResNet3D -from official.vision.beta.modeling.backbones.resnet_deeplab import DilatedResNet -from official.vision.beta.modeling.backbones.revnet import RevNet -from official.vision.beta.modeling.backbones.spinenet import SpineNet -from official.vision.beta.modeling.backbones.spinenet_mobile import SpineNetMobile diff --git a/official/vision/beta/modeling/backbones/factory.py b/official/vision/beta/modeling/backbones/factory.py deleted file mode 100644 index 324301266e6a0bb01fd9ca375bfaba47be5b8c09..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/backbones/factory.py +++ /dev/null @@ -1,113 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Backbone registers and factory method. - -One can regitered a new backbone model by the following two steps: - -1 Import the factory and register the build in the backbone file. -2 Import the backbone class and add a build in __init__.py. - -``` -# my_backbone.py - -from modeling.backbones import factory - -class MyBackbone(): - ... - -@factory.register_backbone_builder('my_backbone') -def build_my_backbone(): - return MyBackbone() - -# backbones/__init__.py adds import -from modeling.backbones.my_backbone import MyBackbone -``` - -If one wants the MyBackbone class to be used only by those binary -then don't imported the backbone module in backbones/__init__.py, but import it -in place that uses it. - - -""" -from typing import Sequence, Union - -# Import libraries - -import tensorflow as tf - -from official.core import registry -from official.modeling import hyperparams - - -_REGISTERED_BACKBONE_CLS = {} - - -def register_backbone_builder(key: str): - """Decorates a builder of backbone class. - - The builder should be a Callable (a class or a function). - This decorator supports registration of backbone builder as follows: - - ``` - class MyBackbone(tf.keras.Model): - pass - - @register_backbone_builder('mybackbone') - def builder(input_specs, config, l2_reg): - return MyBackbone(...) - - # Builds a MyBackbone object. - my_backbone = build_backbone_3d(input_specs, config, l2_reg) - ``` - - Args: - key: A `str` of key to look up the builder. - - Returns: - A callable for using as class decorator that registers the decorated class - for creation from an instance of task_config_cls. - """ - return registry.register(_REGISTERED_BACKBONE_CLS, key) - - -def build_backbone(input_specs: Union[tf.keras.layers.InputSpec, - Sequence[tf.keras.layers.InputSpec]], - backbone_config: hyperparams.Config, - norm_activation_config: hyperparams.Config, - l2_regularizer: tf.keras.regularizers.Regularizer = None, - **kwargs) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras - """Builds backbone from a config. - - Args: - input_specs: A (sequence of) `tf.keras.layers.InputSpec` of input. - backbone_config: A `OneOfConfig` of backbone config. - norm_activation_config: A config for normalization/activation layer. - l2_regularizer: A `tf.keras.regularizers.Regularizer` object. Default to - None. - **kwargs: Additional keyword args to be passed to backbone builder. - - Returns: - A `tf.keras.Model` instance of the backbone. - """ - backbone_builder = registry.lookup(_REGISTERED_BACKBONE_CLS, - backbone_config.type) - - return backbone_builder( - input_specs=input_specs, - backbone_config=backbone_config, - norm_activation_config=norm_activation_config, - l2_regularizer=l2_regularizer, - **kwargs) diff --git a/official/vision/beta/modeling/backbones/factory_test.py b/official/vision/beta/modeling/backbones/factory_test.py deleted file mode 100644 index 81a7455d37c31a3c9a3f42c1381ecfad2bc12a9d..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/backbones/factory_test.py +++ /dev/null @@ -1,228 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for factory functions.""" -# Import libraries -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from official.vision.beta.configs import backbones as backbones_cfg -from official.vision.beta.configs import backbones_3d as backbones_3d_cfg -from official.vision.beta.configs import common as common_cfg -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling.backbones import factory - - -class FactoryTest(tf.test.TestCase, parameterized.TestCase): - - @combinations.generate( - combinations.combine(model_id=[18, 34, 50, 101, 152],)) - def test_resnet_creation(self, model_id): - """Test creation of ResNet models.""" - - network = backbones.ResNet( - model_id=model_id, se_ratio=0.0, norm_momentum=0.99, norm_epsilon=1e-5) - - backbone_config = backbones_cfg.Backbone( - type='resnet', - resnet=backbones_cfg.ResNet(model_id=model_id, se_ratio=0.0)) - norm_activation_config = common_cfg.NormActivation( - norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) - - factory_network = factory.build_backbone( - input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), - backbone_config=backbone_config, - norm_activation_config=norm_activation_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - - @combinations.generate( - combinations.combine( - model_id=['b0', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7'], - se_ratio=[0.0, 0.25], - )) - def test_efficientnet_creation(self, model_id, se_ratio): - """Test creation of EfficientNet models.""" - - network = backbones.EfficientNet( - model_id=model_id, - se_ratio=se_ratio, - norm_momentum=0.99, - norm_epsilon=1e-5) - - backbone_config = backbones_cfg.Backbone( - type='efficientnet', - efficientnet=backbones_cfg.EfficientNet( - model_id=model_id, se_ratio=se_ratio)) - norm_activation_config = common_cfg.NormActivation( - norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) - - factory_network = factory.build_backbone( - input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), - backbone_config=backbone_config, - norm_activation_config=norm_activation_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - - @combinations.generate( - combinations.combine( - model_id=['MobileNetV1', 'MobileNetV2', - 'MobileNetV3Large', 'MobileNetV3Small', - 'MobileNetV3EdgeTPU'], - filter_size_scale=[1.0, 0.75], - )) - def test_mobilenet_creation(self, model_id, filter_size_scale): - """Test creation of Mobilenet models.""" - - network = backbones.MobileNet( - model_id=model_id, - filter_size_scale=filter_size_scale, - norm_momentum=0.99, - norm_epsilon=1e-5) - - backbone_config = backbones_cfg.Backbone( - type='mobilenet', - mobilenet=backbones_cfg.MobileNet( - model_id=model_id, filter_size_scale=filter_size_scale)) - norm_activation_config = common_cfg.NormActivation( - norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) - - factory_network = factory.build_backbone( - input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), - backbone_config=backbone_config, - norm_activation_config=norm_activation_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - - @combinations.generate(combinations.combine(model_id=['49'],)) - def test_spinenet_creation(self, model_id): - """Test creation of SpineNet models.""" - input_size = 128 - min_level = 3 - max_level = 7 - - input_specs = tf.keras.layers.InputSpec( - shape=[None, input_size, input_size, 3]) - network = backbones.SpineNet( - input_specs=input_specs, - min_level=min_level, - max_level=max_level, - norm_momentum=0.99, - norm_epsilon=1e-5) - - backbone_config = backbones_cfg.Backbone( - type='spinenet', - spinenet=backbones_cfg.SpineNet(model_id=model_id)) - norm_activation_config = common_cfg.NormActivation( - norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) - - factory_network = factory.build_backbone( - input_specs=tf.keras.layers.InputSpec( - shape=[None, input_size, input_size, 3]), - backbone_config=backbone_config, - norm_activation_config=norm_activation_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - - @combinations.generate( - combinations.combine(model_id=[38, 56, 104],)) - def test_revnet_creation(self, model_id): - """Test creation of RevNet models.""" - network = backbones.RevNet( - model_id=model_id, norm_momentum=0.99, norm_epsilon=1e-5) - - backbone_config = backbones_cfg.Backbone( - type='revnet', - revnet=backbones_cfg.RevNet(model_id=model_id)) - norm_activation_config = common_cfg.NormActivation( - norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) - - factory_network = factory.build_backbone( - input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), - backbone_config=backbone_config, - norm_activation_config=norm_activation_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - - @combinations.generate(combinations.combine(model_type=['resnet_3d'],)) - def test_resnet_3d_creation(self, model_type): - """Test creation of ResNet 3D models.""" - backbone_cfg = backbones_3d_cfg.Backbone3D(type=model_type).get() - temporal_strides = [] - temporal_kernel_sizes = [] - for block_spec in backbone_cfg.block_specs: - temporal_strides.append(block_spec.temporal_strides) - temporal_kernel_sizes.append(block_spec.temporal_kernel_sizes) - - _ = backbones.ResNet3D( - model_id=backbone_cfg.model_id, - temporal_strides=temporal_strides, - temporal_kernel_sizes=temporal_kernel_sizes, - norm_momentum=0.99, - norm_epsilon=1e-5) - - @combinations.generate( - combinations.combine( - model_id=[ - 'MobileDetCPU', - 'MobileDetDSP', - 'MobileDetEdgeTPU', - 'MobileDetGPU'], - filter_size_scale=[1.0, 0.75], - )) - def test_mobiledet_creation(self, model_id, filter_size_scale): - """Test creation of Mobiledet models.""" - - network = backbones.MobileDet( - model_id=model_id, - filter_size_scale=filter_size_scale, - norm_momentum=0.99, - norm_epsilon=1e-5) - - backbone_config = backbones_cfg.Backbone( - type='mobiledet', - mobiledet=backbones_cfg.MobileDet( - model_id=model_id, filter_size_scale=filter_size_scale)) - norm_activation_config = common_cfg.NormActivation( - norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) - - factory_network = factory.build_backbone( - input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), - backbone_config=backbone_config, - norm_activation_config=norm_activation_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/backbones/resnet.py b/official/vision/beta/modeling/backbones/resnet.py deleted file mode 100644 index 4c77ec8d9f108634bc96c9f351eff8554a4b5558..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/backbones/resnet.py +++ /dev/null @@ -1,432 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains definitions of ResNet and ResNet-RS models.""" - -from typing import Callable, Optional - -# Import libraries -import tensorflow as tf - -from official.modeling import hyperparams -from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.modeling.layers import nn_layers - -layers = tf.keras.layers - -# Specifications for different ResNet variants. -# Each entry specifies block configurations of the particular ResNet variant. -# Each element in the block configuration is in the following format: -# (block_fn, num_filters, block_repeats) -RESNET_SPECS = { - 10: [ - ('residual', 64, 1), - ('residual', 128, 1), - ('residual', 256, 1), - ('residual', 512, 1), - ], - 18: [ - ('residual', 64, 2), - ('residual', 128, 2), - ('residual', 256, 2), - ('residual', 512, 2), - ], - 34: [ - ('residual', 64, 3), - ('residual', 128, 4), - ('residual', 256, 6), - ('residual', 512, 3), - ], - 50: [ - ('bottleneck', 64, 3), - ('bottleneck', 128, 4), - ('bottleneck', 256, 6), - ('bottleneck', 512, 3), - ], - 101: [ - ('bottleneck', 64, 3), - ('bottleneck', 128, 4), - ('bottleneck', 256, 23), - ('bottleneck', 512, 3), - ], - 152: [ - ('bottleneck', 64, 3), - ('bottleneck', 128, 8), - ('bottleneck', 256, 36), - ('bottleneck', 512, 3), - ], - 200: [ - ('bottleneck', 64, 3), - ('bottleneck', 128, 24), - ('bottleneck', 256, 36), - ('bottleneck', 512, 3), - ], - 270: [ - ('bottleneck', 64, 4), - ('bottleneck', 128, 29), - ('bottleneck', 256, 53), - ('bottleneck', 512, 4), - ], - 350: [ - ('bottleneck', 64, 4), - ('bottleneck', 128, 36), - ('bottleneck', 256, 72), - ('bottleneck', 512, 4), - ], - 420: [ - ('bottleneck', 64, 4), - ('bottleneck', 128, 44), - ('bottleneck', 256, 87), - ('bottleneck', 512, 4), - ], -} - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class ResNet(tf.keras.Model): - """Creates ResNet and ResNet-RS family models. - - This implements the Deep Residual Network from: - Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. - Deep Residual Learning for Image Recognition. - (https://arxiv.org/pdf/1512.03385) and - Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, - Tsung-Yi Lin, Jonathon Shlens, Barret Zoph. - Revisiting ResNets: Improved Training and Scaling Strategies. - (https://arxiv.org/abs/2103.07579). - """ - - def __init__( - self, - model_id: int, - input_specs: tf.keras.layers.InputSpec = layers.InputSpec( - shape=[None, None, None, 3]), - depth_multiplier: float = 1.0, - stem_type: str = 'v0', - resnetd_shortcut: bool = False, - replace_stem_max_pool: bool = False, - se_ratio: Optional[float] = None, - init_stochastic_depth_rate: float = 0.0, - scale_stem: bool = True, - activation: str = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - kernel_initializer: str = 'VarianceScaling', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bn_trainable: bool = True, - **kwargs): - """Initializes a ResNet model. - - Args: - model_id: An `int` of the depth of ResNet backbone model. - input_specs: A `tf.keras.layers.InputSpec` of the input tensor. - depth_multiplier: A `float` of the depth multiplier to uniformaly scale up - all layers in channel size. This argument is also referred to as - `width_multiplier` in (https://arxiv.org/abs/2103.07579). - stem_type: A `str` of stem type of ResNet. Default to `v0`. If set to - `v1`, use ResNet-D type stem (https://arxiv.org/abs/1812.01187). - resnetd_shortcut: A `bool` of whether to use ResNet-D shortcut in - downsampling blocks. - replace_stem_max_pool: A `bool` of whether to replace the max pool in stem - with a stride-2 conv, - se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. - init_stochastic_depth_rate: A `float` of initial stochastic depth rate. - scale_stem: A `bool` of whether to scale stem layers. - activation: A `str` name of the activation function. - use_sync_bn: If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A small `float` added to variance to avoid dividing by zero. - kernel_initializer: A str for kernel initializer of convolutional layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - Default to None. - bn_trainable: A `bool` that indicates whether batch norm layers should be - trainable. Default to True. - **kwargs: Additional keyword arguments to be passed. - """ - self._model_id = model_id - self._input_specs = input_specs - self._depth_multiplier = depth_multiplier - self._stem_type = stem_type - self._resnetd_shortcut = resnetd_shortcut - self._replace_stem_max_pool = replace_stem_max_pool - self._se_ratio = se_ratio - self._init_stochastic_depth_rate = init_stochastic_depth_rate - self._scale_stem = scale_stem - self._use_sync_bn = use_sync_bn - self._activation = activation - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - if use_sync_bn: - self._norm = layers.experimental.SyncBatchNormalization - else: - self._norm = layers.BatchNormalization - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._bn_trainable = bn_trainable - - if tf.keras.backend.image_data_format() == 'channels_last': - bn_axis = -1 - else: - bn_axis = 1 - - # Build ResNet. - inputs = tf.keras.Input(shape=input_specs.shape[1:]) - - stem_depth_multiplier = self._depth_multiplier if scale_stem else 1.0 - if stem_type == 'v0': - x = layers.Conv2D( - filters=int(64 * stem_depth_multiplier), - kernel_size=7, - strides=2, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - inputs) - x = self._norm( - axis=bn_axis, - momentum=norm_momentum, - epsilon=norm_epsilon, - trainable=bn_trainable)( - x) - x = tf_utils.get_activation(activation, use_keras_layer=True)(x) - elif stem_type == 'v1': - x = layers.Conv2D( - filters=int(32 * stem_depth_multiplier), - kernel_size=3, - strides=2, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - inputs) - x = self._norm( - axis=bn_axis, - momentum=norm_momentum, - epsilon=norm_epsilon, - trainable=bn_trainable)( - x) - x = tf_utils.get_activation(activation, use_keras_layer=True)(x) - x = layers.Conv2D( - filters=int(32 * stem_depth_multiplier), - kernel_size=3, - strides=1, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - x) - x = self._norm( - axis=bn_axis, - momentum=norm_momentum, - epsilon=norm_epsilon, - trainable=bn_trainable)( - x) - x = tf_utils.get_activation(activation, use_keras_layer=True)(x) - x = layers.Conv2D( - filters=int(64 * stem_depth_multiplier), - kernel_size=3, - strides=1, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - x) - x = self._norm( - axis=bn_axis, - momentum=norm_momentum, - epsilon=norm_epsilon, - trainable=bn_trainable)( - x) - x = tf_utils.get_activation(activation, use_keras_layer=True)(x) - else: - raise ValueError('Stem type {} not supported.'.format(stem_type)) - - if replace_stem_max_pool: - x = layers.Conv2D( - filters=int(64 * self._depth_multiplier), - kernel_size=3, - strides=2, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - x) - x = self._norm( - axis=bn_axis, - momentum=norm_momentum, - epsilon=norm_epsilon, - trainable=bn_trainable)( - x) - x = tf_utils.get_activation(activation, use_keras_layer=True)(x) - else: - x = layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x) - - endpoints = {} - for i, spec in enumerate(RESNET_SPECS[model_id]): - if spec[0] == 'residual': - block_fn = nn_blocks.ResidualBlock - elif spec[0] == 'bottleneck': - block_fn = nn_blocks.BottleneckBlock - else: - raise ValueError('Block fn `{}` is not supported.'.format(spec[0])) - x = self._block_group( - inputs=x, - filters=int(spec[1] * self._depth_multiplier), - strides=(1 if i == 0 else 2), - block_fn=block_fn, - block_repeats=spec[2], - stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( - self._init_stochastic_depth_rate, i + 2, 5), - name='block_group_l{}'.format(i + 2)) - endpoints[str(i + 2)] = x - - self._output_specs = {l: endpoints[l].get_shape() for l in endpoints} - - super(ResNet, self).__init__(inputs=inputs, outputs=endpoints, **kwargs) - - def _block_group(self, - inputs: tf.Tensor, - filters: int, - strides: int, - block_fn: Callable[..., tf.keras.layers.Layer], - block_repeats: int = 1, - stochastic_depth_drop_rate: float = 0.0, - name: str = 'block_group'): - """Creates one group of blocks for the ResNet model. - - Args: - inputs: A `tf.Tensor` of size `[batch, channels, height, width]`. - filters: An `int` number of filters for the first convolution of the - layer. - strides: An `int` stride to use for the first convolution of the layer. - If greater than 1, this layer will downsample the input. - block_fn: The type of block group. Either `nn_blocks.ResidualBlock` or - `nn_blocks.BottleneckBlock`. - block_repeats: An `int` number of blocks contained in the layer. - stochastic_depth_drop_rate: A `float` of drop rate of the current block - group. - name: A `str` name for the block. - - Returns: - The output `tf.Tensor` of the block layer. - """ - x = block_fn( - filters=filters, - strides=strides, - use_projection=True, - stochastic_depth_drop_rate=stochastic_depth_drop_rate, - se_ratio=self._se_ratio, - resnetd_shortcut=self._resnetd_shortcut, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._activation, - use_sync_bn=self._use_sync_bn, - norm_momentum=self._norm_momentum, - norm_epsilon=self._norm_epsilon, - bn_trainable=self._bn_trainable)( - inputs) - - for _ in range(1, block_repeats): - x = block_fn( - filters=filters, - strides=1, - use_projection=False, - stochastic_depth_drop_rate=stochastic_depth_drop_rate, - se_ratio=self._se_ratio, - resnetd_shortcut=self._resnetd_shortcut, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._activation, - use_sync_bn=self._use_sync_bn, - norm_momentum=self._norm_momentum, - norm_epsilon=self._norm_epsilon, - bn_trainable=self._bn_trainable)( - x) - - return tf.keras.layers.Activation('linear', name=name)(x) - - def get_config(self): - config_dict = { - 'model_id': self._model_id, - 'depth_multiplier': self._depth_multiplier, - 'stem_type': self._stem_type, - 'resnetd_shortcut': self._resnetd_shortcut, - 'replace_stem_max_pool': self._replace_stem_max_pool, - 'activation': self._activation, - 'se_ratio': self._se_ratio, - 'init_stochastic_depth_rate': self._init_stochastic_depth_rate, - 'scale_stem': self._scale_stem, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'bn_trainable': self._bn_trainable - } - return config_dict - - @classmethod - def from_config(cls, config, custom_objects=None): - return cls(**config) - - @property - def output_specs(self): - """A dict of {level: TensorShape} pairs for the model output.""" - return self._output_specs - - -@factory.register_backbone_builder('resnet') -def build_resnet( - input_specs: tf.keras.layers.InputSpec, - backbone_config: hyperparams.Config, - norm_activation_config: hyperparams.Config, - l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras - """Builds ResNet backbone from a config.""" - backbone_type = backbone_config.type - backbone_cfg = backbone_config.get() - assert backbone_type == 'resnet', (f'Inconsistent backbone type ' - f'{backbone_type}') - - return ResNet( - model_id=backbone_cfg.model_id, - input_specs=input_specs, - depth_multiplier=backbone_cfg.depth_multiplier, - stem_type=backbone_cfg.stem_type, - resnetd_shortcut=backbone_cfg.resnetd_shortcut, - replace_stem_max_pool=backbone_cfg.replace_stem_max_pool, - se_ratio=backbone_cfg.se_ratio, - init_stochastic_depth_rate=backbone_cfg.stochastic_depth_drop_rate, - scale_stem=backbone_cfg.scale_stem, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer, - bn_trainable=backbone_cfg.bn_trainable) diff --git a/official/vision/beta/modeling/backbones/spinenet.py b/official/vision/beta/modeling/backbones/spinenet.py deleted file mode 100644 index ac458dae778ab56cba1ad57da9027bd52af244bc..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/backbones/spinenet.py +++ /dev/null @@ -1,572 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains definitions of SpineNet Networks.""" - -import math -from typing import Any, List, Optional, Tuple - -# Import libraries - -from absl import logging -import tensorflow as tf - -from official.modeling import hyperparams -from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.modeling.layers import nn_layers -from official.vision.beta.ops import spatial_transform_ops - -layers = tf.keras.layers - -FILTER_SIZE_MAP = { - 1: 32, - 2: 64, - 3: 128, - 4: 256, - 5: 256, - 6: 256, - 7: 256, -} - -# The fixed SpineNet architecture discovered by NAS. -# Each element represents a specification of a building block: -# (block_level, block_fn, (input_offset0, input_offset1), is_output). -SPINENET_BLOCK_SPECS = [ - (2, 'bottleneck', (0, 1), False), - (4, 'residual', (0, 1), False), - (3, 'bottleneck', (2, 3), False), - (4, 'bottleneck', (2, 4), False), - (6, 'residual', (3, 5), False), - (4, 'bottleneck', (3, 5), False), - (5, 'residual', (6, 7), False), - (7, 'residual', (6, 8), False), - (5, 'bottleneck', (8, 9), False), - (5, 'bottleneck', (8, 10), False), - (4, 'bottleneck', (5, 10), True), - (3, 'bottleneck', (4, 10), True), - (5, 'bottleneck', (7, 12), True), - (7, 'bottleneck', (5, 14), True), - (6, 'bottleneck', (12, 14), True), - (2, 'bottleneck', (2, 13), True), -] - -SCALING_MAP = { - '49S': { - 'endpoints_num_filters': 128, - 'filter_size_scale': 0.65, - 'resample_alpha': 0.5, - 'block_repeats': 1, - }, - '49': { - 'endpoints_num_filters': 256, - 'filter_size_scale': 1.0, - 'resample_alpha': 0.5, - 'block_repeats': 1, - }, - '96': { - 'endpoints_num_filters': 256, - 'filter_size_scale': 1.0, - 'resample_alpha': 0.5, - 'block_repeats': 2, - }, - '143': { - 'endpoints_num_filters': 256, - 'filter_size_scale': 1.0, - 'resample_alpha': 1.0, - 'block_repeats': 3, - }, - # SpineNet-143 with 1.3x filter_size_scale. - '143L': { - 'endpoints_num_filters': 256, - 'filter_size_scale': 1.3, - 'resample_alpha': 1.0, - 'block_repeats': 3, - }, - '190': { - 'endpoints_num_filters': 512, - 'filter_size_scale': 1.3, - 'resample_alpha': 1.0, - 'block_repeats': 4, - }, -} - - -class BlockSpec(object): - """A container class that specifies the block configuration for SpineNet.""" - - def __init__(self, level: int, block_fn: str, input_offsets: Tuple[int, int], - is_output: bool): - self.level = level - self.block_fn = block_fn - self.input_offsets = input_offsets - self.is_output = is_output - - -def build_block_specs( - block_specs: Optional[List[Tuple[Any, ...]]] = None) -> List[BlockSpec]: - """Builds the list of BlockSpec objects for SpineNet.""" - if not block_specs: - block_specs = SPINENET_BLOCK_SPECS - logging.info('Building SpineNet block specs: %s', block_specs) - return [BlockSpec(*b) for b in block_specs] - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class SpineNet(tf.keras.Model): - """Creates a SpineNet family model. - - This implements: - Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, - Yin Cui, Quoc V. Le, Xiaodan Song. - SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization. - (https://arxiv.org/abs/1912.05027) - """ - - def __init__( - self, - input_specs: tf.keras.layers.InputSpec = tf.keras.layers.InputSpec( - shape=[None, None, None, 3]), - min_level: int = 3, - max_level: int = 7, - block_specs: List[BlockSpec] = build_block_specs(), - endpoints_num_filters: int = 256, - resample_alpha: float = 0.5, - block_repeats: int = 1, - filter_size_scale: float = 1.0, - init_stochastic_depth_rate: float = 0.0, - kernel_initializer: str = 'VarianceScaling', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - activation: str = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - **kwargs): - """Initializes a SpineNet model. - - Args: - input_specs: A `tf.keras.layers.InputSpec` of the input tensor. - min_level: An `int` of min level for output mutiscale features. - max_level: An `int` of max level for output mutiscale features. - block_specs: A list of block specifications for the SpineNet model - discovered by NAS. - endpoints_num_filters: An `int` of feature dimension for the output - endpoints. - resample_alpha: A `float` of resampling factor in cross-scale connections. - block_repeats: An `int` of number of blocks contained in the layer. - filter_size_scale: A `float` of multiplier for the filters (number of - channels) for all convolution ops. The value must be greater than zero. - Typical usage will be to set this value in (0, 1) to reduce the number - of parameters or computation cost of the model. - init_stochastic_depth_rate: A `float` of initial stochastic depth rate. - kernel_initializer: A str for kernel initializer of convolutional layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - Default to None. - activation: A `str` name of the activation function. - use_sync_bn: If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A small `float` added to variance to avoid dividing by zero. - **kwargs: Additional keyword arguments to be passed. - """ - self._input_specs = input_specs - self._min_level = min_level - self._max_level = max_level - self._block_specs = block_specs - self._endpoints_num_filters = endpoints_num_filters - self._resample_alpha = resample_alpha - self._block_repeats = block_repeats - self._filter_size_scale = filter_size_scale - self._init_stochastic_depth_rate = init_stochastic_depth_rate - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._activation = activation - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - if activation == 'relu': - self._activation_fn = tf.nn.relu - elif activation == 'swish': - self._activation_fn = tf.nn.swish - else: - raise ValueError('Activation {} not implemented.'.format(activation)) - self._init_block_fn = 'bottleneck' - self._num_init_blocks = 2 - - if use_sync_bn: - self._norm = layers.experimental.SyncBatchNormalization - else: - self._norm = layers.BatchNormalization - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - - # Build SpineNet. - inputs = tf.keras.Input(shape=input_specs.shape[1:]) - - net = self._build_stem(inputs=inputs) - input_width = input_specs.shape[2] - if input_width is None: - max_stride = max(map(lambda b: b.level, block_specs)) - input_width = 2 ** max_stride - net = self._build_scale_permuted_network(net=net, input_width=input_width) - endpoints = self._build_endpoints(net=net) - - self._output_specs = {l: endpoints[l].get_shape() for l in endpoints} - super(SpineNet, self).__init__(inputs=inputs, outputs=endpoints) - - def _block_group(self, - inputs: tf.Tensor, - filters: int, - strides: int, - block_fn_cand: str, - block_repeats: int = 1, - stochastic_depth_drop_rate: Optional[float] = None, - name: str = 'block_group'): - """Creates one group of blocks for the SpineNet model.""" - block_fn_candidates = { - 'bottleneck': nn_blocks.BottleneckBlock, - 'residual': nn_blocks.ResidualBlock, - } - block_fn = block_fn_candidates[block_fn_cand] - _, _, _, num_filters = inputs.get_shape().as_list() - - if block_fn_cand == 'bottleneck': - use_projection = not (num_filters == (filters * 4) and strides == 1) - else: - use_projection = not (num_filters == filters and strides == 1) - - x = block_fn( - filters=filters, - strides=strides, - use_projection=use_projection, - stochastic_depth_drop_rate=stochastic_depth_drop_rate, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._activation, - use_sync_bn=self._use_sync_bn, - norm_momentum=self._norm_momentum, - norm_epsilon=self._norm_epsilon)( - inputs) - for _ in range(1, block_repeats): - x = block_fn( - filters=filters, - strides=1, - use_projection=False, - stochastic_depth_drop_rate=stochastic_depth_drop_rate, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._activation, - use_sync_bn=self._use_sync_bn, - norm_momentum=self._norm_momentum, - norm_epsilon=self._norm_epsilon)( - x) - return tf.identity(x, name=name) - - def _build_stem(self, inputs): - """Builds SpineNet stem.""" - x = layers.Conv2D( - filters=64, - kernel_size=7, - strides=2, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - inputs) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation_fn)(x) - x = layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x) - - net = [] - # Build the initial level 2 blocks. - for i in range(self._num_init_blocks): - x = self._block_group( - inputs=x, - filters=int(FILTER_SIZE_MAP[2] * self._filter_size_scale), - strides=1, - block_fn_cand=self._init_block_fn, - block_repeats=self._block_repeats, - name='stem_block_{}'.format(i + 1)) - net.append(x) - return net - - def _build_scale_permuted_network(self, - net, - input_width, - weighted_fusion=False): - """Builds scale-permuted network.""" - net_sizes = [int(math.ceil(input_width / 2**2))] * len(net) - net_block_fns = [self._init_block_fn] * len(net) - num_outgoing_connections = [0] * len(net) - - endpoints = {} - for i, block_spec in enumerate(self._block_specs): - # Find out specs for the target block. - target_width = int(math.ceil(input_width / 2**block_spec.level)) - target_num_filters = int(FILTER_SIZE_MAP[block_spec.level] * - self._filter_size_scale) - target_block_fn = block_spec.block_fn - - # Resample then merge input0 and input1. - parents = [] - input0 = block_spec.input_offsets[0] - input1 = block_spec.input_offsets[1] - - x0 = self._resample_with_alpha( - inputs=net[input0], - input_width=net_sizes[input0], - input_block_fn=net_block_fns[input0], - target_width=target_width, - target_num_filters=target_num_filters, - target_block_fn=target_block_fn, - alpha=self._resample_alpha) - parents.append(x0) - num_outgoing_connections[input0] += 1 - - x1 = self._resample_with_alpha( - inputs=net[input1], - input_width=net_sizes[input1], - input_block_fn=net_block_fns[input1], - target_width=target_width, - target_num_filters=target_num_filters, - target_block_fn=target_block_fn, - alpha=self._resample_alpha) - parents.append(x1) - num_outgoing_connections[input1] += 1 - - # Merge 0 outdegree blocks to the output block. - if block_spec.is_output: - for j, (j_feat, - j_connections) in enumerate(zip(net, num_outgoing_connections)): - if j_connections == 0 and (j_feat.shape[2] == target_width and - j_feat.shape[3] == x0.shape[3]): - parents.append(j_feat) - num_outgoing_connections[j] += 1 - - # pylint: disable=g-direct-tensorflow-import - if weighted_fusion: - dtype = parents[0].dtype - parent_weights = [ - tf.nn.relu(tf.cast(tf.Variable(1.0, name='block{}_fusion{}'.format( - i, j)), dtype=dtype)) for j in range(len(parents))] - weights_sum = tf.add_n(parent_weights) - parents = [ - parents[i] * parent_weights[i] / (weights_sum + 0.0001) - for i in range(len(parents)) - ] - - # Fuse all parent nodes then build a new block. - x = tf_utils.get_activation(self._activation_fn)(tf.add_n(parents)) - x = self._block_group( - inputs=x, - filters=target_num_filters, - strides=1, - block_fn_cand=target_block_fn, - block_repeats=self._block_repeats, - stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( - self._init_stochastic_depth_rate, i + 1, len(self._block_specs)), - name='scale_permuted_block_{}'.format(i + 1)) - - net.append(x) - net_sizes.append(target_width) - net_block_fns.append(target_block_fn) - num_outgoing_connections.append(0) - - # Save output feats. - if block_spec.is_output: - if block_spec.level in endpoints: - raise ValueError('Duplicate feats found for output level {}.'.format( - block_spec.level)) - if (block_spec.level < self._min_level or - block_spec.level > self._max_level): - logging.warning( - 'SpineNet output level out of range [min_level, max_level] = ' - '[%s, %s] will not be used for further processing.', - self._min_level, self._max_level) - endpoints[str(block_spec.level)] = x - - return endpoints - - def _build_endpoints(self, net): - """Matches filter size for endpoints before sharing conv layers.""" - endpoints = {} - for level in range(self._min_level, self._max_level + 1): - x = layers.Conv2D( - filters=self._endpoints_num_filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - net[str(level)]) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation_fn)(x) - endpoints[str(level)] = x - return endpoints - - def _resample_with_alpha(self, - inputs, - input_width, - input_block_fn, - target_width, - target_num_filters, - target_block_fn, - alpha=0.5): - """Matches resolution and feature dimension.""" - _, _, _, input_num_filters = inputs.get_shape().as_list() - if input_block_fn == 'bottleneck': - input_num_filters /= 4 - new_num_filters = int(input_num_filters * alpha) - - x = layers.Conv2D( - filters=new_num_filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - inputs) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation_fn)(x) - - # Spatial resampling. - if input_width > target_width: - x = layers.Conv2D( - filters=new_num_filters, - kernel_size=3, - strides=2, - padding='SAME', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - x) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation_fn)(x) - input_width /= 2 - while input_width > target_width: - x = layers.MaxPool2D(pool_size=3, strides=2, padding='SAME')(x) - input_width /= 2 - elif input_width < target_width: - scale = target_width // input_width - x = spatial_transform_ops.nearest_upsampling(x, scale=scale) - - # Last 1x1 conv to match filter size. - if target_block_fn == 'bottleneck': - target_num_filters *= 4 - x = layers.Conv2D( - filters=target_num_filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - x) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - return x - - def get_config(self): - config_dict = { - 'min_level': self._min_level, - 'max_level': self._max_level, - 'endpoints_num_filters': self._endpoints_num_filters, - 'resample_alpha': self._resample_alpha, - 'block_repeats': self._block_repeats, - 'filter_size_scale': self._filter_size_scale, - 'init_stochastic_depth_rate': self._init_stochastic_depth_rate, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon - } - return config_dict - - @classmethod - def from_config(cls, config, custom_objects=None): - return cls(**config) - - @property - def output_specs(self): - """A dict of {level: TensorShape} pairs for the model output.""" - return self._output_specs - - -@factory.register_backbone_builder('spinenet') -def build_spinenet( - input_specs: tf.keras.layers.InputSpec, - backbone_config: hyperparams.Config, - norm_activation_config: hyperparams.Config, - l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: - """Builds SpineNet backbone from a config.""" - backbone_type = backbone_config.type - backbone_cfg = backbone_config.get() - assert backbone_type == 'spinenet', (f'Inconsistent backbone type ' - f'{backbone_type}') - - model_id = backbone_cfg.model_id - if model_id not in SCALING_MAP: - raise ValueError( - 'SpineNet-{} is not a valid architecture.'.format(model_id)) - scaling_params = SCALING_MAP[model_id] - - return SpineNet( - input_specs=input_specs, - min_level=backbone_cfg.min_level, - max_level=backbone_cfg.max_level, - endpoints_num_filters=scaling_params['endpoints_num_filters'], - resample_alpha=scaling_params['resample_alpha'], - block_repeats=scaling_params['block_repeats'], - filter_size_scale=scaling_params['filter_size_scale'], - init_stochastic_depth_rate=backbone_cfg.stochastic_depth_drop_rate, - kernel_regularizer=l2_regularizer, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon) diff --git a/official/vision/beta/modeling/decoders/__init__.py b/official/vision/beta/modeling/decoders/__init__.py deleted file mode 100644 index 1678aacb488552ad96ef8cd595f94986b61774b7..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/decoders/__init__.py +++ /dev/null @@ -1,20 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Decoders package definition.""" - -from official.vision.beta.modeling.decoders.aspp import ASPP -from official.vision.beta.modeling.decoders.fpn import FPN -from official.vision.beta.modeling.decoders.nasfpn import NASFPN diff --git a/official/vision/beta/modeling/decoders/factory.py b/official/vision/beta/modeling/decoders/factory.py deleted file mode 100644 index a5de4107a3c57386a99fbba36e73140d75bebd76..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/decoders/factory.py +++ /dev/null @@ -1,135 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Decoder registers and factory method. - -One can register a new decoder model by the following two steps: - -1 Import the factory and register the build in the decoder file. -2 Import the decoder class and add a build in __init__.py. - -``` -# my_decoder.py - -from modeling.decoders import factory - -class MyDecoder(): - ... - -@factory.register_decoder_builder('my_decoder') -def build_my_decoder(): - return MyDecoder() - -# decoders/__init__.py adds import -from modeling.decoders.my_decoder import MyDecoder -``` - -If one wants the MyDecoder class to be used only by those binary -then don't imported the decoder module in decoders/__init__.py, but import it -in place that uses it. -""" -from typing import Any, Callable, Mapping, Optional, Union - -# Import libraries - -import tensorflow as tf - -from official.core import registry -from official.modeling import hyperparams - -_REGISTERED_DECODER_CLS = {} - - -def register_decoder_builder(key: str) -> Callable[..., Any]: - """Decorates a builder of decoder class. - - The builder should be a Callable (a class or a function). - This decorator supports registration of decoder builder as follows: - - ``` - class MyDecoder(tf.keras.Model): - pass - - @register_decoder_builder('mydecoder') - def builder(input_specs, config, l2_reg): - return MyDecoder(...) - - # Builds a MyDecoder object. - my_decoder = build_decoder_3d(input_specs, config, l2_reg) - ``` - - Args: - key: A `str` of key to look up the builder. - - Returns: - A callable for using as class decorator that registers the decorated class - for creation from an instance of task_config_cls. - """ - return registry.register(_REGISTERED_DECODER_CLS, key) - - -@register_decoder_builder('identity') -def build_identity( - input_specs: Optional[Mapping[str, tf.TensorShape]] = None, - model_config: Optional[hyperparams.Config] = None, - l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None) -> None: - """Builds identity decoder from a config. - - All the input arguments are not used by identity decoder but kept here to - ensure the interface is consistent. - - Args: - input_specs: A `dict` of input specifications. A dictionary consists of - {level: TensorShape} from a backbone. - model_config: A `OneOfConfig` of model config. - l2_regularizer: A `tf.keras.regularizers.Regularizer` object. Default to - None. - - Returns: - An instance of the identity decoder. - """ - del input_specs, model_config, l2_regularizer # Unused by identity decoder. - - -def build_decoder( - input_specs: Mapping[str, tf.TensorShape], - model_config: hyperparams.Config, - l2_regularizer: tf.keras.regularizers.Regularizer = None, - **kwargs) -> Union[None, tf.keras.Model, tf.keras.layers.Layer]: # pytype: disable=annotation-type-mismatch # typed-keras - """Builds decoder from a config. - - A decoder can be a keras.Model, a keras.layers.Layer, or None. If it is not - None, the decoder will take features from the backbone as input and generate - decoded feature maps. If it is None, such as an identity decoder, the decoder - is skipped and features from the backbone are regarded as model output. - - Args: - input_specs: A `dict` of input specifications. A dictionary consists of - {level: TensorShape} from a backbone. - model_config: A `OneOfConfig` of model config. - l2_regularizer: A `tf.keras.regularizers.Regularizer` object. Default to - None. - **kwargs: Additional keyword args to be passed to decoder builder. - - Returns: - An instance of the decoder. - """ - decoder_builder = registry.lookup(_REGISTERED_DECODER_CLS, - model_config.decoder.type) - - return decoder_builder( - input_specs=input_specs, - model_config=model_config, - l2_regularizer=l2_regularizer, - **kwargs) diff --git a/official/vision/beta/modeling/decoders/factory_test.py b/official/vision/beta/modeling/decoders/factory_test.py deleted file mode 100644 index ea97e59e86e50a3de1dda7ccdc0b049046a0cafc..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/decoders/factory_test.py +++ /dev/null @@ -1,159 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for decoder factory functions.""" - -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from official.vision.beta import configs -from official.vision.beta.configs import decoders as decoders_cfg -from official.vision.beta.modeling import decoders -from official.vision.beta.modeling.decoders import factory - - -class FactoryTest(tf.test.TestCase, parameterized.TestCase): - - @combinations.generate( - combinations.combine( - num_filters=[128, 256], use_separable_conv=[True, False])) - def test_fpn_decoder_creation(self, num_filters, use_separable_conv): - """Test creation of FPN decoder.""" - min_level = 3 - max_level = 7 - input_specs = {} - for level in range(min_level, max_level): - input_specs[str(level)] = tf.TensorShape( - [1, 128 // (2**level), 128 // (2**level), 3]) - - network = decoders.FPN( - input_specs=input_specs, - num_filters=num_filters, - use_separable_conv=use_separable_conv, - use_sync_bn=True) - - model_config = configs.retinanet.RetinaNet() - model_config.min_level = min_level - model_config.max_level = max_level - model_config.num_classes = 10 - model_config.input_size = [None, None, 3] - model_config.decoder = decoders_cfg.Decoder( - type='fpn', - fpn=decoders_cfg.FPN( - num_filters=num_filters, use_separable_conv=use_separable_conv)) - - factory_network = factory.build_decoder( - input_specs=input_specs, model_config=model_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - - @combinations.generate( - combinations.combine( - num_filters=[128, 256], - num_repeats=[3, 5], - use_separable_conv=[True, False])) - def test_nasfpn_decoder_creation(self, num_filters, num_repeats, - use_separable_conv): - """Test creation of NASFPN decoder.""" - min_level = 3 - max_level = 7 - input_specs = {} - for level in range(min_level, max_level): - input_specs[str(level)] = tf.TensorShape( - [1, 128 // (2**level), 128 // (2**level), 3]) - - network = decoders.NASFPN( - input_specs=input_specs, - num_filters=num_filters, - num_repeats=num_repeats, - use_separable_conv=use_separable_conv, - use_sync_bn=True) - - model_config = configs.retinanet.RetinaNet() - model_config.min_level = min_level - model_config.max_level = max_level - model_config.num_classes = 10 - model_config.input_size = [None, None, 3] - model_config.decoder = decoders_cfg.Decoder( - type='nasfpn', - nasfpn=decoders_cfg.NASFPN( - num_filters=num_filters, - num_repeats=num_repeats, - use_separable_conv=use_separable_conv)) - - factory_network = factory.build_decoder( - input_specs=input_specs, model_config=model_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - - self.assertEqual(network_config, factory_network_config) - - @combinations.generate( - combinations.combine( - level=[3, 4], - dilation_rates=[[6, 12, 18], [6, 12]], - num_filters=[128, 256])) - def test_aspp_decoder_creation(self, level, dilation_rates, num_filters): - """Test creation of ASPP decoder.""" - input_specs = {'1': tf.TensorShape([1, 128, 128, 3])} - - network = decoders.ASPP( - level=level, - dilation_rates=dilation_rates, - num_filters=num_filters, - use_sync_bn=True) - - model_config = configs.semantic_segmentation.SemanticSegmentationModel() - model_config.num_classes = 10 - model_config.input_size = [None, None, 3] - model_config.decoder = decoders_cfg.Decoder( - type='aspp', - aspp=decoders_cfg.ASPP( - level=level, dilation_rates=dilation_rates, - num_filters=num_filters)) - - factory_network = factory.build_decoder( - input_specs=input_specs, model_config=model_config) - - network_config = network.get_config() - factory_network_config = factory_network.get_config() - # Due to calling `super().get_config()` in aspp layer, everything but the - # the name of two layer instances are the same, so we force equal name so it - # will not give false alarm. - factory_network_config['name'] = network_config['name'] - - self.assertEqual(network_config, factory_network_config) - - def test_identity_decoder_creation(self): - """Test creation of identity decoder.""" - model_config = configs.retinanet.RetinaNet() - model_config.num_classes = 2 - model_config.input_size = [None, None, 3] - - model_config.decoder = decoders_cfg.Decoder( - type='identity', identity=decoders_cfg.Identity()) - - factory_network = factory.build_decoder( - input_specs=None, model_config=model_config) - - self.assertIsNone(factory_network) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/decoders/fpn.py b/official/vision/beta/modeling/decoders/fpn.py deleted file mode 100644 index f96dec04e461671447cb624036e921678917653f..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/decoders/fpn.py +++ /dev/null @@ -1,246 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains the definitions of Feature Pyramid Networks (FPN).""" -from typing import Any, Mapping, Optional - -# Import libraries -from absl import logging -import tensorflow as tf - -from official.modeling import hyperparams -from official.modeling import tf_utils -from official.vision.beta.modeling.decoders import factory -from official.vision.beta.ops import spatial_transform_ops - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class FPN(tf.keras.Model): - """Creates a Feature Pyramid Network (FPN). - - This implemets the paper: - Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and - Serge Belongie. - Feature Pyramid Networks for Object Detection. - (https://arxiv.org/pdf/1612.03144) - """ - - def __init__( - self, - input_specs: Mapping[str, tf.TensorShape], - min_level: int = 3, - max_level: int = 7, - num_filters: int = 256, - fusion_type: str = 'sum', - use_separable_conv: bool = False, - activation: str = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - kernel_initializer: str = 'VarianceScaling', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - **kwargs): - """Initializes a Feature Pyramid Network (FPN). - - Args: - input_specs: A `dict` of input specifications. A dictionary consists of - {level: TensorShape} from a backbone. - min_level: An `int` of minimum level in FPN output feature maps. - max_level: An `int` of maximum level in FPN output feature maps. - num_filters: An `int` number of filters in FPN layers. - fusion_type: A `str` of `sum` or `concat`. Whether performing sum or - concat for feature fusion. - use_separable_conv: A `bool`. If True use separable convolution for - convolution in FPN layers. - activation: A `str` name of the activation function. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - kernel_initializer: A `str` name of kernel_initializer for convolutional - layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default is None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - **kwargs: Additional keyword arguments to be passed. - """ - self._config_dict = { - 'input_specs': input_specs, - 'min_level': min_level, - 'max_level': max_level, - 'num_filters': num_filters, - 'fusion_type': fusion_type, - 'use_separable_conv': use_separable_conv, - 'activation': activation, - 'use_sync_bn': use_sync_bn, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'kernel_initializer': kernel_initializer, - 'kernel_regularizer': kernel_regularizer, - 'bias_regularizer': bias_regularizer, - } - if use_separable_conv: - conv2d = tf.keras.layers.SeparableConv2D - else: - conv2d = tf.keras.layers.Conv2D - if use_sync_bn: - norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - norm = tf.keras.layers.BatchNormalization - activation_fn = tf.keras.layers.Activation( - tf_utils.get_activation(activation)) - - # Build input feature pyramid. - if tf.keras.backend.image_data_format() == 'channels_last': - bn_axis = -1 - else: - bn_axis = 1 - - # Get input feature pyramid from backbone. - logging.info('FPN input_specs: %s', input_specs) - inputs = self._build_input_pyramid(input_specs, min_level) - backbone_max_level = min(int(max(inputs.keys())), max_level) - - # Build lateral connections. - feats_lateral = {} - for level in range(min_level, backbone_max_level + 1): - feats_lateral[str(level)] = conv2d( - filters=num_filters, - kernel_size=1, - padding='same', - kernel_initializer=kernel_initializer, - kernel_regularizer=kernel_regularizer, - bias_regularizer=bias_regularizer)( - inputs[str(level)]) - - # Build top-down path. - feats = {str(backbone_max_level): feats_lateral[str(backbone_max_level)]} - for level in range(backbone_max_level - 1, min_level - 1, -1): - feat_a = spatial_transform_ops.nearest_upsampling( - feats[str(level + 1)], 2) - feat_b = feats_lateral[str(level)] - - if fusion_type == 'sum': - feats[str(level)] = feat_a + feat_b - elif fusion_type == 'concat': - feats[str(level)] = tf.concat([feat_a, feat_b], axis=-1) - else: - raise ValueError('Fusion type {} not supported.'.format(fusion_type)) - - # TODO(xianzhi): consider to remove bias in conv2d. - # Build post-hoc 3x3 convolution kernel. - for level in range(min_level, backbone_max_level + 1): - feats[str(level)] = conv2d( - filters=num_filters, - strides=1, - kernel_size=3, - padding='same', - kernel_initializer=kernel_initializer, - kernel_regularizer=kernel_regularizer, - bias_regularizer=bias_regularizer)( - feats[str(level)]) - - # TODO(xianzhi): consider to remove bias in conv2d. - # Build coarser FPN levels introduced for RetinaNet. - for level in range(backbone_max_level + 1, max_level + 1): - feats_in = feats[str(level - 1)] - if level > backbone_max_level + 1: - feats_in = activation_fn(feats_in) - feats[str(level)] = conv2d( - filters=num_filters, - strides=2, - kernel_size=3, - padding='same', - kernel_initializer=kernel_initializer, - kernel_regularizer=kernel_regularizer, - bias_regularizer=bias_regularizer)( - feats_in) - - # Apply batch norm layers. - for level in range(min_level, max_level + 1): - feats[str(level)] = norm( - axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)( - feats[str(level)]) - - self._output_specs = { - str(level): feats[str(level)].get_shape() - for level in range(min_level, max_level + 1) - } - - super(FPN, self).__init__(inputs=inputs, outputs=feats, **kwargs) - - def _build_input_pyramid(self, input_specs: Mapping[str, tf.TensorShape], - min_level: int): - assert isinstance(input_specs, dict) - if min(input_specs.keys()) > str(min_level): - raise ValueError( - 'Backbone min level should be less or equal to FPN min level') - - inputs = {} - for level, spec in input_specs.items(): - inputs[level] = tf.keras.Input(shape=spec[1:]) - return inputs - - def get_config(self) -> Mapping[str, Any]: - return self._config_dict - - @classmethod - def from_config(cls, config, custom_objects=None): - return cls(**config) - - @property - def output_specs(self) -> Mapping[str, tf.TensorShape]: - """A dict of {level: TensorShape} pairs for the model output.""" - return self._output_specs - - -@factory.register_decoder_builder('fpn') -def build_fpn_decoder( - input_specs: Mapping[str, tf.TensorShape], - model_config: hyperparams.Config, - l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None -) -> tf.keras.Model: - """Builds FPN decoder from a config. - - Args: - input_specs: A `dict` of input specifications. A dictionary consists of - {level: TensorShape} from a backbone. - model_config: A OneOfConfig. Model config. - l2_regularizer: A `tf.keras.regularizers.Regularizer` instance. Default to - None. - - Returns: - A `tf.keras.Model` instance of the FPN decoder. - - Raises: - ValueError: If the model_config.decoder.type is not `fpn`. - """ - decoder_type = model_config.decoder.type - decoder_cfg = model_config.decoder.get() - if decoder_type != 'fpn': - raise ValueError(f'Inconsistent decoder type {decoder_type}. ' - 'Need to be `fpn`.') - norm_activation_config = model_config.norm_activation - return FPN( - input_specs=input_specs, - min_level=model_config.min_level, - max_level=model_config.max_level, - num_filters=decoder_cfg.num_filters, - fusion_type=decoder_cfg.fusion_type, - use_separable_conv=decoder_cfg.use_separable_conv, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer) diff --git a/official/vision/beta/modeling/factory.py b/official/vision/beta/modeling/factory.py deleted file mode 100644 index c91a1abceed0249fccb4912e931012e0fa5596c9..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/factory.py +++ /dev/null @@ -1,385 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Factory methods to build models.""" - -from typing import Optional - -import tensorflow as tf - -from official.vision.beta.configs import image_classification as classification_cfg -from official.vision.beta.configs import maskrcnn as maskrcnn_cfg -from official.vision.beta.configs import retinanet as retinanet_cfg -from official.vision.beta.configs import semantic_segmentation as segmentation_cfg -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import classification_model -from official.vision.beta.modeling import decoders -from official.vision.beta.modeling import maskrcnn_model -from official.vision.beta.modeling import retinanet_model -from official.vision.beta.modeling import segmentation_model -from official.vision.beta.modeling.heads import dense_prediction_heads -from official.vision.beta.modeling.heads import instance_heads -from official.vision.beta.modeling.heads import segmentation_heads -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.modeling.layers import mask_sampler -from official.vision.beta.modeling.layers import roi_aligner -from official.vision.beta.modeling.layers import roi_generator -from official.vision.beta.modeling.layers import roi_sampler - - -def build_classification_model( - input_specs: tf.keras.layers.InputSpec, - model_config: classification_cfg.ImageClassificationModel, - l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - skip_logits_layer: bool = False, - backbone: Optional[tf.keras.Model] = None) -> tf.keras.Model: - """Builds the classification model.""" - norm_activation_config = model_config.norm_activation - if not backbone: - backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=norm_activation_config, - l2_regularizer=l2_regularizer) - - model = classification_model.ClassificationModel( - backbone=backbone, - num_classes=model_config.num_classes, - input_specs=input_specs, - dropout_rate=model_config.dropout_rate, - kernel_initializer=model_config.kernel_initializer, - kernel_regularizer=l2_regularizer, - add_head_batch_norm=model_config.add_head_batch_norm, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - skip_logits_layer=skip_logits_layer) - return model - - -def build_maskrcnn(input_specs: tf.keras.layers.InputSpec, - model_config: maskrcnn_cfg.MaskRCNN, - l2_regularizer: Optional[ - tf.keras.regularizers.Regularizer] = None, - backbone: Optional[tf.keras.Model] = None, - decoder: Optional[tf.keras.Model] = None) -> tf.keras.Model: - """Builds Mask R-CNN model.""" - norm_activation_config = model_config.norm_activation - if not backbone: - backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=norm_activation_config, - l2_regularizer=l2_regularizer) - backbone_features = backbone(tf.keras.Input(input_specs.shape[1:])) - - if not decoder: - decoder = decoders.factory.build_decoder( - input_specs=backbone.output_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - - rpn_head_config = model_config.rpn_head - roi_generator_config = model_config.roi_generator - roi_sampler_config = model_config.roi_sampler - roi_aligner_config = model_config.roi_aligner - detection_head_config = model_config.detection_head - generator_config = model_config.detection_generator - num_anchors_per_location = ( - len(model_config.anchor.aspect_ratios) * model_config.anchor.num_scales) - - rpn_head = dense_prediction_heads.RPNHead( - min_level=model_config.min_level, - max_level=model_config.max_level, - num_anchors_per_location=num_anchors_per_location, - num_convs=rpn_head_config.num_convs, - num_filters=rpn_head_config.num_filters, - use_separable_conv=rpn_head_config.use_separable_conv, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer) - - detection_head = instance_heads.DetectionHead( - num_classes=model_config.num_classes, - num_convs=detection_head_config.num_convs, - num_filters=detection_head_config.num_filters, - use_separable_conv=detection_head_config.use_separable_conv, - num_fcs=detection_head_config.num_fcs, - fc_dims=detection_head_config.fc_dims, - class_agnostic_bbox_pred=detection_head_config.class_agnostic_bbox_pred, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer, - name='detection_head') - - if decoder: - decoder_features = decoder(backbone_features) - rpn_head(decoder_features) - - if roi_sampler_config.cascade_iou_thresholds: - detection_head_cascade = [detection_head] - for cascade_num in range(len(roi_sampler_config.cascade_iou_thresholds)): - detection_head = instance_heads.DetectionHead( - num_classes=model_config.num_classes, - num_convs=detection_head_config.num_convs, - num_filters=detection_head_config.num_filters, - use_separable_conv=detection_head_config.use_separable_conv, - num_fcs=detection_head_config.num_fcs, - fc_dims=detection_head_config.fc_dims, - class_agnostic_bbox_pred=detection_head_config - .class_agnostic_bbox_pred, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer, - name='detection_head_{}'.format(cascade_num + 1)) - - detection_head_cascade.append(detection_head) - detection_head = detection_head_cascade - - roi_generator_obj = roi_generator.MultilevelROIGenerator( - pre_nms_top_k=roi_generator_config.pre_nms_top_k, - pre_nms_score_threshold=roi_generator_config.pre_nms_score_threshold, - pre_nms_min_size_threshold=( - roi_generator_config.pre_nms_min_size_threshold), - nms_iou_threshold=roi_generator_config.nms_iou_threshold, - num_proposals=roi_generator_config.num_proposals, - test_pre_nms_top_k=roi_generator_config.test_pre_nms_top_k, - test_pre_nms_score_threshold=( - roi_generator_config.test_pre_nms_score_threshold), - test_pre_nms_min_size_threshold=( - roi_generator_config.test_pre_nms_min_size_threshold), - test_nms_iou_threshold=roi_generator_config.test_nms_iou_threshold, - test_num_proposals=roi_generator_config.test_num_proposals, - use_batched_nms=roi_generator_config.use_batched_nms) - - roi_sampler_cascade = [] - roi_sampler_obj = roi_sampler.ROISampler( - mix_gt_boxes=roi_sampler_config.mix_gt_boxes, - num_sampled_rois=roi_sampler_config.num_sampled_rois, - foreground_fraction=roi_sampler_config.foreground_fraction, - foreground_iou_threshold=roi_sampler_config.foreground_iou_threshold, - background_iou_high_threshold=( - roi_sampler_config.background_iou_high_threshold), - background_iou_low_threshold=( - roi_sampler_config.background_iou_low_threshold)) - roi_sampler_cascade.append(roi_sampler_obj) - # Initialize addtional roi simplers for cascade heads. - if roi_sampler_config.cascade_iou_thresholds: - for iou in roi_sampler_config.cascade_iou_thresholds: - roi_sampler_obj = roi_sampler.ROISampler( - mix_gt_boxes=False, - num_sampled_rois=roi_sampler_config.num_sampled_rois, - foreground_iou_threshold=iou, - background_iou_high_threshold=iou, - background_iou_low_threshold=0.0, - skip_subsampling=True) - roi_sampler_cascade.append(roi_sampler_obj) - - roi_aligner_obj = roi_aligner.MultilevelROIAligner( - crop_size=roi_aligner_config.crop_size, - sample_offset=roi_aligner_config.sample_offset) - - detection_generator_obj = detection_generator.DetectionGenerator( - apply_nms=generator_config.apply_nms, - pre_nms_top_k=generator_config.pre_nms_top_k, - pre_nms_score_threshold=generator_config.pre_nms_score_threshold, - nms_iou_threshold=generator_config.nms_iou_threshold, - max_num_detections=generator_config.max_num_detections, - nms_version=generator_config.nms_version, - use_cpu_nms=generator_config.use_cpu_nms, - soft_nms_sigma=generator_config.soft_nms_sigma) - - if model_config.include_mask: - mask_head = instance_heads.MaskHead( - num_classes=model_config.num_classes, - upsample_factor=model_config.mask_head.upsample_factor, - num_convs=model_config.mask_head.num_convs, - num_filters=model_config.mask_head.num_filters, - use_separable_conv=model_config.mask_head.use_separable_conv, - activation=model_config.norm_activation.activation, - norm_momentum=model_config.norm_activation.norm_momentum, - norm_epsilon=model_config.norm_activation.norm_epsilon, - kernel_regularizer=l2_regularizer, - class_agnostic=model_config.mask_head.class_agnostic) - - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=( - model_config.mask_roi_aligner.crop_size * - model_config.mask_head.upsample_factor), - num_sampled_masks=model_config.mask_sampler.num_sampled_masks) - - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner( - crop_size=model_config.mask_roi_aligner.crop_size, - sample_offset=model_config.mask_roi_aligner.sample_offset) - else: - mask_head = None - mask_sampler_obj = None - mask_roi_aligner_obj = None - - model = maskrcnn_model.MaskRCNNModel( - backbone=backbone, - decoder=decoder, - rpn_head=rpn_head, - detection_head=detection_head, - roi_generator=roi_generator_obj, - roi_sampler=roi_sampler_cascade, - roi_aligner=roi_aligner_obj, - detection_generator=detection_generator_obj, - mask_head=mask_head, - mask_sampler=mask_sampler_obj, - mask_roi_aligner=mask_roi_aligner_obj, - class_agnostic_bbox_pred=detection_head_config.class_agnostic_bbox_pred, - cascade_class_ensemble=detection_head_config.cascade_class_ensemble, - min_level=model_config.min_level, - max_level=model_config.max_level, - num_scales=model_config.anchor.num_scales, - aspect_ratios=model_config.anchor.aspect_ratios, - anchor_size=model_config.anchor.anchor_size) - return model - - -def build_retinanet( - input_specs: tf.keras.layers.InputSpec, - model_config: retinanet_cfg.RetinaNet, - l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - backbone: Optional[tf.keras.Model] = None, - decoder: Optional[tf.keras.regularizers.Regularizer] = None -) -> tf.keras.Model: - """Builds RetinaNet model.""" - norm_activation_config = model_config.norm_activation - if not backbone: - backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=norm_activation_config, - l2_regularizer=l2_regularizer) - backbone_features = backbone(tf.keras.Input(input_specs.shape[1:])) - - if not decoder: - decoder = decoders.factory.build_decoder( - input_specs=backbone.output_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - - head_config = model_config.head - generator_config = model_config.detection_generator - num_anchors_per_location = ( - len(model_config.anchor.aspect_ratios) * model_config.anchor.num_scales) - - head = dense_prediction_heads.RetinaNetHead( - min_level=model_config.min_level, - max_level=model_config.max_level, - num_classes=model_config.num_classes, - num_anchors_per_location=num_anchors_per_location, - num_convs=head_config.num_convs, - num_filters=head_config.num_filters, - attribute_heads=[ - cfg.as_dict() for cfg in (head_config.attribute_heads or []) - ], - use_separable_conv=head_config.use_separable_conv, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer) - - # Builds decoder and head so that their trainable weights are initialized - if decoder: - decoder_features = decoder(backbone_features) - _ = head(decoder_features) - - detection_generator_obj = detection_generator.MultilevelDetectionGenerator( - apply_nms=generator_config.apply_nms, - pre_nms_top_k=generator_config.pre_nms_top_k, - pre_nms_score_threshold=generator_config.pre_nms_score_threshold, - nms_iou_threshold=generator_config.nms_iou_threshold, - max_num_detections=generator_config.max_num_detections, - nms_version=generator_config.nms_version, - use_cpu_nms=generator_config.use_cpu_nms, - soft_nms_sigma=generator_config.soft_nms_sigma) - - model = retinanet_model.RetinaNetModel( - backbone, - decoder, - head, - detection_generator_obj, - min_level=model_config.min_level, - max_level=model_config.max_level, - num_scales=model_config.anchor.num_scales, - aspect_ratios=model_config.anchor.aspect_ratios, - anchor_size=model_config.anchor.anchor_size) - return model - - -def build_segmentation_model( - input_specs: tf.keras.layers.InputSpec, - model_config: segmentation_cfg.SemanticSegmentationModel, - l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - backbone: Optional[tf.keras.regularizers.Regularizer] = None, - decoder: Optional[tf.keras.regularizers.Regularizer] = None -) -> tf.keras.Model: - """Builds Segmentation model.""" - norm_activation_config = model_config.norm_activation - if not backbone: - backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=norm_activation_config, - l2_regularizer=l2_regularizer) - - if not decoder: - decoder = decoders.factory.build_decoder( - input_specs=backbone.output_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - - head_config = model_config.head - - head = segmentation_heads.SegmentationHead( - num_classes=model_config.num_classes, - level=head_config.level, - num_convs=head_config.num_convs, - prediction_kernel_size=head_config.prediction_kernel_size, - num_filters=head_config.num_filters, - use_depthwise_convolution=head_config.use_depthwise_convolution, - upsample_factor=head_config.upsample_factor, - feature_fusion=head_config.feature_fusion, - low_level=head_config.low_level, - low_level_num_filters=head_config.low_level_num_filters, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer) - - mask_scoring_head = None - if model_config.mask_scoring_head: - mask_scoring_head = segmentation_heads.MaskScoring( - num_classes=model_config.num_classes, - **model_config.mask_scoring_head.as_dict(), - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer) - - model = segmentation_model.SegmentationModel( - backbone, decoder, head, mask_scoring_head=mask_scoring_head) - return model diff --git a/official/vision/beta/modeling/factory_test.py b/official/vision/beta/modeling/factory_test.py deleted file mode 100644 index 79127f1b5b4d9f9c7d6a37eb6d1523082dadbf1f..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/factory_test.py +++ /dev/null @@ -1,132 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for factory.py.""" - -# Import libraries -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.beta.configs import backbones -from official.vision.beta.configs import backbones_3d -from official.vision.beta.configs import image_classification as classification_cfg -from official.vision.beta.configs import maskrcnn as maskrcnn_cfg -from official.vision.beta.configs import retinanet as retinanet_cfg -from official.vision.beta.configs import video_classification as video_classification_cfg -from official.vision.beta.modeling import factory -from official.vision.beta.modeling import factory_3d - - -class ClassificationModelBuilderTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - ('resnet', (224, 224), 5e-5), - ('resnet', (224, 224), None), - ('resnet', (None, None), 5e-5), - ('resnet', (None, None), None), - ) - def test_builder(self, backbone_type, input_size, weight_decay): - num_classes = 2 - input_specs = tf.keras.layers.InputSpec( - shape=[None, input_size[0], input_size[1], 3]) - model_config = classification_cfg.ImageClassificationModel( - num_classes=num_classes, - backbone=backbones.Backbone(type=backbone_type)) - l2_regularizer = ( - tf.keras.regularizers.l2(weight_decay) if weight_decay else None) - _ = factory.build_classification_model( - input_specs=input_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - - -class MaskRCNNBuilderTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - ('resnet', (640, 640)), - ('resnet', (None, None)), - ) - def test_builder(self, backbone_type, input_size): - num_classes = 2 - input_specs = tf.keras.layers.InputSpec( - shape=[None, input_size[0], input_size[1], 3]) - model_config = maskrcnn_cfg.MaskRCNN( - num_classes=num_classes, - backbone=backbones.Backbone(type=backbone_type)) - l2_regularizer = tf.keras.regularizers.l2(5e-5) - _ = factory.build_maskrcnn( - input_specs=input_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - - -class RetinaNetBuilderTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - ('resnet', (640, 640), False), - ('resnet', (None, None), True), - ) - def test_builder(self, backbone_type, input_size, has_att_heads): - num_classes = 2 - input_specs = tf.keras.layers.InputSpec( - shape=[None, input_size[0], input_size[1], 3]) - if has_att_heads: - attribute_heads_config = [ - retinanet_cfg.AttributeHead(name='att1'), - retinanet_cfg.AttributeHead( - name='att2', type='classification', size=2), - ] - else: - attribute_heads_config = None - model_config = retinanet_cfg.RetinaNet( - num_classes=num_classes, - backbone=backbones.Backbone(type=backbone_type), - head=retinanet_cfg.RetinaNetHead( - attribute_heads=attribute_heads_config)) - l2_regularizer = tf.keras.regularizers.l2(5e-5) - _ = factory.build_retinanet( - input_specs=input_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - if has_att_heads: - self.assertEqual(model_config.head.attribute_heads[0].as_dict(), - dict(name='att1', type='regression', size=1)) - self.assertEqual(model_config.head.attribute_heads[1].as_dict(), - dict(name='att2', type='classification', size=2)) - - -class VideoClassificationModelBuilderTest(parameterized.TestCase, - tf.test.TestCase): - - @parameterized.parameters( - ('resnet_3d', (8, 224, 224), 5e-5), - ('resnet_3d', (None, None, None), 5e-5), - ) - def test_builder(self, backbone_type, input_size, weight_decay): - input_specs = tf.keras.layers.InputSpec( - shape=[None, input_size[0], input_size[1], input_size[2], 3]) - model_config = video_classification_cfg.VideoClassificationModel( - backbone=backbones_3d.Backbone3D(type=backbone_type)) - l2_regularizer = ( - tf.keras.regularizers.l2(weight_decay) if weight_decay else None) - _ = factory_3d.build_video_classification_model( - input_specs=input_specs, - model_config=model_config, - num_classes=2, - l2_regularizer=l2_regularizer) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/heads/__init__.py b/official/vision/beta/modeling/heads/__init__.py deleted file mode 100644 index 881fc1120e85f5bc38c04e103e885285e22c7a8c..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/heads/__init__.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Heads package definition.""" - -from official.vision.beta.modeling.heads.dense_prediction_heads import RetinaNetHead -from official.vision.beta.modeling.heads.dense_prediction_heads import RPNHead -from official.vision.beta.modeling.heads.instance_heads import DetectionHead -from official.vision.beta.modeling.heads.instance_heads import MaskHead -from official.vision.beta.modeling.heads.segmentation_heads import SegmentationHead diff --git a/official/vision/beta/modeling/heads/dense_prediction_heads.py b/official/vision/beta/modeling/heads/dense_prediction_heads.py deleted file mode 100644 index 60e19c92fc4c82042f8ba6fde62e9db5b2e26d2d..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/heads/dense_prediction_heads.py +++ /dev/null @@ -1,517 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains definitions of dense prediction heads.""" - -from typing import Any, Dict, List, Mapping, Optional, Union - -# Import libraries - -import numpy as np -import tensorflow as tf - -from official.modeling import tf_utils - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class RetinaNetHead(tf.keras.layers.Layer): - """Creates a RetinaNet head.""" - - def __init__( - self, - min_level: int, - max_level: int, - num_classes: int, - num_anchors_per_location: int, - num_convs: int = 4, - num_filters: int = 256, - attribute_heads: Optional[List[Dict[str, Any]]] = None, - use_separable_conv: bool = False, - activation: str = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - num_params_per_anchor: int = 4, - **kwargs): - """Initializes a RetinaNet head. - - Args: - min_level: An `int` number of minimum feature level. - max_level: An `int` number of maximum feature level. - num_classes: An `int` number of classes to predict. - num_anchors_per_location: An `int` number of number of anchors per pixel - location. - num_convs: An `int` number that represents the number of the intermediate - conv layers before the prediction. - num_filters: An `int` number that represents the number of filters of the - intermediate conv layers. - attribute_heads: If not None, a list that contains a dict for each - additional attribute head. Each dict consists of 3 key-value pairs: - `name`, `type` ('regression' or 'classification'), and `size` (number - of predicted values for each instance). - use_separable_conv: A `bool` that indicates whether the separable - convolution layers is used. - activation: A `str` that indicates which activation is used, e.g. 'relu', - 'swish', etc. - use_sync_bn: A `bool` that indicates whether to use synchronized batch - normalization across different replicas. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default is None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - num_params_per_anchor: Number of parameters required to specify an anchor - box. For example, `num_params_per_anchor` would be 4 for axis-aligned - anchor boxes specified by their y-centers, x-centers, heights, and - widths. - **kwargs: Additional keyword arguments to be passed. - """ - super(RetinaNetHead, self).__init__(**kwargs) - self._config_dict = { - 'min_level': min_level, - 'max_level': max_level, - 'num_classes': num_classes, - 'num_anchors_per_location': num_anchors_per_location, - 'num_convs': num_convs, - 'num_filters': num_filters, - 'attribute_heads': attribute_heads, - 'use_separable_conv': use_separable_conv, - 'activation': activation, - 'use_sync_bn': use_sync_bn, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'kernel_regularizer': kernel_regularizer, - 'bias_regularizer': bias_regularizer, - 'num_params_per_anchor': num_params_per_anchor, - } - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation = tf_utils.get_activation(activation) - - def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): - """Creates the variables of the head.""" - conv_op = (tf.keras.layers.SeparableConv2D - if self._config_dict['use_separable_conv'] - else tf.keras.layers.Conv2D) - conv_kwargs = { - 'filters': self._config_dict['num_filters'], - 'kernel_size': 3, - 'padding': 'same', - 'bias_initializer': tf.zeros_initializer(), - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - if not self._config_dict['use_separable_conv']: - conv_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.RandomNormal( - stddev=0.01), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - }) - bn_op = (tf.keras.layers.experimental.SyncBatchNormalization - if self._config_dict['use_sync_bn'] - else tf.keras.layers.BatchNormalization) - bn_kwargs = { - 'axis': self._bn_axis, - 'momentum': self._config_dict['norm_momentum'], - 'epsilon': self._config_dict['norm_epsilon'], - } - - # Class net. - self._cls_convs = [] - self._cls_norms = [] - for level in range( - self._config_dict['min_level'], self._config_dict['max_level'] + 1): - this_level_cls_norms = [] - for i in range(self._config_dict['num_convs']): - if level == self._config_dict['min_level']: - cls_conv_name = 'classnet-conv_{}'.format(i) - self._cls_convs.append(conv_op(name=cls_conv_name, **conv_kwargs)) - cls_norm_name = 'classnet-conv-norm_{}_{}'.format(level, i) - this_level_cls_norms.append(bn_op(name=cls_norm_name, **bn_kwargs)) - self._cls_norms.append(this_level_cls_norms) - - classifier_kwargs = { - 'filters': ( - self._config_dict['num_classes'] * - self._config_dict['num_anchors_per_location']), - 'kernel_size': 3, - 'padding': 'same', - 'bias_initializer': tf.constant_initializer(-np.log((1 - 0.01) / 0.01)), - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - if not self._config_dict['use_separable_conv']: - classifier_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.RandomNormal(stddev=1e-5), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - }) - self._classifier = conv_op(name='scores', **classifier_kwargs) - - # Box net. - self._box_convs = [] - self._box_norms = [] - for level in range( - self._config_dict['min_level'], self._config_dict['max_level'] + 1): - this_level_box_norms = [] - for i in range(self._config_dict['num_convs']): - if level == self._config_dict['min_level']: - box_conv_name = 'boxnet-conv_{}'.format(i) - self._box_convs.append(conv_op(name=box_conv_name, **conv_kwargs)) - box_norm_name = 'boxnet-conv-norm_{}_{}'.format(level, i) - this_level_box_norms.append(bn_op(name=box_norm_name, **bn_kwargs)) - self._box_norms.append(this_level_box_norms) - - box_regressor_kwargs = { - 'filters': (self._config_dict['num_params_per_anchor'] * - self._config_dict['num_anchors_per_location']), - 'kernel_size': 3, - 'padding': 'same', - 'bias_initializer': tf.zeros_initializer(), - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - if not self._config_dict['use_separable_conv']: - box_regressor_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.RandomNormal( - stddev=1e-5), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - }) - self._box_regressor = conv_op(name='boxes', **box_regressor_kwargs) - - # Attribute learning nets. - if self._config_dict['attribute_heads']: - self._att_predictors = {} - self._att_convs = {} - self._att_norms = {} - - for att_config in self._config_dict['attribute_heads']: - att_name = att_config['name'] - att_type = att_config['type'] - att_size = att_config['size'] - att_convs_i = [] - att_norms_i = [] - - # Build conv and norm layers. - for level in range(self._config_dict['min_level'], - self._config_dict['max_level'] + 1): - this_level_att_norms = [] - for i in range(self._config_dict['num_convs']): - if level == self._config_dict['min_level']: - att_conv_name = '{}-conv_{}'.format(att_name, i) - att_convs_i.append(conv_op(name=att_conv_name, **conv_kwargs)) - att_norm_name = '{}-conv-norm_{}_{}'.format(att_name, level, i) - this_level_att_norms.append(bn_op(name=att_norm_name, **bn_kwargs)) - att_norms_i.append(this_level_att_norms) - self._att_convs[att_name] = att_convs_i - self._att_norms[att_name] = att_norms_i - - # Build the final prediction layer. - att_predictor_kwargs = { - 'filters': - (att_size * self._config_dict['num_anchors_per_location']), - 'kernel_size': 3, - 'padding': 'same', - 'bias_initializer': tf.zeros_initializer(), - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - if att_type == 'regression': - att_predictor_kwargs.update( - {'bias_initializer': tf.zeros_initializer()}) - elif att_type == 'classification': - att_predictor_kwargs.update({ - 'bias_initializer': - tf.constant_initializer(-np.log((1 - 0.01) / 0.01)) - }) - else: - raise ValueError( - 'Attribute head type {} not supported.'.format(att_type)) - - if not self._config_dict['use_separable_conv']: - att_predictor_kwargs.update({ - 'kernel_initializer': - tf.keras.initializers.RandomNormal(stddev=1e-5), - 'kernel_regularizer': - self._config_dict['kernel_regularizer'], - }) - - self._att_predictors[att_name] = conv_op( - name='{}_attributes'.format(att_name), **att_predictor_kwargs) - - super(RetinaNetHead, self).build(input_shape) - - def call(self, features: Mapping[str, tf.Tensor]): - """Forward pass of the RetinaNet head. - - Args: - features: A `dict` of `tf.Tensor` where - - key: A `str` of the level of the multilevel features. - - values: A `tf.Tensor`, the feature map tensors, whose shape is - [batch, height_l, width_l, channels]. - - Returns: - scores: A `dict` of `tf.Tensor` which includes scores of the predictions. - - key: A `str` of the level of the multilevel predictions. - - values: A `tf.Tensor` of the box scores predicted from a particular - feature level, whose shape is - [batch, height_l, width_l, num_classes * num_anchors_per_location]. - boxes: A `dict` of `tf.Tensor` which includes coordinates of the - predictions. - - key: A `str` of the level of the multilevel predictions. - - values: A `tf.Tensor` of the box scores predicted from a particular - feature level, whose shape is - [batch, height_l, width_l, - num_params_per_anchor * num_anchors_per_location]. - attributes: a dict of (attribute_name, attribute_prediction). Each - `attribute_prediction` is a dict of: - - key: `str`, the level of the multilevel predictions. - - values: `Tensor`, the box scores predicted from a particular feature - level, whose shape is - [batch, height_l, width_l, - attribute_size * num_anchors_per_location]. - Can be an empty dictionary if no attribute learning is required. - """ - scores = {} - boxes = {} - if self._config_dict['attribute_heads']: - attributes = { - att_config['name']: {} - for att_config in self._config_dict['attribute_heads'] - } - else: - attributes = {} - - for i, level in enumerate( - range(self._config_dict['min_level'], - self._config_dict['max_level'] + 1)): - this_level_features = features[str(level)] - - # class net. - x = this_level_features - for conv, norm in zip(self._cls_convs, self._cls_norms[i]): - x = conv(x) - x = norm(x) - x = self._activation(x) - scores[str(level)] = self._classifier(x) - - # box net. - x = this_level_features - for conv, norm in zip(self._box_convs, self._box_norms[i]): - x = conv(x) - x = norm(x) - x = self._activation(x) - boxes[str(level)] = self._box_regressor(x) - - # attribute nets. - if self._config_dict['attribute_heads']: - for att_config in self._config_dict['attribute_heads']: - att_name = att_config['name'] - x = this_level_features - for conv, norm in zip(self._att_convs[att_name], - self._att_norms[att_name][i]): - x = conv(x) - x = norm(x) - x = self._activation(x) - attributes[att_name][str(level)] = self._att_predictors[att_name](x) - - return scores, boxes, attributes - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class RPNHead(tf.keras.layers.Layer): - """Creates a Region Proposal Network (RPN) head.""" - - def __init__( - self, - min_level: int, - max_level: int, - num_anchors_per_location: int, - num_convs: int = 1, - num_filters: int = 256, - use_separable_conv: bool = False, - activation: str = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - **kwargs): - """Initializes a Region Proposal Network head. - - Args: - min_level: An `int` number of minimum feature level. - max_level: An `int` number of maximum feature level. - num_anchors_per_location: An `int` number of number of anchors per pixel - location. - num_convs: An `int` number that represents the number of the intermediate - convolution layers before the prediction. - num_filters: An `int` number that represents the number of filters of the - intermediate convolution layers. - use_separable_conv: A `bool` that indicates whether the separable - convolution layers is used. - activation: A `str` that indicates which activation is used, e.g. 'relu', - 'swish', etc. - use_sync_bn: A `bool` that indicates whether to use synchronized batch - normalization across different replicas. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default is None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - **kwargs: Additional keyword arguments to be passed. - """ - super(RPNHead, self).__init__(**kwargs) - self._config_dict = { - 'min_level': min_level, - 'max_level': max_level, - 'num_anchors_per_location': num_anchors_per_location, - 'num_convs': num_convs, - 'num_filters': num_filters, - 'use_separable_conv': use_separable_conv, - 'activation': activation, - 'use_sync_bn': use_sync_bn, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'kernel_regularizer': kernel_regularizer, - 'bias_regularizer': bias_regularizer, - } - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation = tf_utils.get_activation(activation) - - def build(self, input_shape): - """Creates the variables of the head.""" - conv_op = (tf.keras.layers.SeparableConv2D - if self._config_dict['use_separable_conv'] - else tf.keras.layers.Conv2D) - conv_kwargs = { - 'filters': self._config_dict['num_filters'], - 'kernel_size': 3, - 'padding': 'same', - 'bias_initializer': tf.zeros_initializer(), - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - if not self._config_dict['use_separable_conv']: - conv_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.RandomNormal( - stddev=0.01), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - }) - bn_op = (tf.keras.layers.experimental.SyncBatchNormalization - if self._config_dict['use_sync_bn'] - else tf.keras.layers.BatchNormalization) - bn_kwargs = { - 'axis': self._bn_axis, - 'momentum': self._config_dict['norm_momentum'], - 'epsilon': self._config_dict['norm_epsilon'], - } - - self._convs = [] - self._norms = [] - for level in range( - self._config_dict['min_level'], self._config_dict['max_level'] + 1): - this_level_norms = [] - for i in range(self._config_dict['num_convs']): - if level == self._config_dict['min_level']: - conv_name = 'rpn-conv_{}'.format(i) - self._convs.append(conv_op(name=conv_name, **conv_kwargs)) - norm_name = 'rpn-conv-norm_{}_{}'.format(level, i) - this_level_norms.append(bn_op(name=norm_name, **bn_kwargs)) - self._norms.append(this_level_norms) - - classifier_kwargs = { - 'filters': self._config_dict['num_anchors_per_location'], - 'kernel_size': 1, - 'padding': 'valid', - 'bias_initializer': tf.zeros_initializer(), - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - if not self._config_dict['use_separable_conv']: - classifier_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.RandomNormal( - stddev=1e-5), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - }) - self._classifier = conv_op(name='rpn-scores', **classifier_kwargs) - - box_regressor_kwargs = { - 'filters': 4 * self._config_dict['num_anchors_per_location'], - 'kernel_size': 1, - 'padding': 'valid', - 'bias_initializer': tf.zeros_initializer(), - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - if not self._config_dict['use_separable_conv']: - box_regressor_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.RandomNormal( - stddev=1e-5), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - }) - self._box_regressor = conv_op(name='rpn-boxes', **box_regressor_kwargs) - - super(RPNHead, self).build(input_shape) - - def call(self, features: Mapping[str, tf.Tensor]): - """Forward pass of the RPN head. - - Args: - features: A `dict` of `tf.Tensor` where - - key: A `str` of the level of the multilevel features. - - values: A `tf.Tensor`, the feature map tensors, whose shape is [batch, - height_l, width_l, channels]. - - Returns: - scores: A `dict` of `tf.Tensor` which includes scores of the predictions. - - key: A `str` of the level of the multilevel predictions. - - values: A `tf.Tensor` of the box scores predicted from a particular - feature level, whose shape is - [batch, height_l, width_l, num_classes * num_anchors_per_location]. - boxes: A `dict` of `tf.Tensor` which includes coordinates of the - predictions. - - key: A `str` of the level of the multilevel predictions. - - values: A `tf.Tensor` of the box scores predicted from a particular - feature level, whose shape is - [batch, height_l, width_l, 4 * num_anchors_per_location]. - """ - scores = {} - boxes = {} - for i, level in enumerate( - range(self._config_dict['min_level'], - self._config_dict['max_level'] + 1)): - x = features[str(level)] - for conv, norm in zip(self._convs, self._norms[i]): - x = conv(x) - x = norm(x) - x = self._activation(x) - scores[str(level)] = self._classifier(x) - boxes[str(level)] = self._box_regressor(x) - return scores, boxes - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/modeling/heads/dense_prediction_heads_test.py b/official/vision/beta/modeling/heads/dense_prediction_heads_test.py deleted file mode 100644 index ee940c550e38700b3ee2d6b6d313a0b6448637d3..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/heads/dense_prediction_heads_test.py +++ /dev/null @@ -1,148 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for dense_prediction_heads.py.""" - -# Import libraries -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.vision.beta.modeling.heads import dense_prediction_heads - - -class RetinaNetHeadTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (False, False, False), - (False, True, False), - (True, False, True), - (True, True, True), - ) - def test_forward(self, use_separable_conv, use_sync_bn, has_att_heads): - if has_att_heads: - attribute_heads = [dict(name='depth', type='regression', size=1)] - else: - attribute_heads = None - - retinanet_head = dense_prediction_heads.RetinaNetHead( - min_level=3, - max_level=4, - num_classes=3, - num_anchors_per_location=3, - num_convs=2, - num_filters=256, - attribute_heads=attribute_heads, - use_separable_conv=use_separable_conv, - activation='relu', - use_sync_bn=use_sync_bn, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - features = { - '3': np.random.rand(2, 128, 128, 16), - '4': np.random.rand(2, 64, 64, 16), - } - scores, boxes, attributes = retinanet_head(features) - self.assertAllEqual(scores['3'].numpy().shape, [2, 128, 128, 9]) - self.assertAllEqual(scores['4'].numpy().shape, [2, 64, 64, 9]) - self.assertAllEqual(boxes['3'].numpy().shape, [2, 128, 128, 12]) - self.assertAllEqual(boxes['4'].numpy().shape, [2, 64, 64, 12]) - if has_att_heads: - for att in attributes.values(): - self.assertAllEqual(att['3'].numpy().shape, [2, 128, 128, 3]) - self.assertAllEqual(att['4'].numpy().shape, [2, 64, 64, 3]) - - def test_serialize_deserialize(self): - retinanet_head = dense_prediction_heads.RetinaNetHead( - min_level=3, - max_level=7, - num_classes=3, - num_anchors_per_location=9, - num_convs=2, - num_filters=16, - attribute_heads=None, - use_separable_conv=False, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - config = retinanet_head.get_config() - new_retinanet_head = ( - dense_prediction_heads.RetinaNetHead.from_config(config)) - self.assertAllEqual( - retinanet_head.get_config(), new_retinanet_head.get_config()) - - -class RpnHeadTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (False, False), - (False, True), - (True, False), - (True, True), - ) - def test_forward(self, use_separable_conv, use_sync_bn): - rpn_head = dense_prediction_heads.RPNHead( - min_level=3, - max_level=4, - num_anchors_per_location=3, - num_convs=2, - num_filters=256, - use_separable_conv=use_separable_conv, - activation='relu', - use_sync_bn=use_sync_bn, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - features = { - '3': np.random.rand(2, 128, 128, 16), - '4': np.random.rand(2, 64, 64, 16), - } - scores, boxes = rpn_head(features) - self.assertAllEqual(scores['3'].numpy().shape, [2, 128, 128, 3]) - self.assertAllEqual(scores['4'].numpy().shape, [2, 64, 64, 3]) - self.assertAllEqual(boxes['3'].numpy().shape, [2, 128, 128, 12]) - self.assertAllEqual(boxes['4'].numpy().shape, [2, 64, 64, 12]) - - def test_serialize_deserialize(self): - rpn_head = dense_prediction_heads.RPNHead( - min_level=3, - max_level=7, - num_anchors_per_location=9, - num_convs=2, - num_filters=16, - use_separable_conv=False, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - config = rpn_head.get_config() - new_rpn_head = dense_prediction_heads.RPNHead.from_config(config) - self.assertAllEqual(rpn_head.get_config(), new_rpn_head.get_config()) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/heads/instance_heads.py b/official/vision/beta/modeling/heads/instance_heads.py deleted file mode 100644 index fd492dd22a6d30b727b6c1cc2c67979337329307..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/heads/instance_heads.py +++ /dev/null @@ -1,444 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains definitions of instance prediction heads.""" - -from typing import List, Union, Optional -# Import libraries -import tensorflow as tf - -from official.modeling import tf_utils - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class DetectionHead(tf.keras.layers.Layer): - """Creates a detection head.""" - - def __init__( - self, - num_classes: int, - num_convs: int = 0, - num_filters: int = 256, - use_separable_conv: bool = False, - num_fcs: int = 2, - fc_dims: int = 1024, - class_agnostic_bbox_pred: bool = False, - activation: str = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - **kwargs): - """Initializes a detection head. - - Args: - num_classes: An `int` for the number of classes. - num_convs: An `int` number that represents the number of the intermediate - convolution layers before the FC layers. - num_filters: An `int` number that represents the number of filters of the - intermediate convolution layers. - use_separable_conv: A `bool` that indicates whether the separable - convolution layers is used. - num_fcs: An `int` number that represents the number of FC layers before - the predictions. - fc_dims: An `int` number that represents the number of dimension of the FC - layers. - class_agnostic_bbox_pred: `bool`, indicating whether bboxes should be - predicted for every class or not. - activation: A `str` that indicates which activation is used, e.g. 'relu', - 'swish', etc. - use_sync_bn: A `bool` that indicates whether to use synchronized batch - normalization across different replicas. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default is None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - **kwargs: Additional keyword arguments to be passed. - """ - super(DetectionHead, self).__init__(**kwargs) - self._config_dict = { - 'num_classes': num_classes, - 'num_convs': num_convs, - 'num_filters': num_filters, - 'use_separable_conv': use_separable_conv, - 'num_fcs': num_fcs, - 'fc_dims': fc_dims, - 'class_agnostic_bbox_pred': class_agnostic_bbox_pred, - 'activation': activation, - 'use_sync_bn': use_sync_bn, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'kernel_regularizer': kernel_regularizer, - 'bias_regularizer': bias_regularizer, - } - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation = tf_utils.get_activation(activation) - - def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): - """Creates the variables of the head.""" - conv_op = (tf.keras.layers.SeparableConv2D - if self._config_dict['use_separable_conv'] - else tf.keras.layers.Conv2D) - conv_kwargs = { - 'filters': self._config_dict['num_filters'], - 'kernel_size': 3, - 'padding': 'same', - } - if self._config_dict['use_separable_conv']: - conv_kwargs.update({ - 'depthwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'pointwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'depthwise_regularizer': self._config_dict['kernel_regularizer'], - 'pointwise_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - else: - conv_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - bn_op = (tf.keras.layers.experimental.SyncBatchNormalization - if self._config_dict['use_sync_bn'] - else tf.keras.layers.BatchNormalization) - bn_kwargs = { - 'axis': self._bn_axis, - 'momentum': self._config_dict['norm_momentum'], - 'epsilon': self._config_dict['norm_epsilon'], - } - - self._convs = [] - self._conv_norms = [] - for i in range(self._config_dict['num_convs']): - conv_name = 'detection-conv_{}'.format(i) - self._convs.append(conv_op(name=conv_name, **conv_kwargs)) - bn_name = 'detection-conv-bn_{}'.format(i) - self._conv_norms.append(bn_op(name=bn_name, **bn_kwargs)) - - self._fcs = [] - self._fc_norms = [] - for i in range(self._config_dict['num_fcs']): - fc_name = 'detection-fc_{}'.format(i) - self._fcs.append( - tf.keras.layers.Dense( - units=self._config_dict['fc_dims'], - kernel_initializer=tf.keras.initializers.VarianceScaling( - scale=1 / 3.0, mode='fan_out', distribution='uniform'), - kernel_regularizer=self._config_dict['kernel_regularizer'], - bias_regularizer=self._config_dict['bias_regularizer'], - name=fc_name)) - bn_name = 'detection-fc-bn_{}'.format(i) - self._fc_norms.append(bn_op(name=bn_name, **bn_kwargs)) - - self._classifier = tf.keras.layers.Dense( - units=self._config_dict['num_classes'], - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - bias_initializer=tf.zeros_initializer(), - kernel_regularizer=self._config_dict['kernel_regularizer'], - bias_regularizer=self._config_dict['bias_regularizer'], - name='detection-scores') - - num_box_outputs = (4 if self._config_dict['class_agnostic_bbox_pred'] else - self._config_dict['num_classes'] * 4) - self._box_regressor = tf.keras.layers.Dense( - units=num_box_outputs, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.001), - bias_initializer=tf.zeros_initializer(), - kernel_regularizer=self._config_dict['kernel_regularizer'], - bias_regularizer=self._config_dict['bias_regularizer'], - name='detection-boxes') - - super(DetectionHead, self).build(input_shape) - - def call(self, inputs: tf.Tensor, training: bool = None): - """Forward pass of box and class branches for the Mask-RCNN model. - - Args: - inputs: A `tf.Tensor` of the shape [batch_size, num_instances, roi_height, - roi_width, roi_channels], representing the ROI features. - training: a `bool` indicating whether it is in `training` mode. - - Returns: - class_outputs: A `tf.Tensor` of the shape - [batch_size, num_rois, num_classes], representing the class predictions. - box_outputs: A `tf.Tensor` of the shape - [batch_size, num_rois, num_classes * 4], representing the box - predictions. - """ - roi_features = inputs - _, num_rois, height, width, filters = roi_features.get_shape().as_list() - - x = tf.reshape(roi_features, [-1, height, width, filters]) - for conv, bn in zip(self._convs, self._conv_norms): - x = conv(x) - x = bn(x) - x = self._activation(x) - - _, _, _, filters = x.get_shape().as_list() - x = tf.reshape(x, [-1, num_rois, height * width * filters]) - - for fc, bn in zip(self._fcs, self._fc_norms): - x = fc(x) - x = bn(x) - x = self._activation(x) - - classes = self._classifier(x) - boxes = self._box_regressor(x) - return classes, boxes - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class MaskHead(tf.keras.layers.Layer): - """Creates a mask head.""" - - def __init__( - self, - num_classes: int, - upsample_factor: int = 2, - num_convs: int = 4, - num_filters: int = 256, - use_separable_conv: bool = False, - activation: str = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - class_agnostic: bool = False, - **kwargs): - """Initializes a mask head. - - Args: - num_classes: An `int` of the number of classes. - upsample_factor: An `int` that indicates the upsample factor to generate - the final predicted masks. It should be >= 1. - num_convs: An `int` number that represents the number of the intermediate - convolution layers before the mask prediction layers. - num_filters: An `int` number that represents the number of filters of the - intermediate convolution layers. - use_separable_conv: A `bool` that indicates whether the separable - convolution layers is used. - activation: A `str` that indicates which activation is used, e.g. 'relu', - 'swish', etc. - use_sync_bn: A `bool` that indicates whether to use synchronized batch - normalization across different replicas. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default is None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - class_agnostic: A `bool`. If set, we use a single channel mask head that - is shared between all classes. - **kwargs: Additional keyword arguments to be passed. - """ - super(MaskHead, self).__init__(**kwargs) - self._config_dict = { - 'num_classes': num_classes, - 'upsample_factor': upsample_factor, - 'num_convs': num_convs, - 'num_filters': num_filters, - 'use_separable_conv': use_separable_conv, - 'activation': activation, - 'use_sync_bn': use_sync_bn, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'kernel_regularizer': kernel_regularizer, - 'bias_regularizer': bias_regularizer, - 'class_agnostic': class_agnostic - } - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation = tf_utils.get_activation(activation) - - def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): - """Creates the variables of the head.""" - conv_op = (tf.keras.layers.SeparableConv2D - if self._config_dict['use_separable_conv'] - else tf.keras.layers.Conv2D) - conv_kwargs = { - 'filters': self._config_dict['num_filters'], - 'kernel_size': 3, - 'padding': 'same', - } - if self._config_dict['use_separable_conv']: - conv_kwargs.update({ - 'depthwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'pointwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'depthwise_regularizer': self._config_dict['kernel_regularizer'], - 'pointwise_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - else: - conv_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - bn_op = (tf.keras.layers.experimental.SyncBatchNormalization - if self._config_dict['use_sync_bn'] - else tf.keras.layers.BatchNormalization) - bn_kwargs = { - 'axis': self._bn_axis, - 'momentum': self._config_dict['norm_momentum'], - 'epsilon': self._config_dict['norm_epsilon'], - } - - self._convs = [] - self._conv_norms = [] - for i in range(self._config_dict['num_convs']): - conv_name = 'mask-conv_{}'.format(i) - self._convs.append(conv_op(name=conv_name, **conv_kwargs)) - bn_name = 'mask-conv-bn_{}'.format(i) - self._conv_norms.append(bn_op(name=bn_name, **bn_kwargs)) - - self._deconv = tf.keras.layers.Conv2DTranspose( - filters=self._config_dict['num_filters'], - kernel_size=self._config_dict['upsample_factor'], - strides=self._config_dict['upsample_factor'], - padding='valid', - kernel_initializer=tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - bias_initializer=tf.zeros_initializer(), - kernel_regularizer=self._config_dict['kernel_regularizer'], - bias_regularizer=self._config_dict['bias_regularizer'], - name='mask-upsampling') - self._deconv_bn = bn_op(name='mask-deconv-bn', **bn_kwargs) - - if self._config_dict['class_agnostic']: - num_filters = 1 - else: - num_filters = self._config_dict['num_classes'] - - conv_kwargs = { - 'filters': num_filters, - 'kernel_size': 1, - 'padding': 'valid', - } - if self._config_dict['use_separable_conv']: - conv_kwargs.update({ - 'depthwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'pointwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'depthwise_regularizer': self._config_dict['kernel_regularizer'], - 'pointwise_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - else: - conv_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - self._mask_regressor = conv_op(name='mask-logits', **conv_kwargs) - - super(MaskHead, self).build(input_shape) - - def call(self, inputs: List[tf.Tensor], training: bool = None): - """Forward pass of mask branch for the Mask-RCNN model. - - Args: - inputs: A `list` of two tensors where - inputs[0]: A `tf.Tensor` of shape [batch_size, num_instances, - roi_height, roi_width, roi_channels], representing the ROI features. - inputs[1]: A `tf.Tensor` of shape [batch_size, num_instances], - representing the classes of the ROIs. - training: A `bool` indicating whether it is in `training` mode. - - Returns: - mask_outputs: A `tf.Tensor` of shape - [batch_size, num_instances, roi_height * upsample_factor, - roi_width * upsample_factor], representing the mask predictions. - """ - roi_features, roi_classes = inputs - batch_size, num_rois, height, width, filters = ( - roi_features.get_shape().as_list()) - if batch_size is None: - batch_size = tf.shape(roi_features)[0] - - x = tf.reshape(roi_features, [-1, height, width, filters]) - for conv, bn in zip(self._convs, self._conv_norms): - x = conv(x) - x = bn(x) - x = self._activation(x) - - x = self._deconv(x) - x = self._deconv_bn(x) - x = self._activation(x) - - logits = self._mask_regressor(x) - - mask_height = height * self._config_dict['upsample_factor'] - mask_width = width * self._config_dict['upsample_factor'] - - if self._config_dict['class_agnostic']: - logits = tf.reshape(logits, [-1, num_rois, mask_height, mask_width, 1]) - else: - logits = tf.reshape( - logits, - [-1, num_rois, mask_height, mask_width, - self._config_dict['num_classes']]) - - batch_indices = tf.tile( - tf.expand_dims(tf.range(batch_size), axis=1), [1, num_rois]) - mask_indices = tf.tile( - tf.expand_dims(tf.range(num_rois), axis=0), [batch_size, 1]) - - if self._config_dict['class_agnostic']: - class_gather_indices = tf.zeros_like(roi_classes, dtype=tf.int32) - else: - class_gather_indices = tf.cast(roi_classes, dtype=tf.int32) - - gather_indices = tf.stack( - [batch_indices, mask_indices, class_gather_indices], - axis=2) - mask_outputs = tf.gather_nd( - tf.transpose(logits, [0, 1, 4, 2, 3]), gather_indices) - return mask_outputs - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/modeling/heads/instance_heads_test.py b/official/vision/beta/modeling/heads/instance_heads_test.py deleted file mode 100644 index 2f87705ecae7e9a63e45410cf84e8546511540ab..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/heads/instance_heads_test.py +++ /dev/null @@ -1,135 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for instance_heads.py.""" - -# Import libraries -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.vision.beta.modeling.heads import instance_heads - - -class DetectionHeadTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (0, 0, False, False), - (0, 1, False, False), - (1, 0, False, False), - (1, 1, False, False), - ) - def test_forward(self, num_convs, num_fcs, use_separable_conv, use_sync_bn): - detection_head = instance_heads.DetectionHead( - num_classes=3, - num_convs=num_convs, - num_filters=16, - use_separable_conv=use_separable_conv, - num_fcs=num_fcs, - fc_dims=4, - activation='relu', - use_sync_bn=use_sync_bn, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - roi_features = np.random.rand(2, 10, 128, 128, 16) - scores, boxes = detection_head(roi_features) - self.assertAllEqual(scores.numpy().shape, [2, 10, 3]) - self.assertAllEqual(boxes.numpy().shape, [2, 10, 12]) - - def test_serialize_deserialize(self): - detection_head = instance_heads.DetectionHead( - num_classes=91, - num_convs=0, - num_filters=256, - use_separable_conv=False, - num_fcs=2, - fc_dims=1024, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - config = detection_head.get_config() - new_detection_head = instance_heads.DetectionHead.from_config(config) - self.assertAllEqual( - detection_head.get_config(), new_detection_head.get_config()) - - -class MaskHeadTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (1, 1, False), - (1, 2, False), - (2, 1, False), - (2, 2, False), - ) - def test_forward(self, upsample_factor, num_convs, use_sync_bn): - mask_head = instance_heads.MaskHead( - num_classes=3, - upsample_factor=upsample_factor, - num_convs=num_convs, - num_filters=16, - use_separable_conv=False, - activation='relu', - use_sync_bn=use_sync_bn, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - roi_features = np.random.rand(2, 10, 14, 14, 16) - roi_classes = np.zeros((2, 10)) - masks = mask_head([roi_features, roi_classes]) - self.assertAllEqual( - masks.numpy().shape, - [2, 10, 14 * upsample_factor, 14 * upsample_factor]) - - def test_serialize_deserialize(self): - mask_head = instance_heads.MaskHead( - num_classes=3, - upsample_factor=2, - num_convs=1, - num_filters=256, - use_separable_conv=False, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - config = mask_head.get_config() - new_mask_head = instance_heads.MaskHead.from_config(config) - self.assertAllEqual( - mask_head.get_config(), new_mask_head.get_config()) - - def test_forward_class_agnostic(self): - mask_head = instance_heads.MaskHead( - num_classes=3, - class_agnostic=True - ) - roi_features = np.random.rand(2, 10, 14, 14, 16) - roi_classes = np.zeros((2, 10)) - masks = mask_head([roi_features, roi_classes]) - self.assertAllEqual(masks.numpy().shape, [2, 10, 28, 28]) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/layers/__init__.py b/official/vision/beta/modeling/layers/__init__.py deleted file mode 100644 index 4e74bf6083c023cc76432d0afb1f829658d53f44..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/layers/__init__.py +++ /dev/null @@ -1,44 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Layers package definition.""" - -from official.vision.beta.modeling.layers.box_sampler import BoxSampler -from official.vision.beta.modeling.layers.detection_generator import DetectionGenerator -from official.vision.beta.modeling.layers.detection_generator import MultilevelDetectionGenerator -from official.vision.beta.modeling.layers.mask_sampler import MaskSampler -from official.vision.beta.modeling.layers.nn_blocks import BottleneckBlock -from official.vision.beta.modeling.layers.nn_blocks import BottleneckResidualInner -from official.vision.beta.modeling.layers.nn_blocks import DepthwiseSeparableConvBlock -from official.vision.beta.modeling.layers.nn_blocks import InvertedBottleneckBlock -from official.vision.beta.modeling.layers.nn_blocks import ResidualBlock -from official.vision.beta.modeling.layers.nn_blocks import ResidualInner -from official.vision.beta.modeling.layers.nn_blocks import ReversibleLayer -from official.vision.beta.modeling.layers.nn_blocks_3d import BottleneckBlock3D -from official.vision.beta.modeling.layers.nn_blocks_3d import SelfGating -from official.vision.beta.modeling.layers.nn_layers import CausalConvMixin -from official.vision.beta.modeling.layers.nn_layers import Conv2D -from official.vision.beta.modeling.layers.nn_layers import Conv3D -from official.vision.beta.modeling.layers.nn_layers import DepthwiseConv2D -from official.vision.beta.modeling.layers.nn_layers import GlobalAveragePool3D -from official.vision.beta.modeling.layers.nn_layers import PositionalEncoding -from official.vision.beta.modeling.layers.nn_layers import Scale -from official.vision.beta.modeling.layers.nn_layers import SpatialAveragePool3D -from official.vision.beta.modeling.layers.nn_layers import SqueezeExcitation -from official.vision.beta.modeling.layers.nn_layers import StochasticDepth -from official.vision.beta.modeling.layers.nn_layers import TemporalSoftmaxPool -from official.vision.beta.modeling.layers.roi_aligner import MultilevelROIAligner -from official.vision.beta.modeling.layers.roi_generator import MultilevelROIGenerator -from official.vision.beta.modeling.layers.roi_sampler import ROISampler diff --git a/official/vision/beta/modeling/layers/detection_generator.py b/official/vision/beta/modeling/layers/detection_generator.py deleted file mode 100644 index 0460706c98cdd89392f1f588ba5cbfb8195dda76..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/layers/detection_generator.py +++ /dev/null @@ -1,852 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains definitions of generators to generate the final detections.""" -import contextlib -from typing import List, Optional, Mapping -# Import libraries -import tensorflow as tf - -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import nms -from official.vision.beta.ops import preprocess_ops - - -def _generate_detections_v1(boxes: tf.Tensor, - scores: tf.Tensor, - attributes: Optional[Mapping[str, - tf.Tensor]] = None, - pre_nms_top_k: int = 5000, - pre_nms_score_threshold: float = 0.05, - nms_iou_threshold: float = 0.5, - max_num_detections: int = 100, - soft_nms_sigma: Optional[float] = None): - """Generates the final detections given the model outputs. - - The implementation unrolls the batch dimension and process images one by one. - It required the batch dimension to be statically known and it is TPU - compatible. - - Args: - boxes: A `tf.Tensor` with shape `[batch_size, N, num_classes, 4]` or - `[batch_size, N, 1, 4]` for box predictions on all feature levels. The - N is the number of total anchors on all levels. - scores: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which - stacks class probability on all feature levels. The N is the number of - total anchors on all levels. The num_classes is the number of classes - predicted by the model. Note that the class_outputs here is the raw score. - attributes: None or a dict of (attribute_name, attributes) pairs. Each - attributes is a `tf.Tensor` with shape - `[batch_size, N, num_classes, attribute_size]` or - `[batch_size, N, 1, attribute_size]` for attribute predictions on all - feature levels. The N is the number of total anchors on all levels. Can - be None if no attribute learning is required. - pre_nms_top_k: An `int` number of top candidate detections per class before - NMS. - pre_nms_score_threshold: A `float` representing the threshold for deciding - when to remove boxes based on score. - nms_iou_threshold: A `float` representing the threshold for deciding whether - boxes overlap too much with respect to IOU. - max_num_detections: A scalar representing maximum number of boxes retained - over all classes. - soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. - When soft_nms_sigma=0.0 (which is default), we fall back to standard NMS. - - Returns: - nms_boxes: A `float` type `tf.Tensor` of shape - `[batch_size, max_num_detections, 4]` representing top detected boxes in - `[y1, x1, y2, x2]`. - nms_scores: A `float` type `tf.Tensor` of shape - `[batch_size, max_num_detections]` representing sorted confidence scores - for detected boxes. The values are between `[0, 1]`. - nms_classes: An `int` type `tf.Tensor` of shape - `[batch_size, max_num_detections]` representing classes for detected - boxes. - valid_detections: An `int` type `tf.Tensor` of shape `[batch_size]` only the - top `valid_detections` boxes are valid detections. - nms_attributes: None or a dict of (attribute_name, attributes). Each - attribute is a `float` type `tf.Tensor` of shape - `[batch_size, max_num_detections, attribute_size]` representing attribute - predictions for detected boxes. Can be an empty dict if no attribute - learning is required. - """ - with tf.name_scope('generate_detections'): - batch_size = scores.get_shape().as_list()[0] - nmsed_boxes = [] - nmsed_classes = [] - nmsed_scores = [] - valid_detections = [] - if attributes: - nmsed_attributes = {att_name: [] for att_name in attributes.keys()} - else: - nmsed_attributes = {} - - for i in range(batch_size): - (nmsed_boxes_i, nmsed_scores_i, nmsed_classes_i, valid_detections_i, - nmsed_att_i) = _generate_detections_per_image( - boxes[i], - scores[i], - attributes={ - att_name: att[i] for att_name, att in attributes.items() - } if attributes else {}, - pre_nms_top_k=pre_nms_top_k, - pre_nms_score_threshold=pre_nms_score_threshold, - nms_iou_threshold=nms_iou_threshold, - max_num_detections=max_num_detections, - soft_nms_sigma=soft_nms_sigma) - nmsed_boxes.append(nmsed_boxes_i) - nmsed_scores.append(nmsed_scores_i) - nmsed_classes.append(nmsed_classes_i) - valid_detections.append(valid_detections_i) - if attributes: - for att_name in attributes.keys(): - nmsed_attributes[att_name].append(nmsed_att_i[att_name]) - - nmsed_boxes = tf.stack(nmsed_boxes, axis=0) - nmsed_scores = tf.stack(nmsed_scores, axis=0) - nmsed_classes = tf.stack(nmsed_classes, axis=0) - valid_detections = tf.stack(valid_detections, axis=0) - if attributes: - for att_name in attributes.keys(): - nmsed_attributes[att_name] = tf.stack(nmsed_attributes[att_name], axis=0) - - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, nmsed_attributes - - -def _generate_detections_per_image( - boxes: tf.Tensor, - scores: tf.Tensor, - attributes: Optional[Mapping[str, tf.Tensor]] = None, - pre_nms_top_k: int = 5000, - pre_nms_score_threshold: float = 0.05, - nms_iou_threshold: float = 0.5, - max_num_detections: int = 100, - soft_nms_sigma: Optional[float] = None): - """Generates the final detections per image given the model outputs. - - Args: - boxes: A `tf.Tensor` with shape `[N, num_classes, 4]` or `[N, 1, 4]`, which - box predictions on all feature levels. The N is the number of total - anchors on all levels. - scores: A `tf.Tensor` with shape `[N, num_classes]`, which stacks class - probability on all feature levels. The N is the number of total anchors on - all levels. The num_classes is the number of classes predicted by the - model. Note that the class_outputs here is the raw score. - attributes: If not None, a dict of `tf.Tensor`. Each value is in shape - `[N, num_classes, attribute_size]` or `[N, 1, attribute_size]` of - attribute predictions on all feature levels. The N is the number of total - anchors on all levels. - pre_nms_top_k: An `int` number of top candidate detections per class before - NMS. - pre_nms_score_threshold: A `float` representing the threshold for deciding - when to remove boxes based on score. - nms_iou_threshold: A `float` representing the threshold for deciding whether - boxes overlap too much with respect to IOU. - max_num_detections: A `scalar` representing maximum number of boxes retained - over all classes. - soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. - When soft_nms_sigma=0.0, we fall back to standard NMS. - If set to None, `tf.image.non_max_suppression_padded` is called instead. - - Returns: - nms_boxes: A `float` tf.Tensor of shape `[max_num_detections, 4]` - representing top detected boxes in `[y1, x1, y2, x2]`. - nms_scores: A `float` tf.Tensor of shape `[max_num_detections]` representing - sorted confidence scores for detected boxes. The values are between [0, - 1]. - nms_classes: An `int` tf.Tensor of shape `[max_num_detections]` representing - classes for detected boxes. - valid_detections: An `int` tf.Tensor of shape [1] only the top - `valid_detections` boxes are valid detections. - nms_attributes: None or a dict. Each value is a `float` tf.Tensor of shape - `[max_num_detections, attribute_size]` representing attribute predictions - for detected boxes. Can be an empty dict if `attributes` is None. - """ - nmsed_boxes = [] - nmsed_scores = [] - nmsed_classes = [] - num_classes_for_box = boxes.get_shape().as_list()[1] - num_classes = scores.get_shape().as_list()[1] - if attributes: - nmsed_attributes = {att_name: [] for att_name in attributes.keys()} - else: - nmsed_attributes = {} - - for i in range(num_classes): - boxes_i = boxes[:, min(num_classes_for_box - 1, i)] - scores_i = scores[:, i] - # Obtains pre_nms_top_k before running NMS. - scores_i, indices = tf.nn.top_k( - scores_i, k=tf.minimum(tf.shape(scores_i)[-1], pre_nms_top_k)) - boxes_i = tf.gather(boxes_i, indices) - - if soft_nms_sigma is not None: - (nmsed_indices_i, - nmsed_scores_i) = tf.image.non_max_suppression_with_scores( - tf.cast(boxes_i, tf.float32), - tf.cast(scores_i, tf.float32), - max_num_detections, - iou_threshold=nms_iou_threshold, - score_threshold=pre_nms_score_threshold, - soft_nms_sigma=soft_nms_sigma, - name='nms_detections_' + str(i)) - nmsed_boxes_i = tf.gather(boxes_i, nmsed_indices_i) - nmsed_boxes_i = preprocess_ops.clip_or_pad_to_fixed_size( - nmsed_boxes_i, max_num_detections, 0.0) - nmsed_scores_i = preprocess_ops.clip_or_pad_to_fixed_size( - nmsed_scores_i, max_num_detections, -1.0) - else: - (nmsed_indices_i, - nmsed_num_valid_i) = tf.image.non_max_suppression_padded( - tf.cast(boxes_i, tf.float32), - tf.cast(scores_i, tf.float32), - max_num_detections, - iou_threshold=nms_iou_threshold, - score_threshold=pre_nms_score_threshold, - pad_to_max_output_size=True, - name='nms_detections_' + str(i)) - nmsed_boxes_i = tf.gather(boxes_i, nmsed_indices_i) - nmsed_scores_i = tf.gather(scores_i, nmsed_indices_i) - # Sets scores of invalid boxes to -1. - nmsed_scores_i = tf.where( - tf.less(tf.range(max_num_detections), [nmsed_num_valid_i]), - nmsed_scores_i, -tf.ones_like(nmsed_scores_i)) - - nmsed_classes_i = tf.fill([max_num_detections], i) - nmsed_boxes.append(nmsed_boxes_i) - nmsed_scores.append(nmsed_scores_i) - nmsed_classes.append(nmsed_classes_i) - if attributes: - for att_name, att in attributes.items(): - num_classes_for_attr = att.get_shape().as_list()[1] - att_i = att[:, min(num_classes_for_attr - 1, i)] - att_i = tf.gather(att_i, indices) - nmsed_att_i = tf.gather(att_i, nmsed_indices_i) - nmsed_att_i = preprocess_ops.clip_or_pad_to_fixed_size( - nmsed_att_i, max_num_detections, 0.0) - nmsed_attributes[att_name].append(nmsed_att_i) - - # Concats results from all classes and sort them. - nmsed_boxes = tf.concat(nmsed_boxes, axis=0) - nmsed_scores = tf.concat(nmsed_scores, axis=0) - nmsed_classes = tf.concat(nmsed_classes, axis=0) - nmsed_scores, indices = tf.nn.top_k( - nmsed_scores, k=max_num_detections, sorted=True) - nmsed_boxes = tf.gather(nmsed_boxes, indices) - nmsed_classes = tf.gather(nmsed_classes, indices) - valid_detections = tf.reduce_sum( - tf.cast(tf.greater(nmsed_scores, -1), tf.int32)) - if attributes: - for att_name in attributes.keys(): - nmsed_attributes[att_name] = tf.concat(nmsed_attributes[att_name], axis=0) - nmsed_attributes[att_name] = tf.gather(nmsed_attributes[att_name], - indices) - - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, nmsed_attributes - - -def _select_top_k_scores(scores_in: tf.Tensor, pre_nms_num_detections: int): - """Selects top_k scores and indices for each class. - - Args: - scores_in: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which - stacks class logit outputs on all feature levels. The N is the number of - total anchors on all levels. The num_classes is the number of classes - predicted by the model. - pre_nms_num_detections: Number of candidates before NMS. - - Returns: - scores and indices: A `tf.Tensor` with shape - `[batch_size, pre_nms_num_detections, num_classes]`. - """ - batch_size, num_anchors, num_class = scores_in.get_shape().as_list() - if batch_size is None: - batch_size = tf.shape(scores_in)[0] - scores_trans = tf.transpose(scores_in, perm=[0, 2, 1]) - scores_trans = tf.reshape(scores_trans, [-1, num_anchors]) - - top_k_scores, top_k_indices = tf.nn.top_k( - scores_trans, k=pre_nms_num_detections, sorted=True) - - top_k_scores = tf.reshape(top_k_scores, - [batch_size, num_class, pre_nms_num_detections]) - top_k_indices = tf.reshape(top_k_indices, - [batch_size, num_class, pre_nms_num_detections]) - - return tf.transpose(top_k_scores, - [0, 2, 1]), tf.transpose(top_k_indices, [0, 2, 1]) - - -def _generate_detections_v2(boxes: tf.Tensor, - scores: tf.Tensor, - pre_nms_top_k: int = 5000, - pre_nms_score_threshold: float = 0.05, - nms_iou_threshold: float = 0.5, - max_num_detections: int = 100): - """Generates the final detections given the model outputs. - - This implementation unrolls classes dimension while using the tf.while_loop - to implement the batched NMS, so that it can be parallelized at the batch - dimension. It should give better performance comparing to v1 implementation. - It is TPU compatible. - - Args: - boxes: A `tf.Tensor` with shape `[batch_size, N, num_classes, 4]` or - `[batch_size, N, 1, 4]`, which box predictions on all feature levels. The - N is the number of total anchors on all levels. - scores: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which - stacks class probability on all feature levels. The N is the number of - total anchors on all levels. The num_classes is the number of classes - predicted by the model. Note that the class_outputs here is the raw score. - pre_nms_top_k: An `int` number of top candidate detections per class before - NMS. - pre_nms_score_threshold: A `float` representing the threshold for deciding - when to remove boxes based on score. - nms_iou_threshold: A `float` representing the threshold for deciding whether - boxes overlap too much with respect to IOU. - max_num_detections: A `scalar` representing maximum number of boxes retained - over all classes. - - Returns: - nms_boxes: A `float` tf.Tensor of shape [batch_size, max_num_detections, 4] - representing top detected boxes in [y1, x1, y2, x2]. - nms_scores: A `float` tf.Tensor of shape [batch_size, max_num_detections] - representing sorted confidence scores for detected boxes. The values are - between [0, 1]. - nms_classes: An `int` tf.Tensor of shape [batch_size, max_num_detections] - representing classes for detected boxes. - valid_detections: An `int` tf.Tensor of shape [batch_size] only the top - `valid_detections` boxes are valid detections. - """ - with tf.name_scope('generate_detections'): - nmsed_boxes = [] - nmsed_classes = [] - nmsed_scores = [] - valid_detections = [] - batch_size, _, num_classes_for_box, _ = boxes.get_shape().as_list() - if batch_size is None: - batch_size = tf.shape(boxes)[0] - _, total_anchors, num_classes = scores.get_shape().as_list() - # Selects top pre_nms_num scores and indices before NMS. - scores, indices = _select_top_k_scores( - scores, min(total_anchors, pre_nms_top_k)) - for i in range(num_classes): - boxes_i = boxes[:, :, min(num_classes_for_box - 1, i), :] - scores_i = scores[:, :, i] - # Obtains pre_nms_top_k before running NMS. - boxes_i = tf.gather(boxes_i, indices[:, :, i], batch_dims=1, axis=1) - - # Filter out scores. - boxes_i, scores_i = box_ops.filter_boxes_by_scores( - boxes_i, scores_i, min_score_threshold=pre_nms_score_threshold) - - (nmsed_scores_i, nmsed_boxes_i) = nms.sorted_non_max_suppression_padded( - tf.cast(scores_i, tf.float32), - tf.cast(boxes_i, tf.float32), - max_num_detections, - iou_threshold=nms_iou_threshold) - nmsed_classes_i = tf.fill([batch_size, max_num_detections], i) - nmsed_boxes.append(nmsed_boxes_i) - nmsed_scores.append(nmsed_scores_i) - nmsed_classes.append(nmsed_classes_i) - nmsed_boxes = tf.concat(nmsed_boxes, axis=1) - nmsed_scores = tf.concat(nmsed_scores, axis=1) - nmsed_classes = tf.concat(nmsed_classes, axis=1) - nmsed_scores, indices = tf.nn.top_k( - nmsed_scores, k=max_num_detections, sorted=True) - nmsed_boxes = tf.gather(nmsed_boxes, indices, batch_dims=1, axis=1) - nmsed_classes = tf.gather(nmsed_classes, indices, batch_dims=1) - valid_detections = tf.reduce_sum( - input_tensor=tf.cast(tf.greater(nmsed_scores, 0.0), tf.int32), axis=1) - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections - - -def _generate_detections_batched(boxes: tf.Tensor, scores: tf.Tensor, - pre_nms_score_threshold: float, - nms_iou_threshold: float, - max_num_detections: int): - """Generates detected boxes with scores and classes for one-stage detector. - - The function takes output of multi-level ConvNets and anchor boxes and - generates detected boxes. Note that this used batched nms, which is not - supported on TPU currently. - - Args: - boxes: A `tf.Tensor` with shape `[batch_size, N, num_classes, 4]` or - `[batch_size, N, 1, 4]`, which box predictions on all feature levels. The - N is the number of total anchors on all levels. - scores: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which - stacks class probability on all feature levels. The N is the number of - total anchors on all levels. The num_classes is the number of classes - predicted by the model. Note that the class_outputs here is the raw score. - pre_nms_score_threshold: A `float` representing the threshold for deciding - when to remove boxes based on score. - nms_iou_threshold: A `float` representing the threshold for deciding whether - boxes overlap too much with respect to IOU. - max_num_detections: A `scalar` representing maximum number of boxes retained - over all classes. - - Returns: - nms_boxes: A `float` tf.Tensor of shape [batch_size, max_num_detections, 4] - representing top detected boxes in [y1, x1, y2, x2]. - nms_scores: A `float` tf.Tensor of shape [batch_size, max_num_detections] - representing sorted confidence scores for detected boxes. The values are - between [0, 1]. - nms_classes: An `int` tf.Tensor of shape [batch_size, max_num_detections] - representing classes for detected boxes. - valid_detections: An `int` tf.Tensor of shape [batch_size] only the top - `valid_detections` boxes are valid detections. - """ - with tf.name_scope('generate_detections'): - nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections = ( - tf.image.combined_non_max_suppression( - boxes, - scores, - max_output_size_per_class=max_num_detections, - max_total_size=max_num_detections, - iou_threshold=nms_iou_threshold, - score_threshold=pre_nms_score_threshold, - pad_per_class=False, - clip_boxes=False)) - nmsed_classes = tf.cast(nmsed_classes, tf.int32) - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class DetectionGenerator(tf.keras.layers.Layer): - """Generates the final detected boxes with scores and classes.""" - - def __init__(self, - apply_nms: bool = True, - pre_nms_top_k: int = 5000, - pre_nms_score_threshold: float = 0.05, - nms_iou_threshold: float = 0.5, - max_num_detections: int = 100, - nms_version: str = 'v2', - use_cpu_nms: bool = False, - soft_nms_sigma: Optional[float] = None, - **kwargs): - """Initializes a detection generator. - - Args: - apply_nms: A `bool` of whether or not apply non maximum suppression. - If False, the decoded boxes and their scores are returned. - pre_nms_top_k: An `int` of the number of top scores proposals to be kept - before applying NMS. - pre_nms_score_threshold: A `float` of the score threshold to apply before - applying NMS. Proposals whose scores are below this threshold are - thrown away. - nms_iou_threshold: A `float` in [0, 1], the NMS IoU threshold. - max_num_detections: An `int` of the final number of total detections to - generate. - nms_version: A string of `batched`, `v1` or `v2` specifies NMS version. - use_cpu_nms: A `bool` of whether or not enforce NMS to run on CPU. - soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. - When soft_nms_sigma=0.0, we fall back to standard NMS. - **kwargs: Additional keyword arguments passed to Layer. - """ - self._config_dict = { - 'apply_nms': apply_nms, - 'pre_nms_top_k': pre_nms_top_k, - 'pre_nms_score_threshold': pre_nms_score_threshold, - 'nms_iou_threshold': nms_iou_threshold, - 'max_num_detections': max_num_detections, - 'nms_version': nms_version, - 'use_cpu_nms': use_cpu_nms, - 'soft_nms_sigma': soft_nms_sigma, - } - super(DetectionGenerator, self).__init__(**kwargs) - - def __call__(self, - raw_boxes: tf.Tensor, - raw_scores: tf.Tensor, - anchor_boxes: tf.Tensor, - image_shape: tf.Tensor, - regression_weights: Optional[List[float]] = None, - bbox_per_class: bool = True): - """Generates final detections. - - Args: - raw_boxes: A `tf.Tensor` of shape of `[batch_size, K, num_classes * 4]` - representing the class-specific box coordinates relative to anchors. - raw_scores: A `tf.Tensor` of shape of `[batch_size, K, num_classes]` - representing the class logits before applying score activiation. - anchor_boxes: A `tf.Tensor` of shape of `[batch_size, K, 4]` representing - the corresponding anchor boxes w.r.t `box_outputs`. - image_shape: A `tf.Tensor` of shape of `[batch_size, 2]` storing the image - height and width w.r.t. the scaled image, i.e. the same image space as - `box_outputs` and `anchor_boxes`. - regression_weights: A list of four float numbers to scale coordinates. - bbox_per_class: A `bool`. If True, perform per-class box regression. - - Returns: - If `apply_nms` = True, the return is a dictionary with keys: - `detection_boxes`: A `float` tf.Tensor of shape - [batch, max_num_detections, 4] representing top detected boxes in - [y1, x1, y2, x2]. - `detection_scores`: A `float` `tf.Tensor` of shape - [batch, max_num_detections] representing sorted confidence scores for - detected boxes. The values are between [0, 1]. - `detection_classes`: An `int` tf.Tensor of shape - [batch, max_num_detections] representing classes for detected boxes. - `num_detections`: An `int` tf.Tensor of shape [batch] only the first - `num_detections` boxes are valid detections - If `apply_nms` = False, the return is a dictionary with keys: - `decoded_boxes`: A `float` tf.Tensor of shape [batch, num_raw_boxes, 4] - representing all the decoded boxes. - `decoded_box_scores`: A `float` tf.Tensor of shape - [batch, num_raw_boxes] representing socres of all the decoded boxes. - """ - box_scores = tf.nn.softmax(raw_scores, axis=-1) - - # Removes the background class. - box_scores_shape = tf.shape(box_scores) - box_scores_shape_list = box_scores.get_shape().as_list() - batch_size = box_scores_shape[0] - num_locations = box_scores_shape_list[1] - num_classes = box_scores_shape_list[-1] - - box_scores = tf.slice(box_scores, [0, 0, 1], [-1, -1, -1]) - - if bbox_per_class: - num_detections = num_locations * (num_classes - 1) - raw_boxes = tf.reshape(raw_boxes, - [batch_size, num_locations, num_classes, 4]) - raw_boxes = tf.slice(raw_boxes, [0, 0, 1, 0], [-1, -1, -1, -1]) - anchor_boxes = tf.tile( - tf.expand_dims(anchor_boxes, axis=2), [1, 1, num_classes - 1, 1]) - raw_boxes = tf.reshape(raw_boxes, [batch_size, num_detections, 4]) - anchor_boxes = tf.reshape(anchor_boxes, [batch_size, num_detections, 4]) - - # Box decoding. - decoded_boxes = box_ops.decode_boxes( - raw_boxes, anchor_boxes, weights=regression_weights) - - # Box clipping - decoded_boxes = box_ops.clip_boxes( - decoded_boxes, tf.expand_dims(image_shape, axis=1)) - - if bbox_per_class: - decoded_boxes = tf.reshape( - decoded_boxes, [batch_size, num_locations, num_classes - 1, 4]) - else: - decoded_boxes = tf.expand_dims(decoded_boxes, axis=2) - - if not self._config_dict['apply_nms']: - return { - 'decoded_boxes': decoded_boxes, - 'decoded_box_scores': box_scores, - } - - # Optionally force the NMS be run on CPU. - if self._config_dict['use_cpu_nms']: - nms_context = tf.device('cpu:0') - else: - nms_context = contextlib.nullcontext() - - with nms_context: - if self._config_dict['nms_version'] == 'batched': - (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( - _generate_detections_batched( - decoded_boxes, box_scores, - self._config_dict['pre_nms_score_threshold'], - self._config_dict['nms_iou_threshold'], - self._config_dict['max_num_detections'])) - elif self._config_dict['nms_version'] == 'v1': - (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, _) = ( - _generate_detections_v1( - decoded_boxes, - box_scores, - pre_nms_top_k=self._config_dict['pre_nms_top_k'], - pre_nms_score_threshold=self - ._config_dict['pre_nms_score_threshold'], - nms_iou_threshold=self._config_dict['nms_iou_threshold'], - max_num_detections=self._config_dict['max_num_detections'], - soft_nms_sigma=self._config_dict['soft_nms_sigma'])) - elif self._config_dict['nms_version'] == 'v2': - (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( - _generate_detections_v2( - decoded_boxes, - box_scores, - pre_nms_top_k=self._config_dict['pre_nms_top_k'], - pre_nms_score_threshold=self - ._config_dict['pre_nms_score_threshold'], - nms_iou_threshold=self._config_dict['nms_iou_threshold'], - max_num_detections=self._config_dict['max_num_detections'])) - else: - raise ValueError('NMS version {} not supported.'.format( - self._config_dict['nms_version'])) - - # Adds 1 to offset the background class which has index 0. - nmsed_classes += 1 - - return { - 'num_detections': valid_detections, - 'detection_boxes': nmsed_boxes, - 'detection_classes': nmsed_classes, - 'detection_scores': nmsed_scores, - } - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class MultilevelDetectionGenerator(tf.keras.layers.Layer): - """Generates detected boxes with scores and classes for one-stage detector.""" - - def __init__(self, - apply_nms: bool = True, - pre_nms_top_k: int = 5000, - pre_nms_score_threshold: float = 0.05, - nms_iou_threshold: float = 0.5, - max_num_detections: int = 100, - nms_version: str = 'v1', - use_cpu_nms: bool = False, - soft_nms_sigma: Optional[float] = None, - **kwargs): - """Initializes a multi-level detection generator. - - Args: - apply_nms: A `bool` of whether or not apply non maximum suppression. If - False, the decoded boxes and their scores are returned. - pre_nms_top_k: An `int` of the number of top scores proposals to be kept - before applying NMS. - pre_nms_score_threshold: A `float` of the score threshold to apply before - applying NMS. Proposals whose scores are below this threshold are thrown - away. - nms_iou_threshold: A `float` in [0, 1], the NMS IoU threshold. - max_num_detections: An `int` of the final number of total detections to - generate. - nms_version: A string of `batched`, `v1` or `v2` specifies NMS version - use_cpu_nms: A `bool` of whether or not enforce NMS to run on CPU. - soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. - When soft_nms_sigma=0.0, we fall back to standard NMS. - **kwargs: Additional keyword arguments passed to Layer. - """ - self._config_dict = { - 'apply_nms': apply_nms, - 'pre_nms_top_k': pre_nms_top_k, - 'pre_nms_score_threshold': pre_nms_score_threshold, - 'nms_iou_threshold': nms_iou_threshold, - 'max_num_detections': max_num_detections, - 'nms_version': nms_version, - 'use_cpu_nms': use_cpu_nms, - 'soft_nms_sigma': soft_nms_sigma, - } - super(MultilevelDetectionGenerator, self).__init__(**kwargs) - - def _decode_multilevel_outputs( - self, - raw_boxes: Mapping[str, tf.Tensor], - raw_scores: Mapping[str, tf.Tensor], - anchor_boxes: tf.Tensor, - image_shape: tf.Tensor, - raw_attributes: Optional[Mapping[str, tf.Tensor]] = None): - """Collects dict of multilevel boxes, scores, attributes into lists.""" - boxes = [] - scores = [] - if raw_attributes: - attributes = {att_name: [] for att_name in raw_attributes.keys()} - else: - attributes = {} - - levels = list(raw_boxes.keys()) - min_level = int(min(levels)) - max_level = int(max(levels)) - for i in range(min_level, max_level + 1): - raw_boxes_i = raw_boxes[str(i)] - raw_scores_i = raw_scores[str(i)] - batch_size = tf.shape(raw_boxes_i)[0] - (_, feature_h_i, feature_w_i, - num_anchors_per_locations_times_4) = raw_boxes_i.get_shape().as_list() - num_locations = feature_h_i * feature_w_i - num_anchors_per_locations = num_anchors_per_locations_times_4 // 4 - num_classes = raw_scores_i.get_shape().as_list( - )[-1] // num_anchors_per_locations - - # Applies score transformation and remove the implicit background class. - scores_i = tf.sigmoid( - tf.reshape(raw_scores_i, [ - batch_size, num_locations * num_anchors_per_locations, num_classes - ])) - scores_i = tf.slice(scores_i, [0, 0, 1], [-1, -1, -1]) - - # Box decoding. - # The anchor boxes are shared for all data in a batch. - # One stage detector only supports class agnostic box regression. - anchor_boxes_i = tf.reshape( - anchor_boxes[str(i)], - [batch_size, num_locations * num_anchors_per_locations, 4]) - raw_boxes_i = tf.reshape( - raw_boxes_i, - [batch_size, num_locations * num_anchors_per_locations, 4]) - boxes_i = box_ops.decode_boxes(raw_boxes_i, anchor_boxes_i) - - # Box clipping. - boxes_i = box_ops.clip_boxes( - boxes_i, tf.expand_dims(image_shape, axis=1)) - - boxes.append(boxes_i) - scores.append(scores_i) - - if raw_attributes: - for att_name, raw_att in raw_attributes.items(): - attribute_size = raw_att[str( - i)].get_shape().as_list()[-1] // num_anchors_per_locations - att_i = tf.reshape(raw_att[str(i)], [ - batch_size, num_locations * num_anchors_per_locations, - attribute_size - ]) - attributes[att_name].append(att_i) - - boxes = tf.concat(boxes, axis=1) - boxes = tf.expand_dims(boxes, axis=2) - scores = tf.concat(scores, axis=1) - - if raw_attributes: - for att_name in raw_attributes.keys(): - attributes[att_name] = tf.concat(attributes[att_name], axis=1) - attributes[att_name] = tf.expand_dims(attributes[att_name], axis=2) - - return boxes, scores, attributes - - def __call__(self, - raw_boxes: Mapping[str, tf.Tensor], - raw_scores: Mapping[str, tf.Tensor], - anchor_boxes: tf.Tensor, - image_shape: tf.Tensor, - raw_attributes: Optional[Mapping[str, tf.Tensor]] = None): - """Generates final detections. - - Args: - raw_boxes: A `dict` with keys representing FPN levels and values - representing box tenors of shape `[batch, feature_h, feature_w, - num_anchors * 4]`. - raw_scores: A `dict` with keys representing FPN levels and values - representing logit tensors of shape `[batch, feature_h, feature_w, - num_anchors]`. - anchor_boxes: A `tf.Tensor` of shape of [batch_size, K, 4] representing - the corresponding anchor boxes w.r.t `box_outputs`. - image_shape: A `tf.Tensor` of shape of [batch_size, 2] storing the image - height and width w.r.t. the scaled image, i.e. the same image space as - `box_outputs` and `anchor_boxes`. - raw_attributes: If not None, a `dict` of (attribute_name, - attribute_prediction) pairs. `attribute_prediction` is a dict that - contains keys representing FPN levels and values representing tenors of - shape `[batch, feature_h, feature_w, num_anchors * attribute_size]`. - - Returns: - If `apply_nms` = True, the return is a dictionary with keys: - `detection_boxes`: A `float` tf.Tensor of shape - [batch, max_num_detections, 4] representing top detected boxes in - [y1, x1, y2, x2]. - `detection_scores`: A `float` tf.Tensor of shape - [batch, max_num_detections] representing sorted confidence scores for - detected boxes. The values are between [0, 1]. - `detection_classes`: An `int` tf.Tensor of shape - [batch, max_num_detections] representing classes for detected boxes. - `num_detections`: An `int` tf.Tensor of shape [batch] only the first - `num_detections` boxes are valid detections - `detection_attributes`: A dict. Values of the dict is a `float` - tf.Tensor of shape [batch, max_num_detections, attribute_size] - representing attribute predictions for detected boxes. - If `apply_nms` = False, the return is a dictionary with keys: - `decoded_boxes`: A `float` tf.Tensor of shape [batch, num_raw_boxes, 4] - representing all the decoded boxes. - `decoded_box_scores`: A `float` tf.Tensor of shape - [batch, num_raw_boxes] representing socres of all the decoded boxes. - `decoded_box_attributes`: A dict. Values in the dict is a - `float` tf.Tensor of shape [batch, num_raw_boxes, attribute_size] - representing attribute predictions of all the decoded boxes. - """ - boxes, scores, attributes = self._decode_multilevel_outputs( - raw_boxes, raw_scores, anchor_boxes, image_shape, raw_attributes) - - if not self._config_dict['apply_nms']: - return { - 'decoded_boxes': boxes, - 'decoded_box_scores': scores, - 'decoded_box_attributes': attributes, - } - - # Optionally force the NMS to run on CPU. - if self._config_dict['use_cpu_nms']: - nms_context = tf.device('cpu:0') - else: - nms_context = contextlib.nullcontext() - - with nms_context: - if raw_attributes and (self._config_dict['nms_version'] != 'v1'): - raise ValueError( - 'Attribute learning is only supported for NMSv1 but NMS {} is used.' - .format(self._config_dict['nms_version'])) - if self._config_dict['nms_version'] == 'batched': - (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( - _generate_detections_batched( - boxes, scores, self._config_dict['pre_nms_score_threshold'], - self._config_dict['nms_iou_threshold'], - self._config_dict['max_num_detections'])) - # Set `nmsed_attributes` to None for batched NMS. - nmsed_attributes = {} - elif self._config_dict['nms_version'] == 'v1': - (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, - nmsed_attributes) = ( - _generate_detections_v1( - boxes, - scores, - attributes=attributes if raw_attributes else None, - pre_nms_top_k=self._config_dict['pre_nms_top_k'], - pre_nms_score_threshold=self - ._config_dict['pre_nms_score_threshold'], - nms_iou_threshold=self._config_dict['nms_iou_threshold'], - max_num_detections=self._config_dict['max_num_detections'], - soft_nms_sigma=self._config_dict['soft_nms_sigma'])) - elif self._config_dict['nms_version'] == 'v2': - (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( - _generate_detections_v2( - boxes, - scores, - pre_nms_top_k=self._config_dict['pre_nms_top_k'], - pre_nms_score_threshold=self - ._config_dict['pre_nms_score_threshold'], - nms_iou_threshold=self._config_dict['nms_iou_threshold'], - max_num_detections=self._config_dict['max_num_detections'])) - # Set `nmsed_attributes` to None for v2. - nmsed_attributes = {} - else: - raise ValueError('NMS version {} not supported.'.format( - self._config_dict['nms_version'])) - - # Adds 1 to offset the background class which has index 0. - nmsed_classes += 1 - - return { - 'num_detections': valid_detections, - 'detection_boxes': nmsed_boxes, - 'detection_classes': nmsed_classes, - 'detection_scores': nmsed_scores, - 'detection_attributes': nmsed_attributes, - } - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/modeling/layers/detection_generator_test.py b/official/vision/beta/modeling/layers/detection_generator_test.py deleted file mode 100644 index 7660cb537f54f3b1b9fbffa6dbeee586d580a73d..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/layers/detection_generator_test.py +++ /dev/null @@ -1,249 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for detection_generator.py.""" -# Import libraries - -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.ops import anchor - - -class SelectTopKScoresTest(tf.test.TestCase): - - def testSelectTopKScores(self): - pre_nms_num_boxes = 2 - scores_data = [[[0.2, 0.2], [0.1, 0.9], [0.5, 0.1], [0.3, 0.5]]] - scores_in = tf.constant(scores_data, dtype=tf.float32) - top_k_scores, top_k_indices = detection_generator._select_top_k_scores( - scores_in, pre_nms_num_detections=pre_nms_num_boxes) - expected_top_k_scores = np.array([[[0.5, 0.9], [0.3, 0.5]]], - dtype=np.float32) - - expected_top_k_indices = [[[2, 1], [3, 3]]] - - self.assertAllEqual(top_k_scores.numpy(), expected_top_k_scores) - self.assertAllEqual(top_k_indices.numpy(), expected_top_k_indices) - - -class DetectionGeneratorTest( - parameterized.TestCase, tf.test.TestCase): - - @parameterized.product( - nms_version=['batched', 'v1', 'v2'], - use_cpu_nms=[True, False], - soft_nms_sigma=[None, 0.1]) - def testDetectionsOutputShape(self, nms_version, use_cpu_nms, soft_nms_sigma): - max_num_detections = 10 - num_classes = 4 - pre_nms_top_k = 5000 - pre_nms_score_threshold = 0.01 - batch_size = 1 - kwargs = { - 'apply_nms': True, - 'pre_nms_top_k': pre_nms_top_k, - 'pre_nms_score_threshold': pre_nms_score_threshold, - 'nms_iou_threshold': 0.5, - 'max_num_detections': max_num_detections, - 'nms_version': nms_version, - 'use_cpu_nms': use_cpu_nms, - 'soft_nms_sigma': soft_nms_sigma, - } - generator = detection_generator.DetectionGenerator(**kwargs) - - cls_outputs_all = ( - np.random.rand(84, num_classes) - 0.5) * 3 # random 84x3 outputs. - box_outputs_all = np.random.rand(84, 4 * num_classes) # random 84 boxes. - anchor_boxes_all = np.random.rand(84, 4) # random 84 boxes. - class_outputs = tf.reshape( - tf.convert_to_tensor(cls_outputs_all, dtype=tf.float32), - [1, 84, num_classes]) - box_outputs = tf.reshape( - tf.convert_to_tensor(box_outputs_all, dtype=tf.float32), - [1, 84, 4 * num_classes]) - anchor_boxes = tf.reshape( - tf.convert_to_tensor(anchor_boxes_all, dtype=tf.float32), - [1, 84, 4]) - image_info = tf.constant( - [[[1000, 1000], [100, 100], [0.1, 0.1], [0, 0]]], - dtype=tf.float32) - results = generator( - box_outputs, class_outputs, anchor_boxes, image_info[:, 1, :]) - boxes = results['detection_boxes'] - classes = results['detection_classes'] - scores = results['detection_scores'] - valid_detections = results['num_detections'] - - self.assertEqual(boxes.numpy().shape, (batch_size, max_num_detections, 4)) - self.assertEqual(scores.numpy().shape, (batch_size, max_num_detections,)) - self.assertEqual(classes.numpy().shape, (batch_size, max_num_detections,)) - self.assertEqual(valid_detections.numpy().shape, (batch_size,)) - - def test_serialize_deserialize(self): - kwargs = { - 'apply_nms': True, - 'pre_nms_top_k': 1000, - 'pre_nms_score_threshold': 0.1, - 'nms_iou_threshold': 0.5, - 'max_num_detections': 10, - 'nms_version': 'v2', - 'use_cpu_nms': False, - 'soft_nms_sigma': None, - } - generator = detection_generator.DetectionGenerator(**kwargs) - - expected_config = dict(kwargs) - self.assertEqual(generator.get_config(), expected_config) - - new_generator = ( - detection_generator.DetectionGenerator.from_config( - generator.get_config())) - - self.assertAllEqual(generator.get_config(), new_generator.get_config()) - - -class MultilevelDetectionGeneratorTest( - parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - ('batched', False, True, None), - ('batched', False, False, None), - ('v2', False, True, None), - ('v2', False, False, None), - ('v1', True, True, 0.0), - ('v1', True, False, 0.1), - ('v1', True, False, None), - ) - def testDetectionsOutputShape(self, nms_version, has_att_heads, use_cpu_nms, - soft_nms_sigma): - min_level = 4 - max_level = 6 - num_scales = 2 - max_num_detections = 10 - aspect_ratios = [1.0, 2.0] - anchor_scale = 2.0 - output_size = [64, 64] - num_classes = 4 - pre_nms_top_k = 5000 - pre_nms_score_threshold = 0.01 - batch_size = 1 - kwargs = { - 'apply_nms': True, - 'pre_nms_top_k': pre_nms_top_k, - 'pre_nms_score_threshold': pre_nms_score_threshold, - 'nms_iou_threshold': 0.5, - 'max_num_detections': max_num_detections, - 'nms_version': nms_version, - 'use_cpu_nms': use_cpu_nms, - 'soft_nms_sigma': soft_nms_sigma, - } - - input_anchor = anchor.build_anchor_generator(min_level, max_level, - num_scales, aspect_ratios, - anchor_scale) - anchor_boxes = input_anchor(output_size) - cls_outputs_all = ( - np.random.rand(84, num_classes) - 0.5) * 3 # random 84x3 outputs. - box_outputs_all = np.random.rand(84, 4) # random 84 boxes. - class_outputs = { - '4': - tf.reshape( - tf.convert_to_tensor(cls_outputs_all[0:64], dtype=tf.float32), - [1, 8, 8, num_classes]), - '5': - tf.reshape( - tf.convert_to_tensor(cls_outputs_all[64:80], dtype=tf.float32), - [1, 4, 4, num_classes]), - '6': - tf.reshape( - tf.convert_to_tensor(cls_outputs_all[80:84], dtype=tf.float32), - [1, 2, 2, num_classes]), - } - box_outputs = { - '4': tf.reshape(tf.convert_to_tensor( - box_outputs_all[0:64], dtype=tf.float32), [1, 8, 8, 4]), - '5': tf.reshape(tf.convert_to_tensor( - box_outputs_all[64:80], dtype=tf.float32), [1, 4, 4, 4]), - '6': tf.reshape(tf.convert_to_tensor( - box_outputs_all[80:84], dtype=tf.float32), [1, 2, 2, 4]), - } - if has_att_heads: - att_outputs_all = np.random.rand(84, 1) # random attributes. - att_outputs = { - 'depth': { - '4': - tf.reshape( - tf.convert_to_tensor( - att_outputs_all[0:64], dtype=tf.float32), - [1, 8, 8, 1]), - '5': - tf.reshape( - tf.convert_to_tensor( - att_outputs_all[64:80], dtype=tf.float32), - [1, 4, 4, 1]), - '6': - tf.reshape( - tf.convert_to_tensor( - att_outputs_all[80:84], dtype=tf.float32), - [1, 2, 2, 1]), - } - } - else: - att_outputs = None - image_info = tf.constant([[[1000, 1000], [100, 100], [0.1, 0.1], [0, 0]]], - dtype=tf.float32) - generator = detection_generator.MultilevelDetectionGenerator(**kwargs) - results = generator(box_outputs, class_outputs, anchor_boxes, - image_info[:, 1, :], att_outputs) - boxes = results['detection_boxes'] - classes = results['detection_classes'] - scores = results['detection_scores'] - valid_detections = results['num_detections'] - - self.assertEqual(boxes.numpy().shape, (batch_size, max_num_detections, 4)) - self.assertEqual(scores.numpy().shape, (batch_size, max_num_detections,)) - self.assertEqual(classes.numpy().shape, (batch_size, max_num_detections,)) - self.assertEqual(valid_detections.numpy().shape, (batch_size,)) - if has_att_heads: - for att in results['detection_attributes'].values(): - self.assertEqual(att.numpy().shape, (batch_size, max_num_detections, 1)) - - def test_serialize_deserialize(self): - kwargs = { - 'apply_nms': True, - 'pre_nms_top_k': 1000, - 'pre_nms_score_threshold': 0.1, - 'nms_iou_threshold': 0.5, - 'max_num_detections': 10, - 'nms_version': 'v2', - 'use_cpu_nms': False, - 'soft_nms_sigma': None, - } - generator = detection_generator.MultilevelDetectionGenerator(**kwargs) - - expected_config = dict(kwargs) - self.assertEqual(generator.get_config(), expected_config) - - new_generator = ( - detection_generator.MultilevelDetectionGenerator.from_config( - generator.get_config())) - - self.assertAllEqual(generator.get_config(), new_generator.get_config()) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/layers/nn_blocks.py b/official/vision/beta/modeling/layers/nn_blocks.py deleted file mode 100644 index 2d33011249887bd62881c5c328b52ed60735b9a4..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/layers/nn_blocks.py +++ /dev/null @@ -1,1511 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains common building blocks for neural networks.""" - -from typing import Any, Callable, Dict, List, Optional, Tuple, Union, Text - -# Import libraries -from absl import logging -import tensorflow as tf - -from official.modeling import tf_utils -from official.vision.beta.modeling.layers import nn_layers - - -def _pad_strides(strides: int, axis: int) -> Tuple[int, int, int, int]: - """Converts int to len 4 strides (`tf.nn.avg_pool` uses length 4).""" - if axis == 1: - return (1, 1, strides, strides) - else: - return (1, strides, strides, 1) - - -def _maybe_downsample(x: tf.Tensor, out_filter: int, strides: int, - axis: int) -> tf.Tensor: - """Downsamples feature map and 0-pads tensor if in_filter != out_filter.""" - data_format = 'NCHW' if axis == 1 else 'NHWC' - strides = _pad_strides(strides, axis=axis) - - x = tf.nn.avg_pool(x, strides, strides, 'VALID', data_format=data_format) - - in_filter = x.shape[axis] - if in_filter < out_filter: - # Pad on channel dimension with 0s: half on top half on bottom. - pad_size = [(out_filter - in_filter) // 2, (out_filter - in_filter) // 2] - if axis == 1: - x = tf.pad(x, [[0, 0], pad_size, [0, 0], [0, 0]]) - else: - x = tf.pad(x, [[0, 0], [0, 0], [0, 0], pad_size]) - - return x + 0. - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class ResidualBlock(tf.keras.layers.Layer): - """A residual block.""" - - def __init__(self, - filters, - strides, - use_projection=False, - se_ratio=None, - resnetd_shortcut=False, - stochastic_depth_drop_rate=None, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - use_explicit_padding: bool = False, - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - bn_trainable=True, - **kwargs): - """Initializes a residual block with BN after convolutions. - - Args: - filters: An `int` number of filters for the first two convolutions. Note - that the third and final convolution will use 4 times as many filters. - strides: An `int` block stride. If greater than 1, this block will - ultimately downsample the input. - use_projection: A `bool` for whether this block should use a projection - shortcut (versus the default identity shortcut). This is usually `True` - for the first block of a block group, which may change the number of - filters and the resolution. - se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. - resnetd_shortcut: A `bool` if True, apply the resnetd style modification - to the shortcut connection. Not implemented in residual blocks. - stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for - the stochastic depth layer. - kernel_initializer: A `str` of kernel_initializer for convolutional - layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. - Default to None. - activation: A `str` name of the activation function. - use_explicit_padding: Use 'VALID' padding for convolutions, but prepad - inputs so that the output dimensions are the same as if 'SAME' padding - were used. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - bn_trainable: A `bool` that indicates whether batch norm layers should be - trainable. Default to True. - **kwargs: Additional keyword arguments to be passed. - """ - super(ResidualBlock, self).__init__(**kwargs) - - self._filters = filters - self._strides = strides - self._use_projection = use_projection - self._se_ratio = se_ratio - self._resnetd_shortcut = resnetd_shortcut - self._use_explicit_padding = use_explicit_padding - self._use_sync_bn = use_sync_bn - self._activation = activation - self._stochastic_depth_drop_rate = stochastic_depth_drop_rate - self._kernel_initializer = kernel_initializer - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation_fn = tf_utils.get_activation(activation) - self._bn_trainable = bn_trainable - - def build(self, input_shape): - if self._use_projection: - self._shortcut = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=1, - strides=self._strides, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - trainable=self._bn_trainable) - - conv1_padding = 'same' - # explicit padding here is added for centernet - if self._use_explicit_padding: - self._pad = tf.keras.layers.ZeroPadding2D(padding=(1, 1)) - conv1_padding = 'valid' - - self._conv1 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=3, - strides=self._strides, - padding=conv1_padding, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - trainable=self._bn_trainable) - - self._conv2 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=3, - strides=1, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm2 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - trainable=self._bn_trainable) - - if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: - self._squeeze_excitation = nn_layers.SqueezeExcitation( - in_filters=self._filters, - out_filters=self._filters, - se_ratio=self._se_ratio, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - else: - self._squeeze_excitation = None - - if self._stochastic_depth_drop_rate: - self._stochastic_depth = nn_layers.StochasticDepth( - self._stochastic_depth_drop_rate) - else: - self._stochastic_depth = None - - super(ResidualBlock, self).build(input_shape) - - def get_config(self): - config = { - 'filters': self._filters, - 'strides': self._strides, - 'use_projection': self._use_projection, - 'se_ratio': self._se_ratio, - 'resnetd_shortcut': self._resnetd_shortcut, - 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'use_explicit_padding': self._use_explicit_padding, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'bn_trainable': self._bn_trainable - } - base_config = super(ResidualBlock, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs, training=None): - shortcut = inputs - if self._use_projection: - shortcut = self._shortcut(shortcut) - shortcut = self._norm0(shortcut) - - if self._use_explicit_padding: - inputs = self._pad(inputs) - x = self._conv1(inputs) - x = self._norm1(x) - x = self._activation_fn(x) - - x = self._conv2(x) - x = self._norm2(x) - - if self._squeeze_excitation: - x = self._squeeze_excitation(x) - - if self._stochastic_depth: - x = self._stochastic_depth(x, training=training) - - return self._activation_fn(x + shortcut) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class BottleneckBlock(tf.keras.layers.Layer): - """A standard bottleneck block.""" - - def __init__(self, - filters, - strides, - dilation_rate=1, - use_projection=False, - se_ratio=None, - resnetd_shortcut=False, - stochastic_depth_drop_rate=None, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - bn_trainable=True, - **kwargs): - """Initializes a standard bottleneck block with BN after convolutions. - - Args: - filters: An `int` number of filters for the first two convolutions. Note - that the third and final convolution will use 4 times as many filters. - strides: An `int` block stride. If greater than 1, this block will - ultimately downsample the input. - dilation_rate: An `int` dilation_rate of convolutions. Default to 1. - use_projection: A `bool` for whether this block should use a projection - shortcut (versus the default identity shortcut). This is usually `True` - for the first block of a block group, which may change the number of - filters and the resolution. - se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. - resnetd_shortcut: A `bool`. If True, apply the resnetd style modification - to the shortcut connection. - stochastic_depth_drop_rate: A `float` or None. If not None, drop rate for - the stochastic depth layer. - kernel_initializer: A `str` of kernel_initializer for convolutional - layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. - Default to None. - activation: A `str` name of the activation function. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - bn_trainable: A `bool` that indicates whether batch norm layers should be - trainable. Default to True. - **kwargs: Additional keyword arguments to be passed. - """ - super(BottleneckBlock, self).__init__(**kwargs) - - self._filters = filters - self._strides = strides - self._dilation_rate = dilation_rate - self._use_projection = use_projection - self._se_ratio = se_ratio - self._resnetd_shortcut = resnetd_shortcut - self._use_sync_bn = use_sync_bn - self._activation = activation - self._stochastic_depth_drop_rate = stochastic_depth_drop_rate - self._kernel_initializer = kernel_initializer - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._bn_trainable = bn_trainable - - def build(self, input_shape): - if self._use_projection: - if self._resnetd_shortcut: - self._shortcut0 = tf.keras.layers.AveragePooling2D( - pool_size=2, strides=self._strides, padding='same') - self._shortcut1 = tf.keras.layers.Conv2D( - filters=self._filters * 4, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - else: - self._shortcut = tf.keras.layers.Conv2D( - filters=self._filters * 4, - kernel_size=1, - strides=self._strides, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - trainable=self._bn_trainable) - - self._conv1 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - trainable=self._bn_trainable) - self._activation1 = tf_utils.get_activation( - self._activation, use_keras_layer=True) - - self._conv2 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=3, - strides=self._strides, - dilation_rate=self._dilation_rate, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm2 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - trainable=self._bn_trainable) - self._activation2 = tf_utils.get_activation( - self._activation, use_keras_layer=True) - - self._conv3 = tf.keras.layers.Conv2D( - filters=self._filters * 4, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm3 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - trainable=self._bn_trainable) - self._activation3 = tf_utils.get_activation( - self._activation, use_keras_layer=True) - - if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: - self._squeeze_excitation = nn_layers.SqueezeExcitation( - in_filters=self._filters * 4, - out_filters=self._filters * 4, - se_ratio=self._se_ratio, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - else: - self._squeeze_excitation = None - - if self._stochastic_depth_drop_rate: - self._stochastic_depth = nn_layers.StochasticDepth( - self._stochastic_depth_drop_rate) - else: - self._stochastic_depth = None - self._add = tf.keras.layers.Add() - - super(BottleneckBlock, self).build(input_shape) - - def get_config(self): - config = { - 'filters': self._filters, - 'strides': self._strides, - 'dilation_rate': self._dilation_rate, - 'use_projection': self._use_projection, - 'se_ratio': self._se_ratio, - 'resnetd_shortcut': self._resnetd_shortcut, - 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'bn_trainable': self._bn_trainable - } - base_config = super(BottleneckBlock, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs, training=None): - shortcut = inputs - if self._use_projection: - if self._resnetd_shortcut: - shortcut = self._shortcut0(shortcut) - shortcut = self._shortcut1(shortcut) - else: - shortcut = self._shortcut(shortcut) - shortcut = self._norm0(shortcut) - - x = self._conv1(inputs) - x = self._norm1(x) - x = self._activation1(x) - - x = self._conv2(x) - x = self._norm2(x) - x = self._activation2(x) - - x = self._conv3(x) - x = self._norm3(x) - - if self._squeeze_excitation: - x = self._squeeze_excitation(x) - - if self._stochastic_depth: - x = self._stochastic_depth(x, training=training) - - x = self._add([x, shortcut]) - return self._activation3(x) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class InvertedBottleneckBlock(tf.keras.layers.Layer): - """An inverted bottleneck block.""" - - def __init__(self, - in_filters, - out_filters, - expand_ratio, - strides, - kernel_size=3, - se_ratio=None, - stochastic_depth_drop_rate=None, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - se_inner_activation='relu', - se_gating_activation='sigmoid', - se_round_down_protect=True, - expand_se_in_filters=False, - depthwise_activation=None, - use_sync_bn=False, - dilation_rate=1, - divisible_by=1, - regularize_depthwise=False, - use_depthwise=True, - use_residual=True, - norm_momentum=0.99, - norm_epsilon=0.001, - output_intermediate_endpoints=False, - **kwargs): - """Initializes an inverted bottleneck block with BN after convolutions. - - Args: - in_filters: An `int` number of filters of the input tensor. - out_filters: An `int` number of filters of the output tensor. - expand_ratio: An `int` of expand_ratio for an inverted bottleneck block. - strides: An `int` block stride. If greater than 1, this block will - ultimately downsample the input. - kernel_size: An `int` kernel_size of the depthwise conv layer. - se_ratio: A `float` or None. If not None, se ratio for the squeeze and - excitation layer. - stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for - the stochastic depth layer. - kernel_initializer: A `str` of kernel_initializer for convolutional - layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. - Default to None. - activation: A `str` name of the activation function. - se_inner_activation: A `str` name of squeeze-excitation inner activation. - se_gating_activation: A `str` name of squeeze-excitation gating - activation. - se_round_down_protect: A `bool` of whether round down more than 10% - will be allowed in SE layer. - expand_se_in_filters: A `bool` of whether or not to expand in_filter in - squeeze and excitation layer. - depthwise_activation: A `str` name of the activation function for - depthwise only. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - dilation_rate: An `int` that specifies the dilation rate to use for. - divisible_by: An `int` that ensures all inner dimensions are divisible by - this number. - dilated convolution: An `int` to specify the same value for all spatial - dimensions. - regularize_depthwise: A `bool` of whether or not apply regularization on - depthwise. - use_depthwise: A `bool` of whether to uses fused convolutions instead of - depthwise. - use_residual: A `bool` of whether to include residual connection between - input and output. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - output_intermediate_endpoints: A `bool` of whether or not output the - intermediate endpoints. - **kwargs: Additional keyword arguments to be passed. - """ - super(InvertedBottleneckBlock, self).__init__(**kwargs) - - self._in_filters = in_filters - self._out_filters = out_filters - self._expand_ratio = expand_ratio - self._strides = strides - self._kernel_size = kernel_size - self._se_ratio = se_ratio - self._divisible_by = divisible_by - self._stochastic_depth_drop_rate = stochastic_depth_drop_rate - self._dilation_rate = dilation_rate - self._use_sync_bn = use_sync_bn - self._regularize_depthwise = regularize_depthwise - self._use_depthwise = use_depthwise - self._use_residual = use_residual - self._activation = activation - self._se_inner_activation = se_inner_activation - self._se_gating_activation = se_gating_activation - self._depthwise_activation = depthwise_activation - self._se_round_down_protect = se_round_down_protect - self._kernel_initializer = kernel_initializer - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._expand_se_in_filters = expand_se_in_filters - self._output_intermediate_endpoints = output_intermediate_endpoints - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - if not depthwise_activation: - self._depthwise_activation = activation - if regularize_depthwise: - self._depthsize_regularizer = kernel_regularizer - else: - self._depthsize_regularizer = None - - def build(self, input_shape): - expand_filters = self._in_filters - if self._expand_ratio > 1: - # First 1x1 conv for channel expansion. - expand_filters = nn_layers.make_divisible( - self._in_filters * self._expand_ratio, self._divisible_by) - - expand_kernel = 1 if self._use_depthwise else self._kernel_size - expand_stride = 1 if self._use_depthwise else self._strides - - self._conv0 = tf.keras.layers.Conv2D( - filters=expand_filters, - kernel_size=expand_kernel, - strides=expand_stride, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - self._activation_layer = tf_utils.get_activation( - self._activation, use_keras_layer=True) - - if self._use_depthwise: - # Depthwise conv. - self._conv1 = tf.keras.layers.DepthwiseConv2D( - kernel_size=(self._kernel_size, self._kernel_size), - strides=self._strides, - padding='same', - depth_multiplier=1, - dilation_rate=self._dilation_rate, - use_bias=False, - depthwise_initializer=self._kernel_initializer, - depthwise_regularizer=self._depthsize_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - self._depthwise_activation_layer = tf_utils.get_activation( - self._depthwise_activation, use_keras_layer=True) - - # Squeeze and excitation. - if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: - logging.info('Use Squeeze and excitation.') - in_filters = self._in_filters - if self._expand_se_in_filters: - in_filters = expand_filters - self._squeeze_excitation = nn_layers.SqueezeExcitation( - in_filters=in_filters, - out_filters=expand_filters, - se_ratio=self._se_ratio, - divisible_by=self._divisible_by, - round_down_protect=self._se_round_down_protect, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._se_inner_activation, - gating_activation=self._se_gating_activation) - else: - self._squeeze_excitation = None - - # Last 1x1 conv. - self._conv2 = tf.keras.layers.Conv2D( - filters=self._out_filters, - kernel_size=1, - strides=1, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm2 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - if self._stochastic_depth_drop_rate: - self._stochastic_depth = nn_layers.StochasticDepth( - self._stochastic_depth_drop_rate) - else: - self._stochastic_depth = None - self._add = tf.keras.layers.Add() - - super(InvertedBottleneckBlock, self).build(input_shape) - - def get_config(self): - config = { - 'in_filters': self._in_filters, - 'out_filters': self._out_filters, - 'expand_ratio': self._expand_ratio, - 'strides': self._strides, - 'kernel_size': self._kernel_size, - 'se_ratio': self._se_ratio, - 'divisible_by': self._divisible_by, - 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'se_inner_activation': self._se_inner_activation, - 'se_gating_activation': self._se_gating_activation, - 'se_round_down_protect': self._se_round_down_protect, - 'expand_se_in_filters': self._expand_se_in_filters, - 'depthwise_activation': self._depthwise_activation, - 'dilation_rate': self._dilation_rate, - 'use_sync_bn': self._use_sync_bn, - 'regularize_depthwise': self._regularize_depthwise, - 'use_depthwise': self._use_depthwise, - 'use_residual': self._use_residual, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon - } - base_config = super(InvertedBottleneckBlock, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs, training=None): - endpoints = {} - shortcut = inputs - if self._expand_ratio > 1: - x = self._conv0(inputs) - x = self._norm0(x) - x = self._activation_layer(x) - else: - x = inputs - - if self._use_depthwise: - x = self._conv1(x) - x = self._norm1(x) - x = self._depthwise_activation_layer(x) - if self._output_intermediate_endpoints: - endpoints['depthwise'] = x - - if self._squeeze_excitation: - x = self._squeeze_excitation(x) - - x = self._conv2(x) - x = self._norm2(x) - - if (self._use_residual and self._in_filters == self._out_filters and - self._strides == 1): - if self._stochastic_depth: - x = self._stochastic_depth(x, training=training) - x = self._add([x, shortcut]) - - if self._output_intermediate_endpoints: - return x, endpoints - return x - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class ResidualInner(tf.keras.layers.Layer): - """Creates a single inner block of a residual. - - This corresponds to `F`/`G` functions in the RevNet paper: - Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse. - The Reversible Residual Network: Backpropagation Without Storing Activations. - (https://arxiv.org/pdf/1707.04585.pdf) - """ - - def __init__( - self, - filters: int, - strides: int, - kernel_initializer: Union[str, Callable[ - ..., tf.keras.initializers.Initializer]] = 'VarianceScaling', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - activation: Union[str, Callable[..., tf.Tensor]] = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - batch_norm_first: bool = True, - **kwargs): - """Initializes a ResidualInner. - - Args: - filters: An `int` of output filter size. - strides: An `int` of stride size for convolution for the residual block. - kernel_initializer: A `str` or `tf.keras.initializers.Initializer` - instance for convolutional layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` for Conv2D. - activation: A `str` or `callable` instance of the activation function. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - batch_norm_first: A `bool` of whether to apply activation and batch norm - before conv. - **kwargs: Additional keyword arguments to be passed. - """ - super(ResidualInner, self).__init__(**kwargs) - - self.strides = strides - self.filters = filters - self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) - self._kernel_regularizer = kernel_regularizer - self._activation = tf.keras.activations.get(activation) - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._batch_norm_first = batch_norm_first - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation_fn = tf_utils.get_activation(activation) - - def build(self, input_shape: tf.TensorShape): - if self._batch_norm_first: - self._batch_norm_0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv2d_1 = tf.keras.layers.Conv2D( - filters=self.filters, - kernel_size=3, - strides=self.strides, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer) - - self._batch_norm_1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv2d_2 = tf.keras.layers.Conv2D( - filters=self.filters, - kernel_size=3, - strides=1, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer) - - super(ResidualInner, self).build(input_shape) - - def get_config(self) -> Dict[str, Any]: - config = { - 'filters': self.filters, - 'strides': self.strides, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'batch_norm_first': self._batch_norm_first, - } - base_config = super(ResidualInner, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, - inputs: tf.Tensor, - training: Optional[bool] = None) -> tf.Tensor: - x = inputs - if self._batch_norm_first: - x = self._batch_norm_0(x, training=training) - x = self._activation_fn(x) - x = self._conv2d_1(x) - - x = self._batch_norm_1(x, training=training) - x = self._activation_fn(x) - x = self._conv2d_2(x) - return x - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class BottleneckResidualInner(tf.keras.layers.Layer): - """Creates a single inner block of a bottleneck. - - This corresponds to `F`/`G` functions in the RevNet paper: - Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse. - The Reversible Residual Network: Backpropagation Without Storing Activations. - (https://arxiv.org/pdf/1707.04585.pdf) - """ - - def __init__( - self, - filters: int, - strides: int, - kernel_initializer: Union[str, Callable[ - ..., tf.keras.initializers.Initializer]] = 'VarianceScaling', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - activation: Union[str, Callable[..., tf.Tensor]] = 'relu', - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - batch_norm_first: bool = True, - **kwargs): - """Initializes a BottleneckResidualInner. - - Args: - filters: An `int` number of filters for first 2 convolutions. Last Last, - and thus the number of output channels from the bottlneck block is - `4*filters` - strides: An `int` of stride size for convolution for the residual block. - kernel_initializer: A `str` or `tf.keras.initializers.Initializer` - instance for convolutional layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` for Conv2D. - activation: A `str` or `callable` instance of the activation function. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - batch_norm_first: A `bool` of whether to apply activation and batch norm - before conv. - **kwargs: Additional keyword arguments to be passed. - """ - super(BottleneckResidualInner, self).__init__(**kwargs) - - self.strides = strides - self.filters = filters - self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) - self._kernel_regularizer = kernel_regularizer - self._activation = tf.keras.activations.get(activation) - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._batch_norm_first = batch_norm_first - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation_fn = tf_utils.get_activation(activation) - - def build(self, input_shape: tf.TensorShape): - if self._batch_norm_first: - self._batch_norm_0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - self._conv2d_1 = tf.keras.layers.Conv2D( - filters=self.filters, - kernel_size=1, - strides=self.strides, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer) - self._batch_norm_1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - self._conv2d_2 = tf.keras.layers.Conv2D( - filters=self.filters, - kernel_size=3, - strides=1, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer) - self._batch_norm_2 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - self._conv2d_3 = tf.keras.layers.Conv2D( - filters=self.filters * 4, - kernel_size=1, - strides=1, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer) - - super(BottleneckResidualInner, self).build(input_shape) - - def get_config(self) -> Dict[str, Any]: - config = { - 'filters': self.filters, - 'strides': self.strides, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'batch_norm_first': self._batch_norm_first, - } - base_config = super(BottleneckResidualInner, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, - inputs: tf.Tensor, - training: Optional[bool] = None) -> tf.Tensor: - x = inputs - if self._batch_norm_first: - x = self._batch_norm_0(x, training=training) - x = self._activation_fn(x) - x = self._conv2d_1(x) - - x = self._batch_norm_1(x, training=training) - x = self._activation_fn(x) - x = self._conv2d_2(x) - - x = self._batch_norm_2(x, training=training) - x = self._activation_fn(x) - x = self._conv2d_3(x) - - return x - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class ReversibleLayer(tf.keras.layers.Layer): - """Creates a reversible layer. - - Computes y1 = x1 + f(x2), y2 = x2 + g(y1), where f and g can be arbitrary - layers that are stateless, which in this case are `ResidualInner` layers. - """ - - def __init__(self, - f: tf.keras.layers.Layer, - g: tf.keras.layers.Layer, - manual_grads: bool = True, - **kwargs): - """Initializes a ReversibleLayer. - - Args: - f: A `tf.keras.layers.Layer` instance of `f` inner block referred to in - paper. Each reversible layer consists of two inner functions. For - example, in RevNet the reversible residual consists of two f/g inner - (bottleneck) residual functions. Where the input to the reversible layer - is x, the input gets partitioned in the channel dimension and the - forward pass follows (eq8): x = [x1; x2], z1 = x1 + f(x2), y2 = x2 + - g(z1), y1 = stop_gradient(z1). - g: A `tf.keras.layers.Layer` instance of `g` inner block referred to in - paper. Detailed explanation same as above as `f` arg. - manual_grads: A `bool` [Testing Only] of whether to manually take - gradients as in Algorithm 1 or defer to autograd. - **kwargs: Additional keyword arguments to be passed. - """ - super(ReversibleLayer, self).__init__(**kwargs) - - self._f = f - self._g = g - self._manual_grads = manual_grads - - if tf.keras.backend.image_data_format() == 'channels_last': - self._axis = -1 - else: - self._axis = 1 - - def get_config(self) -> Dict[str, Any]: - config = { - 'f': self._f, - 'g': self._g, - 'manual_grads': self._manual_grads, - } - base_config = super(ReversibleLayer, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def _ckpt_non_trainable_vars(self): - self._f_non_trainable_vars = [ - v.read_value() for v in self._f.non_trainable_variables - ] - self._g_non_trainable_vars = [ - v.read_value() for v in self._g.non_trainable_variables - ] - - def _load_ckpt_non_trainable_vars(self): - for v, v_chkpt in zip(self._f.non_trainable_variables, - self._f_non_trainable_vars): - v.assign(v_chkpt) - for v, v_chkpt in zip(self._g.non_trainable_variables, - self._g_non_trainable_vars): - v.assign(v_chkpt) - - def call(self, - inputs: tf.Tensor, - training: Optional[bool] = None) -> tf.Tensor: - - @tf.custom_gradient - def reversible( - x: tf.Tensor - ) -> Tuple[tf.Tensor, Callable[[Any], Tuple[List[tf.Tensor], - List[tf.Tensor]]]]: - """Implements Algorithm 1 in the RevNet paper. - - Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse. - The Reversible Residual Network: Backpropagation Without Storing - Activations. - (https://arxiv.org/pdf/1707.04585.pdf) - - Args: - x: An input `tf.Tensor. - - Returns: - y: The output [y1; y2] in Algorithm 1. - grad_fn: A callable function that computes the gradients. - """ - with tf.GradientTape() as fwdtape: - fwdtape.watch(x) - x1, x2 = tf.split(x, num_or_size_splits=2, axis=self._axis) - f_x2 = self._f(x2, training=training) - x1_down = _maybe_downsample(x1, f_x2.shape[self._axis], self._f.strides, - self._axis) - z1 = f_x2 + x1_down - g_z1 = self._g(z1, training=training) - x2_down = _maybe_downsample(x2, g_z1.shape[self._axis], self._f.strides, - self._axis) - y2 = x2_down + g_z1 - - # Equation 8: https://arxiv.org/pdf/1707.04585.pdf - # Decouple y1 and z1 so that their derivatives are different. - y1 = tf.identity(z1) - y = tf.concat([y1, y2], axis=self._axis) - - irreversible = ((self._f.strides != 1 or self._g.strides != 1) or - (y.shape[self._axis] != inputs.shape[self._axis])) - - # Checkpointing moving mean/variance for batch normalization layers - # as they shouldn't be updated during the custom gradient pass of f/g. - self._ckpt_non_trainable_vars() - - def grad_fn( - dy: tf.Tensor, - variables: Optional[List[tf.Variable]] = None, - ) -> Tuple[List[tf.Tensor], List[tf.Tensor]]: - """Given dy calculate (dy/dx)|_{x_{input}} using f/g.""" - if irreversible or not self._manual_grads: - grads_combined = fwdtape.gradient( - y, [x] + variables, output_gradients=dy) - dx = grads_combined[0] - grad_vars = grads_combined[1:] - else: - y1_nograd = tf.stop_gradient(y1) - y2_nograd = tf.stop_gradient(y2) - dy1, dy2 = tf.split(dy, num_or_size_splits=2, axis=self._axis) - - # Index mapping from self.f/g.trainable_variables to grad_fn - # input `variables` kwarg so that we can reorder dwf + dwg - # variable gradient list to match `variables` order. - f_var_refs = [v.ref() for v in self._f.trainable_variables] - g_var_refs = [v.ref() for v in self._g.trainable_variables] - fg_var_refs = f_var_refs + g_var_refs - self_to_var_index = [fg_var_refs.index(v.ref()) for v in variables] - - # Algorithm 1 in paper (line # documented in-line) - z1 = y1_nograd # line 2 - with tf.GradientTape() as gtape: - gtape.watch(z1) - g_z1 = self._g(z1, training=training) - x2 = y2_nograd - g_z1 # line 3 - - with tf.GradientTape() as ftape: - ftape.watch(x2) - f_x2 = self._f(x2, training=training) - x1 = z1 - f_x2 # pylint: disable=unused-variable # line 4 - - # Compute gradients - g_grads_combined = gtape.gradient( - g_z1, [z1] + self._g.trainable_variables, output_gradients=dy2) - dz1 = dy1 + g_grads_combined[0] # line 5 - dwg = g_grads_combined[1:] # line 9 - - f_grads_combined = ftape.gradient( - f_x2, [x2] + self._f.trainable_variables, output_gradients=dz1) - dx2 = dy2 + f_grads_combined[0] # line 6 - dwf = f_grads_combined[1:] # line 8 - dx1 = dz1 # line 7 - - # Pack the input and variable gradients. - dx = tf.concat([dx1, dx2], axis=self._axis) - grad_vars = dwf + dwg - # Reorder gradients (trainable_variables to variables kwarg order) - grad_vars = [grad_vars[i] for i in self_to_var_index] - - # Restore batch normalization moving mean/variance for correctness. - self._load_ckpt_non_trainable_vars() - - return dx, grad_vars # grad_fn end - - return y, grad_fn # reversible end - - activations = reversible(inputs) - return activations - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class DepthwiseSeparableConvBlock(tf.keras.layers.Layer): - """Creates an depthwise separable convolution block with batch normalization.""" - - def __init__( - self, - filters: int, - kernel_size: int = 3, - strides: int = 1, - regularize_depthwise=False, - activation: Text = 'relu6', - kernel_initializer: Text = 'VarianceScaling', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - dilation_rate: int = 1, - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - **kwargs): - """Initializes a convolution block with batch normalization. - - Args: - filters: An `int` number of filters for the first two convolutions. Note - that the third and final convolution will use 4 times as many filters. - kernel_size: An `int` that specifies the height and width of the 2D - convolution window. - strides: An `int` of block stride. If greater than 1, this block will - ultimately downsample the input. - regularize_depthwise: A `bool`. If Ture, apply regularization on - depthwise. - activation: A `str` name of the activation function. - kernel_initializer: A `str` of kernel_initializer for convolutional - layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - dilation_rate: An `int` or tuple/list of 2 `int`, specifying the dilation - rate to use for dilated convolution. Can be a single integer to specify - the same value for all spatial dimensions. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - **kwargs: Additional keyword arguments to be passed. - """ - super(DepthwiseSeparableConvBlock, self).__init__(**kwargs) - self._filters = filters - self._kernel_size = kernel_size - self._strides = strides - self._activation = activation - self._regularize_depthwise = regularize_depthwise - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._dilation_rate = dilation_rate - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation_fn = tf_utils.get_activation(activation) - if regularize_depthwise: - self._depthsize_regularizer = kernel_regularizer - else: - self._depthsize_regularizer = None - - def get_config(self): - config = { - 'filters': self._filters, - 'strides': self._strides, - 'regularize_depthwise': self._regularize_depthwise, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon - } - base_config = super(DepthwiseSeparableConvBlock, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def build(self, input_shape): - - self._dwconv0 = tf.keras.layers.DepthwiseConv2D( - kernel_size=self._kernel_size, - strides=self._strides, - padding='same', - depth_multiplier=1, - dilation_rate=self._dilation_rate, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._depthsize_regularizer, - use_bias=False) - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv1 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=1, - strides=1, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer) - self._norm1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - super(DepthwiseSeparableConvBlock, self).build(input_shape) - - def call(self, inputs, training=None): - x = self._dwconv0(inputs) - x = self._norm0(x) - x = self._activation_fn(x) - - x = self._conv1(x) - x = self._norm1(x) - return self._activation_fn(x) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class TuckerConvBlock(tf.keras.layers.Layer): - """An Tucker block (generalized bottleneck).""" - - def __init__(self, - in_filters, - out_filters, - input_compression_ratio, - output_compression_ratio, - strides, - kernel_size=3, - stochastic_depth_drop_rate=None, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - use_sync_bn=False, - divisible_by=1, - use_residual=True, - norm_momentum=0.99, - norm_epsilon=0.001, - **kwargs): - """Initializes an inverted bottleneck block with BN after convolutions. - - Args: - in_filters: An `int` number of filters of the input tensor. - out_filters: An `int` number of filters of the output tensor. - input_compression_ratio: An `float` of compression ratio for - input filters. - output_compression_ratio: An `float` of compression ratio for - output filters. - strides: An `int` block stride. If greater than 1, this block will - ultimately downsample the input. - kernel_size: An `int` kernel_size of the depthwise conv layer. - stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for - the stochastic depth layer. - kernel_initializer: A `str` of kernel_initializer for convolutional - layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. - Default to None. - activation: A `str` name of the activation function. - use_sync_bn: A `bool`. If True, use synchronized batch normalization. - divisible_by: An `int` that ensures all inner dimensions are divisible by - this number. - use_residual: A `bool` of whether to include residual connection between - input and output. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - **kwargs: Additional keyword arguments to be passed. - """ - super(TuckerConvBlock, self).__init__(**kwargs) - - self._in_filters = in_filters - self._out_filters = out_filters - self._input_compression_ratio = input_compression_ratio - self._output_compression_ratio = output_compression_ratio - self._strides = strides - self._kernel_size = kernel_size - self._divisible_by = divisible_by - self._stochastic_depth_drop_rate = stochastic_depth_drop_rate - self._use_sync_bn = use_sync_bn - self._use_residual = use_residual - self._activation = activation - self._kernel_initializer = kernel_initializer - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - - def build(self, input_shape): - input_compressed_filters = nn_layers.make_divisible( - value=self._in_filters * self._input_compression_ratio, - divisor=self._divisible_by, - round_down_protect=False) - - self._conv0 = tf.keras.layers.Conv2D( - filters=input_compressed_filters, - kernel_size=1, - strides=1, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - self._activation_layer0 = tf_utils.get_activation( - self._activation, use_keras_layer=True) - - output_compressed_filters = nn_layers.make_divisible( - value=self._out_filters * self._output_compression_ratio, - divisor=self._divisible_by, - round_down_protect=False) - - self._conv1 = tf.keras.layers.Conv2D( - filters=output_compressed_filters, - kernel_size=self._kernel_size, - strides=self._strides, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - self._activation_layer1 = tf_utils.get_activation( - self._activation, use_keras_layer=True) - - # Last 1x1 conv. - self._conv2 = tf.keras.layers.Conv2D( - filters=self._out_filters, - kernel_size=1, - strides=1, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm2 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - if self._stochastic_depth_drop_rate: - self._stochastic_depth = nn_layers.StochasticDepth( - self._stochastic_depth_drop_rate) - else: - self._stochastic_depth = None - self._add = tf.keras.layers.Add() - - super(TuckerConvBlock, self).build(input_shape) - - def get_config(self): - config = { - 'in_filters': self._in_filters, - 'out_filters': self._out_filters, - 'input_compression_ratio': self._input_compression_ratio, - 'output_compression_ratio': self._output_compression_ratio, - 'strides': self._strides, - 'kernel_size': self._kernel_size, - 'divisible_by': self._divisible_by, - 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'use_residual': self._use_residual, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon - } - base_config = super(TuckerConvBlock, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs, training=None): - shortcut = inputs - - x = self._conv0(inputs) - x = self._norm0(x) - x = self._activation_layer0(x) - - x = self._conv1(x) - x = self._norm1(x) - x = self._activation_layer1(x) - - x = self._conv2(x) - x = self._norm2(x) - - if (self._use_residual and - self._in_filters == self._out_filters and - self._strides == 1): - if self._stochastic_depth: - x = self._stochastic_depth(x, training=training) - x = self._add([x, shortcut]) - - return x diff --git a/official/vision/beta/modeling/layers/nn_blocks_test.py b/official/vision/beta/modeling/layers/nn_blocks_test.py deleted file mode 100644 index 0467b102f8f4344f57efb18acb59120f63f18eea..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/layers/nn_blocks_test.py +++ /dev/null @@ -1,341 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for nn_blocks.""" - -from typing import Any, Iterable, Tuple -# Import libraries -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.modeling.layers import nn_blocks - - -def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]: - """Returns the combinations of end-to-end tests to run.""" - return combinations.combine( - distribution=[ - strategy_combinations.default_strategy, - strategy_combinations.cloud_tpu_strategy, - strategy_combinations.one_device_strategy_gpu, - ],) - - -class NNBlocksTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (nn_blocks.ResidualBlock, 1, False, 0.0, None), - (nn_blocks.ResidualBlock, 2, True, 0.2, 0.25), - ) - def test_residual_block_creation(self, block_fn, strides, use_projection, - stochastic_depth_drop_rate, se_ratio): - input_size = 128 - filter_size = 256 - inputs = tf.keras.Input( - shape=(input_size, input_size, filter_size), batch_size=1) - block = block_fn( - filter_size, - strides, - use_projection=use_projection, - se_ratio=se_ratio, - stochastic_depth_drop_rate=stochastic_depth_drop_rate, - ) - - features = block(inputs) - - self.assertAllEqual( - [1, input_size // strides, input_size // strides, filter_size], - features.shape.as_list()) - - @parameterized.parameters( - (nn_blocks.BottleneckBlock, 1, False, 0.0, None), - (nn_blocks.BottleneckBlock, 2, True, 0.2, 0.25), - ) - def test_bottleneck_block_creation(self, block_fn, strides, use_projection, - stochastic_depth_drop_rate, se_ratio): - input_size = 128 - filter_size = 256 - inputs = tf.keras.Input( - shape=(input_size, input_size, filter_size * 4), batch_size=1) - block = block_fn( - filter_size, - strides, - use_projection=use_projection, - se_ratio=se_ratio, - stochastic_depth_drop_rate=stochastic_depth_drop_rate) - - features = block(inputs) - - self.assertAllEqual( - [1, input_size // strides, input_size // strides, filter_size * 4], - features.shape.as_list()) - - @parameterized.parameters( - (nn_blocks.InvertedBottleneckBlock, 1, 1, None, None), - (nn_blocks.InvertedBottleneckBlock, 6, 1, None, None), - (nn_blocks.InvertedBottleneckBlock, 1, 2, None, None), - (nn_blocks.InvertedBottleneckBlock, 1, 1, 0.2, None), - (nn_blocks.InvertedBottleneckBlock, 1, 1, None, 0.2), - ) - def test_invertedbottleneck_block_creation(self, block_fn, expand_ratio, - strides, se_ratio, - stochastic_depth_drop_rate): - input_size = 128 - in_filters = 24 - out_filters = 40 - inputs = tf.keras.Input( - shape=(input_size, input_size, in_filters), batch_size=1) - block = block_fn( - in_filters=in_filters, - out_filters=out_filters, - expand_ratio=expand_ratio, - strides=strides, - se_ratio=se_ratio, - stochastic_depth_drop_rate=stochastic_depth_drop_rate) - - features = block(inputs) - - self.assertAllEqual( - [1, input_size // strides, input_size // strides, out_filters], - features.shape.as_list()) - - @parameterized.parameters( - (nn_blocks.TuckerConvBlock, 1, 0.25, 0.25), - (nn_blocks.TuckerConvBlock, 2, 0.25, 0.25), - ) - def test_tucker_conv_block( - self, block_fn, strides, - input_compression_ratio, output_compression_ratio): - input_size = 128 - in_filters = 24 - out_filters = 24 - inputs = tf.keras.Input( - shape=(input_size, input_size, in_filters), batch_size=1) - block = block_fn( - in_filters=in_filters, - out_filters=out_filters, - input_compression_ratio=input_compression_ratio, - output_compression_ratio=output_compression_ratio, - strides=strides) - - features = block(inputs) - - self.assertAllEqual( - [1, input_size // strides, input_size // strides, out_filters], - features.shape.as_list()) - - -class ResidualInnerTest(parameterized.TestCase, tf.test.TestCase): - - @combinations.generate(distribution_strategy_combinations()) - def test_shape(self, distribution): - bsz, h, w, c = 8, 32, 32, 32 - filters = 64 - strides = 2 - - input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) - with distribution.scope(): - test_layer = nn_blocks.ResidualInner(filters, strides) - - output = test_layer(input_tensor) - expected_output_shape = [bsz, h // strides, w // strides, filters] - self.assertEqual(expected_output_shape, output.shape.as_list()) - - -class BottleneckResidualInnerTest(parameterized.TestCase, tf.test.TestCase): - - @combinations.generate(distribution_strategy_combinations()) - def test_shape(self, distribution): - bsz, h, w, c = 8, 32, 32, 32 - filters = 64 - strides = 2 - - input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) - with distribution.scope(): - test_layer = nn_blocks.BottleneckResidualInner(filters, strides) - - output = test_layer(input_tensor) - expected_output_shape = [bsz, h // strides, w // strides, filters * 4] - self.assertEqual(expected_output_shape, output.shape.as_list()) - - -class DepthwiseSeparableConvBlockTest(parameterized.TestCase, tf.test.TestCase): - - @combinations.generate(distribution_strategy_combinations()) - def test_shape(self, distribution): - batch_size, height, width, num_channels = 8, 32, 32, 32 - num_filters = 64 - strides = 2 - - input_tensor = tf.random.normal( - shape=[batch_size, height, width, num_channels]) - with distribution.scope(): - block = nn_blocks.DepthwiseSeparableConvBlock( - num_filters, strides=strides) - config_dict = block.get_config() - recreate_block = nn_blocks.DepthwiseSeparableConvBlock(**config_dict) - - output_tensor = block(input_tensor) - expected_output_shape = [ - batch_size, height // strides, width // strides, num_filters - ] - self.assertEqual(output_tensor.shape.as_list(), expected_output_shape) - - output_tensor = recreate_block(input_tensor) - self.assertEqual(output_tensor.shape.as_list(), expected_output_shape) - - -class ReversibleLayerTest(parameterized.TestCase, tf.test.TestCase): - - @combinations.generate(distribution_strategy_combinations()) - def test_downsampling_non_reversible_step(self, distribution): - bsz, h, w, c = 8, 32, 32, 32 - filters = 64 - strides = 2 - - input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) - with distribution.scope(): - f = nn_blocks.ResidualInner( - filters=filters // 2, strides=strides, batch_norm_first=True) - g = nn_blocks.ResidualInner( - filters=filters // 2, strides=1, batch_norm_first=True) - test_layer = nn_blocks.ReversibleLayer(f, g) - test_layer.build(input_tensor.shape) - optimizer = tf.keras.optimizers.SGD(learning_rate=0.01) - - @tf.function - def step_fn(): - with tf.GradientTape() as tape: - output = test_layer(input_tensor, training=True) - grads = tape.gradient(output, test_layer.trainable_variables) - # Test applying gradients with optimizer works - optimizer.apply_gradients(zip(grads, test_layer.trainable_variables)) - - return output - - replica_output = distribution.run(step_fn) - outputs = distribution.experimental_local_results(replica_output) - - # Assert forward pass shape - expected_output_shape = [bsz, h // strides, w // strides, filters] - for output in outputs: - self.assertEqual(expected_output_shape, output.shape.as_list()) - - @combinations.generate(distribution_strategy_combinations()) - def test_reversible_step(self, distribution): - # Reversible layers satisfy: (a) strides = 1 (b) in_filter = out_filter - bsz, h, w, c = 8, 32, 32, 32 - filters = c - strides = 1 - - input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) - with distribution.scope(): - f = nn_blocks.ResidualInner( - filters=filters // 2, strides=strides, batch_norm_first=False) - g = nn_blocks.ResidualInner( - filters=filters // 2, strides=1, batch_norm_first=False) - test_layer = nn_blocks.ReversibleLayer(f, g) - test_layer(input_tensor, training=False) # init weights - optimizer = tf.keras.optimizers.SGD(learning_rate=0.01) - - @tf.function - def step_fn(): - with tf.GradientTape() as tape: - output = test_layer(input_tensor, training=True) - grads = tape.gradient(output, test_layer.trainable_variables) - # Test applying gradients with optimizer works - optimizer.apply_gradients(zip(grads, test_layer.trainable_variables)) - - return output - - @tf.function - def fwd(): - test_layer(input_tensor) - - distribution.run(fwd) # Initialize variables - prev_variables = tf.identity_n(test_layer.trainable_variables) - replica_output = distribution.run(step_fn) - outputs = distribution.experimental_local_results(replica_output) - - # Assert variables values have changed values - for v0, v1 in zip(prev_variables, test_layer.trainable_variables): - self.assertNotAllEqual(v0, v1) - - # Assert forward pass shape - expected_output_shape = [bsz, h // strides, w // strides, filters] - for output in outputs: - self.assertEqual(expected_output_shape, output.shape.as_list()) - - @combinations.generate(distribution_strategy_combinations()) - def test_manual_gradients_correctness(self, distribution): - bsz, h, w, c = 8, 32, 32, 32 - filters = c - strides = 1 - - input_tensor = tf.random.uniform(shape=[bsz, h, w, c * 4]) # bottleneck - with distribution.scope(): - f_manual = nn_blocks.BottleneckResidualInner( - filters=filters // 2, strides=strides, batch_norm_first=False) - g_manual = nn_blocks.BottleneckResidualInner( - filters=filters // 2, strides=1, batch_norm_first=False) - manual_grad_layer = nn_blocks.ReversibleLayer(f_manual, g_manual) - manual_grad_layer(input_tensor, training=False) # init weights - - f_auto = nn_blocks.BottleneckResidualInner( - filters=filters // 2, strides=strides, batch_norm_first=False) - g_auto = nn_blocks.BottleneckResidualInner( - filters=filters // 2, strides=1, batch_norm_first=False) - auto_grad_layer = nn_blocks.ReversibleLayer( - f_auto, g_auto, manual_grads=False) - auto_grad_layer(input_tensor) # init weights - # Clone all weights (tf.keras.layers.Layer has no .clone()) - auto_grad_layer._f.set_weights(manual_grad_layer._f.get_weights()) - auto_grad_layer._g.set_weights(manual_grad_layer._g.get_weights()) - - @tf.function - def manual_fn(): - with tf.GradientTape() as tape: - output = manual_grad_layer(input_tensor, training=True) - grads = tape.gradient(output, manual_grad_layer.trainable_variables) - return grads - - @tf.function - def auto_fn(): - with tf.GradientTape() as tape: - output = auto_grad_layer(input_tensor, training=True) - grads = tape.gradient(output, auto_grad_layer.trainable_variables) - return grads - - manual_grads = distribution.run(manual_fn) - auto_grads = distribution.run(auto_fn) - - # Assert gradients calculated manually are close to that from autograd - for manual_grad, auto_grad in zip(manual_grads, auto_grads): - self.assertAllClose( - distribution.experimental_local_results(manual_grad), - distribution.experimental_local_results(auto_grad), - atol=5e-3, - rtol=5e-3) - - # Verify that BN moving mean and variance is correct. - for manual_var, auto_var in zip(manual_grad_layer.non_trainable_variables, - auto_grad_layer.non_trainable_variables): - self.assertAllClose(manual_var, auto_var) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/layers/nn_layers.py b/official/vision/beta/modeling/layers/nn_layers.py deleted file mode 100644 index 756c0e0cbe2a867feb39d559ea0c9ab18b2d243c..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/layers/nn_layers.py +++ /dev/null @@ -1,1277 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains common building blocks for neural networks.""" -from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union - -from absl import logging -import tensorflow as tf -import tensorflow_addons as tfa - -from official.modeling import tf_utils -from official.vision.beta.ops import spatial_transform_ops - - -# Type annotations. -States = Dict[str, tf.Tensor] -Activation = Union[str, Callable] - - -def make_divisible(value: float, - divisor: int, - min_value: Optional[float] = None, - round_down_protect: bool = True, - ) -> int: - """This is to ensure that all layers have channels that are divisible by 8. - - Args: - value: A `float` of original value. - divisor: An `int` of the divisor that need to be checked upon. - min_value: A `float` of minimum value threshold. - round_down_protect: A `bool` indicating whether round down more than 10% - will be allowed. - - Returns: - The adjusted value in `int` that is divisible against divisor. - """ - if min_value is None: - min_value = divisor - new_value = max(min_value, int(value + divisor / 2) // divisor * divisor) - # Make sure that round down does not go down by more than 10%. - if round_down_protect and new_value < 0.9 * value: - new_value += divisor - return int(new_value) - - -def round_filters(filters: int, - multiplier: float, - divisor: int = 8, - min_depth: Optional[int] = None, - round_down_protect: bool = True, - skip: bool = False) -> int: - """Rounds number of filters based on width multiplier.""" - orig_f = filters - if skip or not multiplier: - return filters - - new_filters = make_divisible(value=filters * multiplier, - divisor=divisor, - min_value=min_depth, - round_down_protect=round_down_protect) - - logging.info('round_filter input=%s output=%s', orig_f, new_filters) - return int(new_filters) - - -def get_padding_for_kernel_size(kernel_size): - """Compute padding size given kernel size.""" - if kernel_size == 7: - return (3, 3) - elif kernel_size == 3: - return (1, 1) - else: - raise ValueError('Padding for kernel size {} not known.'.format( - kernel_size)) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class SqueezeExcitation(tf.keras.layers.Layer): - """Creates a squeeze and excitation layer.""" - - def __init__(self, - in_filters, - out_filters, - se_ratio, - divisible_by=1, - use_3d_input=False, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - gating_activation='sigmoid', - round_down_protect=True, - **kwargs): - """Initializes a squeeze and excitation layer. - - Args: - in_filters: An `int` number of filters of the input tensor. - out_filters: An `int` number of filters of the output tensor. - se_ratio: A `float` or None. If not None, se ratio for the squeeze and - excitation layer. - divisible_by: An `int` that ensures all inner dimensions are divisible by - this number. - use_3d_input: A `bool` of whether input is 2D or 3D image. - kernel_initializer: A `str` of kernel_initializer for convolutional - layers. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default to None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. - Default to None. - activation: A `str` name of the activation function. - gating_activation: A `str` name of the activation function for final - gating function. - round_down_protect: A `bool` of whether round down more than 10% will be - allowed. - **kwargs: Additional keyword arguments to be passed. - """ - super(SqueezeExcitation, self).__init__(**kwargs) - - self._in_filters = in_filters - self._out_filters = out_filters - self._se_ratio = se_ratio - self._divisible_by = divisible_by - self._round_down_protect = round_down_protect - self._use_3d_input = use_3d_input - self._activation = activation - self._gating_activation = gating_activation - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - if tf.keras.backend.image_data_format() == 'channels_last': - if not use_3d_input: - self._spatial_axis = [1, 2] - else: - self._spatial_axis = [1, 2, 3] - else: - if not use_3d_input: - self._spatial_axis = [2, 3] - else: - self._spatial_axis = [2, 3, 4] - self._activation_fn = tf_utils.get_activation(activation) - self._gating_activation_fn = tf_utils.get_activation(gating_activation) - - def build(self, input_shape): - num_reduced_filters = make_divisible( - max(1, int(self._in_filters * self._se_ratio)), - divisor=self._divisible_by, - round_down_protect=self._round_down_protect) - - self._se_reduce = tf.keras.layers.Conv2D( - filters=num_reduced_filters, - kernel_size=1, - strides=1, - padding='same', - use_bias=True, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - - self._se_expand = tf.keras.layers.Conv2D( - filters=self._out_filters, - kernel_size=1, - strides=1, - padding='same', - use_bias=True, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - - super(SqueezeExcitation, self).build(input_shape) - - def get_config(self): - config = { - 'in_filters': self._in_filters, - 'out_filters': self._out_filters, - 'se_ratio': self._se_ratio, - 'divisible_by': self._divisible_by, - 'use_3d_input': self._use_3d_input, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'gating_activation': self._gating_activation, - 'round_down_protect': self._round_down_protect, - } - base_config = super(SqueezeExcitation, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs): - x = tf.reduce_mean(inputs, self._spatial_axis, keepdims=True) - x = self._activation_fn(self._se_reduce(x)) - x = self._gating_activation_fn(self._se_expand(x)) - return x * inputs - - -def get_stochastic_depth_rate(init_rate, i, n): - """Get drop connect rate for the ith block. - - Args: - init_rate: A `float` of initial drop rate. - i: An `int` of order of the current block. - n: An `int` total number of blocks. - - Returns: - Drop rate of the ith block. - """ - if init_rate is not None: - if init_rate < 0 or init_rate > 1: - raise ValueError('Initial drop rate must be within 0 and 1.') - rate = init_rate * float(i) / n - else: - rate = None - return rate - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class StochasticDepth(tf.keras.layers.Layer): - """Creates a stochastic depth layer.""" - - def __init__(self, stochastic_depth_drop_rate, **kwargs): - """Initializes a stochastic depth layer. - - Args: - stochastic_depth_drop_rate: A `float` of drop rate. - **kwargs: Additional keyword arguments to be passed. - - Returns: - A output `tf.Tensor` of which should have the same shape as input. - """ - super(StochasticDepth, self).__init__(**kwargs) - self._drop_rate = stochastic_depth_drop_rate - - def get_config(self): - config = {'drop_rate': self._drop_rate} - base_config = super(StochasticDepth, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs, training=None): - if training is None: - training = tf.keras.backend.learning_phase() - if not training or self._drop_rate is None or self._drop_rate == 0: - return inputs - - keep_prob = 1.0 - self._drop_rate - batch_size = tf.shape(inputs)[0] - random_tensor = keep_prob - random_tensor += tf.random.uniform( - [batch_size] + [1] * (inputs.shape.rank - 1), dtype=inputs.dtype) - binary_tensor = tf.floor(random_tensor) - output = tf.math.divide(inputs, keep_prob) * binary_tensor - return output - - -@tf.keras.utils.register_keras_serializable(package='Vision') -def pyramid_feature_fusion(inputs, target_level): - """Fuses all feature maps in the feature pyramid at the target level. - - Args: - inputs: A dictionary containing the feature pyramid. The size of the input - tensor needs to be fixed. - target_level: An `int` of the target feature level for feature fusion. - - Returns: - A `float` `tf.Tensor` of shape [batch_size, feature_height, feature_width, - feature_channel]. - """ - # Convert keys to int. - pyramid_feats = {int(k): v for k, v in inputs.items()} - min_level = min(pyramid_feats.keys()) - max_level = max(pyramid_feats.keys()) - resampled_feats = [] - - for l in range(min_level, max_level + 1): - if l == target_level: - resampled_feats.append(pyramid_feats[l]) - else: - feat = pyramid_feats[l] - target_size = list(feat.shape[1:3]) - target_size[0] *= 2**(l - target_level) - target_size[1] *= 2**(l - target_level) - # Casts feat to float32 so the resize op can be run on TPU. - feat = tf.cast(feat, tf.float32) - feat = tf.image.resize( - feat, size=target_size, method=tf.image.ResizeMethod.BILINEAR) - # Casts it back to be compatible with the rest opetations. - feat = tf.cast(feat, pyramid_feats[l].dtype) - resampled_feats.append(feat) - - return tf.math.add_n(resampled_feats) - - -class PanopticFPNFusion(tf.keras.Model): - """Creates a Panoptic FPN feature Fusion layer. - - This implements feature fusion for semantic segmentation head from the paper: - Alexander Kirillov, Ross Girshick, Kaiming He and Piotr Dollar. - Panoptic Feature Pyramid Networks. - (https://arxiv.org/pdf/1901.02446.pdf) - """ - - def __init__( - self, - min_level: int = 2, - max_level: int = 5, - target_level: int = 2, - num_filters: int = 128, - num_fpn_filters: int = 256, - activation: str = 'relu', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - **kwargs): - - """Initializes panoptic FPN feature fusion layer. - - Args: - min_level: An `int` of minimum level to use in feature fusion. - max_level: An `int` of maximum level to use in feature fusion. - target_level: An `int` of the target feature level for feature fusion. - num_filters: An `int` number of filters in conv2d layers. - num_fpn_filters: An `int` number of filters in the FPN outputs - activation: A `str` name of the activation function. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default is None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - **kwargs: Additional keyword arguments to be passed. - Returns: - A `float` `tf.Tensor` of shape [batch_size, feature_height, feature_width, - feature_channel]. - """ - if target_level > max_level: - raise ValueError('target_level should be less than max_level') - - self._config_dict = { - 'min_level': min_level, - 'max_level': max_level, - 'target_level': target_level, - 'num_filters': num_filters, - 'num_fpn_filters': num_fpn_filters, - 'activation': activation, - 'kernel_regularizer': kernel_regularizer, - 'bias_regularizer': bias_regularizer, - } - norm = tfa.layers.GroupNormalization - conv2d = tf.keras.layers.Conv2D - activation_fn = tf_utils.get_activation(activation) - if tf.keras.backend.image_data_format() == 'channels_last': - norm_axis = -1 - else: - norm_axis = 1 - inputs = self._build_inputs(num_fpn_filters, min_level, max_level) - - upscaled_features = [] - for level in range(min_level, max_level + 1): - num_conv_layers = max(1, level - target_level) - x = inputs[str(level)] - for i in range(num_conv_layers): - x = conv2d( - filters=num_filters, - kernel_size=3, - padding='same', - kernel_initializer=tf.keras.initializers.VarianceScaling(), - kernel_regularizer=kernel_regularizer, - bias_regularizer=bias_regularizer)(x) - x = norm(groups=32, axis=norm_axis)(x) - x = activation_fn(x) - if level != target_level: - x = spatial_transform_ops.nearest_upsampling(x, scale=2) - upscaled_features.append(x) - - fused_features = tf.math.add_n(upscaled_features) - self._output_specs = {str(target_level): fused_features.get_shape()} - - super(PanopticFPNFusion, self).__init__( - inputs=inputs, outputs=fused_features, **kwargs) - - def _build_inputs(self, num_filters: int, - min_level: int, max_level: int): - inputs = {} - for level in range(min_level, max_level + 1): - inputs[str(level)] = tf.keras.Input(shape=[None, None, num_filters]) - return inputs - - def get_config(self) -> Mapping[str, Any]: - return self._config_dict - - @classmethod - def from_config(cls, config, custom_objects=None): - return cls(**config) - - @property - def output_specs(self) -> Mapping[str, tf.TensorShape]: - """A dict of {level: TensorShape} pairs for the model output.""" - return self._output_specs - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class Scale(tf.keras.layers.Layer): - """Scales the input by a trainable scalar weight. - - This is useful for applying ReZero to layers, which improves convergence - speed. This implements the paper: - ReZero is All You Need: Fast Convergence at Large Depth. - (https://arxiv.org/pdf/2003.04887.pdf). - """ - - def __init__( - self, - initializer: tf.keras.initializers.Initializer = 'ones', - regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - **kwargs): - """Initializes a scale layer. - - Args: - initializer: A `str` of initializer for the scalar weight. - regularizer: A `tf.keras.regularizers.Regularizer` for the scalar weight. - **kwargs: Additional keyword arguments to be passed to this layer. - - Returns: - An `tf.Tensor` of which should have the same shape as input. - """ - super(Scale, self).__init__(**kwargs) - - self._initializer = initializer - self._regularizer = regularizer - - self._scale = self.add_weight( - name='scale', - shape=[], - dtype=self.dtype, - initializer=self._initializer, - regularizer=self._regularizer, - trainable=True) - - def get_config(self): - """Returns a dictionary containing the config used for initialization.""" - config = { - 'initializer': self._initializer, - 'regularizer': self._regularizer, - } - base_config = super(Scale, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs): - """Calls the layer with the given inputs.""" - scale = tf.cast(self._scale, inputs.dtype) - return scale * inputs - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class TemporalSoftmaxPool(tf.keras.layers.Layer): - """Creates a network layer corresponding to temporal softmax pooling. - - This is useful for multi-class logits (used in e.g., Charades). Modified from - AssembleNet Charades evaluation from: - - Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova. - AssembleNet: Searching for Multi-Stream Neural Connectivity in Video - Architectures. - (https://arxiv.org/pdf/1905.13209.pdf). - """ - - def call(self, inputs): - """Calls the layer with the given inputs.""" - assert inputs.shape.rank in (3, 4, 5) - frames = tf.shape(inputs)[1] - pre_logits = inputs / tf.sqrt(tf.cast(frames, inputs.dtype)) - activations = tf.nn.softmax(pre_logits, axis=1) - outputs = inputs * activations - return outputs - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class PositionalEncoding(tf.keras.layers.Layer): - """Creates a network layer that adds a sinusoidal positional encoding. - - Positional encoding is incremented across frames, and is added to the input. - The positional encoding is first weighted at 0 so that the network can choose - to ignore it. This implements: - - Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, - Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. - Attention Is All You Need. - (https://arxiv.org/pdf/1706.03762.pdf). - """ - - def __init__(self, - initializer: tf.keras.initializers.Initializer = 'zeros', - cache_encoding: bool = False, - state_prefix: Optional[str] = None, - **kwargs): - """Initializes positional encoding. - - Args: - initializer: A `str` of initializer for weighting the positional encoding. - cache_encoding: A `bool`. If True, cache the positional encoding tensor - after calling build. Otherwise, rebuild the tensor for every call. - Setting this to False can be useful when we want to input a variable - number of frames, so the positional encoding tensor can change shape. - state_prefix: a prefix string to identify states. - **kwargs: Additional keyword arguments to be passed to this layer. - - Returns: - A `tf.Tensor` of which should have the same shape as input. - """ - super(PositionalEncoding, self).__init__(**kwargs) - self._initializer = initializer - self._cache_encoding = cache_encoding - self._pos_encoding = None - self._rezero = Scale(initializer=initializer, name='rezero') - state_prefix = state_prefix if state_prefix is not None else '' - self._state_prefix = state_prefix - self._frame_count_name = f'{state_prefix}_pos_enc_frame_count' - - def get_config(self): - """Returns a dictionary containing the config used for initialization.""" - config = { - 'initializer': self._initializer, - 'cache_encoding': self._cache_encoding, - 'state_prefix': self._state_prefix, - } - base_config = super(PositionalEncoding, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def _positional_encoding(self, - num_positions: Union[int, tf.Tensor], - hidden_size: Union[int, tf.Tensor], - start_position: Union[int, tf.Tensor] = 0, - dtype: str = 'float32') -> tf.Tensor: - """Creates a sequence of sinusoidal positional encoding vectors. - - Args: - num_positions: the total number of positions (frames). - hidden_size: the number of channels used for the hidden vectors. - start_position: the start position. - dtype: the dtype of the output tensor. - - Returns: - The positional encoding tensor with shape [num_positions, hidden_size]. - """ - if isinstance(start_position, tf.Tensor) and start_position.shape.rank == 1: - start_position = start_position[0] - - # Calling `tf.range` with `dtype=tf.bfloat16` results in an error, - # so we cast afterward. - positions = tf.range(start_position, start_position + num_positions) - positions = tf.cast(positions, dtype)[:, tf.newaxis] - idx = tf.range(hidden_size)[tf.newaxis, :] - - power = tf.cast(2 * (idx // 2), dtype) - power /= tf.cast(hidden_size, dtype) - angles = 1. / tf.math.pow(10_000., power) - radians = positions * angles - - sin = tf.math.sin(radians[:, 0::2]) - cos = tf.math.cos(radians[:, 1::2]) - pos_encoding = tf.concat([sin, cos], axis=-1) - - return pos_encoding - - def _get_pos_encoding(self, - input_shape: tf.Tensor, - frame_count: int = 0) -> tf.Tensor: - """Calculates the positional encoding from the input shape. - - Args: - input_shape: the shape of the input. - frame_count: a count of frames that indicates the index of the first - frame. - - Returns: - The positional encoding tensor with shape [num_positions, hidden_size]. - - """ - frames = input_shape[1] - channels = input_shape[-1] - pos_encoding = self._positional_encoding( - frames, channels, start_position=frame_count, dtype=self.dtype) - pos_encoding = tf.reshape(pos_encoding, [1, frames, 1, 1, channels]) - return pos_encoding - - def build(self, input_shape): - """Builds the layer with the given input shape. - - Args: - input_shape: The input shape. - - Raises: - ValueError: If using 'channels_first' data format. - """ - if tf.keras.backend.image_data_format() == 'channels_first': - raise ValueError('"channels_first" mode is unsupported.') - - if self._cache_encoding: - self._pos_encoding = self._get_pos_encoding(input_shape) - - super(PositionalEncoding, self).build(input_shape) - - def call( - self, - inputs: tf.Tensor, - states: Optional[States] = None, - output_states: bool = True, - ) -> Union[tf.Tensor, Tuple[tf.Tensor, States]]: - """Calls the layer with the given inputs. - - Args: - inputs: An input `tf.Tensor`. - states: A `dict` of states such that, if any of the keys match for this - layer, will overwrite the contents of the buffer(s). Expected keys - include `state_prefix + '_pos_enc_frame_count'`. - output_states: A `bool`. If True, returns the output tensor and output - states. Returns just the output tensor otherwise. - - Returns: - An output `tf.Tensor` (and optionally the states if `output_states=True`). - - Raises: - ValueError: If using 'channels_first' data format. - """ - states = dict(states) if states is not None else {} - - # Keep a count of frames encountered across input iterations in - # num_frames to be able to accurately update the positional encoding. - num_frames = tf.shape(inputs)[1] - frame_count = tf.cast(states.get(self._frame_count_name, [0]), tf.int32) - states[self._frame_count_name] = frame_count + num_frames - - if self._cache_encoding: - pos_encoding = self._pos_encoding - else: - pos_encoding = self._get_pos_encoding( - tf.shape(inputs), frame_count=frame_count) - pos_encoding = tf.cast(pos_encoding, inputs.dtype) - pos_encoding = self._rezero(pos_encoding) - outputs = inputs + pos_encoding - - return (outputs, states) if output_states else outputs - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class GlobalAveragePool3D(tf.keras.layers.Layer): - """Creates a global average pooling layer with causal mode. - - Implements causal mode, which runs a cumulative sum (with `tf.cumsum`) across - frames in the time dimension, allowing the use of a stream buffer. Sums any - valid input state with the current input to allow state to accumulate over - several iterations. - """ - - def __init__(self, - keepdims: bool = False, - causal: bool = False, - state_prefix: Optional[str] = None, - **kwargs): - """Initializes a global average pool layer. - - Args: - keepdims: A `bool`. If True, keep the averaged dimensions. - causal: A `bool` of whether to run in causal mode with a cumulative sum - across frames. - state_prefix: a prefix string to identify states. - **kwargs: Additional keyword arguments to be passed to this layer. - - Returns: - An output `tf.Tensor`. - """ - super(GlobalAveragePool3D, self).__init__(**kwargs) - - self._keepdims = keepdims - self._causal = causal - state_prefix = state_prefix if state_prefix is not None else '' - self._state_prefix = state_prefix - - self._state_name = f'{state_prefix}_pool_buffer' - self._frame_count_name = f'{state_prefix}_pool_frame_count' - - def get_config(self): - """Returns a dictionary containing the config used for initialization.""" - config = { - 'keepdims': self._keepdims, - 'causal': self._causal, - 'state_prefix': self._state_prefix, - } - base_config = super(GlobalAveragePool3D, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, - inputs: tf.Tensor, - states: Optional[States] = None, - output_states: bool = True - ) -> Union[tf.Tensor, Tuple[tf.Tensor, States]]: - """Calls the layer with the given inputs. - - Args: - inputs: An input `tf.Tensor`. - states: A `dict` of states such that, if any of the keys match for this - layer, will overwrite the contents of the buffer(s). - Expected keys include `state_prefix + '__pool_buffer'` and - `state_prefix + '__pool_frame_count'`. - output_states: A `bool`. If True, returns the output tensor and output - states. Returns just the output tensor otherwise. - - Returns: - An output `tf.Tensor` (and optionally the states if `output_states=True`). - If `causal=True`, the output tensor will have shape - `[batch_size, num_frames, 1, 1, channels]` if `keepdims=True`. We keep - the frame dimension in this case to simulate a cumulative global average - as if we are inputting one frame at a time. If `causal=False`, the output - is equivalent to `tf.keras.layers.GlobalAveragePooling3D` with shape - `[batch_size, 1, 1, 1, channels]` if `keepdims=True` (plus the optional - buffer stored in `states`). - - Raises: - ValueError: If using 'channels_first' data format. - """ - states = dict(states) if states is not None else {} - - if tf.keras.backend.image_data_format() == 'channels_first': - raise ValueError('"channels_first" mode is unsupported.') - - # Shape: [batch_size, 1, 1, 1, channels] - buffer = states.get(self._state_name, None) - if buffer is None: - buffer = tf.zeros_like(inputs[:, :1, :1, :1], dtype=inputs.dtype) - states[self._state_name] = buffer - - # Keep a count of frames encountered across input iterations in - # num_frames to be able to accurately take a cumulative average across - # all frames when running in streaming mode - num_frames = tf.shape(inputs)[1] - frame_count = states.get(self._frame_count_name, tf.constant([0])) - frame_count = tf.cast(frame_count, tf.int32) - states[self._frame_count_name] = frame_count + num_frames - - if self._causal: - # Take a mean of spatial dimensions to make computation more efficient. - x = tf.reduce_mean(inputs, axis=[2, 3], keepdims=True) - x = tf.cumsum(x, axis=1) - x = x + buffer - - # The last frame will be the value of the next state - # Shape: [batch_size, 1, 1, 1, channels] - states[self._state_name] = x[:, -1:] - - # In causal mode, the divisor increments by 1 for every frame to - # calculate cumulative averages instead of one global average - mean_divisors = tf.range(num_frames) + frame_count + 1 - mean_divisors = tf.reshape(mean_divisors, [1, num_frames, 1, 1, 1]) - mean_divisors = tf.cast(mean_divisors, x.dtype) - - # Shape: [batch_size, num_frames, 1, 1, channels] - x = x / mean_divisors - else: - # In non-causal mode, we (optionally) sum across frames to take a - # cumulative average across input iterations rather than individual - # frames. If no buffer state is passed, this essentially becomes - # regular global average pooling. - # Shape: [batch_size, 1, 1, 1, channels] - x = tf.reduce_sum(inputs, axis=(1, 2, 3), keepdims=True) - x = x / tf.cast(tf.shape(inputs)[2] * tf.shape(inputs)[3], x.dtype) - x = x + buffer - - # Shape: [batch_size, 1, 1, 1, channels] - states[self._state_name] = x - - x = x / tf.cast(frame_count + num_frames, x.dtype) - - if not self._keepdims: - x = tf.squeeze(x, axis=(1, 2, 3)) - - return (x, states) if output_states else x - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class SpatialAveragePool3D(tf.keras.layers.Layer): - """Creates a global average pooling layer pooling across spatial dimentions.""" - - def __init__(self, keepdims: bool = False, **kwargs): - """Initializes a global average pool layer. - - Args: - keepdims: A `bool`. If True, keep the averaged dimensions. - **kwargs: Additional keyword arguments to be passed to this layer. - - Returns: - An output `tf.Tensor`. - """ - super(SpatialAveragePool3D, self).__init__(**kwargs) - self._keepdims = keepdims - - def get_config(self): - """Returns a dictionary containing the config used for initialization.""" - config = { - 'keepdims': self._keepdims, - } - base_config = super(SpatialAveragePool3D, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def build(self, input_shape): - """Builds the layer with the given input shape.""" - if tf.keras.backend.image_data_format() == 'channels_first': - raise ValueError('"channels_first" mode is unsupported.') - - super(SpatialAveragePool3D, self).build(input_shape) - - def call(self, inputs): - """Calls the layer with the given inputs.""" - if inputs.shape.rank != 5: - raise ValueError( - 'Input should have rank {}, got {}'.format(5, inputs.shape.rank)) - - return tf.reduce_mean(inputs, axis=(2, 3), keepdims=self._keepdims) - - -class CausalConvMixin: - """Mixin class to implement CausalConv for `tf.keras.layers.Conv` layers.""" - - @property - def use_buffered_input(self) -> bool: - return self._use_buffered_input - - @use_buffered_input.setter - def use_buffered_input(self, variable: bool): - self._use_buffered_input = variable - - def _compute_buffered_causal_padding(self, - inputs: tf.Tensor, - use_buffered_input: bool = False, - time_axis: int = 1, - ) -> List[List[int]]: - """Calculates padding for 'causal' option for conv layers. - - Args: - inputs: An optional input `tf.Tensor` to be padded. - use_buffered_input: A `bool`. If True, use 'valid' padding along the time - dimension. This should be set when applying the stream buffer. - time_axis: An `int` of the axis of the time dimension. - - Returns: - A list of paddings for `tf.pad`. - """ - input_shape = tf.shape(inputs)[1:-1] - - if tf.keras.backend.image_data_format() == 'channels_first': - raise ValueError('"channels_first" mode is unsupported.') - - kernel_size_effective = [ - (self.kernel_size[i] + - (self.kernel_size[i] - 1) * (self.dilation_rate[i] - 1)) - for i in range(self.rank) - ] - pad_total = [kernel_size_effective[0] - 1] - for i in range(1, self.rank): - overlap = (input_shape[i] - 1) % self.strides[i] + 1 - pad_total.append(tf.maximum(kernel_size_effective[i] - overlap, 0)) - pad_beg = [pad_total[i] // 2 for i in range(self.rank)] - pad_end = [pad_total[i] - pad_beg[i] for i in range(self.rank)] - padding = [[pad_beg[i], pad_end[i]] for i in range(self.rank)] - padding = [[0, 0]] + padding + [[0, 0]] - - if use_buffered_input: - padding[time_axis] = [0, 0] - else: - padding[time_axis] = [padding[time_axis][0] + padding[time_axis][1], 0] - return padding - - def _causal_validate_init(self): - """Validates the Conv layer initial configuration.""" - # Overriding this method is meant to circumvent unnecessary errors when - # using causal padding. - if (self.filters is not None - and self.filters % self.groups != 0): - raise ValueError( - 'The number of filters must be evenly divisible by the number of ' - 'groups. Received: groups={}, filters={}'.format( - self.groups, self.filters)) - - if not all(self.kernel_size): - raise ValueError('The argument `kernel_size` cannot contain 0(s). ' - 'Received: %s' % (self.kernel_size,)) - - def _buffered_spatial_output_shape(self, spatial_output_shape: List[int]): - """Computes the spatial output shape from the input shape.""" - # When buffer padding, use 'valid' padding across time. The output shape - # across time should be the input shape minus any padding, assuming - # the stride across time is 1. - if self._use_buffered_input and spatial_output_shape[0] is not None: - padding = self._compute_buffered_causal_padding( - tf.zeros([1] + spatial_output_shape + [1]), use_buffered_input=False) - spatial_output_shape[0] -= sum(padding[1]) - return spatial_output_shape - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class Conv2D(tf.keras.layers.Conv2D, CausalConvMixin): - """Conv2D layer supporting CausalConv. - - Supports `padding='causal'` option (like in `tf.keras.layers.Conv1D`), - which applies causal padding to the temporal dimension, and same padding in - the spatial dimensions. - """ - - def __init__(self, *args, use_buffered_input=False, **kwargs): - """Initializes conv2d. - - Args: - *args: Arguments to be passed. - use_buffered_input: A `bool`. If True, the input is expected to be padded - beforehand. In effect, calling this layer will use 'valid' padding on - the temporal dimension to simulate 'causal' padding. - **kwargs: Additional keyword arguments to be passed. - - Returns: - An output `tf.Tensor` of the Conv2D operation. - """ - super(Conv2D, self).__init__(*args, **kwargs) - self._use_buffered_input = use_buffered_input - - def get_config(self): - """Returns a dictionary containing the config used for initialization.""" - config = { - 'use_buffered_input': self._use_buffered_input, - } - base_config = super(Conv2D, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def _compute_causal_padding(self, inputs): - """Computes causal padding dimensions for the given inputs.""" - return self._compute_buffered_causal_padding( - inputs, use_buffered_input=self._use_buffered_input) - - def _validate_init(self): - """Validates the Conv layer initial configuration.""" - self._causal_validate_init() - - def _spatial_output_shape(self, spatial_input_shape: List[int]): - """Computes the spatial output shape from the input shape.""" - shape = super(Conv2D, self)._spatial_output_shape(spatial_input_shape) - return self._buffered_spatial_output_shape(shape) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class DepthwiseConv2D(tf.keras.layers.DepthwiseConv2D, CausalConvMixin): - """DepthwiseConv2D layer supporting CausalConv. - - Supports `padding='causal'` option (like in `tf.keras.layers.Conv1D`), - which applies causal padding to the temporal dimension, and same padding in - the spatial dimensions. - """ - - def __init__(self, *args, use_buffered_input=False, **kwargs): - """Initializes depthwise conv2d. - - Args: - *args: Arguments to be passed. - use_buffered_input: A `bool`. If True, the input is expected to be padded - beforehand. In effect, calling this layer will use 'valid' padding on - the temporal dimension to simulate 'causal' padding. - **kwargs: Additional keyword arguments to be passed. - - Returns: - An output `tf.Tensor` of the DepthwiseConv2D operation. - """ - super(DepthwiseConv2D, self).__init__(*args, **kwargs) - self._use_buffered_input = use_buffered_input - - # Causal padding is unsupported by default for DepthwiseConv2D, - # so we resort to valid padding internally. However, we handle - # causal padding as a special case with `self._is_causal`, which is - # defined by the super class. - if self.padding == 'causal': - self.padding = 'valid' - - def get_config(self): - """Returns a dictionary containing the config used for initialization.""" - config = { - 'use_buffered_input': self._use_buffered_input, - } - base_config = super(DepthwiseConv2D, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs): - """Calls the layer with the given inputs.""" - if self._is_causal: - inputs = tf.pad(inputs, self._compute_causal_padding(inputs)) - return super(DepthwiseConv2D, self).call(inputs) - - def _compute_causal_padding(self, inputs): - """Computes causal padding dimensions for the given inputs.""" - return self._compute_buffered_causal_padding( - inputs, use_buffered_input=self._use_buffered_input) - - def _validate_init(self): - """Validates the Conv layer initial configuration.""" - self._causal_validate_init() - - def _spatial_output_shape(self, spatial_input_shape: List[int]): - """Computes the spatial output shape from the input shape.""" - shape = super(DepthwiseConv2D, self)._spatial_output_shape( - spatial_input_shape) - return self._buffered_spatial_output_shape(shape) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class Conv3D(tf.keras.layers.Conv3D, CausalConvMixin): - """Conv3D layer supporting CausalConv. - - Supports `padding='causal'` option (like in `tf.keras.layers.Conv1D`), - which applies causal padding to the temporal dimension, and same padding in - the spatial dimensions. - """ - - def __init__(self, *args, use_buffered_input=False, **kwargs): - """Initializes conv3d. - - Args: - *args: Arguments to be passed. - use_buffered_input: A `bool`. If True, the input is expected to be padded - beforehand. In effect, calling this layer will use 'valid' padding on - the temporal dimension to simulate 'causal' padding. - **kwargs: Additional keyword arguments to be passed. - - Returns: - An output `tf.Tensor` of the Conv3D operation. - """ - super(Conv3D, self).__init__(*args, **kwargs) - self._use_buffered_input = use_buffered_input - - def get_config(self): - """Returns a dictionary containing the config used for initialization.""" - config = { - 'use_buffered_input': self._use_buffered_input, - } - base_config = super(Conv3D, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs): - """Call the layer with the given inputs.""" - # Note: tf.nn.conv3d with depthwise kernels on CPU is currently only - # supported when compiling with TF graph (XLA) using tf.function, so it - # is compiled by default here (b/186463870). - conv_fn = tf.function(super(Conv3D, self).call, jit_compile=True) - return conv_fn(inputs) - - def _compute_causal_padding(self, inputs): - """Computes causal padding dimensions for the given inputs.""" - return self._compute_buffered_causal_padding( - inputs, use_buffered_input=self._use_buffered_input) - - def _validate_init(self): - """Validates the Conv layer initial configuration.""" - self._causal_validate_init() - - def _spatial_output_shape(self, spatial_input_shape: List[int]): - """Computes the spatial output shape from the input shape.""" - shape = super(Conv3D, self)._spatial_output_shape(spatial_input_shape) - return self._buffered_spatial_output_shape(shape) - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class SpatialPyramidPooling(tf.keras.layers.Layer): - """Implements the Atrous Spatial Pyramid Pooling. - - References: - [Rethinking Atrous Convolution for Semantic Image Segmentation]( - https://arxiv.org/pdf/1706.05587.pdf) - [Encoder-Decoder with Atrous Separable Convolution for Semantic Image - Segmentation](https://arxiv.org/pdf/1802.02611.pdf) - """ - - def __init__( - self, - output_channels: int, - dilation_rates: List[int], - pool_kernel_size: Optional[List[int]] = None, - use_sync_bn: bool = False, - batchnorm_momentum: float = 0.99, - batchnorm_epsilon: float = 0.001, - activation: str = 'relu', - dropout: float = 0.5, - kernel_initializer: str = 'GlorotUniform', - kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, - interpolation: str = 'bilinear', - use_depthwise_convolution: bool = False, - **kwargs): - """Initializes `SpatialPyramidPooling`. - - Args: - output_channels: Number of channels produced by SpatialPyramidPooling. - dilation_rates: A list of integers for parallel dilated conv. - pool_kernel_size: A list of integers or None. If None, global average - pooling is applied, otherwise an average pooling of pool_kernel_size is - applied. - use_sync_bn: A bool, whether or not to use sync batch normalization. - batchnorm_momentum: A float for the momentum in BatchNorm. Defaults to - 0.99. - batchnorm_epsilon: A float for the epsilon value in BatchNorm. Defaults to - 0.001. - activation: A `str` for type of activation to be used. Defaults to 'relu'. - dropout: A float for the dropout rate before output. Defaults to 0.5. - kernel_initializer: Kernel initializer for conv layers. Defaults to - `glorot_uniform`. - kernel_regularizer: Kernel regularizer for conv layers. Defaults to None. - interpolation: The interpolation method for upsampling. Defaults to - `bilinear`. - use_depthwise_convolution: Allows spatial pooling to be separable - depthwise convolusions. [Encoder-Decoder with Atrous Separable - Convolution for Semantic Image Segmentation]( - https://arxiv.org/pdf/1802.02611.pdf) - **kwargs: Other keyword arguments for the layer. - """ - super().__init__(**kwargs) - - self._output_channels = output_channels - self._dilation_rates = dilation_rates - self._use_sync_bn = use_sync_bn - self._batchnorm_momentum = batchnorm_momentum - self._batchnorm_epsilon = batchnorm_epsilon - self._activation = activation - self._dropout = dropout - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._interpolation = interpolation - self._pool_kernel_size = pool_kernel_size - self._use_depthwise_convolution = use_depthwise_convolution - self._activation_fn = tf_utils.get_activation(activation) - if self._use_sync_bn: - self._bn_op = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._bn_op = tf.keras.layers.BatchNormalization - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - - def build(self, input_shape): - height = input_shape[1] - width = input_shape[2] - channels = input_shape[3] - - self.aspp_layers = [] - - conv1 = tf.keras.layers.Conv2D( - filters=self._output_channels, - kernel_size=(1, 1), - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - use_bias=False) - norm1 = self._bn_op( - axis=self._bn_axis, - momentum=self._batchnorm_momentum, - epsilon=self._batchnorm_epsilon) - - self.aspp_layers.append([conv1, norm1]) - - for dilation_rate in self._dilation_rates: - leading_layers = [] - kernel_size = (3, 3) - if self._use_depthwise_convolution: - leading_layers += [ - tf.keras.layers.DepthwiseConv2D( - depth_multiplier=1, - kernel_size=kernel_size, - padding='same', - depthwise_regularizer=self._kernel_regularizer, - depthwise_initializer=self._kernel_initializer, - dilation_rate=dilation_rate, - use_bias=False) - ] - kernel_size = (1, 1) - conv_dilation = leading_layers + [ - tf.keras.layers.Conv2D( - filters=self._output_channels, - kernel_size=kernel_size, - padding='same', - kernel_regularizer=self._kernel_regularizer, - kernel_initializer=self._kernel_initializer, - dilation_rate=dilation_rate, - use_bias=False) - ] - norm_dilation = self._bn_op( - axis=self._bn_axis, - momentum=self._batchnorm_momentum, - epsilon=self._batchnorm_epsilon) - - self.aspp_layers.append(conv_dilation + [norm_dilation]) - - if self._pool_kernel_size is None: - pooling = [ - tf.keras.layers.GlobalAveragePooling2D(), - tf.keras.layers.Reshape((1, 1, channels)) - ] - else: - pooling = [tf.keras.layers.AveragePooling2D(self._pool_kernel_size)] - - conv2 = tf.keras.layers.Conv2D( - filters=self._output_channels, - kernel_size=(1, 1), - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - use_bias=False) - norm2 = self._bn_op( - axis=self._bn_axis, - momentum=self._batchnorm_momentum, - epsilon=self._batchnorm_epsilon) - - self.aspp_layers.append(pooling + [conv2, norm2]) - - self._resizing_layer = tf.keras.layers.Resizing( - height, width, interpolation=self._interpolation, dtype=tf.float32) - - self._projection = [ - tf.keras.layers.Conv2D( - filters=self._output_channels, - kernel_size=(1, 1), - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - use_bias=False), - self._bn_op( - axis=self._bn_axis, - momentum=self._batchnorm_momentum, - epsilon=self._batchnorm_epsilon) - ] - self._dropout_layer = tf.keras.layers.Dropout(rate=self._dropout) - self._concat_layer = tf.keras.layers.Concatenate(axis=-1) - - def call(self, - inputs: tf.Tensor, - training: Optional[bool] = None) -> tf.Tensor: - if training is None: - training = tf.keras.backend.learning_phase() - result = [] - for i, layers in enumerate(self.aspp_layers): - x = inputs - for layer in layers: - # Apply layers sequentially. - x = layer(x, training=training) - x = self._activation_fn(x) - - # Apply resize layer to the end of the last set of layers. - if i == len(self.aspp_layers) - 1: - x = self._resizing_layer(x) - - result.append(tf.cast(x, inputs.dtype)) - x = self._concat_layer(result) - for layer in self._projection: - x = layer(x, training=training) - x = self._activation_fn(x) - return self._dropout_layer(x) - - def get_config(self): - config = { - 'output_channels': self._output_channels, - 'dilation_rates': self._dilation_rates, - 'pool_kernel_size': self._pool_kernel_size, - 'use_sync_bn': self._use_sync_bn, - 'batchnorm_momentum': self._batchnorm_momentum, - 'batchnorm_epsilon': self._batchnorm_epsilon, - 'activation': self._activation, - 'dropout': self._dropout, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'interpolation': self._interpolation, - } - base_config = super().get_config() - return dict(list(base_config.items()) + list(config.items())) diff --git a/official/vision/beta/modeling/layers/nn_layers_test.py b/official/vision/beta/modeling/layers/nn_layers_test.py deleted file mode 100644 index 6cc484ce56ad858fbe6db6b1ce4eb8b6f703b805..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/layers/nn_layers_test.py +++ /dev/null @@ -1,419 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for nn_layers.""" - -# Import libraries -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.beta.modeling.layers import nn_layers - - -class NNLayersTest(parameterized.TestCase, tf.test.TestCase): - - def test_scale(self): - scale = nn_layers.Scale(initializer=tf.keras.initializers.constant(10.)) - output = scale(3.) - self.assertAllEqual(output, 30.) - - def test_temporal_softmax_pool(self): - inputs = tf.range(4, dtype=tf.float32) + 1. - inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) - layer = nn_layers.TemporalSoftmaxPool() - output = layer(inputs) - self.assertAllClose( - output, - [[[[[0.10153633]]], - [[[0.33481020]]], - [[[0.82801306]]], - [[[1.82021690]]]]]) - - def test_positional_encoding(self): - pos_encoding = nn_layers.PositionalEncoding( - initializer='ones', cache_encoding=False) - pos_encoding_cached = nn_layers.PositionalEncoding( - initializer='ones', cache_encoding=True) - - inputs = tf.ones([1, 4, 1, 1, 3]) - outputs, _ = pos_encoding(inputs) - outputs_cached, _ = pos_encoding_cached(inputs) - - expected = tf.constant( - [[[[[1.0000000, 1.0000000, 2.0000000]]], - [[[1.8414710, 1.0021545, 1.5403023]]], - [[[1.9092975, 1.0043088, 0.5838531]]], - [[[1.1411200, 1.0064633, 0.0100075]]]]]) - - self.assertEqual(outputs.shape, expected.shape) - self.assertAllClose(outputs, expected) - - self.assertEqual(outputs.shape, outputs_cached.shape) - self.assertAllClose(outputs, outputs_cached) - - inputs = tf.ones([1, 5, 1, 1, 3]) - _ = pos_encoding(inputs) - - def test_positional_encoding_bfloat16(self): - pos_encoding = nn_layers.PositionalEncoding(initializer='ones') - - inputs = tf.ones([1, 4, 1, 1, 3], dtype=tf.bfloat16) - outputs, _ = pos_encoding(inputs) - - expected = tf.constant( - [[[[[1.0000000, 1.0000000, 2.0000000]]], - [[[1.8414710, 1.0021545, 1.5403023]]], - [[[1.9092975, 1.0043088, 0.5838531]]], - [[[1.1411200, 1.0064633, 0.0100075]]]]]) - - self.assertEqual(outputs.shape, expected.shape) - self.assertAllClose(outputs, expected) - - def test_global_average_pool_basic(self): - pool = nn_layers.GlobalAveragePool3D(keepdims=True) - - inputs = tf.ones([1, 2, 3, 4, 1]) - outputs = pool(inputs, output_states=False) - - expected = tf.ones([1, 1, 1, 1, 1]) - - self.assertEqual(outputs.shape, expected.shape) - self.assertAllEqual(outputs, expected) - - def test_positional_encoding_stream(self): - pos_encoding = nn_layers.PositionalEncoding( - initializer='ones', cache_encoding=False) - - inputs = tf.range(4, dtype=tf.float32) + 1. - inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) - inputs = tf.tile(inputs, [1, 1, 1, 1, 3]) - expected, _ = pos_encoding(inputs) - - for num_splits in [1, 2, 4]: - frames = tf.split(inputs, num_splits, axis=1) - states = {} - predicted = [] - for frame in frames: - output, states = pos_encoding(frame, states=states) - predicted.append(output) - predicted = tf.concat(predicted, axis=1) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - self.assertAllClose(predicted, [[[[[1.0000000, 1.0000000, 2.0000000]]], - [[[2.8414710, 2.0021544, 2.5403023]]], - [[[3.9092975, 3.0043090, 2.5838532]]], - [[[4.1411200, 4.0064630, 3.0100074]]]]]) - - def test_global_average_pool_keras(self): - pool = nn_layers.GlobalAveragePool3D(keepdims=False) - keras_pool = tf.keras.layers.GlobalAveragePooling3D() - - inputs = 10 * tf.random.normal([1, 2, 3, 4, 1]) - - outputs = pool(inputs, output_states=False) - keras_output = keras_pool(inputs) - - self.assertAllEqual(outputs.shape, keras_output.shape) - self.assertAllClose(outputs, keras_output) - - def test_stream_global_average_pool(self): - gap = nn_layers.GlobalAveragePool3D(keepdims=True, causal=False) - - inputs = tf.range(4, dtype=tf.float32) + 1. - inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) - inputs = tf.tile(inputs, [1, 1, 2, 2, 3]) - expected, _ = gap(inputs) - - for num_splits in [1, 2, 4]: - frames = tf.split(inputs, num_splits, axis=1) - states = {} - predicted = None - for frame in frames: - predicted, states = gap(frame, states=states) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - self.assertAllClose( - predicted, - [[[[[2.5, 2.5, 2.5]]]]]) - - def test_causal_stream_global_average_pool(self): - gap = nn_layers.GlobalAveragePool3D(keepdims=True, causal=True) - - inputs = tf.range(4, dtype=tf.float32) + 1. - inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) - inputs = tf.tile(inputs, [1, 1, 2, 2, 3]) - expected, _ = gap(inputs) - - for num_splits in [1, 2, 4]: - frames = tf.split(inputs, num_splits, axis=1) - states = {} - predicted = [] - for frame in frames: - x, states = gap(frame, states=states) - predicted.append(x) - predicted = tf.concat(predicted, axis=1) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - self.assertAllClose( - predicted, - [[[[[1.0, 1.0, 1.0]]], - [[[1.5, 1.5, 1.5]]], - [[[2.0, 2.0, 2.0]]], - [[[2.5, 2.5, 2.5]]]]]) - - def test_spatial_average_pool(self): - pool = nn_layers.SpatialAveragePool3D(keepdims=True) - - inputs = tf.range(64, dtype=tf.float32) + 1. - inputs = tf.reshape(inputs, [1, 4, 4, 4, 1]) - - output = pool(inputs) - - self.assertEqual(output.shape, [1, 4, 1, 1, 1]) - self.assertAllClose( - output, - [[[[[8.50]]], - [[[24.5]]], - [[[40.5]]], - [[[56.5]]]]]) - - def test_conv2d_causal(self): - conv2d = nn_layers.Conv2D( - filters=3, - kernel_size=(3, 3), - strides=(1, 2), - padding='causal', - use_buffered_input=True, - kernel_initializer='ones', - use_bias=False, - ) - - inputs = tf.ones([1, 4, 2, 3]) - - paddings = [[0, 0], [2, 0], [0, 0], [0, 0]] - padded_inputs = tf.pad(inputs, paddings) - predicted = conv2d(padded_inputs) - - expected = tf.constant( - [[[[6.0, 6.0, 6.0]], - [[12., 12., 12.]], - [[18., 18., 18.]], - [[18., 18., 18.]]]]) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - conv2d.use_buffered_input = False - predicted = conv2d(inputs) - - self.assertFalse(conv2d.use_buffered_input) - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - def test_depthwise_conv2d_causal(self): - conv2d = nn_layers.DepthwiseConv2D( - kernel_size=(3, 3), - strides=(1, 1), - padding='causal', - use_buffered_input=True, - depthwise_initializer='ones', - use_bias=False, - ) - - inputs = tf.ones([1, 2, 2, 3]) - - paddings = [[0, 0], [2, 0], [0, 0], [0, 0]] - padded_inputs = tf.pad(inputs, paddings) - predicted = conv2d(padded_inputs) - - expected = tf.constant( - [[[[2., 2., 2.], - [2., 2., 2.]], - [[4., 4., 4.], - [4., 4., 4.]]]]) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - conv2d.use_buffered_input = False - predicted = conv2d(inputs) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - def test_conv3d_causal(self): - conv3d = nn_layers.Conv3D( - filters=3, - kernel_size=(3, 3, 3), - strides=(1, 2, 2), - padding='causal', - use_buffered_input=True, - kernel_initializer='ones', - use_bias=False, - ) - - inputs = tf.ones([1, 2, 4, 4, 3]) - - paddings = [[0, 0], [2, 0], [0, 0], [0, 0], [0, 0]] - padded_inputs = tf.pad(inputs, paddings) - predicted = conv3d(padded_inputs) - - expected = tf.constant( - [[[[[27., 27., 27.], - [18., 18., 18.]], - [[18., 18., 18.], - [12., 12., 12.]]], - [[[54., 54., 54.], - [36., 36., 36.]], - [[36., 36., 36.], - [24., 24., 24.]]]]]) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - conv3d.use_buffered_input = False - predicted = conv3d(inputs) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - def test_depthwise_conv3d_causal(self): - conv3d = nn_layers.Conv3D( - filters=3, - kernel_size=(3, 3, 3), - strides=(1, 2, 2), - padding='causal', - use_buffered_input=True, - kernel_initializer='ones', - use_bias=False, - groups=3, - ) - - inputs = tf.ones([1, 2, 4, 4, 3]) - - paddings = [[0, 0], [2, 0], [0, 0], [0, 0], [0, 0]] - padded_inputs = tf.pad(inputs, paddings) - predicted = conv3d(padded_inputs) - - expected = tf.constant( - [[[[[9.0, 9.0, 9.0], - [6.0, 6.0, 6.0]], - [[6.0, 6.0, 6.0], - [4.0, 4.0, 4.0]]], - [[[18.0, 18.0, 18.0], - [12., 12., 12.]], - [[12., 12., 12.], - [8., 8., 8.]]]]]) - - output_shape = conv3d._spatial_output_shape([4, 4, 4]) - self.assertAllClose(output_shape, [2, 2, 2]) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - conv3d.use_buffered_input = False - predicted = conv3d(inputs) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - def test_conv3d_causal_padding_2d(self): - """Test to ensure causal padding works like standard padding.""" - conv3d = nn_layers.Conv3D( - filters=1, - kernel_size=(1, 3, 3), - strides=(1, 2, 2), - padding='causal', - use_buffered_input=False, - kernel_initializer='ones', - use_bias=False, - ) - - keras_conv3d = tf.keras.layers.Conv3D( - filters=1, - kernel_size=(1, 3, 3), - strides=(1, 2, 2), - padding='same', - kernel_initializer='ones', - use_bias=False, - ) - - inputs = tf.ones([1, 1, 4, 4, 1]) - - predicted = conv3d(inputs) - expected = keras_conv3d(inputs) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - self.assertAllClose(predicted, - [[[[[9.], - [6.]], - [[6.], - [4.]]]]]) - - def test_conv3d_causal_padding_1d(self): - """Test to ensure causal padding works like standard padding.""" - conv3d = nn_layers.Conv3D( - filters=1, - kernel_size=(3, 1, 1), - strides=(2, 1, 1), - padding='causal', - use_buffered_input=False, - kernel_initializer='ones', - use_bias=False, - ) - - keras_conv1d = tf.keras.layers.Conv1D( - filters=1, - kernel_size=3, - strides=2, - padding='causal', - kernel_initializer='ones', - use_bias=False, - ) - - inputs = tf.ones([1, 4, 1, 1, 1]) - - predicted = conv3d(inputs) - expected = keras_conv1d(tf.squeeze(inputs, axis=[2, 3])) - expected = tf.reshape(expected, [1, 2, 1, 1, 1]) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected) - - self.assertAllClose(predicted, - [[[[[1.]]], - [[[3.]]]]]) - - @parameterized.parameters( - (None, []), - (None, [6, 12, 18]), - ([32, 32], [6, 12, 18]), - ) - def test_aspp(self, pool_kernel_size, dilation_rates): - inputs = tf.keras.Input(shape=(64, 64, 128), dtype=tf.float32) - layer = nn_layers.SpatialPyramidPooling( - output_channels=256, - dilation_rates=dilation_rates, - pool_kernel_size=pool_kernel_size) - output = layer(inputs) - self.assertAllEqual([None, 64, 64, 256], output.shape) - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/maskrcnn_model.py b/official/vision/beta/modeling/maskrcnn_model.py deleted file mode 100644 index 722a50b40a30320df2b5a0b3212b9123898d4fc0..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/maskrcnn_model.py +++ /dev/null @@ -1,429 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""R-CNN(-RS) models.""" - -from typing import Any, List, Mapping, Optional, Tuple, Union - -import tensorflow as tf - -from official.vision.beta.ops import anchor -from official.vision.beta.ops import box_ops - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class MaskRCNNModel(tf.keras.Model): - """The Mask R-CNN(-RS) and Cascade RCNN-RS models.""" - - def __init__(self, - backbone: tf.keras.Model, - decoder: tf.keras.Model, - rpn_head: tf.keras.layers.Layer, - detection_head: Union[tf.keras.layers.Layer, - List[tf.keras.layers.Layer]], - roi_generator: tf.keras.layers.Layer, - roi_sampler: Union[tf.keras.layers.Layer, - List[tf.keras.layers.Layer]], - roi_aligner: tf.keras.layers.Layer, - detection_generator: tf.keras.layers.Layer, - mask_head: Optional[tf.keras.layers.Layer] = None, - mask_sampler: Optional[tf.keras.layers.Layer] = None, - mask_roi_aligner: Optional[tf.keras.layers.Layer] = None, - class_agnostic_bbox_pred: bool = False, - cascade_class_ensemble: bool = False, - min_level: Optional[int] = None, - max_level: Optional[int] = None, - num_scales: Optional[int] = None, - aspect_ratios: Optional[List[float]] = None, - anchor_size: Optional[float] = None, - **kwargs): - """Initializes the R-CNN(-RS) model. - - Args: - backbone: `tf.keras.Model`, the backbone network. - decoder: `tf.keras.Model`, the decoder network. - rpn_head: the RPN head. - detection_head: the detection head or a list of heads. - roi_generator: the ROI generator. - roi_sampler: a single ROI sampler or a list of ROI samplers for cascade - detection heads. - roi_aligner: the ROI aligner. - detection_generator: the detection generator. - mask_head: the mask head. - mask_sampler: the mask sampler. - mask_roi_aligner: the ROI alginer for mask prediction. - class_agnostic_bbox_pred: if True, perform class agnostic bounding box - prediction. Needs to be `True` for Cascade RCNN models. - cascade_class_ensemble: if True, ensemble classification scores over all - detection heads. - min_level: Minimum level in output feature maps. - max_level: Maximum level in output feature maps. - num_scales: A number representing intermediate scales added on each level. - For instances, num_scales=2 adds one additional intermediate anchor - scales [2^0, 2^0.5] on each level. - aspect_ratios: A list representing the aspect raito anchors added on each - level. The number indicates the ratio of width to height. For instances, - aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each scale level. - anchor_size: A number representing the scale of size of the base anchor to - the feature stride 2^level. - **kwargs: keyword arguments to be passed. - """ - super(MaskRCNNModel, self).__init__(**kwargs) - self._config_dict = { - 'backbone': backbone, - 'decoder': decoder, - 'rpn_head': rpn_head, - 'detection_head': detection_head, - 'roi_generator': roi_generator, - 'roi_sampler': roi_sampler, - 'roi_aligner': roi_aligner, - 'detection_generator': detection_generator, - 'mask_head': mask_head, - 'mask_sampler': mask_sampler, - 'mask_roi_aligner': mask_roi_aligner, - 'class_agnostic_bbox_pred': class_agnostic_bbox_pred, - 'cascade_class_ensemble': cascade_class_ensemble, - 'min_level': min_level, - 'max_level': max_level, - 'num_scales': num_scales, - 'aspect_ratios': aspect_ratios, - 'anchor_size': anchor_size, - } - self.backbone = backbone - self.decoder = decoder - self.rpn_head = rpn_head - if not isinstance(detection_head, (list, tuple)): - self.detection_head = [detection_head] - else: - self.detection_head = detection_head - self.roi_generator = roi_generator - if not isinstance(roi_sampler, (list, tuple)): - self.roi_sampler = [roi_sampler] - else: - self.roi_sampler = roi_sampler - if len(self.roi_sampler) > 1 and not class_agnostic_bbox_pred: - raise ValueError( - '`class_agnostic_bbox_pred` needs to be True if multiple detection heads are specified.' - ) - self.roi_aligner = roi_aligner - self.detection_generator = detection_generator - self._include_mask = mask_head is not None - self.mask_head = mask_head - if self._include_mask and mask_sampler is None: - raise ValueError('`mask_sampler` is not provided in Mask R-CNN.') - self.mask_sampler = mask_sampler - if self._include_mask and mask_roi_aligner is None: - raise ValueError('`mask_roi_aligner` is not provided in Mask R-CNN.') - self.mask_roi_aligner = mask_roi_aligner - # Weights for the regression losses for each FRCNN layer. - # TODO(xianzhi): Make the weights configurable. - self._cascade_layer_to_weights = [ - [10.0, 10.0, 5.0, 5.0], - [20.0, 20.0, 10.0, 10.0], - [30.0, 30.0, 15.0, 15.0], - ] - - def call(self, - images: tf.Tensor, - image_shape: tf.Tensor, - anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, - gt_boxes: Optional[tf.Tensor] = None, - gt_classes: Optional[tf.Tensor] = None, - gt_masks: Optional[tf.Tensor] = None, - training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: - - model_outputs, intermediate_outputs = self._call_box_outputs( - images=images, image_shape=image_shape, anchor_boxes=anchor_boxes, - gt_boxes=gt_boxes, gt_classes=gt_classes, training=training) - if not self._include_mask: - return model_outputs - - model_mask_outputs = self._call_mask_outputs( - model_box_outputs=model_outputs, - features=model_outputs['decoder_features'], - current_rois=intermediate_outputs['current_rois'], - matched_gt_indices=intermediate_outputs['matched_gt_indices'], - matched_gt_boxes=intermediate_outputs['matched_gt_boxes'], - matched_gt_classes=intermediate_outputs['matched_gt_classes'], - gt_masks=gt_masks, - training=training) - model_outputs.update(model_mask_outputs) - return model_outputs - - def _get_backbone_and_decoder_features(self, images): - - backbone_features = self.backbone(images) - if self.decoder: - features = self.decoder(backbone_features) - else: - features = backbone_features - return backbone_features, features - - def _call_box_outputs( - self, images: tf.Tensor, - image_shape: tf.Tensor, - anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, - gt_boxes: Optional[tf.Tensor] = None, - gt_classes: Optional[tf.Tensor] = None, - training: Optional[bool] = None) -> Tuple[ - Mapping[str, tf.Tensor], Mapping[str, tf.Tensor]]: - """Implementation of the Faster-RCNN logic for boxes.""" - model_outputs = {} - - # Feature extraction. - (backbone_features, - decoder_features) = self._get_backbone_and_decoder_features(images) - - # Region proposal network. - rpn_scores, rpn_boxes = self.rpn_head(decoder_features) - - model_outputs.update({ - 'backbone_features': backbone_features, - 'decoder_features': decoder_features, - 'rpn_boxes': rpn_boxes, - 'rpn_scores': rpn_scores - }) - - # Generate anchor boxes for this batch if not provided. - if anchor_boxes is None: - _, image_height, image_width, _ = images.get_shape().as_list() - anchor_boxes = anchor.Anchor( - min_level=self._config_dict['min_level'], - max_level=self._config_dict['max_level'], - num_scales=self._config_dict['num_scales'], - aspect_ratios=self._config_dict['aspect_ratios'], - anchor_size=self._config_dict['anchor_size'], - image_size=(image_height, image_width)).multilevel_boxes - for l in anchor_boxes: - anchor_boxes[l] = tf.tile( - tf.expand_dims(anchor_boxes[l], axis=0), - [tf.shape(images)[0], 1, 1, 1]) - - # Generate RoIs. - current_rois, _ = self.roi_generator(rpn_boxes, rpn_scores, anchor_boxes, - image_shape, training) - - next_rois = current_rois - all_class_outputs = [] - for cascade_num in range(len(self.roi_sampler)): - # In cascade RCNN we want the higher layers to have different regression - # weights as the predicted deltas become smaller and smaller. - regression_weights = self._cascade_layer_to_weights[cascade_num] - current_rois = next_rois - - (class_outputs, box_outputs, model_outputs, matched_gt_boxes, - matched_gt_classes, matched_gt_indices, - current_rois) = self._run_frcnn_head( - features=decoder_features, - rois=current_rois, - gt_boxes=gt_boxes, - gt_classes=gt_classes, - training=training, - model_outputs=model_outputs, - cascade_num=cascade_num, - regression_weights=regression_weights) - all_class_outputs.append(class_outputs) - - # Generate ROIs for the next cascade head if there is any. - if cascade_num < len(self.roi_sampler) - 1: - next_rois = box_ops.decode_boxes( - tf.cast(box_outputs, tf.float32), - current_rois, - weights=regression_weights) - next_rois = box_ops.clip_boxes(next_rois, - tf.expand_dims(image_shape, axis=1)) - - if not training: - if self._config_dict['cascade_class_ensemble']: - class_outputs = tf.add_n(all_class_outputs) / len(all_class_outputs) - - detections = self.detection_generator( - box_outputs, - class_outputs, - current_rois, - image_shape, - regression_weights, - bbox_per_class=(not self._config_dict['class_agnostic_bbox_pred'])) - model_outputs.update({ - 'cls_outputs': class_outputs, - 'box_outputs': box_outputs, - }) - if self.detection_generator.get_config()['apply_nms']: - model_outputs.update({ - 'detection_boxes': detections['detection_boxes'], - 'detection_scores': detections['detection_scores'], - 'detection_classes': detections['detection_classes'], - 'num_detections': detections['num_detections'] - }) - else: - model_outputs.update({ - 'decoded_boxes': detections['decoded_boxes'], - 'decoded_box_scores': detections['decoded_box_scores'] - }) - - intermediate_outputs = { - 'matched_gt_boxes': matched_gt_boxes, - 'matched_gt_indices': matched_gt_indices, - 'matched_gt_classes': matched_gt_classes, - 'current_rois': current_rois, - } - return (model_outputs, intermediate_outputs) - - def _call_mask_outputs( - self, - model_box_outputs: Mapping[str, tf.Tensor], - features: tf.Tensor, - current_rois: tf.Tensor, - matched_gt_indices: tf.Tensor, - matched_gt_boxes: tf.Tensor, - matched_gt_classes: tf.Tensor, - gt_masks: tf.Tensor, - training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: - """Implementation of Mask-RCNN mask prediction logic.""" - - model_outputs = dict(model_box_outputs) - if training: - current_rois, roi_classes, roi_masks = self.mask_sampler( - current_rois, matched_gt_boxes, matched_gt_classes, - matched_gt_indices, gt_masks) - roi_masks = tf.stop_gradient(roi_masks) - - model_outputs.update({ - 'mask_class_targets': roi_classes, - 'mask_targets': roi_masks, - }) - else: - current_rois = model_outputs['detection_boxes'] - roi_classes = model_outputs['detection_classes'] - - mask_logits, mask_probs = self._features_to_mask_outputs( - features, current_rois, roi_classes) - - if training: - model_outputs.update({ - 'mask_outputs': mask_logits, - }) - else: - model_outputs.update({ - 'detection_masks': mask_probs, - }) - return model_outputs - - def _run_frcnn_head(self, features, rois, gt_boxes, gt_classes, training, - model_outputs, cascade_num, regression_weights): - """Runs the frcnn head that does both class and box prediction. - - Args: - features: `list` of features from the feature extractor. - rois: `list` of current rois that will be used to predict bbox refinement - and classes from. - gt_boxes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES, 4]. - This tensor might have paddings with a negative value. - gt_classes: [batch_size, MAX_INSTANCES] representing the groundtruth box - classes. It is padded with -1s to indicate the invalid classes. - training: `bool`, if model is training or being evaluated. - model_outputs: `dict`, used for storing outputs used for eval and losses. - cascade_num: `int`, the current frcnn layer in the cascade. - regression_weights: `list`, weights used for l1 loss in bounding box - regression. - - Returns: - class_outputs: Class predictions for rois. - box_outputs: Box predictions for rois. These are formatted for the - regression loss and need to be converted before being used as rois - in the next stage. - model_outputs: Updated dict with predictions used for losses and eval. - matched_gt_boxes: If `is_training` is true, then these give the gt box - location of its positive match. - matched_gt_classes: If `is_training` is true, then these give the gt class - of the predicted box. - matched_gt_boxes: If `is_training` is true, then these give the box - location of its positive match. - matched_gt_indices: If `is_training` is true, then gives the index of - the positive box match. Used for mask prediction. - rois: The sampled rois used for this layer. - """ - # Only used during training. - matched_gt_boxes, matched_gt_classes, matched_gt_indices = (None, None, - None) - if training and gt_boxes is not None: - rois = tf.stop_gradient(rois) - - current_roi_sampler = self.roi_sampler[cascade_num] - rois, matched_gt_boxes, matched_gt_classes, matched_gt_indices = ( - current_roi_sampler(rois, gt_boxes, gt_classes)) - # Create bounding box training targets. - box_targets = box_ops.encode_boxes( - matched_gt_boxes, rois, weights=regression_weights) - # If the target is background, the box target is set to all 0s. - box_targets = tf.where( - tf.tile( - tf.expand_dims(tf.equal(matched_gt_classes, 0), axis=-1), - [1, 1, 4]), tf.zeros_like(box_targets), box_targets) - model_outputs.update({ - 'class_targets_{}'.format(cascade_num) - if cascade_num else 'class_targets': - matched_gt_classes, - 'box_targets_{}'.format(cascade_num) - if cascade_num else 'box_targets': - box_targets, - }) - - # Get roi features. - roi_features = self.roi_aligner(features, rois) - - # Run frcnn head to get class and bbox predictions. - current_detection_head = self.detection_head[cascade_num] - class_outputs, box_outputs = current_detection_head(roi_features) - - model_outputs.update({ - 'class_outputs_{}'.format(cascade_num) - if cascade_num else 'class_outputs': - class_outputs, - 'box_outputs_{}'.format(cascade_num) if cascade_num else 'box_outputs': - box_outputs, - }) - return (class_outputs, box_outputs, model_outputs, matched_gt_boxes, - matched_gt_classes, matched_gt_indices, rois) - - def _features_to_mask_outputs(self, features, rois, roi_classes): - # Mask RoI align. - mask_roi_features = self.mask_roi_aligner(features, rois) - - # Mask head. - raw_masks = self.mask_head([mask_roi_features, roi_classes]) - - return raw_masks, tf.nn.sigmoid(raw_masks) - - @property - def checkpoint_items( - self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: - """Returns a dictionary of items to be additionally checkpointed.""" - items = dict( - backbone=self.backbone, - rpn_head=self.rpn_head, - detection_head=self.detection_head) - if self.decoder is not None: - items.update(decoder=self.decoder) - if self._include_mask: - items.update(mask_head=self.mask_head) - - return items - - def get_config(self) -> Mapping[str, Any]: - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/modeling/maskrcnn_model_test.py b/official/vision/beta/modeling/maskrcnn_model_test.py deleted file mode 100644 index 7c42bc5dbb9e1a8d8ee9fecc83d6b765a60746a1..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/maskrcnn_model_test.py +++ /dev/null @@ -1,398 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for maskrcnn_model.py.""" - -import os -# Import libraries -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.modeling import maskrcnn_model -from official.vision.beta.modeling.backbones import resnet -from official.vision.beta.modeling.decoders import fpn -from official.vision.beta.modeling.heads import dense_prediction_heads -from official.vision.beta.modeling.heads import instance_heads -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.modeling.layers import mask_sampler -from official.vision.beta.modeling.layers import roi_aligner -from official.vision.beta.modeling.layers import roi_generator -from official.vision.beta.modeling.layers import roi_sampler -from official.vision.beta.ops import anchor - - -class MaskRCNNModelTest(parameterized.TestCase, tf.test.TestCase): - - @combinations.generate( - combinations.combine( - include_mask=[True, False], - use_separable_conv=[True, False], - build_anchor_boxes=[True, False], - is_training=[True, False])) - def test_build_model(self, include_mask, use_separable_conv, - build_anchor_boxes, is_training): - num_classes = 3 - min_level = 3 - max_level = 7 - num_scales = 3 - aspect_ratios = [1.0] - anchor_size = 3 - resnet_model_id = 50 - num_anchors_per_location = num_scales * len(aspect_ratios) - image_size = 384 - images = np.random.rand(2, image_size, image_size, 3) - image_shape = np.array([[image_size, image_size], [image_size, image_size]]) - - if build_anchor_boxes: - anchor_boxes = anchor.Anchor( - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=3, - image_size=(image_size, image_size)).multilevel_boxes - for l in anchor_boxes: - anchor_boxes[l] = tf.tile( - tf.expand_dims(anchor_boxes[l], axis=0), [2, 1, 1, 1]) - else: - anchor_boxes = None - - backbone = resnet.ResNet(model_id=resnet_model_id) - decoder = fpn.FPN( - input_specs=backbone.output_specs, - min_level=min_level, - max_level=max_level, - use_separable_conv=use_separable_conv) - rpn_head = dense_prediction_heads.RPNHead( - min_level=min_level, - max_level=max_level, - num_anchors_per_location=num_anchors_per_location, - num_convs=1) - detection_head = instance_heads.DetectionHead(num_classes=num_classes) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - roi_sampler_obj = roi_sampler.ROISampler() - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - if include_mask: - mask_head = instance_heads.MaskHead( - num_classes=num_classes, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - else: - mask_head = None - mask_sampler_obj = None - mask_roi_aligner_obj = None - model = maskrcnn_model.MaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size) - - gt_boxes = np.array( - [[[10, 10, 15, 15], [2.5, 2.5, 7.5, 7.5], [-1, -1, -1, -1]], - [[100, 100, 150, 150], [-1, -1, -1, -1], [-1, -1, -1, -1]]], - dtype=np.float32) - gt_classes = np.array([[2, 1, -1], [1, -1, -1]], dtype=np.int32) - if include_mask: - gt_masks = np.ones((2, 3, 100, 100)) - else: - gt_masks = None - - # Results will be checked in test_forward. - _ = model( - images, - image_shape, - anchor_boxes, - gt_boxes, - gt_classes, - gt_masks, - training=is_training) - - @combinations.generate( - combinations.combine( - strategy=[ - strategy_combinations.cloud_tpu_strategy, - strategy_combinations.one_device_strategy_gpu, - ], - include_mask=[True, False], - build_anchor_boxes=[True, False], - use_cascade_heads=[True, False], - training=[True, False], - )) - def test_forward(self, strategy, include_mask, build_anchor_boxes, training, - use_cascade_heads): - num_classes = 3 - min_level = 3 - max_level = 4 - num_scales = 3 - aspect_ratios = [1.0] - anchor_size = 3 - if use_cascade_heads: - cascade_iou_thresholds = [0.6] - class_agnostic_bbox_pred = True - cascade_class_ensemble = True - else: - cascade_iou_thresholds = None - class_agnostic_bbox_pred = False - cascade_class_ensemble = False - - image_size = (256, 256) - images = np.random.rand(2, image_size[0], image_size[1], 3) - image_shape = np.array([[224, 100], [100, 224]]) - with strategy.scope(): - if build_anchor_boxes: - anchor_boxes = anchor.Anchor( - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size, - image_size=image_size).multilevel_boxes - else: - anchor_boxes = None - num_anchors_per_location = len(aspect_ratios) * num_scales - - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) - backbone = resnet.ResNet(model_id=50, input_specs=input_specs) - decoder = fpn.FPN( - min_level=min_level, - max_level=max_level, - input_specs=backbone.output_specs) - rpn_head = dense_prediction_heads.RPNHead( - min_level=min_level, - max_level=max_level, - num_anchors_per_location=num_anchors_per_location) - detection_head = instance_heads.DetectionHead( - num_classes=num_classes, - class_agnostic_bbox_pred=class_agnostic_bbox_pred) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - - roi_sampler_cascade = [] - roi_sampler_obj = roi_sampler.ROISampler() - roi_sampler_cascade.append(roi_sampler_obj) - if cascade_iou_thresholds: - for iou in cascade_iou_thresholds: - roi_sampler_obj = roi_sampler.ROISampler( - mix_gt_boxes=False, - foreground_iou_threshold=iou, - background_iou_high_threshold=iou, - background_iou_low_threshold=0.0, - skip_subsampling=True) - roi_sampler_cascade.append(roi_sampler_obj) - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - if include_mask: - mask_head = instance_heads.MaskHead( - num_classes=num_classes, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - else: - mask_head = None - mask_sampler_obj = None - mask_roi_aligner_obj = None - model = maskrcnn_model.MaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - class_agnostic_bbox_pred=class_agnostic_bbox_pred, - cascade_class_ensemble=cascade_class_ensemble, - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size) - - gt_boxes = np.array( - [[[10, 10, 15, 15], [2.5, 2.5, 7.5, 7.5], [-1, -1, -1, -1]], - [[100, 100, 150, 150], [-1, -1, -1, -1], [-1, -1, -1, -1]]], - dtype=np.float32) - gt_classes = np.array([[2, 1, -1], [1, -1, -1]], dtype=np.int32) - if include_mask: - gt_masks = np.ones((2, 3, 100, 100)) - else: - gt_masks = None - - results = model( - images, - image_shape, - anchor_boxes, - gt_boxes, - gt_classes, - gt_masks, - training=training) - - self.assertIn('rpn_boxes', results) - self.assertIn('rpn_scores', results) - if training: - self.assertIn('class_targets', results) - self.assertIn('box_targets', results) - self.assertIn('class_outputs', results) - self.assertIn('box_outputs', results) - if include_mask: - self.assertIn('mask_outputs', results) - else: - self.assertIn('detection_boxes', results) - self.assertIn('detection_scores', results) - self.assertIn('detection_classes', results) - self.assertIn('num_detections', results) - if include_mask: - self.assertIn('detection_masks', results) - - @parameterized.parameters( - (False,), - (True,), - ) - def test_serialize_deserialize(self, include_mask): - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) - backbone = resnet.ResNet(model_id=50, input_specs=input_specs) - decoder = fpn.FPN( - min_level=3, max_level=7, input_specs=backbone.output_specs) - rpn_head = dense_prediction_heads.RPNHead( - min_level=3, max_level=7, num_anchors_per_location=3) - detection_head = instance_heads.DetectionHead(num_classes=2) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - roi_sampler_obj = roi_sampler.ROISampler() - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - if include_mask: - mask_head = instance_heads.MaskHead(num_classes=2, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - else: - mask_head = None - mask_sampler_obj = None - mask_roi_aligner_obj = None - model = maskrcnn_model.MaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - min_level=3, - max_level=7, - num_scales=3, - aspect_ratios=[1.0], - anchor_size=3) - - config = model.get_config() - new_model = maskrcnn_model.MaskRCNNModel.from_config(config) - - # Validate that the config can be forced to JSON. - _ = new_model.to_json() - - # If the serialization was successful, the new config should match the old. - self.assertAllEqual(model.get_config(), new_model.get_config()) - - @parameterized.parameters( - (False,), - (True,), - ) - def test_checkpoint(self, include_mask): - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) - backbone = resnet.ResNet(model_id=50, input_specs=input_specs) - decoder = fpn.FPN( - min_level=3, max_level=7, input_specs=backbone.output_specs) - rpn_head = dense_prediction_heads.RPNHead( - min_level=3, max_level=7, num_anchors_per_location=3) - detection_head = instance_heads.DetectionHead(num_classes=2) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - roi_sampler_obj = roi_sampler.ROISampler() - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - if include_mask: - mask_head = instance_heads.MaskHead(num_classes=2, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - else: - mask_head = None - mask_sampler_obj = None - mask_roi_aligner_obj = None - model = maskrcnn_model.MaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - min_level=3, - max_level=7, - num_scales=3, - aspect_ratios=[1.0], - anchor_size=3) - expect_checkpoint_items = dict( - backbone=backbone, - decoder=decoder, - rpn_head=rpn_head, - detection_head=[detection_head]) - if include_mask: - expect_checkpoint_items['mask_head'] = mask_head - self.assertAllEqual(expect_checkpoint_items, model.checkpoint_items) - - # Test save and load checkpoints. - ckpt = tf.train.Checkpoint(model=model, **model.checkpoint_items) - save_dir = self.create_tempdir().full_path - ckpt.save(os.path.join(save_dir, 'ckpt')) - - partial_ckpt = tf.train.Checkpoint(backbone=backbone) - partial_ckpt.read(tf.train.latest_checkpoint( - save_dir)).expect_partial().assert_existing_objects_matched() - - if include_mask: - partial_ckpt_mask = tf.train.Checkpoint( - backbone=backbone, mask_head=mask_head) - partial_ckpt_mask.restore(tf.train.latest_checkpoint( - save_dir)).expect_partial().assert_existing_objects_matched() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/modeling/retinanet_model.py b/official/vision/beta/modeling/retinanet_model.py deleted file mode 100644 index 5d6f823906cfdcd07c5e918e3fc27f1021d7a17f..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/retinanet_model.py +++ /dev/null @@ -1,216 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""RetinaNet.""" -from typing import Any, Mapping, List, Optional, Union - -# Import libraries -import tensorflow as tf - -from official.vision.beta.ops import anchor - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class RetinaNetModel(tf.keras.Model): - """The RetinaNet model class.""" - - def __init__(self, - backbone: tf.keras.Model, - decoder: tf.keras.Model, - head: tf.keras.layers.Layer, - detection_generator: tf.keras.layers.Layer, - min_level: Optional[int] = None, - max_level: Optional[int] = None, - num_scales: Optional[int] = None, - aspect_ratios: Optional[List[float]] = None, - anchor_size: Optional[float] = None, - **kwargs): - """Classification initialization function. - - Args: - backbone: `tf.keras.Model` a backbone network. - decoder: `tf.keras.Model` a decoder network. - head: `RetinaNetHead`, the RetinaNet head. - detection_generator: the detection generator. - min_level: Minimum level in output feature maps. - max_level: Maximum level in output feature maps. - num_scales: A number representing intermediate scales added - on each level. For instances, num_scales=2 adds one additional - intermediate anchor scales [2^0, 2^0.5] on each level. - aspect_ratios: A list representing the aspect raito - anchors added on each level. The number indicates the ratio of width to - height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors - on each scale level. - anchor_size: A number representing the scale of size of the base - anchor to the feature stride 2^level. - **kwargs: keyword arguments to be passed. - """ - super(RetinaNetModel, self).__init__(**kwargs) - self._config_dict = { - 'backbone': backbone, - 'decoder': decoder, - 'head': head, - 'detection_generator': detection_generator, - 'min_level': min_level, - 'max_level': max_level, - 'num_scales': num_scales, - 'aspect_ratios': aspect_ratios, - 'anchor_size': anchor_size, - } - self._backbone = backbone - self._decoder = decoder - self._head = head - self._detection_generator = detection_generator - - def call(self, - images: tf.Tensor, - image_shape: Optional[tf.Tensor] = None, - anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, - output_intermediate_features: bool = False, - training: bool = None) -> Mapping[str, tf.Tensor]: - """Forward pass of the RetinaNet model. - - Args: - images: `Tensor`, the input batched images, whose shape is - [batch, height, width, 3]. - image_shape: `Tensor`, the actual shape of the input images, whose shape - is [batch, 2] where the last dimension is [height, width]. Note that - this is the actual image shape excluding paddings. For example, images - in the batch may be resized into different shapes before padding to the - fixed size. - anchor_boxes: a dict of tensors which includes multilevel anchors. - - key: `str`, the level of the multilevel predictions. - - values: `Tensor`, the anchor coordinates of a particular feature - level, whose shape is [height_l, width_l, num_anchors_per_location]. - output_intermediate_features: `bool` indicating whether to return the - intermediate feature maps generated by backbone and decoder. - training: `bool`, indicating whether it is in training mode. - - Returns: - scores: a dict of tensors which includes scores of the predictions. - - key: `str`, the level of the multilevel predictions. - - values: `Tensor`, the box scores predicted from a particular feature - level, whose shape is - [batch, height_l, width_l, num_classes * num_anchors_per_location]. - boxes: a dict of tensors which includes coordinates of the predictions. - - key: `str`, the level of the multilevel predictions. - - values: `Tensor`, the box coordinates predicted from a particular - feature level, whose shape is - [batch, height_l, width_l, 4 * num_anchors_per_location]. - attributes: a dict of (attribute_name, attribute_predictions). Each - attribute prediction is a dict that includes: - - key: `str`, the level of the multilevel predictions. - - values: `Tensor`, the attribute predictions from a particular - feature level, whose shape is - [batch, height_l, width_l, att_size * num_anchors_per_location]. - """ - outputs = {} - # Feature extraction. - features = self.backbone(images) - if output_intermediate_features: - outputs.update( - {'backbone_{}'.format(k): v for k, v in features.items()}) - if self.decoder: - features = self.decoder(features) - if output_intermediate_features: - outputs.update( - {'decoder_{}'.format(k): v for k, v in features.items()}) - - # Dense prediction. `raw_attributes` can be empty. - raw_scores, raw_boxes, raw_attributes = self.head(features) - - if training: - outputs.update({ - 'cls_outputs': raw_scores, - 'box_outputs': raw_boxes, - }) - if raw_attributes: - outputs.update({'attribute_outputs': raw_attributes}) - return outputs - else: - # Generate anchor boxes for this batch if not provided. - if anchor_boxes is None: - _, image_height, image_width, _ = images.get_shape().as_list() - anchor_boxes = anchor.Anchor( - min_level=self._config_dict['min_level'], - max_level=self._config_dict['max_level'], - num_scales=self._config_dict['num_scales'], - aspect_ratios=self._config_dict['aspect_ratios'], - anchor_size=self._config_dict['anchor_size'], - image_size=(image_height, image_width)).multilevel_boxes - for l in anchor_boxes: - anchor_boxes[l] = tf.tile( - tf.expand_dims(anchor_boxes[l], axis=0), - [tf.shape(images)[0], 1, 1, 1]) - - # Post-processing. - final_results = self.detection_generator(raw_boxes, raw_scores, - anchor_boxes, image_shape, - raw_attributes) - outputs.update({ - 'cls_outputs': raw_scores, - 'box_outputs': raw_boxes, - }) - if self.detection_generator.get_config()['apply_nms']: - outputs.update({ - 'detection_boxes': final_results['detection_boxes'], - 'detection_scores': final_results['detection_scores'], - 'detection_classes': final_results['detection_classes'], - 'num_detections': final_results['num_detections'] - }) - else: - outputs.update({ - 'decoded_boxes': final_results['decoded_boxes'], - 'decoded_box_scores': final_results['decoded_box_scores'] - }) - - if raw_attributes: - outputs.update({ - 'attribute_outputs': raw_attributes, - 'detection_attributes': final_results['detection_attributes'], - }) - return outputs - - @property - def checkpoint_items( - self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: - """Returns a dictionary of items to be additionally checkpointed.""" - items = dict(backbone=self.backbone, head=self.head) - if self.decoder is not None: - items.update(decoder=self.decoder) - - return items - - @property - def backbone(self) -> tf.keras.Model: - return self._backbone - - @property - def decoder(self) -> tf.keras.Model: - return self._decoder - - @property - def head(self) -> tf.keras.layers.Layer: - return self._head - - @property - def detection_generator(self) -> tf.keras.layers.Layer: - return self._detection_generator - - def get_config(self) -> Mapping[str, Any]: - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/modeling/segmentation_model.py b/official/vision/beta/modeling/segmentation_model.py deleted file mode 100644 index 6ac8192965b4bbb001be2921ad98cd54fcaea290..0000000000000000000000000000000000000000 --- a/official/vision/beta/modeling/segmentation_model.py +++ /dev/null @@ -1,94 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Build segmentation models.""" -from typing import Any, Mapping, Union, Optional, Dict - -# Import libraries -import tensorflow as tf - -layers = tf.keras.layers - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class SegmentationModel(tf.keras.Model): - """A Segmentation class model. - - Input images are passed through backbone first. Decoder network is then - applied, and finally, segmentation head is applied on the output of the - decoder network. Layers such as ASPP should be part of decoder. Any feature - fusion is done as part of the segmentation head (i.e. deeplabv3+ feature - fusion is not part of the decoder, instead it is part of the segmentation - head). This way, different feature fusion techniques can be combined with - different backbones, and decoders. - """ - - def __init__(self, backbone: tf.keras.Model, decoder: tf.keras.Model, - head: tf.keras.layers.Layer, - mask_scoring_head: Optional[tf.keras.layers.Layer] = None, - **kwargs): - """Segmentation initialization function. - - Args: - backbone: a backbone network. - decoder: a decoder network. E.g. FPN. - head: segmentation head. - mask_scoring_head: mask scoring head. - **kwargs: keyword arguments to be passed. - """ - super(SegmentationModel, self).__init__(**kwargs) - self._config_dict = { - 'backbone': backbone, - 'decoder': decoder, - 'head': head, - 'mask_scoring_head': mask_scoring_head, - } - self.backbone = backbone - self.decoder = decoder - self.head = head - self.mask_scoring_head = mask_scoring_head - - def call(self, inputs: tf.Tensor, training: bool = None - ) -> Dict[str, tf.Tensor]: - backbone_features = self.backbone(inputs) - - if self.decoder: - decoder_features = self.decoder(backbone_features) - else: - decoder_features = backbone_features - - logits = self.head((backbone_features, decoder_features)) - outputs = {'logits': logits} - if self.mask_scoring_head: - mask_scores = self.mask_scoring_head(logits) - outputs.update({'mask_scores': mask_scores}) - return outputs - - @property - def checkpoint_items( - self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: - """Returns a dictionary of items to be additionally checkpointed.""" - items = dict(backbone=self.backbone, head=self.head) - if self.decoder is not None: - items.update(decoder=self.decoder) - if self.mask_scoring_head is not None: - items.update(mask_scoring_head=self.mask_scoring_head) - return items - - def get_config(self) -> Mapping[str, Any]: - return self._config_dict - - @classmethod - def from_config(cls, config, custom_objects=None): - return cls(**config) diff --git a/official/vision/beta/ops/__init__.py b/official/vision/beta/ops/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/ops/anchor.py b/official/vision/beta/ops/anchor.py deleted file mode 100644 index 7d24bd85b5df2b1c51adfdceae46a50c0e4dbf10..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/anchor.py +++ /dev/null @@ -1,373 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Anchor box and labeler definition.""" - -import collections - -# Import libraries - -import tensorflow as tf - -from official.vision.beta.ops import anchor_generator -from official.vision.beta.ops import box_matcher -from official.vision.beta.ops import iou_similarity -from official.vision.beta.ops import target_gather -from official.vision.utils.object_detection import balanced_positive_negative_sampler -from official.vision.utils.object_detection import box_list -from official.vision.utils.object_detection import faster_rcnn_box_coder - - -class Anchor(object): - """Anchor class for anchor-based object detectors.""" - - def __init__(self, - min_level, - max_level, - num_scales, - aspect_ratios, - anchor_size, - image_size): - """Constructs multiscale anchors. - - Args: - min_level: integer number of minimum level of the output feature pyramid. - max_level: integer number of maximum level of the output feature pyramid. - num_scales: integer number representing intermediate scales added - on each level. For instances, num_scales=2 adds one additional - intermediate anchor scales [2^0, 2^0.5] on each level. - aspect_ratios: list of float numbers representing the aspect raito anchors - added on each level. The number indicates the ratio of width to height. - For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each - scale level. - anchor_size: float number representing the scale of size of the base - anchor to the feature stride 2^level. - image_size: a list of integer numbers or Tensors representing - [height, width] of the input image size.The image_size should be divided - by the largest feature stride 2^max_level. - """ - self.min_level = min_level - self.max_level = max_level - self.num_scales = num_scales - self.aspect_ratios = aspect_ratios - self.anchor_size = anchor_size - self.image_size = image_size - self.boxes = self._generate_boxes() - - def _generate_boxes(self): - """Generates multiscale anchor boxes. - - Returns: - a Tensor of shape [N, 4], representing anchor boxes of all levels - concatenated together. - """ - boxes_all = [] - for level in range(self.min_level, self.max_level + 1): - boxes_l = [] - for scale in range(self.num_scales): - for aspect_ratio in self.aspect_ratios: - stride = 2 ** level - intermidate_scale = 2 ** (scale / float(self.num_scales)) - base_anchor_size = self.anchor_size * stride * intermidate_scale - aspect_x = aspect_ratio ** 0.5 - aspect_y = aspect_ratio ** -0.5 - half_anchor_size_x = base_anchor_size * aspect_x / 2.0 - half_anchor_size_y = base_anchor_size * aspect_y / 2.0 - x = tf.range(stride / 2, self.image_size[1], stride) - y = tf.range(stride / 2, self.image_size[0], stride) - xv, yv = tf.meshgrid(x, y) - xv = tf.cast(tf.reshape(xv, [-1]), dtype=tf.float32) - yv = tf.cast(tf.reshape(yv, [-1]), dtype=tf.float32) - # Tensor shape Nx4. - boxes = tf.stack([yv - half_anchor_size_y, xv - half_anchor_size_x, - yv + half_anchor_size_y, xv + half_anchor_size_x], - axis=1) - boxes_l.append(boxes) - # Concat anchors on the same level to tensor shape NxAx4. - boxes_l = tf.stack(boxes_l, axis=1) - boxes_l = tf.reshape(boxes_l, [-1, 4]) - boxes_all.append(boxes_l) - return tf.concat(boxes_all, axis=0) - - def unpack_labels(self, labels): - """Unpacks an array of labels into multiscales labels.""" - unpacked_labels = collections.OrderedDict() - count = 0 - for level in range(self.min_level, self.max_level + 1): - feat_size_y = tf.cast(self.image_size[0] / 2 ** level, tf.int32) - feat_size_x = tf.cast(self.image_size[1] / 2 ** level, tf.int32) - steps = feat_size_y * feat_size_x * self.anchors_per_location - unpacked_labels[str(level)] = tf.reshape( - labels[count:count + steps], [feat_size_y, feat_size_x, -1]) - count += steps - return unpacked_labels - - @property - def anchors_per_location(self): - return self.num_scales * len(self.aspect_ratios) - - @property - def multilevel_boxes(self): - return self.unpack_labels(self.boxes) - - -class AnchorLabeler(object): - """Labeler for dense object detector.""" - - def __init__(self, - match_threshold=0.5, - unmatched_threshold=0.5): - """Constructs anchor labeler to assign labels to anchors. - - Args: - match_threshold: a float number between 0 and 1 representing the - lower-bound threshold to assign positive labels for anchors. An anchor - with a score over the threshold is labeled positive. - unmatched_threshold: a float number between 0 and 1 representing the - upper-bound threshold to assign negative labels for anchors. An anchor - with a score below the threshold is labeled negative. - """ - self.similarity_calc = iou_similarity.IouSimilarity() - self.target_gather = target_gather.TargetGather() - self.matcher = box_matcher.BoxMatcher( - thresholds=[unmatched_threshold, match_threshold], - indicators=[-1, -2, 1], - force_match_for_each_col=True) - self.box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() - - def label_anchors(self, - anchor_boxes, - gt_boxes, - gt_labels, - gt_attributes=None): - """Labels anchors with ground truth inputs. - - Args: - anchor_boxes: A float tensor with shape [N, 4] representing anchor boxes. - For each row, it stores [y0, x0, y1, x1] for four corners of a box. - gt_boxes: A float tensor with shape [N, 4] representing groundtruth boxes. - For each row, it stores [y0, x0, y1, x1] for four corners of a box. - gt_labels: A integer tensor with shape [N, 1] representing groundtruth - classes. - gt_attributes: If not None, a dict of (name, gt_attribute) pairs. - `gt_attribute` is a float tensor with shape [N, attribute_size] - representing groundtruth attributes. - Returns: - cls_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors_per_location]. The height_l and - width_l represent the dimension of class logits at l-th level. - box_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors_per_location * 4]. The height_l - and width_l represent the dimension of bounding box regression output at - l-th level. - attribute_targets_dict: a dict with (name, attribute_targets) pairs. Each - `attribute_targets` represents an ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors_per_location * attribute_size]. - The height_l and width_l represent the dimension of attribute prediction - output at l-th level. - cls_weights: A flattened Tensor with shape [batch_size, num_anchors], that - serves as masking / sample weight for classification loss. Its value - is 1.0 for positive and negative matched anchors, and 0.0 for ignored - anchors. - box_weights: A flattened Tensor with shape [batch_size, num_anchors], that - serves as masking / sample weight for regression loss. Its value is - 1.0 for positive matched anchors, and 0.0 for negative and ignored - anchors. - """ - flattened_anchor_boxes = [] - for anchors in anchor_boxes.values(): - flattened_anchor_boxes.append(tf.reshape(anchors, [-1, 4])) - flattened_anchor_boxes = tf.concat(flattened_anchor_boxes, axis=0) - similarity_matrix = self.similarity_calc(flattened_anchor_boxes, gt_boxes) - match_indices, match_indicators = self.matcher(similarity_matrix) - - mask = tf.less_equal(match_indicators, 0) - cls_mask = tf.expand_dims(mask, -1) - cls_targets = self.target_gather(gt_labels, match_indices, cls_mask, -1) - box_mask = tf.tile(cls_mask, [1, 4]) - box_targets = self.target_gather(gt_boxes, match_indices, box_mask) - att_targets = {} - if gt_attributes: - for k, v in gt_attributes.items(): - att_size = v.get_shape().as_list()[-1] - att_mask = tf.tile(cls_mask, [1, att_size]) - att_targets[k] = self.target_gather(v, match_indices, att_mask, 0.0) - - weights = tf.squeeze(tf.ones_like(gt_labels, dtype=tf.float32), -1) - box_weights = self.target_gather(weights, match_indices, mask) - ignore_mask = tf.equal(match_indicators, -2) - cls_weights = self.target_gather(weights, match_indices, ignore_mask) - box_targets_list = box_list.BoxList(box_targets) - anchor_box_list = box_list.BoxList(flattened_anchor_boxes) - box_targets = self.box_coder.encode(box_targets_list, anchor_box_list) - - # Unpacks labels into multi-level representations. - cls_targets_dict = unpack_targets(cls_targets, anchor_boxes) - box_targets_dict = unpack_targets(box_targets, anchor_boxes) - attribute_targets_dict = {} - for k, v in att_targets.items(): - attribute_targets_dict[k] = unpack_targets(v, anchor_boxes) - - return cls_targets_dict, box_targets_dict, attribute_targets_dict, cls_weights, box_weights - - -class RpnAnchorLabeler(AnchorLabeler): - """Labeler for Region Proposal Network.""" - - def __init__(self, - match_threshold=0.7, - unmatched_threshold=0.3, - rpn_batch_size_per_im=256, - rpn_fg_fraction=0.5): - AnchorLabeler.__init__(self, match_threshold=match_threshold, - unmatched_threshold=unmatched_threshold) - self._rpn_batch_size_per_im = rpn_batch_size_per_im - self._rpn_fg_fraction = rpn_fg_fraction - - def _get_rpn_samples(self, match_results): - """Computes anchor labels. - - This function performs subsampling for foreground (fg) and background (bg) - anchors. - Args: - match_results: A integer tensor with shape [N] representing the - matching results of anchors. (1) match_results[i]>=0, - meaning that column i is matched with row match_results[i]. - (2) match_results[i]=-1, meaning that column i is not matched. - (3) match_results[i]=-2, meaning that column i is ignored. - Returns: - score_targets: a integer tensor with the a shape of [N]. - (1) score_targets[i]=1, the anchor is a positive sample. - (2) score_targets[i]=0, negative. (3) score_targets[i]=-1, the anchor is - don't care (ignore). - """ - sampler = ( - balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( - positive_fraction=self._rpn_fg_fraction, is_static=False)) - # indicator includes both positive and negative labels. - # labels includes only positives labels. - # positives = indicator & labels. - # negatives = indicator & !labels. - # ignore = !indicator. - indicator = tf.greater(match_results, -2) - labels = tf.greater(match_results, -1) - - samples = sampler.subsample( - indicator, self._rpn_batch_size_per_im, labels) - positive_labels = tf.where( - tf.logical_and(samples, labels), - tf.constant(2, dtype=tf.int32, shape=match_results.shape), - tf.constant(0, dtype=tf.int32, shape=match_results.shape)) - negative_labels = tf.where( - tf.logical_and(samples, tf.logical_not(labels)), - tf.constant(1, dtype=tf.int32, shape=match_results.shape), - tf.constant(0, dtype=tf.int32, shape=match_results.shape)) - ignore_labels = tf.fill(match_results.shape, -1) - - return (ignore_labels + positive_labels + negative_labels, - positive_labels, negative_labels) - - def label_anchors(self, anchor_boxes, gt_boxes, gt_labels): - """Labels anchors with ground truth inputs. - - Args: - anchor_boxes: A float tensor with shape [N, 4] representing anchor boxes. - For each row, it stores [y0, x0, y1, x1] for four corners of a box. - gt_boxes: A float tensor with shape [N, 4] representing groundtruth boxes. - For each row, it stores [y0, x0, y1, x1] for four corners of a box. - gt_labels: A integer tensor with shape [N, 1] representing groundtruth - classes. - Returns: - score_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors]. The height_l and width_l - represent the dimension of class logits at l-th level. - box_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - """ - flattened_anchor_boxes = [] - for anchors in anchor_boxes.values(): - flattened_anchor_boxes.append(tf.reshape(anchors, [-1, 4])) - flattened_anchor_boxes = tf.concat(flattened_anchor_boxes, axis=0) - similarity_matrix = self.similarity_calc(flattened_anchor_boxes, gt_boxes) - match_indices, match_indicators = self.matcher(similarity_matrix) - box_mask = tf.tile(tf.expand_dims(tf.less_equal(match_indicators, 0), -1), - [1, 4]) - box_targets = self.target_gather(gt_boxes, match_indices, box_mask) - box_targets_list = box_list.BoxList(box_targets) - anchor_box_list = box_list.BoxList(flattened_anchor_boxes) - box_targets = self.box_coder.encode(box_targets_list, anchor_box_list) - - # Zero out the unmatched and ignored regression targets. - num_matches = match_indices.shape.as_list()[0] or tf.shape(match_indices)[0] - unmatched_ignored_box_targets = tf.zeros([num_matches, 4], dtype=tf.float32) - matched_anchors_mask = tf.greater_equal(match_indicators, 0) - # To broadcast matched_anchors_mask to the same shape as - # matched_reg_targets. - matched_anchors_mask = tf.tile( - tf.expand_dims(matched_anchors_mask, 1), - [1, tf.shape(box_targets)[1]]) - box_targets = tf.where(matched_anchors_mask, box_targets, - unmatched_ignored_box_targets) - - # score_targets contains the subsampled positive and negative anchors. - score_targets, _, _ = self._get_rpn_samples(match_indicators) - - # Unpacks labels. - score_targets_dict = unpack_targets(score_targets, anchor_boxes) - box_targets_dict = unpack_targets(box_targets, anchor_boxes) - - return score_targets_dict, box_targets_dict - - -def build_anchor_generator(min_level, max_level, num_scales, aspect_ratios, - anchor_size): - """Build anchor generator from levels.""" - anchor_sizes = collections.OrderedDict() - strides = collections.OrderedDict() - scales = [] - for scale in range(num_scales): - scales.append(2**(scale / float(num_scales))) - for level in range(min_level, max_level + 1): - stride = 2**level - strides[str(level)] = stride - anchor_sizes[str(level)] = anchor_size * stride - anchor_gen = anchor_generator.AnchorGenerator( - anchor_sizes=anchor_sizes, - scales=scales, - aspect_ratios=aspect_ratios, - strides=strides) - return anchor_gen - - -def unpack_targets(targets, anchor_boxes_dict): - """Unpacks an array of labels into multiscales labels.""" - unpacked_targets = collections.OrderedDict() - count = 0 - for level, anchor_boxes in anchor_boxes_dict.items(): - feat_size_shape = anchor_boxes.shape.as_list() - feat_size_y = feat_size_shape[0] - feat_size_x = feat_size_shape[1] - anchors_per_location = int(feat_size_shape[2] / 4) - steps = feat_size_y * feat_size_x * anchors_per_location - unpacked_targets[level] = tf.reshape(targets[count:count + steps], - [feat_size_y, feat_size_x, -1]) - count += steps - return unpacked_targets diff --git a/official/vision/beta/ops/augment.py b/official/vision/beta/ops/augment.py deleted file mode 100644 index 2ec7519f5d50889e43f40279fc82160d85c341b3..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/augment.py +++ /dev/null @@ -1,2286 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Augmentation policies for enhanced image/video preprocessing. - -AutoAugment Reference: - - AutoAugment Reference: https://arxiv.org/abs/1805.09501 - - AutoAugment for Object Detection Reference: https://arxiv.org/abs/1906.11172 -RandAugment Reference: https://arxiv.org/abs/1909.13719 -RandomErasing Reference: https://arxiv.org/abs/1708.04896 -MixupAndCutmix: - - Mixup: https://arxiv.org/abs/1710.09412 - - Cutmix: https://arxiv.org/abs/1905.04899 - -RandomErasing, Mixup and Cutmix are inspired by -https://github.com/rwightman/pytorch-image-models - -""" -import inspect -import math -from typing import Any, List, Iterable, Optional, Text, Tuple - -from keras.layers.preprocessing import image_preprocessing as image_ops -import numpy as np -import tensorflow as tf - - -# This signifies the max integer that the controller RNN could predict for the -# augmentation scheme. -_MAX_LEVEL = 10. - - -def to_4d(image: tf.Tensor) -> tf.Tensor: - """Converts an input Tensor to 4 dimensions. - - 4D image => [N, H, W, C] or [N, C, H, W] - 3D image => [1, H, W, C] or [1, C, H, W] - 2D image => [1, H, W, 1] - - Args: - image: The 2/3/4D input tensor. - - Returns: - A 4D image tensor. - - Raises: - `TypeError` if `image` is not a 2/3/4D tensor. - - """ - shape = tf.shape(image) - original_rank = tf.rank(image) - left_pad = tf.cast(tf.less_equal(original_rank, 3), dtype=tf.int32) - right_pad = tf.cast(tf.equal(original_rank, 2), dtype=tf.int32) - new_shape = tf.concat( - [ - tf.ones(shape=left_pad, dtype=tf.int32), - shape, - tf.ones(shape=right_pad, dtype=tf.int32), - ], - axis=0, - ) - return tf.reshape(image, new_shape) - - -def from_4d(image: tf.Tensor, ndims: tf.Tensor) -> tf.Tensor: - """Converts a 4D image back to `ndims` rank.""" - shape = tf.shape(image) - begin = tf.cast(tf.less_equal(ndims, 3), dtype=tf.int32) - end = 4 - tf.cast(tf.equal(ndims, 2), dtype=tf.int32) - new_shape = shape[begin:end] - return tf.reshape(image, new_shape) - - -def _convert_translation_to_transform(translations: tf.Tensor) -> tf.Tensor: - """Converts translations to a projective transform. - - The translation matrix looks like this: - [[1 0 -dx] - [0 1 -dy] - [0 0 1]] - - Args: - translations: The 2-element list representing [dx, dy], or a matrix of - 2-element lists representing [dx dy] to translate for each image. The - shape must be static. - - Returns: - The transformation matrix of shape (num_images, 8). - - Raises: - `TypeError` if - - the shape of `translations` is not known or - - the shape of `translations` is not rank 1 or 2. - - """ - translations = tf.convert_to_tensor(translations, dtype=tf.float32) - if translations.get_shape().ndims is None: - raise TypeError('translations rank must be statically known') - elif len(translations.get_shape()) == 1: - translations = translations[None] - elif len(translations.get_shape()) != 2: - raise TypeError('translations should have rank 1 or 2.') - num_translations = tf.shape(translations)[0] - - return tf.concat( - values=[ - tf.ones((num_translations, 1), tf.dtypes.float32), - tf.zeros((num_translations, 1), tf.dtypes.float32), - -translations[:, 0, None], - tf.zeros((num_translations, 1), tf.dtypes.float32), - tf.ones((num_translations, 1), tf.dtypes.float32), - -translations[:, 1, None], - tf.zeros((num_translations, 2), tf.dtypes.float32), - ], - axis=1, - ) - - -def _convert_angles_to_transform(angles: tf.Tensor, image_width: tf.Tensor, - image_height: tf.Tensor) -> tf.Tensor: - """Converts an angle or angles to a projective transform. - - Args: - angles: A scalar to rotate all images, or a vector to rotate a batch of - images. This must be a scalar. - image_width: The width of the image(s) to be transformed. - image_height: The height of the image(s) to be transformed. - - Returns: - A tensor of shape (num_images, 8). - - Raises: - `TypeError` if `angles` is not rank 0 or 1. - - """ - angles = tf.convert_to_tensor(angles, dtype=tf.float32) - if len(angles.get_shape()) == 0: # pylint:disable=g-explicit-length-test - angles = angles[None] - elif len(angles.get_shape()) != 1: - raise TypeError('Angles should have a rank 0 or 1.') - x_offset = ((image_width - 1) - - (tf.math.cos(angles) * (image_width - 1) - tf.math.sin(angles) * - (image_height - 1))) / 2.0 - y_offset = ((image_height - 1) - - (tf.math.sin(angles) * (image_width - 1) + tf.math.cos(angles) * - (image_height - 1))) / 2.0 - num_angles = tf.shape(angles)[0] - return tf.concat( - values=[ - tf.math.cos(angles)[:, None], - -tf.math.sin(angles)[:, None], - x_offset[:, None], - tf.math.sin(angles)[:, None], - tf.math.cos(angles)[:, None], - y_offset[:, None], - tf.zeros((num_angles, 2), tf.dtypes.float32), - ], - axis=1, - ) - - -def transform(image: tf.Tensor, transforms) -> tf.Tensor: - """Prepares input data for `image_ops.transform`.""" - original_ndims = tf.rank(image) - transforms = tf.convert_to_tensor(transforms, dtype=tf.float32) - if transforms.shape.rank == 1: - transforms = transforms[None] - image = to_4d(image) - image = image_ops.transform( - images=image, transforms=transforms, interpolation='nearest') - return from_4d(image, original_ndims) - - -def translate(image: tf.Tensor, translations) -> tf.Tensor: - """Translates image(s) by provided vectors. - - Args: - image: An image Tensor of type uint8. - translations: A vector or matrix representing [dx dy]. - - Returns: - The translated version of the image. - - """ - transforms = _convert_translation_to_transform(translations) - return transform(image, transforms=transforms) - - -def rotate(image: tf.Tensor, degrees: float) -> tf.Tensor: - """Rotates the image by degrees either clockwise or counterclockwise. - - Args: - image: An image Tensor of type uint8. - degrees: Float, a scalar angle in degrees to rotate all images by. If - degrees is positive the image will be rotated clockwise otherwise it will - be rotated counterclockwise. - - Returns: - The rotated version of image. - - """ - # Convert from degrees to radians. - degrees_to_radians = math.pi / 180.0 - radians = tf.cast(degrees * degrees_to_radians, tf.float32) - - original_ndims = tf.rank(image) - image = to_4d(image) - - image_height = tf.cast(tf.shape(image)[1], tf.float32) - image_width = tf.cast(tf.shape(image)[2], tf.float32) - transforms = _convert_angles_to_transform( - angles=radians, image_width=image_width, image_height=image_height) - # In practice, we should randomize the rotation degrees by flipping - # it negatively half the time, but that's done on 'degrees' outside - # of the function. - image = transform(image, transforms=transforms) - return from_4d(image, original_ndims) - - -def blend(image1: tf.Tensor, image2: tf.Tensor, factor: float) -> tf.Tensor: - """Blend image1 and image2 using 'factor'. - - Factor can be above 0.0. A value of 0.0 means only image1 is used. - A value of 1.0 means only image2 is used. A value between 0.0 and - 1.0 means we linearly interpolate the pixel values between the two - images. A value greater than 1.0 "extrapolates" the difference - between the two pixel values, and we clip the results to values - between 0 and 255. - - Args: - image1: An image Tensor of type uint8. - image2: An image Tensor of type uint8. - factor: A floating point value above 0.0. - - Returns: - A blended image Tensor of type uint8. - """ - if factor == 0.0: - return tf.convert_to_tensor(image1) - if factor == 1.0: - return tf.convert_to_tensor(image2) - - image1 = tf.cast(image1, tf.float32) - image2 = tf.cast(image2, tf.float32) - - difference = image2 - image1 - scaled = factor * difference - - # Do addition in float. - temp = tf.cast(image1, tf.float32) + scaled - - # Interpolate - if factor > 0.0 and factor < 1.0: - # Interpolation means we always stay within 0 and 255. - return tf.cast(temp, tf.uint8) - - # Extrapolate: - # - # We need to clip and then cast. - return tf.cast(tf.clip_by_value(temp, 0.0, 255.0), tf.uint8) - - -def cutout(image: tf.Tensor, pad_size: int, replace: int = 0) -> tf.Tensor: - """Apply cutout (https://arxiv.org/abs/1708.04552) to image. - - This operation applies a (2*pad_size x 2*pad_size) mask of zeros to - a random location within `image`. The pixel values filled in will be of the - value `replace`. The location where the mask will be applied is randomly - chosen uniformly over the whole image. - - Args: - image: An image Tensor of type uint8. - pad_size: Specifies how big the zero mask that will be generated is that is - applied to the image. The mask will be of size (2*pad_size x 2*pad_size). - replace: What pixel value to fill in the image in the area that has the - cutout mask applied to it. - - Returns: - An image Tensor that is of type uint8. - """ - if image.shape.rank not in [3, 4]: - raise ValueError('Bad image rank: {}'.format(image.shape.rank)) - - if image.shape.rank == 4: - return cutout_video(image, replace=replace) - - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - - # Sample the center location in the image where the zero mask will be applied. - cutout_center_height = tf.random.uniform( - shape=[], minval=0, maxval=image_height, dtype=tf.int32) - - cutout_center_width = tf.random.uniform( - shape=[], minval=0, maxval=image_width, dtype=tf.int32) - - image = _fill_rectangle(image, cutout_center_width, cutout_center_height, - pad_size, pad_size, replace) - - return image - - -def _fill_rectangle(image, - center_width, - center_height, - half_width, - half_height, - replace=None): - """Fill blank area.""" - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - - lower_pad = tf.maximum(0, center_height - half_height) - upper_pad = tf.maximum(0, image_height - center_height - half_height) - left_pad = tf.maximum(0, center_width - half_width) - right_pad = tf.maximum(0, image_width - center_width - half_width) - - cutout_shape = [ - image_height - (lower_pad + upper_pad), - image_width - (left_pad + right_pad) - ] - padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]] - mask = tf.pad( - tf.zeros(cutout_shape, dtype=image.dtype), - padding_dims, - constant_values=1) - mask = tf.expand_dims(mask, -1) - mask = tf.tile(mask, [1, 1, 3]) - - if replace is None: - fill = tf.random.normal(tf.shape(image), dtype=image.dtype) - elif isinstance(replace, tf.Tensor): - fill = replace - else: - fill = tf.ones_like(image, dtype=image.dtype) * replace - image = tf.where(tf.equal(mask, 0), fill, image) - - return image - - -def cutout_video(image: tf.Tensor, replace: int = 0) -> tf.Tensor: - """Apply cutout (https://arxiv.org/abs/1708.04552) to a video. - - This operation applies a random size 3D mask of zeros to a random location - within `image`. The mask is padded The pixel values filled in will be of the - value `replace`. The location where the mask will be applied is randomly - chosen uniformly over the whole image. The size of the mask is randomly - sampled uniformly from [0.25*height, 0.5*height], [0.25*width, 0.5*width], - and [1, 0.25*depth], which represent the height, width, and number of frames - of the input video tensor respectively. - - Args: - image: A video Tensor of type uint8. - replace: What pixel value to fill in the image in the area that has the - cutout mask applied to it. - - Returns: - An video Tensor that is of type uint8. - """ - image_depth = tf.shape(image)[0] - image_height = tf.shape(image)[1] - image_width = tf.shape(image)[2] - - # Sample the center location in the image where the zero mask will be applied. - cutout_center_height = tf.random.uniform( - shape=[], minval=0, maxval=image_height, dtype=tf.int32) - - cutout_center_width = tf.random.uniform( - shape=[], minval=0, maxval=image_width, dtype=tf.int32) - - cutout_center_depth = tf.random.uniform( - shape=[], minval=0, maxval=image_depth, dtype=tf.int32) - - pad_size_height = tf.random.uniform( - shape=[], - minval=tf.maximum(1, tf.cast(image_height / 4, tf.int32)), - maxval=tf.maximum(2, tf.cast(image_height / 2, tf.int32)), - dtype=tf.int32) - pad_size_width = tf.random.uniform( - shape=[], - minval=tf.maximum(1, tf.cast(image_width / 4, tf.int32)), - maxval=tf.maximum(2, tf.cast(image_width / 2, tf.int32)), - dtype=tf.int32) - pad_size_depth = tf.random.uniform( - shape=[], - minval=1, - maxval=tf.maximum(2, tf.cast(image_depth / 4, tf.int32)), - dtype=tf.int32) - - lower_pad = tf.maximum(0, cutout_center_height - pad_size_height) - upper_pad = tf.maximum( - 0, image_height - cutout_center_height - pad_size_height) - left_pad = tf.maximum(0, cutout_center_width - pad_size_width) - right_pad = tf.maximum(0, image_width - cutout_center_width - pad_size_width) - back_pad = tf.maximum(0, cutout_center_depth - pad_size_depth) - forward_pad = tf.maximum( - 0, image_depth - cutout_center_depth - pad_size_depth) - - cutout_shape = [ - image_depth - (back_pad + forward_pad), - image_height - (lower_pad + upper_pad), - image_width - (left_pad + right_pad), - ] - padding_dims = [[back_pad, forward_pad], - [lower_pad, upper_pad], - [left_pad, right_pad]] - mask = tf.pad( - tf.zeros(cutout_shape, dtype=image.dtype), - padding_dims, - constant_values=1) - mask = tf.expand_dims(mask, -1) - mask = tf.tile(mask, [1, 1, 1, 3]) - image = tf.where( - tf.equal(mask, 0), - tf.ones_like(image, dtype=image.dtype) * replace, image) - return image - - -def solarize(image: tf.Tensor, threshold: int = 128) -> tf.Tensor: - """Solarize the input image(s).""" - # For each pixel in the image, select the pixel - # if the value is less than the threshold. - # Otherwise, subtract 255 from the pixel. - return tf.where(image < threshold, image, 255 - image) - - -def solarize_add(image: tf.Tensor, - addition: int = 0, - threshold: int = 128) -> tf.Tensor: - """Additive solarize the input image(s).""" - # For each pixel in the image less than threshold - # we add 'addition' amount to it and then clip the - # pixel value to be between 0 and 255. The value - # of 'addition' is between -128 and 128. - added_image = tf.cast(image, tf.int64) + addition - added_image = tf.cast(tf.clip_by_value(added_image, 0, 255), tf.uint8) - return tf.where(image < threshold, added_image, image) - - -def color(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Color.""" - degenerate = tf.image.grayscale_to_rgb(tf.image.rgb_to_grayscale(image)) - return blend(degenerate, image, factor) - - -def contrast(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Contrast.""" - degenerate = tf.image.rgb_to_grayscale(image) - # Cast before calling tf.histogram. - degenerate = tf.cast(degenerate, tf.int32) - - # Compute the grayscale histogram, then compute the mean pixel value, - # and create a constant image size of that value. Use that as the - # blending degenerate target of the original image. - hist = tf.histogram_fixed_width(degenerate, [0, 255], nbins=256) - mean = tf.reduce_sum(tf.cast(hist, tf.float32)) / 256.0 - degenerate = tf.ones_like(degenerate, dtype=tf.float32) * mean - degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) - degenerate = tf.image.grayscale_to_rgb(tf.cast(degenerate, tf.uint8)) - return blend(degenerate, image, factor) - - -def brightness(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Brightness.""" - degenerate = tf.zeros_like(image) - return blend(degenerate, image, factor) - - -def posterize(image: tf.Tensor, bits: int) -> tf.Tensor: - """Equivalent of PIL Posterize.""" - shift = 8 - bits - return tf.bitwise.left_shift(tf.bitwise.right_shift(image, shift), shift) - - -def wrapped_rotate(image: tf.Tensor, degrees: float, replace: int) -> tf.Tensor: - """Applies rotation with wrap/unwrap.""" - image = rotate(wrap(image), degrees=degrees) - return unwrap(image, replace) - - -def translate_x(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: - """Equivalent of PIL Translate in X dimension.""" - image = translate(wrap(image), [-pixels, 0]) - return unwrap(image, replace) - - -def translate_y(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: - """Equivalent of PIL Translate in Y dimension.""" - image = translate(wrap(image), [0, -pixels]) - return unwrap(image, replace) - - -def shear_x(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: - """Equivalent of PIL Shearing in X dimension.""" - # Shear parallel to x axis is a projective transform - # with a matrix form of: - # [1 level - # 0 1]. - image = transform( - image=wrap(image), transforms=[1., level, 0., 0., 1., 0., 0., 0.]) - return unwrap(image, replace) - - -def shear_y(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: - """Equivalent of PIL Shearing in Y dimension.""" - # Shear parallel to y axis is a projective transform - # with a matrix form of: - # [1 0 - # level 1]. - image = transform( - image=wrap(image), transforms=[1., 0., 0., level, 1., 0., 0., 0.]) - return unwrap(image, replace) - - -def autocontrast(image: tf.Tensor) -> tf.Tensor: - """Implements Autocontrast function from PIL using TF ops. - - Args: - image: A 3D uint8 tensor. - - Returns: - The image after it has had autocontrast applied to it and will be of type - uint8. - """ - - def scale_channel(image: tf.Tensor) -> tf.Tensor: - """Scale the 2D image using the autocontrast rule.""" - # A possibly cheaper version can be done using cumsum/unique_with_counts - # over the histogram values, rather than iterating over the entire image. - # to compute mins and maxes. - lo = tf.cast(tf.reduce_min(image), tf.float32) - hi = tf.cast(tf.reduce_max(image), tf.float32) - - # Scale the image, making the lowest value 0 and the highest value 255. - def scale_values(im): - scale = 255.0 / (hi - lo) - offset = -lo * scale - im = tf.cast(im, tf.float32) * scale + offset - im = tf.clip_by_value(im, 0.0, 255.0) - return tf.cast(im, tf.uint8) - - result = tf.cond(hi > lo, lambda: scale_values(image), lambda: image) - return result - - # Assumes RGB for now. Scales each channel independently - # and then stacks the result. - s1 = scale_channel(image[..., 0]) - s2 = scale_channel(image[..., 1]) - s3 = scale_channel(image[..., 2]) - image = tf.stack([s1, s2, s3], -1) - - return image - - -def sharpness(image: tf.Tensor, factor: float) -> tf.Tensor: - """Implements Sharpness function from PIL using TF ops.""" - orig_image = image - image = tf.cast(image, tf.float32) - # Make image 4D for conv operation. - image = tf.expand_dims(image, 0) - # SMOOTH PIL Kernel. - if orig_image.shape.rank == 3: - kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]], - dtype=tf.float32, - shape=[3, 3, 1, 1]) / 13. - # Tile across channel dimension. - kernel = tf.tile(kernel, [1, 1, 3, 1]) - strides = [1, 1, 1, 1] - degenerate = tf.nn.depthwise_conv2d( - image, kernel, strides, padding='VALID', dilations=[1, 1]) - elif orig_image.shape.rank == 4: - kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]], - dtype=tf.float32, - shape=[1, 3, 3, 1, 1]) / 13. - strides = [1, 1, 1, 1, 1] - # Run the kernel across each channel - channels = tf.split(image, 3, axis=-1) - degenerates = [ - tf.nn.conv3d(channel, kernel, strides, padding='VALID', - dilations=[1, 1, 1, 1, 1]) - for channel in channels - ] - degenerate = tf.concat(degenerates, -1) - else: - raise ValueError('Bad image rank: {}'.format(image.shape.rank)) - degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) - degenerate = tf.squeeze(tf.cast(degenerate, tf.uint8), [0]) - - # For the borders of the resulting image, fill in the values of the - # original image. - mask = tf.ones_like(degenerate) - paddings = [[0, 0]] * (orig_image.shape.rank - 3) - padded_mask = tf.pad(mask, paddings + [[1, 1], [1, 1], [0, 0]]) - padded_degenerate = tf.pad(degenerate, paddings + [[1, 1], [1, 1], [0, 0]]) - result = tf.where(tf.equal(padded_mask, 1), padded_degenerate, orig_image) - - # Blend the final result. - return blend(result, orig_image, factor) - - -def equalize(image: tf.Tensor) -> tf.Tensor: - """Implements Equalize function from PIL using TF ops.""" - - def scale_channel(im, c): - """Scale the data in the channel to implement equalize.""" - im = tf.cast(im[..., c], tf.int32) - # Compute the histogram of the image channel. - histo = tf.histogram_fixed_width(im, [0, 255], nbins=256) - - # For the purposes of computing the step, filter out the nonzeros. - nonzero = tf.where(tf.not_equal(histo, 0)) - nonzero_histo = tf.reshape(tf.gather(histo, nonzero), [-1]) - step = (tf.reduce_sum(nonzero_histo) - nonzero_histo[-1]) // 255 - - def build_lut(histo, step): - # Compute the cumulative sum, shifting by step // 2 - # and then normalization by step. - lut = (tf.cumsum(histo) + (step // 2)) // step - # Shift lut, prepending with 0. - lut = tf.concat([[0], lut[:-1]], 0) - # Clip the counts to be in range. This is done - # in the C code for image.point. - return tf.clip_by_value(lut, 0, 255) - - # If step is zero, return the original image. Otherwise, build - # lut from the full histogram and step and then index from it. - result = tf.cond( - tf.equal(step, 0), lambda: im, - lambda: tf.gather(build_lut(histo, step), im)) - - return tf.cast(result, tf.uint8) - - # Assumes RGB for now. Scales each channel independently - # and then stacks the result. - s1 = scale_channel(image, 0) - s2 = scale_channel(image, 1) - s3 = scale_channel(image, 2) - image = tf.stack([s1, s2, s3], -1) - return image - - -def invert(image: tf.Tensor) -> tf.Tensor: - """Inverts the image pixels.""" - image = tf.convert_to_tensor(image) - return 255 - image - - -def wrap(image: tf.Tensor) -> tf.Tensor: - """Returns 'image' with an extra channel set to all 1s.""" - shape = tf.shape(image) - extended_channel = tf.expand_dims(tf.ones(shape[:-1], image.dtype), -1) - extended = tf.concat([image, extended_channel], axis=-1) - return extended - - -def unwrap(image: tf.Tensor, replace: int) -> tf.Tensor: - """Unwraps an image produced by wrap. - - Where there is a 0 in the last channel for every spatial position, - the rest of the three channels in that spatial dimension are grayed - (set to 128). Operations like translate and shear on a wrapped - Tensor will leave 0s in empty locations. Some transformations look - at the intensity of values to do preprocessing, and we want these - empty pixels to assume the 'average' value, rather than pure black. - - - Args: - image: A 3D Image Tensor with 4 channels. - replace: A one or three value 1D tensor to fill empty pixels. - - Returns: - image: A 3D image Tensor with 3 channels. - """ - image_shape = tf.shape(image) - # Flatten the spatial dimensions. - flattened_image = tf.reshape(image, [-1, image_shape[-1]]) - - # Find all pixels where the last channel is zero. - alpha_channel = tf.expand_dims(flattened_image[..., 3], axis=-1) - - replace = tf.concat([replace, tf.ones([1], image.dtype)], 0) - - # Where they are zero, fill them in with 'replace'. - flattened_image = tf.where( - tf.equal(alpha_channel, 0), - tf.ones_like(flattened_image, dtype=image.dtype) * replace, - flattened_image) - - image = tf.reshape(flattened_image, image_shape) - image = tf.slice( - image, - [0] * image.shape.rank, - tf.concat([image_shape[:-1], [3]], -1)) - return image - - -def _scale_bbox_only_op_probability(prob): - """Reduce the probability of the bbox-only operation. - - Probability is reduced so that we do not distort the content of too many - bounding boxes that are close to each other. The value of 3.0 was a chosen - hyper parameter when designing the autoaugment algorithm that we found - empirically to work well. - - Args: - prob: Float that is the probability of applying the bbox-only operation. - - Returns: - Reduced probability. - """ - return prob / 3.0 - - -def _apply_bbox_augmentation(image, bbox, augmentation_func, *args): - """Applies augmentation_func to the subsection of image indicated by bbox. - - Args: - image: 3D uint8 Tensor. - bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) - of type float that represents the normalized coordinates between 0 and 1. - augmentation_func: Augmentation function that will be applied to the - subsection of image. - *args: Additional parameters that will be passed into augmentation_func - when it is called. - - Returns: - A modified version of image, where the bbox location in the image will - have `ugmentation_func applied to it. - """ - image_height = tf.cast(tf.shape(image)[0], tf.float32) - image_width = tf.cast(tf.shape(image)[1], tf.float32) - min_y = tf.cast(image_height * bbox[0], tf.int32) - min_x = tf.cast(image_width * bbox[1], tf.int32) - max_y = tf.cast(image_height * bbox[2], tf.int32) - max_x = tf.cast(image_width * bbox[3], tf.int32) - image_height = tf.cast(image_height, tf.int32) - image_width = tf.cast(image_width, tf.int32) - - # Clip to be sure the max values do not fall out of range. - max_y = tf.minimum(max_y, image_height - 1) - max_x = tf.minimum(max_x, image_width - 1) - - # Get the sub-tensor that is the image within the bounding box region. - bbox_content = image[min_y:max_y + 1, min_x:max_x + 1, :] - - # Apply the augmentation function to the bbox portion of the image. - augmented_bbox_content = augmentation_func(bbox_content, *args) - - # Pad the augmented_bbox_content and the mask to match the shape of original - # image. - augmented_bbox_content = tf.pad(augmented_bbox_content, - [[min_y, (image_height - 1) - max_y], - [min_x, (image_width - 1) - max_x], - [0, 0]]) - - # Create a mask that will be used to zero out a part of the original image. - mask_tensor = tf.zeros_like(bbox_content) - - mask_tensor = tf.pad(mask_tensor, - [[min_y, (image_height - 1) - max_y], - [min_x, (image_width - 1) - max_x], - [0, 0]], - constant_values=1) - # Replace the old bbox content with the new augmented content. - image = image * mask_tensor + augmented_bbox_content - return image - - -def _concat_bbox(bbox, bboxes): - """Helper function that concates bbox to bboxes along the first dimension.""" - - # Note if all elements in bboxes are -1 (_INVALID_BOX), then this means - # we discard bboxes and start the bboxes Tensor with the current bbox. - bboxes_sum_check = tf.reduce_sum(bboxes) - bbox = tf.expand_dims(bbox, 0) - # This check will be true when it is an _INVALID_BOX - bboxes = tf.cond(tf.equal(bboxes_sum_check, -4.0), - lambda: bbox, - lambda: tf.concat([bboxes, bbox], 0)) - return bboxes - - -def _apply_bbox_augmentation_wrapper(image, bbox, new_bboxes, prob, - augmentation_func, func_changes_bbox, - *args): - """Applies _apply_bbox_augmentation with probability prob. - - Args: - image: 3D uint8 Tensor. - bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) - of type float that represents the normalized coordinates between 0 and 1. - new_bboxes: 2D Tensor that is a list of the bboxes in the image after they - have been altered by aug_func. These will only be changed when - func_changes_bbox is set to true. Each bbox has 4 elements - (min_y, min_x, max_y, max_x) of type float that are the normalized - bbox coordinates between 0 and 1. - prob: Float that is the probability of applying _apply_bbox_augmentation. - augmentation_func: Augmentation function that will be applied to the - subsection of image. - func_changes_bbox: Boolean. Does augmentation_func return bbox in addition - to image. - *args: Additional parameters that will be passed into augmentation_func - when it is called. - - Returns: - A tuple. Fist element is a modified version of image, where the bbox - location in the image will have augmentation_func applied to it if it is - chosen to be called with probability `prob`. The second element is a - Tensor of Tensors of length 4 that will contain the altered bbox after - applying augmentation_func. - """ - should_apply_op = tf.cast( - tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) - if func_changes_bbox: - augmented_image, bbox = tf.cond( - should_apply_op, - lambda: augmentation_func(image, bbox, *args), - lambda: (image, bbox)) - else: - augmented_image = tf.cond( - should_apply_op, - lambda: _apply_bbox_augmentation(image, bbox, augmentation_func, *args), - lambda: image) - new_bboxes = _concat_bbox(bbox, new_bboxes) - return augmented_image, new_bboxes - - -def _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, aug_func, - func_changes_bbox, *args): - """Checks to be sure num bboxes > 0 before calling inner function.""" - num_bboxes = tf.shape(bboxes)[0] - image, bboxes = tf.cond( - tf.equal(num_bboxes, 0), - lambda: (image, bboxes), - # pylint:disable=g-long-lambda - lambda: _apply_multi_bbox_augmentation( - image, bboxes, prob, aug_func, func_changes_bbox, *args)) - # pylint:enable=g-long-lambda - return image, bboxes - - -# Represents an invalid bounding box that is used for checking for padding -# lists of bounding box coordinates for a few augmentation operations -_INVALID_BOX = [[-1.0, -1.0, -1.0, -1.0]] - - -def _apply_multi_bbox_augmentation(image, bboxes, prob, aug_func, - func_changes_bbox, *args): - """Applies aug_func to the image for each bbox in bboxes. - - Args: - image: 3D uint8 Tensor. - bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox - has 4 elements (min_y, min_x, max_y, max_x) of type float. - prob: Float that is the probability of applying aug_func to a specific - bounding box within the image. - aug_func: Augmentation function that will be applied to the - subsections of image indicated by the bbox values in bboxes. - func_changes_bbox: Boolean. Does augmentation_func return bbox in addition - to image. - *args: Additional parameters that will be passed into augmentation_func - when it is called. - - Returns: - A modified version of image, where each bbox location in the image will - have augmentation_func applied to it if it is chosen to be called with - probability prob independently across all bboxes. Also the final - bboxes are returned that will be unchanged if func_changes_bbox is set to - false and if true, the new altered ones will be returned. - - Raises: - ValueError if applied to video. - """ - if image.shape.rank == 4: - raise ValueError('Image rank 4 is not supported') - - # Will keep track of the new altered bboxes after aug_func is repeatedly - # applied. The -1 values are a dummy value and this first Tensor will be - # removed upon appending the first real bbox. - new_bboxes = tf.constant(_INVALID_BOX) - - # If the bboxes are empty, then just give it _INVALID_BOX. The result - # will be thrown away. - bboxes = tf.cond(tf.equal(tf.size(bboxes), 0), - lambda: tf.constant(_INVALID_BOX), - lambda: bboxes) - - bboxes = tf.ensure_shape(bboxes, (None, 4)) - - # pylint:disable=g-long-lambda - wrapped_aug_func = ( - lambda _image, bbox, _new_bboxes: _apply_bbox_augmentation_wrapper( - _image, bbox, _new_bboxes, prob, aug_func, func_changes_bbox, *args)) - # pylint:enable=g-long-lambda - - # Setup the while_loop. - num_bboxes = tf.shape(bboxes)[0] # We loop until we go over all bboxes. - idx = tf.constant(0) # Counter for the while loop. - - # Conditional function when to end the loop once we go over all bboxes - # images_and_bboxes contain (_image, _new_bboxes) - cond = lambda _idx, _images_and_bboxes: tf.less(_idx, num_bboxes) - - # Shuffle the bboxes so that the augmentation order is not deterministic if - # we are not changing the bboxes with aug_func. - if not func_changes_bbox: - loop_bboxes = tf.random.shuffle(bboxes) - else: - loop_bboxes = bboxes - - # Main function of while_loop where we repeatedly apply augmentation on the - # bboxes in the image. - # pylint:disable=g-long-lambda - body = lambda _idx, _images_and_bboxes: [ - _idx + 1, wrapped_aug_func(_images_and_bboxes[0], - loop_bboxes[_idx], - _images_and_bboxes[1])] - # pylint:enable=g-long-lambda - - _, (image, new_bboxes) = tf.while_loop( - cond, body, [idx, (image, new_bboxes)], - shape_invariants=[idx.get_shape(), - (image.get_shape(), tf.TensorShape([None, 4]))]) - - # Either return the altered bboxes or the original ones depending on if - # we altered them in anyway. - if func_changes_bbox: - final_bboxes = new_bboxes - else: - final_bboxes = bboxes - return image, final_bboxes - - -def _clip_bbox(min_y, min_x, max_y, max_x): - """Clip bounding box coordinates between 0 and 1. - - Args: - min_y: Normalized bbox coordinate of type float between 0 and 1. - min_x: Normalized bbox coordinate of type float between 0 and 1. - max_y: Normalized bbox coordinate of type float between 0 and 1. - max_x: Normalized bbox coordinate of type float between 0 and 1. - - Returns: - Clipped coordinate values between 0 and 1. - """ - min_y = tf.clip_by_value(min_y, 0.0, 1.0) - min_x = tf.clip_by_value(min_x, 0.0, 1.0) - max_y = tf.clip_by_value(max_y, 0.0, 1.0) - max_x = tf.clip_by_value(max_x, 0.0, 1.0) - return min_y, min_x, max_y, max_x - - -def _check_bbox_area(min_y, min_x, max_y, max_x, delta=0.05): - """Adjusts bbox coordinates to make sure the area is > 0. - - Args: - min_y: Normalized bbox coordinate of type float between 0 and 1. - min_x: Normalized bbox coordinate of type float between 0 and 1. - max_y: Normalized bbox coordinate of type float between 0 and 1. - max_x: Normalized bbox coordinate of type float between 0 and 1. - delta: Float, this is used to create a gap of size 2 * delta between - bbox min/max coordinates that are the same on the boundary. - This prevents the bbox from having an area of zero. - - Returns: - Tuple of new bbox coordinates between 0 and 1 that will now have a - guaranteed area > 0. - """ - height = max_y - min_y - width = max_x - min_x - def _adjust_bbox_boundaries(min_coord, max_coord): - # Make sure max is never 0 and min is never 1. - max_coord = tf.maximum(max_coord, 0.0 + delta) - min_coord = tf.minimum(min_coord, 1.0 - delta) - return min_coord, max_coord - min_y, max_y = tf.cond(tf.equal(height, 0.0), - lambda: _adjust_bbox_boundaries(min_y, max_y), - lambda: (min_y, max_y)) - min_x, max_x = tf.cond(tf.equal(width, 0.0), - lambda: _adjust_bbox_boundaries(min_x, max_x), - lambda: (min_x, max_x)) - return min_y, min_x, max_y, max_x - - -def _rotate_bbox(bbox, image_height, image_width, degrees): - """Rotates the bbox coordinated by degrees. - - Args: - bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) - of type float that represents the normalized coordinates between 0 and 1. - image_height: Int, height of the image. - image_width: Int, height of the image. - degrees: Float, a scalar angle in degrees to rotate all images by. If - degrees is positive the image will be rotated clockwise otherwise it will - be rotated counterclockwise. - - Returns: - A tensor of the same shape as bbox, but now with the rotated coordinates. - """ - image_height, image_width = ( - tf.cast(image_height, tf.float32), tf.cast(image_width, tf.float32)) - - # Convert from degrees to radians. - degrees_to_radians = math.pi / 180.0 - radians = degrees * degrees_to_radians - - # Translate the bbox to the center of the image and turn the normalized 0-1 - # coordinates to absolute pixel locations. - # Y coordinates are made negative as the y axis of images goes down with - # increasing pixel values, so we negate to make sure x axis and y axis points - # are in the traditionally positive direction. - min_y = -tf.cast(image_height * (bbox[0] - 0.5), tf.int32) - min_x = tf.cast(image_width * (bbox[1] - 0.5), tf.int32) - max_y = -tf.cast(image_height * (bbox[2] - 0.5), tf.int32) - max_x = tf.cast(image_width * (bbox[3] - 0.5), tf.int32) - coordinates = tf.stack( - [[min_y, min_x], [min_y, max_x], [max_y, min_x], [max_y, max_x]]) - coordinates = tf.cast(coordinates, tf.float32) - # Rotate the coordinates according to the rotation matrix clockwise if - # radians is positive, else negative - rotation_matrix = tf.stack( - [[tf.cos(radians), tf.sin(radians)], - [-tf.sin(radians), tf.cos(radians)]]) - new_coords = tf.cast( - tf.matmul(rotation_matrix, tf.transpose(coordinates)), tf.int32) - # Find min/max values and convert them back to normalized 0-1 floats. - min_y = -( - tf.cast(tf.reduce_max(new_coords[0, :]), tf.float32) / image_height - 0.5) - min_x = tf.cast(tf.reduce_min(new_coords[1, :]), - tf.float32) / image_width + 0.5 - max_y = -( - tf.cast(tf.reduce_min(new_coords[0, :]), tf.float32) / image_height - 0.5) - max_x = tf.cast(tf.reduce_max(new_coords[1, :]), - tf.float32) / image_width + 0.5 - - # Clip the bboxes to be sure the fall between [0, 1]. - min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) - min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) - return tf.stack([min_y, min_x, max_y, max_x]) - - -def rotate_with_bboxes(image, bboxes, degrees, replace): - """Equivalent of PIL Rotate that rotates the image and bbox. - - Args: - image: 3D uint8 Tensor. - bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox - has 4 elements (min_y, min_x, max_y, max_x) of type float. - degrees: Float, a scalar angle in degrees to rotate all images by. If - degrees is positive the image will be rotated clockwise otherwise it will - be rotated counterclockwise. - replace: A one or three value 1D tensor to fill empty pixels. - - Returns: - A tuple containing a 3D uint8 Tensor that will be the result of rotating - image by degrees. The second element of the tuple is bboxes, where now - the coordinates will be shifted to reflect the rotated image. - - Raises: - ValueError: If applied to video. - """ - if image.shape.rank == 4: - raise ValueError('Image rank 4 is not supported') - - # Rotate the image. - image = wrapped_rotate(image, degrees, replace) - - # Convert bbox coordinates to pixel values. - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - # pylint:disable=g-long-lambda - wrapped_rotate_bbox = lambda bbox: _rotate_bbox( - bbox, image_height, image_width, degrees) - # pylint:enable=g-long-lambda - bboxes = tf.map_fn(wrapped_rotate_bbox, bboxes) - return image, bboxes - - -def _shear_bbox(bbox, image_height, image_width, level, shear_horizontal): - """Shifts the bbox according to how the image was sheared. - - Args: - bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) - of type float that represents the normalized coordinates between 0 and 1. - image_height: Int, height of the image. - image_width: Int, height of the image. - level: Float. How much to shear the image. - shear_horizontal: If true then shear in X dimension else shear in - the Y dimension. - - Returns: - A tensor of the same shape as bbox, but now with the shifted coordinates. - """ - image_height, image_width = ( - tf.cast(image_height, tf.float32), tf.cast(image_width, tf.float32)) - - # Change bbox coordinates to be pixels. - min_y = tf.cast(image_height * bbox[0], tf.int32) - min_x = tf.cast(image_width * bbox[1], tf.int32) - max_y = tf.cast(image_height * bbox[2], tf.int32) - max_x = tf.cast(image_width * bbox[3], tf.int32) - coordinates = tf.stack( - [[min_y, min_x], [min_y, max_x], [max_y, min_x], [max_y, max_x]]) - coordinates = tf.cast(coordinates, tf.float32) - - # Shear the coordinates according to the translation matrix. - if shear_horizontal: - translation_matrix = tf.stack( - [[1, 0], [-level, 1]]) - else: - translation_matrix = tf.stack( - [[1, -level], [0, 1]]) - translation_matrix = tf.cast(translation_matrix, tf.float32) - new_coords = tf.cast( - tf.matmul(translation_matrix, tf.transpose(coordinates)), tf.int32) - - # Find min/max values and convert them back to floats. - min_y = tf.cast(tf.reduce_min(new_coords[0, :]), tf.float32) / image_height - min_x = tf.cast(tf.reduce_min(new_coords[1, :]), tf.float32) / image_width - max_y = tf.cast(tf.reduce_max(new_coords[0, :]), tf.float32) / image_height - max_x = tf.cast(tf.reduce_max(new_coords[1, :]), tf.float32) / image_width - - # Clip the bboxes to be sure the fall between [0, 1]. - min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) - min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) - return tf.stack([min_y, min_x, max_y, max_x]) - - -def shear_with_bboxes(image, bboxes, level, replace, shear_horizontal): - """Applies Shear Transformation to the image and shifts the bboxes. - - Args: - image: 3D uint8 Tensor. - bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox - has 4 elements (min_y, min_x, max_y, max_x) of type float with values - between [0, 1]. - level: Float. How much to shear the image. This value will be between - -0.3 to 0.3. - replace: A one or three value 1D tensor to fill empty pixels. - shear_horizontal: Boolean. If true then shear in X dimension else shear in - the Y dimension. - - Returns: - A tuple containing a 3D uint8 Tensor that will be the result of shearing - image by level. The second element of the tuple is bboxes, where now - the coordinates will be shifted to reflect the sheared image. - - Raises: - ValueError: If applied to video. - """ - if image.shape.rank == 4: - raise ValueError('Image rank 4 is not supported') - - if shear_horizontal: - image = shear_x(image, level, replace) - else: - image = shear_y(image, level, replace) - - # Convert bbox coordinates to pixel values. - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - # pylint:disable=g-long-lambda - wrapped_shear_bbox = lambda bbox: _shear_bbox( - bbox, image_height, image_width, level, shear_horizontal) - # pylint:enable=g-long-lambda - bboxes = tf.map_fn(wrapped_shear_bbox, bboxes) - return image, bboxes - - -def _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal): - """Shifts the bbox coordinates by pixels. - - Args: - bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) - of type float that represents the normalized coordinates between 0 and 1. - image_height: Int, height of the image. - image_width: Int, width of the image. - pixels: An int. How many pixels to shift the bbox. - shift_horizontal: Boolean. If true then shift in X dimension else shift in - Y dimension. - - Returns: - A tensor of the same shape as bbox, but now with the shifted coordinates. - """ - pixels = tf.cast(pixels, tf.int32) - # Convert bbox to integer pixel locations. - min_y = tf.cast(tf.cast(image_height, tf.float32) * bbox[0], tf.int32) - min_x = tf.cast(tf.cast(image_width, tf.float32) * bbox[1], tf.int32) - max_y = tf.cast(tf.cast(image_height, tf.float32) * bbox[2], tf.int32) - max_x = tf.cast(tf.cast(image_width, tf.float32) * bbox[3], tf.int32) - - if shift_horizontal: - min_x = tf.maximum(0, min_x - pixels) - max_x = tf.minimum(image_width, max_x - pixels) - else: - min_y = tf.maximum(0, min_y - pixels) - max_y = tf.minimum(image_height, max_y - pixels) - - # Convert bbox back to floats. - min_y = tf.cast(min_y, tf.float32) / tf.cast(image_height, tf.float32) - min_x = tf.cast(min_x, tf.float32) / tf.cast(image_width, tf.float32) - max_y = tf.cast(max_y, tf.float32) / tf.cast(image_height, tf.float32) - max_x = tf.cast(max_x, tf.float32) / tf.cast(image_width, tf.float32) - - # Clip the bboxes to be sure the fall between [0, 1]. - min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) - min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) - return tf.stack([min_y, min_x, max_y, max_x]) - - -def translate_bbox(image, bboxes, pixels, replace, shift_horizontal): - """Equivalent of PIL Translate in X/Y dimension that shifts image and bbox. - - Args: - image: 3D uint8 Tensor. - bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox - has 4 elements (min_y, min_x, max_y, max_x) of type float with values - between [0, 1]. - pixels: An int. How many pixels to shift the image and bboxes - replace: A one or three value 1D tensor to fill empty pixels. - shift_horizontal: Boolean. If true then shift in X dimension else shift in - Y dimension. - - Returns: - A tuple containing a 3D uint8 Tensor that will be the result of translating - image by pixels. The second element of the tuple is bboxes, where now - the coordinates will be shifted to reflect the shifted image. - - Raises: - ValueError if applied to video. - """ - if image.shape.rank == 4: - raise ValueError('Image rank 4 is not supported') - - if shift_horizontal: - image = translate_x(image, pixels, replace) - else: - image = translate_y(image, pixels, replace) - - # Convert bbox coordinates to pixel values. - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - # pylint:disable=g-long-lambda - wrapped_shift_bbox = lambda bbox: _shift_bbox( - bbox, image_height, image_width, pixels, shift_horizontal) - # pylint:enable=g-long-lambda - bboxes = tf.map_fn(wrapped_shift_bbox, bboxes) - return image, bboxes - - -def translate_y_only_bboxes( - image: tf.Tensor, bboxes: tf.Tensor, prob: float, pixels: int, replace): - """Apply translate_y to each bbox in the image with probability prob.""" - if bboxes.shape.rank == 4: - raise ValueError('translate_y_only_bboxes does not support rank 4 boxes') - - func_changes_bbox = False - prob = _scale_bbox_only_op_probability(prob) - return _apply_multi_bbox_augmentation_wrapper( - image, bboxes, prob, translate_y, func_changes_bbox, pixels, replace) - - -def _randomly_negate_tensor(tensor): - """With 50% prob turn the tensor negative.""" - should_flip = tf.cast(tf.floor(tf.random.uniform([]) + 0.5), tf.bool) - final_tensor = tf.cond(should_flip, lambda: tensor, lambda: -tensor) - return final_tensor - - -def _rotate_level_to_arg(level: float): - level = (level / _MAX_LEVEL) * 30. - level = _randomly_negate_tensor(level) - return (level,) - - -def _shrink_level_to_arg(level: float): - """Converts level to ratio by which we shrink the image content.""" - if level == 0: - return (1.0,) # if level is zero, do not shrink the image - # Maximum shrinking ratio is 2.9. - level = 2. / (_MAX_LEVEL / level) + 0.9 - return (level,) - - -def _enhance_level_to_arg(level: float): - return ((level / _MAX_LEVEL) * 1.8 + 0.1,) - - -def _shear_level_to_arg(level: float): - level = (level / _MAX_LEVEL) * 0.3 - # Flip level to negative with 50% chance. - level = _randomly_negate_tensor(level) - return (level,) - - -def _translate_level_to_arg(level: float, translate_const: float): - level = (level / _MAX_LEVEL) * float(translate_const) - # Flip level to negative with 50% chance. - level = _randomly_negate_tensor(level) - return (level,) - - -def _mult_to_arg(level: float, multiplier: float = 1.): - return (int((level / _MAX_LEVEL) * multiplier),) - - -def _apply_func_with_prob(func: Any, image: tf.Tensor, - bboxes: Optional[tf.Tensor], args: Any, prob: float): - """Apply `func` to image w/ `args` as input with probability `prob`.""" - assert isinstance(args, tuple) - assert inspect.getfullargspec(func)[0][1] == 'bboxes' - - # Apply the function with probability `prob`. - should_apply_op = tf.cast( - tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) - augmented_image, augmented_bboxes = tf.cond( - should_apply_op, - lambda: func(image, bboxes, *args), - lambda: (image, bboxes)) - return augmented_image, augmented_bboxes - - -def select_and_apply_random_policy(policies: Any, - image: tf.Tensor, - bboxes: Optional[tf.Tensor] = None): - """Select a random policy from `policies` and apply it to `image`.""" - policy_to_select = tf.random.uniform([], maxval=len(policies), dtype=tf.int32) - # Note that using tf.case instead of tf.conds would result in significantly - # larger graphs and would even break export for some larger policies. - for (i, policy) in enumerate(policies): - image, bboxes = tf.cond( - tf.equal(i, policy_to_select), - lambda selected_policy=policy: selected_policy(image, bboxes), - lambda: (image, bboxes)) - return image, bboxes - - -NAME_TO_FUNC = { - 'AutoContrast': autocontrast, - 'Equalize': equalize, - 'Invert': invert, - 'Rotate': wrapped_rotate, - 'Posterize': posterize, - 'Solarize': solarize, - 'SolarizeAdd': solarize_add, - 'Color': color, - 'Contrast': contrast, - 'Brightness': brightness, - 'Sharpness': sharpness, - 'ShearX': shear_x, - 'ShearY': shear_y, - 'TranslateX': translate_x, - 'TranslateY': translate_y, - 'Cutout': cutout, - 'Rotate_BBox': rotate_with_bboxes, - # pylint:disable=g-long-lambda - 'ShearX_BBox': lambda image, bboxes, level, replace: shear_with_bboxes( - image, bboxes, level, replace, shear_horizontal=True), - 'ShearY_BBox': lambda image, bboxes, level, replace: shear_with_bboxes( - image, bboxes, level, replace, shear_horizontal=False), - 'TranslateX_BBox': lambda image, bboxes, pixels, replace: translate_bbox( - image, bboxes, pixels, replace, shift_horizontal=True), - 'TranslateY_BBox': lambda image, bboxes, pixels, replace: translate_bbox( - image, bboxes, pixels, replace, shift_horizontal=False), - # pylint:enable=g-long-lambda - 'TranslateY_Only_BBoxes': translate_y_only_bboxes, -} - -# Functions that require a `bboxes` parameter. -REQUIRE_BOXES_FUNCS = frozenset({ - 'Rotate_BBox', - 'ShearX_BBox', - 'ShearY_BBox', - 'TranslateX_BBox', - 'TranslateY_BBox', - 'TranslateY_Only_BBoxes', -}) - -# Functions that have a 'prob' parameter -PROB_FUNCS = frozenset({ - 'TranslateY_Only_BBoxes', -}) - -# Functions that have a 'replace' parameter -REPLACE_FUNCS = frozenset({ - 'Rotate', - 'TranslateX', - 'ShearX', - 'ShearY', - 'TranslateY', - 'Cutout', - 'Rotate_BBox', - 'ShearX_BBox', - 'ShearY_BBox', - 'TranslateX_BBox', - 'TranslateY_BBox', - 'TranslateY_Only_BBoxes', -}) - - -def level_to_arg(cutout_const: float, translate_const: float): - """Creates a dict mapping image operation names to their arguments.""" - - no_arg = lambda level: () - posterize_arg = lambda level: _mult_to_arg(level, 4) - solarize_arg = lambda level: _mult_to_arg(level, 256) - solarize_add_arg = lambda level: _mult_to_arg(level, 110) - cutout_arg = lambda level: _mult_to_arg(level, cutout_const) - translate_arg = lambda level: _translate_level_to_arg(level, translate_const) - translate_bbox_arg = lambda level: _translate_level_to_arg(level, 120) - - args = { - 'AutoContrast': no_arg, - 'Equalize': no_arg, - 'Invert': no_arg, - 'Rotate': _rotate_level_to_arg, - 'Posterize': posterize_arg, - 'Solarize': solarize_arg, - 'SolarizeAdd': solarize_add_arg, - 'Color': _enhance_level_to_arg, - 'Contrast': _enhance_level_to_arg, - 'Brightness': _enhance_level_to_arg, - 'Sharpness': _enhance_level_to_arg, - 'ShearX': _shear_level_to_arg, - 'ShearY': _shear_level_to_arg, - 'Cutout': cutout_arg, - 'TranslateX': translate_arg, - 'TranslateY': translate_arg, - 'Rotate_BBox': _rotate_level_to_arg, - 'ShearX_BBox': _shear_level_to_arg, - 'ShearY_BBox': _shear_level_to_arg, - # pylint:disable=g-long-lambda - 'TranslateX_BBox': lambda level: _translate_level_to_arg( - level, translate_const), - 'TranslateY_BBox': lambda level: _translate_level_to_arg( - level, translate_const), - # pylint:enable=g-long-lambda - 'TranslateY_Only_BBoxes': translate_bbox_arg, - } - return args - - -def bbox_wrapper(func): - """Adds a bboxes function argument to func and returns unchanged bboxes.""" - def wrapper(images, bboxes, *args, **kwargs): - return (func(images, *args, **kwargs), bboxes) - return wrapper - - -def _parse_policy_info(name: Text, - prob: float, - level: float, - replace_value: List[int], - cutout_const: float, - translate_const: float, - level_std: float = 0.) -> Tuple[Any, float, Any]: - """Return the function that corresponds to `name` and update `level` param.""" - func = NAME_TO_FUNC[name] - - if level_std > 0: - level += tf.random.normal([], dtype=tf.float32) - level = tf.clip_by_value(level, 0., _MAX_LEVEL) - - args = level_to_arg(cutout_const, translate_const)[name](level) - - if name in PROB_FUNCS: - # Add in the prob arg if it is required for the function that is called. - args = tuple([prob] + list(args)) - - if name in REPLACE_FUNCS: - # Add in replace arg if it is required for the function that is called. - args = tuple(list(args) + [replace_value]) - - # Add bboxes as the second positional argument for the function if it does - # not already exist. - if 'bboxes' not in inspect.getfullargspec(func)[0]: - func = bbox_wrapper(func) - - return func, prob, args - - -class ImageAugment(object): - """Image augmentation class for applying image distortions.""" - - def distort( - self, - image: tf.Tensor - ) -> tf.Tensor: - """Given an image tensor, returns a distorted image with the same shape. - - Args: - image: `Tensor` of shape [height, width, 3] or - [num_frames, height, width, 3] representing an image or image sequence. - - Returns: - The augmented version of `image`. - """ - raise NotImplementedError() - - def distort_with_boxes( - self, - image: tf.Tensor, - bboxes: tf.Tensor - ) -> Tuple[tf.Tensor, tf.Tensor]: - """Distorts the image and bounding boxes. - - Args: - image: `Tensor` of shape [height, width, 3] or - [num_frames, height, width, 3] representing an image or image sequence. - bboxes: `Tensor` of shape [num_boxes, 4] or [num_frames, num_boxes, 4] - representing bounding boxes for an image or image sequence. - - Returns: - The augmented version of `image` and `bboxes`. - """ - raise NotImplementedError - - -class AutoAugment(ImageAugment): - """Applies the AutoAugment policy to images. - - AutoAugment is from the paper: https://arxiv.org/abs/1805.09501. - """ - - def __init__(self, - augmentation_name: Text = 'v0', - policies: Optional[Iterable[Iterable[Tuple[Text, float, - float]]]] = None, - cutout_const: float = 100, - translate_const: float = 250): - """Applies the AutoAugment policy to images. - - Args: - augmentation_name: The name of the AutoAugment policy to use. The - available options are `v0`, `test`, `reduced_cifar10`, `svhn` and - `reduced_imagenet`. `v0` is the policy used for all - of the results in the paper and was found to achieve the best results on - the COCO dataset. `v1`, `v2` and `v3` are additional good policies found - on the COCO dataset that have slight variation in what operations were - used during the search procedure along with how many operations are - applied in parallel to a single image (2 vs 3). Make sure to set - `policies` to `None` (the default) if you want to set options using - `augmentation_name`. - policies: list of lists of tuples in the form `(func, prob, level)`, - `func` is a string name of the augmentation function, `prob` is the - probability of applying the `func` operation, `level` (or magnitude) is - the input argument for `func`. For example: - ``` - [[('Equalize', 0.9, 3), ('Color', 0.7, 8)], - [('Invert', 0.6, 5), ('Rotate', 0.2, 9), ('ShearX', 0.1, 2)], ...] - ``` - The outer-most list must be 3-d. The number of operations in a - sub-policy can vary from one sub-policy to another. - If you provide `policies` as input, any option set with - `augmentation_name` will get overriden as they are mutually exclusive. - cutout_const: multiplier for applying cutout. - translate_const: multiplier for applying translation. - - Raises: - ValueError if `augmentation_name` is unsupported. - """ - super(AutoAugment, self).__init__() - - self.augmentation_name = augmentation_name - self.cutout_const = float(cutout_const) - self.translate_const = float(translate_const) - self.available_policies = { - 'detection_v0': self.detection_policy_v0(), - 'v0': self.policy_v0(), - 'test': self.policy_test(), - 'simple': self.policy_simple(), - 'reduced_cifar10': self.policy_reduced_cifar10(), - 'svhn': self.policy_svhn(), - 'reduced_imagenet': self.policy_reduced_imagenet(), - } - - if not policies: - if augmentation_name not in self.available_policies: - raise ValueError( - 'Invalid augmentation_name: {}'.format(augmentation_name)) - - self.policies = self.available_policies[augmentation_name] - - else: - self._check_policy_shape(policies) - self.policies = policies - - def _check_policy_shape(self, policies): - """Checks dimension and shape of the custom policy. - - Args: - policies: List of list of tuples in the form `(func, prob, level)`. Must - have shape of `(:, :, 3)`. - - Raises: - ValueError if the shape of `policies` is unexpected. - """ - in_shape = np.array(policies).shape - if len(in_shape) != 3 or in_shape[-1:] != (3,): - raise ValueError('Wrong shape detected for custom policy. Expected ' - '(:, :, 3) but got {}.'.format(in_shape)) - - def _make_tf_policies(self): - """Prepares the TF functions for augmentations based on the policies.""" - replace_value = [128] * 3 - - # func is the string name of the augmentation function, prob is the - # probability of applying the operation and level is the parameter - # associated with the tf op. - - # tf_policies are functions that take in an image and return an augmented - # image. - tf_policies = [] - for policy in self.policies: - tf_policy = [] - assert_ranges = [] - # Link string name to the correct python function and make sure the - # correct argument is passed into that function. - for policy_info in policy: - _, prob, level = policy_info - assert_ranges.append(tf.Assert(tf.less_equal(prob, 1.), [prob])) - assert_ranges.append( - tf.Assert(tf.less_equal(level, int(_MAX_LEVEL)), [level])) - - policy_info = list(policy_info) + [ - replace_value, self.cutout_const, self.translate_const - ] - tf_policy.append(_parse_policy_info(*policy_info)) - # Now build the tf policy that will apply the augmentation procedue - # on image. - def make_final_policy(tf_policy_): - - def final_policy(image_, bboxes_): - for func, prob, args in tf_policy_: - image_, bboxes_ = _apply_func_with_prob(func, image_, bboxes_, args, - prob) - return image_, bboxes_ - - return final_policy - - with tf.control_dependencies(assert_ranges): - tf_policies.append(make_final_policy(tf_policy)) - - return tf_policies - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """See base class.""" - input_image_type = image.dtype - if input_image_type != tf.uint8: - image = tf.clip_by_value(image, 0.0, 255.0) - image = tf.cast(image, dtype=tf.uint8) - - tf_policies = self._make_tf_policies() - image, _ = select_and_apply_random_policy(tf_policies, image, bboxes=None) - return image - - def distort_with_boxes(self, image: tf.Tensor, - bboxes: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - """See base class.""" - input_image_type = image.dtype - if input_image_type != tf.uint8: - image = tf.clip_by_value(image, 0.0, 255.0) - image = tf.cast(image, dtype=tf.uint8) - - tf_policies = self._make_tf_policies() - image, bboxes = select_and_apply_random_policy(tf_policies, image, bboxes) - return image, bboxes - - @staticmethod - def detection_policy_v0(): - """Autoaugment policy that was used in AutoAugment Paper for Detection. - - https://arxiv.org/pdf/1906.11172 - - Each tuple is an augmentation operation of the form - (operation, probability, magnitude). Each element in policy is a - sub-policy that will be applied sequentially on the image. - - Returns: - the policy. - """ - policy = [ - [('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)], - [('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)], - [('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)], - [('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)], - [('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)], - ] - return policy - - @staticmethod - def policy_v0(): - """Autoaugment policy that was used in AutoAugment Paper. - - Each tuple is an augmentation operation of the form - (operation, probability, magnitude). Each element in policy is a - sub-policy that will be applied sequentially on the image. - - Returns: - the policy. - """ - - policy = [ - [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)], - [('Color', 0.4, 9), ('Equalize', 0.6, 3)], - [('Color', 0.4, 1), ('Rotate', 0.6, 8)], - [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], - [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], - [('Color', 0.2, 0), ('Equalize', 0.8, 8)], - [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], - [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)], - [('Color', 0.6, 1), ('Equalize', 1.0, 2)], - [('Invert', 0.4, 9), ('Rotate', 0.6, 0)], - [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)], - [('Color', 0.4, 7), ('Equalize', 0.6, 0)], - [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], - [('Solarize', 0.6, 8), ('Color', 0.6, 9)], - [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)], - [('Rotate', 1.0, 7), ('TranslateY', 0.8, 9)], - [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)], - [('ShearY', 0.8, 0), ('Color', 0.6, 4)], - [('Color', 1.0, 0), ('Rotate', 0.6, 2)], - [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], - [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], - [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)], - [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], - [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], - [('Color', 0.8, 6), ('Rotate', 0.4, 5)], - ] - return policy - - @staticmethod - def policy_reduced_cifar10(): - """Autoaugment policy for reduced CIFAR-10 dataset. - - Result is from the AutoAugment paper: https://arxiv.org/abs/1805.09501. - - Each tuple is an augmentation operation of the form - (operation, probability, magnitude). Each element in policy is a - sub-policy that will be applied sequentially on the image. - - Returns: - the policy. - """ - policy = [ - [('Invert', 0.1, 7), ('Contrast', 0.2, 6)], - [('Rotate', 0.7, 2), ('TranslateX', 0.3, 9)], - [('Sharpness', 0.8, 1), ('Sharpness', 0.9, 3)], - [('ShearY', 0.5, 8), ('TranslateY', 0.7, 9)], - [('AutoContrast', 0.5, 8), ('Equalize', 0.9, 2)], - [('ShearY', 0.2, 7), ('Posterize', 0.3, 7)], - [('Color', 0.4, 3), ('Brightness', 0.6, 7)], - [('Sharpness', 0.3, 9), ('Brightness', 0.7, 9)], - [('Equalize', 0.6, 5), ('Equalize', 0.5, 1)], - [('Contrast', 0.6, 7), ('Sharpness', 0.6, 5)], - [('Color', 0.7, 7), ('TranslateX', 0.5, 8)], - [('Equalize', 0.3, 7), ('AutoContrast', 0.4, 8)], - [('TranslateY', 0.4, 3), ('Sharpness', 0.2, 6)], - [('Brightness', 0.9, 6), ('Color', 0.2, 8)], - [('Solarize', 0.5, 2), ('Invert', 0.0, 3)], - [('Equalize', 0.2, 0), ('AutoContrast', 0.6, 0)], - [('Equalize', 0.2, 8), ('Equalize', 0.6, 4)], - [('Color', 0.9, 9), ('Equalize', 0.6, 6)], - [('AutoContrast', 0.8, 4), ('Solarize', 0.2, 8)], - [('Brightness', 0.1, 3), ('Color', 0.7, 0)], - [('Solarize', 0.4, 5), ('AutoContrast', 0.9, 3)], - [('TranslateY', 0.9, 9), ('TranslateY', 0.7, 9)], - [('AutoContrast', 0.9, 2), ('Solarize', 0.8, 3)], - [('Equalize', 0.8, 8), ('Invert', 0.1, 3)], - [('TranslateY', 0.7, 9), ('AutoContrast', 0.9, 1)], - ] - return policy - - @staticmethod - def policy_svhn(): - """Autoaugment policy for SVHN dataset. - - Result is from the AutoAugment paper: https://arxiv.org/abs/1805.09501. - - Each tuple is an augmentation operation of the form - (operation, probability, magnitude). Each element in policy is a - sub-policy that will be applied sequentially on the image. - - Returns: - the policy. - """ - policy = [ - [('ShearX', 0.9, 4), ('Invert', 0.2, 3)], - [('ShearY', 0.9, 8), ('Invert', 0.7, 5)], - [('Equalize', 0.6, 5), ('Solarize', 0.6, 6)], - [('Invert', 0.9, 3), ('Equalize', 0.6, 3)], - [('Equalize', 0.6, 1), ('Rotate', 0.9, 3)], - [('ShearX', 0.9, 4), ('AutoContrast', 0.8, 3)], - [('ShearY', 0.9, 8), ('Invert', 0.4, 5)], - [('ShearY', 0.9, 5), ('Solarize', 0.2, 6)], - [('Invert', 0.9, 6), ('AutoContrast', 0.8, 1)], - [('Equalize', 0.6, 3), ('Rotate', 0.9, 3)], - [('ShearX', 0.9, 4), ('Solarize', 0.3, 3)], - [('ShearY', 0.8, 8), ('Invert', 0.7, 4)], - [('Equalize', 0.9, 5), ('TranslateY', 0.6, 6)], - [('Invert', 0.9, 4), ('Equalize', 0.6, 7)], - [('Contrast', 0.3, 3), ('Rotate', 0.8, 4)], - [('Invert', 0.8, 5), ('TranslateY', 0.0, 2)], - [('ShearY', 0.7, 6), ('Solarize', 0.4, 8)], - [('Invert', 0.6, 4), ('Rotate', 0.8, 4)], - [('ShearY', 0.3, 7), ('TranslateX', 0.9, 3)], - [('ShearX', 0.1, 6), ('Invert', 0.6, 5)], - [('Solarize', 0.7, 2), ('TranslateY', 0.6, 7)], - [('ShearY', 0.8, 4), ('Invert', 0.8, 8)], - [('ShearX', 0.7, 9), ('TranslateY', 0.8, 3)], - [('ShearY', 0.8, 5), ('AutoContrast', 0.7, 3)], - [('ShearX', 0.7, 2), ('Invert', 0.1, 5)], - ] - return policy - - @staticmethod - def policy_reduced_imagenet(): - """Autoaugment policy for reduced ImageNet dataset. - - Result is from the AutoAugment paper: https://arxiv.org/abs/1805.09501. - - Each tuple is an augmentation operation of the form - (operation, probability, magnitude). Each element in policy is a - sub-policy that will be applied sequentially on the image. - - Returns: - the policy. - """ - policy = [ - [('Posterize', 0.4, 8), ('Rotate', 0.6, 9)], - [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)], - [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)], - [('Posterize', 0.6, 7), ('Posterize', 0.6, 6)], - [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)], - [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)], - [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)], - [('Posterize', 0.8, 5), ('Equalize', 1.0, 2)], - [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)], - [('Equalize', 0.6, 8), ('Posterize', 0.4, 6)], - [('Rotate', 0.8, 8), ('Color', 0.4, 0)], - [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)], - [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)], - [('Invert', 0.6, 4), ('Equalize', 1.0, 8)], - [('Color', 0.6, 4), ('Contrast', 1.0, 8)], - [('Rotate', 0.8, 8), ('Color', 1.0, 2)], - [('Color', 0.8, 8), ('Solarize', 0.8, 7)], - [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)], - [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)], - [('Color', 0.4, 0), ('Equalize', 0.6, 3)], - [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)], - [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)], - [('Invert', 0.6, 4), ('Equalize', 1.0, 8)], - [('Color', 0.6, 4), ('Contrast', 1.0, 8)], - [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)] - ] - return policy - - @staticmethod - def policy_simple(): - """Same as `policy_v0`, except with custom ops removed.""" - - policy = [ - [('Color', 0.4, 9), ('Equalize', 0.6, 3)], - [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], - [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], - [('Color', 0.2, 0), ('Equalize', 0.8, 8)], - [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], - [('Color', 0.6, 1), ('Equalize', 1.0, 2)], - [('Color', 0.4, 7), ('Equalize', 0.6, 0)], - [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], - [('Solarize', 0.6, 8), ('Color', 0.6, 9)], - [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], - [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], - [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], - [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], - ] - return policy - - @staticmethod - def policy_test(): - """Autoaugment test policy for debugging.""" - policy = [ - [('TranslateX', 1.0, 4), ('Equalize', 1.0, 10)], - ] - return policy - - -def _maybe_identity(x: Optional[tf.Tensor]) -> Optional[tf.Tensor]: - return tf.identity(x) if x is not None else None - - -class RandAugment(ImageAugment): - """Applies the RandAugment policy to images. - - RandAugment is from the paper https://arxiv.org/abs/1909.13719, - """ - - def __init__(self, - num_layers: int = 2, - magnitude: float = 10., - cutout_const: float = 40., - translate_const: float = 100., - magnitude_std: float = 0.0, - prob_to_apply: Optional[float] = None, - exclude_ops: Optional[List[str]] = None): - """Applies the RandAugment policy to images. - - Args: - num_layers: Integer, the number of augmentation transformations to apply - sequentially to an image. Represented as (N) in the paper. Usually best - values will be in the range [1, 3]. - magnitude: Integer, shared magnitude across all augmentation operations. - Represented as (M) in the paper. Usually best values are in the range - [5, 10]. - cutout_const: multiplier for applying cutout. - translate_const: multiplier for applying translation. - magnitude_std: randomness of the severity as proposed by the authors of - the timm library. - prob_to_apply: The probability to apply the selected augmentation at each - layer. - exclude_ops: exclude selected operations. - """ - super(RandAugment, self).__init__() - - self.num_layers = num_layers - self.magnitude = float(magnitude) - self.cutout_const = float(cutout_const) - self.translate_const = float(translate_const) - self.prob_to_apply = ( - float(prob_to_apply) if prob_to_apply is not None else None) - self.available_ops = [ - 'AutoContrast', 'Equalize', 'Invert', 'Rotate', 'Posterize', 'Solarize', - 'Color', 'Contrast', 'Brightness', 'Sharpness', 'ShearX', 'ShearY', - 'TranslateX', 'TranslateY', 'Cutout', 'SolarizeAdd' - ] - self.magnitude_std = magnitude_std - if exclude_ops: - self.available_ops = [ - op for op in self.available_ops if op not in exclude_ops - ] - - def _distort_common( - self, - image: tf.Tensor, - bboxes: Optional[tf.Tensor] = None - ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: - """Distorts the image and optionally bounding boxes.""" - input_image_type = image.dtype - - if input_image_type != tf.uint8: - image = tf.clip_by_value(image, 0.0, 255.0) - image = tf.cast(image, dtype=tf.uint8) - - replace_value = [128] * 3 - min_prob, max_prob = 0.2, 0.8 - - aug_image = image - aug_bboxes = bboxes - - for _ in range(self.num_layers): - op_to_select = tf.random.uniform([], - maxval=len(self.available_ops) + 1, - dtype=tf.int32) - - branch_fns = [] - for (i, op_name) in enumerate(self.available_ops): - prob = tf.random.uniform([], - minval=min_prob, - maxval=max_prob, - dtype=tf.float32) - func, _, args = _parse_policy_info(op_name, prob, self.magnitude, - replace_value, self.cutout_const, - self.translate_const, - self.magnitude_std) - branch_fns.append(( - i, - # pylint:disable=g-long-lambda - lambda selected_func=func, selected_args=args: selected_func( - image, bboxes, *selected_args))) - # pylint:enable=g-long-lambda - - aug_image, aug_bboxes = tf.switch_case( - branch_index=op_to_select, - branch_fns=branch_fns, - default=lambda: (tf.identity(image), _maybe_identity(bboxes))) - - if self.prob_to_apply is not None: - aug_image, aug_bboxes = tf.cond( - tf.random.uniform(shape=[], dtype=tf.float32) < self.prob_to_apply, - lambda: (tf.identity(aug_image), _maybe_identity(aug_bboxes)), - lambda: (tf.identity(image), _maybe_identity(bboxes))) - image = aug_image - bboxes = aug_bboxes - - image = tf.cast(image, dtype=input_image_type) - return image, bboxes - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """See base class.""" - image, _ = self._distort_common(image) - return image - - def distort_with_boxes(self, image: tf.Tensor, - bboxes: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - """See base class.""" - image, bboxes = self._distort_common(image, bboxes) - return image, bboxes - - -class RandomErasing(ImageAugment): - """Applies RandomErasing to a single image. - - Reference: https://arxiv.org/abs/1708.04896 - - Implementaion is inspired by https://github.com/rwightman/pytorch-image-models - """ - - def __init__(self, - probability: float = 0.25, - min_area: float = 0.02, - max_area: float = 1 / 3, - min_aspect: float = 0.3, - max_aspect=None, - min_count=1, - max_count=1, - trials=10): - """Applies RandomErasing to a single image. - - Args: - probability (float, optional): Probability of augmenting the image. - Defaults to 0.25. - min_area (float, optional): Minimum area of the random erasing rectangle. - Defaults to 0.02. - max_area (float, optional): Maximum area of the random erasing rectangle. - Defaults to 1/3. - min_aspect (float, optional): Minimum aspect rate of the random erasing - rectangle. Defaults to 0.3. - max_aspect ([type], optional): Maximum aspect rate of the random erasing - rectangle. Defaults to None. - min_count (int, optional): Minimum number of erased rectangles. Defaults - to 1. - max_count (int, optional): Maximum number of erased rectangles. Defaults - to 1. - trials (int, optional): Maximum number of trials to randomly sample a - rectangle that fulfills constraint. Defaults to 10. - """ - self._probability = probability - self._min_area = float(min_area) - self._max_area = float(max_area) - self._min_log_aspect = math.log(min_aspect) - self._max_log_aspect = math.log(max_aspect or 1 / min_aspect) - self._min_count = min_count - self._max_count = max_count - self._trials = trials - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """Applies RandomErasing to single `image`. - - Args: - image (tf.Tensor): Of shape [height, width, 3] representing an image. - - Returns: - tf.Tensor: The augmented version of `image`. - """ - uniform_random = tf.random.uniform(shape=[], minval=0., maxval=1.0) - mirror_cond = tf.less(uniform_random, self._probability) - image = tf.cond(mirror_cond, lambda: self._erase(image), lambda: image) - return image - - @tf.function - def _erase(self, image: tf.Tensor) -> tf.Tensor: - """Erase an area.""" - if self._min_count == self._max_count: - count = self._min_count - else: - count = tf.random.uniform( - shape=[], - minval=int(self._min_count), - maxval=int(self._max_count - self._min_count + 1), - dtype=tf.int32) - - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - area = tf.cast(image_width * image_height, tf.float32) - - for _ in range(count): - # Work around since break is not supported in tf.function - is_trial_successfull = False - for _ in range(self._trials): - if not is_trial_successfull: - erase_area = tf.random.uniform( - shape=[], - minval=area * self._min_area, - maxval=area * self._max_area) - aspect_ratio = tf.math.exp( - tf.random.uniform( - shape=[], - minval=self._min_log_aspect, - maxval=self._max_log_aspect)) - - half_height = tf.cast( - tf.math.round(tf.math.sqrt(erase_area * aspect_ratio) / 2), - dtype=tf.int32) - half_width = tf.cast( - tf.math.round(tf.math.sqrt(erase_area / aspect_ratio) / 2), - dtype=tf.int32) - - if 2 * half_height < image_height and 2 * half_width < image_width: - center_height = tf.random.uniform( - shape=[], - minval=0, - maxval=int(image_height - 2 * half_height), - dtype=tf.int32) - center_width = tf.random.uniform( - shape=[], - minval=0, - maxval=int(image_width - 2 * half_width), - dtype=tf.int32) - - image = _fill_rectangle( - image, - center_width, - center_height, - half_width, - half_height, - replace=None) - - is_trial_successfull = True - - return image - - -class MixupAndCutmix: - """Applies Mixup and/or Cutmix to a batch of images. - - - Mixup: https://arxiv.org/abs/1710.09412 - - Cutmix: https://arxiv.org/abs/1905.04899 - - Implementaion is inspired by https://github.com/rwightman/pytorch-image-models - """ - - def __init__(self, - mixup_alpha: float = .8, - cutmix_alpha: float = 1., - prob: float = 1.0, - switch_prob: float = 0.5, - label_smoothing: float = 0.1, - num_classes: int = 1001): - """Applies Mixup and/or Cutmix to a batch of images. - - Args: - mixup_alpha (float, optional): For drawing a random lambda (`lam`) from a - beta distribution (for each image). If zero Mixup is deactivated. - Defaults to .8. - cutmix_alpha (float, optional): For drawing a random lambda (`lam`) from a - beta distribution (for each image). If zero Cutmix is deactivated. - Defaults to 1.. - prob (float, optional): Of augmenting the batch. Defaults to 1.0. - switch_prob (float, optional): Probability of applying Cutmix for the - batch. Defaults to 0.5. - label_smoothing (float, optional): Constant for label smoothing. Defaults - to 0.1. - num_classes (int, optional): Number of classes. Defaults to 1001. - """ - self.mixup_alpha = mixup_alpha - self.cutmix_alpha = cutmix_alpha - self.mix_prob = prob - self.switch_prob = switch_prob - self.label_smoothing = label_smoothing - self.num_classes = num_classes - self.mode = 'batch' - self.mixup_enabled = True - - if self.mixup_alpha and not self.cutmix_alpha: - self.switch_prob = -1 - elif not self.mixup_alpha and self.cutmix_alpha: - self.switch_prob = 1 - - def __call__(self, images: tf.Tensor, - labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - return self.distort(images, labels) - - def distort(self, images: tf.Tensor, - labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - """Applies Mixup and/or Cutmix to batch of images and transforms labels. - - Args: - images (tf.Tensor): Of shape [batch_size,height, width, 3] representing a - batch of image. - labels (tf.Tensor): Of shape [batch_size, ] representing the class id for - each image of the batch. - - Returns: - Tuple[tf.Tensor, tf.Tensor]: The augmented version of `image` and - `labels`. - """ - augment_cond = tf.less( - tf.random.uniform(shape=[], minval=0., maxval=1.0), self.mix_prob) - # pylint: disable=g-long-lambda - augment_a = lambda: self._update_labels(*tf.cond( - tf.less( - tf.random.uniform(shape=[], minval=0., maxval=1.0), self.switch_prob - ), lambda: self._cutmix(images, labels), lambda: self._mixup( - images, labels))) - augment_b = lambda: (images, self._smooth_labels(labels)) - # pylint: enable=g-long-lambda - - return tf.cond(augment_cond, augment_a, augment_b) - - @staticmethod - def _sample_from_beta(alpha, beta, shape): - sample_alpha = tf.random.gamma(shape, 1., beta=alpha) - sample_beta = tf.random.gamma(shape, 1., beta=beta) - return sample_alpha / (sample_alpha + sample_beta) - - def _cutmix(self, images: tf.Tensor, - labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]: - """Apply cutmix.""" - lam = MixupAndCutmix._sample_from_beta(self.cutmix_alpha, self.cutmix_alpha, - labels.shape) - - ratio = tf.math.sqrt(1 - lam) - - batch_size = tf.shape(images)[0] - image_height, image_width = tf.shape(images)[1], tf.shape(images)[2] - - cut_height = tf.cast( - ratio * tf.cast(image_height, dtype=tf.float32), dtype=tf.int32) - cut_width = tf.cast( - ratio * tf.cast(image_height, dtype=tf.float32), dtype=tf.int32) - - random_center_height = tf.random.uniform( - shape=[batch_size], minval=0, maxval=image_height, dtype=tf.int32) - random_center_width = tf.random.uniform( - shape=[batch_size], minval=0, maxval=image_width, dtype=tf.int32) - - bbox_area = cut_height * cut_width - lam = 1. - bbox_area / (image_height * image_width) - lam = tf.cast(lam, dtype=tf.float32) - - images = tf.map_fn( - lambda x: _fill_rectangle(*x), - (images, random_center_width, random_center_height, cut_width // 2, - cut_height // 2, tf.reverse(images, [0])), - dtype=(tf.float32, tf.int32, tf.int32, tf.int32, tf.int32, tf.float32), - fn_output_signature=tf.TensorSpec(images.shape[1:], dtype=tf.float32)) - - return images, labels, lam - - def _mixup(self, images: tf.Tensor, - labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]: - lam = MixupAndCutmix._sample_from_beta(self.mixup_alpha, self.mixup_alpha, - labels.shape) - lam = tf.reshape(lam, [-1, 1, 1, 1]) - images = lam * images + (1. - lam) * tf.reverse(images, [0]) - - return images, labels, tf.squeeze(lam) - - def _smooth_labels(self, labels: tf.Tensor) -> tf.Tensor: - off_value = self.label_smoothing / self.num_classes - on_value = 1. - self.label_smoothing + off_value - - smooth_labels = tf.one_hot( - labels, self.num_classes, on_value=on_value, off_value=off_value) - return smooth_labels - - def _update_labels(self, images: tf.Tensor, labels: tf.Tensor, - lam: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - labels_1 = self._smooth_labels(labels) - labels_2 = tf.reverse(labels_1, [0]) - - lam = tf.reshape(lam, [-1, 1]) - labels = lam * labels_1 + (1. - lam) * labels_2 - - return images, labels diff --git a/official/vision/beta/ops/augment_test.py b/official/vision/beta/ops/augment_test.py deleted file mode 100644 index 45d248464781217df16e8dc060eb61cc0a61e736..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/augment_test.py +++ /dev/null @@ -1,418 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for autoaugment.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import random -from absl.testing import parameterized - -import tensorflow as tf - -from official.vision.beta.ops import augment - - -def get_dtype_test_cases(): - return [ - ('uint8', tf.uint8), - ('int32', tf.int32), - ('float16', tf.float16), - ('float32', tf.float32), - ] - - -@parameterized.named_parameters(get_dtype_test_cases()) -class TransformsTest(parameterized.TestCase, tf.test.TestCase): - """Basic tests for fundamental transformations.""" - - def test_to_from_4d(self, dtype): - for shape in [(10, 10), (10, 10, 10), (10, 10, 10, 10)]: - original_ndims = len(shape) - image = tf.zeros(shape, dtype=dtype) - image_4d = augment.to_4d(image) - self.assertEqual(4, tf.rank(image_4d)) - self.assertAllEqual(image, augment.from_4d(image_4d, original_ndims)) - - def test_transform(self, dtype): - image = tf.constant([[1, 2], [3, 4]], dtype=dtype) - self.assertAllEqual( - augment.transform(image, transforms=[1] * 8), [[4, 4], [4, 4]]) - - def test_translate(self, dtype): - image = tf.constant( - [[1, 0, 1, 0], [0, 1, 0, 1], [1, 0, 1, 0], [0, 1, 0, 1]], dtype=dtype) - translations = [-1, -1] - translated = augment.translate(image=image, translations=translations) - expected = [[1, 0, 1, 1], [0, 1, 0, 0], [1, 0, 1, 1], [1, 0, 1, 1]] - self.assertAllEqual(translated, expected) - - def test_translate_shapes(self, dtype): - translation = [0, 0] - for shape in [(3, 3), (5, 5), (224, 224, 3)]: - image = tf.zeros(shape, dtype=dtype) - self.assertAllEqual(image, augment.translate(image, translation)) - - def test_translate_invalid_translation(self, dtype): - image = tf.zeros((1, 1), dtype=dtype) - invalid_translation = [[[1, 1]]] - with self.assertRaisesRegex(TypeError, 'rank 1 or 2'): - _ = augment.translate(image, invalid_translation) - - def test_rotate(self, dtype): - image = tf.reshape(tf.cast(tf.range(9), dtype), (3, 3)) - rotation = 90. - transformed = augment.rotate(image=image, degrees=rotation) - expected = [[2, 5, 8], [1, 4, 7], [0, 3, 6]] - self.assertAllEqual(transformed, expected) - - def test_rotate_shapes(self, dtype): - degrees = 0. - for shape in [(3, 3), (5, 5), (224, 224, 3)]: - image = tf.zeros(shape, dtype=dtype) - self.assertAllEqual(image, augment.rotate(image, degrees)) - - -class AutoaugmentTest(tf.test.TestCase, parameterized.TestCase): - - AVAILABLE_POLICIES = [ - 'v0', - 'test', - 'simple', - 'reduced_cifar10', - 'svhn', - 'reduced_imagenet', - 'detection_v0', - ] - - def test_autoaugment(self): - """Smoke test to be sure there are no syntax errors.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - - for policy in self.AVAILABLE_POLICIES: - augmenter = augment.AutoAugment(augmentation_name=policy) - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - - def test_autoaugment_with_bboxes(self): - """Smoke test to be sure there are no syntax errors with bboxes.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - bboxes = tf.ones((2, 4), dtype=tf.float32) - - for policy in self.AVAILABLE_POLICIES: - augmenter = augment.AutoAugment(augmentation_name=policy) - aug_image, aug_bboxes = augmenter.distort_with_boxes(image, bboxes) - - self.assertEqual((224, 224, 3), aug_image.shape) - self.assertEqual((2, 4), aug_bboxes.shape) - - def test_randaug(self): - """Smoke test to be sure there are no syntax errors.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - - augmenter = augment.RandAugment() - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - - def test_randaug_with_bboxes(self): - """Smoke test to be sure there are no syntax errors with bboxes.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - bboxes = tf.ones((2, 4), dtype=tf.float32) - - augmenter = augment.RandAugment() - aug_image, aug_bboxes = augmenter.distort_with_boxes(image, bboxes) - - self.assertEqual((224, 224, 3), aug_image.shape) - self.assertEqual((2, 4), aug_bboxes.shape) - - def test_all_policy_ops(self): - """Smoke test to be sure all augmentation functions can execute.""" - - prob = 1 - magnitude = 10 - replace_value = [128] * 3 - cutout_const = 100 - translate_const = 250 - - image = tf.ones((224, 224, 3), dtype=tf.uint8) - bboxes = None - - for op_name in augment.NAME_TO_FUNC.keys() - augment.REQUIRE_BOXES_FUNCS: - func, _, args = augment._parse_policy_info(op_name, prob, magnitude, - replace_value, cutout_const, - translate_const) - image, bboxes = func(image, bboxes, *args) - - self.assertEqual((224, 224, 3), image.shape) - self.assertIsNone(bboxes) - - def test_all_policy_ops_with_bboxes(self): - """Smoke test to be sure all augmentation functions can execute.""" - - prob = 1 - magnitude = 10 - replace_value = [128] * 3 - cutout_const = 100 - translate_const = 250 - - image = tf.ones((224, 224, 3), dtype=tf.uint8) - bboxes = tf.ones((2, 4), dtype=tf.float32) - - for op_name in augment.NAME_TO_FUNC: - func, _, args = augment._parse_policy_info(op_name, prob, magnitude, - replace_value, cutout_const, - translate_const) - image, bboxes = func(image, bboxes, *args) - - self.assertEqual((224, 224, 3), image.shape) - self.assertEqual((2, 4), bboxes.shape) - - def test_autoaugment_video(self): - """Smoke test with video to be sure there are no syntax errors.""" - image = tf.zeros((2, 224, 224, 3), dtype=tf.uint8) - - for policy in self.AVAILABLE_POLICIES: - augmenter = augment.AutoAugment(augmentation_name=policy) - aug_image = augmenter.distort(image) - - self.assertEqual((2, 224, 224, 3), aug_image.shape) - - def test_autoaugment_video_with_boxes(self): - """Smoke test with video to be sure there are no syntax errors.""" - image = tf.zeros((2, 224, 224, 3), dtype=tf.uint8) - bboxes = tf.ones((2, 2, 4), dtype=tf.float32) - - for policy in self.AVAILABLE_POLICIES: - augmenter = augment.AutoAugment(augmentation_name=policy) - aug_image, aug_bboxes = augmenter.distort_with_boxes(image, bboxes) - - self.assertEqual((2, 224, 224, 3), aug_image.shape) - self.assertEqual((2, 2, 4), aug_bboxes.shape) - - def test_randaug_video(self): - """Smoke test with video to be sure there are no syntax errors.""" - image = tf.zeros((2, 224, 224, 3), dtype=tf.uint8) - - augmenter = augment.RandAugment() - aug_image = augmenter.distort(image) - - self.assertEqual((2, 224, 224, 3), aug_image.shape) - - def test_all_policy_ops_video(self): - """Smoke test to be sure all video augmentation functions can execute.""" - - prob = 1 - magnitude = 10 - replace_value = [128] * 3 - cutout_const = 100 - translate_const = 250 - - image = tf.ones((2, 224, 224, 3), dtype=tf.uint8) - bboxes = None - - for op_name in augment.NAME_TO_FUNC.keys() - augment.REQUIRE_BOXES_FUNCS: - func, _, args = augment._parse_policy_info(op_name, prob, magnitude, - replace_value, cutout_const, - translate_const) - image, bboxes = func(image, bboxes, *args) - - self.assertEqual((2, 224, 224, 3), image.shape) - self.assertIsNone(bboxes) - - def test_all_policy_ops_video_with_bboxes(self): - """Smoke test to be sure all video augmentation functions can execute.""" - - prob = 1 - magnitude = 10 - replace_value = [128] * 3 - cutout_const = 100 - translate_const = 250 - - image = tf.ones((2, 224, 224, 3), dtype=tf.uint8) - bboxes = tf.ones((2, 2, 4), dtype=tf.float32) - - for op_name in augment.NAME_TO_FUNC: - func, _, args = augment._parse_policy_info(op_name, prob, magnitude, - replace_value, cutout_const, - translate_const) - if op_name in { - 'Rotate_BBox', - 'ShearX_BBox', - 'ShearY_BBox', - 'TranslateX_BBox', - 'TranslateY_BBox', - 'TranslateY_Only_BBoxes', - }: - with self.assertRaises(ValueError): - func(image, bboxes, *args) - else: - image, bboxes = func(image, bboxes, *args) - - self.assertEqual((2, 224, 224, 3), image.shape) - self.assertEqual((2, 2, 4), bboxes.shape) - - def _generate_test_policy(self): - """Generate a test policy at random.""" - op_list = list(augment.NAME_TO_FUNC.keys()) - size = 6 - prob = [round(random.uniform(0., 1.), 1) for _ in range(size)] - mag = [round(random.uniform(0, 10)) for _ in range(size)] - policy = [] - for i in range(0, size, 2): - policy.append([(op_list[i], prob[i], mag[i]), - (op_list[i + 1], prob[i + 1], mag[i + 1])]) - return policy - - def test_custom_policy(self): - """Test autoaugment with a custom policy.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - augmenter = augment.AutoAugment(policies=self._generate_test_policy()) - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - - @parameterized.named_parameters( - {'testcase_name': '_OutOfRangeProb', - 'sub_policy': ('Equalize', 1.1, 3), 'value': '1.1'}, - {'testcase_name': '_OutOfRangeMag', - 'sub_policy': ('Equalize', 0.9, 11), 'value': '11'}, - ) - def test_invalid_custom_sub_policy(self, sub_policy, value): - """Test autoaugment with out-of-range values in the custom policy.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - policy = self._generate_test_policy() - policy[0][0] = sub_policy - augmenter = augment.AutoAugment(policies=policy) - - with self.assertRaisesRegex( - tf.errors.InvalidArgumentError, - r'Expected \'tf.Tensor\(False, shape=\(\), dtype=bool\)\' to be true. ' - r'Summarized data: ({})'.format(value)): - augmenter.distort(image) - - def test_invalid_custom_policy_ndim(self): - """Test autoaugment with wrong dimension in the custom policy.""" - policy = [[('Equalize', 0.8, 1), ('Shear', 0.8, 4)], - [('TranslateY', 0.6, 3), ('Rotate', 0.9, 3)]] - policy = [[policy]] - - with self.assertRaisesRegex( - ValueError, - r'Expected \(:, :, 3\) but got \(1, 1, 2, 2, 3\).'): - augment.AutoAugment(policies=policy) - - def test_invalid_custom_policy_shape(self): - """Test autoaugment with wrong shape in the custom policy.""" - policy = [[('Equalize', 0.8, 1, 1), ('Shear', 0.8, 4, 1)], - [('TranslateY', 0.6, 3, 1), ('Rotate', 0.9, 3, 1)]] - - with self.assertRaisesRegex( - ValueError, - r'Expected \(:, :, 3\) but got \(2, 2, 4\)'): - augment.AutoAugment(policies=policy) - - def test_invalid_custom_policy_key(self): - """Test autoaugment with invalid key in the custom policy.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - policy = [[('AAAAA', 0.8, 1), ('Shear', 0.8, 4)], - [('TranslateY', 0.6, 3), ('Rotate', 0.9, 3)]] - augmenter = augment.AutoAugment(policies=policy) - - with self.assertRaisesRegex(KeyError, '\'AAAAA\''): - augmenter.distort(image) - - -class RandomErasingTest(tf.test.TestCase, parameterized.TestCase): - - def test_random_erase_replaces_some_pixels(self): - image = tf.zeros((224, 224, 3), dtype=tf.float32) - augmenter = augment.RandomErasing(probability=1., max_count=10) - - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - self.assertNotEqual(0, tf.reduce_max(aug_image)) - - -class MixupAndCutmixTest(tf.test.TestCase, parameterized.TestCase): - - def test_mixup_and_cutmix_smoothes_labels(self): - batch_size = 12 - num_classes = 1000 - label_smoothing = 0.1 - - images = tf.random.normal((batch_size, 224, 224, 3), dtype=tf.float32) - labels = tf.range(batch_size) - augmenter = augment.MixupAndCutmix( - num_classes=num_classes, label_smoothing=label_smoothing) - - aug_images, aug_labels = augmenter.distort(images, labels) - - self.assertEqual(images.shape, aug_images.shape) - self.assertEqual(images.dtype, aug_images.dtype) - self.assertEqual([batch_size, num_classes], aug_labels.shape) - self.assertAllLessEqual(aug_labels, 1. - label_smoothing + - 2. / num_classes) # With tolerance - self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - - 1e4) # With tolerance - - def test_mixup_changes_image(self): - batch_size = 12 - num_classes = 1000 - label_smoothing = 0.1 - - images = tf.random.normal((batch_size, 224, 224, 3), dtype=tf.float32) - labels = tf.range(batch_size) - augmenter = augment.MixupAndCutmix( - mixup_alpha=1., cutmix_alpha=0., num_classes=num_classes) - - aug_images, aug_labels = augmenter.distort(images, labels) - - self.assertEqual(images.shape, aug_images.shape) - self.assertEqual(images.dtype, aug_images.dtype) - self.assertEqual([batch_size, num_classes], aug_labels.shape) - self.assertAllLessEqual(aug_labels, 1. - label_smoothing + - 2. / num_classes) # With tolerance - self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - - 1e4) # With tolerance - self.assertFalse(tf.math.reduce_all(images == aug_images)) - - def test_cutmix_changes_image(self): - batch_size = 12 - num_classes = 1000 - label_smoothing = 0.1 - - images = tf.random.normal((batch_size, 224, 224, 3), dtype=tf.float32) - labels = tf.range(batch_size) - augmenter = augment.MixupAndCutmix( - mixup_alpha=0., cutmix_alpha=1., num_classes=num_classes) - - aug_images, aug_labels = augmenter.distort(images, labels) - - self.assertEqual(images.shape, aug_images.shape) - self.assertEqual(images.dtype, aug_images.dtype) - self.assertEqual([batch_size, num_classes], aug_labels.shape) - self.assertAllLessEqual(aug_labels, 1. - label_smoothing + - 2. / num_classes) # With tolerance - self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - - 1e4) # With tolerance - self.assertFalse(tf.math.reduce_all(images == aug_images)) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/ops/box_matcher.py b/official/vision/beta/ops/box_matcher.py deleted file mode 100644 index d788577d2f9701146252b52bd6ac0b738937143b..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/box_matcher.py +++ /dev/null @@ -1,191 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -"""Box matcher implementation.""" - - -import tensorflow as tf - - -class BoxMatcher: - """Matcher based on highest value. - - This class computes matches from a similarity matrix. Each column is matched - to a single row. - - To support object detection target assignment this class enables setting both - positive_threshold (upper threshold) and negative_threshold (lower thresholds) - defining three categories of similarity which define whether examples are - positive, negative, or ignored, for example: - (1) thresholds=[negative_threshold, positive_threshold], and - indicators=[negative_value, ignore_value, positive_value]: The similarity - metrics below negative_threshold will be assigned with negative_value, - the metrics between negative_threshold and positive_threshold will be - assigned ignore_value, and the metrics above positive_threshold will be - assigned positive_value. - (2) thresholds=[negative_threshold, positive_threshold], and - indicators=[ignore_value, negative_value, positive_value]: The similarity - metric below negative_threshold will be assigned with ignore_value, - the metrics between negative_threshold and positive_threshold will be - assigned negative_value, and the metrics above positive_threshold will be - assigned positive_value. - """ - - def __init__(self, thresholds, indicators, force_match_for_each_col=False): - """Construct BoxMatcher. - - Args: - thresholds: A list of thresholds to classify boxes into - different buckets. The list needs to be sorted, and will be prepended - with -Inf and appended with +Inf. - indicators: A list of values to assign for each bucket. len(`indicators`) - must equal to len(`thresholds`) + 1. - force_match_for_each_col: If True, ensures that each column is matched to - at least one row (which is not guaranteed otherwise if the - positive_threshold is high). Defaults to False. If True, all force - matched row will be assigned to `indicators[-1]`. - - Raises: - ValueError: If `threshold` not sorted, - or len(indicators) != len(threshold) + 1 - """ - if not all([lo <= hi for (lo, hi) in zip(thresholds[:-1], thresholds[1:])]): - raise ValueError('`threshold` must be sorted, got {}'.format(thresholds)) - self.indicators = indicators - if len(indicators) != len(thresholds) + 1: - raise ValueError('len(`indicators`) must be len(`thresholds`) + 1, got ' - 'indicators {}, thresholds {}'.format( - indicators, thresholds)) - thresholds = thresholds[:] - thresholds.insert(0, -float('inf')) - thresholds.append(float('inf')) - self.thresholds = thresholds - self._force_match_for_each_col = force_match_for_each_col - - def __call__(self, similarity_matrix): - """Tries to match each column of the similarity matrix to a row. - - Args: - similarity_matrix: A float tensor of shape [N, M] representing any - similarity metric. - - Returns: - A integer tensor of shape [N] with corresponding match indices for each - of M columns, for positive match, the match result will be the - corresponding row index, for negative match, the match will be - `negative_value`, for ignored match, the match result will be - `ignore_value`. - """ - squeeze_result = False - if len(similarity_matrix.shape) == 2: - squeeze_result = True - similarity_matrix = tf.expand_dims(similarity_matrix, axis=0) - - static_shape = similarity_matrix.shape.as_list() - num_rows = static_shape[1] or tf.shape(similarity_matrix)[1] - batch_size = static_shape[0] or tf.shape(similarity_matrix)[0] - - def _match_when_rows_are_empty(): - """Performs matching when the rows of similarity matrix are empty. - - When the rows are empty, all detections are false positives. So we return - a tensor of -1's to indicate that the columns do not match to any rows. - - Returns: - matches: int32 tensor indicating the row each column matches to. - """ - with tf.name_scope('empty_gt_boxes'): - matches = tf.zeros([batch_size, num_rows], dtype=tf.int32) - match_labels = -tf.ones([batch_size, num_rows], dtype=tf.int32) - return matches, match_labels - - def _match_when_rows_are_non_empty(): - """Performs matching when the rows of similarity matrix are non empty. - - Returns: - matches: int32 tensor indicating the row each column matches to. - """ - # Matches for each column - with tf.name_scope('non_empty_gt_boxes'): - matches = tf.argmax(similarity_matrix, axis=-1, output_type=tf.int32) - - # Get logical indices of ignored and unmatched columns as tf.int64 - matched_vals = tf.reduce_max(similarity_matrix, axis=-1) - matched_indicators = tf.zeros([batch_size, num_rows], tf.int32) - - match_dtype = matched_vals.dtype - for (ind, low, high) in zip(self.indicators, self.thresholds[:-1], - self.thresholds[1:]): - low_threshold = tf.cast(low, match_dtype) - high_threshold = tf.cast(high, match_dtype) - mask = tf.logical_and( - tf.greater_equal(matched_vals, low_threshold), - tf.less(matched_vals, high_threshold)) - matched_indicators = self._set_values_using_indicator( - matched_indicators, mask, ind) - - if self._force_match_for_each_col: - # [batch_size, M], for each col (groundtruth_box), find the best - # matching row (anchor). - force_match_column_ids = tf.argmax( - input=similarity_matrix, axis=1, output_type=tf.int32) - # [batch_size, M, N] - force_match_column_indicators = tf.one_hot( - force_match_column_ids, depth=num_rows) - # [batch_size, N], for each row (anchor), find the largest column - # index for groundtruth box - force_match_row_ids = tf.argmax( - input=force_match_column_indicators, axis=1, output_type=tf.int32) - # [batch_size, N] - force_match_column_mask = tf.cast( - tf.reduce_max(force_match_column_indicators, axis=1), - tf.bool) - # [batch_size, N] - final_matches = tf.where(force_match_column_mask, force_match_row_ids, - matches) - final_matched_indicators = tf.where( - force_match_column_mask, self.indicators[-1] * - tf.ones([batch_size, num_rows], dtype=tf.int32), - matched_indicators) - return final_matches, final_matched_indicators - else: - return matches, matched_indicators - - num_gt_boxes = similarity_matrix.shape.as_list()[-1] or tf.shape( - similarity_matrix)[-1] - result_match, result_matched_indicators = tf.cond( - pred=tf.greater(num_gt_boxes, 0), - true_fn=_match_when_rows_are_non_empty, - false_fn=_match_when_rows_are_empty) - - if squeeze_result: - result_match = tf.squeeze(result_match, axis=0) - result_matched_indicators = tf.squeeze(result_matched_indicators, axis=0) - - return result_match, result_matched_indicators - - def _set_values_using_indicator(self, x, indicator, val): - """Set the indicated fields of x to val. - - Args: - x: tensor. - indicator: boolean with same shape as x. - val: scalar with value to set. - - Returns: - modified tensor. - """ - indicator = tf.cast(indicator, x.dtype) - return tf.add(tf.multiply(x, 1 - indicator), val * indicator) diff --git a/official/vision/beta/ops/box_ops.py b/official/vision/beta/ops/box_ops.py deleted file mode 100644 index 3a6f69c247fe63c40503389353143e0574ad2588..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/box_ops.py +++ /dev/null @@ -1,763 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Box related ops.""" - -# Import libraries -import numpy as np -import tensorflow as tf - - -EPSILON = 1e-8 -BBOX_XFORM_CLIP = np.log(1000. / 16.) - - -def yxyx_to_xywh(boxes): - """Converts boxes from ymin, xmin, ymax, xmax to xmin, ymin, width, height. - - Args: - boxes: a numpy array whose last dimension is 4 representing the coordinates - of boxes in ymin, xmin, ymax, xmax order. - - Returns: - boxes: a numpy array whose shape is the same as `boxes` in new format. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - boxes_ymin = boxes[..., 0] - boxes_xmin = boxes[..., 1] - boxes_width = boxes[..., 3] - boxes[..., 1] - boxes_height = boxes[..., 2] - boxes[..., 0] - new_boxes = np.stack( - [boxes_xmin, boxes_ymin, boxes_width, boxes_height], axis=-1) - - return new_boxes - - -def yxyx_to_cycxhw(boxes): - """Converts box corner coordinates to center plus height and width terms. - - Args: - boxes: a `Tensor` with last dimension of 4, representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - - Returns: - boxes: a `Tensor` with the same shape as the inputted boxes, in the format - of cy, cx, height, width. - - Raises: - ValueError: if the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError('Last dimension of boxes must be 4 but is {:d}'.format( - boxes.shape[-1])) - - boxes_ycenter = (boxes[..., 0] + boxes[..., 2]) / 2 - boxes_xcenter = (boxes[..., 1] + boxes[..., 3]) / 2 - boxes_height = boxes[..., 2] - boxes[..., 0] - boxes_width = boxes[..., 3] - boxes[..., 1] - - new_boxes = tf.stack( - [boxes_ycenter, boxes_xcenter, boxes_height, boxes_width], axis=-1) - return new_boxes - - -def cycxhw_to_yxyx(boxes): - """Converts box center coordinates plus height and width terms to corner. - - Args: - boxes: a numpy array whose last dimension is 4 representing the coordinates - of boxes in cy, cx, height, width order. - - Returns: - boxes: a numpy array whose shape is the same as `boxes` in new format. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - boxes_ymin = boxes[..., 0] - boxes[..., 2] / 2 - boxes_xmin = boxes[..., 1] - boxes[..., 3] / 2 - boxes_ymax = boxes[..., 0] + boxes[..., 2] / 2 - boxes_xmax = boxes[..., 1] + boxes[..., 3] / 2 - new_boxes = tf.stack([ - boxes_ymin, boxes_xmin, boxes_ymax, boxes_xmax], axis=-1) - return new_boxes - - -def jitter_boxes(boxes, noise_scale=0.025): - """Jitter the box coordinates by some noise distribution. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - noise_scale: a python float which specifies the magnitude of noise. The rule - of thumb is to set this between (0, 0.1]. The default value is found to - mimic the noisy detections best empirically. - - Returns: - jittered_boxes: a tensor whose shape is the same as `boxes` representing - the jittered boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - with tf.name_scope('jitter_boxes'): - bbox_jitters = tf.random.normal(tf.shape(boxes), stddev=noise_scale) - ymin = boxes[..., 0:1] - xmin = boxes[..., 1:2] - ymax = boxes[..., 2:3] - xmax = boxes[..., 3:4] - width = xmax - xmin - height = ymax - ymin - new_center_x = (xmin + xmax) / 2.0 + bbox_jitters[..., 0:1] * width - new_center_y = (ymin + ymax) / 2.0 + bbox_jitters[..., 1:2] * height - new_width = width * tf.math.exp(bbox_jitters[..., 2:3]) - new_height = height * tf.math.exp(bbox_jitters[..., 3:4]) - jittered_boxes = tf.concat( - [new_center_y - new_height * 0.5, new_center_x - new_width * 0.5, - new_center_y + new_height * 0.5, new_center_x + new_width * 0.5], - axis=-1) - - return jittered_boxes - - -def normalize_boxes(boxes, image_shape): - """Converts boxes to the normalized coordinates. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates - of boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - - Returns: - normalized_boxes: a tensor whose shape is the same as `boxes` representing - the normalized boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - with tf.name_scope('normalize_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height = image_shape[..., 0:1] - width = image_shape[..., 1:2] - - ymin = boxes[..., 0:1] / height - xmin = boxes[..., 1:2] / width - ymax = boxes[..., 2:3] / height - xmax = boxes[..., 3:4] / width - - normalized_boxes = tf.concat([ymin, xmin, ymax, xmax], axis=-1) - return normalized_boxes - - -def denormalize_boxes(boxes, image_shape): - """Converts boxes normalized by [height, width] to pixel coordinates. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates - of boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - - Returns: - denormalized_boxes: a tensor whose shape is the same as `boxes` representing - the denormalized boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - with tf.name_scope('denormalize_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height, width = tf.split(image_shape, 2, axis=-1) - - ymin, xmin, ymax, xmax = tf.split(boxes, 4, axis=-1) - ymin = ymin * height - xmin = xmin * width - ymax = ymax * height - xmax = xmax * width - - denormalized_boxes = tf.concat([ymin, xmin, ymax, xmax], axis=-1) - return denormalized_boxes - - -def clip_boxes(boxes, image_shape): - """Clips boxes to image boundaries. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates - of boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - - Returns: - clipped_boxes: a tensor whose shape is the same as `boxes` representing the - clipped boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - with tf.name_scope('clip_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - max_length = [height, width, height, width] - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height, width = tf.unstack(image_shape, axis=-1) - max_length = tf.stack([height, width, height, width], axis=-1) - - clipped_boxes = tf.math.maximum(tf.math.minimum(boxes, max_length), 0.0) - return clipped_boxes - - -def compute_outer_boxes(boxes, image_shape, scale=1.0): - """Compute outer box encloses an object with a margin. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - scale: a float number specifying the scale of output outer boxes to input - `boxes`. - - Returns: - outer_boxes: a tensor whose shape is the same as `boxes` representing the - outer boxes. - """ - if scale < 1.0: - raise ValueError( - 'scale is {}, but outer box scale must be greater than 1.0.'.format( - scale)) - centers_y = (boxes[..., 0] + boxes[..., 2]) / 2.0 - centers_x = (boxes[..., 1] + boxes[..., 3]) / 2.0 - box_height = (boxes[..., 2] - boxes[..., 0]) * scale - box_width = (boxes[..., 3] - boxes[..., 1]) * scale - outer_boxes = tf.stack( - [centers_y - box_height / 2.0, centers_x - box_width / 2.0, - centers_y + box_height / 2.0, centers_x + box_width / 2.0], - axis=1) - outer_boxes = clip_boxes(outer_boxes, image_shape) - return outer_boxes - - -def encode_boxes(boxes, anchors, weights=None): - """Encode boxes to targets. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates - of boxes in ymin, xmin, ymax, xmax order. - anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, - representing the coordinates of anchors in ymin, xmin, ymax, xmax order. - weights: None or a list of four float numbers used to scale coordinates. - - Returns: - encoded_boxes: a tensor whose shape is the same as `boxes` representing the - encoded box targets. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - with tf.name_scope('encode_boxes'): - boxes = tf.cast(boxes, dtype=anchors.dtype) - ymin = boxes[..., 0:1] - xmin = boxes[..., 1:2] - ymax = boxes[..., 2:3] - xmax = boxes[..., 3:4] - box_h = ymax - ymin - box_w = xmax - xmin - box_yc = ymin + 0.5 * box_h - box_xc = xmin + 0.5 * box_w - - anchor_ymin = anchors[..., 0:1] - anchor_xmin = anchors[..., 1:2] - anchor_ymax = anchors[..., 2:3] - anchor_xmax = anchors[..., 3:4] - anchor_h = anchor_ymax - anchor_ymin - anchor_w = anchor_xmax - anchor_xmin - anchor_yc = anchor_ymin + 0.5 * anchor_h - anchor_xc = anchor_xmin + 0.5 * anchor_w - - encoded_dy = (box_yc - anchor_yc) / anchor_h - encoded_dx = (box_xc - anchor_xc) / anchor_w - encoded_dh = tf.math.log(box_h / anchor_h) - encoded_dw = tf.math.log(box_w / anchor_w) - if weights: - encoded_dy *= weights[0] - encoded_dx *= weights[1] - encoded_dh *= weights[2] - encoded_dw *= weights[3] - - encoded_boxes = tf.concat( - [encoded_dy, encoded_dx, encoded_dh, encoded_dw], axis=-1) - return encoded_boxes - - -def decode_boxes(encoded_boxes, anchors, weights=None): - """Decode boxes. - - Args: - encoded_boxes: a tensor whose last dimension is 4 representing the - coordinates of encoded boxes in ymin, xmin, ymax, xmax order. - anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, - representing the coordinates of anchors in ymin, xmin, ymax, xmax order. - weights: None or a list of four float numbers used to scale coordinates. - - Returns: - encoded_boxes: a tensor whose shape is the same as `boxes` representing the - decoded box targets. - """ - if encoded_boxes.shape[-1] != 4: - raise ValueError( - 'encoded_boxes.shape[-1] is {:d}, but must be 4.' - .format(encoded_boxes.shape[-1])) - - with tf.name_scope('decode_boxes'): - encoded_boxes = tf.cast(encoded_boxes, dtype=anchors.dtype) - dy = encoded_boxes[..., 0:1] - dx = encoded_boxes[..., 1:2] - dh = encoded_boxes[..., 2:3] - dw = encoded_boxes[..., 3:4] - if weights: - dy /= weights[0] - dx /= weights[1] - dh /= weights[2] - dw /= weights[3] - dh = tf.math.minimum(dh, BBOX_XFORM_CLIP) - dw = tf.math.minimum(dw, BBOX_XFORM_CLIP) - - anchor_ymin = anchors[..., 0:1] - anchor_xmin = anchors[..., 1:2] - anchor_ymax = anchors[..., 2:3] - anchor_xmax = anchors[..., 3:4] - anchor_h = anchor_ymax - anchor_ymin - anchor_w = anchor_xmax - anchor_xmin - anchor_yc = anchor_ymin + 0.5 * anchor_h - anchor_xc = anchor_xmin + 0.5 * anchor_w - - decoded_boxes_yc = dy * anchor_h + anchor_yc - decoded_boxes_xc = dx * anchor_w + anchor_xc - decoded_boxes_h = tf.math.exp(dh) * anchor_h - decoded_boxes_w = tf.math.exp(dw) * anchor_w - - decoded_boxes_ymin = decoded_boxes_yc - 0.5 * decoded_boxes_h - decoded_boxes_xmin = decoded_boxes_xc - 0.5 * decoded_boxes_w - decoded_boxes_ymax = decoded_boxes_ymin + decoded_boxes_h - decoded_boxes_xmax = decoded_boxes_xmin + decoded_boxes_w - - decoded_boxes = tf.concat( - [decoded_boxes_ymin, decoded_boxes_xmin, - decoded_boxes_ymax, decoded_boxes_xmax], - axis=-1) - return decoded_boxes - - -def filter_boxes(boxes, scores, image_shape, min_size_threshold): - """Filter and remove boxes that are too small or fall outside the image. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - scores: a tensor whose shape is the same as tf.shape(boxes)[:-1] - representing the original scores of the boxes. - image_shape: a tensor whose shape is the same as, or `broadcastable` to - `boxes` except the last dimension, which is 2, representing [height, - width] of the scaled image. - min_size_threshold: a float representing the minimal box size in each side - (w.r.t. the scaled image). Boxes whose sides are smaller than it will be - filtered out. - - Returns: - filtered_boxes: a tensor whose shape is the same as `boxes` but with - the position of the filtered boxes are filled with 0. - filtered_scores: a tensor whose shape is the same as 'scores' but with - the positinon of the filtered boxes filled with 0. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - with tf.name_scope('filter_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height = image_shape[..., 0] - width = image_shape[..., 1] - - ymin = boxes[..., 0] - xmin = boxes[..., 1] - ymax = boxes[..., 2] - xmax = boxes[..., 3] - - h = ymax - ymin - w = xmax - xmin - yc = ymin + 0.5 * h - xc = xmin + 0.5 * w - - min_size = tf.cast( - tf.math.maximum(min_size_threshold, 0.0), dtype=boxes.dtype) - - filtered_size_mask = tf.math.logical_and( - tf.math.greater(h, min_size), tf.math.greater(w, min_size)) - filtered_center_mask = tf.logical_and( - tf.math.logical_and(tf.math.greater(yc, 0.0), tf.math.less(yc, height)), - tf.math.logical_and(tf.math.greater(xc, 0.0), tf.math.less(xc, width))) - filtered_mask = tf.math.logical_and( - filtered_size_mask, filtered_center_mask) - - filtered_scores = tf.where(filtered_mask, scores, tf.zeros_like(scores)) - filtered_boxes = tf.cast( - tf.expand_dims(filtered_mask, axis=-1), dtype=boxes.dtype) * boxes - return filtered_boxes, filtered_scores - - -def filter_boxes_by_scores(boxes, scores, min_score_threshold): - """Filter and remove boxes whose scores are smaller than the threshold. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - scores: a tensor whose shape is the same as tf.shape(boxes)[:-1] - representing the original scores of the boxes. - min_score_threshold: a float representing the minimal box score threshold. - Boxes whose score are smaller than it will be filtered out. - - Returns: - filtered_boxes: a tensor whose shape is the same as `boxes` but with - the position of the filtered boxes are filled with -1. - filtered_scores: a tensor whose shape is the same as 'scores' but with - the - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - with tf.name_scope('filter_boxes_by_scores'): - filtered_mask = tf.math.greater(scores, min_score_threshold) - filtered_scores = tf.where(filtered_mask, scores, -tf.ones_like(scores)) - filtered_boxes = tf.cast( - tf.expand_dims(filtered_mask, axis=-1), dtype=boxes.dtype) * boxes - - return filtered_boxes, filtered_scores - - -def gather_instances(selected_indices, instances, *aux_instances): - """Gather instances by indices. - - Args: - selected_indices: a Tensor of shape [batch, K] which indicates the selected - indices in instance dimension (2nd dimension). - instances: a Tensor of shape [batch, N, ...] where the 2nd dimension is - the instance dimension to be selected from. - *aux_instances: the additional Tensors whose shapes are in [batch, N, ...] - which are the tensors to be selected from using the `selected_indices`. - - Returns: - selected_instances: the tensor of shape [batch, K, ...] which corresponds to - the selected instances of the `instances` tensor. - selected_aux_instances: the additional tensors of shape [batch, K, ...] - which corresponds to the selected instances of the `aus_instances` - tensors. - """ - batch_size = instances.shape[0] - if batch_size == 1: - selected_instances = tf.squeeze( - tf.gather(instances, selected_indices, axis=1), axis=1) - if aux_instances: - selected_aux_instances = [ - tf.squeeze( - tf.gather(a, selected_indices, axis=1), axis=1) - for a in aux_instances - ] - return tuple([selected_instances] + selected_aux_instances) - else: - return selected_instances - else: - indices_shape = tf.shape(selected_indices) - batch_indices = ( - tf.expand_dims(tf.range(indices_shape[0]), axis=-1) * - tf.ones([1, indices_shape[-1]], dtype=tf.int32)) - gather_nd_indices = tf.stack( - [batch_indices, selected_indices], axis=-1) - selected_instances = tf.gather_nd(instances, gather_nd_indices) - if aux_instances: - selected_aux_instances = [ - tf.gather_nd(a, gather_nd_indices) for a in aux_instances - ] - return tuple([selected_instances] + selected_aux_instances) - else: - return selected_instances - - -def top_k_boxes(boxes, scores, k): - """Sort and select top k boxes according to the scores. - - Args: - boxes: a tensor of shape [batch_size, N, 4] representing the coordinate of - the boxes. N is the number of boxes per image. - scores: a tensor of shsape [batch_size, N] representing the socre of the - boxes. - k: an integer or a tensor indicating the top k number. - - Returns: - selected_boxes: a tensor of shape [batch_size, k, 4] representing the - selected top k box coordinates. - selected_scores: a tensor of shape [batch_size, k] representing the selected - top k box scores. - """ - with tf.name_scope('top_k_boxes'): - selected_scores, top_k_indices = tf.nn.top_k(scores, k=k, sorted=True) - selected_boxes = gather_instances(top_k_indices, boxes) - return selected_boxes, selected_scores - - -def get_non_empty_box_indices(boxes): - """Get indices for non-empty boxes.""" - # Selects indices if box height or width is 0. - height = boxes[:, 2] - boxes[:, 0] - width = boxes[:, 3] - boxes[:, 1] - indices = tf.where(tf.logical_and(tf.greater(height, 0), - tf.greater(width, 0))) - return indices[:, 0] - - -def bbox_overlap(boxes, gt_boxes): - """Calculates the overlap between proposal and ground truth boxes. - - Some `boxes` or `gt_boxes` may have been padded. The returned `iou` tensor - for these boxes will be -1. - - Args: - boxes: a tensor with a shape of [batch_size, N, 4]. N is the number of - proposals before groundtruth assignment (e.g., rpn_post_nms_topn). The - last dimension is the pixel coordinates in [ymin, xmin, ymax, xmax] form. - gt_boxes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES, 4]. This - tensor might have paddings with a negative value. - - Returns: - iou: a tensor with as a shape of [batch_size, N, MAX_NUM_INSTANCES]. - """ - with tf.name_scope('bbox_overlap'): - bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( - value=boxes, num_or_size_splits=4, axis=2) - gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( - value=gt_boxes, num_or_size_splits=4, axis=2) - - # Calculates the intersection area. - i_xmin = tf.math.maximum(bb_x_min, tf.transpose(gt_x_min, [0, 2, 1])) - i_xmax = tf.math.minimum(bb_x_max, tf.transpose(gt_x_max, [0, 2, 1])) - i_ymin = tf.math.maximum(bb_y_min, tf.transpose(gt_y_min, [0, 2, 1])) - i_ymax = tf.math.minimum(bb_y_max, tf.transpose(gt_y_max, [0, 2, 1])) - i_area = ( - tf.math.maximum((i_xmax - i_xmin), 0) * - tf.math.maximum((i_ymax - i_ymin), 0)) - - # Calculates the union area. - bb_area = (bb_y_max - bb_y_min) * (bb_x_max - bb_x_min) - gt_area = (gt_y_max - gt_y_min) * (gt_x_max - gt_x_min) - # Adds a small epsilon to avoid divide-by-zero. - u_area = bb_area + tf.transpose(gt_area, [0, 2, 1]) - i_area + 1e-8 - - # Calculates IoU. - iou = i_area / u_area - - # Fills -1 for IoU entries between the padded ground truth boxes. - gt_invalid_mask = tf.less( - tf.reduce_max(gt_boxes, axis=-1, keepdims=True), 0.0) - padding_mask = tf.logical_or( - tf.zeros_like(bb_x_min, dtype=tf.bool), - tf.transpose(gt_invalid_mask, [0, 2, 1])) - iou = tf.where(padding_mask, -tf.ones_like(iou), iou) - - # Fills -1 for for invalid (-1) boxes. - boxes_invalid_mask = tf.less( - tf.reduce_max(boxes, axis=-1, keepdims=True), 0.0) - iou = tf.where(boxes_invalid_mask, -tf.ones_like(iou), iou) - - return iou - - -def bbox_generalized_overlap(boxes, gt_boxes): - """Calculates the GIOU between proposal and ground truth boxes. - - The generalized intersection of union is an adjustment of the traditional IOU - metric which provides continuous updates even for predictions with no overlap. - This metric is defined in https://giou.stanford.edu/GIoU.pdf. Note, some - `gt_boxes` may have been padded. The returned `giou` tensor for these boxes - will be -1. - - Args: - boxes: a `Tensor` with a shape of [batch_size, N, 4]. N is the number of - proposals before groundtruth assignment (e.g., rpn_post_nms_topn). The - last dimension is the pixel coordinates in [ymin, xmin, ymax, xmax] form. - gt_boxes: a `Tensor` with a shape of [batch_size, max_num_instances, 4]. - This tensor may have paddings with a negative value and will also be in - the [ymin, xmin, ymax, xmax] format. - - Returns: - giou: a `Tensor` with as a shape of [batch_size, N, max_num_instances]. - """ - with tf.name_scope('bbox_generalized_overlap'): - assert boxes.shape.as_list( - )[-1] == 4, 'Boxes must be defined by 4 coordinates.' - assert gt_boxes.shape.as_list( - )[-1] == 4, 'Groundtruth boxes must be defined by 4 coordinates.' - - bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( - value=boxes, num_or_size_splits=4, axis=2) - gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( - value=gt_boxes, num_or_size_splits=4, axis=2) - - # Calculates the hull area for each pair of boxes, with one from - # boxes and the other from gt_boxes. - # Outputs for coordinates are of shape [batch_size, N, max_num_instances] - h_xmin = tf.minimum(bb_x_min, tf.transpose(gt_x_min, [0, 2, 1])) - h_xmax = tf.maximum(bb_x_max, tf.transpose(gt_x_max, [0, 2, 1])) - h_ymin = tf.minimum(bb_y_min, tf.transpose(gt_y_min, [0, 2, 1])) - h_ymax = tf.maximum(bb_y_max, tf.transpose(gt_y_max, [0, 2, 1])) - h_area = tf.maximum((h_xmax - h_xmin), 0) * tf.maximum((h_ymax - h_ymin), 0) - # Add a small epsilon to avoid divide-by-zero. - h_area = h_area + 1e-8 - - # Calculates the intersection area. - i_xmin = tf.maximum(bb_x_min, tf.transpose(gt_x_min, [0, 2, 1])) - i_xmax = tf.minimum(bb_x_max, tf.transpose(gt_x_max, [0, 2, 1])) - i_ymin = tf.maximum(bb_y_min, tf.transpose(gt_y_min, [0, 2, 1])) - i_ymax = tf.minimum(bb_y_max, tf.transpose(gt_y_max, [0, 2, 1])) - i_area = tf.maximum((i_xmax - i_xmin), 0) * tf.maximum((i_ymax - i_ymin), 0) - - # Calculates the union area. - bb_area = (bb_y_max - bb_y_min) * (bb_x_max - bb_x_min) - gt_area = (gt_y_max - gt_y_min) * (gt_x_max - gt_x_min) - - # Adds a small epsilon to avoid divide-by-zero. - u_area = bb_area + tf.transpose(gt_area, [0, 2, 1]) - i_area + 1e-8 - - # Calculates IoU. - iou = i_area / u_area - # Calculates GIoU. - giou = iou - (h_area - u_area) / h_area - - # Fills -1 for GIoU entries between the padded ground truth boxes. - gt_invalid_mask = tf.less( - tf.reduce_max(gt_boxes, axis=-1, keepdims=True), 0.0) - padding_mask = tf.broadcast_to( - tf.transpose(gt_invalid_mask, [0, 2, 1]), tf.shape(giou)) - giou = tf.where(padding_mask, -tf.ones_like(giou), giou) - return giou - - -def box_matching(boxes, gt_boxes, gt_classes): - """Match boxes to groundtruth boxes. - - Given the proposal boxes and the groundtruth boxes and classes, perform the - groundtruth matching by taking the argmax of the IoU between boxes and - groundtruth boxes. - - Args: - boxes: a tensor of shape of [batch_size, N, 4] representing the box - coordiantes to be matched to groundtruth boxes. - gt_boxes: a tensor of shape of [batch_size, MAX_INSTANCES, 4] representing - the groundtruth box coordinates. It is padded with -1s to indicate the - invalid boxes. - gt_classes: [batch_size, MAX_INSTANCES] representing the groundtruth box - classes. It is padded with -1s to indicate the invalid classes. - - Returns: - matched_gt_boxes: a tensor of shape of [batch_size, N, 4], representing - the matched groundtruth box coordinates for each input box. If the box - does not overlap with any groundtruth boxes, the matched boxes of it - will be set to all 0s. - matched_gt_classes: a tensor of shape of [batch_size, N], representing - the matched groundtruth classes for each input box. If the box does not - overlap with any groundtruth boxes, the matched box classes of it will - be set to 0, which corresponds to the background class. - matched_gt_indices: a tensor of shape of [batch_size, N], representing - the indices of the matched groundtruth boxes in the original gt_boxes - tensor. If the box does not overlap with any groundtruth boxes, the - index of the matched groundtruth will be set to -1. - matched_iou: a tensor of shape of [batch_size, N], representing the IoU - between the box and its matched groundtruth box. The matched IoU is the - maximum IoU of the box and all the groundtruth boxes. - iou: a tensor of shape of [batch_size, N, K], representing the IoU matrix - between boxes and the groundtruth boxes. The IoU between a box and the - invalid groundtruth boxes whose coordinates are [-1, -1, -1, -1] is -1. - """ - # Compute IoU between boxes and gt_boxes. - # iou <- [batch_size, N, K] - iou = bbox_overlap(boxes, gt_boxes) - - # max_iou <- [batch_size, N] - # 0.0 -> no match to gt, or -1.0 match to no gt - matched_iou = tf.reduce_max(iou, axis=-1) - - # background_box_mask <- bool, [batch_size, N] - background_box_mask = tf.less_equal(matched_iou, 0.0) - - argmax_iou_indices = tf.argmax(iou, axis=-1, output_type=tf.int32) - - matched_gt_boxes, matched_gt_classes = gather_instances( - argmax_iou_indices, gt_boxes, gt_classes) - matched_gt_boxes = tf.where( - tf.tile(tf.expand_dims(background_box_mask, axis=-1), [1, 1, 4]), - tf.zeros_like(matched_gt_boxes, dtype=matched_gt_boxes.dtype), - matched_gt_boxes) - matched_gt_classes = tf.where( - background_box_mask, - tf.zeros_like(matched_gt_classes), - matched_gt_classes) - - matched_gt_indices = tf.where( - background_box_mask, - -tf.ones_like(argmax_iou_indices), - argmax_iou_indices) - - return (matched_gt_boxes, matched_gt_classes, matched_gt_indices, - matched_iou, iou) diff --git a/official/vision/beta/ops/mask_ops.py b/official/vision/beta/ops/mask_ops.py deleted file mode 100644 index 6109bfdb568d815875a3c5b2cdf58bab4b8ede4d..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/mask_ops.py +++ /dev/null @@ -1,190 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Utility functions for segmentations.""" - -import math -# Import libraries -import cv2 -import numpy as np - - -def paste_instance_masks(masks, - detected_boxes, - image_height, - image_width): - """Paste instance masks to generate the image segmentation results. - - Args: - masks: a numpy array of shape [N, mask_height, mask_width] representing the - instance masks w.r.t. the `detected_boxes`. - detected_boxes: a numpy array of shape [N, 4] representing the reference - bounding boxes. - image_height: an integer representing the height of the image. - image_width: an integer representing the width of the image. - - Returns: - segms: a numpy array of shape [N, image_height, image_width] representing - the instance masks *pasted* on the image canvas. - """ - - def expand_boxes(boxes, scale): - """Expands an array of boxes by a given scale.""" - # Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/boxes.py#L227 # pylint: disable=line-too-long - # The `boxes` in the reference implementation is in [x1, y1, x2, y2] form, - # whereas `boxes` here is in [x1, y1, w, h] form - w_half = boxes[:, 2] * .5 - h_half = boxes[:, 3] * .5 - x_c = boxes[:, 0] + w_half - y_c = boxes[:, 1] + h_half - - w_half *= scale - h_half *= scale - - boxes_exp = np.zeros(boxes.shape) - boxes_exp[:, 0] = x_c - w_half - boxes_exp[:, 2] = x_c + w_half - boxes_exp[:, 1] = y_c - h_half - boxes_exp[:, 3] = y_c + h_half - - return boxes_exp - - # Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/core/test.py#L812 # pylint: disable=line-too-long - # To work around an issue with cv2.resize (it seems to automatically pad - # with repeated border values), we manually zero-pad the masks by 1 pixel - # prior to resizing back to the original image resolution. This prevents - # "top hat" artifacts. We therefore need to expand the reference boxes by an - # appropriate factor. - _, mask_height, mask_width = masks.shape - scale = max((mask_width + 2.0) / mask_width, - (mask_height + 2.0) / mask_height) - - ref_boxes = expand_boxes(detected_boxes, scale) - ref_boxes = ref_boxes.astype(np.int32) - padded_mask = np.zeros((mask_height + 2, mask_width + 2), dtype=np.float32) - segms = [] - for mask_ind, mask in enumerate(masks): - im_mask = np.zeros((image_height, image_width), dtype=np.uint8) - # Process mask inside bounding boxes. - padded_mask[1:-1, 1:-1] = mask[:, :] - - ref_box = ref_boxes[mask_ind, :] - w = ref_box[2] - ref_box[0] + 1 - h = ref_box[3] - ref_box[1] + 1 - w = np.maximum(w, 1) - h = np.maximum(h, 1) - - mask = cv2.resize(padded_mask, (w, h)) - mask = np.array(mask > 0.5, dtype=np.uint8) - - x_0 = min(max(ref_box[0], 0), image_width) - x_1 = min(max(ref_box[2] + 1, 0), image_width) - y_0 = min(max(ref_box[1], 0), image_height) - y_1 = min(max(ref_box[3] + 1, 0), image_height) - - im_mask[y_0:y_1, x_0:x_1] = mask[ - (y_0 - ref_box[1]):(y_1 - ref_box[1]), - (x_0 - ref_box[0]):(x_1 - ref_box[0]) - ] - segms.append(im_mask) - - segms = np.array(segms) - assert masks.shape[0] == segms.shape[0] - return segms - - -def paste_instance_masks_v2(masks, - detected_boxes, - image_height, - image_width): - """Paste instance masks to generate the image segmentation (v2). - - Args: - masks: a numpy array of shape [N, mask_height, mask_width] representing the - instance masks w.r.t. the `detected_boxes`. - detected_boxes: a numpy array of shape [N, 4] representing the reference - bounding boxes. - image_height: an integer representing the height of the image. - image_width: an integer representing the width of the image. - - Returns: - segms: a numpy array of shape [N, image_height, image_width] representing - the instance masks *pasted* on the image canvas. - """ - _, mask_height, mask_width = masks.shape - - segms = [] - for i, mask in enumerate(masks): - box = detected_boxes[i, :] - xmin = box[0] - ymin = box[1] - xmax = xmin + box[2] - ymax = ymin + box[3] - - # Sample points of the cropped mask w.r.t. the image grid. - # Note that these coordinates may fall beyond the image. - # Pixel clipping will happen after warping. - xmin_int = int(math.floor(xmin)) - xmax_int = int(math.ceil(xmax)) - ymin_int = int(math.floor(ymin)) - ymax_int = int(math.ceil(ymax)) - - alpha = box[2] / (1.0 * mask_width) - beta = box[3] / (1.0 * mask_height) - # pylint: disable=invalid-name - # Transformation from mask pixel indices to image coordinate. - M_mask_to_image = np.array( - [[alpha, 0, xmin], - [0, beta, ymin], - [0, 0, 1]], - dtype=np.float32) - # Transformation from image to cropped mask coordinate. - M_image_to_crop = np.array( - [[1, 0, -xmin_int], - [0, 1, -ymin_int], - [0, 0, 1]], - dtype=np.float32) - M = np.dot(M_image_to_crop, M_mask_to_image) - # Compensate the half pixel offset that OpenCV has in the - # warpPerspective implementation: the top-left pixel is sampled - # at (0,0), but we want it to be at (0.5, 0.5). - M = np.dot( - np.dot( - np.array([[1, 0, -0.5], - [0, 1, -0.5], - [0, 0, 1]], np.float32), - M), - np.array([[1, 0, 0.5], - [0, 1, 0.5], - [0, 0, 1]], np.float32)) - # pylint: enable=invalid-name - cropped_mask = cv2.warpPerspective( - mask.astype(np.float32), M, - (xmax_int - xmin_int, ymax_int - ymin_int)) - cropped_mask = np.array(cropped_mask > 0.5, dtype=np.uint8) - - img_mask = np.zeros((image_height, image_width)) - x0 = max(min(xmin_int, image_width), 0) - x1 = max(min(xmax_int, image_width), 0) - y0 = max(min(ymin_int, image_height), 0) - y1 = max(min(ymax_int, image_height), 0) - img_mask[y0:y1, x0:x1] = cropped_mask[ - (y0 - ymin_int):(y1 - ymin_int), - (x0 - xmin_int):(x1 - xmin_int)] - - segms.append(img_mask) - - segms = np.array(segms) - return segms - diff --git a/official/vision/beta/ops/nms.py b/official/vision/beta/ops/nms.py deleted file mode 100644 index 945e7896d3b2ea0d3ee37dbd20125bc15125bd50..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/nms.py +++ /dev/null @@ -1,202 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tensorflow implementation of non max suppression.""" - -# Import libraries -import tensorflow as tf - -from official.vision.beta.ops import box_ops - - -NMS_TILE_SIZE = 512 - - -def _self_suppression(iou, _, iou_sum): - batch_size = tf.shape(iou)[0] - can_suppress_others = tf.cast( - tf.reshape(tf.reduce_max(iou, 1) <= 0.5, [batch_size, -1, 1]), iou.dtype) - iou_suppressed = tf.reshape( - tf.cast(tf.reduce_max(can_suppress_others * iou, 1) <= 0.5, iou.dtype), - [batch_size, -1, 1]) * iou - iou_sum_new = tf.reduce_sum(iou_suppressed, [1, 2]) - return [ - iou_suppressed, - tf.reduce_any(iou_sum - iou_sum_new > 0.5), iou_sum_new - ] - - -def _cross_suppression(boxes, box_slice, iou_threshold, inner_idx): - batch_size = tf.shape(boxes)[0] - new_slice = tf.slice(boxes, [0, inner_idx * NMS_TILE_SIZE, 0], - [batch_size, NMS_TILE_SIZE, 4]) - iou = box_ops.bbox_overlap(new_slice, box_slice) - ret_slice = tf.expand_dims( - tf.cast(tf.reduce_all(iou < iou_threshold, [1]), box_slice.dtype), - 2) * box_slice - return boxes, ret_slice, iou_threshold, inner_idx + 1 - - -def _suppression_loop_body(boxes, iou_threshold, output_size, idx): - """Process boxes in the range [idx*NMS_TILE_SIZE, (idx+1)*NMS_TILE_SIZE). - - Args: - boxes: a tensor with a shape of [batch_size, anchors, 4]. - iou_threshold: a float representing the threshold for deciding whether boxes - overlap too much with respect to IOU. - output_size: an int32 tensor of size [batch_size]. Representing the number - of selected boxes for each batch. - idx: an integer scalar representing induction variable. - - Returns: - boxes: updated boxes. - iou_threshold: pass down iou_threshold to the next iteration. - output_size: the updated output_size. - idx: the updated induction variable. - """ - num_tiles = tf.shape(boxes)[1] // NMS_TILE_SIZE - batch_size = tf.shape(boxes)[0] - - # Iterates over tiles that can possibly suppress the current tile. - box_slice = tf.slice(boxes, [0, idx * NMS_TILE_SIZE, 0], - [batch_size, NMS_TILE_SIZE, 4]) - _, box_slice, _, _ = tf.while_loop( - lambda _boxes, _box_slice, _threshold, inner_idx: inner_idx < idx, - _cross_suppression, [boxes, box_slice, iou_threshold, - tf.constant(0)]) - - # Iterates over the current tile to compute self-suppression. - iou = box_ops.bbox_overlap(box_slice, box_slice) - mask = tf.expand_dims( - tf.reshape(tf.range(NMS_TILE_SIZE), [1, -1]) > tf.reshape( - tf.range(NMS_TILE_SIZE), [-1, 1]), 0) - iou *= tf.cast(tf.logical_and(mask, iou >= iou_threshold), iou.dtype) - suppressed_iou, _, _ = tf.while_loop( - lambda _iou, loop_condition, _iou_sum: loop_condition, _self_suppression, - [iou, tf.constant(True), - tf.reduce_sum(iou, [1, 2])]) - suppressed_box = tf.reduce_sum(suppressed_iou, 1) > 0 - box_slice *= tf.expand_dims(1.0 - tf.cast(suppressed_box, box_slice.dtype), 2) - - # Uses box_slice to update the input boxes. - mask = tf.reshape( - tf.cast(tf.equal(tf.range(num_tiles), idx), boxes.dtype), [1, -1, 1, 1]) - boxes = tf.tile(tf.expand_dims( - box_slice, [1]), [1, num_tiles, 1, 1]) * mask + tf.reshape( - boxes, [batch_size, num_tiles, NMS_TILE_SIZE, 4]) * (1 - mask) - boxes = tf.reshape(boxes, [batch_size, -1, 4]) - - # Updates output_size. - output_size += tf.reduce_sum( - tf.cast(tf.reduce_any(box_slice > 0, [2]), tf.int32), [1]) - return boxes, iou_threshold, output_size, idx + 1 - - -def sorted_non_max_suppression_padded(scores, - boxes, - max_output_size, - iou_threshold): - """A wrapper that handles non-maximum suppression. - - Assumption: - * The boxes are sorted by scores unless the box is a dot (all coordinates - are zero). - * Boxes with higher scores can be used to suppress boxes with lower scores. - - The overal design of the algorithm is to handle boxes tile-by-tile: - - boxes = boxes.pad_to_multiply_of(tile_size) - num_tiles = len(boxes) // tile_size - output_boxes = [] - for i in range(num_tiles): - box_tile = boxes[i*tile_size : (i+1)*tile_size] - for j in range(i - 1): - suppressing_tile = boxes[j*tile_size : (j+1)*tile_size] - iou = bbox_overlap(box_tile, suppressing_tile) - # if the box is suppressed in iou, clear it to a dot - box_tile *= _update_boxes(iou) - # Iteratively handle the diagnal tile. - iou = _box_overlap(box_tile, box_tile) - iou_changed = True - while iou_changed: - # boxes that are not suppressed by anything else - suppressing_boxes = _get_suppressing_boxes(iou) - # boxes that are suppressed by suppressing_boxes - suppressed_boxes = _get_suppressed_boxes(iou, suppressing_boxes) - # clear iou to 0 for boxes that are suppressed, as they cannot be used - # to suppress other boxes any more - new_iou = _clear_iou(iou, suppressed_boxes) - iou_changed = (new_iou != iou) - iou = new_iou - # remaining boxes that can still suppress others, are selected boxes. - output_boxes.append(_get_suppressing_boxes(iou)) - if len(output_boxes) >= max_output_size: - break - - Args: - scores: a tensor with a shape of [batch_size, anchors]. - boxes: a tensor with a shape of [batch_size, anchors, 4]. - max_output_size: a scalar integer `Tensor` representing the maximum number - of boxes to be selected by non max suppression. - iou_threshold: a float representing the threshold for deciding whether boxes - overlap too much with respect to IOU. - - Returns: - nms_scores: a tensor with a shape of [batch_size, anchors]. It has same - dtype as input scores. - nms_proposals: a tensor with a shape of [batch_size, anchors, 4]. It has - same dtype as input boxes. - """ - batch_size = tf.shape(boxes)[0] - num_boxes = tf.shape(boxes)[1] - pad = tf.cast( - tf.math.ceil(tf.cast(num_boxes, tf.float32) / NMS_TILE_SIZE), - tf.int32) * NMS_TILE_SIZE - num_boxes - boxes = tf.pad(tf.cast(boxes, tf.float32), [[0, 0], [0, pad], [0, 0]]) - scores = tf.pad( - tf.cast(scores, tf.float32), [[0, 0], [0, pad]], constant_values=-1) - num_boxes += pad - - def _loop_cond(unused_boxes, unused_threshold, output_size, idx): - return tf.logical_and( - tf.reduce_min(output_size) < max_output_size, - idx < num_boxes // NMS_TILE_SIZE) - - selected_boxes, _, output_size, _ = tf.while_loop( - _loop_cond, _suppression_loop_body, [ - boxes, iou_threshold, - tf.zeros([batch_size], tf.int32), - tf.constant(0) - ]) - idx = num_boxes - tf.cast( - tf.nn.top_k( - tf.cast(tf.reduce_any(selected_boxes > 0, [2]), tf.int32) * - tf.expand_dims(tf.range(num_boxes, 0, -1), 0), max_output_size)[0], - tf.int32) - idx = tf.minimum(idx, num_boxes - 1) - idx = tf.reshape( - idx + tf.reshape(tf.range(batch_size) * num_boxes, [-1, 1]), [-1]) - boxes = tf.reshape( - tf.gather(tf.reshape(boxes, [-1, 4]), idx), - [batch_size, max_output_size, 4]) - boxes = boxes * tf.cast( - tf.reshape(tf.range(max_output_size), [1, -1, 1]) < tf.reshape( - output_size, [-1, 1, 1]), boxes.dtype) - scores = tf.reshape( - tf.gather(tf.reshape(scores, [-1, 1]), idx), - [batch_size, max_output_size]) - scores = scores * tf.cast( - tf.reshape(tf.range(max_output_size), [1, -1]) < tf.reshape( - output_size, [-1, 1]), scores.dtype) - return scores, boxes diff --git a/official/vision/beta/ops/preprocess_ops.py b/official/vision/beta/ops/preprocess_ops.py deleted file mode 100644 index 348fa0a79de0efe298681f0d73d136701298d672..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/preprocess_ops.py +++ /dev/null @@ -1,839 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Preprocessing ops.""" - -import math -from typing import Optional -from six.moves import range -import tensorflow as tf - -from official.vision.beta.ops import augment -from official.vision.beta.ops import box_ops - -CENTER_CROP_FRACTION = 0.875 - - -def clip_or_pad_to_fixed_size(input_tensor, size, constant_values=0): - """Pads data to a fixed length at the first dimension. - - Args: - input_tensor: `Tensor` with any dimension. - size: `int` number for the first dimension of output Tensor. - constant_values: `int` value assigned to the paddings. - - Returns: - `Tensor` with the first dimension padded to `size`. - """ - input_shape = input_tensor.get_shape().as_list() - padding_shape = [] - - # Computes the padding length on the first dimension, clip input tensor if it - # is longer than `size`. - input_length = tf.shape(input_tensor)[0] - input_length = tf.clip_by_value(input_length, 0, size) - input_tensor = input_tensor[:input_length] - - padding_length = tf.maximum(0, size - input_length) - padding_shape.append(padding_length) - - # Copies shapes of the rest of input shape dimensions. - for i in range(1, len(input_shape)): - padding_shape.append(tf.shape(input_tensor)[i]) - - # Pads input tensor to the fixed first dimension. - paddings = tf.cast(constant_values * tf.ones(padding_shape), - input_tensor.dtype) - padded_tensor = tf.concat([input_tensor, paddings], axis=0) - output_shape = input_shape - output_shape[0] = size - padded_tensor.set_shape(output_shape) - return padded_tensor - - -def normalize_image(image, - offset=(0.485, 0.456, 0.406), - scale=(0.229, 0.224, 0.225)): - """Normalizes the image to zero mean and unit variance.""" - with tf.name_scope('normalize_image'): - image = tf.image.convert_image_dtype(image, dtype=tf.float32) - offset = tf.constant(offset) - offset = tf.expand_dims(offset, axis=0) - offset = tf.expand_dims(offset, axis=0) - image -= offset - - scale = tf.constant(scale) - scale = tf.expand_dims(scale, axis=0) - scale = tf.expand_dims(scale, axis=0) - image /= scale - return image - - -def compute_padded_size(desired_size, stride): - """Compute the padded size given the desired size and the stride. - - The padded size will be the smallest rectangle, such that each dimension is - the smallest multiple of the stride which is larger than the desired - dimension. For example, if desired_size = (100, 200) and stride = 32, - the output padded_size = (128, 224). - - Args: - desired_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the target output image size. - stride: an integer, the stride of the backbone network. - - Returns: - padded_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the padded output image size. - """ - if isinstance(desired_size, list) or isinstance(desired_size, tuple): - padded_size = [int(math.ceil(d * 1.0 / stride) * stride) - for d in desired_size] - else: - padded_size = tf.cast( - tf.math.ceil( - tf.cast(desired_size, dtype=tf.float32) / stride) * stride, - tf.int32) - return padded_size - - -def resize_and_crop_image(image, - desired_size, - padded_size, - aug_scale_min=1.0, - aug_scale_max=1.0, - seed=1, - method=tf.image.ResizeMethod.BILINEAR): - """Resizes the input image to output size (RetinaNet style). - - Resize and pad images given the desired output size of the image and - stride size. - - Here are the preprocessing steps. - 1. For a given image, keep its aspect ratio and rescale the image to make it - the largest rectangle to be bounded by the rectangle specified by the - `desired_size`. - 2. Pad the rescaled image to the padded_size. - - Args: - image: a `Tensor` of shape [height, width, 3] representing an image. - desired_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the desired actual output image size. - padded_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the padded output image size. Padding will be applied - after scaling the image to the desired_size. - aug_scale_min: a `float` with range between [0, 1.0] representing minimum - random scale applied to desired_size for training scale jittering. - aug_scale_max: a `float` with range between [1.0, inf] representing maximum - random scale applied to desired_size for training scale jittering. - seed: seed for random scale jittering. - method: function to resize input image to scaled image. - - Returns: - output_image: `Tensor` of shape [height, width, 3] where [height, width] - equals to `output_size`. - image_info: a 2D `Tensor` that encodes the information of the image and the - applied preprocessing. It is in the format of - [[original_height, original_width], [desired_height, desired_width], - [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, - desired_width] is the actual scaled image size, and [y_scale, x_scale] is - the scaling factor, which is the ratio of - scaled dimension / original dimension. - """ - with tf.name_scope('resize_and_crop_image'): - image_size = tf.cast(tf.shape(image)[0:2], tf.float32) - - random_jittering = (aug_scale_min != 1.0 or aug_scale_max != 1.0) - - if random_jittering: - random_scale = tf.random.uniform( - [], aug_scale_min, aug_scale_max, seed=seed) - scaled_size = tf.round(random_scale * desired_size) - else: - scaled_size = desired_size - - scale = tf.minimum( - scaled_size[0] / image_size[0], scaled_size[1] / image_size[1]) - scaled_size = tf.round(image_size * scale) - - # Computes 2D image_scale. - image_scale = scaled_size / image_size - - # Selects non-zero random offset (x, y) if scaled image is larger than - # desired_size. - if random_jittering: - max_offset = scaled_size - desired_size - max_offset = tf.where( - tf.less(max_offset, 0), tf.zeros_like(max_offset), max_offset) - offset = max_offset * tf.random.uniform([2,], 0, 1, seed=seed) - offset = tf.cast(offset, tf.int32) - else: - offset = tf.zeros((2,), tf.int32) - - scaled_image = tf.image.resize( - image, tf.cast(scaled_size, tf.int32), method=method) - - if random_jittering: - scaled_image = scaled_image[ - offset[0]:offset[0] + desired_size[0], - offset[1]:offset[1] + desired_size[1], :] - - output_image = tf.image.pad_to_bounding_box( - scaled_image, 0, 0, padded_size[0], padded_size[1]) - - image_info = tf.stack([ - image_size, - tf.constant(desired_size, dtype=tf.float32), - image_scale, - tf.cast(offset, tf.float32)]) - return output_image, image_info - - -def resize_and_crop_image_v2(image, - short_side, - long_side, - padded_size, - aug_scale_min=1.0, - aug_scale_max=1.0, - seed=1, - method=tf.image.ResizeMethod.BILINEAR): - """Resizes the input image to output size (Faster R-CNN style). - - Resize and pad images given the specified short / long side length and the - stride size. - - Here are the preprocessing steps. - 1. For a given image, keep its aspect ratio and first try to rescale the short - side of the original image to `short_side`. - 2. If the scaled image after 1 has a long side that exceeds `long_side`, keep - the aspect ratio and rescal the long side of the image to `long_side`. - 2. Pad the rescaled image to the padded_size. - - Args: - image: a `Tensor` of shape [height, width, 3] representing an image. - short_side: a scalar `Tensor` or `int` representing the desired short side - to be rescaled to. - long_side: a scalar `Tensor` or `int` representing the desired long side to - be rescaled to. - padded_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the padded output image size. Padding will be applied - after scaling the image to the desired_size. - aug_scale_min: a `float` with range between [0, 1.0] representing minimum - random scale applied to desired_size for training scale jittering. - aug_scale_max: a `float` with range between [1.0, inf] representing maximum - random scale applied to desired_size for training scale jittering. - seed: seed for random scale jittering. - method: function to resize input image to scaled image. - - Returns: - output_image: `Tensor` of shape [height, width, 3] where [height, width] - equals to `output_size`. - image_info: a 2D `Tensor` that encodes the information of the image and the - applied preprocessing. It is in the format of - [[original_height, original_width], [desired_height, desired_width], - [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, - desired_width] is the actual scaled image size, and [y_scale, x_scale] is - the scaling factor, which is the ratio of - scaled dimension / original dimension. - """ - with tf.name_scope('resize_and_crop_image_v2'): - image_size = tf.cast(tf.shape(image)[0:2], tf.float32) - - scale_using_short_side = ( - short_side / tf.math.minimum(image_size[0], image_size[1])) - scale_using_long_side = ( - long_side / tf.math.maximum(image_size[0], image_size[1])) - - scaled_size = tf.math.round(image_size * scale_using_short_side) - scaled_size = tf.where( - tf.math.greater( - tf.math.maximum(scaled_size[0], scaled_size[1]), long_side), - tf.math.round(image_size * scale_using_long_side), - scaled_size) - desired_size = scaled_size - - random_jittering = (aug_scale_min != 1.0 or aug_scale_max != 1.0) - - if random_jittering: - random_scale = tf.random.uniform( - [], aug_scale_min, aug_scale_max, seed=seed) - scaled_size = tf.math.round(random_scale * scaled_size) - - # Computes 2D image_scale. - image_scale = scaled_size / image_size - - # Selects non-zero random offset (x, y) if scaled image is larger than - # desired_size. - if random_jittering: - max_offset = scaled_size - desired_size - max_offset = tf.where( - tf.math.less(max_offset, 0), tf.zeros_like(max_offset), max_offset) - offset = max_offset * tf.random.uniform([2,], 0, 1, seed=seed) - offset = tf.cast(offset, tf.int32) - else: - offset = tf.zeros((2,), tf.int32) - - scaled_image = tf.image.resize( - image, tf.cast(scaled_size, tf.int32), method=method) - - if random_jittering: - scaled_image = scaled_image[ - offset[0]:offset[0] + desired_size[0], - offset[1]:offset[1] + desired_size[1], :] - - output_image = tf.image.pad_to_bounding_box( - scaled_image, 0, 0, padded_size[0], padded_size[1]) - - image_info = tf.stack([ - image_size, - tf.cast(desired_size, dtype=tf.float32), - image_scale, - tf.cast(offset, tf.float32)]) - return output_image, image_info - - -def center_crop_image(image): - """Center crop a square shape slice from the input image. - - It crops a square shape slice from the image. The side of the actual crop - is 224 / 256 = 0.875 of the short side of the original image. References: - [1] Very Deep Convolutional Networks for Large-Scale Image Recognition - https://arxiv.org/abs/1409.1556 - [2] Deep Residual Learning for Image Recognition - https://arxiv.org/abs/1512.03385 - - Args: - image: a Tensor of shape [height, width, 3] representing the input image. - - Returns: - cropped_image: a Tensor representing the center cropped image. - """ - with tf.name_scope('center_crop_image'): - image_size = tf.cast(tf.shape(image)[:2], dtype=tf.float32) - crop_size = ( - CENTER_CROP_FRACTION * tf.math.minimum(image_size[0], image_size[1])) - crop_offset = tf.cast((image_size - crop_size) / 2.0, dtype=tf.int32) - crop_size = tf.cast(crop_size, dtype=tf.int32) - cropped_image = image[ - crop_offset[0]:crop_offset[0] + crop_size, - crop_offset[1]:crop_offset[1] + crop_size, :] - return cropped_image - - -def center_crop_image_v2(image_bytes, image_shape): - """Center crop a square shape slice from the input image. - - It crops a square shape slice from the image. The side of the actual crop - is 224 / 256 = 0.875 of the short side of the original image. References: - [1] Very Deep Convolutional Networks for Large-Scale Image Recognition - https://arxiv.org/abs/1409.1556 - [2] Deep Residual Learning for Image Recognition - https://arxiv.org/abs/1512.03385 - - This is a faster version of `center_crop_image` which takes the original - image bytes and image size as the inputs, and partially decode the JPEG - bytes according to the center crop. - - Args: - image_bytes: a Tensor of type string representing the raw image bytes. - image_shape: a Tensor specifying the shape of the raw image. - - Returns: - cropped_image: a Tensor representing the center cropped image. - """ - with tf.name_scope('center_image_crop_v2'): - image_shape = tf.cast(image_shape, tf.float32) - crop_size = ( - CENTER_CROP_FRACTION * tf.math.minimum(image_shape[0], image_shape[1])) - crop_offset = tf.cast((image_shape - crop_size) / 2.0, dtype=tf.int32) - crop_size = tf.cast(crop_size, dtype=tf.int32) - crop_window = tf.stack( - [crop_offset[0], crop_offset[1], crop_size, crop_size]) - cropped_image = tf.image.decode_and_crop_jpeg( - image_bytes, crop_window, channels=3) - return cropped_image - - -def random_crop_image(image, - aspect_ratio_range=(3. / 4., 4. / 3.), - area_range=(0.08, 1.0), - max_attempts=10, - seed=1): - """Randomly crop an arbitrary shaped slice from the input image. - - Args: - image: a Tensor of shape [height, width, 3] representing the input image. - aspect_ratio_range: a list of floats. The cropped area of the image must - have an aspect ratio = width / height within this range. - area_range: a list of floats. The cropped reas of the image must contain - a fraction of the input image within this range. - max_attempts: the number of attempts at generating a cropped region of the - image of the specified constraints. After max_attempts failures, return - the entire image. - seed: the seed of the random generator. - - Returns: - cropped_image: a Tensor representing the random cropped image. Can be the - original image if max_attempts is exhausted. - """ - with tf.name_scope('random_crop_image'): - crop_offset, crop_size, _ = tf.image.sample_distorted_bounding_box( - tf.shape(image), - tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]), - seed=seed, - min_object_covered=area_range[0], - aspect_ratio_range=aspect_ratio_range, - area_range=area_range, - max_attempts=max_attempts) - cropped_image = tf.slice(image, crop_offset, crop_size) - return cropped_image - - -def random_crop_image_v2(image_bytes, - image_shape, - aspect_ratio_range=(3. / 4., 4. / 3.), - area_range=(0.08, 1.0), - max_attempts=10, - seed=1): - """Randomly crop an arbitrary shaped slice from the input image. - - This is a faster version of `random_crop_image` which takes the original - image bytes and image size as the inputs, and partially decode the JPEG - bytes according to the generated crop. - - Args: - image_bytes: a Tensor of type string representing the raw image bytes. - image_shape: a Tensor specifying the shape of the raw image. - aspect_ratio_range: a list of floats. The cropped area of the image must - have an aspect ratio = width / height within this range. - area_range: a list of floats. The cropped reas of the image must contain - a fraction of the input image within this range. - max_attempts: the number of attempts at generating a cropped region of the - image of the specified constraints. After max_attempts failures, return - the entire image. - seed: the seed of the random generator. - - Returns: - cropped_image: a Tensor representing the random cropped image. Can be the - original image if max_attempts is exhausted. - """ - with tf.name_scope('random_crop_image_v2'): - crop_offset, crop_size, _ = tf.image.sample_distorted_bounding_box( - image_shape, - tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]), - seed=seed, - min_object_covered=area_range[0], - aspect_ratio_range=aspect_ratio_range, - area_range=area_range, - max_attempts=max_attempts) - offset_y, offset_x, _ = tf.unstack(crop_offset) - crop_height, crop_width, _ = tf.unstack(crop_size) - crop_window = tf.stack([offset_y, offset_x, crop_height, crop_width]) - cropped_image = tf.image.decode_and_crop_jpeg( - image_bytes, crop_window, channels=3) - return cropped_image - - -def resize_and_crop_boxes(boxes, - image_scale, - output_size, - offset): - """Resizes boxes to output size with scale and offset. - - Args: - boxes: `Tensor` of shape [N, 4] representing ground truth boxes. - image_scale: 2D float `Tensor` representing scale factors that apply to - [height, width] of input image. - output_size: 2D `Tensor` or `int` representing [height, width] of target - output image size. - offset: 2D `Tensor` representing top-left corner [y0, x0] to crop scaled - boxes. - - Returns: - boxes: `Tensor` of shape [N, 4] representing the scaled boxes. - """ - with tf.name_scope('resize_and_crop_boxes'): - # Adjusts box coordinates based on image_scale and offset. - boxes *= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2]) - boxes -= tf.tile(tf.expand_dims(offset, axis=0), [1, 2]) - # Clips the boxes. - boxes = box_ops.clip_boxes(boxes, output_size) - return boxes - - -def resize_and_crop_masks(masks, - image_scale, - output_size, - offset): - """Resizes boxes to output size with scale and offset. - - Args: - masks: `Tensor` of shape [N, H, W, 1] representing ground truth masks. - image_scale: 2D float `Tensor` representing scale factors that apply to - [height, width] of input image. - output_size: 2D `Tensor` or `int` representing [height, width] of target - output image size. - offset: 2D `Tensor` representing top-left corner [y0, x0] to crop scaled - boxes. - - Returns: - masks: `Tensor` of shape [N, H, W, 1] representing the scaled masks. - """ - with tf.name_scope('resize_and_crop_masks'): - mask_size = tf.cast(tf.shape(masks)[1:3], tf.float32) - # Pad masks to avoid empty mask annotations. - masks = tf.concat( - [tf.zeros([1, mask_size[0], mask_size[1], 1]), masks], axis=0) - - scaled_size = tf.cast(image_scale * mask_size, tf.int32) - scaled_masks = tf.image.resize( - masks, scaled_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) - offset = tf.cast(offset, tf.int32) - scaled_masks = scaled_masks[ - :, - offset[0]:offset[0] + output_size[0], - offset[1]:offset[1] + output_size[1], - :] - - output_masks = tf.image.pad_to_bounding_box( - scaled_masks, 0, 0, output_size[0], output_size[1]) - # Remove padding. - output_masks = output_masks[1::] - return output_masks - - -def horizontal_flip_image(image): - """Flips image horizontally.""" - return tf.image.flip_left_right(image) - - -def horizontal_flip_boxes(normalized_boxes): - """Flips normalized boxes horizontally.""" - ymin, xmin, ymax, xmax = tf.split( - value=normalized_boxes, num_or_size_splits=4, axis=1) - flipped_xmin = tf.subtract(1.0, xmax) - flipped_xmax = tf.subtract(1.0, xmin) - flipped_boxes = tf.concat([ymin, flipped_xmin, ymax, flipped_xmax], 1) - return flipped_boxes - - -def horizontal_flip_masks(masks): - """Flips masks horizontally.""" - return masks[:, :, ::-1] - - -def random_horizontal_flip(image, normalized_boxes=None, masks=None, seed=1): - """Randomly flips input image and bounding boxes.""" - with tf.name_scope('random_horizontal_flip'): - do_flip = tf.greater(tf.random.uniform([], seed=seed), 0.5) - - image = tf.cond( - do_flip, - lambda: horizontal_flip_image(image), - lambda: image) - - if normalized_boxes is not None: - normalized_boxes = tf.cond( - do_flip, - lambda: horizontal_flip_boxes(normalized_boxes), - lambda: normalized_boxes) - - if masks is not None: - masks = tf.cond( - do_flip, - lambda: horizontal_flip_masks(masks), - lambda: masks) - - return image, normalized_boxes, masks - - -def color_jitter(image: tf.Tensor, - brightness: Optional[float] = 0., - contrast: Optional[float] = 0., - saturation: Optional[float] = 0., - seed: Optional[int] = None) -> tf.Tensor: - """Applies color jitter to an image, similarly to torchvision`s ColorJitter. - - Args: - image (tf.Tensor): Of shape [height, width, 3] and type uint8. - brightness (float, optional): Magnitude for brightness jitter. Defaults to - 0. - contrast (float, optional): Magnitude for contrast jitter. Defaults to 0. - saturation (float, optional): Magnitude for saturation jitter. Defaults to - 0. - seed (int, optional): Random seed. Defaults to None. - - Returns: - tf.Tensor: The augmented `image` of type uint8. - """ - image = tf.cast(image, dtype=tf.uint8) - image = random_brightness(image, brightness, seed=seed) - image = random_contrast(image, contrast, seed=seed) - image = random_saturation(image, saturation, seed=seed) - return image - - -def random_brightness(image: tf.Tensor, - brightness: float = 0., - seed: Optional[int] = None) -> tf.Tensor: - """Jitters brightness of an image. - - Args: - image (tf.Tensor): Of shape [height, width, 3] and type uint8. - brightness (float, optional): Magnitude for brightness jitter. Defaults to - 0. - seed (int, optional): Random seed. Defaults to None. - - Returns: - tf.Tensor: The augmented `image` of type uint8. - """ - assert brightness >= 0, '`brightness` must be positive' - brightness = tf.random.uniform([], - max(0, 1 - brightness), - 1 + brightness, - seed=seed, - dtype=tf.float32) - return augment.brightness(image, brightness) - - -def random_contrast(image: tf.Tensor, - contrast: float = 0., - seed: Optional[int] = None) -> tf.Tensor: - """Jitters contrast of an image, similarly to torchvision`s ColorJitter. - - Args: - image (tf.Tensor): Of shape [height, width, 3] and type uint8. - contrast (float, optional): Magnitude for contrast jitter. Defaults to 0. - seed (int, optional): Random seed. Defaults to None. - - Returns: - tf.Tensor: The augmented `image` of type uint8. - """ - assert contrast >= 0, '`contrast` must be positive' - contrast = tf.random.uniform([], - max(0, 1 - contrast), - 1 + contrast, - seed=seed, - dtype=tf.float32) - return augment.contrast(image, contrast) - - -def random_saturation(image: tf.Tensor, - saturation: float = 0., - seed: Optional[int] = None) -> tf.Tensor: - """Jitters saturation of an image, similarly to torchvision`s ColorJitter. - - Args: - image (tf.Tensor): Of shape [height, width, 3] and type uint8. - saturation (float, optional): Magnitude for saturation jitter. Defaults to - 0. - seed (int, optional): Random seed. Defaults to None. - - Returns: - tf.Tensor: The augmented `image` of type uint8. - """ - assert saturation >= 0, '`saturation` must be positive' - saturation = tf.random.uniform([], - max(0, 1 - saturation), - 1 + saturation, - seed=seed, - dtype=tf.float32) - return _saturation(image, saturation) - - -def _saturation(image: tf.Tensor, - saturation: Optional[float] = 0.) -> tf.Tensor: - return augment.blend( - tf.repeat(tf.image.rgb_to_grayscale(image), 3, axis=-1), image, - saturation) - - -def random_crop_image_with_boxes_and_labels(img, boxes, labels, min_scale, - aspect_ratio_range, - min_overlap_params, max_retry): - """Crops a random slice from the input image. - - The function will correspondingly recompute the bounding boxes and filter out - outside boxes and their labels. - - References: - [1] End-to-End Object Detection with Transformers - https://arxiv.org/abs/2005.12872 - - The preprocessing steps: - 1. Sample a minimum IoU overlap. - 2. For each trial, sample the new image width, height, and top-left corner. - 3. Compute the IoUs of bounding boxes with the cropped image and retry if - the maximum IoU is below the sampled threshold. - 4. Find boxes whose centers are in the cropped image. - 5. Compute new bounding boxes in the cropped region and only select those - boxes' labels. - - Args: - img: a 'Tensor' of shape [height, width, 3] representing the input image. - boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding - boxes with (ymin, xmin, ymax, xmax). - labels: a 'Tensor' of shape [N,] representing the class labels of the boxes. - min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random - scale variable. - aspect_ratio_range: a list of two 'float' that specifies the lower and upper - bound of the random aspect ratio. - min_overlap_params: a list of four 'float' representing the min value, max - value, step size, and offset for the minimum overlap sample. - max_retry: an 'int' representing the number of trials for cropping. If it is - exhausted, no cropping will be performed. - - Returns: - img: a Tensor representing the random cropped image. Can be the - original image if max_retry is exhausted. - boxes: a Tensor representing the bounding boxes in the cropped image. - labels: a Tensor representing the new bounding boxes' labels. - """ - - shape = tf.shape(img) - original_h = shape[0] - original_w = shape[1] - - minval, maxval, step, offset = min_overlap_params - - min_overlap = tf.math.floordiv( - tf.random.uniform([], minval=minval, maxval=maxval), step) * step - offset - - min_overlap = tf.clip_by_value(min_overlap, 0.0, 1.1) - - if min_overlap > 1.0: - return img, boxes, labels - - aspect_ratio_low = aspect_ratio_range[0] - aspect_ratio_high = aspect_ratio_range[1] - - for _ in tf.range(max_retry): - scale_h = tf.random.uniform([], min_scale, 1.0) - scale_w = tf.random.uniform([], min_scale, 1.0) - new_h = tf.cast( - scale_h * tf.cast(original_h, dtype=tf.float32), dtype=tf.int32) - new_w = tf.cast( - scale_w * tf.cast(original_w, dtype=tf.float32), dtype=tf.int32) - - # Aspect ratio has to be in the prespecified range - aspect_ratio = new_h / new_w - if aspect_ratio_low > aspect_ratio or aspect_ratio > aspect_ratio_high: - continue - - left = tf.random.uniform([], 0, original_w - new_w, dtype=tf.int32) - right = left + new_w - top = tf.random.uniform([], 0, original_h - new_h, dtype=tf.int32) - bottom = top + new_h - - normalized_left = tf.cast( - left, dtype=tf.float32) / tf.cast( - original_w, dtype=tf.float32) - normalized_right = tf.cast( - right, dtype=tf.float32) / tf.cast( - original_w, dtype=tf.float32) - normalized_top = tf.cast( - top, dtype=tf.float32) / tf.cast( - original_h, dtype=tf.float32) - normalized_bottom = tf.cast( - bottom, dtype=tf.float32) / tf.cast( - original_h, dtype=tf.float32) - - cropped_box = tf.expand_dims( - tf.stack([ - normalized_top, - normalized_left, - normalized_bottom, - normalized_right, - ]), - axis=0) - iou = box_ops.bbox_overlap( - tf.expand_dims(cropped_box, axis=0), - tf.expand_dims(boxes, axis=0)) # (1, 1, n_ground_truth) - iou = tf.squeeze(iou, axis=[0, 1]) - - # If not a single bounding box has a Jaccard overlap of greater than - # the minimum, try again - if tf.reduce_max(iou) < min_overlap: - continue - - centroids = box_ops.yxyx_to_cycxhw(boxes) - mask = tf.math.logical_and( - tf.math.logical_and(centroids[:, 0] > normalized_top, - centroids[:, 0] < normalized_bottom), - tf.math.logical_and(centroids[:, 1] > normalized_left, - centroids[:, 1] < normalized_right)) - # If not a single bounding box has its center in the crop, try again. - if tf.reduce_sum(tf.cast(mask, dtype=tf.int32)) > 0: - indices = tf.squeeze(tf.where(mask), axis=1) - - filtered_boxes = tf.gather(boxes, indices) - - boxes = tf.clip_by_value( - (filtered_boxes[..., :] * tf.cast( - tf.stack([original_h, original_w, original_h, original_w]), - dtype=tf.float32) - - tf.cast(tf.stack([top, left, top, left]), dtype=tf.float32)) / - tf.cast(tf.stack([new_h, new_w, new_h, new_w]), dtype=tf.float32), - 0.0, 1.0) - - img = tf.image.crop_to_bounding_box(img, top, left, bottom - top, - right - left) - - labels = tf.gather(labels, indices) - break - - return img, boxes, labels - - -def random_crop(image, - boxes, - labels, - min_scale=0.3, - aspect_ratio_range=(0.5, 2.0), - min_overlap_params=(0.0, 1.4, 0.2, 0.1), - max_retry=50, - seed=None): - """Randomly crop the image and boxes, filtering labels. - - Args: - image: a 'Tensor' of shape [height, width, 3] representing the input image. - boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding - boxes with (ymin, xmin, ymax, xmax). - labels: a 'Tensor' of shape [N,] representing the class labels of the boxes. - min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random - scale variable. - aspect_ratio_range: a list of two 'float' that specifies the lower and upper - bound of the random aspect ratio. - min_overlap_params: a list of four 'float' representing the min value, max - value, step size, and offset for the minimum overlap sample. - max_retry: an 'int' representing the number of trials for cropping. If it is - exhausted, no cropping will be performed. - seed: the random number seed of int, but could be None. - - Returns: - image: a Tensor representing the random cropped image. Can be the - original image if max_retry is exhausted. - boxes: a Tensor representing the bounding boxes in the cropped image. - labels: a Tensor representing the new bounding boxes' labels. - """ - with tf.name_scope('random_crop'): - do_crop = tf.greater(tf.random.uniform([], seed=seed), 0.5) - if do_crop: - return random_crop_image_with_boxes_and_labels(image, boxes, labels, - min_scale, - aspect_ratio_range, - min_overlap_params, - max_retry) - else: - return image, boxes, labels diff --git a/official/vision/beta/ops/preprocess_ops_3d.py b/official/vision/beta/ops/preprocess_ops_3d.py deleted file mode 100644 index ad9d03029dc951996792022f410ca943b3d0f314..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/preprocess_ops_3d.py +++ /dev/null @@ -1,355 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Utils for processing video dataset features.""" - -from typing import Optional, Tuple -import tensorflow as tf - - -def _sample_or_pad_sequence_indices(sequence: tf.Tensor, - num_steps: int, - stride: int, - offset: tf.Tensor) -> tf.Tensor: - """Returns indices to take for sampling or padding sequences to fixed size.""" - sequence_length = tf.shape(sequence)[0] - sel_idx = tf.range(sequence_length) - - # Repeats sequence until num_steps are available in total. - max_length = num_steps * stride + offset - num_repeats = tf.math.floordiv( - max_length + sequence_length - 1, sequence_length) - sel_idx = tf.tile(sel_idx, [num_repeats]) - - steps = tf.range(offset, offset + num_steps * stride, stride) - return tf.gather(sel_idx, steps) - - -def sample_linspace_sequence(sequence: tf.Tensor, - num_windows: int, - num_steps: int, - stride: int) -> tf.Tensor: - """Samples `num_windows` segments from sequence with linearly spaced offsets. - - The samples are concatenated in a single `tf.Tensor` in order to have the same - format structure per timestep (e.g. a single frame). If `num_steps` * `stride` - is bigger than the number of timesteps, the sequence is repeated. This - function can be used in evaluation in order to extract enough segments to span - the entire sequence. - - Args: - sequence: Any tensor where the first dimension is timesteps. - num_windows: Number of windows retrieved from the sequence. - num_steps: Number of steps (e.g. frames) to take. - stride: Distance to sample between timesteps. - - Returns: - A single `tf.Tensor` with first dimension `num_windows` * `num_steps`. The - tensor contains the concatenated list of `num_windows` tensors which offsets - have been linearly spaced from input. - """ - sequence_length = tf.shape(sequence)[0] - max_offset = tf.maximum(0, sequence_length - num_steps * stride) - offsets = tf.linspace(0.0, tf.cast(max_offset, tf.float32), num_windows) - offsets = tf.cast(offsets, tf.int32) - - all_indices = [] - for i in range(num_windows): - all_indices.append(_sample_or_pad_sequence_indices( - sequence=sequence, - num_steps=num_steps, - stride=stride, - offset=offsets[i])) - - indices = tf.concat(all_indices, axis=0) - indices.set_shape((num_windows * num_steps,)) - return tf.gather(sequence, indices) - - -def sample_sequence(sequence: tf.Tensor, - num_steps: int, - random: bool, - stride: int, - seed: Optional[int] = None) -> tf.Tensor: - """Samples a single segment of size `num_steps` from a given sequence. - - If `random` is not `True`, this function will simply sample the central window - of the sequence. Otherwise, a random offset will be chosen in a way that the - desired `num_steps` might be extracted from the sequence. - - Args: - sequence: Any tensor where the first dimension is timesteps. - num_steps: Number of steps (e.g. frames) to take. - random: A boolean indicating whether to random sample the single window. If - `True`, the offset is randomized. If `False`, the middle frame minus half - of `num_steps` is the first frame. - stride: Distance to sample between timesteps. - seed: A deterministic seed to use when sampling. - - Returns: - A single `tf.Tensor` with first dimension `num_steps` with the sampled - segment. - """ - sequence_length = tf.shape(sequence)[0] - - if random: - sequence_length = tf.cast(sequence_length, tf.float32) - frame_stride = tf.cast(stride, tf.float32) - max_offset = tf.cond( - sequence_length > (num_steps - 1) * frame_stride, - lambda: sequence_length - (num_steps - 1) * frame_stride, - lambda: sequence_length) - offset = tf.random.uniform( - (), - maxval=tf.cast(max_offset, dtype=tf.int32), - dtype=tf.int32, - seed=seed) - else: - offset = (sequence_length - num_steps * stride) // 2 - offset = tf.maximum(0, offset) - - indices = _sample_or_pad_sequence_indices( - sequence=sequence, - num_steps=num_steps, - stride=stride, - offset=offset) - indices.set_shape((num_steps,)) - - return tf.gather(sequence, indices) - - -def decode_jpeg(image_string: tf.Tensor, channels: int = 0) -> tf.Tensor: - """Decodes JPEG raw bytes string into a RGB uint8 Tensor. - - Args: - image_string: A `tf.Tensor` of type strings with the raw JPEG bytes where - the first dimension is timesteps. - channels: Number of channels of the JPEG image. Allowed values are 0, 1 and - 3. If 0, the number of channels will be calculated at runtime and no - static shape is set. - - Returns: - A Tensor of shape [T, H, W, C] of type uint8 with the decoded images. - """ - return tf.map_fn( - lambda x: tf.image.decode_jpeg(x, channels=channels), - image_string, back_prop=False, dtype=tf.uint8) - - -def crop_image(frames: tf.Tensor, - target_height: int, - target_width: int, - random: bool = False, - num_crops: int = 1, - seed: Optional[int] = None) -> tf.Tensor: - """Crops the image sequence of images. - - If requested size is bigger than image size, image is padded with 0. If not - random cropping, a central crop is performed if num_crops is 1. - - Args: - frames: A Tensor of dimension [timesteps, in_height, in_width, channels]. - target_height: Target cropped image height. - target_width: Target cropped image width. - random: A boolean indicating if crop should be randomized. - num_crops: Number of crops (support 1 for central crop and 3 for 3-crop). - seed: A deterministic seed to use when random cropping. - - Returns: - A Tensor of shape [timesteps, out_height, out_width, channels] of type uint8 - with the cropped images. - """ - if random: - # Random spatial crop. - shape = tf.shape(frames) - # If a static_shape is available (e.g. when using this method from add_image - # method), it will be used to have an output tensor with static shape. - static_shape = frames.shape.as_list() - seq_len = shape[0] if static_shape[0] is None else static_shape[0] - channels = shape[3] if static_shape[3] is None else static_shape[3] - frames = tf.image.random_crop( - frames, (seq_len, target_height, target_width, channels), seed) - else: - if num_crops == 1: - # Central crop or pad. - frames = tf.image.resize_with_crop_or_pad(frames, target_height, - target_width) - - elif num_crops == 3: - # Three-crop evaluation. - shape = tf.shape(frames) - static_shape = frames.shape.as_list() - seq_len = shape[0] if static_shape[0] is None else static_shape[0] - height = shape[1] if static_shape[1] is None else static_shape[1] - width = shape[2] if static_shape[2] is None else static_shape[2] - channels = shape[3] if static_shape[3] is None else static_shape[3] - - size = tf.convert_to_tensor( - (seq_len, target_height, target_width, channels)) - - offset_1 = tf.broadcast_to([0, 0, 0, 0], [4]) - # pylint:disable=g-long-lambda - offset_2 = tf.cond( - tf.greater_equal(height, width), - true_fn=lambda: tf.broadcast_to([ - 0, tf.cast(height, tf.float32) / 2 - target_height // 2, 0, 0 - ], [4]), - false_fn=lambda: tf.broadcast_to([ - 0, 0, tf.cast(width, tf.float32) / 2 - target_width // 2, 0 - ], [4])) - offset_3 = tf.cond( - tf.greater_equal(height, width), - true_fn=lambda: tf.broadcast_to( - [0, tf.cast(height, tf.float32) - target_height, 0, 0], [4]), - false_fn=lambda: tf.broadcast_to( - [0, 0, tf.cast(width, tf.float32) - target_width, 0], [4])) - # pylint:disable=g-long-lambda - - crops = [] - for offset in [offset_1, offset_2, offset_3]: - offset = tf.cast(tf.math.round(offset), tf.int32) - crops.append(tf.slice(frames, offset, size)) - frames = tf.concat(crops, axis=0) - - else: - raise NotImplementedError( - f"Only 1-crop and 3-crop are supported. Found {num_crops!r}.") - - return frames - - -def resize_smallest(frames: tf.Tensor, - min_resize: int) -> tf.Tensor: - """Resizes frames so that min(`height`, `width`) is equal to `min_resize`. - - This function will not do anything if the min(`height`, `width`) is already - equal to `min_resize`. This allows to save compute time. - - Args: - frames: A Tensor of dimension [timesteps, input_h, input_w, channels]. - min_resize: Minimum size of the final image dimensions. - - Returns: - A Tensor of shape [timesteps, output_h, output_w, channels] of type - frames.dtype where min(output_h, output_w) = min_resize. - """ - shape = tf.shape(frames) - input_h = shape[1] - input_w = shape[2] - - output_h = tf.maximum(min_resize, (input_h * min_resize) // input_w) - output_w = tf.maximum(min_resize, (input_w * min_resize) // input_h) - - def resize_fn(): - frames_resized = tf.image.resize(frames, (output_h, output_w)) - return tf.cast(frames_resized, frames.dtype) - - should_resize = tf.math.logical_or(tf.not_equal(input_w, output_w), - tf.not_equal(input_h, output_h)) - frames = tf.cond(should_resize, resize_fn, lambda: frames) - - return frames - - -def random_crop_resize(frames: tf.Tensor, - output_h: int, - output_w: int, - num_frames: int, - num_channels: int, - aspect_ratio: Tuple[float, float], - area_range: Tuple[float, float]) -> tf.Tensor: - """First crops clip with jittering and then resizes to (output_h, output_w). - - Args: - frames: A Tensor of dimension [timesteps, input_h, input_w, channels]. - output_h: Resized image height. - output_w: Resized image width. - num_frames: Number of input frames per clip. - num_channels: Number of channels of the clip. - aspect_ratio: Float tuple with the aspect range for cropping. - area_range: Float tuple with the area range for cropping. - Returns: - A Tensor of shape [timesteps, output_h, output_w, channels] of type - frames.dtype. - """ - shape = tf.shape(frames) - seq_len, _, _, channels = shape[0], shape[1], shape[2], shape[3] - bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) - factor = output_w / output_h - aspect_ratio = (aspect_ratio[0] * factor, aspect_ratio[1] * factor) - sample_distorted_bbox = tf.image.sample_distorted_bounding_box( - shape[1:], - bounding_boxes=bbox, - min_object_covered=0.1, - aspect_ratio_range=aspect_ratio, - area_range=area_range, - max_attempts=100, - use_image_if_no_bounding_boxes=True) - bbox_begin, bbox_size, _ = sample_distorted_bbox - offset_y, offset_x, _ = tf.unstack(bbox_begin) - target_height, target_width, _ = tf.unstack(bbox_size) - size = tf.convert_to_tensor(( - seq_len, target_height, target_width, channels)) - offset = tf.convert_to_tensor(( - 0, offset_y, offset_x, 0)) - frames = tf.slice(frames, offset, size) - frames = tf.cast( - tf.image.resize(frames, (output_h, output_w)), - frames.dtype) - frames.set_shape((num_frames, output_h, output_w, num_channels)) - return frames - - -def random_flip_left_right( - frames: tf.Tensor, - seed: Optional[int] = None) -> tf.Tensor: - """Flips all the frames with a probability of 50%. - - Args: - frames: A Tensor of shape [timesteps, input_h, input_w, channels]. - seed: A seed to use for the random sampling. - - Returns: - A Tensor of shape [timesteps, output_h, output_w, channels] eventually - flipped left right. - """ - is_flipped = tf.random.uniform( - (), minval=0, maxval=2, dtype=tf.int32, seed=seed) - - frames = tf.cond(tf.equal(is_flipped, 1), - true_fn=lambda: tf.image.flip_left_right(frames), - false_fn=lambda: frames) - return frames - - -def normalize_image(frames: tf.Tensor, - zero_centering_image: bool, - dtype: tf.dtypes.DType = tf.float32) -> tf.Tensor: - """Normalizes images. - - Args: - frames: A Tensor of numbers. - zero_centering_image: If True, results are in [-1, 1], if False, results are - in [0, 1]. - dtype: Type of output Tensor. - - Returns: - A Tensor of same shape as the input and of the given type. - """ - frames = tf.cast(frames, dtype) - if zero_centering_image: - return frames * (2.0 / 255.0) - 1.0 - else: - return frames / 255.0 diff --git a/official/vision/beta/ops/spatial_transform_ops.py b/official/vision/beta/ops/spatial_transform_ops.py deleted file mode 100644 index 3095e33ed085d425c50a5c0ce65d66a6544ec961..0000000000000000000000000000000000000000 --- a/official/vision/beta/ops/spatial_transform_ops.py +++ /dev/null @@ -1,544 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Spatial transform ops.""" - -import tensorflow as tf - -_EPSILON = 1e-8 - - -def _feature_bilinear_interpolation(features, kernel_y, kernel_x): - """Feature bilinear interpolation. - - The RoIAlign feature f can be computed by bilinear interpolation - of four neighboring feature points f0, f1, f2, and f3. - - f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T - [f10, f11]] - f(y, x) = (hy*hx)f00 + (hy*lx)f01 + (ly*hx)f10 + (lx*ly)f11 - f(y, x) = w00*f00 + w01*f01 + w10*f10 + w11*f11 - kernel_y = [hy, ly] - kernel_x = [hx, lx] - - Args: - features: The features are in shape of [batch_size, num_boxes, output_size * - 2, output_size * 2, num_filters]. - kernel_y: Tensor of size [batch_size, boxes, output_size, 2, 1]. - kernel_x: Tensor of size [batch_size, boxes, output_size, 2, 1]. - - Returns: - A 5-D tensor representing feature crop of shape - [batch_size, num_boxes, output_size, output_size, num_filters]. - - """ - features_shape = tf.shape(features) - batch_size, num_boxes, output_size, num_filters = ( - features_shape[0], features_shape[1], features_shape[2], - features_shape[4]) - - output_size = output_size // 2 - kernel_y = tf.reshape(kernel_y, [batch_size, num_boxes, output_size * 2, 1]) - kernel_x = tf.reshape(kernel_x, [batch_size, num_boxes, 1, output_size * 2]) - # Use implicit broadcast to generate the interpolation kernel. The - # multiplier `4` is for avg pooling. - interpolation_kernel = kernel_y * kernel_x * 4 - - # Interpolate the gathered features with computed interpolation kernels. - features *= tf.cast( - tf.expand_dims(interpolation_kernel, axis=-1), dtype=features.dtype) - features = tf.reshape( - features, - [batch_size * num_boxes, output_size * 2, output_size * 2, num_filters]) - features = tf.nn.avg_pool(features, [1, 2, 2, 1], [1, 2, 2, 1], 'VALID') - features = tf.reshape( - features, [batch_size, num_boxes, output_size, output_size, num_filters]) - return features - - -def _compute_grid_positions(boxes, boundaries, output_size, sample_offset): - """Computes the grid position w.r.t. the corresponding feature map. - - Args: - boxes: a 3-D tensor of shape [batch_size, num_boxes, 4] encoding the - information of each box w.r.t. the corresponding feature map. - boxes[:, :, 0:2] are the grid position in (y, x) (float) of the top-left - corner of each box. boxes[:, :, 2:4] are the box sizes in (h, w) (float) - in terms of the number of pixels of the corresponding feature map size. - boundaries: a 3-D tensor of shape [batch_size, num_boxes, 2] representing - the boundary (in (y, x)) of the corresponding feature map for each box. - Any resampled grid points that go beyond the bounary will be clipped. - output_size: a scalar indicating the output crop size. - sample_offset: a float number in [0, 1] indicates the subpixel sample offset - from grid point. - - Returns: - kernel_y: Tensor of size [batch_size, boxes, output_size, 2, 1]. - kernel_x: Tensor of size [batch_size, boxes, output_size, 2, 1]. - box_grid_y0y1: Tensor of size [batch_size, boxes, output_size, 2] - box_grid_x0x1: Tensor of size [batch_size, boxes, output_size, 2] - """ - boxes_shape = tf.shape(boxes) - batch_size, num_boxes = boxes_shape[0], boxes_shape[1] - if batch_size is None: - batch_size = tf.shape(boxes)[0] - box_grid_x = [] - box_grid_y = [] - for i in range(output_size): - box_grid_x.append(boxes[:, :, 1] + - (i + sample_offset) * boxes[:, :, 3] / output_size) - box_grid_y.append(boxes[:, :, 0] + - (i + sample_offset) * boxes[:, :, 2] / output_size) - box_grid_x = tf.stack(box_grid_x, axis=2) - box_grid_y = tf.stack(box_grid_y, axis=2) - - box_grid_y0 = tf.floor(box_grid_y) - box_grid_x0 = tf.floor(box_grid_x) - box_grid_x0 = tf.maximum(tf.cast(0., dtype=box_grid_x0.dtype), box_grid_x0) - box_grid_y0 = tf.maximum(tf.cast(0., dtype=box_grid_y0.dtype), box_grid_y0) - - box_grid_x0 = tf.minimum(box_grid_x0, tf.expand_dims(boundaries[:, :, 1], -1)) - box_grid_x1 = tf.minimum(box_grid_x0 + 1, - tf.expand_dims(boundaries[:, :, 1], -1)) - box_grid_y0 = tf.minimum(box_grid_y0, tf.expand_dims(boundaries[:, :, 0], -1)) - box_grid_y1 = tf.minimum(box_grid_y0 + 1, - tf.expand_dims(boundaries[:, :, 0], -1)) - - box_gridx0x1 = tf.stack([box_grid_x0, box_grid_x1], axis=-1) - box_gridy0y1 = tf.stack([box_grid_y0, box_grid_y1], axis=-1) - - # The RoIAlign feature f can be computed by bilinear interpolation of four - # neighboring feature points f0, f1, f2, and f3. - # f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T - # [f10, f11]] - # f(y, x) = (hy*hx)f00 + (hy*lx)f01 + (ly*hx)f10 + (lx*ly)f11 - # f(y, x) = w00*f00 + w01*f01 + w10*f10 + w11*f11 - ly = box_grid_y - box_grid_y0 - lx = box_grid_x - box_grid_x0 - hy = 1.0 - ly - hx = 1.0 - lx - kernel_y = tf.reshape( - tf.stack([hy, ly], axis=3), [batch_size, num_boxes, output_size, 2, 1]) - kernel_x = tf.reshape( - tf.stack([hx, lx], axis=3), [batch_size, num_boxes, output_size, 2, 1]) - return kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 - - -def multilevel_crop_and_resize(features, - boxes, - output_size=7, - sample_offset=0.5): - """Crop and resize on multilevel feature pyramid. - - Generate the (output_size, output_size) set of pixels for each input box - by first locating the box into the correct feature level, and then cropping - and resizing it using the correspoding feature map of that level. - - Args: - features: A dictionary with key as pyramid level and value as features. The - features are in shape of [batch_size, height_l, width_l, num_filters]. - boxes: A 3-D Tensor of shape [batch_size, num_boxes, 4]. Each row represents - a box with [y1, x1, y2, x2] in un-normalized coordinates. - output_size: A scalar to indicate the output crop size. - sample_offset: a float number in [0, 1] indicates the subpixel sample offset - from grid point. - - Returns: - A 5-D tensor representing feature crop of shape - [batch_size, num_boxes, output_size, output_size, num_filters]. - """ - - with tf.name_scope('multilevel_crop_and_resize'): - levels = list(features.keys()) - min_level = int(min(levels)) - max_level = int(max(levels)) - features_shape = tf.shape(features[str(min_level)]) - batch_size, max_feature_height, max_feature_width, num_filters = ( - features_shape[0], features_shape[1], features_shape[2], - features_shape[3]) - - num_boxes = tf.shape(boxes)[1] - - # Stack feature pyramid into a features_all of shape - # [batch_size, levels, height, width, num_filters]. - features_all = [] - feature_heights = [] - feature_widths = [] - for level in range(min_level, max_level + 1): - shape = features[str(level)].get_shape().as_list() - feature_heights.append(shape[1]) - feature_widths.append(shape[2]) - # Concat tensor of [batch_size, height_l * width_l, num_filters] for each - # levels. - features_all.append( - tf.reshape(features[str(level)], [batch_size, -1, num_filters])) - features_r2 = tf.reshape(tf.concat(features_all, 1), [-1, num_filters]) - - # Calculate height_l * width_l for each level. - level_dim_sizes = [ - feature_widths[i] * feature_heights[i] - for i in range(len(feature_widths)) - ] - # level_dim_offsets is accumulated sum of level_dim_size. - level_dim_offsets = [0] - for i in range(len(feature_widths) - 1): - level_dim_offsets.append(level_dim_offsets[i] + level_dim_sizes[i]) - batch_dim_size = level_dim_offsets[-1] + level_dim_sizes[-1] - level_dim_offsets = tf.constant(level_dim_offsets, tf.int32) - height_dim_sizes = tf.constant(feature_widths, tf.int32) - - # Assigns boxes to the right level. - box_width = boxes[:, :, 3] - boxes[:, :, 1] - box_height = boxes[:, :, 2] - boxes[:, :, 0] - areas_sqrt = tf.sqrt( - tf.cast(box_height, tf.float32) * tf.cast(box_width, tf.float32)) - - levels = tf.cast( - tf.math.floordiv( - tf.math.log(tf.math.divide_no_nan(areas_sqrt, 224.0)), - tf.math.log(2.0)) + 4.0, - dtype=tf.int32) - # Maps levels between [min_level, max_level]. - levels = tf.minimum(max_level, tf.maximum(levels, min_level)) - - # Projects box location and sizes to corresponding feature levels. - scale_to_level = tf.cast( - tf.pow(tf.constant(2.0), tf.cast(levels, tf.float32)), - dtype=boxes.dtype) - boxes /= tf.expand_dims(scale_to_level, axis=2) - box_width /= scale_to_level - box_height /= scale_to_level - boxes = tf.concat([boxes[:, :, 0:2], - tf.expand_dims(box_height, -1), - tf.expand_dims(box_width, -1)], axis=-1) - - # Maps levels to [0, max_level-min_level]. - levels -= min_level - level_strides = tf.pow([[2.0]], tf.cast(levels, tf.float32)) - boundary = tf.cast( - tf.concat([ - tf.expand_dims( - [[tf.cast(max_feature_height, tf.float32)]] / level_strides - 1, - axis=-1), - tf.expand_dims( - [[tf.cast(max_feature_width, tf.float32)]] / level_strides - 1, - axis=-1), - ], - axis=-1), boxes.dtype) - - # Compute grid positions. - kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 = _compute_grid_positions( - boxes, boundary, output_size, sample_offset) - - x_indices = tf.cast( - tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - y_indices = tf.cast( - tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - - batch_size_offset = tf.tile( - tf.reshape( - tf.range(batch_size) * batch_dim_size, [batch_size, 1, 1, 1]), - [1, num_boxes, output_size * 2, output_size * 2]) - # Get level offset for each box. Each box belongs to one level. - levels_offset = tf.tile( - tf.reshape( - tf.gather(level_dim_offsets, levels), - [batch_size, num_boxes, 1, 1]), - [1, 1, output_size * 2, output_size * 2]) - y_indices_offset = tf.tile( - tf.reshape( - y_indices * tf.expand_dims(tf.gather(height_dim_sizes, levels), -1), - [batch_size, num_boxes, output_size * 2, 1]), - [1, 1, 1, output_size * 2]) - x_indices_offset = tf.tile( - tf.reshape(x_indices, [batch_size, num_boxes, 1, output_size * 2]), - [1, 1, output_size * 2, 1]) - indices = tf.reshape( - batch_size_offset + levels_offset + y_indices_offset + x_indices_offset, - [-1]) - - # TODO(wangtao): replace tf.gather with tf.gather_nd and try to get similar - # performance. - features_per_box = tf.reshape( - tf.gather(features_r2, indices), - [batch_size, num_boxes, output_size * 2, output_size * 2, num_filters]) - - # Bilinear interpolation. - features_per_box = _feature_bilinear_interpolation( - features_per_box, kernel_y, kernel_x) - return features_per_box - - -def _selective_crop_and_resize(features, - boxes, - box_levels, - boundaries, - output_size=7, - sample_offset=0.5, - use_einsum_gather=False): - """Crop and resize boxes on a set of feature maps. - - Given multiple features maps indexed by different levels, and a set of boxes - where each box is mapped to a certain level, it selectively crops and resizes - boxes from the corresponding feature maps to generate the box features. - - We follow the ROIAlign technique (see https://arxiv.org/pdf/1703.06870.pdf, - figure 3 for reference). Specifically, for each feature map, we select an - (output_size, output_size) set of pixels corresponding to the box location, - and then use bilinear interpolation to select the feature value for each - pixel. - - For performance, we perform the gather and interpolation on all layers as a - single operation. In this op the multi-level features are first stacked and - gathered into [2*output_size, 2*output_size] feature points. Then bilinear - interpolation is performed on the gathered feature points to generate - [output_size, output_size] RoIAlign feature map. - - Here is the step-by-step algorithm: - 1. The multi-level features are gathered into a - [batch_size, num_boxes, output_size*2, output_size*2, num_filters] - Tensor. The Tensor contains four neighboring feature points for each - vertex in the output grid. - 2. Compute the interpolation kernel of shape - [batch_size, num_boxes, output_size*2, output_size*2]. The last 2 axis - can be seen as stacking 2x2 interpolation kernels for all vertices in the - output grid. - 3. Element-wise multiply the gathered features and interpolation kernel. - Then apply 2x2 average pooling to reduce spatial dimension to - output_size. - - Args: - features: a 5-D tensor of shape [batch_size, num_levels, max_height, - max_width, num_filters] where cropping and resizing are based. - boxes: a 3-D tensor of shape [batch_size, num_boxes, 4] encoding the - information of each box w.r.t. the corresponding feature map. - boxes[:, :, 0:2] are the grid position in (y, x) (float) of the top-left - corner of each box. boxes[:, :, 2:4] are the box sizes in (h, w) (float) - in terms of the number of pixels of the corresponding feature map size. - box_levels: a 3-D tensor of shape [batch_size, num_boxes, 1] representing - the 0-based corresponding feature level index of each box. - boundaries: a 3-D tensor of shape [batch_size, num_boxes, 2] representing - the boundary (in (y, x)) of the corresponding feature map for each box. - Any resampled grid points that go beyond the bounary will be clipped. - output_size: a scalar indicating the output crop size. - sample_offset: a float number in [0, 1] indicates the subpixel sample offset - from grid point. - use_einsum_gather: use einsum to replace gather or not. Replacing einsum - with gather can improve performance when feature size is not large, einsum - is friendly with model partition as well. Gather's performance is better - when feature size is very large and there are multiple box levels. - - Returns: - features_per_box: a 5-D tensor of shape - [batch_size, num_boxes, output_size, output_size, num_filters] - representing the cropped features. - """ - (batch_size, num_levels, max_feature_height, max_feature_width, - num_filters) = features.get_shape().as_list() - if batch_size is None: - batch_size = tf.shape(features)[0] - _, num_boxes, _ = boxes.get_shape().as_list() - - kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 = _compute_grid_positions( - boxes, boundaries, output_size, sample_offset) - x_indices = tf.cast( - tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - y_indices = tf.cast( - tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - - if use_einsum_gather: - # Blinear interpolation is done during the last two gathers: - # f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T - # [f10, f11]] - # [[f00, f01], - # [f10, f11]] = tf.einsum(tf.einsum(features, y_one_hot), x_one_hot) - # where [hy, ly] and [hx, lx] are the bilinear interpolation kernel. - y_indices = tf.cast( - tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size, 2]), - dtype=tf.int32) - x_indices = tf.cast( - tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size, 2]), - dtype=tf.int32) - - # shape is [batch_size, num_boxes, output_size, 2, height] - grid_y_one_hot = tf.one_hot( - tf.cast(y_indices, tf.int32), max_feature_height, dtype=kernel_y.dtype) - # shape is [batch_size, num_boxes, output_size, 2, width] - grid_x_one_hot = tf.one_hot( - tf.cast(x_indices, tf.int32), max_feature_width, dtype=kernel_x.dtype) - - # shape is [batch_size, num_boxes, output_size, height] - grid_y_weight = tf.reduce_sum( - tf.multiply(grid_y_one_hot, kernel_y), axis=-2) - # shape is [batch_size, num_boxes, output_size, width] - grid_x_weight = tf.reduce_sum( - tf.multiply(grid_x_one_hot, kernel_x), axis=-2) - - # Gather for y_axis. - # shape is [batch_size, num_boxes, output_size, width, features] - features_per_box = tf.einsum('bmhwf,bmoh->bmowf', features, - tf.cast(grid_y_weight, features.dtype)) - # Gather for x_axis. - # shape is [batch_size, num_boxes, output_size, output_size, features] - features_per_box = tf.einsum('bmhwf,bmow->bmhof', features_per_box, - tf.cast(grid_x_weight, features.dtype)) - else: - height_dim_offset = max_feature_width - level_dim_offset = max_feature_height * height_dim_offset - batch_dim_offset = num_levels * level_dim_offset - - batch_size_offset = tf.tile( - tf.reshape( - tf.range(batch_size) * batch_dim_offset, [batch_size, 1, 1, 1]), - [1, num_boxes, output_size * 2, output_size * 2]) - box_levels_offset = tf.tile( - tf.reshape(box_levels * level_dim_offset, - [batch_size, num_boxes, 1, 1]), - [1, 1, output_size * 2, output_size * 2]) - y_indices_offset = tf.tile( - tf.reshape(y_indices * height_dim_offset, - [batch_size, num_boxes, output_size * 2, 1]), - [1, 1, 1, output_size * 2]) - x_indices_offset = tf.tile( - tf.reshape(x_indices, [batch_size, num_boxes, 1, output_size * 2]), - [1, 1, output_size * 2, 1]) - - indices = tf.reshape( - batch_size_offset + box_levels_offset + y_indices_offset + - x_indices_offset, [-1]) - - features = tf.reshape(features, [-1, num_filters]) - # TODO(wangtao): replace tf.gather with tf.gather_nd and try to get similar - # performance. - features_per_box = tf.reshape( - tf.gather(features, indices), - [batch_size, num_boxes, output_size * 2, output_size * 2, num_filters]) - features_per_box = _feature_bilinear_interpolation( - features_per_box, kernel_y, kernel_x) - - return features_per_box - - -def crop_mask_in_target_box(masks, - boxes, - target_boxes, - output_size, - sample_offset=0, - use_einsum=True): - """Crop masks in target boxes. - - Args: - masks: A tensor with a shape of [batch_size, num_masks, height, width]. - boxes: a float tensor representing box cooridnates that tightly enclose - masks with a shape of [batch_size, num_masks, 4] in un-normalized - coordinates. A box is represented by [ymin, xmin, ymax, xmax]. - target_boxes: a float tensor representing target box cooridnates for masks - with a shape of [batch_size, num_masks, 4] in un-normalized coordinates. A - box is represented by [ymin, xmin, ymax, xmax]. - output_size: A scalar to indicate the output crop size. It currently only - supports to output a square shape outputs. - sample_offset: a float number in [0, 1] indicates the subpixel sample offset - from grid point. - use_einsum: Use einsum to replace gather in selective_crop_and_resize. - - Returns: - A 4-D tensor representing feature crop of shape - [batch_size, num_boxes, output_size, output_size]. - """ - with tf.name_scope('crop_mask_in_target_box'): - # Cast to float32, as the y_transform and other transform variables may - # overflow in float16 - masks = tf.cast(masks, tf.float32) - boxes = tf.cast(boxes, tf.float32) - target_boxes = tf.cast(target_boxes, tf.float32) - - batch_size, num_masks, height, width = masks.get_shape().as_list() - if batch_size is None: - batch_size = tf.shape(masks)[0] - masks = tf.reshape(masks, [batch_size * num_masks, height, width, 1]) - # Pad zeros on the boundary of masks. - masks = tf.image.pad_to_bounding_box(masks, 2, 2, height + 4, width + 4) - masks = tf.reshape(masks, [batch_size, num_masks, height+4, width+4, 1]) - - # Projects target box locations and sizes to corresponding cropped - # mask coordinates. - gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( - value=boxes, num_or_size_splits=4, axis=2) - bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( - value=target_boxes, num_or_size_splits=4, axis=2) - y_transform = (bb_y_min - gt_y_min) * height / ( - gt_y_max - gt_y_min + _EPSILON) + 2 - x_transform = (bb_x_min - gt_x_min) * height / ( - gt_x_max - gt_x_min + _EPSILON) + 2 - h_transform = (bb_y_max - bb_y_min) * width / ( - gt_y_max - gt_y_min + _EPSILON) - w_transform = (bb_x_max - bb_x_min) * width / ( - gt_x_max - gt_x_min + _EPSILON) - - boundaries = tf.concat( - [tf.ones_like(y_transform) * ((height + 4) - 1), - tf.ones_like(x_transform) * ((width + 4) - 1)], - axis=-1) - boundaries = tf.cast(boundaries, dtype=y_transform.dtype) - - # Reshape tensors to have the right shape for selective_crop_and_resize. - trasnformed_boxes = tf.concat( - [y_transform, x_transform, h_transform, w_transform], -1) - levels = tf.tile(tf.reshape(tf.range(num_masks), [1, num_masks]), - [batch_size, 1]) - - cropped_masks = _selective_crop_and_resize( - masks, - trasnformed_boxes, - levels, - boundaries, - output_size, - sample_offset=sample_offset, - use_einsum_gather=use_einsum) - cropped_masks = tf.squeeze(cropped_masks, axis=-1) - - return cropped_masks - - -def nearest_upsampling(data, scale, use_keras_layer=False): - """Nearest neighbor upsampling implementation. - - Args: - data: A tensor with a shape of [batch, height_in, width_in, channels]. - scale: An integer multiple to scale resolution of input data. - use_keras_layer: If True, use keras Upsampling2D layer. - - Returns: - data_up: A tensor with a shape of - [batch, height_in*scale, width_in*scale, channels]. Same dtype as input - data. - """ - if use_keras_layer: - return tf.keras.layers.UpSampling2D(size=(scale, scale), - interpolation='nearest')(data) - with tf.name_scope('nearest_upsampling'): - bs, _, _, c = data.get_shape().as_list() - shape = tf.shape(input=data) - h = shape[1] - w = shape[2] - bs = -1 if bs is None else bs - # Uses reshape to quickly upsample the input. The nearest pixel is selected - # via tiling. - data = tf.tile( - tf.reshape(data, [bs, h, 1, w, 1, c]), [1, 1, scale, 1, scale, 1]) - return tf.reshape(data, [bs, h * scale, w * scale, c]) diff --git a/official/vision/beta/projects/README.md b/official/vision/beta/projects/README.md deleted file mode 100644 index 9c20f07fc608947e218602b696e03586109bdc8c..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/README.md +++ /dev/null @@ -1,3 +0,0 @@ -Here are a few projects that are built on tf.vision. They are build and maintain -by different parties. They can be used as examples of how to build your own -projects based on tf.vision. diff --git a/official/vision/beta/projects/__init__.py b/official/vision/beta/projects/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/centernet/README.md b/official/vision/beta/projects/centernet/README.md deleted file mode 100644 index 6c80d8e371e385d1d9b8caff21416dd7a0ec61bc..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/README.md +++ /dev/null @@ -1,82 +0,0 @@ -# Centernet - -[![Paper](http://img.shields.io/badge/Paper-arXiv.1904.07850-B3181B?logo=arXiv)](https://arxiv.org/abs/1904.07850) - -Centernet builds upon CornerNet, an anchor-free model for object detection. - -Many other models, such as YOLO and RetinaNet, use anchor boxes. These anchor -boxes are predefined to be close to the aspect ratios and scales of the objects -in the training dataset. Anchor-based models do not predict the bounding boxes -of objects directly. They instead predict the location and size/shape -refinements to a predefined anchor box. The detection generator then computes -the final confidences, positions, and size of the detection. - -CornerNet eliminates the need for anchor boxes. RetinaNet needs thousands of -anchor boxes in order to cover the most common ground truth boxes. This adds -unnecessary complexity to the model which slow down training and create -imbalances in positive and negative anchor boxes. Instead, CornerNet creates -heatmaps for each of the corners and pools them together in order to get the -final detection boxes for the objects. CenterNet removes even more complexity -by using the center instead of the corners, meaning that only one set of -heatmaps (one heatmap for each class) is needed to predict the object. CenterNet -proves that this can be done without a significant difference in accuracy. - - -## Enviroment setup - -The code can be run on multiple GPUs or TPUs with different distribution -strategies. See the TensorFlow distributed training -[guide](https://www.tensorflow.org/guide/distributed_training) for an overview -of `tf.distribute`. - -The code is compatible with TensorFlow 2.5+. See requirements.txt for all -prerequisites, and you can also install them using the following command. `pip -install -r ./official/requirements.txt` - -## Training -To train the model on Coco, try the following command: - -``` -python3 -m official.vision.beta.projects.centernet.train \ - --mode=train_and_eval \ - --experiment=centernet_hourglass_coco \ - --model_dir={MODEL_DIR} \ - --config_file={CONFIG_FILE} -``` - -## Configurations - -In the following table, we report the mAP measured on the `coco-val2017` set. - -Backbone | Config name | mAP -:--------------- | :-----------------------------------------------| -------: -Hourglass-104 | `coco-centernet-hourglass-gpu.yaml` | 40.01 -Hourglass-104 | `coco-centernet-hourglass-tpu.yaml` | 40.5 - -**Note:** `float16` (`bfloat16` for TPU) is used in the provided configurations. - - -## Cite - -[Centernet](https://arxiv.org/abs/1904.07850): -``` -@article{Zhou2019ObjectsAP, - title={Objects as Points}, - author={Xingyi Zhou and Dequan Wang and Philipp Kr{\"a}henb{\"u}hl}, - journal={ArXiv}, - year={2019}, - volume={abs/1904.07850} -} -``` - -[CornerNet](https://arxiv.org/abs/1808.01244): -``` -@article{Law2019CornerNetDO, - title={CornerNet: Detecting Objects as Paired Keypoints}, - author={Hei Law and J. Deng}, - journal={International Journal of Computer Vision}, - year={2019}, - volume={128}, - pages={642-656} -} -``` diff --git a/official/vision/beta/projects/centernet/common/registry_imports.py b/official/vision/beta/projects/centernet/common/registry_imports.py deleted file mode 100644 index 068147017d4f8fde2236e0b7d199551001b1c51f..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/common/registry_imports.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""All necessary imports for registration.""" - -# pylint: disable=unused-import -from official.common import registry_imports -from official.vision.beta.projects.centernet.configs import centernet -from official.vision.beta.projects.centernet.modeling import centernet_model -from official.vision.beta.projects.centernet.modeling.backbones import hourglass -from official.vision.beta.projects.centernet.tasks import centernet as centernet_task diff --git a/official/vision/beta/projects/centernet/configs/__init__.py b/official/vision/beta/projects/centernet/configs/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/centernet/configs/backbones.py b/official/vision/beta/projects/centernet/configs/backbones.py deleted file mode 100644 index 00cfd7ff105aeb914f9631f089c2cf7c1cdca1ad..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/configs/backbones.py +++ /dev/null @@ -1,35 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Backbones configurations.""" - -import dataclasses - -from official.modeling import hyperparams -from official.vision.beta.configs import backbones - - -@dataclasses.dataclass -class Hourglass(hyperparams.Config): - """Hourglass config.""" - model_id: int = 52 - input_channel_dims: int = 128 - num_hourglasses: int = 2 - initial_downsample: bool = True - activation: str = 'relu' - - -@dataclasses.dataclass -class Backbone(backbones.Backbone): - hourglass: Hourglass = Hourglass() diff --git a/official/vision/beta/projects/centernet/configs/centernet.py b/official/vision/beta/projects/centernet/configs/centernet.py deleted file mode 100644 index b991668a17f5415eac389ef62baa1613d79a41d5..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/configs/centernet.py +++ /dev/null @@ -1,225 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""CenterNet configuration definition.""" - -import dataclasses -import os -from typing import List, Optional, Tuple -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.projects.centernet.configs import backbones - - -TfExampleDecoderLabelMap = common.TfExampleDecoderLabelMap - - -@dataclasses.dataclass -class TfExampleDecoder(hyperparams.Config): - regenerate_source_id: bool = False - - -@dataclasses.dataclass -class DataDecoder(hyperparams.OneOfConfig): - type: Optional[str] = 'simple_decoder' - simple_decoder: TfExampleDecoder = TfExampleDecoder() - label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap() - - -@dataclasses.dataclass -class Parser(hyperparams.Config): - """Config for parser.""" - bgr_ordering: bool = True - aug_rand_hflip: bool = True - aug_scale_min: float = 1.0 - aug_scale_max: float = 1.0 - aug_rand_saturation: bool = False - aug_rand_brightness: bool = False - aug_rand_hue: bool = False - aug_rand_contrast: bool = False - odapi_augmentation: bool = False - channel_means: Tuple[float, float, float] = dataclasses.field( - default_factory=lambda: (104.01362025, 114.03422265, 119.9165958)) - channel_stds: Tuple[float, float, float] = dataclasses.field( - default_factory=lambda: (73.6027665, 69.89082075, 70.9150767)) - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """Input config for training.""" - input_path: str = '' - global_batch_size: int = 32 - is_training: bool = True - dtype: str = 'float16' - decoder: DataDecoder = DataDecoder() - parser: Parser = Parser() - shuffle_buffer_size: int = 10000 - file_type: str = 'tfrecord' - drop_remainder: bool = True - - -@dataclasses.dataclass -class DetectionLoss(hyperparams.Config): - object_center_weight: float = 1.0 - offset_weight: float = 1.0 - scale_weight: float = 0.1 - - -@dataclasses.dataclass -class Losses(hyperparams.Config): - detection: DetectionLoss = DetectionLoss() - gaussian_iou: float = 0.7 - class_offset: int = 1 - - -@dataclasses.dataclass -class CenterNetHead(hyperparams.Config): - heatmap_bias: float = -2.19 - input_levels: List[str] = dataclasses.field( - default_factory=lambda: ['2_0', '2']) - - -@dataclasses.dataclass -class CenterNetDetectionGenerator(hyperparams.Config): - max_detections: int = 100 - peak_error: float = 1e-6 - peak_extract_kernel_size: int = 3 - class_offset: int = 1 - use_nms: bool = False - nms_pre_thresh: float = 0.1 - nms_thresh: float = 0.4 - use_reduction_sum: bool = True - - -@dataclasses.dataclass -class CenterNetModel(hyperparams.Config): - """Config for centernet model.""" - num_classes: int = 90 - max_num_instances: int = 128 - input_size: List[int] = dataclasses.field(default_factory=list) - backbone: backbones.Backbone = backbones.Backbone( - type='hourglass', hourglass=backbones.Hourglass(model_id=52)) - head: CenterNetHead = CenterNetHead() - # pylint: disable=line-too-long - detection_generator: CenterNetDetectionGenerator = CenterNetDetectionGenerator() - norm_activation: common.NormActivation = common.NormActivation( - norm_momentum=0.1, norm_epsilon=1e-5, use_sync_bn=True) - - -@dataclasses.dataclass -class CenterNetDetection(hyperparams.Config): - # use_center is the only option implemented currently. - use_centers: bool = True - - -@dataclasses.dataclass -class CenterNetSubTasks(hyperparams.Config): - detection: CenterNetDetection = CenterNetDetection() - - -@dataclasses.dataclass -class CenterNetTask(cfg.TaskConfig): - """Config for centernet task.""" - model: CenterNetModel = CenterNetModel() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False) - subtasks: CenterNetSubTasks = CenterNetSubTasks() - losses: Losses = Losses() - gradient_clip_norm: float = 10.0 - per_category_metrics: bool = False - weight_decay: float = 5e-4 - # Load checkpoints - init_checkpoint: Optional[str] = None - init_checkpoint_modules: str = 'all' - annotation_file: Optional[str] = None - - def get_output_length_dict(self): - task_outputs = {} - if self.subtasks.detection and self.subtasks.detection.use_centers: - task_outputs.update({ - 'ct_heatmaps': self.model.num_classes, - 'ct_offset': 2, - 'ct_size': 2 - }) - else: - raise ValueError('Detection with center point is only available ') - return task_outputs - - -COCO_INPUT_PATH_BASE = 'coco' -COCO_TRAIN_EXAMPLES = 118287 -COCO_VAL_EXAMPLES = 5000 - - -@exp_factory.register_config_factory('centernet_hourglass_coco') -def centernet_hourglass_coco() -> cfg.ExperimentConfig: - """COCO object detection with CenterNet.""" - train_batch_size = 128 - eval_batch_size = 8 - steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size - - config = cfg.ExperimentConfig( - task=CenterNetTask( - annotation_file=os.path.join(COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=CenterNetModel(), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser(), - shuffle_buffer_size=2), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - shuffle_buffer_size=2), - ), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=150 * steps_per_epoch, - validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'adam', - 'adam': { - 'epsilon': 1e-7 - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - 'initial_learning_rate': 0.001, - 'decay_steps': 150 * steps_per_epoch - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 2000, - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config diff --git a/official/vision/beta/projects/centernet/modeling/layers/detection_generator.py b/official/vision/beta/projects/centernet/modeling/layers/detection_generator.py deleted file mode 100644 index abeea0df514092483d149c435b1cf002a0d9f588..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/modeling/layers/detection_generator.py +++ /dev/null @@ -1,339 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Detection generator for centernet. - -Parses predictions from the CenterNet head into the final bounding boxes, -confidences, and classes. This class contains repurposed methods from the -TensorFlow Object Detection API -in: https://github.com/tensorflow/models/blob/master/research/object_detection -/meta_architectures/center_net_meta_arch.py -""" - -from typing import Any, Mapping - -import tensorflow as tf - -from official.vision.beta.ops import box_ops -from official.vision.beta.projects.centernet.ops import loss_ops -from official.vision.beta.projects.centernet.ops import nms_ops - - -class CenterNetDetectionGenerator(tf.keras.layers.Layer): - """CenterNet Detection Generator.""" - - def __init__(self, - input_image_dims: int = 512, - net_down_scale: int = 4, - max_detections: int = 100, - peak_error: float = 1e-6, - peak_extract_kernel_size: int = 3, - class_offset: int = 1, - use_nms: bool = False, - nms_pre_thresh: float = 0.1, - nms_thresh: float = 0.4, - **kwargs): - """Initialize CenterNet Detection Generator. - - Args: - input_image_dims: An `int` that specifies the input image size. - net_down_scale: An `int` that specifies stride of the output. - max_detections: An `int` specifying the maximum number of bounding - boxes generated. This is an upper bound, so the number of generated - boxes may be less than this due to thresholding/non-maximum suppression. - peak_error: A `float` for determining non-valid heatmap locations to mask. - peak_extract_kernel_size: An `int` indicating the kernel size used when - performing max-pool over the heatmaps to detect valid center locations - from its neighbors. From the paper, set this to 3 to detect valid. - locations that have responses greater than its 8-connected neighbors - class_offset: An `int` indicating to add an offset to the class - prediction if the dataset labels have been shifted. - use_nms: A `bool` for whether or not to use non-maximum suppression to - filter the bounding boxes. - nms_pre_thresh: A `float` for pre-nms threshold. - nms_thresh: A `float` for nms threshold. - **kwargs: Additional keyword arguments to be passed. - """ - super(CenterNetDetectionGenerator, self).__init__(**kwargs) - - # Object center selection parameters - self._max_detections = max_detections - self._peak_error = peak_error - self._peak_extract_kernel_size = peak_extract_kernel_size - - # Used for adjusting class prediction - self._class_offset = class_offset - - # Box normalization parameters - self._net_down_scale = net_down_scale - self._input_image_dims = input_image_dims - - self._use_nms = use_nms - self._nms_pre_thresh = nms_pre_thresh - self._nms_thresh = nms_thresh - - def process_heatmap(self, - feature_map: tf.Tensor, - kernel_size: int) -> tf.Tensor: - """Processes the heatmap into peaks for box selection. - - Given a heatmap, this function first masks out nearby heatmap locations of - the same class using max-pooling such that, ideally, only one center for the - object remains. Then, center locations are masked according to their scores - in comparison to a threshold. NOTE: Repurposed from Google OD API. - - Args: - feature_map: A Tensor with shape [batch_size, height, width, num_classes] - which is the center heatmap predictions. - kernel_size: An integer value for max-pool kernel size. - - Returns: - A Tensor with the same shape as the input but with non-valid center - prediction locations masked out. - """ - - feature_map = tf.math.sigmoid(feature_map) - if not kernel_size or kernel_size == 1: - feature_map_peaks = feature_map - else: - feature_map_max_pool = tf.nn.max_pool( - feature_map, - ksize=kernel_size, - strides=1, - padding='SAME') - - feature_map_peak_mask = tf.math.abs( - feature_map - feature_map_max_pool) < self._peak_error - - # Zero out everything that is not a peak. - feature_map_peaks = ( - feature_map * tf.cast(feature_map_peak_mask, feature_map.dtype)) - - return feature_map_peaks - - def get_top_k_peaks(self, - feature_map_peaks: tf.Tensor, - batch_size: int, - width: int, - num_classes: int, - k: int = 100): - """Gets the scores and indices of the top-k peaks from the feature map. - - This function flattens the feature map in order to retrieve the top-k - peaks, then computes the x, y, and class indices for those scores. - NOTE: Repurposed from Google OD API. - - Args: - feature_map_peaks: A `Tensor` with shape [batch_size, height, - width, num_classes] which is the processed center heatmap peaks. - batch_size: An `int` that indicates the batch size of the input. - width: An `int` that indicates the width (and also height) of the input. - num_classes: An `int` for the number of possible classes. This is also - the channel depth of the input. - k: `int`` that controls how many peaks to select. - - Returns: - top_scores: A Tensor with shape [batch_size, k] containing the top-k - scores. - y_indices: A Tensor with shape [batch_size, k] containing the top-k - y-indices corresponding to top_scores. - x_indices: A Tensor with shape [batch_size, k] containing the top-k - x-indices corresponding to top_scores. - channel_indices: A Tensor with shape [batch_size, k] containing the top-k - channel indices corresponding to top_scores. - """ - # Flatten the entire prediction per batch - feature_map_peaks_flat = tf.reshape(feature_map_peaks, [batch_size, -1]) - - # top_scores and top_indices have shape [batch_size, k] - top_scores, top_indices = tf.math.top_k(feature_map_peaks_flat, k=k) - - # Get x, y and channel indices corresponding to the top indices in the flat - # array. - y_indices, x_indices, channel_indices = ( - loss_ops.get_row_col_channel_indices_from_flattened_indices( - top_indices, width, num_classes)) - - return top_scores, y_indices, x_indices, channel_indices - - def get_boxes(self, - y_indices: tf.Tensor, - x_indices: tf.Tensor, - channel_indices: tf.Tensor, - height_width_predictions: tf.Tensor, - offset_predictions: tf.Tensor, - num_boxes: int): - """Organizes prediction information into the final bounding boxes. - - NOTE: Repurposed from Google OD API. - - Args: - y_indices: A Tensor with shape [batch_size, k] containing the top-k - y-indices corresponding to top_scores. - x_indices: A Tensor with shape [batch_size, k] containing the top-k - x-indices corresponding to top_scores. - channel_indices: A Tensor with shape [batch_size, k] containing the top-k - channel indices corresponding to top_scores. - height_width_predictions: A Tensor with shape [batch_size, height, - width, 2] containing the object size predictions. - offset_predictions: A Tensor with shape [batch_size, height, width, 2] - containing the object local offset predictions. - num_boxes: `int`, the number of boxes. - - Returns: - boxes: A Tensor with shape [batch_size, num_boxes, 4] that contains the - bounding box coordinates in [y_min, x_min, y_max, x_max] format. - detection_classes: A Tensor with shape [batch_size, num_boxes] that - gives the class prediction for each box. - num_detections: Number of non-zero confidence detections made. - """ - # TF Lite does not support tf.gather with batch_dims > 0, so we need to use - # tf_gather_nd instead and here we prepare the indices for that. - - # shapes of heatmap output - shape = tf.shape(height_width_predictions) - batch_size, height, width = shape[0], shape[1], shape[2] - - # combined indices dtype=int32 - combined_indices = tf.stack([ - loss_ops.multi_range(batch_size, value_repetitions=num_boxes), - tf.reshape(y_indices, [-1]), - tf.reshape(x_indices, [-1]) - ], axis=1) - - new_height_width = tf.gather_nd(height_width_predictions, combined_indices) - new_height_width = tf.reshape(new_height_width, [batch_size, num_boxes, 2]) - height_width = tf.maximum(new_height_width, 0.0) - - # height and widths dtype=float32 - heights = height_width[..., 0] - widths = height_width[..., 1] - - # Get the offsets of center points - new_offsets = tf.gather_nd(offset_predictions, combined_indices) - offsets = tf.reshape(new_offsets, [batch_size, num_boxes, 2]) - - # offsets are dtype=float32 - y_offsets = offsets[..., 0] - x_offsets = offsets[..., 1] - - y_indices = tf.cast(y_indices, dtype=heights.dtype) - x_indices = tf.cast(x_indices, dtype=widths.dtype) - - detection_classes = channel_indices + self._class_offset - ymin = y_indices + y_offsets - heights / 2.0 - xmin = x_indices + x_offsets - widths / 2.0 - ymax = y_indices + y_offsets + heights / 2.0 - xmax = x_indices + x_offsets + widths / 2.0 - - ymin = tf.clip_by_value(ymin, 0., tf.cast(height, ymin.dtype)) - xmin = tf.clip_by_value(xmin, 0., tf.cast(width, xmin.dtype)) - ymax = tf.clip_by_value(ymax, 0., tf.cast(height, ymax.dtype)) - xmax = tf.clip_by_value(xmax, 0., tf.cast(width, xmax.dtype)) - boxes = tf.stack([ymin, xmin, ymax, xmax], axis=2) - - return boxes, detection_classes - - def convert_strided_predictions_to_normalized_boxes(self, boxes: tf.Tensor): - boxes = boxes * tf.cast(self._net_down_scale, boxes.dtype) - boxes = boxes / tf.cast(self._input_image_dims, boxes.dtype) - boxes = tf.clip_by_value(boxes, 0.0, 1.0) - return boxes - - def __call__(self, inputs): - # Get heatmaps from decoded outputs via final hourglass stack output - all_ct_heatmaps = inputs['ct_heatmaps'] - all_ct_sizes = inputs['ct_size'] - all_ct_offsets = inputs['ct_offset'] - - ct_heatmaps = all_ct_heatmaps[-1] - ct_sizes = all_ct_sizes[-1] - ct_offsets = all_ct_offsets[-1] - - shape = tf.shape(ct_heatmaps) - - _, width = shape[1], shape[2] - batch_size, num_channels = shape[0], shape[3] - - # Process heatmaps using 3x3 max pool and applying sigmoid - peaks = self.process_heatmap( - feature_map=ct_heatmaps, - kernel_size=self._peak_extract_kernel_size) - - # Get top scores along with their x, y, and class - # Each has size [batch_size, k] - scores, y_indices, x_indices, channel_indices = self.get_top_k_peaks( - feature_map_peaks=peaks, - batch_size=batch_size, - width=width, - num_classes=num_channels, - k=self._max_detections) - - # Parse the score and indices into bounding boxes - boxes, classes = self.get_boxes( - y_indices=y_indices, - x_indices=x_indices, - channel_indices=channel_indices, - height_width_predictions=ct_sizes, - offset_predictions=ct_offsets, - num_boxes=self._max_detections) - - # Normalize bounding boxes - boxes = self.convert_strided_predictions_to_normalized_boxes(boxes) - - # Apply nms - if self._use_nms: - boxes = tf.expand_dims(boxes, axis=-2) - multi_class_scores = tf.gather_nd( - peaks, tf.stack([y_indices, x_indices], -1), batch_dims=1) - - boxes, _, scores = nms_ops.nms( - boxes=boxes, - classes=multi_class_scores, - confidence=scores, - k=self._max_detections, - limit_pre_thresh=True, - pre_nms_thresh=0.1, - nms_thresh=0.4) - - num_det = tf.reduce_sum(tf.cast(scores > 0, dtype=tf.int32), axis=1) - boxes = box_ops.denormalize_boxes( - boxes, [self._input_image_dims, self._input_image_dims]) - - return { - 'boxes': boxes, - 'classes': classes, - 'confidence': scores, - 'num_detections': num_det - } - - def get_config(self) -> Mapping[str, Any]: - config = { - 'max_detections': self._max_detections, - 'peak_error': self._peak_error, - 'peak_extract_kernel_size': self._peak_extract_kernel_size, - 'class_offset': self._class_offset, - 'net_down_scale': self._net_down_scale, - 'input_image_dims': self._input_image_dims, - 'use_nms': self._use_nms, - 'nms_pre_thresh': self._nms_pre_thresh, - 'nms_thresh': self._nms_thresh - } - - base_config = super(CenterNetDetectionGenerator, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/projects/centernet/ops/__init__.py b/official/vision/beta/projects/centernet/ops/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/ops/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/centernet/ops/preprocess_ops.py b/official/vision/beta/projects/centernet/ops/preprocess_ops.py deleted file mode 100644 index 985b26cd83b4536ba557ab73c4c43bd0632461d0..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/ops/preprocess_ops.py +++ /dev/null @@ -1,496 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Preprocessing ops imported from OD API.""" - -import functools - -import tensorflow as tf - -from official.vision.beta.projects.centernet.ops import box_list -from official.vision.beta.projects.centernet.ops import box_list_ops - - -def _get_or_create_preprocess_rand_vars(generator_func, - function_id, - preprocess_vars_cache, - key=''): - """Returns a tensor stored in preprocess_vars_cache or using generator_func. - - If the tensor was previously generated and appears in the PreprocessorCache, - the previously generated tensor will be returned. Otherwise, a new tensor - is generated using generator_func and stored in the cache. - - Args: - generator_func: A 0-argument function that generates a tensor. - function_id: identifier for the preprocessing function used. - preprocess_vars_cache: PreprocessorCache object that records previously - performed augmentations. Updated in-place. If this - function is called multiple times with the same - non-null cache, it will perform deterministically. - key: identifier for the variable stored. - - Returns: - The generated tensor. - """ - if preprocess_vars_cache is not None: - var = preprocess_vars_cache.get(function_id, key) - if var is None: - var = generator_func() - preprocess_vars_cache.update(function_id, key, var) - else: - var = generator_func() - return var - - -def _random_integer(minval, maxval, seed): - """Returns a random 0-D tensor between minval and maxval. - - Args: - minval: minimum value of the random tensor. - maxval: maximum value of the random tensor. - seed: random seed. - - Returns: - A random 0-D tensor between minval and maxval. - """ - return tf.random.uniform( - [], minval=minval, maxval=maxval, dtype=tf.int32, seed=seed) - - -def _get_crop_border(border, size): - """Get the border of cropping.""" - - border = tf.cast(border, tf.float32) - size = tf.cast(size, tf.float32) - - i = tf.math.ceil(tf.math.log(2.0 * border / size) / tf.math.log(2.0)) - divisor = tf.pow(2.0, i) - divisor = tf.clip_by_value(divisor, 1, border) - divisor = tf.cast(divisor, tf.int32) - - return tf.cast(border, tf.int32) // divisor - - -def random_square_crop_by_scale(image, - boxes, - labels, - max_border=128, - scale_min=0.6, - scale_max=1.3, - num_scales=8, - seed=None, - preprocess_vars_cache=None): - """Randomly crop a square in proportion to scale and image size. - - Extract a square sized crop from an image whose side length is sampled by - randomly scaling the maximum spatial dimension of the image. If part of - the crop falls outside the image, it is filled with zeros. - The augmentation is borrowed from [1] - [1]: https://arxiv.org/abs/1904.07850 - - Args: - image: rank 3 float32 tensor containing 1 image -> - [height, width, channels]. - boxes: rank 2 float32 tensor containing the bounding boxes -> [N, 4]. - Boxes are in normalized form meaning their coordinates vary - between [0, 1]. Each row is in the form of [ymin, xmin, ymax, xmax]. - Boxes on the crop boundary are clipped to the boundary and boxes - falling outside the crop are ignored. - labels: rank 1 int32 tensor containing the object classes. - max_border: The maximum size of the border. The border defines distance in - pixels to the image boundaries that will not be considered as a center of - a crop. To make sure that the border does not go over the center of the - image, we chose the border value by computing the minimum k, such that - (max_border / (2**k)) < image_dimension/2. - scale_min: float, the minimum value for scale. - scale_max: float, the maximum value for scale. - num_scales: int, the number of discrete scale values to sample between - [scale_min, scale_max] - seed: random seed. - preprocess_vars_cache: PreprocessorCache object that records previously - performed augmentations. Updated in-place. If this - function is called multiple times with the same - non-null cache, it will perform deterministically. - - - Returns: - image: image which is the same rank as input image. - boxes: boxes which is the same rank as input boxes. - Boxes are in normalized form. - labels: new labels. - - """ - - img_shape = tf.shape(image) - height, width = img_shape[0], img_shape[1] - scales = tf.linspace(scale_min, scale_max, num_scales) - - scale = _get_or_create_preprocess_rand_vars( - lambda: scales[_random_integer(0, num_scales, seed)], - 'square_crop_scale', - preprocess_vars_cache, 'scale') - - image_size = scale * tf.cast(tf.maximum(height, width), tf.float32) - image_size = tf.cast(image_size, tf.int32) - h_border = _get_crop_border(max_border, height) - w_border = _get_crop_border(max_border, width) - - def y_function(): - y = _random_integer(h_border, - tf.cast(height, tf.int32) - h_border + 1, - seed) - return y - - def x_function(): - x = _random_integer(w_border, - tf.cast(width, tf.int32) - w_border + 1, - seed) - return x - - y_center = _get_or_create_preprocess_rand_vars( - y_function, - 'square_crop_scale', - preprocess_vars_cache, 'y_center') - - x_center = _get_or_create_preprocess_rand_vars( - x_function, - 'square_crop_scale', - preprocess_vars_cache, 'x_center') - - half_size = tf.cast(image_size / 2, tf.int32) - crop_ymin, crop_ymax = y_center - half_size, y_center + half_size - crop_xmin, crop_xmax = x_center - half_size, x_center + half_size - - ymin = tf.maximum(crop_ymin, 0) - xmin = tf.maximum(crop_xmin, 0) - ymax = tf.minimum(crop_ymax, height - 1) - xmax = tf.minimum(crop_xmax, width - 1) - - cropped_image = image[ymin:ymax, xmin:xmax] - offset_y = tf.maximum(0, ymin - crop_ymin) - offset_x = tf.maximum(0, xmin - crop_xmin) - - oy_i = offset_y - ox_i = offset_x - - output_image = tf.image.pad_to_bounding_box( - cropped_image, offset_height=oy_i, offset_width=ox_i, - target_height=image_size, target_width=image_size) - - if ymin == 0: - # We might be padding the image. - box_ymin = -offset_y - else: - box_ymin = crop_ymin - - if xmin == 0: - # We might be padding the image. - box_xmin = -offset_x - else: - box_xmin = crop_xmin - - box_ymax = box_ymin + image_size - box_xmax = box_xmin + image_size - - image_box = [box_ymin / height, box_xmin / width, - box_ymax / height, box_xmax / width] - boxlist = box_list.BoxList(boxes) - boxlist = box_list_ops.change_coordinate_frame(boxlist, image_box) - boxlist, indices = box_list_ops.prune_completely_outside_window( - boxlist, [0.0, 0.0, 1.0, 1.0]) - boxlist = box_list_ops.clip_to_window(boxlist, [0.0, 0.0, 1.0, 1.0], - filter_nonoverlapping=False) - - return_values = [output_image, - boxlist.get(), - tf.gather(labels, indices)] - - return return_values - - -def resize_to_range(image, - masks=None, - min_dimension=None, - max_dimension=None, - method=tf.image.ResizeMethod.BILINEAR, - pad_to_max_dimension=False, - per_channel_pad_value=(0, 0, 0)): - """Resizes an image so its dimensions are within the provided value. - - The output size can be described by two cases: - 1. If the image can be rescaled so its minimum dimension is equal to the - provided value without the other dimension exceeding max_dimension, - then do so. - 2. Otherwise, resize so the largest dimension is equal to max_dimension. - - Args: - image: A 3D tensor of shape [height, width, channels] - masks: (optional) rank 3 float32 tensor with shape - [num_instances, height, width] containing instance masks. - min_dimension: (optional) (scalar) desired size of the smaller image - dimension. - max_dimension: (optional) (scalar) maximum allowed size - of the larger image dimension. - method: (optional) interpolation method used in resizing. Defaults to - BILINEAR. - pad_to_max_dimension: Whether to resize the image and pad it with zeros - so the resulting image is of the spatial size - [max_dimension, max_dimension]. If masks are included they are padded - similarly. - per_channel_pad_value: A tuple of per-channel scalar value to use for - padding. By default pads zeros. - - Returns: - Note that the position of the resized_image_shape changes based on whether - masks are present. - resized_image: A 3D tensor of shape [new_height, new_width, channels], - where the image has been resized (with bilinear interpolation) so that - min(new_height, new_width) == min_dimension or - max(new_height, new_width) == max_dimension. - resized_masks: If masks is not None, also outputs masks. A 3D tensor of - shape [num_instances, new_height, new_width]. - resized_image_shape: A 1D tensor of shape [3] containing shape of the - resized image. - - Raises: - ValueError: if the image is not a 3D tensor. - """ - if len(image.get_shape()) != 3: - raise ValueError('Image should be 3D tensor') - - def _resize_landscape_image(image): - # resize a landscape image - return tf.image.resize( - image, tf.stack([min_dimension, max_dimension]), method=method, - preserve_aspect_ratio=True) - - def _resize_portrait_image(image): - # resize a portrait image - return tf.image.resize( - image, tf.stack([max_dimension, min_dimension]), method=method, - preserve_aspect_ratio=True) - - with tf.name_scope('ResizeToRange'): - if image.get_shape().is_fully_defined(): - if image.get_shape()[0] < image.get_shape()[1]: - new_image = _resize_landscape_image(image) - else: - new_image = _resize_portrait_image(image) - new_size = tf.constant(new_image.get_shape().as_list()) - else: - new_image = tf.cond( - tf.less(tf.shape(image)[0], tf.shape(image)[1]), - lambda: _resize_landscape_image(image), - lambda: _resize_portrait_image(image)) - new_size = tf.shape(new_image) - - if pad_to_max_dimension: - channels = tf.unstack(new_image, axis=2) - if len(channels) != len(per_channel_pad_value): - raise ValueError('Number of channels must be equal to the length of ' - 'per-channel pad value.') - new_image = tf.stack( - [ - tf.pad( # pylint: disable=g-complex-comprehension - channels[i], [[0, max_dimension - new_size[0]], - [0, max_dimension - new_size[1]]], - constant_values=per_channel_pad_value[i]) - for i in range(len(channels)) - ], - axis=2) - new_image.set_shape([max_dimension, max_dimension, len(channels)]) - - result = [new_image, new_size] - if masks is not None: - new_masks = tf.expand_dims(masks, 3) - new_masks = tf.image.resize( - new_masks, - new_size[:-1], - method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) - if pad_to_max_dimension: - new_masks = tf.image.pad_to_bounding_box( - new_masks, 0, 0, max_dimension, max_dimension) - new_masks = tf.squeeze(new_masks, 3) - result.append(new_masks) - - return result - - -def _augment_only_rgb_channels(image, augment_function): - """Augments only the RGB slice of an image with additional channels.""" - rgb_slice = image[:, :, :3] - augmented_rgb_slice = augment_function(rgb_slice) - image = tf.concat([augmented_rgb_slice, image[:, :, 3:]], -1) - return image - - -def random_adjust_brightness(image, - max_delta=0.2, - seed=None, - preprocess_vars_cache=None): - """Randomly adjusts brightness. - - Makes sure the output image is still between 0 and 255. - - Args: - image: rank 3 float32 tensor contains 1 image -> [height, width, channels] - with pixel values varying between [0, 255]. - max_delta: how much to change the brightness. A value between [0, 1). - seed: random seed. - preprocess_vars_cache: PreprocessorCache object that records previously - performed augmentations. Updated in-place. If this - function is called multiple times with the same - non-null cache, it will perform deterministically. - - Returns: - image: image which is the same shape as input image. - """ - with tf.name_scope('RandomAdjustBrightness'): - generator_func = functools.partial(tf.random.uniform, [], - -max_delta, max_delta, seed=seed) - delta = _get_or_create_preprocess_rand_vars( - generator_func, - 'adjust_brightness', - preprocess_vars_cache) - - def _adjust_brightness(image): - image = tf.image.adjust_brightness(image / 255, delta) * 255 - image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) - return image - - image = _augment_only_rgb_channels(image, _adjust_brightness) - return image - - -def random_adjust_contrast(image, - min_delta=0.8, - max_delta=1.25, - seed=None, - preprocess_vars_cache=None): - """Randomly adjusts contrast. - - Makes sure the output image is still between 0 and 255. - - Args: - image: rank 3 float32 tensor contains 1 image -> [height, width, channels] - with pixel values varying between [0, 255]. - min_delta: see max_delta. - max_delta: how much to change the contrast. Contrast will change with a - value between min_delta and max_delta. This value will be - multiplied to the current contrast of the image. - seed: random seed. - preprocess_vars_cache: PreprocessorCache object that records previously - performed augmentations. Updated in-place. If this - function is called multiple times with the same - non-null cache, it will perform deterministically. - - Returns: - image: image which is the same shape as input image. - """ - with tf.name_scope('RandomAdjustContrast'): - generator_func = functools.partial(tf.random.uniform, [], - min_delta, max_delta, seed=seed) - contrast_factor = _get_or_create_preprocess_rand_vars( - generator_func, - 'adjust_contrast', - preprocess_vars_cache) - - def _adjust_contrast(image): - image = tf.image.adjust_contrast(image / 255, contrast_factor) * 255 - image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) - return image - - image = _augment_only_rgb_channels(image, _adjust_contrast) - return image - - -def random_adjust_hue(image, - max_delta=0.02, - seed=None, - preprocess_vars_cache=None): - """Randomly adjusts hue. - - Makes sure the output image is still between 0 and 255. - - Args: - image: rank 3 float32 tensor contains 1 image -> [height, width, channels] - with pixel values varying between [0, 255]. - max_delta: change hue randomly with a value between 0 and max_delta. - seed: random seed. - preprocess_vars_cache: PreprocessorCache object that records previously - performed augmentations. Updated in-place. If this - function is called multiple times with the same - non-null cache, it will perform deterministically. - - Returns: - image: image which is the same shape as input image. - """ - with tf.name_scope('RandomAdjustHue'): - generator_func = functools.partial(tf.random.uniform, [], - -max_delta, max_delta, seed=seed) - delta = _get_or_create_preprocess_rand_vars( - generator_func, - 'adjust_hue', - preprocess_vars_cache) - - def _adjust_hue(image): - image = tf.image.adjust_hue(image / 255, delta) * 255 - image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) - return image - - image = _augment_only_rgb_channels(image, _adjust_hue) - return image - - -def random_adjust_saturation(image, - min_delta=0.8, - max_delta=1.25, - seed=None, - preprocess_vars_cache=None): - """Randomly adjusts saturation. - - Makes sure the output image is still between 0 and 255. - - Args: - image: rank 3 float32 tensor contains 1 image -> [height, width, channels] - with pixel values varying between [0, 255]. - min_delta: see max_delta. - max_delta: how much to change the saturation. Saturation will change with a - value between min_delta and max_delta. This value will be - multiplied to the current saturation of the image. - seed: random seed. - preprocess_vars_cache: PreprocessorCache object that records previously - performed augmentations. Updated in-place. If this - function is called multiple times with the same - non-null cache, it will perform deterministically. - - Returns: - image: image which is the same shape as input image. - """ - with tf.name_scope('RandomAdjustSaturation'): - generator_func = functools.partial(tf.random.uniform, [], - min_delta, max_delta, seed=seed) - saturation_factor = _get_or_create_preprocess_rand_vars( - generator_func, - 'adjust_saturation', - preprocess_vars_cache) - - def _adjust_saturation(image): - image = tf.image.adjust_saturation(image / 255, saturation_factor) * 255 - image = tf.clip_by_value(image, clip_value_min=0.0, clip_value_max=255.0) - return image - - image = _augment_only_rgb_channels(image, _adjust_saturation) - return image diff --git a/official/vision/beta/projects/centernet/tasks/centernet.py b/official/vision/beta/projects/centernet/tasks/centernet.py deleted file mode 100644 index e02c863bf31926ea5787fc497a8994d42454b713..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/tasks/centernet.py +++ /dev/null @@ -1,425 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Centernet task definition.""" - -from typing import Any, List, Optional, Tuple - -from absl import logging -import tensorflow as tf - -from official.core import base_task -from official.core import input_reader -from official.core import task_factory -from official.vision.beta.dataloaders import tf_example_decoder -from official.vision.beta.dataloaders import tfds_factory -from official.vision.beta.dataloaders import tf_example_label_map_decoder -from official.vision.beta.evaluation import coco_evaluator -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.projects.centernet.configs import centernet as exp_cfg -from official.vision.beta.projects.centernet.dataloaders import centernet_input -from official.vision.beta.projects.centernet.losses import centernet_losses -from official.vision.beta.projects.centernet.modeling import centernet_model -from official.vision.beta.projects.centernet.modeling.heads import centernet_head -from official.vision.beta.projects.centernet.modeling.layers import detection_generator -from official.vision.beta.projects.centernet.ops import loss_ops -from official.vision.beta.projects.centernet.ops import target_assigner - - -@task_factory.register_task_cls(exp_cfg.CenterNetTask) -class CenterNetTask(base_task.Task): - """Task definition for centernet.""" - - def build_inputs(self, - params: exp_cfg.DataConfig, - input_context: Optional[tf.distribute.InputContext] = None): - """Build input dataset.""" - if params.tfds_name: - decoder = tfds_factory.get_detection_decoder(params.tfds_name) - else: - decoder_cfg = params.decoder.get() - if params.decoder.type == 'simple_decoder': - decoder = tf_example_decoder.TfExampleDecoder( - regenerate_source_id=decoder_cfg.regenerate_source_id) - elif params.decoder.type == 'label_map_decoder': - decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( - label_map=decoder_cfg.label_map, - regenerate_source_id=decoder_cfg.regenerate_source_id) - else: - raise ValueError('Unknown decoder type: {}!'.format( - params.decoder.type)) - - parser = centernet_input.CenterNetParser( - output_height=self.task_config.model.input_size[0], - output_width=self.task_config.model.input_size[1], - max_num_instances=self.task_config.model.max_num_instances, - bgr_ordering=params.parser.bgr_ordering, - channel_means=params.parser.channel_means, - channel_stds=params.parser.channel_stds, - aug_rand_hflip=params.parser.aug_rand_hflip, - aug_scale_min=params.parser.aug_scale_min, - aug_scale_max=params.parser.aug_scale_max, - aug_rand_hue=params.parser.aug_rand_hue, - aug_rand_brightness=params.parser.aug_rand_brightness, - aug_rand_contrast=params.parser.aug_rand_contrast, - aug_rand_saturation=params.parser.aug_rand_saturation, - odapi_augmentation=params.parser.odapi_augmentation, - dtype=params.dtype) - - reader = input_reader.InputReader( - params, - dataset_fn=tf.data.TFRecordDataset, - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - - dataset = reader.read(input_context=input_context) - - return dataset - - def build_model(self): - """get an instance of CenterNet.""" - model_config = self.task_config.model - input_specs = tf.keras.layers.InputSpec( - shape=[None] + model_config.input_size) - - l2_weight_decay = self.task_config.weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - backbone = factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=model_config.norm_activation, - l2_regularizer=l2_regularizer) - - task_outputs = self.task_config.get_output_length_dict() - head_config = model_config.head - head = centernet_head.CenterNetHead( - input_specs=backbone.output_specs, - task_outputs=task_outputs, - input_levels=head_config.input_levels, - heatmap_bias=head_config.heatmap_bias) - - # output_specs is a dict - backbone_output_spec = backbone.output_specs[head_config.input_levels[-1]] - if len(backbone_output_spec) == 4: - bb_output_height = backbone_output_spec[1] - elif len(backbone_output_spec) == 3: - bb_output_height = backbone_output_spec[0] - else: - raise ValueError - self._net_down_scale = int(model_config.input_size[0] / bb_output_height) - dg_config = model_config.detection_generator - detect_generator_obj = detection_generator.CenterNetDetectionGenerator( - max_detections=dg_config.max_detections, - peak_error=dg_config.peak_error, - peak_extract_kernel_size=dg_config.peak_extract_kernel_size, - class_offset=dg_config.class_offset, - net_down_scale=self._net_down_scale, - input_image_dims=model_config.input_size[0], - use_nms=dg_config.use_nms, - nms_pre_thresh=dg_config.nms_pre_thresh, - nms_thresh=dg_config.nms_thresh) - - model = centernet_model.CenterNetModel( - backbone=backbone, - head=head, - detection_generator=detect_generator_obj) - - return model - - def initialize(self, model: tf.keras.Model): - """Loading pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - - # Restoring checkpoint. - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.restore(ckpt_dir_or_file) - status.assert_consumed() - elif self.task_config.init_checkpoint_modules == 'backbone': - ckpt = tf.train.Checkpoint(backbone=model.backbone) - status = ckpt.restore(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - raise ValueError( - "Only 'all' or 'backbone' can be used to initialize the model.") - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def build_losses(self, - outputs, - labels, - aux_losses=None): - """Build losses.""" - input_size = self.task_config.model.input_size[0:2] - output_size = outputs['ct_heatmaps'][0].get_shape().as_list()[1:3] - - gt_label = tf.map_fn( - # pylint: disable=g-long-lambda - fn=lambda x: target_assigner.assign_centernet_targets( - labels=x, - input_size=input_size, - output_size=output_size, - num_classes=self.task_config.model.num_classes, - max_num_instances=self.task_config.model.max_num_instances, - gaussian_iou=self.task_config.losses.gaussian_iou, - class_offset=self.task_config.losses.class_offset), - elems=labels, - fn_output_signature={ - 'ct_heatmaps': tf.TensorSpec( - shape=[output_size[0], output_size[1], - self.task_config.model.num_classes], - dtype=tf.float32), - 'ct_offset': tf.TensorSpec( - shape=[self.task_config.model.max_num_instances, 2], - dtype=tf.float32), - 'size': tf.TensorSpec( - shape=[self.task_config.model.max_num_instances, 2], - dtype=tf.float32), - 'box_mask': tf.TensorSpec( - shape=[self.task_config.model.max_num_instances], - dtype=tf.int32), - 'box_indices': tf.TensorSpec( - shape=[self.task_config.model.max_num_instances, 2], - dtype=tf.int32), - } - ) - - losses = {} - - # Create loss functions - object_center_loss_fn = centernet_losses.PenaltyReducedLogisticFocalLoss() - localization_loss_fn = centernet_losses.L1LocalizationLoss() - - # Set up box indices so that they have a batch element as well - box_indices = loss_ops.add_batch_to_indices(gt_label['box_indices']) - - box_mask = tf.cast(gt_label['box_mask'], dtype=tf.float32) - num_boxes = tf.cast( - loss_ops.get_num_instances_from_weights(gt_label['box_mask']), - dtype=tf.float32) - - # Calculate center heatmap loss - output_unpad_image_shapes = tf.math.ceil( - tf.cast(labels['unpad_image_shapes'], - tf.float32) / self._net_down_scale) - valid_anchor_weights = loss_ops.get_valid_anchor_weights_in_flattened_image( - output_unpad_image_shapes, output_size[0], output_size[1]) - valid_anchor_weights = tf.expand_dims(valid_anchor_weights, 2) - - pred_ct_heatmap_list = outputs['ct_heatmaps'] - true_flattened_ct_heatmap = loss_ops.flatten_spatial_dimensions( - gt_label['ct_heatmaps']) - true_flattened_ct_heatmap = tf.cast(true_flattened_ct_heatmap, tf.float32) - - total_center_loss = 0.0 - for ct_heatmap in pred_ct_heatmap_list: - pred_flattened_ct_heatmap = loss_ops.flatten_spatial_dimensions( - ct_heatmap) - pred_flattened_ct_heatmap = tf.cast(pred_flattened_ct_heatmap, tf.float32) - total_center_loss += object_center_loss_fn( - target_tensor=true_flattened_ct_heatmap, - prediction_tensor=pred_flattened_ct_heatmap, - weights=valid_anchor_weights) - - center_loss = tf.reduce_sum(total_center_loss) / float( - len(pred_ct_heatmap_list) * num_boxes) - losses['ct_loss'] = center_loss - - # Calculate scale loss - pred_scale_list = outputs['ct_size'] - true_scale = tf.cast(gt_label['size'], tf.float32) - - total_scale_loss = 0.0 - for scale_map in pred_scale_list: - pred_scale = loss_ops.get_batch_predictions_from_indices(scale_map, - box_indices) - pred_scale = tf.cast(pred_scale, tf.float32) - # Only apply loss for boxes that appear in the ground truth - total_scale_loss += tf.reduce_sum( - localization_loss_fn(target_tensor=true_scale, - prediction_tensor=pred_scale), - axis=-1) * box_mask - - scale_loss = tf.reduce_sum(total_scale_loss) / float( - len(pred_scale_list) * num_boxes) - losses['scale_loss'] = scale_loss - - # Calculate offset loss - pred_offset_list = outputs['ct_offset'] - true_offset = tf.cast(gt_label['ct_offset'], tf.float32) - - total_offset_loss = 0.0 - for offset_map in pred_offset_list: - pred_offset = loss_ops.get_batch_predictions_from_indices(offset_map, - box_indices) - pred_offset = tf.cast(pred_offset, tf.float32) - # Only apply loss for boxes that appear in the ground truth - total_offset_loss += tf.reduce_sum( - localization_loss_fn(target_tensor=true_offset, - prediction_tensor=pred_offset), - axis=-1) * box_mask - - offset_loss = tf.reduce_sum(total_offset_loss) / float( - len(pred_offset_list) * num_boxes) - losses['ct_offset_loss'] = offset_loss - - # Aggregate and finalize loss - loss_weights = self.task_config.losses.detection - total_loss = (loss_weights.object_center_weight * center_loss + - loss_weights.scale_weight * scale_loss + - loss_weights.offset_weight * offset_loss) - - if aux_losses: - total_loss += tf.add_n(aux_losses) - - losses['total_loss'] = total_loss - return losses - - def build_metrics(self, training=True): - metrics = [] - metric_names = ['total_loss', 'ct_loss', 'scale_loss', 'ct_offset_loss'] - for name in metric_names: - metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) - - if not training: - if (self.task_config.validation_data.tfds_name - and self.task_config.annotation_file): - raise ValueError( - "Can't evaluate using annotation file when TFDS is used.") - self.coco_metric = coco_evaluator.COCOEvaluator( - annotation_file=self.task_config.annotation_file, - include_mask=False, - per_category_metrics=self.task_config.per_category_metrics) - - return metrics - - def train_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - optimizer: tf.keras.optimizers.Optimizer, - metrics: Optional[List[Any]] = None): - """Does forward and backward. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model(features, training=True) - # Casting output layer as float32 is necessary when mixed_precision is - # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - - losses = self.build_losses(outputs['raw_output'], labels) - - scaled_loss = losses['total_loss'] / num_replicas - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - # compute the gradient - tvars = model.trainable_variables - gradients = tape.gradient(scaled_loss, tvars) - - # get unscaled loss if the scaled loss was used - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - gradients = optimizer.get_unscaled_gradients(gradients) - - if self.task_config.gradient_clip_norm > 0.0: - gradients, _ = tf.clip_by_global_norm(gradients, - self.task_config.gradient_clip_norm) - - optimizer.apply_gradients(list(zip(gradients, tvars))) - - logs = {self.loss: losses['total_loss']} - - if metrics: - for m in metrics: - m.update_state(losses[m.name]) - logs.update({m.name: m.result()}) - - return logs - - def validation_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - metrics: Optional[List[Any]] = None): - """Validation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - - outputs = model(features, training=False) - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - - losses = self.build_losses(outputs['raw_output'], labels) - - logs = {self.loss: losses['total_loss']} - - coco_model_outputs = { - 'detection_boxes': outputs['boxes'], - 'detection_scores': outputs['confidence'], - 'detection_classes': outputs['classes'], - 'num_detections': outputs['num_detections'], - 'source_id': labels['groundtruths']['source_id'], - 'image_info': labels['image_info'] - } - - logs.update({self.coco_metric.name: (labels['groundtruths'], - coco_model_outputs)}) - - if metrics: - for m in metrics: - m.update_state(losses[m.name]) - logs.update({m.name: m.result()}) - return logs - - def aggregate_logs(self, state=None, step_outputs=None): - if state is None: - self.coco_metric.reset_states() - state = self.coco_metric - self.coco_metric.update_state(step_outputs[self.coco_metric.name][0], - step_outputs[self.coco_metric.name][1]) - return state - - def reduce_aggregated_logs(self, aggregated_logs, global_step=None): - return self.coco_metric.result() diff --git a/official/vision/beta/projects/centernet/train.py b/official/vision/beta/projects/centernet/train.py deleted file mode 100644 index 82a0fa64b51533f856c326fc2e3f90e6a48ad770..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/train.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""TensorFlow Model Garden Vision Centernet trainer.""" -from absl import app -from absl import flags -import gin - -from official.common import distribute_utils -from official.common import flags as tfm_flags -from official.core import task_factory -from official.core import train_lib -from official.core import train_utils -from official.modeling import performance -from official.vision.beta.projects.centernet.common import registry_imports # pylint: disable=unused-import - -FLAGS = flags.FLAGS - - -def main(_): - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) - params = train_utils.parse_configuration(FLAGS) - - model_dir = FLAGS.model_dir - if 'train' in FLAGS.mode: - # Pure eval modes do not output yaml files. Otherwise continuous eval job - # may race against the train job for writing the same file. - train_utils.serialize_config(params, model_dir) - - # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' - # can have significant impact on model speeds by utilizing float16 in case of - # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when - # dtype is float16 - if params.runtime.mixed_precision_dtype: - performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) - distribution_strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - with distribution_strategy.scope(): - task = task_factory.get_task(params.task, logging_dir=model_dir) - - train_lib.run_experiment( - distribution_strategy=distribution_strategy, - task=task, - mode=FLAGS.mode, - params=params, - model_dir=model_dir) - - train_utils.save_gin_config(FLAGS.mode, model_dir) - - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(main) diff --git a/official/vision/beta/projects/centernet/utils/checkpoints/__init__.py b/official/vision/beta/projects/centernet/utils/checkpoints/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/centernet/utils/checkpoints/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/README.md b/official/vision/beta/projects/deepmac_maskrcnn/README.md deleted file mode 100644 index dffe66021a392a0ac71905d643a313270c92e2c9..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/README.md +++ /dev/null @@ -1,122 +0,0 @@ -# Mask R-CNN with deep mask heads - -This project brings insights from the DeepMAC model into the Mask-RCNN -architecture. Please see the paper -[The surprising impact of mask-head architecture on novel class segmentation](https://arxiv.org/abs/2104.00613) -for more details. - -## Code structure - -* This folder contains forks of a few Mask R-CNN files and repurposes them to - support deep mask heads. -* To see the benefits of using deep mask heads, it is important to train the - mask head with only groundtruth boxes. This is configured via the - `task.model.use_gt_boxes_for_masks` flag. -* Architecture of the mask head can be changed via the config value - `task.model.mask_head.convnet_variant`. Supported values are `"default"`, - `"hourglass20"`, `"hourglass52"`, and `"hourglass100"`. -* The flag `task.model.mask_head.class_agnostic` trains the model in class - agnostic mode and `task.allowed_mask_class_ids` controls which classes are - allowed to have masks during training. -* Majority of experiments and ablations from the paper are perfomed with the - [DeepMAC model](../../../../../research/object_detection/g3doc/deepmac.md) - in the Object Detection API code base. - -## Prerequisites - -### Prepare dataset - -Use [create_coco_tf_record.py](../../data/create_coco_tf_record.py) to create -the COCO dataset. The data needs to be store in a -[Google cloud storage bucket](https://cloud.google.com/storage/docs/creating-buckets) -so that it can be accessed by the TPU. - -### Start a TPU v3-32 instance - -See [TPU Quickstart](https://cloud.google.com/tpu/docs/quickstart) for -instructions. An example command would look like: - -```shell -ctpu up --name --zone --tpu-size=v3-32 --tf-version nightly -``` - -This model requires TF version `>= 2.5`. Currently, that is only available via a -`nightly` build on Cloud. - -### Install requirements - -SSH into the TPU host with `gcloud compute ssh ` and execute the -following. - -```shell -$ git clone https://github.com/tensorflow/models.git -$ cd models -$ pip3 install -r official/requirements.txt -``` - -## Training Models - -The configurations can be found in the `configs/experiments` directory. You can -launch a training job by executing. - -```shell -$ export CONFIG=./official/vision/beta/projects/deepmac_maskrcnn/configs/experiments/deep_mask_head_rcnn_voc_r50.yaml -$ export MODEL_DIR="gs://" -$ export ANNOTAION_FILE="gs://" -$ export TRAIN_DATA="gs://" -$ export EVAL_DATA="gs://" -# Overrides to access data. These can also be changed in the config file. -$ export OVERRIDES="task.validation_data.input_path=${EVAL_DATA},\ -task.train_data.input_path=${TRAIN_DATA},\ -task.annotation_file=${ANNOTAION_FILE},\ -runtime.distribution_strategy=tpu" - -$ python3 -m official.vision.beta.projects.deepmac_maskrcnn.train \ - --logtostderr \ - --mode=train_and_eval \ - --experiment=deep_mask_head_rcnn_resnetfpn_coco \ - --model_dir=$MODEL_DIR \ - --config_file=$CONFIG \ - --params_override=$OVERRIDES\ - --tpu= -``` - -`CONFIG_FILE` can be any file in the `configs/experiments` directory. -When using SpineNet models, please specify -`--experiment=deep_mask_head_rcnn_spinenet_coco` - -**Note:** The default eval batch size of 32 discards some samples during -validation. For accurate vaidation statistics, launch a dedicated eval job on -TPU `v3-8` and set batch size to 8. - -## Configurations - -In the following table, we report the Mask mAP of our models on the non-VOC -classes when only training with masks for the VOC calsses. Performance is -measured on the `coco-val2017` set. - -Backbone | Mask head | Config name | Mask mAP -:------------| :----------- | :-----------------------------------------------| -------: -ResNet-50 | Default | `deep_mask_head_rcnn_voc_r50.yaml` | 25.9 -ResNet-50 | Hourglass-52 | `deep_mask_head_rcnn_voc_r50_hg52.yaml` | 33.1 -ResNet-101 | Hourglass-52 | `deep_mask_head_rcnn_voc_r101_hg52.yaml` | 34.4 -SpienNet-143 | Hourglass-52 | `deep_mask_head_rcnn_voc_spinenet143_hg52.yaml` | 38.7 - -## See also - -* [DeepMAC model](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/deepmac.md) - in the Object Detection API code base. -* Project website - [git.io/deepmac](https://git.io/deepmac) - -## Citation - -``` -@misc{birodkar2021surprising, - title={The surprising impact of mask-head architecture on novel class segmentation}, - author={Vighnesh Birodkar and Zhichao Lu and Siyang Li and Vivek Rathod and Jonathan Huang}, - year={2021}, - eprint={2104.00613}, - archivePrefix={arXiv}, - primaryClass={cs.CV} -} -``` diff --git a/official/vision/beta/projects/deepmac_maskrcnn/__init__.py b/official/vision/beta/projects/deepmac_maskrcnn/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/common/__init__.py b/official/vision/beta/projects/deepmac_maskrcnn/common/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/common/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/common/registry_imports.py b/official/vision/beta/projects/deepmac_maskrcnn/common/registry_imports.py deleted file mode 100644 index 0732d1a0be9d5728dc01f907db493c8ac1a3bd73..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/common/registry_imports.py +++ /dev/null @@ -1,18 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Imports to configure Mask R-CNN with deep mask heads.""" - -# pylint: disable=unused-import -from official.vision.beta.projects.deepmac_maskrcnn.tasks import deep_mask_head_rcnn diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/__init__.py b/official/vision/beta/projects/deepmac_maskrcnn/configs/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn.py b/official/vision/beta/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn.py deleted file mode 100644 index ef81566ed77eccc61f71b07670c0292724445429..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/configs/deep_mask_head_rcnn.py +++ /dev/null @@ -1,197 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Configuration for Mask R-CNN with deep mask heads.""" - -import os -from typing import Optional - -import dataclasses - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import optimization -from official.vision.beta.configs import backbones -from official.vision.beta.configs import common -from official.vision.beta.configs import decoders -from official.vision.beta.configs import maskrcnn as maskrcnn_config -from official.vision.beta.configs import retinanet as retinanet_config - - -@dataclasses.dataclass -class DeepMaskHead(maskrcnn_config.MaskHead): - convnet_variant: str = 'default' - - -@dataclasses.dataclass -class DeepMaskHeadRCNN(maskrcnn_config.MaskRCNN): - mask_head: Optional[DeepMaskHead] = DeepMaskHead() - use_gt_boxes_for_masks: bool = False - - -@dataclasses.dataclass -class DeepMaskHeadRCNNTask(maskrcnn_config.MaskRCNNTask): - """Configuration for the deep mask head R-CNN task.""" - model: DeepMaskHeadRCNN = DeepMaskHeadRCNN() - - -@exp_factory.register_config_factory('deep_mask_head_rcnn_resnetfpn_coco') -def deep_mask_head_rcnn_resnetfpn_coco() -> cfg.ExperimentConfig: - """COCO object detection with Mask R-CNN with deep mask heads.""" - global_batch_size = 64 - steps_per_epoch = int(retinanet_config.COCO_TRAIN_EXAMPLES / - global_batch_size) - coco_val_samples = 5000 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=DeepMaskHeadRCNNTask( - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', - init_checkpoint_modules='backbone', - annotation_file=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - model=DeepMaskHeadRCNN( - num_classes=91, input_size=[1024, 1024, 3], include_mask=True), # pytype: disable=wrong-keyword-args - losses=maskrcnn_config.Losses(l2_weight_decay=0.00004), - train_data=maskrcnn_config.DataConfig( - input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, - 'train*'), - is_training=True, - global_batch_size=global_batch_size, - parser=maskrcnn_config.Parser( - aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), - validation_data=maskrcnn_config.DataConfig( - input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, - 'val*'), - is_training=False, - global_batch_size=8)), # pytype: disable=wrong-keyword-args - trainer=cfg.TrainerConfig( - train_steps=22500, - validation_steps=coco_val_samples // 8, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [15000, 20000], - 'values': [0.12, 0.012, 0.0012], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 500, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('deep_mask_head_rcnn_spinenet_coco') -def deep_mask_head_rcnn_spinenet_coco() -> cfg.ExperimentConfig: - """COCO object detection with Mask R-CNN with SpineNet backbone.""" - steps_per_epoch = 463 - coco_val_samples = 5000 - train_batch_size = 256 - eval_batch_size = 8 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=DeepMaskHeadRCNNTask( - annotation_file=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), # pytype: disable=wrong-keyword-args - model=DeepMaskHeadRCNN( - backbone=backbones.Backbone( - type='spinenet', - spinenet=backbones.SpineNet( - model_id='49', - min_level=3, - max_level=7, - )), - decoder=decoders.Decoder( - type='identity', identity=decoders.Identity()), - anchor=maskrcnn_config.Anchor(anchor_size=3), - norm_activation=common.NormActivation(use_sync_bn=True), - num_classes=91, - input_size=[640, 640, 3], - min_level=3, - max_level=7, - include_mask=True), # pytype: disable=wrong-keyword-args - losses=maskrcnn_config.Losses(l2_weight_decay=0.00004), - train_data=maskrcnn_config.DataConfig( - input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, - 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=maskrcnn_config.Parser( - aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)), - validation_data=maskrcnn_config.DataConfig( - input_path=os.path.join(maskrcnn_config.COCO_INPUT_PATH_BASE, - 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - drop_remainder=False)), # pytype: disable=wrong-keyword-args - trainer=cfg.TrainerConfig( - train_steps=steps_per_epoch * 350, - validation_steps=coco_val_samples // eval_batch_size, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - steps_per_epoch * 320, steps_per_epoch * 340 - ], - 'values': [0.32, 0.032, 0.0032], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 2000, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None', - 'task.model.min_level == task.model.backbone.spinenet.min_level', - 'task.model.max_level == task.model.backbone.spinenet.max_level', - ]) - return config diff --git a/official/vision/beta/projects/deepmac_maskrcnn/modeling/__init__.py b/official/vision/beta/projects/deepmac_maskrcnn/modeling/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/modeling/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/__init__.py b/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/instance_heads.py b/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/instance_heads.py deleted file mode 100644 index 6e6ef08885aeab669a9ce41468223491a32f9d55..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/instance_heads.py +++ /dev/null @@ -1,311 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Instance prediction heads.""" - -# Import libraries - -from absl import logging -import tensorflow as tf - -from official.modeling import tf_utils -from official.vision.beta.projects.deepmac_maskrcnn.modeling.heads import hourglass_network - - -class DeepMaskHead(tf.keras.layers.Layer): - """Creates a mask head.""" - - def __init__(self, - num_classes, - upsample_factor=2, - num_convs=4, - num_filters=256, - use_separable_conv=False, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - class_agnostic=False, - convnet_variant='default', - **kwargs): - """Initializes a mask head. - - Args: - num_classes: An `int` of the number of classes. - upsample_factor: An `int` that indicates the upsample factor to generate - the final predicted masks. It should be >= 1. - num_convs: An `int` number that represents the number of the intermediate - convolution layers before the mask prediction layers. - num_filters: An `int` number that represents the number of filters of the - intermediate convolution layers. - use_separable_conv: A `bool` that indicates whether the separable - convolution layers is used. - activation: A `str` that indicates which activation is used, e.g. 'relu', - 'swish', etc. - use_sync_bn: A `bool` that indicates whether to use synchronized batch - normalization across different replicas. - norm_momentum: A `float` of normalization momentum for the moving average. - norm_epsilon: A `float` added to variance to avoid dividing by zero. - kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for - Conv2D. Default is None. - bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. - class_agnostic: A `bool`. If set, we use a single channel mask head that - is shared between all classes. - convnet_variant: A `str` denoting the architecture of network used in the - head. Supported options are 'default', 'hourglass20', 'hourglass52' - and 'hourglass100'. - **kwargs: Additional keyword arguments to be passed. - """ - super(DeepMaskHead, self).__init__(**kwargs) - self._config_dict = { - 'num_classes': num_classes, - 'upsample_factor': upsample_factor, - 'num_convs': num_convs, - 'num_filters': num_filters, - 'use_separable_conv': use_separable_conv, - 'activation': activation, - 'use_sync_bn': use_sync_bn, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'kernel_regularizer': kernel_regularizer, - 'bias_regularizer': bias_regularizer, - 'class_agnostic': class_agnostic, - 'convnet_variant': convnet_variant, - } - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation = tf_utils.get_activation(activation) - - def _get_conv_op_and_kwargs(self): - conv_op = (tf.keras.layers.SeparableConv2D - if self._config_dict['use_separable_conv'] - else tf.keras.layers.Conv2D) - conv_kwargs = { - 'filters': self._config_dict['num_filters'], - 'kernel_size': 3, - 'padding': 'same', - } - if self._config_dict['use_separable_conv']: - conv_kwargs.update({ - 'depthwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'pointwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'depthwise_regularizer': self._config_dict['kernel_regularizer'], - 'pointwise_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - else: - conv_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - - return conv_op, conv_kwargs - - def _get_bn_op_and_kwargs(self): - - bn_op = (tf.keras.layers.experimental.SyncBatchNormalization - if self._config_dict['use_sync_bn'] - else tf.keras.layers.BatchNormalization) - bn_kwargs = { - 'axis': self._bn_axis, - 'momentum': self._config_dict['norm_momentum'], - 'epsilon': self._config_dict['norm_epsilon'], - } - - return bn_op, bn_kwargs - - def build(self, input_shape): - """Creates the variables of the head.""" - - conv_op, conv_kwargs = self._get_conv_op_and_kwargs() - - self._build_convnet_variant() - - self._deconv = tf.keras.layers.Conv2DTranspose( - filters=self._config_dict['num_filters'], - kernel_size=self._config_dict['upsample_factor'], - strides=self._config_dict['upsample_factor'], - padding='valid', - kernel_initializer=tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - bias_initializer=tf.zeros_initializer(), - kernel_regularizer=self._config_dict['kernel_regularizer'], - bias_regularizer=self._config_dict['bias_regularizer'], - name='mask-upsampling') - - bn_op, bn_kwargs = self._get_bn_op_and_kwargs() - self._deconv_bn = bn_op(name='mask-deconv-bn', **bn_kwargs) - - if self._config_dict['class_agnostic']: - num_filters = 1 - else: - num_filters = self._config_dict['num_classes'] - - conv_kwargs = { - 'filters': num_filters, - 'kernel_size': 1, - 'padding': 'valid', - } - if self._config_dict['use_separable_conv']: - conv_kwargs.update({ - 'depthwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'pointwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'depthwise_regularizer': self._config_dict['kernel_regularizer'], - 'pointwise_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - else: - conv_kwargs.update({ - 'kernel_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - }) - self._mask_regressor = conv_op(name='mask-logits', **conv_kwargs) - - super(DeepMaskHead, self).build(input_shape) - - def call(self, inputs, training=None): - """Forward pass of mask branch for the Mask-RCNN model. - - Args: - inputs: A `list` of two tensors where - inputs[0]: A `tf.Tensor` of shape [batch_size, num_instances, - roi_height, roi_width, roi_channels], representing the ROI features. - inputs[1]: A `tf.Tensor` of shape [batch_size, num_instances], - representing the classes of the ROIs. - training: A `bool` indicating whether it is in `training` mode. - - Returns: - mask_outputs: A `tf.Tensor` of shape - [batch_size, num_instances, roi_height * upsample_factor, - roi_width * upsample_factor], representing the mask predictions. - """ - roi_features, roi_classes = inputs - features_shape = tf.shape(roi_features) - batch_size, num_rois, height, width, filters = ( - features_shape[0], features_shape[1], features_shape[2], - features_shape[3], features_shape[4]) - if batch_size is None: - batch_size = tf.shape(roi_features)[0] - - x = tf.reshape(roi_features, [-1, height, width, filters]) - - x = self._call_convnet_variant(x) - - x = self._deconv(x) - x = self._deconv_bn(x) - x = self._activation(x) - - logits = self._mask_regressor(x) - - mask_height = height * self._config_dict['upsample_factor'] - mask_width = width * self._config_dict['upsample_factor'] - - if self._config_dict['class_agnostic']: - logits = tf.reshape(logits, [-1, num_rois, mask_height, mask_width, 1]) - else: - logits = tf.reshape( - logits, - [-1, num_rois, mask_height, mask_width, - self._config_dict['num_classes']]) - - batch_indices = tf.tile( - tf.expand_dims(tf.range(batch_size), axis=1), [1, num_rois]) - mask_indices = tf.tile( - tf.expand_dims(tf.range(num_rois), axis=0), [batch_size, 1]) - - if self._config_dict['class_agnostic']: - class_gather_indices = tf.zeros_like(roi_classes, dtype=tf.int32) - else: - class_gather_indices = tf.cast(roi_classes, dtype=tf.int32) - - gather_indices = tf.stack( - [batch_indices, mask_indices, class_gather_indices], - axis=2) - mask_outputs = tf.gather_nd( - tf.transpose(logits, [0, 1, 4, 2, 3]), gather_indices) - return mask_outputs - - def _build_convnet_variant(self): - - variant = self._config_dict['convnet_variant'] - if variant == 'default': - conv_op, conv_kwargs = self._get_conv_op_and_kwargs() - bn_op, bn_kwargs = self._get_bn_op_and_kwargs() - self._convs = [] - self._conv_norms = [] - for i in range(self._config_dict['num_convs']): - conv_name = 'mask-conv_{}'.format(i) - self._convs.append(conv_op(name=conv_name, **conv_kwargs)) - bn_name = 'mask-conv-bn_{}'.format(i) - self._conv_norms.append(bn_op(name=bn_name, **bn_kwargs)) - - elif variant == 'hourglass20': - logging.info('Using hourglass 20 network.') - self._hourglass = hourglass_network.hourglass_20( - self._config_dict['num_filters'], initial_downsample=False) - - elif variant == 'hourglass52': - logging.info('Using hourglass 52 network.') - self._hourglass = hourglass_network.hourglass_52( - self._config_dict['num_filters'], initial_downsample=False) - - elif variant == 'hourglass100': - logging.info('Using hourglass 100 network.') - self._hourglass = hourglass_network.hourglass_100( - self._config_dict['num_filters'], initial_downsample=False) - - else: - raise ValueError('Unknown ConvNet variant - {}'.format(variant)) - - def _call_convnet_variant(self, x): - - variant = self._config_dict['convnet_variant'] - if variant == 'default': - for conv, bn in zip(self._convs, self._conv_norms): - x = conv(x) - x = bn(x) - x = self._activation(x) - return x - elif variant == 'hourglass20': - return self._hourglass(x)[-1] - elif variant == 'hourglass52': - return self._hourglass(x)[-1] - elif variant == 'hourglass100': - return self._hourglass(x)[-1] - else: - raise ValueError('Unknown ConvNet variant - {}'.format(variant)) - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/instance_heads_test.py b/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/instance_heads_test.py deleted file mode 100644 index 95947238f966cecc61c19ed997ddd282beca4914..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/modeling/heads/instance_heads_test.py +++ /dev/null @@ -1,99 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for instance_heads.py.""" - -# Import libraries -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.vision.beta.projects.deepmac_maskrcnn.modeling.heads import instance_heads as deep_instance_heads - - -class MaskHeadTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (1, 1, False), - (1, 2, False), - (2, 1, False), - (2, 2, False), - ) - def test_forward(self, upsample_factor, num_convs, use_sync_bn): - mask_head = deep_instance_heads.DeepMaskHead( - num_classes=3, - upsample_factor=upsample_factor, - num_convs=num_convs, - num_filters=16, - use_separable_conv=False, - activation='relu', - use_sync_bn=use_sync_bn, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - roi_features = np.random.rand(2, 10, 14, 14, 16) - roi_classes = np.zeros((2, 10)) - masks = mask_head([roi_features, roi_classes]) - self.assertAllEqual( - masks.numpy().shape, - [2, 10, 14 * upsample_factor, 14 * upsample_factor]) - - def test_serialize_deserialize(self): - mask_head = deep_instance_heads.DeepMaskHead( - num_classes=3, - upsample_factor=2, - num_convs=1, - num_filters=256, - use_separable_conv=False, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - kernel_regularizer=None, - bias_regularizer=None, - ) - config = mask_head.get_config() - new_mask_head = deep_instance_heads.DeepMaskHead.from_config(config) - self.assertAllEqual( - mask_head.get_config(), new_mask_head.get_config()) - - def test_forward_class_agnostic(self): - mask_head = deep_instance_heads.DeepMaskHead( - num_classes=3, - class_agnostic=True - ) - roi_features = np.random.rand(2, 10, 14, 14, 16) - roi_classes = np.zeros((2, 10)) - masks = mask_head([roi_features, roi_classes]) - self.assertAllEqual(masks.numpy().shape, [2, 10, 28, 28]) - - def test_instance_head_hourglass(self): - mask_head = deep_instance_heads.DeepMaskHead( - num_classes=3, - class_agnostic=True, - convnet_variant='hourglass20', - num_filters=32, - upsample_factor=2 - ) - roi_features = np.random.rand(2, 10, 16, 16, 16) - roi_classes = np.zeros((2, 10)) - masks = mask_head([roi_features, roi_classes]) - self.assertAllEqual(masks.numpy().shape, [2, 10, 32, 32]) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/deepmac_maskrcnn/modeling/maskrcnn_model.py b/official/vision/beta/projects/deepmac_maskrcnn/modeling/maskrcnn_model.py deleted file mode 100644 index 97263c4bdb47a5eeed29b6d701c149649dc27a7b..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/modeling/maskrcnn_model.py +++ /dev/null @@ -1,221 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Mask R-CNN model.""" - -from typing import List, Mapping, Optional, Union - -# Import libraries - -from absl import logging -import tensorflow as tf - -from official.vision.beta.modeling import maskrcnn_model - - -def resize_as(source, size): - - source = tf.transpose(source, (0, 2, 3, 1)) - source = tf.image.resize(source, (size, size)) - return tf.transpose(source, (0, 3, 1, 2)) - - -class DeepMaskRCNNModel(maskrcnn_model.MaskRCNNModel): - """The Mask R-CNN model.""" - - def __init__(self, - backbone: tf.keras.Model, - decoder: tf.keras.Model, - rpn_head: tf.keras.layers.Layer, - detection_head: Union[tf.keras.layers.Layer, - List[tf.keras.layers.Layer]], - roi_generator: tf.keras.layers.Layer, - roi_sampler: Union[tf.keras.layers.Layer, - List[tf.keras.layers.Layer]], - roi_aligner: tf.keras.layers.Layer, - detection_generator: tf.keras.layers.Layer, - mask_head: Optional[tf.keras.layers.Layer] = None, - mask_sampler: Optional[tf.keras.layers.Layer] = None, - mask_roi_aligner: Optional[tf.keras.layers.Layer] = None, - class_agnostic_bbox_pred: bool = False, - cascade_class_ensemble: bool = False, - min_level: Optional[int] = None, - max_level: Optional[int] = None, - num_scales: Optional[int] = None, - aspect_ratios: Optional[List[float]] = None, - anchor_size: Optional[float] = None, - use_gt_boxes_for_masks=False, - **kwargs): - """Initializes the Mask R-CNN model. - - Args: - backbone: `tf.keras.Model`, the backbone network. - decoder: `tf.keras.Model`, the decoder network. - rpn_head: the RPN head. - detection_head: the detection head or a list of heads. - roi_generator: the ROI generator. - roi_sampler: a single ROI sampler or a list of ROI samplers for cascade - detection heads. - roi_aligner: the ROI aligner. - detection_generator: the detection generator. - mask_head: the mask head. - mask_sampler: the mask sampler. - mask_roi_aligner: the ROI alginer for mask prediction. - class_agnostic_bbox_pred: if True, perform class agnostic bounding box - prediction. Needs to be `True` for Cascade RCNN models. - cascade_class_ensemble: if True, ensemble classification scores over all - detection heads. - min_level: Minimum level in output feature maps. - max_level: Maximum level in output feature maps. - num_scales: A number representing intermediate scales added on each level. - For instances, num_scales=2 adds one additional intermediate anchor - scales [2^0, 2^0.5] on each level. - aspect_ratios: A list representing the aspect raito anchors added on each - level. The number indicates the ratio of width to height. For instances, - aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each scale level. - anchor_size: A number representing the scale of size of the base anchor to - the feature stride 2^level. - use_gt_boxes_for_masks: bool, if set, crop using groundtruth boxes instead - of proposals for training mask head - **kwargs: keyword arguments to be passed. - """ - super(DeepMaskRCNNModel, self).__init__( - backbone=backbone, - decoder=decoder, - rpn_head=rpn_head, - detection_head=detection_head, - roi_generator=roi_generator, - roi_sampler=roi_sampler, - roi_aligner=roi_aligner, - detection_generator=detection_generator, - mask_head=mask_head, - mask_sampler=mask_sampler, - mask_roi_aligner=mask_roi_aligner, - class_agnostic_bbox_pred=class_agnostic_bbox_pred, - cascade_class_ensemble=cascade_class_ensemble, - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size, - **kwargs) - - self._config_dict['use_gt_boxes_for_masks'] = use_gt_boxes_for_masks - - def call(self, - images: tf.Tensor, - image_shape: tf.Tensor, - anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, - gt_boxes: Optional[tf.Tensor] = None, - gt_classes: Optional[tf.Tensor] = None, - gt_masks: Optional[tf.Tensor] = None, - training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: - - model_outputs, intermediate_outputs = self._call_box_outputs( - images=images, image_shape=image_shape, anchor_boxes=anchor_boxes, - gt_boxes=gt_boxes, gt_classes=gt_classes, training=training) - if not self._include_mask: - return model_outputs - - model_mask_outputs = self._call_mask_outputs( - model_box_outputs=model_outputs, - features=model_outputs['decoder_features'], - current_rois=intermediate_outputs['current_rois'], - matched_gt_indices=intermediate_outputs['matched_gt_indices'], - matched_gt_boxes=intermediate_outputs['matched_gt_boxes'], - matched_gt_classes=intermediate_outputs['matched_gt_classes'], - gt_masks=gt_masks, - gt_classes=gt_classes, - gt_boxes=gt_boxes, - training=training) - model_outputs.update(model_mask_outputs) - return model_outputs - - def call_images_and_boxes(self, images, boxes): - """Predict masks given an image and bounding boxes.""" - - _, decoder_features = self._get_backbone_and_decoder_features(images) - boxes_shape = tf.shape(boxes) - batch_size, num_boxes = boxes_shape[0], boxes_shape[1] - classes = tf.zeros((batch_size, num_boxes), dtype=tf.int32) - - _, mask_probs = self._features_to_mask_outputs( - decoder_features, boxes, classes) - return { - 'detection_masks': mask_probs - } - - def _call_mask_outputs( - self, - model_box_outputs: Mapping[str, tf.Tensor], - features: tf.Tensor, - current_rois: tf.Tensor, - matched_gt_indices: tf.Tensor, - matched_gt_boxes: tf.Tensor, - matched_gt_classes: tf.Tensor, - gt_masks: tf.Tensor, - gt_classes: tf.Tensor, - gt_boxes: tf.Tensor, - training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: - - model_outputs = dict(model_box_outputs) - if training: - if self._config_dict['use_gt_boxes_for_masks']: - mask_size = ( - self.mask_roi_aligner._config_dict['crop_size'] * # pylint:disable=protected-access - self.mask_head._config_dict['upsample_factor'] # pylint:disable=protected-access - ) - gt_masks = resize_as(source=gt_masks, size=mask_size) - - logging.info('Using GT class and mask targets.') - model_outputs.update({ - 'mask_class_targets': gt_classes, - 'mask_targets': gt_masks, - }) - else: - rois, roi_classes, roi_masks = self.mask_sampler( - current_rois, matched_gt_boxes, matched_gt_classes, - matched_gt_indices, gt_masks) - roi_masks = tf.stop_gradient(roi_masks) - model_outputs.update({ - 'mask_class_targets': roi_classes, - 'mask_targets': roi_masks, - }) - - else: - rois = model_outputs['detection_boxes'] - roi_classes = model_outputs['detection_classes'] - - # Mask RoI align. - if training and self._config_dict['use_gt_boxes_for_masks']: - logging.info('Using GT mask roi features.') - roi_aligner_boxes = gt_boxes - mask_head_classes = gt_classes - - else: - roi_aligner_boxes = rois - mask_head_classes = roi_classes - - mask_logits, mask_probs = self._features_to_mask_outputs( - features, roi_aligner_boxes, mask_head_classes) - - if training: - model_outputs.update({ - 'mask_outputs': mask_logits, - }) - else: - model_outputs.update({ - 'detection_masks': mask_probs, - }) - return model_outputs diff --git a/official/vision/beta/projects/deepmac_maskrcnn/modeling/maskrcnn_model_test.py b/official/vision/beta/projects/deepmac_maskrcnn/modeling/maskrcnn_model_test.py deleted file mode 100644 index 9003d793ef35c0a397e0777ab9f4cfc58bef8fb1..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/modeling/maskrcnn_model_test.py +++ /dev/null @@ -1,154 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for maskrcnn_model.py.""" - -# Import libraries - -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.vision.beta.modeling.backbones import resnet -from official.vision.beta.modeling.decoders import fpn -from official.vision.beta.modeling.heads import dense_prediction_heads -from official.vision.beta.modeling.heads import instance_heads -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.modeling.layers import mask_sampler -from official.vision.beta.modeling.layers import roi_aligner -from official.vision.beta.modeling.layers import roi_generator -from official.vision.beta.modeling.layers import roi_sampler -from official.vision.beta.ops import anchor -from official.vision.beta.projects.deepmac_maskrcnn.modeling import maskrcnn_model -from official.vision.beta.projects.deepmac_maskrcnn.modeling.heads import instance_heads as deep_instance_heads - - -def construct_model_and_anchors(image_size, use_gt_boxes_for_masks): - num_classes = 3 - min_level = 3 - max_level = 4 - num_scales = 3 - aspect_ratios = [1.0] - - anchor_boxes = anchor.Anchor( - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=3, - image_size=image_size).multilevel_boxes - num_anchors_per_location = len(aspect_ratios) * num_scales - - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) - backbone = resnet.ResNet(model_id=50, input_specs=input_specs) - decoder = fpn.FPN( - min_level=min_level, - max_level=max_level, - input_specs=backbone.output_specs) - rpn_head = dense_prediction_heads.RPNHead( - min_level=min_level, - max_level=max_level, - num_anchors_per_location=num_anchors_per_location) - detection_head = instance_heads.DetectionHead( - num_classes=num_classes) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - roi_sampler_obj = roi_sampler.ROISampler() - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - mask_head = deep_instance_heads.DeepMaskHead( - num_classes=num_classes, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - - model = maskrcnn_model.DeepMaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - use_gt_boxes_for_masks=use_gt_boxes_for_masks) - - return model, anchor_boxes - - -class MaskRCNNModelTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (False, False,), - (False, True,), - (True, False,), - (True, True,), - ) - def test_forward(self, use_gt_boxes_for_masks, training): - image_size = (256, 256) - images = np.random.rand(2, image_size[0], image_size[1], 3) - image_shape = np.array([[224, 100], [100, 224]]) - model, anchor_boxes = construct_model_and_anchors( - image_size, use_gt_boxes_for_masks) - - gt_boxes = tf.zeros((2, 16, 4), dtype=tf.float32) - gt_masks = tf.zeros((2, 16, 32, 32)) - gt_classes = tf.zeros((2, 16), dtype=tf.int32) - results = model(images.astype(np.uint8), - image_shape, - anchor_boxes, - gt_boxes, - gt_classes, - gt_masks, - training=training) - - self.assertIn('rpn_boxes', results) - self.assertIn('rpn_scores', results) - if training: - self.assertIn('class_targets', results) - self.assertIn('box_targets', results) - self.assertIn('class_outputs', results) - self.assertIn('box_outputs', results) - self.assertIn('mask_outputs', results) - self.assertEqual(results['mask_targets'].shape, - results['mask_outputs'].shape) - else: - self.assertIn('detection_boxes', results) - self.assertIn('detection_scores', results) - self.assertIn('detection_classes', results) - self.assertIn('num_detections', results) - self.assertIn('detection_masks', results) - - @parameterized.parameters( - [(1, 5), (1, 10), (1, 15), (2, 5), (2, 10), (2, 15)] - ) - def test_image_and_boxes(self, batch_size, num_boxes): - image_size = (640, 640) - images = np.random.rand(1, image_size[0], image_size[1], 3).astype( - np.float32) - model, _ = construct_model_and_anchors( - image_size, use_gt_boxes_for_masks=True) - - boxes = np.zeros((1, num_boxes, 4), dtype=np.float32) - boxes[:, :, [2, 3]] = 1.0 - boxes = tf.constant(boxes) - results = model.call_images_and_boxes(images, boxes) - self.assertIn('detection_masks', results) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/deepmac_maskrcnn/serving/__init__.py b/official/vision/beta/projects/deepmac_maskrcnn/serving/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/serving/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/serving/detection.py b/official/vision/beta/projects/deepmac_maskrcnn/serving/detection.py deleted file mode 100644 index 74fc1cd047a99bd67eed6584ec158f3dd0ea32e3..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/serving/detection.py +++ /dev/null @@ -1,139 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Detection input and model functions for serving/inference.""" - -from typing import Dict, Mapping, Text -import tensorflow as tf - -from official.vision.beta.ops import box_ops -from official.vision.beta.projects.deepmac_maskrcnn.configs import deep_mask_head_rcnn as cfg -from official.vision.beta.projects.deepmac_maskrcnn.modeling import maskrcnn_model -from official.vision.beta.projects.deepmac_maskrcnn.tasks import deep_mask_head_rcnn -from official.vision.beta.serving import detection - - -def reverse_input_box_transformation(boxes, image_info): - """Reverse the Mask R-CNN model's input boxes tranformation. - - Args: - boxes: A [batch_size, num_boxes, 4] float tensor of boxes in normalized - coordinates. - image_info: a 2D `Tensor` that encodes the information of the image and the - applied preprocessing. It is in the format of - [[original_height, original_width], [desired_height, desired_width], - [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, - desired_width] is the actual scaled image size, and [y_scale, x_scale] is - the scaling factor, which is the ratio of - scaled dimension / original dimension. - - Returns: - boxes: Same shape as input `boxes` but in the absolute coordinate space of - the preprocessed image. - """ - # Reversing sequence from Detection_module.serve when - # output_normalized_coordinates=true - scale = image_info[:, 2:3, :] - scale = tf.tile(scale, [1, 1, 2]) - boxes = boxes * scale - height_width = image_info[:, 0:1, :] - return box_ops.denormalize_boxes(boxes, height_width) - - -class DetectionModule(detection.DetectionModule): - """Detection Module.""" - - def _build_model(self): - - if self._batch_size is None: - ValueError("batch_size can't be None for detection models") - if self.params.task.model.detection_generator.nms_version != 'batched': - ValueError('Only batched_nms is supported.') - input_specs = tf.keras.layers.InputSpec(shape=[self._batch_size] + - self._input_image_size + [3]) - - if isinstance(self.params.task.model, cfg.DeepMaskHeadRCNN): - model = deep_mask_head_rcnn.build_maskrcnn( - input_specs=input_specs, model_config=self.params.task.model) - else: - raise ValueError('Detection module not implemented for {} model.'.format( - type(self.params.task.model))) - - return model - - @tf.function - def inference_for_tflite_image_and_boxes( - self, images: tf.Tensor, boxes: tf.Tensor) -> Mapping[str, tf.Tensor]: - """A tf-function for serve_image_and_boxes. - - Args: - images: A [batch_size, height, width, channels] float tensor. - boxes: A [batch_size, num_boxes, 4] float tensor containing boxes - normalized to the input image. - - Returns: - result: A dict containing: - 'detection_masks': A [batch_size, num_boxes, mask_height, mask_width] - float tensor containing per-pixel mask probabilities. - """ - - if not isinstance(self.model, maskrcnn_model.DeepMaskRCNNModel): - raise ValueError( - ('Can only use image and boxes input for DeepMaskRCNNModel, ' - 'Found {}'.format(type(self.model)))) - - return self.serve_image_and_boxes(images, boxes) - - def serve_image_and_boxes(self, images: tf.Tensor, boxes: tf.Tensor): - """Function used to export a model that consumes and image and boxes. - - The model predicts the class-agnostic masks at the given box locations. - - Args: - images: A [batch_size, height, width, channels] float tensor. - boxes: A [batch_size, num_boxes, 4] float tensor containing boxes - normalized to the input image. - - Returns: - result: A dict containing: - 'detection_masks': A [batch_size, num_boxes, mask_height, mask_width] - float tensor containing per-pixel mask probabilities. - """ - images, _, image_info = self.preprocess(images) - boxes = reverse_input_box_transformation(boxes, image_info) - result = self.model.call_images_and_boxes(images, boxes) - return result - - def get_inference_signatures(self, function_keys: Dict[Text, Text]): - signatures = {} - - if 'image_and_boxes_tensor' in function_keys: - def_name = function_keys['image_and_boxes_tensor'] - image_signature = tf.TensorSpec( - shape=[self._batch_size] + [None] * len(self._input_image_size) + - [self._num_channels], - dtype=tf.uint8) - boxes_signature = tf.TensorSpec(shape=[self._batch_size, None, 4], - dtype=tf.float32) - tf_function = self.inference_for_tflite_image_and_boxes - signatures[def_name] = tf_function.get_concrete_function( - image_signature, boxes_signature) - - function_keys.pop('image_and_boxes_tensor', None) - parent_signatures = super(DetectionModule, self).get_inference_signatures( - function_keys) - signatures.update(parent_signatures) - - return signatures diff --git a/official/vision/beta/projects/deepmac_maskrcnn/serving/detection_test.py b/official/vision/beta/projects/deepmac_maskrcnn/serving/detection_test.py deleted file mode 100644 index a15b44f27903641fe6a162f609911bec2824dbfb..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/serving/detection_test.py +++ /dev/null @@ -1,165 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Test for image detection export lib.""" - -import io -import os - -from absl.testing import parameterized -import numpy as np -from PIL import Image -import tensorflow as tf - -from official.core import exp_factory -from official.vision.beta.projects.deepmac_maskrcnn.serving import detection - - -class DetectionExportTest(tf.test.TestCase, parameterized.TestCase): - - def _get_detection_module(self, experiment_name, image_size=(640, 640)): - params = exp_factory.get_exp_config(experiment_name) - params.task.model.backbone.resnet.model_id = 18 - params.task.model.detection_generator.use_batched_nms = True - detection_module = detection.DetectionModule( - params, batch_size=1, input_image_size=list(image_size)) - return detection_module - - def _export_from_module(self, module, input_type, save_directory): - signatures = module.get_inference_signatures( - {input_type: 'serving_default'}) - tf.saved_model.save(module, save_directory, signatures=signatures) - - def _get_dummy_input(self, input_type, batch_size, image_size): - """Get dummy input for the given input type.""" - h, w = image_size - - if input_type == 'image_tensor': - return tf.zeros((batch_size, h, w, 3), dtype=np.uint8) - elif input_type == 'image_bytes': - image = Image.fromarray(np.zeros((h, w, 3), dtype=np.uint8)) - byte_io = io.BytesIO() - image.save(byte_io, 'PNG') - return [byte_io.getvalue() for b in range(batch_size)] - elif input_type == 'tf_example': - image_tensor = tf.zeros((h, w, 3), dtype=tf.uint8) - encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() - example = tf.train.Example( - features=tf.train.Features( - feature={ - 'image/encoded': - tf.train.Feature( - bytes_list=tf.train.BytesList(value=[encoded_jpeg])), - })).SerializeToString() - return [example for b in range(batch_size)] - - @parameterized.parameters( - ('image_tensor', 'deep_mask_head_rcnn_resnetfpn_coco', [640, 640]), - ('image_bytes', 'deep_mask_head_rcnn_resnetfpn_coco', [640, 384]), - ('tf_example', 'deep_mask_head_rcnn_resnetfpn_coco', [640, 640]), - ) - def test_export(self, input_type, experiment_name, image_size): - self.skipTest('a') - tmp_dir = self.get_temp_dir() - module = self._get_detection_module(experiment_name, image_size) - - self._export_from_module(module, input_type, tmp_dir) - - self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) - self.assertTrue( - os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) - self.assertTrue( - os.path.exists( - os.path.join(tmp_dir, 'variables', - 'variables.data-00000-of-00001'))) - - imported = tf.saved_model.load(tmp_dir) - detection_fn = imported.signatures['serving_default'] - - images = self._get_dummy_input( - input_type, batch_size=1, image_size=image_size) - - processed_images, anchor_boxes, image_info = module._build_inputs( - tf.zeros((224, 224, 3), dtype=tf.uint8)) - image_shape = image_info[1, :] - image_shape = tf.expand_dims(image_shape, 0) - processed_images = tf.expand_dims(processed_images, 0) - for l, l_boxes in anchor_boxes.items(): - anchor_boxes[l] = tf.expand_dims(l_boxes, 0) - - expected_outputs = module.model( - images=processed_images, - image_shape=image_shape, - anchor_boxes=anchor_boxes, - training=False) - outputs = detection_fn(tf.constant(images)) - - self.assertAllClose(outputs['num_detections'].numpy(), - expected_outputs['num_detections'].numpy()) - - @parameterized.parameters( - ('deep_mask_head_rcnn_resnetfpn_coco', [640, 640], 1), - ('deep_mask_head_rcnn_resnetfpn_coco', [640, 640], 5), - ('deep_mask_head_rcnn_spinenet_coco', [640, 384], 3), - ('deep_mask_head_rcnn_spinenet_coco', [640, 384], 9), - ) - def test_export_image_and_boxes(self, experiment_name, image_size, num_boxes): - tmp_dir = self.get_temp_dir() - module = self._get_detection_module(experiment_name) - - self._export_from_module(module, 'image_and_boxes_tensor', tmp_dir) - - self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) - self.assertTrue( - os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) - self.assertTrue( - os.path.exists( - os.path.join(tmp_dir, 'variables', - 'variables.data-00000-of-00001'))) - - imported = tf.saved_model.load(tmp_dir) - detection_fn = imported.signatures['serving_default'] - - images = self._get_dummy_input( - 'image_tensor', batch_size=1, image_size=image_size) - - processed_images, anchor_boxes, image_info = module._build_inputs( - tf.zeros(image_size + [3], dtype=tf.uint8)) - - image_shape = image_info[1, :] - image_shape = image_shape[tf.newaxis] - processed_images = processed_images[tf.newaxis] - image_info = image_info[tf.newaxis] - - for l, l_boxes in anchor_boxes.items(): - anchor_boxes[l] = tf.expand_dims(l_boxes, 0) - - boxes = np.zeros((1, num_boxes, 4), dtype=np.float32) - boxes[:, :, [2, 3]] = 1.0 - boxes = tf.constant(boxes) - - denormalized_boxes = detection.reverse_input_box_transformation( - boxes, image_info) - expected_outputs = module.model.call_images_and_boxes( - images=processed_images, boxes=denormalized_boxes) - outputs = detection_fn(images=tf.constant(images), boxes=boxes) - - self.assertAllClose(outputs['detection_masks'].numpy(), - expected_outputs['detection_masks'].numpy(), - rtol=1e-3, atol=1e-3) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/deepmac_maskrcnn/serving/export_saved_model.py b/official/vision/beta/projects/deepmac_maskrcnn/serving/export_saved_model.py deleted file mode 100644 index da497eff7afaff20e8d9f7e344feff49e4bef4c6..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/serving/export_saved_model.py +++ /dev/null @@ -1,106 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -r"""Deepmac model export binary for serving/inference. - -To export a trained checkpoint in saved_model format (shell script): - -CHECKPOINT_PATH = XX -EXPORT_DIR_PATH = XX -CONFIG_FILE_PATH = XX -export_saved_model --export_dir=${EXPORT_DIR_PATH}/ \ - --checkpoint_path=${CHECKPOINT_PATH} \ - --config_file=${CONFIG_FILE_PATH} \ - --batch_size=2 \ - --input_image_size=224,224 -To serve (python): -export_dir_path = XX -input_type = XX -input_images = XX -imported = tf.saved_model.load(export_dir_path) -model_fn = imported.signatures['serving_default'] -output = model_fn(input_images) -""" - -from absl import app -from absl import flags - -from official.core import exp_factory -from official.modeling import hyperparams -from official.vision.beta.projects.deepmac_maskrcnn.serving import detection -from official.vision.beta.projects.deepmac_maskrcnn.tasks import deep_mask_head_rcnn # pylint: disable=unused-import -from official.vision.beta.serving import export_saved_model_lib - -FLAGS = flags.FLAGS - -flags.DEFINE_string('experiment', 'deep_mask_head_rcnn_resnetfpn_coco', - 'experiment type, e.g. retinanet_resnetfpn_coco') -flags.DEFINE_string('export_dir', None, 'The export directory.') -flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path.') -flags.DEFINE_multi_string( - 'config_file', - default=None, - help='YAML/JSON files which specifies overrides. The override order ' - 'follows the order of args. Note that each file ' - 'can be used as an override template to override the default parameters ' - 'specified in Python. If the same parameter is specified in both ' - '`--config_file` and `--params_override`, `config_file` will be used ' - 'first, followed by params_override.') -flags.DEFINE_string( - 'params_override', '', - 'The JSON/YAML file or string which specifies the parameter to be overriden' - ' on top of `config_file` template.') -flags.DEFINE_integer('batch_size', None, 'The batch size.') -flags.DEFINE_string('input_type', 'image_tensor', - ('One of `image_tensor`, `image_bytes`, `tf_example` ' - 'or `image_and_boxes_tensor`.')) -flags.DEFINE_string( - 'input_image_size', '224,224', - 'The comma-separated string of two integers representing the height,width ' - 'of the input to the model.') - - -def main(_): - - params = exp_factory.get_exp_config(FLAGS.experiment) - for config_file in FLAGS.config_file or []: - params = hyperparams.override_params_dict( - params, config_file, is_strict=True) - if FLAGS.params_override: - params = hyperparams.override_params_dict( - params, FLAGS.params_override, is_strict=True) - - params.validate() - params.lock() - - export_module = detection.DetectionModule( - params=params, - batch_size=FLAGS.batch_size, - input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], - num_channels=3) - - export_saved_model_lib.export_inference_graph( - input_type=FLAGS.input_type, - batch_size=FLAGS.batch_size, - input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], - params=params, - checkpoint_path=FLAGS.checkpoint_path, - export_dir=FLAGS.export_dir, - export_module=export_module, - export_checkpoint_subdir='checkpoint', - export_saved_model_subdir='saved_model') - - -if __name__ == '__main__': - app.run(main) diff --git a/official/vision/beta/projects/deepmac_maskrcnn/tasks/__init__.py b/official/vision/beta/projects/deepmac_maskrcnn/tasks/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/tasks/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/deepmac_maskrcnn/tasks/deep_mask_head_rcnn.py b/official/vision/beta/projects/deepmac_maskrcnn/tasks/deep_mask_head_rcnn.py deleted file mode 100644 index 60eb0b5651b9c10544e72feb55551c2efe1657ee..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/tasks/deep_mask_head_rcnn.py +++ /dev/null @@ -1,190 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Mask R-CNN variant with support for deep mask heads.""" - -import tensorflow as tf - -from official.core import task_factory -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling.decoders import factory as decoder_factory -from official.vision.beta.modeling.heads import dense_prediction_heads -from official.vision.beta.modeling.heads import instance_heads -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.modeling.layers import mask_sampler -from official.vision.beta.modeling.layers import roi_aligner -from official.vision.beta.modeling.layers import roi_generator -from official.vision.beta.modeling.layers import roi_sampler -from official.vision.beta.projects.deepmac_maskrcnn.configs import deep_mask_head_rcnn as deep_mask_head_rcnn_config -from official.vision.beta.projects.deepmac_maskrcnn.modeling import maskrcnn_model as deep_maskrcnn_model -from official.vision.beta.projects.deepmac_maskrcnn.modeling.heads import instance_heads as deep_instance_heads -from official.vision.beta.tasks import maskrcnn - - -# Taken from modeling/factory.py -def build_maskrcnn(input_specs: tf.keras.layers.InputSpec, - model_config: deep_mask_head_rcnn_config.DeepMaskHeadRCNN, - l2_regularizer: tf.keras.regularizers.Regularizer = None): # pytype: disable=annotation-type-mismatch # typed-keras - """Builds Mask R-CNN model.""" - norm_activation_config = model_config.norm_activation - backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=norm_activation_config, - l2_regularizer=l2_regularizer) - - decoder = decoder_factory.build_decoder( - input_specs=backbone.output_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - - rpn_head_config = model_config.rpn_head - roi_generator_config = model_config.roi_generator - roi_sampler_config = model_config.roi_sampler - roi_aligner_config = model_config.roi_aligner - detection_head_config = model_config.detection_head - generator_config = model_config.detection_generator - num_anchors_per_location = ( - len(model_config.anchor.aspect_ratios) * model_config.anchor.num_scales) - - rpn_head = dense_prediction_heads.RPNHead( - min_level=model_config.min_level, - max_level=model_config.max_level, - num_anchors_per_location=num_anchors_per_location, - num_convs=rpn_head_config.num_convs, - num_filters=rpn_head_config.num_filters, - use_separable_conv=rpn_head_config.use_separable_conv, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer) - - detection_head = instance_heads.DetectionHead( - num_classes=model_config.num_classes, - num_convs=detection_head_config.num_convs, - num_filters=detection_head_config.num_filters, - use_separable_conv=detection_head_config.use_separable_conv, - num_fcs=detection_head_config.num_fcs, - fc_dims=detection_head_config.fc_dims, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer) - - roi_generator_obj = roi_generator.MultilevelROIGenerator( - pre_nms_top_k=roi_generator_config.pre_nms_top_k, - pre_nms_score_threshold=roi_generator_config.pre_nms_score_threshold, - pre_nms_min_size_threshold=( - roi_generator_config.pre_nms_min_size_threshold), - nms_iou_threshold=roi_generator_config.nms_iou_threshold, - num_proposals=roi_generator_config.num_proposals, - test_pre_nms_top_k=roi_generator_config.test_pre_nms_top_k, - test_pre_nms_score_threshold=( - roi_generator_config.test_pre_nms_score_threshold), - test_pre_nms_min_size_threshold=( - roi_generator_config.test_pre_nms_min_size_threshold), - test_nms_iou_threshold=roi_generator_config.test_nms_iou_threshold, - test_num_proposals=roi_generator_config.test_num_proposals, - use_batched_nms=roi_generator_config.use_batched_nms) - - roi_sampler_obj = roi_sampler.ROISampler( - mix_gt_boxes=roi_sampler_config.mix_gt_boxes, - num_sampled_rois=roi_sampler_config.num_sampled_rois, - foreground_fraction=roi_sampler_config.foreground_fraction, - foreground_iou_threshold=roi_sampler_config.foreground_iou_threshold, - background_iou_high_threshold=( - roi_sampler_config.background_iou_high_threshold), - background_iou_low_threshold=( - roi_sampler_config.background_iou_low_threshold)) - - roi_aligner_obj = roi_aligner.MultilevelROIAligner( - crop_size=roi_aligner_config.crop_size, - sample_offset=roi_aligner_config.sample_offset) - - detection_generator_obj = detection_generator.DetectionGenerator( - apply_nms=True, - pre_nms_top_k=generator_config.pre_nms_top_k, - pre_nms_score_threshold=generator_config.pre_nms_score_threshold, - nms_iou_threshold=generator_config.nms_iou_threshold, - max_num_detections=generator_config.max_num_detections, - nms_version=generator_config.nms_version) - - if model_config.include_mask: - mask_head = deep_instance_heads.DeepMaskHead( - num_classes=model_config.num_classes, - upsample_factor=model_config.mask_head.upsample_factor, - num_convs=model_config.mask_head.num_convs, - num_filters=model_config.mask_head.num_filters, - use_separable_conv=model_config.mask_head.use_separable_conv, - activation=model_config.norm_activation.activation, - norm_momentum=model_config.norm_activation.norm_momentum, - norm_epsilon=model_config.norm_activation.norm_epsilon, - kernel_regularizer=l2_regularizer, - class_agnostic=model_config.mask_head.class_agnostic, - convnet_variant=model_config.mask_head.convnet_variant) - - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=( - model_config.mask_roi_aligner.crop_size * - model_config.mask_head.upsample_factor), - num_sampled_masks=model_config.mask_sampler.num_sampled_masks) - - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner( - crop_size=model_config.mask_roi_aligner.crop_size, - sample_offset=model_config.mask_roi_aligner.sample_offset) - else: - mask_head = None - mask_sampler_obj = None - mask_roi_aligner_obj = None - - model = deep_maskrcnn_model.DeepMaskRCNNModel( - backbone=backbone, - decoder=decoder, - rpn_head=rpn_head, - detection_head=detection_head, - roi_generator=roi_generator_obj, - roi_sampler=roi_sampler_obj, - roi_aligner=roi_aligner_obj, - detection_generator=detection_generator_obj, - mask_head=mask_head, - mask_sampler=mask_sampler_obj, - mask_roi_aligner=mask_roi_aligner_obj, - use_gt_boxes_for_masks=model_config.use_gt_boxes_for_masks) - return model - - -@task_factory.register_task_cls(deep_mask_head_rcnn_config.DeepMaskHeadRCNNTask) -class DeepMaskHeadRCNNTask(maskrcnn.MaskRCNNTask): - """Mask R-CNN with support for deep mask heads.""" - - def build_model(self): - """Build Mask R-CNN model.""" - - input_specs = tf.keras.layers.InputSpec( - shape=[None] + self.task_config.model.input_size) - - l2_weight_decay = self.task_config.losses.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - model = build_maskrcnn( - input_specs=input_specs, - model_config=self.task_config.model, - l2_regularizer=l2_regularizer) - return model diff --git a/official/vision/beta/projects/deepmac_maskrcnn/train.py b/official/vision/beta/projects/deepmac_maskrcnn/train.py deleted file mode 100644 index 8e773615a3e99b20c64a1f61c74cc3ee866257a6..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/deepmac_maskrcnn/train.py +++ /dev/null @@ -1,72 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""TensorFlow Model Garden Vision training driver.""" - -from absl import app -from absl import flags -from absl import logging - -import gin - -from official.common import distribute_utils -from official.common import flags as tfm_flags -from official.core import task_factory -from official.core import train_lib -from official.core import train_utils -from official.modeling import performance -# pylint: disable=unused-import -from official.vision.beta.projects.deepmac_maskrcnn.common import registry_imports -# pylint: enable=unused-import - -FLAGS = flags.FLAGS - - -def main(_): - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) - params = train_utils.parse_configuration(FLAGS) - model_dir = FLAGS.model_dir - if 'train' in FLAGS.mode: - # Pure eval modes do not output yaml files. Otherwise continuous eval job - # may race against the train job for writing the same file. - train_utils.serialize_config(params, model_dir) - - # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' - # can have significant impact on model speeds by utilizing float16 in case of - # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when - # dtype is float16 - if params.runtime.mixed_precision_dtype: - performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) - distribution_strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - with distribution_strategy.scope(): - task = task_factory.get_task(params.task, logging_dir=model_dir) - logging.info('Training with task %s', task) - - train_lib.run_experiment( - distribution_strategy=distribution_strategy, - task=task, - mode=FLAGS.mode, - params=params, - model_dir=model_dir) - - train_utils.save_gin_config(FLAGS.mode, model_dir) - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(main) diff --git a/official/vision/beta/projects/example/README.md b/official/vision/beta/projects/example/README.md deleted file mode 100644 index b2e7b3dd6819dc16644367d54b005cd4d6878d07..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/example/README.md +++ /dev/null @@ -1,214 +0,0 @@ -# TF Vision Example Project - -This is a minimal example project to demonstrate how to use TF Model Garden's -building blocks to implement a new vision project from scratch. - -Below we use classification as an example. We will walk you through the process -of creating a new projects leveraging existing components, such as tasks, data -loaders, models, etc. You will get better understanding of these components by -going through the process. You can also refer to the docstring of corresponding -components to get more information. - -## Create Model - -In -[example_model.py](example_model.py), -we show how to create a new model. The `ExampleModel` is a subclass of -`tf.keras.Model` that defines necessary parameters. Here, you need to have -`input_specs` to specify the input shape and dimensions, and build layers within -constructor: - -```python -class ExampleModel(tf.keras.Model): - def __init__( - self, - num_classes: int, - input_specs: tf.keras.layers.InputSpec = tf.keras.layers.InputSpec( - shape=[None, None, None, 3]), - **kwargs): - # Build layers. -``` - -Given the `ExampleModel`, you can define a function that takes a model config as -input and return an `ExampleModel` instance, similar as -[build_example_model](example_model.py#L80). -As a simple example, we define a single model. However, you can split the model -implementation to individual components, such as backbones, decoders, heads, as -what we do -[here](https://github.com/tensorflow/models/blob/master/official/vision/beta/modeling). -And then in `build_example_model` function, you can hook up these components -together to obtain your full model. - -## Create Dataloader - -A dataloader reads, decodes and parses the input data. We have created various -[dataloaders](https://github.com/tensorflow/models/blob/master/official/vision/beta/dataloaders) -to handle standard input formats for classification, detection and segmentation. -If you have non-standard or complex data, you may want to create your own -dataloader. It contains a `Decoder` and a `Parser`. - -- The - [Decoder](example_input.py#L33) - decodes a TF Example record and returns a dictionary of decoded tensors: - - ```python - class Decoder(decoder.Decoder): - """A tf.Example decoder for classification task.""" - def __init__(self): - """Initializes the decoder. - - The constructor defines the mapping between the field name and the value - from an input tf.Example. For example, we define two fields for image bytes - and labels. There is no limit on the number of fields to decode. - """ - self._keys_to_features = { - 'image/encoded': - tf.io.FixedLenFeature((), tf.string, default_value=''), - 'image/class/label': - tf.io.FixedLenFeature((), tf.int64, default_value=-1) - } - ``` - -- The - [Parser](example_input.py#L68) - parses the decoded tensors and performs pre-processing to the input data, - such as image decoding, augmentation and resizing, etc. It should have - `_parse_train_data` and `_parse_eval_data` functions, in which the processed - images and labels are returned. - -## Create Config - -Next you will define configs for your project. All configs are defined as -`dataclass` objects, and can have default parameter values. - -First, you will define your -[`ExampleDataConfig`](example_config.py#L27). -It inherits from `config_definitions.DataConfig` that already defines a few -common fields, like `input_path`, `file_type`, `global_batch_size`, etc. You can -add more fields in your own config as needed. - -You can then define you model config -[`ExampleModel`](example_config.py#L39) -that inherits from `hyperparams.Config`. Expose your own model parameters here. - -You can then define your `Loss` and `Evaluation` configs. - -Next, you will put all the above configs into an -[`ExampleTask`](example_config.py#L56) -config. Here you list the configs for your data, model, loss, and evaluation, -etc. - -Finally, you can define a -[`tf_vision_example_experiment`](example_config.py#L66), -which creates a template for your experiments and fills with default parameters. -These default parameter values can be overridden by a YAML file, like -[example_config_tpu.yaml](example_config_tpu.yaml). -Also, make sure you give a unique name to your experiment template by the -decorator: - -```python -@exp_factory.register_config_factory('tf_vision_example_experiment') -def tf_vision_example_experiment() -> cfg.ExperimentConfig: - """Definition of a full example experiment.""" - # Create and return experiment template. -``` - -## Create Task - -A task is a class that encapsules the logic of loading data, building models, -performing one-step training and validation, etc. It connects all components -together and is called by the base -[Trainer](https://github.com/tensorflow/models/blob/master/official/core/base_trainer.py). - -You can create your own task by inheriting from base -[Task](https://github.com/tensorflow/models/blob/master/official/core/base_task.py), -or from one of the -[tasks](https://github.com/tensorflow/models/blob/master/official/vision/beta/tasks/) -we already defined, if most of the operations can be reused. An `ExampleTask` -inheriting from -[ImageClassificationTask](https://github.com/tensorflow/models/blob/master/official/vision/beta/tasks/image_classification.py#L32) -can be found -[here](example_task.py). -We will go through each important components in the task in the following. - -- `build_model`: you can instantiate a model you have defined above. It is - also good practice to run forward pass with a dummy input to ensure layers - within the model are properly initialized. - -- `build_inputs`: here you can instantiate a Decoder object and a Parser - object. They are used to create an `InputReader` that will generate a - `tf.data.Dataset` object. - -- `build_losses`: it takes groundtruth labels and model outputs as input, and - computes the loss. It will be called in `train_step` and `validation_step`. - You can also define different losses for training and validation, for - example, `build_train_losses` and `build_validation_losses`. Just make sure - they are called by the corresponding functions properly. - -- `build_metrics`: here you can define your own metrics. It should return a - list of `tf.keras.metrics.Metric` objects. You can create your own metric - class by subclassing `tf.keras.metrics.Metric`. - -- `train_step` and `validation_step`: they perform one-step training and - validation. They take one batch of training/validation data, run forward - pass, gather losses and update metrics. They assume the data format is - consistency with that from the `Parser` output. `train_step` also contains - backward pass to update model weights. - -## Import registry - -To use your custom dataloaders, models, tasks, etc., you will need to register -them properly. The recommended way is to have a single file with all relevant -files imported, for example, -[registry_imports.py](registry_imports.py). -You can see in this file we import all our custom components: - -```python -# pylint: disable=unused-import -from official.common import registry_imports -from official.vision.beta.projects.example import example_config -from official.vision.beta.projects.example import example_input -from official.vision.beta.projects.example import example_model -from official.vision.beta.projects.example import example_task -``` - -## Training - -You can create your own trainer by branching from our core -[trainer](https://github.com/tensorflow/models/blob/master/official/vision/beta/train.py). -Just make sure you import the registry like this: - -```python -from official.vision.beta.projects.example import registry_imports # pylint: disable=unused-import -``` - -You can run training locally for testing purpose: - -```bash -# Assume you are under official/vision/beta/projects. -python3 example/train.py \ - --experiment=tf_vision_example_experiment \ - --config_file=${PWD}/example/example_config_local.yaml \ - --mode=train \ - --model_dir=/tmp/tfvision_test/ -``` - -It can also run on Google Cloud using Cloud TPU. -[Here](https://cloud.google.com/tpu/docs/how-to) is the instruction of using -Cloud TPU and here is a more detailed -[tutorial](https://cloud.google.com/tpu/docs/tutorials/resnet-rs-2.x) of -training a ResNet-RS model. Following the instructions to set up Cloud TPU and -launch training by: - -```bash -EXP_TYPE=tf_vision_example_experiment # This should match the registered name of your experiment template. -EXP_NAME=exp_001 # You can give any name to the experiment. -TPU_NAME=experiment01 -# Now launch the experiment. -python3 example/train.py \ - --experiment=$EXP_TYPE \ - --mode=train \ - --tpu=$TPU_NAME \ - --model_dir=/tmp/tfvision_test/ - --config_file=third_party/tensorflow_models/official/vision/beta/projects/example/example_config_tpu.yaml -``` diff --git a/official/vision/beta/projects/example/registry_imports.py b/official/vision/beta/projects/example/registry_imports.py deleted file mode 100644 index 1f44a877cc98533bac0e2a278cfc907a58e8f396..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/example/registry_imports.py +++ /dev/null @@ -1,27 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""All necessary imports for registration. - -Custom models, task, configs, etc need to be imported to registry so they can be -picked up by the trainer. They can be included in this file so you do not need -to handle each file separately. -""" - -# pylint: disable=unused-import -from official.common import registry_imports -from official.vision.beta.projects.example import example_config -from official.vision.beta.projects.example import example_input -from official.vision.beta.projects.example import example_model -from official.vision.beta.projects.example import example_task diff --git a/official/vision/beta/projects/example/train.py b/official/vision/beta/projects/example/train.py deleted file mode 100644 index 57b177517ae2debbf89429d750d005fcc7f2fd00..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/example/train.py +++ /dev/null @@ -1,30 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""TensorFlow Model Garden Vision trainer. - -All custom registry are imported from registry_imports. Here we use default -trainer so we directly call train.main. If you need to customize the trainer, -branch from `official/vision/beta/train.py` and make changes. -""" -from absl import app - -from official.common import flags as tfm_flags -from official.vision.beta import train -from official.vision.beta.projects.example import registry_imports # pylint: disable=unused-import - - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(train.main) diff --git a/official/vision/beta/projects/movinet/README.md b/official/vision/beta/projects/movinet/README.md deleted file mode 100644 index fb924e2bbad41a9ef071cd69b4d82d4381a08c2d..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/README.md +++ /dev/null @@ -1,419 +0,0 @@ -# Mobile Video Networks (MoViNets) - -[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tensorflow/models/blob/master/official/vision/beta/projects/movinet/movinet_tutorial.ipynb) -[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet) -[![Paper](http://img.shields.io/badge/Paper-arXiv.2103.11511-B3181B?logo=arXiv)](https://arxiv.org/abs/2103.11511) - -This repository is the official implementation of -[MoViNets: Mobile Video Networks for Efficient Video -Recognition](https://arxiv.org/abs/2103.11511). - -**[UPDATE 2021-07-12] Mobile Models Available via [TF Lite](#tf-lite-streaming-models)** - -

- -

- -## Description - -Mobile Video Networks (MoViNets) are efficient video classification models -runnable on mobile devices. MoViNets demonstrate state-of-the-art accuracy and -efficiency on several large-scale video action recognition datasets. - -On [Kinetics 600](https://deepmind.com/research/open-source/kinetics), -MoViNet-A6 achieves 84.8% top-1 accuracy, outperforming recent -Vision Transformer models like [ViViT](https://arxiv.org/abs/2103.15691) (83.0%) -and [VATT](https://arxiv.org/abs/2104.11178) (83.6%) without any additional -training data, while using 10x fewer FLOPs. And streaming MoViNet-A0 achieves -72% accuracy while using 3x fewer FLOPs than MobileNetV3-large (68%). - -There is a large gap between video model performance of accurate models and -efficient models for video action recognition. On the one hand, 2D MobileNet -CNNs are fast and can operate on streaming video in real time, but are prone to -be noisy and inaccurate. On the other hand, 3D CNNs are accurate, but are -memory and computation intensive and cannot operate on streaming video. - -MoViNets bridge this gap, producing: - -- State-of-the art efficiency and accuracy across the model family (MoViNet-A0 -to A6). -- Streaming models with 3D causal convolutions substantially reducing memory -usage. -- Temporal ensembles of models to boost efficiency even higher. - -MoViNets also improve computational efficiency by outputting high-quality -predictions frame by frame, as opposed to the traditional multi-clip evaluation -approach that performs redundant computation and limits temporal scope. - -

- -

- -

- -

- -## History - -- **2021-07-12** Add TF Lite support and replace 3D stream models with -mobile-friendly (2+1)D stream. -- **2021-05-30** Add streaming MoViNet checkpoints and examples. -- **2021-05-11** Initial Commit. - -## Authors and Maintainers - -* Dan Kondratyuk ([@hyperparticle](https://github.com/hyperparticle)) -* Liangzhe Yuan ([@yuanliangzhe](https://github.com/yuanliangzhe)) -* Yeqing Li ([@yeqingli](https://github.com/yeqingli)) - -## Table of Contents - -- [Requirements](#requirements) -- [Results and Pretrained Weights](#results-and-pretrained-weights) - - [Kinetics 600](#kinetics-600) -- [Prediction Examples](#prediction-examples) -- [TF Lite Example](#tf-lite-example) -- [Training and Evaluation](#training-and-evaluation) -- [References](#references) -- [License](#license) -- [Citation](#citation) - -## Requirements - -[![TensorFlow 2.4](https://img.shields.io/badge/TensorFlow-2.1-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.1.0) -[![Python 3.6](https://img.shields.io/badge/Python-3.6-3776AB?logo=python)](https://www.python.org/downloads/release/python-360/) - -To install requirements: - -```shell -pip install -r requirements.txt -``` - -## Results and Pretrained Weights - -[![TensorFlow Hub](https://img.shields.io/badge/TF%20Hub-Models-FF6F00?logo=tensorflow)](https://tfhub.dev/google/collections/movinet) -[![TensorBoard](https://img.shields.io/badge/TensorBoard-dev-FF6F00?logo=tensorflow)](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/) - -### Kinetics 600 - -

- -

- -[tensorboard.dev summary](https://tensorboard.dev/experiment/Q07RQUlVRWOY4yDw3SnSkA/) -of training runs across all models. - -The table below summarizes the performance of each model on -[Kinetics 600](https://deepmind.com/research/open-source/kinetics) -and provides links to download pretrained models. All models are evaluated on -single clips with the same resolution as training. - -Note: MoViNet-A6 can be constructed as an ensemble of MoViNet-A4 and -MoViNet-A5. - -#### Base Models - -Base models implement standard 3D convolutions without stream buffers. Base -models are not recommended for fast inference on CPU or mobile due to -limited support for -[`tf.nn.conv3d`](https://www.tensorflow.org/api_docs/python/tf/nn/conv3d). -Instead, see the [streaming models section](#streaming-models). - -| Model Name | Top-1 Accuracy | Top-5 Accuracy | Input Shape | GFLOPs\* | Checkpoint | TF Hub SavedModel | -|------------|----------------|----------------|-------------|----------|------------|-------------------| -| MoViNet-A0-Base | 72.28 | 90.92 | 50 x 172 x 172 | 2.7 | [checkpoint (12 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a0/base/kinetics-600/classification/) | -| MoViNet-A1-Base | 76.69 | 93.40 | 50 x 172 x 172 | 6.0 | [checkpoint (18 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a1/base/kinetics-600/classification/) | -| MoViNet-A2-Base | 78.62 | 94.17 | 50 x 224 x 224 | 10 | [checkpoint (20 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a2/base/kinetics-600/classification/) | -| MoViNet-A3-Base | 81.79 | 95.67 | 120 x 256 x 256 | 57 | [checkpoint (29 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a3/base/kinetics-600/classification/) | -| MoViNet-A4-Base | 83.48 | 96.16 | 80 x 290 x 290 | 110 | [checkpoint (44 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a4/base/kinetics-600/classification/) | -| MoViNet-A5-Base | 84.27 | 96.39 | 120 x 320 x 320 | 280 | [checkpoint (72 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_base.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a5/base/kinetics-600/classification/) | - -\*GFLOPs per video on Kinetics 600. - -#### Streaming Models - -Streaming models implement causal (2+1)D convolutions with stream buffers. -Streaming models use (2+1)D convolution instead of 3D to utilize optimized -[`tf.nn.conv2d`](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) -operations, which offer fast inference on CPU. Streaming models can be run on -individual frames or on larger video clips like base models. - -Note: A3, A4, and A5 models use a positional encoding in the squeeze-excitation -blocks, while A0, A1, and A2 do not. For the smaller models, accuracy is -unaffected without positional encoding, while for the larger models accuracy is -significantly worse without positional encoding. - -| Model Name | Top-1 Accuracy | Top-5 Accuracy | Input Shape\* | GFLOPs\*\* | Checkpoint | TF Hub SavedModel | -|------------|----------------|----------------|---------------|------------|------------|-------------------| -| MoViNet-A0-Stream | 72.05 | 90.63 | 50 x 172 x 172 | 2.7 | [checkpoint (12 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a0/stream/kinetics-600/classification/) | -| MoViNet-A1-Stream | 76.45 | 93.25 | 50 x 172 x 172 | 6.0 | [checkpoint (18 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a1/stream/kinetics-600/classification/) | -| MoViNet-A2-Stream | 78.40 | 94.05 | 50 x 224 x 224 | 10 | [checkpoint (20 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a2/stream/kinetics-600/classification/) | -| MoViNet-A3-Stream | 80.09 | 94.84 | 120 x 256 x 256 | 57 | [checkpoint (29 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a3/stream/kinetics-600/classification/) | -| MoViNet-A4-Stream | 81.49 | 95.66 | 80 x 290 x 290 | 110 | [checkpoint (44 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a4/stream/kinetics-600/classification/) | -| MoViNet-A5-Stream | 82.37 | 95.79 | 120 x 320 x 320 | 280 | [checkpoint (72 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_stream.tar.gz) | [tfhub](https://tfhub.dev/tensorflow/movinet/a5/stream/kinetics-600/classification/) | - -\*In streaming mode, the number of frames correspond to the total accumulated -duration of the 10-second clip. - -\*\*GFLOPs per video on Kinetics 600. - -Note: current streaming model checkpoints have been updated with a slightly -different architecture. To download the old checkpoints, insert `_legacy` before -`.tar.gz` in the URL. E.g., `movinet_a0_stream_legacy.tar.gz`. - -##### TF Lite Streaming Models - -For convenience, we provide converted TF Lite models for inference on mobile -devices. See the [TF Lite Example](#tf-lite-example) to export and run your own -models. - -For reference, MoViNet-A0-Stream runs with a similar latency to -[MobileNetV3-Large] -(https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/classification/) -with +5% accuracy on Kinetics 600. - -| Model Name | Input Shape | Pixel 4 Latency\* | x86 Latency\* | TF Lite Binary | -|------------|-------------|-------------------|---------------|----------------| -| MoViNet-A0-Stream | 1 x 1 x 172 x 172 | 22 ms | 16 ms | [TF Lite (13 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a0_stream.tflite) | -| MoViNet-A1-Stream | 1 x 1 x 172 x 172 | 42 ms | 33 ms | [TF Lite (45 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a1_stream.tflite) | -| MoViNet-A2-Stream | 1 x 1 x 224 x 224 | 200 ms | 66 ms | [TF Lite (53 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a2_stream.tflite) | -| MoViNet-A3-Stream | 1 x 1 x 256 x 256 | - | 120 ms | [TF Lite (73 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a3_stream.tflite) | -| MoViNet-A4-Stream | 1 x 1 x 290 x 290 | - | 300 ms | [TF Lite (101 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a4_stream.tflite) | -| MoViNet-A5-Stream | 1 x 1 x 320 x 320 | - | 450 ms | [TF Lite (153 MB)](https://storage.googleapis.com/tf_model_garden/vision/movinet/movinet_a5_stream.tflite) | - -\*Single-frame latency measured on with unaltered float32 operations on a -single CPU core. Observed latency may differ depending on hardware -configuration. Measured on a stock Pixel 4 (Android 11) and x86 Intel Xeon -W-2135 CPU. - -## Prediction Examples - -Please check out our [Colab Notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/official/vision/beta/projects/movinet/movinet_tutorial.ipynb) -to get started with MoViNets. - -This section provides examples on how to run prediction. - -For **base models**, run the following: - -```python -import tensorflow as tf - -from official.vision.beta.projects.movinet.modeling import movinet -from official.vision.beta.projects.movinet.modeling import movinet_model - -# Create backbone and model. -backbone = movinet.Movinet( - model_id='a0', - causal=False, - use_external_states=False, -) -model = movinet_model.MovinetClassifier( - backbone, num_classes=600, output_states=True) - -# Create your example input here. -# Refer to the paper for recommended input shapes. -inputs = tf.ones([1, 8, 172, 172, 3]) - -# [Optional] Build the model and load a pretrained checkpoint -model.build(inputs.shape) - -checkpoint_dir = '/path/to/checkpoint' -checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir) -checkpoint = tf.train.Checkpoint(model=model) -status = checkpoint.restore(checkpoint_path) -status.assert_existing_objects_matched() - -# Run the model prediction. -output = model(inputs) -prediction = tf.argmax(output, -1) -``` - -For **streaming models**, run the following: - -```python -import tensorflow as tf - -from official.vision.beta.projects.movinet.modeling import movinet -from official.vision.beta.projects.movinet.modeling import movinet_model - -model_id = 'a0' -use_positional_encoding = model_id in {'a3', 'a4', 'a5'} - -# Create backbone and model. -backbone = movinet.Movinet( - model_id=model_id, - causal=True, - conv_type='2plus1d', - se_type='2plus3d', - activation='hard_swish', - gating_activation='hard_sigmoid', - use_positional_encoding=use_positional_encoding, - use_external_states=True, -) - -model = movinet_model.MovinetClassifier( - backbone, - num_classes=600, - output_states=True) - -# Create your example input here. -# Refer to the paper for recommended input shapes. -inputs = tf.ones([1, 8, 172, 172, 3]) - -# [Optional] Build the model and load a pretrained checkpoint. -model.build(inputs.shape) - -checkpoint_dir = '/path/to/checkpoint' -checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir) -checkpoint = tf.train.Checkpoint(model=model) -status = checkpoint.restore(checkpoint_path) -status.assert_existing_objects_matched() - -# Split the video into individual frames. -# Note: we can also split into larger clips as well (e.g., 8-frame clips). -# Running on larger clips will slightly reduce latency overhead, but -# will consume more memory. -frames = tf.split(inputs, inputs.shape[1], axis=1) - -# Initialize the dict of states. All state tensors are initially zeros. -init_states = model.init_states(tf.shape(inputs)) - -# Run the model prediction by looping over each frame. -states = init_states -predictions = [] -for frame in frames: - output, states = model({**states, 'image': frame}) - predictions.append(output) - -# The video classification will simply be the last output of the model. -final_prediction = tf.argmax(predictions[-1], -1) - -# Alternatively, we can run the network on the entire input video. -# The output should be effectively the same -# (but it may differ a small amount due to floating point errors). -non_streaming_output, _ = model({**init_states, 'image': inputs}) -non_streaming_prediction = tf.argmax(non_streaming_output, -1) -``` - -## TF Lite Example - -This section outlines an example on how to export a model to run on mobile -devices with [TF Lite](https://www.tensorflow.org/lite). - -[Optional] For streaming models, they are typically trained with -`conv_type = 3d_2plus1d` for better training throughpouts. In order to achieve -better inference performance on CPU, we need to convert the `3d_2plus1d` -checkpoint to make it compatible with the `2plus1d` graph. -You could achieve this by running `tools/convert_3d_2plus1d.py`. - -First, convert to [TF SavedModel](https://www.tensorflow.org/guide/saved_model) -by running `export_saved_model.py`. For example, for `MoViNet-A0-Stream`, run: - -```shell -python3 export_saved_model.py \ - --model_id=a0 \ - --causal=True \ - --conv_type=2plus1d \ - --se_type=2plus3d \ - --activation=hard_swish \ - --gating_activation=hard_sigmoid \ - --use_positional_encoding=False \ - --num_classes=600 \ - --batch_size=1 \ - --num_frames=1 \ - --image_size=172 \ - --bundle_input_init_states_fn=False \ - --checkpoint_path=/path/to/checkpoint \ - --export_path=/tmp/movinet_a0_stream -``` - -Then the SavedModel can be converted to TF Lite using the [`TFLiteConverter`](https://www.tensorflow.org/lite/convert): - -```python -saved_model_dir = '/tmp/movinet_a0_stream' -converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) -tflite_model = converter.convert() - -with open('/tmp/movinet_a0_stream.tflite', 'wb') as f: - f.write(tflite_model) -``` - -To run with TF Lite using [tf.lite.Interpreter](https://www.tensorflow.org/lite/guide/inference#load_and_run_a_model_in_python) -with the Python API: - -```python -# Create the interpreter and signature runner -interpreter = tf.lite.Interpreter('/tmp/movinet_a0_stream.tflite') -runner = interpreter.get_signature_runner() - -# Extract state names and create the initial (zero) states -def state_name(name: str) -> str: - return name[len('serving_default_'):-len(':0')] - -init_states = { - state_name(x['name']): tf.zeros(x['shape'], dtype=x['dtype']) - for x in interpreter.get_input_details() -} -del init_states['image'] - -# Insert your video clip here -video = tf.ones([1, 8, 172, 172, 3]) -clips = tf.split(video, video.shape[1], axis=1) - -# To run on a video, pass in one frame at a time -states = init_states -for clip in clips: - # Input shape: [1, 1, 172, 172, 3] - outputs = runner(**states, image=clip) - logits = outputs.pop('logits') - states = outputs -``` - -Follow the [official guide](https://www.tensorflow.org/lite/guide) to run a -model with TF Lite on your mobile device. - -## Training and Evaluation - -Run this command line for continuous training and evaluation. - -```shell -MODE=train_and_eval # Can also be 'train' if using a separate evaluator job -CONFIG_FILE=official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml -python3 official/vision/beta/projects/movinet/train.py \ - --experiment=movinet_kinetics600 \ - --mode=${MODE} \ - --model_dir=/tmp/movinet_a0_base/ \ - --config_file=${CONFIG_FILE} -``` - -Run this command line for evaluation. - -```shell -MODE=eval # Can also be 'eval_continuous' for use during training -CONFIG_FILE=official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml -python3 official/vision/beta/projects/movinet/train.py \ - --experiment=movinet_kinetics600 \ - --mode=${MODE} \ - --model_dir=/tmp/movinet_a0_base/ \ - --config_file=${CONFIG_FILE} -``` - -## License - -[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) - -This project is licensed under the terms of the **Apache License 2.0**. - -## Citation - -If you want to cite this code in your research paper, please use the following -information. - -``` -@article{kondratyuk2021movinets, - title={MoViNets: Mobile Video Networks for Efficient Video Recognition}, - author={Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Matthew Brown, and Boqing Gong}, - journal={arXiv preprint arXiv:2103.11511}, - year={2021} -} -``` diff --git a/official/vision/beta/projects/movinet/__init__.py b/official/vision/beta/projects/movinet/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/movinet/configs/__init__.py b/official/vision/beta/projects/movinet/configs/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/movinet/configs/movinet.py b/official/vision/beta/projects/movinet/configs/movinet.py deleted file mode 100644 index 22516d5d8ffae4e7e6dde2e7deafbed5282a96d0..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/configs/movinet.py +++ /dev/null @@ -1,147 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Definitions for MoViNet structures. - -Reference: "MoViNets: Mobile Video Networks for Efficient Video Recognition" -https://arxiv.org/pdf/2103.11511.pdf - -MoViNets are efficient video classification networks that are part of a model -family, ranging from the smallest model, MoViNet-A0, to the largest model, -MoViNet-A6. Each model has various width, depth, input resolution, and input -frame-rate associated with them. See the main paper for more details. -""" - -import dataclasses - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.vision.beta.configs import backbones_3d -from official.vision.beta.configs import common -from official.vision.beta.configs import video_classification - - -@dataclasses.dataclass -class Movinet(hyperparams.Config): - """Backbone config for Base MoViNet.""" - model_id: str = 'a0' - causal: bool = False - use_positional_encoding: bool = False - # Choose from ['3d', '2plus1d', '3d_2plus1d'] - # 3d: default 3D convolution - # 2plus1d: (2+1)D convolution with Conv2D (2D reshaping) - # 3d_2plus1d: (2+1)D convolution with Conv3D (no 2D reshaping) - conv_type: str = '3d' - # Choose from ['3d', '2d', '2plus3d'] - # 3d: default 3D global average pooling. - # 2d: 2D global average pooling. - # 2plus3d: concatenation of 2D and 3D global average pooling. - se_type: str = '3d' - activation: str = 'swish' - gating_activation: str = 'sigmoid' - stochastic_depth_drop_rate: float = 0.2 - use_external_states: bool = False - - -@dataclasses.dataclass -class MovinetA0(Movinet): - """Backbone config for MoViNet-A0. - - Represents the smallest base MoViNet searched by NAS. - - Reference: https://arxiv.org/pdf/2103.11511.pdf - """ - model_id: str = 'a0' - - -@dataclasses.dataclass -class MovinetA1(Movinet): - """Backbone config for MoViNet-A1.""" - model_id: str = 'a1' - - -@dataclasses.dataclass -class MovinetA2(Movinet): - """Backbone config for MoViNet-A2.""" - model_id: str = 'a2' - - -@dataclasses.dataclass -class MovinetA3(Movinet): - """Backbone config for MoViNet-A3.""" - model_id: str = 'a3' - - -@dataclasses.dataclass -class MovinetA4(Movinet): - """Backbone config for MoViNet-A4.""" - model_id: str = 'a4' - - -@dataclasses.dataclass -class MovinetA5(Movinet): - """Backbone config for MoViNet-A5. - - Represents the largest base MoViNet searched by NAS. - """ - model_id: str = 'a5' - - -@dataclasses.dataclass -class MovinetT0(Movinet): - """Backbone config for MoViNet-T0. - - MoViNet-T0 is a smaller version of MoViNet-A0 for even faster processing. - """ - model_id: str = 't0' - - -@dataclasses.dataclass -class Backbone3D(backbones_3d.Backbone3D): - """Configuration for backbones. - - Attributes: - type: 'str', type of backbone be used, on the of fields below. - movinet: movinet backbone config. - """ - type: str = 'movinet' - movinet: Movinet = Movinet() - - -@dataclasses.dataclass -class MovinetModel(video_classification.VideoClassificationModel): - """The MoViNet model config.""" - model_type: str = 'movinet' - backbone: Backbone3D = Backbone3D() - norm_activation: common.NormActivation = common.NormActivation( - activation=None, # legacy flag, not used. - norm_momentum=0.99, - norm_epsilon=1e-3, - use_sync_bn=True) - activation: str = 'swish' - output_states: bool = False - - -@exp_factory.register_config_factory('movinet_kinetics600') -def movinet_kinetics600() -> cfg.ExperimentConfig: - """Video classification on Videonet with MoViNet backbone.""" - exp = video_classification.video_classification_kinetics600() - exp.task.train_data.dtype = 'bfloat16' - exp.task.validation_data.dtype = 'bfloat16' - - model = MovinetModel() - exp.task.model = model - - return exp diff --git a/official/vision/beta/projects/movinet/configs/movinet_test.py b/official/vision/beta/projects/movinet/configs/movinet_test.py deleted file mode 100644 index b93b77a1108f32ef0e1821e269c8203bcf470cf5..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/configs/movinet_test.py +++ /dev/null @@ -1,42 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for movinet video classification.""" - -from absl.testing import parameterized -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.vision.beta.configs import video_classification as exp_cfg -from official.vision.beta.projects.movinet.configs import movinet - - -class MovinetConfigTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters( - ('movinet_kinetics600',),) - def test_video_classification_configs(self, config_name): - config = exp_factory.get_exp_config(config_name) - self.assertIsInstance(config, cfg.ExperimentConfig) - self.assertIsInstance(config.task, exp_cfg.VideoClassificationTask) - self.assertIsInstance(config.task.model, movinet.MovinetModel) - self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) - config.task.train_data.is_training = None - with self.assertRaises(KeyError): - config.validate() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/movinet/export_saved_model.py b/official/vision/beta/projects/movinet/export_saved_model.py deleted file mode 100644 index 3b0cf67f3612c37bb4e13b13d126b8ad4911e97c..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/export_saved_model.py +++ /dev/null @@ -1,204 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -r"""Exports models to tf.saved_model. - -Export example: - -```shell -python3 export_saved_model.py \ - --export_path=/tmp/movinet/ \ - --model_id=a0 \ - --causal=True \ - --conv_type="3d" \ - --num_classes=600 \ - --use_positional_encoding=False \ - --checkpoint_path="" -``` - -Export for TF Lite example: - -```shell -python3 export_saved_model.py \ - --model_id=a0 \ - --causal=True \ - --conv_type=2plus1d \ - --se_type=2plus3d \ - --activation=hard_swish \ - --gating_activation=hard_sigmoid \ - --use_positional_encoding=False \ - --num_classes=600 \ - --batch_size=1 \ - --num_frames=1 \ # Use a single frame for streaming mode - --image_size=172 \ # Input resolution for the model - --bundle_input_init_states_fn=False \ - --checkpoint_path=/path/to/checkpoint \ - --export_path=/tmp/movinet_a0_stream -``` - -To use an exported saved_model, refer to export_saved_model_test.py. -""" - -from absl import app -from absl import flags -import tensorflow as tf - -from official.vision.beta.projects.movinet.modeling import movinet -from official.vision.beta.projects.movinet.modeling import movinet_model - -flags.DEFINE_string( - 'export_path', '/tmp/movinet/', - 'Export path to save the saved_model file.') -flags.DEFINE_string( - 'model_id', 'a0', 'MoViNet model name.') -flags.DEFINE_bool( - 'causal', False, 'Run the model in causal mode.') -flags.DEFINE_string( - 'conv_type', '3d', - '3d, 2plus1d, or 3d_2plus1d. 3d configures the network ' - 'to use the default 3D convolution. 2plus1d uses (2+1)D convolution ' - 'with Conv2D operations and 2D reshaping (e.g., a 5x3x3 kernel becomes ' - '3x3 followed by 5x1 conv). 3d_2plus1d uses (2+1)D convolution with ' - 'Conv3D and no 2D reshaping (e.g., a 5x3x3 kernel becomes 1x3x3 ' - 'followed by 5x1x1 conv).') -flags.DEFINE_string( - 'se_type', '3d', - '3d, 2d, or 2plus3d. 3d uses the default 3D spatiotemporal global average' - 'pooling for squeeze excitation. 2d uses 2D spatial global average pooling ' - 'on each frame. 2plus3d concatenates both 3D and 2D global average ' - 'pooling.') -flags.DEFINE_string( - 'activation', 'swish', - 'The main activation to use across layers.') -flags.DEFINE_string( - 'gating_activation', 'sigmoid', - 'The gating activation to use in squeeze-excitation layers.') -flags.DEFINE_bool( - 'use_positional_encoding', False, - 'Whether to use positional encoding (only applied when causal=True).') -flags.DEFINE_integer( - 'num_classes', 600, 'The number of classes for prediction.') -flags.DEFINE_integer( - 'batch_size', None, - 'The batch size of the input. Set to None for dynamic input.') -flags.DEFINE_integer( - 'num_frames', None, - 'The number of frames of the input. Set to None for dynamic input.') -flags.DEFINE_integer( - 'image_size', None, - 'The resolution of the input. Set to None for dynamic input.') -flags.DEFINE_bool( - 'bundle_input_init_states_fn', True, - 'Add init_states as a function signature to the saved model.' - 'This is not necessary if the input shape is static (e.g., for TF Lite).') -flags.DEFINE_string( - 'checkpoint_path', '', - 'Checkpoint path to load. Leave blank for default initialization.') - -FLAGS = flags.FLAGS - - -def main(_) -> None: - input_specs = tf.keras.layers.InputSpec(shape=[ - FLAGS.batch_size, - FLAGS.num_frames, - FLAGS.image_size, - FLAGS.image_size, - 3, - ]) - - # Use dimensions of 1 except the channels to export faster, - # since we only really need the last dimension to build and get the output - # states. These dimensions can be set to `None` once the model is built. - input_shape = [1 if s is None else s for s in input_specs.shape] - - activation = FLAGS.activation - if activation == 'swish': - # Override swish activation implementation to remove custom gradients - activation = 'simple_swish' - - backbone = movinet.Movinet( - model_id=FLAGS.model_id, - causal=FLAGS.causal, - use_positional_encoding=FLAGS.use_positional_encoding, - conv_type=FLAGS.conv_type, - se_type=FLAGS.se_type, - input_specs=input_specs, - activation=activation, - gating_activation=FLAGS.gating_activation, - use_sync_bn=False, - use_external_states=FLAGS.causal) - model = movinet_model.MovinetClassifier( - backbone, - num_classes=FLAGS.num_classes, - output_states=FLAGS.causal, - input_specs=dict(image=input_specs), - # TODO(dankondratyuk): currently set to swish, but will need to - # re-train to use other activations. - activation='simple_swish') - model.build(input_shape) - - # Compile model to generate some internal Keras variables. - model.compile() - - if FLAGS.checkpoint_path: - checkpoint = tf.train.Checkpoint(model=model) - status = checkpoint.restore(FLAGS.checkpoint_path) - status.assert_existing_objects_matched() - - if FLAGS.causal: - # Call the model once to get the output states. Call again with `states` - # input to ensure that the inputs with the `states` argument is built - # with the full output state shapes. - input_image = tf.ones(input_shape) - _, states = model({**model.init_states(input_shape), 'image': input_image}) - _ = model({**states, 'image': input_image}) - - # Create a function to explicitly set the names of the outputs - def predict(inputs): - outputs, states = model(inputs) - return {**states, 'logits': outputs} - - specs = { - name: tf.TensorSpec(spec.shape, name=name, dtype=spec.dtype) - for name, spec in model.initial_state_specs( - input_specs.shape).items() - } - specs['image'] = tf.TensorSpec( - input_specs.shape, dtype=model.dtype, name='image') - - predict_fn = tf.function(predict, jit_compile=True) - predict_fn = predict_fn.get_concrete_function(specs) - - init_states_fn = tf.function(model.init_states, jit_compile=True) - init_states_fn = init_states_fn.get_concrete_function( - tf.TensorSpec([5], dtype=tf.int32)) - - if FLAGS.bundle_input_init_states_fn: - signatures = {'call': predict_fn, 'init_states': init_states_fn} - else: - signatures = predict_fn - - tf.keras.models.save_model( - model, FLAGS.export_path, signatures=signatures) - else: - _ = model(tf.ones(input_shape)) - tf.keras.models.save_model(model, FLAGS.export_path) - - print(' ----- Done. Saved Model is saved at {}'.format(FLAGS.export_path)) - - -if __name__ == '__main__': - app.run(main) diff --git a/official/vision/beta/projects/movinet/modeling/__init__.py b/official/vision/beta/projects/movinet/modeling/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/modeling/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/movinet/modeling/movinet.py b/official/vision/beta/projects/movinet/modeling/movinet.py deleted file mode 100644 index 5639f6a68c6f5cbaa7da67ca631a144e198d9316..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/modeling/movinet.py +++ /dev/null @@ -1,733 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Contains definitions of Mobile Video Networks. - -Reference: https://arxiv.org/pdf/2103.11511.pdf -""" -import dataclasses -import math -from typing import Dict, Mapping, Optional, Sequence, Tuple, Union - -import tensorflow as tf - -from official.modeling import hyperparams -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.projects.movinet.modeling import movinet_layers - -# Defines a set of kernel sizes and stride sizes to simplify and shorten -# architecture definitions for configs below. -KernelSize = Tuple[int, int, int] - -# K(ab) represents a 3D kernel of size (a, b, b) -K13: KernelSize = (1, 3, 3) -K15: KernelSize = (1, 5, 5) -K33: KernelSize = (3, 3, 3) -K53: KernelSize = (5, 3, 3) - -# S(ab) represents a 3D stride of size (a, b, b) -S11: KernelSize = (1, 1, 1) -S12: KernelSize = (1, 2, 2) -S22: KernelSize = (2, 2, 2) -S21: KernelSize = (2, 1, 1) - -# Type for a state container (map) -TensorMap = Mapping[str, tf.Tensor] - - -@dataclasses.dataclass -class BlockSpec: - """Configuration of a block.""" - pass - - -@dataclasses.dataclass -class StemSpec(BlockSpec): - """Configuration of a Movinet block.""" - filters: int = 0 - kernel_size: KernelSize = (0, 0, 0) - strides: KernelSize = (0, 0, 0) - - -@dataclasses.dataclass -class MovinetBlockSpec(BlockSpec): - """Configuration of a Movinet block.""" - base_filters: int = 0 - expand_filters: Sequence[int] = () - kernel_sizes: Sequence[KernelSize] = () - strides: Sequence[KernelSize] = () - - -@dataclasses.dataclass -class HeadSpec(BlockSpec): - """Configuration of a Movinet block.""" - project_filters: int = 0 - head_filters: int = 0 - - -# Block specs specify the architecture of each model -BLOCK_SPECS = { - 'a0': ( - StemSpec(filters=8, kernel_size=K13, strides=S12), - MovinetBlockSpec( - base_filters=8, - expand_filters=(24,), - kernel_sizes=(K15,), - strides=(S12,)), - MovinetBlockSpec( - base_filters=32, - expand_filters=(80, 80, 80), - kernel_sizes=(K33, K33, K33), - strides=(S12, S11, S11)), - MovinetBlockSpec( - base_filters=56, - expand_filters=(184, 112, 184), - kernel_sizes=(K53, K33, K33), - strides=(S12, S11, S11)), - MovinetBlockSpec( - base_filters=56, - expand_filters=(184, 184, 184, 184), - kernel_sizes=(K53, K33, K33, K33), - strides=(S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=104, - expand_filters=(384, 280, 280, 344), - kernel_sizes=(K53, K15, K15, K15), - strides=(S12, S11, S11, S11)), - HeadSpec(project_filters=480, head_filters=2048), - ), - 'a1': ( - StemSpec(filters=16, kernel_size=K13, strides=S12), - MovinetBlockSpec( - base_filters=16, - expand_filters=(40, 40), - kernel_sizes=(K15, K33), - strides=(S12, S11)), - MovinetBlockSpec( - base_filters=40, - expand_filters=(96, 120, 96, 96), - kernel_sizes=(K33, K33, K33, K33), - strides=(S12, S11, S11, S11)), - MovinetBlockSpec( - base_filters=64, - expand_filters=(216, 128, 216, 168, 216), - kernel_sizes=(K53, K33, K33, K33, K33), - strides=(S12, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=64, - expand_filters=(216, 216, 216, 128, 128, 216), - kernel_sizes=(K53, K33, K33, K33, K15, K33), - strides=(S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=136, - expand_filters=(456, 360, 360, 360, 456, 456, 544), - kernel_sizes=(K53, K15, K15, K15, K15, K33, K13), - strides=(S12, S11, S11, S11, S11, S11, S11)), - HeadSpec(project_filters=600, head_filters=2048), - ), - 'a2': ( - StemSpec(filters=16, kernel_size=K13, strides=S12), - MovinetBlockSpec( - base_filters=16, - expand_filters=(40, 40, 64), - kernel_sizes=(K15, K33, K33), - strides=(S12, S11, S11)), - MovinetBlockSpec( - base_filters=40, - expand_filters=(96, 120, 96, 96, 120), - kernel_sizes=(K33, K33, K33, K33, K33), - strides=(S12, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=72, - expand_filters=(240, 160, 240, 192, 240), - kernel_sizes=(K53, K33, K33, K33, K33), - strides=(S12, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=72, - expand_filters=(240, 240, 240, 240, 144, 240), - kernel_sizes=(K53, K33, K33, K33, K15, K33), - strides=(S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=144, - expand_filters=(480, 384, 384, 480, 480, 480, 576), - kernel_sizes=(K53, K15, K15, K15, K15, K33, K13), - strides=(S12, S11, S11, S11, S11, S11, S11)), - HeadSpec(project_filters=640, head_filters=2048), - ), - 'a3': ( - StemSpec(filters=16, kernel_size=K13, strides=S12), - MovinetBlockSpec( - base_filters=16, - expand_filters=(40, 40, 64, 40), - kernel_sizes=(K15, K33, K33, K33), - strides=(S12, S11, S11, S11)), - MovinetBlockSpec( - base_filters=48, - expand_filters=(112, 144, 112, 112, 144, 144), - kernel_sizes=(K33, K33, K33, K15, K33, K33), - strides=(S12, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=80, - expand_filters=(240, 152, 240, 192, 240), - kernel_sizes=(K53, K33, K33, K33, K33), - strides=(S12, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=88, - expand_filters=(264, 264, 264, 264, 160, 264, 264, 264), - kernel_sizes=(K53, K33, K33, K33, K15, K33, K33, K33), - strides=(S11, S11, S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=168, - expand_filters=(560, 448, 448, 560, 560, 560, 448, 448, 560, 672), - kernel_sizes=(K53, K15, K15, K15, K15, K33, K15, K15, K33, K13), - strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11)), - HeadSpec(project_filters=744, head_filters=2048), - ), - 'a4': ( - StemSpec(filters=24, kernel_size=K13, strides=S12), - MovinetBlockSpec( - base_filters=24, - expand_filters=(64, 64, 96, 64, 96, 64), - kernel_sizes=(K15, K33, K33, K33, K33, K33), - strides=(S12, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=56, - expand_filters=(168, 168, 136, 136, 168, 168, 168, 136, 136), - kernel_sizes=(K33, K33, K33, K33, K33, K33, K33, K15, K33), - strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=96, - expand_filters=(320, 160, 320, 192, 320, 160, 320, 256, 320), - kernel_sizes=(K53, K33, K33, K33, K33, K33, K33, K33, K33), - strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=96, - expand_filters=(320, 320, 320, 320, 192, 320, 320, 192, 320, 320), - kernel_sizes=(K53, K33, K33, K33, K15, K33, K33, K33, K33, K33), - strides=(S11, S11, S11, S11, S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=192, - expand_filters=(640, 512, 512, 640, 640, 640, 512, 512, 640, 768, - 640, 640, 768), - kernel_sizes=(K53, K15, K15, K15, K15, K33, K15, K15, K15, K15, K15, - K33, K33), - strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, - S11)), - HeadSpec(project_filters=856, head_filters=2048), - ), - 'a5': ( - StemSpec(filters=24, kernel_size=K13, strides=S12), - MovinetBlockSpec( - base_filters=24, - expand_filters=(64, 64, 96, 64, 96, 64), - kernel_sizes=(K15, K15, K33, K33, K33, K33), - strides=(S12, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=64, - expand_filters=(192, 152, 152, 152, 192, 192, 192, 152, 152, 192, - 192), - kernel_sizes=(K53, K33, K33, K33, K33, K33, K33, K33, K33, K33, - K33), - strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=112, - expand_filters=(376, 224, 376, 376, 296, 376, 224, 376, 376, 296, - 376, 376, 376), - kernel_sizes=(K53, K33, K33, K33, K33, K33, K33, K33, K33, K33, K33, - K33, K33), - strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, - S11)), - MovinetBlockSpec( - base_filters=120, - expand_filters=(376, 376, 376, 376, 224, 376, 376, 224, 376, 376, - 376), - kernel_sizes=(K53, K33, K33, K33, K15, K33, K33, K33, K33, K33, - K33), - strides=(S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=224, - expand_filters=(744, 744, 600, 600, 744, 744, 744, 896, 600, 600, - 896, 744, 744, 896, 600, 600, 744, 744), - kernel_sizes=(K53, K33, K15, K15, K15, K15, K33, K15, K15, K15, K15, - K15, K33, K15, K15, K15, K15, K33), - strides=(S12, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, S11, - S11, S11, S11, S11, S11, S11)), - HeadSpec(project_filters=992, head_filters=2048), - ), - 't0': ( - StemSpec(filters=8, kernel_size=K13, strides=S12), - MovinetBlockSpec( - base_filters=8, - expand_filters=(16,), - kernel_sizes=(K15,), - strides=(S12,)), - MovinetBlockSpec( - base_filters=32, - expand_filters=(72, 72), - kernel_sizes=(K33, K15), - strides=(S12, S11)), - MovinetBlockSpec( - base_filters=56, - expand_filters=(112, 112, 112), - kernel_sizes=(K53, K15, K33), - strides=(S12, S11, S11)), - MovinetBlockSpec( - base_filters=56, - expand_filters=(184, 184, 184, 184), - kernel_sizes=(K53, K15, K33, K33), - strides=(S11, S11, S11, S11)), - MovinetBlockSpec( - base_filters=104, - expand_filters=(344, 344, 344, 344), - kernel_sizes=(K53, K15, K15, K33), - strides=(S12, S11, S11, S11)), - HeadSpec(project_filters=240, head_filters=1024), - ), -} - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class Movinet(tf.keras.Model): - """Class to build Movinet family model. - - Reference: https://arxiv.org/pdf/2103.11511.pdf - """ - - def __init__(self, - model_id: str = 'a0', - causal: bool = False, - use_positional_encoding: bool = False, - conv_type: str = '3d', - se_type: str = '3d', - input_specs: Optional[tf.keras.layers.InputSpec] = None, - activation: str = 'swish', - gating_activation: str = 'sigmoid', - use_sync_bn: bool = True, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - kernel_initializer: str = 'HeNormal', - kernel_regularizer: Optional[str] = None, - bias_regularizer: Optional[str] = None, - stochastic_depth_drop_rate: float = 0., - use_external_states: bool = False, - output_states: bool = True, - **kwargs): - """MoViNet initialization function. - - Args: - model_id: name of MoViNet backbone model. - causal: use causal mode, with CausalConv and CausalSE operations. - use_positional_encoding: if True, adds a positional encoding before - temporal convolutions and the cumulative global average pooling - layers. - conv_type: '3d', '2plus1d', or '3d_2plus1d'. '3d' configures the network - to use the default 3D convolution. '2plus1d' uses (2+1)D convolution - with Conv2D operations and 2D reshaping (e.g., a 5x3x3 kernel becomes - 3x3 followed by 5x1 conv). '3d_2plus1d' uses (2+1)D convolution with - Conv3D and no 2D reshaping (e.g., a 5x3x3 kernel becomes 1x3x3 followed - by 5x1x1 conv). - se_type: '3d', '2d', or '2plus3d'. '3d' uses the default 3D - spatiotemporal global average pooling for squeeze excitation. '2d' - uses 2D spatial global average pooling on each frame. '2plus3d' - concatenates both 3D and 2D global average pooling. - input_specs: the model input spec to use. - activation: name of the main activation function. - gating_activation: gating activation to use in squeeze excitation layers. - use_sync_bn: if True, use synchronized batch normalization. - norm_momentum: normalization momentum for the moving average. - norm_epsilon: small float added to variance to avoid dividing by - zero. - kernel_initializer: kernel_initializer for convolutional layers. - kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D. - Defaults to None. - bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d. - Defaults to None. - stochastic_depth_drop_rate: the base rate for stochastic depth. - use_external_states: if True, expects states to be passed as additional - input. - output_states: if True, output intermediate states that can be used to run - the model in streaming mode. Inputting the output states of the - previous input clip with the current input clip will utilize a stream - buffer for streaming video. - **kwargs: keyword arguments to be passed. - """ - block_specs = BLOCK_SPECS[model_id] - if input_specs is None: - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, None, 3]) - - if conv_type not in ('3d', '2plus1d', '3d_2plus1d'): - raise ValueError('Unknown conv type: {}'.format(conv_type)) - if se_type not in ('3d', '2d', '2plus3d'): - raise ValueError('Unknown squeeze excitation type: {}'.format(se_type)) - - self._model_id = model_id - self._block_specs = block_specs - self._causal = causal - self._use_positional_encoding = use_positional_encoding - self._conv_type = conv_type - self._se_type = se_type - self._input_specs = input_specs - self._use_sync_bn = use_sync_bn - self._activation = activation - self._gating_activation = gating_activation - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._stochastic_depth_drop_rate = stochastic_depth_drop_rate - self._use_external_states = use_external_states - self._output_states = output_states - - if self._use_external_states and not self._causal: - raise ValueError('External states should be used with causal mode.') - if not isinstance(block_specs[0], StemSpec): - raise ValueError( - 'Expected first spec to be StemSpec, got {}'.format(block_specs[0])) - if not isinstance(block_specs[-1], HeadSpec): - raise ValueError( - 'Expected final spec to be HeadSpec, got {}'.format(block_specs[-1])) - self._head_filters = block_specs[-1].head_filters - - state_specs = None - if use_external_states: - self._set_dtype_policy(input_specs.dtype) - state_specs = self.initial_state_specs(input_specs.shape) - - inputs, outputs = self._build_network(input_specs, state_specs=state_specs) - - super(Movinet, self).__init__(inputs=inputs, outputs=outputs, **kwargs) - - self._state_specs = state_specs - - def _build_network( - self, - input_specs: tf.keras.layers.InputSpec, - state_specs: Optional[Mapping[str, tf.keras.layers.InputSpec]] = None, - ) -> Tuple[TensorMap, Union[TensorMap, Tuple[TensorMap, TensorMap]]]: - """Builds the model network. - - Args: - input_specs: the model input spec to use. - state_specs: a dict mapping a state name to the corresponding state spec. - State names should match with the `state` input/output dict. - - Returns: - Inputs and outputs as a tuple. Inputs are expected to be a dict with - base input and states. Outputs are expected to be a dict of endpoints - and (optional) output states. - """ - state_specs = state_specs if state_specs is not None else {} - - image_input = tf.keras.Input(shape=input_specs.shape[1:], name='inputs') - - states = { - name: tf.keras.Input(shape=spec.shape[1:], dtype=spec.dtype, name=name) - for name, spec in state_specs.items() - } - - inputs = {**states, 'image': image_input} - endpoints = {} - - x = image_input - - num_layers = sum( - len(block.expand_filters) - for block in self._block_specs - if isinstance(block, MovinetBlockSpec)) - stochastic_depth_idx = 1 - for block_idx, block in enumerate(self._block_specs): - if isinstance(block, StemSpec): - layer_obj = movinet_layers.Stem( - block.filters, - block.kernel_size, - block.strides, - conv_type=self._conv_type, - causal=self._causal, - activation=self._activation, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - batch_norm_layer=self._norm, - batch_norm_momentum=self._norm_momentum, - batch_norm_epsilon=self._norm_epsilon, - state_prefix='state_stem', - name='stem') - x, states = layer_obj(x, states=states) - endpoints['stem'] = x - elif isinstance(block, MovinetBlockSpec): - if not (len(block.expand_filters) == len(block.kernel_sizes) == - len(block.strides)): - raise ValueError( - 'Lengths of block parameters differ: {}, {}, {}'.format( - len(block.expand_filters), - len(block.kernel_sizes), - len(block.strides))) - params = list(zip(block.expand_filters, - block.kernel_sizes, - block.strides)) - for layer_idx, layer in enumerate(params): - stochastic_depth_drop_rate = ( - self._stochastic_depth_drop_rate * stochastic_depth_idx / - num_layers) - expand_filters, kernel_size, strides = layer - name = f'block{block_idx-1}_layer{layer_idx}' - layer_obj = movinet_layers.MovinetBlock( - block.base_filters, - expand_filters, - kernel_size=kernel_size, - strides=strides, - causal=self._causal, - activation=self._activation, - gating_activation=self._gating_activation, - stochastic_depth_drop_rate=stochastic_depth_drop_rate, - conv_type=self._conv_type, - se_type=self._se_type, - use_positional_encoding= - self._use_positional_encoding and self._causal, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - batch_norm_layer=self._norm, - batch_norm_momentum=self._norm_momentum, - batch_norm_epsilon=self._norm_epsilon, - state_prefix=f'state_{name}', - name=name) - x, states = layer_obj(x, states=states) - - endpoints[name] = x - stochastic_depth_idx += 1 - elif isinstance(block, HeadSpec): - layer_obj = movinet_layers.Head( - project_filters=block.project_filters, - conv_type=self._conv_type, - activation=self._activation, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - batch_norm_layer=self._norm, - batch_norm_momentum=self._norm_momentum, - batch_norm_epsilon=self._norm_epsilon, - state_prefix='state_head', - name='head') - x, states = layer_obj(x, states=states) - endpoints['head'] = x - else: - raise ValueError('Unknown block type {}'.format(block)) - - outputs = (endpoints, states) if self._output_states else endpoints - - return inputs, outputs - - def _get_initial_state_shapes( - self, - block_specs: Sequence[BlockSpec], - input_shape: Union[Sequence[int], tf.Tensor], - use_positional_encoding: bool = False) -> Dict[str, Sequence[int]]: - """Generates names and shapes for all input states. - - Args: - block_specs: sequence of specs used for creating a model. - input_shape: the expected 5D shape of the image input. - use_positional_encoding: whether the model will use positional encoding. - - Returns: - A dict mapping state names to state shapes. - """ - def divide_resolution(shape, num_downsamples): - """Downsamples the dimension to calculate strided convolution shape.""" - if shape is None: - return None - if isinstance(shape, tf.Tensor): - # Avoid using div and ceil to support tf lite - shape = tf.cast(shape, tf.float32) - resolution_divisor = 2 ** num_downsamples - resolution_multiplier = 0.5 ** num_downsamples - shape = ((shape + resolution_divisor - 1) * resolution_multiplier) - return tf.cast(shape, tf.int32) - else: - resolution_divisor = 2 ** num_downsamples - return math.ceil(shape / resolution_divisor) - - states = {} - num_downsamples = 0 - - for block_idx, block in enumerate(block_specs): - if isinstance(block, StemSpec): - if block.kernel_size[0] > 1: - states['state_stem_stream_buffer'] = ( - input_shape[0], - input_shape[1], - divide_resolution(input_shape[2], num_downsamples), - divide_resolution(input_shape[3], num_downsamples), - block.filters, - ) - num_downsamples += 1 - elif isinstance(block, MovinetBlockSpec): - block_idx -= 1 - params = list(zip( - block.expand_filters, - block.kernel_sizes, - block.strides)) - for layer_idx, layer in enumerate(params): - expand_filters, kernel_size, strides = layer - - # If we use a 2D kernel, we apply spatial downsampling - # before the buffer. - if (tuple(strides[1:3]) != (1, 1) and - self._conv_type in ['2plus1d', '3d_2plus1d']): - num_downsamples += 1 - - prefix = f'state_block{block_idx}_layer{layer_idx}' - - if kernel_size[0] > 1: - states[f'{prefix}_stream_buffer'] = ( - input_shape[0], - kernel_size[0] - 1, - divide_resolution(input_shape[2], num_downsamples), - divide_resolution(input_shape[3], num_downsamples), - expand_filters, - ) - - states[f'{prefix}_pool_buffer'] = ( - input_shape[0], 1, 1, 1, expand_filters, - ) - states[f'{prefix}_pool_frame_count'] = (1,) - - if use_positional_encoding: - name = f'{prefix}_pos_enc_frame_count' - states[name] = (1,) - - if strides[1] != strides[2]: - raise ValueError('Strides must match in the spatial dimensions, ' - 'got {}'.format(strides)) - - # If we use a 3D kernel, we apply spatial downsampling - # after the buffer. - if (tuple(strides[1:3]) != (1, 1) and - self._conv_type not in ['2plus1d', '3d_2plus1d']): - num_downsamples += 1 - elif isinstance(block, HeadSpec): - states['state_head_pool_buffer'] = ( - input_shape[0], 1, 1, 1, block.project_filters, - ) - states['state_head_pool_frame_count'] = (1,) - - return states - - def _get_state_dtype(self, name: str) -> str: - """Returns the dtype associated with a state.""" - if 'frame_count' in name: - return 'int32' - return self.dtype - - def initial_state_specs( - self, input_shape: Sequence[int]) -> Dict[str, tf.keras.layers.InputSpec]: - """Creates a mapping of state name to InputSpec from the input shape.""" - state_shapes = self._get_initial_state_shapes( - self._block_specs, - input_shape, - use_positional_encoding=self._use_positional_encoding) - - return { - name: tf.keras.layers.InputSpec( - shape=shape, dtype=self._get_state_dtype(name)) - for name, shape in state_shapes.items() - } - - def init_states(self, input_shape: Sequence[int]) -> Dict[str, tf.Tensor]: - """Returns initial states for the first call in steaming mode.""" - state_shapes = self._get_initial_state_shapes( - self._block_specs, - input_shape, - use_positional_encoding=self._use_positional_encoding) - - states = { - name: tf.zeros(shape, dtype=self._get_state_dtype(name)) - for name, shape in state_shapes.items() - } - return states - - @property - def use_external_states(self) -> bool: - """Whether this model is expecting input states as additional input.""" - return self._use_external_states - - @property - def head_filters(self): - """The number of filters expected to be in the head classifer layer.""" - return self._head_filters - - @property - def conv_type(self): - """The expected convolution type (see __init__ for more details).""" - return self._conv_type - - def get_config(self): - config_dict = { - 'model_id': self._model_id, - 'causal': self._causal, - 'use_positional_encoding': self._use_positional_encoding, - 'conv_type': self._conv_type, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, - 'use_external_states': self._use_external_states, - 'output_states': self._output_states, - } - return config_dict - - @classmethod - def from_config(cls, config, custom_objects=None): - return cls(**config) - - -@factory.register_backbone_builder('movinet') -def build_movinet( - input_specs: tf.keras.layers.InputSpec, - backbone_config: hyperparams.Config, - norm_activation_config: hyperparams.Config, - l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras - """Builds MoViNet backbone from a config.""" - backbone_type = backbone_config.type - backbone_cfg = backbone_config.get() - if backbone_type != 'movinet': - raise ValueError(f'Inconsistent backbone type {backbone_type}') - if norm_activation_config.activation is not None: - raise ValueError( - 'norm_activation is not used in MoViNets, but specified: %s' % - norm_activation_config.activation) - - return Movinet( - model_id=backbone_cfg.model_id, - causal=backbone_cfg.causal, - use_positional_encoding=backbone_cfg.use_positional_encoding, - conv_type=backbone_cfg.conv_type, - se_type=backbone_cfg.se_type, - input_specs=input_specs, - activation=backbone_cfg.activation, - gating_activation=backbone_cfg.gating_activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - kernel_regularizer=l2_regularizer, - stochastic_depth_drop_rate=backbone_cfg.stochastic_depth_drop_rate, - use_external_states=backbone_cfg.use_external_states) diff --git a/official/vision/beta/projects/movinet/modeling/movinet_test.py b/official/vision/beta/projects/movinet/modeling/movinet_test.py deleted file mode 100644 index e6fa5e76f761baa46918822ceab4f442acb311c3..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/modeling/movinet_test.py +++ /dev/null @@ -1,183 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for movinet.py.""" - -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.beta.projects.movinet.modeling import movinet - - -class MoViNetTest(parameterized.TestCase, tf.test.TestCase): - - def test_network_creation(self): - """Test creation of MoViNet family models.""" - tf.keras.backend.set_image_data_format('channels_last') - - network = movinet.Movinet( - model_id='a0', - causal=True, - ) - inputs = tf.keras.Input(shape=(8, 128, 128, 3), batch_size=1) - endpoints, states = network(inputs) - - self.assertAllEqual(endpoints['stem'].shape, [1, 8, 64, 64, 8]) - self.assertAllEqual(endpoints['block0_layer0'].shape, [1, 8, 32, 32, 8]) - self.assertAllEqual(endpoints['block1_layer0'].shape, [1, 8, 16, 16, 32]) - self.assertAllEqual(endpoints['block2_layer0'].shape, [1, 8, 8, 8, 56]) - self.assertAllEqual(endpoints['block3_layer0'].shape, [1, 8, 8, 8, 56]) - self.assertAllEqual(endpoints['block4_layer0'].shape, [1, 8, 4, 4, 104]) - self.assertAllEqual(endpoints['head'].shape, [1, 1, 1, 1, 480]) - - self.assertNotEmpty(states) - - def test_network_with_states(self): - """Test creation of MoViNet family models with states.""" - tf.keras.backend.set_image_data_format('channels_last') - - backbone = movinet.Movinet( - model_id='a0', - causal=True, - use_external_states=True, - ) - inputs = tf.ones([1, 8, 128, 128, 3]) - - init_states = backbone.init_states(tf.shape(inputs)) - endpoints, new_states = backbone({**init_states, 'image': inputs}) - - self.assertAllEqual(endpoints['stem'].shape, [1, 8, 64, 64, 8]) - self.assertAllEqual(endpoints['block0_layer0'].shape, [1, 8, 32, 32, 8]) - self.assertAllEqual(endpoints['block1_layer0'].shape, [1, 8, 16, 16, 32]) - self.assertAllEqual(endpoints['block2_layer0'].shape, [1, 8, 8, 8, 56]) - self.assertAllEqual(endpoints['block3_layer0'].shape, [1, 8, 8, 8, 56]) - self.assertAllEqual(endpoints['block4_layer0'].shape, [1, 8, 4, 4, 104]) - self.assertAllEqual(endpoints['head'].shape, [1, 1, 1, 1, 480]) - - self.assertNotEmpty(init_states) - self.assertNotEmpty(new_states) - - def test_movinet_stream(self): - """Test if the backbone can be run in streaming mode.""" - tf.keras.backend.set_image_data_format('channels_last') - - backbone = movinet.Movinet( - model_id='a0', - causal=True, - use_external_states=True, - ) - inputs = tf.ones([1, 5, 128, 128, 3]) - - init_states = backbone.init_states(tf.shape(inputs)) - expected_endpoints, _ = backbone({**init_states, 'image': inputs}) - - frames = tf.split(inputs, inputs.shape[1], axis=1) - - states = init_states - for frame in frames: - output, states = backbone({**states, 'image': frame}) - predicted_endpoints = output - - predicted = predicted_endpoints['head'] - - # The expected final output is simply the mean across frames - expected = expected_endpoints['head'] - expected = tf.reduce_mean(expected, 1, keepdims=True) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected, 1e-5, 1e-5) - - def test_movinet_2plus1d_stream(self): - tf.keras.backend.set_image_data_format('channels_last') - - backbone = movinet.Movinet( - model_id='a0', - causal=True, - conv_type='2plus1d', - use_external_states=True, - ) - inputs = tf.ones([1, 5, 128, 128, 3]) - - init_states = backbone.init_states(tf.shape(inputs)) - expected_endpoints, _ = backbone({**init_states, 'image': inputs}) - - frames = tf.split(inputs, inputs.shape[1], axis=1) - - states = init_states - for frame in frames: - output, states = backbone({**states, 'image': frame}) - predicted_endpoints = output - - predicted = predicted_endpoints['head'] - - # The expected final output is simply the mean across frames - expected = expected_endpoints['head'] - expected = tf.reduce_mean(expected, 1, keepdims=True) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected, 1e-5, 1e-5) - - def test_movinet_3d_2plus1d_stream(self): - tf.keras.backend.set_image_data_format('channels_last') - - backbone = movinet.Movinet( - model_id='a0', - causal=True, - conv_type='3d_2plus1d', - use_external_states=True, - ) - inputs = tf.ones([1, 5, 128, 128, 3]) - - init_states = backbone.init_states(tf.shape(inputs)) - expected_endpoints, _ = backbone({**init_states, 'image': inputs}) - - frames = tf.split(inputs, inputs.shape[1], axis=1) - - states = init_states - for frame in frames: - output, states = backbone({**states, 'image': frame}) - predicted_endpoints = output - - predicted = predicted_endpoints['head'] - - # The expected final output is simply the mean across frames - expected = expected_endpoints['head'] - expected = tf.reduce_mean(expected, 1, keepdims=True) - - self.assertEqual(predicted.shape, expected.shape) - self.assertAllClose(predicted, expected, 1e-5, 1e-5) - - def test_serialize_deserialize(self): - # Create a network object that sets all of its config options. - kwargs = dict( - model_id='a0', - causal=True, - use_positional_encoding=True, - use_external_states=True, - ) - network = movinet.Movinet(**kwargs) - - # Create another network object from the first object's config. - new_network = movinet.Movinet.from_config(network.get_config()) - - # Validate that the config can be forced to JSON. - _ = new_network.to_json() - - # If the serialization was successful, the new config should match the old. - self.assertAllEqual(network.get_config(), new_network.get_config()) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/movinet/movinet_tutorial.ipynb b/official/vision/beta/projects/movinet/movinet_tutorial.ipynb deleted file mode 100644 index 319489d7a7895009397b3b9e03aa693f41ece9fd..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/movinet_tutorial.ipynb +++ /dev/null @@ -1,597 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "3E96e1UKQ8uR" - }, - "source": [ - "# MoViNet Tutorial\n", - "\n", - "This notebook provides basic example code to create, build, and run [MoViNets (Mobile Video Networks)](https://arxiv.org/pdf/2103.11511.pdf). Models use TF Keras and support inference in TF 1 and TF 2. Pretrained models are provided by [TensorFlow Hub](https://tfhub.dev/google/collections/movinet/), trained on [Kinetics 600](https://deepmind.com/research/open-source/kinetics) for video action classification." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "8_oLnvJy7kz5" - }, - "source": [ - "## Setup\n", - "\n", - "It is recommended to run the models using GPUs or TPUs.\n", - "\n", - "To select a GPU/TPU in Colab, select `Runtime \u003e Change runtime type \u003e Hardware accelerator` dropdown in the top menu.\n", - "\n", - "### Install the TensorFlow Model Garden pip package\n", - "\n", - "- tf-models-official is the stable Model Garden package. Note that it may not include the latest changes in the tensorflow_models github repo.\n", - "- To include latest changes, you may install tf-models-nightly, which is the nightly Model Garden package created daily automatically.\n", - "pip will install all models and dependencies automatically.\n", - "\n", - "Install the [mediapy](https://github.com/google/mediapy) package for visualizing images/videos." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "s3khsunT7kWa" - }, - "outputs": [], - "source": [ - "!pip install -q tf-models-nightly tfds-nightly\n", - "\n", - "!command -v ffmpeg \u003e/dev/null || (apt update \u0026\u0026 apt install -y ffmpeg)\n", - "!pip install -q mediapy" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "dI_1csl6Q-gH" - }, - "outputs": [], - "source": [ - "import os\n", - "from six.moves import urllib\n", - "\n", - "import matplotlib.pyplot as plt\n", - "import mediapy as media\n", - "import numpy as np\n", - "from PIL import Image\n", - "import tensorflow as tf\n", - "import tensorflow_datasets as tfds\n", - "import tensorflow_hub as hub\n", - "\n", - "from official.vision.beta.configs import video_classification\n", - "from official.vision.beta.projects.movinet.configs import movinet as movinet_configs\n", - "from official.vision.beta.projects.movinet.modeling import movinet\n", - "from official.vision.beta.projects.movinet.modeling import movinet_layers\n", - "from official.vision.beta.projects.movinet.modeling import movinet_model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6g0tuFvf71S9" - }, - "source": [ - "## Example Usage with TensorFlow Hub\n", - "\n", - "Load MoViNet-A2-Base from TensorFlow Hub, as part of the [MoViNet collection](https://tfhub.dev/google/collections/movinet/).\n", - "\n", - "The following code will:\n", - "\n", - "- Load a MoViNet KerasLayer from [tfhub.dev](https://tfhub.dev).\n", - "- Wrap the layer in a [Keras Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model).\n", - "- Load an example image, and reshape it to a single frame video.\n", - "- Classify the video" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "nTUdhlRJzl2o" - }, - "outputs": [], - "source": [ - "movinet_a2_hub_url = 'https://tfhub.dev/tensorflow/movinet/a2/base/kinetics-600/classification/1'\n", - "\n", - "inputs = tf.keras.layers.Input(\n", - " shape=[None, None, None, 3],\n", - " dtype=tf.float32)\n", - "\n", - "encoder = hub.KerasLayer(movinet_a2_hub_url, trainable=True)\n", - "\n", - "# Important: To use tf.nn.conv3d on CPU, we must compile with tf.function.\n", - "encoder.call = tf.function(encoder.call, experimental_compile=True)\n", - "\n", - "# [batch_size, 600]\n", - "outputs = encoder(dict(image=inputs))\n", - "\n", - "model = tf.keras.Model(inputs, outputs)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "7kU1_pL10l0B" - }, - "source": [ - "To provide a simple example video for classification, we can load a static image and reshape it to produce a video with a single frame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Iy0rKRrT723_" - }, - "outputs": [], - "source": [ - "image_url = 'https://upload.wikimedia.org/wikipedia/commons/8/84/Ski_Famille_-_Family_Ski_Holidays.jpg'\n", - "image_height = 224\n", - "image_width = 224\n", - "\n", - "with urllib.request.urlopen(image_url) as f:\n", - " image = Image.open(f).resize((image_height, image_width))\n", - "video = tf.reshape(np.array(image), [1, 1, image_height, image_width, 3])\n", - "video = tf.cast(video, tf.float32) / 255.\n", - "\n", - "image" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Yf6EefHuWfxC" - }, - "source": [ - "Run the model and output the predicted label. Expected output should be skiing (labels 464-467). E.g., 465 = \"skiing crosscountry\".\n", - "\n", - "See [here](https://gist.github.com/willprice/f19da185c9c5f32847134b87c1960769#file-kinetics_600_labels-csv) for a full list of all labels." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "OOpEKuqH8sH7" - }, - "outputs": [], - "source": [ - "output = model(video)\n", - "output_label_index = tf.argmax(output, -1)[0].numpy()\n", - "\n", - "print(output_label_index)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_s-7bEoa3f8g" - }, - "source": [ - "## Example Usage with the TensorFlow Model Garden\n", - "\n", - "Fine-tune MoViNet-A0-Base on [UCF-101](https://www.crcv.ucf.edu/research/data-sets/ucf101/).\n", - "\n", - "The following code will:\n", - "\n", - "- Load the UCF-101 dataset with [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/ucf101).\n", - "- Create a [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) pipeline for training and evaluation.\n", - "- Display some example videos from the dataset.\n", - "- Build a MoViNet model and load pretrained weights.\n", - "- Fine-tune the final classifier layers on UCF-101." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "o7unW4WVr580" - }, - "source": [ - "### Load the UCF-101 Dataset with TensorFlow Datasets\n", - "\n", - "Calling `download_and_prepare()` will automatically download the dataset. After downloading, this cell will output information about the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "FxM1vNYp_YAM" - }, - "outputs": [], - "source": [ - "dataset_name = 'ucf101'\n", - "\n", - "builder = tfds.builder(dataset_name)\n", - "\n", - "config = tfds.download.DownloadConfig(verify_ssl=False)\n", - "builder.download_and_prepare(download_config=config)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "executionInfo": { - "elapsed": 2957, - "status": "ok", - "timestamp": 1619748263684, - "user": { - "displayName": "", - "photoUrl": "", - "userId": "" - }, - "user_tz": 360 - }, - "id": "boQHbcfDhXpJ", - "outputId": "eabc3307-d6bf-4f29-cc5a-c8dc6360701b" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Number of classes: 101\n", - "Number of examples for train: 9537\n", - "Number of examples for test: 3783\n", - "\n" - ] - }, - { - "data": { - "text/plain": [ - "tfds.core.DatasetInfo(\n", - " name='ucf101',\n", - " full_name='ucf101/ucf101_1_256/2.0.0',\n", - " description=\"\"\"\n", - " A 101-label video classification dataset.\n", - " \"\"\",\n", - " config_description=\"\"\"\n", - " 256x256 UCF with the first action recognition split.\n", - " \"\"\",\n", - " homepage='https://www.crcv.ucf.edu/data-sets/ucf101/',\n", - " data_path='/readahead/128M/placer/prod/home/tensorflow-datasets-cns-storage-owner/datasets/ucf101/ucf101_1_256/2.0.0',\n", - " download_size=6.48 GiB,\n", - " dataset_size=Unknown size,\n", - " features=FeaturesDict({\n", - " 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=101),\n", - " 'video': Video(Image(shape=(256, 256, 3), dtype=tf.uint8)),\n", - " }),\n", - " supervised_keys=None,\n", - " splits={\n", - " 'test': \u003cSplitInfo num_examples=3783, num_shards=32\u003e,\n", - " 'train': \u003cSplitInfo num_examples=9537, num_shards=64\u003e,\n", - " },\n", - " citation=\"\"\"@article{DBLP:journals/corr/abs-1212-0402,\n", - " author = {Khurram Soomro and\n", - " Amir Roshan Zamir and\n", - " Mubarak Shah},\n", - " title = {{UCF101:} {A} Dataset of 101 Human Actions Classes From Videos in\n", - " The Wild},\n", - " journal = {CoRR},\n", - " volume = {abs/1212.0402},\n", - " year = {2012},\n", - " url = {http://arxiv.org/abs/1212.0402},\n", - " archivePrefix = {arXiv},\n", - " eprint = {1212.0402},\n", - " timestamp = {Mon, 13 Aug 2018 16:47:45 +0200},\n", - " biburl = {https://dblp.org/rec/bib/journals/corr/abs-1212-0402},\n", - " bibsource = {dblp computer science bibliography, https://dblp.org}\n", - " }\"\"\",\n", - ")" - ] - }, - "execution_count": 0, - "metadata": { - "tags": [] - }, - "output_type": "execute_result" - } - ], - "source": [ - "num_classes = builder.info.features['label'].num_classes\n", - "num_examples = {\n", - " name: split.num_examples\n", - " for name, split in builder.info.splits.items()\n", - "}\n", - "\n", - "print('Number of classes:', num_classes)\n", - "print('Number of examples for train:', num_examples['train'])\n", - "print('Number of examples for test:', num_examples['test'])\n", - "print()\n", - "\n", - "builder.info" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "BsJJgnBBqDKZ" - }, - "source": [ - "Build the training and evaluation datasets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9cO_BCu9le3r" - }, - "outputs": [], - "source": [ - "batch_size = 8\n", - "num_frames = 8\n", - "frame_stride = 10\n", - "resolution = 172\n", - "\n", - "def format_features(features):\n", - " video = features['video']\n", - " video = video[:, ::frame_stride]\n", - " video = video[:, :num_frames]\n", - "\n", - " video = tf.reshape(video, [-1, video.shape[2], video.shape[3], 3])\n", - " video = tf.image.resize(video, (resolution, resolution))\n", - " video = tf.reshape(video, [-1, num_frames, resolution, resolution, 3])\n", - " video = tf.cast(video, tf.float32) / 255.\n", - "\n", - " label = tf.one_hot(features['label'], num_classes)\n", - " return (video, label)\n", - "\n", - "train_dataset = builder.as_dataset(\n", - " split='train',\n", - " batch_size=batch_size,\n", - " shuffle_files=True)\n", - "train_dataset = train_dataset.map(\n", - " format_features,\n", - " num_parallel_calls=tf.data.AUTOTUNE)\n", - "train_dataset = train_dataset.repeat()\n", - "train_dataset = train_dataset.prefetch(2)\n", - "\n", - "test_dataset = builder.as_dataset(\n", - " split='test',\n", - " batch_size=batch_size)\n", - "test_dataset = test_dataset.map(\n", - " format_features,\n", - " num_parallel_calls=tf.data.AUTOTUNE,\n", - " deterministic=True)\n", - "test_dataset = test_dataset.prefetch(2)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rToX7_Ymgh57" - }, - "source": [ - "Display some example videos from the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "KG8Z7rUj06of" - }, - "outputs": [], - "source": [ - "videos, labels = next(iter(train_dataset))\n", - "media.show_videos(videos.numpy(), codec='gif', fps=5)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "R3RHeuHdsd_3" - }, - "source": [ - "### Build MoViNet-A0-Base and Load Pretrained Weights" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "JXVQOP9Rqk0I" - }, - "source": [ - "Here we create a MoViNet model using the open source code provided in [tensorflow/models](https://github.com/tensorflow/models) and load the pretrained weights. Here we freeze the all layers except the final classifier head to speed up fine-tuning." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "JpfxpeGSsbzJ" - }, - "outputs": [], - "source": [ - "model_id = 'a0'\n", - "\n", - "tf.keras.backend.clear_session()\n", - "\n", - "backbone = movinet.Movinet(\n", - " model_id=model_id)\n", - "model = movinet_model.MovinetClassifier(\n", - " backbone=backbone,\n", - " num_classes=600)\n", - "model.build([batch_size, num_frames, resolution, resolution, 3])\n", - "\n", - "# Load pretrained weights from TF Hub\n", - "movinet_hub_url = f'https://tfhub.dev/tensorflow/movinet/{model_id}/base/kinetics-600/classification/1'\n", - "movinet_hub_model = hub.KerasLayer(movinet_hub_url, trainable=True)\n", - "pretrained_weights = {w.name: w for w in movinet_hub_model.weights}\n", - "model_weights = {w.name: w for w in model.weights}\n", - "for name in pretrained_weights:\n", - " model_weights[name].assign(pretrained_weights[name])\n", - "\n", - "# Wrap the backbone with a new classifier to create a new classifier head\n", - "# with num_classes outputs\n", - "model = movinet_model.MovinetClassifier(\n", - " backbone=backbone,\n", - " num_classes=num_classes)\n", - "model.build([batch_size, num_frames, resolution, resolution, 3])\n", - "\n", - "# Freeze all layers except for the final classifier head\n", - "for layer in model.layers[:-1]:\n", - " layer.trainable = False\n", - "model.layers[-1].trainable = True" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "ucntdu2xqgXB" - }, - "source": [ - "Configure fine-tuning with training/evaluation steps, loss object, metrics, learning rate, optimizer, and callbacks.\n", - "\n", - "Here we use 3 epochs. Training for more epochs should improve accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "WUYTw48BouTu" - }, - "outputs": [], - "source": [ - "num_epochs = 3\n", - "\n", - "train_steps = num_examples['train'] // batch_size\n", - "total_train_steps = train_steps * num_epochs\n", - "test_steps = num_examples['test'] // batch_size\n", - "\n", - "loss_obj = tf.keras.losses.CategoricalCrossentropy(\n", - " from_logits=True,\n", - " label_smoothing=0.1)\n", - "\n", - "metrics = [\n", - " tf.keras.metrics.TopKCategoricalAccuracy(\n", - " k=1, name='top_1', dtype=tf.float32),\n", - " tf.keras.metrics.TopKCategoricalAccuracy(\n", - " k=5, name='top_5', dtype=tf.float32),\n", - "]\n", - "\n", - "initial_learning_rate = 0.01\n", - "learning_rate = tf.keras.optimizers.schedules.CosineDecay(\n", - " initial_learning_rate, decay_steps=total_train_steps,\n", - ")\n", - "optimizer = tf.keras.optimizers.RMSprop(\n", - " learning_rate, rho=0.9, momentum=0.9, epsilon=1.0, clipnorm=1.0)\n", - "\n", - "model.compile(loss=loss_obj, optimizer=optimizer, metrics=metrics)\n", - "\n", - "callbacks = [\n", - " tf.keras.callbacks.TensorBoard(),\n", - "]" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0IyAOOlcpHna" - }, - "source": [ - "Run the fine-tuning with Keras compile/fit. After fine-tuning the model, we should be able to achieve \u003e70% accuracy on the test set." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "executionInfo": { - "elapsed": 982253, - "status": "ok", - "timestamp": 1619750139919, - "user": { - "displayName": "", - "photoUrl": "", - "userId": "" - }, - "user_tz": 360 - }, - "id": "Zecc_K3lga8I", - "outputId": "e4c5c61e-aa08-47db-c04c-42dea3efb545" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Epoch 1/3\n", - "1192/1192 [==============================] - 348s 286ms/step - loss: 3.4914 - top_1: 0.3639 - top_5: 0.6294 - val_loss: 2.5153 - val_top_1: 0.5975 - val_top_5: 0.8565\n", - "Epoch 2/3\n", - "1192/1192 [==============================] - 286s 240ms/step - loss: 2.1397 - top_1: 0.6794 - top_5: 0.9231 - val_loss: 2.0695 - val_top_1: 0.6838 - val_top_5: 0.9070\n", - "Epoch 3/3\n", - "1192/1192 [==============================] - 348s 292ms/step - loss: 1.8925 - top_1: 0.7660 - top_5: 0.9454 - val_loss: 1.9848 - val_top_1: 0.7116 - val_top_5: 0.9227\n" - ] - } - ], - "source": [ - "results = model.fit(\n", - " train_dataset,\n", - " validation_data=test_dataset,\n", - " epochs=num_epochs,\n", - " steps_per_epoch=train_steps,\n", - " validation_steps=test_steps,\n", - " callbacks=callbacks,\n", - " validation_freq=1,\n", - " verbose=1)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "XuH8XflmpU9d" - }, - "source": [ - "We can also view the training and evaluation progress in TensorBoard." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "9fZhzhRJRd2J" - }, - "outputs": [], - "source": [ - "%reload_ext tensorboard\n", - "%tensorboard --logdir logs --port 0" - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "last_runtime": { - "build_target": "//learning/deepmind/public/tools/ml_python:ml_notebook", - "kind": "private" - }, - "name": "movinet_tutorial.ipynb", - "provenance": [ - { - "file_id": "11msGCxFjxwioBOBJavP9alfTclUQCJf-", - "timestamp": 1617043059980 - } - ] - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "language_info": { - "name": "python" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} diff --git a/official/vision/beta/projects/movinet/train.py b/official/vision/beta/projects/movinet/train.py deleted file mode 100644 index 0e2c0aec6eed52014be14f342f79efa9af648f19..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/movinet/train.py +++ /dev/null @@ -1,95 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -r"""Training driver. - -To train: - -CONFIG_FILE=official/vision/beta/projects/movinet/configs/yaml/movinet_a0_k600_8x8.yaml -python3 official/vision/beta/projects/movinet/train.py \ - --experiment=movinet_kinetics600 \ - --mode=train \ - --model_dir=/tmp/movinet/ \ - --config_file=${CONFIG_FILE} \ - --params_override="" \ - --gin_file="" \ - --gin_params="" \ - --tpu="" \ - --tf_data_service="" -""" - -from absl import app -from absl import flags -import gin - -# pylint: disable=unused-import -from official.common import registry_imports -# pylint: enable=unused-import -from official.common import distribute_utils -from official.common import flags as tfm_flags -from official.core import task_factory -from official.core import train_lib -from official.core import train_utils -from official.modeling import performance -# Import movinet libraries to register the backbone and model into tf.vision -# model garden factory. -# pylint: disable=unused-import -# the followings are the necessary imports. -from official.vision.beta.projects.movinet.modeling import movinet -from official.vision.beta.projects.movinet.modeling import movinet_model -# pylint: enable=unused-import - -FLAGS = flags.FLAGS - - -def main(_): - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) - params = train_utils.parse_configuration(FLAGS) - model_dir = FLAGS.model_dir - if 'train' in FLAGS.mode: - # Pure eval modes do not output yaml files. Otherwise continuous eval job - # may race against the train job for writing the same file. - train_utils.serialize_config(params, model_dir) - - if 'train_and_eval' in FLAGS.mode: - assert (params.task.train_data.feature_shape == - params.task.validation_data.feature_shape), ( - f'train {params.task.train_data.feature_shape} != validate ' - f'{params.task.validation_data.feature_shape}') - - # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' - # can have significant impact on model speeds by utilizing float16 in case of - # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when - # dtype is float16 - if params.runtime.mixed_precision_dtype: - performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) - distribution_strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - with distribution_strategy.scope(): - task = task_factory.get_task(params.task, logging_dir=model_dir) - - train_lib.run_experiment( - distribution_strategy=distribution_strategy, - task=task, - mode=FLAGS.mode, - params=params, - model_dir=model_dir) - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(main) diff --git a/official/vision/beta/projects/panoptic_maskrcnn/README.md b/official/vision/beta/projects/panoptic_maskrcnn/README.md deleted file mode 100644 index f1cfefdf878f6df40224b507771cbe92c7f55435..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/README.md +++ /dev/null @@ -1,97 +0,0 @@ -# Panoptic Segmentation - -## Description - -Panoptic Segmentation combines the two distinct vision tasks - semantic -segmentation and instance segmentation. These tasks are unified such that, each -pixel in the image is assigned the label of the class it belongs to, and also -the instance identifier of the object it is a part of. - -## Environment setup -The code can be run on multiple GPUs or TPUs with different distribution -strategies. See the TensorFlow distributed training -[guide](https://www.tensorflow.org/guide/distributed_training) for an overview -of `tf.distribute`. - -The code is compatible with TensorFlow 2.6+. See requirements.txt for all -prerequisites. - -```bash -$ git clone https://github.com/tensorflow/models.git -$ cd models -$ pip3 install -r official/requirements.txt -$ export PYTHONPATH=$(pwd) -``` - -## Preparing Dataset -```bash -$ ./official/vision/beta/data/process_coco_panoptic.sh -``` - -## Launch Training -```bash -$ export MODEL_DIR="gs://" -$ export TPU_NAME="" -$ export ANNOTATION_FILE="gs://" -$ export TRAIN_DATA="gs://" -$ export EVAL_DATA="gs://" -$ export OVERRIDES="task.validation_data.input_path=${EVAL_DATA},\ -task.train_data.input_path=${TRAIN_DATA},\ -task.annotation_file=${ANNOTATION_FILE},\ -runtime.distribution_strategy=tpu" - - -$ python3 train.py \ - --experiment panoptic_fpn_coco \ - --config_file configs/experiments/r50fpn_1x_coco.yaml \ - --mode train \ - --model_dir $MODEL_DIR \ - --tpu $TPU_NAME \ - --params_override=$OVERRIDES -``` - -## Launch Evaluation -```bash -$ export MODEL_DIR="gs://" -$ export NUM_GPUS="" -$ export PRECISION="" -$ export ANNOTATION_FILE="gs://" -$ export TRAIN_DATA="gs://" -$ export EVAL_DATA="gs://" -$ export OVERRIDES="task.validation_data.input_path=${EVAL_DATA}, \ -task.train_data.input_path=${TRAIN_DATA}, \ -task.annotation_file=${ANNOTATION_FILE}, \ -runtime.distribution_strategy=mirrored, \ -runtime.mixed_precision_dtype=$PRECISION, \ -runtime.num_gpus=$NUM_GPUS" - - -$ python3 train.py \ - --experiment panoptic_fpn_coco \ - --config_file configs/experiments/r50fpn_1x_coco.yaml \ - --mode eval \ - --model_dir $MODEL_DIR \ - --params_override=$OVERRIDES -``` -**Note**: The [PanopticSegmentationGenerator](https://github.com/tensorflow/models/blob/ac7f9e7f2d0508913947242bad3e23ef7cae5a43/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/panoptic_segmentation_generator.py#L22) layer uses dynamic shapes and hence generating panoptic masks is not supported on Cloud TPUs. Running evaluation on Cloud TPUs is not supported for the same reason. However, training is supported on both Cloud TPUs and GPUs. -## Pretrained Models -### Panoptic FPN -Backbone | Schedule | Experiment name | Box mAP | Mask mAP | Overall PQ | Things PQ | Stuff PQ | Checkpoints -:------------| :----------- | :---------------------------| ------- | ---------- | ---------- | --------- | -------- | ------------: -ResNet-50 | 1x | `panoptic_fpn_coco` | 38.19 | 34.25 | 39.14 | 45.42 | 29.65 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_fpn/panoptic_fpn_1x) -ResNet-50 | 3x | `panoptic_fpn_coco` | 40.64 | 36.29 | 40.91 | 47.68 | 30.69 | [ckpt](gs://tf_model_garden/vision/panoptic/panoptic_fpn/panoptic_fpn_3x) - -**Note**: Here 1x schedule refers to ~12 epochs - -___ -## Citation -``` -@misc{kirillov2019panoptic, - title={Panoptic Feature Pyramid Networks}, - author={Alexander Kirillov and Ross Girshick and Kaiming He and Piotr Dollár}, - year={2019}, - eprint={1901.02446}, - archivePrefix={arXiv}, - primaryClass={cs.CV} -} -``` diff --git a/official/vision/beta/projects/panoptic_maskrcnn/__init__.py b/official/vision/beta/projects/panoptic_maskrcnn/__init__.py deleted file mode 100644 index 3a1fd335a2479174a1195ee001bbde1705c2c727..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/__init__.py +++ /dev/null @@ -1,27 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. diff --git a/official/vision/beta/projects/panoptic_maskrcnn/configs/__init__.py b/official/vision/beta/projects/panoptic_maskrcnn/configs/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/panoptic_maskrcnn/configs/panoptic_maskrcnn.py b/official/vision/beta/projects/panoptic_maskrcnn/configs/panoptic_maskrcnn.py deleted file mode 100644 index c17a82595fa2e6d807b388a3a74fb3b4fefa962e..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/configs/panoptic_maskrcnn.py +++ /dev/null @@ -1,255 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Panoptic Mask R-CNN configuration definition.""" - -import dataclasses -import os -from typing import List, Optional - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.configs import maskrcnn -from official.vision.beta.configs import semantic_segmentation - - -SEGMENTATION_MODEL = semantic_segmentation.SemanticSegmentationModel -SEGMENTATION_HEAD = semantic_segmentation.SegmentationHead - -_COCO_INPUT_PATH_BASE = 'coco/tfrecords' -_COCO_TRAIN_EXAMPLES = 118287 -_COCO_VAL_EXAMPLES = 5000 - -# pytype: disable=wrong-keyword-args - - -@dataclasses.dataclass -class Parser(maskrcnn.Parser): - """Panoptic Mask R-CNN parser config.""" - # If segmentation_resize_eval_groundtruth is set to False, original image - # sizes are used for eval. In that case, - # segmentation_groundtruth_padded_size has to be specified too to allow for - # batching the variable input sizes of images. - segmentation_resize_eval_groundtruth: bool = True - segmentation_groundtruth_padded_size: List[int] = dataclasses.field( - default_factory=list) - segmentation_ignore_label: int = 255 - panoptic_ignore_label: int = 0 - # Setting this to true will enable parsing category_mask and instance_mask. - include_panoptic_masks: bool = True - - -@dataclasses.dataclass -class TfExampleDecoder(common.TfExampleDecoder): - """A simple TF Example decoder config.""" - # Setting this to true will enable decoding category_mask and instance_mask. - include_panoptic_masks: bool = True - - -@dataclasses.dataclass -class DataDecoder(common.DataDecoder): - """Data decoder config.""" - simple_decoder: TfExampleDecoder = TfExampleDecoder() - - -@dataclasses.dataclass -class DataConfig(maskrcnn.DataConfig): - """Input config for training.""" - decoder: DataDecoder = DataDecoder() - parser: Parser = Parser() - - -@dataclasses.dataclass -class PanopticSegmentationGenerator(hyperparams.Config): - """Panoptic segmentation generator config.""" - output_size: List[int] = dataclasses.field( - default_factory=list) - mask_binarize_threshold: float = 0.5 - score_threshold: float = 0.5 - things_overlap_threshold: float = 0.5 - stuff_area_threshold: float = 4096.0 - things_class_label: int = 1 - void_class_label: int = 0 - void_instance_id: int = 0 - rescale_predictions: bool = False - - -@dataclasses.dataclass -class PanopticMaskRCNN(maskrcnn.MaskRCNN): - """Panoptic Mask R-CNN model config.""" - segmentation_model: semantic_segmentation.SemanticSegmentationModel = ( - SEGMENTATION_MODEL(num_classes=2)) - include_mask = True - shared_backbone: bool = True - shared_decoder: bool = True - stuff_classes_offset: int = 0 - generate_panoptic_masks: bool = True - panoptic_segmentation_generator: PanopticSegmentationGenerator = PanopticSegmentationGenerator() # pylint:disable=line-too-long - - -@dataclasses.dataclass -class Losses(maskrcnn.Losses): - """Panoptic Mask R-CNN loss config.""" - semantic_segmentation_label_smoothing: float = 0.0 - semantic_segmentation_ignore_label: int = 255 - semantic_segmentation_class_weights: List[float] = dataclasses.field( - default_factory=list) - semantic_segmentation_use_groundtruth_dimension: bool = True - semantic_segmentation_top_k_percent_pixels: float = 1.0 - instance_segmentation_weight: float = 1.0 - semantic_segmentation_weight: float = 0.5 - - -@dataclasses.dataclass -class PanopticQualityEvaluator(hyperparams.Config): - """Panoptic Quality Evaluator config.""" - num_categories: int = 2 - ignored_label: int = 0 - max_instances_per_category: int = 256 - offset: int = 256 * 256 * 256 - is_thing: List[float] = dataclasses.field( - default_factory=list) - rescale_predictions: bool = False - report_per_class_metrics: bool = False - - -@dataclasses.dataclass -class PanopticMaskRCNNTask(maskrcnn.MaskRCNNTask): - """Panoptic Mask R-CNN task config.""" - model: PanopticMaskRCNN = PanopticMaskRCNN() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False, - drop_remainder=False) - segmentation_evaluation: semantic_segmentation.Evaluation = semantic_segmentation.Evaluation() # pylint: disable=line-too-long - losses: Losses = Losses() - init_checkpoint: Optional[str] = None - segmentation_init_checkpoint: Optional[str] = None - - # 'init_checkpoint_modules' controls the modules that need to be initialized - # from checkpoint paths given by 'init_checkpoint' and/or - # 'segmentation_init_checkpoint. Supports modules: - # 'backbone': Initialize MaskRCNN backbone - # 'segmentation_backbone': Initialize segmentation backbone - # 'segmentation_decoder': Initialize segmentation decoder - # 'all': Initialize all modules - init_checkpoint_modules: Optional[List[str]] = dataclasses.field( - default_factory=list) - panoptic_quality_evaluator: PanopticQualityEvaluator = PanopticQualityEvaluator() # pylint: disable=line-too-long - - -@exp_factory.register_config_factory('panoptic_fpn_coco') -def panoptic_fpn_coco() -> cfg.ExperimentConfig: - """COCO panoptic segmentation with Panoptic Mask R-CNN.""" - train_batch_size = 64 - eval_batch_size = 8 - steps_per_epoch = _COCO_TRAIN_EXAMPLES // train_batch_size - validation_steps = _COCO_VAL_EXAMPLES // eval_batch_size - - # coco panoptic dataset has category ids ranging from [0-200] inclusive. - # 0 is not used and represents the background class - # ids 1-91 represent thing categories (91) - # ids 92-200 represent stuff categories (109) - # for the segmentation task, we continue using id=0 for the background - # and map all thing categories to id=1, the remaining 109 stuff categories - # are shifted by an offset=90 given by num_thing classes - 1. This shifting - # will make all the stuff categories begin from id=2 and end at id=110 - num_panoptic_categories = 201 - num_thing_categories = 91 - num_semantic_segmentation_classes = 111 - - is_thing = [False] - for idx in range(1, num_panoptic_categories): - is_thing.append(True if idx <= num_thing_categories else False) - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig( - mixed_precision_dtype='float32', enable_xla=True), - task=PanopticMaskRCNNTask( - init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', # pylint: disable=line-too-long - init_checkpoint_modules=['backbone'], - model=PanopticMaskRCNN( - num_classes=91, input_size=[1024, 1024, 3], - panoptic_segmentation_generator=PanopticSegmentationGenerator( - output_size=[640, 640], rescale_predictions=True), - stuff_classes_offset=90, - segmentation_model=SEGMENTATION_MODEL( - num_classes=num_semantic_segmentation_classes, - head=SEGMENTATION_HEAD( - level=2, - num_convs=0, - num_filters=128, - decoder_min_level=2, - decoder_max_level=6, - feature_fusion='panoptic_fpn_fusion'))), - losses=Losses(l2_weight_decay=0.00004), - train_data=DataConfig( - input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - parser=Parser( - aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), - validation_data=DataConfig( - input_path=os.path.join(_COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - parser=Parser( - segmentation_resize_eval_groundtruth=False, - segmentation_groundtruth_padded_size=[640, 640]), - drop_remainder=False), - annotation_file=os.path.join(_COCO_INPUT_PATH_BASE, - 'instances_val2017.json'), - segmentation_evaluation=semantic_segmentation.Evaluation( - report_per_class_iou=False, report_train_mean_iou=False), - panoptic_quality_evaluator=PanopticQualityEvaluator( - num_categories=num_panoptic_categories, - ignored_label=0, - is_thing=is_thing, - rescale_predictions=True)), - trainer=cfg.TrainerConfig( - train_steps=22500, - validation_steps=validation_steps, - validation_interval=steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9 - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [15000, 20000], - 'values': [0.12, 0.012, 0.0012], - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 500, - 'warmup_learning_rate': 0.0067 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - return config diff --git a/official/vision/beta/projects/panoptic_maskrcnn/configs/panoptic_maskrcnn_test.py b/official/vision/beta/projects/panoptic_maskrcnn/configs/panoptic_maskrcnn_test.py deleted file mode 100644 index 6cd0ae2c2fa9f45e2c326369ca014044eddc556e..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/configs/panoptic_maskrcnn_test.py +++ /dev/null @@ -1,43 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for panoptic maskrcnn config.""" -# pylint: disable=unused-import -from absl.testing import parameterized -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as exp_cfg - - -class PanopticMaskRCNNConfigTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters( - ('panoptic_fpn_coco',), - ) - def test_panoptic_maskrcnn_configs(self, config_name): - config = exp_factory.get_exp_config(config_name) - self.assertIsInstance(config, cfg.ExperimentConfig) - self.assertIsInstance(config.task, exp_cfg.PanopticMaskRCNNTask) - self.assertIsInstance(config.task.model, exp_cfg.PanopticMaskRCNN) - self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) - config.validate() - config.task.train_data.is_training = None - with self.assertRaisesRegex(KeyError, 'Found inconsistncy between key'): - config.validate() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/panoptic_maskrcnn/modeling/factory.py b/official/vision/beta/projects/panoptic_maskrcnn/modeling/factory.py deleted file mode 100644 index e02227fb3e2f277260e60d054aa1ee65a7dda6a5..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/modeling/factory.py +++ /dev/null @@ -1,143 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Factory method to build panoptic segmentation model.""" - -import tensorflow as tf - -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import factory as models_factory -from official.vision.beta.modeling.decoders import factory as decoder_factory -from official.vision.beta.modeling.heads import segmentation_heads -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as panoptic_maskrcnn_cfg -from official.vision.beta.projects.panoptic_maskrcnn.modeling import panoptic_maskrcnn_model -from official.vision.beta.projects.panoptic_maskrcnn.modeling.layers import panoptic_segmentation_generator - - -def build_panoptic_maskrcnn( - input_specs: tf.keras.layers.InputSpec, - model_config: panoptic_maskrcnn_cfg.PanopticMaskRCNN, - l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras - """Builds Panoptic Mask R-CNN model. - - This factory function builds the mask rcnn first, builds the non-shared - semantic segmentation layers, and finally combines the two models to form - the panoptic segmentation model. - - Args: - input_specs: `tf.keras.layers.InputSpec` specs of the input tensor. - model_config: Config instance for the panoptic maskrcnn model. - l2_regularizer: Optional `tf.keras.regularizers.Regularizer`, if specified, - the model is built with the provided regularization layer. - Returns: - tf.keras.Model for the panoptic segmentation model. - """ - norm_activation_config = model_config.norm_activation - segmentation_config = model_config.segmentation_model - - # Builds the maskrcnn model. - maskrcnn_model = models_factory.build_maskrcnn( - input_specs=input_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - - # Builds the semantic segmentation branch. - if not model_config.shared_backbone: - segmentation_backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=segmentation_config.backbone, - norm_activation_config=norm_activation_config, - l2_regularizer=l2_regularizer) - segmentation_decoder_input_specs = segmentation_backbone.output_specs - else: - segmentation_backbone = None - segmentation_decoder_input_specs = maskrcnn_model.backbone.output_specs - - if not model_config.shared_decoder: - segmentation_decoder = decoder_factory.build_decoder( - input_specs=segmentation_decoder_input_specs, - model_config=segmentation_config, - l2_regularizer=l2_regularizer) - decoder_config = segmentation_decoder.get_config() - else: - segmentation_decoder = None - decoder_config = maskrcnn_model.decoder.get_config() - - segmentation_head_config = segmentation_config.head - detection_head_config = model_config.detection_head - postprocessing_config = model_config.panoptic_segmentation_generator - - segmentation_head = segmentation_heads.SegmentationHead( - num_classes=segmentation_config.num_classes, - level=segmentation_head_config.level, - num_convs=segmentation_head_config.num_convs, - prediction_kernel_size=segmentation_head_config.prediction_kernel_size, - num_filters=segmentation_head_config.num_filters, - upsample_factor=segmentation_head_config.upsample_factor, - feature_fusion=segmentation_head_config.feature_fusion, - decoder_min_level=segmentation_head_config.decoder_min_level, - decoder_max_level=segmentation_head_config.decoder_max_level, - low_level=segmentation_head_config.low_level, - low_level_num_filters=segmentation_head_config.low_level_num_filters, - activation=norm_activation_config.activation, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon, - num_decoder_filters=decoder_config['num_filters'], - kernel_regularizer=l2_regularizer) - - if model_config.generate_panoptic_masks: - max_num_detections = model_config.detection_generator.max_num_detections - mask_binarize_threshold = postprocessing_config.mask_binarize_threshold - panoptic_segmentation_generator_obj = panoptic_segmentation_generator.PanopticSegmentationGenerator( - output_size=postprocessing_config.output_size, - max_num_detections=max_num_detections, - stuff_classes_offset=model_config.stuff_classes_offset, - mask_binarize_threshold=mask_binarize_threshold, - score_threshold=postprocessing_config.score_threshold, - things_overlap_threshold=postprocessing_config.things_overlap_threshold, - things_class_label=postprocessing_config.things_class_label, - stuff_area_threshold=postprocessing_config.stuff_area_threshold, - void_class_label=postprocessing_config.void_class_label, - void_instance_id=postprocessing_config.void_instance_id, - rescale_predictions=postprocessing_config.rescale_predictions) - else: - panoptic_segmentation_generator_obj = None - - # Combines maskrcnn, and segmentation models to build panoptic segmentation - # model. - model = panoptic_maskrcnn_model.PanopticMaskRCNNModel( - backbone=maskrcnn_model.backbone, - decoder=maskrcnn_model.decoder, - rpn_head=maskrcnn_model.rpn_head, - detection_head=maskrcnn_model.detection_head, - roi_generator=maskrcnn_model.roi_generator, - roi_sampler=maskrcnn_model.roi_sampler, - roi_aligner=maskrcnn_model.roi_aligner, - detection_generator=maskrcnn_model.detection_generator, - panoptic_segmentation_generator=panoptic_segmentation_generator_obj, - mask_head=maskrcnn_model.mask_head, - mask_sampler=maskrcnn_model.mask_sampler, - mask_roi_aligner=maskrcnn_model.mask_roi_aligner, - segmentation_backbone=segmentation_backbone, - segmentation_decoder=segmentation_decoder, - segmentation_head=segmentation_head, - class_agnostic_bbox_pred=detection_head_config.class_agnostic_bbox_pred, - cascade_class_ensemble=detection_head_config.cascade_class_ensemble, - min_level=model_config.min_level, - max_level=model_config.max_level, - num_scales=model_config.anchor.num_scales, - aspect_ratios=model_config.anchor.aspect_ratios, - anchor_size=model_config.anchor.anchor_size) - return model diff --git a/official/vision/beta/projects/panoptic_maskrcnn/modeling/factory_test.py b/official/vision/beta/projects/panoptic_maskrcnn/modeling/factory_test.py deleted file mode 100644 index ba64f8083a60c80ef2c68e0a933b912917f0f157..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/modeling/factory_test.py +++ /dev/null @@ -1,65 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for factory.py.""" - -from absl.testing import parameterized -import numpy as np -import tensorflow as tf -from official.vision.beta.configs import backbones -from official.vision.beta.configs import decoders -from official.vision.beta.configs import semantic_segmentation -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as panoptic_maskrcnn_cfg -from official.vision.beta.projects.panoptic_maskrcnn.modeling import factory - - -class PanopticMaskRCNNBuilderTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - ('resnet', (640, 640), 'dilated_resnet', 'fpn', 'panoptic_fpn_fusion'), - ('resnet', (640, 640), 'dilated_resnet', 'aspp', 'deeplabv3plus'), - ('resnet', (640, 640), None, 'fpn', 'panoptic_fpn_fusion'), - ('resnet', (640, 640), None, 'aspp', 'deeplabv3plus'), - ('resnet', (640, 640), None, None, 'panoptic_fpn_fusion'), - ('resnet', (None, None), 'dilated_resnet', 'fpn', 'panoptic_fpn_fusion'), - ('resnet', (None, None), 'dilated_resnet', 'aspp', 'deeplabv3plus'), - ('resnet', (None, None), None, 'fpn', 'panoptic_fpn_fusion'), - ('resnet', (None, None), None, 'aspp', 'deeplabv3plus'), - ('resnet', (None, None), None, None, 'deeplabv3plus')) - def test_builder(self, backbone_type, input_size, segmentation_backbone_type, - segmentation_decoder_type, fusion_type): - num_classes = 2 - input_specs = tf.keras.layers.InputSpec( - shape=[None, input_size[0], input_size[1], 3]) - segmentation_output_stride = 16 - level = int(np.math.log2(segmentation_output_stride)) - segmentation_model = semantic_segmentation.SemanticSegmentationModel( - num_classes=2, - backbone=backbones.Backbone(type=segmentation_backbone_type), - decoder=decoders.Decoder(type=segmentation_decoder_type), - head=semantic_segmentation.SegmentationHead(level=level)) - model_config = panoptic_maskrcnn_cfg.PanopticMaskRCNN( - num_classes=num_classes, - segmentation_model=segmentation_model, - backbone=backbones.Backbone(type=backbone_type), - shared_backbone=segmentation_backbone_type is None, - shared_decoder=segmentation_decoder_type is None) - l2_regularizer = tf.keras.regularizers.l2(5e-5) - _ = factory.build_panoptic_maskrcnn( - input_specs=input_specs, - model_config=model_config, - l2_regularizer=l2_regularizer) - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/panoptic_segmentation_generator.py b/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/panoptic_segmentation_generator.py deleted file mode 100644 index 498304e95c7fce545d7fc13aae297b9727147327..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/panoptic_segmentation_generator.py +++ /dev/null @@ -1,321 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains definition for postprocessing layer to genrate panoptic segmentations.""" - -from typing import List, Optional - -import tensorflow as tf - -from official.vision.beta.projects.panoptic_maskrcnn.modeling.layers import paste_masks - - -class PanopticSegmentationGenerator(tf.keras.layers.Layer): - """Panoptic segmentation generator layer.""" - - def __init__( - self, - output_size: List[int], - max_num_detections: int, - stuff_classes_offset: int, - mask_binarize_threshold: float = 0.5, - score_threshold: float = 0.5, - things_overlap_threshold: float = 0.5, - stuff_area_threshold: float = 4096, - things_class_label: int = 1, - void_class_label: int = 0, - void_instance_id: int = -1, - rescale_predictions: bool = False, - **kwargs): - """Generates panoptic segmentation masks. - - Args: - output_size: A `List` of integers that represent the height and width of - the output mask. - max_num_detections: `int` for maximum number of detections. - stuff_classes_offset: An `int` that is added to the output of the - semantic segmentation mask to make sure that the stuff class ids do not - ovelap with the thing class ids of the MaskRCNN outputs. - mask_binarize_threshold: A `float` - score_threshold: A `float` representing the threshold for deciding - when to remove objects based on score. - things_overlap_threshold: A `float` representing a threshold for deciding - to ignore a thing if overlap is above the threshold. - stuff_area_threshold: A `float` representing a threshold for deciding to - to ignore a stuff class if area is below certain threshold. - things_class_label: An `int` that represents a single merged category of - all thing classes in the semantic segmentation output. - void_class_label: An `int` that is used to represent empty or unlabelled - regions of the mask - void_instance_id: An `int` that is used to denote regions that are not - assigned to any thing class. That is, void_instance_id are assigned to - both stuff regions and empty regions. - rescale_predictions: `bool`, whether to scale back prediction to original - image sizes. If True, image_info is used to rescale predictions. - **kwargs: additional kewargs arguments. - """ - self._output_size = output_size - self._max_num_detections = max_num_detections - self._stuff_classes_offset = stuff_classes_offset - self._mask_binarize_threshold = mask_binarize_threshold - self._score_threshold = score_threshold - self._things_overlap_threshold = things_overlap_threshold - self._stuff_area_threshold = stuff_area_threshold - self._things_class_label = things_class_label - self._void_class_label = void_class_label - self._void_instance_id = void_instance_id - self._rescale_predictions = rescale_predictions - - self._config_dict = { - 'output_size': output_size, - 'max_num_detections': max_num_detections, - 'stuff_classes_offset': stuff_classes_offset, - 'mask_binarize_threshold': mask_binarize_threshold, - 'score_threshold': score_threshold, - 'things_class_label': things_class_label, - 'void_class_label': void_class_label, - 'void_instance_id': void_instance_id, - 'rescale_predictions': rescale_predictions - } - super(PanopticSegmentationGenerator, self).__init__(**kwargs) - - def build(self, input_shape): - grid_sampler = paste_masks.BilinearGridSampler(align_corners=False) - self._paste_masks_fn = paste_masks.PasteMasks( - output_size=self._output_size, grid_sampler=grid_sampler) - - def _generate_panoptic_masks(self, boxes, scores, classes, detections_masks, - segmentation_mask): - """Generates panoptic masks for a single image. - - This function implements the following steps to merge instance and semantic - segmentation masks described in https://arxiv.org/pdf/1901.02446.pdf - Steps: - 1. resolving overlaps between different instances based on their - confidence scores - 2. resolving overlaps between instance and semantic segmentation - outputs in favor of instances - 3. removing any stuff regions labeled other or under a given area - threshold. - Args: - boxes: A `tf.Tensor` of shape [num_rois, 4], representing the bounding - boxes for detected objects. - scores: A `tf.Tensor` of shape [num_rois], representing the - confidence scores for each object. - classes: A `tf.Tensor` of shape [num_rois], representing the class - for each object. - detections_masks: A `tf.Tensor` of shape - [num_rois, mask_height, mask_width, 1], representing the cropped mask - for each object. - segmentation_mask: A `tf.Tensor` of shape [height, width], representing - the semantic segmentation output. - Returns: - Dict with the following keys: - - category_mask: A `tf.Tensor` for category masks. - - instance_mask: A `tf.Tensor for instance masks. - """ - - # Offset stuff class predictions - segmentation_mask = tf.where( - tf.logical_or( - tf.equal(segmentation_mask, self._things_class_label), - tf.equal(segmentation_mask, self._void_class_label)), - segmentation_mask, - segmentation_mask + self._stuff_classes_offset - ) - # sort instances by their scores - sorted_indices = tf.argsort(scores, direction='DESCENDING') - - mask_shape = self._output_size + [1] - category_mask = tf.ones(mask_shape, - dtype=tf.float32) * self._void_class_label - instance_mask = tf.ones( - mask_shape, dtype=tf.float32) * self._void_instance_id - - # filter instances with low confidence - sorted_scores = tf.sort(scores, direction='DESCENDING') - - valid_indices = tf.where(sorted_scores > self._score_threshold) - - # if no instance has sufficient confidence score, skip merging - # instance segmentation masks - if tf.shape(valid_indices)[0] > 0: - loop_end_idx = valid_indices[-1, 0] + 1 - loop_end_idx = tf.minimum( - tf.cast(loop_end_idx, dtype=tf.int32), - self._max_num_detections) - pasted_masks = self._paste_masks_fn(( - detections_masks[:loop_end_idx], - boxes[:loop_end_idx])) - - # add things segmentation to panoptic masks - for i in range(loop_end_idx): - # we process instances in decending order, which will make sure - # the overlaps are resolved based on confidence score - instance_idx = sorted_indices[i] - - pasted_mask = pasted_masks[instance_idx] - - class_id = tf.cast(classes[instance_idx], dtype=tf.float32) - - # convert sigmoid scores to binary values - binary_mask = tf.greater( - pasted_mask, self._mask_binarize_threshold) - - # filter empty instance masks - if not tf.reduce_sum(tf.cast(binary_mask, tf.float32)) > 0: - continue - - overlap = tf.logical_and( - binary_mask, - tf.not_equal(category_mask, self._void_class_label)) - binary_mask_area = tf.reduce_sum( - tf.cast(binary_mask, dtype=tf.float32)) - overlap_area = tf.reduce_sum( - tf.cast(overlap, dtype=tf.float32)) - - # skip instance that have a big enough overlap with instances with - # higer scores - if overlap_area / binary_mask_area > self._things_overlap_threshold: - continue - - # fill empty regions in category_mask represented by - # void_class_label with class_id of the instance. - category_mask = tf.where( - tf.logical_and( - binary_mask, tf.equal(category_mask, self._void_class_label)), - tf.ones_like(category_mask) * class_id, category_mask) - - # fill empty regions in the instance_mask represented by - # void_instance_id with the id of the instance, starting from 1 - instance_mask = tf.where( - tf.logical_and( - binary_mask, - tf.equal(instance_mask, self._void_instance_id)), - tf.ones_like(instance_mask) * - tf.cast(instance_idx + 1, tf.float32), instance_mask) - - stuff_class_ids = tf.unique(tf.reshape(segmentation_mask, [-1])).y - for stuff_class_id in stuff_class_ids: - if stuff_class_id == self._things_class_label: - continue - - stuff_mask = tf.logical_and( - tf.equal(segmentation_mask, stuff_class_id), - tf.equal(category_mask, self._void_class_label)) - - stuff_mask_area = tf.reduce_sum( - tf.cast(stuff_mask, dtype=tf.float32)) - - if stuff_mask_area < self._stuff_area_threshold: - continue - - category_mask = tf.where( - stuff_mask, - tf.ones_like(category_mask) * stuff_class_id, - category_mask) - - results = { - 'category_mask': category_mask[:, :, 0], - 'instance_mask': instance_mask[:, :, 0] - } - return results - - def _resize_and_pad_masks(self, mask, image_info): - """Resizes masks to match the original image shape and pads to`output_size`. - - Args: - mask: a padded mask tensor. - image_info: a tensor that holds information about original and - preprocessed images. - Returns: - resized and padded masks: tf.Tensor. - """ - rescale_size = tf.cast( - tf.math.ceil(image_info[1, :] / image_info[2, :]), tf.int32) - image_shape = tf.cast(image_info[0, :], tf.int32) - offsets = tf.cast(image_info[3, :], tf.int32) - - mask = tf.image.resize( - mask, - rescale_size, - method='bilinear') - mask = tf.image.crop_to_bounding_box( - mask, - offsets[0], offsets[1], - image_shape[0], - image_shape[1]) - mask = tf.image.pad_to_bounding_box( - mask, 0, 0, self._output_size[0], self._output_size[1]) - return mask - - def call(self, inputs: tf.Tensor, image_info: Optional[tf.Tensor] = None): - detections = inputs - - batched_scores = detections['detection_scores'] - batched_classes = detections['detection_classes'] - batched_detections_masks = tf.expand_dims( - detections['detection_masks'], axis=-1) - batched_boxes = detections['detection_boxes'] - batched_segmentation_masks = tf.cast( - detections['segmentation_outputs'], dtype=tf.float32) - - if self._rescale_predictions: - scale = tf.tile( - tf.cast(image_info[:, 2:3, :], dtype=batched_boxes.dtype), - multiples=[1, 1, 2]) - batched_boxes /= scale - - batched_segmentation_masks = tf.map_fn( - fn=lambda x: self._resize_and_pad_masks(x[0], x[1]), - elems=( - batched_segmentation_masks, - image_info), - fn_output_signature=tf.float32, - parallel_iterations=32) - else: - batched_segmentation_masks = tf.image.resize( - batched_segmentation_masks, - size=self._output_size, - method='bilinear') - - batched_segmentation_masks = tf.expand_dims(tf.cast( - tf.argmax(batched_segmentation_masks, axis=-1), - dtype=tf.float32), axis=-1) - - panoptic_masks = tf.map_fn( - fn=lambda x: self._generate_panoptic_masks( # pylint:disable=g-long-lambda - x[0], x[1], x[2], x[3], x[4]), - elems=( - batched_boxes, - batched_scores, - batched_classes, - batched_detections_masks, - batched_segmentation_masks), - fn_output_signature={ - 'category_mask': tf.float32, - 'instance_mask': tf.float32 - }, parallel_iterations=32) - - for k, v in panoptic_masks.items(): - panoptic_masks[k] = tf.cast(v, dtype=tf.int32) - - return panoptic_masks - - def get_config(self): - return self._config_dict - - @classmethod - def from_config(cls, config): - return cls(**config) diff --git a/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/panoptic_segmentation_generator_test.py b/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/panoptic_segmentation_generator_test.py deleted file mode 100644 index 8005a8350e748151ca9cd9a6ab2e34449dba1524..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/modeling/layers/panoptic_segmentation_generator_test.py +++ /dev/null @@ -1,145 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for panoptic_segmentation_generator.py.""" - -from absl.testing import parameterized -import numpy as np -import tensorflow as tf -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations - -from official.vision.beta.projects.panoptic_maskrcnn.modeling.layers import panoptic_segmentation_generator - -PANOPTIC_SEGMENTATION_GENERATOR = panoptic_segmentation_generator.PanopticSegmentationGenerator - - -class PanopticSegmentationGeneratorTest( - parameterized.TestCase, tf.test.TestCase): - - def test_serialize_deserialize(self): - config = { - 'output_size': [640, 640], - 'max_num_detections': 100, - 'stuff_classes_offset': 90, - 'mask_binarize_threshold': 0.5, - 'score_threshold': 0.005, - 'things_class_label': 1, - 'void_class_label': 0, - 'void_instance_id': -1, - 'rescale_predictions': False, - } - generator = PANOPTIC_SEGMENTATION_GENERATOR(**config) - - expected_config = dict(config) - self.assertEqual(generator.get_config(), expected_config) - - new_generator = PANOPTIC_SEGMENTATION_GENERATOR.from_config( - generator.get_config()) - - self.assertAllEqual(generator.get_config(), new_generator.get_config()) - - @combinations.generate( - combinations.combine( - strategy=[ - strategy_combinations.default_strategy, - strategy_combinations.one_device_strategy_gpu, - ])) - def test_outputs(self, strategy): - - # 0 represents the void class label - thing_class_ids = [0, 1, 2, 3, 4] - stuff_class_ids = [0, 5, 6, 7, 8, 9, 10] - all_class_ids = set(thing_class_ids + stuff_class_ids) - - num_thing_classes = len(thing_class_ids) - num_stuff_classes = len(stuff_class_ids) - num_classes_for_segmentation = num_stuff_classes + 1 - - # all thing classes are mapped to class_id=1, stuff class ids are offset - # such that the stuff class_ids start from 2, this means the semantic - # segmentation head will have ground truths with class_ids belonging to - # [0, 1, 2, 3, 4, 5, 6, 7] - - config = { - 'output_size': [640, 640], - 'max_num_detections': 100, - 'stuff_classes_offset': 3, - 'mask_binarize_threshold': 0.5, - 'score_threshold': 0.005, - 'things_class_label': 1, - 'void_class_label': 0, - 'void_instance_id': -1, - 'rescale_predictions': False, - } - generator = PANOPTIC_SEGMENTATION_GENERATOR(**config) - - crop_height = 112 - crop_width = 112 - - boxes = tf.constant([[ - [167, 398, 342, 619], - [192, 171, 363, 449], - [211, 1, 382, 74] - ]]) - - num_detections = boxes.get_shape().as_list()[1] - scores = tf.random.uniform([1, num_detections], 0, 1) - classes = tf.random.uniform( - [1, num_detections], - 1, num_thing_classes, dtype=tf.int32) - masks = tf.random.normal( - [1, num_detections, crop_height, crop_width]) - - segmentation_mask = tf.random.uniform( - [1, *config['output_size']], - 0, num_classes_for_segmentation, dtype=tf.int32) - segmentation_mask_one_hot = tf.one_hot( - segmentation_mask, depth=num_stuff_classes + 1) - - inputs = { - 'detection_boxes': boxes, - 'detection_scores': scores, - 'detection_classes': classes, - 'detection_masks': masks, - 'num_detections': tf.constant([num_detections]), - 'segmentation_outputs': segmentation_mask_one_hot - } - - def _run(inputs): - return generator(inputs=inputs) - - @tf.function - def _distributed_run(inputs): - outputs = strategy.run(_run, args=((inputs,))) - return strategy.gather(outputs, axis=0) - - outputs = _distributed_run(inputs) - - self.assertIn('category_mask', outputs) - self.assertIn('instance_mask', outputs) - - self.assertAllEqual( - outputs['category_mask'][0].get_shape().as_list(), - config['output_size']) - - self.assertAllEqual( - outputs['instance_mask'][0].get_shape().as_list(), - config['output_size']) - - for category_id in np.unique(outputs['category_mask']): - self.assertIn(category_id, all_class_ids) - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/panoptic_maskrcnn/modeling/panoptic_maskrcnn_model_test.py b/official/vision/beta/projects/panoptic_maskrcnn/modeling/panoptic_maskrcnn_model_test.py deleted file mode 100644 index 63889ba50a7bc487dfd00b5d18e690bb072c325a..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/modeling/panoptic_maskrcnn_model_test.py +++ /dev/null @@ -1,553 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for panoptic_maskrcnn_model.py.""" - -import os -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.modeling.backbones import resnet -from official.vision.beta.modeling.decoders import aspp -from official.vision.beta.modeling.decoders import fpn -from official.vision.beta.modeling.heads import dense_prediction_heads -from official.vision.beta.modeling.heads import instance_heads -from official.vision.beta.modeling.heads import segmentation_heads -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.modeling.layers import mask_sampler -from official.vision.beta.modeling.layers import roi_aligner -from official.vision.beta.modeling.layers import roi_generator -from official.vision.beta.modeling.layers import roi_sampler -from official.vision.beta.ops import anchor -from official.vision.beta.projects.panoptic_maskrcnn.modeling import panoptic_maskrcnn_model -from official.vision.beta.projects.panoptic_maskrcnn.modeling.layers import panoptic_segmentation_generator - - -class PanopticMaskRCNNModelTest(parameterized.TestCase, tf.test.TestCase): - - @combinations.generate( - combinations.combine( - use_separable_conv=[True, False], - build_anchor_boxes=[True, False], - shared_backbone=[True, False], - shared_decoder=[True, False], - is_training=[True,])) - def test_build_model(self, - use_separable_conv, - build_anchor_boxes, - shared_backbone, - shared_decoder, - is_training=True): - num_classes = 3 - min_level = 2 - max_level = 6 - num_scales = 3 - aspect_ratios = [1.0] - anchor_size = 3 - resnet_model_id = 50 - segmentation_resnet_model_id = 50 - aspp_dilation_rates = [6, 12, 18] - aspp_decoder_level = 2 - fpn_decoder_level = 2 - num_anchors_per_location = num_scales * len(aspect_ratios) - image_size = 128 - images = tf.random.normal([2, image_size, image_size, 3]) - image_info = tf.convert_to_tensor( - [[[image_size, image_size], [image_size, image_size], [1, 1], [0, 0]], - [[image_size, image_size], [image_size, image_size], [1, 1], [0, 0]]]) - shared_decoder = shared_decoder and shared_backbone - if build_anchor_boxes or not is_training: - anchor_boxes = anchor.Anchor( - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=3, - image_size=(image_size, image_size)).multilevel_boxes - for l in anchor_boxes: - anchor_boxes[l] = tf.tile( - tf.expand_dims(anchor_boxes[l], axis=0), [2, 1, 1, 1]) - else: - anchor_boxes = None - - backbone = resnet.ResNet(model_id=resnet_model_id) - decoder = fpn.FPN( - input_specs=backbone.output_specs, - min_level=min_level, - max_level=max_level, - use_separable_conv=use_separable_conv) - rpn_head = dense_prediction_heads.RPNHead( - min_level=min_level, - max_level=max_level, - num_anchors_per_location=num_anchors_per_location, - num_convs=1) - detection_head = instance_heads.DetectionHead(num_classes=num_classes) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - roi_sampler_obj = roi_sampler.ROISampler() - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - panoptic_segmentation_generator_obj = panoptic_segmentation_generator.PanopticSegmentationGenerator( - output_size=[image_size, image_size], - max_num_detections=100, - stuff_classes_offset=90) - mask_head = instance_heads.MaskHead( - num_classes=num_classes, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - - if shared_backbone: - segmentation_backbone = None - else: - segmentation_backbone = resnet.ResNet( - model_id=segmentation_resnet_model_id) - if not shared_decoder: - feature_fusion = 'deeplabv3plus' - level = aspp_decoder_level - segmentation_decoder = aspp.ASPP( - level=level, dilation_rates=aspp_dilation_rates) - else: - feature_fusion = 'panoptic_fpn_fusion' - level = fpn_decoder_level - segmentation_decoder = None - segmentation_head = segmentation_heads.SegmentationHead( - num_classes=2, # stuff and common class for things, - level=level, - feature_fusion=feature_fusion, - decoder_min_level=min_level, - decoder_max_level=max_level, - num_convs=2) - - model = panoptic_maskrcnn_model.PanopticMaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - panoptic_segmentation_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - segmentation_backbone=segmentation_backbone, - segmentation_decoder=segmentation_decoder, - segmentation_head=segmentation_head, - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size) - - gt_boxes = tf.convert_to_tensor( - [[[10, 10, 15, 15], [2.5, 2.5, 7.5, 7.5], [-1, -1, -1, -1]], - [[100, 100, 150, 150], [-1, -1, -1, -1], [-1, -1, -1, -1]]], - dtype=tf.float32) - gt_classes = tf.convert_to_tensor([[2, 1, -1], [1, -1, -1]], dtype=tf.int32) - gt_masks = tf.ones((2, 3, 100, 100)) - - # Results will be checked in test_forward. - _ = model( - images, - image_info, - anchor_boxes, - gt_boxes, - gt_classes, - gt_masks, - training=is_training) - - @combinations.generate( - combinations.combine( - strategy=[ - strategy_combinations.one_device_strategy, - strategy_combinations.one_device_strategy_gpu, - ], - shared_backbone=[True, False], - shared_decoder=[True, False], - training=[True, False], - generate_panoptic_masks=[True, False])) - def test_forward(self, strategy, training, - shared_backbone, shared_decoder, - generate_panoptic_masks): - num_classes = 3 - min_level = 2 - max_level = 6 - num_scales = 3 - aspect_ratios = [1.0] - anchor_size = 3 - segmentation_resnet_model_id = 101 - aspp_dilation_rates = [6, 12, 18] - aspp_decoder_level = 2 - fpn_decoder_level = 2 - - class_agnostic_bbox_pred = False - cascade_class_ensemble = False - - image_size = (256, 256) - images = tf.random.normal([2, image_size[0], image_size[1], 3]) - image_info = tf.convert_to_tensor( - [[[224, 100], [224, 100], [1, 1], [0, 0]], - [[224, 100], [224, 100], [1, 1], [0, 0]]]) - shared_decoder = shared_decoder and shared_backbone - with strategy.scope(): - - anchor_boxes = anchor.Anchor( - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size, - image_size=image_size).multilevel_boxes - - num_anchors_per_location = len(aspect_ratios) * num_scales - - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) - backbone = resnet.ResNet(model_id=50, input_specs=input_specs) - decoder = fpn.FPN( - min_level=min_level, - max_level=max_level, - input_specs=backbone.output_specs) - rpn_head = dense_prediction_heads.RPNHead( - min_level=min_level, - max_level=max_level, - num_anchors_per_location=num_anchors_per_location) - detection_head = instance_heads.DetectionHead( - num_classes=num_classes, - class_agnostic_bbox_pred=class_agnostic_bbox_pred) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - - roi_sampler_cascade = [] - roi_sampler_obj = roi_sampler.ROISampler() - roi_sampler_cascade.append(roi_sampler_obj) - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - - if generate_panoptic_masks: - panoptic_segmentation_generator_obj = panoptic_segmentation_generator.PanopticSegmentationGenerator( - output_size=list(image_size), - max_num_detections=100, - stuff_classes_offset=90) - else: - panoptic_segmentation_generator_obj = None - - mask_head = instance_heads.MaskHead( - num_classes=num_classes, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - - if shared_backbone: - segmentation_backbone = None - else: - segmentation_backbone = resnet.ResNet( - model_id=segmentation_resnet_model_id) - if not shared_decoder: - feature_fusion = 'deeplabv3plus' - level = aspp_decoder_level - segmentation_decoder = aspp.ASPP( - level=level, dilation_rates=aspp_dilation_rates) - else: - feature_fusion = 'panoptic_fpn_fusion' - level = fpn_decoder_level - segmentation_decoder = None - segmentation_head = segmentation_heads.SegmentationHead( - num_classes=2, # stuff and common class for things, - level=level, - feature_fusion=feature_fusion, - decoder_min_level=min_level, - decoder_max_level=max_level, - num_convs=2) - - model = panoptic_maskrcnn_model.PanopticMaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - panoptic_segmentation_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - segmentation_backbone=segmentation_backbone, - segmentation_decoder=segmentation_decoder, - segmentation_head=segmentation_head, - class_agnostic_bbox_pred=class_agnostic_bbox_pred, - cascade_class_ensemble=cascade_class_ensemble, - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size) - - gt_boxes = tf.convert_to_tensor( - [[[10, 10, 15, 15], [2.5, 2.5, 7.5, 7.5], [-1, -1, -1, -1]], - [[100, 100, 150, 150], [-1, -1, -1, -1], [-1, -1, -1, -1]]], - dtype=tf.float32) - gt_classes = tf.convert_to_tensor( - [[2, 1, -1], [1, -1, -1]], dtype=tf.int32) - gt_masks = tf.ones((2, 3, 100, 100)) - - results = model( - images, - image_info, - anchor_boxes, - gt_boxes, - gt_classes, - gt_masks, - training=training) - - self.assertIn('rpn_boxes', results) - self.assertIn('rpn_scores', results) - if training: - self.assertIn('class_targets', results) - self.assertIn('box_targets', results) - self.assertIn('class_outputs', results) - self.assertIn('box_outputs', results) - self.assertIn('mask_outputs', results) - else: - self.assertIn('detection_boxes', results) - self.assertIn('detection_scores', results) - self.assertIn('detection_classes', results) - self.assertIn('num_detections', results) - self.assertIn('detection_masks', results) - self.assertIn('segmentation_outputs', results) - - self.assertAllEqual( - [2, image_size[0] // (2**level), image_size[1] // (2**level), 2], - results['segmentation_outputs'].numpy().shape) - - if generate_panoptic_masks: - self.assertIn('panoptic_outputs', results) - self.assertIn('category_mask', results['panoptic_outputs']) - self.assertIn('instance_mask', results['panoptic_outputs']) - self.assertAllEqual( - [2, image_size[0], image_size[1]], - results['panoptic_outputs']['category_mask'].numpy().shape) - self.assertAllEqual( - [2, image_size[0], image_size[1]], - results['panoptic_outputs']['instance_mask'].numpy().shape) - else: - self.assertNotIn('panoptic_outputs', results) - - @combinations.generate( - combinations.combine( - shared_backbone=[True, False], shared_decoder=[True, False])) - def test_serialize_deserialize(self, shared_backbone, shared_decoder): - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) - backbone = resnet.ResNet(model_id=50, input_specs=input_specs) - decoder = fpn.FPN( - min_level=3, max_level=7, input_specs=backbone.output_specs) - rpn_head = dense_prediction_heads.RPNHead( - min_level=3, max_level=7, num_anchors_per_location=3) - detection_head = instance_heads.DetectionHead(num_classes=2) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - roi_sampler_obj = roi_sampler.ROISampler() - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - panoptic_segmentation_generator_obj = panoptic_segmentation_generator.PanopticSegmentationGenerator( - output_size=[None, None], - max_num_detections=100, - stuff_classes_offset=90) - segmentation_resnet_model_id = 101 - aspp_dilation_rates = [6, 12, 18] - min_level = 2 - max_level = 6 - aspp_decoder_level = 2 - fpn_decoder_level = 2 - shared_decoder = shared_decoder and shared_backbone - mask_head = instance_heads.MaskHead(num_classes=2, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - - if shared_backbone: - segmentation_backbone = None - else: - segmentation_backbone = resnet.ResNet( - model_id=segmentation_resnet_model_id) - if not shared_decoder: - feature_fusion = 'deeplabv3plus' - level = aspp_decoder_level - segmentation_decoder = aspp.ASPP( - level=level, dilation_rates=aspp_dilation_rates) - else: - feature_fusion = 'panoptic_fpn_fusion' - level = fpn_decoder_level - segmentation_decoder = None - segmentation_head = segmentation_heads.SegmentationHead( - num_classes=2, # stuff and common class for things, - level=level, - feature_fusion=feature_fusion, - decoder_min_level=min_level, - decoder_max_level=max_level, - num_convs=2) - - model = panoptic_maskrcnn_model.PanopticMaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - panoptic_segmentation_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - segmentation_backbone=segmentation_backbone, - segmentation_decoder=segmentation_decoder, - segmentation_head=segmentation_head, - min_level=min_level, - max_level=max_level, - num_scales=3, - aspect_ratios=[1.0], - anchor_size=3) - - config = model.get_config() - new_model = panoptic_maskrcnn_model.PanopticMaskRCNNModel.from_config( - config) - - # Validate that the config can be forced to JSON. - _ = new_model.to_json() - - # If the serialization was successful, the new config should match the old. - self.assertAllEqual(model.get_config(), new_model.get_config()) - - @combinations.generate( - combinations.combine( - shared_backbone=[True, False], shared_decoder=[True, False])) - def test_checkpoint(self, shared_backbone, shared_decoder): - input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) - backbone = resnet.ResNet(model_id=50, input_specs=input_specs) - decoder = fpn.FPN( - min_level=3, max_level=7, input_specs=backbone.output_specs) - rpn_head = dense_prediction_heads.RPNHead( - min_level=3, max_level=7, num_anchors_per_location=3) - detection_head = instance_heads.DetectionHead(num_classes=2) - roi_generator_obj = roi_generator.MultilevelROIGenerator() - roi_sampler_obj = roi_sampler.ROISampler() - roi_aligner_obj = roi_aligner.MultilevelROIAligner() - detection_generator_obj = detection_generator.DetectionGenerator() - panoptic_segmentation_generator_obj = panoptic_segmentation_generator.PanopticSegmentationGenerator( - output_size=[None, None], - max_num_detections=100, - stuff_classes_offset=90) - segmentation_resnet_model_id = 101 - aspp_dilation_rates = [6, 12, 18] - min_level = 2 - max_level = 6 - aspp_decoder_level = 2 - fpn_decoder_level = 2 - shared_decoder = shared_decoder and shared_backbone - mask_head = instance_heads.MaskHead(num_classes=2, upsample_factor=2) - mask_sampler_obj = mask_sampler.MaskSampler( - mask_target_size=28, num_sampled_masks=1) - mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) - - if shared_backbone: - segmentation_backbone = None - else: - segmentation_backbone = resnet.ResNet( - model_id=segmentation_resnet_model_id) - if not shared_decoder: - feature_fusion = 'deeplabv3plus' - level = aspp_decoder_level - segmentation_decoder = aspp.ASPP( - level=level, dilation_rates=aspp_dilation_rates) - else: - feature_fusion = 'panoptic_fpn_fusion' - level = fpn_decoder_level - segmentation_decoder = None - segmentation_head = segmentation_heads.SegmentationHead( - num_classes=2, # stuff and common class for things, - level=level, - feature_fusion=feature_fusion, - decoder_min_level=min_level, - decoder_max_level=max_level, - num_convs=2) - - model = panoptic_maskrcnn_model.PanopticMaskRCNNModel( - backbone, - decoder, - rpn_head, - detection_head, - roi_generator_obj, - roi_sampler_obj, - roi_aligner_obj, - detection_generator_obj, - panoptic_segmentation_generator_obj, - mask_head, - mask_sampler_obj, - mask_roi_aligner_obj, - segmentation_backbone=segmentation_backbone, - segmentation_decoder=segmentation_decoder, - segmentation_head=segmentation_head, - min_level=max_level, - max_level=max_level, - num_scales=3, - aspect_ratios=[1.0], - anchor_size=3) - expect_checkpoint_items = dict( - backbone=backbone, - decoder=decoder, - rpn_head=rpn_head, - detection_head=[detection_head]) - expect_checkpoint_items['mask_head'] = mask_head - if not shared_backbone: - expect_checkpoint_items['segmentation_backbone'] = segmentation_backbone - if not shared_decoder: - expect_checkpoint_items['segmentation_decoder'] = segmentation_decoder - expect_checkpoint_items['segmentation_head'] = segmentation_head - self.assertAllEqual(expect_checkpoint_items, model.checkpoint_items) - - # Test save and load checkpoints. - ckpt = tf.train.Checkpoint(model=model, **model.checkpoint_items) - save_dir = self.create_tempdir().full_path - ckpt.save(os.path.join(save_dir, 'ckpt')) - - partial_ckpt = tf.train.Checkpoint(backbone=backbone) - partial_ckpt.read(tf.train.latest_checkpoint( - save_dir)).expect_partial().assert_existing_objects_matched() - - partial_ckpt_mask = tf.train.Checkpoint( - backbone=backbone, mask_head=mask_head) - partial_ckpt_mask.restore(tf.train.latest_checkpoint( - save_dir)).expect_partial().assert_existing_objects_matched() - - if not shared_backbone: - partial_ckpt_segmentation = tf.train.Checkpoint( - segmentation_backbone=segmentation_backbone, - segmentation_decoder=segmentation_decoder, - segmentation_head=segmentation_head) - elif not shared_decoder: - partial_ckpt_segmentation = tf.train.Checkpoint( - segmentation_decoder=segmentation_decoder, - segmentation_head=segmentation_head) - else: - partial_ckpt_segmentation = tf.train.Checkpoint( - segmentation_head=segmentation_head) - - partial_ckpt_segmentation.restore(tf.train.latest_checkpoint( - save_dir)).expect_partial().assert_existing_objects_matched() - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/panoptic_maskrcnn/serving/export_saved_model.py b/official/vision/beta/projects/panoptic_maskrcnn/serving/export_saved_model.py deleted file mode 100644 index 11d675971da8d23e47d7685e9266de784a1dd45f..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/serving/export_saved_model.py +++ /dev/null @@ -1,115 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -r"""Panoptic MaskRCNN model export binary for serving/inference. - -To export a trained checkpoint in saved_model format (shell script): - -CHECKPOINT_PATH = XX -EXPORT_DIR_PATH = XX -CONFIG_FILE_PATH = XX -export_saved_model --export_dir=${EXPORT_DIR_PATH}/ \ - --checkpoint_path=${CHECKPOINT_PATH} \ - --config_file=${CONFIG_FILE_PATH} \ - --batch_size=2 \ - --input_image_size=224,224 -To serve (python): -export_dir_path = XX -input_type = XX -input_images = XX -imported = tf.saved_model.load(export_dir_path) -model_fn = imported.signatures['serving_default'] -output = model_fn(input_images) -""" - -from absl import app -from absl import flags -import tensorflow as tf - -from official.core import exp_factory -from official.modeling import hyperparams -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as cfg # pylint: disable=unused-import -from official.vision.beta.projects.panoptic_maskrcnn.modeling import factory -from official.vision.beta.projects.panoptic_maskrcnn.serving import panoptic_segmentation -from official.vision.beta.projects.panoptic_maskrcnn.tasks import panoptic_maskrcnn as task # pylint: disable=unused-import -from official.vision.beta.serving import export_saved_model_lib - -FLAGS = flags.FLAGS - -flags.DEFINE_string('experiment', 'panoptic_fpn_coco', - 'experiment type, e.g. panoptic_fpn_coco') -flags.DEFINE_string('export_dir', None, 'The export directory.') -flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path.') -flags.DEFINE_multi_string( - 'config_file', - default=None, - help='YAML/JSON files which specifies overrides. The override order ' - 'follows the order of args. Note that each file ' - 'can be used as an override template to override the default parameters ' - 'specified in Python. If the same parameter is specified in both ' - '`--config_file` and `--params_override`, `config_file` will be used ' - 'first, followed by params_override.') -flags.DEFINE_string( - 'params_override', '', - 'The JSON/YAML file or string which specifies the parameter to be overriden' - ' on top of `config_file` template.') -flags.DEFINE_integer('batch_size', None, 'The batch size.') -flags.DEFINE_string('input_type', 'image_tensor', - 'One of `image_tensor`, `image_bytes`, `tf_example`.') -flags.DEFINE_string( - 'input_image_size', '224,224', - 'The comma-separated string of two integers representing the height,width ' - 'of the input to the model.') - - -def main(_): - - params = exp_factory.get_exp_config(FLAGS.experiment) - for config_file in FLAGS.config_file or []: - params = hyperparams.override_params_dict( - params, config_file, is_strict=True) - if FLAGS.params_override: - params = hyperparams.override_params_dict( - params, FLAGS.params_override, is_strict=True) - - params.validate() - params.lock() - - input_image_size = [int(x) for x in FLAGS.input_image_size.split(',')] - input_specs = tf.keras.layers.InputSpec( - shape=[FLAGS.batch_size, *input_image_size, 3]) - model = factory.build_panoptic_maskrcnn( - input_specs=input_specs, model_config=params.task.model) - - export_module = panoptic_segmentation.PanopticSegmentationModule( - params=params, - model=model, - batch_size=FLAGS.batch_size, - input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], - num_channels=3) - - export_saved_model_lib.export_inference_graph( - input_type=FLAGS.input_type, - batch_size=FLAGS.batch_size, - input_image_size=input_image_size, - params=params, - checkpoint_path=FLAGS.checkpoint_path, - export_dir=FLAGS.export_dir, - export_module=export_module, - export_checkpoint_subdir='checkpoint', - export_saved_model_subdir='saved_model') - - -if __name__ == '__main__': - app.run(main) diff --git a/official/vision/beta/projects/panoptic_maskrcnn/serving/panoptic_segmentation.py b/official/vision/beta/projects/panoptic_maskrcnn/serving/panoptic_segmentation.py deleted file mode 100644 index 2f001307f2f4195c5a1fb222612605a5f14ed770..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/serving/panoptic_segmentation.py +++ /dev/null @@ -1,126 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Panoptic Segmentation input and model functions for serving/inference.""" - -from typing import List - -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.vision.beta.projects.panoptic_maskrcnn.modeling import panoptic_maskrcnn_model -from official.vision.beta.serving import detection - - -class PanopticSegmentationModule(detection.DetectionModule): - """Panoptic Segmentation Module.""" - - def __init__(self, - params: cfg.ExperimentConfig, - *, - model: tf.keras.Model, - batch_size: int, - input_image_size: List[int], - num_channels: int = 3): - """Initializes panoptic segmentation module for export.""" - - if batch_size is None: - raise ValueError('batch_size cannot be None for panoptic segmentation ' - 'model.') - if not isinstance(model, panoptic_maskrcnn_model.PanopticMaskRCNNModel): - raise ValueError('PanopticSegmentationModule module not implemented for ' - '{} model.'.format(type(model))) - - super(PanopticSegmentationModule, self).__init__( - params=params, - model=model, - batch_size=batch_size, - input_image_size=input_image_size, - num_channels=num_channels) - - def serve(self, images: tf.Tensor): - """Cast image to float and run inference. - - Args: - images: uint8 Tensor of shape [batch_size, None, None, 3] - Returns: - Tensor holding detection output logits. - """ - model_params = self.params.task.model - with tf.device('cpu:0'): - images = tf.cast(images, dtype=tf.float32) - - # Tensor Specs for map_fn outputs (images, anchor_boxes, and image_info). - images_spec = tf.TensorSpec(shape=self._input_image_size + [3], - dtype=tf.float32) - - num_anchors = model_params.anchor.num_scales * len( - model_params.anchor.aspect_ratios) * 4 - anchor_shapes = [] - for level in range(model_params.min_level, model_params.max_level + 1): - anchor_level_spec = tf.TensorSpec( - shape=[ - self._input_image_size[0] // 2**level, - self._input_image_size[1] // 2**level, num_anchors - ], - dtype=tf.float32) - anchor_shapes.append((str(level), anchor_level_spec)) - - image_info_spec = tf.TensorSpec(shape=[4, 2], dtype=tf.float32) - - images, anchor_boxes, image_info = tf.nest.map_structure( - tf.identity, - tf.map_fn( - self._build_inputs, - elems=images, - fn_output_signature=(images_spec, dict(anchor_shapes), - image_info_spec), - parallel_iterations=32)) - - # To overcome keras.Model extra limitation to save a model with layers that - # have multiple inputs, we use `model.call` here to trigger the forward - # path. Note that, this disables some keras magics happens in `__call__`. - detections = self.model.call( - images=images, - image_info=image_info, - anchor_boxes=anchor_boxes, - training=False) - - if model_params.detection_generator.apply_nms: - final_outputs = { - 'detection_boxes': detections['detection_boxes'], - 'detection_scores': detections['detection_scores'], - 'detection_classes': detections['detection_classes'], - 'num_detections': detections['num_detections'] - } - else: - final_outputs = { - 'decoded_boxes': detections['decoded_boxes'], - 'decoded_box_scores': detections['decoded_box_scores'] - } - - final_outputs.update({ - 'detection_masks': detections['detection_masks'], - 'segmentation_outputs': detections['segmentation_outputs'], - 'image_info': image_info - }) - if model_params.generate_panoptic_masks: - final_outputs.update({ - 'panoptic_category_mask': - detections['panoptic_outputs']['category_mask'], - 'panoptic_instance_mask': - detections['panoptic_outputs']['instance_mask'], - }) - - return final_outputs diff --git a/official/vision/beta/projects/panoptic_maskrcnn/serving/panoptic_segmentation_test.py b/official/vision/beta/projects/panoptic_maskrcnn/serving/panoptic_segmentation_test.py deleted file mode 100644 index 6f6e2748d8e6c6429b6ad1717a2d717b67cccbff..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/serving/panoptic_segmentation_test.py +++ /dev/null @@ -1,126 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test for panoptic image segmentation export lib.""" -import io -import os - -from absl.testing import parameterized -import numpy as np -from PIL import Image -import tensorflow as tf - -from official.core import exp_factory -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as cfg # pylint: disable=unused-import -from official.vision.beta.projects.panoptic_maskrcnn.modeling import factory -from official.vision.beta.projects.panoptic_maskrcnn.serving import panoptic_segmentation -from official.vision.beta.projects.panoptic_maskrcnn.tasks import panoptic_maskrcnn as task # pylint: disable=unused-import - - -class PanopticSegmentationExportTest(tf.test.TestCase, parameterized.TestCase): - - def _get_panoptic_segmentation_module(self, experiment_name): - params = exp_factory.get_exp_config(experiment_name) - params.task.model.backbone.resnet.model_id = 18 - params.task.model.detection_generator.nms_version = 'batched' - input_specs = tf.keras.layers.InputSpec(shape=[1, 128, 128, 3]) - model = factory.build_panoptic_maskrcnn( - input_specs=input_specs, model_config=params.task.model) - panoptic_segmentation_module = panoptic_segmentation.PanopticSegmentationModule( - params, model=model, batch_size=1, input_image_size=[128, 128]) - return panoptic_segmentation_module - - def _export_from_module(self, module, input_type, save_directory): - signatures = module.get_inference_signatures( - {input_type: 'serving_default'}) - tf.saved_model.save(module, save_directory, signatures=signatures) - - def _get_dummy_input(self, input_type, batch_size, image_size): - """Get dummy input for the given input type.""" - h, w = image_size - - if input_type == 'image_tensor': - return tf.zeros((batch_size, h, w, 3), dtype=np.uint8) - elif input_type == 'image_bytes': - image = Image.fromarray(np.zeros((h, w, 3), dtype=np.uint8)) - byte_io = io.BytesIO() - image.save(byte_io, 'PNG') - return [byte_io.getvalue() for b in range(batch_size)] - elif input_type == 'tf_example': - image_tensor = tf.zeros((h, w, 3), dtype=tf.uint8) - encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() - example = tf.train.Example( - features=tf.train.Features( - feature={ - 'image/encoded': - tf.train.Feature( - bytes_list=tf.train.BytesList(value=[encoded_jpeg])), - })).SerializeToString() - return [example for b in range(batch_size)] - - @parameterized.parameters( - ('image_tensor', 'panoptic_fpn_coco'), - ('image_bytes', 'panoptic_fpn_coco'), - ('tf_example', 'panoptic_fpn_coco'), - ) - def test_export(self, input_type, experiment_name): - tmp_dir = self.get_temp_dir() - module = self._get_panoptic_segmentation_module(experiment_name) - - self._export_from_module(module, input_type, tmp_dir) - - self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) - self.assertTrue( - os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) - self.assertTrue( - os.path.exists( - os.path.join(tmp_dir, 'variables', - 'variables.data-00000-of-00001'))) - - imported = tf.saved_model.load(tmp_dir) - detection_fn = imported.signatures['serving_default'] - - images = self._get_dummy_input( - input_type, batch_size=1, image_size=[128, 128]) - - processed_images, anchor_boxes, image_info = module._build_inputs( - tf.zeros((128, 128, 3), dtype=tf.uint8)) - image_info = tf.expand_dims(image_info, 0) - processed_images = tf.expand_dims(processed_images, 0) - for l, l_boxes in anchor_boxes.items(): - anchor_boxes[l] = tf.expand_dims(l_boxes, 0) - - expected_outputs = module.model( - images=processed_images, - image_info=image_info, - anchor_boxes=anchor_boxes, - training=False) - outputs = detection_fn(tf.constant(images)) - - self.assertAllClose(outputs['num_detections'].numpy(), - expected_outputs['num_detections'].numpy()) - - def test_build_model_fail_with_none_batch_size(self): - params = exp_factory.get_exp_config('panoptic_fpn_coco') - input_specs = tf.keras.layers.InputSpec(shape=[1, 128, 128, 3]) - model = factory.build_panoptic_maskrcnn( - input_specs=input_specs, model_config=params.task.model) - with self.assertRaisesRegex( - ValueError, - 'batch_size cannot be None for panoptic segmentation model.'): - _ = panoptic_segmentation.PanopticSegmentationModule( - params, model=model, batch_size=None, input_image_size=[128, 128]) - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/panoptic_maskrcnn/tasks/__init__.py b/official/vision/beta/projects/panoptic_maskrcnn/tasks/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/tasks/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/panoptic_maskrcnn/tasks/panoptic_maskrcnn.py b/official/vision/beta/projects/panoptic_maskrcnn/tasks/panoptic_maskrcnn.py deleted file mode 100644 index 5fb133cb56ca5579072cf64c7ea7a6de488c5b7d..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/tasks/panoptic_maskrcnn.py +++ /dev/null @@ -1,455 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Panoptic MaskRCNN task definition.""" -from typing import Any, List, Mapping, Optional, Tuple, Dict -from absl import logging -import tensorflow as tf - -from official.common import dataset_fn -from official.core import task_factory -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.evaluation import panoptic_quality_evaluator -from official.vision.beta.evaluation import segmentation_metrics -from official.vision.beta.losses import segmentation_losses -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as exp_cfg -from official.vision.beta.projects.panoptic_maskrcnn.dataloaders import panoptic_maskrcnn_input -from official.vision.beta.projects.panoptic_maskrcnn.modeling import factory -from official.vision.beta.tasks import maskrcnn - - -@task_factory.register_task_cls(exp_cfg.PanopticMaskRCNNTask) -class PanopticMaskRCNNTask(maskrcnn.MaskRCNNTask): - - """A single-replica view of training procedure. - - Panoptic Mask R-CNN task provides artifacts for training/evalution procedures, - including loading/iterating over Datasets, initializing the model, calculating - the loss, post-processing, and customized metrics with reduction. - """ - - def build_model(self) -> tf.keras.Model: - """Build Panoptic Mask R-CNN model.""" - - input_specs = tf.keras.layers.InputSpec( - shape=[None] + self.task_config.model.input_size) - - l2_weight_decay = self.task_config.losses.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - model = factory.build_panoptic_maskrcnn( - input_specs=input_specs, - model_config=self.task_config.model, - l2_regularizer=l2_regularizer) - return model - - def initialize(self, model: tf.keras.Model) -> None: - """Loading pretrained checkpoint.""" - - if not self.task_config.init_checkpoint_modules: - return - - def _get_checkpoint_path(checkpoint_dir_or_file): - checkpoint_path = checkpoint_dir_or_file - if tf.io.gfile.isdir(checkpoint_dir_or_file): - checkpoint_path = tf.train.latest_checkpoint( - checkpoint_dir_or_file) - return checkpoint_path - - for init_module in self.task_config.init_checkpoint_modules: - # Restoring checkpoint. - if init_module == 'all': - checkpoint_path = _get_checkpoint_path( - self.task_config.init_checkpoint) - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(checkpoint_path) - status.expect_partial().assert_existing_objects_matched() - - elif init_module == 'backbone': - checkpoint_path = _get_checkpoint_path( - self.task_config.init_checkpoint) - ckpt = tf.train.Checkpoint(backbone=model.backbone) - status = ckpt.read(checkpoint_path) - status.expect_partial().assert_existing_objects_matched() - - elif init_module == 'segmentation_backbone': - checkpoint_path = _get_checkpoint_path( - self.task_config.segmentation_init_checkpoint) - ckpt = tf.train.Checkpoint( - segmentation_backbone=model.segmentation_backbone) - status = ckpt.read(checkpoint_path) - status.expect_partial().assert_existing_objects_matched() - - elif init_module == 'segmentation_decoder': - checkpoint_path = _get_checkpoint_path( - self.task_config.segmentation_init_checkpoint) - ckpt = tf.train.Checkpoint( - segmentation_decoder=model.segmentation_decoder) - status = ckpt.read(checkpoint_path) - status.expect_partial().assert_existing_objects_matched() - - else: - raise ValueError( - "Only 'all', 'backbone', 'segmentation_backbone' and/or " - "segmentation_backbone' can be used to initialize the model, but " - "got {}".format(init_module)) - logging.info('Finished loading pretrained checkpoint from %s for %s', - checkpoint_path, init_module) - - def build_inputs( - self, - params: exp_cfg.DataConfig, - input_context: Optional[tf.distribute.InputContext] = None - ) -> tf.data.Dataset: - """Build input dataset.""" - decoder_cfg = params.decoder.get() - if params.decoder.type == 'simple_decoder': - decoder = panoptic_maskrcnn_input.TfExampleDecoder( - regenerate_source_id=decoder_cfg.regenerate_source_id, - mask_binarize_threshold=decoder_cfg.mask_binarize_threshold, - include_panoptic_masks=decoder_cfg.include_panoptic_masks) - else: - raise ValueError('Unknown decoder type: {}!'.format(params.decoder.type)) - - parser = panoptic_maskrcnn_input.Parser( - output_size=self.task_config.model.input_size[:2], - min_level=self.task_config.model.min_level, - max_level=self.task_config.model.max_level, - num_scales=self.task_config.model.anchor.num_scales, - aspect_ratios=self.task_config.model.anchor.aspect_ratios, - anchor_size=self.task_config.model.anchor.anchor_size, - dtype=params.dtype, - rpn_match_threshold=params.parser.rpn_match_threshold, - rpn_unmatched_threshold=params.parser.rpn_unmatched_threshold, - rpn_batch_size_per_im=params.parser.rpn_batch_size_per_im, - rpn_fg_fraction=params.parser.rpn_fg_fraction, - aug_rand_hflip=params.parser.aug_rand_hflip, - aug_scale_min=params.parser.aug_scale_min, - aug_scale_max=params.parser.aug_scale_max, - skip_crowd_during_training=params.parser.skip_crowd_during_training, - max_num_instances=params.parser.max_num_instances, - mask_crop_size=params.parser.mask_crop_size, - segmentation_resize_eval_groundtruth=params.parser - .segmentation_resize_eval_groundtruth, - segmentation_groundtruth_padded_size=params.parser - .segmentation_groundtruth_padded_size, - segmentation_ignore_label=params.parser.segmentation_ignore_label, - panoptic_ignore_label=params.parser.panoptic_ignore_label, - include_panoptic_masks=params.parser.include_panoptic_masks) - - reader = input_reader_factory.input_reader_generator( - params, - dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - dataset = reader.read(input_context=input_context) - - return dataset - - def build_losses(self, - outputs: Mapping[str, Any], - labels: Mapping[str, Any], - aux_losses: Optional[Any] = None) -> Dict[str, tf.Tensor]: - """Build Panoptic Mask R-CNN losses.""" - params = self.task_config.losses - - use_groundtruth_dimension = params.semantic_segmentation_use_groundtruth_dimension - - segmentation_loss_fn = segmentation_losses.SegmentationLoss( - label_smoothing=params.semantic_segmentation_label_smoothing, - class_weights=params.semantic_segmentation_class_weights, - ignore_label=params.semantic_segmentation_ignore_label, - use_groundtruth_dimension=use_groundtruth_dimension, - top_k_percent_pixels=params.semantic_segmentation_top_k_percent_pixels) - - instance_segmentation_weight = params.instance_segmentation_weight - semantic_segmentation_weight = params.semantic_segmentation_weight - - losses = super(PanopticMaskRCNNTask, self).build_losses( - outputs=outputs, - labels=labels, - aux_losses=None) - maskrcnn_loss = losses['model_loss'] - segmentation_loss = segmentation_loss_fn( - outputs['segmentation_outputs'], - labels['gt_segmentation_mask']) - - model_loss = ( - instance_segmentation_weight * maskrcnn_loss + - semantic_segmentation_weight * segmentation_loss) - - total_loss = model_loss - if aux_losses: - reg_loss = tf.reduce_sum(aux_losses) - total_loss = model_loss + reg_loss - - losses.update({ - 'total_loss': total_loss, - 'maskrcnn_loss': maskrcnn_loss, - 'segmentation_loss': segmentation_loss, - 'model_loss': model_loss, - }) - return losses - - def build_metrics(self, training: bool = True) -> List[ - tf.keras.metrics.Metric]: - """Build detection metrics.""" - metrics = [] - num_segmentation_classes = self.task_config.model.segmentation_model.num_classes - if training: - metric_names = [ - 'total_loss', - 'rpn_score_loss', - 'rpn_box_loss', - 'frcnn_cls_loss', - 'frcnn_box_loss', - 'mask_loss', - 'maskrcnn_loss', - 'segmentation_loss', - 'model_loss' - ] - for name in metric_names: - metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) - - if self.task_config.segmentation_evaluation.report_train_mean_iou: - self.segmentation_train_mean_iou = segmentation_metrics.MeanIoU( - name='train_mean_iou', - num_classes=num_segmentation_classes, - rescale_predictions=False, - dtype=tf.float32) - - else: - self._build_coco_metrics() - - rescale_predictions = (not self.task_config.validation_data.parser - .segmentation_resize_eval_groundtruth) - - self.segmentation_perclass_iou_metric = segmentation_metrics.PerClassIoU( - name='per_class_iou', - num_classes=num_segmentation_classes, - rescale_predictions=rescale_predictions, - dtype=tf.float32) - - if isinstance(tf.distribute.get_strategy(), tf.distribute.TPUStrategy): - self._process_iou_metric_on_cpu = True - else: - self._process_iou_metric_on_cpu = False - - if self.task_config.model.generate_panoptic_masks: - if not self.task_config.validation_data.parser.include_panoptic_masks: - raise ValueError('`include_panoptic_masks` should be set to True when' - ' computing panoptic quality.') - pq_config = self.task_config.panoptic_quality_evaluator - self.panoptic_quality_metric = panoptic_quality_evaluator.PanopticQualityEvaluator( - num_categories=pq_config.num_categories, - ignored_label=pq_config.ignored_label, - max_instances_per_category=pq_config.max_instances_per_category, - offset=pq_config.offset, - is_thing=pq_config.is_thing, - rescale_predictions=pq_config.rescale_predictions) - - return metrics - - def train_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - optimizer: tf.keras.optimizers.Optimizer, - metrics: Optional[List[Any]] = None) -> Dict[str, Any]: - """Does forward and backward. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - images, labels = inputs - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - - with tf.GradientTape() as tape: - outputs = model( - images, - image_info=labels['image_info'], - anchor_boxes=labels['anchor_boxes'], - gt_boxes=labels['gt_boxes'], - gt_classes=labels['gt_classes'], - gt_masks=(labels['gt_masks'] if self.task_config.model.include_mask - else None), - training=True) - outputs = tf.nest.map_structure( - lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - losses = self.build_losses( - outputs=outputs, labels=labels, aux_losses=model.losses) - scaled_loss = losses['total_loss'] / num_replicas - - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient when LossScaleOptimizer is used. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = {self.loss: losses['total_loss']} - - if metrics: - for m in metrics: - m.update_state(losses[m.name]) - - if self.task_config.segmentation_evaluation.report_train_mean_iou: - segmentation_labels = { - 'masks': labels['gt_segmentation_mask'], - 'valid_masks': labels['gt_segmentation_valid_mask'], - 'image_info': labels['image_info'] - } - self.process_metrics( - metrics=[self.segmentation_train_mean_iou], - labels=segmentation_labels, - model_outputs=outputs['segmentation_outputs']) - logs.update({ - self.segmentation_train_mean_iou.name: - self.segmentation_train_mean_iou.result() - }) - - return logs - - def validation_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - metrics: Optional[List[Any]] = None) -> Dict[str, Any]: - """Validatation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - images, labels = inputs - - outputs = model( - images, - anchor_boxes=labels['anchor_boxes'], - image_info=labels['image_info'], - training=False) - - logs = {self.loss: 0} - coco_model_outputs = { - 'detection_masks': outputs['detection_masks'], - 'detection_boxes': outputs['detection_boxes'], - 'detection_scores': outputs['detection_scores'], - 'detection_classes': outputs['detection_classes'], - 'num_detections': outputs['num_detections'], - 'source_id': labels['groundtruths']['source_id'], - 'image_info': labels['image_info'] - } - segmentation_labels = { - 'masks': labels['groundtruths']['gt_segmentation_mask'], - 'valid_masks': labels['groundtruths']['gt_segmentation_valid_mask'], - 'image_info': labels['image_info'] - } - - logs.update( - {self.coco_metric.name: (labels['groundtruths'], coco_model_outputs)}) - if self._process_iou_metric_on_cpu: - logs.update({ - self.segmentation_perclass_iou_metric.name: - (segmentation_labels, outputs['segmentation_outputs']) - }) - else: - self.segmentation_perclass_iou_metric.update_state( - segmentation_labels, - outputs['segmentation_outputs']) - - if self.task_config.model.generate_panoptic_masks: - pq_metric_labels = { - 'category_mask': - labels['groundtruths']['gt_panoptic_category_mask'], - 'instance_mask': - labels['groundtruths']['gt_panoptic_instance_mask'], - 'image_info': labels['image_info'] - } - logs.update({ - self.panoptic_quality_metric.name: - (pq_metric_labels, outputs['panoptic_outputs'])}) - return logs - - def aggregate_logs(self, state=None, step_outputs=None): - if state is None: - self.coco_metric.reset_states() - self.segmentation_perclass_iou_metric.reset_states() - state = [self.coco_metric, self.segmentation_perclass_iou_metric] - if self.task_config.model.generate_panoptic_masks: - state += [self.panoptic_quality_metric] - - self.coco_metric.update_state( - step_outputs[self.coco_metric.name][0], - step_outputs[self.coco_metric.name][1]) - - if self._process_iou_metric_on_cpu: - self.segmentation_perclass_iou_metric.update_state( - step_outputs[self.segmentation_perclass_iou_metric.name][0], - step_outputs[self.segmentation_perclass_iou_metric.name][1]) - - if self.task_config.model.generate_panoptic_masks: - self.panoptic_quality_metric.update_state( - step_outputs[self.panoptic_quality_metric.name][0], - step_outputs[self.panoptic_quality_metric.name][1]) - - return state - - def reduce_aggregated_logs(self, aggregated_logs, global_step=None): - result = {} - result = super( - PanopticMaskRCNNTask, self).reduce_aggregated_logs( - aggregated_logs=aggregated_logs, - global_step=global_step) - - ious = self.segmentation_perclass_iou_metric.result() - if self.task_config.segmentation_evaluation.report_per_class_iou: - for i, value in enumerate(ious.numpy()): - result.update({'segmentation_iou/class_{}'.format(i): value}) - # Computes mean IoU - result.update({'segmentation_mean_iou': tf.reduce_mean(ious).numpy()}) - - if self.task_config.model.generate_panoptic_masks: - report_per_class_metrics = self.task_config.panoptic_quality_evaluator.report_per_class_metrics - panoptic_quality_results = self.panoptic_quality_metric.result() - for k, value in panoptic_quality_results.items(): - if k.endswith('per_class'): - if report_per_class_metrics: - for i, per_class_value in enumerate(value): - metric_key = 'panoptic_quality/{}/class_{}'.format(k, i) - result[metric_key] = per_class_value - else: - continue - else: - result['panoptic_quality/{}'.format(k)] = value - - return result diff --git a/official/vision/beta/projects/panoptic_maskrcnn/tasks/panoptic_maskrcnn_test.py b/official/vision/beta/projects/panoptic_maskrcnn/tasks/panoptic_maskrcnn_test.py deleted file mode 100644 index 031bc9247d2839ae92f5b743e3fede57bc01ee62..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/tasks/panoptic_maskrcnn_test.py +++ /dev/null @@ -1,69 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for panoptic_maskrcnn.py.""" -import os - -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.beta.configs import decoders as decoder_cfg -from official.vision.beta.configs import semantic_segmentation as segmentation_cfg -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as cfg -from official.vision.beta.projects.panoptic_maskrcnn.tasks import panoptic_maskrcnn - - -class PanopticMaskRCNNTaskTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters( - (['all'],), - (['backbone'],), - (['segmentation_backbone'],), - (['segmentation_decoder'],), - (['backbone', 'segmentation_backbone'],), - (['segmentation_backbone', 'segmentation_decoder'],)) - def test_model_initializing(self, init_checkpoint_modules): - - shared_backbone = ('segmentation_backbone' not in init_checkpoint_modules) - shared_decoder = ('segmentation_decoder' not in init_checkpoint_modules and - shared_backbone) - - task_config = cfg.PanopticMaskRCNNTask( - model=cfg.PanopticMaskRCNN( - num_classes=2, - input_size=[640, 640, 3], - segmentation_model=segmentation_cfg.SemanticSegmentationModel( - decoder=decoder_cfg.Decoder(type='fpn')), - shared_backbone=shared_backbone, - shared_decoder=shared_decoder)) - - task = panoptic_maskrcnn.PanopticMaskRCNNTask(task_config) - model = task.build_model() - - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - ckpt_save_dir = self.create_tempdir().full_path - ckpt.save(os.path.join(ckpt_save_dir, 'ckpt')) - - if (init_checkpoint_modules == ['all'] or - 'backbone' in init_checkpoint_modules): - task._task_config.init_checkpoint = ckpt_save_dir - if ('segmentation_backbone' in init_checkpoint_modules or - 'segmentation_decoder' in init_checkpoint_modules): - task._task_config.segmentation_init_checkpoint = ckpt_save_dir - - task._task_config.init_checkpoint_modules = init_checkpoint_modules - task.initialize(model) - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/panoptic_maskrcnn/train.py b/official/vision/beta/projects/panoptic_maskrcnn/train.py deleted file mode 100644 index 4270f915fc66f2b7072470728b61d277d78de606..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/panoptic_maskrcnn/train.py +++ /dev/null @@ -1,27 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Panoptic MaskRCNN trainer.""" - -from absl import app - -from official.common import flags as tfm_flags -from official.vision.beta import train -from official.vision.beta.projects.panoptic_maskrcnn.configs import panoptic_maskrcnn as cfg # pylint: disable=unused-import -from official.vision.beta.projects.panoptic_maskrcnn.tasks import panoptic_maskrcnn as task # pylint: disable=unused-import - - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(train.main) diff --git a/official/vision/beta/projects/simclr/README.md b/official/vision/beta/projects/simclr/README.md deleted file mode 100644 index 91b4375bd60ff89b8677acad89390592f658f6c0..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/README.md +++ /dev/null @@ -1,78 +0,0 @@ -# Simple Framework for Contrastive Learning - -[![Paper](http://img.shields.io/badge/Paper-arXiv.2002.05709-B3181B?logo=arXiv)](https://arxiv.org/abs/2002.05709) -[![Paper](http://img.shields.io/badge/Paper-arXiv.2006.10029-B3181B?logo=arXiv)](https://arxiv.org/abs/2006.10029) - -
- SimCLR Illustration -
-
- An illustration of SimCLR (from our blog here). -
- -## Enviroment setup - -The code can be run on multiple GPUs or TPUs with different distribution -strategies. See the TensorFlow distributed training -[guide](https://www.tensorflow.org/guide/distributed_training) for an overview -of `tf.distribute`. - -The code is compatible with TensorFlow 2.4+. See requirements.txt for all -prerequisites, and you can also install them using the following command. `pip -install -r ./official/requirements.txt` - -## Pretraining -To pretrain the model on Imagenet, try the following command: - -``` -python3 -m official.vision.beta.projects.simclr.train \ - --mode=train_and_eval \ - --experiment=simclr_pretraining \ - --model_dir={MODEL_DIR} \ - --config_file={CONFIG_FILE} -``` - -An example of the config file can be found [here](./configs/experiments/imagenet_simclr_pretrain_gpu.yaml) - - -## Semi-supervised learning and fine-tuning the whole network - -You can access 1% and 10% ImageNet subsets used for semi-supervised learning via -[tensorflow datasets](https://www.tensorflow.org/datasets/catalog/imagenet2012_subset). -You can also find image IDs of these subsets in `imagenet_subsets/`. - -To fine-tune the whole network, refer to the following command: - -``` -python3 -m official.vision.beta.projects.simclr.train \ - --mode=train_and_eval \ - --experiment=simclr_finetuning \ - --model_dir={MODEL_DIR} \ - --config_file={CONFIG_FILE} -``` - -An example of the config file can be found [here](./configs/experiments/imagenet_simclr_finetune_gpu.yaml). - -## Cite - -[SimCLR paper](https://arxiv.org/abs/2002.05709): - -``` -@article{chen2020simple, - title={A Simple Framework for Contrastive Learning of Visual Representations}, - author={Chen, Ting and Kornblith, Simon and Norouzi, Mohammad and Hinton, Geoffrey}, - journal={arXiv preprint arXiv:2002.05709}, - year={2020} -} -``` - -[SimCLRv2 paper](https://arxiv.org/abs/2006.10029): - -``` -@article{chen2020big, - title={Big Self-Supervised Models are Strong Semi-Supervised Learners}, - author={Chen, Ting and Kornblith, Simon and Swersky, Kevin and Norouzi, Mohammad and Hinton, Geoffrey}, - journal={arXiv preprint arXiv:2006.10029}, - year={2020} -} -``` diff --git a/official/vision/beta/projects/simclr/common/registry_imports.py b/official/vision/beta/projects/simclr/common/registry_imports.py deleted file mode 100644 index d605e21bb3564a0674f306d1e367bbcaf0ae66bd..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/common/registry_imports.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""All necessary imports for registration.""" - -# pylint: disable=unused-import -from official.common import registry_imports -from official.vision.beta.projects.simclr.configs import simclr -from official.vision.beta.projects.simclr.losses import contrastive_losses -from official.vision.beta.projects.simclr.modeling import simclr_model -from official.vision.beta.projects.simclr.tasks import simclr as simclr_task diff --git a/official/vision/beta/projects/simclr/configs/simclr.py b/official/vision/beta/projects/simclr/configs/simclr.py deleted file mode 100644 index 2d03839c61477083d695e690f39056c7d04b8405..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/configs/simclr.py +++ /dev/null @@ -1,318 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""SimCLR configurations.""" -import dataclasses -import os -from typing import List, Optional - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import backbones -from official.vision.beta.configs import common -from official.vision.beta.projects.simclr.modeling import simclr_model - - -@dataclasses.dataclass -class Decoder(hyperparams.Config): - decode_label: bool = True - - -@dataclasses.dataclass -class Parser(hyperparams.Config): - """Parser config.""" - aug_rand_crop: bool = True - aug_rand_hflip: bool = True - aug_color_distort: bool = True - aug_color_jitter_strength: float = 1.0 - aug_color_jitter_impl: str = 'simclrv2' # 'simclrv1' or 'simclrv2' - aug_rand_blur: bool = True - parse_label: bool = True - test_crop: bool = True - mode: str = simclr_model.PRETRAIN - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """Training data config.""" - input_path: str = '' - global_batch_size: int = 0 - is_training: bool = True - dtype: str = 'float32' - shuffle_buffer_size: int = 10000 - cycle_length: int = 10 - # simclr specific configs - parser: Parser = Parser() - decoder: Decoder = Decoder() - # Useful when doing a sanity check that we absolutely use no labels while - # pretrain by setting labels to zeros (default = False, keep original labels) - input_set_label_to_zero: bool = False - - -@dataclasses.dataclass -class ProjectionHead(hyperparams.Config): - proj_output_dim: int = 128 - num_proj_layers: int = 3 - ft_proj_idx: int = 1 # layer of the projection head to use for fine-tuning. - - -@dataclasses.dataclass -class SupervisedHead(hyperparams.Config): - num_classes: int = 1001 - zero_init: bool = False - - -@dataclasses.dataclass -class ContrastiveLoss(hyperparams.Config): - projection_norm: bool = True - temperature: float = 0.1 - l2_weight_decay: float = 0.0 - - -@dataclasses.dataclass -class ClassificationLosses(hyperparams.Config): - label_smoothing: float = 0.0 - one_hot: bool = True - l2_weight_decay: float = 0.0 - - -@dataclasses.dataclass -class Evaluation(hyperparams.Config): - top_k: int = 5 - one_hot: bool = True - - -@dataclasses.dataclass -class SimCLRModel(hyperparams.Config): - """SimCLR model config.""" - input_size: List[int] = dataclasses.field(default_factory=list) - backbone: backbones.Backbone = backbones.Backbone( - type='resnet', resnet=backbones.ResNet()) - projection_head: ProjectionHead = ProjectionHead( - proj_output_dim=128, num_proj_layers=3, ft_proj_idx=1) - supervised_head: SupervisedHead = SupervisedHead(num_classes=1001) - norm_activation: common.NormActivation = common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False) - mode: str = simclr_model.PRETRAIN - backbone_trainable: bool = True - - -@dataclasses.dataclass -class SimCLRPretrainTask(cfg.TaskConfig): - """SimCLR pretraining task config.""" - model: SimCLRModel = SimCLRModel(mode=simclr_model.PRETRAIN) - train_data: DataConfig = DataConfig( - parser=Parser(mode=simclr_model.PRETRAIN), is_training=True) - validation_data: DataConfig = DataConfig( - parser=Parser(mode=simclr_model.PRETRAIN), is_training=False) - loss: ContrastiveLoss = ContrastiveLoss() - evaluation: Evaluation = Evaluation() - init_checkpoint: Optional[str] = None - # all or backbone - init_checkpoint_modules: str = 'all' - - -@dataclasses.dataclass -class SimCLRFinetuneTask(cfg.TaskConfig): - """SimCLR fine tune task config.""" - model: SimCLRModel = SimCLRModel( - mode=simclr_model.FINETUNE, - supervised_head=SupervisedHead(num_classes=1001, zero_init=True)) - train_data: DataConfig = DataConfig( - parser=Parser(mode=simclr_model.FINETUNE), is_training=True) - validation_data: DataConfig = DataConfig( - parser=Parser(mode=simclr_model.FINETUNE), is_training=False) - loss: ClassificationLosses = ClassificationLosses() - evaluation: Evaluation = Evaluation() - init_checkpoint: Optional[str] = None - # all, backbone_projection or backbone - init_checkpoint_modules: str = 'backbone_projection' - - -@exp_factory.register_config_factory('simclr_pretraining') -def simclr_pretraining() -> cfg.ExperimentConfig: - """Image classification general.""" - return cfg.ExperimentConfig( - task=SimCLRPretrainTask(), - trainer=cfg.TrainerConfig(), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - -@exp_factory.register_config_factory('simclr_finetuning') -def simclr_finetuning() -> cfg.ExperimentConfig: - """Image classification general.""" - return cfg.ExperimentConfig( - task=SimCLRFinetuneTask(), - trainer=cfg.TrainerConfig(), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - -IMAGENET_TRAIN_EXAMPLES = 1281167 -IMAGENET_VAL_EXAMPLES = 50000 -IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord' - - -@exp_factory.register_config_factory('simclr_pretraining_imagenet') -def simclr_pretraining_imagenet() -> cfg.ExperimentConfig: - """Image classification general.""" - train_batch_size = 4096 - eval_batch_size = 4096 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - return cfg.ExperimentConfig( - task=SimCLRPretrainTask( - model=SimCLRModel( - mode=simclr_model.PRETRAIN, - backbone_trainable=True, - input_size=[224, 224, 3], - backbone=backbones.Backbone( - type='resnet', resnet=backbones.ResNet(model_id=50)), - projection_head=ProjectionHead( - proj_output_dim=128, num_proj_layers=3, ft_proj_idx=1), - supervised_head=SupervisedHead(num_classes=1001), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=True)), - loss=ContrastiveLoss(), - evaluation=Evaluation(), - train_data=DataConfig( - parser=Parser(mode=simclr_model.PRETRAIN), - decoder=Decoder(decode_label=True), - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size), - validation_data=DataConfig( - parser=Parser(mode=simclr_model.PRETRAIN), - decoder=Decoder(decode_label=True), - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size), - ), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=500 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'lars', - 'lars': { - 'momentum': - 0.9, - 'weight_decay_rate': - 0.000001, - 'exclude_from_weight_decay': [ - 'batch_normalization', 'bias' - ] - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - # 0.2 * BatchSize / 256 - 'initial_learning_rate': 0.2 * train_batch_size / 256, - # train_steps - warmup_steps - 'decay_steps': 475 * steps_per_epoch - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - # 5% of total epochs - 'warmup_steps': 25 * steps_per_epoch - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - -@exp_factory.register_config_factory('simclr_finetuning_imagenet') -def simclr_finetuning_imagenet() -> cfg.ExperimentConfig: - """Image classification general.""" - train_batch_size = 1024 - eval_batch_size = 1024 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - pretrain_model_base = '' - return cfg.ExperimentConfig( - task=SimCLRFinetuneTask( - model=SimCLRModel( - mode=simclr_model.FINETUNE, - backbone_trainable=True, - input_size=[224, 224, 3], - backbone=backbones.Backbone( - type='resnet', resnet=backbones.ResNet(model_id=50)), - projection_head=ProjectionHead( - proj_output_dim=128, num_proj_layers=3, ft_proj_idx=1), - supervised_head=SupervisedHead(num_classes=1001, zero_init=True), - norm_activation=common.NormActivation( - norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), - loss=ClassificationLosses(), - evaluation=Evaluation(), - train_data=DataConfig( - parser=Parser(mode=simclr_model.FINETUNE), - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size), - validation_data=DataConfig( - parser=Parser(mode=simclr_model.FINETUNE), - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size), - init_checkpoint=pretrain_model_base, - # all, backbone_projection or backbone - init_checkpoint_modules='backbone_projection'), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=60 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'lars', - 'lars': { - 'momentum': - 0.9, - 'weight_decay_rate': - 0.0, - 'exclude_from_weight_decay': [ - 'batch_normalization', 'bias' - ] - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - # 0.01 × BatchSize / 512 - 'initial_learning_rate': 0.01 * train_batch_size / 512, - 'decay_steps': 60 * steps_per_epoch - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) diff --git a/official/vision/beta/projects/simclr/dataloaders/preprocess_ops.py b/official/vision/beta/projects/simclr/dataloaders/preprocess_ops.py deleted file mode 100644 index 93a2b1f35c6cf43de9cbb653979734c774eb9554..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/dataloaders/preprocess_ops.py +++ /dev/null @@ -1,349 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Preprocessing ops.""" -import functools -import tensorflow as tf - -CROP_PROPORTION = 0.875 # Standard for ImageNet. - - -def random_apply(func, p, x): - """Randomly apply function func to x with probability p.""" - return tf.cond( - tf.less( - tf.random.uniform([], minval=0, maxval=1, dtype=tf.float32), - tf.cast(p, tf.float32)), lambda: func(x), lambda: x) - - -def random_brightness(image, max_delta, impl='simclrv2'): - """A multiplicative vs additive change of brightness.""" - if impl == 'simclrv2': - factor = tf.random.uniform([], tf.maximum(1.0 - max_delta, 0), - 1.0 + max_delta) - image = image * factor - elif impl == 'simclrv1': - image = tf.image.random_brightness(image, max_delta=max_delta) - else: - raise ValueError('Unknown impl {} for random brightness.'.format(impl)) - return image - - -def to_grayscale(image, keep_channels=True): - image = tf.image.rgb_to_grayscale(image) - if keep_channels: - image = tf.tile(image, [1, 1, 3]) - return image - - -def color_jitter_nonrand(image, - brightness=0, - contrast=0, - saturation=0, - hue=0, - impl='simclrv2'): - """Distorts the color of the image (jittering order is fixed). - - Args: - image: The input image tensor. - brightness: A float, specifying the brightness for color jitter. - contrast: A float, specifying the contrast for color jitter. - saturation: A float, specifying the saturation for color jitter. - hue: A float, specifying the hue for color jitter. - impl: 'simclrv1' or 'simclrv2'. Whether to use simclrv1 or simclrv2's - version of random brightness. - - Returns: - The distorted image tensor. - """ - with tf.name_scope('distort_color'): - def apply_transform(i, x, brightness, contrast, saturation, hue): - """Apply the i-th transformation.""" - if brightness != 0 and i == 0: - x = random_brightness(x, max_delta=brightness, impl=impl) - elif contrast != 0 and i == 1: - x = tf.image.random_contrast( - x, lower=1 - contrast, upper=1 + contrast) - elif saturation != 0 and i == 2: - x = tf.image.random_saturation( - x, lower=1 - saturation, upper=1 + saturation) - elif hue != 0: - x = tf.image.random_hue(x, max_delta=hue) - return x - - for i in range(4): - image = apply_transform(i, image, brightness, contrast, saturation, hue) - image = tf.clip_by_value(image, 0., 1.) - return image - - -def color_jitter_rand(image, - brightness=0, - contrast=0, - saturation=0, - hue=0, - impl='simclrv2'): - """Distorts the color of the image (jittering order is random). - - Args: - image: The input image tensor. - brightness: A float, specifying the brightness for color jitter. - contrast: A float, specifying the contrast for color jitter. - saturation: A float, specifying the saturation for color jitter. - hue: A float, specifying the hue for color jitter. - impl: 'simclrv1' or 'simclrv2'. Whether to use simclrv1 or simclrv2's - version of random brightness. - - Returns: - The distorted image tensor. - """ - with tf.name_scope('distort_color'): - def apply_transform(i, x): - """Apply the i-th transformation.""" - - def brightness_foo(): - if brightness == 0: - return x - else: - return random_brightness(x, max_delta=brightness, impl=impl) - - def contrast_foo(): - if contrast == 0: - return x - else: - return tf.image.random_contrast(x, lower=1 - contrast, - upper=1 + contrast) - - def saturation_foo(): - if saturation == 0: - return x - else: - return tf.image.random_saturation( - x, lower=1 - saturation, upper=1 + saturation) - - def hue_foo(): - if hue == 0: - return x - else: - return tf.image.random_hue(x, max_delta=hue) - - x = tf.cond(tf.less(i, 2), - lambda: tf.cond(tf.less(i, 1), brightness_foo, contrast_foo), - lambda: tf.cond(tf.less(i, 3), saturation_foo, hue_foo)) - return x - - perm = tf.random.shuffle(tf.range(4)) - for i in range(4): - image = apply_transform(perm[i], image) - image = tf.clip_by_value(image, 0., 1.) - return image - - -def color_jitter(image, strength, random_order=True, impl='simclrv2'): - """Distorts the color of the image. - - Args: - image: The input image tensor. - strength: the floating number for the strength of the color augmentation. - random_order: A bool, specifying whether to randomize the jittering order. - impl: 'simclrv1' or 'simclrv2'. Whether to use simclrv1 or simclrv2's - version of random brightness. - - Returns: - The distorted image tensor. - """ - brightness = 0.8 * strength - contrast = 0.8 * strength - saturation = 0.8 * strength - hue = 0.2 * strength - if random_order: - return color_jitter_rand( - image, brightness, contrast, saturation, hue, impl=impl) - else: - return color_jitter_nonrand( - image, brightness, contrast, saturation, hue, impl=impl) - - -def random_color_jitter(image, - p=1.0, - color_jitter_strength=1.0, - impl='simclrv2'): - """Perform random color jitter.""" - def _transform(image): - color_jitter_t = functools.partial( - color_jitter, strength=color_jitter_strength, impl=impl) - image = random_apply(color_jitter_t, p=0.8, x=image) - return random_apply(to_grayscale, p=0.2, x=image) - - return random_apply(_transform, p=p, x=image) - - -def gaussian_blur(image, kernel_size, sigma, padding='SAME'): - """Blurs the given image with separable convolution. - - - Args: - image: Tensor of shape [height, width, channels] and dtype float to blur. - kernel_size: Integer Tensor for the size of the blur kernel. This is should - be an odd number. If it is an even number, the actual kernel size will be - size + 1. - sigma: Sigma value for gaussian operator. - padding: Padding to use for the convolution. Typically 'SAME' or 'VALID'. - - Returns: - A Tensor representing the blurred image. - """ - radius = tf.cast(kernel_size / 2, dtype=tf.int32) - kernel_size = radius * 2 + 1 - x = tf.cast(tf.range(-radius, radius + 1), dtype=tf.float32) - blur_filter = tf.exp(-tf.pow(x, 2.0) / - (2.0 * tf.pow(tf.cast(sigma, dtype=tf.float32), 2.0))) - blur_filter /= tf.reduce_sum(blur_filter) - # One vertical and one horizontal filter. - blur_v = tf.reshape(blur_filter, [kernel_size, 1, 1, 1]) - blur_h = tf.reshape(blur_filter, [1, kernel_size, 1, 1]) - num_channels = tf.shape(image)[-1] - blur_h = tf.tile(blur_h, [1, 1, num_channels, 1]) - blur_v = tf.tile(blur_v, [1, 1, num_channels, 1]) - expand_batch_dim = image.shape.ndims == 3 - if expand_batch_dim: - # Tensorflow requires batched input to convolutions, which we can fake with - # an extra dimension. - image = tf.expand_dims(image, axis=0) - blurred = tf.nn.depthwise_conv2d( - image, blur_h, strides=[1, 1, 1, 1], padding=padding) - blurred = tf.nn.depthwise_conv2d( - blurred, blur_v, strides=[1, 1, 1, 1], padding=padding) - if expand_batch_dim: - blurred = tf.squeeze(blurred, axis=0) - return blurred - - -def random_blur(image, height, width, p=0.5): - """Randomly blur an image. - - Args: - image: `Tensor` representing an image of arbitrary size. - height: Height of output image. - width: Width of output image. - p: probability of applying this transformation. - - Returns: - A preprocessed image `Tensor`. - """ - del width - - def _transform(image): - sigma = tf.random.uniform([], 0.1, 2.0, dtype=tf.float32) - return gaussian_blur( - image, kernel_size=height // 10, sigma=sigma, padding='SAME') - - return random_apply(_transform, p=p, x=image) - - -def distorted_bounding_box_crop(image, - bbox, - min_object_covered=0.1, - aspect_ratio_range=(0.75, 1.33), - area_range=(0.05, 1.0), - max_attempts=100, - scope=None): - """Generates cropped_image using one of the bboxes randomly distorted. - - See `tf.image.sample_distorted_bounding_box` for more documentation. - - Args: - image: `Tensor` of image data. - bbox: `Tensor` of bounding boxes arranged `[1, num_boxes, coords]` - where each coordinate is [0, 1) and the coordinates are arranged - as `[ymin, xmin, ymax, xmax]`. If num_boxes is 0 then use the whole - image. - min_object_covered: An optional `float`. Defaults to `0.1`. The cropped - area of the image must contain at least this fraction of any bounding - box supplied. - aspect_ratio_range: An optional list of `float`s. The cropped area of the - image must have an aspect ratio = width / height within this range. - area_range: An optional list of `float`s. The cropped area of the image - must contain a fraction of the supplied image within in this range. - max_attempts: An optional `int`. Number of attempts at generating a cropped - region of the image of the specified constraints. After `max_attempts` - failures, return the entire image. - scope: Optional `str` for name scope. - Returns: - (cropped image `Tensor`, distorted bbox `Tensor`). - """ - with tf.name_scope(scope or 'distorted_bounding_box_crop'): - shape = tf.shape(image) - sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( - shape, - bounding_boxes=bbox, - min_object_covered=min_object_covered, - aspect_ratio_range=aspect_ratio_range, - area_range=area_range, - max_attempts=max_attempts, - use_image_if_no_bounding_boxes=True) - bbox_begin, bbox_size, _ = sample_distorted_bounding_box - - # Crop the image to the specified bounding box. - offset_y, offset_x, _ = tf.unstack(bbox_begin) - target_height, target_width, _ = tf.unstack(bbox_size) - image = tf.image.crop_to_bounding_box( - image, offset_y, offset_x, target_height, target_width) - - return image - - -def crop_and_resize(image, height, width): - """Make a random crop and resize it to height `height` and width `width`. - - Args: - image: Tensor representing the image. - height: Desired image height. - width: Desired image width. - - Returns: - A `height` x `width` x channels Tensor holding a random crop of `image`. - """ - bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) - aspect_ratio = width / height - image = distorted_bounding_box_crop( - image, - bbox, - min_object_covered=0.1, - aspect_ratio_range=(3. / 4 * aspect_ratio, 4. / 3. * aspect_ratio), - area_range=(0.08, 1.0), - max_attempts=100, - scope=None) - return tf.image.resize([image], [height, width], - method=tf.image.ResizeMethod.BICUBIC)[0] - - -def random_crop_with_resize(image, height, width, p=1.0): - """Randomly crop and resize an image. - - Args: - image: `Tensor` representing an image of arbitrary size. - height: Height of output image. - width: Width of output image. - p: Probability of applying this transformation. - - Returns: - A preprocessed image `Tensor`. - """ - - def _transform(image): # pylint: disable=missing-docstring - image = crop_and_resize(image, height, width) - return image - - return random_apply(_transform, p=p, x=image) diff --git a/official/vision/beta/projects/simclr/modeling/layers/nn_blocks.py b/official/vision/beta/projects/simclr/modeling/layers/nn_blocks.py deleted file mode 100644 index 5264eb8b1c3b7515bc7dbdfc00c7e467f2004d23..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/modeling/layers/nn_blocks.py +++ /dev/null @@ -1,133 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains common building blocks for simclr neural networks.""" -from typing import Text, Optional - -import tensorflow as tf - -from official.modeling import tf_utils - -regularizers = tf.keras.regularizers - - -class DenseBN(tf.keras.layers.Layer): - """Modified Dense layer to help build simclr system. - - The layer is a standards combination of Dense, BatchNorm and Activation. - """ - - def __init__( - self, - output_dim: int, - use_bias: bool = True, - use_normalization: bool = False, - use_sync_bn: bool = False, - norm_momentum: float = 0.99, - norm_epsilon: float = 0.001, - activation: Optional[Text] = 'relu', - kernel_initializer: Text = 'VarianceScaling', - kernel_regularizer: Optional[regularizers.Regularizer] = None, - bias_regularizer: Optional[regularizers.Regularizer] = None, - name='linear_layer', - **kwargs): - """Customized Dense layer. - - Args: - output_dim: `int` size of output dimension. - use_bias: if True, use biase in the dense layer. - use_normalization: if True, use batch normalization. - use_sync_bn: if True, use synchronized batch normalization. - norm_momentum: `float` normalization momentum for the moving average. - norm_epsilon: `float` small float added to variance to avoid dividing by - zero. - activation: `str` name of the activation function. - kernel_initializer: kernel_initializer for convolutional layers. - kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D. - Default to None. - bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d. - Default to None. - name: `str`, name of the layer. - **kwargs: keyword arguments to be passed. - """ - # Note: use_bias is ignored for the dense layer when use_bn=True. - # However, it is still used for batch norm. - super(DenseBN, self).__init__(**kwargs) - self._output_dim = output_dim - self._use_bias = use_bias - self._use_normalization = use_normalization - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._activation = activation - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._name = name - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - if activation: - self._activation_fn = tf_utils.get_activation(activation) - else: - self._activation_fn = None - - def get_config(self): - config = { - 'output_dim': self._output_dim, - 'use_bias': self._use_bias, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'use_normalization': self._use_normalization, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - } - base_config = super(DenseBN, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def build(self, input_shape): - self._dense0 = tf.keras.layers.Dense( - self._output_dim, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - use_bias=self._use_bias and not self._use_normalization) - - if self._use_normalization: - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - center=self._use_bias, - scale=True) - - super(DenseBN, self).build(input_shape) - - def call(self, inputs, training=None): - assert inputs.shape.ndims == 2, inputs.shape - x = self._dense0(inputs) - if self._use_normalization: - x = self._norm0(x) - if self._activation: - x = self._activation_fn(x) - return x diff --git a/official/vision/beta/projects/simclr/modeling/layers/nn_blocks_test.py b/official/vision/beta/projects/simclr/modeling/layers/nn_blocks_test.py deleted file mode 100644 index f63f1037c19aa281a33d6e00571ec7438c1af8c6..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/modeling/layers/nn_blocks_test.py +++ /dev/null @@ -1,58 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from absl.testing import parameterized - -import tensorflow as tf - -from official.vision.beta.projects.simclr.modeling.layers import nn_blocks - - -class DenseBNTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.parameters( - (64, True, True), - (64, True, False), - (64, False, True), - ) - def test_pass_through(self, output_dim, use_bias, use_normalization): - test_layer = nn_blocks.DenseBN( - output_dim=output_dim, - use_bias=use_bias, - use_normalization=use_normalization - ) - - x = tf.keras.Input(shape=(64,)) - out_x = test_layer(x) - - self.assertAllEqual(out_x.shape.as_list(), [None, output_dim]) - - # kernel of the dense layer - train_var_len = 1 - if use_normalization: - if use_bias: - # batch norm introduce two trainable variables - train_var_len += 2 - else: - # center is set to False if not use bias - train_var_len += 1 - else: - if use_bias: - # bias of dense layer - train_var_len += 1 - self.assertLen(test_layer.trainable_variables, train_var_len) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/simclr/tasks/simclr.py b/official/vision/beta/projects/simclr/tasks/simclr.py deleted file mode 100644 index 101fa39749aa5caf74c91f735733175cc40f791e..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/tasks/simclr.py +++ /dev/null @@ -1,632 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Image SimCLR task definition. - -SimCLR training two different modes: -- pretrain -- fine-tuning - -For the above two different modes, the following components are different in -the task definition: -- training data format -- training loss -- projection_head and/or supervised_head -""" -from typing import Dict, Optional - -from absl import logging -import tensorflow as tf - -from official.core import base_task -from official.core import config_definitions -from official.core import input_reader -from official.core import task_factory -from official.modeling import optimization -from official.modeling import performance -from official.modeling import tf_utils -from official.vision.beta.modeling import backbones -from official.vision.beta.projects.simclr.configs import simclr as exp_cfg -from official.vision.beta.projects.simclr.dataloaders import simclr_input -from official.vision.beta.projects.simclr.heads import simclr_head -from official.vision.beta.projects.simclr.losses import contrastive_losses -from official.vision.beta.projects.simclr.modeling import simclr_model - -OptimizationConfig = optimization.OptimizationConfig -RuntimeConfig = config_definitions.RuntimeConfig - - -@task_factory.register_task_cls(exp_cfg.SimCLRPretrainTask) -class SimCLRPretrainTask(base_task.Task): - """A task for image classification.""" - - def create_optimizer(self, - optimizer_config: OptimizationConfig, - runtime_config: Optional[RuntimeConfig] = None): - """Creates an TF optimizer from configurations. - - Args: - optimizer_config: the parameters of the Optimization settings. - runtime_config: the parameters of the runtime. - - Returns: - A tf.optimizers.Optimizer object. - """ - if (optimizer_config.optimizer.type == 'lars' and - self.task_config.loss.l2_weight_decay > 0.0): - raise ValueError('The l2_weight_decay cannot be used together with lars ' - 'optimizer. Please set it to 0.') - - opt_factory = optimization.OptimizerFactory(optimizer_config) - optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) - # Configuring optimizer when loss_scale is set in runtime config. This helps - # avoiding overflow/underflow for float16 computations. - if runtime_config and runtime_config.loss_scale: - optimizer = performance.configure_optimizer( - optimizer, - use_float16=runtime_config.mixed_precision_dtype == 'float16', - loss_scale=runtime_config.loss_scale) - - return optimizer - - def build_model(self): - model_config = self.task_config.model - input_specs = tf.keras.layers.InputSpec(shape=[None] + - model_config.input_size) - - l2_weight_decay = self.task_config.loss.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = ( - tf.keras.regularizers.l2(l2_weight_decay / - 2.0) if l2_weight_decay else None) - - # Build backbone - backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=model_config.norm_activation, - l2_regularizer=l2_regularizer) - - # Build projection head - norm_activation_config = model_config.norm_activation - projection_head_config = model_config.projection_head - projection_head = simclr_head.ProjectionHead( - proj_output_dim=projection_head_config.proj_output_dim, - num_proj_layers=projection_head_config.num_proj_layers, - ft_proj_idx=projection_head_config.ft_proj_idx, - kernel_regularizer=l2_regularizer, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon) - - # Build supervised head - supervised_head_config = model_config.supervised_head - if supervised_head_config: - if supervised_head_config.zero_init: - s_kernel_initializer = 'zeros' - else: - s_kernel_initializer = 'random_uniform' - supervised_head = simclr_head.ClassificationHead( - num_classes=supervised_head_config.num_classes, - kernel_initializer=s_kernel_initializer, - kernel_regularizer=l2_regularizer) - else: - supervised_head = None - - model = simclr_model.SimCLRModel( - input_specs=input_specs, - backbone=backbone, - projection_head=projection_head, - supervised_head=supervised_head, - mode=model_config.mode, - backbone_trainable=model_config.backbone_trainable) - - logging.info(model.get_config()) - - return model - - def initialize(self, model: tf.keras.Model): - """Loading pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - elif self.task_config.init_checkpoint_modules == 'backbone': - ckpt = tf.train.Checkpoint(backbone=model.backbone) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - assert "Only 'all' or 'backbone' can be used to initialize the model." - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def build_inputs(self, params, input_context=None): - input_size = self.task_config.model.input_size - - if params.tfds_name: - decoder = simclr_input.TFDSDecoder(params.decoder.decode_label) - else: - decoder = simclr_input.Decoder(params.decoder.decode_label) - - parser = simclr_input.Parser( - output_size=input_size[:2], - aug_rand_crop=params.parser.aug_rand_crop, - aug_rand_hflip=params.parser.aug_rand_hflip, - aug_color_distort=params.parser.aug_color_distort, - aug_color_jitter_strength=params.parser.aug_color_jitter_strength, - aug_color_jitter_impl=params.parser.aug_color_jitter_impl, - aug_rand_blur=params.parser.aug_rand_blur, - parse_label=params.parser.parse_label, - test_crop=params.parser.test_crop, - mode=params.parser.mode, - dtype=params.dtype) - - reader = input_reader.InputReader( - params, - dataset_fn=tf.data.TFRecordDataset, - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - - dataset = reader.read(input_context=input_context) - - return dataset - - def build_losses(self, - labels, - model_outputs, - aux_losses=None) -> Dict[str, tf.Tensor]: - # Compute contrastive relative loss - con_losses_obj = contrastive_losses.ContrastiveLoss( - projection_norm=self.task_config.loss.projection_norm, - temperature=self.task_config.loss.temperature) - # The projection outputs from model has the size of - # (2 * bsz, project_dim) - projection_outputs = model_outputs[simclr_model.PROJECTION_OUTPUT_KEY] - projection1, projection2 = tf.split(projection_outputs, 2, 0) - contrast_loss, (contrast_logits, contrast_labels) = con_losses_obj( - projection1=projection1, projection2=projection2) - - contrast_accuracy = tf.equal( - tf.argmax(contrast_labels, axis=1), tf.argmax(contrast_logits, axis=1)) - contrast_accuracy = tf.reduce_mean(tf.cast(contrast_accuracy, tf.float32)) - - contrast_prob = tf.nn.softmax(contrast_logits) - contrast_entropy = -tf.reduce_mean( - tf.reduce_sum(contrast_prob * tf.math.log(contrast_prob + 1e-8), -1)) - - model_loss = contrast_loss - - losses = { - 'contrast_loss': contrast_loss, - 'contrast_accuracy': contrast_accuracy, - 'contrast_entropy': contrast_entropy - } - - if self.task_config.model.supervised_head is not None: - outputs = model_outputs[simclr_model.SUPERVISED_OUTPUT_KEY] - labels = tf.concat([labels, labels], 0) - - if self.task_config.evaluation.one_hot: - sup_loss = tf.keras.losses.CategoricalCrossentropy( - from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, - outputs) - else: - sup_loss = tf.keras.losses.SparseCategoricalCrossentropy( - from_logits=True, reduction=tf.keras.losses.Reduction.NONE)(labels, - outputs) - sup_loss = tf.reduce_mean(sup_loss) - - label_acc = tf.equal( - tf.argmax(labels, axis=1), tf.argmax(outputs, axis=1)) - label_acc = tf.reduce_mean(tf.cast(label_acc, tf.float32)) - - model_loss = contrast_loss + sup_loss - - losses.update({ - 'accuracy': label_acc, - 'supervised_loss': sup_loss, - }) - - total_loss = model_loss - if aux_losses: - reg_loss = tf.reduce_sum(aux_losses) - total_loss = model_loss + reg_loss - - losses['total_loss'] = total_loss - - return losses - - def build_metrics(self, training=True): - - if training: - metrics = [] - metric_names = [ - 'total_loss', 'contrast_loss', 'contrast_accuracy', 'contrast_entropy' - ] - if self.task_config.model.supervised_head: - metric_names.extend(['supervised_loss', 'accuracy']) - for name in metric_names: - metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) - else: - k = self.task_config.evaluation.top_k - if self.task_config.evaluation.one_hot: - metrics = [ - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - tf.keras.metrics.TopKCategoricalAccuracy( - k=k, name='top_{}_accuracy'.format(k)) - ] - else: - metrics = [ - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - tf.keras.metrics.SparseTopKCategoricalAccuracy( - k=k, name='top_{}_accuracy'.format(k)) - ] - return metrics - - def train_step(self, inputs, model, optimizer, metrics=None): - features, labels = inputs - - # To do a sanity check that we absolutely use no labels when pretraining, we - # can set the labels here to zero. - if self.task_config.train_data.input_set_label_to_zero: - labels *= 0 - - if (self.task_config.model.supervised_head is not None and - self.task_config.evaluation.one_hot): - num_classes = self.task_config.model.supervised_head.num_classes - labels = tf.one_hot(labels, num_classes) - - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model(features, training=True) - # Casting output layer as float32 is necessary when mixed_precision is - # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - losses = self.build_losses( - model_outputs=outputs, labels=labels, aux_losses=model.losses) - - scaled_loss = losses['total_loss'] / num_replicas - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - logging.info('Trainable variables:') - for var in tvars: - logging.info(var.name) - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient when LossScaleOptimizer is used. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = {self.loss: losses['total_loss']} - - for m in metrics: - m.update_state(losses[m.name]) - logs.update({m.name: m.result()}) - - return logs - - def validation_step(self, inputs, model, metrics=None): - if self.task_config.model.supervised_head is None: - assert 'Skipping eval during pretraining without supervised head.' - - features, labels = inputs - if self.task_config.evaluation.one_hot: - num_classes = self.task_config.model.supervised_head.num_classes - labels = tf.one_hot(labels, num_classes) - - outputs = model( - features, training=False)[simclr_model.SUPERVISED_OUTPUT_KEY] - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - - logs = {self.loss: 0} - - if metrics: - self.process_metrics(metrics, labels, outputs) - logs.update({m.name: m.result() for m in metrics}) - elif model.compiled_metrics: - self.process_compiled_metrics(model.compiled_metrics, labels, outputs) - logs.update({m.name: m.result() for m in model.metrics}) - - return logs - - -@task_factory.register_task_cls(exp_cfg.SimCLRFinetuneTask) -class SimCLRFinetuneTask(base_task.Task): - """A task for image classification.""" - - def create_optimizer(self, - optimizer_config: OptimizationConfig, - runtime_config: Optional[RuntimeConfig] = None): - """Creates an TF optimizer from configurations. - - Args: - optimizer_config: the parameters of the Optimization settings. - runtime_config: the parameters of the runtime. - - Returns: - A tf.optimizers.Optimizer object. - """ - if (optimizer_config.optimizer.type == 'lars' and - self.task_config.loss.l2_weight_decay > 0.0): - raise ValueError('The l2_weight_decay cannot be used together with lars ' - 'optimizer. Please set it to 0.') - - opt_factory = optimization.OptimizerFactory(optimizer_config) - optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) - # Configuring optimizer when loss_scale is set in runtime config. This helps - # avoiding overflow/underflow for float16 computations. - if runtime_config and runtime_config.loss_scale: - optimizer = performance.configure_optimizer( - optimizer, - use_float16=runtime_config.mixed_precision_dtype == 'float16', - loss_scale=runtime_config.loss_scale) - - return optimizer - - def build_model(self): - model_config = self.task_config.model - input_specs = tf.keras.layers.InputSpec(shape=[None] + - model_config.input_size) - - l2_weight_decay = self.task_config.loss.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = ( - tf.keras.regularizers.l2(l2_weight_decay / - 2.0) if l2_weight_decay else None) - - backbone = backbones.factory.build_backbone( - input_specs=input_specs, - backbone_config=model_config.backbone, - norm_activation_config=model_config.norm_activation, - l2_regularizer=l2_regularizer) - - norm_activation_config = model_config.norm_activation - projection_head_config = model_config.projection_head - projection_head = simclr_head.ProjectionHead( - proj_output_dim=projection_head_config.proj_output_dim, - num_proj_layers=projection_head_config.num_proj_layers, - ft_proj_idx=projection_head_config.ft_proj_idx, - kernel_regularizer=l2_regularizer, - use_sync_bn=norm_activation_config.use_sync_bn, - norm_momentum=norm_activation_config.norm_momentum, - norm_epsilon=norm_activation_config.norm_epsilon) - - supervised_head_config = model_config.supervised_head - if supervised_head_config.zero_init: - s_kernel_initializer = 'zeros' - else: - s_kernel_initializer = 'random_uniform' - supervised_head = simclr_head.ClassificationHead( - num_classes=supervised_head_config.num_classes, - kernel_initializer=s_kernel_initializer, - kernel_regularizer=l2_regularizer) - - model = simclr_model.SimCLRModel( - input_specs=input_specs, - backbone=backbone, - projection_head=projection_head, - supervised_head=supervised_head, - mode=model_config.mode, - backbone_trainable=model_config.backbone_trainable) - - logging.info(model.get_config()) - - return model - - def initialize(self, model: tf.keras.Model): - """Loading pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - elif self.task_config.init_checkpoint_modules == 'backbone_projection': - ckpt = tf.train.Checkpoint( - backbone=model.backbone, projection_head=model.projection_head) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - elif self.task_config.init_checkpoint_modules == 'backbone': - ckpt = tf.train.Checkpoint(backbone=model.backbone) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - assert "Only 'all' or 'backbone' can be used to initialize the model." - - # If the checkpoint is from pretraining, reset the following parameters - model.backbone_trainable = self.task_config.model.backbone_trainable - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def build_inputs(self, params, input_context=None): - input_size = self.task_config.model.input_size - - if params.tfds_name: - decoder = simclr_input.TFDSDecoder(params.decoder.decode_label) - else: - decoder = simclr_input.Decoder(params.decoder.decode_label) - parser = simclr_input.Parser( - output_size=input_size[:2], - parse_label=params.parser.parse_label, - test_crop=params.parser.test_crop, - mode=params.parser.mode, - dtype=params.dtype) - - reader = input_reader.InputReader( - params, - dataset_fn=tf.data.TFRecordDataset, - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - - dataset = reader.read(input_context=input_context) - - return dataset - - def build_losses(self, labels, model_outputs, aux_losses=None): - """Sparse categorical cross entropy loss. - - Args: - labels: labels. - model_outputs: Output logits of the classifier. - aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model. - - Returns: - The total loss tensor. - """ - losses_config = self.task_config.loss - if losses_config.one_hot: - total_loss = tf.keras.losses.categorical_crossentropy( - labels, - model_outputs, - from_logits=True, - label_smoothing=losses_config.label_smoothing) - else: - total_loss = tf.keras.losses.sparse_categorical_crossentropy( - labels, model_outputs, from_logits=True) - - total_loss = tf_utils.safe_mean(total_loss) - if aux_losses: - total_loss += tf.add_n(aux_losses) - - return total_loss - - def build_metrics(self, training=True): - """Gets streaming metrics for training/validation.""" - k = self.task_config.evaluation.top_k - if self.task_config.evaluation.one_hot: - metrics = [ - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - tf.keras.metrics.TopKCategoricalAccuracy( - k=k, name='top_{}_accuracy'.format(k)) - ] - else: - metrics = [ - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - tf.keras.metrics.SparseTopKCategoricalAccuracy( - k=k, name='top_{}_accuracy'.format(k)) - ] - return metrics - - def train_step(self, inputs, model, optimizer, metrics=None): - """Does forward and backward. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - if self.task_config.loss.one_hot: - num_classes = self.task_config.model.supervised_head.num_classes - labels = tf.one_hot(labels, num_classes) - - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model( - features, training=True)[simclr_model.SUPERVISED_OUTPUT_KEY] - # Casting output layer as float32 is necessary when mixed_precision is - # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - loss = self.build_losses( - model_outputs=outputs, labels=labels, aux_losses=model.losses) - # Scales loss as the default gradients allreduce performs sum inside the - # optimizer. - scaled_loss = loss / num_replicas - - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - logging.info('Trainable variables:') - for var in tvars: - logging.info(var.name) - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient before apply_gradients when LossScaleOptimizer is - # used. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = {self.loss: loss} - if metrics: - self.process_metrics(metrics, labels, outputs) - logs.update({m.name: m.result() for m in metrics}) - elif model.compiled_metrics: - self.process_compiled_metrics(model.compiled_metrics, labels, outputs) - logs.update({m.name: m.result() for m in model.metrics}) - return logs - - def validation_step(self, inputs, model, metrics=None): - """Validatation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - if self.task_config.loss.one_hot: - num_classes = self.task_config.model.supervised_head.num_classes - labels = tf.one_hot(labels, num_classes) - - outputs = self.inference_step(features, - model)[simclr_model.SUPERVISED_OUTPUT_KEY] - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - loss = self.build_losses( - model_outputs=outputs, labels=labels, aux_losses=model.losses) - - logs = {self.loss: loss} - if metrics: - self.process_metrics(metrics, labels, outputs) - logs.update({m.name: m.result() for m in metrics}) - elif model.compiled_metrics: - self.process_compiled_metrics(model.compiled_metrics, labels, outputs) - logs.update({m.name: m.result() for m in model.metrics}) - return logs diff --git a/official/vision/beta/projects/simclr/train.py b/official/vision/beta/projects/simclr/train.py deleted file mode 100644 index 6f636c657c22a22611b878539fb9c069ab39ce0b..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/simclr/train.py +++ /dev/null @@ -1,66 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""TensorFlow Model Garden Vision SimCLR trainer.""" -from absl import app -from absl import flags -import gin - -from official.common import distribute_utils -from official.common import flags as tfm_flags -from official.core import task_factory -from official.core import train_lib -from official.core import train_utils -from official.modeling import performance -from official.vision.beta.projects.simclr.common import registry_imports # pylint: disable=unused-import - -FLAGS = flags.FLAGS - - -def main(_): - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) - print(FLAGS.experiment) - params = train_utils.parse_configuration(FLAGS) - - model_dir = FLAGS.model_dir - if 'train' in FLAGS.mode: - # Pure eval modes do not output yaml files. Otherwise continuous eval job - # may race against the train job for writing the same file. - train_utils.serialize_config(params, model_dir) - - # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' - # can have significant impact on model speeds by utilizing float16 in case of - # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when - # dtype is float16 - if params.runtime.mixed_precision_dtype: - performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) - distribution_strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - with distribution_strategy.scope(): - task = task_factory.get_task(params.task, logging_dir=model_dir) - - train_lib.run_experiment( - distribution_strategy=distribution_strategy, - task=task, - mode=FLAGS.mode, - params=params, - model_dir=model_dir) - - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(main) diff --git a/official/vision/beta/projects/video_ssl/README.md b/official/vision/beta/projects/video_ssl/README.md deleted file mode 100644 index 92626164955c6e28b594b8239b9461b8f9012f73..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/video_ssl/README.md +++ /dev/null @@ -1,62 +0,0 @@ -# Spatiotemporal Contrastive Video Representation Learning - -[![Paper](http://img.shields.io/badge/Paper-arXiv.2008.03800-B3181B?logo=arXiv)](https://arxiv.org/abs/2008.03800) - -This repository is the official TF2 implementation of [Spatiotemporal Contrastive Video Representation Learning](https://arxiv.org/abs/2008.03800). - -

- -

- -## Description - -We present a self-supervised Contrastive Video Representation Learning (CVRL) -method to learn spatiotemporal visual representations from unlabeled videos. Our -representations are learned using a contrastive loss, where two augmented clips -from the same short video are pulled together in the embedding space, while -clips from different videos are pushed away. CVRL significantly closes the gap -between unsupervised and supervised video representation learning. - -We release the code and pre-trained models. - -More pre-trained model checkpoints and a detailed instruction about the code -will be updated. - - -## Experimental Results - -### Kinetics-600 top-1 linear classification accuracy - -

- -

- - -## Pre-trained Model Checkpoints - -We provide model checkpoints pre-trained on unlabeled RGB videos from -Kinetics-400 and Kinetics-600. All models are trained scratch with random -initialization. - -We also provide a baseline model checkpoint of "ImageNet inflated" we used in -the paper. The model has the same architecture as 3D-ResNet-50 (R3D-50), with -model weights inflated from a 2D ResNet-50 pre-trained on ImageNet. - -| Model | Parameters | Dataset | Epochs | K400 Linear Eval. | K600 Linear Eval. | Checkpoint | -| :--------------: | :----: | :--: | :--: |:-----------: | :----------: | :----------: | -| R3D-50 (1x) | 31.7M | ImageNet | - | 53.5% | 54.7% | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/imagenet.tar.gz) | -| R3D-50 (1x) | 31.7M | Kinetics-400 | 200 | 63.8% | - | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/r3d_1x_k400_200ep.tar.gz) | -| R3D-50 (1x) | 31.7M | Kinetics-400 | 800 | 66.1% | - | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/r3d_1x_k400_800ep.tar.gz) | -| R3D-50 (1x) | 31.7M | Kinetics-600 | 800 | 68.5% | 70.4% | [ckpt (127 MB)](https://storage.googleapis.com/tf_model_garden/vision/cvrl/r3d_1x_k600_800ep.tar.gz) | - - -## Citation - -``` -@inproceedings{qian2021spatiotemporal, - title={Spatiotemporal contrastive video representation learning}, - author={Qian, Rui and Meng, Tianjian and Gong, Boqing and Yang, Ming-Hsuan and Wang, Huisheng and Belongie, Serge and Cui, Yin}, - booktitle={CVPR}, - year={2021} -} -``` diff --git a/official/vision/beta/projects/video_ssl/configs/__init__.py b/official/vision/beta/projects/video_ssl/configs/__init__.py deleted file mode 100644 index d96b0c3bcb1cfb178c3ef1e982b9a9e8a834b268..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/video_ssl/configs/__init__.py +++ /dev/null @@ -1,18 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configs package definition.""" - -from official.vision.beta.projects.video_ssl.configs import video_ssl diff --git a/official/vision/beta/projects/video_ssl/losses/losses.py b/official/vision/beta/projects/video_ssl/losses/losses.py deleted file mode 100644 index 6801816ee405a0b8878ab186b35e861cc5bf8445..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/video_ssl/losses/losses.py +++ /dev/null @@ -1,136 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Define losses.""" - -# Import libraries -import tensorflow as tf -from tensorflow.compiler.tf2xla.python import xla - - -def contrastive_loss(hidden, - num_replicas, - normalize_hidden, - temperature, - model, - weight_decay): - """Computes contrastive loss. - - Args: - hidden: embedding of video clips after projection head. - num_replicas: number of distributed replicas. - normalize_hidden: whether or not to l2 normalize the hidden vector. - temperature: temperature in the InfoNCE contrastive loss. - model: keras model for calculating weight decay. - weight_decay: weight decay parameter. - - Returns: - A loss scalar. - The logits for contrastive prediction task. - The labels for contrastive prediction task. - """ - large_num = 1e9 - - hidden1, hidden2 = tf.split(hidden, num_or_size_splits=2, axis=0) - if normalize_hidden: - hidden1 = tf.math.l2_normalize(hidden1, -1) - hidden2 = tf.math.l2_normalize(hidden2, -1) - batch_size = tf.shape(hidden1)[0] - - if num_replicas == 1: - # This is the local version - hidden1_large = hidden1 - hidden2_large = hidden2 - labels = tf.one_hot(tf.range(batch_size), batch_size * 2) - masks = tf.one_hot(tf.range(batch_size), batch_size) - - else: - # This is the cross-tpu version. - hidden1_large = tpu_cross_replica_concat(hidden1, num_replicas) - hidden2_large = tpu_cross_replica_concat(hidden2, num_replicas) - enlarged_batch_size = tf.shape(hidden1_large)[0] - replica_id = tf.cast(tf.cast(xla.replica_id(), tf.uint32), tf.int32) - labels_idx = tf.range(batch_size) + replica_id * batch_size - labels = tf.one_hot(labels_idx, enlarged_batch_size * 2) - masks = tf.one_hot(labels_idx, enlarged_batch_size) - - logits_aa = tf.matmul(hidden1, hidden1_large, transpose_b=True) / temperature - logits_aa = logits_aa - tf.cast(masks, logits_aa.dtype) * large_num - logits_bb = tf.matmul(hidden2, hidden2_large, transpose_b=True) / temperature - logits_bb = logits_bb - tf.cast(masks, logits_bb.dtype) * large_num - logits_ab = tf.matmul(hidden1, hidden2_large, transpose_b=True) / temperature - logits_ba = tf.matmul(hidden2, hidden1_large, transpose_b=True) / temperature - - loss_a = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( - labels, tf.concat([logits_ab, logits_aa], 1))) - loss_b = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( - labels, tf.concat([logits_ba, logits_bb], 1))) - loss = loss_a + loss_b - - l2_loss = weight_decay * tf.add_n([ - tf.nn.l2_loss(v) - for v in model.trainable_variables - if 'kernel' in v.name - ]) - - total_loss = loss + tf.cast(l2_loss, loss.dtype) - - contrast_prob = tf.nn.softmax(logits_ab) - contrast_entropy = - tf.reduce_mean( - tf.reduce_sum(contrast_prob * tf.math.log(contrast_prob + 1e-8), -1)) - - contrast_acc = tf.equal(tf.argmax(labels, 1), tf.argmax(logits_ab, axis=1)) - contrast_acc = tf.reduce_mean(tf.cast(contrast_acc, tf.float32)) - - return { - 'total_loss': total_loss, - 'contrastive_loss': loss, - 'reg_loss': l2_loss, - 'contrast_acc': contrast_acc, - 'contrast_entropy': contrast_entropy, - } - - -def tpu_cross_replica_concat(tensor, num_replicas): - """Reduce a concatenation of the `tensor` across TPU cores. - - Args: - tensor: tensor to concatenate. - num_replicas: number of TPU device replicas. - - Returns: - Tensor of the same rank as `tensor` with first dimension `num_replicas` - times larger. - """ - with tf.name_scope('tpu_cross_replica_concat'): - # This creates a tensor that is like the input tensor but has an added - # replica dimension as the outermost dimension. On each replica it will - # contain the local values and zeros for all other values that need to be - # fetched from other replicas. - ext_tensor = tf.scatter_nd( - indices=[[xla.replica_id()]], - updates=[tensor], - shape=[num_replicas] + tensor.shape.as_list()) - - # As every value is only present on one replica and 0 in all others, adding - # them all together will result in the full tensor on all replicas. - replica_context = tf.distribute.get_replica_context() - ext_tensor = replica_context.all_reduce(tf.distribute.ReduceOp.SUM, - ext_tensor) - - # Flatten the replica dimension. - # The first dimension size will be: tensor.shape[0] * num_replicas - # Using [-1] trick to support also scalar input. - return tf.reshape(ext_tensor, [-1] + ext_tensor.shape.as_list()[2:]) diff --git a/official/vision/beta/projects/video_ssl/tasks/__init__.py b/official/vision/beta/projects/video_ssl/tasks/__init__.py deleted file mode 100644 index d4b14ce6872c2c75fe071080cb3ccb02538739eb..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/video_ssl/tasks/__init__.py +++ /dev/null @@ -1,18 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tasks package definition.""" - -from official.vision.beta.projects.video_ssl.tasks import linear_eval -from official.vision.beta.projects.video_ssl.tasks import pretrain diff --git a/official/vision/beta/projects/video_ssl/train.py b/official/vision/beta/projects/video_ssl/train.py deleted file mode 100644 index 9a6482d624bb0eb7ff85fc3d0b037659ebc282a6..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/video_ssl/train.py +++ /dev/null @@ -1,78 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Training driver.""" - -from absl import app -from absl import flags -import gin - -# pylint: disable=unused-import -from official.common import registry_imports -from official.common import distribute_utils -from official.common import flags as tfm_flags -from official.core import task_factory -from official.core import train_lib -from official.core import train_utils -from official.modeling import performance -from official.vision.beta.projects.video_ssl.modeling import video_ssl_model -from official.vision.beta.projects.video_ssl.tasks import linear_eval -from official.vision.beta.projects.video_ssl.tasks import pretrain -# pylint: disable=unused-import - -FLAGS = flags.FLAGS - - -def main(_): - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) - params = train_utils.parse_configuration(FLAGS) - model_dir = FLAGS.model_dir - if 'train' in FLAGS.mode: - # Pure eval modes do not output yaml files. Otherwise continuous eval job - # may race against the train job for writing the same file. - train_utils.serialize_config(params, model_dir) - - if 'train_and_eval' in FLAGS.mode: - assert (params.task.train_data.feature_shape == - params.task.validation_data.feature_shape), ( - f'train {params.task.train_data.feature_shape} != validate ' - f'{params.task.validation_data.feature_shape}') - - # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' - # can have significant impact on model speeds by utilizing float16 in case of - # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when - # dtype is float16 - if params.runtime.mixed_precision_dtype: - performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) - distribution_strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - with distribution_strategy.scope(): - task = task_factory.get_task(params.task, logging_dir=model_dir) - - train_lib.run_experiment( - distribution_strategy=distribution_strategy, - task=task, - mode=FLAGS.mode, - params=params, - model_dir=model_dir) - - train_utils.save_gin_config(FLAGS.mode, model_dir) - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(main) diff --git a/official/vision/beta/projects/vit/configs/__init__.py b/official/vision/beta/projects/vit/configs/__init__.py deleted file mode 100644 index b1f629bd7bd73fc0f9e8de6a3afe61b96cafde86..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/vit/configs/__init__.py +++ /dev/null @@ -1,18 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configs package definition.""" - -from official.vision.beta.projects.vit.configs import image_classification diff --git a/official/vision/beta/projects/vit/configs/backbones.py b/official/vision/beta/projects/vit/configs/backbones.py deleted file mode 100644 index 93ee4b1fa389c169e1bc6a3868104462ef6f3cf5..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/vit/configs/backbones.py +++ /dev/null @@ -1,58 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Backbones configurations.""" -from typing import Optional - -import dataclasses - -from official.modeling import hyperparams - - -@dataclasses.dataclass -class Transformer(hyperparams.Config): - """Transformer config.""" - mlp_dim: int = 1 - num_heads: int = 1 - num_layers: int = 1 - attention_dropout_rate: float = 0.0 - dropout_rate: float = 0.1 - - -@dataclasses.dataclass -class VisionTransformer(hyperparams.Config): - """VisionTransformer config.""" - model_name: str = 'vit-b16' - # pylint: disable=line-too-long - classifier: str = 'token' # 'token' or 'gap'. If set to 'token', an extra classification token is added to sequence. - # pylint: enable=line-too-long - representation_size: int = 0 - hidden_size: int = 1 - patch_size: int = 16 - transformer: Transformer = Transformer() - init_stochastic_depth_rate: float = 0.0 - original_init: bool = True - - -@dataclasses.dataclass -class Backbone(hyperparams.OneOfConfig): - """Configuration for backbones. - - Attributes: - type: 'str', type of backbone be used, one the of fields below. - vit: vit backbone config. - """ - type: Optional[str] = None - vit: VisionTransformer = VisionTransformer() diff --git a/official/vision/beta/projects/vit/configs/image_classification.py b/official/vision/beta/projects/vit/configs/image_classification.py deleted file mode 100644 index 25fd3db4b8eacc58aaed1d749c8aaaf917bce999..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/vit/configs/image_classification.py +++ /dev/null @@ -1,281 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Image classification configuration definition.""" -import os -from typing import List, Optional - -import dataclasses - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.core import task_factory -from official.modeling import hyperparams -from official.modeling import optimization -from official.vision.beta.configs import common -from official.vision.beta.configs import image_classification as img_cls_cfg -from official.vision.beta.projects.vit.configs import backbones -from official.vision.beta.tasks import image_classification - -DataConfig = img_cls_cfg.DataConfig - - -@dataclasses.dataclass -class ImageClassificationModel(hyperparams.Config): - """The model config.""" - num_classes: int = 0 - input_size: List[int] = dataclasses.field(default_factory=list) - backbone: backbones.Backbone = backbones.Backbone( - type='vit', vit=backbones.VisionTransformer()) - dropout_rate: float = 0.0 - norm_activation: common.NormActivation = common.NormActivation( - use_sync_bn=False) - # Adds a BatchNormalization layer pre-GlobalAveragePooling in classification - add_head_batch_norm: bool = False - kernel_initializer: str = 'random_uniform' - - -@dataclasses.dataclass -class Losses(hyperparams.Config): - loss_weight: float = 1.0 - one_hot: bool = True - label_smoothing: float = 0.0 - l2_weight_decay: float = 0.0 - soft_labels: bool = False - - -@dataclasses.dataclass -class Evaluation(hyperparams.Config): - top_k: int = 5 - - -@dataclasses.dataclass -class ImageClassificationTask(cfg.TaskConfig): - """The task config. Same as the classification task for convnets.""" - model: ImageClassificationModel = ImageClassificationModel() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False) - losses: Losses = Losses() - evaluation: Evaluation = Evaluation() - init_checkpoint: Optional[str] = None - init_checkpoint_modules: str = 'all' # all or backbone - - -IMAGENET_TRAIN_EXAMPLES = 1281167 -IMAGENET_VAL_EXAMPLES = 50000 -IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord' - -# TODO(b/177942984): integrate the experiments to TF-vision. -task_factory.register_task_cls(ImageClassificationTask)( - image_classification.ImageClassificationTask) - - -@exp_factory.register_config_factory('deit_imagenet_pretrain') -def image_classification_imagenet_deit_pretrain() -> cfg.ExperimentConfig: - """Image classification on imagenet with vision transformer.""" - train_batch_size = 4096 # originally was 1024 but 4096 better for tpu v3-32 - eval_batch_size = 4096 # originally was 1024 but 4096 better for tpu v3-32 - num_classes = 1001 - label_smoothing = 0.1 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - config = cfg.ExperimentConfig( - task=ImageClassificationTask( - model=ImageClassificationModel( - num_classes=num_classes, - input_size=[224, 224, 3], - kernel_initializer='zeros', - backbone=backbones.Backbone( - type='vit', - vit=backbones.VisionTransformer( - model_name='vit-b16', - representation_size=768, - init_stochastic_depth_rate=0.1, - original_init=False, - transformer=backbones.Transformer( - dropout_rate=0.0, attention_dropout_rate=0.0)))), - losses=Losses( - l2_weight_decay=0.0, - label_smoothing=label_smoothing, - one_hot=False, - soft_labels=True), - train_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - aug_type=common.Augmentation( - type='randaug', - randaug=common.RandAugment( - magnitude=9, exclude_ops=['Cutout'])), - mixup_and_cutmix=common.MixupAndCutmix( - label_smoothing=label_smoothing)), - validation_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=300 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'adamw', - 'adamw': { - 'weight_decay_rate': 0.05, - 'include_in_weight_decay': r'.*(kernel|weight):0$', - 'gradient_clip_norm': 0.0 - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - 'initial_learning_rate': 0.0005 * train_batch_size / 512, - 'decay_steps': 300 * steps_per_epoch, - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 5 * steps_per_epoch, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('vit_imagenet_pretrain') -def image_classification_imagenet_vit_pretrain() -> cfg.ExperimentConfig: - """Image classification on imagenet with vision transformer.""" - train_batch_size = 4096 - eval_batch_size = 4096 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - config = cfg.ExperimentConfig( - task=ImageClassificationTask( - model=ImageClassificationModel( - num_classes=1001, - input_size=[224, 224, 3], - kernel_initializer='zeros', - backbone=backbones.Backbone( - type='vit', - vit=backbones.VisionTransformer( - model_name='vit-b16', representation_size=768))), - losses=Losses(l2_weight_decay=0.0), - train_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size), - validation_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=300 * steps_per_epoch, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'adamw', - 'adamw': { - 'weight_decay_rate': 0.3, - 'include_in_weight_decay': r'.*(kernel|weight):0$', - 'gradient_clip_norm': 0.0 - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - 'initial_learning_rate': 0.003 * train_batch_size / 4096, - 'decay_steps': 300 * steps_per_epoch, - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 10000, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('vit_imagenet_finetune') -def image_classification_imagenet_vit_finetune() -> cfg.ExperimentConfig: - """Image classification on imagenet with vision transformer.""" - train_batch_size = 512 - eval_batch_size = 512 - steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size - config = cfg.ExperimentConfig( - task=ImageClassificationTask( - model=ImageClassificationModel( - num_classes=1001, - input_size=[384, 384, 3], - backbone=backbones.Backbone( - type='vit', - vit=backbones.VisionTransformer(model_name='vit-b16'))), - losses=Losses(l2_weight_decay=0.0), - train_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size), - validation_data=DataConfig( - input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), - is_training=False, - global_batch_size=eval_batch_size)), - trainer=cfg.TrainerConfig( - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - train_steps=20000, - validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, - validation_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'optimizer': { - 'type': 'sgd', - 'sgd': { - 'momentum': 0.9, - 'global_clipnorm': 1.0, - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - 'initial_learning_rate': 0.003, - 'decay_steps': 20000, - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config diff --git a/official/vision/beta/projects/vit/modeling/nn_blocks.py b/official/vision/beta/projects/vit/modeling/nn_blocks.py deleted file mode 100644 index 3c222290b8d4ca1411763e1a872899a153a1436b..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/vit/modeling/nn_blocks.py +++ /dev/null @@ -1,106 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Keras-based TransformerEncoder block layer.""" -import tensorflow as tf - -from official.nlp import modeling -from official.vision.beta.modeling.layers.nn_layers import StochasticDepth - - -class TransformerEncoderBlock(modeling.layers.TransformerEncoderBlock): - """TransformerEncoderBlock layer with stochastic depth.""" - - def __init__(self, *args, stochastic_depth_drop_rate=0.0, **kwargs): - """Initializes TransformerEncoderBlock.""" - super().__init__(*args, **kwargs) - self._stochastic_depth_drop_rate = stochastic_depth_drop_rate - - def build(self, input_shape): - if self._stochastic_depth_drop_rate: - self._stochastic_depth = StochasticDepth(self._stochastic_depth_drop_rate) - else: - self._stochastic_depth = lambda x, *args, **kwargs: tf.identity(x) - - super().build(input_shape) - - def get_config(self): - config = {"stochastic_depth_drop_rate": self._stochastic_depth_drop_rate} - base_config = super().get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs, training=None): - """Transformer self-attention encoder block call.""" - if isinstance(inputs, (list, tuple)): - if len(inputs) == 2: - input_tensor, attention_mask = inputs - key_value = None - elif len(inputs) == 3: - input_tensor, key_value, attention_mask = inputs - else: - raise ValueError("Unexpected inputs to %s with length at %d" % - (self.__class__, len(inputs))) - else: - input_tensor, key_value, attention_mask = (inputs, None, None) - - if self._output_range: - if self._norm_first: - source_tensor = input_tensor[:, 0:self._output_range, :] - input_tensor = self._attention_layer_norm(input_tensor) - if key_value is not None: - key_value = self._attention_layer_norm(key_value) - target_tensor = input_tensor[:, 0:self._output_range, :] - if attention_mask is not None: - attention_mask = attention_mask[:, 0:self._output_range, :] - else: - if self._norm_first: - source_tensor = input_tensor - input_tensor = self._attention_layer_norm(input_tensor) - if key_value is not None: - key_value = self._attention_layer_norm(key_value) - target_tensor = input_tensor - - if key_value is None: - key_value = input_tensor - attention_output = self._attention_layer( - query=target_tensor, value=key_value, attention_mask=attention_mask) - attention_output = self._attention_dropout(attention_output) - - if self._norm_first: - attention_output = source_tensor + self._stochastic_depth( - attention_output, training=training) - else: - attention_output = self._attention_layer_norm( - target_tensor + - self._stochastic_depth(attention_output, training=training)) - - if self._norm_first: - source_attention_output = attention_output - attention_output = self._output_layer_norm(attention_output) - inner_output = self._intermediate_dense(attention_output) - inner_output = self._intermediate_activation_layer(inner_output) - inner_output = self._inner_dropout_layer(inner_output) - layer_output = self._output_dense(inner_output) - layer_output = self._output_dropout(layer_output) - - if self._norm_first: - return source_attention_output + self._stochastic_depth( - layer_output, training=training) - - # During mixed precision training, layer norm output is always fp32 for now. - # Casts fp32 for the subsequent add. - layer_output = tf.cast(layer_output, tf.float32) - return self._output_layer_norm( - layer_output + - self._stochastic_depth(attention_output, training=training)) diff --git a/official/vision/beta/projects/vit/modeling/vit.py b/official/vision/beta/projects/vit/modeling/vit.py deleted file mode 100644 index ae1e8ea948c6df161ca3beb7eeaf7402198bec38..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/vit/modeling/vit.py +++ /dev/null @@ -1,278 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""VisionTransformer models.""" -import tensorflow as tf - -from official.modeling import activations -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_layers -from official.vision.beta.projects.vit.modeling import nn_blocks - -layers = tf.keras.layers - -VIT_SPECS = { - 'vit-ti16': - dict( - hidden_size=192, - patch_size=16, - transformer=dict(mlp_dim=768, num_heads=3, num_layers=12), - ), - 'vit-s16': - dict( - hidden_size=384, - patch_size=16, - transformer=dict(mlp_dim=1536, num_heads=6, num_layers=12), - ), - 'vit-b16': - dict( - hidden_size=768, - patch_size=16, - transformer=dict(mlp_dim=3072, num_heads=12, num_layers=12), - ), - 'vit-b32': - dict( - hidden_size=768, - patch_size=32, - transformer=dict(mlp_dim=3072, num_heads=12, num_layers=12), - ), - 'vit-l16': - dict( - hidden_size=1024, - patch_size=16, - transformer=dict(mlp_dim=4096, num_heads=16, num_layers=24), - ), - 'vit-l32': - dict( - hidden_size=1024, - patch_size=32, - transformer=dict(mlp_dim=4096, num_heads=16, num_layers=24), - ), - 'vit-h14': - dict( - hidden_size=1280, - patch_size=14, - transformer=dict(mlp_dim=5120, num_heads=16, num_layers=32), - ), - 'vit-g14': - dict( - hidden_size=1664, - patch_size=14, - transformer=dict(mlp_dim=8192, num_heads=16, num_layers=48), - ), -} - - -class AddPositionEmbs(tf.keras.layers.Layer): - """Adds (optionally learned) positional embeddings to the inputs.""" - - def __init__(self, posemb_init=None, **kwargs): - super().__init__(**kwargs) - self.posemb_init = posemb_init - - def build(self, inputs_shape): - pos_emb_shape = (1, inputs_shape[1], inputs_shape[2]) - self.pos_embedding = self.add_weight( - 'pos_embedding', pos_emb_shape, initializer=self.posemb_init) - - def call(self, inputs, inputs_positions=None): - # inputs.shape is (batch_size, seq_len, emb_dim). - pos_embedding = tf.cast(self.pos_embedding, inputs.dtype) - - return inputs + pos_embedding - - -class TokenLayer(tf.keras.layers.Layer): - """A simple layer to wrap token parameters.""" - - def build(self, inputs_shape): - self.cls = self.add_weight( - 'cls', (1, 1, inputs_shape[-1]), initializer='zeros') - - def call(self, inputs): - cls = tf.cast(self.cls, inputs.dtype) - cls = cls + tf.zeros_like(inputs[:, 0:1]) # A hacky way to tile. - x = tf.concat([cls, inputs], axis=1) - return x - - -class Encoder(tf.keras.layers.Layer): - """Transformer Encoder.""" - - def __init__(self, - num_layers, - mlp_dim, - num_heads, - dropout_rate=0.1, - attention_dropout_rate=0.1, - kernel_regularizer=None, - inputs_positions=None, - init_stochastic_depth_rate=0.0, - kernel_initializer='glorot_uniform', - **kwargs): - super().__init__(**kwargs) - self._num_layers = num_layers - self._mlp_dim = mlp_dim - self._num_heads = num_heads - self._dropout_rate = dropout_rate - self._attention_dropout_rate = attention_dropout_rate - self._kernel_regularizer = kernel_regularizer - self._inputs_positions = inputs_positions - self._init_stochastic_depth_rate = init_stochastic_depth_rate - self._kernel_initializer = kernel_initializer - - def build(self, input_shape): - self._pos_embed = AddPositionEmbs( - posemb_init=tf.keras.initializers.RandomNormal(stddev=0.02), - name='posembed_input') - self._dropout = layers.Dropout(rate=self._dropout_rate) - - self._encoder_layers = [] - # Set layer norm epsilons to 1e-6 to be consistent with JAX implementation. - # https://flax.readthedocs.io/en/latest/_autosummary/flax.nn.LayerNorm.html - for i in range(self._num_layers): - encoder_layer = nn_blocks.TransformerEncoderBlock( - inner_activation=activations.gelu, - num_attention_heads=self._num_heads, - inner_dim=self._mlp_dim, - output_dropout=self._dropout_rate, - attention_dropout=self._attention_dropout_rate, - kernel_regularizer=self._kernel_regularizer, - kernel_initializer=self._kernel_initializer, - norm_first=True, - stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( - self._init_stochastic_depth_rate, i + 1, self._num_layers), - norm_epsilon=1e-6) - self._encoder_layers.append(encoder_layer) - self._norm = layers.LayerNormalization(epsilon=1e-6) - super().build(input_shape) - - def call(self, inputs, training=None): - x = self._pos_embed(inputs, inputs_positions=self._inputs_positions) - x = self._dropout(x, training=training) - - for encoder_layer in self._encoder_layers: - x = encoder_layer(x, training=training) - x = self._norm(x) - return x - - -class VisionTransformer(tf.keras.Model): - """Class to build VisionTransformer family model.""" - - def __init__(self, - mlp_dim=3072, - num_heads=12, - num_layers=12, - attention_dropout_rate=0.0, - dropout_rate=0.1, - init_stochastic_depth_rate=0.0, - input_specs=layers.InputSpec(shape=[None, None, None, 3]), - patch_size=16, - hidden_size=768, - representation_size=0, - classifier='token', - kernel_regularizer=None, - original_init=True): - """VisionTransformer initialization function.""" - inputs = tf.keras.Input(shape=input_specs.shape[1:]) - - x = layers.Conv2D( - filters=hidden_size, - kernel_size=patch_size, - strides=patch_size, - padding='valid', - kernel_regularizer=kernel_regularizer, - kernel_initializer='lecun_normal' if original_init else 'he_uniform')( - inputs) - if tf.keras.backend.image_data_format() == 'channels_last': - rows_axis, cols_axis = (1, 2) - else: - rows_axis, cols_axis = (2, 3) - # The reshape below assumes the data_format is 'channels_last,' so - # transpose to that. Once the data is flattened by the reshape, the - # data_format is irrelevant, so no need to update - # tf.keras.backend.image_data_format. - x = tf.transpose(x, perm=[0, 2, 3, 1]) - seq_len = (input_specs.shape[rows_axis] // patch_size) * ( - input_specs.shape[cols_axis] // patch_size) - x = tf.reshape(x, [-1, seq_len, hidden_size]) - - # If we want to add a class token, add it here. - if classifier == 'token': - x = TokenLayer(name='cls')(x) - - x = Encoder( - num_layers=num_layers, - mlp_dim=mlp_dim, - num_heads=num_heads, - dropout_rate=dropout_rate, - attention_dropout_rate=attention_dropout_rate, - kernel_regularizer=kernel_regularizer, - kernel_initializer='glorot_uniform' if original_init else dict( - class_name='TruncatedNormal', config=dict(stddev=.02)), - init_stochastic_depth_rate=init_stochastic_depth_rate)( - x) - - if classifier == 'token': - x = x[:, 0] - elif classifier == 'gap': - x = tf.reduce_mean(x, axis=1) - - if representation_size: - x = tf.keras.layers.Dense( - representation_size, - kernel_regularizer=kernel_regularizer, - name='pre_logits', - kernel_initializer='lecun_normal' if original_init else 'he_uniform')( - x) - x = tf.nn.tanh(x) - else: - x = tf.identity(x, name='pre_logits') - endpoints = { - 'pre_logits': - tf.reshape(x, [-1, 1, 1, representation_size or hidden_size]) - } - - super(VisionTransformer, self).__init__(inputs=inputs, outputs=endpoints) - - -@factory.register_backbone_builder('vit') -def build_vit(input_specs, - backbone_config, - norm_activation_config, - l2_regularizer=None): - """Build ViT model.""" - del norm_activation_config - backbone_type = backbone_config.type - backbone_cfg = backbone_config.get() - assert backbone_type == 'vit', (f'Inconsistent backbone type ' - f'{backbone_type}') - backbone_cfg.override(VIT_SPECS[backbone_cfg.model_name]) - - return VisionTransformer( - mlp_dim=backbone_cfg.transformer.mlp_dim, - num_heads=backbone_cfg.transformer.num_heads, - num_layers=backbone_cfg.transformer.num_layers, - attention_dropout_rate=backbone_cfg.transformer.attention_dropout_rate, - dropout_rate=backbone_cfg.transformer.dropout_rate, - init_stochastic_depth_rate=backbone_cfg.init_stochastic_depth_rate, - input_specs=input_specs, - patch_size=backbone_cfg.patch_size, - hidden_size=backbone_cfg.hidden_size, - representation_size=backbone_cfg.representation_size, - classifier=backbone_cfg.classifier, - kernel_regularizer=l2_regularizer, - original_init=backbone_cfg.original_init) diff --git a/official/vision/beta/projects/vit/modeling/vit_test.py b/official/vision/beta/projects/vit/modeling/vit_test.py deleted file mode 100644 index 7a9b2ac4d1482f3c030183c2de511f920917be87..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/vit/modeling/vit_test.py +++ /dev/null @@ -1,43 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tests for VIT.""" - -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.beta.projects.vit.modeling import vit - - -class VisionTransformerTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (224, 85798656), - (256, 85844736), - ) - def test_network_creation(self, input_size, params_count): - """Test creation of VisionTransformer family models.""" - tf.keras.backend.set_image_data_format('channels_last') - input_specs = tf.keras.layers.InputSpec( - shape=[2, input_size, input_size, 3]) - network = vit.VisionTransformer(input_specs=input_specs) - - inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) - _ = network(inputs) - self.assertEqual(network.count_params(), params_count) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/vit/train.py b/official/vision/beta/projects/vit/train.py deleted file mode 100644 index 46a6a1b5e8c433b3ffaec115e2a2c27b84b38320..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/vit/train.py +++ /dev/null @@ -1,28 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""TensorFlow Model Garden Vision training driver, including ViT configs..""" - -from absl import app - -from official.common import flags as tfm_flags -from official.vision.beta import train -from official.vision.beta.projects.vit import configs # pylint: disable=unused-import -from official.vision.beta.projects.vit.modeling import vit # pylint: disable=unused-import - - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(train.main) diff --git a/official/vision/beta/projects/yolo/README.md b/official/vision/beta/projects/yolo/README.md deleted file mode 100644 index a39fdf0d2884297f91d34f64708e850a7df5a3db..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/README.md +++ /dev/null @@ -1,88 +0,0 @@ -DISCLAIMER: this YOLO implementation is still under development. No support will -be provided during the development phase. - -# YOLO Object Detectors, You Only Look Once - -[![Paper](http://img.shields.io/badge/Paper-arXiv.1804.02767-B3181B?logo=arXiv)](https://arxiv.org/abs/1804.02767) -[![Paper](http://img.shields.io/badge/Paper-arXiv.2004.10934-B3181B?logo=arXiv)](https://arxiv.org/abs/2004.10934) - -This repository is the unofficial implementation of the following papers. -However, we spent painstaking hours ensuring that every aspect that we -constructed was the exact same as the original paper and the original -repository. - -* YOLOv3: An Incremental Improvement: [YOLOv3: An Incremental Improvement](https://arxiv.org/abs/1804.02767) - -* YOLOv4: Optimal Speed and Accuracy of Object Detection: [YOLOv4: Optimal Speed and Accuracy of Object Detection](https://arxiv.org/abs/2004.10934) - -## Description - -YOLO v1 the original implementation was released in 2015 providing a -ground breaking algorithm that would quickly process images and locate objects -in a single pass through the detector. The original implementation used a -backbone derived from state of the art object classifiers of the time, like -[GoogLeNet](https://arxiv.org/abs/1409.4842) and -[VGG](https://arxiv.org/abs/1409.1556). More attention was given to the novel -YOLO Detection head that allowed for Object Detection with a single pass of an -image. Though limited, the network could predict up to 90 bounding boxes per -image, and was tested for about 80 classes per box. Also, the model can only -make predictions at one scale. These attributes caused YOLO v1 to be more -limited and less versatile, so as the year passed, the Developers continued to -update and develop this model. - -YOLO v3 and v4 serve as the most up to date and capable versions of the YOLO -network group. This model uses a custom backbone called Darknet53 that uses -knowledge gained from the ResNet paper to improve its predictions. The new -backbone also allows for objects to be detected at multiple scales. As for the -new detection head, the model now predicts the bounding boxes using a set of -anchor box priors (Anchor Boxes) as suggestions. Multiscale predictions in -combination with Anchor boxes allow for the network to make up to 1000 object -predictions on a single image. Finally, the new loss function forces the network -to make better predictions by using Intersection Over Union (IOU) to inform the -model's confidence rather than relying on the mean squared error for the entire -output. - - -## Authors - -* Vishnu Samardh Banna ([@GitHub vishnubanna](https://github.com/vishnubanna)) -* Anirudh Vegesana ([@GitHub anivegesana](https://github.com/anivegesana)) -* Akhil Chinnakotla ([@GitHub The-Indian-Chinna](https://github.com/The-Indian-Chinna)) -* Tristan Yan ([@GitHub Tyan3001](https://github.com/Tyan3001)) -* Naveen Vivek ([@GitHub naveen-vivek](https://github.com/naveen-vivek)) - -## Table of Contents - -* [Our Goal](#our-goal) -* [Models in the library](#models-in-the-library) -* [References](#references) - - -## Our Goal - -Our goal with this model conversion is to provide implementation of the Backbone -and YOLO Head. We have built the model in such a way that the YOLO head could be -connected to a new, more powerful backbone if a person chose to. - -## Models in the library - -| Object Detectors | Classifiers | -| :--------------: | :--------------: | -| Yolo-v3 | Darknet53 | -| Yolo-v3 tiny | CSPDarknet53 | -| Yolo-v3 spp | -| Yolo-v4 | -| Yolo-v4 tiny | -| Yolo-v4 csp | -| Yolo-v4 large | - -## Models Zoo - - -## Requirements -[![TensorFlow 2.6](https://img.shields.io/badge/TensorFlow-2.6-FF6F00?logo=tensorflow)](https://github.com/tensorflow/tensorflow/releases/tag/v2.6.0) -[![Python 3.8](https://img.shields.io/badge/Python-3.8-3776AB)](https://www.python.org/downloads/release/python-380/) - - -DISCLAIMER: this YOLO implementation is still under development. No support -will be provided during the development phase. diff --git a/official/vision/beta/projects/yolo/common/registry_imports.py b/official/vision/beta/projects/yolo/common/registry_imports.py deleted file mode 100644 index e40d39856a703e2f17a0f144ccd880eb3999647b..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/common/registry_imports.py +++ /dev/null @@ -1,36 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""All necessary imports for registration.""" - -# pylint: disable=unused-import -# pylint: disable=g-bad-import-order -from official.common import registry_imports - -# import configs -from official.vision.beta.projects.yolo.configs import darknet_classification -from official.vision.beta.projects.yolo.configs import yolo as yolo_config - -# import modeling components -from official.vision.beta.projects.yolo.modeling.backbones import darknet -from official.vision.beta.projects.yolo.modeling.decoders import yolo_decoder - -# import tasks -from official.vision.beta.projects.yolo.tasks import image_classification -from official.vision.beta.projects.yolo.tasks import yolo as yolo_task - -# import optimization packages -from official.vision.beta.projects.yolo.optimization import optimizer_factory -from official.vision.beta.projects.yolo.optimization.configs import optimizer_config -from official.vision.beta.projects.yolo.optimization.configs import optimization_config diff --git a/official/vision/beta/projects/yolo/configs/backbones.py b/official/vision/beta/projects/yolo/configs/backbones.py deleted file mode 100644 index 071af5bdef7db8830f96e96c4f905f9c4772fc16..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/configs/backbones.py +++ /dev/null @@ -1,36 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Backbones configurations.""" -import dataclasses -from official.modeling import hyperparams -from official.vision.beta.configs import backbones - - -@dataclasses.dataclass -class Darknet(hyperparams.Config): - """DarkNet config.""" - model_id: str = 'cspdarknet53' - width_scale: float = 1.0 - depth_scale: float = 1.0 - dilate: bool = False - min_level: int = 3 - max_level: int = 5 - use_separable_conv: bool = False - use_reorg_input: bool = False - - -@dataclasses.dataclass -class Backbone(backbones.Backbone): - darknet: Darknet = Darknet() diff --git a/official/vision/beta/projects/yolo/configs/decoders.py b/official/vision/beta/projects/yolo/configs/decoders.py deleted file mode 100755 index 7a4f4a6d997db9e6936d70f42eb8cbdde9c94adf..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/configs/decoders.py +++ /dev/null @@ -1,48 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Decoders configurations.""" -import dataclasses -from typing import Optional -from official.modeling import hyperparams -from official.vision.beta.configs import decoders - - -@dataclasses.dataclass -class YoloDecoder(hyperparams.Config): - """Builds Yolo decoder. - - If the name is specified, or version is specified we ignore input parameters - and use version and name defaults. - """ - version: Optional[str] = None - type: Optional[str] = None - use_fpn: Optional[bool] = None - use_spatial_attention: bool = False - use_separable_conv: bool = False - csp_stack: Optional[bool] = None - fpn_depth: Optional[int] = None - max_fpn_depth: Optional[int] = None - max_csp_stack: Optional[int] = None - fpn_filter_scale: Optional[int] = None - path_process_len: Optional[int] = None - max_level_process_len: Optional[int] = None - embed_spp: Optional[bool] = None - activation: Optional[str] = 'same' - - -@dataclasses.dataclass -class Decoder(decoders.Decoder): - type: Optional[str] = 'yolo_decoder' - yolo_decoder: YoloDecoder = YoloDecoder() diff --git a/official/vision/beta/projects/yolo/configs/yolo.py b/official/vision/beta/projects/yolo/configs/yolo.py deleted file mode 100755 index bb529f41b4ab1c9971deeb643f2d90de09060948..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/configs/yolo.py +++ /dev/null @@ -1,510 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""YOLO configuration definition.""" -import dataclasses -import os -from typing import Any, List, Optional, Union - -import numpy as np - -from official.core import config_definitions as cfg -from official.core import exp_factory -from official.modeling import hyperparams -from official.vision.beta.configs import common -from official.vision.beta.projects.yolo import optimization -from official.vision.beta.projects.yolo.configs import backbones -from official.vision.beta.projects.yolo.configs import decoders - - -# pytype: disable=annotation-type-mismatch - -MIN_LEVEL = 1 -MAX_LEVEL = 7 -GLOBAL_SEED = 1000 - - -def _build_dict(min_level, max_level, value): - vals = {str(key): value for key in range(min_level, max_level + 1)} - vals['all'] = None - return lambda: vals - - -def _build_path_scales(min_level, max_level): - return lambda: {str(key): 2**key for key in range(min_level, max_level + 1)} - - -@dataclasses.dataclass -class FPNConfig(hyperparams.Config): - """FPN config.""" - all: Optional[Any] = None - - def get(self): - """Allow for a key for each level or a single key for all the levels.""" - values = self.as_dict() - if 'all' in values and values['all'] is not None: - for key in values: - if key != 'all': - values[key] = values['all'] - return values - - -# pylint: disable=missing-class-docstring -@dataclasses.dataclass -class TfExampleDecoder(hyperparams.Config): - regenerate_source_id: bool = False - coco91_to_80: bool = True - - -@dataclasses.dataclass -class TfExampleDecoderLabelMap(hyperparams.Config): - regenerate_source_id: bool = False - label_map: str = '' - - -@dataclasses.dataclass -class DataDecoder(hyperparams.OneOfConfig): - type: Optional[str] = 'simple_decoder' - simple_decoder: TfExampleDecoder = TfExampleDecoder() - label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap() - - -@dataclasses.dataclass -class Mosaic(hyperparams.Config): - mosaic_frequency: float = 0.0 - mixup_frequency: float = 0.0 - mosaic_center: float = 0.2 - mosaic_crop_mode: Optional[str] = None - aug_scale_min: float = 1.0 - aug_scale_max: float = 1.0 - jitter: float = 0.0 - - -@dataclasses.dataclass -class Parser(hyperparams.Config): - max_num_instances: int = 200 - letter_box: Optional[bool] = True - random_flip: bool = True - random_pad: float = False - jitter: float = 0.0 - aug_scale_min: float = 1.0 - aug_scale_max: float = 1.0 - aug_rand_saturation: float = 0.0 - aug_rand_brightness: float = 0.0 - aug_rand_hue: float = 0.0 - aug_rand_angle: float = 0.0 - aug_rand_translate: float = 0.0 - aug_rand_perspective: float = 0.0 - use_tie_breaker: bool = True - best_match_only: bool = False - anchor_thresh: float = -0.01 - area_thresh: float = 0.1 - mosaic: Mosaic = Mosaic() - - -@dataclasses.dataclass -class DataConfig(cfg.DataConfig): - """Input config for training.""" - global_batch_size: int = 64 - input_path: str = '' - tfds_name: str = '' - tfds_split: str = '' - global_batch_size: int = 1 - is_training: bool = True - dtype: str = 'float16' - decoder: DataDecoder = DataDecoder() - parser: Parser = Parser() - shuffle_buffer_size: int = 10000 - tfds_download: bool = True - cache: bool = False - drop_remainder: bool = True - - -@dataclasses.dataclass -class YoloHead(hyperparams.Config): - """Parameterization for the YOLO Head.""" - smart_bias: bool = True - - -@dataclasses.dataclass -class YoloDetectionGenerator(hyperparams.Config): - box_type: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 'original')) - scale_xy: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) - path_scales: FPNConfig = dataclasses.field( - default_factory=_build_path_scales(MIN_LEVEL, MAX_LEVEL)) - nms_type: str = 'greedy' - iou_thresh: float = 0.001 - nms_thresh: float = 0.6 - max_boxes: int = 200 - pre_nms_points: int = 5000 - - -@dataclasses.dataclass -class YoloLoss(hyperparams.Config): - ignore_thresh: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 0.0)) - truth_thresh: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) - box_loss_type: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 'ciou')) - iou_normalizer: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) - cls_normalizer: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) - object_normalizer: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 1.0)) - max_delta: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, np.inf)) - objectness_smooth: FPNConfig = dataclasses.field( - default_factory=_build_dict(MIN_LEVEL, MAX_LEVEL, 0.0)) - label_smoothing: float = 0.0 - use_scaled_loss: bool = True - update_on_repeat: bool = True - - -@dataclasses.dataclass -class Box(hyperparams.Config): - box: List[int] = dataclasses.field(default=list) - - -@dataclasses.dataclass -class AnchorBoxes(hyperparams.Config): - boxes: Optional[List[Box]] = None - level_limits: Optional[List[int]] = None - anchors_per_scale: int = 3 - - def get(self, min_level, max_level): - """Distribute them in order to each level. - - Args: - min_level: `int` the lowest output level. - max_level: `int` the heighest output level. - Returns: - anchors_per_level: A `Dict[List[int]]` of the anchor boxes for each level. - self.level_limits: A `List[int]` of the box size limits to link to each - level under anchor free conditions. - """ - if self.level_limits is None: - boxes = [box.box for box in self.boxes] - else: - boxes = [[1.0, 1.0]] * ((max_level - min_level) + 1) - self.anchors_per_scale = 1 - - anchors_per_level = dict() - start = 0 - for i in range(min_level, max_level + 1): - anchors_per_level[str(i)] = boxes[start:start + self.anchors_per_scale] - start += self.anchors_per_scale - return anchors_per_level, self.level_limits - - -@dataclasses.dataclass -class Yolo(hyperparams.Config): - input_size: Optional[List[int]] = dataclasses.field( - default_factory=lambda: [512, 512, 3]) - backbone: backbones.Backbone = backbones.Backbone( - type='darknet', darknet=backbones.Darknet(model_id='cspdarknet53')) - decoder: decoders.Decoder = decoders.Decoder( - type='yolo_decoder', - yolo_decoder=decoders.YoloDecoder(version='v4', type='regular')) - head: YoloHead = YoloHead() - detection_generator: YoloDetectionGenerator = YoloDetectionGenerator() - loss: YoloLoss = YoloLoss() - norm_activation: common.NormActivation = common.NormActivation( - activation='mish', - use_sync_bn=True, - norm_momentum=0.99, - norm_epsilon=0.001) - num_classes: int = 80 - anchor_boxes: AnchorBoxes = AnchorBoxes() - darknet_based_model: bool = False - - -@dataclasses.dataclass -class YoloTask(cfg.TaskConfig): - per_category_metrics: bool = False - smart_bias_lr: float = 0.0 - model: Yolo = Yolo() - train_data: DataConfig = DataConfig(is_training=True) - validation_data: DataConfig = DataConfig(is_training=False) - weight_decay: float = 0.0 - annotation_file: Optional[str] = None - init_checkpoint: Optional[str] = None - init_checkpoint_modules: Union[ - str, List[str]] = 'all' # all, backbone, and/or decoder - gradient_clip_norm: float = 0.0 - seed = GLOBAL_SEED - - -COCO_INPUT_PATH_BASE = 'coco' -COCO_TRAIN_EXAMPLES = 118287 -COCO_VAL_EXAMPLES = 5000 - - -@exp_factory.register_config_factory('yolo') -def yolo() -> cfg.ExperimentConfig: - """Yolo general config.""" - return cfg.ExperimentConfig( - task=YoloTask(), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - -@exp_factory.register_config_factory('yolo_darknet') -def yolo_darknet() -> cfg.ExperimentConfig: - """COCO object detection with YOLOv3 and v4.""" - train_batch_size = 256 - eval_batch_size = 8 - train_epochs = 300 - steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size - validation_interval = 5 - - max_num_instances = 200 - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=YoloTask( - smart_bias_lr=0.1, - init_checkpoint='', - init_checkpoint_modules='backbone', - annotation_file=None, - weight_decay=0.0, - model=Yolo( - darknet_based_model=True, - norm_activation=common.NormActivation(use_sync_bn=True), - head=YoloHead(smart_bias=True), - loss=YoloLoss(use_scaled_loss=False, update_on_repeat=True), - anchor_boxes=AnchorBoxes( - anchors_per_scale=3, - boxes=[ - Box(box=[12, 16]), - Box(box=[19, 36]), - Box(box=[40, 28]), - Box(box=[36, 75]), - Box(box=[76, 55]), - Box(box=[72, 146]), - Box(box=[142, 110]), - Box(box=[192, 243]), - Box(box=[459, 401]) - ])), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - dtype='float32', - parser=Parser( - letter_box=False, - aug_rand_saturation=1.5, - aug_rand_brightness=1.5, - aug_rand_hue=0.1, - use_tie_breaker=True, - best_match_only=False, - anchor_thresh=0.4, - area_thresh=0.1, - max_num_instances=max_num_instances, - mosaic=Mosaic( - mosaic_frequency=0.75, - mixup_frequency=0.0, - mosaic_crop_mode='crop', - mosaic_center=0.2))), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - drop_remainder=True, - dtype='float32', - parser=Parser( - letter_box=False, - use_tie_breaker=True, - best_match_only=False, - anchor_thresh=0.4, - area_thresh=0.1, - max_num_instances=max_num_instances, - ))), - trainer=cfg.TrainerConfig( - train_steps=train_epochs * steps_per_epoch, - validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, - validation_interval=validation_interval * steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'ema': { - 'average_decay': 0.9998, - 'trainable_weights_only': False, - 'dynamic_decay': True, - }, - 'optimizer': { - 'type': 'sgd_torch', - 'sgd_torch': { - 'momentum': 0.949, - 'momentum_start': 0.949, - 'nesterov': True, - 'warmup_steps': 1000, - 'weight_decay': 0.0005, - } - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': { - 'boundaries': [ - 240 * steps_per_epoch - ], - 'values': [ - 0.00131 * train_batch_size / 64.0, - 0.000131 * train_batch_size / 64.0, - ] - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': 1000, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config - - -@exp_factory.register_config_factory('scaled_yolo') -def scaled_yolo() -> cfg.ExperimentConfig: - """COCO object detection with YOLOv4-csp and v4.""" - train_batch_size = 256 - eval_batch_size = 8 - train_epochs = 300 - warmup_epochs = 3 - - validation_interval = 5 - steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size - - max_num_instances = 300 - - config = cfg.ExperimentConfig( - runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), - task=YoloTask( - smart_bias_lr=0.1, - init_checkpoint_modules='', - annotation_file=None, - weight_decay=0.0, - model=Yolo( - darknet_based_model=False, - norm_activation=common.NormActivation( - activation='mish', - use_sync_bn=True, - norm_epsilon=0.001, - norm_momentum=0.97), - head=YoloHead(smart_bias=True), - loss=YoloLoss(use_scaled_loss=True), - anchor_boxes=AnchorBoxes( - anchors_per_scale=3, - boxes=[ - Box(box=[12, 16]), - Box(box=[19, 36]), - Box(box=[40, 28]), - Box(box=[36, 75]), - Box(box=[76, 55]), - Box(box=[72, 146]), - Box(box=[142, 110]), - Box(box=[192, 243]), - Box(box=[459, 401]) - ])), - train_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), - is_training=True, - global_batch_size=train_batch_size, - dtype='float32', - parser=Parser( - aug_rand_saturation=0.7, - aug_rand_brightness=0.4, - aug_rand_hue=0.015, - letter_box=True, - use_tie_breaker=True, - best_match_only=True, - anchor_thresh=4.0, - random_pad=False, - area_thresh=0.1, - max_num_instances=max_num_instances, - mosaic=Mosaic( - mosaic_crop_mode='scale', - mosaic_frequency=1.0, - mixup_frequency=0.0, - ))), - validation_data=DataConfig( - input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), - is_training=False, - global_batch_size=eval_batch_size, - drop_remainder=True, - dtype='float32', - parser=Parser( - letter_box=True, - use_tie_breaker=True, - best_match_only=True, - anchor_thresh=4.0, - area_thresh=0.1, - max_num_instances=max_num_instances, - ))), - trainer=cfg.TrainerConfig( - train_steps=train_epochs * steps_per_epoch, - validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, - validation_interval=validation_interval * steps_per_epoch, - steps_per_loop=steps_per_epoch, - summary_interval=steps_per_epoch, - checkpoint_interval=5 * steps_per_epoch, - optimizer_config=optimization.OptimizationConfig({ - 'ema': { - 'average_decay': 0.9999, - 'trainable_weights_only': False, - 'dynamic_decay': True, - }, - 'optimizer': { - 'type': 'sgd_torch', - 'sgd_torch': { - 'momentum': 0.937, - 'momentum_start': 0.8, - 'nesterov': True, - 'warmup_steps': steps_per_epoch * warmup_epochs, - 'weight_decay': 0.0005, - } - }, - 'learning_rate': { - 'type': 'cosine', - 'cosine': { - 'initial_learning_rate': 0.01, - 'alpha': 0.2, - 'decay_steps': train_epochs * steps_per_epoch, - } - }, - 'warmup': { - 'type': 'linear', - 'linear': { - 'warmup_steps': steps_per_epoch * warmup_epochs, - 'warmup_learning_rate': 0 - } - } - })), - restrictions=[ - 'task.train_data.is_training != None', - 'task.validation_data.is_training != None' - ]) - - return config diff --git a/official/vision/beta/projects/yolo/dataloaders/__init__.py b/official/vision/beta/projects/yolo/dataloaders/__init__.py deleted file mode 100644 index a25710c222e3327cb20e000db5df5c5651c4a2cc..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/dataloaders/__init__.py +++ /dev/null @@ -1,15 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - diff --git a/official/vision/beta/projects/yolo/dataloaders/classification_input.py b/official/vision/beta/projects/yolo/dataloaders/classification_input.py deleted file mode 100755 index 57d7ec2382ab2660d86c6346ca0d8af0ea2c23a3..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/dataloaders/classification_input.py +++ /dev/null @@ -1,92 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Classification decoder and parser.""" -import tensorflow as tf -from official.vision.beta.dataloaders import classification_input -from official.vision.beta.ops import preprocess_ops - - -class Parser(classification_input.Parser): - """Parser to parse an image and its annotations into a dictionary of tensors.""" - - def _parse_train_image(self, decoded_tensors): - """Parses image data for training.""" - image_bytes = decoded_tensors[self._image_field_key] - - if self._decode_jpeg_only: - image_shape = tf.image.extract_jpeg_shape(image_bytes) - - # Crops image. - cropped_image = preprocess_ops.random_crop_image_v2( - image_bytes, image_shape) - image = tf.cond( - tf.reduce_all(tf.equal(tf.shape(cropped_image), image_shape)), - lambda: preprocess_ops.center_crop_image_v2(image_bytes, image_shape), - lambda: cropped_image) - else: - # Decodes image. - image = tf.io.decode_image(image_bytes, channels=3) - image.set_shape([None, None, 3]) - - # Crops image. - cropped_image = preprocess_ops.random_crop_image(image) - - image = tf.cond( - tf.reduce_all(tf.equal(tf.shape(cropped_image), tf.shape(image))), - lambda: preprocess_ops.center_crop_image(image), - lambda: cropped_image) - - if self._aug_rand_hflip: - image = tf.image.random_flip_left_right(image) - - # Resizes image. - image = tf.image.resize( - image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) - image.set_shape([self._output_size[0], self._output_size[1], 3]) - - # Apply autoaug or randaug. - if self._augmenter is not None: - image = self._augmenter.distort(image) - - # Convert image to self._dtype. - image = tf.image.convert_image_dtype(image, self._dtype) - image = image / 255.0 - return image - - def _parse_eval_image(self, decoded_tensors): - """Parses image data for evaluation.""" - image_bytes = decoded_tensors[self._image_field_key] - - if self._decode_jpeg_only: - image_shape = tf.image.extract_jpeg_shape(image_bytes) - - # Center crops. - image = preprocess_ops.center_crop_image_v2(image_bytes, image_shape) - else: - # Decodes image. - image = tf.io.decode_image(image_bytes, channels=3) - image.set_shape([None, None, 3]) - - # Center crops. - image = preprocess_ops.center_crop_image(image) - - image = tf.image.resize( - image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) - image.set_shape([self._output_size[0], self._output_size[1], 3]) - - # Convert image to self._dtype. - image = tf.image.convert_image_dtype(image, self._dtype) - image = image / 255.0 - return image diff --git a/official/vision/beta/projects/yolo/dataloaders/tf_example_decoder.py b/official/vision/beta/projects/yolo/dataloaders/tf_example_decoder.py deleted file mode 100644 index 032a20a52e3cb3494b420a845d076a6299879fd2..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/dataloaders/tf_example_decoder.py +++ /dev/null @@ -1,119 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tensorflow Example proto decoder for object detection. - -A decoder to decode string tensors containing serialized tensorflow.Example -protos for object detection. -""" -import tensorflow as tf - -from official.vision.beta.dataloaders import tf_example_decoder - - -def _coco91_to_80(classif, box, areas, iscrowds): - """Function used to reduce COCO 91 to COCO 80 (2017 to 2014 format).""" - # Vector where index i coralates to the class at index[i]. - class_ids = [ - 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, - 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, - 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, - 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, - 86, 87, 88, 89, 90 - ] - new_classes = tf.expand_dims(tf.convert_to_tensor(class_ids), axis=0) - - # Resahpe the classes to in order to build a class mask. - classes = tf.expand_dims(classif, axis=-1) - - # One hot the classificiations to match the 80 class format. - ind = classes == tf.cast(new_classes, classes.dtype) - - # Select the max values. - selected_class = tf.reshape( - tf.math.argmax(tf.cast(ind, tf.float32), axis=-1), [-1]) - ind = tf.where(tf.reduce_any(ind, axis=-1)) - - # Gather the valuable instances. - classif = tf.gather_nd(selected_class, ind) - box = tf.gather_nd(box, ind) - areas = tf.gather_nd(areas, ind) - iscrowds = tf.gather_nd(iscrowds, ind) - - # Restate the number of viable detections, ideally it should be the same. - num_detections = tf.shape(classif)[0] - return classif, box, areas, iscrowds, num_detections - - -class TfExampleDecoder(tf_example_decoder.TfExampleDecoder): - """Tensorflow Example proto decoder.""" - - def __init__(self, - coco91_to_80=None, - include_mask=False, - regenerate_source_id=False, - mask_binarize_threshold=None): - """Initialize the example decoder. - - Args: - coco91_to_80: `bool` indicating whether to convert coco from its 91 class - format to the 80 class format. - include_mask: `bool` indicating if the decoder should also decode instance - masks for instance segmentation. - regenerate_source_id: `bool` indicating if the source id needs to be - recreated for each image sample. - mask_binarize_threshold: `float` for binarizing mask values. - """ - if coco91_to_80 and include_mask: - raise ValueError('If masks are included you cannot convert coco from the' - '91 class format to the 80 class format.') - - self._coco91_to_80 = coco91_to_80 - super().__init__( - include_mask=include_mask, - regenerate_source_id=regenerate_source_id, - mask_binarize_threshold=mask_binarize_threshold) - - def decode(self, serialized_example): - """Decode the serialized example. - - Args: - serialized_example: a single serialized tf.Example string. - - Returns: - decoded_tensors: a dictionary of tensors with the following fields: - - source_id: a string scalar tensor. - - image: a uint8 tensor of shape [None, None, 3]. - - height: an integer scalar tensor. - - width: an integer scalar tensor. - - groundtruth_classes: a int64 tensor of shape [None]. - - groundtruth_is_crowd: a bool tensor of shape [None]. - - groundtruth_area: a float32 tensor of shape [None]. - - groundtruth_boxes: a float32 tensor of shape [None, 4]. - - groundtruth_instance_masks: a float32 tensor of shape - [None, None, None]. - - groundtruth_instance_masks_png: a string tensor of shape [None]. - """ - decoded_tensors = super().decode(serialized_example) - - if self._coco91_to_80: - (decoded_tensors['groundtruth_classes'], - decoded_tensors['groundtruth_boxes'], - decoded_tensors['groundtruth_area'], - decoded_tensors['groundtruth_is_crowd'], - _) = _coco91_to_80(decoded_tensors['groundtruth_classes'], - decoded_tensors['groundtruth_boxes'], - decoded_tensors['groundtruth_area'], - decoded_tensors['groundtruth_is_crowd']) - return decoded_tensors diff --git a/official/vision/beta/projects/yolo/losses/__init__.py b/official/vision/beta/projects/yolo/losses/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/losses/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/yolo/modeling/decoders/__init__.py b/official/vision/beta/projects/yolo/modeling/decoders/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/modeling/decoders/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/yolo/modeling/factory.py b/official/vision/beta/projects/yolo/modeling/factory.py deleted file mode 100644 index a841131062ad130df8d075e7e561a96c485548e0..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/modeling/factory.py +++ /dev/null @@ -1,95 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains common factory functions yolo neural networks.""" - -from absl import logging -from official.vision.beta.modeling.backbones import factory as backbone_factory -from official.vision.beta.modeling.decoders import factory as decoder_factory - -from official.vision.beta.projects.yolo.configs import yolo -from official.vision.beta.projects.yolo.modeling import yolo_model -from official.vision.beta.projects.yolo.modeling.heads import yolo_head -from official.vision.beta.projects.yolo.modeling.layers import detection_generator - - -def build_yolo_detection_generator(model_config: yolo.Yolo, anchor_boxes): - """Builds yolo detection generator.""" - model = detection_generator.YoloLayer( - classes=model_config.num_classes, - anchors=anchor_boxes, - iou_thresh=model_config.detection_generator.iou_thresh, - nms_thresh=model_config.detection_generator.nms_thresh, - max_boxes=model_config.detection_generator.max_boxes, - pre_nms_points=model_config.detection_generator.pre_nms_points, - nms_type=model_config.detection_generator.nms_type, - box_type=model_config.detection_generator.box_type.get(), - path_scale=model_config.detection_generator.path_scales.get(), - scale_xy=model_config.detection_generator.scale_xy.get(), - label_smoothing=model_config.loss.label_smoothing, - use_scaled_loss=model_config.loss.use_scaled_loss, - update_on_repeat=model_config.loss.update_on_repeat, - truth_thresh=model_config.loss.truth_thresh.get(), - loss_type=model_config.loss.box_loss_type.get(), - max_delta=model_config.loss.max_delta.get(), - iou_normalizer=model_config.loss.iou_normalizer.get(), - cls_normalizer=model_config.loss.cls_normalizer.get(), - object_normalizer=model_config.loss.object_normalizer.get(), - ignore_thresh=model_config.loss.ignore_thresh.get(), - objectness_smooth=model_config.loss.objectness_smooth.get()) - return model - - -def build_yolo_head(input_specs, model_config: yolo.Yolo, l2_regularization): - """Builds yolo head.""" - min_level = min(map(int, input_specs.keys())) - max_level = max(map(int, input_specs.keys())) - head = yolo_head.YoloHead( - min_level=min_level, - max_level=max_level, - classes=model_config.num_classes, - boxes_per_level=model_config.anchor_boxes.anchors_per_scale, - norm_momentum=model_config.norm_activation.norm_momentum, - norm_epsilon=model_config.norm_activation.norm_epsilon, - kernel_regularizer=l2_regularization, - smart_bias=model_config.head.smart_bias) - return head - - -def build_yolo(input_specs, model_config, l2_regularization): - """Builds yolo model.""" - backbone = model_config.backbone.get() - anchor_dict, _ = model_config.anchor_boxes.get( - backbone.min_level, backbone.max_level) - backbone = backbone_factory.build_backbone(input_specs, model_config.backbone, - model_config.norm_activation, - l2_regularization) - decoder = decoder_factory.build_decoder(backbone.output_specs, model_config, - l2_regularization) - - head = build_yolo_head(decoder.output_specs, model_config, l2_regularization) - detection_generator_obj = build_yolo_detection_generator(model_config, - anchor_dict) - - model = yolo_model.Yolo( - backbone=backbone, - decoder=decoder, - head=head, - detection_generator=detection_generator_obj) - model.build(input_specs.shape) - - model.summary(print_fn=logging.info) - - losses = detection_generator_obj.get_losses() - return model, losses diff --git a/official/vision/beta/projects/yolo/modeling/heads/__init__.py b/official/vision/beta/projects/yolo/modeling/heads/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/modeling/heads/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/yolo/modeling/layers/detection_generator.py b/official/vision/beta/projects/yolo/modeling/layers/detection_generator.py deleted file mode 100644 index 13732e4751bd645393425d7d0ec4c6ff30a6d2ab..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/modeling/layers/detection_generator.py +++ /dev/null @@ -1,307 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains common building blocks for yolo layer (detection layer).""" -import tensorflow as tf - -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.projects.yolo.losses import yolo_loss -from official.vision.beta.projects.yolo.ops import box_ops -from official.vision.beta.projects.yolo.ops import loss_utils - - -class YoloLayer(tf.keras.Model): - """Yolo layer (detection generator).""" - - def __init__(self, - anchors, - classes, - iou_thresh=0.0, - ignore_thresh=0.7, - truth_thresh=1.0, - nms_thresh=0.6, - max_delta=10.0, - loss_type='ciou', - iou_normalizer=1.0, - cls_normalizer=1.0, - object_normalizer=1.0, - use_scaled_loss=False, - update_on_repeat=False, - pre_nms_points=5000, - label_smoothing=0.0, - max_boxes=200, - box_type='original', - path_scale=None, - scale_xy=None, - nms_type='greedy', - objectness_smooth=False, - **kwargs): - """Parameters for the loss functions used at each detection head output. - - Args: - anchors: `List[List[int]]` for the anchor boxes that are used in the - model. - classes: `int` for the number of classes. - iou_thresh: `float` to use many anchors per object if IoU(Obj, Anchor) > - iou_thresh. - ignore_thresh: `float` for the IOU value over which the loss is not - propagated, and a detection is assumed to have been made. - truth_thresh: `float` for the IOU value over which the loss is propagated - despite a detection being made'. - nms_thresh: `float` for the minimum IOU value for an overlap. - max_delta: gradient clipping to apply to the box loss. - loss_type: `str` for the typeof iou loss to use with in {ciou, diou, - giou, iou}. - iou_normalizer: `float` for how much to scale the loss on the IOU or the - boxes. - cls_normalizer: `float` for how much to scale the loss on the classes. - object_normalizer: `float` for how much to scale loss on the detection - map. - use_scaled_loss: `bool` for whether to use the scaled loss - or the traditional loss. - update_on_repeat: `bool` indicating how you would like to handle repeated - indexes in a given [j, i] index. Setting this to True will give more - consistent MAP, setting it to falls will improve recall by 1-2% but will - sacrifice some MAP. - pre_nms_points: `int` number of top candidate detections per class before - NMS. - label_smoothing: `float` for how much to smooth the loss on the classes. - max_boxes: `int` for the maximum number of boxes retained over all - classes. - box_type: `str`, there are 3 different box types that will affect training - differently {original, scaled and anchor_free}. The original method - decodes the boxes by applying an exponential to the model width and - height maps, then scaling the maps by the anchor boxes. This method is - used in Yolo-v4, Yolo-v3, and all its counterparts. The Scale method - squares the width and height and scales both by a fixed factor of 4. - This method is used in the Scale Yolo models, as well as Yolov4-CSP. - Finally, anchor_free is like the original method but will not apply an - activation function to the boxes, this is used for some of the newer - anchor free versions of YOLO. - path_scale: `dict` for the size of the input tensors. Defaults to - precalulated values from the `mask`. - scale_xy: dictionary `float` values inidcating how far each pixel can see - outside of its containment of 1.0. a value of 1.2 indicates there is a - 20% extended radius around each pixel that this specific pixel can - predict values for a center at. the center can range from 0 - value/2 - to 1 + value/2, this value is set in the yolo filter, and resused here. - there should be one value for scale_xy for each level from min_level to - max_level. - nms_type: `str` for which non max suppression to use. - objectness_smooth: `float` for how much to smooth the loss on the - detection map. - **kwargs: Addtional keyword arguments. - """ - super().__init__(**kwargs) - self._anchors = anchors - self._thresh = iou_thresh - self._ignore_thresh = ignore_thresh - self._truth_thresh = truth_thresh - self._iou_normalizer = iou_normalizer - self._cls_normalizer = cls_normalizer - self._object_normalizer = object_normalizer - self._objectness_smooth = objectness_smooth - self._nms_thresh = nms_thresh - self._max_boxes = max_boxes - self._max_delta = max_delta - self._classes = classes - self._loss_type = loss_type - - self._use_scaled_loss = use_scaled_loss - self._update_on_repeat = update_on_repeat - - self._pre_nms_points = pre_nms_points - self._label_smoothing = label_smoothing - - self._keys = list(anchors.keys()) - self._len_keys = len(self._keys) - self._box_type = box_type - self._path_scale = path_scale or {key: 2**int(key) for key in self._keys} - - self._nms_type = nms_type - self._scale_xy = scale_xy or {key: 1.0 for key, _ in anchors.items()} - - self._generator = {} - self._len_mask = {} - for key in self._keys: - anchors = self._anchors[key] - self._generator[key] = loss_utils.GridGenerator( - anchors, scale_anchors=self._path_scale[key]) - self._len_mask[key] = len(anchors) - return - - def parse_prediction_path(self, key, inputs): - shape_ = tf.shape(inputs) - shape = inputs.get_shape().as_list() - batchsize, height, width = shape_[0], shape[1], shape[2] - - if height is None or width is None: - height, width = shape_[1], shape_[2] - - generator = self._generator[key] - len_mask = self._len_mask[key] - scale_xy = self._scale_xy[key] - - # reshape the yolo output to (batchsize, - # width, - # height, - # number_anchors, - # remaining_points) - data = tf.reshape(inputs, [-1, height, width, len_mask, self._classes + 5]) - - # use the grid generator to get the formatted anchor boxes and grid points - # in shape [1, height, width, 2] - centers, anchors = generator(height, width, batchsize, dtype=data.dtype) - - # split the yolo detections into boxes, object score map, classes - boxes, obns_scores, class_scores = tf.split( - data, [4, 1, self._classes], axis=-1) - - # determine the number of classes - classes = class_scores.get_shape().as_list()[-1] - - # configurable to use the new coordinates in scaled Yolo v4 or not - _, _, boxes = loss_utils.get_predicted_box( - tf.cast(height, data.dtype), - tf.cast(width, data.dtype), - boxes, - anchors, - centers, - scale_xy, - stride=self._path_scale[key], - darknet=False, - box_type=self._box_type[key]) - - # convert boxes from yolo(x, y, w. h) to tensorflow(ymin, xmin, ymax, xmax) - boxes = box_ops.xcycwh_to_yxyx(boxes) - - # activate and detection map - obns_scores = tf.math.sigmoid(obns_scores) - - # convert detection map to class detection probabailities - class_scores = tf.math.sigmoid(class_scores) * obns_scores - - # platten predictions to [batchsize, N, -1] for non max supression - fill = height * width * len_mask - boxes = tf.reshape(boxes, [-1, fill, 4]) - class_scores = tf.reshape(class_scores, [-1, fill, classes]) - obns_scores = tf.reshape(obns_scores, [-1, fill]) - return obns_scores, boxes, class_scores - - def call(self, inputs): - boxes = [] - class_scores = [] - object_scores = [] - levels = list(inputs.keys()) - min_level = int(min(levels)) - max_level = int(max(levels)) - - # aggregare boxes over each scale - for i in range(min_level, max_level + 1): - key = str(i) - object_scores_, boxes_, class_scores_ = self.parse_prediction_path( - key, inputs[key]) - boxes.append(boxes_) - class_scores.append(class_scores_) - object_scores.append(object_scores_) - - # colate all predicitons - boxes = tf.concat(boxes, axis=1) - object_scores = tf.concat(object_scores, axis=1) - class_scores = tf.concat(class_scores, axis=1) - - # get masks to threshold all the predicitons - object_mask = tf.cast(object_scores > self._thresh, object_scores.dtype) - class_mask = tf.cast(class_scores > self._thresh, class_scores.dtype) - - # apply thresholds mask to all the predicitons - object_scores *= object_mask - class_scores *= (tf.expand_dims(object_mask, axis=-1) * class_mask) - - # apply nms - if self._nms_type == 'greedy': - # greedy NMS - boxes = tf.cast(boxes, dtype=tf.float32) - class_scores = tf.cast(class_scores, dtype=tf.float32) - boxes, object_scores_, class_scores, num_detections = ( - tf.image.combined_non_max_suppression( - tf.expand_dims(boxes, axis=-2), - class_scores, - self._pre_nms_points, - self._max_boxes, - iou_threshold=self._nms_thresh, - score_threshold=self._thresh)) - # cast the boxes and predicitons abck to original datatype - boxes = tf.cast(boxes, object_scores.dtype) - class_scores = tf.cast(class_scores, object_scores.dtype) - object_scores = tf.cast(object_scores_, object_scores.dtype) - else: - # TPU NMS - boxes = tf.cast(boxes, dtype=tf.float32) - class_scores = tf.cast(class_scores, dtype=tf.float32) - (boxes, confidence, classes, - num_detections) = detection_generator._generate_detections_v2( # pylint:disable=protected-access - tf.expand_dims(boxes, axis=-2), - class_scores, - pre_nms_top_k=self._pre_nms_points, - max_num_detections=self._max_boxes, - nms_iou_threshold=self._nms_thresh, - pre_nms_score_threshold=self._thresh) - boxes = tf.cast(boxes, object_scores.dtype) - class_scores = tf.cast(classes, object_scores.dtype) - object_scores = tf.cast(confidence, object_scores.dtype) - - # format and return - return { - 'bbox': boxes, - 'classes': class_scores, - 'confidence': object_scores, - 'num_detections': num_detections, - } - - def get_losses(self): - """Generates a dictionary of losses to apply to each path. - - Done in the detection generator because all parameters are the same - across both loss and detection generator. - - Returns: - Dict[str, tf.Tensor] of losses - """ - loss = yolo_loss.YoloLoss( - keys=self._keys, - classes=self._classes, - anchors=self._anchors, - path_strides=self._path_scale, - truth_thresholds=self._truth_thresh, - ignore_thresholds=self._ignore_thresh, - loss_types=self._loss_type, - iou_normalizers=self._iou_normalizer, - cls_normalizers=self._cls_normalizer, - object_normalizers=self._object_normalizer, - objectness_smooths=self._objectness_smooth, - box_types=self._box_type, - max_deltas=self._max_delta, - scale_xys=self._scale_xy, - use_scaled_loss=self._use_scaled_loss, - update_on_repeat=self._update_on_repeat, - label_smoothing=self._label_smoothing) - return loss - - def get_config(self): - return { - 'anchors': [list(a) for a in self._anchors], - 'thresh': self._thresh, - 'max_boxes': self._max_boxes, - } diff --git a/official/vision/beta/projects/yolo/modeling/layers/detection_generator_test.py b/official/vision/beta/projects/yolo/modeling/layers/detection_generator_test.py deleted file mode 100644 index ebe70060427e9e0be0d293cf1dc1b8f5e0dbcd7b..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/modeling/layers/detection_generator_test.py +++ /dev/null @@ -1,61 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for yolo detection generator.""" -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.beta.projects.yolo.modeling.layers import detection_generator as dg - - -class YoloDecoderTest(parameterized.TestCase, tf.test.TestCase): - - @parameterized.parameters( - (True), - (False), - ) - def test_network_creation(self, nms): - """Test creation of ResNet family models.""" - tf.keras.backend.set_image_data_format('channels_last') - input_shape = { - '3': [1, 52, 52, 255], - '4': [1, 26, 26, 255], - '5': [1, 13, 13, 255] - } - classes = 80 - anchors = { - '3': [[12.0, 19.0], [31.0, 46.0], [96.0, 54.0]], - '4': [[46.0, 114.0], [133.0, 127.0], [79.0, 225.0]], - '5': [[301.0, 150.0], [172.0, 286.0], [348.0, 340.0]] - } - - box_type = {key: 'scaled' for key in anchors.keys()} - - layer = dg.YoloLayer(anchors, classes, box_type=box_type, max_boxes=10) - - inputs = {} - for key in input_shape: - inputs[key] = tf.ones(input_shape[key], dtype=tf.float32) - - endpoints = layer(inputs) - - boxes = endpoints['bbox'] - classes = endpoints['classes'] - - self.assertAllEqual(boxes.shape.as_list(), [1, 10, 4]) - self.assertAllEqual(classes.shape.as_list(), [1, 10]) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/yolo/modeling/layers/nn_blocks.py b/official/vision/beta/projects/yolo/modeling/layers/nn_blocks.py deleted file mode 100644 index a3879c6f9e2e568a642a765a30beb14e0171451b..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/modeling/layers/nn_blocks.py +++ /dev/null @@ -1,1718 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains common building blocks for yolo neural networks.""" -from typing import Callable, List, Tuple - -import tensorflow as tf - -from official.modeling import tf_utils -from official.vision.beta.ops import spatial_transform_ops - - -class Identity(tf.keras.layers.Layer): - - def call(self, inputs): - return inputs - - -class ConvBN(tf.keras.layers.Layer): - """ConvBN block. - - Modified Convolution layer to match that of the Darknet Library. - The Layer is a standards combination of Conv BatchNorm Activation, - however, the use of bias in the conv is determined by the use of batch - normalization. - Cross Stage Partial networks (CSPNets) were proposed in: - [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, - Ping-Yang Chen, Jun-Wei Hsieh - CSPNet: A New Backbone that can Enhance Learning Capability of CNN. - arXiv:1911.11929 - """ - - def __init__(self, - filters=1, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - dilation_rate=(1, 1), - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - use_separable_conv=False, - use_bn=True, - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - activation='leaky', - leaky_alpha=0.1, - **kwargs): - """ConvBN initializer. - - Args: - filters: integer for output depth, or the number of features to learn. - kernel_size: integer or tuple for the shape of the weight matrix or kernel - to learn. - strides: integer of tuple how much to move the kernel after each kernel - use. - padding: string 'valid' or 'same', if same, then pad the image, else do - not. - dilation_rate: tuple to indicate how much to modulate kernel weights and - how many pixels in a feature map to skip. - kernel_initializer: string to indicate which function to use to initialize - weights. - bias_initializer: string to indicate which function to use to initialize - bias. - bias_regularizer: string to indicate which function to use to regularizer - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - use_separable_conv: `bool` wether to use separable convs. - use_bn: boolean for whether to use batch normalization. - use_sync_bn: boolean for whether sync batch normalization statistics - of all batch norm layers to the models global statistics - (across all input batches). - norm_momentum: float for moment to use for batch normalization. - norm_epsilon: float for batch normalization epsilon. - activation: string or None for activation function to use in layer, - if None activation is replaced by linear. - leaky_alpha: float to use as alpha if activation function is leaky. - **kwargs: Keyword Arguments. - """ - - # convolution params - self._filters = filters - self._kernel_size = kernel_size - self._strides = strides - self._padding = padding - self._dilation_rate = dilation_rate - - if kernel_initializer == 'VarianceScaling': - # to match pytorch initialization method - self._kernel_initializer = tf.keras.initializers.VarianceScaling( - scale=1 / 3, mode='fan_in', distribution='uniform') - else: - self._kernel_initializer = kernel_initializer - - self._bias_initializer = bias_initializer - self._kernel_regularizer = kernel_regularizer - - self._bias_regularizer = bias_regularizer - - # batch normalization params - self._use_bn = use_bn - self._use_separable_conv = use_separable_conv - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - - ksize = self._kernel_size - if not isinstance(ksize, List) and not isinstance(ksize, Tuple): - ksize = [ksize] - if use_separable_conv and not all([a == 1 for a in ksize]): - self._conv_base = tf.keras.layers.SeparableConv2D - else: - self._conv_base = tf.keras.layers.Conv2D - - if use_sync_bn: - self._bn_base = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._bn_base = tf.keras.layers.BatchNormalization - - if tf.keras.backend.image_data_format() == 'channels_last': - # format: (batch_size, height, width, channels) - self._bn_axis = -1 - else: - # format: (batch_size, channels, width, height) - self._bn_axis = 1 - - # activation params - self._activation = activation - self._leaky_alpha = leaky_alpha - self._fuse = False - - super().__init__(**kwargs) - - def build(self, input_shape): - use_bias = not self._use_bn - - self.conv = self._conv_base( - filters=self._filters, - kernel_size=self._kernel_size, - strides=self._strides, - padding=self._padding, - dilation_rate=self._dilation_rate, - use_bias=use_bias, - kernel_initializer=self._kernel_initializer, - bias_initializer=self._bias_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - - if self._use_bn: - self.bn = self._bn_base( - momentum=self._norm_momentum, - epsilon=self._norm_epsilon, - axis=self._bn_axis) - else: - self.bn = None - - if self._activation == 'leaky': - self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) - elif self._activation == 'mish': - self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) - else: - self._activation_fn = tf_utils.get_activation(self._activation) - - def call(self, x): - x = self.conv(x) - if self._use_bn and not self._fuse: - x = self.bn(x) - x = self._activation_fn(x) - return x - - def fuse(self): - if self.bn is not None and not self._use_separable_conv: - # Fuse convolution and batchnorm, gives me +2 to 3 FPS 2ms latency. - # layers: https://tehnokv.com/posts/fusing-batchnorm-and-conv/ - if self._fuse: - return - - self._fuse = True - conv_weights = self.conv.get_weights()[0] - gamma, beta, moving_mean, moving_variance = self.bn.get_weights() - - self.conv.use_bias = True - infilters = conv_weights.shape[-2] - self.conv.build([None, None, None, infilters]) - - base = tf.sqrt(self._norm_epsilon + moving_variance) - w_conv_base = tf.transpose(conv_weights, perm=(3, 2, 0, 1)) - w_conv = tf.reshape(w_conv_base, [conv_weights.shape[-1], -1]) - - w_bn = tf.linalg.diag(gamma / base) - w_conv = tf.reshape(tf.matmul(w_bn, w_conv), w_conv_base.get_shape()) - w_conv = tf.transpose(w_conv, perm=(2, 3, 1, 0)) - - b_bn = beta - gamma * moving_mean / base - - self.conv.set_weights([w_conv, b_bn]) - del self.bn - - self.trainable = False - self.conv.trainable = False - self.bn = None - return - - def get_config(self): - # used to store/share parameters to reconstruct the model - layer_config = { - 'filters': self._filters, - 'kernel_size': self._kernel_size, - 'strides': self._strides, - 'padding': self._padding, - 'dilation_rate': self._dilation_rate, - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'bias_regularizer': self._bias_regularizer, - 'kernel_regularizer': self._kernel_regularizer, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'activation': self._activation, - 'leaky_alpha': self._leaky_alpha - } - layer_config.update(super().get_config()) - return layer_config - - -class DarkResidual(tf.keras.layers.Layer): - """Darknet block with Residual connection for Yolo v3 Backbone.""" - - def __init__(self, - filters=1, - filter_scale=2, - dilation_rate=1, - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - kernel_regularizer=None, - bias_regularizer=None, - use_bn=True, - use_sync_bn=False, - use_separable_conv=False, - norm_momentum=0.99, - norm_epsilon=0.001, - activation='leaky', - leaky_alpha=0.1, - sc_activation='linear', - downsample=False, - **kwargs): - """Dark Residual initializer. - - Args: - filters: integer for output depth, or the number of features to learn. - filter_scale: `int` for filter scale. - dilation_rate: tuple to indicate how much to modulate kernel weights and - how many pixels in a feature map to skip. - kernel_initializer: string to indicate which function to use to initialize - weights. - bias_initializer: string to indicate which function to use to initialize - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - bias_regularizer: string to indicate which function to use to regularizer - bias. - use_bn: boolean for whether to use batch normalization. - use_sync_bn: boolean for whether sync batch normalization statistics. - of all batch norm layers to the models global statistics - (across all input batches). - use_separable_conv: `bool` wether to use separable convs. - norm_momentum: float for moment to use for batch normalization. - norm_epsilon: float for batch normalization epsilon. - activation: string or None for activation function to use in layer, - if None activation is replaced by linear. - leaky_alpha: float to use as alpha if activation function is leaky. - sc_activation: string for activation function to use in layer. - downsample: boolean for if image input is larger than layer output, set - downsample to True so the dimensions are forced to match. - **kwargs: Keyword Arguments. - """ - - # downsample - self._downsample = downsample - - # ConvBN params - self._filters = filters - self._filter_scale = filter_scale - self._kernel_initializer = kernel_initializer - self._bias_initializer = bias_initializer - self._bias_regularizer = bias_regularizer - self._use_bn = use_bn - self._use_sync_bn = use_sync_bn - self._use_separable_conv = use_separable_conv - self._kernel_regularizer = kernel_regularizer - - # normal params - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._dilation_rate = dilation_rate if isinstance(dilation_rate, - int) else dilation_rate[0] - - # activation params - self._conv_activation = activation - self._leaky_alpha = leaky_alpha - self._sc_activation = sc_activation - - super().__init__(**kwargs) - - def build(self, input_shape): - dark_conv_args = { - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'bias_regularizer': self._bias_regularizer, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'activation': self._conv_activation, - 'kernel_regularizer': self._kernel_regularizer, - 'leaky_alpha': self._leaky_alpha - } - if self._downsample: - if self._dilation_rate > 1: - dilation_rate = 1 - if self._dilation_rate // 2 > 0: - dilation_rate = self._dilation_rate // 2 - down_stride = 1 - else: - dilation_rate = 1 - down_stride = 2 - - self._dconv = ConvBN( - filters=self._filters, - kernel_size=(3, 3), - strides=down_stride, - dilation_rate=dilation_rate, - padding='same', - **dark_conv_args) - - self._conv1 = ConvBN( - filters=self._filters // self._filter_scale, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - **dark_conv_args) - - self._conv2 = ConvBN( - filters=self._filters, - kernel_size=(3, 3), - strides=(1, 1), - dilation_rate=self._dilation_rate, - padding='same', - **dark_conv_args) - - self._shortcut = tf.keras.layers.Add() - if self._sc_activation == 'leaky': - self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) - elif self._sc_activation == 'mish': - self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) - else: - self._activation_fn = tf_utils.get_activation(self._sc_activation) - super().build(input_shape) - - def call(self, inputs, training=None): - if self._downsample: - inputs = self._dconv(inputs) - x = self._conv1(inputs) - x = self._conv2(x) - x = self._shortcut([x, inputs]) - return self._activation_fn(x) - - def get_config(self): - # used to store/share parameters to reconstruct the model - layer_config = { - 'filters': self._filters, - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'dilation_rate': self._dilation_rate, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'activation': self._conv_activation, - 'leaky_alpha': self._leaky_alpha, - 'sc_activation': self._sc_activation, - 'downsample': self._downsample, - } - layer_config.update(super().get_config()) - return layer_config - - -class CSPTiny(tf.keras.layers.Layer): - """CSP Tiny layer. - - A Small size convolution block proposed in the CSPNet. The layer uses - shortcuts, routing(concatnation), and feature grouping in order to improve - gradient variablity and allow for high efficency, low power residual learning - for small networtf.keras. - Cross Stage Partial networks (CSPNets) were proposed in: - [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, - Ping-Yang Chen, Jun-Wei Hsieh - CSPNet: A New Backbone that can Enhance Learning Capability of CNN. - arXiv:1911.11929 - """ - - def __init__(self, - filters=1, - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - use_bn=True, - dilation_rate=1, - use_sync_bn=False, - use_separable_conv=False, - group_id=1, - groups=2, - norm_momentum=0.99, - norm_epsilon=0.001, - activation='leaky', - downsample=True, - leaky_alpha=0.1, - **kwargs): - """Initializer for CSPTiny block. - - Args: - filters: integer for output depth, or the number of features to learn. - kernel_initializer: string to indicate which function to use to initialize - weights. - bias_initializer: string to indicate which function to use to initialize - bias. - bias_regularizer: string to indicate which function to use to regularizer - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - use_bn: boolean for whether to use batch normalization. - dilation_rate: `int`, dilation rate for conv layers. - use_sync_bn: boolean for whether sync batch normalization statistics - of all batch norm layers to the models global statistics - (across all input batches). - use_separable_conv: `bool` wether to use separable convs. - group_id: integer for which group of features to pass through the csp - tiny stack. - groups: integer for how many splits there should be in the convolution - feature stack output. - norm_momentum: float for moment to use for batch normalization. - norm_epsilon: float for batch normalization epsilon. - activation: string or None for activation function to use in layer, - if None activation is replaced by linear. - downsample: boolean for if image input is larger than layer output, set - downsample to True so the dimensions are forced to match. - leaky_alpha: float to use as alpha if activation function is leaky. - **kwargs: Keyword Arguments. - """ - - # ConvBN params - self._filters = filters - self._kernel_initializer = kernel_initializer - self._bias_initializer = bias_initializer - self._bias_regularizer = bias_regularizer - self._use_bn = use_bn - self._dilation_rate = dilation_rate - self._use_sync_bn = use_sync_bn - self._use_separable_conv = use_separable_conv - self._kernel_regularizer = kernel_regularizer - self._groups = groups - self._group_id = group_id - self._downsample = downsample - - # normal params - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - - # activation params - self._conv_activation = activation - self._leaky_alpha = leaky_alpha - - super().__init__(**kwargs) - - def build(self, input_shape): - dark_conv_args = { - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'activation': self._conv_activation, - 'leaky_alpha': self._leaky_alpha - } - self._convlayer1 = ConvBN( - filters=self._filters, - kernel_size=(3, 3), - strides=(1, 1), - padding='same', - **dark_conv_args) - - self._convlayer2 = ConvBN( - filters=self._filters // 2, - kernel_size=(3, 3), - strides=(1, 1), - padding='same', - **dark_conv_args) - - self._convlayer3 = ConvBN( - filters=self._filters // 2, - kernel_size=(3, 3), - strides=(1, 1), - padding='same', - **dark_conv_args) - - self._convlayer4 = ConvBN( - filters=self._filters, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - **dark_conv_args) - - if self._downsample: - self._maxpool = tf.keras.layers.MaxPool2D( - pool_size=2, strides=2, padding='same', data_format=None) - - super().build(input_shape) - - def call(self, inputs, training=None): - x1 = self._convlayer1(inputs) - x1_group = tf.split(x1, self._groups, axis=-1)[self._group_id] - x2 = self._convlayer2(x1_group) # grouping - x3 = self._convlayer3(x2) - x4 = tf.concat([x3, x2], axis=-1) # csp partial using grouping - x5 = self._convlayer4(x4) - x = tf.concat([x1, x5], axis=-1) # csp connect - if self._downsample: - x = self._maxpool(x) - return x, x5 - - -class CSPRoute(tf.keras.layers.Layer): - """CSPRoute block. - - Down sampling layer to take the place of down sampleing done in Residual - networks. This is the first of 2 layers needed to convert any Residual Network - model to a CSPNet. At the start of a new level change, this CSPRoute layer - creates a learned identity that will act as a cross stage connection, - that is used to inform the inputs to the next stage. It is called cross stage - partial because the number of filters required in every intermitent Residual - layer is reduced by half. The sister layer will take the partial generated by - this layer and concatnate it with the output of the final residual layer in - the stack to create a fully feature level output. This concatnation merges the - partial blocks of 2 levels as input to the next allowing the gradients of each - level to be more unique, and reducing the number of parameters required by - each level by 50% while keeping accuracy consistent. - - Cross Stage Partial networks (CSPNets) were proposed in: - [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, - Ping-Yang Chen, Jun-Wei Hsieh - CSPNet: A New Backbone that can Enhance Learning Capability of CNN. - arXiv:1911.11929 - """ - - def __init__(self, - filters, - filter_scale=2, - activation='mish', - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - dilation_rate=1, - use_bn=True, - use_sync_bn=False, - use_separable_conv=False, - norm_momentum=0.99, - norm_epsilon=0.001, - downsample=True, - leaky_alpha=0.1, - **kwargs): - """CSPRoute layer initializer. - - Args: - filters: integer for output depth, or the number of features to learn - filter_scale: integer dictating (filters//2) or the number of filters in - the partial feature stack. - activation: string for activation function to use in layer. - kernel_initializer: string to indicate which function to use to - initialize weights. - bias_initializer: string to indicate which function to use to initialize - bias. - bias_regularizer: string to indicate which function to use to regularizer - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - dilation_rate: dilation rate for conv layers. - use_bn: boolean for whether to use batch normalization. - use_sync_bn: boolean for whether sync batch normalization statistics - of all batch norm layers to the models global statistics - (across all input batches). - use_separable_conv: `bool` wether to use separable convs. - norm_momentum: float for moment to use for batch normalization. - norm_epsilon: float for batch normalization epsilon. - downsample: down_sample the input. - leaky_alpha: `float`, for leaky alpha value. - **kwargs: Keyword Arguments. - """ - - super().__init__(**kwargs) - # layer params - self._filters = filters - self._filter_scale = filter_scale - self._activation = activation - - # convoultion params - self._kernel_initializer = kernel_initializer - self._bias_initializer = bias_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._dilation_rate = dilation_rate - self._use_bn = use_bn - self._use_sync_bn = use_sync_bn - self._use_separable_conv = use_separable_conv - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._downsample = downsample - self._leaky_alpha = leaky_alpha - - def build(self, input_shape): - dark_conv_args = { - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'bias_regularizer': self._bias_regularizer, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'activation': self._activation, - 'kernel_regularizer': self._kernel_regularizer, - 'leaky_alpha': self._leaky_alpha, - } - if self._downsample: - if self._dilation_rate > 1: - dilation_rate = 1 - if self._dilation_rate // 2 > 0: - dilation_rate = self._dilation_rate // 2 - down_stride = 1 - else: - dilation_rate = 1 - down_stride = 2 - - self._conv1 = ConvBN( - filters=self._filters, - kernel_size=(3, 3), - strides=down_stride, - dilation_rate=dilation_rate, - **dark_conv_args) - - self._conv2 = ConvBN( - filters=self._filters // self._filter_scale, - kernel_size=(1, 1), - strides=(1, 1), - **dark_conv_args) - - self._conv3 = ConvBN( - filters=self._filters // self._filter_scale, - kernel_size=(1, 1), - strides=(1, 1), - **dark_conv_args) - - def call(self, inputs, training=None): - if self._downsample: - inputs = self._conv1(inputs) - y = self._conv2(inputs) - x = self._conv3(inputs) - return (x, y) - - -class CSPConnect(tf.keras.layers.Layer): - """CSPConnect block. - - Sister Layer to the CSPRoute layer. Merges the partial feature stacks - generated by the CSPDownsampling layer, and the finaly output of the - residual stack. Suggested in the CSPNet paper. - Cross Stage Partial networks (CSPNets) were proposed in: - [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, - Ping-Yang Chen, Jun-Wei Hsieh - CSPNet: A New Backbone that can Enhance Learning Capability of CNN. - arXiv:1911.11929 - """ - - def __init__(self, - filters, - filter_scale=2, - drop_final=False, - drop_first=False, - activation='mish', - kernel_size=(1, 1), - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - dilation_rate=1, - use_bn=True, - use_sync_bn=False, - use_separable_conv=False, - norm_momentum=0.99, - norm_epsilon=0.001, - leaky_alpha=0.1, - **kwargs): - """Initializer for CSPConnect block. - - Args: - filters: integer for output depth, or the number of features to learn. - filter_scale: integer dictating (filters//2) or the number of filters in - the partial feature stack. - drop_final: `bool`, whether to drop final conv layer. - drop_first: `bool`, whether to drop first conv layer. - activation: string for activation function to use in layer. - kernel_size: `Tuple`, kernel size for conv layers. - kernel_initializer: string to indicate which function to use to initialize - weights. - bias_initializer: string to indicate which function to use to initialize - bias. - bias_regularizer: string to indicate which function to use to regularizer - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - dilation_rate: `int`, dilation rate for conv layers. - use_bn: boolean for whether to use batch normalization. - use_sync_bn: boolean for whether sync batch normalization statistics - of all batch norm layers to the models global - statistics (across all input batches). - use_separable_conv: `bool` wether to use separable convs. - norm_momentum: float for moment to use for batch normalization. - norm_epsilon: float for batch normalization epsilon. - leaky_alpha: `float`, for leaky alpha value. - **kwargs: Keyword Arguments. - """ - - super().__init__(**kwargs) - # layer params - self._filters = filters - self._filter_scale = filter_scale - self._activation = activation - - # convoultion params - self._kernel_size = kernel_size - self._kernel_initializer = kernel_initializer - self._bias_initializer = bias_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._use_bn = use_bn - self._use_sync_bn = use_sync_bn - self._use_separable_conv = use_separable_conv - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._drop_final = drop_final - self._drop_first = drop_first - self._leaky_alpha = leaky_alpha - - def build(self, input_shape): - dark_conv_args = { - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'bias_regularizer': self._bias_regularizer, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'activation': self._activation, - 'kernel_regularizer': self._kernel_regularizer, - 'leaky_alpha': self._leaky_alpha, - } - if not self._drop_first: - self._conv1 = ConvBN( - filters=self._filters // self._filter_scale, - kernel_size=self._kernel_size, - strides=(1, 1), - **dark_conv_args) - self._concat = tf.keras.layers.Concatenate(axis=-1) - - if not self._drop_final: - self._conv2 = ConvBN( - filters=self._filters, - kernel_size=(1, 1), - strides=(1, 1), - **dark_conv_args) - - def call(self, inputs, training=None): - x_prev, x_csp = inputs - if not self._drop_first: - x_prev = self._conv1(x_prev) - x = self._concat([x_prev, x_csp]) - - # skipped if drop final is true - if not self._drop_final: - x = self._conv2(x) - return x - - -class CSPStack(tf.keras.layers.Layer): - """CSP Stack layer. - - CSP full stack, combines the route and the connect in case you dont want to - jsut quickly wrap an existing callable or list of layers to - make it a cross stage partial. Added for ease of use. you should be able - to wrap any layer stack with a CSP independent of wether it belongs - to the Darknet family. if filter_scale = 2, then the blocks in the stack - passed into the the CSP stack should also have filters = filters/filter_scale - Cross Stage Partial networks (CSPNets) were proposed in: - - [1] Chien-Yao Wang, Hong-Yuan Mark Liao, I-Hau Yeh, Yueh-Hua Wu, - Ping-Yang Chen, Jun-Wei Hsieh - CSPNet: A New Backbone that can Enhance Learning Capability of CNN. - arXiv:1911.11929 - """ - - def __init__(self, - filters, - model_to_wrap=None, - filter_scale=2, - activation='mish', - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - downsample=True, - use_bn=True, - use_sync_bn=False, - use_separable_conv=False, - norm_momentum=0.99, - norm_epsilon=0.001, - **kwargs): - """CSPStack layer initializer. - - Args: - filters: filter size for conv layers. - model_to_wrap: callable Model or a list of callable objects that will - process the output of CSPRoute, and be input into CSPConnect. list will - be called sequentially. - filter_scale: integer dictating (filters//2) or the number of filters in - the partial feature stack. - activation: string for activation function to use in layer. - kernel_initializer: string to indicate which function to use to initialize - weights. - bias_initializer: string to indicate which function to use to initialize - bias. - bias_regularizer: string to indicate which function to use to regularizer - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - downsample: down_sample the input. - use_bn: boolean for whether to use batch normalization. - use_sync_bn: boolean for whether sync batch normalization statistics of - all batch norm layers to the models global statistics (across all input - batches). - use_separable_conv: `bool` wether to use separable convs. - norm_momentum: float for moment to use for batch normalization. - norm_epsilon: float for batch normalization epsilon. - **kwargs: Keyword Arguments. - - Raises: - TypeError: model_to_wrap is not a layer or a list of layers - """ - - super().__init__(**kwargs) - # layer params - self._filters = filters - self._filter_scale = filter_scale - self._activation = activation - self._downsample = downsample - - # convoultion params - self._kernel_initializer = kernel_initializer - self._bias_initializer = bias_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._use_bn = use_bn - self._use_sync_bn = use_sync_bn - self._use_separable_conv = use_separable_conv - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - - if model_to_wrap is None: - self._model_to_wrap = [] - elif isinstance(model_to_wrap, Callable): - self._model_to_wrap = [model_to_wrap] - elif isinstance(model_to_wrap, List): - self._model_to_wrap = model_to_wrap - else: - raise TypeError( - 'the input to the CSPStack must be a list of layers that we can' + - 'iterate through, or \n a callable') - - def build(self, input_shape): - dark_conv_args = { - 'filters': self._filters, - 'filter_scale': self._filter_scale, - 'activation': self._activation, - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'bias_regularizer': self._bias_regularizer, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'kernel_regularizer': self._kernel_regularizer, - } - self._route = CSPRoute(downsample=self._downsample, **dark_conv_args) - self._connect = CSPConnect(**dark_conv_args) - - def call(self, inputs, training=None): - x, x_route = self._route(inputs) - for layer in self._model_to_wrap: - x = layer(x) - x = self._connect([x, x_route]) - return x - - -class PathAggregationBlock(tf.keras.layers.Layer): - """Path Aggregation block.""" - - def __init__(self, - filters=1, - drop_final=True, - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - use_bn=True, - use_sync_bn=False, - use_separable_conv=False, - inverted=False, - norm_momentum=0.99, - norm_epsilon=0.001, - activation='leaky', - leaky_alpha=0.1, - downsample=False, - upsample=False, - upsample_size=2, - **kwargs): - """Initializer for path aggregation block. - - Args: - filters: integer for output depth, or the number of features to learn. - drop_final: do not create the last convolution block. - kernel_initializer: string to indicate which function to use to initialize - weights. - bias_initializer: string to indicate which function to use to initialize - bias. - bias_regularizer: string to indicate which function to use to regularizer - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - use_bn: boolean for whether to use batch normalization. - use_sync_bn: boolean for whether sync batch normalization statistics - of all batch norm layers to the models global statistics - (across all input batches). - use_separable_conv: `bool` wether to use separable convs. - inverted: boolean for inverting the order of the convolutions. - norm_momentum: float for moment to use for batch normalization. - norm_epsilon: float for batch normalization epsilon. - activation: string or None for activation function to use in layer, - if None activation is replaced by linear. - leaky_alpha: float to use as alpha if activation function is leaky. - downsample: `bool` for whehter to downwample and merge. - upsample: `bool` for whehter to upsample and merge. - upsample_size: `int` how much to upsample in order to match shapes. - **kwargs: Keyword Arguments. - """ - - # Darkconv params - self._filters = filters - self._kernel_initializer = kernel_initializer - self._bias_initializer = bias_initializer - self._bias_regularizer = bias_regularizer - self._kernel_regularizer = kernel_regularizer - self._use_bn = use_bn - self._use_sync_bn = use_sync_bn - self._use_separable_conv = use_separable_conv - - # Normal params - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - - # Activation params - self._conv_activation = activation - self._leaky_alpha = leaky_alpha - self._downsample = downsample - self._upsample = upsample - self._upsample_size = upsample_size - self._drop_final = drop_final - - # Block params - self._inverted = inverted - - super().__init__(**kwargs) - - def _build_regular(self, input_shape, kwargs): - if self._downsample: - self._conv = ConvBN( - filters=self._filters, - kernel_size=(3, 3), - strides=(2, 2), - padding='same', - **kwargs) - else: - self._conv = ConvBN( - filters=self._filters, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - **kwargs) - - if not self._drop_final: - self._conv_concat = ConvBN( - filters=self._filters, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - **kwargs) - - def _build_reversed(self, input_shape, kwargs): - if self._downsample: - self._conv_prev = ConvBN( - filters=self._filters, - kernel_size=(3, 3), - strides=(2, 2), - padding='same', - **kwargs) - else: - self._conv_prev = ConvBN( - filters=self._filters, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - **kwargs) - - self._conv_route = ConvBN( - filters=self._filters, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - **kwargs) - - if not self._drop_final: - self._conv_sync = ConvBN( - filters=self._filters, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - **kwargs) - - def build(self, input_shape): - dark_conv_args = { - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'bias_regularizer': self._bias_regularizer, - 'use_bn': self._use_bn, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'activation': self._conv_activation, - 'kernel_regularizer': self._kernel_regularizer, - 'leaky_alpha': self._leaky_alpha, - } - - if self._inverted: - self._build_reversed(input_shape, dark_conv_args) - else: - self._build_regular(input_shape, dark_conv_args) - - self._concat = tf.keras.layers.Concatenate() - super().build(input_shape) - - def _call_regular(self, inputs, training=None): - input_to_convolve, input_to_concat = inputs - x_prev = self._conv(input_to_convolve) - if self._upsample: - x_prev = spatial_transform_ops.nearest_upsampling(x_prev, - self._upsample_size) - x = self._concat([x_prev, input_to_concat]) - - # used in csp conversion - if not self._drop_final: - x = self._conv_concat(x) - return x_prev, x - - def _call_reversed(self, inputs, training=None): - x_route, x_prev = inputs - x_prev = self._conv_prev(x_prev) - if self._upsample: - x_prev = spatial_transform_ops.nearest_upsampling(x_prev, - self._upsample_size) - x_route = self._conv_route(x_route) - x = self._concat([x_route, x_prev]) - if not self._drop_final: - x = self._conv_sync(x) - return x_prev, x - - def call(self, inputs, training=None): - # done this way to prevent confusion in the auto graph - if self._inverted: - return self._call_reversed(inputs, training=training) - else: - return self._call_regular(inputs, training=training) - - -class SPP(tf.keras.layers.Layer): - """Spatial Pyramid Pooling. - - A non-agregated SPP layer that uses Pooling. - """ - - def __init__(self, sizes, **kwargs): - self._sizes = list(reversed(sizes)) - if not sizes: - raise ValueError('More than one maxpool should be specified in SSP block') - super().__init__(**kwargs) - - def build(self, input_shape): - maxpools = [] - for size in self._sizes: - maxpools.append( - tf.keras.layers.MaxPool2D( - pool_size=(size, size), - strides=(1, 1), - padding='same', - data_format=None)) - self._maxpools = maxpools - super().build(input_shape) - - def call(self, inputs, training=None): - outputs = [] - for maxpool in self._maxpools: - outputs.append(maxpool(inputs)) - outputs.append(inputs) - concat_output = tf.keras.layers.concatenate(outputs) - return concat_output - - def get_config(self): - layer_config = {'sizes': self._sizes} - layer_config.update(super().get_config()) - return layer_config - - -class SAM(tf.keras.layers.Layer): - """Spatial Attention Model. - - [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon - CBAM: Convolutional Block Attention Module. arXiv:1807.06521 - - implementation of the Spatial Attention Model (SAM) - """ - - def __init__(self, - use_pooling=False, - filter_match=False, - filters=1, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - dilation_rate=(1, 1), - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - use_bn=True, - use_sync_bn=True, - use_separable_conv=False, - norm_momentum=0.99, - norm_epsilon=0.001, - activation='sigmoid', - output_activation=None, - leaky_alpha=0.1, - **kwargs): - - # use_pooling - self._use_pooling = use_pooling - self._filters = filters - self._output_activation = output_activation - self._leaky_alpha = leaky_alpha - - self.dark_conv_args = { - 'kernel_size': kernel_size, - 'strides': strides, - 'padding': padding, - 'dilation_rate': dilation_rate, - 'kernel_initializer': kernel_initializer, - 'bias_initializer': bias_initializer, - 'bias_regularizer': bias_regularizer, - 'use_bn': use_bn, - 'use_sync_bn': use_sync_bn, - 'use_separable_conv': use_separable_conv, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'activation': activation, - 'kernel_regularizer': kernel_regularizer, - 'leaky_alpha': leaky_alpha - } - - super().__init__(**kwargs) - - def build(self, input_shape): - if self._filters == -1: - self._filters = input_shape[-1] - self._conv = ConvBN(filters=self._filters, **self.dark_conv_args) - if self._output_activation == 'leaky': - self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) - elif self._output_activation == 'mish': - self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) - else: - self._activation_fn = tf_utils.get_activation(self._output_activation) - - def call(self, inputs, training=None): - if self._use_pooling: - depth_max = tf.reduce_max(inputs, axis=-1, keepdims=True) - depth_avg = tf.reduce_mean(inputs, axis=-1, keepdims=True) - input_maps = tf.concat([depth_avg, depth_max], axis=-1) - else: - input_maps = inputs - - attention_mask = self._conv(input_maps) - return self._activation_fn(inputs * attention_mask) - - -class CAM(tf.keras.layers.Layer): - """Channel Attention Model. - - [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon - CBAM: Convolutional Block Attention Module. arXiv:1807.06521 - - Implementation of the Channel Attention Model (CAM) - """ - - def __init__(self, - reduction_ratio=1.0, - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - use_bn=False, - use_sync_bn=False, - use_bias=False, - norm_momentum=0.99, - norm_epsilon=0.001, - mlp_activation='linear', - activation='sigmoid', - leaky_alpha=0.1, - **kwargs): - - self._reduction_ratio = reduction_ratio - - # use_pooling - if use_sync_bn: - self._bn = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._bn = tf.keras.layers.BatchNormalization - - if not use_bn: - self._bn = Identity - self._bn_args = {} - else: - self._bn_args = { - 'momentum': norm_momentum, - 'epsilon': norm_epsilon, - } - - self._mlp_args = { - 'use_bias': use_bias, - 'kernel_initializer': kernel_initializer, - 'bias_initializer': bias_initializer, - 'bias_regularizer': bias_regularizer, - 'activation': mlp_activation, - 'kernel_regularizer': kernel_regularizer, - } - - self._leaky_alpha = leaky_alpha - self._activation = activation - - super().__init__(**kwargs) - - def build(self, input_shape): - self._filters = input_shape[-1] - - self._mlp = tf.keras.Sequential([ - tf.keras.layers.Dense(self._filters, **self._mlp_args), - self._bn(**self._bn_args), - tf.keras.layers.Dense( - int(self._filters * self._reduction_ratio), **self._mlp_args), - self._bn(**self._bn_args), - tf.keras.layers.Dense(self._filters, **self._mlp_args), - self._bn(**self._bn_args), - ]) - - if self._activation == 'leaky': - self._activation_fn = tf.keras.layers.LeakyReLU(alpha=self._leaky_alpha) - elif self._activation == 'mish': - self._activation_fn = lambda x: x * tf.math.tanh(tf.math.softplus(x)) - else: - self._activation_fn = tf_utils.get_activation(self._activation) - - def call(self, inputs, training=None): - depth_max = self._mlp(tf.reduce_max(inputs, axis=(1, 2))) - depth_avg = self._mlp(tf.reduce_mean(inputs, axis=(1, 2))) - channel_mask = self._activation_fn(depth_avg + depth_max) - - channel_mask = tf.expand_dims(channel_mask, axis=1) - attention_mask = tf.expand_dims(channel_mask, axis=1) - - return inputs * attention_mask - - -class CBAM(tf.keras.layers.Layer): - """Convolutional Block Attention Module. - - [1] Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon - CBAM: Convolutional Block Attention Module. arXiv:1807.06521 - - implementation of the Convolution Block Attention Module (CBAM) - """ - - def __init__(self, - use_pooling=False, - filters=1, - reduction_ratio=1.0, - kernel_size=(1, 1), - strides=(1, 1), - padding='same', - dilation_rate=(1, 1), - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - use_bn=True, - use_sync_bn=False, - use_separable_conv=False, - norm_momentum=0.99, - norm_epsilon=0.001, - mlp_activation=None, - activation='sigmoid', - leaky_alpha=0.1, - **kwargs): - - # use_pooling - - self._sam_args = { - 'use_pooling': use_pooling, - 'filters': filters, - 'kernel_size': kernel_size, - 'strides': strides, - 'padding': padding, - 'dilation_rate': dilation_rate, - 'use_separable_conv': use_separable_conv, - } - - self._cam_args = { - 'reduction_ratio': reduction_ratio, - 'mlp_activation': mlp_activation - } - - self._common_args = { - 'kernel_initializer': kernel_initializer, - 'bias_initializer': bias_initializer, - 'bias_regularizer': bias_regularizer, - 'use_bn': use_bn, - 'use_sync_bn': use_sync_bn, - 'norm_momentum': norm_momentum, - 'norm_epsilon': norm_epsilon, - 'activation': activation, - 'kernel_regularizer': kernel_regularizer, - 'leaky_alpha': leaky_alpha - } - - self._cam_args.update(self._common_args) - self._sam_args.update(self._common_args) - super().__init__(**kwargs) - - def build(self, input_shape): - self._cam = CAM(**self._cam_args) - self._sam = SAM(**self._sam_args) - - def call(self, inputs, training=None): - return self._sam(self._cam(inputs)) - - -class DarkRouteProcess(tf.keras.layers.Layer): - """Dark Route Process block. - - Process darknet outputs and connect back bone to head more generalizably - Abstracts repetition of DarkConv objects that is common in YOLO. - - It is used like the following: - - x = ConvBN(1024, (3, 3), (1, 1))(x) - proc = DarkRouteProcess(filters = 1024, - repetitions = 3, - insert_spp = False)(x) - """ - - def __init__(self, - filters=2, - repetitions=2, - insert_spp=False, - insert_sam=False, - insert_cbam=False, - csp_stack=0, - csp_scale=2, - kernel_initializer='VarianceScaling', - bias_initializer='zeros', - bias_regularizer=None, - kernel_regularizer=None, - use_sync_bn=False, - use_separable_conv=False, - norm_momentum=0.99, - norm_epsilon=0.001, - block_invert=False, - activation='leaky', - leaky_alpha=0.1, - spp_keys=None, - **kwargs): - """DarkRouteProcess initializer. - - Args: - filters: the number of filters to be used in all subsequent layers - filters should be the depth of the tensor input into this layer, - as no downsampling can be done within this layer object. - repetitions: number of times to repeat the processign nodes. - for tiny: 1 repition, no spp allowed. - for spp: insert_spp = True, and allow for 6 repetitions. - for regular: insert_spp = False, and allow for 6 repetitions. - insert_spp: bool if true add the spatial pyramid pooling layer. - insert_sam: bool if true add spatial attention module to path. - insert_cbam: bool if true add convolutional block attention - module to path. - csp_stack: int for the number of sequential layers from 0 - to you would like to convert into a Cross Stage - Partial(csp) type. - csp_scale: int for how much to down scale the number of filters - only for the csp layers in the csp section of the processing - path. A value 2 indicates that each layer that is int eh CSP - stack will have filters = filters/2. - kernel_initializer: method to use to initialize kernel weights. - bias_initializer: method to use to initialize the bias of the conv - layers. - bias_regularizer: string to indicate which function to use to regularizer - bias. - kernel_regularizer: string to indicate which function to use to - regularizer weights. - use_sync_bn: bool if true use the sync batch normalization. - use_separable_conv: `bool` wether to use separable convs. - norm_momentum: batch norm parameter see Tensorflow documentation. - norm_epsilon: batch norm parameter see Tensorflow documentation. - block_invert: bool use for switching between the even and odd - repretions of layers. usually the repetition is based on a - 3x3 conv with filters, followed by a 1x1 with filters/2 with - an even number of repetitions to ensure each 3x3 gets a 1x1 - sqeeze. block invert swaps the 3x3/1 1x1/2 to a 1x1/2 3x3/1 - ordering typically used when the model requires an odd number - of repetiitions. All other peramters maintain their affects - activation: activation function to use in processing. - leaky_alpha: if leaky acitivation function, the alpha to use in - processing the relu input. - spp_keys: List[int] of the sampling levels to be applied by - the Spatial Pyramid Pooling Layer. By default it is - [5, 9, 13] inidicating a 5x5 pooling followed by 9x9 - followed by 13x13 then followed by the standard concatnation - and convolution. - **kwargs: Keyword Arguments. - """ - - super().__init__(**kwargs) - # darkconv params - self._filters = filters - self._use_sync_bn = use_sync_bn - self._use_separable_conv = use_separable_conv - self._kernel_initializer = kernel_initializer - self._bias_initializer = bias_initializer - self._bias_regularizer = bias_regularizer - self._kernel_regularizer = kernel_regularizer - - # normal params - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - - # activation params - self._activation = activation - self._leaky_alpha = leaky_alpha - - repetitions += (2 * int(insert_spp)) - if repetitions == 1: - block_invert = True - - self._repetitions = repetitions - self.layer_list, self.outputs = self._get_base_layers() - - if csp_stack > 0: - self._csp_scale = csp_scale - csp_stack += (2 * int(insert_spp)) - self._csp_filters = lambda x: x // csp_scale - self._convert_csp(self.layer_list, self.outputs, csp_stack) - block_invert = False - - self._csp_stack = csp_stack - - if block_invert: - self._conv1_filters = lambda x: x - self._conv2_filters = lambda x: x // 2 - self._conv1_kernel = (3, 3) - self._conv2_kernel = (1, 1) - else: - self._conv1_filters = lambda x: x // 2 - self._conv2_filters = lambda x: x - self._conv1_kernel = (1, 1) - self._conv2_kernel = (3, 3) - - # insert SPP will always add to the total nuber of layer, never replace - if insert_spp: - self._spp_keys = spp_keys if spp_keys is not None else [5, 9, 13] - self.layer_list = self._insert_spp(self.layer_list) - - if repetitions > 1: - self.outputs[-2] = True - - if insert_sam: - self.layer_list = self._insert_sam(self.layer_list, self.outputs) - self._repetitions += 1 - self.outputs[-1] = True - - def _get_base_layers(self): - layer_list = [] - outputs = [] - for i in range(self._repetitions): - layers = ['conv1'] * ((i + 1) % 2) + ['conv2'] * (i % 2) - layer_list.extend(layers) - outputs = [False] + outputs - return layer_list, outputs - - def _insert_spp(self, layer_list): - if len(layer_list) <= 3: - layer_list[1] = 'spp' - else: - layer_list[3] = 'spp' - return layer_list - - def _convert_csp(self, layer_list, outputs, csp_stack_size): - layer_list[0] = 'csp_route' - layer_list.insert(csp_stack_size - 1, 'csp_connect') - outputs.insert(csp_stack_size - 1, False) - return layer_list, outputs - - def _insert_sam(self, layer_list, outputs): - if len(layer_list) >= 2 and layer_list[-2] != 'spp': - layer_list.insert(-2, 'sam') - outputs.insert(-1, True) - else: - layer_list.insert(-1, 'sam') - outputs.insert(-1, False) - return layer_list - - def _conv1(self, filters, kwargs, csp=False): - if csp: - filters_ = self._csp_filters - else: - filters_ = self._conv1_filters - - x1 = ConvBN( - filters=filters_(filters), - kernel_size=self._conv1_kernel, - strides=(1, 1), - padding='same', - use_bn=True, - **kwargs) - return x1 - - def _conv2(self, filters, kwargs, csp=False): - if csp: - filters_ = self._csp_filters - else: - filters_ = self._conv2_filters - - x1 = ConvBN( - filters=filters_(filters), - kernel_size=self._conv2_kernel, - strides=(1, 1), - padding='same', - use_bn=True, - **kwargs) - return x1 - - def _csp_route(self, filters, kwargs): - x1 = CSPRoute( - filters=filters, - filter_scale=self._csp_scale, - downsample=False, - **kwargs) - return x1 - - def _csp_connect(self, filters, kwargs): - x1 = CSPConnect(filters=filters, drop_final=True, drop_first=True, **kwargs) - return x1 - - def _spp(self, filters, kwargs): - x1 = SPP(self._spp_keys) - return x1 - - def _sam(self, filters, kwargs): - x1 = SAM(filters=-1, use_pooling=False, use_bn=True, **kwargs) - return x1 - - def build(self, input_shape): - dark_conv_args = { - 'activation': self._activation, - 'kernel_initializer': self._kernel_initializer, - 'bias_initializer': self._bias_initializer, - 'bias_regularizer': self._bias_regularizer, - 'use_sync_bn': self._use_sync_bn, - 'use_separable_conv': self._use_separable_conv, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon, - 'kernel_regularizer': self._kernel_regularizer, - 'leaky_alpha': self._leaky_alpha, - } - - csp = False - self.layers = [] - for layer in self.layer_list: - if layer == 'csp_route': - self.layers.append(self._csp_route(self._filters, dark_conv_args)) - csp = True - elif layer == 'csp_connect': - self.layers.append(self._csp_connect(self._filters, dark_conv_args)) - csp = False - elif layer == 'conv1': - self.layers.append(self._conv1(self._filters, dark_conv_args, csp=csp)) - elif layer == 'conv2': - self.layers.append(self._conv2(self._filters, dark_conv_args, csp=csp)) - elif layer == 'spp': - self.layers.append(self._spp(self._filters, dark_conv_args)) - elif layer == 'sam': - self.layers.append(self._sam(-1, dark_conv_args)) - - self._lim = len(self.layers) - super().build(input_shape) - - def _call_regular(self, inputs, training=None): - # check efficiency - x = inputs - x_prev = x - output_prev = True - - for (layer, output) in zip(self.layers, self.outputs): - if output_prev: - x_prev = x - x = layer(x) - output_prev = output - return x_prev, x - - def _call_csp(self, inputs, training=None): - # check efficiency - x = inputs - x_prev = x - output_prev = True - x_route = None - - for i, (layer, output) in enumerate(zip(self.layers, self.outputs)): - if output_prev: - x_prev = x - if i == 0: - x, x_route = layer(x) - elif i == self._csp_stack - 1: - x = layer([x, x_route]) - else: - x = layer(x) - output_prev = output - return x_prev, x - - def call(self, inputs, training=None): - if self._csp_stack > 0: - return self._call_csp(inputs, training=training) - else: - return self._call_regular(inputs) - - -class Reorg(tf.keras.layers.Layer): - """Splits a high resolution image into 4 lower resolution images. - - Used in YOLOR to process very high resolution inputs efficiently. - for example an input image of [1280, 1280, 3] will become [640, 640, 12], - the images are sampled in such a way that the spatial resoltion is - retained. - """ - - def call(self, x, training=None): - return tf.concat([ - x[..., ::2, ::2, :], x[..., 1::2, ::2, :], x[..., ::2, 1::2, :], - x[..., 1::2, 1::2, :] - ], - axis=-1) diff --git a/official/vision/beta/projects/yolo/modeling/layers/nn_blocks_test.py b/official/vision/beta/projects/yolo/modeling/layers/nn_blocks_test.py deleted file mode 100644 index b43beefba60f8343972e9be749e5b0fc0837e912..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/modeling/layers/nn_blocks_test.py +++ /dev/null @@ -1,306 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.vision.beta.projects.yolo.modeling.layers import nn_blocks - - -class CSPConnectTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.named_parameters(('same', 224, 224, 64, 1), - ('downsample', 224, 224, 64, 2)) - def test_pass_through(self, width, height, filters, mod): - x = tf.keras.Input(shape=(width, height, filters)) - test_layer = nn_blocks.CSPRoute(filters=filters, filter_scale=mod) - test_layer2 = nn_blocks.CSPConnect(filters=filters, filter_scale=mod) - outx, px = test_layer(x) - outx = test_layer2([outx, px]) - print(outx) - print(outx.shape.as_list()) - self.assertAllEqual( - outx.shape.as_list(), - [None, np.ceil(width // 2), - np.ceil(height // 2), (filters)]) - - @parameterized.named_parameters(('same', 224, 224, 64, 1), - ('downsample', 224, 224, 128, 2)) - def test_gradient_pass_though(self, filters, width, height, mod): - loss = tf.keras.losses.MeanSquaredError() - optimizer = tf.keras.optimizers.SGD() - test_layer = nn_blocks.CSPRoute(filters, filter_scale=mod) - path_layer = nn_blocks.CSPConnect(filters, filter_scale=mod) - - init = tf.random_normal_initializer() - x = tf.Variable( - initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) - y = tf.Variable( - initial_value=init( - shape=(1, int(np.ceil(width // 2)), int(np.ceil(height // 2)), - filters), - dtype=tf.float32)) - - with tf.GradientTape() as tape: - x_hat, x_prev = test_layer(x) - x_hat = path_layer([x_hat, x_prev]) - grad_loss = loss(x_hat, y) - grad = tape.gradient(grad_loss, test_layer.trainable_variables) - optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) - - self.assertNotIn(None, grad) - - -class CSPRouteTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.named_parameters(('same', 224, 224, 64, 1), - ('downsample', 224, 224, 64, 2)) - def test_pass_through(self, width, height, filters, mod): - x = tf.keras.Input(shape=(width, height, filters)) - test_layer = nn_blocks.CSPRoute(filters=filters, filter_scale=mod) - outx, _ = test_layer(x) - print(outx) - print(outx.shape.as_list()) - self.assertAllEqual( - outx.shape.as_list(), - [None, np.ceil(width // 2), - np.ceil(height // 2), (filters / mod)]) - - @parameterized.named_parameters(('same', 224, 224, 64, 1), - ('downsample', 224, 224, 128, 2)) - def test_gradient_pass_though(self, filters, width, height, mod): - loss = tf.keras.losses.MeanSquaredError() - optimizer = tf.keras.optimizers.SGD() - test_layer = nn_blocks.CSPRoute(filters, filter_scale=mod) - path_layer = nn_blocks.CSPConnect(filters, filter_scale=mod) - - init = tf.random_normal_initializer() - x = tf.Variable( - initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) - y = tf.Variable( - initial_value=init( - shape=(1, int(np.ceil(width // 2)), int(np.ceil(height // 2)), - filters), - dtype=tf.float32)) - - with tf.GradientTape() as tape: - x_hat, x_prev = test_layer(x) - x_hat = path_layer([x_hat, x_prev]) - grad_loss = loss(x_hat, y) - grad = tape.gradient(grad_loss, test_layer.trainable_variables) - optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) - - self.assertNotIn(None, grad) - - -class ConvBNTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.named_parameters( - ('valid', (3, 3), 'valid', (1, 1)), ('same', (3, 3), 'same', (1, 1)), - ('downsample', (3, 3), 'same', (2, 2)), ('test', (1, 1), 'valid', (1, 1))) - def test_pass_through(self, kernel_size, padding, strides): - if padding == 'same': - pad_const = 1 - else: - pad_const = 0 - x = tf.keras.Input(shape=(224, 224, 3)) - test_layer = nn_blocks.ConvBN( - filters=64, - kernel_size=kernel_size, - padding=padding, - strides=strides, - trainable=False) - outx = test_layer(x) - print(outx.shape.as_list()) - test = [ - None, - int((224 - kernel_size[0] + (2 * pad_const)) / strides[0] + 1), - int((224 - kernel_size[1] + (2 * pad_const)) / strides[1] + 1), 64 - ] - print(test) - self.assertAllEqual(outx.shape.as_list(), test) - - @parameterized.named_parameters(('filters', 3)) - def test_gradient_pass_though(self, filters): - loss = tf.keras.losses.MeanSquaredError() - optimizer = tf.keras.optimizers.SGD() - with tf.device('/CPU:0'): - test_layer = nn_blocks.ConvBN(filters, kernel_size=(3, 3), padding='same') - - init = tf.random_normal_initializer() - x = tf.Variable( - initial_value=init(shape=(1, 224, 224, 3), dtype=tf.float32)) - y = tf.Variable( - initial_value=init(shape=(1, 224, 224, filters), dtype=tf.float32)) - - with tf.GradientTape() as tape: - x_hat = test_layer(x) - grad_loss = loss(x_hat, y) - grad = tape.gradient(grad_loss, test_layer.trainable_variables) - optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) - self.assertNotIn(None, grad) - - -class DarkResidualTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.named_parameters(('same', 224, 224, 64, False), - ('downsample', 223, 223, 32, True), - ('oddball', 223, 223, 32, False)) - def test_pass_through(self, width, height, filters, downsample): - mod = 1 - if downsample: - mod = 2 - x = tf.keras.Input(shape=(width, height, filters)) - test_layer = nn_blocks.DarkResidual(filters=filters, downsample=downsample) - outx = test_layer(x) - print(outx) - print(outx.shape.as_list()) - self.assertAllEqual( - outx.shape.as_list(), - [None, np.ceil(width / mod), - np.ceil(height / mod), filters]) - - @parameterized.named_parameters(('same', 64, 224, 224, False), - ('downsample', 32, 223, 223, True), - ('oddball', 32, 223, 223, False)) - def test_gradient_pass_though(self, filters, width, height, downsample): - loss = tf.keras.losses.MeanSquaredError() - optimizer = tf.keras.optimizers.SGD() - test_layer = nn_blocks.DarkResidual(filters, downsample=downsample) - - if downsample: - mod = 2 - else: - mod = 1 - - init = tf.random_normal_initializer() - x = tf.Variable( - initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) - y = tf.Variable( - initial_value=init( - shape=(1, int(np.ceil(width / mod)), int(np.ceil(height / mod)), - filters), - dtype=tf.float32)) - - with tf.GradientTape() as tape: - x_hat = test_layer(x) - grad_loss = loss(x_hat, y) - grad = tape.gradient(grad_loss, test_layer.trainable_variables) - optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) - - self.assertNotIn(None, grad) - - -class DarkSppTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.named_parameters(('RouteProcessSpp', 224, 224, 3, [5, 9, 13]), - ('test1', 300, 300, 10, [2, 3, 4, 5]), - ('test2', 256, 256, 5, [10])) - def test_pass_through(self, width, height, channels, sizes): - x = tf.keras.Input(shape=(width, height, channels)) - test_layer = nn_blocks.SPP(sizes=sizes) - outx = test_layer(x) - self.assertAllEqual(outx.shape.as_list(), - [None, width, height, channels * (len(sizes) + 1)]) - return - - @parameterized.named_parameters(('RouteProcessSpp', 224, 224, 3, [5, 9, 13]), - ('test1', 300, 300, 10, [2, 3, 4, 5]), - ('test2', 256, 256, 5, [10])) - def test_gradient_pass_though(self, width, height, channels, sizes): - loss = tf.keras.losses.MeanSquaredError() - optimizer = tf.keras.optimizers.SGD() - test_layer = nn_blocks.SPP(sizes=sizes) - - init = tf.random_normal_initializer() - x = tf.Variable( - initial_value=init( - shape=(1, width, height, channels), dtype=tf.float32)) - y = tf.Variable( - initial_value=init( - shape=(1, width, height, channels * (len(sizes) + 1)), - dtype=tf.float32)) - - with tf.GradientTape() as tape: - x_hat = test_layer(x) - grad_loss = loss(x_hat, y) - grad = tape.gradient(grad_loss, test_layer.trainable_variables) - optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) - - self.assertNotIn(None, grad) - return - - -class DarkRouteProcessTest(tf.test.TestCase, parameterized.TestCase): - - @parameterized.named_parameters( - ('test1', 224, 224, 64, 7, False), ('test2', 223, 223, 32, 3, False), - ('tiny', 223, 223, 16, 1, False), ('spp', 224, 224, 64, 7, False)) - def test_pass_through(self, width, height, filters, repetitions, spp): - x = tf.keras.Input(shape=(width, height, filters)) - test_layer = nn_blocks.DarkRouteProcess( - filters=filters, repetitions=repetitions, insert_spp=spp) - outx = test_layer(x) - self.assertLen(outx, 2, msg='len(outx) != 2') - if repetitions == 1: - filter_y1 = filters - else: - filter_y1 = filters // 2 - self.assertAllEqual( - outx[1].shape.as_list(), [None, width, height, filter_y1]) - self.assertAllEqual( - filters % 2, - 0, - msg='Output of a DarkRouteProcess layer has an odd number of filters') - self.assertAllEqual(outx[0].shape.as_list(), [None, width, height, filters]) - - @parameterized.named_parameters( - ('test1', 224, 224, 64, 7, False), ('test2', 223, 223, 32, 3, False), - ('tiny', 223, 223, 16, 1, False), ('spp', 224, 224, 64, 7, False)) - def test_gradient_pass_though(self, width, height, filters, repetitions, spp): - loss = tf.keras.losses.MeanSquaredError() - optimizer = tf.keras.optimizers.SGD() - test_layer = nn_blocks.DarkRouteProcess( - filters=filters, repetitions=repetitions, insert_spp=spp) - - if repetitions == 1: - filter_y1 = filters - else: - filter_y1 = filters // 2 - - init = tf.random_normal_initializer() - x = tf.Variable( - initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) - y_0 = tf.Variable( - initial_value=init(shape=(1, width, height, filters), dtype=tf.float32)) - y_1 = tf.Variable( - initial_value=init( - shape=(1, width, height, filter_y1), dtype=tf.float32)) - - with tf.GradientTape() as tape: - x_hat_0, x_hat_1 = test_layer(x) - grad_loss_0 = loss(x_hat_0, y_0) - grad_loss_1 = loss(x_hat_1, y_1) - grad = tape.gradient([grad_loss_0, grad_loss_1], - test_layer.trainable_variables) - optimizer.apply_gradients(zip(grad, test_layer.trainable_variables)) - - self.assertNotIn(None, grad) - return - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/projects/yolo/ops/__init__.py b/official/vision/beta/projects/yolo/ops/__init__.py deleted file mode 100644 index a25710c222e3327cb20e000db5df5c5651c4a2cc..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/ops/__init__.py +++ /dev/null @@ -1,15 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - diff --git a/official/vision/beta/projects/yolo/ops/anchor.py b/official/vision/beta/projects/yolo/ops/anchor.py deleted file mode 100644 index dfe675984a7f77d9c45117d41f21a6b96ab60568..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/ops/anchor.py +++ /dev/null @@ -1,481 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Yolo Anchor labler.""" -import numpy as np -import tensorflow as tf - -from official.vision.beta.projects.yolo.ops import box_ops -from official.vision.beta.projects.yolo.ops import loss_utils -from official.vision.beta.projects.yolo.ops import preprocessing_ops - -INF = 10000000 - - -def get_best_anchor(y_true, - anchors, - stride, - width=1, - height=1, - iou_thresh=0.25, - best_match_only=False, - use_tie_breaker=True): - """Get the correct anchor that is assoiciated with each box using IOU. - - Args: - y_true: tf.Tensor[] for the list of bounding boxes in the yolo format. - anchors: list or tensor for the anchor boxes to be used in prediction found - via Kmeans. - stride: `int` stride for the anchors. - width: int for the image width. - height: int for the image height. - iou_thresh: `float` the minimum iou threshold to use for selecting boxes for - each level. - best_match_only: `bool` if the box only has one match and it is less than - the iou threshold, when set to True, this match will be dropped as no - anchors can be linked to it. - use_tie_breaker: `bool` if there is many anchors for a given box, then - attempt to use all of them, if False, only the first matching box will be - used. - Returns: - tf.Tensor: y_true with the anchor associated with each ground truth box - known - """ - with tf.name_scope('get_best_anchor'): - width = tf.cast(width, dtype=tf.float32) - height = tf.cast(height, dtype=tf.float32) - scaler = tf.convert_to_tensor([width, height]) - - # scale to levels houts width and height - true_wh = tf.cast(y_true[..., 2:4], dtype=tf.float32) * scaler - - # scale down from large anchor to small anchor type - anchors = tf.cast(anchors, dtype=tf.float32) / stride - - k = tf.shape(anchors)[0] - - anchors = tf.concat([tf.zeros_like(anchors), anchors], axis=-1) - truth_comp = tf.concat([tf.zeros_like(true_wh), true_wh], axis=-1) - - if iou_thresh >= 1.0: - anchors = tf.expand_dims(anchors, axis=-2) - truth_comp = tf.expand_dims(truth_comp, axis=-3) - - aspect = truth_comp[..., 2:4] / anchors[..., 2:4] - aspect = tf.where(tf.math.is_nan(aspect), tf.zeros_like(aspect), aspect) - aspect = tf.maximum(aspect, 1 / aspect) - aspect = tf.where(tf.math.is_nan(aspect), tf.zeros_like(aspect), aspect) - aspect = tf.reduce_max(aspect, axis=-1) - - values, indexes = tf.math.top_k( - tf.transpose(-aspect, perm=[1, 0]), - k=tf.cast(k, dtype=tf.int32), - sorted=True) - values = -values - ind_mask = tf.cast(values < iou_thresh, dtype=indexes.dtype) - else: - truth_comp = box_ops.xcycwh_to_yxyx(truth_comp) - anchors = box_ops.xcycwh_to_yxyx(anchors) - iou_raw = box_ops.aggregated_comparitive_iou( - truth_comp, - anchors, - iou_type=3, - ) - values, indexes = tf.math.top_k( - iou_raw, k=tf.cast(k, dtype=tf.int32), sorted=True) - ind_mask = tf.cast(values >= iou_thresh, dtype=indexes.dtype) - - # pad the indexs such that all values less than the thresh are -1 - # add one, multiply the mask to zeros all the bad locations - # subtract 1 makeing all the bad locations 0. - if best_match_only: - iou_index = ((indexes[..., 0:] + 1) * ind_mask[..., 0:]) - 1 - elif use_tie_breaker: - iou_index = tf.concat([ - tf.expand_dims(indexes[..., 0], axis=-1), - ((indexes[..., 1:] + 1) * ind_mask[..., 1:]) - 1 - ], - axis=-1) - else: - iou_index = tf.concat([ - tf.expand_dims(indexes[..., 0], axis=-1), - tf.zeros_like(indexes[..., 1:]) - 1 - ], - axis=-1) - - return tf.cast(iou_index, dtype=tf.float32), tf.cast(values, dtype=tf.float32) - - -class YoloAnchorLabeler: - """Anchor labeler for the Yolo Models.""" - - def __init__(self, - anchors=None, - anchor_free_level_limits=None, - level_strides=None, - center_radius=None, - max_num_instances=200, - match_threshold=0.25, - best_matches_only=False, - use_tie_breaker=True, - darknet=False, - dtype='float32'): - """Initialization for anchor labler. - - Args: - anchors: `Dict[List[Union[int, float]]]` values for each anchor box. - anchor_free_level_limits: `List` the box sizes that will be allowed at - each FPN level as is done in the FCOS and YOLOX paper for anchor free - box assignment. - level_strides: `Dict[int]` for how much the model scales down the images - at the each level. - center_radius: `Dict[float]` for radius around each box center to search - for extra centers in each level. - max_num_instances: `int` for the number of boxes to compute loss on. - match_threshold: `float` indicating the threshold over which an anchor - will be considered for prediction, at zero, all the anchors will be used - and at 1.0 only the best will be used. for anchor thresholds larger than - 1.0 we stop using the IOU for anchor comparison and resort directly to - comparing the width and height, this is used for the scaled models. - best_matches_only: `boolean` indicating how boxes are selected for - optimization. - use_tie_breaker: `boolean` indicating whether to use the anchor threshold - value. - darknet: `boolean` indicating which data pipeline to use. Setting to True - swaps the pipeline to output images realtive to Yolov4 and older. - dtype: `str` indicating the output datatype of the datapipeline selecting - from {"float32", "float16", "bfloat16"}. - """ - self.anchors = anchors - self.masks = self._get_mask() - self.anchor_free_level_limits = self._get_level_limits( - anchor_free_level_limits) - - if darknet and self.anchor_free_level_limits is None: - center_radius = None - - self.keys = self.anchors.keys() - if self.anchor_free_level_limits is not None: - maxim = 2000 - match_threshold = -0.01 - self.num_instances = {key: maxim for key in self.keys} - elif not darknet: - self.num_instances = { - key: (6 - i) * max_num_instances for i, key in enumerate(self.keys) - } - else: - self.num_instances = {key: max_num_instances for key in self.keys} - - self.center_radius = center_radius - self.level_strides = level_strides - self.match_threshold = match_threshold - self.best_matches_only = best_matches_only - self.use_tie_breaker = use_tie_breaker - self.dtype = dtype - - def _get_mask(self): - """For each level get indexs of each anchor for box search across levels.""" - masks = {} - start = 0 - - minimum = int(min(self.anchors.keys())) - maximum = int(max(self.anchors.keys())) - for i in range(minimum, maximum + 1): - per_scale = len(self.anchors[str(i)]) - masks[str(i)] = list(range(start, per_scale + start)) - start += per_scale - return masks - - def _get_level_limits(self, level_limits): - """For each level receptive feild range for anchor free box placement.""" - if level_limits is not None: - level_limits_dict = {} - level_limits = [0.0] + level_limits + [np.inf] - - for i, key in enumerate(self.anchors.keys()): - level_limits_dict[key] = level_limits[i:i + 2] - else: - level_limits_dict = None - return level_limits_dict - - def _tie_breaking_search(self, anchors, mask, boxes, classes): - """After search, link each anchor ind to the correct map in ground truth.""" - mask = tf.cast(tf.reshape(mask, [1, 1, 1, -1]), anchors.dtype) - anchors = tf.expand_dims(anchors, axis=-1) - viable = tf.where(tf.squeeze(anchors == mask, axis=0)) - - gather_id, _, anchor_id = tf.split(viable, 3, axis=-1) - - boxes = tf.gather_nd(boxes, gather_id) - classes = tf.gather_nd(classes, gather_id) - - classes = tf.expand_dims(classes, axis=-1) - classes = tf.cast(classes, boxes.dtype) - anchor_id = tf.cast(anchor_id, boxes.dtype) - return boxes, classes, anchor_id - - def _get_anchor_id(self, - key, - boxes, - classes, - width, - height, - stride, - iou_index=None): - """Find the object anchor assignments in an anchor based paradigm.""" - - # find the best anchor - anchors = self.anchors[key] - num_anchors = len(anchors) - if self.best_matches_only: - # get the best anchor for each box - iou_index, _ = get_best_anchor( - boxes, - anchors, - stride, - width=width, - height=height, - best_match_only=True, - iou_thresh=self.match_threshold) - mask = range(num_anchors) - else: - # search is done across FPN levels, get the mask of anchor indexes - # corralated to this level. - mask = self.masks[key] - - # search for the correct box to use - (boxes, classes, - anchors) = self._tie_breaking_search(iou_index, mask, boxes, classes) - return boxes, classes, anchors, num_anchors - - def _get_centers(self, boxes, classes, anchors, width, height, scale_xy): - """Find the object center assignments in an anchor based paradigm.""" - offset = tf.cast(0.5 * (scale_xy - 1), boxes.dtype) - - grid_xy, _ = tf.split(boxes, 2, axis=-1) - wh_scale = tf.cast(tf.convert_to_tensor([width, height]), boxes.dtype) - - grid_xy = grid_xy * wh_scale - centers = tf.math.floor(grid_xy) - - if offset != 0.0: - clamp = lambda x, ma: tf.maximum( # pylint:disable=g-long-lambda - tf.minimum(x, tf.cast(ma, x.dtype)), tf.zeros_like(x)) - - grid_xy_index = grid_xy - centers - positive_shift = ((grid_xy_index < offset) & (grid_xy > 1.)) - negative_shift = ((grid_xy_index > (1 - offset)) & (grid_xy < - (wh_scale - 1.))) - - zero, _ = tf.split(tf.ones_like(positive_shift), 2, axis=-1) - shift_mask = tf.concat([zero, positive_shift, negative_shift], axis=-1) - offset = tf.cast([[0, 0], [1, 0], [0, 1], [-1, 0], [0, -1]], - offset.dtype) * offset - - num_shifts = tf.shape(shift_mask) - num_shifts = num_shifts[-1] - boxes = tf.tile(tf.expand_dims(boxes, axis=-2), [1, num_shifts, 1]) - classes = tf.tile(tf.expand_dims(classes, axis=-2), [1, num_shifts, 1]) - anchors = tf.tile(tf.expand_dims(anchors, axis=-2), [1, num_shifts, 1]) - - shift_mask = tf.cast(shift_mask, boxes.dtype) - shift_ind = shift_mask * tf.range(0, num_shifts, dtype=boxes.dtype) - shift_ind = shift_ind - (1 - shift_mask) - shift_ind = tf.expand_dims(shift_ind, axis=-1) - - boxes_and_centers = tf.concat([boxes, classes, anchors, shift_ind], - axis=-1) - boxes_and_centers = tf.reshape(boxes_and_centers, [-1, 7]) - _, center_ids = tf.split(boxes_and_centers, [6, 1], axis=-1) - - select = tf.where(center_ids >= 0) - select, _ = tf.split(select, 2, axis=-1) - - boxes_and_centers = tf.gather_nd(boxes_and_centers, select) - - center_ids = tf.gather_nd(center_ids, select) - center_ids = tf.cast(center_ids, tf.int32) - shifts = tf.gather_nd(offset, center_ids) - - boxes, classes, anchors, _ = tf.split( - boxes_and_centers, [4, 1, 1, 1], axis=-1) - grid_xy, _ = tf.split(boxes, 2, axis=-1) - centers = tf.math.floor(grid_xy * wh_scale - shifts) - centers = clamp(centers, wh_scale - 1) - - x, y = tf.split(centers, 2, axis=-1) - centers = tf.cast(tf.concat([y, x, anchors], axis=-1), tf.int32) - return boxes, classes, centers - - def _get_anchor_free(self, key, boxes, classes, height, width, stride, - center_radius): - """Find the box assignements in an anchor free paradigm.""" - level_limits = self.anchor_free_level_limits[key] - gen = loss_utils.GridGenerator(anchors=[[1, 1]], scale_anchors=stride) - grid_points = gen(width, height, 1, boxes.dtype)[0] - grid_points = tf.squeeze(grid_points, axis=0) - box_list = boxes - class_list = classes - - grid_points = (grid_points + 0.5) * stride - x_centers, y_centers = grid_points[..., 0], grid_points[..., 1] - boxes *= (tf.convert_to_tensor([width, height, width, height]) * stride) - - tlbr_boxes = box_ops.xcycwh_to_yxyx(boxes) - - boxes = tf.reshape(boxes, [1, 1, -1, 4]) - tlbr_boxes = tf.reshape(tlbr_boxes, [1, 1, -1, 4]) - if self.use_tie_breaker: - area = tf.reduce_prod(boxes[..., 2:], axis=-1) - - # check if the box is in the receptive feild of the this fpn level - b_t = y_centers - tlbr_boxes[..., 0] - b_l = x_centers - tlbr_boxes[..., 1] - b_b = tlbr_boxes[..., 2] - y_centers - b_r = tlbr_boxes[..., 3] - x_centers - box_delta = tf.stack([b_t, b_l, b_b, b_r], axis=-1) - if level_limits is not None: - max_reg_targets_per_im = tf.reduce_max(box_delta, axis=-1) - gt_min = max_reg_targets_per_im >= level_limits[0] - gt_max = max_reg_targets_per_im <= level_limits[1] - is_in_boxes = tf.logical_and(gt_min, gt_max) - else: - is_in_boxes = tf.reduce_min(box_delta, axis=-1) > 0.0 - is_in_boxes_all = tf.reduce_any(is_in_boxes, axis=(0, 1), keepdims=True) - - # check if the center is in the receptive feild of the this fpn level - c_t = y_centers - (boxes[..., 1] - center_radius * stride) - c_l = x_centers - (boxes[..., 0] - center_radius * stride) - c_b = (boxes[..., 1] + center_radius * stride) - y_centers - c_r = (boxes[..., 0] + center_radius * stride) - x_centers - centers_delta = tf.stack([c_t, c_l, c_b, c_r], axis=-1) - is_in_centers = tf.reduce_min(centers_delta, axis=-1) > 0.0 - is_in_centers_all = tf.reduce_any(is_in_centers, axis=(0, 1), keepdims=True) - - # colate all masks to get the final locations - is_in_index = tf.logical_or(is_in_boxes_all, is_in_centers_all) - is_in_boxes_and_center = tf.logical_and(is_in_boxes, is_in_centers) - is_in_boxes_and_center = tf.logical_and(is_in_index, is_in_boxes_and_center) - - if self.use_tie_breaker: - boxes_all = tf.cast(is_in_boxes_and_center, area.dtype) - boxes_all = ((boxes_all * area) + ((1 - boxes_all) * INF)) - boxes_min = tf.reduce_min(boxes_all, axis=-1, keepdims=True) - boxes_min = tf.where(boxes_min == INF, -1.0, boxes_min) - is_in_boxes_and_center = boxes_all == boxes_min - - # construct the index update grid - reps = tf.reduce_sum(tf.cast(is_in_boxes_and_center, tf.int16), axis=-1) - indexes = tf.cast(tf.where(is_in_boxes_and_center), tf.int32) - y, x, t = tf.split(indexes, 3, axis=-1) - - boxes = tf.gather_nd(box_list, t) - classes = tf.cast(tf.gather_nd(class_list, t), boxes.dtype) - reps = tf.gather_nd(reps, tf.concat([y, x], axis=-1)) - reps = tf.cast(tf.expand_dims(reps, axis=-1), boxes.dtype) - classes = tf.cast(tf.expand_dims(classes, axis=-1), boxes.dtype) - conf = tf.ones_like(classes) - - # return the samples and the indexes - samples = tf.concat([boxes, conf, classes], axis=-1) - indexes = tf.concat([y, x, tf.zeros_like(t)], axis=-1) - return indexes, samples - - def build_label_per_path(self, - key, - boxes, - classes, - width, - height, - iou_index=None): - """Builds the labels for one path.""" - stride = self.level_strides[key] - scale_xy = self.center_radius[key] if self.center_radius is not None else 1 - - width = tf.cast(width // stride, boxes.dtype) - height = tf.cast(height // stride, boxes.dtype) - - if self.anchor_free_level_limits is None: - (boxes, classes, anchors, num_anchors) = self._get_anchor_id( - key, boxes, classes, width, height, stride, iou_index=iou_index) - boxes, classes, centers = self._get_centers(boxes, classes, anchors, - width, height, scale_xy) - ind_mask = tf.ones_like(classes) - updates = tf.concat([boxes, ind_mask, classes], axis=-1) - else: - num_anchors = 1 - (centers, updates) = self._get_anchor_free(key, boxes, classes, height, - width, stride, scale_xy) - boxes, ind_mask, classes = tf.split(updates, [4, 1, 1], axis=-1) - - width = tf.cast(width, tf.int32) - height = tf.cast(height, tf.int32) - full = tf.zeros([height, width, num_anchors, 1], dtype=classes.dtype) - full = tf.tensor_scatter_nd_add(full, centers, ind_mask) - - num_instances = int(self.num_instances[key]) - centers = preprocessing_ops.pad_max_instances( - centers, num_instances, pad_value=0, pad_axis=0) - updates = preprocessing_ops.pad_max_instances( - updates, num_instances, pad_value=0, pad_axis=0) - - updates = tf.cast(updates, self.dtype) - full = tf.cast(full, self.dtype) - return centers, updates, full - - def __call__(self, boxes, classes, width, height): - """Builds the labels for a single image, not functional in batch mode. - - Args: - boxes: `Tensor` of shape [None, 4] indicating the object locations in an - image. - classes: `Tensor` of shape [None] indicating the each objects classes. - width: `int` for the images width. - height: `int` for the images height. - - Returns: - centers: `Tensor` of shape [None, 3] of indexes in the final grid where - boxes are located. - updates: `Tensor` of shape [None, 8] the value to place in the final grid. - full: `Tensor` of [width/stride, height/stride, num_anchors, 1] holding - a mask of where boxes are locates for confidence losses. - """ - indexes = {} - updates = {} - true_grids = {} - iou_index = None - - boxes = box_ops.yxyx_to_xcycwh(boxes) - if not self.best_matches_only and self.anchor_free_level_limits is None: - # stitch and search boxes across fpn levels - anchorsvec = [] - for stitch in self.anchors: - anchorsvec.extend(self.anchors[stitch]) - - stride = tf.cast([width, height], boxes.dtype) - # get the best anchor for each box - iou_index, _ = get_best_anchor( - boxes, - anchorsvec, - stride, - width=1.0, - height=1.0, - best_match_only=False, - use_tie_breaker=self.use_tie_breaker, - iou_thresh=self.match_threshold) - - for key in self.keys: - indexes[key], updates[key], true_grids[key] = self.build_label_per_path( - key, boxes, classes, width, height, iou_index=iou_index) - return indexes, updates, true_grids diff --git a/official/vision/beta/projects/yolo/ops/box_ops.py b/official/vision/beta/projects/yolo/ops/box_ops.py deleted file mode 100644 index 6d15f5d315702f44b6313427f531e2fece45de29..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/ops/box_ops.py +++ /dev/null @@ -1,322 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Yolo box ops.""" -import math -import tensorflow as tf -from official.vision.beta.projects.yolo.ops import math_ops - - -def yxyx_to_xcycwh(box: tf.Tensor): - """Converts boxes from yxyx to x_center, y_center, width, height. - - Args: - box: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax. - - Returns: - box: a `Tensor` whose shape is the same as `box` in new format. - """ - with tf.name_scope('yxyx_to_xcycwh'): - ymin, xmin, ymax, xmax = tf.split(box, 4, axis=-1) - x_center = (xmax + xmin) / 2 - y_center = (ymax + ymin) / 2 - width = xmax - xmin - height = ymax - ymin - box = tf.concat([x_center, y_center, width, height], axis=-1) - return box - - -def xcycwh_to_yxyx(box: tf.Tensor): - """Converts boxes from x_center, y_center, width, height to yxyx format. - - Args: - box: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes in x_center, y_center, width, height. - - Returns: - box: a `Tensor` whose shape is the same as `box` in new format. - """ - with tf.name_scope('xcycwh_to_yxyx'): - xy, wh = tf.split(box, 2, axis=-1) - xy_min = xy - wh / 2 - xy_max = xy + wh / 2 - x_min, y_min = tf.split(xy_min, 2, axis=-1) - x_max, y_max = tf.split(xy_max, 2, axis=-1) - box = tf.concat([y_min, x_min, y_max, x_max], axis=-1) - return box - - -def intersect_and_union(box1, box2, yxyx=False): - """Calculates the intersection and union between box1 and box2. - - Args: - box1: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - box2: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - yxyx: a `bool` indicating whether the input box is of the format x_center - y_center, width, height or y_min, x_min, y_max, x_max. - - Returns: - intersection: a `Tensor` who represents the intersection. - union: a `Tensor` who represents the union. - """ - if not yxyx: - box1_area = tf.reduce_prod(tf.split(box1, 2, axis=-1)[-1], axis=-1) - box2_area = tf.reduce_prod(tf.split(box2, 2, axis=-1)[-1], axis=-1) - box1 = xcycwh_to_yxyx(box1) - box2 = xcycwh_to_yxyx(box2) - - b1mi, b1ma = tf.split(box1, 2, axis=-1) - b2mi, b2ma = tf.split(box2, 2, axis=-1) - intersect_mins = tf.math.maximum(b1mi, b2mi) - intersect_maxes = tf.math.minimum(b1ma, b2ma) - intersect_wh = tf.math.maximum(intersect_maxes - intersect_mins, 0.0) - intersection = tf.reduce_prod(intersect_wh, axis=-1) - - if yxyx: - box1_area = tf.reduce_prod(b1ma - b1mi, axis=-1) - box2_area = tf.reduce_prod(b2ma - b2mi, axis=-1) - union = box1_area + box2_area - intersection - return intersection, union - - -def smallest_encompassing_box(box1, box2, yxyx=False, clip=False): - """Calculates the smallest box that encompasses box1 and box2. - - Args: - box1: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - box2: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - yxyx: a `bool` indicating whether the input box is of the format x_center - y_center, width, height or y_min, x_min, y_max, x_max. - clip: a `bool`, whether or not to clip boxes. - - Returns: - box_c: a `Tensor` whose last dimension is 4 representing the coordinates of - boxes, the return format is y_min, x_min, y_max, x_max if yxyx is set to - to True. In other words it will match the input format. - """ - if not yxyx: - box1 = xcycwh_to_yxyx(box1) - box2 = xcycwh_to_yxyx(box2) - - b1mi, b1ma = tf.split(box1, 2, axis=-1) - b2mi, b2ma = tf.split(box2, 2, axis=-1) - - bcmi = tf.math.minimum(b1mi, b2mi) - bcma = tf.math.maximum(b1ma, b2ma) - box_c = tf.concat([bcmi, bcma], axis=-1) - - if not yxyx: - box_c = yxyx_to_xcycwh(box_c) - - if clip: - bca = tf.reduce_prod(bcma - bcmi, keepdims=True, axis=-1) - box_c = tf.where(bca <= 0.0, tf.zeros_like(box_c), box_c) - return bcmi, bcma, box_c - - -def compute_iou(box1, box2, yxyx=False): - """Calculates the intersection over union between box1 and box2. - - Args: - box1: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - box2: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - yxyx: a `bool` indicating whether the input box is of the format x_center - y_center, width, height or y_min, x_min, y_max, x_max. - - Returns: - iou: a `Tensor` who represents the intersection over union. - """ - with tf.name_scope('iou'): - intersection, union = intersect_and_union(box1, box2, yxyx=yxyx) - iou = math_ops.divide_no_nan(intersection, union) - return iou - - -def compute_giou(box1, box2, yxyx=False): - """Calculates the General intersection over union between box1 and box2. - - Args: - box1: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - box2: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - yxyx: a `bool` indicating whether the input box is of the format x_center - y_center, width, height or y_min, x_min, y_max, x_max. - - Returns: - giou: a `Tensor` who represents the General intersection over union. - """ - with tf.name_scope('giou'): - if not yxyx: - yxyx1 = xcycwh_to_yxyx(box1) - yxyx2 = xcycwh_to_yxyx(box2) - else: - yxyx1, yxyx2 = box1, box2 - - cmi, cma, _ = smallest_encompassing_box(yxyx1, yxyx2, yxyx=True) - intersection, union = intersect_and_union(yxyx1, yxyx2, yxyx=True) - iou = math_ops.divide_no_nan(intersection, union) - - bcwh = cma - cmi - c = tf.math.reduce_prod(bcwh, axis=-1) - - regularization = math_ops.divide_no_nan((c - union), c) - giou = iou - regularization - return iou, giou - - -def compute_diou(box1, box2, beta=1.0, yxyx=False): - """Calculates the distance intersection over union between box1 and box2. - - Args: - box1: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - box2: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - beta: a `float` indicating the amount to scale the distance iou - regularization term. - yxyx: a `bool` indicating whether the input box is of the format x_center - y_center, width, height or y_min, x_min, y_max, x_max. - - Returns: - diou: a `Tensor` who represents the distance intersection over union. - """ - with tf.name_scope('diou'): - # compute center distance - if not yxyx: - xycc1, xycc2 = box1, box2 - yxyx1 = xcycwh_to_yxyx(box1) - yxyx2 = xcycwh_to_yxyx(box2) - else: - yxyx1, yxyx2 = box1, box2 - xycc1 = yxyx_to_xcycwh(box1) - xycc2 = yxyx_to_xcycwh(box2) - - cmi, cma, _ = smallest_encompassing_box(yxyx1, yxyx2, yxyx=True) - intersection, union = intersect_and_union(yxyx1, yxyx2, yxyx=True) - iou = math_ops.divide_no_nan(intersection, union) - - b1xy, _ = tf.split(xycc1, 2, axis=-1) - b2xy, _ = tf.split(xycc2, 2, axis=-1) - bcwh = cma - cmi - - center_dist = tf.reduce_sum((b1xy - b2xy)**2, axis=-1) - c_diag = tf.reduce_sum(bcwh**2, axis=-1) - - regularization = math_ops.divide_no_nan(center_dist, c_diag) - diou = iou - regularization**beta - return iou, diou - - -def compute_ciou(box1, box2, yxyx=False, darknet=False): - """Calculates the complete intersection over union between box1 and box2. - - Args: - box1: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - box2: any `Tensor` whose last dimension is 4 representing the coordinates of - boxes. - yxyx: a `bool` indicating whether the input box is of the format x_center - y_center, width, height or y_min, x_min, y_max, x_max. - darknet: a `bool` indicating whether the calling function is the YOLO - darknet loss. - - Returns: - ciou: a `Tensor` who represents the complete intersection over union. - """ - with tf.name_scope('ciou'): - if not yxyx: - xycc1, xycc2 = box1, box2 - yxyx1 = xcycwh_to_yxyx(box1) - yxyx2 = xcycwh_to_yxyx(box2) - else: - yxyx1, yxyx2 = box1, box2 - xycc1 = yxyx_to_xcycwh(box1) - xycc2 = yxyx_to_xcycwh(box2) - - # Build the smallest encomapssing box. - cmi, cma, _ = smallest_encompassing_box(yxyx1, yxyx2, yxyx=True) - intersection, union = intersect_and_union(yxyx1, yxyx2, yxyx=True) - iou = math_ops.divide_no_nan(intersection, union) - - b1xy, b1w, b1h = tf.split(xycc1, [2, 1, 1], axis=-1) - b2xy, b2w, b2h = tf.split(xycc2, [2, 1, 1], axis=-1) - bchw = cma - cmi - - # Center regularization - center_dist = tf.reduce_sum((b1xy - b2xy)**2, axis=-1) - c_diag = tf.reduce_sum(bchw**2, axis=-1) - regularization = math_ops.divide_no_nan(center_dist, c_diag) - - # Computer aspect ratio consistency - terma = math_ops.divide_no_nan(b1w, b1h) # gt - termb = math_ops.divide_no_nan(b2w, b2h) # pred - arcterm = tf.squeeze( - tf.math.pow(tf.math.atan(termb) - tf.math.atan(terma), 2), axis=-1) - v = (4 / math.pi**2) * arcterm - - # Compute the aspect ratio weight, should be treated as a constant - a = tf.stop_gradient(math_ops.divide_no_nan(v, 1 - iou + v)) - - if darknet: - grad_scale = tf.stop_gradient(tf.square(b2w) + tf.square(b2h)) - v *= tf.squeeze(grad_scale, axis=-1) - - ciou = iou - regularization - (v * a) - return iou, ciou - - -def aggregated_comparitive_iou(boxes1, boxes2=None, iou_type=0, beta=0.6): - """Calculates the IOU between two set of boxes. - - Similar to bbox_overlap but far more versitile. - - Args: - boxes1: a `Tensor` of shape [batch size, N, 4] representing the coordinates - of boxes. - boxes2: a `Tensor` of shape [batch size, N, 4] representing the coordinates - of boxes. - iou_type: `integer` representing the iou version to use, 0 is distance iou, - 1 is the general iou, 2 is the complete iou, any other number uses the - standard iou. - beta: `float` for the scaling quantity to apply to distance iou - regularization. - - Returns: - iou: a `Tensor` who represents the intersection over union in of the - expected/input type. - """ - boxes1 = tf.expand_dims(boxes1, axis=-2) - - if boxes2 is not None: - boxes2 = tf.expand_dims(boxes2, axis=-3) - else: - boxes2 = tf.transpose(boxes1, perm=(0, 2, 1, 3)) - - if iou_type == 0 or iou_type == 'diou': # diou - _, iou = compute_diou(boxes1, boxes2, beta=beta, yxyx=True) - elif iou_type == 1 or iou_type == 'giou': # giou - _, iou = compute_giou(boxes1, boxes2, yxyx=True) - elif iou_type == 2 or iou_type == 'ciou': # ciou - _, iou = compute_ciou(boxes1, boxes2, yxyx=True) - else: - iou = compute_iou(boxes1, boxes2, yxyx=True) - return iou diff --git a/official/vision/beta/projects/yolo/ops/loss_utils.py b/official/vision/beta/projects/yolo/ops/loss_utils.py deleted file mode 100755 index 5536290199bb00696f05fd4a03f562b4d8ab2b60..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/ops/loss_utils.py +++ /dev/null @@ -1,629 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Yolo loss utility functions.""" - -import numpy as np -import tensorflow as tf - -from official.vision.beta.projects.yolo.ops import box_ops -from official.vision.beta.projects.yolo.ops import math_ops - - -@tf.custom_gradient -def sigmoid_bce(y, x_prime, label_smoothing): - """Applies the Sigmoid Cross Entropy Loss. - - Implements the same derivative as that found in the Darknet C library. - The derivative of this method is not the same as the standard binary cross - entropy with logits function. - - The BCE with logits function equation is as follows: - x = 1 / (1 + exp(-x_prime)) - bce = -ylog(x) - (1 - y)log(1 - x) - - The standard BCE with logits function derivative is as follows: - dloss = -y/x + (1-y)/(1-x) - dsigmoid = x * (1 - x) - dx = dloss * dsigmoid - - This derivative can be reduced simply to: - dx = (-y + x) - - This simplification is used by the darknet library in order to improve - training stability. The gradient is almost the same - as tf.keras.losses.binary_crossentropy but varies slightly and - yields different performance. - - Args: - y: `Tensor` holding ground truth data. - x_prime: `Tensor` holding the predictions prior to application of the - sigmoid operation. - label_smoothing: float value between 0.0 and 1.0 indicating the amount of - smoothing to apply to the data. - - Returns: - bce: Tensor of the be applied loss values. - delta: callable function indicating the custom gradient for this operation. - """ - - eps = 1e-9 - x = tf.math.sigmoid(x_prime) - y = tf.stop_gradient(y * (1 - label_smoothing) + 0.5 * label_smoothing) - bce = -y * tf.math.log(x + eps) - (1 - y) * tf.math.log(1 - x + eps) - - def delta(dpass): - x = tf.math.sigmoid(x_prime) - dx = (-y + x) * dpass - dy = tf.zeros_like(y) - return dy, dx, 0.0 - - return bce, delta - - -def apply_mask(mask, x, value=0): - """This function is used for gradient masking. - - The YOLO loss function makes extensive use of dynamically shaped tensors. - To allow this use case on the TPU while preserving the gradient correctly - for back propagation we use this masking function to use a tf.where operation - to hard set masked location to have a gradient and a value of zero. - - Args: - mask: A `Tensor` with the same shape as x used to select values of - importance. - x: A `Tensor` with the same shape as mask that will be getting masked. - value: `float` constant additive value. - - Returns: - x: A masked `Tensor` with the same shape as x. - """ - mask = tf.cast(mask, tf.bool) - masked = tf.where(mask, x, tf.zeros_like(x) + value) - return masked - - -def build_grid(indexes, truths, preds, ind_mask, update=False, grid=None): - """This function is used to broadcast elements into the output shape. - - This function is used to broadcasts a list of truths into the correct index - in the output shape. This is used for the ground truth map construction in - the scaled loss and the classification map in the darknet loss. - - Args: - indexes: A `Tensor` for the indexes - truths: A `Tensor` for the ground truth. - preds: A `Tensor` for the predictions. - ind_mask: A `Tensor` for the index masks. - update: A `bool` for updating the grid. - grid: A `Tensor` for the grid. - - Returns: - grid: A `Tensor` representing the augmented grid. - """ - # this function is used to broadcast all the indexes to the correct - # into the correct ground truth mask, used for iou detection map - # in the scaled loss and the classification mask in the darknet loss - num_flatten = tf.shape(preds)[-1] - - # is there a way to verify that we are not on the CPU? - ind_mask = tf.cast(ind_mask, indexes.dtype) - - # find all the batch indexes using the cumulated sum of a ones tensor - # cumsum(ones) - 1 yeild the zero indexed batches - bhep = tf.reduce_max(tf.ones_like(indexes), axis=-1, keepdims=True) - bhep = tf.math.cumsum(bhep, axis=0) - 1 - - # concatnate the batch sizes to the indexes - indexes = tf.concat([bhep, indexes], axis=-1) - indexes = apply_mask(tf.cast(ind_mask, indexes.dtype), indexes) - indexes = (indexes + (ind_mask - 1)) - - # mask truths - truths = apply_mask(tf.cast(ind_mask, truths.dtype), truths) - truths = (truths + (tf.cast(ind_mask, truths.dtype) - 1)) - - # reshape the indexes into the correct shape for the loss, - # just flatten all indexes but the last - indexes = tf.reshape(indexes, [-1, 4]) - - # also flatten the ground truth value on all axis but the last - truths = tf.reshape(truths, [-1, num_flatten]) - - # build a zero grid in the samve shape as the predicitons - if grid is None: - grid = tf.zeros_like(preds) - # remove invalid values from the truths that may have - # come up from computation, invalid = nan and inf - truths = math_ops.rm_nan_inf(truths) - - # scatter update the zero grid - if update: - grid = tf.tensor_scatter_nd_update(grid, indexes, truths) - else: - grid = tf.tensor_scatter_nd_max(grid, indexes, truths) - - # stop gradient and return to avoid TPU errors and save compute - # resources - return grid - - -class GridGenerator: - """Grid generator that generates anchor grids for box decoding.""" - - def __init__(self, anchors, scale_anchors=None): - """Initialize Grid Generator. - - Args: - anchors: A `List[List[int]]` for the anchor boxes that are used in the - model at all levels. - scale_anchors: An `int` for how much to scale this level to get the - original input shape. - """ - self.dtype = tf.keras.backend.floatx() - self._scale_anchors = scale_anchors - self._anchors = tf.convert_to_tensor(anchors) - return - - def _build_grid_points(self, lwidth, lheight, anchors, dtype): - """Generate a grid of fixed grid edges for box center decoding.""" - with tf.name_scope('center_grid'): - y = tf.range(0, lheight) - x = tf.range(0, lwidth) - num = tf.shape(anchors)[0] - x_left = tf.tile( - tf.transpose(tf.expand_dims(y, axis=-1), perm=[1, 0]), [lwidth, 1]) - y_left = tf.tile(tf.expand_dims(x, axis=-1), [1, lheight]) - x_y = tf.stack([x_left, y_left], axis=-1) - x_y = tf.cast(x_y, dtype=dtype) - x_y = tf.expand_dims( - tf.tile(tf.expand_dims(x_y, axis=-2), [1, 1, num, 1]), axis=0) - return x_y - - def _build_anchor_grid(self, anchors, dtype): - """Get the transformed anchor boxes for each dimention.""" - with tf.name_scope('anchor_grid'): - num = tf.shape(anchors)[0] - anchors = tf.cast(anchors, dtype=dtype) - anchors = tf.reshape(anchors, [1, 1, 1, num, 2]) - return anchors - - def _extend_batch(self, grid, batch_size): - return tf.tile(grid, [batch_size, 1, 1, 1, 1]) - - def __call__(self, width, height, batch_size, dtype=None): - if dtype is None: - self.dtype = tf.keras.backend.floatx() - else: - self.dtype = dtype - grid_points = self._build_grid_points(width, height, self._anchors, - self.dtype) - anchor_grid = self._build_anchor_grid( - tf.cast(self._anchors, self.dtype) / - tf.cast(self._scale_anchors, self.dtype), self.dtype) - - grid_points = self._extend_batch(grid_points, batch_size) - anchor_grid = self._extend_batch(anchor_grid, batch_size) - return grid_points, anchor_grid - - -TILE_SIZE = 50 - - -class PairWiseSearch: - """Apply a pairwise search between the ground truth and the labels. - - The goal is to indicate the locations where the predictions overlap with - ground truth for dynamic ground truth associations. - """ - - def __init__(self, - iou_type='iou', - any_match=True, - min_conf=0.0, - track_boxes=False, - track_classes=False): - """Initialization of Pair Wise Search. - - Args: - iou_type: An `str` for the iou type to use. - any_match: A `bool` for any match(no class match). - min_conf: An `int` for minimum confidence threshold. - track_boxes: A `bool` dynamic box assignment. - track_classes: A `bool` dynamic class assignment. - """ - self.iou_type = iou_type - self._any = any_match - self._min_conf = min_conf - self._track_boxes = track_boxes - self._track_classes = track_classes - return - - def box_iou(self, true_box, pred_box): - # based on the type of loss, compute the iou loss for a box - # compute_ indicated the type of iou to use - if self.iou_type == 'giou': - _, iou = box_ops.compute_giou(true_box, pred_box) - elif self.iou_type == 'ciou': - _, iou = box_ops.compute_ciou(true_box, pred_box) - else: - iou = box_ops.compute_iou(true_box, pred_box) - return iou - - def _search_body(self, pred_box, pred_class, boxes, classes, running_boxes, - running_classes, max_iou, idx): - """Main search fn.""" - - # capture the batch size to be used, and gather a slice of - # boxes from the ground truth. currently TILE_SIZE = 50, to - # save memory - batch_size = tf.shape(boxes)[0] - box_slice = tf.slice(boxes, [0, idx * TILE_SIZE, 0], - [batch_size, TILE_SIZE, 4]) - - # match the dimentions of the slice to the model predictions - # shape: [batch_size, 1, 1, num, TILE_SIZE, 4] - box_slice = tf.expand_dims(box_slice, axis=1) - box_slice = tf.expand_dims(box_slice, axis=1) - box_slice = tf.expand_dims(box_slice, axis=1) - - box_grid = tf.expand_dims(pred_box, axis=-2) - - # capture the classes - class_slice = tf.slice(classes, [0, idx * TILE_SIZE], - [batch_size, TILE_SIZE]) - class_slice = tf.expand_dims(class_slice, axis=1) - class_slice = tf.expand_dims(class_slice, axis=1) - class_slice = tf.expand_dims(class_slice, axis=1) - - iou = self.box_iou(box_slice, box_grid) - - if self._min_conf > 0.0: - if not self._any: - class_grid = tf.expand_dims(pred_class, axis=-2) - class_mask = tf.one_hot( - tf.cast(class_slice, tf.int32), - depth=tf.shape(pred_class)[-1], - dtype=pred_class.dtype) - class_mask = tf.reduce_any(tf.equal(class_mask, class_grid), axis=-1) - else: - class_mask = tf.reduce_max(pred_class, axis=-1, keepdims=True) - class_mask = tf.cast(class_mask, iou.dtype) - iou *= class_mask - - max_iou_ = tf.concat([max_iou, iou], axis=-1) - max_iou = tf.reduce_max(max_iou_, axis=-1, keepdims=True) - ind = tf.expand_dims(tf.argmax(max_iou_, axis=-1), axis=-1) - - if self._track_boxes: - running_boxes = tf.expand_dims(running_boxes, axis=-2) - box_slice = tf.zeros_like(running_boxes) + box_slice - box_slice = tf.concat([running_boxes, box_slice], axis=-2) - running_boxes = tf.gather_nd(box_slice, ind, batch_dims=4) - - if self._track_classes: - running_classes = tf.expand_dims(running_classes, axis=-1) - class_slice = tf.zeros_like(running_classes) + class_slice - class_slice = tf.concat([running_classes, class_slice], axis=-1) - running_classes = tf.gather_nd(class_slice, ind, batch_dims=4) - - return (pred_box, pred_class, boxes, classes, running_boxes, - running_classes, max_iou, idx + 1) - - def __call__(self, - pred_boxes, - pred_classes, - boxes, - classes, - clip_thresh=0.0): - num_boxes = tf.shape(boxes)[-2] - num_tiles = (num_boxes // TILE_SIZE) - 1 - - if self._min_conf > 0.0: - pred_classes = tf.cast(pred_classes > self._min_conf, pred_classes.dtype) - - def _loop_cond(unused_pred_box, unused_pred_class, boxes, unused_classes, - unused_running_boxes, unused_running_classes, unused_max_iou, - idx): - - # check that the slice has boxes that all zeros - batch_size = tf.shape(boxes)[0] - box_slice = tf.slice(boxes, [0, idx * TILE_SIZE, 0], - [batch_size, TILE_SIZE, 4]) - - return tf.logical_and(idx < num_tiles, - tf.math.greater(tf.reduce_sum(box_slice), 0)) - - running_boxes = tf.zeros_like(pred_boxes) - running_classes = tf.zeros_like(tf.reduce_sum(running_boxes, axis=-1)) - max_iou = tf.zeros_like(tf.reduce_sum(running_boxes, axis=-1)) - max_iou = tf.expand_dims(max_iou, axis=-1) - - (pred_boxes, pred_classes, boxes, classes, running_boxes, running_classes, - max_iou, _) = tf.while_loop(_loop_cond, self._search_body, [ - pred_boxes, pred_classes, boxes, classes, running_boxes, - running_classes, max_iou, - tf.constant(0) - ]) - - mask = tf.cast(max_iou > clip_thresh, running_boxes.dtype) - running_boxes *= mask - running_classes *= tf.squeeze(mask, axis=-1) - max_iou *= mask - max_iou = tf.squeeze(max_iou, axis=-1) - mask = tf.squeeze(mask, axis=-1) - - return (tf.stop_gradient(running_boxes), tf.stop_gradient(running_classes), - tf.stop_gradient(max_iou), tf.stop_gradient(mask)) - - -def average_iou(iou): - """Computes the average intersection over union without counting locations. - - where the iou is zero. - - Args: - iou: A `Tensor` representing the iou values. - - Returns: - tf.stop_gradient(avg_iou): A `Tensor` representing average - intersection over union. - """ - iou_sum = tf.reduce_sum(iou, axis=tf.range(1, tf.shape(tf.shape(iou))[0])) - counts = tf.cast( - tf.math.count_nonzero(iou, axis=tf.range(1, - tf.shape(tf.shape(iou))[0])), - iou.dtype) - avg_iou = tf.reduce_mean(math_ops.divide_no_nan(iou_sum, counts)) - return tf.stop_gradient(avg_iou) - - -def _scale_boxes(encoded_boxes, width, height, anchor_grid, grid_points, - scale_xy): - """Decodes models boxes applying and exponential to width and height maps.""" - # split the boxes - pred_xy = encoded_boxes[..., 0:2] - pred_wh = encoded_boxes[..., 2:4] - - # build a scaling tensor to get the offset of th ebox relative to the image - scaler = tf.convert_to_tensor([height, width, height, width]) - scale_xy = tf.cast(scale_xy, encoded_boxes.dtype) - - # apply the sigmoid - pred_xy = tf.math.sigmoid(pred_xy) - - # scale the centers and find the offset of each box relative to - # their center pixel - pred_xy = pred_xy * scale_xy - 0.5 * (scale_xy - 1) - - # scale the offsets and add them to the grid points or a tensor that is - # the realtive location of each pixel - box_xy = grid_points + pred_xy - - # scale the width and height of the predictions and corlate them - # to anchor boxes - box_wh = tf.math.exp(pred_wh) * anchor_grid - - # build the final predicted box - scaled_box = tf.concat([box_xy, box_wh], axis=-1) - pred_box = scaled_box / scaler - - # shift scaled boxes - scaled_box = tf.concat([pred_xy, box_wh], axis=-1) - return (scaler, scaled_box, pred_box) - - -@tf.custom_gradient -def _darknet_boxes(encoded_boxes, width, height, anchor_grid, grid_points, - max_delta, scale_xy): - """Wrapper for _scale_boxes to implement a custom gradient.""" - (scaler, scaled_box, pred_box) = _scale_boxes(encoded_boxes, width, height, - anchor_grid, grid_points, - scale_xy) - - def delta(unused_dy_scaler, dy_scaled, dy): - dy_xy, dy_wh = tf.split(dy, 2, axis=-1) - dy_xy_, dy_wh_ = tf.split(dy_scaled, 2, axis=-1) - - # add all the gradients that may have been applied to the - # boxes and those that have been applied to the width and height - dy_wh += dy_wh_ - dy_xy += dy_xy_ - - # propagate the exponential applied to the width and height in - # order to ensure the gradient propagated is of the correct - # magnitude - pred_wh = encoded_boxes[..., 2:4] - dy_wh *= tf.math.exp(pred_wh) - - dbox = tf.concat([dy_xy, dy_wh], axis=-1) - - # apply the gradient clipping to xy and wh - dbox = math_ops.rm_nan_inf(dbox) - delta = tf.cast(max_delta, dbox.dtype) - dbox = tf.clip_by_value(dbox, -delta, delta) - return dbox, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 - - return (scaler, scaled_box, pred_box), delta - - -def _new_coord_scale_boxes(encoded_boxes, width, height, anchor_grid, - grid_points, scale_xy): - """Decodes models boxes by squaring and scaling the width and height maps.""" - # split the boxes - pred_xy = encoded_boxes[..., 0:2] - pred_wh = encoded_boxes[..., 2:4] - - # build a scaling tensor to get the offset of th ebox relative to the image - scaler = tf.convert_to_tensor([height, width, height, width]) - scale_xy = tf.cast(scale_xy, pred_xy.dtype) - - # apply the sigmoid - pred_xy = tf.math.sigmoid(pred_xy) - pred_wh = tf.math.sigmoid(pred_wh) - - # scale the xy offset predictions according to the config - pred_xy = pred_xy * scale_xy - 0.5 * (scale_xy - 1) - - # find the true offset from the grid points and the scaler - # where the grid points are the relative offset of each pixel with - # in the image - box_xy = grid_points + pred_xy - - # decode the widht and height of the boxes and correlate them - # to the anchor boxes - box_wh = (2 * pred_wh)**2 * anchor_grid - - # build the final boxes - scaled_box = tf.concat([box_xy, box_wh], axis=-1) - pred_box = scaled_box / scaler - - # shift scaled boxes - scaled_box = tf.concat([pred_xy, box_wh], axis=-1) - return (scaler, scaled_box, pred_box) - - -@tf.custom_gradient -def _darknet_new_coord_boxes(encoded_boxes, width, height, anchor_grid, - grid_points, max_delta, scale_xy): - """Wrapper for _new_coord_scale_boxes to implement a custom gradient.""" - (scaler, scaled_box, - pred_box) = _new_coord_scale_boxes(encoded_boxes, width, height, anchor_grid, - grid_points, scale_xy) - - def delta(unused_dy_scaler, dy_scaled, dy): - dy_xy, dy_wh = tf.split(dy, 2, axis=-1) - dy_xy_, dy_wh_ = tf.split(dy_scaled, 2, axis=-1) - - # add all the gradients that may have been applied to the - # boxes and those that have been applied to the width and height - dy_wh += dy_wh_ - dy_xy += dy_xy_ - - dbox = tf.concat([dy_xy, dy_wh], axis=-1) - - # apply the gradient clipping to xy and wh - dbox = math_ops.rm_nan_inf(dbox) - delta = tf.cast(max_delta, dbox.dtype) - dbox = tf.clip_by_value(dbox, -delta, delta) - return dbox, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 - - return (scaler, scaled_box, pred_box), delta - - -def _anchor_free_scale_boxes(encoded_boxes, - width, - height, - stride, - grid_points, - darknet=False): - """Decode models boxes using FPN stride under anchor free conditions.""" - del darknet - # split the boxes - pred_xy = encoded_boxes[..., 0:2] - pred_wh = encoded_boxes[..., 2:4] - - # build a scaling tensor to get the offset of th ebox relative to the image - scaler = tf.convert_to_tensor([height, width, height, width]) - - # scale the offsets and add them to the grid points or a tensor that is - # the realtive location of each pixel - box_xy = (grid_points + pred_xy) - - # scale the width and height of the predictions and corlate them - # to anchor boxes - box_wh = tf.math.exp(pred_wh) - - # build the final predicted box - scaled_box = tf.concat([box_xy, box_wh], axis=-1) - - # properly scaling boxes gradeints - scaled_box = scaled_box * tf.cast(stride, scaled_box.dtype) - pred_box = scaled_box / tf.cast(scaler * stride, scaled_box.dtype) - return (scaler, scaled_box, pred_box) - - -def get_predicted_box(width, - height, - encoded_boxes, - anchor_grid, - grid_points, - scale_xy, - stride, - darknet=False, - box_type='original', - max_delta=np.inf): - """Decodes the predicted boxes from the model format to a usable format. - - This function decodes the model outputs into the [x, y, w, h] format for - use in the loss function as well as for use within the detection generator. - - Args: - width: A `float` scalar indicating the width of the prediction layer. - height: A `float` scalar indicating the height of the prediction layer - encoded_boxes: A `Tensor` of shape [..., height, width, 4] holding encoded - boxes. - anchor_grid: A `Tensor` of shape [..., 1, 1, 2] holding the anchor boxes - organized for box decoding, box width and height. - grid_points: A `Tensor` of shape [..., height, width, 2] holding the anchor - boxes for decoding the box centers. - scale_xy: A `float` scaler used to indicate the range for each center - outside of its given [..., i, j, 4] index, where i and j are indexing - pixels along the width and height of the predicted output map. - stride: An `int` defining the amount of down stride realtive to the input - image. - darknet: A `bool` used to select between custom gradient and default - autograd. - box_type: An `str` indicating the type of box encoding that is being used. - max_delta: A `float` scaler used for gradient clipping in back propagation. - - Returns: - scaler: A `Tensor` of shape [4] returned to allow the scaling of the ground - truth boxes to be of the same magnitude as the decoded predicted boxes. - scaled_box: A `Tensor` of shape [..., height, width, 4] with the predicted - boxes. - pred_box: A `Tensor` of shape [..., height, width, 4] with the predicted - boxes divided by the scaler parameter used to put all boxes in the [0, 1] - range. - """ - if box_type == 'anchor_free': - (scaler, scaled_box, pred_box) = _anchor_free_scale_boxes( - encoded_boxes, width, height, stride, grid_points, darknet=darknet) - elif darknet: - - # pylint:disable=unbalanced-tuple-unpacking - # if we are using the darknet loss we shoud nto propagate the - # decoding of the box - if box_type == 'scaled': - (scaler, scaled_box, - pred_box) = _darknet_new_coord_boxes(encoded_boxes, width, height, - anchor_grid, grid_points, max_delta, - scale_xy) - else: - (scaler, scaled_box, - pred_box) = _darknet_boxes(encoded_boxes, width, height, anchor_grid, - grid_points, max_delta, scale_xy) - else: - # if we are using the scaled loss we should propagate the decoding of - # the boxes - if box_type == 'scaled': - (scaler, scaled_box, - pred_box) = _new_coord_scale_boxes(encoded_boxes, width, height, - anchor_grid, grid_points, scale_xy) - else: - (scaler, scaled_box, pred_box) = _scale_boxes(encoded_boxes, width, - height, anchor_grid, - grid_points, scale_xy) - - return (scaler, scaled_box, pred_box) diff --git a/official/vision/beta/projects/yolo/optimization/__init__.py b/official/vision/beta/projects/yolo/optimization/__init__.py deleted file mode 100755 index 6ff51c806488ec0b9672d9d98a3ba8af164da66b..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/optimization/__init__.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Optimization package definition.""" - -# pylint: disable=wildcard-import -from official.modeling.optimization.configs.learning_rate_config import * -from official.modeling.optimization.ema_optimizer import ExponentialMovingAverage -from official.vision.beta.projects.yolo.optimization.configs.optimization_config import * -from official.vision.beta.projects.yolo.optimization.configs.optimizer_config import * -from official.vision.beta.projects.yolo.optimization.optimizer_factory import OptimizerFactory as YoloOptimizerFactory diff --git a/official/vision/beta/projects/yolo/optimization/configs/__init__.py b/official/vision/beta/projects/yolo/optimization/configs/__init__.py deleted file mode 100755 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/optimization/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/projects/yolo/optimization/optimizer_factory.py b/official/vision/beta/projects/yolo/optimization/optimizer_factory.py deleted file mode 100755 index b2126d16bc24fd91c1faf2624bd414bd317e8e04..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/optimization/optimizer_factory.py +++ /dev/null @@ -1,99 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Optimizer factory class.""" - -import gin - -from official.modeling.optimization import ema_optimizer -from official.modeling.optimization import optimizer_factory -from official.vision.beta.projects.yolo.optimization import sgd_torch - -optimizer_factory.OPTIMIZERS_CLS.update({ - 'sgd_torch': sgd_torch.SGDTorch, -}) - -OPTIMIZERS_CLS = optimizer_factory.OPTIMIZERS_CLS -LR_CLS = optimizer_factory.LR_CLS -WARMUP_CLS = optimizer_factory.WARMUP_CLS - - -class OptimizerFactory(optimizer_factory.OptimizerFactory): - """Optimizer factory class. - - This class builds learning rate and optimizer based on an optimization config. - To use this class, you need to do the following: - (1) Define optimization config, this includes optimizer, and learning rate - schedule. - (2) Initialize the class using the optimization config. - (3) Build learning rate. - (4) Build optimizer. - - This is a typical example for using this class: - params = { - 'optimizer': { - 'type': 'sgd', - 'sgd': {'momentum': 0.9} - }, - 'learning_rate': { - 'type': 'stepwise', - 'stepwise': {'boundaries': [10000, 20000], - 'values': [0.1, 0.01, 0.001]} - }, - 'warmup': { - 'type': 'linear', - 'linear': {'warmup_steps': 500, 'warmup_learning_rate': 0.01} - } - } - opt_config = OptimizationConfig(params) - opt_factory = OptimizerFactory(opt_config) - lr = opt_factory.build_learning_rate() - optimizer = opt_factory.build_optimizer(lr) - """ - - def get_bias_lr_schedule(self, bias_lr): - """Build learning rate. - - Builds learning rate from config. Learning rate schedule is built according - to the learning rate config. If learning rate type is consant, - lr_config.learning_rate is returned. - - Args: - bias_lr: learning rate config. - - Returns: - tf.keras.optimizers.schedules.LearningRateSchedule instance. If - learning rate type is consant, lr_config.learning_rate is returned. - """ - if self._lr_type == 'constant': - lr = self._lr_config.learning_rate - else: - lr = LR_CLS[self._lr_type](**self._lr_config.as_dict()) - - if self._warmup_config: - if self._warmup_type != 'linear': - raise ValueError('Smart Bias is only supported currently with a' - 'linear warm up.') - warm_up_cfg = self._warmup_config.as_dict() - warm_up_cfg['warmup_learning_rate'] = bias_lr - lr = WARMUP_CLS['linear'](lr, **warm_up_cfg) - return lr - - @gin.configurable - def add_ema(self, optimizer): - """Add EMA to the optimizer independently of the build optimizer method.""" - if self._use_ema: - optimizer = ema_optimizer.ExponentialMovingAverage( - optimizer, **self._ema_config.as_dict()) - return optimizer diff --git a/official/vision/beta/projects/yolo/tasks/image_classification.py b/official/vision/beta/projects/yolo/tasks/image_classification.py deleted file mode 100644 index 4edef631fce8744cafff901734c340fcc82cfe66..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/tasks/image_classification.py +++ /dev/null @@ -1,65 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Image classification task definition.""" -from official.common import dataset_fn -from official.core import task_factory -from official.vision.beta.dataloaders import classification_input as classification_input_base -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.dataloaders import tfds_factory -from official.vision.beta.projects.yolo.configs import darknet_classification as exp_cfg -from official.vision.beta.projects.yolo.dataloaders import classification_input -from official.vision.beta.tasks import image_classification - - -@task_factory.register_task_cls(exp_cfg.ImageClassificationTask) -class ImageClassificationTask(image_classification.ImageClassificationTask): - """A task for image classification.""" - - def build_inputs(self, params, input_context=None): - """Builds classification input.""" - - num_classes = self.task_config.model.num_classes - input_size = self.task_config.model.input_size - image_field_key = self.task_config.train_data.image_field_key - label_field_key = self.task_config.train_data.label_field_key - is_multilabel = self.task_config.train_data.is_multilabel - - if params.tfds_name: - decoder = tfds_factory.get_classification_decoder(params.tfds_name) - else: - decoder = classification_input_base.Decoder( - image_field_key=image_field_key, - label_field_key=label_field_key, - is_multilabel=is_multilabel) - - parser = classification_input.Parser( - output_size=input_size[:2], - num_classes=num_classes, - image_field_key=image_field_key, - label_field_key=label_field_key, - decode_jpeg_only=params.decode_jpeg_only, - aug_rand_hflip=params.aug_rand_hflip, - aug_type=params.aug_type, - is_multilabel=is_multilabel, - dtype=params.dtype) - - reader = input_reader_factory.input_reader_generator( - params, - dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - - dataset = reader.read(input_context=input_context) - return dataset diff --git a/official/vision/beta/projects/yolo/tasks/yolo.py b/official/vision/beta/projects/yolo/tasks/yolo.py deleted file mode 100755 index 3539f17f170a8395575017b5d9fb46ab49cadd19..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/tasks/yolo.py +++ /dev/null @@ -1,407 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains classes used to train Yolo.""" - -import collections -from typing import Optional - -from absl import logging -import tensorflow as tf - -from official.core import base_task -from official.core import config_definitions -from official.core import input_reader -from official.core import task_factory -from official.modeling import performance -from official.vision.beta.dataloaders import tfds_factory -from official.vision.beta.dataloaders import tf_example_label_map_decoder -from official.vision.beta.evaluation import coco_evaluator -from official.vision.beta.ops import box_ops -from official.vision.beta.projects.yolo import optimization -from official.vision.beta.projects.yolo.configs import yolo as exp_cfg -from official.vision.beta.projects.yolo.dataloaders import tf_example_decoder -from official.vision.beta.projects.yolo.dataloaders import yolo_input -from official.vision.beta.projects.yolo.modeling import factory -from official.vision.beta.projects.yolo.ops import mosaic -from official.vision.beta.projects.yolo.ops import preprocessing_ops -from official.vision.beta.projects.yolo.tasks import task_utils - -OptimizationConfig = optimization.OptimizationConfig -RuntimeConfig = config_definitions.RuntimeConfig - - -@task_factory.register_task_cls(exp_cfg.YoloTask) -class YoloTask(base_task.Task): - """A single-replica view of training procedure. - - YOLO task provides artifacts for training/evalution procedures, including - loading/iterating over Datasets, initializing the model, calculating the loss, - post-processing, and customized metrics with reduction. - """ - - def __init__(self, params, logging_dir: Optional[str] = None): - super().__init__(params, logging_dir) - self.coco_metric = None - self._loss_fn = None - self._model = None - self._coco_91_to_80 = False - self._metrics = [] - - # globally set the random seed - preprocessing_ops.set_random_seeds(seed=params.seed) - return - - def build_model(self): - """Build an instance of Yolo.""" - - model_base_cfg = self.task_config.model - l2_weight_decay = self.task_config.weight_decay / 2.0 - - input_size = model_base_cfg.input_size.copy() - input_specs = tf.keras.layers.InputSpec(shape=[None] + input_size) - l2_regularizer = ( - tf.keras.regularizers.l2(l2_weight_decay) if l2_weight_decay else None) - model, losses = factory.build_yolo( - input_specs, model_base_cfg, l2_regularizer) - - # save for later usage within the task. - self._loss_fn = losses - self._model = model - return model - - def _get_data_decoder(self, params): - """Get a decoder object to decode the dataset.""" - if params.tfds_name: - decoder = tfds_factory.get_detection_decoder(params.tfds_name) - else: - decoder_cfg = params.decoder.get() - if params.decoder.type == 'simple_decoder': - self._coco_91_to_80 = decoder_cfg.coco91_to_80 - decoder = tf_example_decoder.TfExampleDecoder( - coco91_to_80=decoder_cfg.coco91_to_80, - regenerate_source_id=decoder_cfg.regenerate_source_id) - elif params.decoder.type == 'label_map_decoder': - decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( - label_map=decoder_cfg.label_map, - regenerate_source_id=decoder_cfg.regenerate_source_id) - else: - raise ValueError('Unknown decoder type: {}!'.format( - params.decoder.type)) - return decoder - - def build_inputs(self, params, input_context=None): - """Build input dataset.""" - model = self.task_config.model - - # get anchor boxes dict based on models min and max level - backbone = model.backbone.get() - anchor_dict, level_limits = model.anchor_boxes.get(backbone.min_level, - backbone.max_level) - - params.seed = self.task_config.seed - # set shared patamters between mosaic and yolo_input - base_config = dict( - letter_box=params.parser.letter_box, - aug_rand_translate=params.parser.aug_rand_translate, - aug_rand_angle=params.parser.aug_rand_angle, - aug_rand_perspective=params.parser.aug_rand_perspective, - area_thresh=params.parser.area_thresh, - random_flip=params.parser.random_flip, - seed=params.seed, - ) - - # get the decoder - decoder = self._get_data_decoder(params) - - # init Mosaic - sample_fn = mosaic.Mosaic( - output_size=model.input_size, - mosaic_frequency=params.parser.mosaic.mosaic_frequency, - mixup_frequency=params.parser.mosaic.mixup_frequency, - jitter=params.parser.mosaic.jitter, - mosaic_center=params.parser.mosaic.mosaic_center, - mosaic_crop_mode=params.parser.mosaic.mosaic_crop_mode, - aug_scale_min=params.parser.mosaic.aug_scale_min, - aug_scale_max=params.parser.mosaic.aug_scale_max, - **base_config) - - # init Parser - parser = yolo_input.Parser( - output_size=model.input_size, - anchors=anchor_dict, - use_tie_breaker=params.parser.use_tie_breaker, - jitter=params.parser.jitter, - aug_scale_min=params.parser.aug_scale_min, - aug_scale_max=params.parser.aug_scale_max, - aug_rand_hue=params.parser.aug_rand_hue, - aug_rand_saturation=params.parser.aug_rand_saturation, - aug_rand_brightness=params.parser.aug_rand_brightness, - max_num_instances=params.parser.max_num_instances, - scale_xy=model.detection_generator.scale_xy.get(), - expanded_strides=model.detection_generator.path_scales.get(), - darknet=model.darknet_based_model, - best_match_only=params.parser.best_match_only, - anchor_t=params.parser.anchor_thresh, - random_pad=params.parser.random_pad, - level_limits=level_limits, - dtype=params.dtype, - **base_config) - - # init the dataset reader - reader = input_reader.InputReader( - params, - dataset_fn=tf.data.TFRecordDataset, - decoder_fn=decoder.decode, - sample_fn=sample_fn.mosaic_fn(is_training=params.is_training), - parser_fn=parser.parse_fn(params.is_training)) - dataset = reader.read(input_context=input_context) - return dataset - - def build_metrics(self, training=True): - """Build detection metrics.""" - metrics = [] - - backbone = self.task_config.model.backbone.get() - metric_names = collections.defaultdict(list) - for key in range(backbone.min_level, backbone.max_level + 1): - key = str(key) - metric_names[key].append('loss') - metric_names[key].append('avg_iou') - metric_names[key].append('avg_obj') - - metric_names['net'].append('box') - metric_names['net'].append('class') - metric_names['net'].append('conf') - - for _, key in enumerate(metric_names.keys()): - metrics.append(task_utils.ListMetrics(metric_names[key], name=key)) - - self._metrics = metrics - if not training: - annotation_file = self.task_config.annotation_file - if self._coco_91_to_80: - annotation_file = None - self.coco_metric = coco_evaluator.COCOEvaluator( - annotation_file=annotation_file, - include_mask=False, - need_rescale_bboxes=False, - per_category_metrics=self._task_config.per_category_metrics) - - return metrics - - def build_losses(self, outputs, labels, aux_losses=None): - """Build YOLO losses.""" - return self._loss_fn(labels, outputs) - - def train_step(self, inputs, model, optimizer, metrics=None): - """Train Step. - - Forward step and backwards propagate the model. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - image, label = inputs - - with tf.GradientTape(persistent=False) as tape: - # Compute a prediction - y_pred = model(image, training=True) - - # Cast to float32 for gradietn computation - y_pred = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), y_pred) - - # Get the total loss - (scaled_loss, metric_loss, - loss_metrics) = self.build_losses(y_pred['raw_output'], label) - - # Scale the loss for numerical stability - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - # Compute the gradient - train_vars = model.trainable_variables - gradients = tape.gradient(scaled_loss, train_vars) - - # Get unscaled loss if we are using the loss scale optimizer on fp16 - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - gradients = optimizer.get_unscaled_gradients(gradients) - - # Apply gradients to the model - optimizer.apply_gradients(zip(gradients, train_vars)) - logs = {self.loss: metric_loss} - - # Compute all metrics - if metrics: - for m in metrics: - m.update_state(loss_metrics[m.name]) - logs.update({m.name: m.result()}) - return logs - - def _reorg_boxes(self, boxes, info, num_detections): - """Scale and Clean boxes prior to Evaluation.""" - mask = tf.sequence_mask(num_detections, maxlen=tf.shape(boxes)[1]) - mask = tf.cast(tf.expand_dims(mask, axis=-1), boxes.dtype) - - # Denormalize the boxes by the shape of the image - inshape = tf.expand_dims(info[:, 1, :], axis=1) - ogshape = tf.expand_dims(info[:, 0, :], axis=1) - scale = tf.expand_dims(info[:, 2, :], axis=1) - offset = tf.expand_dims(info[:, 3, :], axis=1) - - boxes = box_ops.denormalize_boxes(boxes, inshape) - boxes = box_ops.clip_boxes(boxes, inshape) - boxes += tf.tile(offset, [1, 1, 2]) - boxes /= tf.tile(scale, [1, 1, 2]) - boxes = box_ops.clip_boxes(boxes, ogshape) - - # Mask the boxes for usage - boxes *= mask - boxes += (mask - 1) - return boxes - - def validation_step(self, inputs, model, metrics=None): - """Validatation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - image, label = inputs - - # Step the model once - y_pred = model(image, training=False) - y_pred = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), y_pred) - (_, metric_loss, loss_metrics) = self.build_losses(y_pred['raw_output'], - label) - logs = {self.loss: metric_loss} - - # Reorganize and rescale the boxes - info = label['groundtruths']['image_info'] - boxes = self._reorg_boxes(y_pred['bbox'], info, y_pred['num_detections']) - - # Build the input for the coc evaluation metric - coco_model_outputs = { - 'detection_boxes': boxes, - 'detection_scores': y_pred['confidence'], - 'detection_classes': y_pred['classes'], - 'num_detections': y_pred['num_detections'], - 'source_id': label['groundtruths']['source_id'], - 'image_info': label['groundtruths']['image_info'] - } - - # Compute all metrics - if metrics: - logs.update( - {self.coco_metric.name: (label['groundtruths'], coco_model_outputs)}) - for m in metrics: - m.update_state(loss_metrics[m.name]) - logs.update({m.name: m.result()}) - return logs - - def aggregate_logs(self, state=None, step_outputs=None): - """Get Metric Results.""" - if not state: - self.coco_metric.reset_states() - state = self.coco_metric - self.coco_metric.update_state(step_outputs[self.coco_metric.name][0], - step_outputs[self.coco_metric.name][1]) - return state - - def reduce_aggregated_logs(self, aggregated_logs, global_step=None): - """Reduce logs and remove unneeded items. Update with COCO results.""" - res = self.coco_metric.result() - return res - - def initialize(self, model: tf.keras.Model): - """Loading pretrained checkpoint.""" - - if not self.task_config.init_checkpoint: - logging.info('Training from Scratch.') - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - ckpt_items = {} - if 'backbone' in self.task_config.init_checkpoint_modules: - ckpt_items.update(backbone=model.backbone) - if 'decoder' in self.task_config.init_checkpoint_modules: - ckpt_items.update(decoder=model.decoder) - - ckpt = tf.train.Checkpoint(**ckpt_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def create_optimizer(self, - optimizer_config: OptimizationConfig, - runtime_config: Optional[RuntimeConfig] = None): - """Creates an TF optimizer from configurations. - - Args: - optimizer_config: the parameters of the Optimization settings. - runtime_config: the parameters of the runtime. - - Returns: - A tf.optimizers.Optimizer object. - """ - opt_factory = optimization.YoloOptimizerFactory(optimizer_config) - # pylint: disable=protected-access - ema = opt_factory._use_ema - opt_factory._use_ema = False - - opt_type = opt_factory._optimizer_type - if opt_type == 'sgd_torch': - optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) - optimizer.set_bias_lr( - opt_factory.get_bias_lr_schedule(self._task_config.smart_bias_lr)) - optimizer.search_and_set_variable_groups(self._model.trainable_variables) - else: - optimizer = opt_factory.build_optimizer(opt_factory.build_learning_rate()) - opt_factory._use_ema = ema - - if ema: - logging.info('EMA is enabled.') - optimizer = opt_factory.add_ema(optimizer) - - # pylint: enable=protected-access - - if runtime_config and runtime_config.loss_scale: - use_float16 = runtime_config.mixed_precision_dtype == 'float16' - optimizer = performance.configure_optimizer( - optimizer, - use_float16=use_float16, - loss_scale=runtime_config.loss_scale) - - return optimizer diff --git a/official/vision/beta/projects/yolo/train.py b/official/vision/beta/projects/yolo/train.py deleted file mode 100644 index 78ee1ac32ae6df2ae3a82fb30bd2d4ef94c7ba91..0000000000000000000000000000000000000000 --- a/official/vision/beta/projects/yolo/train.py +++ /dev/null @@ -1,29 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""TensorFlow Model Garden Vision training driver.""" - -from absl import app -from absl import flags - -from official.common import flags as tfm_flags -from official.vision.beta import train -from official.vision.beta.projects.yolo.common import registry_imports # pylint: disable=unused-import - -FLAGS = flags.FLAGS - - -if __name__ == '__main__': - tfm_flags.define_flags() - app.run(train.main) diff --git a/official/vision/beta/serving/__init__.py b/official/vision/beta/serving/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/beta/serving/detection.py b/official/vision/beta/serving/detection.py deleted file mode 100644 index 749e6a3196cc1f10a16ecb6d1ca88ef301d37b53..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/detection.py +++ /dev/null @@ -1,206 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Detection input and model functions for serving/inference.""" - -from typing import Mapping, Text -import tensorflow as tf - -from official.vision.beta import configs -from official.vision.beta.modeling import factory -from official.vision.beta.ops import anchor -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.serving import export_base - - -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) - - -class DetectionModule(export_base.ExportModule): - """Detection Module.""" - - def _build_model(self): - - if self._batch_size is None: - raise ValueError('batch_size cannot be None for detection models.') - input_specs = tf.keras.layers.InputSpec(shape=[self._batch_size] + - self._input_image_size + [3]) - - if isinstance(self.params.task.model, configs.maskrcnn.MaskRCNN): - model = factory.build_maskrcnn( - input_specs=input_specs, model_config=self.params.task.model) - elif isinstance(self.params.task.model, configs.retinanet.RetinaNet): - model = factory.build_retinanet( - input_specs=input_specs, model_config=self.params.task.model) - else: - raise ValueError('Detection module not implemented for {} model.'.format( - type(self.params.task.model))) - - return model - - def _build_anchor_boxes(self): - """Builds and returns anchor boxes.""" - model_params = self.params.task.model - input_anchor = anchor.build_anchor_generator( - min_level=model_params.min_level, - max_level=model_params.max_level, - num_scales=model_params.anchor.num_scales, - aspect_ratios=model_params.anchor.aspect_ratios, - anchor_size=model_params.anchor.anchor_size) - return input_anchor( - image_size=(self._input_image_size[0], self._input_image_size[1])) - - def _build_inputs(self, image): - """Builds detection model inputs for serving.""" - model_params = self.params.task.model - # Normalizes image with mean and std pixel values. - image = preprocess_ops.normalize_image(image, - offset=MEAN_RGB, - scale=STDDEV_RGB) - - image, image_info = preprocess_ops.resize_and_crop_image( - image, - self._input_image_size, - padded_size=preprocess_ops.compute_padded_size( - self._input_image_size, 2**model_params.max_level), - aug_scale_min=1.0, - aug_scale_max=1.0) - anchor_boxes = self._build_anchor_boxes() - - return image, anchor_boxes, image_info - - def preprocess(self, images: tf.Tensor) -> ( - tf.Tensor, Mapping[Text, tf.Tensor], tf.Tensor): - """Preprocess inputs to be suitable for the model. - - Args: - images: The images tensor. - Returns: - images: The images tensor cast to float. - anchor_boxes: Dict mapping anchor levels to anchor boxes. - image_info: Tensor containing the details of the image resizing. - - """ - model_params = self.params.task.model - with tf.device('cpu:0'): - images = tf.cast(images, dtype=tf.float32) - - # Tensor Specs for map_fn outputs (images, anchor_boxes, and image_info). - images_spec = tf.TensorSpec(shape=self._input_image_size + [3], - dtype=tf.float32) - - num_anchors = model_params.anchor.num_scales * len( - model_params.anchor.aspect_ratios) * 4 - anchor_shapes = [] - for level in range(model_params.min_level, model_params.max_level + 1): - anchor_level_spec = tf.TensorSpec( - shape=[ - self._input_image_size[0] // 2**level, - self._input_image_size[1] // 2**level, num_anchors - ], - dtype=tf.float32) - anchor_shapes.append((str(level), anchor_level_spec)) - - image_info_spec = tf.TensorSpec(shape=[4, 2], dtype=tf.float32) - - images, anchor_boxes, image_info = tf.nest.map_structure( - tf.identity, - tf.map_fn( - self._build_inputs, - elems=images, - fn_output_signature=(images_spec, dict(anchor_shapes), - image_info_spec), - parallel_iterations=32)) - - return images, anchor_boxes, image_info - - def serve(self, images: tf.Tensor): - """Cast image to float and run inference. - - Args: - images: uint8 Tensor of shape [batch_size, None, None, 3] - Returns: - Tensor holding detection output logits. - """ - - # Skip image preprocessing when input_type is tflite so it is compatible - # with TFLite quantization. - if self._input_type != 'tflite': - images, anchor_boxes, image_info = self.preprocess(images) - else: - with tf.device('cpu:0'): - anchor_boxes = self._build_anchor_boxes() - # image_info is a 3D tensor of shape [batch_size, 4, 2]. It is in the - # format of [[original_height, original_width], - # [desired_height, desired_width], [y_scale, x_scale], - # [y_offset, x_offset]]. When input_type is tflite, input image is - # supposed to be preprocessed already. - image_info = tf.convert_to_tensor([[ - self._input_image_size, self._input_image_size, [1.0, 1.0], [0, 0] - ]], - dtype=tf.float32) - input_image_shape = image_info[:, 1, :] - - # To overcome keras.Model extra limitation to save a model with layers that - # have multiple inputs, we use `model.call` here to trigger the forward - # path. Note that, this disables some keras magics happens in `__call__`. - detections = self.model.call( - images=images, - image_shape=input_image_shape, - anchor_boxes=anchor_boxes, - training=False) - - if self.params.task.model.detection_generator.apply_nms: - # For RetinaNet model, apply export_config. - # TODO(huizhongc): Add export_config to fasterrcnn and maskrcnn as needed. - if isinstance(self.params.task.model, configs.retinanet.RetinaNet): - export_config = self.params.task.export_config - # Normalize detection box coordinates to [0, 1]. - if export_config.output_normalized_coordinates: - detection_boxes = ( - detections['detection_boxes'] / - tf.tile(image_info[:, 2:3, :], [1, 1, 2])) - detections['detection_boxes'] = box_ops.normalize_boxes( - detection_boxes, image_info[:, 0:1, :]) - - # Cast num_detections and detection_classes to float. This allows the - # model inference to work on chain (go/chain) as chain requires floating - # point outputs. - if export_config.cast_num_detections_to_float: - detections['num_detections'] = tf.cast( - detections['num_detections'], dtype=tf.float32) - if export_config.cast_detection_classes_to_float: - detections['detection_classes'] = tf.cast( - detections['detection_classes'], dtype=tf.float32) - - final_outputs = { - 'detection_boxes': detections['detection_boxes'], - 'detection_scores': detections['detection_scores'], - 'detection_classes': detections['detection_classes'], - 'num_detections': detections['num_detections'] - } - else: - final_outputs = { - 'decoded_boxes': detections['decoded_boxes'], - 'decoded_box_scores': detections['decoded_box_scores'] - } - - if 'detection_masks' in detections.keys(): - final_outputs['detection_masks'] = detections['detection_masks'] - - final_outputs.update({'image_info': image_info}) - return final_outputs diff --git a/official/vision/beta/serving/detection_test.py b/official/vision/beta/serving/detection_test.py deleted file mode 100644 index f2958b08b50d45626a658a9d39036727e5ca630e..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/detection_test.py +++ /dev/null @@ -1,145 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Test for image detection export lib.""" - -import io -import os - -from absl.testing import parameterized -import numpy as np -from PIL import Image -import tensorflow as tf - -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.vision.beta.serving import detection - - -class DetectionExportTest(tf.test.TestCase, parameterized.TestCase): - - def _get_detection_module(self, experiment_name, input_type): - params = exp_factory.get_exp_config(experiment_name) - params.task.model.backbone.resnet.model_id = 18 - params.task.model.detection_generator.nms_version = 'batched' - detection_module = detection.DetectionModule( - params, - batch_size=1, - input_image_size=[640, 640], - input_type=input_type) - return detection_module - - def _export_from_module(self, module, input_type, save_directory): - signatures = module.get_inference_signatures( - {input_type: 'serving_default'}) - tf.saved_model.save(module, save_directory, signatures=signatures) - - def _get_dummy_input(self, input_type, batch_size, image_size): - """Get dummy input for the given input type.""" - h, w = image_size - - if input_type == 'image_tensor': - return tf.zeros((batch_size, h, w, 3), dtype=np.uint8) - elif input_type == 'image_bytes': - image = Image.fromarray(np.zeros((h, w, 3), dtype=np.uint8)) - byte_io = io.BytesIO() - image.save(byte_io, 'PNG') - return [byte_io.getvalue() for b in range(batch_size)] - elif input_type == 'tf_example': - image_tensor = tf.zeros((h, w, 3), dtype=tf.uint8) - encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() - example = tf.train.Example( - features=tf.train.Features( - feature={ - 'image/encoded': - tf.train.Feature( - bytes_list=tf.train.BytesList(value=[encoded_jpeg])), - })).SerializeToString() - return [example for b in range(batch_size)] - elif input_type == 'tflite': - return tf.zeros((batch_size, h, w, 3), dtype=np.float32) - - @parameterized.parameters( - ('image_tensor', 'fasterrcnn_resnetfpn_coco', [384, 384]), - ('image_bytes', 'fasterrcnn_resnetfpn_coco', [640, 640]), - ('tf_example', 'fasterrcnn_resnetfpn_coco', [640, 640]), - ('tflite', 'fasterrcnn_resnetfpn_coco', [640, 640]), - ('image_tensor', 'maskrcnn_resnetfpn_coco', [640, 640]), - ('image_bytes', 'maskrcnn_resnetfpn_coco', [640, 384]), - ('tf_example', 'maskrcnn_resnetfpn_coco', [640, 640]), - ('tflite', 'maskrcnn_resnetfpn_coco', [640, 640]), - ('image_tensor', 'retinanet_resnetfpn_coco', [640, 640]), - ('image_bytes', 'retinanet_resnetfpn_coco', [640, 640]), - ('tf_example', 'retinanet_resnetfpn_coco', [384, 640]), - ('tflite', 'retinanet_resnetfpn_coco', [640, 640]), - ('image_tensor', 'retinanet_resnetfpn_coco', [384, 384]), - ('image_bytes', 'retinanet_spinenet_coco', [640, 640]), - ('tf_example', 'retinanet_spinenet_coco', [640, 384]), - ('tflite', 'retinanet_spinenet_coco', [640, 640]), - ) - def test_export(self, input_type, experiment_name, image_size): - tmp_dir = self.get_temp_dir() - module = self._get_detection_module(experiment_name, input_type) - - self._export_from_module(module, input_type, tmp_dir) - - self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) - self.assertTrue( - os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) - self.assertTrue( - os.path.exists( - os.path.join(tmp_dir, 'variables', - 'variables.data-00000-of-00001'))) - - imported = tf.saved_model.load(tmp_dir) - detection_fn = imported.signatures['serving_default'] - - images = self._get_dummy_input( - input_type, batch_size=1, image_size=image_size) - - if input_type == 'tflite': - processed_images = tf.zeros(image_size + [3], dtype=tf.float32) - anchor_boxes = module._build_anchor_boxes() - image_info = tf.convert_to_tensor( - [image_size, image_size, [1.0, 1.0], [0, 0]], dtype=tf.float32) - else: - processed_images, anchor_boxes, image_info = module._build_inputs( - tf.zeros((224, 224, 3), dtype=tf.uint8)) - image_shape = image_info[1, :] - image_shape = tf.expand_dims(image_shape, 0) - processed_images = tf.expand_dims(processed_images, 0) - for l, l_boxes in anchor_boxes.items(): - anchor_boxes[l] = tf.expand_dims(l_boxes, 0) - - expected_outputs = module.model( - images=processed_images, - image_shape=image_shape, - anchor_boxes=anchor_boxes, - training=False) - outputs = detection_fn(tf.constant(images)) - - self.assertAllClose(outputs['num_detections'].numpy(), - expected_outputs['num_detections'].numpy()) - - def test_build_model_fail_with_none_batch_size(self): - params = exp_factory.get_exp_config('retinanet_resnetfpn_coco') - with self.assertRaisesRegex( - ValueError, 'batch_size cannot be None for detection models.'): - detection.DetectionModule( - params, batch_size=None, input_image_size=[640, 640]) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/serving/export_module_factory.py b/official/vision/beta/serving/export_module_factory.py deleted file mode 100644 index b2a8ee63e563dff617a8581085eeef6b6e1ffcb1..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/export_module_factory.py +++ /dev/null @@ -1,89 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Factory for vision export modules.""" - -from typing import List, Optional - -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.vision.beta import configs -from official.vision.beta.dataloaders import classification_input -from official.vision.beta.modeling import factory -from official.vision.beta.serving import export_base_v2 as export_base -from official.vision.beta.serving import export_utils - - -def create_classification_export_module(params: cfg.ExperimentConfig, - input_type: str, - batch_size: int, - input_image_size: List[int], - num_channels: int = 3): - """Creats classification export module.""" - input_signature = export_utils.get_image_input_signatures( - input_type, batch_size, input_image_size, num_channels) - input_specs = tf.keras.layers.InputSpec( - shape=[batch_size] + input_image_size + [num_channels]) - - model = factory.build_classification_model( - input_specs=input_specs, - model_config=params.task.model, - l2_regularizer=None) - - def preprocess_fn(inputs): - image_tensor = export_utils.parse_image(inputs, input_type, - input_image_size, num_channels) - # If input_type is `tflite`, do not apply image preprocessing. - if input_type == 'tflite': - return image_tensor - - def preprocess_image_fn(inputs): - return classification_input.Parser.inference_fn( - inputs, input_image_size, num_channels) - - images = tf.map_fn( - preprocess_image_fn, elems=image_tensor, - fn_output_signature=tf.TensorSpec( - shape=input_image_size + [num_channels], - dtype=tf.float32)) - - return images - - def postprocess_fn(logits): - probs = tf.nn.softmax(logits) - return {'logits': logits, 'probs': probs} - - export_module = export_base.ExportModule(params, - model=model, - input_signature=input_signature, - preprocessor=preprocess_fn, - postprocessor=postprocess_fn) - return export_module - - -def get_export_module(params: cfg.ExperimentConfig, - input_type: str, - batch_size: Optional[int], - input_image_size: List[int], - num_channels: int = 3) -> export_base.ExportModule: - """Factory for export modules.""" - if isinstance(params.task, - configs.image_classification.ImageClassificationTask): - export_module = create_classification_export_module( - params, input_type, batch_size, input_image_size, num_channels) - else: - raise ValueError('Export module not implemented for {} task.'.format( - type(params.task))) - return export_module diff --git a/official/vision/beta/serving/export_saved_model.py b/official/vision/beta/serving/export_saved_model.py deleted file mode 100644 index 39fa5585a6af0a3254917053c09130288b3196a2..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/export_saved_model.py +++ /dev/null @@ -1,107 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -r"""Vision models export binary for serving/inference. - -To export a trained checkpoint in saved_model format (shell script): - -EXPERIMENT_TYPE = XX -CHECKPOINT_PATH = XX -EXPORT_DIR_PATH = XX -export_saved_model --experiment=${EXPERIMENT_TYPE} \ - --export_dir=${EXPORT_DIR_PATH}/ \ - --checkpoint_path=${CHECKPOINT_PATH} \ - --batch_size=2 \ - --input_image_size=224,224 - -To serve (python): - -export_dir_path = XX -input_type = XX -input_images = XX -imported = tf.saved_model.load(export_dir_path) -model_fn = imported.signatures['serving_default'] -output = model_fn(input_images) -""" - -from absl import app -from absl import flags - -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.modeling import hyperparams -from official.vision.beta.serving import export_saved_model_lib - -FLAGS = flags.FLAGS - - -flags.DEFINE_string( - 'experiment', None, 'experiment type, e.g. retinanet_resnetfpn_coco') -flags.DEFINE_string('export_dir', None, 'The export directory.') -flags.DEFINE_string('checkpoint_path', None, 'Checkpoint path.') -flags.DEFINE_multi_string( - 'config_file', - default=None, - help='YAML/JSON files which specifies overrides. The override order ' - 'follows the order of args. Note that each file ' - 'can be used as an override template to override the default parameters ' - 'specified in Python. If the same parameter is specified in both ' - '`--config_file` and `--params_override`, `config_file` will be used ' - 'first, followed by params_override.') -flags.DEFINE_string( - 'params_override', '', - 'The JSON/YAML file or string which specifies the parameter to be overriden' - ' on top of `config_file` template.') -flags.DEFINE_integer( - 'batch_size', None, 'The batch size.') -flags.DEFINE_string( - 'input_type', 'image_tensor', - 'One of `image_tensor`, `image_bytes`, `tf_example` and `tflite`.') -flags.DEFINE_string( - 'input_image_size', '224,224', - 'The comma-separated string of two integers representing the height,width ' - 'of the input to the model.') -flags.DEFINE_string('export_checkpoint_subdir', 'checkpoint', - 'The subdirectory for checkpoints.') -flags.DEFINE_string('export_saved_model_subdir', 'saved_model', - 'The subdirectory for saved model.') - - -def main(_): - - params = exp_factory.get_exp_config(FLAGS.experiment) - for config_file in FLAGS.config_file or []: - params = hyperparams.override_params_dict( - params, config_file, is_strict=True) - if FLAGS.params_override: - params = hyperparams.override_params_dict( - params, FLAGS.params_override, is_strict=True) - - params.validate() - params.lock() - - export_saved_model_lib.export_inference_graph( - input_type=FLAGS.input_type, - batch_size=FLAGS.batch_size, - input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], - params=params, - checkpoint_path=FLAGS.checkpoint_path, - export_dir=FLAGS.export_dir, - export_checkpoint_subdir=FLAGS.export_checkpoint_subdir, - export_saved_model_subdir=FLAGS.export_saved_model_subdir) - - -if __name__ == '__main__': - app.run(main) diff --git a/official/vision/beta/serving/export_tfhub.py b/official/vision/beta/serving/export_tfhub.py deleted file mode 100644 index 8d8af0899034065f9750ea1ab51e23dea40bb915..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/export_tfhub.py +++ /dev/null @@ -1,105 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""A script to export the image classification as a TF-Hub SavedModel.""" - -# Import libraries -from absl import app -from absl import flags - -import tensorflow as tf - -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.modeling import hyperparams -from official.vision.beta.modeling import factory - - -FLAGS = flags.FLAGS - -flags.DEFINE_string( - 'experiment', None, 'experiment type, e.g. resnet_imagenet') -flags.DEFINE_string( - 'checkpoint_path', None, 'Checkpoint path.') -flags.DEFINE_string( - 'export_path', None, 'The export directory.') -flags.DEFINE_multi_string( - 'config_file', - None, - 'A YAML/JSON files which specifies overrides. The override order ' - 'follows the order of args. Note that each file ' - 'can be used as an override template to override the default parameters ' - 'specified in Python. If the same parameter is specified in both ' - '`--config_file` and `--params_override`, `config_file` will be used ' - 'first, followed by params_override.') -flags.DEFINE_string( - 'params_override', '', - 'The JSON/YAML file or string which specifies the parameter to be overriden' - ' on top of `config_file` template.') -flags.DEFINE_integer( - 'batch_size', None, 'The batch size.') -flags.DEFINE_string( - 'input_image_size', - '224,224', - 'The comma-separated string of two integers representing the height,width ' - 'of the input to the model.') -flags.DEFINE_boolean( - 'skip_logits_layer', - False, - 'Whether to skip the prediction layer and only output the feature vector.') - - -def export_model_to_tfhub(params, - batch_size, - input_image_size, - skip_logits_layer, - checkpoint_path, - export_path): - """Export an image classification model to TF-Hub.""" - input_specs = tf.keras.layers.InputSpec(shape=[batch_size] + - input_image_size + [3]) - - model = factory.build_classification_model( - input_specs=input_specs, - model_config=params.task.model, - l2_regularizer=None, - skip_logits_layer=skip_logits_layer) - checkpoint = tf.train.Checkpoint(model=model) - checkpoint.restore(checkpoint_path).assert_existing_objects_matched() - model.save(export_path, include_optimizer=False, save_format='tf') - - -def main(_): - params = exp_factory.get_exp_config(FLAGS.experiment) - for config_file in FLAGS.config_file or []: - params = hyperparams.override_params_dict( - params, config_file, is_strict=True) - if FLAGS.params_override: - params = hyperparams.override_params_dict( - params, FLAGS.params_override, is_strict=True) - params.validate() - params.lock() - - export_model_to_tfhub( - params=params, - batch_size=FLAGS.batch_size, - input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], - skip_logits_layer=FLAGS.skip_logits_layer, - checkpoint_path=FLAGS.checkpoint_path, - export_path=FLAGS.export_path) - - -if __name__ == '__main__': - app.run(main) diff --git a/official/vision/beta/serving/export_tflite.py b/official/vision/beta/serving/export_tflite.py deleted file mode 100644 index 9e75841f017a9ff6175d2a90251fb79fca50c760..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/export_tflite.py +++ /dev/null @@ -1,108 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -r"""Binary to convert a saved model to tflite model. - -It requires a SavedModel exported using export_saved_model.py with batch size 1 -and input type `tflite`, and using the same config file used for exporting saved -model. It includes optional post-training quantization. When using integer -quantization, calibration steps need to be provided to calibrate model input. - -To convert a SavedModel to a TFLite model: - -EXPERIMENT_TYPE = XX -TFLITE_PATH = XX -SAVED_MOODEL_DIR = XX -CONFIG_FILE = XX -export_tflite --experiment=${EXPERIMENT_TYPE} \ - --saved_model_dir=${SAVED_MOODEL_DIR} \ - --tflite_path=${TFLITE_PATH} \ - --config_file=${CONFIG_FILE} \ - --quant_type=fp16 \ - --calibration_steps=500 -""" -from absl import app -from absl import flags -from absl import logging - -import tensorflow as tf -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.modeling import hyperparams -from official.vision.beta.serving import export_tflite_lib - -FLAGS = flags.FLAGS - -flags.DEFINE_string( - 'experiment', - None, - 'experiment type, e.g. retinanet_resnetfpn_coco', - required=True) -flags.DEFINE_multi_string( - 'config_file', - default='', - help='YAML/JSON files which specifies overrides. The override order ' - 'follows the order of args. Note that each file ' - 'can be used as an override template to override the default parameters ' - 'specified in Python. If the same parameter is specified in both ' - '`--config_file` and `--params_override`, `config_file` will be used ' - 'first, followed by params_override.') -flags.DEFINE_string( - 'params_override', '', - 'The JSON/YAML file or string which specifies the parameter to be overriden' - ' on top of `config_file` template.') -flags.DEFINE_string( - 'saved_model_dir', None, 'The directory to the saved model.', required=True) -flags.DEFINE_string( - 'tflite_path', None, 'The path to the output tflite model.', required=True) -flags.DEFINE_string( - 'quant_type', - default=None, - help='Post training quantization type. Support `int8`, `int8_full`, ' - '`fp16`, and `default`. See ' - 'https://www.tensorflow.org/lite/performance/post_training_quantization ' - 'for more details.') -flags.DEFINE_integer('calibration_steps', 500, - 'The number of calibration steps for integer model.') - - -def main(_) -> None: - params = exp_factory.get_exp_config(FLAGS.experiment) - if FLAGS.config_file is not None: - for config_file in FLAGS.config_file: - params = hyperparams.override_params_dict( - params, config_file, is_strict=True) - if FLAGS.params_override: - params = hyperparams.override_params_dict( - params, FLAGS.params_override, is_strict=True) - - params.validate() - params.lock() - - logging.info('Converting SavedModel from %s to TFLite model...', - FLAGS.saved_model_dir) - tflite_model = export_tflite_lib.convert_tflite_model( - saved_model_dir=FLAGS.saved_model_dir, - quant_type=FLAGS.quant_type, - params=params, - calibration_steps=FLAGS.calibration_steps) - - with tf.io.gfile.GFile(FLAGS.tflite_path, 'wb') as fw: - fw.write(tflite_model) - - logging.info('TFLite model converted and saved to %s.', FLAGS.tflite_path) - - -if __name__ == '__main__': - app.run(main) diff --git a/official/vision/beta/serving/export_tflite_lib.py b/official/vision/beta/serving/export_tflite_lib.py deleted file mode 100644 index 1ea6baf99cfe256edcb08993a2835b7bd8b6b0e9..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/export_tflite_lib.py +++ /dev/null @@ -1,122 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Library to facilitate TFLite model conversion.""" -import functools -from typing import Iterator, List, Optional - -from absl import logging -import tensorflow as tf - -from official.core import config_definitions as cfg -from official.vision.beta import configs -from official.vision.beta import tasks - - -def create_representative_dataset( - params: cfg.ExperimentConfig) -> tf.data.Dataset: - """Creates a tf.data.Dataset to load images for representative dataset. - - Args: - params: An ExperimentConfig. - - Returns: - A tf.data.Dataset instance. - - Raises: - ValueError: If task is not supported. - """ - if isinstance(params.task, - configs.image_classification.ImageClassificationTask): - - task = tasks.image_classification.ImageClassificationTask(params.task) - elif isinstance(params.task, configs.retinanet.RetinaNetTask): - task = tasks.retinanet.RetinaNetTask(params.task) - elif isinstance(params.task, configs.maskrcnn.MaskRCNNTask): - task = tasks.maskrcnn.MaskRCNNTask(params.task) - elif isinstance(params.task, - configs.semantic_segmentation.SemanticSegmentationTask): - task = tasks.semantic_segmentation.SemanticSegmentationTask(params.task) - else: - raise ValueError('Task {} not supported.'.format(type(params.task))) - # Ensure batch size is 1 for TFLite model. - params.task.train_data.global_batch_size = 1 - params.task.train_data.dtype = 'float32' - logging.info('Task config: %s', params.task.as_dict()) - return task.build_inputs(params=params.task.train_data) - - -def representative_dataset( - params: cfg.ExperimentConfig, - calibration_steps: int = 2000) -> Iterator[List[tf.Tensor]]: - """"Creates representative dataset for input calibration. - - Args: - params: An ExperimentConfig. - calibration_steps: The steps to do calibration. - - Yields: - An input image tensor. - """ - dataset = create_representative_dataset(params=params) - for image, _ in dataset.take(calibration_steps): - # Skip images that do not have 3 channels. - if image.shape[-1] != 3: - continue - yield [image] - - -def convert_tflite_model(saved_model_dir: str, - quant_type: Optional[str] = None, - params: Optional[cfg.ExperimentConfig] = None, - calibration_steps: Optional[int] = 2000) -> bytes: - """Converts and returns a TFLite model. - - Args: - saved_model_dir: The directory to the SavedModel. - quant_type: The post training quantization (PTQ) method. It can be one of - `default` (dynamic range), `fp16` (float16), `int8` (integer wih float - fallback), `int8_full` (integer only) and None (no quantization). - params: An optional ExperimentConfig to load and preprocess input images to - do calibration for integer quantization. - calibration_steps: The steps to do calibration. - - Returns: - A converted TFLite model with optional PTQ. - - Raises: - ValueError: If `representative_dataset_path` is not present if integer - quantization is requested. - """ - converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) - if quant_type: - if quant_type.startswith('int8'): - converter.optimizations = [tf.lite.Optimize.DEFAULT] - converter.representative_dataset = functools.partial( - representative_dataset, - params=params, - calibration_steps=calibration_steps) - if quant_type == 'int8_full': - converter.target_spec.supported_ops = [ - tf.lite.OpsSet.TFLITE_BUILTINS_INT8 - ] - converter.inference_input_type = tf.uint8 # or tf.int8 - converter.inference_output_type = tf.uint8 # or tf.int8 - elif quant_type == 'fp16': - converter.optimizations = [tf.lite.Optimize.DEFAULT] - converter.target_spec.supported_types = [tf.float16] - elif quant_type == 'default': - converter.optimizations = [tf.lite.Optimize.DEFAULT] - - return converter.convert() diff --git a/official/vision/beta/serving/export_tflite_lib_test.py b/official/vision/beta/serving/export_tflite_lib_test.py deleted file mode 100644 index f12b4c0c1b8eeb55a1851a0abd6d3e2a73b2969b..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/export_tflite_lib_test.py +++ /dev/null @@ -1,152 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for export_tflite_lib.""" -import os - -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.vision.beta.dataloaders import tfexample_utils -from official.vision.beta.serving import detection as detection_serving -from official.vision.beta.serving import export_tflite_lib -from official.vision.beta.serving import image_classification as image_classification_serving -from official.vision.beta.serving import semantic_segmentation as semantic_segmentation_serving - - -class ExportTfliteLibTest(tf.test.TestCase, parameterized.TestCase): - - def _create_test_tfrecord(self, tfrecord_file, example, num_samples): - examples = [example] * num_samples - tfexample_utils.dump_to_tfrecord( - record_file=tfrecord_file, tf_examples=examples) - - def _export_from_module(self, module, input_type, saved_model_dir): - signatures = module.get_inference_signatures( - {input_type: 'serving_default'}) - tf.saved_model.save(module, saved_model_dir, signatures=signatures) - - @combinations.generate( - combinations.combine( - experiment=['mobilenet_imagenet'], - quant_type=[None, 'default', 'fp16', 'int8', 'int8_full'], - input_image_size=[[224, 224]])) - def test_export_tflite_image_classification(self, experiment, quant_type, - input_image_size): - test_tfrecord_file = os.path.join(self.get_temp_dir(), 'cls_test.tfrecord') - example = tf.train.Example.FromString( - tfexample_utils.create_classification_example( - image_height=input_image_size[0], image_width=input_image_size[1])) - self._create_test_tfrecord( - tfrecord_file=test_tfrecord_file, example=example, num_samples=10) - params = exp_factory.get_exp_config(experiment) - params.task.validation_data.input_path = test_tfrecord_file - params.task.train_data.input_path = test_tfrecord_file - temp_dir = self.get_temp_dir() - module = image_classification_serving.ClassificationModule( - params=params, - batch_size=1, - input_image_size=input_image_size, - input_type='tflite') - self._export_from_module( - module=module, - input_type='tflite', - saved_model_dir=os.path.join(temp_dir, 'saved_model')) - - tflite_model = export_tflite_lib.convert_tflite_model( - saved_model_dir=os.path.join(temp_dir, 'saved_model'), - quant_type=quant_type, - params=params, - calibration_steps=5) - - self.assertIsInstance(tflite_model, bytes) - - @combinations.generate( - combinations.combine( - experiment=['retinanet_mobile_coco'], - quant_type=[None, 'default', 'fp16'], - input_image_size=[[384, 384]])) - def test_export_tflite_detection(self, experiment, quant_type, - input_image_size): - test_tfrecord_file = os.path.join(self.get_temp_dir(), 'det_test.tfrecord') - example = tfexample_utils.create_detection_test_example( - image_height=input_image_size[0], - image_width=input_image_size[1], - image_channel=3, - num_instances=10) - self._create_test_tfrecord( - tfrecord_file=test_tfrecord_file, example=example, num_samples=10) - params = exp_factory.get_exp_config(experiment) - params.task.validation_data.input_path = test_tfrecord_file - params.task.train_data.input_path = test_tfrecord_file - temp_dir = self.get_temp_dir() - module = detection_serving.DetectionModule( - params=params, - batch_size=1, - input_image_size=input_image_size, - input_type='tflite') - self._export_from_module( - module=module, - input_type='tflite', - saved_model_dir=os.path.join(temp_dir, 'saved_model')) - - tflite_model = export_tflite_lib.convert_tflite_model( - saved_model_dir=os.path.join(temp_dir, 'saved_model'), - quant_type=quant_type, - params=params, - calibration_steps=5) - - self.assertIsInstance(tflite_model, bytes) - - @combinations.generate( - combinations.combine( - experiment=['mnv2_deeplabv3_pascal'], - quant_type=[None, 'default', 'fp16', 'int8', 'int8_full'], - input_image_size=[[512, 512]])) - def test_export_tflite_semantic_segmentation(self, experiment, quant_type, - input_image_size): - test_tfrecord_file = os.path.join(self.get_temp_dir(), 'seg_test.tfrecord') - example = tfexample_utils.create_segmentation_test_example( - image_height=input_image_size[0], - image_width=input_image_size[1], - image_channel=3) - self._create_test_tfrecord( - tfrecord_file=test_tfrecord_file, example=example, num_samples=10) - params = exp_factory.get_exp_config(experiment) - params.task.validation_data.input_path = test_tfrecord_file - params.task.train_data.input_path = test_tfrecord_file - temp_dir = self.get_temp_dir() - module = semantic_segmentation_serving.SegmentationModule( - params=params, - batch_size=1, - input_image_size=input_image_size, - input_type='tflite') - self._export_from_module( - module=module, - input_type='tflite', - saved_model_dir=os.path.join(temp_dir, 'saved_model')) - - tflite_model = export_tflite_lib.convert_tflite_model( - saved_model_dir=os.path.join(temp_dir, 'saved_model'), - quant_type=quant_type, - params=params, - calibration_steps=5) - - self.assertIsInstance(tflite_model, bytes) - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/serving/image_classification.py b/official/vision/beta/serving/image_classification.py deleted file mode 100644 index 614d129eb25e2a29a9c2128608f1634985dddb69..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/image_classification.py +++ /dev/null @@ -1,84 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Image classification input and model functions for serving/inference.""" - -import tensorflow as tf - -from official.vision.beta.modeling import factory -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.serving import export_base - - -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) - - -class ClassificationModule(export_base.ExportModule): - """classification Module.""" - - def _build_model(self): - input_specs = tf.keras.layers.InputSpec( - shape=[self._batch_size] + self._input_image_size + [3]) - - return factory.build_classification_model( - input_specs=input_specs, - model_config=self.params.task.model, - l2_regularizer=None) - - def _build_inputs(self, image): - """Builds classification model inputs for serving.""" - # Center crops and resizes image. - image = preprocess_ops.center_crop_image(image) - - image = tf.image.resize( - image, self._input_image_size, method=tf.image.ResizeMethod.BILINEAR) - - image = tf.reshape( - image, [self._input_image_size[0], self._input_image_size[1], 3]) - - # Normalizes image with mean and std pixel values. - image = preprocess_ops.normalize_image(image, - offset=MEAN_RGB, - scale=STDDEV_RGB) - return image - - def serve(self, images): - """Cast image to float and run inference. - - Args: - images: uint8 Tensor of shape [batch_size, None, None, 3] - Returns: - Tensor holding classification output logits. - """ - # Skip image preprocessing when input_type is tflite so it is compatible - # with TFLite quantization. - if self._input_type != 'tflite': - with tf.device('cpu:0'): - images = tf.cast(images, dtype=tf.float32) - - images = tf.nest.map_structure( - tf.identity, - tf.map_fn( - self._build_inputs, - elems=images, - fn_output_signature=tf.TensorSpec( - shape=self._input_image_size + [3], dtype=tf.float32), - parallel_iterations=32)) - - logits = self.inference_step(images) - probs = tf.nn.softmax(logits) - - return {'logits': logits, 'probs': probs} diff --git a/official/vision/beta/serving/image_classification_test.py b/official/vision/beta/serving/image_classification_test.py deleted file mode 100644 index 056d24b8702a2f0d3850f534b80774cce5b5c33b..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/image_classification_test.py +++ /dev/null @@ -1,121 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Test for image classification export lib.""" - -import io -import os - -from absl.testing import parameterized -import numpy as np -from PIL import Image -import tensorflow as tf - -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.vision.beta.serving import image_classification - - -class ImageClassificationExportTest(tf.test.TestCase, parameterized.TestCase): - - def _get_classification_module(self, input_type): - params = exp_factory.get_exp_config('resnet_imagenet') - params.task.model.backbone.resnet.model_id = 18 - classification_module = image_classification.ClassificationModule( - params, - batch_size=1, - input_image_size=[224, 224], - input_type=input_type) - return classification_module - - def _export_from_module(self, module, input_type, save_directory): - signatures = module.get_inference_signatures( - {input_type: 'serving_default'}) - tf.saved_model.save(module, - save_directory, - signatures=signatures) - - def _get_dummy_input(self, input_type): - """Get dummy input for the given input type.""" - - if input_type == 'image_tensor': - return tf.zeros((1, 224, 224, 3), dtype=np.uint8) - elif input_type == 'image_bytes': - image = Image.fromarray(np.zeros((224, 224, 3), dtype=np.uint8)) - byte_io = io.BytesIO() - image.save(byte_io, 'PNG') - return [byte_io.getvalue()] - elif input_type == 'tf_example': - image_tensor = tf.zeros((224, 224, 3), dtype=tf.uint8) - encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() - example = tf.train.Example( - features=tf.train.Features( - feature={ - 'image/encoded': - tf.train.Feature( - bytes_list=tf.train.BytesList(value=[encoded_jpeg])), - })).SerializeToString() - return [example] - elif input_type == 'tflite': - return tf.zeros((1, 224, 224, 3), dtype=np.float32) - - @parameterized.parameters( - {'input_type': 'image_tensor'}, - {'input_type': 'image_bytes'}, - {'input_type': 'tf_example'}, - {'input_type': 'tflite'}, - ) - def test_export(self, input_type='image_tensor'): - tmp_dir = self.get_temp_dir() - module = self._get_classification_module(input_type) - # Test that the model restores any attrs that are trackable objects - # (eg: tables, resource variables, keras models/layers, tf.hub modules). - module.model.test_trackable = tf.keras.layers.InputLayer(input_shape=(4,)) - - self._export_from_module(module, input_type, tmp_dir) - - self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) - self.assertTrue(os.path.exists( - os.path.join(tmp_dir, 'variables', 'variables.index'))) - self.assertTrue(os.path.exists( - os.path.join(tmp_dir, 'variables', 'variables.data-00000-of-00001'))) - - imported = tf.saved_model.load(tmp_dir) - classification_fn = imported.signatures['serving_default'] - - images = self._get_dummy_input(input_type) - if input_type != 'tflite': - processed_images = tf.nest.map_structure( - tf.stop_gradient, - tf.map_fn( - module._build_inputs, - elems=tf.zeros((1, 224, 224, 3), dtype=tf.uint8), - fn_output_signature=tf.TensorSpec( - shape=[224, 224, 3], dtype=tf.float32))) - else: - processed_images = images - expected_logits = module.model(processed_images, training=False) - expected_prob = tf.nn.softmax(expected_logits) - out = classification_fn(tf.constant(images)) - - # The imported model should contain any trackable attrs that the original - # model had. - self.assertTrue(hasattr(imported.model, 'test_trackable')) - self.assertAllClose(out['logits'].numpy(), expected_logits.numpy()) - self.assertAllClose(out['probs'].numpy(), expected_prob.numpy()) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/serving/semantic_segmentation.py b/official/vision/beta/serving/semantic_segmentation.py deleted file mode 100644 index e73d2ff61fea8f986be67e6f46f3c228dcd7ac08..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/semantic_segmentation.py +++ /dev/null @@ -1,84 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Semantic segmentation input and model functions for serving/inference.""" - -import tensorflow as tf - -from official.vision.beta.modeling import factory -from official.vision.beta.ops import preprocess_ops -from official.vision.beta.serving import export_base - - -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) - - -class SegmentationModule(export_base.ExportModule): - """Segmentation Module.""" - - def _build_model(self): - input_specs = tf.keras.layers.InputSpec( - shape=[self._batch_size] + self._input_image_size + [3]) - - return factory.build_segmentation_model( - input_specs=input_specs, - model_config=self.params.task.model, - l2_regularizer=None) - - def _build_inputs(self, image): - """Builds classification model inputs for serving.""" - - # Normalizes image with mean and std pixel values. - image = preprocess_ops.normalize_image(image, - offset=MEAN_RGB, - scale=STDDEV_RGB) - - image, _ = preprocess_ops.resize_and_crop_image( - image, - self._input_image_size, - padded_size=self._input_image_size, - aug_scale_min=1.0, - aug_scale_max=1.0) - return image - - def serve(self, images): - """Cast image to float and run inference. - - Args: - images: uint8 Tensor of shape [batch_size, None, None, 3] - Returns: - Tensor holding classification output logits. - """ - # Skip image preprocessing when input_type is tflite so it is compatible - # with TFLite quantization. - if self._input_type != 'tflite': - with tf.device('cpu:0'): - images = tf.cast(images, dtype=tf.float32) - - images = tf.nest.map_structure( - tf.identity, - tf.map_fn( - self._build_inputs, - elems=images, - fn_output_signature=tf.TensorSpec( - shape=self._input_image_size + [3], dtype=tf.float32), - parallel_iterations=32)) - - outputs = self.inference_step(images) - outputs['logits'] = tf.image.resize( - outputs['logits'], self._input_image_size, method='bilinear') - - return outputs diff --git a/official/vision/beta/serving/semantic_segmentation_test.py b/official/vision/beta/serving/semantic_segmentation_test.py deleted file mode 100644 index 798a27e4ed542070847b0edbd0cd6d07eeae5493..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/semantic_segmentation_test.py +++ /dev/null @@ -1,113 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Test for semantic segmentation export lib.""" - -import io -import os - -from absl.testing import parameterized -import numpy as np -from PIL import Image -import tensorflow as tf - -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.vision.beta.serving import semantic_segmentation - - -class SemanticSegmentationExportTest(tf.test.TestCase, parameterized.TestCase): - - def _get_segmentation_module(self, input_type): - params = exp_factory.get_exp_config('mnv2_deeplabv3_pascal') - segmentation_module = semantic_segmentation.SegmentationModule( - params, - batch_size=1, - input_image_size=[112, 112], - input_type=input_type) - return segmentation_module - - def _export_from_module(self, module, input_type, save_directory): - signatures = module.get_inference_signatures( - {input_type: 'serving_default'}) - tf.saved_model.save(module, save_directory, signatures=signatures) - - def _get_dummy_input(self, input_type): - """Get dummy input for the given input type.""" - - if input_type == 'image_tensor': - return tf.zeros((1, 112, 112, 3), dtype=np.uint8) - elif input_type == 'image_bytes': - image = Image.fromarray(np.zeros((112, 112, 3), dtype=np.uint8)) - byte_io = io.BytesIO() - image.save(byte_io, 'PNG') - return [byte_io.getvalue()] - elif input_type == 'tf_example': - image_tensor = tf.zeros((112, 112, 3), dtype=tf.uint8) - encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() - example = tf.train.Example( - features=tf.train.Features( - feature={ - 'image/encoded': - tf.train.Feature( - bytes_list=tf.train.BytesList(value=[encoded_jpeg])), - })).SerializeToString() - return [example] - elif input_type == 'tflite': - return tf.zeros((1, 112, 112, 3), dtype=np.float32) - - @parameterized.parameters( - {'input_type': 'image_tensor'}, - {'input_type': 'image_bytes'}, - {'input_type': 'tf_example'}, - {'input_type': 'tflite'}, - ) - def test_export(self, input_type='image_tensor'): - tmp_dir = self.get_temp_dir() - module = self._get_segmentation_module(input_type) - - self._export_from_module(module, input_type, tmp_dir) - - self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) - self.assertTrue( - os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) - self.assertTrue( - os.path.exists( - os.path.join(tmp_dir, 'variables', - 'variables.data-00000-of-00001'))) - - imported = tf.saved_model.load(tmp_dir) - segmentation_fn = imported.signatures['serving_default'] - - images = self._get_dummy_input(input_type) - if input_type != 'tflite': - processed_images = tf.nest.map_structure( - tf.stop_gradient, - tf.map_fn( - module._build_inputs, - elems=tf.zeros((1, 112, 112, 3), dtype=tf.uint8), - fn_output_signature=tf.TensorSpec( - shape=[112, 112, 3], dtype=tf.float32))) - else: - processed_images = images - expected_output = tf.image.resize( - module.model(processed_images, training=False)['logits'], [112, 112], - method='bilinear') - out = segmentation_fn(tf.constant(images)) - self.assertAllClose(out['logits'].numpy(), expected_output.numpy()) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/serving/video_classification.py b/official/vision/beta/serving/video_classification.py deleted file mode 100644 index 2760cacb89b4ec8322d7dcf11471e02f4eb1069f..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/video_classification.py +++ /dev/null @@ -1,191 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Video classification input and model functions for serving/inference.""" -from typing import Mapping, Dict, Text - -import tensorflow as tf - -from official.vision.beta.dataloaders import video_input -from official.vision.beta.serving import export_base -from official.vision.beta.tasks import video_classification - -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) - - -class VideoClassificationModule(export_base.ExportModule): - """Video classification Module.""" - - def _build_model(self): - input_params = self.params.task.train_data - self._num_frames = input_params.feature_shape[0] - self._stride = input_params.temporal_stride - self._min_resize = input_params.min_image_size - self._crop_size = input_params.feature_shape[1] - - self._output_audio = input_params.output_audio - task = video_classification.VideoClassificationTask(self.params.task) - return task.build_model() - - def _decode_tf_example(self, encoded_inputs: tf.Tensor): - sequence_description = { - # Each image is a string encoding JPEG. - video_input.IMAGE_KEY: - tf.io.FixedLenSequenceFeature((), tf.string), - } - if self._output_audio: - sequence_description[self._params.task.validation_data.audio_feature] = ( - tf.io.VarLenFeature(dtype=tf.float32)) - _, decoded_tensors = tf.io.parse_single_sequence_example( - encoded_inputs, {}, sequence_description) - for key, value in decoded_tensors.items(): - if isinstance(value, tf.SparseTensor): - decoded_tensors[key] = tf.sparse.to_dense(value) - return decoded_tensors - - def _preprocess_image(self, image): - image = video_input.process_image( - image=image, - is_training=False, - num_frames=self._num_frames, - stride=self._stride, - num_test_clips=1, - min_resize=self._min_resize, - crop_size=self._crop_size, - num_crops=1) - image = tf.cast(image, tf.float32) # Use config. - features = {'image': image} - return features - - def _preprocess_audio(self, audio): - features = {} - audio = tf.cast(audio, dtype=tf.float32) # Use config. - audio = video_input.preprocess_ops_3d.sample_sequence( - audio, 20, random=False, stride=1) - audio = tf.ensure_shape( - audio, self._params.task.validation_data.audio_feature_shape) - features['audio'] = audio - return features - - @tf.function - def inference_from_tf_example( - self, encoded_inputs: tf.Tensor) -> Mapping[str, tf.Tensor]: - with tf.device('cpu:0'): - if self._output_audio: - inputs = tf.map_fn( - self._decode_tf_example, (encoded_inputs), - fn_output_signature={ - video_input.IMAGE_KEY: tf.string, - self._params.task.validation_data.audio_feature: tf.float32 - }) - return self.serve(inputs['image'], inputs['audio']) - else: - inputs = tf.map_fn( - self._decode_tf_example, (encoded_inputs), - fn_output_signature={ - video_input.IMAGE_KEY: tf.string, - }) - return self.serve(inputs[video_input.IMAGE_KEY], tf.zeros([1, 1])) - - @tf.function - def inference_from_image_tensors( - self, input_frames: tf.Tensor) -> Mapping[str, tf.Tensor]: - return self.serve(input_frames, tf.zeros([1, 1])) - - @tf.function - def inference_from_image_audio_tensors( - self, input_frames: tf.Tensor, - input_audio: tf.Tensor) -> Mapping[str, tf.Tensor]: - return self.serve(input_frames, input_audio) - - @tf.function - def inference_from_image_bytes(self, inputs: tf.Tensor): - raise NotImplementedError( - 'Video classification do not support image bytes input.') - - def serve(self, input_frames: tf.Tensor, input_audio: tf.Tensor): - """Cast image to float and run inference. - - Args: - input_frames: uint8 Tensor of shape [batch_size, None, None, 3] - input_audio: float32 - - Returns: - Tensor holding classification output logits. - """ - with tf.device('cpu:0'): - inputs = tf.map_fn( - self._preprocess_image, (input_frames), - fn_output_signature={ - 'image': tf.float32, - }) - if self._output_audio: - inputs.update( - tf.map_fn( - self._preprocess_audio, (input_audio), - fn_output_signature={'audio': tf.float32})) - logits = self.inference_step(inputs) - if self.params.task.train_data.is_multilabel: - probs = tf.math.sigmoid(logits) - else: - probs = tf.nn.softmax(logits) - return {'logits': logits, 'probs': probs} - - def get_inference_signatures(self, function_keys: Dict[Text, Text]): - """Gets defined function signatures. - - Args: - function_keys: A dictionary with keys as the function to create signature - for and values as the signature keys when returns. - - Returns: - A dictionary with key as signature key and value as concrete functions - that can be used for tf.saved_model.save. - """ - signatures = {} - for key, def_name in function_keys.items(): - if key == 'image_tensor': - input_signature = tf.TensorSpec( - shape=[self._batch_size] + self._input_image_size + [3], - dtype=tf.uint8, - name='INPUT_FRAMES') - signatures[ - def_name] = self.inference_from_image_tensors.get_concrete_function( - input_signature) - elif key == 'frames_audio': - input_signature = [ - tf.TensorSpec( - shape=[self._batch_size] + self._input_image_size + [3], - dtype=tf.uint8, - name='INPUT_FRAMES'), - tf.TensorSpec( - shape=[self._batch_size] + - self.params.task.train_data.audio_feature_shape, - dtype=tf.float32, - name='INPUT_AUDIO') - ] - signatures[ - def_name] = self.inference_from_image_audio_tensors.get_concrete_function( - input_signature) - elif key == 'serve_examples' or key == 'tf_example': - input_signature = tf.TensorSpec( - shape=[self._batch_size], dtype=tf.string) - signatures[ - def_name] = self.inference_from_tf_example.get_concrete_function( - input_signature) - else: - raise ValueError('Unrecognized `input_type`') - return signatures diff --git a/official/vision/beta/serving/video_classification_test.py b/official/vision/beta/serving/video_classification_test.py deleted file mode 100644 index cba83650d7c586d61f2cbb99be7e8f1f93e000f0..0000000000000000000000000000000000000000 --- a/official/vision/beta/serving/video_classification_test.py +++ /dev/null @@ -1,114 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 - -# import io -import os -import random - -from absl.testing import parameterized -import numpy as np -import tensorflow as tf - -from official.common import registry_imports # pylint: disable=unused-import -from official.core import exp_factory -from official.vision.beta.dataloaders import tfexample_utils -from official.vision.beta.serving import video_classification - - -class VideoClassificationTest(tf.test.TestCase, parameterized.TestCase): - - def _get_classification_module(self): - params = exp_factory.get_exp_config('video_classification_ucf101') - params.task.train_data.feature_shape = (8, 64, 64, 3) - params.task.validation_data.feature_shape = (8, 64, 64, 3) - params.task.model.backbone.resnet_3d.model_id = 50 - classification_module = video_classification.VideoClassificationModule( - params, batch_size=1, input_image_size=[8, 64, 64]) - return classification_module - - def _export_from_module(self, module, input_type, save_directory): - signatures = module.get_inference_signatures( - {input_type: 'serving_default'}) - tf.saved_model.save(module, save_directory, signatures=signatures) - - def _get_dummy_input(self, input_type, module=None): - """Get dummy input for the given input type.""" - - if input_type == 'image_tensor': - images = np.random.randint( - low=0, high=255, size=(1, 8, 64, 64, 3), dtype=np.uint8) - # images = np.zeros((1, 8, 64, 64, 3), dtype=np.uint8) - return images, images - elif input_type == 'tf_example': - example = tfexample_utils.make_video_test_example( - image_shape=(64, 64, 3), - audio_shape=(20, 128), - label=random.randint(0, 100)).SerializeToString() - images = tf.nest.map_structure( - tf.stop_gradient, - tf.map_fn( - module._decode_tf_example, - elems=tf.constant([example]), - fn_output_signature={ - video_classification.video_input.IMAGE_KEY: tf.string, - })) - images = images[video_classification.video_input.IMAGE_KEY] - return [example], images - else: - raise ValueError(f'{input_type}') - - @parameterized.parameters( - {'input_type': 'image_tensor'}, - {'input_type': 'tf_example'}, - ) - def test_export(self, input_type): - tmp_dir = self.get_temp_dir() - module = self._get_classification_module() - - self._export_from_module(module, input_type, tmp_dir) - - self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) - self.assertTrue( - os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) - self.assertTrue( - os.path.exists( - os.path.join(tmp_dir, 'variables', - 'variables.data-00000-of-00001'))) - - imported = tf.saved_model.load(tmp_dir) - classification_fn = imported.signatures['serving_default'] - - images, images_tensor = self._get_dummy_input(input_type, module) - processed_images = tf.nest.map_structure( - tf.stop_gradient, - tf.map_fn( - module._preprocess_image, - elems=images_tensor, - fn_output_signature={ - 'image': tf.float32, - })) - expected_logits = module.model(processed_images, training=False) - expected_prob = tf.nn.softmax(expected_logits) - out = classification_fn(tf.constant(images)) - - # The imported model should contain any trackable attrs that the original - # model had. - self.assertAllClose(out['logits'].numpy(), expected_logits.numpy()) - self.assertAllClose(out['probs'].numpy(), expected_prob.numpy()) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/beta/tasks/__init__.py b/official/vision/beta/tasks/__init__.py deleted file mode 100644 index 8410d0d5b44fad9fa2627a24773ebe02c5df19cb..0000000000000000000000000000000000000000 --- a/official/vision/beta/tasks/__init__.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Tasks package definition.""" - -from official.vision.beta.tasks import image_classification -from official.vision.beta.tasks import maskrcnn -from official.vision.beta.tasks import retinanet -from official.vision.beta.tasks import semantic_segmentation -from official.vision.beta.tasks import video_classification diff --git a/official/vision/beta/tasks/image_classification.py b/official/vision/beta/tasks/image_classification.py deleted file mode 100644 index 6892b3a375b7d6e679d07272e730f1f2196dc7f6..0000000000000000000000000000000000000000 --- a/official/vision/beta/tasks/image_classification.py +++ /dev/null @@ -1,312 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Image classification task definition.""" -from typing import Any, Optional, List, Tuple -from absl import logging -import tensorflow as tf - -from official.common import dataset_fn -from official.core import base_task -from official.core import task_factory -from official.modeling import tf_utils -from official.vision.beta.configs import image_classification as exp_cfg -from official.vision.beta.dataloaders import classification_input -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.dataloaders import tfds_factory -from official.vision.beta.modeling import factory -from official.vision.beta.ops import augment - - -@task_factory.register_task_cls(exp_cfg.ImageClassificationTask) -class ImageClassificationTask(base_task.Task): - """A task for image classification.""" - - def build_model(self): - """Builds classification model.""" - input_specs = tf.keras.layers.InputSpec( - shape=[None] + self.task_config.model.input_size) - - l2_weight_decay = self.task_config.losses.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - model = factory.build_classification_model( - input_specs=input_specs, - model_config=self.task_config.model, - l2_regularizer=l2_regularizer) - return model - - def initialize(self, model: tf.keras.Model): - """Loads pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(model=model) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - elif self.task_config.init_checkpoint_modules == 'backbone': - ckpt = tf.train.Checkpoint(backbone=model.backbone) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - raise ValueError( - "Only 'all' or 'backbone' can be used to initialize the model.") - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def build_inputs( - self, - params: exp_cfg.DataConfig, - input_context: Optional[tf.distribute.InputContext] = None - ) -> tf.data.Dataset: - """Builds classification input.""" - - num_classes = self.task_config.model.num_classes - input_size = self.task_config.model.input_size - image_field_key = self.task_config.train_data.image_field_key - label_field_key = self.task_config.train_data.label_field_key - is_multilabel = self.task_config.train_data.is_multilabel - - if params.tfds_name: - decoder = tfds_factory.get_classification_decoder(params.tfds_name) - else: - decoder = classification_input.Decoder( - image_field_key=image_field_key, label_field_key=label_field_key, - is_multilabel=is_multilabel) - - parser = classification_input.Parser( - output_size=input_size[:2], - num_classes=num_classes, - image_field_key=image_field_key, - label_field_key=label_field_key, - decode_jpeg_only=params.decode_jpeg_only, - aug_rand_hflip=params.aug_rand_hflip, - aug_type=params.aug_type, - color_jitter=params.color_jitter, - random_erasing=params.random_erasing, - is_multilabel=is_multilabel, - dtype=params.dtype) - - postprocess_fn = None - if params.mixup_and_cutmix: - postprocess_fn = augment.MixupAndCutmix( - mixup_alpha=params.mixup_and_cutmix.mixup_alpha, - cutmix_alpha=params.mixup_and_cutmix.cutmix_alpha, - prob=params.mixup_and_cutmix.prob, - label_smoothing=params.mixup_and_cutmix.label_smoothing, - num_classes=num_classes) - - reader = input_reader_factory.input_reader_generator( - params, - dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training), - postprocess_fn=postprocess_fn) - - dataset = reader.read(input_context=input_context) - - return dataset - - def build_losses(self, - labels: tf.Tensor, - model_outputs: tf.Tensor, - aux_losses: Optional[Any] = None) -> tf.Tensor: - """Builds sparse categorical cross entropy loss. - - Args: - labels: Input groundtruth labels. - model_outputs: Output logits of the classifier. - aux_losses: The auxiliarly loss tensors, i.e. `losses` in tf.keras.Model. - - Returns: - The total loss tensor. - """ - losses_config = self.task_config.losses - is_multilabel = self.task_config.train_data.is_multilabel - - if not is_multilabel: - if losses_config.one_hot: - total_loss = tf.keras.losses.categorical_crossentropy( - labels, - model_outputs, - from_logits=True, - label_smoothing=losses_config.label_smoothing) - elif losses_config.soft_labels: - total_loss = tf.nn.softmax_cross_entropy_with_logits( - labels, model_outputs) - else: - total_loss = tf.keras.losses.sparse_categorical_crossentropy( - labels, model_outputs, from_logits=True) - else: - # Multi-label weighted binary cross entropy loss. - total_loss = tf.nn.sigmoid_cross_entropy_with_logits( - labels=labels, logits=model_outputs) - total_loss = tf.reduce_sum(total_loss, axis=-1) - - total_loss = tf_utils.safe_mean(total_loss) - if aux_losses: - total_loss += tf.add_n(aux_losses) - - total_loss = losses_config.loss_weight * total_loss - return total_loss - - def build_metrics(self, - training: bool = True) -> List[tf.keras.metrics.Metric]: - """Gets streaming metrics for training/validation.""" - is_multilabel = self.task_config.train_data.is_multilabel - if not is_multilabel: - k = self.task_config.evaluation.top_k - if (self.task_config.losses.one_hot or - self.task_config.losses.soft_labels): - metrics = [ - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - tf.keras.metrics.TopKCategoricalAccuracy( - k=k, name='top_{}_accuracy'.format(k))] - else: - metrics = [ - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - tf.keras.metrics.SparseTopKCategoricalAccuracy( - k=k, name='top_{}_accuracy'.format(k))] - else: - metrics = [] - # These metrics destablize the training if included in training. The jobs - # fail due to OOM. - # TODO(arashwan): Investigate adding following metric to train. - if not training: - metrics = [ - tf.keras.metrics.AUC( - name='globalPR-AUC', - curve='PR', - multi_label=False, - from_logits=True), - tf.keras.metrics.AUC( - name='meanPR-AUC', - curve='PR', - multi_label=True, - num_labels=self.task_config.model.num_classes, - from_logits=True), - ] - return metrics - - def train_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - optimizer: tf.keras.optimizers.Optimizer, - metrics: Optional[List[Any]] = None): - """Does forward and backward. - - Args: - inputs: A tuple of of input tensors of (features, labels). - model: A tf.keras.Model instance. - optimizer: The optimizer for this training step. - metrics: A nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - is_multilabel = self.task_config.train_data.is_multilabel - if self.task_config.losses.one_hot and not is_multilabel: - labels = tf.one_hot(labels, self.task_config.model.num_classes) - - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model(features, training=True) - # Casting output layer as float32 is necessary when mixed_precision is - # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. - outputs = tf.nest.map_structure( - lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - loss = self.build_losses( - model_outputs=outputs, - labels=labels, - aux_losses=model.losses) - # Scales loss as the default gradients allreduce performs sum inside the - # optimizer. - scaled_loss = loss / num_replicas - - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance( - optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient before apply_gradients when LossScaleOptimizer is - # used. - if isinstance( - optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = {self.loss: loss} - if metrics: - self.process_metrics(metrics, labels, outputs) - elif model.compiled_metrics: - self.process_compiled_metrics(model.compiled_metrics, labels, outputs) - logs.update({m.name: m.result() for m in model.metrics}) - return logs - - def validation_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - metrics: Optional[List[Any]] = None): - """Runs validatation step. - - Args: - inputs: A tuple of of input tensors of (features, labels). - model: A tf.keras.Model instance. - metrics: A nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - one_hot = self.task_config.losses.one_hot - soft_labels = self.task_config.losses.soft_labels - is_multilabel = self.task_config.train_data.is_multilabel - if (one_hot or soft_labels) and not is_multilabel: - labels = tf.one_hot(labels, self.task_config.model.num_classes) - - outputs = self.inference_step(features, model) - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - loss = self.build_losses( - model_outputs=outputs, - labels=labels, - aux_losses=model.losses) - - logs = {self.loss: loss} - if metrics: - self.process_metrics(metrics, labels, outputs) - elif model.compiled_metrics: - self.process_compiled_metrics(model.compiled_metrics, labels, outputs) - logs.update({m.name: m.result() for m in model.metrics}) - return logs - - def inference_step(self, inputs: tf.Tensor, model: tf.keras.Model): - """Performs the forward step.""" - return model(inputs, training=False) diff --git a/official/vision/beta/tasks/maskrcnn.py b/official/vision/beta/tasks/maskrcnn.py deleted file mode 100644 index abb92a3e017493197f8b9f45342e9647cea4527e..0000000000000000000000000000000000000000 --- a/official/vision/beta/tasks/maskrcnn.py +++ /dev/null @@ -1,455 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""MaskRCNN task definition.""" -import os -from typing import Any, Optional, List, Tuple, Mapping - -from absl import logging -import tensorflow as tf -from official.common import dataset_fn -from official.core import base_task -from official.core import task_factory -from official.vision.beta.configs import maskrcnn as exp_cfg -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.dataloaders import maskrcnn_input -from official.vision.beta.dataloaders import tf_example_decoder -from official.vision.beta.dataloaders import tf_example_label_map_decoder -from official.vision.beta.evaluation import coco_evaluator -from official.vision.beta.evaluation import coco_utils -from official.vision.beta.losses import maskrcnn_losses -from official.vision.beta.modeling import factory - - -def zero_out_disallowed_class_ids(batch_class_ids: tf.Tensor, - allowed_class_ids: List[int]): - """Zero out IDs of classes not in allowed_class_ids. - - Args: - batch_class_ids: A [batch_size, num_instances] int tensor of input - class IDs. - allowed_class_ids: A python list of class IDs which we want to allow. - - Returns: - filtered_class_ids: A [batch_size, num_instances] int tensor with any - class ID not in allowed_class_ids set to 0. - """ - - allowed_class_ids = tf.constant(allowed_class_ids, - dtype=batch_class_ids.dtype) - - match_ids = (batch_class_ids[:, :, tf.newaxis] == - allowed_class_ids[tf.newaxis, tf.newaxis, :]) - - match_ids = tf.reduce_any(match_ids, axis=2) - return tf.where(match_ids, batch_class_ids, tf.zeros_like(batch_class_ids)) - - -@task_factory.register_task_cls(exp_cfg.MaskRCNNTask) -class MaskRCNNTask(base_task.Task): - """A single-replica view of training procedure. - - Mask R-CNN task provides artifacts for training/evalution procedures, - including loading/iterating over Datasets, initializing the model, calculating - the loss, post-processing, and customized metrics with reduction. - """ - - def build_model(self): - """Build Mask R-CNN model.""" - - input_specs = tf.keras.layers.InputSpec( - shape=[None] + self.task_config.model.input_size) - - l2_weight_decay = self.task_config.losses.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - model = factory.build_maskrcnn( - input_specs=input_specs, - model_config=self.task_config.model, - l2_regularizer=l2_regularizer) - return model - - def initialize(self, model: tf.keras.Model): - """Loading pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - ckpt_items = {} - if 'backbone' in self.task_config.init_checkpoint_modules: - ckpt_items.update(backbone=model.backbone) - if 'decoder' in self.task_config.init_checkpoint_modules: - ckpt_items.update(decoder=model.decoder) - - ckpt = tf.train.Checkpoint(**ckpt_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def build_inputs(self, - params: exp_cfg.DataConfig, - input_context: Optional[tf.distribute.InputContext] = None): - """Build input dataset.""" - decoder_cfg = params.decoder.get() - if params.decoder.type == 'simple_decoder': - decoder = tf_example_decoder.TfExampleDecoder( - include_mask=self._task_config.model.include_mask, - regenerate_source_id=decoder_cfg.regenerate_source_id, - mask_binarize_threshold=decoder_cfg.mask_binarize_threshold) - elif params.decoder.type == 'label_map_decoder': - decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( - label_map=decoder_cfg.label_map, - include_mask=self._task_config.model.include_mask, - regenerate_source_id=decoder_cfg.regenerate_source_id, - mask_binarize_threshold=decoder_cfg.mask_binarize_threshold) - else: - raise ValueError('Unknown decoder type: {}!'.format(params.decoder.type)) - - parser = maskrcnn_input.Parser( - output_size=self.task_config.model.input_size[:2], - min_level=self.task_config.model.min_level, - max_level=self.task_config.model.max_level, - num_scales=self.task_config.model.anchor.num_scales, - aspect_ratios=self.task_config.model.anchor.aspect_ratios, - anchor_size=self.task_config.model.anchor.anchor_size, - dtype=params.dtype, - rpn_match_threshold=params.parser.rpn_match_threshold, - rpn_unmatched_threshold=params.parser.rpn_unmatched_threshold, - rpn_batch_size_per_im=params.parser.rpn_batch_size_per_im, - rpn_fg_fraction=params.parser.rpn_fg_fraction, - aug_rand_hflip=params.parser.aug_rand_hflip, - aug_scale_min=params.parser.aug_scale_min, - aug_scale_max=params.parser.aug_scale_max, - skip_crowd_during_training=params.parser.skip_crowd_during_training, - max_num_instances=params.parser.max_num_instances, - include_mask=self._task_config.model.include_mask, - mask_crop_size=params.parser.mask_crop_size) - - reader = input_reader_factory.input_reader_generator( - params, - dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - dataset = reader.read(input_context=input_context) - - return dataset - - def build_losses(self, - outputs: Mapping[str, Any], - labels: Mapping[str, Any], - aux_losses: Optional[Any] = None): - """Build Mask R-CNN losses.""" - params = self.task_config - cascade_ious = params.model.roi_sampler.cascade_iou_thresholds - - rpn_score_loss_fn = maskrcnn_losses.RpnScoreLoss( - tf.shape(outputs['box_outputs'])[1]) - rpn_box_loss_fn = maskrcnn_losses.RpnBoxLoss( - params.losses.rpn_huber_loss_delta) - rpn_score_loss = tf.reduce_mean( - rpn_score_loss_fn( - outputs['rpn_scores'], labels['rpn_score_targets'])) - rpn_box_loss = tf.reduce_mean( - rpn_box_loss_fn( - outputs['rpn_boxes'], labels['rpn_box_targets'])) - - frcnn_cls_loss_fn = maskrcnn_losses.FastrcnnClassLoss() - frcnn_box_loss_fn = maskrcnn_losses.FastrcnnBoxLoss( - params.losses.frcnn_huber_loss_delta, - params.model.detection_head.class_agnostic_bbox_pred) - - # Final cls/box losses are computed as an average of all detection heads. - frcnn_cls_loss = 0.0 - frcnn_box_loss = 0.0 - num_det_heads = 1 if cascade_ious is None else 1 + len(cascade_ious) - for cas_num in range(num_det_heads): - frcnn_cls_loss_i = tf.reduce_mean( - frcnn_cls_loss_fn( - outputs['class_outputs_{}' - .format(cas_num) if cas_num else 'class_outputs'], - outputs['class_targets_{}' - .format(cas_num) if cas_num else 'class_targets'])) - frcnn_box_loss_i = tf.reduce_mean( - frcnn_box_loss_fn( - outputs['box_outputs_{}'.format(cas_num - ) if cas_num else 'box_outputs'], - outputs['class_targets_{}' - .format(cas_num) if cas_num else 'class_targets'], - outputs['box_targets_{}'.format(cas_num - ) if cas_num else 'box_targets'])) - frcnn_cls_loss += frcnn_cls_loss_i - frcnn_box_loss += frcnn_box_loss_i - frcnn_cls_loss /= num_det_heads - frcnn_box_loss /= num_det_heads - - if params.model.include_mask: - mask_loss_fn = maskrcnn_losses.MaskrcnnLoss() - mask_class_targets = outputs['mask_class_targets'] - if self._task_config.allowed_mask_class_ids is not None: - # Classes with ID=0 are ignored by mask_loss_fn in loss computation. - mask_class_targets = zero_out_disallowed_class_ids( - mask_class_targets, self._task_config.allowed_mask_class_ids) - - mask_loss = tf.reduce_mean( - mask_loss_fn( - outputs['mask_outputs'], - outputs['mask_targets'], - mask_class_targets)) - else: - mask_loss = 0.0 - - model_loss = ( - params.losses.rpn_score_weight * rpn_score_loss + - params.losses.rpn_box_weight * rpn_box_loss + - params.losses.frcnn_class_weight * frcnn_cls_loss + - params.losses.frcnn_box_weight * frcnn_box_loss + - params.losses.mask_weight * mask_loss) - - total_loss = model_loss - if aux_losses: - reg_loss = tf.reduce_sum(aux_losses) - total_loss = model_loss + reg_loss - - total_loss = params.losses.loss_weight * total_loss - losses = { - 'total_loss': total_loss, - 'rpn_score_loss': rpn_score_loss, - 'rpn_box_loss': rpn_box_loss, - 'frcnn_cls_loss': frcnn_cls_loss, - 'frcnn_box_loss': frcnn_box_loss, - 'mask_loss': mask_loss, - 'model_loss': model_loss, - } - return losses - - def _build_coco_metrics(self): - """Build COCO metrics evaluator.""" - if (not self._task_config.model.include_mask - ) or self._task_config.annotation_file: - self.coco_metric = coco_evaluator.COCOEvaluator( - annotation_file=self._task_config.annotation_file, - include_mask=self._task_config.model.include_mask, - per_category_metrics=self._task_config.per_category_metrics) - else: - # Builds COCO-style annotation file if include_mask is True, and - # annotation_file isn't provided. - annotation_path = os.path.join(self._logging_dir, 'annotation.json') - if tf.io.gfile.exists(annotation_path): - logging.info( - 'annotation.json file exists, skipping creating the annotation' - ' file.') - else: - if self._task_config.validation_data.num_examples <= 0: - logging.info('validation_data.num_examples needs to be > 0') - if not self._task_config.validation_data.input_path: - logging.info('Can not create annotation file for tfds.') - logging.info( - 'Creating coco-style annotation file: %s', annotation_path) - coco_utils.scan_and_generator_annotation_file( - self._task_config.validation_data.input_path, - self._task_config.validation_data.file_type, - self._task_config.validation_data.num_examples, - self.task_config.model.include_mask, annotation_path, - regenerate_source_id=self._task_config.validation_data.decoder - .simple_decoder.regenerate_source_id) - self.coco_metric = coco_evaluator.COCOEvaluator( - annotation_file=annotation_path, - include_mask=self._task_config.model.include_mask, - per_category_metrics=self._task_config.per_category_metrics) - - def build_metrics(self, training: bool = True): - """Build detection metrics.""" - metrics = [] - if training: - metric_names = [ - 'total_loss', - 'rpn_score_loss', - 'rpn_box_loss', - 'frcnn_cls_loss', - 'frcnn_box_loss', - 'mask_loss', - 'model_loss' - ] - for name in metric_names: - metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) - - else: - if self._task_config.use_coco_metrics: - self._build_coco_metrics() - if self._task_config.use_wod_metrics: - # To use Waymo open dataset metrics, please install one of the pip - # package `waymo-open-dataset-tf-*` from - # https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md#use-pre-compiled-pippip3-packages-for-linux - # Note that the package is built with specific tensorflow version and - # will produce error if it does not match the tf version that is - # currently used. - try: - from official.vision.beta.evaluation import wod_detection_evaluator # pylint: disable=g-import-not-at-top - except ModuleNotFoundError: - logging.error('waymo-open-dataset should be installed to enable Waymo' - ' evaluator.') - raise - self.wod_metric = wod_detection_evaluator.WOD2dDetectionEvaluator() - - return metrics - - def train_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - optimizer: tf.keras.optimizers.Optimizer, - metrics: Optional[List[Any]] = None): - """Does forward and backward. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - images, labels = inputs - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model( - images, - image_shape=labels['image_info'][:, 1, :], - anchor_boxes=labels['anchor_boxes'], - gt_boxes=labels['gt_boxes'], - gt_classes=labels['gt_classes'], - gt_masks=(labels['gt_masks'] if self.task_config.model.include_mask - else None), - training=True) - outputs = tf.nest.map_structure( - lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - losses = self.build_losses( - outputs=outputs, labels=labels, aux_losses=model.losses) - scaled_loss = losses['total_loss'] / num_replicas - - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient when LossScaleOptimizer is used. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = {self.loss: losses['total_loss']} - - if metrics: - for m in metrics: - m.update_state(losses[m.name]) - - return logs - - def validation_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - metrics: Optional[List[Any]] = None): - """Validatation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - images, labels = inputs - - outputs = model( - images, - anchor_boxes=labels['anchor_boxes'], - image_shape=labels['image_info'][:, 1, :], - training=False) - - logs = {self.loss: 0} - if self._task_config.use_coco_metrics: - coco_model_outputs = { - 'detection_boxes': outputs['detection_boxes'], - 'detection_scores': outputs['detection_scores'], - 'detection_classes': outputs['detection_classes'], - 'num_detections': outputs['num_detections'], - 'source_id': labels['groundtruths']['source_id'], - 'image_info': labels['image_info'] - } - if self.task_config.model.include_mask: - coco_model_outputs.update({ - 'detection_masks': outputs['detection_masks'], - }) - logs.update( - {self.coco_metric.name: (labels['groundtruths'], coco_model_outputs)}) - - if self.task_config.use_wod_metrics: - wod_model_outputs = { - 'detection_boxes': outputs['detection_boxes'], - 'detection_scores': outputs['detection_scores'], - 'detection_classes': outputs['detection_classes'], - 'num_detections': outputs['num_detections'], - 'source_id': labels['groundtruths']['source_id'], - 'image_info': labels['image_info'] - } - logs.update( - {self.wod_metric.name: (labels['groundtruths'], wod_model_outputs)}) - return logs - - def aggregate_logs(self, state=None, step_outputs=None): - if self._task_config.use_coco_metrics: - if state is None: - self.coco_metric.reset_states() - self.coco_metric.update_state( - step_outputs[self.coco_metric.name][0], - step_outputs[self.coco_metric.name][1]) - if self._task_config.use_wod_metrics: - if state is None: - self.wod_metric.reset_states() - self.wod_metric.update_state( - step_outputs[self.wod_metric.name][0], - step_outputs[self.wod_metric.name][1]) - if state is None: - # Create an arbitrary state to indicate it's not the first step in the - # following calls to this function. - state = True - return state - - def reduce_aggregated_logs(self, aggregated_logs, global_step=None): - logs = {} - if self._task_config.use_coco_metrics: - logs.update(self.coco_metric.result()) - if self._task_config.use_wod_metrics: - logs.update(self.wod_metric.result()) - return logs diff --git a/official/vision/beta/tasks/retinanet.py b/official/vision/beta/tasks/retinanet.py deleted file mode 100644 index b463e8d94cbce319d9e57ae4e1c09e4dbe0d5fd2..0000000000000000000000000000000000000000 --- a/official/vision/beta/tasks/retinanet.py +++ /dev/null @@ -1,358 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""RetinaNet task definition.""" -from typing import Any, List, Mapping, Optional, Tuple - -from absl import logging -import tensorflow as tf - -from official.common import dataset_fn -from official.core import base_task -from official.core import task_factory -from official.vision.beta.configs import retinanet as exp_cfg -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.dataloaders import retinanet_input -from official.vision.beta.dataloaders import tf_example_decoder -from official.vision.beta.dataloaders import tfds_factory -from official.vision.beta.dataloaders import tf_example_label_map_decoder -from official.vision.beta.evaluation import coco_evaluator -from official.vision.beta.losses import focal_loss -from official.vision.beta.losses import loss_utils -from official.vision.beta.modeling import factory - - -@task_factory.register_task_cls(exp_cfg.RetinaNetTask) -class RetinaNetTask(base_task.Task): - """A single-replica view of training procedure. - - RetinaNet task provides artifacts for training/evalution procedures, including - loading/iterating over Datasets, initializing the model, calculating the loss, - post-processing, and customized metrics with reduction. - """ - - def build_model(self): - """Build RetinaNet model.""" - - input_specs = tf.keras.layers.InputSpec( - shape=[None] + self.task_config.model.input_size) - - l2_weight_decay = self.task_config.losses.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - model = factory.build_retinanet( - input_specs=input_specs, - model_config=self.task_config.model, - l2_regularizer=l2_regularizer) - return model - - def initialize(self, model: tf.keras.Model): - """Loading pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - ckpt_items = {} - if 'backbone' in self.task_config.init_checkpoint_modules: - ckpt_items.update(backbone=model.backbone) - if 'decoder' in self.task_config.init_checkpoint_modules: - ckpt_items.update(decoder=model.decoder) - - ckpt = tf.train.Checkpoint(**ckpt_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def build_inputs(self, - params: exp_cfg.DataConfig, - input_context: Optional[tf.distribute.InputContext] = None): - """Build input dataset.""" - - if params.tfds_name: - decoder = tfds_factory.get_detection_decoder(params.tfds_name) - else: - decoder_cfg = params.decoder.get() - if params.decoder.type == 'simple_decoder': - decoder = tf_example_decoder.TfExampleDecoder( - regenerate_source_id=decoder_cfg.regenerate_source_id) - elif params.decoder.type == 'label_map_decoder': - decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( - label_map=decoder_cfg.label_map, - regenerate_source_id=decoder_cfg.regenerate_source_id) - else: - raise ValueError('Unknown decoder type: {}!'.format( - params.decoder.type)) - - parser = retinanet_input.Parser( - output_size=self.task_config.model.input_size[:2], - min_level=self.task_config.model.min_level, - max_level=self.task_config.model.max_level, - num_scales=self.task_config.model.anchor.num_scales, - aspect_ratios=self.task_config.model.anchor.aspect_ratios, - anchor_size=self.task_config.model.anchor.anchor_size, - dtype=params.dtype, - match_threshold=params.parser.match_threshold, - unmatched_threshold=params.parser.unmatched_threshold, - aug_type=params.parser.aug_type, - aug_rand_hflip=params.parser.aug_rand_hflip, - aug_scale_min=params.parser.aug_scale_min, - aug_scale_max=params.parser.aug_scale_max, - skip_crowd_during_training=params.parser.skip_crowd_during_training, - max_num_instances=params.parser.max_num_instances) - - reader = input_reader_factory.input_reader_generator( - params, - dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - dataset = reader.read(input_context=input_context) - - return dataset - - def build_attribute_loss(self, - attribute_heads: List[exp_cfg.AttributeHead], - outputs: Mapping[str, Any], - labels: Mapping[str, Any], - box_sample_weight: tf.Tensor) -> float: - """Computes attribute loss. - - Args: - attribute_heads: a list of attribute head configs. - outputs: RetinaNet model outputs. - labels: RetinaNet labels. - box_sample_weight: normalized bounding box sample weights. - - Returns: - Attribute loss of all attribute heads. - """ - attribute_loss = 0.0 - for head in attribute_heads: - if head.name not in labels['attribute_targets']: - raise ValueError(f'Attribute {head.name} not found in label targets.') - if head.name not in outputs['attribute_outputs']: - raise ValueError(f'Attribute {head.name} not found in model outputs.') - - y_true_att = loss_utils.multi_level_flatten( - labels['attribute_targets'][head.name], last_dim=head.size) - y_pred_att = loss_utils.multi_level_flatten( - outputs['attribute_outputs'][head.name], last_dim=head.size) - if head.type == 'regression': - att_loss_fn = tf.keras.losses.Huber( - 1.0, reduction=tf.keras.losses.Reduction.SUM) - att_loss = att_loss_fn( - y_true=y_true_att, - y_pred=y_pred_att, - sample_weight=box_sample_weight) - else: - raise ValueError(f'Attribute type {head.type} not supported.') - attribute_loss += att_loss - - return attribute_loss - - def build_losses(self, - outputs: Mapping[str, Any], - labels: Mapping[str, Any], - aux_losses: Optional[Any] = None): - """Build RetinaNet losses.""" - params = self.task_config - attribute_heads = self.task_config.model.head.attribute_heads - - cls_loss_fn = focal_loss.FocalLoss( - alpha=params.losses.focal_loss_alpha, - gamma=params.losses.focal_loss_gamma, - reduction=tf.keras.losses.Reduction.SUM) - box_loss_fn = tf.keras.losses.Huber( - params.losses.huber_loss_delta, reduction=tf.keras.losses.Reduction.SUM) - - # Sums all positives in a batch for normalization and avoids zero - # num_positives_sum, which would lead to inf loss during training - cls_sample_weight = labels['cls_weights'] - box_sample_weight = labels['box_weights'] - num_positives = tf.reduce_sum(box_sample_weight) + 1.0 - cls_sample_weight = cls_sample_weight / num_positives - box_sample_weight = box_sample_weight / num_positives - y_true_cls = loss_utils.multi_level_flatten( - labels['cls_targets'], last_dim=None) - y_true_cls = tf.one_hot(y_true_cls, params.model.num_classes) - y_pred_cls = loss_utils.multi_level_flatten( - outputs['cls_outputs'], last_dim=params.model.num_classes) - y_true_box = loss_utils.multi_level_flatten( - labels['box_targets'], last_dim=4) - y_pred_box = loss_utils.multi_level_flatten( - outputs['box_outputs'], last_dim=4) - - cls_loss = cls_loss_fn( - y_true=y_true_cls, y_pred=y_pred_cls, sample_weight=cls_sample_weight) - box_loss = box_loss_fn( - y_true=y_true_box, y_pred=y_pred_box, sample_weight=box_sample_weight) - - model_loss = cls_loss + params.losses.box_loss_weight * box_loss - - if attribute_heads: - model_loss += self.build_attribute_loss(attribute_heads, outputs, labels, - box_sample_weight) - - total_loss = model_loss - if aux_losses: - reg_loss = tf.reduce_sum(aux_losses) - total_loss = model_loss + reg_loss - - total_loss = params.losses.loss_weight * total_loss - - return total_loss, cls_loss, box_loss, model_loss - - def build_metrics(self, training: bool = True): - """Build detection metrics.""" - metrics = [] - metric_names = ['total_loss', 'cls_loss', 'box_loss', 'model_loss'] - for name in metric_names: - metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) - - if not training: - if self.task_config.validation_data.tfds_name and self.task_config.annotation_file: - raise ValueError( - "Can't evaluate using annotation file when TFDS is used.") - self.coco_metric = coco_evaluator.COCOEvaluator( - annotation_file=self.task_config.annotation_file, - include_mask=False, - per_category_metrics=self.task_config.per_category_metrics) - - return metrics - - def train_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - optimizer: tf.keras.optimizers.Optimizer, - metrics: Optional[List[Any]] = None): - """Does forward and backward. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model(features, training=True) - outputs = tf.nest.map_structure( - lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - loss, cls_loss, box_loss, model_loss = self.build_losses( - outputs=outputs, labels=labels, aux_losses=model.losses) - scaled_loss = loss / num_replicas - - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient when LossScaleOptimizer is used. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = {self.loss: loss} - - all_losses = { - 'total_loss': loss, - 'cls_loss': cls_loss, - 'box_loss': box_loss, - 'model_loss': model_loss, - } - if metrics: - for m in metrics: - m.update_state(all_losses[m.name]) - logs.update({m.name: m.result()}) - - return logs - - def validation_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - metrics: Optional[List[Any]] = None): - """Validatation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - - outputs = model(features, anchor_boxes=labels['anchor_boxes'], - image_shape=labels['image_info'][:, 1, :], - training=False) - loss, cls_loss, box_loss, model_loss = self.build_losses( - outputs=outputs, labels=labels, aux_losses=model.losses) - logs = {self.loss: loss} - - all_losses = { - 'total_loss': loss, - 'cls_loss': cls_loss, - 'box_loss': box_loss, - 'model_loss': model_loss, - } - - coco_model_outputs = { - 'detection_boxes': outputs['detection_boxes'], - 'detection_scores': outputs['detection_scores'], - 'detection_classes': outputs['detection_classes'], - 'num_detections': outputs['num_detections'], - 'source_id': labels['groundtruths']['source_id'], - 'image_info': labels['image_info'] - } - logs.update({self.coco_metric.name: (labels['groundtruths'], - coco_model_outputs)}) - if metrics: - for m in metrics: - m.update_state(all_losses[m.name]) - logs.update({m.name: m.result()}) - return logs - - def aggregate_logs(self, state=None, step_outputs=None): - if state is None: - self.coco_metric.reset_states() - state = self.coco_metric - self.coco_metric.update_state(step_outputs[self.coco_metric.name][0], - step_outputs[self.coco_metric.name][1]) - return state - - def reduce_aggregated_logs(self, aggregated_logs, global_step=None): - return self.coco_metric.result() diff --git a/official/vision/beta/tasks/semantic_segmentation.py b/official/vision/beta/tasks/semantic_segmentation.py deleted file mode 100644 index ab1028e463cb97d2b715e2cf848a24c482b6523d..0000000000000000000000000000000000000000 --- a/official/vision/beta/tasks/semantic_segmentation.py +++ /dev/null @@ -1,337 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Image segmentation task definition.""" -from typing import Any, Optional, List, Tuple, Mapping, Union - -from absl import logging -import tensorflow as tf -from official.common import dataset_fn -from official.core import base_task -from official.core import task_factory -from official.vision.beta.configs import semantic_segmentation as exp_cfg -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.dataloaders import segmentation_input -from official.vision.beta.dataloaders import tfds_factory -from official.vision.beta.evaluation import segmentation_metrics -from official.vision.beta.losses import segmentation_losses -from official.vision.beta.modeling import factory - - -@task_factory.register_task_cls(exp_cfg.SemanticSegmentationTask) -class SemanticSegmentationTask(base_task.Task): - """A task for semantic segmentation.""" - - def build_model(self): - """Builds segmentation model.""" - input_specs = tf.keras.layers.InputSpec( - shape=[None] + self.task_config.model.input_size) - - l2_weight_decay = self.task_config.losses.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - model = factory.build_segmentation_model( - input_specs=input_specs, - model_config=self.task_config.model, - l2_regularizer=l2_regularizer) - return model - - def initialize(self, model: tf.keras.Model): - """Loads pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if 'all' in self.task_config.init_checkpoint_modules: - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - ckpt_items = {} - if 'backbone' in self.task_config.init_checkpoint_modules: - ckpt_items.update(backbone=model.backbone) - if 'decoder' in self.task_config.init_checkpoint_modules: - ckpt_items.update(decoder=model.decoder) - - ckpt = tf.train.Checkpoint(**ckpt_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def build_inputs(self, - params: exp_cfg.DataConfig, - input_context: Optional[tf.distribute.InputContext] = None): - """Builds classification input.""" - - ignore_label = self.task_config.losses.ignore_label - - if params.tfds_name: - decoder = tfds_factory.get_segmentation_decoder(params.tfds_name) - else: - decoder = segmentation_input.Decoder() - - parser = segmentation_input.Parser( - output_size=params.output_size, - crop_size=params.crop_size, - ignore_label=ignore_label, - resize_eval_groundtruth=params.resize_eval_groundtruth, - groundtruth_padded_size=params.groundtruth_padded_size, - aug_scale_min=params.aug_scale_min, - aug_scale_max=params.aug_scale_max, - aug_rand_hflip=params.aug_rand_hflip, - preserve_aspect_ratio=params.preserve_aspect_ratio, - dtype=params.dtype) - - reader = input_reader_factory.input_reader_generator( - params, - dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), - decoder_fn=decoder.decode, - parser_fn=parser.parse_fn(params.is_training)) - - dataset = reader.read(input_context=input_context) - - return dataset - - def build_losses(self, - labels: Mapping[str, tf.Tensor], - model_outputs: Union[Mapping[str, tf.Tensor], tf.Tensor], - aux_losses: Optional[Any] = None): - """Segmentation loss. - - Args: - labels: labels. - model_outputs: Output logits of the classifier. - aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model. - - Returns: - The total loss tensor. - """ - loss_params = self._task_config.losses - segmentation_loss_fn = segmentation_losses.SegmentationLoss( - loss_params.label_smoothing, - loss_params.class_weights, - loss_params.ignore_label, - use_groundtruth_dimension=loss_params.use_groundtruth_dimension, - top_k_percent_pixels=loss_params.top_k_percent_pixels) - - total_loss = segmentation_loss_fn(model_outputs['logits'], labels['masks']) - - if 'mask_scores' in model_outputs: - mask_scoring_loss_fn = segmentation_losses.MaskScoringLoss( - loss_params.ignore_label) - total_loss += mask_scoring_loss_fn( - model_outputs['mask_scores'], - model_outputs['logits'], - labels['masks']) - - if aux_losses: - total_loss += tf.add_n(aux_losses) - - total_loss = loss_params.loss_weight * total_loss - - return total_loss - - def process_metrics(self, metrics, labels, model_outputs, **kwargs): - """Process and update metrics. - - Called when using custom training loop API. - - Args: - metrics: a nested structure of metrics objects. The return of function - self.build_metrics. - labels: a tensor or a nested structure of tensors. - model_outputs: a tensor or a nested structure of tensors. For example, - output of the keras model built by self.build_model. - **kwargs: other args. - """ - for metric in metrics: - if 'mask_scores_mse' is metric.name: - actual_mask_scores = segmentation_losses.get_actual_mask_scores( - model_outputs['logits'], labels['masks'], - self.task_config.losses.ignore_label) - metric.update_state(actual_mask_scores, model_outputs['mask_scores']) - else: - metric.update_state(labels, model_outputs['logits']) - - def build_metrics(self, training: bool = True): - """Gets streaming metrics for training/validation.""" - metrics = [] - if training and self.task_config.evaluation.report_train_mean_iou: - metrics.append(segmentation_metrics.MeanIoU( - name='mean_iou', - num_classes=self.task_config.model.num_classes, - rescale_predictions=False, - dtype=tf.float32)) - if self.task_config.model.get('mask_scoring_head'): - metrics.append( - tf.keras.metrics.MeanSquaredError(name='mask_scores_mse')) - else: - self.iou_metric = segmentation_metrics.PerClassIoU( - name='per_class_iou', - num_classes=self.task_config.model.num_classes, - rescale_predictions=not self.task_config.validation_data - .resize_eval_groundtruth, - dtype=tf.float32) - if self.task_config.validation_data.resize_eval_groundtruth and self.task_config.model.get('mask_scoring_head'): # pylint: disable=line-too-long - # Masks scores metric can only be computed if labels are scaled to match - # preticted mask scores. - metrics.append( - tf.keras.metrics.MeanSquaredError(name='mask_scores_mse')) - - # Update state on CPU if TPUStrategy due to dynamic resizing. - self._process_iou_metric_on_cpu = isinstance( - tf.distribute.get_strategy(), tf.distribute.TPUStrategy) - - return metrics - - def train_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - optimizer: tf.keras.optimizers.Optimizer, - metrics: Optional[List[Any]] = None): - """Does forward and backward. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - - input_partition_dims = self.task_config.train_input_partition_dims - if input_partition_dims: - strategy = tf.distribute.get_strategy() - features = strategy.experimental_split_to_logical_devices( - features, input_partition_dims) - - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model(features, training=True) - if isinstance(outputs, tf.Tensor): - outputs = {'logits': outputs} - # Casting output layer as float32 is necessary when mixed_precision is - # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. - outputs = tf.nest.map_structure( - lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - loss = self.build_losses( - model_outputs=outputs, labels=labels, aux_losses=model.losses) - # Scales loss as the default gradients allreduce performs sum inside the - # optimizer. - scaled_loss = loss / num_replicas - - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient before apply_gradients when LossScaleOptimizer is - # used. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = {self.loss: loss} - if metrics: - self.process_metrics(metrics, labels, outputs) - logs.update({m.name: m.result() for m in metrics}) - - return logs - - def validation_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - metrics: Optional[List[Any]] = None): - """Validatation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - - input_partition_dims = self.task_config.eval_input_partition_dims - if input_partition_dims: - strategy = tf.distribute.get_strategy() - features = strategy.experimental_split_to_logical_devices( - features, input_partition_dims) - - outputs = self.inference_step(features, model) - if isinstance(outputs, tf.Tensor): - outputs = {'logits': outputs} - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - - if self.task_config.validation_data.resize_eval_groundtruth: - loss = self.build_losses(model_outputs=outputs, labels=labels, - aux_losses=model.losses) - else: - loss = 0 - - logs = {self.loss: loss} - - if self._process_iou_metric_on_cpu: - logs.update({self.iou_metric.name: (labels, outputs['logits'])}) - else: - self.iou_metric.update_state(labels, outputs['logits']) - - if metrics: - self.process_metrics(metrics, labels, outputs) - logs.update({m.name: m.result() for m in metrics}) - - return logs - - def inference_step(self, inputs: tf.Tensor, model: tf.keras.Model): - """Performs the forward step.""" - return model(inputs, training=False) - - def aggregate_logs(self, state=None, step_outputs=None): - if state is None: - self.iou_metric.reset_states() - state = self.iou_metric - if self._process_iou_metric_on_cpu: - self.iou_metric.update_state(step_outputs[self.iou_metric.name][0], - step_outputs[self.iou_metric.name][1]) - return state - - def reduce_aggregated_logs(self, aggregated_logs, global_step=None): - result = {} - ious = self.iou_metric.result() - # TODO(arashwan): support loading class name from a label map file. - if self.task_config.evaluation.report_per_class_iou: - for i, value in enumerate(ious.numpy()): - result.update({'iou/{}'.format(i): value}) - # Computes mean IoU - result.update({'mean_iou': tf.reduce_mean(ious).numpy()}) - return result diff --git a/official/vision/beta/tasks/video_classification.py b/official/vision/beta/tasks/video_classification.py deleted file mode 100644 index 8cafba94ece7369374b18144430d8ccdcba9b5b7..0000000000000000000000000000000000000000 --- a/official/vision/beta/tasks/video_classification.py +++ /dev/null @@ -1,353 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Video classification task definition.""" -from typing import Any, Optional, List, Tuple - -from absl import logging -import tensorflow as tf -from official.core import base_task -from official.core import task_factory -from official.modeling import tf_utils -from official.vision.beta.configs import video_classification as exp_cfg -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.dataloaders import video_input -from official.vision.beta.modeling import factory_3d - - -@task_factory.register_task_cls(exp_cfg.VideoClassificationTask) -class VideoClassificationTask(base_task.Task): - """A task for video classification.""" - - def _get_num_classes(self): - """Gets the number of classes.""" - return self.task_config.train_data.num_classes - - def _get_feature_shape(self): - """Get the common feature shape for train and eval.""" - return [ - d1 if d1 == d2 else None - for d1, d2 in zip(self.task_config.train_data.feature_shape, - self.task_config.validation_data.feature_shape) - ] - - def _get_num_test_views(self): - """Gets number of views for test.""" - num_test_clips = self.task_config.validation_data.num_test_clips - num_test_crops = self.task_config.validation_data.num_test_crops - num_test_views = num_test_clips * num_test_crops - return num_test_views - - def _is_multilabel(self): - """If the label is multi-labels.""" - return self.task_config.train_data.is_multilabel - - def build_model(self): - """Builds video classification model.""" - common_input_shape = self._get_feature_shape() - input_specs = tf.keras.layers.InputSpec(shape=[None] + common_input_shape) - logging.info('Build model input %r', common_input_shape) - - l2_weight_decay = self.task_config.losses.l2_weight_decay - # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. - # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) - # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) - l2_regularizer = (tf.keras.regularizers.l2( - l2_weight_decay / 2.0) if l2_weight_decay else None) - - model = factory_3d.build_model( - self.task_config.model.model_type, - input_specs=input_specs, - model_config=self.task_config.model, - num_classes=self._get_num_classes(), - l2_regularizer=l2_regularizer) - return model - - def initialize(self, model: tf.keras.Model): - """Loads pretrained checkpoint.""" - if not self.task_config.init_checkpoint: - return - - ckpt_dir_or_file = self.task_config.init_checkpoint - if tf.io.gfile.isdir(ckpt_dir_or_file): - ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) - - # Restoring checkpoint. - if self.task_config.init_checkpoint_modules == 'all': - ckpt = tf.train.Checkpoint(**model.checkpoint_items) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - elif self.task_config.init_checkpoint_modules == 'backbone': - ckpt = tf.train.Checkpoint(backbone=model.backbone) - status = ckpt.read(ckpt_dir_or_file) - status.expect_partial().assert_existing_objects_matched() - else: - raise ValueError( - "Only 'all' or 'backbone' can be used to initialize the model.") - - logging.info('Finished loading pretrained checkpoint from %s', - ckpt_dir_or_file) - - def _get_dataset_fn(self, params): - if params.file_type == 'tfrecord': - return tf.data.TFRecordDataset - else: - raise ValueError('Unknown input file type {!r}'.format(params.file_type)) - - def _get_decoder_fn(self, params): - if params.tfds_name: - decoder = video_input.VideoTfdsDecoder( - image_key=params.image_field_key, label_key=params.label_field_key) - else: - decoder = video_input.Decoder( - image_key=params.image_field_key, label_key=params.label_field_key) - if self.task_config.train_data.output_audio: - assert self.task_config.train_data.audio_feature, 'audio feature is empty' - decoder.add_feature(self.task_config.train_data.audio_feature, - tf.io.VarLenFeature(dtype=tf.float32)) - return decoder.decode - - def build_inputs(self, - params: exp_cfg.DataConfig, - input_context: Optional[tf.distribute.InputContext] = None): - """Builds classification input.""" - - parser = video_input.Parser( - input_params=params, - image_key=params.image_field_key, - label_key=params.label_field_key) - postprocess_fn = video_input.PostBatchProcessor(params) - - reader = input_reader_factory.input_reader_generator( - params, - dataset_fn=self._get_dataset_fn(params), - decoder_fn=self._get_decoder_fn(params), - parser_fn=parser.parse_fn(params.is_training), - postprocess_fn=postprocess_fn) - - dataset = reader.read(input_context=input_context) - - return dataset - - def build_losses(self, - labels: Any, - model_outputs: Any, - aux_losses: Optional[Any] = None): - """Sparse categorical cross entropy loss. - - Args: - labels: labels. - model_outputs: Output logits of the classifier. - aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model. - - Returns: - The total loss tensor. - """ - all_losses = {} - losses_config = self.task_config.losses - total_loss = None - if self._is_multilabel(): - entropy = -tf.reduce_mean( - tf.reduce_sum(model_outputs * tf.math.log(model_outputs + 1e-8), -1)) - total_loss = tf.keras.losses.binary_crossentropy( - labels, model_outputs, from_logits=False) - all_losses.update({ - 'class_loss': total_loss, - 'entropy': entropy, - }) - else: - if losses_config.one_hot: - total_loss = tf.keras.losses.categorical_crossentropy( - labels, - model_outputs, - from_logits=False, - label_smoothing=losses_config.label_smoothing) - else: - total_loss = tf.keras.losses.sparse_categorical_crossentropy( - labels, model_outputs, from_logits=False) - - total_loss = tf_utils.safe_mean(total_loss) - all_losses.update({ - 'class_loss': total_loss, - }) - if aux_losses: - all_losses.update({ - 'reg_loss': aux_losses, - }) - total_loss += tf.add_n(aux_losses) - all_losses[self.loss] = total_loss - - return all_losses - - def build_metrics(self, training: bool = True): - """Gets streaming metrics for training/validation.""" - if self.task_config.losses.one_hot: - metrics = [ - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - tf.keras.metrics.TopKCategoricalAccuracy(k=1, name='top_1_accuracy'), - tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='top_5_accuracy') - ] - if self._is_multilabel(): - metrics.append( - tf.keras.metrics.AUC( - curve='ROC', multi_label=self._is_multilabel(), name='ROC-AUC')) - metrics.append( - tf.keras.metrics.RecallAtPrecision( - 0.95, name='RecallAtPrecision95')) - metrics.append( - tf.keras.metrics.AUC( - curve='PR', multi_label=self._is_multilabel(), name='PR-AUC')) - if self.task_config.metrics.use_per_class_recall: - for i in range(self._get_num_classes()): - metrics.append( - tf.keras.metrics.Recall(class_id=i, name=f'recall-{i}')) - else: - metrics = [ - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - tf.keras.metrics.SparseTopKCategoricalAccuracy( - k=1, name='top_1_accuracy'), - tf.keras.metrics.SparseTopKCategoricalAccuracy( - k=5, name='top_5_accuracy') - ] - return metrics - - def process_metrics(self, metrics: List[Any], labels: Any, - model_outputs: Any): - """Process and update metrics. - - Called when using custom training loop API. - - Args: - metrics: a nested structure of metrics objects. The return of function - self.build_metrics. - labels: a tensor or a nested structure of tensors. - model_outputs: a tensor or a nested structure of tensors. For example, - output of the keras model built by self.build_model. - """ - for metric in metrics: - metric.update_state(labels, model_outputs) - - def train_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - optimizer: tf.keras.optimizers.Optimizer, - metrics: Optional[List[Any]] = None): - """Does forward and backward. - - Args: - inputs: a dictionary of input tensors. - model: the model, forward pass definition. - optimizer: the optimizer for this training step. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - input_partition_dims = self.task_config.train_input_partition_dims - if input_partition_dims: - strategy = tf.distribute.get_strategy() - features['image'] = strategy.experimental_split_to_logical_devices( - features['image'], input_partition_dims) - - num_replicas = tf.distribute.get_strategy().num_replicas_in_sync - with tf.GradientTape() as tape: - outputs = model(features, training=True) - # Casting output layer as float32 is necessary when mixed_precision is - # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. - outputs = tf.nest.map_structure( - lambda x: tf.cast(x, tf.float32), outputs) - - # Computes per-replica loss. - if self._is_multilabel(): - outputs = tf.math.sigmoid(outputs) - else: - outputs = tf.math.softmax(outputs) - all_losses = self.build_losses( - model_outputs=outputs, labels=labels, aux_losses=model.losses) - loss = all_losses[self.loss] - # Scales loss as the default gradients allreduce performs sum inside the - # optimizer. - scaled_loss = loss / num_replicas - - # For mixed_precision policy, when LossScaleOptimizer is used, loss is - # scaled for numerical stability. - if isinstance( - optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - scaled_loss = optimizer.get_scaled_loss(scaled_loss) - - tvars = model.trainable_variables - grads = tape.gradient(scaled_loss, tvars) - # Scales back gradient before apply_gradients when LossScaleOptimizer is - # used. - if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): - grads = optimizer.get_unscaled_gradients(grads) - optimizer.apply_gradients(list(zip(grads, tvars))) - - logs = all_losses - if metrics: - self.process_metrics(metrics, labels, outputs) - logs.update({m.name: m.result() for m in metrics}) - elif model.compiled_metrics: - self.process_compiled_metrics(model.compiled_metrics, labels, outputs) - logs.update({m.name: m.result() for m in model.metrics}) - return logs - - def validation_step(self, - inputs: Tuple[Any, Any], - model: tf.keras.Model, - metrics: Optional[List[Any]] = None): - """Validatation step. - - Args: - inputs: a dictionary of input tensors. - model: the keras.Model. - metrics: a nested structure of metrics objects. - - Returns: - A dictionary of logs. - """ - features, labels = inputs - input_partition_dims = self.task_config.eval_input_partition_dims - if input_partition_dims: - strategy = tf.distribute.get_strategy() - features['image'] = strategy.experimental_split_to_logical_devices( - features['image'], input_partition_dims) - - outputs = self.inference_step(features, model) - outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) - logs = self.build_losses(model_outputs=outputs, labels=labels, - aux_losses=model.losses) - - if metrics: - self.process_metrics(metrics, labels, outputs) - logs.update({m.name: m.result() for m in metrics}) - elif model.compiled_metrics: - self.process_compiled_metrics(model.compiled_metrics, labels, outputs) - logs.update({m.name: m.result() for m in model.metrics}) - return logs - - def inference_step(self, features: tf.Tensor, model: tf.keras.Model): - """Performs the forward step.""" - outputs = model(features, training=False) - if self._is_multilabel(): - outputs = tf.math.sigmoid(outputs) - else: - outputs = tf.math.softmax(outputs) - num_test_views = self._get_num_test_views() - if num_test_views > 1: - # Averaging output probabilities across multiples views. - outputs = tf.reshape(outputs, [-1, num_test_views, outputs.shape[-1]]) - outputs = tf.reduce_mean(outputs, axis=1) - return outputs diff --git a/official/vision/beta/train.py b/official/vision/beta/train.py deleted file mode 100644 index c3debad44b846ec9052c0fd2a7e7b46aec35587f..0000000000000000000000000000000000000000 --- a/official/vision/beta/train.py +++ /dev/null @@ -1,70 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""TensorFlow Model Garden Vision training driver.""" - -from absl import app -from absl import flags -import gin - -# pylint: disable=unused-import -from official.common import registry_imports -# pylint: enable=unused-import -from official.common import distribute_utils -from official.common import flags as tfm_flags -from official.core import task_factory -from official.core import train_lib -from official.core import train_utils -from official.modeling import performance - -FLAGS = flags.FLAGS - - -def main(_): - gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) - params = train_utils.parse_configuration(FLAGS) - model_dir = FLAGS.model_dir - if 'train' in FLAGS.mode: - # Pure eval modes do not output yaml files. Otherwise continuous eval job - # may race against the train job for writing the same file. - train_utils.serialize_config(params, model_dir) - - # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' - # can have significant impact on model speeds by utilizing float16 in case of - # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when - # dtype is float16 - if params.runtime.mixed_precision_dtype: - performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) - distribution_strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - with distribution_strategy.scope(): - task = task_factory.get_task(params.task, logging_dir=model_dir) - - train_lib.run_experiment( - distribution_strategy=distribution_strategy, - task=task, - mode=FLAGS.mode, - params=params, - model_dir=model_dir) - - train_utils.save_gin_config(FLAGS.mode, model_dir) - -if __name__ == '__main__': - tfm_flags.define_flags() - flags.mark_flags_as_required(['experiment', 'mode', 'model_dir']) - app.run(main) diff --git a/official/vision/configs/__init__.py b/official/vision/configs/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7ba5793215cc202e07d4221d9b273fba61cb9c03 --- /dev/null +++ b/official/vision/configs/__init__.py @@ -0,0 +1,24 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Configs package definition.""" + +from official.vision.configs import backbones +from official.vision.configs import backbones_3d +from official.vision.configs import common +from official.vision.configs import image_classification +from official.vision.configs import maskrcnn +from official.vision.configs import retinanet +from official.vision.configs import semantic_segmentation +from official.vision.configs import video_classification diff --git a/official/vision/configs/backbones.py b/official/vision/configs/backbones.py new file mode 100644 index 0000000000000000000000000000000000000000..c9ae9ffcd5f401af689b00bcdb41ff066f9f3ea5 --- /dev/null +++ b/official/vision/configs/backbones.py @@ -0,0 +1,158 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Backbones configurations.""" +import dataclasses +from typing import List, Optional, Tuple + +from official.modeling import hyperparams + + +@dataclasses.dataclass +class Transformer(hyperparams.Config): + """Transformer config.""" + mlp_dim: int = 1 + num_heads: int = 1 + num_layers: int = 1 + attention_dropout_rate: float = 0.0 + dropout_rate: float = 0.1 + + +@dataclasses.dataclass +class VisionTransformer(hyperparams.Config): + """VisionTransformer config.""" + model_name: str = 'vit-b16' + # pylint: disable=line-too-long + pooler: str = 'token' # 'token', 'gap' or 'none'. If set to 'token', an extra classification token is added to sequence. + # pylint: enable=line-too-long + representation_size: int = 0 + hidden_size: int = 1 + patch_size: int = 16 + transformer: Transformer = Transformer() + init_stochastic_depth_rate: float = 0.0 + original_init: bool = True + pos_embed_shape: Optional[Tuple[int, int]] = None + + +@dataclasses.dataclass +class ResNet(hyperparams.Config): + """ResNet config.""" + model_id: int = 50 + depth_multiplier: float = 1.0 + stem_type: str = 'v0' + se_ratio: float = 0.0 + stochastic_depth_drop_rate: float = 0.0 + scale_stem: bool = True + resnetd_shortcut: bool = False + replace_stem_max_pool: bool = False + bn_trainable: bool = True + + +@dataclasses.dataclass +class DilatedResNet(hyperparams.Config): + """DilatedResNet config.""" + model_id: int = 50 + output_stride: int = 16 + multigrid: Optional[List[int]] = None + stem_type: str = 'v0' + last_stage_repeats: int = 1 + se_ratio: float = 0.0 + stochastic_depth_drop_rate: float = 0.0 + resnetd_shortcut: bool = False + replace_stem_max_pool: bool = False + + +@dataclasses.dataclass +class EfficientNet(hyperparams.Config): + """EfficientNet config.""" + model_id: str = 'b0' + se_ratio: float = 0.0 + stochastic_depth_drop_rate: float = 0.0 + + +@dataclasses.dataclass +class MobileNet(hyperparams.Config): + """Mobilenet config.""" + model_id: str = 'MobileNetV2' + filter_size_scale: float = 1.0 + stochastic_depth_drop_rate: float = 0.0 + output_stride: Optional[int] = None + output_intermediate_endpoints: bool = False + + +@dataclasses.dataclass +class SpineNet(hyperparams.Config): + """SpineNet config.""" + model_id: str = '49' + stochastic_depth_drop_rate: float = 0.0 + min_level: int = 3 + max_level: int = 7 + + +@dataclasses.dataclass +class SpineNetMobile(hyperparams.Config): + """SpineNet config.""" + model_id: str = '49' + stochastic_depth_drop_rate: float = 0.0 + se_ratio: float = 0.2 + expand_ratio: int = 6 + min_level: int = 3 + max_level: int = 7 + # If use_keras_upsampling_2d is True, model uses UpSampling2D keras layer + # instead of optimized custom TF op. It makes model be more keras style. We + # set this flag to True when we apply QAT from model optimization toolkit + # that requires the model should use keras layers. + use_keras_upsampling_2d: bool = False + + +@dataclasses.dataclass +class RevNet(hyperparams.Config): + """RevNet config.""" + # Specifies the depth of RevNet. + model_id: int = 56 + + +@dataclasses.dataclass +class MobileDet(hyperparams.Config): + """Mobiledet config.""" + model_id: str = 'MobileDetCPU' + filter_size_scale: float = 1.0 + + +@dataclasses.dataclass +class Backbone(hyperparams.OneOfConfig): + """Configuration for backbones. + + Attributes: + type: 'str', type of backbone be used, one of the fields below. + resnet: resnet backbone config. + dilated_resnet: dilated resnet backbone for semantic segmentation config. + revnet: revnet backbone config. + efficientnet: efficientnet backbone config. + spinenet: spinenet backbone config. + spinenet_mobile: mobile spinenet backbone config. + mobilenet: mobilenet backbone config. + mobiledet: mobiledet backbone config. + vit: vision transformer backbone config. + """ + type: Optional[str] = None + resnet: ResNet = ResNet() + dilated_resnet: DilatedResNet = DilatedResNet() + revnet: RevNet = RevNet() + efficientnet: EfficientNet = EfficientNet() + spinenet: SpineNet = SpineNet() + spinenet_mobile: SpineNetMobile = SpineNetMobile() + mobilenet: MobileNet = MobileNet() + mobiledet: MobileDet = MobileDet() + vit: VisionTransformer = VisionTransformer() diff --git a/official/vision/beta/configs/backbones_3d.py b/official/vision/configs/backbones_3d.py similarity index 97% rename from official/vision/beta/configs/backbones_3d.py rename to official/vision/configs/backbones_3d.py index d23df73ed60d2714e86ba68b94c88f1f4295e10b..436a3b1be667e04742f7cf106c3430e5a84bfa10 100644 --- a/official/vision/beta/configs/backbones_3d.py +++ b/official/vision/configs/backbones_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """3D Backbones configurations.""" from typing import Optional, Tuple diff --git a/official/vision/configs/common.py b/official/vision/configs/common.py new file mode 100644 index 0000000000000000000000000000000000000000..6731715bc508e8e781725bae2cc34bc739b1ab53 --- /dev/null +++ b/official/vision/configs/common.py @@ -0,0 +1,147 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Common configurations.""" + +import dataclasses +from typing import List, Optional + +# Import libraries + +from official.core import config_definitions as cfg +from official.modeling import hyperparams + + +@dataclasses.dataclass +class TfExampleDecoder(hyperparams.Config): + """A simple TF Example decoder config.""" + regenerate_source_id: bool = False + mask_binarize_threshold: Optional[float] = None + + +@dataclasses.dataclass +class TfExampleDecoderLabelMap(hyperparams.Config): + """TF Example decoder with label map config.""" + regenerate_source_id: bool = False + mask_binarize_threshold: Optional[float] = None + label_map: str = '' + + +@dataclasses.dataclass +class DataDecoder(hyperparams.OneOfConfig): + """Data decoder config. + + Attributes: + type: 'str', type of data decoder be used, one of the fields below. + simple_decoder: simple TF Example decoder config. + label_map_decoder: TF Example decoder with label map config. + """ + type: Optional[str] = 'simple_decoder' + simple_decoder: TfExampleDecoder = TfExampleDecoder() + label_map_decoder: TfExampleDecoderLabelMap = TfExampleDecoderLabelMap() + + +@dataclasses.dataclass +class RandAugment(hyperparams.Config): + """Configuration for RandAugment.""" + num_layers: int = 2 + magnitude: float = 10 + cutout_const: float = 40 + translate_const: float = 10 + magnitude_std: float = 0.0 + prob_to_apply: Optional[float] = None + exclude_ops: List[str] = dataclasses.field(default_factory=list) + + +@dataclasses.dataclass +class AutoAugment(hyperparams.Config): + """Configuration for AutoAugment.""" + augmentation_name: str = 'v0' + cutout_const: float = 100 + translate_const: float = 250 + + +@dataclasses.dataclass +class RandomErasing(hyperparams.Config): + """Configuration for RandomErasing.""" + probability: float = 0.25 + min_area: float = 0.02 + max_area: float = 1 / 3 + min_aspect: float = 0.3 + max_aspect = None + min_count = 1 + max_count = 1 + trials = 10 + + +@dataclasses.dataclass +class MixupAndCutmix(hyperparams.Config): + """Configuration for MixupAndCutmix.""" + mixup_alpha: float = .8 + cutmix_alpha: float = 1. + prob: float = 1.0 + switch_prob: float = 0.5 + label_smoothing: float = 0.1 + + +@dataclasses.dataclass +class Augmentation(hyperparams.OneOfConfig): + """Configuration for input data augmentation. + + Attributes: + type: 'str', type of augmentation be used, one of the fields below. + randaug: RandAugment config. + autoaug: AutoAugment config. + """ + type: Optional[str] = None + randaug: RandAugment = RandAugment() + autoaug: AutoAugment = AutoAugment() + + +@dataclasses.dataclass +class NormActivation(hyperparams.Config): + activation: str = 'relu' + use_sync_bn: bool = True + norm_momentum: float = 0.99 + norm_epsilon: float = 0.001 + + +@dataclasses.dataclass +class PseudoLabelDataConfig(cfg.DataConfig): + """Psuedo Label input config for training.""" + input_path: str = '' + data_ratio: float = 1.0 # Per-batch ratio of pseudo-labeled to labeled data. + is_training: bool = True + dtype: str = 'float32' + shuffle_buffer_size: int = 10000 + cycle_length: int = 10 + aug_rand_hflip: bool = True + aug_type: Optional[ + Augmentation] = None # Choose from AutoAugment and RandAugment. + file_type: str = 'tfrecord' + + # Keep for backward compatibility. + aug_policy: Optional[str] = None # None, 'autoaug', or 'randaug'. + randaug_magnitude: Optional[int] = 10 + + +@dataclasses.dataclass +class TFLitePostProcessingConfig(hyperparams.Config): + max_detections: int = 200 + max_classes_per_detection: int = 5 + # Regular NMS run in a multi-class fashion and is slow. Setting it to False + # uses class-agnostic NMS, which is faster. + use_regular_nms: bool = False + nms_score_threshold: float = 0.1 + nms_iou_threshold: float = 0.5 diff --git a/official/vision/configs/decoders.py b/official/vision/configs/decoders.py new file mode 100644 index 0000000000000000000000000000000000000000..8b243c10177a0bf96d11e953c35d0c363d6cdbfa --- /dev/null +++ b/official/vision/configs/decoders.py @@ -0,0 +1,72 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Decoders configurations.""" +import dataclasses +from typing import List, Optional + +# Import libraries + +from official.modeling import hyperparams + + +@dataclasses.dataclass +class Identity(hyperparams.Config): + """Identity config.""" + pass + + +@dataclasses.dataclass +class FPN(hyperparams.Config): + """FPN config.""" + num_filters: int = 256 + fusion_type: str = 'sum' + use_separable_conv: bool = False + use_keras_layer: bool = False + + +@dataclasses.dataclass +class NASFPN(hyperparams.Config): + """NASFPN config.""" + num_filters: int = 256 + num_repeats: int = 5 + use_separable_conv: bool = False + + +@dataclasses.dataclass +class ASPP(hyperparams.Config): + """ASPP config.""" + level: int = 4 + dilation_rates: List[int] = dataclasses.field(default_factory=list) + dropout_rate: float = 0.0 + num_filters: int = 256 + use_depthwise_convolution: bool = False + pool_kernel_size: Optional[List[int]] = None # Use global average pooling. + spp_layer_version: str = 'v1' + output_tensor: bool = False + + +@dataclasses.dataclass +class Decoder(hyperparams.OneOfConfig): + """Configuration for decoders. + + Attributes: + type: 'str', type of decoder be used, one of the fields below. + fpn: fpn config. + """ + type: Optional[str] = None + fpn: FPN = FPN() + nasfpn: NASFPN = NASFPN() + identity: Identity = Identity() + aspp: ASPP = ASPP() diff --git a/official/vision/configs/experiments/image_classification/imagenet_mobilenetv1_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv1_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..258c46ae71f7791f3f5617d5da5b9c402b01b89c --- /dev/null +++ b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv1_tpu.yaml @@ -0,0 +1,48 @@ +# MobileNetV1_1.0 ImageNet classification. 72.33% top-1 and 90.65% top-5 accuracy. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'bfloat16' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV1' + filter_size_scale: 1.0 + dropout_rate: 0.2 + losses: + l2_weight_decay: 0.00001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 4096 + dtype: 'bfloat16' + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 4096 + dtype: 'bfloat16' + drop_remainder: false +trainer: + train_steps: 156000 # 500 epochs + validation_steps: 13 + validation_interval: 312 + steps_per_loop: 312 # NUM_EXAMPLES (1281167) // global_batch_size + summary_interval: 312 + checkpoint_interval: 312 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.256 # 0.008 * batch_size / 128 + decay_steps: 780 # 2.5 * steps_per_epoch + decay_rate: 0.94 + staircase: true + warmup: + type: 'linear' + linear: + warmup_steps: 1560 diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv2_gpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv2_gpu.yaml similarity index 88% rename from official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv2_gpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_mobilenetv2_gpu.yaml index ff1a0719e6f179c98674ba7723bd1695aaa90241..35ea33df68a552ee5451a4457c9abbc9c19f6491 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv2_gpu.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv2_gpu.yaml @@ -18,12 +18,12 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 1024 # 128 * 8 dtype: 'float16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 1024 # 128 * 8 dtype: 'float16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv2_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv2_tpu.yaml similarity index 88% rename from official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv2_tpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_mobilenetv2_tpu.yaml index b5df9d6e74a44617b91625042f5821fe8c967d9f..a5ec4420b0792750e85054490f5c9a99b4843724 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_mobilenetv2_tpu.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv2_tpu.yaml @@ -17,12 +17,12 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_avg_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_avg_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d35ea55f1573ba0586eb9d894c03aeeadec34ab0 --- /dev/null +++ b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_avg_tpu.yaml @@ -0,0 +1,60 @@ +# MobileNetV3.5 AVG ImageNet classification. 75.27% top-1 and 92.43% top-5 accuracy. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'bfloat16' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + mobilenet: + filter_size_scale: 1.0 + model_id: MobileNetMultiAVG + stochastic_depth_drop_rate: 0.0 + type: mobilenet + dropout_rate: 0.2 + norm_activation: + activation: relu + norm_epsilon: 0.001 + norm_momentum: 0.997 + use_sync_bn: false + losses: + l2_weight_decay: 0.00001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 4096 + dtype: 'bfloat16' + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 4096 + dtype: 'bfloat16' + drop_remainder: false +trainer: + train_steps: 156000 # 500 epochs + validation_steps: 13 + validation_interval: 312 + steps_per_loop: 312 # NUM_EXAMPLES (1281167) // global_batch_size + summary_interval: 312 + checkpoint_interval: 312 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.256 # 0.008 * batch_size / 128 + decay_steps: 780 # 2.5 * steps_per_epoch + decay_rate: 0.96 + staircase: true + optimizer: + type: rmsprop + rmsprop: + epsilon: 0.002 + momentum: 0.9 + rho: 0.9 + warmup: + type: 'linear' + linear: + warmup_steps: 1560 diff --git a/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_avgseg_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_avgseg_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..15fd099a1776c06181513e856c3d2f9092979841 --- /dev/null +++ b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3.5_avgseg_tpu.yaml @@ -0,0 +1,60 @@ +# MobileNetV3.5 AVG Seg ImageNet classification. 73.35% top-1 and 91.49% top-5 accuracy. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'bfloat16' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + mobilenet: + filter_size_scale: 1.0 + model_id: MobileNetMultiAVGSeg + stochastic_depth_drop_rate: 0.0 + type: mobilenet + dropout_rate: 0.2 + norm_activation: + activation: relu + norm_epsilon: 0.001 + norm_momentum: 0.997 + use_sync_bn: false + losses: + l2_weight_decay: 0.00001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 4096 + dtype: 'bfloat16' + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 4096 + dtype: 'bfloat16' + drop_remainder: false +trainer: + train_steps: 156000 # 500 epochs + validation_steps: 13 + validation_interval: 312 + steps_per_loop: 312 # NUM_EXAMPLES (1281167) // global_batch_size + summary_interval: 312 + checkpoint_interval: 312 + optimizer_config: + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.256 # 0.008 * batch_size / 128 + decay_steps: 780 # 2.5 * steps_per_epoch + decay_rate: 0.96 + staircase: true + optimizer: + type: rmsprop + rmsprop: + epsilon: 0.002 + momentum: 0.9 + rho: 0.9 + warmup: + type: 'linear' + linear: + warmup_steps: 1560 diff --git a/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3large_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3large_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..5c088720fe58cfd8d604a02611af13cf65ac6c9a --- /dev/null +++ b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3large_tpu.yaml @@ -0,0 +1,70 @@ +# MobileNetV3-large_1.0 ImageNet classification: ~75.7% top-1. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'bfloat16' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV3Large' + filter_size_scale: 1.0 + kernel_initializer: 'random_uniform' + norm_activation: + norm_epsilon: 0.001 + norm_momentum: 0.997 + dropout_rate: 0.2 + losses: + l2_weight_decay: 0.0 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 4096 + dtype: 'bfloat16' + aug_type: + autoaug: + augmentation_name: 'v0' + cutout_const: 100 + translate_const: 250 + type: 'autoaug' + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 4096 + dtype: 'bfloat16' + drop_remainder: false +trainer: + train_steps: 218000 # 700 epochs + validation_steps: 13 + validation_interval: 312 + steps_per_loop: 312 # NUM_EXAMPLES (1281167) // global_batch_size + summary_interval: 312 + checkpoint_interval: 312 + optimizer_config: + learning_rate: + cosine: + alpha: 0.0 + decay_steps: 218000 + initial_learning_rate: 0.004 + name: CosineDecay + offset: 0 + type: 'cosine' + optimizer: + adamw: + amsgrad: false + beta_1: 0.9 + beta_2: 0.999 + clipnorm: null + clipvalue: null + epsilon: 1.0e-07 + exclude_from_weight_decay: ['batch_normalization'] + global_clipnorm: null + gradient_clip_norm: 0.0 + include_in_weight_decay: null + name: 'AdamWeightDecay' + weight_decay_rate: 0.1 + type: 'adamw' diff --git a/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3small_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3small_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..a9959c336898c5acf5ddc351aea167cfd05ef047 --- /dev/null +++ b/official/vision/configs/experiments/image_classification/imagenet_mobilenetv3small_tpu.yaml @@ -0,0 +1,63 @@ +# MobileNetV3Small ImageNet classification. 67.5% top-1 and 87.7% top-5 accuracy. +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'bfloat16' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'mobilenet' + mobilenet: + model_id: 'MobileNetV3Small' + filter_size_scale: 1.0 + norm_activation: + activation: 'relu' + norm_momentum: 0.997 + norm_epsilon: 0.001 + use_sync_bn: false + dropout_rate: 0.2 + losses: + l2_weight_decay: 0.00001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 4096 + dtype: 'bfloat16' + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 4096 + dtype: 'bfloat16' + drop_remainder: false +trainer: + train_steps: 312000 # 1000 epochs + validation_steps: 13 + validation_interval: 312 + steps_per_loop: 312 # NUM_EXAMPLES (1281167) // global_batch_size + summary_interval: 312 + checkpoint_interval: 312 + optimizer_config: + optimizer: + type: 'rmsprop' + rmsprop: + rho: 0.9 + momentum: 0.9 + epsilon: 0.002 + learning_rate: + type: 'exponential' + exponential: + initial_learning_rate: 0.426 # 0.02 * (batch_size / 192) + decay_steps: 936 # 3 * steps_per_epoch + decay_rate: 0.99 + staircase: true + ema: + average_decay: 0.9999 + trainable_weights_only: false + warmup: + type: 'linear' + linear: + warmup_steps: 1560 + warmup_learning_rate: 0.0 diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_deeplab_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnet101_deeplab_tpu.yaml similarity index 88% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_deeplab_tpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnet101_deeplab_tpu.yaml index 5d7d295963752ff64ee39e6d159eb99831da3306..cb86a8e7f11f9377d299bf5e8fc8922f3bd80f36 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_deeplab_tpu.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnet101_deeplab_tpu.yaml @@ -23,13 +23,13 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' aug_policy: 'randaug' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml similarity index 87% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml index 2600f58faa595d2100bc0325c7c4fdc83c9517c1..c9fea870939a1ab43f4c93e7bd5c98898b026a0e 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnet101_tpu.yaml @@ -17,12 +17,12 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml similarity index 87% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml index 1c81953e2f6c06eb1d67a619d3106c7195da4e20..5093ce1100b5d774ffa688fba80a2d7794bce9cf 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnet152_tpu.yaml @@ -17,12 +17,12 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_deeplab_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnet50_deeplab_tpu.yaml similarity index 86% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_deeplab_tpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnet50_deeplab_tpu.yaml index 11bdafbc35d4c6f63625b5990de0167a78a7e6b0..5054d1a7eafa7e7cda84e66438babbef9781b711 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_deeplab_tpu.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnet50_deeplab_tpu.yaml @@ -17,12 +17,12 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f31fde1ddd218c8a73307835fd9c1709fc080ef6 --- /dev/null +++ b/official/vision/configs/experiments/image_classification/imagenet_resnet50_gpu.yaml @@ -0,0 +1,52 @@ +runtime: + distribution_strategy: 'mirrored' + mixed_precision_dtype: 'float16' + loss_scale: 'dynamic' +task: + model: + num_classes: 1001 + input_size: [224, 224, 3] + backbone: + type: 'resnet' + resnet: + model_id: 50 + losses: + l2_weight_decay: 0.0001 + one_hot: true + label_smoothing: 0.1 + train_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' + is_training: true + global_batch_size: 2048 + dtype: 'float16' + # Autotuning the prefetch buffer size causes OOMs, so set it to a reasonable + # static value: 32. See b/218880025. + prefetch_buffer_size: 32 + validation_data: + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' + is_training: false + global_batch_size: 2048 + dtype: 'float16' + drop_remainder: false + prefetch_buffer_size: 32 +trainer: + train_steps: 56160 + validation_steps: 25 + validation_interval: 625 + steps_per_loop: 625 + summary_interval: 625 + checkpoint_interval: 625 + optimizer_config: + optimizer: + type: 'sgd' + sgd: + momentum: 0.9 + learning_rate: + type: 'stepwise' + stepwise: + boundaries: [18750, 37500, 50000] + values: [0.8, 0.08, 0.008, 0.0008] + warmup: + type: 'linear' + linear: + warmup_steps: 3125 diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tfds_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnet50_tfds_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tfds_tpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnet50_tfds_tpu.yaml diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml similarity index 86% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml index 358cafb6df9f083d7cfe8f89040dad2360393d13..8421b3411d5b1b9c53578eb45767ff1db57d5b0a 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnet50_tpu.yaml @@ -14,12 +14,12 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml index 7c9e7b80a02386093981e584440c8b29ce395b83..915f94fc3af3619cd93d36105cd78744854451fc 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i160.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml index 576c48625055d0e61d948d732cc16d4e5020c131..10af95d6f3feb7e334ee44ce65a1049ceab40f79 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs101_i192.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml index b1c8edc463f8ce3150f4414f90d88df28abd77e9..7427de95261f1226a617c4be451371d2e3aa3d36 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i192.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml index 2ec14bae5ab2f20867498ff7ba46af855d61f73c..a31c09bd872a4a88efe022c52a06d04fd31e1636 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i224.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml index 91b53d6217f7579b6764ddaf58795f0ee14f58dc..12990520e50695b7d1f5592b8b5baa3d03abf86a 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs152_i256.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml index 9d76c010170070426745607aa2690799136c0665..d8f7e841e14556bd4a1eb8d6fe0188afccab9149 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs200_i256.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml index b7c6a644e2cbb784c18b008d26a2986eacbf98e6..25735b23cfac033d887672bc0bc0220859171a14 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs270_i256.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml index 3b2d3fe261c21bb789096677e57f2a49d2b75d57..fd8f9f57812d78f9bcefb4d65e78e1974c23d3b8 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i256.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml index 36cdba7bb43cd0d41b46e5c87eead2a657afe651..a83420d663a751b7d3609944efcface9b84792b0 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs350_i320.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i320.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs420_i320.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i320.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs420_i320.yaml index 9b02b7e006a8239fe86b3d036fbfb484dc0b0995..fea2190c864ebc56626f71f89d4c774ca158f056 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs420_i320.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs420_i320.yaml @@ -24,7 +24,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -33,7 +33,7 @@ task: randaug: magnitude: 15 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml b/official/vision/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml similarity index 89% rename from official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml rename to official/vision/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml index a57f41f3908c51fb2a4249e80afe8ef8fff48c88..f7f7b2560b7907a767967523084b193aedd4d2a7 100644 --- a/official/vision/beta/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml +++ b/official/vision/configs/experiments/image_classification/imagenet_resnetrs50_i160.yaml @@ -25,7 +25,7 @@ task: one_hot: true label_smoothing: 0.1 train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' @@ -34,7 +34,7 @@ task: randaug: magnitude: 10 validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/configs/experiments/maskrcnn/coco_mobilenetv2_mrcnn_tpu.yaml b/official/vision/configs/experiments/maskrcnn/coco_mobilenetv2_mrcnn_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..6380eafb8fcf633ae265b2155cde5192d7fdf82d --- /dev/null +++ b/official/vision/configs/experiments/maskrcnn/coco_mobilenetv2_mrcnn_tpu.yaml @@ -0,0 +1,20 @@ +# Expect to reach: box mAP: 33.3%, mask mAP: 29.4% on COCO +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'bfloat16' +task: + init_checkpoint: gs://**/mobilenetv2_gpu/22984194/ckpt-625500 + init_checkpoint_modules: 'backbone' + train_data: + parser: + aug_rand_hflip: true + aug_scale_min: 0.1 + aug_scale_max: 2.0 + losses: + l2_weight_decay: 0.00004 + model: + anchor: + anchor_size: 3.0 + num_scales: 3 + detection_generator: + pre_nms_top_k: 1000 diff --git a/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml b/official/vision/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml rename to official/vision/configs/experiments/maskrcnn/coco_spinenet143_cascadercnn_tpu.yaml diff --git a/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml b/official/vision/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml rename to official/vision/configs/experiments/maskrcnn/coco_spinenet143_mrcnn_tpu.yaml diff --git a/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml b/official/vision/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml rename to official/vision/configs/experiments/maskrcnn/coco_spinenet49_cascadercnn_tpu.yaml diff --git a/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml b/official/vision/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml rename to official/vision/configs/experiments/maskrcnn/coco_spinenet49_mrcnn_tpu.yaml diff --git a/official/vision/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml b/official/vision/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml new file mode 100644 index 0000000000000000000000000000000000000000..b8dd2fb9d72cbb6a91000a6ba63b26e97f9d7dee --- /dev/null +++ b/official/vision/configs/experiments/maskrcnn/coco_spinenet96_cascadercnn_tpu.yaml @@ -0,0 +1,58 @@ +# --experiment_type=cascadercnn_spinenet_coco +# Expect to reach: box mAP: 51.9%, mask mAP: 45.0% on COCO +runtime: + distribution_strategy: 'tpu' + mixed_precision_dtype: 'bfloat16' +task: + init_checkpoint: null + train_data: + global_batch_size: 256 + parser: + aug_rand_hflip: true + aug_scale_min: 0.1 + aug_scale_max: 2.5 + losses: + l2_weight_decay: 0.00004 + model: + anchor: + anchor_size: 4.0 + num_scales: 3 + min_level: 3 + max_level: 7 + input_size: [1024, 1024, 3] + backbone: + spinenet: + stochastic_depth_drop_rate: 0.2 + model_id: '96' + type: 'spinenet' + decoder: + type: 'identity' + detection_head: + cascade_class_ensemble: true + class_agnostic_bbox_pred: true + rpn_head: + num_convs: 2 + num_filters: 256 + roi_sampler: + cascade_iou_thresholds: [0.7] + foreground_iou_threshold: 0.6 + norm_activation: + norm_epsilon: 0.001 + norm_momentum: 0.99 + use_sync_bn: true + activation: 'swish' + detection_generator: + pre_nms_top_k: 1000 +trainer: + train_steps: 231000 + optimizer_config: + learning_rate: + type: 'stepwise' + stepwise: + boundaries: [219450, 226380] + values: [0.32, 0.032, 0.0032] + warmup: + type: 'linear' + linear: + warmup_steps: 2000 + warmup_learning_rate: 0.0067 diff --git a/official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml b/official/vision/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml rename to official/vision/configs/experiments/maskrcnn/coco_spinenet96_mrcnn_tpu.yaml diff --git a/official/vision/beta/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml b/official/vision/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml similarity index 100% rename from official/vision/beta/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml rename to official/vision/configs/experiments/maskrcnn/r50fpn_640_coco_scratch_tpu4x4.yaml diff --git a/official/vision/beta/configs/experiments/retinanet/coco_mobiledetcpu_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_mobiledetcpu_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/retinanet/coco_mobiledetcpu_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_mobiledetcpu_tpu.yaml diff --git a/official/vision/beta/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml similarity index 92% rename from official/vision/beta/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml index 9e27bfe8c7201e2490760bbd2d2eeabc5092952a..60a97bc0588bac064059f2674113f93dccba21ab 100644 --- a/official/vision/beta/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml +++ b/official/vision/configs/experiments/retinanet/coco_mobilenetv2_tpu.yaml @@ -21,6 +21,7 @@ task: fpn: num_filters: 128 use_separable_conv: true + use_keras_layer: true head: num_convs: 4 num_filters: 128 @@ -43,8 +44,9 @@ task: aug_scale_min: 0.5 validation_data: dtype: 'bfloat16' - global_batch_size: 8 + global_batch_size: 256 is_training: false + drop_remainder: false trainer: optimizer_config: learning_rate: @@ -59,4 +61,4 @@ trainer: steps_per_loop: 462 train_steps: 277200 validation_interval: 462 - validation_steps: 625 + validation_steps: 20 diff --git a/official/vision/beta/configs/experiments/retinanet/coco_spinenet143_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_spinenet143_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/retinanet/coco_spinenet143_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_spinenet143_tpu.yaml diff --git a/official/vision/beta/configs/experiments/retinanet/coco_spinenet190_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_spinenet190_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/retinanet/coco_spinenet190_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_spinenet190_tpu.yaml diff --git a/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml similarity index 91% rename from official/vision/beta/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml index e1e14b321f0d118eec89aed08578ee88bdae7ba2..27f613f95189c937b5c62e170b73119c5dde9b8f 100644 --- a/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml +++ b/official/vision/configs/experiments/retinanet/coco_spinenet49_mobile_tpu.yaml @@ -1,4 +1,5 @@ # --experiment_type=retinanet_mobile_coco +# COCO mAP: 27.6 runtime: distribution_strategy: 'tpu' mixed_precision_dtype: 'bfloat16' @@ -26,7 +27,7 @@ task: max_level: 7 min_level: 3 norm_activation: - activation: 'swish' + activation: 'hard_swish' norm_epsilon: 0.001 norm_momentum: 0.99 use_sync_bn: true @@ -40,8 +41,9 @@ task: aug_scale_min: 0.5 validation_data: dtype: 'bfloat16' - global_batch_size: 8 + global_batch_size: 256 is_training: false + drop_remainder: false trainer: checkpoint_interval: 462 optimizer_config: @@ -57,4 +59,4 @@ trainer: steps_per_loop: 462 train_steps: 277200 validation_interval: 462 - validation_steps: 625 + validation_steps: 20 diff --git a/official/vision/beta/configs/experiments/retinanet/coco_spinenet49_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_spinenet49_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/retinanet/coco_spinenet49_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_spinenet49_tpu.yaml diff --git a/official/vision/beta/configs/experiments/retinanet/coco_spinenet49s_mobile_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_spinenet49s_mobile_tpu.yaml similarity index 91% rename from official/vision/beta/configs/experiments/retinanet/coco_spinenet49s_mobile_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_spinenet49s_mobile_tpu.yaml index 9f854ccf44c53d385b81ce69fa69fc94d7d92bd5..4e82339475f44280ee9f5a7d99b184ac5a043795 100644 --- a/official/vision/beta/configs/experiments/retinanet/coco_spinenet49s_mobile_tpu.yaml +++ b/official/vision/configs/experiments/retinanet/coco_spinenet49s_mobile_tpu.yaml @@ -1,4 +1,5 @@ # --experiment_type=retinanet_mobile_coco +# COCO mAP: 23.5 runtime: distribution_strategy: 'tpu' mixed_precision_dtype: 'bfloat16' @@ -26,7 +27,7 @@ task: max_level: 7 min_level: 3 norm_activation: - activation: 'swish' + activation: 'hard_swish' norm_epsilon: 0.001 norm_momentum: 0.99 use_sync_bn: true @@ -40,8 +41,9 @@ task: aug_scale_min: 0.5 validation_data: dtype: 'bfloat16' - global_batch_size: 8 + global_batch_size: 256 is_training: false + drop_remainder: false trainer: checkpoint_interval: 462 optimizer_config: @@ -57,4 +59,4 @@ trainer: steps_per_loop: 462 train_steps: 277200 validation_interval: 462 - validation_steps: 625 + validation_steps: 20 diff --git a/official/vision/beta/configs/experiments/retinanet/coco_spinenet49xs_mobile_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_spinenet49xs_mobile_tpu.yaml similarity index 91% rename from official/vision/beta/configs/experiments/retinanet/coco_spinenet49xs_mobile_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_spinenet49xs_mobile_tpu.yaml index 926bd62097ae885374bfceb40089de1646d6fe3d..be895c3feefeadc66e026e60fb50347437649bb4 100644 --- a/official/vision/beta/configs/experiments/retinanet/coco_spinenet49xs_mobile_tpu.yaml +++ b/official/vision/configs/experiments/retinanet/coco_spinenet49xs_mobile_tpu.yaml @@ -1,4 +1,5 @@ # --experiment_type=retinanet_mobile_coco +# COCO mAP: 16.5 runtime: distribution_strategy: 'tpu' mixed_precision_dtype: 'bfloat16' @@ -26,7 +27,7 @@ task: max_level: 7 min_level: 3 norm_activation: - activation: 'swish' + activation: 'hard_swish' norm_epsilon: 0.001 norm_momentum: 0.99 use_sync_bn: true @@ -40,8 +41,9 @@ task: aug_scale_min: 0.5 validation_data: dtype: 'bfloat16' - global_batch_size: 8 + global_batch_size: 256 is_training: false + drop_remainder: false trainer: checkpoint_interval: 462 optimizer_config: @@ -57,4 +59,4 @@ trainer: steps_per_loop: 462 train_steps: 277200 validation_interval: 462 - validation_steps: 625 + validation_steps: 20 diff --git a/official/vision/beta/configs/experiments/retinanet/coco_spinenet96_tpu.yaml b/official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/retinanet/coco_spinenet96_tpu.yaml rename to official/vision/configs/experiments/retinanet/coco_spinenet96_tpu.yaml diff --git a/official/vision/beta/configs/experiments/retinanet/resnet50fpn_coco_tfds_tpu.yaml b/official/vision/configs/experiments/retinanet/resnet50fpn_coco_tfds_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/retinanet/resnet50fpn_coco_tfds_tpu.yaml rename to official/vision/configs/experiments/retinanet/resnet50fpn_coco_tfds_tpu.yaml diff --git a/official/vision/beta/configs/experiments/retinanet/resnet50fpn_coco_tpu4x4_benchmark.yaml b/official/vision/configs/experiments/retinanet/resnet50fpn_coco_tpu4x4_benchmark.yaml similarity index 100% rename from official/vision/beta/configs/experiments/retinanet/resnet50fpn_coco_tpu4x4_benchmark.yaml rename to official/vision/configs/experiments/retinanet/resnet50fpn_coco_tpu4x4_benchmark.yaml diff --git a/official/vision/beta/configs/experiments/semantic_segmentation/deeplabv3plus_resnet101_cityscapes_tfds_tpu.yaml b/official/vision/configs/experiments/semantic_segmentation/deeplabv3plus_resnet101_cityscapes_tfds_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/semantic_segmentation/deeplabv3plus_resnet101_cityscapes_tfds_tpu.yaml rename to official/vision/configs/experiments/semantic_segmentation/deeplabv3plus_resnet101_cityscapes_tfds_tpu.yaml diff --git a/official/vision/beta/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml b/official/vision/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml rename to official/vision/configs/experiments/video_classification/k400_3d-resnet50_tpu.yaml diff --git a/official/vision/beta/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml b/official/vision/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml similarity index 98% rename from official/vision/beta/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml rename to official/vision/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml index 83875d1273a6f96df05a10fa46e101bf32498c73..baff531b6d465309efc454123581c2af6d178f1e 100644 --- a/official/vision/beta/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml +++ b/official/vision/configs/experiments/video_classification/k400_resnet3drs_50_tpu.yaml @@ -42,7 +42,6 @@ task: is_training: true min_image_size: 256 name: kinetics400 - num_channels: 3 num_classes: 400 num_examples: 215570 num_test_clips: 1 @@ -67,7 +66,6 @@ task: is_training: false min_image_size: 256 name: kinetics400 - num_channels: 3 num_classes: 400 num_examples: 17706 num_test_clips: 10 diff --git a/official/vision/beta/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml b/official/vision/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml rename to official/vision/configs/experiments/video_classification/k400_slowonly16x4_tpu.yaml diff --git a/official/vision/beta/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml b/official/vision/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml rename to official/vision/configs/experiments/video_classification/k400_slowonly8x8_tpu.yaml diff --git a/official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml b/official/vision/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml rename to official/vision/configs/experiments/video_classification/k600_3d-resnet50_tpu.yaml diff --git a/official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50g_tpu.yaml b/official/vision/configs/experiments/video_classification/k600_3d-resnet50g_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/video_classification/k600_3d-resnet50g_tpu.yaml rename to official/vision/configs/experiments/video_classification/k600_3d-resnet50g_tpu.yaml diff --git a/official/vision/beta/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml b/official/vision/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml similarity index 100% rename from official/vision/beta/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml rename to official/vision/configs/experiments/video_classification/k600_slowonly8x8_tpu.yaml diff --git a/official/vision/configs/image_classification.py b/official/vision/configs/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..b9e81654594a8c4c38c366e30e5c72ea36043ada --- /dev/null +++ b/official/vision/configs/image_classification.py @@ -0,0 +1,602 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification configuration definition.""" +import dataclasses +import os +from typing import List, Optional, Tuple + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.vision.configs import common +from official.vision.configs import backbones + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + input_path: str = '' + global_batch_size: int = 0 + is_training: bool = True + dtype: str = 'float32' + shuffle_buffer_size: int = 10000 + cycle_length: int = 10 + is_multilabel: bool = False + aug_rand_hflip: bool = True + aug_crop: Optional[bool] = True + crop_area_range: Optional[Tuple[float, float]] = (0.08, 1.0) + aug_type: Optional[ + common.Augmentation] = None # Choose from AutoAugment and RandAugment. + color_jitter: float = 0. + random_erasing: Optional[common.RandomErasing] = None + file_type: str = 'tfrecord' + image_field_key: str = 'image/encoded' + label_field_key: str = 'image/class/label' + decode_jpeg_only: bool = True + mixup_and_cutmix: Optional[common.MixupAndCutmix] = None + decoder: Optional[common.DataDecoder] = common.DataDecoder() + + # Keep for backward compatibility. + aug_policy: Optional[str] = None # None, 'autoaug', or 'randaug'. + randaug_magnitude: Optional[int] = 10 + + +@dataclasses.dataclass +class ImageClassificationModel(hyperparams.Config): + """The model config.""" + num_classes: int = 0 + input_size: List[int] = dataclasses.field(default_factory=list) + backbone: backbones.Backbone = backbones.Backbone( + type='resnet', resnet=backbones.ResNet()) + dropout_rate: float = 0.0 + norm_activation: common.NormActivation = common.NormActivation( + use_sync_bn=False) + # Adds a BatchNormalization layer pre-GlobalAveragePooling in classification + add_head_batch_norm: bool = False + kernel_initializer: str = 'random_uniform' + # Whether to output softmax results instead of logits. + output_softmax: bool = False + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + loss_weight: float = 1.0 + one_hot: bool = True + label_smoothing: float = 0.0 + l2_weight_decay: float = 0.0 + soft_labels: bool = False + + +@dataclasses.dataclass +class Evaluation(hyperparams.Config): + top_k: int = 5 + precision_and_recall_thresholds: Optional[List[float]] = None + report_per_class_precision_and_recall: bool = False + + +@dataclasses.dataclass +class ImageClassificationTask(cfg.TaskConfig): + """The task config.""" + model: ImageClassificationModel = ImageClassificationModel() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False) + losses: Losses = Losses() + evaluation: Evaluation = Evaluation() + init_checkpoint: Optional[str] = None + init_checkpoint_modules: str = 'all' # all or backbone + model_output_keys: Optional[List[int]] = dataclasses.field( + default_factory=list) + freeze_backbone: bool = False + + +@exp_factory.register_config_factory('image_classification') +def image_classification() -> cfg.ExperimentConfig: + """Image classification general.""" + return cfg.ExperimentConfig( + task=ImageClassificationTask(), + trainer=cfg.TrainerConfig(), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + +IMAGENET_TRAIN_EXAMPLES = 1281167 +IMAGENET_VAL_EXAMPLES = 50000 +IMAGENET_INPUT_PATH_BASE = 'imagenet-2012-tfrecord' + + +@exp_factory.register_config_factory('resnet_imagenet') +def image_classification_imagenet() -> cfg.ExperimentConfig: + """Image classification on imagenet with resnet.""" + train_batch_size = 4096 + eval_batch_size = 4096 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(enable_xla=True), + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[224, 224, 3], + backbone=backbones.Backbone( + type='resnet', resnet=backbones.ResNet(model_id=50)), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), + losses=Losses(l2_weight_decay=1e-4), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=90 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + 30 * steps_per_epoch, 60 * steps_per_epoch, + 80 * steps_per_epoch + ], + 'values': [ + 0.1 * train_batch_size / 256, + 0.01 * train_batch_size / 256, + 0.001 * train_batch_size / 256, + 0.0001 * train_batch_size / 256, + ] + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('resnet_rs_imagenet') +def image_classification_imagenet_resnetrs() -> cfg.ExperimentConfig: + """Image classification on imagenet with resnet-rs.""" + train_batch_size = 4096 + eval_batch_size = 4096 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[160, 160, 3], + backbone=backbones.Backbone( + type='resnet', + resnet=backbones.ResNet( + model_id=50, + stem_type='v1', + resnetd_shortcut=True, + replace_stem_max_pool=True, + se_ratio=0.25, + stochastic_depth_drop_rate=0.0)), + dropout_rate=0.25, + norm_activation=common.NormActivation( + norm_momentum=0.0, + norm_epsilon=1e-5, + use_sync_bn=False, + activation='swish')), + losses=Losses(l2_weight_decay=4e-5, label_smoothing=0.1), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + aug_type=common.Augmentation( + type='randaug', randaug=common.RandAugment(magnitude=10))), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=350 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'ema': { + 'average_decay': 0.9999, + 'trainable_weights_only': False, + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 1.6, + 'decay_steps': 350 * steps_per_epoch + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config + + +@exp_factory.register_config_factory('revnet_imagenet') +def image_classification_imagenet_revnet() -> cfg.ExperimentConfig: + """Returns a revnet config for image classification on imagenet.""" + train_batch_size = 4096 + eval_batch_size = 4096 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[224, 224, 3], + backbone=backbones.Backbone( + type='revnet', revnet=backbones.RevNet(model_id=56)), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False), + add_head_batch_norm=True), + losses=Losses(l2_weight_decay=1e-4), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=90 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + 30 * steps_per_epoch, 60 * steps_per_epoch, + 80 * steps_per_epoch + ], + 'values': [0.8, 0.08, 0.008, 0.0008] + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('mobilenet_imagenet') +def image_classification_imagenet_mobilenet() -> cfg.ExperimentConfig: + """Image classification on imagenet with mobilenet.""" + train_batch_size = 4096 + eval_batch_size = 4096 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + dropout_rate=0.2, + input_size=[224, 224, 3], + backbone=backbones.Backbone( + type='mobilenet', + mobilenet=backbones.MobileNet( + model_id='MobileNetV2', filter_size_scale=1.0)), + norm_activation=common.NormActivation( + norm_momentum=0.997, norm_epsilon=1e-3, use_sync_bn=False)), + losses=Losses(l2_weight_decay=1e-5, label_smoothing=0.1), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=500 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'rmsprop', + 'rmsprop': { + 'rho': 0.9, + 'momentum': 0.9, + 'epsilon': 0.002, + } + }, + 'learning_rate': { + 'type': 'exponential', + 'exponential': { + 'initial_learning_rate': + 0.008 * (train_batch_size // 128), + 'decay_steps': + int(2.5 * steps_per_epoch), + 'decay_rate': + 0.98, + 'staircase': + True + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + }, + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('deit_imagenet_pretrain') +def image_classification_imagenet_deit_pretrain() -> cfg.ExperimentConfig: + """Image classification on imagenet with vision transformer.""" + train_batch_size = 4096 # originally was 1024 but 4096 better for tpu v3-32 + eval_batch_size = 4096 # originally was 1024 but 4096 better for tpu v3-32 + label_smoothing = 0.1 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[224, 224, 3], + kernel_initializer='zeros', + backbone=backbones.Backbone( + type='vit', + vit=backbones.VisionTransformer( + model_name='vit-b16', + representation_size=768, + init_stochastic_depth_rate=0.1, + original_init=False, + transformer=backbones.Transformer( + dropout_rate=0.0, attention_dropout_rate=0.0)))), + losses=Losses( + l2_weight_decay=0.0, + label_smoothing=label_smoothing, + one_hot=False, + soft_labels=True), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + aug_type=common.Augmentation( + type='randaug', + randaug=common.RandAugment( + magnitude=9, exclude_ops=['Cutout'])), + mixup_and_cutmix=common.MixupAndCutmix( + label_smoothing=label_smoothing)), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=300 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': 0.05, + 'include_in_weight_decay': r'.*(kernel|weight):0$', + 'gradient_clip_norm': 0.0 + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.0005 * train_batch_size / 512, + 'decay_steps': 300 * steps_per_epoch, + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('vit_imagenet_pretrain') +def image_classification_imagenet_vit_pretrain() -> cfg.ExperimentConfig: + """Image classification on imagenet with vision transformer.""" + train_batch_size = 4096 + eval_batch_size = 4096 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[224, 224, 3], + kernel_initializer='zeros', + backbone=backbones.Backbone( + type='vit', + vit=backbones.VisionTransformer( + model_name='vit-b16', representation_size=768))), + losses=Losses(l2_weight_decay=0.0), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=300 * steps_per_epoch, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'adamw', + 'adamw': { + 'weight_decay_rate': 0.3, + 'include_in_weight_decay': r'.*(kernel|weight):0$', + 'gradient_clip_norm': 0.0 + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.003 * train_batch_size / 4096, + 'decay_steps': 300 * steps_per_epoch, + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 10000, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('vit_imagenet_finetune') +def image_classification_imagenet_vit_finetune() -> cfg.ExperimentConfig: + """Image classification on imagenet with vision transformer.""" + train_batch_size = 512 + eval_batch_size = 512 + steps_per_epoch = IMAGENET_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=ImageClassificationTask( + model=ImageClassificationModel( + num_classes=1001, + input_size=[384, 384, 3], + backbone=backbones.Backbone( + type='vit', + vit=backbones.VisionTransformer(model_name='vit-b16'))), + losses=Losses(l2_weight_decay=0.0), + train_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size), + validation_data=DataConfig( + input_path=os.path.join(IMAGENET_INPUT_PATH_BASE, 'valid*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=20000, + validation_steps=IMAGENET_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9, + 'global_clipnorm': 1.0, + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': 0.003, + 'decay_steps': 20000, + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config diff --git a/official/vision/configs/image_classification_test.py b/official/vision/configs/image_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8d3cd4d1ab3fbc0c87cb67d846f4afec19e72883 --- /dev/null +++ b/official/vision/configs/image_classification_test.py @@ -0,0 +1,51 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for image_classification.""" +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.vision.configs import image_classification as exp_cfg + + +class ImageClassificationConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + ('resnet_imagenet',), + ('resnet_rs_imagenet',), + ('revnet_imagenet',), + ('mobilenet_imagenet',), + ('deit_imagenet_pretrain',), + ('vit_imagenet_pretrain',), + ('vit_imagenet_finetune',), + ) + def test_image_classification_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.ImageClassificationTask) + self.assertIsInstance(config.task.model, + exp_cfg.ImageClassificationModel) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.validate() + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/configs/maskrcnn.py b/official/vision/configs/maskrcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..62abc3e9c9f6648938ad4bef60e13e51e29122d9 --- /dev/null +++ b/official/vision/configs/maskrcnn.py @@ -0,0 +1,616 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""R-CNN(-RS) configuration definition.""" + +import dataclasses +import os +from typing import List, Optional, Union + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.vision.configs import common +from official.vision.configs import decoders +from official.vision.configs import backbones + + +# pylint: disable=missing-class-docstring +@dataclasses.dataclass +class Parser(hyperparams.Config): + num_channels: int = 3 + match_threshold: float = 0.5 + unmatched_threshold: float = 0.5 + aug_rand_hflip: bool = False + aug_scale_min: float = 1.0 + aug_scale_max: float = 1.0 + aug_type: Optional[ + common.Augmentation] = None # Choose from AutoAugment and RandAugment. + skip_crowd_during_training: bool = True + max_num_instances: int = 100 + rpn_match_threshold: float = 0.7 + rpn_unmatched_threshold: float = 0.3 + rpn_batch_size_per_im: int = 256 + rpn_fg_fraction: float = 0.5 + mask_crop_size: int = 112 + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + input_path: str = '' + global_batch_size: int = 0 + is_training: bool = False + dtype: str = 'bfloat16' + decoder: common.DataDecoder = common.DataDecoder() + parser: Parser = Parser() + shuffle_buffer_size: int = 10000 + file_type: str = 'tfrecord' + drop_remainder: bool = True + # Number of examples in the data set, it's used to create the annotation file. + num_examples: int = -1 + + +@dataclasses.dataclass +class Anchor(hyperparams.Config): + num_scales: int = 1 + aspect_ratios: List[float] = dataclasses.field( + default_factory=lambda: [0.5, 1.0, 2.0]) + anchor_size: float = 8.0 + + +@dataclasses.dataclass +class RPNHead(hyperparams.Config): + num_convs: int = 1 + num_filters: int = 256 + use_separable_conv: bool = False + + +@dataclasses.dataclass +class DetectionHead(hyperparams.Config): + num_convs: int = 4 + num_filters: int = 256 + use_separable_conv: bool = False + num_fcs: int = 1 + fc_dims: int = 1024 + class_agnostic_bbox_pred: bool = False # Has to be True for Cascade RCNN. + # If additional IoUs are passed in 'cascade_iou_thresholds' + # then ensemble the class probabilities from all heads. + cascade_class_ensemble: bool = False + + +@dataclasses.dataclass +class ROIGenerator(hyperparams.Config): + pre_nms_top_k: int = 2000 + pre_nms_score_threshold: float = 0.0 + pre_nms_min_size_threshold: float = 0.0 + nms_iou_threshold: float = 0.7 + num_proposals: int = 1000 + test_pre_nms_top_k: int = 1000 + test_pre_nms_score_threshold: float = 0.0 + test_pre_nms_min_size_threshold: float = 0.0 + test_nms_iou_threshold: float = 0.7 + test_num_proposals: int = 1000 + use_batched_nms: bool = False + + +@dataclasses.dataclass +class ROISampler(hyperparams.Config): + mix_gt_boxes: bool = True + num_sampled_rois: int = 512 + foreground_fraction: float = 0.25 + foreground_iou_threshold: float = 0.5 + background_iou_high_threshold: float = 0.5 + background_iou_low_threshold: float = 0.0 + # IoU thresholds for additional FRCNN heads in Cascade mode. + # `foreground_iou_threshold` is the first threshold. + cascade_iou_thresholds: Optional[List[float]] = None + + +@dataclasses.dataclass +class ROIAligner(hyperparams.Config): + crop_size: int = 7 + sample_offset: float = 0.5 + + +@dataclasses.dataclass +class DetectionGenerator(hyperparams.Config): + apply_nms: bool = True + pre_nms_top_k: int = 5000 + pre_nms_score_threshold: float = 0.05 + nms_iou_threshold: float = 0.5 + max_num_detections: int = 100 + nms_version: str = 'v2' # `v2`, `v1`, `batched` + use_cpu_nms: bool = False + soft_nms_sigma: Optional[float] = None # Only works when nms_version='v1'. + + +@dataclasses.dataclass +class MaskHead(hyperparams.Config): + upsample_factor: int = 2 + num_convs: int = 4 + num_filters: int = 256 + use_separable_conv: bool = False + class_agnostic: bool = False + + +@dataclasses.dataclass +class MaskSampler(hyperparams.Config): + num_sampled_masks: int = 128 + + +@dataclasses.dataclass +class MaskROIAligner(hyperparams.Config): + crop_size: int = 14 + sample_offset: float = 0.5 + + +@dataclasses.dataclass +class MaskRCNN(hyperparams.Config): + num_classes: int = 0 + input_size: List[int] = dataclasses.field(default_factory=list) + min_level: int = 2 + max_level: int = 6 + anchor: Anchor = Anchor() + include_mask: bool = True + backbone: backbones.Backbone = backbones.Backbone( + type='resnet', resnet=backbones.ResNet()) + decoder: decoders.Decoder = decoders.Decoder( + type='fpn', fpn=decoders.FPN()) + rpn_head: RPNHead = RPNHead() + detection_head: DetectionHead = DetectionHead() + roi_generator: ROIGenerator = ROIGenerator() + roi_sampler: ROISampler = ROISampler() + roi_aligner: ROIAligner = ROIAligner() + detection_generator: DetectionGenerator = DetectionGenerator() + mask_head: Optional[MaskHead] = MaskHead() + mask_sampler: Optional[MaskSampler] = MaskSampler() + mask_roi_aligner: Optional[MaskROIAligner] = MaskROIAligner() + norm_activation: common.NormActivation = common.NormActivation( + norm_momentum=0.997, + norm_epsilon=0.0001, + use_sync_bn=True) + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + loss_weight: float = 1.0 + rpn_huber_loss_delta: float = 1. / 9. + frcnn_huber_loss_delta: float = 1. + l2_weight_decay: float = 0.0 + rpn_score_weight: float = 1.0 + rpn_box_weight: float = 1.0 + frcnn_class_weight: float = 1.0 + frcnn_box_weight: float = 1.0 + mask_weight: float = 1.0 + + +@dataclasses.dataclass +class MaskRCNNTask(cfg.TaskConfig): + model: MaskRCNN = MaskRCNN() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False, + drop_remainder=False) + losses: Losses = Losses() + init_checkpoint: Optional[str] = None + init_checkpoint_modules: Union[ + str, List[str]] = 'all' # all, backbone, and/or decoder + annotation_file: Optional[str] = None + per_category_metrics: bool = False + # If set, we only use masks for the specified class IDs. + allowed_mask_class_ids: Optional[List[int]] = None + # If set, the COCO metrics will be computed. + use_coco_metrics: bool = True + # If set, the Waymo Open Dataset evaluator would be used. + use_wod_metrics: bool = False + + # If set, freezes the backbone during training. + # TODO(crisnv) Add paper link when available. + freeze_backbone: bool = False + + +COCO_INPUT_PATH_BASE = 'coco' + + +@exp_factory.register_config_factory('fasterrcnn_resnetfpn_coco') +def fasterrcnn_resnetfpn_coco() -> cfg.ExperimentConfig: + """COCO object detection with Faster R-CNN.""" + steps_per_epoch = 500 + coco_val_samples = 5000 + train_batch_size = 64 + eval_batch_size = 8 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=MaskRCNNTask( + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', + init_checkpoint_modules='backbone', + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=MaskRCNN( + num_classes=91, + input_size=[1024, 1024, 3], + include_mask=False, + mask_head=None, + mask_sampler=None, + mask_roi_aligner=None), + losses=Losses(l2_weight_decay=0.00004), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), + trainer=cfg.TrainerConfig( + train_steps=22500, + validation_steps=coco_val_samples // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [15000, 20000], + 'values': [0.12, 0.012, 0.0012], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 500, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config + + +@exp_factory.register_config_factory('maskrcnn_resnetfpn_coco') +def maskrcnn_resnetfpn_coco() -> cfg.ExperimentConfig: + """COCO object detection with Mask R-CNN.""" + steps_per_epoch = 500 + coco_val_samples = 5000 + train_batch_size = 64 + eval_batch_size = 8 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig( + mixed_precision_dtype='bfloat16', enable_xla=True), + task=MaskRCNNTask( + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', + init_checkpoint_modules='backbone', + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=MaskRCNN( + num_classes=91, input_size=[1024, 1024, 3], include_mask=True), + losses=Losses(l2_weight_decay=0.00004), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.25)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), + trainer=cfg.TrainerConfig( + train_steps=22500, + validation_steps=coco_val_samples // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [15000, 20000], + 'values': [0.12, 0.012, 0.0012], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 500, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + return config + + +@exp_factory.register_config_factory('maskrcnn_spinenet_coco') +def maskrcnn_spinenet_coco() -> cfg.ExperimentConfig: + """COCO object detection with Mask R-CNN with SpineNet backbone.""" + steps_per_epoch = 463 + coco_val_samples = 5000 + train_batch_size = 256 + eval_batch_size = 8 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=MaskRCNNTask( + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=MaskRCNN( + backbone=backbones.Backbone( + type='spinenet', + spinenet=backbones.SpineNet( + model_id='49', + min_level=3, + max_level=7, + )), + decoder=decoders.Decoder( + type='identity', identity=decoders.Identity()), + anchor=Anchor(anchor_size=3), + norm_activation=common.NormActivation(use_sync_bn=True), + num_classes=91, + input_size=[640, 640, 3], + min_level=3, + max_level=7, + include_mask=True), + losses=Losses(l2_weight_decay=0.00004), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), + trainer=cfg.TrainerConfig( + train_steps=steps_per_epoch * 350, + validation_steps=coco_val_samples // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + steps_per_epoch * 320, steps_per_epoch * 340 + ], + 'values': [0.32, 0.032, 0.0032], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.model.min_level == task.model.backbone.spinenet.min_level', + 'task.model.max_level == task.model.backbone.spinenet.max_level', + ]) + return config + + +@exp_factory.register_config_factory('cascadercnn_spinenet_coco') +def cascadercnn_spinenet_coco() -> cfg.ExperimentConfig: + """COCO object detection with Cascade RCNN-RS with SpineNet backbone.""" + steps_per_epoch = 463 + coco_val_samples = 5000 + train_batch_size = 256 + eval_batch_size = 8 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=MaskRCNNTask( + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=MaskRCNN( + backbone=backbones.Backbone( + type='spinenet', + spinenet=backbones.SpineNet( + model_id='49', + min_level=3, + max_level=7, + )), + decoder=decoders.Decoder( + type='identity', identity=decoders.Identity()), + roi_sampler=ROISampler(cascade_iou_thresholds=[0.6, 0.7]), + detection_head=DetectionHead( + class_agnostic_bbox_pred=True, cascade_class_ensemble=True), + anchor=Anchor(anchor_size=3), + norm_activation=common.NormActivation( + use_sync_bn=True, activation='swish'), + num_classes=91, + input_size=[640, 640, 3], + min_level=3, + max_level=7, + include_mask=True), + losses=Losses(l2_weight_decay=0.00004), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.1, aug_scale_max=2.5)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), + trainer=cfg.TrainerConfig( + train_steps=steps_per_epoch * 500, + validation_steps=coco_val_samples // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + steps_per_epoch * 475, steps_per_epoch * 490 + ], + 'values': [0.32, 0.032, 0.0032], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.model.min_level == task.model.backbone.spinenet.min_level', + 'task.model.max_level == task.model.backbone.spinenet.max_level', + ]) + return config + + +@exp_factory.register_config_factory('maskrcnn_mobilenet_coco') +def maskrcnn_mobilenet_coco() -> cfg.ExperimentConfig: + """COCO object detection with Mask R-CNN with MobileNet backbone.""" + steps_per_epoch = 232 + coco_val_samples = 5000 + train_batch_size = 512 + eval_batch_size = 512 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=MaskRCNNTask( + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=MaskRCNN( + backbone=backbones.Backbone( + type='mobilenet', + mobilenet=backbones.MobileNet(model_id='MobileNetV2')), + decoder=decoders.Decoder( + type='fpn', + fpn=decoders.FPN(num_filters=128, use_separable_conv=True)), + rpn_head=RPNHead(use_separable_conv=True, + num_filters=128), # 1/2 of original channels. + detection_head=DetectionHead( + use_separable_conv=True, num_filters=128, + fc_dims=512), # 1/2 of original channels. + mask_head=MaskHead(use_separable_conv=True, + num_filters=128), # 1/2 of original channels. + anchor=Anchor(anchor_size=3), + norm_activation=common.NormActivation( + activation='relu6', + norm_momentum=0.99, + norm_epsilon=0.001, + use_sync_bn=True), + num_classes=91, + input_size=[512, 512, 3], + min_level=3, + max_level=6, + include_mask=True), + losses=Losses(l2_weight_decay=0.00004), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.5, aug_scale_max=2.0)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + drop_remainder=False)), + trainer=cfg.TrainerConfig( + train_steps=steps_per_epoch * 350, + validation_steps=coco_val_samples // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + steps_per_epoch * 320, steps_per_epoch * 340 + ], + 'values': [0.32, 0.032, 0.0032], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + ]) + return config diff --git a/official/vision/beta/configs/maskrcnn_test.py b/official/vision/configs/maskrcnn_test.py similarity index 86% rename from official/vision/beta/configs/maskrcnn_test.py rename to official/vision/configs/maskrcnn_test.py index 28eab0164fbc4ac51554fa6adb3b25ca44f93727..c5f818d4373759d67513d509b4e5372fbd70c875 100644 --- a/official/vision/beta/configs/maskrcnn_test.py +++ b/official/vision/configs/maskrcnn_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,10 +17,10 @@ from absl.testing import parameterized import tensorflow as tf +from official import vision from official.core import config_definitions as cfg from official.core import exp_factory -from official.vision import beta -from official.vision.beta.configs import maskrcnn as exp_cfg +from official.vision.configs import maskrcnn as exp_cfg class MaskRCNNConfigTest(tf.test.TestCase, parameterized.TestCase): @@ -39,7 +39,7 @@ class MaskRCNNConfigTest(tf.test.TestCase, parameterized.TestCase): self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) config.validate() config.task.train_data.is_training = None - with self.assertRaisesRegex(KeyError, 'Found inconsistncy between key'): + with self.assertRaisesRegex(KeyError, 'Found inconsistency between key'): config.validate() diff --git a/official/vision/configs/retinanet.py b/official/vision/configs/retinanet.py new file mode 100644 index 0000000000000000000000000000000000000000..ccd8a4c797c5f542260e25bfb8670bd2491c1f9a --- /dev/null +++ b/official/vision/configs/retinanet.py @@ -0,0 +1,435 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""RetinaNet configuration definition.""" + +import dataclasses +import os +from typing import List, Optional, Union + +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.vision.configs import common +from official.vision.configs import decoders +from official.vision.configs import backbones + + +# pylint: disable=missing-class-docstring +# Keep for backward compatibility. +@dataclasses.dataclass +class TfExampleDecoder(common.TfExampleDecoder): + """A simple TF Example decoder config.""" + + +# Keep for backward compatibility. +@dataclasses.dataclass +class TfExampleDecoderLabelMap(common.TfExampleDecoderLabelMap): + """TF Example decoder with label map config.""" + + +# Keep for backward compatibility. +@dataclasses.dataclass +class DataDecoder(common.DataDecoder): + """Data decoder config.""" + + +@dataclasses.dataclass +class Parser(hyperparams.Config): + num_channels: int = 3 + match_threshold: float = 0.5 + unmatched_threshold: float = 0.5 + aug_rand_hflip: bool = False + aug_scale_min: float = 1.0 + aug_scale_max: float = 1.0 + skip_crowd_during_training: bool = True + max_num_instances: int = 100 + # Can choose AutoAugment and RandAugment. + aug_type: Optional[common.Augmentation] = None + + # Keep for backward compatibility. Not used. + aug_policy: Optional[str] = None + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + input_path: str = '' + global_batch_size: int = 0 + is_training: bool = False + dtype: str = 'bfloat16' + decoder: common.DataDecoder = common.DataDecoder() + parser: Parser = Parser() + shuffle_buffer_size: int = 10000 + file_type: str = 'tfrecord' + + +@dataclasses.dataclass +class Anchor(hyperparams.Config): + num_scales: int = 3 + aspect_ratios: List[float] = dataclasses.field( + default_factory=lambda: [0.5, 1.0, 2.0]) + anchor_size: float = 4.0 + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + loss_weight: float = 1.0 + focal_loss_alpha: float = 0.25 + focal_loss_gamma: float = 1.5 + huber_loss_delta: float = 0.1 + box_loss_weight: int = 50 + l2_weight_decay: float = 0.0 + + +@dataclasses.dataclass +class AttributeHead(hyperparams.Config): + name: str = '' + type: str = 'regression' + size: int = 1 + + +@dataclasses.dataclass +class RetinaNetHead(hyperparams.Config): + num_convs: int = 4 + num_filters: int = 256 + use_separable_conv: bool = False + attribute_heads: List[AttributeHead] = dataclasses.field(default_factory=list) + share_classification_heads: bool = False + + +@dataclasses.dataclass +class DetectionGenerator(hyperparams.Config): + apply_nms: bool = True + pre_nms_top_k: int = 5000 + pre_nms_score_threshold: float = 0.05 + nms_iou_threshold: float = 0.5 + max_num_detections: int = 100 + nms_version: str = 'v2' # `v2`, `v1`, `batched`, or `tflite`. + use_cpu_nms: bool = False + soft_nms_sigma: Optional[float] = None # Only works when nms_version='v1'. + + # When nms_version = `tflite`, values from tflite_post_processing need to be + # specified. They are compatible with the input arguments used by TFLite + # custom NMS op and override above parameters. + tflite_post_processing: common.TFLitePostProcessingConfig = common.TFLitePostProcessingConfig( + ) + + +@dataclasses.dataclass +class RetinaNet(hyperparams.Config): + num_classes: int = 0 + input_size: List[int] = dataclasses.field(default_factory=list) + min_level: int = 3 + max_level: int = 7 + anchor: Anchor = Anchor() + backbone: backbones.Backbone = backbones.Backbone( + type='resnet', resnet=backbones.ResNet()) + decoder: decoders.Decoder = decoders.Decoder( + type='fpn', fpn=decoders.FPN()) + head: RetinaNetHead = RetinaNetHead() + detection_generator: DetectionGenerator = DetectionGenerator() + norm_activation: common.NormActivation = common.NormActivation() + + +@dataclasses.dataclass +class ExportConfig(hyperparams.Config): + output_normalized_coordinates: bool = False + cast_num_detections_to_float: bool = False + cast_detection_classes_to_float: bool = False + + +@dataclasses.dataclass +class RetinaNetTask(cfg.TaskConfig): + model: RetinaNet = RetinaNet() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False) + losses: Losses = Losses() + init_checkpoint: Optional[str] = None + init_checkpoint_modules: Union[ + str, List[str]] = 'all' # all, backbone, and/or decoder + annotation_file: Optional[str] = None + per_category_metrics: bool = False + export_config: ExportConfig = ExportConfig() + # If set, the COCO metrics will be computed. + use_coco_metrics: bool = True + # If set, the Waymo Open Dataset evaluator would be used. + use_wod_metrics: bool = False + + # If set, freezes the backbone during training. + # TODO(crisnv) Add paper link when available. + freeze_backbone: bool = False + + +@exp_factory.register_config_factory('retinanet') +def retinanet() -> cfg.ExperimentConfig: + """RetinaNet general config.""" + return cfg.ExperimentConfig( + task=RetinaNetTask(), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + +COCO_INPUT_PATH_BASE = 'coco' +COCO_TRAIN_EXAMPLES = 118287 +COCO_VAL_EXAMPLES = 5000 + + +@exp_factory.register_config_factory('retinanet_resnetfpn_coco') +def retinanet_resnetfpn_coco() -> cfg.ExperimentConfig: + """COCO object detection with RetinaNet.""" + train_batch_size = 256 + eval_batch_size = 8 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=RetinaNetTask( + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/resnet50_imagenet/ckpt-28080', + init_checkpoint_modules='backbone', + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=RetinaNet( + num_classes=91, + input_size=[640, 640, 3], + norm_activation=common.NormActivation(use_sync_bn=False), + min_level=3, + max_level=7), + losses=Losses(l2_weight_decay=1e-4), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.8, aug_scale_max=1.2)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + train_steps=72 * steps_per_epoch, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + 57 * steps_per_epoch, 67 * steps_per_epoch + ], + 'values': [ + 0.32 * train_batch_size / 256.0, + 0.032 * train_batch_size / 256.0, + 0.0032 * train_batch_size / 256.0 + ], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 500, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('retinanet_spinenet_coco') +def retinanet_spinenet_coco() -> cfg.ExperimentConfig: + """COCO object detection with RetinaNet using SpineNet backbone.""" + train_batch_size = 256 + eval_batch_size = 8 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + input_size = 640 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='float32'), + task=RetinaNetTask( + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=RetinaNet( + backbone=backbones.Backbone( + type='spinenet', + spinenet=backbones.SpineNet( + model_id='49', + stochastic_depth_drop_rate=0.2, + min_level=3, + max_level=7)), + decoder=decoders.Decoder( + type='identity', identity=decoders.Identity()), + anchor=Anchor(anchor_size=3), + norm_activation=common.NormActivation( + use_sync_bn=True, activation='swish'), + num_classes=91, + input_size=[input_size, input_size, 3], + min_level=3, + max_level=7), + losses=Losses(l2_weight_decay=4e-5), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.1, aug_scale_max=2.0)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + train_steps=500 * steps_per_epoch, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + 475 * steps_per_epoch, 490 * steps_per_epoch + ], + 'values': [ + 0.32 * train_batch_size / 256.0, + 0.032 * train_batch_size / 256.0, + 0.0032 * train_batch_size / 256.0 + ], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.model.min_level == task.model.backbone.spinenet.min_level', + 'task.model.max_level == task.model.backbone.spinenet.max_level', + ]) + + return config + + +@exp_factory.register_config_factory('retinanet_mobile_coco') +def retinanet_spinenet_mobile_coco() -> cfg.ExperimentConfig: + """COCO object detection with mobile RetinaNet.""" + train_batch_size = 256 + eval_batch_size = 8 + steps_per_epoch = COCO_TRAIN_EXAMPLES // train_batch_size + input_size = 384 + + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='float32'), + task=RetinaNetTask( + annotation_file=os.path.join(COCO_INPUT_PATH_BASE, + 'instances_val2017.json'), + model=RetinaNet( + backbone=backbones.Backbone( + type='spinenet_mobile', + spinenet_mobile=backbones.SpineNetMobile( + model_id='49', + stochastic_depth_drop_rate=0.2, + min_level=3, + max_level=7, + use_keras_upsampling_2d=False)), + decoder=decoders.Decoder( + type='identity', identity=decoders.Identity()), + head=RetinaNetHead(num_filters=48, use_separable_conv=True), + anchor=Anchor(anchor_size=3), + norm_activation=common.NormActivation( + use_sync_bn=True, activation='swish'), + num_classes=91, + input_size=[input_size, input_size, 3], + min_level=3, + max_level=7), + losses=Losses(l2_weight_decay=3e-5), + train_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'train*'), + is_training=True, + global_batch_size=train_batch_size, + parser=Parser( + aug_rand_hflip=True, aug_scale_min=0.1, aug_scale_max=2.0)), + validation_data=DataConfig( + input_path=os.path.join(COCO_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size)), + trainer=cfg.TrainerConfig( + train_steps=600 * steps_per_epoch, + validation_steps=COCO_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'stepwise', + 'stepwise': { + 'boundaries': [ + 575 * steps_per_epoch, 590 * steps_per_epoch + ], + 'values': [ + 0.32 * train_batch_size / 256.0, + 0.032 * train_batch_size / 256.0, + 0.0032 * train_batch_size / 256.0 + ], + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 2000, + 'warmup_learning_rate': 0.0067 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + ]) + + return config diff --git a/official/vision/configs/retinanet_test.py b/official/vision/configs/retinanet_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3194c5f2fcd6b4dc497edef31ea6eb29f2f86877 --- /dev/null +++ b/official/vision/configs/retinanet_test.py @@ -0,0 +1,46 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for retinanet.""" +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.vision.configs import retinanet as exp_cfg + + +class RetinaNetConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters( + ('retinanet_resnetfpn_coco',), + ('retinanet_spinenet_coco',), + ('retinanet_mobile_coco',), + ) + def test_retinanet_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.RetinaNetTask) + self.assertIsInstance(config.task.model, exp_cfg.RetinaNet) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.validate() + config.task.train_data.is_training = None + with self.assertRaisesRegex(KeyError, 'Found inconsistency between key'): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/configs/semantic_segmentation.py b/official/vision/configs/semantic_segmentation.py new file mode 100644 index 0000000000000000000000000000000000000000..6c8c23b20d62dccfb02fbcaaae56262410e7c5a3 --- /dev/null +++ b/official/vision/configs/semantic_segmentation.py @@ -0,0 +1,730 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Semantic segmentation configuration definition.""" +import dataclasses +import os +from typing import List, Optional, Union + +import numpy as np +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.vision.configs import common +from official.vision.configs import decoders +from official.vision.configs import backbones + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """Input config for training.""" + output_size: List[int] = dataclasses.field(default_factory=list) + # If crop_size is specified, image will be resized first to + # output_size, then crop of size crop_size will be cropped. + crop_size: List[int] = dataclasses.field(default_factory=list) + input_path: str = '' + global_batch_size: int = 0 + is_training: bool = True + dtype: str = 'float32' + shuffle_buffer_size: int = 1000 + cycle_length: int = 10 + # If resize_eval_groundtruth is set to False, original image sizes are used + # for eval. In that case, groundtruth_padded_size has to be specified too to + # allow for batching the variable input sizes of images. + resize_eval_groundtruth: bool = True + groundtruth_padded_size: List[int] = dataclasses.field(default_factory=list) + aug_scale_min: float = 1.0 + aug_scale_max: float = 1.0 + aug_rand_hflip: bool = True + preserve_aspect_ratio: bool = True + aug_policy: Optional[str] = None + drop_remainder: bool = True + file_type: str = 'tfrecord' + decoder: Optional[common.DataDecoder] = common.DataDecoder() + + +@dataclasses.dataclass +class SegmentationHead(hyperparams.Config): + """Segmentation head config.""" + level: int = 3 + num_convs: int = 2 + num_filters: int = 256 + use_depthwise_convolution: bool = False + prediction_kernel_size: int = 1 + upsample_factor: int = 1 + feature_fusion: Optional[ + str] = None # None, deeplabv3plus, panoptic_fpn_fusion or pyramid_fusion + # deeplabv3plus feature fusion params + low_level: Union[int, str] = 2 + low_level_num_filters: int = 48 + # panoptic_fpn_fusion params + decoder_min_level: Optional[Union[int, str]] = None + decoder_max_level: Optional[Union[int, str]] = None + + +@dataclasses.dataclass +class MaskScoringHead(hyperparams.Config): + """Mask Scoring head config.""" + num_convs: int = 4 + num_filters: int = 128 + fc_input_size: List[int] = dataclasses.field(default_factory=list) + num_fcs: int = 2 + fc_dims: int = 1024 + + +@dataclasses.dataclass +class SemanticSegmentationModel(hyperparams.Config): + """Semantic segmentation model config.""" + num_classes: int = 0 + input_size: List[int] = dataclasses.field(default_factory=list) + min_level: int = 3 + max_level: int = 6 + head: SegmentationHead = SegmentationHead() + backbone: backbones.Backbone = backbones.Backbone( + type='resnet', resnet=backbones.ResNet()) + decoder: decoders.Decoder = decoders.Decoder(type='identity') + mask_scoring_head: Optional[MaskScoringHead] = None + norm_activation: common.NormActivation = common.NormActivation() + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + loss_weight: float = 1.0 + label_smoothing: float = 0.0 + ignore_label: int = 255 + gt_is_matting_map: bool = False + class_weights: List[float] = dataclasses.field(default_factory=list) + l2_weight_decay: float = 0.0 + use_groundtruth_dimension: bool = True + top_k_percent_pixels: float = 1.0 + + +@dataclasses.dataclass +class Evaluation(hyperparams.Config): + report_per_class_iou: bool = True + report_train_mean_iou: bool = True # Turning this off can speed up training. + + +@dataclasses.dataclass +class ExportConfig(hyperparams.Config): + # Whether to rescale the predicted mask to the original image size. + rescale_output: bool = False + + +@dataclasses.dataclass +class SemanticSegmentationTask(cfg.TaskConfig): + """The model config.""" + model: SemanticSegmentationModel = SemanticSegmentationModel() + train_data: DataConfig = DataConfig(is_training=True) + validation_data: DataConfig = DataConfig(is_training=False) + losses: Losses = Losses() + evaluation: Evaluation = Evaluation() + train_input_partition_dims: List[int] = dataclasses.field( + default_factory=list) + eval_input_partition_dims: List[int] = dataclasses.field(default_factory=list) + init_checkpoint: Optional[str] = None + init_checkpoint_modules: Union[ + str, List[str]] = 'all' # all, backbone, and/or decoder + export_config: ExportConfig = ExportConfig() + + +@exp_factory.register_config_factory('semantic_segmentation') +def semantic_segmentation() -> cfg.ExperimentConfig: + """Semantic segmentation general.""" + return cfg.ExperimentConfig( + task=SemanticSegmentationTask(), + trainer=cfg.TrainerConfig(), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + +# PASCAL VOC 2012 Dataset +PASCAL_TRAIN_EXAMPLES = 10582 +PASCAL_VAL_EXAMPLES = 1449 +PASCAL_INPUT_PATH_BASE = 'gs://**/pascal_voc_seg' + + +@exp_factory.register_config_factory('seg_deeplabv3_pascal') +def seg_deeplabv3_pascal() -> cfg.ExperimentConfig: + """Image segmentation on pascal voc with resnet deeplabv3.""" + train_batch_size = 16 + eval_batch_size = 8 + steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size + output_stride = 16 + aspp_dilation_rates = [12, 24, 36] # [6, 12, 18] if output_stride = 16 + multigrid = [1, 2, 4] + stem_type = 'v1' + level = int(np.math.log2(output_stride)) + config = cfg.ExperimentConfig( + task=SemanticSegmentationTask( + model=SemanticSegmentationModel( + num_classes=21, + input_size=[None, None, 3], + backbone=backbones.Backbone( + type='dilated_resnet', + dilated_resnet=backbones.DilatedResNet( + model_id=101, + output_stride=output_stride, + multigrid=multigrid, + stem_type=stem_type)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, dilation_rates=aspp_dilation_rates)), + head=SegmentationHead(level=level, num_convs=0), + norm_activation=common.NormActivation( + activation='swish', + norm_momentum=0.9997, + norm_epsilon=1e-3, + use_sync_bn=True)), + losses=Losses(l2_weight_decay=1e-4), + train_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), + # TODO(arashwan): test changing size to 513 to match deeplab. + output_size=[512, 512], + is_training=True, + global_batch_size=train_batch_size, + aug_scale_min=0.5, + aug_scale_max=2.0), + validation_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), + output_size=[512, 512], + is_training=False, + global_batch_size=eval_batch_size, + resize_eval_groundtruth=False, + groundtruth_padded_size=[512, 512], + drop_remainder=False), + # resnet101 + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/deeplab/deeplab_resnet101_imagenet/ckpt-62400', + init_checkpoint_modules='backbone'), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=45 * steps_per_epoch, + validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.007, + 'decay_steps': 45 * steps_per_epoch, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('seg_deeplabv3plus_pascal') +def seg_deeplabv3plus_pascal() -> cfg.ExperimentConfig: + """Image segmentation on pascal voc with resnet deeplabv3+.""" + train_batch_size = 16 + eval_batch_size = 8 + steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size + output_stride = 16 + aspp_dilation_rates = [6, 12, 18] + multigrid = [1, 2, 4] + stem_type = 'v1' + level = int(np.math.log2(output_stride)) + config = cfg.ExperimentConfig( + task=SemanticSegmentationTask( + model=SemanticSegmentationModel( + num_classes=21, + input_size=[None, None, 3], + backbone=backbones.Backbone( + type='dilated_resnet', + dilated_resnet=backbones.DilatedResNet( + model_id=101, + output_stride=output_stride, + stem_type=stem_type, + multigrid=multigrid)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, dilation_rates=aspp_dilation_rates)), + head=SegmentationHead( + level=level, + num_convs=2, + feature_fusion='deeplabv3plus', + low_level=2, + low_level_num_filters=48), + norm_activation=common.NormActivation( + activation='swish', + norm_momentum=0.9997, + norm_epsilon=1e-3, + use_sync_bn=True)), + losses=Losses(l2_weight_decay=1e-4), + train_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), + output_size=[512, 512], + is_training=True, + global_batch_size=train_batch_size, + aug_scale_min=0.5, + aug_scale_max=2.0), + validation_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), + output_size=[512, 512], + is_training=False, + global_batch_size=eval_batch_size, + resize_eval_groundtruth=False, + groundtruth_padded_size=[512, 512], + drop_remainder=False), + # resnet101 + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/deeplab/deeplab_resnet101_imagenet/ckpt-62400', + init_checkpoint_modules='backbone'), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=45 * steps_per_epoch, + validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.007, + 'decay_steps': 45 * steps_per_epoch, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('seg_resnetfpn_pascal') +def seg_resnetfpn_pascal() -> cfg.ExperimentConfig: + """Image segmentation on pascal voc with resnet-fpn.""" + train_batch_size = 256 + eval_batch_size = 32 + steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size + config = cfg.ExperimentConfig( + task=SemanticSegmentationTask( + model=SemanticSegmentationModel( + num_classes=21, + input_size=[512, 512, 3], + min_level=3, + max_level=7, + backbone=backbones.Backbone( + type='resnet', resnet=backbones.ResNet(model_id=50)), + decoder=decoders.Decoder(type='fpn', fpn=decoders.FPN()), + head=SegmentationHead(level=3, num_convs=3), + norm_activation=common.NormActivation( + activation='swish', use_sync_bn=True)), + losses=Losses(l2_weight_decay=1e-4), + train_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), + is_training=True, + global_batch_size=train_batch_size, + aug_scale_min=0.2, + aug_scale_max=1.5), + validation_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), + is_training=False, + global_batch_size=eval_batch_size, + resize_eval_groundtruth=False, + groundtruth_padded_size=[512, 512], + drop_remainder=False), + ), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=450 * steps_per_epoch, + validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.007, + 'decay_steps': 450 * steps_per_epoch, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('mnv2_deeplabv3_pascal') +def mnv2_deeplabv3_pascal() -> cfg.ExperimentConfig: + """Image segmentation on pascal with mobilenetv2 deeplabv3.""" + train_batch_size = 16 + eval_batch_size = 16 + steps_per_epoch = PASCAL_TRAIN_EXAMPLES // train_batch_size + output_stride = 16 + aspp_dilation_rates = [] + level = int(np.math.log2(output_stride)) + pool_kernel_size = [] + + config = cfg.ExperimentConfig( + task=SemanticSegmentationTask( + model=SemanticSegmentationModel( + num_classes=21, + input_size=[None, None, 3], + backbone=backbones.Backbone( + type='mobilenet', + mobilenet=backbones.MobileNet( + model_id='MobileNetV2', output_stride=output_stride)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, + dilation_rates=aspp_dilation_rates, + pool_kernel_size=pool_kernel_size)), + head=SegmentationHead(level=level, num_convs=0), + norm_activation=common.NormActivation( + activation='relu', + norm_momentum=0.99, + norm_epsilon=1e-3, + use_sync_bn=True)), + losses=Losses(l2_weight_decay=4e-5), + train_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'train_aug*'), + output_size=[512, 512], + is_training=True, + global_batch_size=train_batch_size, + aug_scale_min=0.5, + aug_scale_max=2.0), + validation_data=DataConfig( + input_path=os.path.join(PASCAL_INPUT_PATH_BASE, 'val*'), + output_size=[512, 512], + is_training=False, + global_batch_size=eval_batch_size, + resize_eval_groundtruth=False, + groundtruth_padded_size=[512, 512], + drop_remainder=False), + # mobilenetv2 + init_checkpoint='gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63', + init_checkpoint_modules=['backbone', 'decoder']), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=30000, + validation_steps=PASCAL_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + best_checkpoint_eval_metric='mean_iou', + best_checkpoint_export_subdir='best_ckpt', + best_checkpoint_metric_comp='higher', + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.007 * train_batch_size / 16, + 'decay_steps': 30000, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +# Cityscapes Dataset (Download and process the dataset yourself) +CITYSCAPES_TRAIN_EXAMPLES = 2975 +CITYSCAPES_VAL_EXAMPLES = 500 +CITYSCAPES_INPUT_PATH_BASE = 'cityscapes' + + +@exp_factory.register_config_factory('seg_deeplabv3plus_cityscapes') +def seg_deeplabv3plus_cityscapes() -> cfg.ExperimentConfig: + """Image segmentation on cityscapes with resnet deeplabv3+.""" + train_batch_size = 16 + eval_batch_size = 16 + steps_per_epoch = CITYSCAPES_TRAIN_EXAMPLES // train_batch_size + output_stride = 16 + aspp_dilation_rates = [6, 12, 18] + multigrid = [1, 2, 4] + stem_type = 'v1' + level = int(np.math.log2(output_stride)) + config = cfg.ExperimentConfig( + task=SemanticSegmentationTask( + model=SemanticSegmentationModel( + # Cityscapes uses only 19 semantic classes for train/evaluation. + # The void (background) class is ignored in train and evaluation. + num_classes=19, + input_size=[None, None, 3], + backbone=backbones.Backbone( + type='dilated_resnet', + dilated_resnet=backbones.DilatedResNet( + model_id=101, + output_stride=output_stride, + stem_type=stem_type, + multigrid=multigrid)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, + dilation_rates=aspp_dilation_rates, + pool_kernel_size=[512, 1024])), + head=SegmentationHead( + level=level, + num_convs=2, + feature_fusion='deeplabv3plus', + low_level=2, + low_level_num_filters=48), + norm_activation=common.NormActivation( + activation='swish', + norm_momentum=0.99, + norm_epsilon=1e-3, + use_sync_bn=True)), + losses=Losses(l2_weight_decay=1e-4), + train_data=DataConfig( + input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, + 'train_fine**'), + crop_size=[512, 1024], + output_size=[1024, 2048], + is_training=True, + global_batch_size=train_batch_size, + aug_scale_min=0.5, + aug_scale_max=2.0), + validation_data=DataConfig( + input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, 'val_fine*'), + output_size=[1024, 2048], + is_training=False, + global_batch_size=eval_batch_size, + resize_eval_groundtruth=True, + drop_remainder=False), + # resnet101 + init_checkpoint='gs://cloud-tpu-checkpoints/vision-2.0/deeplab/deeplab_resnet101_imagenet/ckpt-62400', + init_checkpoint_modules='backbone'), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=500 * steps_per_epoch, + validation_steps=CITYSCAPES_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.01, + 'decay_steps': 500 * steps_per_epoch, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('mnv2_deeplabv3_cityscapes') +def mnv2_deeplabv3_cityscapes() -> cfg.ExperimentConfig: + """Image segmentation on cityscapes with mobilenetv2 deeplabv3.""" + train_batch_size = 16 + eval_batch_size = 16 + steps_per_epoch = CITYSCAPES_TRAIN_EXAMPLES // train_batch_size + output_stride = 16 + aspp_dilation_rates = [] + pool_kernel_size = [512, 1024] + + level = int(np.math.log2(output_stride)) + config = cfg.ExperimentConfig( + task=SemanticSegmentationTask( + model=SemanticSegmentationModel( + # Cityscapes uses only 19 semantic classes for train/evaluation. + # The void (background) class is ignored in train and evaluation. + num_classes=19, + input_size=[None, None, 3], + backbone=backbones.Backbone( + type='mobilenet', + mobilenet=backbones.MobileNet( + model_id='MobileNetV2', output_stride=output_stride)), + decoder=decoders.Decoder( + type='aspp', + aspp=decoders.ASPP( + level=level, + dilation_rates=aspp_dilation_rates, + pool_kernel_size=pool_kernel_size)), + head=SegmentationHead(level=level, num_convs=0), + norm_activation=common.NormActivation( + activation='relu', + norm_momentum=0.99, + norm_epsilon=1e-3, + use_sync_bn=True)), + losses=Losses(l2_weight_decay=4e-5), + train_data=DataConfig( + input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, + 'train_fine**'), + crop_size=[512, 1024], + output_size=[1024, 2048], + is_training=True, + global_batch_size=train_batch_size, + aug_scale_min=0.5, + aug_scale_max=2.0), + validation_data=DataConfig( + input_path=os.path.join(CITYSCAPES_INPUT_PATH_BASE, 'val_fine*'), + output_size=[1024, 2048], + is_training=False, + global_batch_size=eval_batch_size, + resize_eval_groundtruth=True, + drop_remainder=False), + # Coco pre-trained mobilenetv2 checkpoint + init_checkpoint='gs://tf_model_garden/cloud/vision-2.0/deeplab/deeplabv3_mobilenetv2_coco/best_ckpt-63', + init_checkpoint_modules='backbone'), + trainer=cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=100000, + validation_steps=CITYSCAPES_VAL_EXAMPLES // eval_batch_size, + validation_interval=steps_per_epoch, + best_checkpoint_eval_metric='mean_iou', + best_checkpoint_export_subdir='best_ckpt', + best_checkpoint_metric_comp='higher', + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9 + } + }, + 'learning_rate': { + 'type': 'polynomial', + 'polynomial': { + 'initial_learning_rate': 0.01, + 'decay_steps': 100000, + 'end_learning_rate': 0.0, + 'power': 0.9 + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': 5 * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None' + ]) + + return config + + +@exp_factory.register_config_factory('mnv2_deeplabv3plus_cityscapes') +def mnv2_deeplabv3plus_cityscapes() -> cfg.ExperimentConfig: + """Image segmentation on cityscapes with mobilenetv2 deeplabv3plus.""" + config = mnv2_deeplabv3_cityscapes() + config.task.model.head = SegmentationHead( + level=4, + num_convs=2, + feature_fusion='deeplabv3plus', + use_depthwise_convolution=True, + low_level='2/depthwise', + low_level_num_filters=48) + config.task.model.backbone.mobilenet.output_intermediate_endpoints = True + return config diff --git a/official/vision/configs/semantic_segmentation_test.py b/official/vision/configs/semantic_segmentation_test.py new file mode 100644 index 0000000000000000000000000000000000000000..fb25dbdc8990aa9289a51fb3d97643d5d152d3f8 --- /dev/null +++ b/official/vision/configs/semantic_segmentation_test.py @@ -0,0 +1,45 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for semantic_segmentation.""" + +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.vision.configs import semantic_segmentation as exp_cfg + + +class ImageSegmentationConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters(('seg_deeplabv3_pascal',), + ('seg_deeplabv3plus_pascal',)) + def test_semantic_segmentation_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.SemanticSegmentationTask) + self.assertIsInstance(config.task.model, + exp_cfg.SemanticSegmentationModel) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.validate() + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/configs/video_classification.py b/official/vision/configs/video_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..3d803e6ef95d739bada71efa601003f87cc32e2b --- /dev/null +++ b/official/vision/configs/video_classification.py @@ -0,0 +1,375 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Video classification configuration definition.""" +import dataclasses +from typing import Optional, Tuple, Union +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.modeling import hyperparams +from official.modeling import optimization +from official.vision.configs import backbones_3d +from official.vision.configs import common + + +@dataclasses.dataclass +class DataConfig(cfg.DataConfig): + """The base configuration for building datasets.""" + name: Optional[str] = None + file_type: Optional[str] = 'tfrecord' + compressed_input: bool = False + split: str = 'train' + variant_name: Optional[str] = None + feature_shape: Tuple[int, ...] = (64, 224, 224, 3) + temporal_stride: int = 1 + random_stride_range: int = 0 + num_test_clips: int = 1 + num_test_crops: int = 1 + num_classes: int = -1 + num_examples: int = -1 + global_batch_size: int = 128 + data_format: str = 'channels_last' + dtype: str = 'float32' + label_dtype: str = 'int32' + one_hot: bool = True + shuffle_buffer_size: int = 64 + cache: bool = False + input_path: Union[str, cfg.base_config.Config] = '' + is_training: bool = True + cycle_length: int = 10 + drop_remainder: bool = True + min_image_size: int = 256 + zero_centering_image: bool = False + is_multilabel: bool = False + output_audio: bool = False + audio_feature: str = '' + audio_feature_shape: Tuple[int, ...] = (-1,) + aug_min_aspect_ratio: float = 0.5 + aug_max_aspect_ratio: float = 2.0 + aug_min_area_ratio: float = 0.49 + aug_max_area_ratio: float = 1.0 + aug_type: Optional[ + common.Augmentation] = None # AutoAugment and RandAugment. + mixup_and_cutmix: Optional[common.MixupAndCutmix] = None + image_field_key: str = 'image/encoded' + label_field_key: str = 'clip/label/index' + + +def kinetics400(is_training): + """Generated Kinectics 400 dataset configs.""" + return DataConfig( + name='kinetics400', + num_classes=400, + is_training=is_training, + split='train' if is_training else 'valid', + drop_remainder=is_training, + num_examples=215570 if is_training else 17706, + feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) + + +def kinetics600(is_training): + """Generated Kinectics 600 dataset configs.""" + return DataConfig( + name='kinetics600', + num_classes=600, + is_training=is_training, + split='train' if is_training else 'valid', + drop_remainder=is_training, + num_examples=366016 if is_training else 27780, + feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) + + +def kinetics700(is_training): + """Generated Kinectics 600 dataset configs.""" + return DataConfig( + name='kinetics700', + num_classes=700, + is_training=is_training, + split='train' if is_training else 'valid', + drop_remainder=is_training, + num_examples=522883 if is_training else 33441, + feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) + + +def kinetics700_2020(is_training): + """Generated Kinectics 600 dataset configs.""" + return DataConfig( + name='kinetics700', + num_classes=700, + is_training=is_training, + split='train' if is_training else 'valid', + drop_remainder=is_training, + num_examples=535982 if is_training else 33640, + feature_shape=(64, 224, 224, 3) if is_training else (250, 224, 224, 3)) + + +@dataclasses.dataclass +class VideoClassificationModel(hyperparams.Config): + """The model config.""" + model_type: str = 'video_classification' + backbone: backbones_3d.Backbone3D = backbones_3d.Backbone3D( + type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()) + norm_activation: common.NormActivation = common.NormActivation( + use_sync_bn=False) + dropout_rate: float = 0.2 + aggregate_endpoints: bool = False + require_endpoints: Optional[Tuple[str, ...]] = None + + +@dataclasses.dataclass +class Losses(hyperparams.Config): + one_hot: bool = True + label_smoothing: float = 0.0 + l2_weight_decay: float = 0.0 + + +@dataclasses.dataclass +class Metrics(hyperparams.Config): + use_per_class_recall: bool = False + + +@dataclasses.dataclass +class VideoClassificationTask(cfg.TaskConfig): + """The task config.""" + model: VideoClassificationModel = VideoClassificationModel() + train_data: DataConfig = DataConfig(is_training=True, drop_remainder=True) + validation_data: DataConfig = DataConfig( + is_training=False, drop_remainder=False) + losses: Losses = Losses() + metrics: Metrics = Metrics() + init_checkpoint: Optional[str] = None + init_checkpoint_modules: str = 'all' # all or backbone + freeze_backbone: bool = False + # Spatial Partitioning fields. + train_input_partition_dims: Optional[Tuple[int, ...]] = None + eval_input_partition_dims: Optional[Tuple[int, ...]] = None + + +def add_trainer(experiment: cfg.ExperimentConfig, + train_batch_size: int, + eval_batch_size: int, + learning_rate: float = 1.6, + train_epochs: int = 44, + warmup_epochs: int = 5): + """Add and config a trainer to the experiment config.""" + if experiment.task.train_data.num_examples <= 0: + raise ValueError('Wrong train dataset size {!r}'.format( + experiment.task.train_data)) + if experiment.task.validation_data.num_examples <= 0: + raise ValueError('Wrong validation dataset size {!r}'.format( + experiment.task.validation_data)) + experiment.task.train_data.global_batch_size = train_batch_size + experiment.task.validation_data.global_batch_size = eval_batch_size + steps_per_epoch = experiment.task.train_data.num_examples // train_batch_size + experiment.trainer = cfg.TrainerConfig( + steps_per_loop=steps_per_epoch, + summary_interval=steps_per_epoch, + checkpoint_interval=steps_per_epoch, + train_steps=train_epochs * steps_per_epoch, + validation_steps=experiment.task.validation_data.num_examples // + eval_batch_size, + validation_interval=steps_per_epoch, + optimizer_config=optimization.OptimizationConfig({ + 'optimizer': { + 'type': 'sgd', + 'sgd': { + 'momentum': 0.9, + 'nesterov': True, + } + }, + 'learning_rate': { + 'type': 'cosine', + 'cosine': { + 'initial_learning_rate': learning_rate, + 'decay_steps': train_epochs * steps_per_epoch, + } + }, + 'warmup': { + 'type': 'linear', + 'linear': { + 'warmup_steps': warmup_epochs * steps_per_epoch, + 'warmup_learning_rate': 0 + } + } + })) + return experiment + + +@exp_factory.register_config_factory('video_classification') +def video_classification() -> cfg.ExperimentConfig: + """Video classification general.""" + return cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=VideoClassificationTask(), + trainer=cfg.TrainerConfig(), + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.train_data.num_classes == task.validation_data.num_classes', + ]) + + +@exp_factory.register_config_factory('video_classification_ucf101') +def video_classification_ucf101() -> cfg.ExperimentConfig: + """Video classification on UCF-101 with resnet.""" + train_dataset = DataConfig( + name='ucf101', + num_classes=101, + is_training=True, + split='train', + drop_remainder=True, + num_examples=9537, + temporal_stride=2, + feature_shape=(32, 224, 224, 3)) + train_dataset.tfds_name = 'ucf101' + train_dataset.tfds_split = 'train' + validation_dataset = DataConfig( + name='ucf101', + num_classes=101, + is_training=True, + split='test', + drop_remainder=False, + num_examples=3783, + temporal_stride=2, + feature_shape=(32, 224, 224, 3)) + validation_dataset.tfds_name = 'ucf101' + validation_dataset.tfds_split = 'test' + task = VideoClassificationTask( + model=VideoClassificationModel( + backbone=backbones_3d.Backbone3D( + type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), + losses=Losses(l2_weight_decay=1e-4), + train_data=train_dataset, + validation_data=validation_dataset) + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=task, + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.train_data.num_classes == task.validation_data.num_classes', + ]) + add_trainer( + config, + train_batch_size=64, + eval_batch_size=16, + learning_rate=0.8, + train_epochs=100) + return config + + +@exp_factory.register_config_factory('video_classification_kinetics400') +def video_classification_kinetics400() -> cfg.ExperimentConfig: + """Video classification on Kinectics 400 with resnet.""" + train_dataset = kinetics400(is_training=True) + validation_dataset = kinetics400(is_training=False) + task = VideoClassificationTask( + model=VideoClassificationModel( + backbone=backbones_3d.Backbone3D( + type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), + losses=Losses(l2_weight_decay=1e-4), + train_data=train_dataset, + validation_data=validation_dataset) + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=task, + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.train_data.num_classes == task.validation_data.num_classes', + ]) + add_trainer(config, train_batch_size=1024, eval_batch_size=64) + return config + + +@exp_factory.register_config_factory('video_classification_kinetics600') +def video_classification_kinetics600() -> cfg.ExperimentConfig: + """Video classification on Kinectics 600 with resnet.""" + train_dataset = kinetics600(is_training=True) + validation_dataset = kinetics600(is_training=False) + task = VideoClassificationTask( + model=VideoClassificationModel( + backbone=backbones_3d.Backbone3D( + type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), + losses=Losses(l2_weight_decay=1e-4), + train_data=train_dataset, + validation_data=validation_dataset) + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=task, + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.train_data.num_classes == task.validation_data.num_classes', + ]) + add_trainer(config, train_batch_size=1024, eval_batch_size=64) + return config + + +@exp_factory.register_config_factory('video_classification_kinetics700') +def video_classification_kinetics700() -> cfg.ExperimentConfig: + """Video classification on Kinectics 700 with resnet.""" + train_dataset = kinetics700(is_training=True) + validation_dataset = kinetics700(is_training=False) + task = VideoClassificationTask( + model=VideoClassificationModel( + backbone=backbones_3d.Backbone3D( + type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), + losses=Losses(l2_weight_decay=1e-4), + train_data=train_dataset, + validation_data=validation_dataset) + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=task, + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.train_data.num_classes == task.validation_data.num_classes', + ]) + add_trainer(config, train_batch_size=1024, eval_batch_size=64) + return config + + +@exp_factory.register_config_factory('video_classification_kinetics700_2020') +def video_classification_kinetics700_2020() -> cfg.ExperimentConfig: + """Video classification on Kinectics 700 2020 with resnet.""" + train_dataset = kinetics700_2020(is_training=True) + validation_dataset = kinetics700_2020(is_training=False) + task = VideoClassificationTask( + model=VideoClassificationModel( + backbone=backbones_3d.Backbone3D( + type='resnet_3d', resnet_3d=backbones_3d.ResNet3D50()), + norm_activation=common.NormActivation( + norm_momentum=0.9, norm_epsilon=1e-5, use_sync_bn=False)), + losses=Losses(l2_weight_decay=1e-4), + train_data=train_dataset, + validation_data=validation_dataset) + config = cfg.ExperimentConfig( + runtime=cfg.RuntimeConfig(mixed_precision_dtype='bfloat16'), + task=task, + restrictions=[ + 'task.train_data.is_training != None', + 'task.validation_data.is_training != None', + 'task.train_data.num_classes == task.validation_data.num_classes', + ]) + add_trainer(config, train_batch_size=1024, eval_batch_size=64) + return config diff --git a/official/vision/configs/video_classification_test.py b/official/vision/configs/video_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..6c8053e14e37d38feb5c44947e4c6ff2d212988c --- /dev/null +++ b/official/vision/configs/video_classification_test.py @@ -0,0 +1,44 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for video_classification.""" + +# pylint: disable=unused-import +from absl.testing import parameterized +import tensorflow as tf + +from official import vision +from official.core import config_definitions as cfg +from official.core import exp_factory +from official.vision.configs import video_classification as exp_cfg + + +class VideoClassificationConfigTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.parameters(('video_classification',), + ('video_classification_kinetics600',)) + def test_video_classification_configs(self, config_name): + config = exp_factory.get_exp_config(config_name) + self.assertIsInstance(config, cfg.ExperimentConfig) + self.assertIsInstance(config.task, exp_cfg.VideoClassificationTask) + self.assertIsInstance(config.task.model, exp_cfg.VideoClassificationModel) + self.assertIsInstance(config.task.train_data, exp_cfg.DataConfig) + config.validate() + config.task.train_data.is_training = None + with self.assertRaises(KeyError): + config.validate() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/data/__init__.py b/official/vision/data/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/vision/data/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/data/create_coco_tf_record.py b/official/vision/data/create_coco_tf_record.py similarity index 97% rename from official/vision/beta/data/create_coco_tf_record.py rename to official/vision/data/create_coco_tf_record.py index 9cf30d2214588755aa89bbcf9000a989cdffe75d..ffa579696c65dd08f2c51752c86b2fd42660fb0d 100644 --- a/official/vision/beta/data/create_coco_tf_record.py +++ b/official/vision/data/create_coco_tf_record.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -40,7 +40,7 @@ from pycocotools import mask import tensorflow as tf import multiprocessing as mp -from official.vision.beta.data import tfrecord_lib +from official.vision.data import tfrecord_lib flags.DEFINE_boolean( @@ -68,6 +68,11 @@ flags.DEFINE_boolean( 'default: False.') flags.DEFINE_string('output_file_prefix', '/tmp/train', 'Path to output file') flags.DEFINE_integer('num_shards', 32, 'Number of shards for output file.') +_NUM_PROCESSES = flags.DEFINE_integer( + 'num_processes', None, + ('Number of parallel processes to use. ' + 'If set to 0, disables multi-processing.')) + FLAGS = flags.FLAGS @@ -107,7 +112,7 @@ def generate_coco_panoptics_masks(segments_info, mask_path, represent "stuff" and "things" classes respectively. Returns: - A dict with with keys: [u'semantic_segmentation_mask', u'category_mask', + A dict with keys: [u'semantic_segmentation_mask', u'category_mask', u'instance_mask']. The dict contains 'category_mask' and 'instance_mask' only if `include_panoptic_eval_masks` is set to True. """ @@ -221,7 +226,7 @@ def bbox_annotations_to_feature_dict( 'image/object/is_crowd': tfrecord_lib.convert_to_feature(data['is_crowd']), 'image/object/area': - tfrecord_lib.convert_to_feature(data['area']), + tfrecord_lib.convert_to_feature(data['area'], 'float_list') } if include_masks: feature_dict['image/object/mask'] = ( @@ -518,7 +523,8 @@ def _create_tf_record_from_coco_annotations(images_info_file, include_masks=include_masks) num_skipped = tfrecord_lib.write_tf_record_dataset( - output_path, coco_annotations_iter, create_tf_example, num_shards) + output_path, coco_annotations_iter, create_tf_example, num_shards, + multiple_processes=_NUM_PROCESSES.value) logging.info('Finished writing, skipped %d annotations.', num_skipped) diff --git a/official/vision/data/fake_feature_generator.py b/official/vision/data/fake_feature_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..440e87529c1cf5c4e2110985728272f0b4a7f599 --- /dev/null +++ b/official/vision/data/fake_feature_generator.py @@ -0,0 +1,125 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Generates fake feature for testing and validation.""" + +import collections +from typing import Optional, Tuple, Union + +import numpy as np + +_RGB_CHANNELS = 3 + + +def generate_image_np(height: int, + width: int, + num_channels: int = _RGB_CHANNELS) -> np.ndarray: + """Returns a fake numpy image matrix array.""" + return np.reshape( + np.mod(np.arange(height * width * num_channels), 255).astype(np.uint8), + newshape=(height, width, num_channels)) + + +def generate_normalized_boxes_np(num_boxes: int) -> np.ndarray: + """Returns a fake numpy normalized boxes array.""" + xmins = np.reshape(np.arange(num_boxes) / (2 * num_boxes), (num_boxes, 1)) + ymins = np.reshape(np.arange(num_boxes) / (2 * num_boxes), (num_boxes, 1)) + xmaxs = xmins + .5 + ymaxs = ymins + .5 + return np.concatenate((ymins, xmins, ymaxs, xmaxs), axis=-1) + + +def generate_boxes_np(height: int, width: int, num_boxes: int) -> np.ndarray: + """Returns a fake numpy absolute boxes array.""" + normalized_boxes = generate_normalized_boxes_np(num_boxes) + normalized_boxes[:, 1::2] *= height + normalized_boxes[:, 0::2] *= width + return normalized_boxes + + +def generate_classes_np(num_classes: int, + size: Optional[int] = None) -> Union[int, np.ndarray]: + """Returns a fake class or a fake numpy classes array.""" + if size is None: + return num_classes - 1 + + return np.arange(size) % num_classes + + +def generate_confidences_np( + size: Optional[int] = None) -> Union[float, np.ndarray]: + """Returns a fake confidence score or a fake numpy confidence score array.""" + if size is None: + return 0.5 + + return np.arange(size) / size + + +def generate_instance_masks_np(height: int, + width: int, + boxes_np: np.ndarray, + normalized: bool = True) -> np.ndarray: + """Returns a fake numpy instance mask matrices array.""" + num_boxes = len(boxes_np) + instance_masks_np = np.zeros((num_boxes, height, width, 1)) + if normalized: + boxes_np[:, 1::2] *= height + boxes_np[:, ::2] *= width + xmins = boxes_np[:, 0].astype(int) + ymins = boxes_np[:, 1].astype(int) + box_widths = boxes_np[:, 2].astype(int) - xmins + box_heights = boxes_np[:, 3].astype(int) - ymins + + for i, (x, y, w, h) in enumerate(zip(xmins, ymins, box_widths, box_heights)): + instance_masks_np[i, y:y + h, x:x + w, :] = np.reshape( + np.mod(np.arange(h * w), 2).astype(np.uint8), newshape=(h, w, 1)) + return instance_masks_np + + +def generate_semantic_mask_np(height: int, width: int, + num_classes: int) -> np.ndarray: + """Returns a fake numpy semantic mask array.""" + return generate_image_np(height, width, num_channels=1) % num_classes + + +def generate_panoptic_masks_np( + semantic_mask: np.ndarray, instance_masks: np.ndarray, + instance_classes: np.ndarray, + stuff_classes_offset: int) -> Tuple[np.ndarray, np.ndarray]: + """Returns fake numpy panoptic category and instance mask arrays.""" + panoptic_category_mask = np.zeros_like(semantic_mask) + panoptic_instance_mask = np.zeros_like(semantic_mask) + instance_ids = collections.defaultdict(int) + for instance_mask, instance_class in zip(instance_masks, instance_classes): + if instance_class == 0: + continue + instance_ids[instance_class] += 1 + # If a foreground pixel is labelled previously, replace the old category + # class and instance ID with the new one. + foreground_indices = np.where(np.equal(instance_mask, 1)) + # Note that instance class start from index 1. + panoptic_category_mask[foreground_indices] = instance_class + 1 + panoptic_instance_mask[foreground_indices] = instance_ids[instance_class] + + # If there are pixels remains unlablled (labelled as background), then the + # semantic labels will be used (if it has one). + # Note that in panoptic FPN, the panoptic labels are expected in this order, + # 0 (background), 1 ..., N (stuffs), N + 1, ..., N + M - 2 (things) + # N classes for stuff classes, without background class, and M classes for + # thing classes, with 0 representing the background class and 1 representing + # all stuff classes. + background_indices = np.where(np.equal(panoptic_category_mask, 0)) + panoptic_category_mask[background_indices] = ( + semantic_mask[background_indices] + stuff_classes_offset) + return panoptic_category_mask, panoptic_instance_mask diff --git a/official/vision/data/image_utils.py b/official/vision/data/image_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d4b31b1f34087040bee25bb7bec6d362dd97ccc4 --- /dev/null +++ b/official/vision/data/image_utils.py @@ -0,0 +1,113 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image-related utilities that are useful to prepare dataset.""" + +import dataclasses +import imghdr +import io +from typing import Optional, Tuple + +import numpy as np +from PIL import Image + + +@dataclasses.dataclass +class ImageFormat: + """Supported image formats. + + For model development, this library should support the same image formats as + `tf.io.decode_image`[1]. + + [1]: https://www.tensorflow.org/api_docs/python/tf/io/decode_image + """ + bmp: str = 'BMP' + png: str = 'PNG' + jpeg: str = 'JPEG' + raw: str = 'RAW' + + +def validate_image_format(format_str: str) -> str: + """Validates `format_str` and returns canonical format. + + This function accepts image format in lower case and will returns the upper + case string as canonical format. + + Args: + format_str: Image format string. + + Returns: + Canonical image format string. + + Raises: + ValueError: If the canonical format is not listed in `ImageFormat`. + """ + canonical_format = format_str.upper() + if canonical_format in dataclasses.asdict(ImageFormat()).values(): + return canonical_format + raise ValueError(f'Image format is invalid: {format_str}') + + +def encode_image(image_np: np.ndarray, image_format: str) -> bytes: + """Encodes `image_np` specified by `image_format`. + + Args: + image_np: Numpy image array. + image_format: An enum specifying the format of the generated image. + + Returns: + Encoded image string. + """ + if image_format == 'RAW': + return image_np.tobytes() + + if len(image_np.shape) > 2 and image_np.shape[2] == 1: + image_pil = Image.fromarray(np.squeeze(image_np), 'L') + else: + image_pil = Image.fromarray(image_np) + with io.BytesIO() as output: + image_pil.save(output, format=validate_image_format(image_format)) + return output.getvalue() + + +def decode_image(image_bytes: bytes, + image_format: Optional[str] = None, + image_dtype: str = 'uint8') -> np.ndarray: + """Decodes image_bytes into numpy array.""" + if image_format == 'RAW': + return np.frombuffer(image_bytes, dtype=image_dtype) + image_pil = Image.open(io.BytesIO(image_bytes)) + image_np = np.array(image_pil) + if len(image_np.shape) < 3: + image_np = image_np[..., np.newaxis] + return image_np + + +def decode_image_metadata(image_bytes: bytes) -> Tuple[int, int, int, str]: + """Decodes image metadata from encoded image string. + + Note that if the image is encoded in RAW format, the metadata cannot be + inferred from the image bytes. + + Args: + image_bytes: Encoded image string. + + Returns: + A tuple of height, width, number of channels, and encoding format. + """ + image_np = decode_image(image_bytes) + # https://pillow.readthedocs.io/en/stable/reference/Image.html#image-attributes + height, width, num_channels = image_np.shape + image_format = imghdr.what(file=None, h=image_bytes) + return height, width, num_channels, validate_image_format(image_format) diff --git a/official/vision/data/image_utils_test.py b/official/vision/data/image_utils_test.py new file mode 100644 index 0000000000000000000000000000000000000000..91eb3b45687118facc5c016556a205ee296f2cbf --- /dev/null +++ b/official/vision/data/image_utils_test.py @@ -0,0 +1,87 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for image_utils.""" +import imghdr +from unittest import mock +from absl.testing import parameterized +import tensorflow as tf + +from official.vision.data import fake_feature_generator +from official.vision.data import image_utils + + +class ImageUtilsTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.named_parameters( + ('RGB_PNG', 128, 64, 3, 'PNG'), ('RGB_JPEG', 2, 1, 3, 'JPEG'), + ('GREY_BMP', 32, 32, 1, 'BMP'), ('GREY_PNG', 128, 128, 1, 'png')) + def test_encode_image_then_decode_image(self, height, width, num_channels, + image_format): + image_np = fake_feature_generator.generate_image_np(height, width, + num_channels) + image_str = image_utils.encode_image(image_np, image_format) + actual_image_np = image_utils.decode_image(image_str) + + # JPEG encoding does not keep the pixel value. + if image_format != 'JPEG': + self.assertAllClose(actual_image_np, image_np) + self.assertEqual(actual_image_np.shape, image_np.shape) + + @parameterized.named_parameters( + ('RGB_RAW', 128, 64, 3, tf.bfloat16.as_numpy_dtype), + ('GREY_RAW', 32, 32, 1, tf.uint8.as_numpy_dtype)) + def test_encode_raw_image_then_decode_raw_image(self, height, width, + num_channels, image_dtype): + image_np = fake_feature_generator.generate_image_np(height, width, + num_channels) + image_np = image_np.astype(image_dtype) + image_str = image_utils.encode_image(image_np, 'RAW') + actual_image_np = image_utils.decode_image(image_str, 'RAW', image_dtype) + actual_image_np = actual_image_np.reshape([height, width, num_channels]) + + self.assertAllClose(actual_image_np, image_np) + self.assertEqual(actual_image_np.shape, image_np.shape) + + @parameterized.named_parameters( + ('RGB_PNG', 128, 64, 3, 'PNG'), ('RGB_JPEG', 64, 128, 3, 'JPEG'), + ('GREY_BMP', 32, 32, 1, 'BMP'), ('GREY_PNG', 128, 128, 1, 'png')) + def test_encode_image_then_decode_image_metadata(self, height, width, + num_channels, image_format): + image_np = fake_feature_generator.generate_image_np(height, width, + num_channels) + image_str = image_utils.encode_image(image_np, image_format) + (actual_height, actual_width, actual_num_channels, actual_format) = ( + image_utils.decode_image_metadata(image_str)) + + self.assertEqual(actual_height, height) + self.assertEqual(actual_width, width) + self.assertEqual(actual_num_channels, num_channels) + self.assertEqual(actual_format, image_format.upper()) + + def test_encode_image_raise_error_with_invalid_image_format(self): + with self.assertRaisesRegex(ValueError, 'Image format is invalid: foo'): + image_np = fake_feature_generator.generate_image_np(2, 2, 1) + image_utils.encode_image(image_np, 'foo') + + @mock.patch.object(imghdr, 'what', return_value='foo', autospec=True) + def test_decode_image_raise_error_with_invalid_image_format(self, _): + image_np = fake_feature_generator.generate_image_np(1, 1, 3) + image_str = image_utils.encode_image(image_np, 'PNG') + with self.assertRaisesRegex(ValueError, 'Image format is invalid: foo'): + image_utils.decode_image_metadata(image_str) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/data/process_coco_few_shot.sh b/official/vision/data/process_coco_few_shot.sh similarity index 100% rename from official/vision/beta/data/process_coco_few_shot.sh rename to official/vision/data/process_coco_few_shot.sh diff --git a/official/vision/beta/data/process_coco_few_shot_json_files.py b/official/vision/data/process_coco_few_shot_json_files.py similarity index 98% rename from official/vision/beta/data/process_coco_few_shot_json_files.py rename to official/vision/data/process_coco_few_shot_json_files.py index 7f02c27019fcfb3462fdbadc777ea7b85695bcbc..7a918c5117d3d792bb3d11e974e6d1da921caf54 100644 --- a/official/vision/beta/data/process_coco_few_shot_json_files.py +++ b/official/vision/data/process_coco_few_shot_json_files.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/data/process_coco_panoptic.sh b/official/vision/data/process_coco_panoptic.sh similarity index 100% rename from official/vision/beta/data/process_coco_panoptic.sh rename to official/vision/data/process_coco_panoptic.sh diff --git a/official/vision/data/tf_example_builder.py b/official/vision/data/tf_example_builder.py new file mode 100644 index 0000000000000000000000000000000000000000..69605614f00eb69262aa2b34ed7dda537d3ec8d2 --- /dev/null +++ b/official/vision/data/tf_example_builder.py @@ -0,0 +1,492 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Builder class for preparing tf.train.Example in vision tasks.""" + +# https://www.python.org/dev/peps/pep-0563/#enabling-the-future-behavior-in-python-3-7 +from __future__ import annotations + +import hashlib +from typing import Optional, Sequence, Union +import numpy as np + +from official.core import tf_example_builder +from official.vision.data import image_utils +from official.vision.data import tf_example_feature_key + +BytesValueType = Union[bytes, Sequence[bytes], str, Sequence[str]] + +_to_array = lambda v: [v] if not isinstance(v, (list, np.ndarray)) else v +_to_bytes = lambda v: v.encode() if isinstance(v, str) else v +_to_bytes_array = lambda v: list(map(_to_bytes, _to_array(v))) + + +class TfExampleBuilder(tf_example_builder.TfExampleBuilder): + """Builder class for preparing tf.train.Example in vision task. + + Read API doc at https://www.tensorflow.org/api_docs/python/tf/train/Example. + """ + + def add_image_matrix_feature( + self, + image_matrix: np.ndarray, + image_format: str = 'PNG', + image_source_id: Optional[bytes] = None, + feature_prefix: Optional[str] = None, + label: Optional[Union[int, Sequence[int]]] = None) -> 'TfExampleBuilder': + """Encodes and adds image features to the example. + + See `tf_example_feature_key.EncodedImageFeatureKey` for list of feature keys + that will be added to the example. + + Example usages: + >>> example_builder = TfExampleBuilder() + * For adding RGB image feature with PNG encoding: + >>> example_builder.add_image_matrix_feature(image_matrix) + * For adding RGB image feature with a pre-generated source ID. + >>> example_builder.add_image_matrix_feature( + image_matrix, image_source_id=image_source_id) + * For adding single-channel depth image feature with JPEG encoding: + >>> example_builder.add_image_matrix_feature( + image_matrix, image_format=ImageFormat.JPEG, + feature_prefix='depth') + + Args: + image_matrix: Numpy image matrix with shape (height, width, channels) + image_format: Image format string, defaults to 'PNG'. + image_source_id: Unique string ID to identify the image. Hashed image will + be used if the field is not provided. + feature_prefix: Feature prefix for image features. + label: the label or a list of labels for the image. + + Returns: + The builder object for subsequent method calls. + """ + encoded_image = image_utils.encode_image(image_matrix, image_format) + height, width, num_channels = image_matrix.shape + + return self.add_encoded_image_feature(encoded_image, image_format, height, + width, num_channels, image_source_id, + feature_prefix, label) + + def add_encoded_image_feature( + self, + encoded_image: bytes, + image_format: Optional[str] = None, + height: Optional[int] = None, + width: Optional[int] = None, + num_channels: Optional[int] = None, + image_source_id: Optional[bytes] = None, + feature_prefix: Optional[str] = None, + label: Optional[Union[int, Sequence[int]]] = None) -> 'TfExampleBuilder': + """Adds encoded image features to the example. + + See `tf_example_feature_key.EncodedImageFeatureKey` for list of feature keys + that will be added to the example. + + Image format, height, width, and channels are inferred from the encoded + image bytes if any of them is not provided. Hashed image will be used if + pre-generated source ID is not provided. + + Example usages: + >>> example_builder = TfExampleBuilder() + * For adding RGB image feature: + >>> example_builder.add_encoded_image_feature(image_bytes) + * For adding RGB image feature with pre-generated source ID: + >>> example_builder.add_encoded_image_feature( + image_bytes, image_source_id=image_source_id) + * For adding single-channel depth image feature: + >>> example_builder.add_encoded_image_feature( + image_bytes, feature_prefix='depth') + + Args: + encoded_image: Encoded image string. + image_format: Image format string. + height: Number of rows. + width: Number of columns. + num_channels: Number of channels. + image_source_id: Unique string ID to identify the image. + feature_prefix: Feature prefix for image features. + label: the label or a list of labels for the image. + + Returns: + The builder object for subsequent method calls. + """ + if image_format == 'RAW': + if not (height and width and num_channels): + raise ValueError('For raw image feature, height, width and ' + 'num_channels fields are required.') + if not all((height, width, num_channels, image_format)): + (height, width, num_channels, image_format) = ( + image_utils.decode_image_metadata(encoded_image)) + else: + image_format = image_utils.validate_image_format(image_format) + + feature_key = tf_example_feature_key.EncodedImageFeatureKey(feature_prefix) + + # If source ID is not provided, we use hashed encoded image as the source + # ID. Note that we only keep 24 bits to be consistent with the Model Garden + # requirement, which will transform the source ID into float32. + if not image_source_id: + hashed_image = int(hashlib.blake2s(encoded_image).hexdigest(), 16) + image_source_id = _to_bytes(str(hash(hashed_image) % ((1 << 24) + 1))) + + if label is not None: + self.add_ints_feature(feature_key.label, label) + + return ( + self.add_bytes_feature(feature_key.encoded, encoded_image) + .add_bytes_feature(feature_key.format, image_format) + .add_ints_feature(feature_key.height, [height]) + .add_ints_feature(feature_key.width, [width]) + .add_ints_feature(feature_key.num_channels, num_channels) + .add_bytes_feature(feature_key.source_id, image_source_id)) + + def add_boxes_feature( + self, + xmins: Sequence[float], + xmaxs: Sequence[float], + ymins: Sequence[float], + ymaxs: Sequence[float], + labels: Sequence[int], + confidences: Optional[Sequence[float]] = None, + normalized: bool = True, + feature_prefix: Optional[str] = None) -> 'TfExampleBuilder': + """Adds box and label features to the example. + + Four features will be generated for xmin, ymin, xmax, and ymax. One feature + will be generated for label. Different feature keys will be used for + normalized boxes and pixel-value boxes, depending on the value of + `normalized`. + + Example usages: + >>> example_builder = TfExampleBuilder() + >>> example_builder.add_boxes_feature(xmins, xmaxs, ymins, ymaxs, labels) + + Args: + xmins: A list of minimum X coordinates. + xmaxs: A list of maximum X coordinates. + ymins: A list of minimum Y coordinates. + ymaxs: A list of maximum Y coordinates. + labels: The labels of added boxes. + confidences: The confidences of added boxes. + normalized: Indicate if the coordinates of boxes are normalized. + feature_prefix: Feature prefix for added box features. + + Returns: + The builder object for subsequent method calls. + """ + if normalized: + feature_key = tf_example_feature_key.BoxFeatureKey(feature_prefix) + else: + feature_key = tf_example_feature_key.BoxPixelFeatureKey(feature_prefix) + + self.add_floats_feature(feature_key.xmin, xmins) + self.add_floats_feature(feature_key.xmax, xmaxs) + self.add_floats_feature(feature_key.ymin, ymins) + self.add_floats_feature(feature_key.ymax, ymaxs) + self.add_ints_feature(feature_key.label, labels) + if confidences is not None: + self.add_floats_feature(feature_key.confidence, confidences) + return self + + def _compute_mask_areas( + self, instance_mask_matrices: np.ndarray) -> Sequence[float]: + return np.sum( + instance_mask_matrices, axis=(1, 2, 3), + dtype=np.float).flatten().tolist() + + def add_instance_mask_matrices_feature( + self, + instance_mask_matrices: np.ndarray, + feature_prefix: Optional[str] = None) -> 'TfExampleBuilder': + """Encodes and adds instance mask features to the example. + + See `tf_example_feature_key.EncodedInstanceMaskFeatureKey` for list of + feature keys that will be added to the example. Please note that all masks + will be encoded as PNG images. + + Example usages: + >>> example_builder = TfExampleBuilder() + >>> example_builder.add_instance_mask_matrices_feature( + instance_mask_matrices) + + TODO(b/223653024): Provide a way to generate visualization mask from + feature mask. + + Args: + instance_mask_matrices: Numpy instance mask matrices with shape + (num_instance, height, width, 1) or (num_instance, height, width). + feature_prefix: Feature prefix for instance mask features. + + Returns: + The builder object for subsequent method calls. + """ + if len(instance_mask_matrices.shape) == 3: + instance_mask_matrices = instance_mask_matrices[..., np.newaxis] + + mask_areas = self._compute_mask_areas(instance_mask_matrices) + encoded_instance_masks = list( + map(lambda x: image_utils.encode_image(x, 'PNG'), + instance_mask_matrices)) + + return self.add_encoded_instance_masks_feature(encoded_instance_masks, + mask_areas, feature_prefix) + + def add_encoded_instance_masks_feature( + self, + encoded_instance_masks: Sequence[bytes], + mask_areas: Optional[Sequence[float]] = None, + feature_prefix: Optional[str] = None) -> 'TfExampleBuilder': + """Adds encoded instance mask features to the example. + + See `tf_example_feature_key.EncodedInstanceMaskFeatureKey` for list of + feature keys that will be added to the example. + + Image area is inferred from the encoded instance mask bytes if not provided. + + Example usages: + >>> example_builder = TfExampleBuilder() + >>> example_builder.add_encoded_instance_masks_feature( + instance_mask_bytes) + + TODO(b/223653024): Provide a way to generate visualization mask from + feature mask. + + Args: + encoded_instance_masks: A list of encoded instance mask string. Note that + the encoding is not changed in this function and it always assumes the + image is in "PNG" format. + mask_areas: Areas for each instance masks. + feature_prefix: Feature prefix for instance mask features. + + Returns: + The builder object for subsequent method calls. + """ + encoded_instance_masks = _to_bytes_array(encoded_instance_masks) + + if mask_areas is None: + instance_mask_matrices = np.array( + list(map(image_utils.decode_image, encoded_instance_masks))) + mask_areas = self._compute_mask_areas(instance_mask_matrices) + + feature_key = tf_example_feature_key.EncodedInstanceMaskFeatureKey( + feature_prefix) + return ( + self.add_bytes_feature(feature_key.mask, encoded_instance_masks) + .add_floats_feature(feature_key.area, mask_areas)) + + def add_semantic_mask_matrix_feature( + self, + mask_matrix: np.ndarray, + mask_format: str = 'PNG', + visualization_mask_matrix: Optional[np.ndarray] = None, + visualization_mask_format: str = 'PNG', + feature_prefix: Optional[str] = None) -> 'TfExampleBuilder': + """Encodes and adds semantic mask features to the example. + + See `tf_example_feature_key.EncodedSemanticMaskFeatureKey` for list of + feature keys that will be added to the example. + + Example usages: + >>> example_builder = TfExampleBuilder() + * For adding semantic mask feature: + >>> example_builder.add_semantic_mask_matrix_feature( + semantic_mask_matrix) + * For adding semantic mask feature and visualization mask feature: + >>> example_builder.add_semantic_mask_matrix_feature( + semantic_mask_matrix, + visualization_mask_matrix=visualization_mask_matrix) + * For adding predicted semantic mask feature with visualization mask: + >>> example_builder.add_encoded_semantic_mask_feature( + predicted_mask_matrix, + visualization_mask_matrix=predicted_visualization_mask_matrix, + feature_prefix='predicted') + + TODO(b/223653024): Provide a way to generate visualization mask from + feature mask. + + Args: + mask_matrix: Numpy semantic mask matrix with shape (height, width, 1) or + (height, width). + mask_format: Mask format string, defaults to 'PNG'. + visualization_mask_matrix: Numpy visualization mask matrix for semantic + mask with shape (height, width, 3). + visualization_mask_format: Visualization mask format string, defaults to + 'PNG'. + feature_prefix: Feature prefix for semantic mask features. + + Returns: + The builder object for subsequent method calls. + """ + if len(mask_matrix.shape) == 2: + mask_matrix = mask_matrix[..., np.newaxis] + encoded_mask = image_utils.encode_image(mask_matrix, mask_format) + + encoded_visualization_mask = None + if visualization_mask_matrix is not None: + encoded_visualization_mask = image_utils.encode_image( + visualization_mask_matrix, visualization_mask_format) + + return self.add_encoded_semantic_mask_feature(encoded_mask, mask_format, + encoded_visualization_mask, + visualization_mask_format, + feature_prefix) + + def add_encoded_semantic_mask_feature( + self, encoded_mask: bytes, + mask_format: str = 'PNG', + encoded_visualization_mask: Optional[bytes] = None, + visualization_mask_format: str = 'PNG', + feature_prefix: Optional[str] = None) -> 'TfExampleBuilder': + """Adds encoded semantic mask features to the example. + + See `tf_example_feature_key.EncodedSemanticMaskFeatureKey` for list of + feature keys that will be added to the example. + + Example usages: + >>> example_builder = TfExampleBuilder() + * For adding semantic mask feature: + >>> example_builder.add_encoded_semantic_mask_feature(semantic_mask_bytes) + * For adding semantic mask feature and visualization mask feature: + >>> example_builder.add_encoded_semantic_mask_feature( + semantic_mask_bytes, + encoded_visualization_mask=visualization_mask_bytes) + * For adding predicted semantic mask feature with visualization mask: + >>> example_builder.add_encoded_semantic_mask_feature( + predicted_mask_bytes, + encoded_visualization_mask=predicted_visualization_mask_bytes, + feature_prefix='predicted') + + TODO(b/223653024): Provide a way to generate visualization mask from + feature mask. + + Args: + encoded_mask: Encoded semantic mask string. + mask_format: Semantic mask format string, defaults to 'PNG'. + encoded_visualization_mask: Encoded visualization mask string. + visualization_mask_format: Visualization mask format string, defaults to + 'PNG'. + feature_prefix: Feature prefix for semantic mask features. + + Returns: + The builder object for subsequent method calls. + """ + feature_key = tf_example_feature_key.EncodedSemanticMaskFeatureKey( + feature_prefix) + example_builder = ( + self.add_bytes_feature(feature_key.mask, encoded_mask) + .add_bytes_feature(feature_key.mask_format, mask_format)) + if encoded_visualization_mask is not None: + example_builder = ( + example_builder.add_bytes_feature( + feature_key.visualization_mask, encoded_visualization_mask) + .add_bytes_feature( + feature_key.visualization_mask_format, visualization_mask_format)) + return example_builder + + def add_panoptic_mask_matrix_feature( + self, + panoptic_category_mask_matrix: np.ndarray, + panoptic_instance_mask_matrix: np.ndarray, + panoptic_category_mask_format: str = 'PNG', + panoptic_instance_mask_format: str = 'PNG', + feature_prefix: Optional[str] = None) -> 'TfExampleBuilder': + """Encodes and adds panoptic mask features to the example. + + See `tf_example_feature_key.EncodedPanopticMaskFeatureKey` for list of + feature keys that will be added to the example. + + Example usages: + >>> example_builder = TfExampleBuilder() + >>> example_builder.add_panoptic_mask_matrix_feature( + panoptic_category_mask_matrix, panoptic_instance_mask_matrix) + + TODO(b/223653024): Provide a way to generate visualization mask from + feature mask. + + Args: + panoptic_category_mask_matrix: Numpy panoptic category mask matrix with + shape (height, width, 1) or (height, width). + panoptic_instance_mask_matrix: Numpy panoptic instance mask matrix with + shape (height, width, 1) or (height, width). + panoptic_category_mask_format: Panoptic category mask format string, + defaults to 'PNG'. + panoptic_instance_mask_format: Panoptic instance mask format string, + defaults to 'PNG'. + feature_prefix: Feature prefix for panoptic mask features. + + Returns: + The builder object for subsequent method calls. + """ + if len(panoptic_category_mask_matrix.shape) == 2: + panoptic_category_mask_matrix = ( + panoptic_category_mask_matrix[..., np.newaxis]) + if len(panoptic_instance_mask_matrix.shape) == 2: + panoptic_instance_mask_matrix = ( + panoptic_instance_mask_matrix[..., np.newaxis]) + encoded_panoptic_category_mask = image_utils.encode_image( + panoptic_category_mask_matrix, panoptic_category_mask_format) + encoded_panoptic_instance_mask = image_utils.encode_image( + panoptic_instance_mask_matrix, panoptic_instance_mask_format) + + return self.add_encoded_panoptic_mask_feature( + encoded_panoptic_category_mask, encoded_panoptic_instance_mask, + panoptic_category_mask_format, panoptic_instance_mask_format, + feature_prefix) + + def add_encoded_panoptic_mask_feature( + self, + encoded_panoptic_category_mask: bytes, + encoded_panoptic_instance_mask: bytes, + panoptic_category_mask_format: str = 'PNG', + panoptic_instance_mask_format: str = 'PNG', + feature_prefix: Optional[str] = None) -> 'TfExampleBuilder': + """Adds encoded panoptic mask features to the example. + + See `tf_example_feature_key.EncodedPanopticMaskFeatureKey` for list of + feature keys that will be added to the example. + + Example usages: + >>> example_builder = TfExampleBuilder() + >>> example_builder.add_encoded_panoptic_mask_feature( + encoded_panoptic_category_mask, encoded_panoptic_instance_mask) + + TODO(b/223653024): Provide a way to generate visualization mask from + feature mask. + + Args: + encoded_panoptic_category_mask: Encoded panoptic category mask string. + encoded_panoptic_instance_mask: Encoded panoptic instance mask string. + panoptic_category_mask_format: Panoptic category mask format string, + defaults to 'PNG'. + panoptic_instance_mask_format: Panoptic instance mask format string, + defaults to 'PNG'. + feature_prefix: Feature prefix for panoptic mask features. + + Returns: + The builder object for subsequent method calls. + """ + feature_key = tf_example_feature_key.EncodedPanopticMaskFeatureKey( + feature_prefix) + return ( + self.add_bytes_feature( + feature_key.category_mask, encoded_panoptic_category_mask) + .add_bytes_feature( + feature_key.category_mask_format, panoptic_category_mask_format) + .add_bytes_feature( + feature_key.instance_mask, encoded_panoptic_instance_mask) + .add_bytes_feature( + feature_key.instance_mask_format, panoptic_instance_mask_format)) + diff --git a/official/vision/data/tf_example_builder_test.py b/official/vision/data/tf_example_builder_test.py new file mode 100644 index 0000000000000000000000000000000000000000..b5b450b7a71af68d5983878869f88af9efa31ca7 --- /dev/null +++ b/official/vision/data/tf_example_builder_test.py @@ -0,0 +1,646 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for tf_example_builder.""" + +from absl.testing import parameterized +import tensorflow as tf +from official.vision.data import fake_feature_generator +from official.vision.data import image_utils +from official.vision.data import tf_example_builder + + +class TfExampleBuilderTest(tf.test.TestCase, parameterized.TestCase): + + @parameterized.named_parameters( + ('RGB_PNG', 128, 64, 3, 'PNG', '15125990', 3), + ('RGB_RAW', 128, 128, 3, 'RAW', '5607919', 0), + ('RGB_JPEG', 64, 128, 3, 'JPEG', '3107796', [2, 5])) + def test_add_image_matrix_feature_success(self, height, width, num_channels, + image_format, hashed_image, label): + # Prepare test data. + image_np = fake_feature_generator.generate_image_np(height, width, + num_channels) + expected_image_bytes = image_utils.encode_image(image_np, image_format) + hashed_image = bytes(hashed_image, 'ascii') + + # Run code logic. + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_image_matrix_feature(image_np, image_format, + label=label) + example = example_builder.example + + # Verify outputs. + # Prefer to use string literal for feature keys to directly display the + # structure of the expected tf.train.Example. + if isinstance(label, int): + expected_labels = [label] + else: + expected_labels = label + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[expected_image_bytes])), + 'image/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + 'image/height': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[height])), + 'image/width': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[width])), + 'image/channels': + tf.train.Feature( + int64_list=tf.train.Int64List( + value=[num_channels])), + 'image/source_id': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[hashed_image])), + 'image/class/label': + tf.train.Feature( + int64_list=tf.train.Int64List( + value=expected_labels)), + })), example) + + def test_add_image_matrix_feature_with_feature_prefix_success(self): + height = 64 + width = 64 + num_channels = 1 + image_format = 'PNG' + feature_prefix = 'depth' + label = 8 + image_np = fake_feature_generator.generate_image_np(height, width, + num_channels) + expected_image_bytes = image_utils.encode_image(image_np, image_format) + hashed_image = bytes('11981843', 'ascii') + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_image_matrix_feature( + image_np, image_format, feature_prefix=feature_prefix, label=label) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'depth/image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[expected_image_bytes])), + 'depth/image/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + 'depth/image/height': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[height])), + 'depth/image/width': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[width])), + 'depth/image/channels': + tf.train.Feature( + int64_list=tf.train.Int64List( + value=[num_channels])), + 'depth/image/source_id': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[hashed_image])), + 'depth/image/class/label': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[label])) + })), example) + + def test_add_encoded_raw_image_feature_success(self): + height = 128 + width = 128 + num_channels = 3 + image_format = 'RAW' + image_bytes = tf.bfloat16.as_numpy_dtype + image_np = fake_feature_generator.generate_image_np(height, width, + num_channels) + image_np = image_np.astype(image_bytes) + expected_image_bytes = image_utils.encode_image(image_np, image_format) + hashed_image = bytes('3572575', 'ascii') + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_encoded_image_feature(expected_image_bytes, 'RAW', + height, width, num_channels) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[expected_image_bytes])), + 'image/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + 'image/height': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[height])), + 'image/width': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[width])), + 'image/channels': + tf.train.Feature( + int64_list=tf.train.Int64List( + value=[num_channels])), + 'image/source_id': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[hashed_image])) + })), example) + + def test_add_encoded_raw_image_feature_valueerror(self): + image_format = 'RAW' + image_bytes = tf.bfloat16.as_numpy_dtype + image_np = fake_feature_generator.generate_image_np(1, 1, 1) + image_np = image_np.astype(image_bytes) + expected_image_bytes = image_utils.encode_image(image_np, image_format) + + example_builder = tf_example_builder.TfExampleBuilder() + with self.assertRaises(ValueError): + example_builder.add_encoded_image_feature(expected_image_bytes, + image_format) + + @parameterized.parameters( + (True, True, True, True, True, True), + (False, False, False, False, False, False), + (True, False, False, False, False, False), + (False, True, False, False, False, False), + (False, False, True, False, False, False), + (False, False, False, True, False, False), + (False, False, False, False, True, False), + (False, False, False, False, False, True), + ) + def test_add_encoded_image_feature_success(self, miss_image_format, + miss_height, miss_width, + miss_num_channels, + miss_image_source_id, + miss_label): + height = 64 + width = 64 + num_channels = 3 + image_format = 'PNG' + image_np = fake_feature_generator.generate_image_np(height, width, + num_channels) + image_bytes = image_utils.encode_image(image_np, image_format) + hashed_image = bytes('2968688', 'ascii') + label = 5 + + image_format = None if miss_image_format else image_format + height = None if miss_height else height + width = None if miss_width else width + num_channels = None if miss_num_channels else num_channels + image_source_id = None if miss_image_source_id else hashed_image + label = None if miss_label else label + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_encoded_image_feature( + image_bytes, + image_format=image_format, + height=height, + width=width, + num_channels=num_channels, + image_source_id=image_source_id, + label=label) + example = example_builder.example + + expected_features = { + 'image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[image_bytes])), + 'image/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes('PNG', 'ascii')])), + 'image/height': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[64])), + 'image/width': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[64])), + 'image/channels': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[3])), + 'image/source_id': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[hashed_image]))} + if not miss_label: + expected_features.update({ + 'image/class/label': + tf.train.Feature( + int64_list=tf.train.Int64List(value=[label]))}) + self.assertProtoEquals( + tf.train.Example(features=tf.train.Features(feature=expected_features)), + example) + + @parameterized.named_parameters(('no_box', 0), ('10_boxes', 10)) + def test_add_normalized_boxes_feature(self, num_boxes): + normalized_boxes_np = fake_feature_generator.generate_normalized_boxes_np( + num_boxes) + ymins, xmins, ymaxs, xmaxs = normalized_boxes_np.T.tolist() + labels = fake_feature_generator.generate_classes_np( + 2, size=num_boxes).tolist() + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_boxes_feature( + xmins, xmaxs, ymins, ymaxs, labels=labels, normalized=True) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/object/bbox/xmin': + tf.train.Feature( + float_list=tf.train.FloatList(value=xmins)), + 'image/object/bbox/ymin': + tf.train.Feature( + float_list=tf.train.FloatList(value=ymins)), + 'image/object/bbox/xmax': + tf.train.Feature( + float_list=tf.train.FloatList(value=xmaxs)), + 'image/object/bbox/ymax': + tf.train.Feature( + float_list=tf.train.FloatList(value=ymaxs)), + 'image/object/class/label': + tf.train.Feature( + int64_list=tf.train.Int64List(value=labels)), + })), example) + + @parameterized.named_parameters(('no_box', 0), ('10_boxes', 10)) + def test_add_box_pixels_feature(self, num_boxes): + height, width = 10, 10 + boxes_np = fake_feature_generator.generate_boxes_np(height, width, + num_boxes) + ymins, xmins, ymaxs, xmaxs = boxes_np.T.tolist() + labels = fake_feature_generator.generate_classes_np( + 2, size=num_boxes).tolist() + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_boxes_feature( + xmins, xmaxs, ymins, ymaxs, labels=labels, normalized=False) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/object/bbox/xmin_pixels': + tf.train.Feature( + float_list=tf.train.FloatList(value=xmins)), + 'image/object/bbox/ymin_pixels': + tf.train.Feature( + float_list=tf.train.FloatList(value=ymins)), + 'image/object/bbox/xmax_pixels': + tf.train.Feature( + float_list=tf.train.FloatList(value=xmaxs)), + 'image/object/bbox/ymax_pixels': + tf.train.Feature( + float_list=tf.train.FloatList(value=ymaxs)), + 'image/object/class/label': + tf.train.Feature( + int64_list=tf.train.Int64List(value=labels)), + })), example) + + @parameterized.named_parameters(('no_box', 0), ('10_boxes', 10)) + def test_add_normalized_boxes_feature_with_confidence_and_prefix( + self, num_boxes): + normalized_boxes_np = fake_feature_generator.generate_normalized_boxes_np( + num_boxes) + ymins, xmins, ymaxs, xmaxs = normalized_boxes_np.T.tolist() + labels = fake_feature_generator.generate_classes_np( + 2, size=num_boxes).tolist() + confidences = fake_feature_generator.generate_confidences_np( + size=num_boxes).tolist() + feature_prefix = 'predicted' + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_boxes_feature( + xmins, + xmaxs, + ymins, + ymaxs, + labels=labels, + confidences=confidences, + normalized=True, + feature_prefix=feature_prefix) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'predicted/image/object/bbox/xmin': + tf.train.Feature( + float_list=tf.train.FloatList(value=xmins)), + 'predicted/image/object/bbox/ymin': + tf.train.Feature( + float_list=tf.train.FloatList(value=ymins)), + 'predicted/image/object/bbox/xmax': + tf.train.Feature( + float_list=tf.train.FloatList(value=xmaxs)), + 'predicted/image/object/bbox/ymax': + tf.train.Feature( + float_list=tf.train.FloatList(value=ymaxs)), + 'predicted/image/object/class/label': + tf.train.Feature( + int64_list=tf.train.Int64List(value=labels)), + 'predicted/image/object/bbox/confidence': + tf.train.Feature( + float_list=tf.train.FloatList(value=confidences)), + })), example) + + @parameterized.named_parameters(('no_mask', 128, 64, 0), + ('10_masks', 64, 128, 10)) + def test_add_instance_mask_matrices_feature_success(self, height, width, + num_masks): + # Prepare test data. + instance_masks_np = fake_feature_generator.generate_instance_masks_np( + height, + width, + boxes_np=fake_feature_generator.generate_boxes_np( + height, width, num_masks), + normalized=False) + expected_instance_masks_bytes = list( + map(lambda x: image_utils.encode_image(x, 'PNG'), instance_masks_np)) + + # Run code logic. + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_instance_mask_matrices_feature(instance_masks_np) + example = example_builder.example + + # Verify outputs. + # Prefer to use string literal for feature keys to directly display the + # structure of the expected tf.train.Example. + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/object/mask': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=expected_instance_masks_bytes)), + 'image/object/area': + # The box area is 4x smaller than the image, and the + # mask area is 2x smaller than the box. + tf.train.Feature( + float_list=tf.train.FloatList( + value=[height * width / 8] * num_masks)) + })), example) + + @parameterized.named_parameters(('with_mask_areas', True), + ('without_mask_areas', False)) + def test_add_encoded_instance_masks_feature_success(self, has_mask_areas): + height = 64 + width = 64 + image_format = 'PNG' + mask_np = fake_feature_generator.generate_semantic_mask_np(height, width, 2) + mask_bytes = image_utils.encode_image(mask_np, image_format) + + test_masks = [mask_bytes for _ in range(2)] + mask_areas = [2040., 2040.] if has_mask_areas else None + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_encoded_instance_masks_feature( + test_masks, mask_areas=mask_areas) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/object/mask': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=test_masks)), + 'image/object/area': + tf.train.Feature( + float_list=tf.train.FloatList( + value=[2040., 2040.])), + })), example) + + @parameterized.named_parameters( + ('with_visualization_mask', 128, 64, True), + ('without_visualization_mask', 64, 128, False)) + def test_add_semantic_mask_matrices_feature_success(self, height, width, + has_visualization_mask): + # Prepare test data. + semantic_mask_np = fake_feature_generator.generate_semantic_mask_np( + height, width, 2) + image_format = 'PNG' + expected_feature_dict = { + 'image/segmentation/class/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[ + image_utils.encode_image(semantic_mask_np, image_format) + ])), + 'image/segmentation/class/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + } + visualization_mask_np = None + if has_visualization_mask: + visualization_mask_np = fake_feature_generator.generate_image_np( + height, width) + expected_feature_dict.update({ + 'image/segmentation/class/visualization/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[ + image_utils.encode_image(visualization_mask_np, + image_format) + ])), + 'image/segmentation/class/visualization/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + }) + + # Run code logic. + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_semantic_mask_matrix_feature(semantic_mask_np, + image_format, + visualization_mask_np, + image_format) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features(feature=expected_feature_dict)), example) + + @parameterized.named_parameters(('with_visualization_mask', True), + ('without_visualization_mask', False)) + def test_add_encoded_semantic_mask_feature_success(self, + has_visualization_mask): + height, width = 64, 64 + semantic_mask_np = fake_feature_generator.generate_semantic_mask_np( + height, width, 2) + image_format = 'PNG' + encoded_semantic_mask = image_utils.encode_image(semantic_mask_np, + image_format) + expected_feature_dict = { + 'image/segmentation/class/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[encoded_semantic_mask])), + 'image/segmentation/class/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + } + encoded_visualization_mask = None + if has_visualization_mask: + visualization_mask_np = fake_feature_generator.generate_image_np( + height, width) + encoded_visualization_mask = image_utils.encode_image( + visualization_mask_np, image_format) + expected_feature_dict.update({ + 'image/segmentation/class/visualization/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[encoded_visualization_mask])), + 'image/segmentation/class/visualization/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + }) + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_encoded_semantic_mask_feature( + encoded_semantic_mask, image_format, encoded_visualization_mask, + image_format) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features(feature=expected_feature_dict)), example) + + def test_add_panoptic_mask_matrices_feature_success(self): + # Prepare test data. + height, width, num_instances = 64, 64, 10 + num_thing_classes, num_semantic_segmentation_classes = 3, 6 + image_format = 'PNG' + + normalized_boxes_np = fake_feature_generator.generate_normalized_boxes_np( + num_instances) + instance_masks_np = fake_feature_generator.generate_instance_masks_np( + height, width, normalized_boxes_np) + instance_classes_np = fake_feature_generator.generate_classes_np( + num_thing_classes, num_instances) + semantic_mask_np = fake_feature_generator.generate_semantic_mask_np( + height, width, num_semantic_segmentation_classes) + panoptic_category_mask_np, panoptic_instance_mask_np = ( + fake_feature_generator.generate_panoptic_masks_np( + semantic_mask_np, instance_masks_np, instance_classes_np, + num_thing_classes - 1)) + + # Run code logic. + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_panoptic_mask_matrix_feature(panoptic_category_mask_np, + panoptic_instance_mask_np, + image_format, + image_format) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/panoptic/category/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[ + image_utils.encode_image( + panoptic_category_mask_np, image_format) + ])), + 'image/panoptic/category/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + 'image/panoptic/instance/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[ + image_utils.encode_image( + panoptic_instance_mask_np, image_format) + ])), + 'image/panoptic/instance/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + })), example) + + def test_add_encoded_panoptic_mask_feature_success(self): + # Prepare test data. + height, width, num_instances = 64, 64, 10 + num_thing_classes, num_semantic_segmentation_classes = 3, 6 + image_format = 'PNG' + + normalized_boxes_np = fake_feature_generator.generate_normalized_boxes_np( + num_instances) + instance_masks_np = fake_feature_generator.generate_instance_masks_np( + height, width, normalized_boxes_np) + instance_classes_np = fake_feature_generator.generate_classes_np( + num_thing_classes, num_instances) + semantic_mask_np = fake_feature_generator.generate_semantic_mask_np( + height, width, num_semantic_segmentation_classes) + panoptic_category_mask_np, panoptic_instance_mask_np = ( + fake_feature_generator.generate_panoptic_masks_np( + semantic_mask_np, instance_masks_np, instance_classes_np, + num_thing_classes - 1)) + + encoded_panoptic_category_mask = image_utils.encode_image( + panoptic_category_mask_np, image_format) + encoded_panoptic_instance_mask = image_utils.encode_image( + panoptic_instance_mask_np, image_format) + + example_builder = tf_example_builder.TfExampleBuilder() + example_builder.add_encoded_panoptic_mask_feature( + encoded_panoptic_category_mask, encoded_panoptic_instance_mask, + image_format, image_format) + example = example_builder.example + + self.assertProtoEquals( + tf.train.Example( + features=tf.train.Features( + feature={ + 'image/panoptic/category/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[encoded_panoptic_category_mask])), + 'image/panoptic/category/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + 'image/panoptic/instance/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[encoded_panoptic_instance_mask])), + 'image/panoptic/instance/format': + tf.train.Feature( + bytes_list=tf.train.BytesList( + value=[bytes(image_format, 'ascii')])), + })), example) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/data/tf_example_feature_key.py b/official/vision/data/tf_example_feature_key.py new file mode 100644 index 0000000000000000000000000000000000000000..cd23aae5c814b29b5bbd45ce587c9e9bca216098 --- /dev/null +++ b/official/vision/data/tf_example_feature_key.py @@ -0,0 +1,174 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Data classes for tf.Example proto feature keys in vision tasks. + +Feature keys are grouped by feature types. Key names follow conventions in +go/tf-example. +""" +import dataclasses +import functools + +from official.core import tf_example_feature_key + +# Disable init function to use the one defined in base class. +dataclass = functools.partial(dataclasses.dataclass(init=False)) + + +@dataclass +class EncodedImageFeatureKey(tf_example_feature_key.TfExampleFeatureKeyBase): + """Feature keys for a single encoded image. + + The image matrix is expected to be in the shape of (height, width, + num_channels). + + Attributes: + encoded: encoded image bytes. + format: format string, e.g. 'PNG'. + height: number of rows. + width: number of columns. + num_channels: number of channels. + source_id: Unique string ID to identify the image. + label: the label or a list of labels for the image. + """ + encoded: str = 'image/encoded' + format: str = 'image/format' + height: str = 'image/height' + width: str = 'image/width' + num_channels: str = 'image/channels' + source_id: str = 'image/source_id' + label: str = 'image/class/label' + + +@dataclass +class BoxFeatureKey(tf_example_feature_key.TfExampleFeatureKeyBase): + """Feature keys for normalized boxes representing objects in an image. + + Each box is defined by ((ymin, xmin), (ymax, xmax)). + + The origin point of an image matrix is top left. + + Note: The coordinate values are normalized to [0, 1], this is commonly adopted + by most model implementations. + + Attributes: + xmin: The x coordinate (column) of top-left corner. + xmax: The x coordinate (column) of bottom-right corner. + ymin: The y coordinate (row) of top-left corner. + ymax: The y coordinate (row) of bottom-right corner. + label: The class id. + confidence: The confidence score of the box, could be prior score (for + training) or predicted score (for prediction). + """ + xmin: str = 'image/object/bbox/xmin' + xmax: str = 'image/object/bbox/xmax' + ymin: str = 'image/object/bbox/ymin' + ymax: str = 'image/object/bbox/ymax' + label: str = 'image/object/class/label' + confidence: str = 'image/object/bbox/confidence' + + +@dataclass +class BoxPixelFeatureKey(tf_example_feature_key.TfExampleFeatureKeyBase): + """Feature keys for boxes in pixel values representing objects in an image. + + Each box is defined by ((ymin, xmin), (ymax, xmax)). + + Note: The coordinate values are in the scale of the context image. The image + size is usually stored in `EncodedImageFeatureKey`. + + Attributes: + xmin: The x coordinate (column) of top-left corner. + xmax: The x coordinate (column) of bottom-right corner. + ymin: The y coordinate (row) of top-left corner. + ymax: The y coordinate (row) of bottom-right corner. + label: The class id. + confidence: The confidence score of the box, could be prior score (for + training) or predicted score (for prediction). + """ + xmin: str = 'image/object/bbox/xmin_pixels' + xmax: str = 'image/object/bbox/xmax_pixels' + ymin: str = 'image/object/bbox/ymin_pixels' + ymax: str = 'image/object/bbox/ymax_pixels' + label: str = 'image/object/class/label' + confidence: str = 'image/object/bbox/confidence' + + +@dataclass +class EncodedInstanceMaskFeatureKey( + tf_example_feature_key.TfExampleFeatureKeyBase): + """Feature keys for a single encoded instance mask. + + The instance mask matrices are expected to be in the shape of (num_instances, + height, width, 1) or (num_instance, height, width). The height and width + correspond to the image height and width. For each instance mask, the pixel + value is either 0, representing a background, or 1, representing the object. + + TODO(b/223653024): Add keys for visualization mask as well. + + Attributes: + mask: Encoded instance mask bytes. + area: Total number of pixels that are marked as objects. + """ + mask: str = 'image/object/mask' + area: str = 'image/object/area' + + +@dataclass +class EncodedSemanticMaskFeatureKey( + tf_example_feature_key.TfExampleFeatureKeyBase): + """Feature keys for a encoded semantic mask and its associated images. + + The semantic mask matrix is expected to be in the shape of (height, width, 1) + or (height, width). The visualization mask matrix is expected to be in the + shape of (height, width, 3). The height and width correspond to the image + height and width. Each pixel in the semantic mask respresents a class. + + Attributes: + mask: Encoded semantic mask bytes. + mask_format: Format string for semantic mask, e.g. 'PNG'. + visualization_mask: Encoded visualization mask bytes. + visualization_mask_format: Format string for visualization mask, e.g. + 'PNG'. + """ + mask: str = 'image/segmentation/class/encoded' + mask_format: str = 'image/segmentation/class/format' + visualization_mask: str = 'image/segmentation/class/visualization/encoded' + visualization_mask_format: str = 'image/segmentation/class/visualization/format' + + +@dataclass +class EncodedPanopticMaskFeatureKey( + tf_example_feature_key.TfExampleFeatureKeyBase): + """Feature keys for encoded panoptic category and instance masks. + + Both panoptic mask matrices are expected to be in the shape of (height, width, + 1) or (height, width). The height and width correspond to the image height and + width. For category mask, each pixel represents a class ID, and for instance + mask, each pixel represents an instance ID. + + TODO(b/223653024): Add keys for visualization mask as well. + + Attributes: + category_mask: Encoded panoptic category mask bytes. + category_mask_format: Format string for panoptic category mask, e.g. + 'PNG'. + instance_mask: Encoded panoptic instance mask bytes. + instance_mask_format: Format string for panoptic instance mask, e.g. + 'PNG'. + """ + category_mask: str = 'image/panoptic/category/encoded' + category_mask_format: str = 'image/panoptic/category/format' + instance_mask: str = 'image/panoptic/instance/encoded' + instance_mask_format: str = 'image/panoptic/instance/format' diff --git a/official/vision/beta/data/tfrecord_lib.py b/official/vision/data/tfrecord_lib.py similarity index 90% rename from official/vision/beta/data/tfrecord_lib.py rename to official/vision/data/tfrecord_lib.py index df3645f99aa839f34ec3a23540a83fcf281e3a18..96c7c5dc33ba810427a4315917f551588d6eaf15 100644 --- a/official/vision/beta/data/tfrecord_lib.py +++ b/official/vision/data/tfrecord_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -26,6 +26,9 @@ import tensorflow as tf import multiprocessing as mp +LOG_EVERY = 100 + + def convert_to_feature(value, value_type=None): """Converts the given python object to a tf.train.Feature. @@ -114,7 +117,7 @@ def encode_mask_as_png(mask): def write_tf_record_dataset(output_path, annotation_iterator, process_func, num_shards, - use_multiprocessing=True, unpack_arguments=True): + multiple_processes=None, unpack_arguments=True): """Iterates over annotations, processes them and writes into TFRecords. Args: @@ -125,7 +128,10 @@ def write_tf_record_dataset(output_path, annotation_iterator, annotation_iterator as arguments and returns a tuple of (tf.train.Example, int). The integer indicates the number of annotations that were skipped. num_shards: int, the number of shards to write for the dataset. - use_multiprocessing: + multiple_processes: integer, the number of multiple parallel processes to + use. If None, uses multi-processing with number of processes equal to + `os.cpu_count()`, which is Python's default behavior. If set to 0, + multi-processing is disabled. Whether or not to use multiple processes to write TF Records. unpack_arguments: Whether to unpack the tuples from annotation_iterator as individual @@ -143,8 +149,9 @@ def write_tf_record_dataset(output_path, annotation_iterator, total_num_annotations_skipped = 0 - if use_multiprocessing: - pool = mp.Pool() + if multiple_processes is None or multiple_processes > 0: + pool = mp.Pool( + processes=multiple_processes) if unpack_arguments: tf_example_iterator = pool.starmap(process_func, annotation_iterator) else: @@ -157,13 +164,13 @@ def write_tf_record_dataset(output_path, annotation_iterator, for idx, (tf_example, num_annotations_skipped) in enumerate( tf_example_iterator): - if idx % 100 == 0: + if idx % LOG_EVERY == 0: logging.info('On image %d', idx) total_num_annotations_skipped += num_annotations_skipped writers[idx % num_shards].write(tf_example.SerializeToString()) - if use_multiprocessing: + if multiple_processes is None or multiple_processes > 0: pool.close() pool.join() diff --git a/official/vision/beta/data/tfrecord_lib_test.py b/official/vision/data/tfrecord_lib_test.py similarity index 93% rename from official/vision/beta/data/tfrecord_lib_test.py rename to official/vision/data/tfrecord_lib_test.py index b348d6243db70c13985d119faf066b935a269433..6825ff8cd4ab8895130bb435cc409777d55bc4cf 100644 --- a/official/vision/beta/data/tfrecord_lib_test.py +++ b/official/vision/data/tfrecord_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,7 +20,7 @@ from absl import flags from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.data import tfrecord_lib +from official.vision.data import tfrecord_lib FLAGS = flags.FLAGS @@ -47,7 +47,7 @@ class TfrecordLibTest(parameterized.TestCase): path = os.path.join(FLAGS.test_tmpdir, 'train') tfrecord_lib.write_tf_record_dataset( - path, data, process_sample, 3, use_multiprocessing=False) + path, data, process_sample, 3, multiple_processes=0) tfrecord_files = tf.io.gfile.glob(path + '*') self.assertLen(tfrecord_files, 3) diff --git a/official/vision/dataloaders/__init__.py b/official/vision/dataloaders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/vision/dataloaders/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/dataloaders/classification_input.py b/official/vision/dataloaders/classification_input.py new file mode 100644 index 0000000000000000000000000000000000000000..e72c2c147c58d83329d49e47b718dda2024d8269 --- /dev/null +++ b/official/vision/dataloaders/classification_input.py @@ -0,0 +1,286 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Classification decoder and parser.""" +from typing import Any, Dict, List, Optional, Tuple +# Import libraries +import tensorflow as tf + +from official.vision.configs import common +from official.vision.dataloaders import decoder +from official.vision.dataloaders import parser +from official.vision.ops import augment +from official.vision.ops import preprocess_ops + +DEFAULT_IMAGE_FIELD_KEY = 'image/encoded' +DEFAULT_LABEL_FIELD_KEY = 'image/class/label' + + +class Decoder(decoder.Decoder): + """A tf.Example decoder for classification task.""" + + def __init__(self, + image_field_key: str = DEFAULT_IMAGE_FIELD_KEY, + label_field_key: str = DEFAULT_LABEL_FIELD_KEY, + is_multilabel: bool = False, + keys_to_features: Optional[Dict[str, Any]] = None): + if not keys_to_features: + keys_to_features = { + image_field_key: + tf.io.FixedLenFeature((), tf.string, default_value=''), + } + if is_multilabel: + keys_to_features.update( + {label_field_key: tf.io.VarLenFeature(dtype=tf.int64)}) + else: + keys_to_features.update({ + label_field_key: + tf.io.FixedLenFeature((), tf.int64, default_value=-1) + }) + self._keys_to_features = keys_to_features + + def decode(self, serialized_example): + return tf.io.parse_single_example(serialized_example, + self._keys_to_features) + + +class Parser(parser.Parser): + """Parser to parse an image and its annotations into a dictionary of tensors.""" + + def __init__(self, + output_size: List[int], + num_classes: float, + image_field_key: str = DEFAULT_IMAGE_FIELD_KEY, + label_field_key: str = DEFAULT_LABEL_FIELD_KEY, + decode_jpeg_only: bool = True, + aug_rand_hflip: bool = True, + aug_crop: Optional[bool] = True, + aug_type: Optional[common.Augmentation] = None, + color_jitter: float = 0., + random_erasing: Optional[common.RandomErasing] = None, + is_multilabel: bool = False, + dtype: str = 'float32', + crop_area_range: Optional[Tuple[float, float]] = (0.08, 1.0)): + """Initializes parameters for parsing annotations in the dataset. + + Args: + output_size: `Tensor` or `list` for [height, width] of output image. The + output_size should be divided by the largest feature stride 2^max_level. + num_classes: `float`, number of classes. + image_field_key: `str`, the key name to encoded image in tf.Example. + label_field_key: `str`, the key name to label in tf.Example. + decode_jpeg_only: `bool`, if True, only JPEG format is decoded, this is + faster than decoding other types. Default is True. + aug_rand_hflip: `bool`, if True, augment training with random horizontal + flip. + aug_crop: `bool`, if True, perform random cropping during training and + center crop during validation. + aug_type: An optional Augmentation object to choose from AutoAugment and + RandAugment. + color_jitter: Magnitude of color jitter. If > 0, the value is used to + generate random scale factor for brightness, contrast and saturation. + See `preprocess_ops.color_jitter` for more details. + random_erasing: if not None, augment input image by random erasing. See + `augment.RandomErasing` for more details. + is_multilabel: A `bool`, whether or not each example has multiple labels. + dtype: `str`, cast output image in dtype. It can be 'float32', 'float16', + or 'bfloat16'. + crop_area_range: An optional `tuple` of (min_area, max_area) for image + random crop function to constraint crop operation. The cropped areas + of the image must contain a fraction of the input image within this + range. The default area range is (0.08, 1.0). + """ + self._output_size = output_size + self._aug_rand_hflip = aug_rand_hflip + self._aug_crop = aug_crop + self._num_classes = num_classes + self._image_field_key = image_field_key + if dtype == 'float32': + self._dtype = tf.float32 + elif dtype == 'float16': + self._dtype = tf.float16 + elif dtype == 'bfloat16': + self._dtype = tf.bfloat16 + else: + raise ValueError('dtype {!r} is not supported!'.format(dtype)) + if aug_type: + if aug_type.type == 'autoaug': + self._augmenter = augment.AutoAugment( + augmentation_name=aug_type.autoaug.augmentation_name, + cutout_const=aug_type.autoaug.cutout_const, + translate_const=aug_type.autoaug.translate_const) + elif aug_type.type == 'randaug': + self._augmenter = augment.RandAugment( + num_layers=aug_type.randaug.num_layers, + magnitude=aug_type.randaug.magnitude, + cutout_const=aug_type.randaug.cutout_const, + translate_const=aug_type.randaug.translate_const, + prob_to_apply=aug_type.randaug.prob_to_apply, + exclude_ops=aug_type.randaug.exclude_ops) + else: + raise ValueError('Augmentation policy {} not supported.'.format( + aug_type.type)) + else: + self._augmenter = None + self._label_field_key = label_field_key + self._color_jitter = color_jitter + if random_erasing: + self._random_erasing = augment.RandomErasing( + probability=random_erasing.probability, + min_area=random_erasing.min_area, + max_area=random_erasing.max_area, + min_aspect=random_erasing.min_aspect, + max_aspect=random_erasing.max_aspect, + min_count=random_erasing.min_count, + max_count=random_erasing.max_count, + trials=random_erasing.trials) + else: + self._random_erasing = None + self._is_multilabel = is_multilabel + self._decode_jpeg_only = decode_jpeg_only + self._crop_area_range = crop_area_range + + def _parse_train_data(self, decoded_tensors): + """Parses data for training.""" + image = self._parse_train_image(decoded_tensors) + label = tf.cast(decoded_tensors[self._label_field_key], dtype=tf.int32) + if self._is_multilabel: + if isinstance(label, tf.sparse.SparseTensor): + label = tf.sparse.to_dense(label) + label = tf.reduce_sum(tf.one_hot(label, self._num_classes), axis=0) + return image, label + + def _parse_eval_data(self, decoded_tensors): + """Parses data for evaluation.""" + image = self._parse_eval_image(decoded_tensors) + label = tf.cast(decoded_tensors[self._label_field_key], dtype=tf.int32) + if self._is_multilabel: + if isinstance(label, tf.sparse.SparseTensor): + label = tf.sparse.to_dense(label) + label = tf.reduce_sum(tf.one_hot(label, self._num_classes), axis=0) + return image, label + + def _parse_train_image(self, decoded_tensors): + """Parses image data for training.""" + image_bytes = decoded_tensors[self._image_field_key] + + if self._decode_jpeg_only and self._aug_crop: + image_shape = tf.image.extract_jpeg_shape(image_bytes) + + # Crops image. + cropped_image = preprocess_ops.random_crop_image_v2( + image_bytes, image_shape, area_range=self._crop_area_range) + image = tf.cond( + tf.reduce_all(tf.equal(tf.shape(cropped_image), image_shape)), + lambda: preprocess_ops.center_crop_image_v2(image_bytes, image_shape), + lambda: cropped_image) + else: + # Decodes image. + image = tf.io.decode_image(image_bytes, channels=3) + image.set_shape([None, None, 3]) + + # Crops image. + if self._aug_crop: + cropped_image = preprocess_ops.random_crop_image( + image, area_range=self._crop_area_range) + + image = tf.cond( + tf.reduce_all(tf.equal(tf.shape(cropped_image), tf.shape(image))), + lambda: preprocess_ops.center_crop_image(image), + lambda: cropped_image) + + if self._aug_rand_hflip: + image = tf.image.random_flip_left_right(image) + + # Color jitter. + if self._color_jitter > 0: + image = preprocess_ops.color_jitter(image, self._color_jitter, + self._color_jitter, + self._color_jitter) + + # Resizes image. + image = tf.image.resize( + image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) + image.set_shape([self._output_size[0], self._output_size[1], 3]) + + # Apply autoaug or randaug. + if self._augmenter is not None: + image = self._augmenter.distort(image) + + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image( + image, offset=preprocess_ops.MEAN_RGB, scale=preprocess_ops.STDDEV_RGB) + + # Random erasing after the image has been normalized + if self._random_erasing is not None: + image = self._random_erasing.distort(image) + + # Convert image to self._dtype. + image = tf.image.convert_image_dtype(image, self._dtype) + + return image + + def _parse_eval_image(self, decoded_tensors): + """Parses image data for evaluation.""" + image_bytes = decoded_tensors[self._image_field_key] + + if self._decode_jpeg_only and self._aug_crop: + image_shape = tf.image.extract_jpeg_shape(image_bytes) + + # Center crops. + image = preprocess_ops.center_crop_image_v2(image_bytes, image_shape) + else: + # Decodes image. + image = tf.io.decode_image(image_bytes, channels=3) + image.set_shape([None, None, 3]) + + # Center crops. + if self._aug_crop: + image = preprocess_ops.center_crop_image(image) + + image = tf.image.resize( + image, self._output_size, method=tf.image.ResizeMethod.BILINEAR) + image.set_shape([self._output_size[0], self._output_size[1], 3]) + + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image( + image, offset=preprocess_ops.MEAN_RGB, scale=preprocess_ops.STDDEV_RGB) + + # Convert image to self._dtype. + image = tf.image.convert_image_dtype(image, self._dtype) + + return image + + def parse_train_image(self, decoded_tensors: Dict[str, + tf.Tensor]) -> tf.Tensor: + """Public interface for parsing image data for training.""" + return self._parse_train_image(decoded_tensors) + + @classmethod + def inference_fn(cls, + image: tf.Tensor, + input_image_size: List[int], + num_channels: int = 3) -> tf.Tensor: + """Builds image model inputs for serving.""" + + image = tf.cast(image, dtype=tf.float32) + image = preprocess_ops.center_crop_image(image) + image = tf.image.resize( + image, input_image_size, method=tf.image.ResizeMethod.BILINEAR) + + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image( + image, offset=preprocess_ops.MEAN_RGB, scale=preprocess_ops.STDDEV_RGB) + image.set_shape(input_image_size + [num_channels]) + return image diff --git a/official/vision/beta/dataloaders/decoder.py b/official/vision/dataloaders/decoder.py similarity index 93% rename from official/vision/beta/dataloaders/decoder.py rename to official/vision/dataloaders/decoder.py index a5f691b95152ebebb46e53cd258459164c99fa26..821083f0f0993ae67518b9fcf45c34f73e552fac 100644 --- a/official/vision/beta/dataloaders/decoder.py +++ b/official/vision/dataloaders/decoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/dataloaders/input_reader.py b/official/vision/dataloaders/input_reader.py new file mode 100644 index 0000000000000000000000000000000000000000..fba7dc2772fda09b482ad76f73437604541e379b --- /dev/null +++ b/official/vision/dataloaders/input_reader.py @@ -0,0 +1,178 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Dataset reader for vision model garden.""" + +from typing import Any, Callable, Optional, Tuple + +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.core import input_reader + + +def calculate_batch_sizes(total_batch_size: int, + pseudo_label_ratio: float) -> Tuple[int, int]: + """Calculates labeled and pseudo-labeled dataset batch sizes. + + Returns (labeled_batch_size, pseudo_labeled_batch_size) given a + total batch size and pseudo-label data ratio. + + Args: + total_batch_size: The total batch size for all data. + pseudo_label_ratio: A non-negative float ratio of pseudo-labeled + to labeled data in a batch. + + Returns: + (labeled_batch_size, pseudo_labeled_batch_size) as ints. + + Raises: + ValueError: If total_batch_size is negative. + ValueError: If pseudo_label_ratio is negative. + """ + if total_batch_size < 0: + raise ValueError('Invalid total_batch_size: {}'.format(total_batch_size)) + if pseudo_label_ratio < 0.0: + raise ValueError( + 'Invalid pseudo_label_ratio: {}'.format(pseudo_label_ratio)) + + ratio_factor = pseudo_label_ratio / (1.0 + pseudo_label_ratio) + pseudo_labeled_batch_size = int(round(total_batch_size * ratio_factor)) + labeled_batch_size = total_batch_size - pseudo_labeled_batch_size + return labeled_batch_size, pseudo_labeled_batch_size + + +class CombinationDatasetInputReader(input_reader.InputReader): + """Combination dataset input reader.""" + + def __init__(self, + params: cfg.DataConfig, + dataset_fn=tf.data.TFRecordDataset, + pseudo_label_dataset_fn=tf.data.TFRecordDataset, + decoder_fn: Optional[Callable[..., Any]] = None, + sample_fn: Optional[Callable[..., Any]] = None, + parser_fn: Optional[Callable[..., Any]] = None, + transform_and_batch_fn: Optional[Callable[ + [tf.data.Dataset, Optional[tf.distribute.InputContext]], + tf.data.Dataset]] = None, + postprocess_fn: Optional[Callable[..., Any]] = None): + """Initializes an CombinationDatasetInputReader instance. + + This class mixes a labeled and pseudo-labeled dataset. The params + must contain "pseudo_label_data.input_path" to specify the + pseudo-label dataset files and "pseudo_label_data.data_ratio" + to specify a per-batch mixing ratio of pseudo-label examples to + labeled dataset examples. + + Args: + params: A config_definitions.DataConfig object. + dataset_fn: A `tf.data.Dataset` that consumes the input files. For + example, it can be `tf.data.TFRecordDataset`. + pseudo_label_dataset_fn: A `tf.data.Dataset` that consumes the input + files. For example, it can be `tf.data.TFRecordDataset`. + decoder_fn: An optional `callable` that takes the serialized data string + and decodes them into the raw tensor dictionary. + sample_fn: An optional `callable` that takes a `tf.data.Dataset` object as + input and outputs the transformed dataset. It performs sampling on the + decoded raw tensors dict before the parser_fn. + parser_fn: An optional `callable` that takes the decoded raw tensors dict + and parse them into a dictionary of tensors that can be consumed by the + model. It will be executed after decoder_fn. + transform_and_batch_fn: An optional `callable` that takes a + `tf.data.Dataset` object and an optional `tf.distribute.InputContext` as + input, and returns a `tf.data.Dataset` object. It will be executed after + `parser_fn` to transform and batch the dataset; if None, after + `parser_fn` is executed, the dataset will be batched into per-replica + batch size. + postprocess_fn: A optional `callable` that processes batched tensors. It + will be executed after batching. + + Raises: + ValueError: If drop_remainder is False. + """ + super().__init__(params=params, + dataset_fn=dataset_fn, + decoder_fn=decoder_fn, + sample_fn=sample_fn, + parser_fn=parser_fn, + transform_and_batch_fn=transform_and_batch_fn, + postprocess_fn=postprocess_fn) + + self._pseudo_label_file_pattern = params.pseudo_label_data.input_path + self._pseudo_label_dataset_fn = pseudo_label_dataset_fn + self._pseudo_label_data_ratio = params.pseudo_label_data.data_ratio + self._pseudo_label_matched_files = input_reader.match_files( + self._pseudo_label_file_pattern) + if not self._drop_remainder: + raise ValueError( + 'Must use drop_remainder=True with CombinationDatasetInputReader') + + def read( + self, + input_context: Optional[tf.distribute.InputContext] = None + ) -> tf.data.Dataset: + """Generates a tf.data.Dataset object.""" + + labeled_batch_size, pl_batch_size = calculate_batch_sizes( + self._global_batch_size, self._pseudo_label_data_ratio) + + if not labeled_batch_size and pl_batch_size: + raise ValueError( + 'Invalid batch_size: {} and pseudo_label_data_ratio: {}, ' + 'resulting in a 0 batch size for one of the datasets.'.format( + self._global_batch_size, self._pseudo_label_data_ratio)) + + def _read_decode_and_parse_dataset(matched_files, dataset_fn, batch_size, + input_context, tfds_builder): + dataset = self._read_data_source(matched_files, dataset_fn, input_context, + tfds_builder) + return self._decode_and_parse_dataset(dataset, batch_size, input_context) + + labeled_dataset = _read_decode_and_parse_dataset( + matched_files=self._matched_files, + dataset_fn=self._dataset_fn, + batch_size=labeled_batch_size, + input_context=input_context, + tfds_builder=self._tfds_builder) + + pseudo_labeled_dataset = _read_decode_and_parse_dataset( + matched_files=self._pseudo_label_matched_files, + dataset_fn=self._pseudo_label_dataset_fn, + batch_size=pl_batch_size, + input_context=input_context, + tfds_builder=False) + + def concat_fn(d1, d2): + return tf.nest.map_structure( + lambda x1, x2: tf.concat([x1, x2], axis=0), d1, d2) + + dataset_concat = tf.data.Dataset.zip( + (labeled_dataset, pseudo_labeled_dataset)) + dataset_concat = dataset_concat.map( + concat_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) + + def maybe_map_fn(dataset, fn): + return dataset if fn is None else dataset.map( + fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) + + dataset_concat = maybe_map_fn(dataset_concat, self._postprocess_fn) + dataset_concat = self._maybe_apply_data_service(dataset_concat, + input_context) + + if self._deterministic is not None: + options = tf.data.Options() + options.experimental_deterministic = self._deterministic + dataset_concat = dataset_concat.with_options(options) + + return dataset_concat.prefetch(tf.data.experimental.AUTOTUNE) diff --git a/official/vision/beta/dataloaders/input_reader_factory.py b/official/vision/dataloaders/input_reader_factory.py similarity index 90% rename from official/vision/beta/dataloaders/input_reader_factory.py rename to official/vision/dataloaders/input_reader_factory.py index ffe8ae778cc4ff2dc9d01b4e0d03dcaf7622c1d2..6af7c1b14912549e61ec42bba67b54b1f1de113f 100644 --- a/official/vision/beta/dataloaders/input_reader_factory.py +++ b/official/vision/dataloaders/input_reader_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Factory for getting TF-Vision input readers.""" from official.common import dataset_fn as dataset_fn_util from official.core import config_definitions as cfg from official.core import input_reader as core_input_reader -from official.vision.beta.dataloaders import input_reader as vision_input_reader +from official.vision.dataloaders import input_reader as vision_input_reader def input_reader_generator(params: cfg.DataConfig, diff --git a/official/vision/beta/dataloaders/maskrcnn_input.py b/official/vision/dataloaders/maskrcnn_input.py similarity index 89% rename from official/vision/beta/dataloaders/maskrcnn_input.py rename to official/vision/dataloaders/maskrcnn_input.py index cdaa9e9c8dc6f99d9e76dec448ab2e101701cab1..6163180eb5dab60a356f099dcd94a9cc7943018f 100644 --- a/official/vision/beta/dataloaders/maskrcnn_input.py +++ b/official/vision/dataloaders/maskrcnn_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,15 +14,18 @@ """Data parser and processing for Mask R-CNN.""" -# Import libraries +from typing import Optional +# Import libraries import tensorflow as tf -from official.vision.beta.dataloaders import parser -from official.vision.beta.dataloaders import utils -from official.vision.beta.ops import anchor -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import preprocess_ops +from official.vision.configs import common +from official.vision.dataloaders import parser +from official.vision.dataloaders import utils +from official.vision.ops import anchor +from official.vision.ops import augment +from official.vision.ops import box_ops +from official.vision.ops import preprocess_ops class Parser(parser.Parser): @@ -42,6 +45,7 @@ class Parser(parser.Parser): aug_rand_hflip=False, aug_scale_min=1.0, aug_scale_max=1.0, + aug_type: Optional[common.Augmentation] = None, skip_crowd_during_training=True, max_num_instances=100, include_mask=False, @@ -73,6 +77,9 @@ class Parser(parser.Parser): data augmentation during training. aug_scale_max: `float`, the maximum scale applied to `output_size` for data augmentation during training. + aug_type: An optional Augmentation object with params for AutoAugment. + The AutoAug policy should not use rotation/translation/shear. + Only in-place augmentations can be used. skip_crowd_during_training: `bool`, if True, skip annotations labeled with `is_crowd` equals to 1. max_num_instances: `int` number of maximum number of instances in an @@ -104,6 +111,26 @@ class Parser(parser.Parser): self._aug_scale_min = aug_scale_min self._aug_scale_max = aug_scale_max + if aug_type and aug_type.type: + if aug_type.type == 'autoaug': + self._augmenter = augment.AutoAugment( + augmentation_name=aug_type.autoaug.augmentation_name, + cutout_const=aug_type.autoaug.cutout_const, + translate_const=aug_type.autoaug.translate_const) + elif aug_type.type == 'randaug': + self._augmenter = augment.RandAugment( + num_layers=aug_type.randaug.num_layers, + magnitude=aug_type.randaug.magnitude, + cutout_const=aug_type.randaug.cutout_const, + translate_const=aug_type.randaug.translate_const, + prob_to_apply=aug_type.randaug.prob_to_apply, + exclude_ops=aug_type.randaug.exclude_ops) + else: + raise ValueError('Augmentation policy {} not supported.'.format( + aug_type.type)) + else: + self._augmenter = None + # Mask. self._include_mask = include_mask self._mask_crop_size = mask_crop_size @@ -167,6 +194,9 @@ class Parser(parser.Parser): # Gets original image and its size. image = data['image'] + if self._augmenter is not None: + image = self._augmenter.distort(image) + image_shape = tf.shape(image)[0:2] # Normalizes image with mean and std pixel values. diff --git a/official/vision/beta/dataloaders/parser.py b/official/vision/dataloaders/parser.py similarity index 97% rename from official/vision/beta/dataloaders/parser.py rename to official/vision/dataloaders/parser.py index 104b52b8c1b5c6187d1e2a52458a275082825bd0..2a415cb018a0dc2b16763abd78e9b530fd766301 100644 --- a/official/vision/beta/dataloaders/parser.py +++ b/official/vision/dataloaders/parser.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/dataloaders/retinanet_input.py b/official/vision/dataloaders/retinanet_input.py similarity index 93% rename from official/vision/beta/dataloaders/retinanet_input.py rename to official/vision/dataloaders/retinanet_input.py index 846a0137593ee80d57812c569affa20e6cf65253..c3c4f75284ef72365bfe7b09c6dffa15cc194c9d 100644 --- a/official/vision/beta/dataloaders/retinanet_input.py +++ b/official/vision/dataloaders/retinanet_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,12 +22,12 @@ into (image, labels) tuple for RetinaNet. from absl import logging import tensorflow as tf -from official.vision.beta.dataloaders import parser -from official.vision.beta.dataloaders import utils -from official.vision.beta.ops import anchor -from official.vision.beta.ops import augment -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import preprocess_ops +from official.vision.dataloaders import parser +from official.vision.dataloaders import utils +from official.vision.ops import anchor +from official.vision.ops import augment +from official.vision.ops import box_ops +from official.vision.ops import preprocess_ops class Parser(parser.Parser): @@ -75,7 +75,7 @@ class Parser(parser.Parser): upper-bound threshold to assign negative labels for anchors. An anchor with a score below the threshold is labeled negative. aug_type: An optional Augmentation object to choose from AutoAugment and - RandAugment. The latter is not supported, and will raise ValueError. + RandAugment. aug_rand_hflip: `bool`, if True, augment training with random horizontal flip. aug_scale_min: `float`, the minimum scale applied to `output_size` for @@ -122,8 +122,16 @@ class Parser(parser.Parser): augmentation_name=aug_type.autoaug.augmentation_name, cutout_const=aug_type.autoaug.cutout_const, translate_const=aug_type.autoaug.translate_const) + elif aug_type.type == 'randaug': + logging.info('Using RandAugment.') + self._augmenter = augment.RandAugment.build_for_detection( + num_layers=aug_type.randaug.num_layers, + magnitude=aug_type.randaug.magnitude, + cutout_const=aug_type.randaug.cutout_const, + translate_const=aug_type.randaug.translate_const, + prob_to_apply=aug_type.randaug.prob_to_apply, + exclude_ops=aug_type.randaug.exclude_ops) else: - # TODO(b/205346436) Support RandAugment. raise ValueError(f'Augmentation policy {aug_type.type} not supported.') # Deprecated. Data Augmentation with AutoAugment. @@ -162,7 +170,6 @@ class Parser(parser.Parser): # Apply autoaug or randaug. if self._augmenter is not None: image, boxes = self._augmenter.distort_with_boxes(image, boxes) - image_shape = tf.shape(input=image)[0:2] # Normalizes image with mean and std pixel values. diff --git a/official/vision/beta/dataloaders/segmentation_input.py b/official/vision/dataloaders/segmentation_input.py similarity index 75% rename from official/vision/beta/dataloaders/segmentation_input.py rename to official/vision/dataloaders/segmentation_input.py index 440f555ad670134e568422bbcb50951e1a2b7ab4..ca6820b23cf40f61bafec1019754560b824ab11a 100644 --- a/official/vision/beta/dataloaders/segmentation_input.py +++ b/official/vision/dataloaders/segmentation_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,9 +15,10 @@ """Data parser and processing for segmentation datasets.""" import tensorflow as tf -from official.vision.beta.dataloaders import decoder -from official.vision.beta.dataloaders import parser -from official.vision.beta.ops import preprocess_ops +from official.vision.dataloaders import decoder +from official.vision.dataloaders import parser +from official.vision.dataloaders import utils +from official.vision.ops import preprocess_ops class Decoder(decoder.Decoder): @@ -25,26 +26,29 @@ class Decoder(decoder.Decoder): def __init__(self): self._keys_to_features = { - 'image/encoded': tf.io.FixedLenFeature((), tf.string, default_value=''), - 'image/height': tf.io.FixedLenFeature((), tf.int64, default_value=0), - 'image/width': tf.io.FixedLenFeature((), tf.int64, default_value=0), + 'image/encoded': + tf.io.FixedLenFeature((), tf.string, default_value=''), + 'image/height': + tf.io.FixedLenFeature((), tf.int64, default_value=0), + 'image/width': + tf.io.FixedLenFeature((), tf.int64, default_value=0), 'image/segmentation/class/encoded': tf.io.FixedLenFeature((), tf.string, default_value='') } def decode(self, serialized_example): - return tf.io.parse_single_example( - serialized_example, self._keys_to_features) + return tf.io.parse_single_example(serialized_example, + self._keys_to_features) class Parser(parser.Parser): - """Parser to parse an image and its annotations into a dictionary of tensors. - """ + """Parser to parse an image and its annotations into a dictionary of tensors.""" def __init__(self, output_size, crop_size=None, resize_eval_groundtruth=True, + gt_is_matting_map=False, groundtruth_padded_size=None, ignore_label=255, aug_rand_hflip=False, @@ -63,13 +67,16 @@ class Parser(parser.Parser): original image sizes. resize_eval_groundtruth: `bool`, if True, eval groundtruth masks are resized to output_size. + gt_is_matting_map: `bool`, if True, the expected mask is in the range + between 0 and 255. The parser will normalize the value of the mask into + the range between 0 and 1. groundtruth_padded_size: `Tensor` or `list` for [height, width]. When resize_eval_groundtruth is set to False, the groundtruth masks are padded to this size. ignore_label: `int` the pixel with ignore label will not used for training and evaluation. - aug_rand_hflip: `bool`, if True, augment training with random - horizontal flip. + aug_rand_hflip: `bool`, if True, augment training with random horizontal + flip. preserve_aspect_ratio: `bool`, if True, the aspect ratio is preserved, otherwise, the image is resized to output_size. aug_scale_min: `float`, the minimum scale applied to `output_size` for @@ -84,6 +91,7 @@ class Parser(parser.Parser): if (not resize_eval_groundtruth) and (groundtruth_padded_size is None): raise ValueError('groundtruth_padded_size ([height, width]) needs to be' 'specified when resize_eval_groundtruth is False.') + self._gt_is_matting_map = gt_is_matting_map self._groundtruth_padded_size = groundtruth_padded_size self._ignore_label = ignore_label self._preserve_aspect_ratio = preserve_aspect_ratio @@ -99,8 +107,8 @@ class Parser(parser.Parser): def _prepare_image_and_label(self, data): """Prepare normalized image and label.""" image = tf.io.decode_image(data['image/encoded'], channels=3) - label = tf.io.decode_image(data['image/segmentation/class/encoded'], - channels=1) + label = tf.io.decode_image( + data['image/segmentation/class/encoded'], channels=1) height = data['image/height'] width = data['image/width'] image = tf.reshape(image, (height, width, 3)) @@ -122,6 +130,16 @@ class Parser(parser.Parser): """Parses data for training and evaluation.""" image, label = self._prepare_image_and_label(data) + # Normalize the label into the range of 0 and 1 for matting groundtruth. + # Note that the input groundtruth labels must be 0 to 255, and do not + # contain ignore_label. For gt_is_matting_map case, ignore_label is only + # used for padding the labels. + if self._gt_is_matting_map: + scale = tf.constant(255.0, dtype=tf.float32) + scale = tf.expand_dims(scale, axis=0) + scale = tf.expand_dims(scale, axis=0) + label = tf.cast(label, tf.float32) / scale + if self._crop_size: label = tf.reshape(label, [data['image/height'], data['image/width'], 1]) @@ -132,8 +150,7 @@ class Parser(parser.Parser): label = tf.image.resize(label, self._output_size, method='nearest') image_mask = tf.concat([image, label], axis=2) - image_mask_crop = tf.image.random_crop(image_mask, - self._crop_size + [4]) + image_mask_crop = tf.image.random_crop(image_mask, self._crop_size + [4]) image = image_mask_crop[:, :, :-1] label = tf.reshape(image_mask_crop[:, :, -1], [1] + self._crop_size) @@ -159,13 +176,14 @@ class Parser(parser.Parser): # The label is first offset by +1 and then padded with 0. label += 1 label = tf.expand_dims(label, axis=3) - label = preprocess_ops.resize_and_crop_masks( - label, image_scale, train_image_size, offset) + label = preprocess_ops.resize_and_crop_masks(label, image_scale, + train_image_size, offset) label -= 1 - label = tf.where(tf.equal(label, -1), - self._ignore_label * tf.ones_like(label), label) + label = tf.where( + tf.equal(label, -1), self._ignore_label * tf.ones_like(label), label) label = tf.squeeze(label, axis=0) valid_mask = tf.not_equal(label, self._ignore_label) + labels = { 'masks': label, 'valid_masks': valid_mask, @@ -180,6 +198,12 @@ class Parser(parser.Parser): def _parse_eval_data(self, data): """Parses data for training and evaluation.""" image, label = self._prepare_image_and_label(data) + + # Binarize mask if groundtruth is a matting map + if self._gt_is_matting_map: + label = tf.divide(tf.cast(label, dtype=tf.float32), 255.0) + label = utils.binarize_matting_map(label) + # The label is first offset by +1 and then padded with 0. label += 1 label = tf.expand_dims(label, axis=3) @@ -196,13 +220,13 @@ class Parser(parser.Parser): label = preprocess_ops.resize_and_crop_masks(label, image_scale, self._output_size, offset) else: - label = tf.image.pad_to_bounding_box( - label, 0, 0, self._groundtruth_padded_size[0], - self._groundtruth_padded_size[1]) + label = tf.image.pad_to_bounding_box(label, 0, 0, + self._groundtruth_padded_size[0], + self._groundtruth_padded_size[1]) label -= 1 - label = tf.where(tf.equal(label, -1), - self._ignore_label * tf.ones_like(label), label) + label = tf.where( + tf.equal(label, -1), self._ignore_label * tf.ones_like(label), label) label = tf.squeeze(label, axis=0) valid_mask = tf.not_equal(label, self._ignore_label) diff --git a/official/vision/dataloaders/tf_example_decoder.py b/official/vision/dataloaders/tf_example_decoder.py new file mode 100644 index 0000000000000000000000000000000000000000..e7082cf937d118410dabffb833456f10916f6861 --- /dev/null +++ b/official/vision/dataloaders/tf_example_decoder.py @@ -0,0 +1,188 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tensorflow Example proto decoder for object detection. + +A decoder to decode string tensors containing serialized tensorflow.Example +protos for object detection. +""" +import tensorflow as tf + +from official.vision.dataloaders import decoder + + +def _generate_source_id(image_bytes): + # Hashing using 22 bits since float32 has only 23 mantissa bits. + return tf.strings.as_string( + tf.strings.to_hash_bucket_fast(image_bytes, 2 ** 22 - 1)) + + +class TfExampleDecoder(decoder.Decoder): + """Tensorflow Example proto decoder.""" + + def __init__(self, + include_mask=False, + regenerate_source_id=False, + mask_binarize_threshold=None): + self._include_mask = include_mask + self._regenerate_source_id = regenerate_source_id + self._keys_to_features = { + 'image/encoded': tf.io.FixedLenFeature((), tf.string), + 'image/height': tf.io.FixedLenFeature((), tf.int64, -1), + 'image/width': tf.io.FixedLenFeature((), tf.int64, -1), + 'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32), + 'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32), + 'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32), + 'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32), + 'image/object/class/label': tf.io.VarLenFeature(tf.int64), + 'image/object/area': tf.io.VarLenFeature(tf.float32), + 'image/object/is_crowd': tf.io.VarLenFeature(tf.int64), + } + self._mask_binarize_threshold = mask_binarize_threshold + if include_mask: + self._keys_to_features.update({ + 'image/object/mask': tf.io.VarLenFeature(tf.string), + }) + if not regenerate_source_id: + self._keys_to_features.update({ + 'image/source_id': tf.io.FixedLenFeature((), tf.string), + }) + + def _decode_image(self, parsed_tensors): + """Decodes the image and set its static shape.""" + image = tf.io.decode_image(parsed_tensors['image/encoded'], channels=3) + image.set_shape([None, None, 3]) + return image + + def _decode_boxes(self, parsed_tensors): + """Concat box coordinates in the format of [ymin, xmin, ymax, xmax].""" + xmin = parsed_tensors['image/object/bbox/xmin'] + xmax = parsed_tensors['image/object/bbox/xmax'] + ymin = parsed_tensors['image/object/bbox/ymin'] + ymax = parsed_tensors['image/object/bbox/ymax'] + return tf.stack([ymin, xmin, ymax, xmax], axis=-1) + + def _decode_classes(self, parsed_tensors): + return parsed_tensors['image/object/class/label'] + + def _decode_areas(self, parsed_tensors): + xmin = parsed_tensors['image/object/bbox/xmin'] + xmax = parsed_tensors['image/object/bbox/xmax'] + ymin = parsed_tensors['image/object/bbox/ymin'] + ymax = parsed_tensors['image/object/bbox/ymax'] + height = tf.cast(parsed_tensors['image/height'], dtype=tf.float32) + width = tf.cast(parsed_tensors['image/width'], dtype=tf.float32) + return tf.cond( + tf.greater(tf.shape(parsed_tensors['image/object/area'])[0], 0), + lambda: parsed_tensors['image/object/area'], + lambda: (xmax - xmin) * (ymax - ymin) * height * width) + + def _decode_masks(self, parsed_tensors): + """Decode a set of PNG masks to the tf.float32 tensors.""" + + def _decode_png_mask(png_bytes): + mask = tf.squeeze( + tf.io.decode_png(png_bytes, channels=1, dtype=tf.uint8), axis=-1) + mask = tf.cast(mask, dtype=tf.float32) + mask.set_shape([None, None]) + return mask + + height = parsed_tensors['image/height'] + width = parsed_tensors['image/width'] + masks = parsed_tensors['image/object/mask'] + return tf.cond( + pred=tf.greater(tf.size(input=masks), 0), + true_fn=lambda: tf.map_fn(_decode_png_mask, masks, dtype=tf.float32), + false_fn=lambda: tf.zeros([0, height, width], dtype=tf.float32)) + + def decode(self, serialized_example): + """Decode the serialized example. + + Args: + serialized_example: a single serialized tf.Example string. + + Returns: + decoded_tensors: a dictionary of tensors with the following fields: + - source_id: a string scalar tensor. + - image: a uint8 tensor of shape [None, None, 3]. + - height: an integer scalar tensor. + - width: an integer scalar tensor. + - groundtruth_classes: a int64 tensor of shape [None]. + - groundtruth_is_crowd: a bool tensor of shape [None]. + - groundtruth_area: a float32 tensor of shape [None]. + - groundtruth_boxes: a float32 tensor of shape [None, 4]. + - groundtruth_instance_masks: a float32 tensor of shape + [None, None, None]. + - groundtruth_instance_masks_png: a string tensor of shape [None]. + """ + parsed_tensors = tf.io.parse_single_example( + serialized=serialized_example, features=self._keys_to_features) + for k in parsed_tensors: + if isinstance(parsed_tensors[k], tf.SparseTensor): + if parsed_tensors[k].dtype == tf.string: + parsed_tensors[k] = tf.sparse.to_dense( + parsed_tensors[k], default_value='') + else: + parsed_tensors[k] = tf.sparse.to_dense( + parsed_tensors[k], default_value=0) + + if self._regenerate_source_id: + source_id = _generate_source_id(parsed_tensors['image/encoded']) + else: + source_id = tf.cond( + tf.greater(tf.strings.length(parsed_tensors['image/source_id']), 0), + lambda: parsed_tensors['image/source_id'], + lambda: _generate_source_id(parsed_tensors['image/encoded'])) + image = self._decode_image(parsed_tensors) + boxes = self._decode_boxes(parsed_tensors) + classes = self._decode_classes(parsed_tensors) + areas = self._decode_areas(parsed_tensors) + + decode_image_shape = tf.logical_or( + tf.equal(parsed_tensors['image/height'], -1), + tf.equal(parsed_tensors['image/width'], -1)) + image_shape = tf.cast(tf.shape(image), dtype=tf.int64) + + parsed_tensors['image/height'] = tf.where(decode_image_shape, + image_shape[0], + parsed_tensors['image/height']) + parsed_tensors['image/width'] = tf.where(decode_image_shape, image_shape[1], + parsed_tensors['image/width']) + + is_crowds = tf.cond( + tf.greater(tf.shape(parsed_tensors['image/object/is_crowd'])[0], 0), + lambda: tf.cast(parsed_tensors['image/object/is_crowd'], dtype=tf.bool), + lambda: tf.zeros_like(classes, dtype=tf.bool)) + if self._include_mask: + masks = self._decode_masks(parsed_tensors) + + if self._mask_binarize_threshold is not None: + masks = tf.cast(masks > self._mask_binarize_threshold, tf.float32) + + decoded_tensors = { + 'source_id': source_id, + 'image': image, + 'height': parsed_tensors['image/height'], + 'width': parsed_tensors['image/width'], + 'groundtruth_classes': classes, + 'groundtruth_is_crowd': is_crowds, + 'groundtruth_area': areas, + 'groundtruth_boxes': boxes, + } + if self._include_mask: + decoded_tensors.update({ + 'groundtruth_instance_masks': masks, + 'groundtruth_instance_masks_png': parsed_tensors['image/object/mask'], + }) + return decoded_tensors diff --git a/official/vision/beta/dataloaders/tf_example_decoder_test.py b/official/vision/dataloaders/tf_example_decoder_test.py similarity index 92% rename from official/vision/beta/dataloaders/tf_example_decoder_test.py rename to official/vision/dataloaders/tf_example_decoder_test.py index 4a8ec93ca76f048c13474f0d0226704bab7093a4..8b66e752a97b4475f5d6cde275f46522527a12f6 100644 --- a/official/vision/beta/dataloaders/tf_example_decoder_test.py +++ b/official/vision/dataloaders/tf_example_decoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,25 +19,28 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.dataloaders import tf_example_decoder -from official.vision.beta.dataloaders import tfexample_utils +from official.vision.dataloaders import tf_example_decoder +from official.vision.dataloaders import tfexample_utils class TfExampleDecoderTest(tf.test.TestCase, parameterized.TestCase): @parameterized.parameters( - (100, 100, 0, True), - (100, 100, 1, True), - (100, 100, 2, True), - (100, 100, 0, False), - (100, 100, 1, False), - (100, 100, 2, False), + (100, 100, 0, True, True), + (100, 100, 1, True, True), + (100, 100, 2, True, True), + (100, 100, 0, False, True), + (100, 100, 1, False, True), + (100, 100, 2, False, True), + (100, 100, 0, True, False), + (100, 100, 1, True, False), + (100, 100, 2, True, False), + (100, 100, 0, False, False), + (100, 100, 1, False, False), + (100, 100, 2, False, False), ) - def test_result_shape(self, - image_height, - image_width, - num_instances, - regenerate_source_id): + def test_result_shape(self, image_height, image_width, num_instances, + regenerate_source_id, fill_image_size): decoder = tf_example_decoder.TfExampleDecoder( include_mask=True, regenerate_source_id=regenerate_source_id) @@ -45,7 +48,9 @@ class TfExampleDecoderTest(tf.test.TestCase, parameterized.TestCase): image_height=image_height, image_width=image_width, image_channel=3, - num_instances=num_instances).SerializeToString() + num_instances=num_instances, + fill_image_size=fill_image_size, + ).SerializeToString() decoded_tensors = decoder.decode( tf.convert_to_tensor(value=serialized_example)) diff --git a/official/vision/beta/dataloaders/tf_example_label_map_decoder.py b/official/vision/dataloaders/tf_example_label_map_decoder.py similarity index 95% rename from official/vision/beta/dataloaders/tf_example_label_map_decoder.py rename to official/vision/dataloaders/tf_example_label_map_decoder.py index 14ebd2f831fd906e9d324f9b344750a2629384b5..a2f04477b1915883eaceb32a08dfa07095ca6d43 100644 --- a/official/vision/beta/dataloaders/tf_example_label_map_decoder.py +++ b/official/vision/dataloaders/tf_example_label_map_decoder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ import csv # Import libraries import tensorflow as tf -from official.vision.beta.dataloaders import tf_example_decoder +from official.vision.dataloaders import tf_example_decoder class TfExampleDecoderLabelMap(tf_example_decoder.TfExampleDecoder): diff --git a/official/vision/beta/dataloaders/tf_example_label_map_decoder_test.py b/official/vision/dataloaders/tf_example_label_map_decoder_test.py similarity index 97% rename from official/vision/beta/dataloaders/tf_example_label_map_decoder_test.py rename to official/vision/dataloaders/tf_example_label_map_decoder_test.py index 425ad9f77d37eff2914666a5f3acabef2b16d9f8..3ff9a8b3c14c70ea2289e9cf6783a02f93db5b72 100644 --- a/official/vision/beta/dataloaders/tf_example_label_map_decoder_test.py +++ b/official/vision/dataloaders/tf_example_label_map_decoder_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,8 +20,8 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.dataloaders import tf_example_label_map_decoder -from official.vision.beta.dataloaders import tfexample_utils +from official.vision.dataloaders import tf_example_label_map_decoder +from official.vision.dataloaders import tfexample_utils LABEL_MAP_CSV_CONTENT = '0,class_0\n1,class_1\n2,class_2' diff --git a/official/vision/beta/dataloaders/tfds_classification_decoders.py b/official/vision/dataloaders/tfds_classification_decoders.py similarity index 90% rename from official/vision/beta/dataloaders/tfds_classification_decoders.py rename to official/vision/dataloaders/tfds_classification_decoders.py index 36f6e28f734a41509944caa1dbfd09911f1a3acc..a27cdf5f3ce29d7766f75c7565e069acc680586a 100644 --- a/official/vision/beta/dataloaders/tfds_classification_decoders.py +++ b/official/vision/dataloaders/tfds_classification_decoders.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,7 @@ """TFDS Classification decoders.""" import tensorflow as tf -from official.vision.beta.dataloaders import decoder +from official.vision.dataloaders import decoder class ClassificationDecorder(decoder.Decoder): diff --git a/official/vision/beta/dataloaders/tfds_detection_decoders.py b/official/vision/dataloaders/tfds_detection_decoders.py similarity index 94% rename from official/vision/beta/dataloaders/tfds_detection_decoders.py rename to official/vision/dataloaders/tfds_detection_decoders.py index fef7d2f24ef42d2f6f33c29ea35e516a71bbd345..4c270128a1215731c6135abd434d2778eda3442b 100644 --- a/official/vision/beta/dataloaders/tfds_detection_decoders.py +++ b/official/vision/dataloaders/tfds_detection_decoders.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,7 @@ """TFDS detection decoders.""" import tensorflow as tf -from official.vision.beta.dataloaders import decoder +from official.vision.dataloaders import decoder class MSCOCODecoder(decoder.Decoder): diff --git a/official/vision/beta/dataloaders/tfds_factory.py b/official/vision/dataloaders/tfds_factory.py similarity index 86% rename from official/vision/beta/dataloaders/tfds_factory.py rename to official/vision/dataloaders/tfds_factory.py index 67190df8f6e621bd0052aa767bc698643572a4bb..8f4c877043c8e994cf20ef9113e451cb234e66b4 100644 --- a/official/vision/beta/dataloaders/tfds_factory.py +++ b/official/vision/dataloaders/tfds_factory.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,10 +13,10 @@ # limitations under the License. """TFDS factory functions.""" -from official.vision.beta.dataloaders import decoder as base_decoder -from official.vision.beta.dataloaders import tfds_detection_decoders -from official.vision.beta.dataloaders import tfds_segmentation_decoders -from official.vision.beta.dataloaders import tfds_classification_decoders +from official.vision.dataloaders import decoder as base_decoder +from official.vision.dataloaders import tfds_detection_decoders +from official.vision.dataloaders import tfds_segmentation_decoders +from official.vision.dataloaders import tfds_classification_decoders def get_classification_decoder(tfds_name: str) -> base_decoder.Decoder: diff --git a/official/vision/beta/dataloaders/tfds_factory_test.py b/official/vision/dataloaders/tfds_factory_test.py similarity index 95% rename from official/vision/beta/dataloaders/tfds_factory_test.py rename to official/vision/dataloaders/tfds_factory_test.py index 5c22f46c03b04bf5334fe8670054bc3be81ae88b..c9397a9724da05a8302237aec6d84102f97290b3 100644 --- a/official/vision/beta/dataloaders/tfds_factory_test.py +++ b/official/vision/dataloaders/tfds_factory_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.dataloaders import decoder as base_decoder -from official.vision.beta.dataloaders import tfds_factory +from official.vision.dataloaders import decoder as base_decoder +from official.vision.dataloaders import tfds_factory class TFDSFactoryTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/vision/beta/dataloaders/tfds_segmentation_decoders.py b/official/vision/dataloaders/tfds_segmentation_decoders.py similarity index 95% rename from official/vision/beta/dataloaders/tfds_segmentation_decoders.py rename to official/vision/dataloaders/tfds_segmentation_decoders.py index 4b6985fcdbda28282821e3952ca3661bbaf096b4..9613d0f00ff0ae69631df5f038e1576d0706cc55 100644 --- a/official/vision/beta/dataloaders/tfds_segmentation_decoders.py +++ b/official/vision/dataloaders/tfds_segmentation_decoders.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,7 +15,7 @@ """TFDS Semantic Segmentation decoders.""" import tensorflow as tf -from official.vision.beta.dataloaders import decoder +from official.vision.dataloaders import decoder class CityScapesDecorder(decoder.Decoder): diff --git a/official/vision/dataloaders/tfexample_utils.py b/official/vision/dataloaders/tfexample_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..11c2baee3ba2f1c6a3e565976d5d941f885c761d --- /dev/null +++ b/official/vision/dataloaders/tfexample_utils.py @@ -0,0 +1,310 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Utility functions to create tf.Example and tf.SequnceExample for test. + +Example:video classification end-to-end test +i.e. from reading input file to train and eval. + +```python +class FooTrainTest(tf.test.TestCase): + + def setUp(self): + super(TrainTest, self).setUp() + + # Write the fake tf.train.SequenceExample to file for test. + data_dir = os.path.join(self.get_temp_dir(), 'data') + tf.io.gfile.makedirs(data_dir) + self._data_path = os.path.join(data_dir, 'data.tfrecord') + examples = [ + tfexample_utils.make_video_test_example( + image_shape=(36, 36, 3), + audio_shape=(20, 128), + label=random.randint(0, 100)) for _ in range(2) + ] + tfexample_utils.dump_to_tfrecord(self._data_path, tf_examples=examples) + + def test_foo(self): + dataset = tf.data.TFRecordDataset(self._data_path) + ... + +``` + +""" +from typing import Sequence, Union + +import numpy as np +import tensorflow as tf + +from official.core import file_writers +from official.vision.data import fake_feature_generator +from official.vision.data import image_utils +from official.vision.data import tf_example_builder + +IMAGE_KEY = 'image/encoded' +CLASSIFICATION_LABEL_KEY = 'image/class/label' +DISTILLATION_LABEL_KEY = 'image/class/soft_labels' +LABEL_KEY = 'clip/label/index' +AUDIO_KEY = 'features/audio' +DUMP_SOURCE_ID = b'7435790' + + +def encode_image(image_array: np.ndarray, fmt: str) -> bytes: + return image_utils.encode_image(image_array, fmt) + + +def make_image_bytes(shape: Sequence[int], fmt: str = 'JPEG') -> bytes: + """Generates image and return bytes in specified format.""" + image = fake_feature_generator.generate_image_np(*shape) + return encode_image(image, fmt=fmt) + + +def put_int64_to_context(seq_example: tf.train.SequenceExample, + label: int = 0, + key: str = LABEL_KEY): + """Puts int64 to SequenceExample context with key.""" + seq_example.context.feature[key].int64_list.value[:] = [label] + + +def put_bytes_list_to_feature(seq_example: tf.train.SequenceExample, + raw_image_bytes: bytes, + key: str = IMAGE_KEY, + repeat_num: int = 2): + """Puts bytes list to SequenceExample context with key.""" + for _ in range(repeat_num): + seq_example.feature_lists.feature_list.get_or_create( + key).feature.add().bytes_list.value[:] = [raw_image_bytes] + + +def put_float_list_to_feature(seq_example: tf.train.SequenceExample, + value: Sequence[Sequence[float]], key: str): + """Puts float list to SequenceExample context with key.""" + for s in value: + seq_example.feature_lists.feature_list.get_or_create( + key).feature.add().float_list.value[:] = s + + +def make_video_test_example(image_shape: Sequence[int] = (263, 320, 3), + audio_shape: Sequence[int] = (10, 256), + label: int = 42): + """Generates data for testing video models (inc. RGB, audio, & label).""" + raw_image_bytes = make_image_bytes(shape=image_shape) + random_audio = np.random.normal(size=audio_shape).tolist() + + seq_example = tf.train.SequenceExample() + put_int64_to_context(seq_example, label=label, key=LABEL_KEY) + put_bytes_list_to_feature( + seq_example, raw_image_bytes, key=IMAGE_KEY, repeat_num=4) + + put_float_list_to_feature(seq_example, value=random_audio, key=AUDIO_KEY) + return seq_example + + +def dump_to_tfrecord(record_file: str, + tf_examples: Sequence[Union[tf.train.Example, + tf.train.SequenceExample]]): + """Writes serialized Example to TFRecord file with path. + + Note that examples are expected to be not seriazlied. + + Args: + record_file: The name of the output file. + tf_examples: A list of examples to be stored. + """ + file_writers.write_small_dataset(tf_examples, record_file, 'tfrecord') + + +def create_classification_example( + image_height: int, + image_width: int, + image_format: str = 'JPEG', + is_multilabel: bool = False, + output_serialized_example: bool = True) -> tf.train.Example: + """Creates image and labels for image classification input pipeline. + + Args: + image_height: The height of test image. + image_width: The width of test image. + image_format: The format of test image. + is_multilabel: A boolean flag represents whether the test image can have + multiple labels. + output_serialized_example: A boolean flag represents whether to return a + serialized example. + + Returns: + A tf.train.Example for testing. + """ + image = fake_feature_generator.generate_image_np(image_height, image_width) + labels = fake_feature_generator.generate_classes_np(2, + int(is_multilabel) + + 1).tolist() + builder = tf_example_builder.TfExampleBuilder() + example = builder.add_image_matrix_feature(image, + image_format).add_ints_feature( + CLASSIFICATION_LABEL_KEY, + labels).example + if output_serialized_example: + return example.SerializeToString() + return example + + +def create_distillation_example( + image_height: int, + image_width: int, + num_labels: int, + image_format: str = 'JPEG', + output_serialized_example: bool = True) -> tf.train.Example: + """Creates image and labels for image classification with distillation. + + Args: + image_height: The height of test image. + image_width: The width of test image. + num_labels: The number of labels used in test image. + image_format: The format of test image. + output_serialized_example: A boolean flag represents whether to return a + serialized example. + + Returns: + A tf.train.Example for testing. + """ + image = fake_feature_generator.generate_image_np(image_height, image_width) + labels = fake_feature_generator.generate_classes_np(2, 1).tolist() + soft_labels = (fake_feature_generator.generate_classes_np(1, num_labels) + + 0.6).tolist() + builder = tf_example_builder.TfExampleBuilder() + example = builder.add_image_matrix_feature(image, + image_format).add_ints_feature( + CLASSIFICATION_LABEL_KEY, + labels).add_floats_feature( + DISTILLATION_LABEL_KEY, + soft_labels).example + if output_serialized_example: + return example.SerializeToString() + return example + + +def create_3d_image_test_example( + image_height: int, + image_width: int, + image_volume: int, + image_channel: int, + output_serialized_example: bool = False) -> tf.train.Example: + """Creates 3D image and label. + + Args: + image_height: The height of test 3D image. + image_width: The width of test 3D image. + image_volume: The volume of test 3D image. + image_channel: The channel of test 3D image. + output_serialized_example: A boolean flag represents whether to return a + serialized example. + + Returns: + A tf.train.Example for testing. + """ + image = fake_feature_generator.generate_image_np(image_height, image_width, + image_channel) + images = image[:, :, np.newaxis, :] + images = np.tile(images, [1, 1, image_volume, 1]).astype(np.float32) + + shape = [image_height, image_width, image_volume, image_channel] + labels = fake_feature_generator.generate_classes_np( + 2, np.prod(shape)).reshape(shape).astype(np.float32) + + builder = tf_example_builder.TfExampleBuilder() + example = builder.add_bytes_feature(IMAGE_KEY, + images.tobytes()).add_bytes_feature( + CLASSIFICATION_LABEL_KEY, + labels.tobytes()).example + if output_serialized_example: + return example.SerializeToString() + return example + + +def create_detection_test_example( + image_height: int, + image_width: int, + image_channel: int, + num_instances: int, + fill_image_size: bool = True, + output_serialized_example: bool = False) -> tf.train.Example: + """Creates and returns a test example containing box and mask annotations. + + Args: + image_height: The height of test image. + image_width: The width of test image. + image_channel: The channel of test image. + num_instances: The number of object instances per image. + fill_image_size: If image height and width will be added to the example. + output_serialized_example: A boolean flag represents whether to return a + serialized example. + + Returns: + A tf.train.Example for testing. + """ + image = fake_feature_generator.generate_image_np(image_height, image_width, + image_channel) + boxes = fake_feature_generator.generate_normalized_boxes_np(num_instances) + ymins, xmins, ymaxs, xmaxs = boxes.T.tolist() + is_crowds = [0] * num_instances + labels = fake_feature_generator.generate_classes_np( + 2, size=num_instances).tolist() + labels_text = [b'class_1'] * num_instances + masks = fake_feature_generator.generate_instance_masks_np( + image_height, image_width, boxes) + + builder = tf_example_builder.TfExampleBuilder() + + example = builder.add_image_matrix_feature(image).add_boxes_feature( + xmins, xmaxs, ymins, ymaxs, + labels).add_instance_mask_matrices_feature(masks).add_ints_feature( + 'image/object/is_crowd', + is_crowds).add_bytes_feature('image/object/class/text', + labels_text).example + if not fill_image_size: + del example.features.feature['image/height'] + del example.features.feature['image/width'] + + if output_serialized_example: + return example.SerializeToString() + return example + + +def create_segmentation_test_example( + image_height: int, + image_width: int, + image_channel: int, + output_serialized_example: bool = False) -> tf.train.Example: + """Creates and returns a test example containing mask annotations. + + Args: + image_height: The height of test image. + image_width: The width of test image. + image_channel: The channel of test image. + output_serialized_example: A boolean flag represents whether to return a + serialized example. + + Returns: + A tf.train.Example for testing. + """ + image = fake_feature_generator.generate_image_np(image_height, image_width, + image_channel) + mask = fake_feature_generator.generate_semantic_mask_np( + image_height, image_width, 3) + builder = tf_example_builder.TfExampleBuilder() + example = builder.add_image_matrix_feature( + image).add_semantic_mask_matrix_feature(mask).example + if output_serialized_example: + return example.SerializeToString() + return example diff --git a/official/vision/dataloaders/utils.py b/official/vision/dataloaders/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..bbe0d499694ce461ca32d2f9226979efa863798b --- /dev/null +++ b/official/vision/dataloaders/utils.py @@ -0,0 +1,86 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Data loader utils.""" +from typing import Dict + +# Import libraries +import tensorflow as tf + +from official.vision.ops import preprocess_ops + + +def process_source_id(source_id: tf.Tensor) -> tf.Tensor: + """Processes source_id to the right format. + + Args: + source_id: A `tf.Tensor` that contains the source ID. It can be empty. + + Returns: + A formatted source ID. + """ + if source_id.dtype == tf.string: + source_id = tf.strings.to_number(source_id, tf.int64) + with tf.control_dependencies([source_id]): + source_id = tf.cond( + pred=tf.equal(tf.size(input=source_id), 0), + true_fn=lambda: tf.cast(tf.constant(-1), tf.int64), + false_fn=lambda: tf.identity(source_id)) + return source_id + + +def pad_groundtruths_to_fixed_size(groundtruths: Dict[str, tf.Tensor], + size: int) -> Dict[str, tf.Tensor]: + """Pads the first dimension of groundtruths labels to the fixed size. + + Args: + groundtruths: A dictionary of {`str`: `tf.Tensor`} that contains groundtruth + annotations of `boxes`, `is_crowds`, `areas` and `classes`. + size: An `int` that specifies the expected size of the first dimension of + padded tensors. + + Returns: + A dictionary of the same keys as input and padded tensors as values. + + """ + groundtruths['boxes'] = preprocess_ops.clip_or_pad_to_fixed_size( + groundtruths['boxes'], size, -1) + groundtruths['is_crowds'] = preprocess_ops.clip_or_pad_to_fixed_size( + groundtruths['is_crowds'], size, 0) + groundtruths['areas'] = preprocess_ops.clip_or_pad_to_fixed_size( + groundtruths['areas'], size, -1) + groundtruths['classes'] = preprocess_ops.clip_or_pad_to_fixed_size( + groundtruths['classes'], size, -1) + if 'attributes' in groundtruths: + for k, v in groundtruths['attributes'].items(): + groundtruths['attributes'][k] = preprocess_ops.clip_or_pad_to_fixed_size( + v, size, -1) + return groundtruths + + +def binarize_matting_map(matting_map: tf.Tensor, + threshold: float = 0.5) -> tf.Tensor: + """Binarizes a matting map. + + If the matting_map value is above a threshold, set it as 1 otherwise 0. The + binarization is done for every element in the matting_map. + + Args: + matting_map: The groundtruth in the matting map format. + threshold: The threshold used to binarize the matting map. + + Returns: + The binarized labels (0 for BG, 1 for FG) as tf.float32. + """ + return tf.cast(tf.greater(matting_map, threshold), tf.float32) diff --git a/official/vision/beta/dataloaders/utils_test.py b/official/vision/dataloaders/utils_test.py similarity index 95% rename from official/vision/beta/dataloaders/utils_test.py rename to official/vision/dataloaders/utils_test.py index 7c728bbd2d2c0b908197dd2e1545363976ab37e4..8622b9b414cb5eb2295d64bb17ff12e38849dd3a 100644 --- a/official/vision/beta/dataloaders/utils_test.py +++ b/official/vision/dataloaders/utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.dataloaders import utils +from official.vision.dataloaders import utils class UtilsTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/vision/beta/dataloaders/video_input.py b/official/vision/dataloaders/video_input.py similarity index 82% rename from official/vision/beta/dataloaders/video_input.py rename to official/vision/dataloaders/video_input.py index de5669ec2a63ea40fda6dbd39a9fac81537c5bcf..e0dc1467c2d205965776987989a37946e6124e41 100644 --- a/official/vision/beta/dataloaders/video_input.py +++ b/official/vision/dataloaders/video_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Parser for video and label datasets.""" from typing import Dict, Optional, Tuple, Union @@ -20,11 +19,11 @@ from typing import Dict, Optional, Tuple, Union from absl import logging import tensorflow as tf -from official.vision.beta.configs import video_classification as exp_cfg -from official.vision.beta.dataloaders import decoder -from official.vision.beta.dataloaders import parser -from official.vision.beta.ops import augment -from official.vision.beta.ops import preprocess_ops_3d +from official.vision.configs import video_classification as exp_cfg +from official.vision.dataloaders import decoder +from official.vision.dataloaders import parser +from official.vision.ops import augment +from official.vision.ops import preprocess_ops_3d IMAGE_KEY = 'image/encoded' LABEL_KEY = 'clip/label/index' @@ -37,7 +36,8 @@ def process_image(image: tf.Tensor, random_stride_range: int = 0, num_test_clips: int = 1, min_resize: int = 256, - crop_size: int = 224, + crop_size: Union[int, Tuple[int, int]] = 224, + num_channels: int = 3, num_crops: int = 1, zero_centering_image: bool = False, min_aspect_ratio: float = 0.5, @@ -65,8 +65,10 @@ def process_image(image: tf.Tensor, If 1, then a single clip in the middle of the video is sampled. The clips are aggreagated in the batch dimension. min_resize: Frames are resized so that min(height, width) is min_resize. - crop_size: Final size of the frame after cropping the resized frames. Both - height and width are the same. + crop_size: Final size of the frame after cropping the resized frames. + Optionally, specify a tuple of (crop_height, crop_width) if + crop_height != crop_width. + num_channels: Number of channels of the clip. num_crops: Number of crops to perform on the resized frames. zero_centering_image: If True, frames are normalized to values in [-1, 1]. If False, values in [0, 1]. @@ -79,7 +81,7 @@ def process_image(image: tf.Tensor, Returns: Processed frames. Tensor of shape - [num_frames * num_test_clips, crop_size, crop_size, 3]. + [num_frames * num_test_clips, crop_height, crop_width, num_channels]. """ # Validate parameters. if is_training and num_test_clips != 1: @@ -91,6 +93,10 @@ def process_image(image: tf.Tensor, raise ValueError('Random stride range should be >= 0, got {}'.format( random_stride_range)) + if isinstance(crop_size, int): + crop_size = (crop_size, crop_size) + crop_height, crop_width = crop_size + # Temporal sampler. if is_training: if random_stride_range > 0: @@ -114,12 +120,12 @@ def process_image(image: tf.Tensor, # Decode JPEG string to tf.uint8. if image.dtype == tf.string: - image = preprocess_ops_3d.decode_jpeg(image, 3) + image = preprocess_ops_3d.decode_jpeg(image, num_channels) if is_training: # Standard image data augmentation: random resized crop and random flip. image = preprocess_ops_3d.random_crop_resize( - image, crop_size, crop_size, num_frames, 3, + image, crop_height, crop_width, num_frames, num_channels, (min_aspect_ratio, max_aspect_ratio), (min_area_ratio, max_area_ratio)) image = preprocess_ops_3d.random_flip_left_right(image, seed) @@ -130,7 +136,7 @@ def process_image(image: tf.Tensor, # Resize images (resize happens only if necessary to save compute). image = preprocess_ops_3d.resize_smallest(image, min_resize) # Crop of the frames. - image = preprocess_ops_3d.crop_image(image, crop_size, crop_size, False, + image = preprocess_ops_3d.crop_image(image, crop_height, crop_width, False, num_crops) # Cast the frames in float32, normalizing according to zero_centering_image. @@ -174,15 +180,16 @@ def postprocess_image(image: tf.Tensor, def process_label(label: tf.Tensor, one_hot_label: bool = True, - num_classes: Optional[int] = None) -> tf.Tensor: + num_classes: Optional[int] = None, + label_dtype: tf.DType = tf.int32) -> tf.Tensor: """Processes label Tensor.""" # Validate parameters. if one_hot_label and not num_classes: raise ValueError( '`num_classes` should be given when requesting one hot label.') - # Cast to tf.int32. - label = tf.cast(label, dtype=tf.int32) + # Cast to label_dtype (default = tf.int32). + label = tf.cast(label, dtype=label_dtype) if one_hot_label: # Replace label index by one hot representation. @@ -270,13 +277,19 @@ class Parser(parser.Parser): self._random_stride_range = input_params.random_stride_range self._num_test_clips = input_params.num_test_clips self._min_resize = input_params.min_image_size - self._crop_size = input_params.feature_shape[1] + crop_height = input_params.feature_shape[1] + crop_width = input_params.feature_shape[2] + self._crop_size = crop_height if crop_height == crop_width else ( + crop_height, crop_width) + self._num_channels = input_params.feature_shape[3] self._num_crops = input_params.num_test_crops + self._zero_centering_image = input_params.zero_centering_image self._one_hot_label = input_params.one_hot self._num_classes = input_params.num_classes self._image_key = image_key self._label_key = label_key self._dtype = tf.dtypes.as_dtype(input_params.dtype) + self._label_dtype = tf.dtypes.as_dtype(input_params.label_dtype) self._output_audio = input_params.output_audio self._min_aspect_ratio = input_params.aug_min_aspect_ratio self._max_aspect_ratio = input_params.aug_max_aspect_ratio @@ -286,18 +299,28 @@ class Parser(parser.Parser): self._audio_feature = input_params.audio_feature self._audio_shape = input_params.audio_feature_shape - self._augmenter = None - if input_params.aug_type is not None: - aug_type = input_params.aug_type - if aug_type == 'autoaug': + aug_type = input_params.aug_type + if aug_type is not None: + if aug_type.type == 'autoaug': logging.info('Using AutoAugment.') - self._augmenter = augment.AutoAugment() - elif aug_type == 'randaug': + self._augmenter = augment.AutoAugment( + augmentation_name=aug_type.autoaug.augmentation_name, + cutout_const=aug_type.autoaug.cutout_const, + translate_const=aug_type.autoaug.translate_const) + elif aug_type.type == 'randaug': logging.info('Using RandAugment.') - self._augmenter = augment.RandAugment() + self._augmenter = augment.RandAugment( + num_layers=aug_type.randaug.num_layers, + magnitude=aug_type.randaug.magnitude, + cutout_const=aug_type.randaug.cutout_const, + translate_const=aug_type.randaug.translate_const, + prob_to_apply=aug_type.randaug.prob_to_apply, + exclude_ops=aug_type.randaug.exclude_ops) else: - raise ValueError('Augmentation policy {} is not supported.'.format( - aug_type)) + raise ValueError( + 'Augmentation policy {} not supported.'.format(aug_type.type)) + else: + self._augmenter = None def _parse_train_data( self, decoded_tensors: Dict[str, tf.Tensor] @@ -314,17 +337,20 @@ class Parser(parser.Parser): num_test_clips=self._num_test_clips, min_resize=self._min_resize, crop_size=self._crop_size, + num_channels=self._num_channels, min_aspect_ratio=self._min_aspect_ratio, max_aspect_ratio=self._max_aspect_ratio, min_area_ratio=self._min_area_ratio, max_area_ratio=self._max_area_ratio, - augmenter=self._augmenter) + augmenter=self._augmenter, + zero_centering_image=self._zero_centering_image) image = tf.cast(image, dtype=self._dtype) features = {'image': image} label = decoded_tensors[self._label_key] - label = process_label(label, self._one_hot_label, self._num_classes) + label = process_label(label, self._one_hot_label, self._num_classes, + self._label_dtype) if self._output_audio: audio = decoded_tensors[self._audio_feature] @@ -350,18 +376,21 @@ class Parser(parser.Parser): num_test_clips=self._num_test_clips, min_resize=self._min_resize, crop_size=self._crop_size, - num_crops=self._num_crops) + num_channels=self._num_channels, + num_crops=self._num_crops, + zero_centering_image=self._zero_centering_image) image = tf.cast(image, dtype=self._dtype) features = {'image': image} label = decoded_tensors[self._label_key] - label = process_label(label, self._one_hot_label, self._num_classes) + label = process_label(label, self._one_hot_label, self._num_classes, + self._label_dtype) if self._output_audio: audio = decoded_tensors[self._audio_feature] audio = tf.cast(audio, dtype=self._dtype) audio = preprocess_ops_3d.sample_sequence( - audio, 20, random=False, stride=1) + audio, self._audio_shape[0], random=False, stride=1) audio = tf.ensure_shape(audio, self._audio_shape) features['audio'] = audio diff --git a/official/vision/beta/dataloaders/video_input_test.py b/official/vision/dataloaders/video_input_test.py similarity index 85% rename from official/vision/beta/dataloaders/video_input_test.py rename to official/vision/dataloaders/video_input_test.py index 4ba495ddfcd2d1e842b7d6c6e464605eb1cb70b5..93039b98e5f817334975d5da48028f4ecd858a7d 100644 --- a/official/vision/beta/dataloaders/video_input_test.py +++ b/official/vision/dataloaders/video_input_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 import io @@ -22,8 +21,9 @@ from PIL import Image import tensorflow as tf import tensorflow_datasets as tfds -from official.vision.beta.configs import video_classification as exp_cfg -from official.vision.beta.dataloaders import video_input +from official.vision.configs import common +from official.vision.configs import video_classification as exp_cfg +from official.vision.dataloaders import video_input AUDIO_KEY = 'features/audio' @@ -174,7 +174,8 @@ class VideoAndLabelParserTest(tf.test.TestCase): params.min_image_size = 224 params.temporal_stride = 2 - params.aug_type = 'autoaug' + params.aug_type = common.Augmentation( + type='autoaug', autoaug=common.AutoAugment()) decoder = video_input.Decoder() parser = video_input.Parser(params).parse_fn(params.is_training) @@ -190,6 +191,28 @@ class VideoAndLabelParserTest(tf.test.TestCase): self.assertAllEqual(image.shape, (2, 224, 224, 3)) self.assertAllEqual(label.shape, (600,)) + def test_video_input_image_shape_label_type(self): + params = exp_cfg.kinetics600(is_training=True) + params.feature_shape = (2, 168, 224, 1) + params.min_image_size = 168 + params.label_dtype = 'float32' + params.one_hot = False + + decoder = video_input.Decoder() + parser = video_input.Parser(params).parse_fn(params.is_training) + + seq_example, label = fake_seq_example() + + input_tensor = tf.constant(seq_example.SerializeToString()) + decoded_tensors = decoder.decode(input_tensor) + output_tensor = parser(decoded_tensors) + image_features, label = output_tensor + image = image_features['image'] + + self.assertAllEqual(image.shape, (2, 168, 224, 1)) + self.assertAllEqual(label.shape, (1,)) + self.assertDTypeEqual(label, tf.float32) + if __name__ == '__main__': tf.test.main() diff --git a/official/vision/detection/README.md b/official/vision/detection/README.md deleted file mode 100644 index 2633f86d5dc4feed71ba170b92e9ffb66021652d..0000000000000000000000000000000000000000 --- a/official/vision/detection/README.md +++ /dev/null @@ -1,429 +0,0 @@ -# Object Detection Models on TensorFlow 2 - -**WARNING**: This repository will be deprecated and replaced by the solid -implementations inside vision/beta/. - -## Prerequsite -To get started, download the code from TensorFlow models GitHub repository or -use the pre-installed Google Cloud VM. - -```bash -git clone https://github.com/tensorflow/models.git -``` - -Next, make sure to use TensorFlow 2.1+ on Google Cloud. Also here are -a few package you need to install to get started: - -```bash -sudo apt-get install -y python-tk && \ -pip3 install -r ~/models/official/requirements.txt -``` - -## Train RetinaNet on TPU - -### Train a vanilla ResNet-50 based RetinaNet. - -```bash -TPU_NAME="" -MODEL_DIR="" -RESNET_CHECKPOINT="" -TRAIN_FILE_PATTERN="" -EVAL_FILE_PATTERN="" -VAL_JSON_FILE="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu="${TPU_NAME?}" \ - --model_dir="${MODEL_DIR?}" \ - --mode=train \ - --params_override="{ type: retinanet, train: { checkpoint: { path: ${RESNET_CHECKPOINT?}, prefix: resnet50/ }, train_file_pattern: ${TRAIN_FILE_PATTERN?} }, eval: { val_json_file: ${VAL_JSON_FILE?}, eval_file_pattern: ${EVAL_FILE_PATTERN?} } }" -``` - -The pre-trained ResNet-50 checkpoint can be downloaded [here](https://storage.cloud.google.com/cloud-tpu-checkpoints/model-garden-vision/detection/resnet50-2018-02-07.tar.gz). - -Note: The ResNet implementation under -[detection/](https://github.com/tensorflow/models/tree/master/official/vision/detection) -is currently different from the one under -[classification/](https://github.com/tensorflow/models/tree/master/official/vision/image_classification), -so the checkpoints are not compatible. -We will unify the implementation soon. - - -### Train a SpineNet-49 based RetinaNet. - -```bash -TPU_NAME="" -MODEL_DIR="" -TRAIN_FILE_PATTERN="" -EVAL_FILE_PATTERN="" -VAL_JSON_FILE="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu="${TPU_NAME?}" \ - --model_dir="${MODEL_DIR?}" \ - --mode=train \ - --params_override="{ type: retinanet, architecture: {backbone: spinenet, multilevel_features: identity}, spinenet: {model_id: 49}, train_file_pattern: ${TRAIN_FILE_PATTERN?} }, eval: { val_json_file: ${VAL_JSON_FILE?}, eval_file_pattern: ${EVAL_FILE_PATTERN?} } }" -``` - - -### Train a custom RetinaNet using the config file. - -First, create a YAML config file, e.g. *my_retinanet.yaml*. This file specifies -the parameters to be overridden, which should at least include the following -fields. - -```YAML -# my_retinanet.yaml -type: 'retinanet' -train: - train_file_pattern: -eval: - eval_file_pattern: - val_json_file: -``` - -Once the YAML config file is created, you can launch the training using the -following command. - -```bash -TPU_NAME="" -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu="${TPU_NAME?}" \ - --model_dir="${MODEL_DIR?}" \ - --mode=train \ - --config_file="my_retinanet.yaml" -``` - -## Train RetinaNet on GPU - -Training on GPU is similar to that on TPU. The major change is the strategy -type (use "[mirrored](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy)" for multiple GPU and -"[one_device](https://www.tensorflow.org/api_docs/python/tf/distribute/OneDeviceStrategy)" for single GPU). - -Multi-GPUs example (assuming there are 8GPU connected to the host): - -```bash -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=mirrored \ - --num_gpus=8 \ - --model_dir="${MODEL_DIR?}" \ - --mode=train \ - --config_file="my_retinanet.yaml" -``` - -```bash -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=one_device \ - --num_gpus=1 \ - --model_dir="${MODEL_DIR?}" \ - --mode=train \ - --config_file="my_retinanet.yaml" -``` - -An example with inline configuration (YAML or JSON format): - -``` -python3 ~/models/official/vision/detection/main.py \ - --model_dir= \ - --strategy_type=one_device \ - --num_gpus=1 \ - --mode=train \ - --params_override="eval: - eval_file_pattern: - batch_size: 8 - val_json_file: -predict: - predict_batch_size: 8 -architecture: - use_bfloat16: False -train: - total_steps: 1 - batch_size: 8 - train_file_pattern: -use_tpu: False -" -``` - ---- - -## Train Mask R-CNN on TPU - -### Train a vanilla ResNet-50 based Mask R-CNN. - -```bash -TPU_NAME="" -MODEL_DIR="" -RESNET_CHECKPOINT="" -TRAIN_FILE_PATTERN="" -EVAL_FILE_PATTERN="" -VAL_JSON_FILE="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu=${TPU_NAME} \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=mask_rcnn \ - --params_override="{train: { checkpoint: { path: ${RESNET_CHECKPOINT}, prefix: resnet50/ }, train_file_pattern: ${TRAIN_FILE_PATTERN} }, eval: { val_json_file: ${VAL_JSON_FILE}, eval_file_pattern: ${EVAL_FILE_PATTERN} } }" -``` - -The pre-trained ResNet-50 checkpoint can be downloaded [here](https://storage.cloud.google.com/cloud-tpu-checkpoints/model-garden-vision/detection/resnet50-2018-02-07.tar.gz). - -Note: The ResNet implementation under -[detection/](https://github.com/tensorflow/models/tree/master/official/vision/detection) -is currently different from the one under -[classification/](https://github.com/tensorflow/models/tree/master/official/vision/image_classification), -so the checkpoints are not compatible. -We will unify the implementation soon. - - -### Train a SpineNet-49 based Mask R-CNN. - -```bash -TPU_NAME="" -MODEL_DIR="" -TRAIN_FILE_PATTERN="" -EVAL_FILE_PATTERN="" -VAL_JSON_FILE="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu="${TPU_NAME?}" \ - --model_dir="${MODEL_DIR?}" \ - --mode=train \ - --model=mask_rcnn \ - --params_override="{architecture: {backbone: spinenet, multilevel_features: identity}, spinenet: {model_id: 49}, train_file_pattern: ${TRAIN_FILE_PATTERN?} }, eval: { val_json_file: ${VAL_JSON_FILE?}, eval_file_pattern: ${EVAL_FILE_PATTERN?} } }" -``` - - -### Train a custom Mask R-CNN using the config file. - -First, create a YAML config file, e.g. *my_maskrcnn.yaml*. -This file specifies the parameters to be overridden, -which should at least include the following fields. - -```YAML -# my_maskrcnn.yaml -train: - train_file_pattern: -eval: - eval_file_pattern: - val_json_file: -``` - -Once the YAML config file is created, you can launch the training using the -following command. - -```bash -TPU_NAME="" -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu=${TPU_NAME} \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=mask_rcnn \ - --config_file="my_maskrcnn.yaml" -``` - -## Train Mask R-CNN on GPU - -Training on GPU is similar to that on TPU. The major change is the strategy type -(use -"[mirrored](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy)" -for multiple GPU and -"[one_device](https://www.tensorflow.org/api_docs/python/tf/distribute/OneDeviceStrategy)" -for single GPU). - -Multi-GPUs example (assuming there are 8GPU connected to the host): - -```bash -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=mirrored \ - --num_gpus=8 \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=mask_rcnn \ - --config_file="my_maskrcnn.yaml" -``` - -```bash -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=one_device \ - --num_gpus=1 \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=mask_rcnn \ - --config_file="my_maskrcnn.yaml" -``` - -An example with inline configuration (YAML or JSON format): - -``` -python3 ~/models/official/vision/detection/main.py \ - --model_dir= \ - --strategy_type=one_device \ - --num_gpus=1 \ - --mode=train \ - --model=mask_rcnn \ - --params_override="eval: - eval_file_pattern: - batch_size: 8 - val_json_file: -predict: - predict_batch_size: 8 -architecture: - use_bfloat16: False -train: - total_steps: 1000 - batch_size: 8 - train_file_pattern: -use_tpu: False -" -``` - -## Train ShapeMask on TPU - -### Train a ResNet-50 based ShapeMask. - -```bash -TPU_NAME="" -MODEL_DIR="" -RESNET_CHECKPOINT="" -TRAIN_FILE_PATTERN="" -EVAL_FILE_PATTERN="" -VAL_JSON_FILE="" -SHAPE_PRIOR_PATH="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu=${TPU_NAME} \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=shapemask \ - --params_override="{train: { checkpoint: { path: ${RESNET_CHECKPOINT}, prefix: resnet50/ }, train_file_pattern: ${TRAIN_FILE_PATTERN} }, eval: { val_json_file: ${VAL_JSON_FILE}, eval_file_pattern: ${EVAL_FILE_PATTERN} } shapemask_head: {use_category_for_mask: true, shape_prior_path: ${SHAPE_PRIOR_PATH}} }" -``` - -The pre-trained ResNet-50 checkpoint can be downloaded [here](https://storage.cloud.google.com/cloud-tpu-checkpoints/model-garden-vision/detection/resnet50-2018-02-07.tar.gz). - -The shape priors can be downloaded [here] -(https://storage.googleapis.com/cloud-tpu-checkpoints/shapemask/kmeans_class_priors_91x20x32x32.npy) - - -### Train a custom ShapeMask using the config file. - -First, create a YAML config file, e.g. *my_shapemask.yaml*. -This file specifies the parameters to be overridden: - -```YAML -# my_shapemask.yaml -train: - train_file_pattern: - total_steps: - batch_size: -eval: - eval_file_pattern: - val_json_file: - batch_size: -shapemask_head: - shape_prior_path: -``` - -Once the YAML config file is created, you can launch the training using the -following command. - -```bash -TPU_NAME="" -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu=${TPU_NAME} \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=shapemask \ - --config_file="my_shapemask.yaml" -``` - -## Train ShapeMask on GPU - -Training on GPU is similar to that on TPU. The major change is the strategy type -(use -"[mirrored](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy)" -for multiple GPU and -"[one_device](https://www.tensorflow.org/api_docs/python/tf/distribute/OneDeviceStrategy)" -for single GPU). - -Multi-GPUs example (assuming there are 8GPU connected to the host): - -```bash -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=mirrored \ - --num_gpus=8 \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=shapemask \ - --config_file="my_shapemask.yaml" -``` - -A single GPU example - -```bash -MODEL_DIR="" -python3 ~/models/official/vision/detection/main.py \ - --strategy_type=one_device \ - --num_gpus=1 \ - --model_dir=${MODEL_DIR} \ - --mode=train \ - --model=shapemask \ - --config_file="my_shapemask.yaml" -``` - - -An example with inline configuration (YAML or JSON format): - -``` -python3 ~/models/official/vision/detection/main.py \ - --model_dir= \ - --strategy_type=one_device \ - --num_gpus=1 \ - --mode=train \ - --model=shapemask \ - --params_override="eval: - eval_file_pattern: - batch_size: 8 - val_json_file: -train: - total_steps: 1000 - batch_size: 8 - train_file_pattern: -use_tpu: False -" -``` - - -### Run the evaluation (after training) - -``` -python3 /usr/share/models/official/vision/detection/main.py \ - --strategy_type=tpu \ - --tpu=${TPU_NAME} \ - --model_dir=${MODEL_DIR} \ - --mode=eval \ - --model=shapemask \ - --params_override="{eval: { val_json_file: ${VAL_JSON_FILE}, eval_file_pattern: ${EVAL_FILE_PATTERN}, eval_samples: 5000 } }" -``` - -`MODEL_DIR` needs to point to the trained path of ShapeMask model. -Change `strategy_type=mirrored` and `num_gpus=1` to run on a GPU. - -Note: The JSON groundtruth file is useful for [COCO dataset](http://cocodataset.org/#home) and can be -downloaded from the [COCO website](http://cocodataset.org/#download). For custom dataset, it is unncessary because the groundtruth can be included in the TFRecord files. - -## References - -1. [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002). - Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. IEEE - International Conference on Computer Vision (ICCV), 2017. diff --git a/official/vision/detection/__init__.py b/official/vision/detection/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/configs/__init__.py b/official/vision/detection/configs/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/configs/base_config.py b/official/vision/detection/configs/base_config.py deleted file mode 100644 index 32b8bcc1be551c249cafeab6706ae3bc58cc2d08..0000000000000000000000000000000000000000 --- a/official/vision/detection/configs/base_config.py +++ /dev/null @@ -1,140 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Base config template.""" - - -BACKBONES = [ - 'resnet', - 'spinenet', -] - -MULTILEVEL_FEATURES = [ - 'fpn', - 'identity', -] - -# pylint: disable=line-too-long -# For ResNet, this freezes the variables of the first conv1 and conv2_x -# layers [1], which leads to higher training speed and slightly better testing -# accuracy. The intuition is that the low-level architecture (e.g., ResNet-50) -# is able to capture low-level features such as edges; therefore, it does not -# need to be fine-tuned for the detection task. -# Note that we need to trailing `/` to avoid the incorrect match. -# [1]: https://github.com/facebookresearch/Detectron/blob/master/detectron/core/config.py#L198 -RESNET_FROZEN_VAR_PREFIX = r'(resnet\d+)\/(conv2d(|_([1-9]|10))|batch_normalization(|_([1-9]|10)))\/' -REGULARIZATION_VAR_REGEX = r'.*(kernel|weight):0$' - -BASE_CFG = { - 'model_dir': '', - 'use_tpu': True, - 'strategy_type': 'tpu', - 'isolate_session_state': False, - 'train': { - 'iterations_per_loop': 100, - 'batch_size': 64, - 'total_steps': 22500, - 'num_cores_per_replica': None, - 'input_partition_dims': None, - 'optimizer': { - 'type': 'momentum', - 'momentum': 0.9, - 'nesterov': True, # `False` is better for TPU v3-128. - }, - 'learning_rate': { - 'type': 'step', - 'warmup_learning_rate': 0.0067, - 'warmup_steps': 500, - 'init_learning_rate': 0.08, - 'learning_rate_levels': [0.008, 0.0008], - 'learning_rate_steps': [15000, 20000], - }, - 'checkpoint': { - 'path': '', - 'prefix': '', - }, - # One can use 'RESNET_FROZEN_VAR_PREFIX' to speed up ResNet training - # when loading from the checkpoint. - 'frozen_variable_prefix': '', - 'train_file_pattern': '', - 'train_dataset_type': 'tfrecord', - # TODO(b/142174042): Support transpose_input option. - 'transpose_input': False, - 'regularization_variable_regex': REGULARIZATION_VAR_REGEX, - 'l2_weight_decay': 0.0001, - 'gradient_clip_norm': 0.0, - 'input_sharding': False, - }, - 'eval': { - 'input_sharding': True, - 'batch_size': 8, - 'eval_samples': 5000, - 'min_eval_interval': 180, - 'eval_timeout': None, - 'num_steps_per_eval': 1000, - 'type': 'box', - 'use_json_file': True, - 'val_json_file': '', - 'eval_file_pattern': '', - 'eval_dataset_type': 'tfrecord', - # When visualizing images, set evaluation batch size to 40 to avoid - # potential OOM. - 'num_images_to_visualize': 0, - }, - 'predict': { - 'batch_size': 8, - }, - 'architecture': { - 'backbone': 'resnet', - 'min_level': 3, - 'max_level': 7, - 'multilevel_features': 'fpn', - 'use_bfloat16': True, - # Note that `num_classes` is the total number of classes including - # one background classes whose index is 0. - 'num_classes': 91, - }, - 'anchor': { - 'num_scales': 3, - 'aspect_ratios': [1.0, 2.0, 0.5], - 'anchor_size': 4.0, - }, - 'norm_activation': { - 'activation': 'relu', - 'batch_norm_momentum': 0.997, - 'batch_norm_epsilon': 1e-4, - 'batch_norm_trainable': True, - 'use_sync_bn': False, - }, - 'resnet': { - 'resnet_depth': 50, - }, - 'spinenet': { - 'model_id': '49', - }, - 'fpn': { - 'fpn_feat_dims': 256, - 'use_separable_conv': False, - 'use_batch_norm': True, - }, - 'postprocess': { - 'use_batched_nms': False, - 'max_total_size': 100, - 'nms_iou_threshold': 0.5, - 'score_threshold': 0.05, - 'pre_nms_num_boxes': 5000, - }, - 'enable_summary': False, -} -# pylint: enable=line-too-long diff --git a/official/vision/detection/configs/factory.py b/official/vision/detection/configs/factory.py deleted file mode 100644 index 58530518b7f6cbbb33e244c8f746fead7822add4..0000000000000000000000000000000000000000 --- a/official/vision/detection/configs/factory.py +++ /dev/null @@ -1,41 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Factory to provide model configs.""" - -from official.modeling.hyperparams import params_dict -from official.vision.detection.configs import maskrcnn_config -from official.vision.detection.configs import olnmask_config -from official.vision.detection.configs import retinanet_config -from official.vision.detection.configs import shapemask_config - - -def config_generator(model): - """Model function generator.""" - if model == 'retinanet': - default_config = retinanet_config.RETINANET_CFG - restrictions = retinanet_config.RETINANET_RESTRICTIONS - elif model == 'mask_rcnn': - default_config = maskrcnn_config.MASKRCNN_CFG - restrictions = maskrcnn_config.MASKRCNN_RESTRICTIONS - elif model == 'olnmask': - default_config = olnmask_config.OLNMASK_CFG - restrictions = olnmask_config.OLNMASK_RESTRICTIONS - elif model == 'shapemask': - default_config = shapemask_config.SHAPEMASK_CFG - restrictions = shapemask_config.SHAPEMASK_RESTRICTIONS - else: - raise ValueError('Model %s is not supported.' % model) - - return params_dict.ParamsDict(default_config, restrictions) diff --git a/official/vision/detection/configs/maskrcnn_config.py b/official/vision/detection/configs/maskrcnn_config.py deleted file mode 100644 index e421fb4e7174b09c576423efe0cdbd5622d82304..0000000000000000000000000000000000000000 --- a/official/vision/detection/configs/maskrcnn_config.py +++ /dev/null @@ -1,115 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Config template to train Mask R-CNN.""" - -from official.modeling.hyperparams import params_dict -from official.vision.detection.configs import base_config - - -# pylint: disable=line-too-long -MASKRCNN_CFG = params_dict.ParamsDict(base_config.BASE_CFG) -MASKRCNN_CFG.override({ - 'type': 'mask_rcnn', - 'eval': { - 'type': 'box_and_mask', - 'num_images_to_visualize': 0, - }, - 'architecture': { - 'parser': 'maskrcnn_parser', - 'min_level': 2, - 'max_level': 6, - 'include_mask': True, - 'mask_target_size': 28, - }, - 'maskrcnn_parser': { - 'output_size': [1024, 1024], - 'num_channels': 3, - 'rpn_match_threshold': 0.7, - 'rpn_unmatched_threshold': 0.3, - 'rpn_batch_size_per_im': 256, - 'rpn_fg_fraction': 0.5, - 'aug_rand_hflip': True, - 'aug_scale_min': 1.0, - 'aug_scale_max': 1.0, - 'skip_crowd_during_training': True, - 'max_num_instances': 100, - 'mask_crop_size': 112, - }, - 'anchor': { - 'num_scales': 1, - 'anchor_size': 8, - }, - 'rpn_head': { - 'num_convs': 2, - 'num_filters': 256, - 'use_separable_conv': False, - 'use_batch_norm': False, - }, - 'frcnn_head': { - 'num_convs': 0, - 'num_filters': 256, - 'use_separable_conv': False, - 'num_fcs': 2, - 'fc_dims': 1024, - 'use_batch_norm': False, - }, - 'mrcnn_head': { - 'num_convs': 4, - 'num_filters': 256, - 'use_separable_conv': False, - 'use_batch_norm': False, - }, - 'rpn_score_loss': { - 'rpn_batch_size_per_im': 256, - }, - 'rpn_box_loss': { - 'huber_loss_delta': 1.0 / 9.0, - }, - 'frcnn_box_loss': { - 'huber_loss_delta': 1.0, - }, - 'roi_proposal': { - 'rpn_pre_nms_top_k': 2000, - 'rpn_post_nms_top_k': 1000, - 'rpn_nms_threshold': 0.7, - 'rpn_score_threshold': 0.0, - 'rpn_min_size_threshold': 0.0, - 'test_rpn_pre_nms_top_k': 1000, - 'test_rpn_post_nms_top_k': 1000, - 'test_rpn_nms_threshold': 0.7, - 'test_rpn_score_threshold': 0.0, - 'test_rpn_min_size_threshold': 0.0, - 'use_batched_nms': False, - }, - 'roi_sampling': { - 'num_samples_per_image': 512, - 'fg_fraction': 0.25, - 'fg_iou_thresh': 0.5, - 'bg_iou_thresh_hi': 0.5, - 'bg_iou_thresh_lo': 0.0, - 'mix_gt_boxes': True, - }, - 'mask_sampling': { - 'num_mask_samples_per_image': 128, # Typically = `num_samples_per_image` * `fg_fraction`. - }, - 'postprocess': { - 'pre_nms_num_boxes': 1000, - }, -}, is_strict=False) - - -MASKRCNN_RESTRICTIONS = [ -] -# pylint: enable=line-too-long diff --git a/official/vision/detection/configs/olnmask_config.py b/official/vision/detection/configs/olnmask_config.py deleted file mode 100644 index 2888cc2ad87a6b6fbf2c0364337ace8d9ac5a30a..0000000000000000000000000000000000000000 --- a/official/vision/detection/configs/olnmask_config.py +++ /dev/null @@ -1,143 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Config template to train Object Localization Network (OLN).""" - -from official.modeling.hyperparams import params_dict -from official.vision.detection.configs import base_config - - -# pylint: disable=line-too-long -OLNMASK_CFG = params_dict.ParamsDict(base_config.BASE_CFG) -OLNMASK_CFG.override({ - 'type': 'olnmask', - 'eval': { - 'type': 'oln_xclass_box', - 'use_category': False, - 'seen_class': 'voc', - 'num_images_to_visualize': 0, - }, - 'architecture': { - 'parser': 'olnmask_parser', - 'min_level': 2, - 'max_level': 6, - 'include_rpn_class': False, - 'include_frcnn_class': False, - 'include_frcnn_box': True, - 'include_mask': False, - 'mask_target_size': 28, - 'num_classes': 2, - }, - 'olnmask_parser': { - 'output_size': [640, 640], - 'num_channels': 3, - 'rpn_match_threshold': 0.7, - 'rpn_unmatched_threshold': 0.3, - 'rpn_batch_size_per_im': 256, - 'rpn_fg_fraction': 0.5, - 'aug_rand_hflip': True, - 'aug_scale_min': 0.5, - 'aug_scale_max': 2.0, - 'skip_crowd_during_training': True, - 'max_num_instances': 100, - 'mask_crop_size': 112, - # centerness targets. - 'has_centerness': True, - 'rpn_center_match_iou_threshold': 0.3, - 'rpn_center_unmatched_iou_threshold': 0.1, - 'rpn_num_center_samples_per_im': 256, - # class manipulation. - 'class_agnostic': True, - 'train_class': 'voc', - }, - 'anchor': { - 'num_scales': 1, - 'aspect_ratios': [1.0], - 'anchor_size': 8, - }, - 'rpn_head': { - 'num_convs': 2, - 'num_filters': 256, - 'use_separable_conv': False, - 'use_batch_norm': False, - # RPN-Centerness learning { - 'has_centerness': True, # } - }, - 'frcnn_head': { - 'num_convs': 0, - 'num_filters': 256, - 'use_separable_conv': False, - 'num_fcs': 2, - 'fc_dims': 1024, - 'use_batch_norm': False, - 'has_scoring': True, - }, - 'mrcnn_head': { - 'num_convs': 4, - 'num_filters': 256, - 'use_separable_conv': False, - 'use_batch_norm': False, - 'has_scoring': False, - }, - 'rpn_score_loss': { - 'rpn_batch_size_per_im': 256, - }, - 'rpn_box_loss': { - 'huber_loss_delta': 1.0 / 9.0, - }, - 'frcnn_box_loss': { - 'huber_loss_delta': 1.0, - }, - 'frcnn_box_score_loss': { - 'ignore_threshold': 0.3, - }, - 'roi_proposal': { - 'rpn_pre_nms_top_k': 2000, - 'rpn_post_nms_top_k': 2000, - 'rpn_nms_threshold': 0.7, - 'rpn_score_threshold': 0.0, - 'rpn_min_size_threshold': 0.0, - 'test_rpn_pre_nms_top_k': 2000, - 'test_rpn_post_nms_top_k': 2000, - 'test_rpn_nms_threshold': 0.7, - 'test_rpn_score_threshold': 0.0, - 'test_rpn_min_size_threshold': 0.0, - 'use_batched_nms': False, - }, - 'roi_sampling': { - 'num_samples_per_image': 512, - 'fg_fraction': 0.25, - 'fg_iou_thresh': 0.5, - 'bg_iou_thresh_hi': 0.5, - 'bg_iou_thresh_lo': 0.0, - 'mix_gt_boxes': True, - }, - 'mask_sampling': { - 'num_mask_samples_per_image': 128, # Typically = `num_samples_per_image` * `fg_fraction`. - }, - 'postprocess': { - 'use_batched_nms': False, - 'max_total_size': 100, - 'nms_iou_threshold': 0.5, - 'score_threshold': 0.00, - 'pre_nms_num_boxes': 2000, - }, -}, is_strict=False) - - -OLNMASK_RESTRICTIONS = [ - # 'anchor.aspect_ratios == [1.0]', - # 'anchor.scales == 1', -] -# pylint: enable=line-too-long diff --git a/official/vision/detection/configs/retinanet_config.py b/official/vision/detection/configs/retinanet_config.py deleted file mode 100644 index fb55a8a3bbedd1a0f54719c2385087edf2733853..0000000000000000000000000000000000000000 --- a/official/vision/detection/configs/retinanet_config.py +++ /dev/null @@ -1,58 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Config template to train Retinanet.""" - -from official.modeling.hyperparams import params_dict -from official.vision.detection.configs import base_config - - -# pylint: disable=line-too-long -RETINANET_CFG = params_dict.ParamsDict(base_config.BASE_CFG) -RETINANET_CFG.override({ - 'type': 'retinanet', - 'architecture': { - 'parser': 'retinanet_parser', - }, - 'retinanet_parser': { - 'output_size': [640, 640], - 'num_channels': 3, - 'match_threshold': 0.5, - 'unmatched_threshold': 0.5, - 'aug_rand_hflip': True, - 'aug_scale_min': 1.0, - 'aug_scale_max': 1.0, - 'use_autoaugment': False, - 'autoaugment_policy_name': 'v0', - 'skip_crowd_during_training': True, - 'max_num_instances': 100, - }, - 'retinanet_head': { - 'num_convs': 4, - 'num_filters': 256, - 'use_separable_conv': False, - }, - 'retinanet_loss': { - 'focal_loss_alpha': 0.25, - 'focal_loss_gamma': 1.5, - 'huber_loss_delta': 0.1, - 'box_loss_weight': 50, - }, - 'enable_summary': True, -}, is_strict=False) - -RETINANET_RESTRICTIONS = [ -] - -# pylint: enable=line-too-long diff --git a/official/vision/detection/configs/shapemask_config.py b/official/vision/detection/configs/shapemask_config.py deleted file mode 100644 index aef823275c17f12b91b9c30f3cb3ca5b45b4e2cf..0000000000000000000000000000000000000000 --- a/official/vision/detection/configs/shapemask_config.py +++ /dev/null @@ -1,97 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Config to train shapemask on COCO.""" - -from official.modeling.hyperparams import params_dict -from official.vision.detection.configs import base_config - -SHAPEMASK_RESNET_FROZEN_VAR_PREFIX = r'(conv2d(|_([1-9]|10))|batch_normalization(|_([1-9]|10)))\/' - -SHAPEMASK_CFG = params_dict.ParamsDict(base_config.BASE_CFG) -SHAPEMASK_CFG.override({ - 'type': 'shapemask', - 'architecture': { - 'parser': 'shapemask_parser', - 'backbone': 'resnet', - 'multilevel_features': 'fpn', - 'outer_box_scale': 1.25, - }, - 'train': { - 'total_steps': 45000, - 'learning_rate': { - 'learning_rate_steps': [30000, 40000], - }, - 'frozen_variable_prefix': SHAPEMASK_RESNET_FROZEN_VAR_PREFIX, - 'regularization_variable_regex': None, - }, - 'eval': { - 'type': 'shapemask_box_and_mask', - 'mask_eval_class': 'all', # 'all', 'voc', or 'nonvoc'. - }, - 'shapemask_parser': { - 'output_size': [640, 640], - 'num_channels': 3, - 'match_threshold': 0.5, - 'unmatched_threshold': 0.5, - 'aug_rand_hflip': True, - 'aug_scale_min': 0.8, - 'aug_scale_max': 1.2, - 'skip_crowd_during_training': True, - 'max_num_instances': 100, - # Shapemask specific parameters - 'mask_train_class': 'all', # 'all', 'voc', or 'nonvoc'. - 'use_category': True, - 'outer_box_scale': 1.25, - 'num_sampled_masks': 8, - 'mask_crop_size': 32, - 'mask_min_level': 3, - 'mask_max_level': 5, - 'box_jitter_scale': 0.025, - 'upsample_factor': 4, - }, - 'retinanet_head': { - 'num_convs': 4, - 'num_filters': 256, - 'use_separable_conv': False, - 'use_batch_norm': True, - }, - 'shapemask_head': { - 'num_downsample_channels': 128, - 'mask_crop_size': 32, - 'use_category_for_mask': True, - 'num_convs': 4, - 'upsample_factor': 4, - 'shape_prior_path': '', - }, - 'retinanet_loss': { - 'focal_loss_alpha': 0.4, - 'focal_loss_gamma': 1.5, - 'huber_loss_delta': 0.15, - 'box_loss_weight': 50, - }, - 'shapemask_loss': { - 'shape_prior_loss_weight': 0.1, - 'coarse_mask_loss_weight': 1.0, - 'fine_mask_loss_weight': 1.0, - }, -}, is_strict=False) - -SHAPEMASK_RESTRICTIONS = [ - 'shapemask_head.mask_crop_size == shapemask_parser.mask_crop_size', - 'shapemask_head.upsample_factor == shapemask_parser.upsample_factor', - 'shapemask_parser.outer_box_scale == architecture.outer_box_scale', -] - -# pylint: enable=line-too-long diff --git a/official/vision/detection/dataloader/__init__.py b/official/vision/detection/dataloader/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/dataloader/anchor.py b/official/vision/detection/dataloader/anchor.py deleted file mode 100644 index c4d76d1b606698eaad596b8af8a926e7c1116be6..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/anchor.py +++ /dev/null @@ -1,458 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Anchor box and labeler definition.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import collections - -import tensorflow as tf -from official.vision.beta.ops import iou_similarity -from official.vision.detection.utils import box_utils -from official.vision.utils.object_detection import argmax_matcher -from official.vision.utils.object_detection import balanced_positive_negative_sampler -from official.vision.utils.object_detection import box_list -from official.vision.utils.object_detection import faster_rcnn_box_coder -from official.vision.utils.object_detection import target_assigner - - -class Anchor(object): - """Anchor class for anchor-based object detectors.""" - - def __init__(self, min_level, max_level, num_scales, aspect_ratios, - anchor_size, image_size): - """Constructs multiscale anchors. - - Args: - min_level: integer number of minimum level of the output feature pyramid. - max_level: integer number of maximum level of the output feature pyramid. - num_scales: integer number representing intermediate scales added on each - level. For instances, num_scales=2 adds one additional intermediate - anchor scales [2^0, 2^0.5] on each level. - aspect_ratios: list of float numbers representing the aspect ratio anchors - added on each level. The number indicates the ratio of width to height. - For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each - scale level. - anchor_size: float number representing the scale of size of the base - anchor to the feature stride 2^level. - image_size: a list of integer numbers or Tensors representing [height, - width] of the input image size.The image_size should be divisible by the - largest feature stride 2^max_level. - """ - self.min_level = min_level - self.max_level = max_level - self.num_scales = num_scales - self.aspect_ratios = aspect_ratios - self.anchor_size = anchor_size - self.image_size = image_size - self.boxes = self._generate_boxes() - - def _generate_boxes(self): - """Generates multiscale anchor boxes. - - Returns: - a Tensor of shape [N, 4], represneting anchor boxes of all levels - concatenated together. - """ - boxes_all = [] - for level in range(self.min_level, self.max_level + 1): - boxes_l = [] - for scale in range(self.num_scales): - for aspect_ratio in self.aspect_ratios: - stride = 2**level - intermediate_scale = 2**(scale / float(self.num_scales)) - base_anchor_size = self.anchor_size * stride * intermediate_scale - aspect_x = aspect_ratio**0.5 - aspect_y = aspect_ratio**-0.5 - half_anchor_size_x = base_anchor_size * aspect_x / 2.0 - half_anchor_size_y = base_anchor_size * aspect_y / 2.0 - x = tf.range(stride / 2, self.image_size[1], stride) - y = tf.range(stride / 2, self.image_size[0], stride) - xv, yv = tf.meshgrid(x, y) - xv = tf.cast(tf.reshape(xv, [-1]), dtype=tf.float32) - yv = tf.cast(tf.reshape(yv, [-1]), dtype=tf.float32) - # Tensor shape Nx4. - boxes = tf.stack([ - yv - half_anchor_size_y, xv - half_anchor_size_x, - yv + half_anchor_size_y, xv + half_anchor_size_x - ], - axis=1) - boxes_l.append(boxes) - # Concat anchors on the same level to tensor shape NxAx4. - boxes_l = tf.stack(boxes_l, axis=1) - boxes_l = tf.reshape(boxes_l, [-1, 4]) - boxes_all.append(boxes_l) - return tf.concat(boxes_all, axis=0) - - def unpack_labels(self, labels): - """Unpacks an array of labels into multiscales labels.""" - unpacked_labels = collections.OrderedDict() - count = 0 - for level in range(self.min_level, self.max_level + 1): - feat_size_y = tf.cast(self.image_size[0] / 2**level, tf.int32) - feat_size_x = tf.cast(self.image_size[1] / 2**level, tf.int32) - steps = feat_size_y * feat_size_x * self.anchors_per_location - unpacked_labels[level] = tf.reshape(labels[count:count + steps], - [feat_size_y, feat_size_x, -1]) - count += steps - return unpacked_labels - - @property - def anchors_per_location(self): - return self.num_scales * len(self.aspect_ratios) - - @property - def multilevel_boxes(self): - return self.unpack_labels(self.boxes) - - -class AnchorLabeler(object): - """Labeler for dense object detector.""" - - def __init__(self, anchor, match_threshold=0.5, unmatched_threshold=0.5): - """Constructs anchor labeler to assign labels to anchors. - - Args: - anchor: an instance of class Anchors. - match_threshold: a float number between 0 and 1 representing the - lower-bound threshold to assign positive labels for anchors. An anchor - with a score over the threshold is labeled positive. - unmatched_threshold: a float number between 0 and 1 representing the - upper-bound threshold to assign negative labels for anchors. An anchor - with a score below the threshold is labeled negative. - """ - similarity_calc = iou_similarity.IouSimilarity() - matcher = argmax_matcher.ArgMaxMatcher( - match_threshold, - unmatched_threshold=unmatched_threshold, - negatives_lower_than_unmatched=True, - force_match_for_each_row=True) - box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() - - self._target_assigner = target_assigner.TargetAssigner( - similarity_calc, matcher, box_coder) - self._anchor = anchor - self._match_threshold = match_threshold - self._unmatched_threshold = unmatched_threshold - - def label_anchors(self, gt_boxes, gt_labels): - """Labels anchors with ground truth inputs. - - Args: - gt_boxes: A float tensor with shape [N, 4] representing groundtruth boxes. - For each row, it stores [y0, x0, y1, x1] for four corners of a box. - gt_labels: A integer tensor with shape [N, 1] representing groundtruth - classes. - - Returns: - cls_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors_per_location]. The height_l and - width_l represent the dimension of class logits at l-th level. - box_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors_per_location * 4]. The height_l - and width_l represent the dimension of bounding box regression output at - l-th level. - num_positives: scalar tensor storing number of positives in an image. - """ - gt_box_list = box_list.BoxList(gt_boxes) - anchor_box_list = box_list.BoxList(self._anchor.boxes) - - # The cls_weights, box_weights are not used. - cls_targets, _, box_targets, _, matches = self._target_assigner.assign( - anchor_box_list, gt_box_list, gt_labels) - - # Labels definition in matches.match_results: - # (1) match_results[i]>=0, meaning that column i is matched with row - # match_results[i]. - # (2) match_results[i]=-1, meaning that column i is not matched. - # (3) match_results[i]=-2, meaning that column i is ignored. - match_results = tf.expand_dims(matches.match_results, axis=1) - cls_targets = tf.cast(cls_targets, tf.int32) - cls_targets = tf.where( - tf.equal(match_results, -1), -tf.ones_like(cls_targets), cls_targets) - cls_targets = tf.where( - tf.equal(match_results, -2), -2 * tf.ones_like(cls_targets), - cls_targets) - - # Unpacks labels into multi-level representations. - cls_targets_dict = self._anchor.unpack_labels(cls_targets) - box_targets_dict = self._anchor.unpack_labels(box_targets) - num_positives = tf.reduce_sum( - input_tensor=tf.cast(tf.greater(matches.match_results, -1), tf.float32)) - - return cls_targets_dict, box_targets_dict, num_positives - - -class RpnAnchorLabeler(AnchorLabeler): - """Labeler for Region Proposal Network.""" - - def __init__(self, - anchor, - match_threshold=0.7, - unmatched_threshold=0.3, - rpn_batch_size_per_im=256, - rpn_fg_fraction=0.5): - AnchorLabeler.__init__( - self, anchor, match_threshold=0.7, unmatched_threshold=0.3) - self._rpn_batch_size_per_im = rpn_batch_size_per_im - self._rpn_fg_fraction = rpn_fg_fraction - - def _get_rpn_samples(self, match_results): - """Computes anchor labels. - - This function performs subsampling for foreground (fg) and background (bg) - anchors. - Args: - match_results: A integer tensor with shape [N] representing the matching - results of anchors. (1) match_results[i]>=0, meaning that column i is - matched with row match_results[i]. (2) match_results[i]=-1, meaning that - column i is not matched. (3) match_results[i]=-2, meaning that column i - is ignored. - - Returns: - score_targets: a integer tensor with the a shape of [N]. - (1) score_targets[i]=1, the anchor is a positive sample. - (2) score_targets[i]=0, negative. (3) score_targets[i]=-1, the anchor is - don't care (ignore). - """ - sampler = ( - balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( - positive_fraction=self._rpn_fg_fraction, is_static=False)) - # indicator includes both positive and negative labels. - # labels includes only positives labels. - # positives = indicator & labels. - # negatives = indicator & !labels. - # ignore = !indicator. - indicator = tf.greater(match_results, -2) - labels = tf.greater(match_results, -1) - - samples = sampler.subsample(indicator, self._rpn_batch_size_per_im, labels) - positive_labels = tf.where( - tf.logical_and(samples, labels), - tf.constant(2, dtype=tf.int32, shape=match_results.shape), - tf.constant(0, dtype=tf.int32, shape=match_results.shape)) - negative_labels = tf.where( - tf.logical_and(samples, tf.logical_not(labels)), - tf.constant(1, dtype=tf.int32, shape=match_results.shape), - tf.constant(0, dtype=tf.int32, shape=match_results.shape)) - ignore_labels = tf.fill(match_results.shape, -1) - - return (ignore_labels + positive_labels + negative_labels, positive_labels, - negative_labels) - - def label_anchors(self, gt_boxes, gt_labels): - """Labels anchors with ground truth inputs. - - Args: - gt_boxes: A float tensor with shape [N, 4] representing groundtruth boxes. - For each row, it stores [y0, x0, y1, x1] for four corners of a box. - gt_labels: A integer tensor with shape [N, 1] representing groundtruth - classes. - - Returns: - score_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors]. The height_l and width_l - represent the dimension of class logits at l-th level. - box_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - """ - gt_box_list = box_list.BoxList(gt_boxes) - anchor_box_list = box_list.BoxList(self._anchor.boxes) - - # cls_targets, cls_weights, box_weights are not used. - _, _, box_targets, _, matches = self._target_assigner.assign( - anchor_box_list, gt_box_list, gt_labels) - - # score_targets contains the subsampled positive and negative anchors. - score_targets, _, _ = self._get_rpn_samples(matches.match_results) - - # Unpacks labels. - score_targets_dict = self._anchor.unpack_labels(score_targets) - box_targets_dict = self._anchor.unpack_labels(box_targets) - - return score_targets_dict, box_targets_dict - - -class OlnAnchorLabeler(RpnAnchorLabeler): - """Labeler for Region Proposal Network.""" - - def __init__(self, - anchor, - match_threshold=0.7, - unmatched_threshold=0.3, - rpn_batch_size_per_im=256, - rpn_fg_fraction=0.5, - has_centerness=False, - center_match_iou_threshold=0.3, - center_unmatched_iou_threshold=0.1, - num_center_samples_per_im=256): - """Constructs rpn anchor labeler to assign labels and centerness to anchors. - - Args: - anchor: an instance of class Anchors. - match_threshold: a float number between 0 and 1 representing the - lower-bound threshold to assign positive labels for anchors. An anchor - with a score over the threshold is labeled positive. - unmatched_threshold: a float number between 0 and 1 representing the - upper-bound threshold to assign negative labels for anchors. An anchor - with a score below the threshold is labeled negative. - rpn_batch_size_per_im: number of anchors that are sampled per image. - rpn_fg_fraction: - has_centerness: whether to include centerness target creation. An anchor - is paired with one centerness score. - center_match_iou_threshold: a float number between 0 and 1 representing - the lower-bound threshold to sample foreground anchors for centerness - regression. An anchor with a score over the threshold is sampled as - foreground sample for centerness regression. We sample mostly from the - foreground region (255 out of 256 samples). That is, we sample 255 vs 1 - (foreground vs background) anchor points to learn centerness regression. - center_unmatched_iou_threshold: a float number between 0 and 1 - representing the lower-bound threshold to sample background anchors for - centerness regression. An anchor with a score over the threshold is - sampled as foreground sample for centerness regression. We sample very - sparsely from the background region (1 out of 256 samples). That is, we - sample 255 vs 1 (foreground vs background) anchor points to learn - centerness regression. - num_center_samples_per_im: number of anchor points per image that are - sampled as centerness targets. - """ - super(OlnAnchorLabeler, self).__init__( - anchor, match_threshold=match_threshold, - unmatched_threshold=unmatched_threshold, - rpn_batch_size_per_im=rpn_batch_size_per_im, - rpn_fg_fraction=rpn_fg_fraction) - similarity_calc = iou_similarity.IouSimilarity() - matcher = argmax_matcher.ArgMaxMatcher( - match_threshold, - unmatched_threshold=unmatched_threshold, - negatives_lower_than_unmatched=True, - force_match_for_each_row=True) - box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() - if has_centerness: - center_matcher = argmax_matcher.ArgMaxMatcher( - center_match_iou_threshold, - unmatched_threshold=center_match_iou_threshold, - negatives_lower_than_unmatched=True, - force_match_for_each_row=True,) - else: - center_matcher = None - - self._target_assigner = target_assigner.OlnTargetAssigner( - similarity_calc, matcher, box_coder, - center_matcher=center_matcher) - self._num_center_samples_per_im = num_center_samples_per_im - self._center_unmatched_iou_threshold = center_unmatched_iou_threshold - self._rpn_batch_size_per_im = rpn_batch_size_per_im - self._rpn_fg_fraction = rpn_fg_fraction - - def label_anchors_lrtb(self, gt_boxes, gt_labels): - """Labels anchors with ground truth inputs. - - Args: - gt_boxes: A float tensor with shape [N, 4] representing groundtruth boxes. - For each row, it stores [y0, x0, y1, x1] for four corners of a box. - gt_labels: A integer tensor with shape [N, 1] representing groundtruth - classes. - - Returns: - score_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors]. The height_l and width_l - represent the dimension of class logits at l-th level. - box_targets_dict: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, num_anchors * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - lrtb_targets_dict: Same strucure to box_target_dict, except the regression - targets are converted from xyhw to lrtb format. Ordered dictionary with - keys [min_level, min_level+1, ..., max_level]. The values are tensor - with shape [height_l, width_l, num_anchors * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - center_targets_dict: Same structure to score_tragets_dict, except the - scores are centerness values ranging from 0 to 1. Ordered dictionary - with keys [min_level, min_level+1, ..., max_level]. The values are - tensor with shape [height_l, width_l, num_anchors]. The height_l and - width_l represent the dimension of class logits at l-th level. - """ - gt_box_list = box_list.BoxList(gt_boxes) - anchor_box_list = box_list.BoxList(self._anchor.boxes) - - # cls_targets, cls_weights, box_weights are not used. - (_, _, box_targets, _, matches, - matched_gt_box_list, matched_anchors_mask, - center_matched_gt_box_list, center_matched_anchors_mask, - matched_ious) = self._target_assigner.assign( - anchor_box_list, gt_box_list, gt_labels) - # Box lrtb_targets. - lrtb_targets, _ = box_utils.encode_boxes_lrtb( - matched_gt_box_list.data['boxes'], - anchor_box_list.data['boxes'], - weights=[1.0, 1.0, 1.0, 1.0]) - lrtb_sanity = tf.logical_and( - tf.greater(tf.reduce_min(lrtb_targets, -1), 0.), - matched_anchors_mask) - # To broadcast lrtb_sanity to the same shape as lrtb_targets. - lrtb_sanity = tf.tile(tf.expand_dims(lrtb_sanity, 1), - [1, tf.shape(lrtb_targets)[1]]) - lrtb_targets = tf.where(lrtb_sanity, - lrtb_targets, - tf.zeros_like(lrtb_targets)) - # RPN anchor-gtbox iou values. - iou_targets = tf.where(tf.greater(matched_ious, 0.0), - matched_ious, - tf.zeros_like(matched_ious)) - # Centerness_targets. - _, center_targets = box_utils.encode_boxes_lrtb( - center_matched_gt_box_list.data['boxes'], - anchor_box_list.data['boxes'], - weights=[1.0, 1.0, 1.0, 1.0]) - # Positive-negative centerness sampler. - num_center_samples_per_im = self._num_center_samples_per_im - center_pos_neg_sampler = ( - balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( - positive_fraction=(1.- 1./num_center_samples_per_im), - is_static=False)) - center_pos_neg_indicator = tf.logical_or( - center_matched_anchors_mask, - tf.less(iou_targets, self._center_unmatched_iou_threshold)) - center_pos_labels = center_matched_anchors_mask - center_samples = center_pos_neg_sampler.subsample( - center_pos_neg_indicator, num_center_samples_per_im, center_pos_labels) - is_valid = center_samples - center_targets = tf.where(is_valid, - center_targets, - (-1) * tf.ones_like(center_targets)) - - # score_targets contains the subsampled positive and negative anchors. - score_targets, _, _ = self._get_rpn_samples(matches.match_results) - - # Unpacks labels. - score_targets_dict = self._anchor.unpack_labels(score_targets) - box_targets_dict = self._anchor.unpack_labels(box_targets) - lrtb_targets_dict = self._anchor.unpack_labels(lrtb_targets) - center_targets_dict = self._anchor.unpack_labels(center_targets) - - return (score_targets_dict, box_targets_dict, - lrtb_targets_dict, center_targets_dict) diff --git a/official/vision/detection/dataloader/factory.py b/official/vision/detection/dataloader/factory.py deleted file mode 100644 index 3d1e8574a497747463dadd792efc2ccadba7e3ed..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/factory.py +++ /dev/null @@ -1,136 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Model architecture factory.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from official.vision.detection.dataloader import maskrcnn_parser -from official.vision.detection.dataloader import olnmask_parser -from official.vision.detection.dataloader import retinanet_parser -from official.vision.detection.dataloader import shapemask_parser - - -def parser_generator(params, mode): - """Generator function for various dataset parser.""" - if params.architecture.parser == 'retinanet_parser': - anchor_params = params.anchor - parser_params = params.retinanet_parser - parser_fn = retinanet_parser.Parser( - output_size=parser_params.output_size, - min_level=params.architecture.min_level, - max_level=params.architecture.max_level, - num_scales=anchor_params.num_scales, - aspect_ratios=anchor_params.aspect_ratios, - anchor_size=anchor_params.anchor_size, - match_threshold=parser_params.match_threshold, - unmatched_threshold=parser_params.unmatched_threshold, - aug_rand_hflip=parser_params.aug_rand_hflip, - aug_scale_min=parser_params.aug_scale_min, - aug_scale_max=parser_params.aug_scale_max, - use_autoaugment=parser_params.use_autoaugment, - autoaugment_policy_name=parser_params.autoaugment_policy_name, - skip_crowd_during_training=parser_params.skip_crowd_during_training, - max_num_instances=parser_params.max_num_instances, - use_bfloat16=params.architecture.use_bfloat16, - mode=mode) - elif params.architecture.parser == 'maskrcnn_parser': - anchor_params = params.anchor - parser_params = params.maskrcnn_parser - parser_fn = maskrcnn_parser.Parser( - output_size=parser_params.output_size, - min_level=params.architecture.min_level, - max_level=params.architecture.max_level, - num_scales=anchor_params.num_scales, - aspect_ratios=anchor_params.aspect_ratios, - anchor_size=anchor_params.anchor_size, - rpn_match_threshold=parser_params.rpn_match_threshold, - rpn_unmatched_threshold=parser_params.rpn_unmatched_threshold, - rpn_batch_size_per_im=parser_params.rpn_batch_size_per_im, - rpn_fg_fraction=parser_params.rpn_fg_fraction, - aug_rand_hflip=parser_params.aug_rand_hflip, - aug_scale_min=parser_params.aug_scale_min, - aug_scale_max=parser_params.aug_scale_max, - skip_crowd_during_training=parser_params.skip_crowd_during_training, - max_num_instances=parser_params.max_num_instances, - include_mask=params.architecture.include_mask, - mask_crop_size=parser_params.mask_crop_size, - use_bfloat16=params.architecture.use_bfloat16, - mode=mode) - elif params.architecture.parser == 'olnmask_parser': - anchor_params = params.anchor - parser_params = params.olnmask_parser - parser_fn = olnmask_parser.Parser( - output_size=parser_params.output_size, - min_level=params.architecture.min_level, - max_level=params.architecture.max_level, - num_scales=anchor_params.num_scales, - aspect_ratios=anchor_params.aspect_ratios, - anchor_size=anchor_params.anchor_size, - rpn_match_threshold=parser_params.rpn_match_threshold, - rpn_unmatched_threshold=parser_params.rpn_unmatched_threshold, - rpn_batch_size_per_im=parser_params.rpn_batch_size_per_im, - rpn_fg_fraction=parser_params.rpn_fg_fraction, - aug_rand_hflip=parser_params.aug_rand_hflip, - aug_scale_min=parser_params.aug_scale_min, - aug_scale_max=parser_params.aug_scale_max, - skip_crowd_during_training=parser_params.skip_crowd_during_training, - max_num_instances=parser_params.max_num_instances, - include_mask=params.architecture.include_mask, - mask_crop_size=parser_params.mask_crop_size, - use_bfloat16=params.architecture.use_bfloat16, - mode=mode, - has_centerness=parser_params.has_centerness, - rpn_center_match_iou_threshold=( - parser_params.rpn_center_match_iou_threshold), - rpn_center_unmatched_iou_threshold=( - parser_params.rpn_center_unmatched_iou_threshold), - rpn_num_center_samples_per_im=( - parser_params.rpn_num_center_samples_per_im), - class_agnostic=parser_params.class_agnostic, - train_class=parser_params.train_class,) - elif params.architecture.parser == 'shapemask_parser': - anchor_params = params.anchor - parser_params = params.shapemask_parser - parser_fn = shapemask_parser.Parser( - output_size=parser_params.output_size, - min_level=params.architecture.min_level, - max_level=params.architecture.max_level, - num_scales=anchor_params.num_scales, - aspect_ratios=anchor_params.aspect_ratios, - anchor_size=anchor_params.anchor_size, - use_category=parser_params.use_category, - outer_box_scale=parser_params.outer_box_scale, - box_jitter_scale=parser_params.box_jitter_scale, - num_sampled_masks=parser_params.num_sampled_masks, - mask_crop_size=parser_params.mask_crop_size, - mask_min_level=parser_params.mask_min_level, - mask_max_level=parser_params.mask_max_level, - upsample_factor=parser_params.upsample_factor, - match_threshold=parser_params.match_threshold, - unmatched_threshold=parser_params.unmatched_threshold, - aug_rand_hflip=parser_params.aug_rand_hflip, - aug_scale_min=parser_params.aug_scale_min, - aug_scale_max=parser_params.aug_scale_max, - skip_crowd_during_training=parser_params.skip_crowd_during_training, - max_num_instances=parser_params.max_num_instances, - use_bfloat16=params.architecture.use_bfloat16, - mask_train_class=parser_params.mask_train_class, - mode=mode) - else: - raise ValueError('Parser %s is not supported.' % params.architecture.parser) - - return parser_fn diff --git a/official/vision/detection/dataloader/input_reader.py b/official/vision/detection/dataloader/input_reader.py deleted file mode 100644 index 99e834109365963442e987bbf58f4c24bc5c9774..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/input_reader.py +++ /dev/null @@ -1,107 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Data loader and input processing.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from typing import Text, Optional -from official.modeling.hyperparams import params_dict -from official.vision.detection.dataloader import factory -from official.vision.detection.dataloader import mode_keys as ModeKeys - - -class InputFn(object): - """Input function that creates dataset from files.""" - - def __init__(self, - file_pattern: Text, - params: params_dict.ParamsDict, - mode: Text, - batch_size: int, - num_examples: Optional[int] = -1): - """Initialize. - - Args: - file_pattern: the file pattern for the data example (TFRecords). - params: the parameter object for constructing example parser and model. - mode: ModeKeys.TRAIN or ModeKeys.Eval - batch_size: the data batch size. - num_examples: If positive, only takes this number of examples and raise - tf.errors.OutOfRangeError after that. If non-positive, it will be - ignored. - """ - assert file_pattern is not None - assert mode is not None - assert batch_size is not None - self._file_pattern = file_pattern - self._mode = mode - self._is_training = (mode == ModeKeys.TRAIN) - self._batch_size = batch_size - self._num_examples = num_examples - self._parser_fn = factory.parser_generator(params, mode) - self._dataset_fn = tf.data.TFRecordDataset - - self._input_sharding = (not self._is_training) - try: - if self._is_training: - self._input_sharding = params.train.input_sharding - else: - self._input_sharding = params.eval.input_sharding - except AttributeError: - pass - - def __call__(self, ctx=None, batch_size: int = None): - """Provides tf.data.Dataset object. - - Args: - ctx: context object. - batch_size: expected batch size input data. - - Returns: - tf.data.Dataset object. - """ - if not batch_size: - batch_size = self._batch_size - assert batch_size is not None - dataset = tf.data.Dataset.list_files( - self._file_pattern, shuffle=self._is_training) - - if self._input_sharding and ctx and ctx.num_input_pipelines > 1: - dataset = dataset.shard(ctx.num_input_pipelines, ctx.input_pipeline_id) - dataset = dataset.cache() - - if self._is_training: - dataset = dataset.repeat() - - dataset = dataset.interleave( - map_func=self._dataset_fn, - cycle_length=32, - num_parallel_calls=tf.data.experimental.AUTOTUNE) - - if self._is_training: - dataset = dataset.shuffle(1000) - if self._num_examples > 0: - dataset = dataset.take(self._num_examples) - - # Parses the fetched records to input tensors for model function. - dataset = dataset.map( - self._parser_fn, num_parallel_calls=tf.data.experimental.AUTOTUNE) - dataset = dataset.batch(batch_size, drop_remainder=True) - dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE) - return dataset diff --git a/official/vision/detection/dataloader/maskrcnn_parser.py b/official/vision/detection/dataloader/maskrcnn_parser.py deleted file mode 100644 index 7df1d547cbab068f68fc750927c945eefddec48e..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/maskrcnn_parser.py +++ /dev/null @@ -1,385 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Data parser and processing for Mask R-CNN.""" - -import tensorflow as tf - -from official.vision.detection.dataloader import anchor -from official.vision.detection.dataloader import mode_keys as ModeKeys -from official.vision.detection.dataloader import tf_example_decoder -from official.vision.detection.utils import box_utils -from official.vision.detection.utils import dataloader_utils -from official.vision.detection.utils import input_utils - - -class Parser(object): - """Parser to parse an image and its annotations into a dictionary of tensors.""" - - def __init__(self, - output_size, - min_level, - max_level, - num_scales, - aspect_ratios, - anchor_size, - rpn_match_threshold=0.7, - rpn_unmatched_threshold=0.3, - rpn_batch_size_per_im=256, - rpn_fg_fraction=0.5, - aug_rand_hflip=False, - aug_scale_min=1.0, - aug_scale_max=1.0, - skip_crowd_during_training=True, - max_num_instances=100, - include_mask=False, - mask_crop_size=112, - use_bfloat16=True, - mode=None): - """Initializes parameters for parsing annotations in the dataset. - - Args: - output_size: `Tensor` or `list` for [height, width] of output image. The - output_size should be divided by the largest feature stride 2^max_level. - min_level: `int` number of minimum level of the output feature pyramid. - max_level: `int` number of maximum level of the output feature pyramid. - num_scales: `int` number representing intermediate scales added - on each level. For instances, num_scales=2 adds one additional - intermediate anchor scales [2^0, 2^0.5] on each level. - aspect_ratios: `list` of float numbers representing the aspect raito - anchors added on each level. The number indicates the ratio of width to - height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors - on each scale level. - anchor_size: `float` number representing the scale of size of the base - anchor to the feature stride 2^level. - rpn_match_threshold: - rpn_unmatched_threshold: - rpn_batch_size_per_im: - rpn_fg_fraction: - aug_rand_hflip: `bool`, if True, augment training with random - horizontal flip. - aug_scale_min: `float`, the minimum scale applied to `output_size` for - data augmentation during training. - aug_scale_max: `float`, the maximum scale applied to `output_size` for - data augmentation during training. - skip_crowd_during_training: `bool`, if True, skip annotations labeled with - `is_crowd` equals to 1. - max_num_instances: `int` number of maximum number of instances in an - image. The groundtruth data will be padded to `max_num_instances`. - include_mask: a bool to indicate whether parse mask groundtruth. - mask_crop_size: the size which groundtruth mask is cropped to. - use_bfloat16: `bool`, if True, cast output image to tf.bfloat16. - mode: a ModeKeys. Specifies if this is training, evaluation, prediction - or prediction with groundtruths in the outputs. - """ - self._mode = mode - self._max_num_instances = max_num_instances - self._skip_crowd_during_training = skip_crowd_during_training - self._is_training = (mode == ModeKeys.TRAIN) - - self._example_decoder = tf_example_decoder.TfExampleDecoder( - include_mask=include_mask) - - # Anchor. - self._output_size = output_size - self._min_level = min_level - self._max_level = max_level - self._num_scales = num_scales - self._aspect_ratios = aspect_ratios - self._anchor_size = anchor_size - - # Target assigning. - self._rpn_match_threshold = rpn_match_threshold - self._rpn_unmatched_threshold = rpn_unmatched_threshold - self._rpn_batch_size_per_im = rpn_batch_size_per_im - self._rpn_fg_fraction = rpn_fg_fraction - - # Data augmentation. - self._aug_rand_hflip = aug_rand_hflip - self._aug_scale_min = aug_scale_min - self._aug_scale_max = aug_scale_max - - # Mask. - self._include_mask = include_mask - self._mask_crop_size = mask_crop_size - - # Device. - self._use_bfloat16 = use_bfloat16 - - # Data is parsed depending on the model Modekey. - if mode == ModeKeys.TRAIN: - self._parse_fn = self._parse_train_data - elif mode == ModeKeys.EVAL: - self._parse_fn = self._parse_eval_data - elif mode == ModeKeys.PREDICT or mode == ModeKeys.PREDICT_WITH_GT: - self._parse_fn = self._parse_predict_data - else: - raise ValueError('mode is not defined.') - - def __call__(self, value): - """Parses data to an image and associated training labels. - - Args: - value: a string tensor holding a serialized tf.Example proto. - - Returns: - image, labels: if mode == ModeKeys.TRAIN. see _parse_train_data. - {'images': image, 'labels': labels}: if mode == ModeKeys.PREDICT - or ModeKeys.PREDICT_WITH_GT. - """ - with tf.name_scope('parser'): - data = self._example_decoder.decode(value) - return self._parse_fn(data) - - def _parse_train_data(self, data): - """Parses data for training. - - Args: - data: the decoded tensor dictionary from TfExampleDecoder. - - Returns: - image: image tensor that is preproessed to have normalized value and - dimension [output_size[0], output_size[1], 3] - labels: a dictionary of tensors used for training. The following describes - {key: value} pairs in the dictionary. - image_info: a 2D `Tensor` that encodes the information of the image and - the applied preprocessing. It is in the format of - [[original_height, original_width], [scaled_height, scaled_width], - anchor_boxes: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, 4] representing anchor boxes at each level. - rpn_score_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location]. The height_l and - width_l represent the dimension of class logits at l-th level. - rpn_box_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - gt_boxes: Groundtruth bounding box annotations. The box is represented - in [y1, x1, y2, x2] format. The coordinates are w.r.t the scaled - image that is fed to the network. The tennsor is padded with -1 to - the fixed dimension [self._max_num_instances, 4]. - gt_classes: Groundtruth classes annotations. The tennsor is padded - with -1 to the fixed dimension [self._max_num_instances]. - gt_masks: groundtrugh masks cropped by the bounding box and - resized to a fixed size determined by mask_crop_size. - """ - classes = data['groundtruth_classes'] - boxes = data['groundtruth_boxes'] - if self._include_mask: - masks = data['groundtruth_instance_masks'] - - is_crowds = data['groundtruth_is_crowd'] - # Skips annotations with `is_crowd` = True. - if self._skip_crowd_during_training and self._is_training: - num_groundtruths = tf.shape(classes)[0] - with tf.control_dependencies([num_groundtruths, is_crowds]): - indices = tf.cond( - tf.greater(tf.size(is_crowds), 0), - lambda: tf.where(tf.logical_not(is_crowds))[:, 0], - lambda: tf.cast(tf.range(num_groundtruths), tf.int64)) - classes = tf.gather(classes, indices) - boxes = tf.gather(boxes, indices) - if self._include_mask: - masks = tf.gather(masks, indices) - - # Gets original image and its size. - image = data['image'] - image_shape = tf.shape(image)[0:2] - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Flips image randomly during training. - if self._aug_rand_hflip: - if self._include_mask: - image, boxes, masks = input_utils.random_horizontal_flip( - image, boxes, masks) - else: - image, boxes = input_utils.random_horizontal_flip( - image, boxes) - - # Converts boxes from normalized coordinates to pixel coordinates. - # Now the coordinates of boxes are w.r.t. the original image. - boxes = box_utils.denormalize_boxes(boxes, image_shape) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - padded_size=input_utils.compute_padded_size( - self._output_size, 2 ** self._max_level), - aug_scale_min=self._aug_scale_min, - aug_scale_max=self._aug_scale_max) - image_height, image_width, _ = image.get_shape().as_list() - - # Resizes and crops boxes. - # Now the coordinates of boxes are w.r.t the scaled image. - image_scale = image_info[2, :] - offset = image_info[3, :] - boxes = input_utils.resize_and_crop_boxes( - boxes, image_scale, image_info[1, :], offset) - - # Filters out ground truth boxes that are all zeros. - indices = box_utils.get_non_empty_box_indices(boxes) - boxes = tf.gather(boxes, indices) - classes = tf.gather(classes, indices) - if self._include_mask: - masks = tf.gather(masks, indices) - # Transfer boxes to the original image space and do normalization. - cropped_boxes = boxes + tf.tile(tf.expand_dims(offset, axis=0), [1, 2]) - cropped_boxes /= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2]) - cropped_boxes = box_utils.normalize_boxes(cropped_boxes, image_shape) - num_masks = tf.shape(masks)[0] - masks = tf.image.crop_and_resize( - tf.expand_dims(masks, axis=-1), - cropped_boxes, - box_indices=tf.range(num_masks, dtype=tf.int32), - crop_size=[self._mask_crop_size, self._mask_crop_size], - method='bilinear') - masks = tf.squeeze(masks, axis=-1) - - # Assigns anchor targets. - # Note that after the target assignment, box targets are absolute pixel - # offsets w.r.t. the scaled image. - input_anchor = anchor.Anchor( - self._min_level, - self._max_level, - self._num_scales, - self._aspect_ratios, - self._anchor_size, - (image_height, image_width)) - anchor_labeler = anchor.RpnAnchorLabeler( - input_anchor, - self._rpn_match_threshold, - self._rpn_unmatched_threshold, - self._rpn_batch_size_per_im, - self._rpn_fg_fraction) - rpn_score_targets, rpn_box_targets = anchor_labeler.label_anchors( - boxes, tf.cast(tf.expand_dims(classes, axis=-1), dtype=tf.float32)) - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - inputs = { - 'image': image, - 'image_info': image_info, - } - # Packs labels for model_fn outputs. - labels = { - 'anchor_boxes': input_anchor.multilevel_boxes, - 'image_info': image_info, - 'rpn_score_targets': rpn_score_targets, - 'rpn_box_targets': rpn_box_targets, - } - inputs['gt_boxes'] = input_utils.pad_to_fixed_size(boxes, - self._max_num_instances, - -1) - inputs['gt_classes'] = input_utils.pad_to_fixed_size( - classes, self._max_num_instances, -1) - if self._include_mask: - inputs['gt_masks'] = input_utils.pad_to_fixed_size( - masks, self._max_num_instances, -1) - - return inputs, labels - - def _parse_eval_data(self, data): - """Parses data for evaluation.""" - raise NotImplementedError('Not implemented!') - - def _parse_predict_data(self, data): - """Parses data for prediction. - - Args: - data: the decoded tensor dictionary from TfExampleDecoder. - - Returns: - A dictionary of {'images': image, 'labels': labels} where - image: image tensor that is preproessed to have normalized value and - dimension [output_size[0], output_size[1], 3] - labels: a dictionary of tensors used for training. The following - describes {key: value} pairs in the dictionary. - source_ids: Source image id. Default value -1 if the source id is - empty in the groundtruth annotation. - image_info: a 2D `Tensor` that encodes the information of the image - and the applied preprocessing. It is in the format of - [[original_height, original_width], [scaled_height, scaled_width], - anchor_boxes: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, 4] representing anchor boxes at each - level. - """ - # Gets original image and its size. - image = data['image'] - image_shape = tf.shape(image)[0:2] - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - padded_size=input_utils.compute_padded_size( - self._output_size, 2 ** self._max_level), - aug_scale_min=1.0, - aug_scale_max=1.0) - image_height, image_width, _ = image.get_shape().as_list() - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - # Compute Anchor boxes. - input_anchor = anchor.Anchor( - self._min_level, - self._max_level, - self._num_scales, - self._aspect_ratios, - self._anchor_size, - (image_height, image_width)) - - labels = { - 'image_info': image_info, - } - - if self._mode == ModeKeys.PREDICT_WITH_GT: - # Converts boxes from normalized coordinates to pixel coordinates. - boxes = box_utils.denormalize_boxes( - data['groundtruth_boxes'], image_shape) - groundtruths = { - 'source_id': data['source_id'], - 'height': data['height'], - 'width': data['width'], - 'num_detections': tf.shape(data['groundtruth_classes']), - 'boxes': boxes, - 'classes': data['groundtruth_classes'], - 'areas': data['groundtruth_area'], - 'is_crowds': tf.cast(data['groundtruth_is_crowd'], tf.int32), - } - groundtruths['source_id'] = dataloader_utils.process_source_id( - groundtruths['source_id']) - groundtruths = dataloader_utils.pad_groundtruths_to_fixed_size( - groundtruths, self._max_num_instances) - # TODO(yeqing): Remove the `groundtrtuh` layer key (no longer needed). - labels['groundtruths'] = groundtruths - inputs = { - 'image': image, - 'image_info': image_info, - } - - return inputs, labels diff --git a/official/vision/detection/dataloader/mode_keys.py b/official/vision/detection/dataloader/mode_keys.py deleted file mode 100644 index d6fdd9008bd4491ebec171d25c14d517ca3647c6..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/mode_keys.py +++ /dev/null @@ -1,33 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Standard names for input dataloader modes. - -The following standard keys are defined: - -* `TRAIN`: training mode. -* `EVAL`: evaluation mode. -* `PREDICT`: prediction mode. -* `PREDICT_WITH_GT`: prediction mode with groundtruths in returned variables. -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - - -TRAIN = 'train' -EVAL = 'eval' -PREDICT = 'predict' -PREDICT_WITH_GT = 'predict_with_gt' diff --git a/official/vision/detection/dataloader/olnmask_parser.py b/official/vision/detection/dataloader/olnmask_parser.py deleted file mode 100644 index 5f05f7387c7eca91ba4b42ed87bbd5a3a3926a73..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/olnmask_parser.py +++ /dev/null @@ -1,327 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Data parser and processing for Mask R-CNN.""" - -import tensorflow as tf - -from official.vision.detection.dataloader import anchor -from official.vision.detection.dataloader.maskrcnn_parser import Parser as MaskrcnnParser -from official.vision.detection.utils import box_utils -from official.vision.detection.utils import class_utils -from official.vision.detection.utils import input_utils - - -class Parser(MaskrcnnParser): - """Parser to parse an image and its annotations into a dictionary of tensors.""" - - def __init__(self, - output_size, - min_level, - max_level, - num_scales, - aspect_ratios, - anchor_size, - rpn_match_threshold=0.7, - rpn_unmatched_threshold=0.3, - rpn_batch_size_per_im=256, - rpn_fg_fraction=0.5, - aug_rand_hflip=False, - aug_scale_min=1.0, - aug_scale_max=1.0, - skip_crowd_during_training=True, - max_num_instances=100, - include_mask=False, - mask_crop_size=112, - use_bfloat16=True, - mode=None, - # for centerness learning. - has_centerness=False, - rpn_center_match_iou_threshold=0.3, - rpn_center_unmatched_iou_threshold=0.1, - rpn_num_center_samples_per_im=256, - # for class manipulation. - class_agnostic=False, - train_class='all', - ): - """Initializes parameters for parsing annotations in the dataset. - - Args: - output_size: `Tensor` or `list` for [height, width] of output image. The - output_size should be divided by the largest feature stride 2^max_level. - min_level: `int` number of minimum level of the output feature pyramid. - max_level: `int` number of maximum level of the output feature pyramid. - num_scales: `int` number representing intermediate scales added - on each level. For instances, num_scales=2 adds one additional - intermediate anchor scales [2^0, 2^0.5] on each level. - aspect_ratios: `list` of float numbers representing the aspect raito - anchors added on each level. The number indicates the ratio of width to - height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors - on each scale level. - anchor_size: `float` number representing the scale of size of the base - anchor to the feature stride 2^level. - rpn_match_threshold: - rpn_unmatched_threshold: - rpn_batch_size_per_im: - rpn_fg_fraction: - aug_rand_hflip: `bool`, if True, augment training with random - horizontal flip. - aug_scale_min: `float`, the minimum scale applied to `output_size` for - data augmentation during training. - aug_scale_max: `float`, the maximum scale applied to `output_size` for - data augmentation during training. - skip_crowd_during_training: `bool`, if True, skip annotations labeled with - `is_crowd` equals to 1. - max_num_instances: `int` number of maximum number of instances in an - image. The groundtruth data will be padded to `max_num_instances`. - include_mask: a bool to indicate whether parse mask groundtruth. - mask_crop_size: the size which groundtruth mask is cropped to. - use_bfloat16: `bool`, if True, cast output image to tf.bfloat16. - mode: a ModeKeys. Specifies if this is training, evaluation, prediction - or prediction with groundtruths in the outputs. - has_centerness: whether to create centerness targets - rpn_center_match_iou_threshold: iou threshold for valid centerness samples - ,set to 0.3 by default. - rpn_center_unmatched_iou_threshold: iou threshold for invalid centerness - samples, set to 0.1 by default. - rpn_num_center_samples_per_im: number of centerness samples per image, - 256 by default. - class_agnostic: whether to merge class ids into one foreground(=1) class, - False by default. - train_class: 'all' or 'voc' or 'nonvoc', 'all' by default. - """ - super(Parser, self).__init__( - output_size=output_size, - min_level=min_level, - max_level=max_level, - num_scales=num_scales, - aspect_ratios=aspect_ratios, - anchor_size=anchor_size, - rpn_match_threshold=rpn_match_threshold, - rpn_unmatched_threshold=rpn_unmatched_threshold, - rpn_batch_size_per_im=rpn_batch_size_per_im, - rpn_fg_fraction=rpn_fg_fraction, - aug_rand_hflip=aug_rand_hflip, - aug_scale_min=aug_scale_min, - aug_scale_max=aug_scale_max, - skip_crowd_during_training=skip_crowd_during_training, - max_num_instances=max_num_instances, - include_mask=include_mask, - mask_crop_size=mask_crop_size, - use_bfloat16=use_bfloat16, - mode=mode,) - - # Centerness target assigning. - self._has_centerness = has_centerness - self._rpn_center_match_iou_threshold = rpn_center_match_iou_threshold - self._rpn_center_unmatched_iou_threshold = ( - rpn_center_unmatched_iou_threshold) - self._rpn_num_center_samples_per_im = rpn_num_center_samples_per_im - - # Class manipulation. - self._class_agnostic = class_agnostic - self._train_class = train_class - - def _parse_train_data(self, data): - """Parses data for training. - - Args: - data: the decoded tensor dictionary from TfExampleDecoder. - - Returns: - image: image tensor that is preproessed to have normalized value and - dimension [output_size[0], output_size[1], 3] - labels: a dictionary of tensors used for training. The following describes - {key: value} pairs in the dictionary. - image_info: a 2D `Tensor` that encodes the information of the image and - the applied preprocessing. It is in the format of - [[original_height, original_width], [scaled_height, scaled_width], - anchor_boxes: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, 4] representing anchor boxes at each level. - rpn_score_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location]. The height_l and - width_l represent the dimension of class logits at l-th level. - rpn_box_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - gt_boxes: Groundtruth bounding box annotations. The box is represented - in [y1, x1, y2, x2] format. The coordinates are w.r.t the scaled - image that is fed to the network. The tennsor is padded with -1 to - the fixed dimension [self._max_num_instances, 4]. - gt_classes: Groundtruth classes annotations. The tennsor is padded - with -1 to the fixed dimension [self._max_num_instances]. - gt_masks: groundtrugh masks cropped by the bounding box and - resized to a fixed size determined by mask_crop_size. - """ - classes = data['groundtruth_classes'] - boxes = data['groundtruth_boxes'] - if self._include_mask: - masks = data['groundtruth_instance_masks'] - - is_crowds = data['groundtruth_is_crowd'] - # Skips annotations with `is_crowd` = True. - if self._skip_crowd_during_training and self._is_training: - num_groundtruths = tf.shape(classes)[0] - with tf.control_dependencies([num_groundtruths, is_crowds]): - indices = tf.cond( - tf.greater(tf.size(is_crowds), 0), - lambda: tf.where(tf.logical_not(is_crowds))[:, 0], - lambda: tf.cast(tf.range(num_groundtruths), tf.int64)) - classes = tf.gather(classes, indices) - boxes = tf.gather(boxes, indices) - if self._include_mask: - masks = tf.gather(masks, indices) - - # Gets original image and its size. - image = data['image'] - image_shape = tf.shape(image)[0:2] - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Flips image randomly during training. - if self._aug_rand_hflip: - if self._include_mask: - image, boxes, masks = input_utils.random_horizontal_flip( - image, boxes, masks) - else: - image, boxes = input_utils.random_horizontal_flip( - image, boxes) - - # Converts boxes from normalized coordinates to pixel coordinates. - # Now the coordinates of boxes are w.r.t. the original image. - boxes = box_utils.denormalize_boxes(boxes, image_shape) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - padded_size=input_utils.compute_padded_size( - self._output_size, 2 ** self._max_level), - aug_scale_min=self._aug_scale_min, - aug_scale_max=self._aug_scale_max) - image_height, image_width, _ = image.get_shape().as_list() - - # Resizes and crops boxes. - # Now the coordinates of boxes are w.r.t the scaled image. - image_scale = image_info[2, :] - offset = image_info[3, :] - boxes = input_utils.resize_and_crop_boxes( - boxes, image_scale, image_info[1, :], offset) - - # Filters out ground truth boxes that are all zeros. - indices = box_utils.get_non_empty_box_indices(boxes) - boxes = tf.gather(boxes, indices) - classes = tf.gather(classes, indices) - if self._include_mask: - masks = tf.gather(masks, indices) - # Transfer boxes to the original image space and do normalization. - cropped_boxes = boxes + tf.tile(tf.expand_dims(offset, axis=0), [1, 2]) - cropped_boxes /= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2]) - cropped_boxes = box_utils.normalize_boxes(cropped_boxes, image_shape) - num_masks = tf.shape(masks)[0] - masks = tf.image.crop_and_resize( - tf.expand_dims(masks, axis=-1), - cropped_boxes, - box_indices=tf.range(num_masks, dtype=tf.int32), - crop_size=[self._mask_crop_size, self._mask_crop_size], - method='bilinear') - masks = tf.squeeze(masks, axis=-1) - - # Class manipulation. - # Filter out novel split classes from training. - if self._train_class != 'all': - valid_classes = tf.cast( - class_utils.coco_split_class_ids(self._train_class), - dtype=classes.dtype) - match = tf.reduce_any(tf.equal( - tf.expand_dims(valid_classes, 1), - tf.expand_dims(classes, 0)), 0) - # kill novel split classes and boxes. - boxes = tf.gather(boxes, tf.where(match)[:, 0]) - classes = tf.gather(classes, tf.where(match)[:, 0]) - if self._include_mask: - masks = tf.gather(masks, tf.where(match)[:, 0]) - - # Assigns anchor targets. - # Note that after the target assignment, box targets are absolute pixel - # offsets w.r.t. the scaled image. - input_anchor = anchor.Anchor( - self._min_level, - self._max_level, - self._num_scales, - self._aspect_ratios, - self._anchor_size, - (image_height, image_width)) - anchor_labeler = anchor.OlnAnchorLabeler( - input_anchor, - self._rpn_match_threshold, - self._rpn_unmatched_threshold, - self._rpn_batch_size_per_im, - self._rpn_fg_fraction, - # for centerness target. - self._has_centerness, - self._rpn_center_match_iou_threshold, - self._rpn_center_unmatched_iou_threshold, - self._rpn_num_center_samples_per_im,) - - if self._has_centerness: - rpn_score_targets, _, rpn_lrtb_targets, rpn_center_targets = ( - anchor_labeler.label_anchors_lrtb( - gt_boxes=boxes, - gt_labels=tf.cast( - tf.expand_dims(classes, axis=-1), dtype=tf.float32))) - else: - rpn_score_targets, rpn_box_targets = anchor_labeler.label_anchors( - boxes, tf.cast(tf.expand_dims(classes, axis=-1), dtype=tf.float32)) - # For base rpn, dummy placeholder for centerness target. - rpn_center_targets = rpn_score_targets.copy() - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - inputs = { - 'image': image, - 'image_info': image_info, - } - # Packs labels for model_fn outputs. - labels = { - 'anchor_boxes': input_anchor.multilevel_boxes, - 'image_info': image_info, - 'rpn_score_targets': rpn_score_targets, - 'rpn_box_targets': (rpn_lrtb_targets if self._has_centerness - else rpn_box_targets), - 'rpn_center_targets': rpn_center_targets, - } - # If class_agnostic, convert to binary classes. - if self._class_agnostic: - classes = tf.where(tf.greater(classes, 0), - tf.ones_like(classes), - tf.zeros_like(classes)) - - inputs['gt_boxes'] = input_utils.pad_to_fixed_size(boxes, - self._max_num_instances, - -1) - inputs['gt_classes'] = input_utils.pad_to_fixed_size( - classes, self._max_num_instances, -1) - if self._include_mask: - inputs['gt_masks'] = input_utils.pad_to_fixed_size( - masks, self._max_num_instances, -1) - - return inputs, labels diff --git a/official/vision/detection/dataloader/retinanet_parser.py b/official/vision/detection/dataloader/retinanet_parser.py deleted file mode 100644 index 8e9c3397ed304ec505e2030ddd5eb825273d2ff0..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/retinanet_parser.py +++ /dev/null @@ -1,425 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Data parser and processing. - -Parse image and ground truths in a dataset to training targets and package them -into (image, labels) tuple for RetinaNet. - -T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar -Focal Loss for Dense Object Detection. arXiv:1708.02002 -""" - -import tensorflow as tf - -from official.vision.detection.dataloader import anchor -from official.vision.detection.dataloader import mode_keys as ModeKeys -from official.vision.detection.dataloader import tf_example_decoder -from official.vision.detection.utils import box_utils -from official.vision.detection.utils import input_utils - - -def process_source_id(source_id): - """Processes source_id to the right format.""" - if source_id.dtype == tf.string: - source_id = tf.cast(tf.strings.to_number(source_id), tf.int32) - with tf.control_dependencies([source_id]): - source_id = tf.cond( - pred=tf.equal(tf.size(input=source_id), 0), - true_fn=lambda: tf.cast(tf.constant(-1), tf.int32), - false_fn=lambda: tf.identity(source_id)) - return source_id - - -def pad_groundtruths_to_fixed_size(gt, n): - """Pads the first dimension of groundtruths labels to the fixed size.""" - gt['boxes'] = input_utils.pad_to_fixed_size(gt['boxes'], n, -1) - gt['is_crowds'] = input_utils.pad_to_fixed_size(gt['is_crowds'], n, 0) - gt['areas'] = input_utils.pad_to_fixed_size(gt['areas'], n, -1) - gt['classes'] = input_utils.pad_to_fixed_size(gt['classes'], n, -1) - return gt - - -class Parser(object): - """Parser to parse an image and its annotations into a dictionary of tensors.""" - - def __init__(self, - output_size, - min_level, - max_level, - num_scales, - aspect_ratios, - anchor_size, - match_threshold=0.5, - unmatched_threshold=0.5, - aug_rand_hflip=False, - aug_scale_min=1.0, - aug_scale_max=1.0, - use_autoaugment=False, - autoaugment_policy_name='v0', - skip_crowd_during_training=True, - max_num_instances=100, - use_bfloat16=True, - mode=None): - """Initializes parameters for parsing annotations in the dataset. - - Args: - output_size: `Tensor` or `list` for [height, width] of output image. The - output_size should be divided by the largest feature stride 2^max_level. - min_level: `int` number of minimum level of the output feature pyramid. - max_level: `int` number of maximum level of the output feature pyramid. - num_scales: `int` number representing intermediate scales added on each - level. For instances, num_scales=2 adds one additional intermediate - anchor scales [2^0, 2^0.5] on each level. - aspect_ratios: `list` of float numbers representing the aspect raito - anchors added on each level. The number indicates the ratio of width to - height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors - on each scale level. - anchor_size: `float` number representing the scale of size of the base - anchor to the feature stride 2^level. - match_threshold: `float` number between 0 and 1 representing the - lower-bound threshold to assign positive labels for anchors. An anchor - with a score over the threshold is labeled positive. - unmatched_threshold: `float` number between 0 and 1 representing the - upper-bound threshold to assign negative labels for anchors. An anchor - with a score below the threshold is labeled negative. - aug_rand_hflip: `bool`, if True, augment training with random horizontal - flip. - aug_scale_min: `float`, the minimum scale applied to `output_size` for - data augmentation during training. - aug_scale_max: `float`, the maximum scale applied to `output_size` for - data augmentation during training. - use_autoaugment: `bool`, if True, use the AutoAugment augmentation policy - during training. - autoaugment_policy_name: `string` that specifies the name of the - AutoAugment policy that will be used during training. - skip_crowd_during_training: `bool`, if True, skip annotations labeled with - `is_crowd` equals to 1. - max_num_instances: `int` number of maximum number of instances in an - image. The groundtruth data will be padded to `max_num_instances`. - use_bfloat16: `bool`, if True, cast output image to tf.bfloat16. - mode: a ModeKeys. Specifies if this is training, evaluation, prediction or - prediction with groundtruths in the outputs. - """ - self._mode = mode - self._max_num_instances = max_num_instances - self._skip_crowd_during_training = skip_crowd_during_training - self._is_training = (mode == ModeKeys.TRAIN) - - self._example_decoder = tf_example_decoder.TfExampleDecoder( - include_mask=False) - - # Anchor. - self._output_size = output_size - self._min_level = min_level - self._max_level = max_level - self._num_scales = num_scales - self._aspect_ratios = aspect_ratios - self._anchor_size = anchor_size - self._match_threshold = match_threshold - self._unmatched_threshold = unmatched_threshold - - # Data augmentation. - self._aug_rand_hflip = aug_rand_hflip - self._aug_scale_min = aug_scale_min - self._aug_scale_max = aug_scale_max - - # Data Augmentation with AutoAugment. - self._use_autoaugment = use_autoaugment - self._autoaugment_policy_name = autoaugment_policy_name - - # Device. - self._use_bfloat16 = use_bfloat16 - - # Data is parsed depending on the model Modekey. - if mode == ModeKeys.TRAIN: - self._parse_fn = self._parse_train_data - elif mode == ModeKeys.EVAL: - self._parse_fn = self._parse_eval_data - elif mode == ModeKeys.PREDICT or mode == ModeKeys.PREDICT_WITH_GT: - self._parse_fn = self._parse_predict_data - else: - raise ValueError('mode is not defined.') - - def __call__(self, value): - """Parses data to an image and associated training labels. - - Args: - value: a string tensor holding a serialized tf.Example proto. - - Returns: - image: image tensor that is preproessed to have normalized value and - dimension [output_size[0], output_size[1], 3] - labels: - cls_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location]. The height_l and - width_l represent the dimension of class logits at l-th level. - box_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - num_positives: number of positive anchors in the image. - anchor_boxes: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, 4] representing anchor boxes at each level. - image_info: a 2D `Tensor` that encodes the information of the image and - the applied preprocessing. It is in the format of - [[original_height, original_width], [scaled_height, scaled_width], - [y_scale, x_scale], [y_offset, x_offset]]. - groundtruths: - source_id: source image id. Default value -1 if the source id is empty - in the groundtruth annotation. - boxes: groundtruth bounding box annotations. The box is represented in - [y1, x1, y2, x2] format. The tennsor is padded with -1 to the fixed - dimension [self._max_num_instances, 4]. - classes: groundtruth classes annotations. The tennsor is padded with - -1 to the fixed dimension [self._max_num_instances]. - areas: groundtruth areas annotations. The tennsor is padded with -1 - to the fixed dimension [self._max_num_instances]. - is_crowds: groundtruth annotations to indicate if an annotation - represents a group of instances by value {0, 1}. The tennsor is - padded with 0 to the fixed dimension [self._max_num_instances]. - """ - with tf.name_scope('parser'): - data = self._example_decoder.decode(value) - return self._parse_fn(data) - - def _parse_train_data(self, data): - """Parses data for training and evaluation.""" - classes = data['groundtruth_classes'] - boxes = data['groundtruth_boxes'] - is_crowds = data['groundtruth_is_crowd'] - # Skips annotations with `is_crowd` = True. - if self._skip_crowd_during_training and self._is_training: - num_groundtrtuhs = tf.shape(input=classes)[0] - with tf.control_dependencies([num_groundtrtuhs, is_crowds]): - indices = tf.cond( - pred=tf.greater(tf.size(input=is_crowds), 0), - true_fn=lambda: tf.where(tf.logical_not(is_crowds))[:, 0], - false_fn=lambda: tf.cast(tf.range(num_groundtrtuhs), tf.int64)) - classes = tf.gather(classes, indices) - boxes = tf.gather(boxes, indices) - - # Gets original image and its size. - image = data['image'] - - image_shape = tf.shape(input=image)[0:2] - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Flips image randomly during training. - if self._aug_rand_hflip: - image, boxes = input_utils.random_horizontal_flip(image, boxes) - - # Converts boxes from normalized coordinates to pixel coordinates. - boxes = box_utils.denormalize_boxes(boxes, image_shape) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - padded_size=input_utils.compute_padded_size(self._output_size, - 2**self._max_level), - aug_scale_min=self._aug_scale_min, - aug_scale_max=self._aug_scale_max) - image_height, image_width, _ = image.get_shape().as_list() - - # Resizes and crops boxes. - image_scale = image_info[2, :] - offset = image_info[3, :] - boxes = input_utils.resize_and_crop_boxes(boxes, image_scale, - image_info[1, :], offset) - # Filters out ground truth boxes that are all zeros. - indices = box_utils.get_non_empty_box_indices(boxes) - boxes = tf.gather(boxes, indices) - classes = tf.gather(classes, indices) - - # Assigns anchors. - input_anchor = anchor.Anchor(self._min_level, self._max_level, - self._num_scales, self._aspect_ratios, - self._anchor_size, (image_height, image_width)) - anchor_labeler = anchor.AnchorLabeler(input_anchor, self._match_threshold, - self._unmatched_threshold) - (cls_targets, box_targets, num_positives) = anchor_labeler.label_anchors( - boxes, tf.cast(tf.expand_dims(classes, axis=1), tf.float32)) - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - # Packs labels for model_fn outputs. - labels = { - 'cls_targets': cls_targets, - 'box_targets': box_targets, - 'anchor_boxes': input_anchor.multilevel_boxes, - 'num_positives': num_positives, - 'image_info': image_info, - } - return image, labels - - def _parse_eval_data(self, data): - """Parses data for training and evaluation.""" - groundtruths = {} - classes = data['groundtruth_classes'] - boxes = data['groundtruth_boxes'] - - # Gets original image and its size. - image = data['image'] - image_shape = tf.shape(input=image)[0:2] - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Converts boxes from normalized coordinates to pixel coordinates. - boxes = box_utils.denormalize_boxes(boxes, image_shape) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - padded_size=input_utils.compute_padded_size(self._output_size, - 2**self._max_level), - aug_scale_min=1.0, - aug_scale_max=1.0) - image_height, image_width, _ = image.get_shape().as_list() - - # Resizes and crops boxes. - image_scale = image_info[2, :] - offset = image_info[3, :] - boxes = input_utils.resize_and_crop_boxes(boxes, image_scale, - image_info[1, :], offset) - # Filters out ground truth boxes that are all zeros. - indices = box_utils.get_non_empty_box_indices(boxes) - boxes = tf.gather(boxes, indices) - classes = tf.gather(classes, indices) - - # Assigns anchors. - input_anchor = anchor.Anchor(self._min_level, self._max_level, - self._num_scales, self._aspect_ratios, - self._anchor_size, (image_height, image_width)) - anchor_labeler = anchor.AnchorLabeler(input_anchor, self._match_threshold, - self._unmatched_threshold) - (cls_targets, box_targets, num_positives) = anchor_labeler.label_anchors( - boxes, tf.cast(tf.expand_dims(classes, axis=1), tf.float32)) - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - # Sets up groundtruth data for evaluation. - groundtruths = { - 'source_id': - data['source_id'], - 'num_groundtrtuhs': - tf.shape(data['groundtruth_classes']), - 'image_info': - image_info, - 'boxes': - box_utils.denormalize_boxes(data['groundtruth_boxes'], image_shape), - 'classes': - data['groundtruth_classes'], - 'areas': - data['groundtruth_area'], - 'is_crowds': - tf.cast(data['groundtruth_is_crowd'], tf.int32), - } - groundtruths['source_id'] = process_source_id(groundtruths['source_id']) - groundtruths = pad_groundtruths_to_fixed_size(groundtruths, - self._max_num_instances) - - # Packs labels for model_fn outputs. - labels = { - 'cls_targets': cls_targets, - 'box_targets': box_targets, - 'anchor_boxes': input_anchor.multilevel_boxes, - 'num_positives': num_positives, - 'image_info': image_info, - 'groundtruths': groundtruths, - } - return image, labels - - def _parse_predict_data(self, data): - """Parses data for prediction.""" - # Gets original image and its size. - image = data['image'] - image_shape = tf.shape(input=image)[0:2] - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - padded_size=input_utils.compute_padded_size(self._output_size, - 2**self._max_level), - aug_scale_min=1.0, - aug_scale_max=1.0) - image_height, image_width, _ = image.get_shape().as_list() - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - # Compute Anchor boxes. - input_anchor = anchor.Anchor(self._min_level, self._max_level, - self._num_scales, self._aspect_ratios, - self._anchor_size, (image_height, image_width)) - - labels = { - 'anchor_boxes': input_anchor.multilevel_boxes, - 'image_info': image_info, - } - # If mode is PREDICT_WITH_GT, returns groundtruths and training targets - # in labels. - if self._mode == ModeKeys.PREDICT_WITH_GT: - # Converts boxes from normalized coordinates to pixel coordinates. - boxes = box_utils.denormalize_boxes(data['groundtruth_boxes'], - image_shape) - groundtruths = { - 'source_id': data['source_id'], - 'num_detections': tf.shape(data['groundtruth_classes']), - 'boxes': boxes, - 'classes': data['groundtruth_classes'], - 'areas': data['groundtruth_area'], - 'is_crowds': tf.cast(data['groundtruth_is_crowd'], tf.int32), - } - groundtruths['source_id'] = process_source_id(groundtruths['source_id']) - groundtruths = pad_groundtruths_to_fixed_size(groundtruths, - self._max_num_instances) - labels['groundtruths'] = groundtruths - - # Computes training objective for evaluation loss. - classes = data['groundtruth_classes'] - - image_scale = image_info[2, :] - offset = image_info[3, :] - boxes = input_utils.resize_and_crop_boxes(boxes, image_scale, - image_info[1, :], offset) - # Filters out ground truth boxes that are all zeros. - indices = box_utils.get_non_empty_box_indices(boxes) - boxes = tf.gather(boxes, indices) - - # Assigns anchors. - anchor_labeler = anchor.AnchorLabeler(input_anchor, self._match_threshold, - self._unmatched_threshold) - (cls_targets, box_targets, num_positives) = anchor_labeler.label_anchors( - boxes, tf.cast(tf.expand_dims(classes, axis=1), tf.float32)) - labels['cls_targets'] = cls_targets - labels['box_targets'] = box_targets - labels['num_positives'] = num_positives - return image, labels diff --git a/official/vision/detection/dataloader/shapemask_parser.py b/official/vision/detection/dataloader/shapemask_parser.py deleted file mode 100644 index c0e79e071692adcb8f1328563ccb9bd40734df80..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/shapemask_parser.py +++ /dev/null @@ -1,521 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Data parser and processing. - -Parse image and ground truths in a dataset to training targets and package them -into (image, labels) tuple for ShapeMask. - -Weicheng Kuo, Anelia Angelova, Jitendra Malik, Tsung-Yi Lin -ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors. -arXiv:1904.03239. -""" -import tensorflow as tf - -from official.vision.detection.dataloader import anchor -from official.vision.detection.dataloader import mode_keys as ModeKeys -from official.vision.detection.dataloader import tf_example_decoder -from official.vision.detection.utils import box_utils -from official.vision.detection.utils import class_utils -from official.vision.detection.utils import dataloader_utils -from official.vision.detection.utils import input_utils - - -def pad_to_size(input_tensor, size): - """Pads data with zeros to a given length at the first dimension if needed. - - Args: - input_tensor: `Tensor` with any dimension. - size: `int` number for the first dimension of output Tensor. - - Returns: - `Tensor` with the first dimension padded to `size` if the first diemsion - is less than `size`, otherwise no padding. - """ - input_shape = tf.shape(input_tensor) - padding_shape = [] - - # Computes the padding length on the first dimension. - padding_length = tf.maximum(0, size - tf.shape(input_tensor)[0]) - assert_length = tf.Assert( - tf.greater_equal(padding_length, 0), [padding_length]) - with tf.control_dependencies([assert_length]): - padding_shape.append(padding_length) - - # Copies shapes of the rest of input shape dimensions. - for i in range(1, len(input_shape)): - padding_shape.append(tf.shape(input=input_tensor)[i]) - - # Pads input tensor to the fixed first dimension. - paddings = tf.cast(tf.zeros(padding_shape), input_tensor.dtype) - padded_tensor = tf.concat([input_tensor, paddings], axis=0) - return padded_tensor - - -class Parser(object): - """ShapeMask Parser to parse an image and its annotations into a dictionary of tensors.""" - - def __init__(self, - output_size, - min_level, - max_level, - num_scales, - aspect_ratios, - anchor_size, - use_category=True, - outer_box_scale=1.0, - box_jitter_scale=0.025, - num_sampled_masks=8, - mask_crop_size=32, - mask_min_level=3, - mask_max_level=5, - upsample_factor=4, - match_threshold=0.5, - unmatched_threshold=0.5, - aug_rand_hflip=False, - aug_scale_min=1.0, - aug_scale_max=1.0, - skip_crowd_during_training=True, - max_num_instances=100, - use_bfloat16=True, - mask_train_class='all', - mode=None): - """Initializes parameters for parsing annotations in the dataset. - - Args: - output_size: `Tensor` or `list` for [height, width] of output image. The - output_size should be divided by the largest feature stride 2^max_level. - min_level: `int` number of minimum level of the output feature pyramid. - max_level: `int` number of maximum level of the output feature pyramid. - num_scales: `int` number representing intermediate scales added - on each level. For instances, num_scales=2 adds one additional - intermediate anchor scales [2^0, 2^0.5] on each level. - aspect_ratios: `list` of float numbers representing the aspect raito - anchors added on each level. The number indicates the ratio of width to - height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors - on each scale level. - anchor_size: `float` number representing the scale of size of the base - anchor to the feature stride 2^level. - use_category: if `False`, treat all object in all classes in one - foreground category. - outer_box_scale: `float` number in a range of [1.0, inf) representing - the scale from object box to outer box. The mask branch predicts - instance mask enclosed in outer box. - box_jitter_scale: `float` number representing the noise magnitude to - jitter the training groundtruth boxes for mask branch. - num_sampled_masks: `int` number of sampled masks for training. - mask_crop_size: `list` for [height, width] of output training masks. - mask_min_level: `int` number indicating the minimum feature level to - obtain instance features. - mask_max_level: `int` number indicating the maximum feature level to - obtain instance features. - upsample_factor: `int` factor of upsampling the fine mask predictions. - match_threshold: `float` number between 0 and 1 representing the - lower-bound threshold to assign positive labels for anchors. An anchor - with a score over the threshold is labeled positive. - unmatched_threshold: `float` number between 0 and 1 representing the - upper-bound threshold to assign negative labels for anchors. An anchor - with a score below the threshold is labeled negative. - aug_rand_hflip: `bool`, if True, augment training with random - horizontal flip. - aug_scale_min: `float`, the minimum scale applied to `output_size` for - data augmentation during training. - aug_scale_max: `float`, the maximum scale applied to `output_size` for - data augmentation during training. - skip_crowd_during_training: `bool`, if True, skip annotations labeled with - `is_crowd` equals to 1. - max_num_instances: `int` number of maximum number of instances in an - image. The groundtruth data will be padded to `max_num_instances`. - use_bfloat16: `bool`, if True, cast output image to tf.bfloat16. - mask_train_class: a string of experiment mode: `all`, `voc` or `nonvoc`. - mode: a ModeKeys. Specifies if this is training, evaluation, prediction - or prediction with groundtruths in the outputs. - """ - self._mode = mode - self._mask_train_class = mask_train_class - self._max_num_instances = max_num_instances - self._skip_crowd_during_training = skip_crowd_during_training - self._is_training = (mode == ModeKeys.TRAIN) - - self._example_decoder = tf_example_decoder.TfExampleDecoder( - include_mask=True) - - # Anchor. - self._output_size = output_size - self._min_level = min_level - self._max_level = max_level - self._num_scales = num_scales - self._aspect_ratios = aspect_ratios - self._anchor_size = anchor_size - self._match_threshold = match_threshold - self._unmatched_threshold = unmatched_threshold - - # Data augmentation. - self._aug_rand_hflip = aug_rand_hflip - self._aug_scale_min = aug_scale_min - self._aug_scale_max = aug_scale_max - - # Device. - self._use_bfloat16 = use_bfloat16 - - # ShapeMask specific. - # Control of which category to use. - self._use_category = use_category - self._num_sampled_masks = num_sampled_masks - self._mask_crop_size = mask_crop_size - self._mask_min_level = mask_min_level - self._mask_max_level = mask_max_level - self._outer_box_scale = outer_box_scale - self._box_jitter_scale = box_jitter_scale - self._up_sample_factor = upsample_factor - - # Data is parsed depending on the model Modekey. - if mode == ModeKeys.TRAIN: - self._parse_fn = self._parse_train_data - elif mode == ModeKeys.EVAL: - self._parse_fn = self._parse_eval_data - elif mode == ModeKeys.PREDICT or mode == ModeKeys.PREDICT_WITH_GT: - self._parse_fn = self._parse_predict_data - else: - raise ValueError('mode is not defined.') - - def __call__(self, value): - """Parses data to an image and associated training labels. - - Args: - value: a string tensor holding a serialized tf.Example proto. - - Returns: - inputs: - image: image tensor that is preproessed to have normalized value and - dimension [output_size[0], output_size[1], 3] - mask_boxes: sampled boxes that tightly enclose the training masks. The - box is represented in [y1, x1, y2, x2] format. The tensor is sampled - to the fixed dimension [self._num_sampled_masks, 4]. - mask_outer_boxes: loose box that enclose sampled tight box. The - box is represented in [y1, x1, y2, x2] format. The tensor is sampled - to the fixed dimension [self._num_sampled_masks, 4]. - mask_classes: the class ids of sampled training masks. The tensor has - shape [self._num_sampled_masks]. - labels: - cls_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location]. The height_l and - width_l represent the dimension of class logits at l-th level. - box_targets: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, anchors_per_location * 4]. The height_l and - width_l represent the dimension of bounding box regression output at - l-th level. - num_positives: number of positive anchors in the image. - anchor_boxes: ordered dictionary with keys - [min_level, min_level+1, ..., max_level]. The values are tensor with - shape [height_l, width_l, 4] representing anchor boxes at each level. - image_scale: 2D float `Tensor` representing scale factors that apply - to [height, width] of input image. - mask_targets: training binary mask targets. The tensor has shape - [self._num_sampled_masks, self._mask_crop_size, self._mask_crop_size]. - mask_is_valid: the binary tensor to indicate if the sampled masks are - valide. The sampled masks are invalid when no mask annotations are - included in the image. The tensor has shape [1]. - groundtruths: - source_id: source image id. Default value -1 if the source id is empty - in the groundtruth annotation. - boxes: groundtruth bounding box annotations. The box is represented in - [y1, x1, y2, x2] format. The tensor is padded with -1 to the fixed - dimension [self._max_num_instances, 4]. - classes: groundtruth classes annotations. The tensor is padded with - -1 to the fixed dimension [self._max_num_instances]. - areas: groundtruth areas annotations. The tensor is padded with -1 - to the fixed dimension [self._max_num_instances]. - is_crowds: groundtruth annotations to indicate if an annotation - represents a group of instances by value {0, 1}. The tensor is - padded with 0 to the fixed dimension [self._max_num_instances]. - """ - with tf.name_scope('parser'): - data = self._example_decoder.decode(value) - return self._parse_fn(data) - - def _parse_train_data(self, data): - """Parse data for ShapeMask training.""" - classes = data['groundtruth_classes'] - boxes = data['groundtruth_boxes'] - masks = data['groundtruth_instance_masks'] - is_crowds = data['groundtruth_is_crowd'] - # Skips annotations with `is_crowd` = True. - if self._skip_crowd_during_training and self._is_training: - num_groundtrtuhs = tf.shape(classes)[0] - with tf.control_dependencies([num_groundtrtuhs, is_crowds]): - indices = tf.cond( - tf.greater(tf.size(is_crowds), 0), - lambda: tf.where(tf.logical_not(is_crowds))[:, 0], - lambda: tf.cast(tf.range(num_groundtrtuhs), tf.int64)) - classes = tf.gather(classes, indices) - boxes = tf.gather(boxes, indices) - masks = tf.gather(masks, indices) - - # Gets original image and its size. - image = data['image'] - image_shape = tf.shape(image)[0:2] - - # If not using category, makes all categories with id = 0. - if not self._use_category: - classes = tf.cast(tf.greater(classes, 0), dtype=tf.float32) - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Flips image randomly during training. - if self._aug_rand_hflip: - image, boxes, masks = input_utils.random_horizontal_flip( - image, boxes, masks) - - # Converts boxes from normalized coordinates to pixel coordinates. - boxes = box_utils.denormalize_boxes(boxes, image_shape) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - self._output_size, - aug_scale_min=self._aug_scale_min, - aug_scale_max=self._aug_scale_max) - image_scale = image_info[2, :] - offset = image_info[3, :] - - # Resizes and crops boxes and masks. - boxes = input_utils.resize_and_crop_boxes( - boxes, image_scale, image_info[1, :], offset) - - # Filters out ground truth boxes that are all zeros. - indices = box_utils.get_non_empty_box_indices(boxes) - boxes = tf.gather(boxes, indices) - classes = tf.gather(classes, indices) - masks = tf.gather(masks, indices) - - # Assigns anchors. - input_anchor = anchor.Anchor( - self._min_level, self._max_level, self._num_scales, - self._aspect_ratios, self._anchor_size, self._output_size) - anchor_labeler = anchor.AnchorLabeler( - input_anchor, self._match_threshold, self._unmatched_threshold) - (cls_targets, - box_targets, - num_positives) = anchor_labeler.label_anchors( - boxes, - tf.cast(tf.expand_dims(classes, axis=1), tf.float32)) - - # Sample groundtruth masks/boxes/classes for mask branch. - num_masks = tf.shape(masks)[0] - mask_shape = tf.shape(masks)[1:3] - - # Pad sampled boxes/masks/classes to a constant batch size. - padded_boxes = pad_to_size(boxes, self._num_sampled_masks) - padded_classes = pad_to_size(classes, self._num_sampled_masks) - padded_masks = pad_to_size(masks, self._num_sampled_masks) - - # Randomly sample groundtruth masks for mask branch training. For the image - # without groundtruth masks, it will sample the dummy padded tensors. - rand_indices = tf.random.shuffle( - tf.range(tf.maximum(num_masks, self._num_sampled_masks))) - rand_indices = tf.math.mod(rand_indices, tf.maximum(num_masks, 1)) - rand_indices = rand_indices[0:self._num_sampled_masks] - rand_indices = tf.reshape(rand_indices, [self._num_sampled_masks]) - - sampled_boxes = tf.gather(padded_boxes, rand_indices) - sampled_classes = tf.gather(padded_classes, rand_indices) - sampled_masks = tf.gather(padded_masks, rand_indices) - # Jitter the sampled boxes to mimic the noisy detections. - sampled_boxes = box_utils.jitter_boxes( - sampled_boxes, noise_scale=self._box_jitter_scale) - sampled_boxes = box_utils.clip_boxes(sampled_boxes, self._output_size) - # Compute mask targets in feature crop. A feature crop fully contains a - # sampled box. - mask_outer_boxes = box_utils.compute_outer_boxes( - sampled_boxes, tf.shape(image)[0:2], scale=self._outer_box_scale) - mask_outer_boxes = box_utils.clip_boxes(mask_outer_boxes, self._output_size) - # Compensate the offset of mask_outer_boxes to map it back to original image - # scale. - mask_outer_boxes_ori = mask_outer_boxes - mask_outer_boxes_ori += tf.tile(tf.expand_dims(offset, axis=0), [1, 2]) - mask_outer_boxes_ori /= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2]) - norm_mask_outer_boxes_ori = box_utils.normalize_boxes( - mask_outer_boxes_ori, mask_shape) - - # Set sampled_masks shape to [batch_size, height, width, 1]. - sampled_masks = tf.cast(tf.expand_dims(sampled_masks, axis=-1), tf.float32) - mask_targets = tf.image.crop_and_resize( - sampled_masks, - norm_mask_outer_boxes_ori, - box_indices=tf.range(self._num_sampled_masks), - crop_size=[self._mask_crop_size, self._mask_crop_size], - method='bilinear', - extrapolation_value=0, - name='train_mask_targets') - mask_targets = tf.where(tf.greater_equal(mask_targets, 0.5), - tf.ones_like(mask_targets), - tf.zeros_like(mask_targets)) - mask_targets = tf.squeeze(mask_targets, axis=-1) - if self._up_sample_factor > 1: - fine_mask_targets = tf.image.crop_and_resize( - sampled_masks, - norm_mask_outer_boxes_ori, - box_indices=tf.range(self._num_sampled_masks), - crop_size=[ - self._mask_crop_size * self._up_sample_factor, - self._mask_crop_size * self._up_sample_factor - ], - method='bilinear', - extrapolation_value=0, - name='train_mask_targets') - fine_mask_targets = tf.where( - tf.greater_equal(fine_mask_targets, 0.5), - tf.ones_like(fine_mask_targets), tf.zeros_like(fine_mask_targets)) - fine_mask_targets = tf.squeeze(fine_mask_targets, axis=-1) - else: - fine_mask_targets = mask_targets - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - valid_image = tf.cast(tf.not_equal(num_masks, 0), tf.int32) - if self._mask_train_class == 'all': - mask_is_valid = valid_image * tf.ones_like(sampled_classes, tf.int32) - else: - # Get the intersection of sampled classes with training splits. - mask_valid_classes = tf.cast( - tf.expand_dims( - class_utils.coco_split_class_ids(self._mask_train_class), 1), - sampled_classes.dtype) - match = tf.reduce_any( - tf.equal(tf.expand_dims(sampled_classes, 0), mask_valid_classes), 0) - mask_is_valid = valid_image * tf.cast(match, tf.int32) - - # Packs labels for model_fn outputs. - labels = { - 'cls_targets': cls_targets, - 'box_targets': box_targets, - 'anchor_boxes': input_anchor.multilevel_boxes, - 'num_positives': num_positives, - 'image_info': image_info, - # For ShapeMask. - 'mask_targets': mask_targets, - 'fine_mask_targets': fine_mask_targets, - 'mask_is_valid': mask_is_valid, - } - - inputs = { - 'image': image, - 'image_info': image_info, - 'mask_boxes': sampled_boxes, - 'mask_outer_boxes': mask_outer_boxes, - 'mask_classes': sampled_classes, - } - return inputs, labels - - def _parse_predict_data(self, data): - """Parse data for ShapeMask training.""" - classes = data['groundtruth_classes'] - boxes = data['groundtruth_boxes'] - masks = data['groundtruth_instance_masks'] - - # Gets original image and its size. - image = data['image'] - image_shape = tf.shape(image)[0:2] - - # If not using category, makes all categories with id = 0. - if not self._use_category: - classes = tf.cast(tf.greater(classes, 0), dtype=tf.float32) - - # Normalizes image with mean and std pixel values. - image = input_utils.normalize_image(image) - - # Converts boxes from normalized coordinates to pixel coordinates. - boxes = box_utils.denormalize_boxes(boxes, image_shape) - - # Resizes and crops image. - image, image_info = input_utils.resize_and_crop_image( - image, - self._output_size, - self._output_size, - aug_scale_min=1.0, - aug_scale_max=1.0) - image_scale = image_info[2, :] - offset = image_info[3, :] - - # Resizes and crops boxes and masks. - boxes = input_utils.resize_and_crop_boxes( - boxes, image_scale, image_info[1, :], offset) - masks = input_utils.resize_and_crop_masks( - tf.expand_dims(masks, axis=-1), image_scale, self._output_size, offset) - - # Filters out ground truth boxes that are all zeros. - indices = box_utils.get_non_empty_box_indices(boxes) - boxes = tf.gather(boxes, indices) - classes = tf.gather(classes, indices) - - # Assigns anchors. - input_anchor = anchor.Anchor( - self._min_level, self._max_level, self._num_scales, - self._aspect_ratios, self._anchor_size, self._output_size) - anchor_labeler = anchor.AnchorLabeler( - input_anchor, self._match_threshold, self._unmatched_threshold) - - # If bfloat16 is used, casts input image to tf.bfloat16. - if self._use_bfloat16: - image = tf.cast(image, dtype=tf.bfloat16) - - labels = { - 'anchor_boxes': input_anchor.multilevel_boxes, - 'image_info': image_info, - } - if self._mode == ModeKeys.PREDICT_WITH_GT: - # Converts boxes from normalized coordinates to pixel coordinates. - groundtruths = { - 'source_id': data['source_id'], - 'height': data['height'], - 'width': data['width'], - 'num_detections': tf.shape(data['groundtruth_classes']), - 'boxes': box_utils.denormalize_boxes( - data['groundtruth_boxes'], image_shape), - 'classes': data['groundtruth_classes'], - # 'masks': tf.squeeze(masks, axis=-1), - 'areas': data['groundtruth_area'], - 'is_crowds': tf.cast(data['groundtruth_is_crowd'], tf.int32), - } - groundtruths['source_id'] = dataloader_utils.process_source_id( - groundtruths['source_id']) - groundtruths = dataloader_utils.pad_groundtruths_to_fixed_size( - groundtruths, self._max_num_instances) - # Computes training labels. - (cls_targets, - box_targets, - num_positives) = anchor_labeler.label_anchors( - boxes, - tf.cast(tf.expand_dims(classes, axis=1), tf.float32)) - # Packs labels for model_fn outputs. - labels.update({ - 'cls_targets': cls_targets, - 'box_targets': box_targets, - 'num_positives': num_positives, - 'groundtruths': groundtruths, - }) - - inputs = { - 'image': image, - 'image_info': image_info, - } - - return inputs, labels diff --git a/official/vision/detection/dataloader/tf_example_decoder.py b/official/vision/detection/dataloader/tf_example_decoder.py deleted file mode 100644 index e6472a36b9a31a8e8a98cecf10a6abf8ccb03985..0000000000000000000000000000000000000000 --- a/official/vision/detection/dataloader/tf_example_decoder.py +++ /dev/null @@ -1,156 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -"""Tensorflow Example proto decoder for object detection. - -A decoder to decode string tensors containing serialized tensorflow.Example -protos for object detection. -""" -import tensorflow as tf - - -class TfExampleDecoder(object): - """Tensorflow Example proto decoder.""" - - def __init__(self, include_mask=False): - self._include_mask = include_mask - self._keys_to_features = { - 'image/encoded': - tf.io.FixedLenFeature((), tf.string), - 'image/source_id': - tf.io.FixedLenFeature((), tf.string), - 'image/height': - tf.io.FixedLenFeature((), tf.int64), - 'image/width': - tf.io.FixedLenFeature((), tf.int64), - 'image/object/bbox/xmin': - tf.io.VarLenFeature(tf.float32), - 'image/object/bbox/xmax': - tf.io.VarLenFeature(tf.float32), - 'image/object/bbox/ymin': - tf.io.VarLenFeature(tf.float32), - 'image/object/bbox/ymax': - tf.io.VarLenFeature(tf.float32), - 'image/object/class/label': - tf.io.VarLenFeature(tf.int64), - 'image/object/area': - tf.io.VarLenFeature(tf.float32), - 'image/object/is_crowd': - tf.io.VarLenFeature(tf.int64), - } - if include_mask: - self._keys_to_features.update({ - 'image/object/mask': - tf.io.VarLenFeature(tf.string), - }) - - def _decode_image(self, parsed_tensors): - """Decodes the image and set its static shape.""" - image = tf.io.decode_image(parsed_tensors['image/encoded'], channels=3) - image.set_shape([None, None, 3]) - return image - - def _decode_boxes(self, parsed_tensors): - """Concat box coordinates in the format of [ymin, xmin, ymax, xmax].""" - xmin = parsed_tensors['image/object/bbox/xmin'] - xmax = parsed_tensors['image/object/bbox/xmax'] - ymin = parsed_tensors['image/object/bbox/ymin'] - ymax = parsed_tensors['image/object/bbox/ymax'] - return tf.stack([ymin, xmin, ymax, xmax], axis=-1) - - def _decode_masks(self, parsed_tensors): - """Decode a set of PNG masks to the tf.float32 tensors.""" - def _decode_png_mask(png_bytes): - mask = tf.squeeze( - tf.io.decode_png(png_bytes, channels=1, dtype=tf.uint8), axis=-1) - mask = tf.cast(mask, dtype=tf.float32) - mask.set_shape([None, None]) - return mask - - height = parsed_tensors['image/height'] - width = parsed_tensors['image/width'] - masks = parsed_tensors['image/object/mask'] - return tf.cond( - pred=tf.greater(tf.size(input=masks), 0), - true_fn=lambda: tf.map_fn(_decode_png_mask, masks, dtype=tf.float32), - false_fn=lambda: tf.zeros([0, height, width], dtype=tf.float32)) - - def _decode_areas(self, parsed_tensors): - xmin = parsed_tensors['image/object/bbox/xmin'] - xmax = parsed_tensors['image/object/bbox/xmax'] - ymin = parsed_tensors['image/object/bbox/ymin'] - ymax = parsed_tensors['image/object/bbox/ymax'] - return tf.cond( - tf.greater(tf.shape(parsed_tensors['image/object/area'])[0], 0), - lambda: parsed_tensors['image/object/area'], - lambda: (xmax - xmin) * (ymax - ymin)) - - def decode(self, serialized_example): - """Decode the serialized example. - - Args: - serialized_example: a single serialized tf.Example string. - - Returns: - decoded_tensors: a dictionary of tensors with the following fields: - - image: a uint8 tensor of shape [None, None, 3]. - - source_id: a string scalar tensor. - - height: an integer scalar tensor. - - width: an integer scalar tensor. - - groundtruth_classes: a int64 tensor of shape [None]. - - groundtruth_is_crowd: a bool tensor of shape [None]. - - groundtruth_area: a float32 tensor of shape [None]. - - groundtruth_boxes: a float32 tensor of shape [None, 4]. - - groundtruth_instance_masks: a float32 tensor of shape - [None, None, None]. - - groundtruth_instance_masks_png: a string tensor of shape [None]. - """ - parsed_tensors = tf.io.parse_single_example( - serialized=serialized_example, features=self._keys_to_features) - for k in parsed_tensors: - if isinstance(parsed_tensors[k], tf.SparseTensor): - if parsed_tensors[k].dtype == tf.string: - parsed_tensors[k] = tf.sparse.to_dense( - parsed_tensors[k], default_value='') - else: - parsed_tensors[k] = tf.sparse.to_dense( - parsed_tensors[k], default_value=0) - - image = self._decode_image(parsed_tensors) - boxes = self._decode_boxes(parsed_tensors) - areas = self._decode_areas(parsed_tensors) - is_crowds = tf.cond( - tf.greater(tf.shape(parsed_tensors['image/object/is_crowd'])[0], 0), - lambda: tf.cast(parsed_tensors['image/object/is_crowd'], dtype=tf.bool), - lambda: tf.zeros_like(parsed_tensors['image/object/class/label'], dtype=tf.bool)) # pylint: disable=line-too-long - if self._include_mask: - masks = self._decode_masks(parsed_tensors) - - decoded_tensors = { - 'image': image, - 'source_id': parsed_tensors['image/source_id'], - 'height': parsed_tensors['image/height'], - 'width': parsed_tensors['image/width'], - 'groundtruth_classes': parsed_tensors['image/object/class/label'], - 'groundtruth_is_crowd': is_crowds, - 'groundtruth_area': areas, - 'groundtruth_boxes': boxes, - } - if self._include_mask: - decoded_tensors.update({ - 'groundtruth_instance_masks': masks, - 'groundtruth_instance_masks_png': parsed_tensors['image/object/mask'], - }) - return decoded_tensors diff --git a/official/vision/detection/evaluation/__init__.py b/official/vision/detection/evaluation/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/evaluation/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/evaluation/coco_evaluator.py b/official/vision/detection/evaluation/coco_evaluator.py deleted file mode 100644 index 108290bb7bef6633c4be579b8ca8c929b34213cb..0000000000000000000000000000000000000000 --- a/official/vision/detection/evaluation/coco_evaluator.py +++ /dev/null @@ -1,615 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""The COCO-style evaluator. - -The following snippet demonstrates the use of interfaces: - - evaluator = COCOEvaluator(...) - for _ in range(num_evals): - for _ in range(num_batches_per_eval): - predictions, groundtruth = predictor.predict(...) # pop a batch. - evaluator.update(predictions, groundtruths) # aggregate internal stats. - evaluator.evaluate() # finish one full eval. - -See also: https://github.com/cocodataset/cocoapi/ -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import atexit -import copy -import tempfile - -from absl import logging -import numpy as np -from pycocotools import cocoeval -import six -import tensorflow as tf - -from official.vision.detection.evaluation import coco_utils -from official.vision.detection.utils import class_utils - - -class MetricWrapper(object): - # This is only a wrapper for COCO metric and works on for numpy array. So it - # doesn't inherit from tf.keras.layers.Layer or tf.keras.metrics.Metric. - - def __init__(self, evaluator): - self._evaluator = evaluator - - def update_state(self, y_true, y_pred): - labels = tf.nest.map_structure(lambda x: x.numpy(), y_true) - outputs = tf.nest.map_structure(lambda x: x.numpy(), y_pred) - groundtruths = {} - predictions = {} - for key, val in outputs.items(): - if isinstance(val, tuple): - val = np.concatenate(val) - predictions[key] = val - for key, val in labels.items(): - if isinstance(val, tuple): - val = np.concatenate(val) - groundtruths[key] = val - self._evaluator.update(predictions, groundtruths) - - def result(self): - return self._evaluator.evaluate() - - def reset_states(self): - return self._evaluator.reset() - - -class COCOEvaluator(object): - """COCO evaluation metric class.""" - - def __init__(self, annotation_file, include_mask, need_rescale_bboxes=True): - """Constructs COCO evaluation class. - - The class provides the interface to metrics_fn in TPUEstimator. The - _update_op() takes detections from each image and push them to - self.detections. The _evaluate() loads a JSON file in COCO annotation format - as the groundtruths and runs COCO evaluation. - - Args: - annotation_file: a JSON file that stores annotations of the eval dataset. - If `annotation_file` is None, groundtruth annotations will be loaded - from the dataloader. - include_mask: a boolean to indicate whether or not to include the mask - eval. - need_rescale_bboxes: If true bboxes in `predictions` will be rescaled back - to absolute values (`image_info` is needed in this case). - """ - if annotation_file: - if annotation_file.startswith('gs://'): - _, local_val_json = tempfile.mkstemp(suffix='.json') - tf.io.gfile.remove(local_val_json) - - tf.io.gfile.copy(annotation_file, local_val_json) - atexit.register(tf.io.gfile.remove, local_val_json) - else: - local_val_json = annotation_file - self._coco_gt = coco_utils.COCOWrapper( - eval_type=('mask' if include_mask else 'box'), - annotation_file=local_val_json) - self._annotation_file = annotation_file - self._include_mask = include_mask - self._metric_names = [ - 'AP', 'AP50', 'AP75', 'APs', 'APm', 'APl', 'ARmax1', 'ARmax10', - 'ARmax100', 'ARs', 'ARm', 'ARl' - ] - self._required_prediction_fields = [ - 'source_id', 'num_detections', 'detection_classes', 'detection_scores', - 'detection_boxes' - ] - self._need_rescale_bboxes = need_rescale_bboxes - if self._need_rescale_bboxes: - self._required_prediction_fields.append('image_info') - self._required_groundtruth_fields = [ - 'source_id', 'height', 'width', 'classes', 'boxes' - ] - if self._include_mask: - mask_metric_names = ['mask_' + x for x in self._metric_names] - self._metric_names.extend(mask_metric_names) - self._required_prediction_fields.extend(['detection_masks']) - self._required_groundtruth_fields.extend(['masks']) - - self.reset() - - def reset(self): - """Resets internal states for a fresh run.""" - self._predictions = {} - if not self._annotation_file: - self._groundtruths = {} - - def evaluate(self): - """Evaluates with detections from all images with COCO API. - - Returns: - coco_metric: float numpy array with shape [24] representing the - coco-style evaluation metrics (box and mask). - """ - if not self._annotation_file: - logging.info('Thre is no annotation_file in COCOEvaluator.') - gt_dataset = coco_utils.convert_groundtruths_to_coco_dataset( - self._groundtruths) - coco_gt = coco_utils.COCOWrapper( - eval_type=('mask' if self._include_mask else 'box'), - gt_dataset=gt_dataset) - else: - logging.info('Using annotation file: %s', self._annotation_file) - coco_gt = self._coco_gt - coco_predictions = coco_utils.convert_predictions_to_coco_annotations( - self._predictions) - coco_dt = coco_gt.loadRes(predictions=coco_predictions) - image_ids = [ann['image_id'] for ann in coco_predictions] - - coco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='bbox') - coco_eval.params.imgIds = image_ids - coco_eval.evaluate() - coco_eval.accumulate() - coco_eval.summarize() - coco_metrics = coco_eval.stats - - if self._include_mask: - mcoco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='segm') - mcoco_eval.params.imgIds = image_ids - mcoco_eval.evaluate() - mcoco_eval.accumulate() - mcoco_eval.summarize() - mask_coco_metrics = mcoco_eval.stats - - if self._include_mask: - metrics = np.hstack((coco_metrics, mask_coco_metrics)) - else: - metrics = coco_metrics - - # Cleans up the internal variables in order for a fresh eval next time. - self.reset() - - metrics_dict = {} - for i, name in enumerate(self._metric_names): - metrics_dict[name] = metrics[i].astype(np.float32) - return metrics_dict - - def _process_predictions(self, predictions): - image_scale = np.tile(predictions['image_info'][:, 2:3, :], (1, 1, 2)) - predictions['detection_boxes'] = ( - predictions['detection_boxes'].astype(np.float32)) - predictions['detection_boxes'] /= image_scale - if 'detection_outer_boxes' in predictions: - predictions['detection_outer_boxes'] = ( - predictions['detection_outer_boxes'].astype(np.float32)) - predictions['detection_outer_boxes'] /= image_scale - - def update(self, predictions, groundtruths=None): - """Update and aggregate detection results and groundtruth data. - - Args: - predictions: a dictionary of numpy arrays including the fields below. See - different parsers under `../dataloader` for more details. - Required fields: - - source_id: a numpy array of int or string of shape [batch_size]. - - image_info [if `need_rescale_bboxes` is True]: a numpy array of - float of shape [batch_size, 4, 2]. - - num_detections: a numpy array of int of shape [batch_size]. - - detection_boxes: a numpy array of float of shape [batch_size, K, 4]. - - detection_classes: a numpy array of int of shape [batch_size, K]. - - detection_scores: a numpy array of float of shape [batch_size, K]. - Optional fields: - - detection_masks: a numpy array of float of shape [batch_size, K, - mask_height, mask_width]. - groundtruths: a dictionary of numpy arrays including the fields below. See - also different parsers under `../dataloader` for more details. - Required fields: - - source_id: a numpy array of int or string of shape [batch_size]. - - height: a numpy array of int of shape [batch_size]. - - width: a numpy array of int of shape [batch_size]. - - num_detections: a numpy array of int of shape [batch_size]. - - boxes: a numpy array of float of shape [batch_size, K, 4]. - - classes: a numpy array of int of shape [batch_size, K]. - Optional fields: - - is_crowds: a numpy array of int of shape [batch_size, K]. If the - field is absent, it is assumed that this instance is not crowd. - - areas: a numy array of float of shape [batch_size, K]. If the field - is absent, the area is calculated using either boxes or masks - depending on which one is available. - - masks: a numpy array of float of shape [batch_size, K, mask_height, - mask_width], - - Raises: - ValueError: if the required prediction or groundtruth fields are not - present in the incoming `predictions` or `groundtruths`. - """ - for k in self._required_prediction_fields: - if k not in predictions: - raise ValueError( - 'Missing the required key `{}` in predictions!'.format(k)) - if self._need_rescale_bboxes: - self._process_predictions(predictions) - for k, v in six.iteritems(predictions): - if k not in self._predictions: - self._predictions[k] = [v] - else: - self._predictions[k].append(v) - - if not self._annotation_file: - assert groundtruths - for k in self._required_groundtruth_fields: - if k not in groundtruths: - raise ValueError( - 'Missing the required key `{}` in groundtruths!'.format(k)) - for k, v in six.iteritems(groundtruths): - if k not in self._groundtruths: - self._groundtruths[k] = [v] - else: - self._groundtruths[k].append(v) - - -class OlnXclassEvaluator(COCOEvaluator): - """COCO evaluation metric class.""" - - def __init__(self, annotation_file, include_mask, need_rescale_bboxes=True, - use_category=True, seen_class='all'): - """Constructs COCO evaluation class. - - The class provides the interface to metrics_fn in TPUEstimator. The - _update_op() takes detections from each image and push them to - self.detections. The _evaluate() loads a JSON file in COCO annotation format - as the groundtruths and runs COCO evaluation. - - Args: - annotation_file: a JSON file that stores annotations of the eval dataset. - If `annotation_file` is None, groundtruth annotations will be loaded - from the dataloader. - include_mask: a boolean to indicate whether or not to include the mask - eval. - need_rescale_bboxes: If true bboxes in `predictions` will be rescaled back - to absolute values (`image_info` is needed in this case). - use_category: if `False`, treat all object in all classes in one - foreground category. - seen_class: 'all' or 'voc' or 'nonvoc' - """ - super(OlnXclassEvaluator, self).__init__( - annotation_file=annotation_file, - include_mask=include_mask, - need_rescale_bboxes=need_rescale_bboxes) - self._use_category = use_category - self._seen_class = seen_class - self._seen_class_ids = class_utils.coco_split_class_ids(seen_class) - self._metric_names = [ - 'AP', 'AP50', 'AP75', - 'APs', 'APm', 'APl', - 'ARmax10', 'ARmax20', 'ARmax50', 'ARmax100', 'ARmax200', - 'ARmax10s', 'ARmax10m', 'ARmax10l' - ] - if self._seen_class != 'all': - self._metric_names.extend([ - 'AP_seen', 'AP50_seen', 'AP75_seen', - 'APs_seen', 'APm_seen', 'APl_seen', - 'ARmax10_seen', 'ARmax20_seen', 'ARmax50_seen', - 'ARmax100_seen', 'ARmax200_seen', - 'ARmax10s_seen', 'ARmax10m_seen', 'ARmax10l_seen', - - 'AP_novel', 'AP50_novel', 'AP75_novel', - 'APs_novel', 'APm_novel', 'APl_novel', - 'ARmax10_novel', 'ARmax20_novel', 'ARmax50_novel', - 'ARmax100_novel', 'ARmax200_novel', - 'ARmax10s_novel', 'ARmax10m_novel', 'ARmax10l_novel', - ]) - if self._include_mask: - mask_metric_names = ['mask_' + x for x in self._metric_names] - self._metric_names.extend(mask_metric_names) - self._required_prediction_fields.extend(['detection_masks']) - self._required_groundtruth_fields.extend(['masks']) - - self.reset() - - def evaluate(self): - """Evaluates with detections from all images with COCO API. - - Returns: - coco_metric: float numpy array with shape [24] representing the - coco-style evaluation metrics (box and mask). - """ - if not self._annotation_file: - logging.info('Thre is no annotation_file in COCOEvaluator.') - gt_dataset = coco_utils.convert_groundtruths_to_coco_dataset( - self._groundtruths) - coco_gt = coco_utils.COCOWrapper( - eval_type=('mask' if self._include_mask else 'box'), - gt_dataset=gt_dataset) - else: - logging.info('Using annotation file: %s', self._annotation_file) - coco_gt = self._coco_gt - - coco_predictions = coco_utils.convert_predictions_to_coco_annotations( - self._predictions) - coco_dt = coco_gt.loadRes(predictions=coco_predictions) - image_ids = [ann['image_id'] for ann in coco_predictions] - # Class manipulation: 'all' split samples -> ignored_split = 0. - for idx, ann in enumerate(coco_gt.dataset['annotations']): - coco_gt.dataset['annotations'][idx]['ignored_split'] = 0 - coco_eval = cocoeval.OlnCOCOevalXclassWrapper( - coco_gt, coco_dt, iou_type='bbox') - coco_eval.params.maxDets = [10, 20, 50, 100, 200] - coco_eval.params.imgIds = image_ids - coco_eval.params.useCats = 0 if not self._use_category else 1 - coco_eval.evaluate() - coco_eval.accumulate() - coco_eval.summarize() - coco_metrics = coco_eval.stats - - if self._include_mask: - mcoco_eval = cocoeval.OlnCOCOevalXclassWrapper( - coco_gt, coco_dt, iou_type='segm') - mcoco_eval.params.maxDets = [10, 20, 50, 100, 200] - mcoco_eval.params.imgIds = image_ids - mcoco_eval.params.useCats = 0 if not self._use_category else 1 - mcoco_eval.evaluate() - mcoco_eval.accumulate() - mcoco_eval.summarize() - mask_coco_metrics = mcoco_eval.stats - - if self._include_mask: - metrics = np.hstack((coco_metrics, mask_coco_metrics)) - else: - metrics = coco_metrics - - if self._seen_class != 'all': - # for seen class eval, samples of novel_class are ignored. - coco_gt_seen = copy.deepcopy(coco_gt) - for idx, ann in enumerate(coco_gt.dataset['annotations']): - if ann['category_id'] in self._seen_class_ids: - coco_gt_seen.dataset['annotations'][idx]['ignored_split'] = 0 - else: - coco_gt_seen.dataset['annotations'][idx]['ignored_split'] = 1 - coco_eval_seen = cocoeval.OlnCOCOevalXclassWrapper( - coco_gt_seen, coco_dt, iou_type='bbox') - coco_eval_seen.params.maxDets = [10, 20, 50, 100, 200] - coco_eval_seen.params.imgIds = image_ids - coco_eval_seen.params.useCats = 0 if not self._use_category else 1 - coco_eval_seen.evaluate() - coco_eval_seen.accumulate() - coco_eval_seen.summarize() - coco_metrics_seen = coco_eval_seen.stats - if self._include_mask: - mcoco_eval_seen = cocoeval.OlnCOCOevalXclassWrapper( - coco_gt_seen, coco_dt, iou_type='segm') - mcoco_eval_seen.params.maxDets = [10, 20, 50, 100, 200] - mcoco_eval_seen.params.imgIds = image_ids - mcoco_eval_seen.params.useCats = 0 if not self._use_category else 1 - mcoco_eval_seen.evaluate() - mcoco_eval_seen.accumulate() - mcoco_eval_seen.summarize() - mask_coco_metrics_seen = mcoco_eval_seen.stats - - # for novel class eval, samples of seen_class are ignored. - coco_gt_novel = copy.deepcopy(coco_gt) - for idx, ann in enumerate(coco_gt.dataset['annotations']): - if ann['category_id'] in self._seen_class_ids: - coco_gt_novel.dataset['annotations'][idx]['ignored_split'] = 1 - else: - coco_gt_novel.dataset['annotations'][idx]['ignored_split'] = 0 - coco_eval_novel = cocoeval.OlnCOCOevalXclassWrapper( - coco_gt_novel, coco_dt, iou_type='bbox') - coco_eval_novel.params.maxDets = [10, 20, 50, 100, 200] - coco_eval_novel.params.imgIds = image_ids - coco_eval_novel.params.useCats = 0 if not self._use_category else 1 - coco_eval_novel.evaluate() - coco_eval_novel.accumulate() - coco_eval_novel.summarize() - coco_metrics_novel = coco_eval_novel.stats - if self._include_mask: - mcoco_eval_novel = cocoeval.OlnCOCOevalXclassWrapper( - coco_gt_novel, coco_dt, iou_type='segm') - mcoco_eval_novel.params.maxDets = [10, 20, 50, 100, 200] - mcoco_eval_novel.params.imgIds = image_ids - mcoco_eval_novel.params.useCats = 0 if not self._use_category else 1 - mcoco_eval_novel.evaluate() - mcoco_eval_novel.accumulate() - mcoco_eval_novel.summarize() - mask_coco_metrics_novel = mcoco_eval_novel.stats - - # Combine all splits. - if self._include_mask: - metrics = np.hstack(( - coco_metrics, coco_metrics_seen, coco_metrics_novel, - mask_coco_metrics, mask_coco_metrics_seen, mask_coco_metrics_novel)) - else: - metrics = np.hstack(( - coco_metrics, coco_metrics_seen, coco_metrics_novel)) - - # Cleans up the internal variables in order for a fresh eval next time. - self.reset() - - metrics_dict = {} - for i, name in enumerate(self._metric_names): - metrics_dict[name] = metrics[i].astype(np.float32) - return metrics_dict - - -class OlnXdataEvaluator(OlnXclassEvaluator): - """COCO evaluation metric class.""" - - def __init__(self, annotation_file, include_mask, need_rescale_bboxes=True, - use_category=True, seen_class='all'): - """Constructs COCO evaluation class. - - The class provides the interface to metrics_fn in TPUEstimator. The - _update_op() takes detections from each image and push them to - self.detections. The _evaluate() loads a JSON file in COCO annotation format - as the groundtruths and runs COCO evaluation. - - Args: - annotation_file: a JSON file that stores annotations of the eval dataset. - If `annotation_file` is None, groundtruth annotations will be loaded - from the dataloader. - include_mask: a boolean to indicate whether or not to include the mask - eval. - need_rescale_bboxes: If true bboxes in `predictions` will be rescaled back - to absolute values (`image_info` is needed in this case). - use_category: if `False`, treat all object in all classes in one - foreground category. - seen_class: 'all' or 'voc' or 'nonvoc' - """ - super(OlnXdataEvaluator, self).__init__( - annotation_file=annotation_file, - include_mask=include_mask, - need_rescale_bboxes=need_rescale_bboxes, - use_category=False, - seen_class='all') - - def evaluate(self): - """Evaluates with detections from all images with COCO API. - - Returns: - coco_metric: float numpy array with shape [24] representing the - coco-style evaluation metrics (box and mask). - """ - if not self._annotation_file: - logging.info('Thre is no annotation_file in COCOEvaluator.') - gt_dataset = coco_utils.convert_groundtruths_to_coco_dataset( - self._groundtruths) - coco_gt = coco_utils.COCOWrapper( - eval_type=('mask' if self._include_mask else 'box'), - gt_dataset=gt_dataset) - else: - logging.info('Using annotation file: %s', self._annotation_file) - coco_gt = self._coco_gt - coco_predictions = coco_utils.convert_predictions_to_coco_annotations( - self._predictions) - coco_dt = coco_gt.loadRes(predictions=coco_predictions) - image_ids = [ann['image_id'] for ann in coco_predictions] - # Class manipulation: 'all' split samples -> ignored_split = 0. - for idx, _ in enumerate(coco_gt.dataset['annotations']): - coco_gt.dataset['annotations'][idx]['ignored_split'] = 0 - coco_eval = cocoeval.OlnCOCOevalWrapper(coco_gt, coco_dt, iou_type='bbox') - coco_eval.params.maxDets = [10, 20, 50, 100, 200] - coco_eval.params.imgIds = image_ids - coco_eval.params.useCats = 0 if not self._use_category else 1 - coco_eval.evaluate() - coco_eval.accumulate() - coco_eval.summarize() - coco_metrics = coco_eval.stats - - if self._include_mask: - mcoco_eval = cocoeval.OlnCOCOevalWrapper(coco_gt, coco_dt, - iou_type='segm') - mcoco_eval.params.maxDets = [10, 20, 50, 100, 200] - mcoco_eval.params.imgIds = image_ids - mcoco_eval.params.useCats = 0 if not self._use_category else 1 - mcoco_eval.evaluate() - mcoco_eval.accumulate() - mcoco_eval.summarize() - mask_coco_metrics = mcoco_eval.stats - - if self._include_mask: - metrics = np.hstack((coco_metrics, mask_coco_metrics)) - else: - metrics = coco_metrics - - # Cleans up the internal variables in order for a fresh eval next time. - self.reset() - - metrics_dict = {} - for i, name in enumerate(self._metric_names): - metrics_dict[name] = metrics[i].astype(np.float32) - return metrics_dict - - -class ShapeMaskCOCOEvaluator(COCOEvaluator): - """COCO evaluation metric class for ShapeMask.""" - - def __init__(self, mask_eval_class, **kwargs): - """Constructs COCO evaluation class. - - The class provides the interface to metrics_fn in TPUEstimator. The - _update_op() takes detections from each image and push them to - self.detections. The _evaluate() loads a JSON file in COCO annotation format - as the groundtruths and runs COCO evaluation. - - Args: - mask_eval_class: the set of classes for mask evaluation. - **kwargs: other keyword arguments passed to the parent class initializer. - """ - super(ShapeMaskCOCOEvaluator, self).__init__(**kwargs) - self._mask_eval_class = mask_eval_class - self._eval_categories = class_utils.coco_split_class_ids(mask_eval_class) - if mask_eval_class != 'all': - self._metric_names = [ - x.replace('mask', 'novel_mask') for x in self._metric_names - ] - - def evaluate(self): - """Evaluates with detections from all images with COCO API. - - Returns: - coco_metric: float numpy array with shape [24] representing the - coco-style evaluation metrics (box and mask). - """ - if not self._annotation_file: - gt_dataset = coco_utils.convert_groundtruths_to_coco_dataset( - self._groundtruths) - coco_gt = coco_utils.COCOWrapper( - eval_type=('mask' if self._include_mask else 'box'), - gt_dataset=gt_dataset) - else: - coco_gt = self._coco_gt - coco_predictions = coco_utils.convert_predictions_to_coco_annotations( - self._predictions) - coco_dt = coco_gt.loadRes(predictions=coco_predictions) - image_ids = [ann['image_id'] for ann in coco_predictions] - - coco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='bbox') - coco_eval.params.imgIds = image_ids - coco_eval.evaluate() - coco_eval.accumulate() - coco_eval.summarize() - coco_metrics = coco_eval.stats - - if self._include_mask: - mcoco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='segm') - mcoco_eval.params.imgIds = image_ids - mcoco_eval.evaluate() - mcoco_eval.accumulate() - mcoco_eval.summarize() - if self._mask_eval_class == 'all': - metrics = np.hstack((coco_metrics, mcoco_eval.stats)) - else: - mask_coco_metrics = mcoco_eval.category_stats - val_catg_idx = np.isin(mcoco_eval.params.catIds, self._eval_categories) - # Gather the valid evaluation of the eval categories. - if np.any(val_catg_idx): - mean_val_metrics = [] - for mid in range(len(self._metric_names) // 2): - mean_val_metrics.append( - np.nanmean(mask_coco_metrics[mid][val_catg_idx])) - - mean_val_metrics = np.array(mean_val_metrics) - else: - mean_val_metrics = np.zeros(len(self._metric_names) // 2) - metrics = np.hstack((coco_metrics, mean_val_metrics)) - else: - metrics = coco_metrics - - # Cleans up the internal variables in order for a fresh eval next time. - self.reset() - - metrics_dict = {} - for i, name in enumerate(self._metric_names): - metrics_dict[name] = metrics[i].astype(np.float32) - return metrics_dict diff --git a/official/vision/detection/evaluation/coco_utils.py b/official/vision/detection/evaluation/coco_utils.py deleted file mode 100644 index 0f289d354bacfc97e48c2b7d5af9fb6f72feae77..0000000000000000000000000000000000000000 --- a/official/vision/detection/evaluation/coco_utils.py +++ /dev/null @@ -1,374 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Util functions related to pycocotools and COCO eval.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import copy -import json - -from absl import logging -import numpy as np -from PIL import Image -from pycocotools import coco -from pycocotools import mask as mask_api -import six -import tensorflow as tf - -from official.vision.detection.dataloader import tf_example_decoder -from official.vision.detection.utils import box_utils -from official.vision.detection.utils import mask_utils - - -class COCOWrapper(coco.COCO): - """COCO wrapper class. - - This class wraps COCO API object, which provides the following additional - functionalities: - 1. Support string type image id. - 2. Support loading the groundtruth dataset using the external annotation - dictionary. - 3. Support loading the prediction results using the external annotation - dictionary. - """ - - def __init__(self, eval_type='box', annotation_file=None, gt_dataset=None): - """Instantiates a COCO-style API object. - - Args: - eval_type: either 'box' or 'mask'. - annotation_file: a JSON file that stores annotations of the eval dataset. - This is required if `gt_dataset` is not provided. - gt_dataset: the groundtruth eval datatset in COCO API format. - """ - if ((annotation_file and gt_dataset) or - ((not annotation_file) and (not gt_dataset))): - raise ValueError('One and only one of `annotation_file` and `gt_dataset` ' - 'needs to be specified.') - - if eval_type not in ['box', 'mask']: - raise ValueError('The `eval_type` can only be either `box` or `mask`.') - - coco.COCO.__init__(self, annotation_file=annotation_file) - self._eval_type = eval_type - if gt_dataset: - self.dataset = gt_dataset - self.createIndex() - - def loadRes(self, predictions): - """Loads result file and return a result api object. - - Args: - predictions: a list of dictionary each representing an annotation in COCO - format. The required fields are `image_id`, `category_id`, `score`, - `bbox`, `segmentation`. - - Returns: - res: result COCO api object. - - Raises: - ValueError: if the set of image id from predctions is not the subset of - the set of image id of the groundtruth dataset. - """ - res = coco.COCO() - res.dataset['images'] = copy.deepcopy(self.dataset['images']) - res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) - - image_ids = [ann['image_id'] for ann in predictions] - if set(image_ids) != (set(image_ids) & set(self.getImgIds())): - raise ValueError('Results do not correspond to the current dataset!') - for ann in predictions: - x1, x2, y1, y2 = [ann['bbox'][0], ann['bbox'][0] + ann['bbox'][2], - ann['bbox'][1], ann['bbox'][1] + ann['bbox'][3]] - if self._eval_type == 'box': - ann['area'] = ann['bbox'][2] * ann['bbox'][3] - ann['segmentation'] = [ - [x1, y1, x1, y2, x2, y2, x2, y1]] - elif self._eval_type == 'mask': - ann['area'] = mask_api.area(ann['segmentation']) - - res.dataset['annotations'] = copy.deepcopy(predictions) - res.createIndex() - return res - - -def convert_predictions_to_coco_annotations(predictions): - """Converts a batch of predictions to annotations in COCO format. - - Args: - predictions: a dictionary of lists of numpy arrays including the following - fields. K below denotes the maximum number of instances per image. - Required fields: - - source_id: a list of numpy arrays of int or string of shape - [batch_size]. - - num_detections: a list of numpy arrays of int of shape [batch_size]. - - detection_boxes: a list of numpy arrays of float of shape - [batch_size, K, 4], where coordinates are in the original image - space (not the scaled image space). - - detection_classes: a list of numpy arrays of int of shape - [batch_size, K]. - - detection_scores: a list of numpy arrays of float of shape - [batch_size, K]. - Optional fields: - - detection_masks: a list of numpy arrays of float of shape - [batch_size, K, mask_height, mask_width]. - - Returns: - coco_predictions: prediction in COCO annotation format. - """ - coco_predictions = [] - num_batches = len(predictions['source_id']) - batch_size = predictions['source_id'][0].shape[0] - max_num_detections = predictions['detection_classes'][0].shape[1] - use_outer_box = 'detection_outer_boxes' in predictions - for i in range(num_batches): - predictions['detection_boxes'][i] = box_utils.yxyx_to_xywh( - predictions['detection_boxes'][i]) - if use_outer_box: - predictions['detection_outer_boxes'][i] = box_utils.yxyx_to_xywh( - predictions['detection_outer_boxes'][i]) - mask_boxes = predictions['detection_outer_boxes'] - else: - mask_boxes = predictions['detection_boxes'] - - for j in range(batch_size): - if 'detection_masks' in predictions: - image_masks = mask_utils.paste_instance_masks( - predictions['detection_masks'][i][j], - mask_boxes[i][j], - int(predictions['image_info'][i][j, 0, 0]), - int(predictions['image_info'][i][j, 0, 1])) - binary_masks = (image_masks > 0.0).astype(np.uint8) - encoded_masks = [ - mask_api.encode(np.asfortranarray(binary_mask)) - for binary_mask in list(binary_masks)] - for k in range(max_num_detections): - ann = {} - ann['image_id'] = predictions['source_id'][i][j] - ann['category_id'] = predictions['detection_classes'][i][j, k] - ann['bbox'] = predictions['detection_boxes'][i][j, k] - ann['score'] = predictions['detection_scores'][i][j, k] - if 'detection_masks' in predictions: - ann['segmentation'] = encoded_masks[k] - coco_predictions.append(ann) - - for i, ann in enumerate(coco_predictions): - ann['id'] = i + 1 - - return coco_predictions - - -def convert_groundtruths_to_coco_dataset(groundtruths, label_map=None): - """Converts groundtruths to the dataset in COCO format. - - Args: - groundtruths: a dictionary of numpy arrays including the fields below. - Note that each element in the list represent the number for a single - example without batch dimension. K below denotes the actual number of - instances for each image. - Required fields: - - source_id: a list of numpy arrays of int or string of shape - [batch_size]. - - height: a list of numpy arrays of int of shape [batch_size]. - - width: a list of numpy arrays of int of shape [batch_size]. - - num_detections: a list of numpy arrays of int of shape [batch_size]. - - boxes: a list of numpy arrays of float of shape [batch_size, K, 4], - where coordinates are in the original image space (not the - normalized coordinates). - - classes: a list of numpy arrays of int of shape [batch_size, K]. - Optional fields: - - is_crowds: a list of numpy arrays of int of shape [batch_size, K]. If - th field is absent, it is assumed that this instance is not crowd. - - areas: a list of numy arrays of float of shape [batch_size, K]. If the - field is absent, the area is calculated using either boxes or - masks depending on which one is available. - - masks: a list of numpy arrays of string of shape [batch_size, K], - label_map: (optional) a dictionary that defines items from the category id - to the category name. If `None`, collect the category mappping from the - `groundtruths`. - - Returns: - coco_groundtruths: the groundtruth dataset in COCO format. - """ - source_ids = np.concatenate(groundtruths['source_id'], axis=0) - heights = np.concatenate(groundtruths['height'], axis=0) - widths = np.concatenate(groundtruths['width'], axis=0) - gt_images = [{'id': int(i), 'height': int(h), 'width': int(w)} for i, h, w - in zip(source_ids, heights, widths)] - - gt_annotations = [] - num_batches = len(groundtruths['source_id']) - batch_size = groundtruths['source_id'][0].shape[0] - for i in range(num_batches): - for j in range(batch_size): - num_instances = groundtruths['num_detections'][i][j] - for k in range(num_instances): - ann = {} - ann['image_id'] = int(groundtruths['source_id'][i][j]) - if 'is_crowds' in groundtruths: - ann['iscrowd'] = int(groundtruths['is_crowds'][i][j, k]) - else: - ann['iscrowd'] = 0 - ann['category_id'] = int(groundtruths['classes'][i][j, k]) - boxes = groundtruths['boxes'][i] - ann['bbox'] = [ - float(boxes[j, k, 1]), - float(boxes[j, k, 0]), - float(boxes[j, k, 3] - boxes[j, k, 1]), - float(boxes[j, k, 2] - boxes[j, k, 0])] - if 'areas' in groundtruths: - ann['area'] = float(groundtruths['areas'][i][j, k]) - else: - ann['area'] = float( - (boxes[j, k, 3] - boxes[j, k, 1]) * - (boxes[j, k, 2] - boxes[j, k, 0])) - if 'masks' in groundtruths: - mask = Image.open(six.BytesIO(groundtruths['masks'][i][j, k])) - width, height = mask.size - np_mask = ( - np.array(mask.getdata()).reshape(height, width).astype(np.uint8)) - np_mask[np_mask > 0] = 255 - encoded_mask = mask_api.encode(np.asfortranarray(np_mask)) - ann['segmentation'] = encoded_mask - if 'areas' not in groundtruths: - ann['area'] = mask_api.area(encoded_mask) - gt_annotations.append(ann) - - for i, ann in enumerate(gt_annotations): - ann['id'] = i + 1 - - if label_map: - gt_categories = [{'id': i, 'name': label_map[i]} for i in label_map] - else: - category_ids = [gt['category_id'] for gt in gt_annotations] - gt_categories = [{'id': i} for i in set(category_ids)] - - gt_dataset = { - 'images': gt_images, - 'categories': gt_categories, - 'annotations': copy.deepcopy(gt_annotations), - } - return gt_dataset - - -class COCOGroundtruthGenerator(object): - """Generates the groundtruth annotations from a single example.""" - - def __init__(self, file_pattern, num_examples, include_mask): - self._file_pattern = file_pattern - self._num_examples = num_examples - self._include_mask = include_mask - self._dataset_fn = tf.data.TFRecordDataset - - def _parse_single_example(self, example): - """Parses a single serialized tf.Example proto. - - Args: - example: a serialized tf.Example proto string. - - Returns: - A dictionary of groundtruth with the following fields: - source_id: a scalar tensor of int64 representing the image source_id. - height: a scalar tensor of int64 representing the image height. - width: a scalar tensor of int64 representing the image width. - boxes: a float tensor of shape [K, 4], representing the groundtruth - boxes in absolute coordinates with respect to the original image size. - classes: a int64 tensor of shape [K], representing the class labels of - each instances. - is_crowds: a bool tensor of shape [K], indicating whether the instance - is crowd. - areas: a float tensor of shape [K], indicating the area of each - instance. - masks: a string tensor of shape [K], containing the bytes of the png - mask of each instance. - """ - decoder = tf_example_decoder.TfExampleDecoder( - include_mask=self._include_mask) - decoded_tensors = decoder.decode(example) - - image = decoded_tensors['image'] - image_size = tf.shape(image)[0:2] - boxes = box_utils.denormalize_boxes( - decoded_tensors['groundtruth_boxes'], image_size) - groundtruths = { - 'source_id': tf.string_to_number( - decoded_tensors['source_id'], out_type=tf.int64), - 'height': decoded_tensors['height'], - 'width': decoded_tensors['width'], - 'num_detections': tf.shape(decoded_tensors['groundtruth_classes'])[0], - 'boxes': boxes, - 'classes': decoded_tensors['groundtruth_classes'], - 'is_crowds': decoded_tensors['groundtruth_is_crowd'], - 'areas': decoded_tensors['groundtruth_area'], - } - if self._include_mask: - groundtruths.update({ - 'masks': decoded_tensors['groundtruth_instance_masks_png'], - }) - return groundtruths - - def _build_pipeline(self): - """Builds data pipeline to generate groundtruth annotations.""" - dataset = tf.data.Dataset.list_files(self._file_pattern, shuffle=False) - dataset = dataset.apply( - tf.data.experimental.parallel_interleave( - lambda filename: self._dataset_fn(filename).prefetch(1), - cycle_length=32, - sloppy=False)) - dataset = dataset.map(self._parse_single_example, num_parallel_calls=64) - dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE) - dataset = dataset.batch(1, drop_remainder=False) - return dataset - - def __call__(self): - with tf.Graph().as_default(): - dataset = self._build_pipeline() - groundtruth = dataset.make_one_shot_iterator().get_next() - - with tf.Session() as sess: - for _ in range(self._num_examples): - groundtruth_result = sess.run(groundtruth) - yield groundtruth_result - - -def scan_and_generator_annotation_file(file_pattern, - num_samples, - include_mask, - annotation_file): - """Scans and generate the COCO-style annotation JSON file given a dataset.""" - groundtruth_generator = COCOGroundtruthGenerator( - file_pattern, num_samples, include_mask) - generate_annotation_file(groundtruth_generator, annotation_file) - - -def generate_annotation_file(groundtruth_generator, - annotation_file): - """Generates COCO-style annotation JSON file given a groundtruth generator.""" - groundtruths = {} - logging.info('Loading groundtruth annotations from dataset to memory...') - for groundtruth in groundtruth_generator(): - for k, v in six.iteritems(groundtruth): - if k not in groundtruths: - groundtruths[k] = [v] - else: - groundtruths[k].append(v) - gt_dataset = convert_groundtruths_to_coco_dataset(groundtruths) - - logging.info('Saving groundtruth annotations to the JSON file...') - with tf.io.gfile.GFile(annotation_file, 'w') as f: - f.write(json.dumps(gt_dataset)) - logging.info('Done saving the JSON file...') diff --git a/official/vision/detection/evaluation/factory.py b/official/vision/detection/evaluation/factory.py deleted file mode 100644 index fcc543bfd00b72c08540088f74d89e410569d020..0000000000000000000000000000000000000000 --- a/official/vision/detection/evaluation/factory.py +++ /dev/null @@ -1,52 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Evaluator factory.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from official.vision.detection.evaluation import coco_evaluator - - -def evaluator_generator(params): - """Generator function for various evaluators.""" - if params.type == 'box': - evaluator = coco_evaluator.COCOEvaluator( - annotation_file=params.val_json_file, include_mask=False) - elif params.type == 'box_and_mask': - evaluator = coco_evaluator.COCOEvaluator( - annotation_file=params.val_json_file, include_mask=True) - elif params.type == 'oln_xclass_box': - evaluator = coco_evaluator.OlnXclassEvaluator( - annotation_file=params.val_json_file, include_mask=False, - use_category=False, seen_class=params.seen_class,) - elif params.type == 'oln_xclass_box_and_mask': - evaluator = coco_evaluator.OlnXclassEvaluator( - annotation_file=params.val_json_file, include_mask=True, - use_category=False, seen_class=params.seen_class,) - elif params.type == 'oln_xdata_box': - evaluator = coco_evaluator.OlnXdataEvaluator( - annotation_file=params.val_json_file, include_mask=False, - use_category=False, seen_class='all',) - elif params.type == 'shapemask_box_and_mask': - evaluator = coco_evaluator.ShapeMaskCOCOEvaluator( - mask_eval_class=params.mask_eval_class, - annotation_file=params.val_json_file, include_mask=True) - - else: - raise ValueError('Evaluator %s is not supported.' % params.type) - - return coco_evaluator.MetricWrapper(evaluator) diff --git a/official/vision/detection/executor/__init__.py b/official/vision/detection/executor/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/executor/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/executor/detection_executor.py b/official/vision/detection/executor/detection_executor.py deleted file mode 100644 index 7a86d13385821729b3dbe2cf1650d6bb72ffc717..0000000000000000000000000000000000000000 --- a/official/vision/detection/executor/detection_executor.py +++ /dev/null @@ -1,159 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""An executor class for running model on TensorFlow 2.0.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from absl import logging - -import tensorflow as tf -from official.vision.detection.executor import distributed_executor as executor -from official.vision.utils.object_detection import visualization_utils - - -class DetectionDistributedExecutor(executor.DistributedExecutor): - """Detection specific customer training loop executor. - - Subclasses the DistributedExecutor and adds support for numpy based metrics. - """ - - def __init__(self, - predict_post_process_fn=None, - trainable_variables_filter=None, - **kwargs): - super(DetectionDistributedExecutor, self).__init__(**kwargs) - if predict_post_process_fn: - assert callable(predict_post_process_fn) - if trainable_variables_filter: - assert callable(trainable_variables_filter) - self._predict_post_process_fn = predict_post_process_fn - self._trainable_variables_filter = trainable_variables_filter - self.eval_steps = tf.Variable( - 0, - trainable=False, - dtype=tf.int32, - synchronization=tf.VariableSynchronization.ON_READ, - aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA, - shape=[]) - - def _create_replicated_step(self, - strategy, - model, - loss_fn, - optimizer, - metric=None): - trainable_variables = model.trainable_variables - if self._trainable_variables_filter: - trainable_variables = self._trainable_variables_filter( - trainable_variables) - logging.info('Filter trainable variables from %d to %d', - len(model.trainable_variables), len(trainable_variables)) - update_state_fn = lambda labels, outputs: None - if isinstance(metric, tf.keras.metrics.Metric): - update_state_fn = metric.update_state - else: - logging.error('Detection: train metric is not an instance of ' - 'tf.keras.metrics.Metric.') - - def _replicated_step(inputs): - """Replicated training step.""" - inputs, labels = inputs - - with tf.GradientTape() as tape: - outputs = model(inputs, training=True) - all_losses = loss_fn(labels, outputs) - losses = {} - for k, v in all_losses.items(): - losses[k] = tf.reduce_mean(v) - per_replica_loss = losses['total_loss'] / strategy.num_replicas_in_sync - update_state_fn(labels, outputs) - - grads = tape.gradient(per_replica_loss, trainable_variables) - clipped_grads, _ = tf.clip_by_global_norm(grads, clip_norm=1.0) - optimizer.apply_gradients(zip(clipped_grads, trainable_variables)) - return losses - - return _replicated_step - - def _create_test_step(self, strategy, model, metric): - """Creates a distributed test step.""" - - @tf.function - def test_step(iterator, eval_steps): - """Calculates evaluation metrics on distributed devices.""" - - def _test_step_fn(inputs, eval_steps): - """Replicated accuracy calculation.""" - inputs, labels = inputs - model_outputs = model(inputs, training=False) - if self._predict_post_process_fn: - labels, prediction_outputs = self._predict_post_process_fn( - labels, model_outputs) - num_remaining_visualizations = ( - self._params.eval.num_images_to_visualize - eval_steps) - # If there are remaining number of visualizations that needs to be - # done, add next batch outputs for visualization. - # - # TODO(hongjunchoi): Once dynamic slicing is supported on TPU, only - # write correct slice of outputs to summary file. - if num_remaining_visualizations > 0: - visualization_utils.visualize_images_with_bounding_boxes( - inputs, prediction_outputs['detection_boxes'], - self.global_train_step, self.eval_summary_writer) - - return labels, prediction_outputs - - labels, outputs = strategy.run( - _test_step_fn, args=( - next(iterator), - eval_steps, - )) - outputs = tf.nest.map_structure(strategy.experimental_local_results, - outputs) - labels = tf.nest.map_structure(strategy.experimental_local_results, - labels) - - eval_steps.assign_add(self._params.eval.batch_size) - return labels, outputs - - return test_step - - def _run_evaluation(self, test_step, current_training_step, metric, - test_iterator): - """Runs validation steps and aggregate metrics.""" - self.eval_steps.assign(0) - if not test_iterator or not metric: - logging.warning( - 'Both test_iterator (%s) and metrics (%s) must not be None.', - test_iterator, metric) - return None - logging.info('Running evaluation after step: %s.', current_training_step) - while True: - try: - labels, outputs = test_step(test_iterator, self.eval_steps) - if metric: - metric.update_state(labels, outputs) - except (StopIteration, tf.errors.OutOfRangeError): - break - - metric_result = metric.result() - if isinstance(metric, tf.keras.metrics.Metric): - metric_result = tf.nest.map_structure(lambda x: x.numpy().astype(float), - metric_result) - logging.info('Step: [%d] Validation metric = %s', current_training_step, - metric_result) - return metric_result diff --git a/official/vision/detection/executor/distributed_executor.py b/official/vision/detection/executor/distributed_executor.py deleted file mode 100644 index 145222aebea9cfe32962ba0d524e1df220811243..0000000000000000000000000000000000000000 --- a/official/vision/detection/executor/distributed_executor.py +++ /dev/null @@ -1,805 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Custom training loop for running TensorFlow 2.0 models.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -from absl import flags -from absl import logging - -import numpy as np -import tensorflow as tf - -# pylint: disable=unused-import,g-import-not-at-top,redefined-outer-name,reimported -from typing import Optional, Dict, List, Text, Callable, Union, Iterator, Any -from official.modeling.hyperparams import params_dict -from official.utils import hyperparams_flags -from official.common import distribute_utils -from official.utils.misc import keras_utils - -FLAGS = flags.FLAGS - -strategy_flags_dict = hyperparams_flags.strategy_flags_dict -hparam_flags_dict = hyperparams_flags.hparam_flags_dict - - -def _save_checkpoint(checkpoint, model_dir, checkpoint_prefix): - """Saves model to model_dir with provided checkpoint prefix.""" - - checkpoint_path = os.path.join(model_dir, checkpoint_prefix) - saved_path = checkpoint.save(checkpoint_path) - logging.info('Saving model as TF checkpoint: %s', saved_path) - - -def _steps_to_run(current_step, total_steps, steps_per_loop): - """Calculates steps to run on device.""" - if steps_per_loop <= 0: - raise ValueError('steps_per_loop should be positive integer.') - return min(total_steps - current_step, steps_per_loop) - - -def _no_metric(): - return None - - -def metrics_as_dict(metric): - """Puts input metric(s) into a list. - - Args: - metric: metric(s) to be put into the list. `metric` could be an object, a - list, or a dict of tf.keras.metrics.Metric or has the `required_method`. - - Returns: - A dictionary of valid metrics. - """ - if isinstance(metric, tf.keras.metrics.Metric): - metrics = {metric.name: metric} - elif isinstance(metric, list): - metrics = {m.name: m for m in metric} - elif isinstance(metric, dict): - metrics = metric - elif not metric: - return {} - else: - metrics = {'metric': metric} - return metrics - - -def metric_results(metric): - """Collects results from the given metric(s).""" - metrics = metrics_as_dict(metric) - metric_result = { - name: m.result().numpy().astype(float) for name, m in metrics.items() - } - return metric_result - - -def reset_states(metric): - """Resets states of the given metric(s).""" - metrics = metrics_as_dict(metric) - for m in metrics.values(): - m.reset_states() - - -class SummaryWriter(object): - """Simple SummaryWriter for writing dictionary of metrics. - - Attributes: - writer: The tf.SummaryWriter. - """ - - def __init__(self, model_dir: Text, name: Text): - """Inits SummaryWriter with paths. - - Args: - model_dir: the model folder path. - name: the summary subfolder name. - """ - self.writer = tf.summary.create_file_writer(os.path.join(model_dir, name)) - - def __call__(self, metrics: Union[Dict[Text, float], float], step: int): - """Write metrics to summary with the given writer. - - Args: - metrics: a dictionary of metrics values. Prefer dictionary. - step: integer. The training step. - """ - if not isinstance(metrics, dict): - # Support scalar metric without name. - logging.warning('Warning: summary writer prefer metrics as dictionary.') - metrics = {'metric': metrics} - - with self.writer.as_default(): - for k, v in metrics.items(): - tf.summary.scalar(k, v, step=step) - self.writer.flush() - - -class DistributedExecutor(object): - """Interface to train and eval models with tf.distribute.Strategy.""" - - def __init__(self, strategy, params, model_fn, loss_fn, is_multi_host=False): - """Constructor. - - Args: - strategy: an instance of tf.distribute.Strategy. - params: Model configuration needed to run distribution strategy. - model_fn: Keras model function. Signature: - (params: ParamsDict) -> tf.keras.models.Model. - loss_fn: loss function. Signature: - (y_true: Tensor, y_pred: Tensor) -> Tensor - is_multi_host: Set to True when using multi hosts for training, like multi - worker GPU or TPU pod (slice). Otherwise, False. - """ - - self._params = params - self._model_fn = model_fn - self._loss_fn = loss_fn - self._strategy = strategy - self._checkpoint_name = 'ctl_step_{step}.ckpt' - self._is_multi_host = is_multi_host - self.train_summary_writer = None - self.eval_summary_writer = None - self.global_train_step = None - - @property - def checkpoint_name(self): - """Returns default checkpoint name.""" - return self._checkpoint_name - - @checkpoint_name.setter - def checkpoint_name(self, name): - """Sets default summary writer for the current thread.""" - self._checkpoint_name = name - - def loss_fn(self): - return self._loss_fn() - - def model_fn(self, params): - return self._model_fn(params) - - def _save_config(self, model_dir): - """Save parameters to config files if model_dir is defined.""" - - logging.info('Save config to model_dir %s.', model_dir) - if model_dir: - if not tf.io.gfile.exists(model_dir): - tf.io.gfile.makedirs(model_dir) - self._params.lock() - params_dict.save_params_dict_to_yaml(self._params, - model_dir + '/params.yaml') - else: - logging.warning('model_dir is empty, so skip the save config.') - - def _get_input_iterator( - self, input_fn: Callable[..., tf.data.Dataset], - strategy: tf.distribute.Strategy) -> Optional[Iterator[Any]]: - """Returns distributed dataset iterator. - - Args: - input_fn: (params: dict) -> tf.data.Dataset. - strategy: an instance of tf.distribute.Strategy. - - Returns: - An iterator that yields input tensors. - """ - - if input_fn is None: - return None - # When training with multiple TPU workers, datasets needs to be cloned - # across workers. Since Dataset instance cannot be cloned in eager mode, - # we instead pass callable that returns a dataset. - if self._is_multi_host: - return iter(strategy.distribute_datasets_from_function(input_fn)) - else: - input_data = input_fn() - return iter(strategy.experimental_distribute_dataset(input_data)) - - def _create_replicated_step(self, - strategy, - model, - loss_fn, - optimizer, - metric=None): - """Creates a single training step. - - Args: - strategy: an instance of tf.distribute.Strategy. - model: (Tensor, bool) -> Tensor. model function. - loss_fn: (y_true: Tensor, y_pred: Tensor) -> Tensor. - optimizer: tf.keras.optimizers.Optimizer. - metric: tf.keras.metrics.Metric subclass. - - Returns: - The training step callable. - """ - metrics = metrics_as_dict(metric) - - def _replicated_step(inputs): - """Replicated training step.""" - inputs, labels = inputs - - with tf.GradientTape() as tape: - outputs = model(inputs, training=True) - prediction_loss = loss_fn(labels, outputs) - loss = tf.reduce_mean(prediction_loss) - loss = loss / strategy.num_replicas_in_sync - for m in metrics.values(): - m.update_state(labels, outputs) - - grads = tape.gradient(loss, model.trainable_variables) - optimizer.apply_gradients(zip(grads, model.trainable_variables)) - return loss - - return _replicated_step - - def _create_train_step(self, - strategy, - model, - loss_fn, - optimizer, - metric=None): - """Creates a distributed training step. - - Args: - strategy: an instance of tf.distribute.Strategy. - model: (Tensor, bool) -> Tensor. model function. - loss_fn: (y_true: Tensor, y_pred: Tensor) -> Tensor. - optimizer: tf.keras.optimizers.Optimizer. - metric: tf.keras.metrics.Metric subclass. - - Returns: - The training step callable. - """ - replicated_step = self._create_replicated_step(strategy, model, loss_fn, - optimizer, metric) - - @tf.function - def train_step(iterator, num_steps): - """Performs a distributed training step. - - Args: - iterator: an iterator that yields input tensors. - num_steps: the number of steps in the loop. - - Returns: - The loss tensor. - """ - if not isinstance(num_steps, tf.Tensor): - raise ValueError('steps should be an Tensor. Python object may cause ' - 'retracing.') - - per_replica_losses = strategy.run(replicated_step, args=(next(iterator),)) - for _ in tf.range(num_steps - 1): - per_replica_losses = strategy.run( - replicated_step, args=(next(iterator),)) - - # For reporting, we returns the mean of losses. - losses = tf.nest.map_structure( - lambda x: strategy.reduce(tf.distribute.ReduceOp.MEAN, x, axis=None), - per_replica_losses) - return losses - - return train_step - - def _create_test_step(self, strategy, model, metric): - """Creates a distributed test step.""" - metrics = metrics_as_dict(metric) - - @tf.function - def test_step(iterator): - """Calculates evaluation metrics on distributed devices.""" - if not metric: - logging.info('Skip test_step because metric is None (%s)', metric) - return None, None - - def _test_step_fn(inputs): - """Replicated accuracy calculation.""" - inputs, labels = inputs - model_outputs = model(inputs, training=False) - for m in metrics.values(): - m.update_state(labels, model_outputs) - return labels, model_outputs - - return strategy.run(_test_step_fn, args=(next(iterator),)) - - return test_step - - def train( - self, - train_input_fn: Callable[[params_dict.ParamsDict], tf.data.Dataset], - eval_input_fn: Optional[Callable[[params_dict.ParamsDict], - tf.data.Dataset]] = None, - model_dir: Optional[Text] = None, - total_steps: int = 1, - iterations_per_loop: int = 1, - train_metric_fn: Optional[Callable[[], Any]] = None, - eval_metric_fn: Optional[Callable[[], Any]] = None, - summary_writer_fn: Callable[[Text, Text], SummaryWriter] = SummaryWriter, - init_checkpoint: Optional[Callable[[tf.keras.Model], Any]] = None, - custom_callbacks: Optional[List[tf.keras.callbacks.Callback]] = None, - continuous_eval: bool = False, - save_config: bool = True): - """Runs distributed training. - - Args: - train_input_fn: (params: dict) -> tf.data.Dataset training data input - function. - eval_input_fn: (Optional) same type as train_input_fn. If not None, will - trigger evaluating metric on eval data. If None, will not run the eval - step. - model_dir: the folder path for model checkpoints. - total_steps: total training steps. - iterations_per_loop: train steps per loop. After each loop, this job will - update metrics like loss and save checkpoint. - train_metric_fn: metric_fn for evaluation in train_step. - eval_metric_fn: metric_fn for evaluation in test_step. - summary_writer_fn: function to create summary writer. - init_checkpoint: function to load checkpoint. - custom_callbacks: A list of Keras Callbacks objects to run during - training. More specifically, `on_batch_begin()`, `on_batch_end()`, - methods are invoked during training. - continuous_eval: If `True`, will continously run evaluation on every - available checkpoints. If `False`, will do the evaluation once after the - final step. - save_config: bool. Whether to save params to model_dir. - - Returns: - The training loss and eval metrics. - """ - assert train_input_fn is not None - if train_metric_fn and not callable(train_metric_fn): - raise ValueError('if `train_metric_fn` is specified, ' - 'train_metric_fn must be a callable.') - if eval_metric_fn and not callable(eval_metric_fn): - raise ValueError('if `eval_metric_fn` is specified, ' - 'eval_metric_fn must be a callable.') - train_metric_fn = train_metric_fn or _no_metric - eval_metric_fn = eval_metric_fn or _no_metric - - if custom_callbacks and iterations_per_loop != 1: - logging.warning( - 'It is sematically wrong to run callbacks when ' - 'iterations_per_loop is not one (%s)', iterations_per_loop) - - custom_callbacks = custom_callbacks or [] - - def _run_callbacks_on_batch_begin(batch): - """Runs custom callbacks at the start of every step.""" - if not custom_callbacks: - return - for callback in custom_callbacks: - if callback: - callback.on_batch_begin(batch) - - def _run_callbacks_on_batch_end(batch): - """Runs custom callbacks at the end of every step.""" - if not custom_callbacks: - return - for callback in custom_callbacks: - if callback: - callback.on_batch_end(batch) - - if save_config: - self._save_config(model_dir) - - if FLAGS.save_checkpoint_freq: - save_freq = FLAGS.save_checkpoint_freq - else: - save_freq = iterations_per_loop - - params = self._params - strategy = self._strategy - # To reduce unnecessary send/receive input pipeline operation, we place - # input pipeline ops in worker task. - train_iterator = self._get_input_iterator(train_input_fn, strategy) - train_loss = None - train_metric_result = None - eval_metric_result = None - tf.keras.backend.set_learning_phase(1) - with strategy.scope(): - # To correctly place the model weights on accelerators, - # model and optimizer should be created in scope. - model = self.model_fn(params.as_dict()) - if not hasattr(model, 'optimizer'): - raise ValueError('User should set optimizer attribute to model ' - 'inside `model_fn`.') - optimizer = model.optimizer - - # Training loop starts here. - checkpoint = tf.train.Checkpoint(model=model, optimizer=optimizer) - latest_checkpoint_file = tf.train.latest_checkpoint(model_dir) - initial_step = 0 - if latest_checkpoint_file: - logging.info( - 'Checkpoint file %s found and restoring from ' - 'checkpoint', latest_checkpoint_file) - checkpoint.restore(latest_checkpoint_file) - initial_step = optimizer.iterations.numpy() - logging.info('Loading from checkpoint file completed. Init step %d', - initial_step) - elif init_checkpoint: - logging.info('Restoring from init checkpoint function') - init_checkpoint(model) - logging.info('Loading from init checkpoint file completed') - - current_step = optimizer.iterations.numpy() - checkpoint_name = self.checkpoint_name - - eval_metric = eval_metric_fn() - train_metric = train_metric_fn() - train_summary_writer = summary_writer_fn(model_dir, 'eval_train') - self.train_summary_writer = train_summary_writer.writer - - test_summary_writer = summary_writer_fn(model_dir, 'eval_test') - self.eval_summary_writer = test_summary_writer.writer - - # Use training summary writer in TimeHistory if it's in use - for cb in custom_callbacks: - if isinstance(cb, keras_utils.TimeHistory): - cb.summary_writer = self.train_summary_writer - - # Continue training loop. - train_step = self._create_train_step( - strategy=strategy, - model=model, - loss_fn=self.loss_fn(), - optimizer=optimizer, - metric=train_metric) - test_step = None - if eval_input_fn and eval_metric: - self.global_train_step = model.optimizer.iterations - test_step = self._create_test_step(strategy, model, metric=eval_metric) - - # Step-0 operations - if current_step == 0 and not latest_checkpoint_file: - _save_checkpoint(checkpoint, model_dir, - checkpoint_name.format(step=current_step)) - if test_step: - eval_iterator = self._get_input_iterator(eval_input_fn, strategy) - eval_metric_result = self._run_evaluation(test_step, current_step, - eval_metric, eval_iterator) - logging.info('Step: %s evalation metric = %s.', current_step, - eval_metric_result) - test_summary_writer(metrics=eval_metric_result, step=optimizer.iterations) - reset_states(eval_metric) - - logging.info('Training started') - last_save_checkpoint_step = current_step - while current_step < total_steps: - - num_steps = _steps_to_run(current_step, total_steps, iterations_per_loop) - _run_callbacks_on_batch_begin(current_step) - train_loss = train_step(train_iterator, - tf.convert_to_tensor(num_steps, dtype=tf.int32)) - current_step += num_steps - - train_loss = tf.nest.map_structure(lambda x: x.numpy().astype(float), - train_loss) - - _run_callbacks_on_batch_end(current_step - 1) - if not isinstance(train_loss, dict): - train_loss = {'total_loss': train_loss} - if np.isnan(train_loss['total_loss']): - raise ValueError('total loss is NaN.') - - if train_metric: - train_metric_result = metric_results(train_metric) - train_metric_result.update(train_loss) - else: - train_metric_result = train_loss - if callable(optimizer.lr): - train_metric_result.update( - {'learning_rate': optimizer.lr(current_step).numpy()}) - else: - train_metric_result.update({'learning_rate': optimizer.lr.numpy()}) - logging.info('Train Step: %d/%d / loss = %s / training metric = %s', - current_step, total_steps, train_loss, train_metric_result) - - train_summary_writer( - metrics=train_metric_result, step=optimizer.iterations) - - # Saves model checkpoints and run validation steps at every - # iterations_per_loop steps. - # To avoid repeated model saving, we do not save after the last - # step of training. - if save_freq > 0 and current_step < total_steps and ( - current_step - last_save_checkpoint_step) >= save_freq: - _save_checkpoint(checkpoint, model_dir, - checkpoint_name.format(step=current_step)) - last_save_checkpoint_step = current_step - - if continuous_eval and current_step < total_steps and test_step: - eval_iterator = self._get_input_iterator(eval_input_fn, strategy) - eval_metric_result = self._run_evaluation(test_step, current_step, - eval_metric, eval_iterator) - logging.info('Step: %s evalation metric = %s.', current_step, - eval_metric_result) - test_summary_writer( - metrics=eval_metric_result, step=optimizer.iterations) - - # Re-initialize evaluation metric, except the last step. - if eval_metric and current_step < total_steps: - reset_states(eval_metric) - if train_metric and current_step < total_steps: - reset_states(train_metric) - - # Reaches the end of training and saves the last checkpoint. - if last_save_checkpoint_step < total_steps: - _save_checkpoint(checkpoint, model_dir, - checkpoint_name.format(step=current_step)) - - if test_step: - logging.info('Running final evaluation after training is complete.') - eval_iterator = self._get_input_iterator(eval_input_fn, strategy) - eval_metric_result = self._run_evaluation(test_step, current_step, - eval_metric, eval_iterator) - logging.info('Final evaluation metric = %s.', eval_metric_result) - test_summary_writer(metrics=eval_metric_result, step=optimizer.iterations) - - self.train_summary_writer.close() - self.eval_summary_writer.close() - - return train_metric_result, eval_metric_result - - def _run_evaluation(self, test_step, current_training_step, metric, - test_iterator): - """Runs validation steps and aggregate metrics.""" - if not test_iterator or not metric: - logging.warning( - 'Both test_iterator (%s) and metrics (%s) must not be None.', - test_iterator, metric) - return None - logging.info('Running evaluation after step: %s.', current_training_step) - eval_step = 0 - while True: - try: - with tf.experimental.async_scope(): - test_step(test_iterator) - eval_step += 1 - except (StopIteration, tf.errors.OutOfRangeError): - tf.experimental.async_clear_error() - break - - metric_result = metric_results(metric) - logging.info('Total eval steps: [%d]', eval_step) - logging.info('At training step: [%r] Validation metric = %r', - current_training_step, metric_result) - return metric_result - - def evaluate_from_model_dir( - self, - model_dir: Text, - eval_input_fn: Callable[[params_dict.ParamsDict], tf.data.Dataset], - eval_metric_fn: Callable[[], Any], - total_steps: int = -1, - eval_timeout: Optional[int] = None, - min_eval_interval: int = 180, - summary_writer_fn: Callable[[Text, Text], SummaryWriter] = SummaryWriter): - """Runs distributed evaluation on model folder. - - Args: - model_dir: the folder for storing model checkpoints. - eval_input_fn: (Optional) same type as train_input_fn. If not None, will - trigger evaluting metric on eval data. If None, will not run eval step. - eval_metric_fn: metric_fn for evaluation in test_step. - total_steps: total training steps. If the current step reaches the - total_steps, the evaluation loop will stop. - eval_timeout: The maximum number of seconds to wait between checkpoints. - If left as None, then the process will wait indefinitely. Used by - tf.train.checkpoints_iterator. - min_eval_interval: The minimum number of seconds between yielding - checkpoints. Used by tf.train.checkpoints_iterator. - summary_writer_fn: function to create summary writer. - - Returns: - Eval metrics dictionary of the last checkpoint. - """ - - if not model_dir: - raise ValueError('model_dir must be set.') - - def terminate_eval(): - tf.logging.info('Terminating eval after %d seconds of no checkpoints' % - eval_timeout) - return True - - summary_writer = summary_writer_fn(model_dir, 'eval') - self.eval_summary_writer = summary_writer.writer - - # Read checkpoints from the given model directory - # until `eval_timeout` seconds elapses. - for checkpoint_path in tf.train.checkpoints_iterator( - model_dir, - min_interval_secs=min_eval_interval, - timeout=eval_timeout, - timeout_fn=terminate_eval): - eval_metric_result, current_step = self.evaluate_checkpoint( - checkpoint_path=checkpoint_path, - eval_input_fn=eval_input_fn, - eval_metric_fn=eval_metric_fn, - summary_writer=summary_writer) - if total_steps > 0 and current_step >= total_steps: - logging.info('Evaluation finished after training step %d', current_step) - break - return eval_metric_result - - def evaluate_checkpoint(self, - checkpoint_path: Text, - eval_input_fn: Callable[[params_dict.ParamsDict], - tf.data.Dataset], - eval_metric_fn: Callable[[], Any], - summary_writer: Optional[SummaryWriter] = None): - """Runs distributed evaluation on the one checkpoint. - - Args: - checkpoint_path: the checkpoint to evaluate. - eval_input_fn: (Optional) same type as train_input_fn. If not None, will - trigger evaluting metric on eval data. If None, will not run eval step. - eval_metric_fn: metric_fn for evaluation in test_step. - summary_writer: function to create summary writer. - - Returns: - Eval metrics dictionary of the last checkpoint. - """ - if not callable(eval_metric_fn): - raise ValueError('if `eval_metric_fn` is specified, ' - 'eval_metric_fn must be a callable.') - - old_phase = tf.keras.backend.learning_phase() - tf.keras.backend.set_learning_phase(0) - params = self._params - strategy = self._strategy - # To reduce unnecessary send/receive input pipeline operation, we place - # input pipeline ops in worker task. - with strategy.scope(): - - # To correctly place the model weights on accelerators, - # model and optimizer should be created in scope. - model = self.model_fn(params.as_dict()) - checkpoint = tf.train.Checkpoint(model=model) - - eval_metric = eval_metric_fn() - assert eval_metric, 'eval_metric does not exist' - test_step = self._create_test_step(strategy, model, metric=eval_metric) - - logging.info('Starting to evaluate.') - if not checkpoint_path: - raise ValueError('checkpoint path is empty') - reader = tf.compat.v1.train.NewCheckpointReader(checkpoint_path) - current_step = reader.get_tensor( - 'optimizer/iter/.ATTRIBUTES/VARIABLE_VALUE') - logging.info('Checkpoint file %s found and restoring from ' - 'checkpoint', checkpoint_path) - status = checkpoint.restore(checkpoint_path) - status.expect_partial().assert_existing_objects_matched() - - self.global_train_step = model.optimizer.iterations - eval_iterator = self._get_input_iterator(eval_input_fn, strategy) - eval_metric_result = self._run_evaluation(test_step, current_step, - eval_metric, eval_iterator) - logging.info('Step: %s evalation metric = %s.', current_step, - eval_metric_result) - summary_writer(metrics=eval_metric_result, step=current_step) - reset_states(eval_metric) - - tf.keras.backend.set_learning_phase(old_phase) - return eval_metric_result, current_step - - def predict(self): - return NotImplementedError('Unimplmented function.') - - -class ExecutorBuilder(object): - """Builder of DistributedExecutor. - - Example 1: Builds an executor with supported Strategy. - builder = ExecutorBuilder( - strategy_type='tpu', - strategy_config={'tpu': '/bns/xxx'}) - dist_executor = builder.build_executor( - params=params, - model_fn=my_model_fn, - loss_fn=my_loss_fn, - metric_fn=my_metric_fn) - - Example 2: Builds an executor with customized Strategy. - builder = ExecutorBuilder() - builder.strategy = - dist_executor = builder.build_executor( - params=params, - model_fn=my_model_fn, - loss_fn=my_loss_fn, - metric_fn=my_metric_fn) - - Example 3: Builds a customized executor with customized Strategy. - class MyDistributedExecutor(DistributedExecutor): - # implementation ... - - builder = ExecutorBuilder() - builder.strategy = - dist_executor = builder.build_executor( - class_ctor=MyDistributedExecutor, - params=params, - model_fn=my_model_fn, - loss_fn=my_loss_fn, - metric_fn=my_metric_fn) - """ - - def __init__(self, strategy_type=None, strategy_config=None): - _ = distribute_utils.configure_cluster(strategy_config.worker_hosts, - strategy_config.task_index) - """Constructor. - - Args: - strategy_type: string. One of 'tpu', 'mirrored', 'multi_worker_mirrored'. - If None, the user is responsible to set the strategy before calling - build_executor(...). - strategy_config: necessary config for constructing the proper Strategy. - Check strategy_flags_dict() for examples of the structure. - """ - self._strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=strategy_type, - num_gpus=strategy_config.num_gpus, - all_reduce_alg=strategy_config.all_reduce_alg, - num_packs=strategy_config.num_packs, - tpu_address=strategy_config.tpu) - - @property - def strategy(self): - """Returns default checkpoint name.""" - return self._strategy - - @strategy.setter - def strategy(self, new_strategy): - """Sets default summary writer for the current thread.""" - self._strategy = new_strategy - - def build_executor(self, - class_ctor=DistributedExecutor, - params=None, - model_fn=None, - loss_fn=None, - **kwargs): - """Creates an executor according to strategy type. - - See doc string of the DistributedExecutor.__init__ for more information of - the - input arguments. - - Args: - class_ctor: A constructor of executor (default: DistributedExecutor). - params: ParamsDict, all the model parameters and runtime parameters. - model_fn: Keras model function. - loss_fn: loss function. - **kwargs: other arguments to the executor constructor. - - Returns: - An instance of DistributedExecutor or its subclass. - """ - if self._strategy is None: - raise ValueError('`strategy` should not be None. You need to specify ' - '`strategy_type` in the builder contructor or directly ' - 'set the `strategy` property of the builder.') - return class_ctor( - strategy=self._strategy, - params=params, - model_fn=model_fn, - loss_fn=loss_fn, - **kwargs) diff --git a/official/vision/detection/main.py b/official/vision/detection/main.py deleted file mode 100644 index 6bfdd2906ca67b95bc4086d542066929f0539c85..0000000000000000000000000000000000000000 --- a/official/vision/detection/main.py +++ /dev/null @@ -1,263 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Main function to train various object detection models.""" - -import functools -import pprint - -from absl import app -from absl import flags -from absl import logging -import tensorflow as tf - -from official.common import distribute_utils -from official.modeling.hyperparams import params_dict -from official.utils import hyperparams_flags -from official.utils.flags import core as flags_core -from official.utils.misc import keras_utils -from official.vision.detection.configs import factory as config_factory -from official.vision.detection.dataloader import input_reader -from official.vision.detection.dataloader import mode_keys as ModeKeys -from official.vision.detection.executor import distributed_executor as executor -from official.vision.detection.executor.detection_executor import DetectionDistributedExecutor -from official.vision.detection.modeling import factory as model_factory - -hyperparams_flags.initialize_common_flags() -flags_core.define_log_steps() - -flags.DEFINE_bool('enable_xla', default=False, help='Enable XLA for GPU') - -flags.DEFINE_string( - 'mode', - default='train', - help='Mode to run: `train`, `eval` or `eval_once`.') - -flags.DEFINE_string( - 'model', default='retinanet', - help='Model to run: `retinanet`, `mask_rcnn` or `shapemask`.') - -flags.DEFINE_string('training_file_pattern', None, - 'Location of the train data.') - -flags.DEFINE_string('eval_file_pattern', None, 'Location of ther eval data') - -flags.DEFINE_string( - 'checkpoint_path', None, - 'The checkpoint path to eval. Only used in eval_once mode.') - -FLAGS = flags.FLAGS - - -def run_executor(params, - mode, - checkpoint_path=None, - train_input_fn=None, - eval_input_fn=None, - callbacks=None, - prebuilt_strategy=None): - """Runs the object detection model on distribution strategy defined by the user.""" - - if params.architecture.use_bfloat16: - tf.compat.v2.keras.mixed_precision.set_global_policy('mixed_bfloat16') - - model_builder = model_factory.model_generator(params) - - if prebuilt_strategy is not None: - strategy = prebuilt_strategy - else: - strategy_config = params.strategy_config - distribute_utils.configure_cluster(strategy_config.worker_hosts, - strategy_config.task_index) - strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=params.strategy_type, - num_gpus=strategy_config.num_gpus, - all_reduce_alg=strategy_config.all_reduce_alg, - num_packs=strategy_config.num_packs, - tpu_address=strategy_config.tpu) - - num_workers = int(strategy.num_replicas_in_sync + 7) // 8 - is_multi_host = (int(num_workers) >= 2) - - if mode == 'train': - - def _model_fn(params): - return model_builder.build_model(params, mode=ModeKeys.TRAIN) - - logging.info( - 'Train num_replicas_in_sync %d num_workers %d is_multi_host %s', - strategy.num_replicas_in_sync, num_workers, is_multi_host) - - dist_executor = DetectionDistributedExecutor( - strategy=strategy, - params=params, - model_fn=_model_fn, - loss_fn=model_builder.build_loss_fn, - is_multi_host=is_multi_host, - predict_post_process_fn=model_builder.post_processing, - trainable_variables_filter=model_builder - .make_filter_trainable_variables_fn()) - - if is_multi_host: - train_input_fn = functools.partial( - train_input_fn, - batch_size=params.train.batch_size // strategy.num_replicas_in_sync) - - return dist_executor.train( - train_input_fn=train_input_fn, - model_dir=params.model_dir, - iterations_per_loop=params.train.iterations_per_loop, - total_steps=params.train.total_steps, - init_checkpoint=model_builder.make_restore_checkpoint_fn(), - custom_callbacks=callbacks, - save_config=True) - elif mode == 'eval' or mode == 'eval_once': - - def _model_fn(params): - return model_builder.build_model(params, mode=ModeKeys.PREDICT_WITH_GT) - - logging.info('Eval num_replicas_in_sync %d num_workers %d is_multi_host %s', - strategy.num_replicas_in_sync, num_workers, is_multi_host) - - if is_multi_host: - eval_input_fn = functools.partial( - eval_input_fn, - batch_size=params.eval.batch_size // strategy.num_replicas_in_sync) - - dist_executor = DetectionDistributedExecutor( - strategy=strategy, - params=params, - model_fn=_model_fn, - loss_fn=model_builder.build_loss_fn, - is_multi_host=is_multi_host, - predict_post_process_fn=model_builder.post_processing, - trainable_variables_filter=model_builder - .make_filter_trainable_variables_fn()) - - if mode == 'eval': - results = dist_executor.evaluate_from_model_dir( - model_dir=params.model_dir, - eval_input_fn=eval_input_fn, - eval_metric_fn=model_builder.eval_metrics, - eval_timeout=params.eval.eval_timeout, - min_eval_interval=params.eval.min_eval_interval, - total_steps=params.train.total_steps) - else: - # Run evaluation once for a single checkpoint. - if not checkpoint_path: - raise ValueError('checkpoint_path cannot be empty.') - if tf.io.gfile.isdir(checkpoint_path): - checkpoint_path = tf.train.latest_checkpoint(checkpoint_path) - summary_writer = executor.SummaryWriter(params.model_dir, 'eval') - results, _ = dist_executor.evaluate_checkpoint( - checkpoint_path=checkpoint_path, - eval_input_fn=eval_input_fn, - eval_metric_fn=model_builder.eval_metrics, - summary_writer=summary_writer) - for k, v in results.items(): - logging.info('Final eval metric %s: %f', k, v) - return results - else: - raise ValueError('Mode not found: %s.' % mode) - - -def run(callbacks=None): - keras_utils.set_session_config(enable_xla=FLAGS.enable_xla) - - params = config_factory.config_generator(FLAGS.model) - - params = params_dict.override_params_dict( - params, FLAGS.config_file, is_strict=True) - - params = params_dict.override_params_dict( - params, FLAGS.params_override, is_strict=True) - params.override( - { - 'strategy_type': FLAGS.strategy_type, - 'model_dir': FLAGS.model_dir, - 'strategy_config': executor.strategy_flags_dict(), - }, - is_strict=False) - - # Make sure use_tpu and strategy_type are in sync. - params.use_tpu = (params.strategy_type == 'tpu') - - if not params.use_tpu: - params.override({ - 'architecture': { - 'use_bfloat16': False, - }, - 'norm_activation': { - 'use_sync_bn': False, - }, - }, is_strict=True) - - params.validate() - params.lock() - pp = pprint.PrettyPrinter() - params_str = pp.pformat(params.as_dict()) - logging.info('Model Parameters: %s', params_str) - - train_input_fn = None - eval_input_fn = None - training_file_pattern = FLAGS.training_file_pattern or params.train.train_file_pattern - eval_file_pattern = FLAGS.eval_file_pattern or params.eval.eval_file_pattern - if not training_file_pattern and not eval_file_pattern: - raise ValueError('Must provide at least one of training_file_pattern and ' - 'eval_file_pattern.') - - if training_file_pattern: - # Use global batch size for single host. - train_input_fn = input_reader.InputFn( - file_pattern=training_file_pattern, - params=params, - mode=input_reader.ModeKeys.TRAIN, - batch_size=params.train.batch_size) - - if eval_file_pattern: - eval_input_fn = input_reader.InputFn( - file_pattern=eval_file_pattern, - params=params, - mode=input_reader.ModeKeys.PREDICT_WITH_GT, - batch_size=params.eval.batch_size, - num_examples=params.eval.eval_samples) - - if callbacks is None: - callbacks = [] - - if FLAGS.log_steps: - callbacks.append( - keras_utils.TimeHistory( - batch_size=params.train.batch_size, - log_steps=FLAGS.log_steps, - )) - - return run_executor( - params, - FLAGS.mode, - checkpoint_path=FLAGS.checkpoint_path, - train_input_fn=train_input_fn, - eval_input_fn=eval_input_fn, - callbacks=callbacks) - - -def main(argv): - del argv # Unused. - - run() - - -if __name__ == '__main__': - tf.config.set_soft_device_placement(True) - app.run(main) diff --git a/official/vision/detection/modeling/__init__.py b/official/vision/detection/modeling/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/modeling/architecture/__init__.py b/official/vision/detection/modeling/architecture/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/modeling/architecture/factory.py b/official/vision/detection/modeling/architecture/factory.py deleted file mode 100644 index f39949d26ffd0f1ba3ac195b4c059744b6c99579..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/factory.py +++ /dev/null @@ -1,217 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Model architecture factory.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from official.vision.detection.modeling.architecture import fpn -from official.vision.detection.modeling.architecture import heads -from official.vision.detection.modeling.architecture import identity -from official.vision.detection.modeling.architecture import nn_ops -from official.vision.detection.modeling.architecture import resnet -from official.vision.detection.modeling.architecture import spinenet - - -def norm_activation_generator(params): - return nn_ops.norm_activation_builder( - momentum=params.batch_norm_momentum, - epsilon=params.batch_norm_epsilon, - trainable=params.batch_norm_trainable, - activation=params.activation) - - -def backbone_generator(params): - """Generator function for various backbone models.""" - if params.architecture.backbone == 'resnet': - resnet_params = params.resnet - backbone_fn = resnet.Resnet( - resnet_depth=resnet_params.resnet_depth, - activation=params.norm_activation.activation, - norm_activation=norm_activation_generator( - params.norm_activation)) - elif params.architecture.backbone == 'spinenet': - spinenet_params = params.spinenet - backbone_fn = spinenet.SpineNetBuilder(model_id=spinenet_params.model_id) - else: - raise ValueError('Backbone model `{}` is not supported.' - .format(params.architecture.backbone)) - - return backbone_fn - - -def multilevel_features_generator(params): - """Generator function for various FPN models.""" - if params.architecture.multilevel_features == 'fpn': - fpn_params = params.fpn - fpn_fn = fpn.Fpn( - min_level=params.architecture.min_level, - max_level=params.architecture.max_level, - fpn_feat_dims=fpn_params.fpn_feat_dims, - use_separable_conv=fpn_params.use_separable_conv, - activation=params.norm_activation.activation, - use_batch_norm=fpn_params.use_batch_norm, - norm_activation=norm_activation_generator( - params.norm_activation)) - elif params.architecture.multilevel_features == 'identity': - fpn_fn = identity.Identity() - else: - raise ValueError('The multi-level feature model `{}` is not supported.' - .format(params.architecture.multilevel_features)) - return fpn_fn - - -def retinanet_head_generator(params): - """Generator function for RetinaNet head architecture.""" - head_params = params.retinanet_head - anchors_per_location = params.anchor.num_scales * len( - params.anchor.aspect_ratios) - return heads.RetinanetHead( - params.architecture.min_level, - params.architecture.max_level, - params.architecture.num_classes, - anchors_per_location, - head_params.num_convs, - head_params.num_filters, - head_params.use_separable_conv, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def rpn_head_generator(params): - """Generator function for RPN head architecture.""" - head_params = params.rpn_head - anchors_per_location = params.anchor.num_scales * len( - params.anchor.aspect_ratios) - return heads.RpnHead( - params.architecture.min_level, - params.architecture.max_level, - anchors_per_location, - head_params.num_convs, - head_params.num_filters, - head_params.use_separable_conv, - params.norm_activation.activation, - head_params.use_batch_norm, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def oln_rpn_head_generator(params): - """Generator function for OLN-proposal (OLN-RPN) head architecture.""" - head_params = params.rpn_head - anchors_per_location = params.anchor.num_scales * len( - params.anchor.aspect_ratios) - return heads.OlnRpnHead( - params.architecture.min_level, - params.architecture.max_level, - anchors_per_location, - head_params.num_convs, - head_params.num_filters, - head_params.use_separable_conv, - params.norm_activation.activation, - head_params.use_batch_norm, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def fast_rcnn_head_generator(params): - """Generator function for Fast R-CNN head architecture.""" - head_params = params.frcnn_head - return heads.FastrcnnHead( - params.architecture.num_classes, - head_params.num_convs, - head_params.num_filters, - head_params.use_separable_conv, - head_params.num_fcs, - head_params.fc_dims, - params.norm_activation.activation, - head_params.use_batch_norm, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def oln_box_score_head_generator(params): - """Generator function for Scoring Fast R-CNN head architecture.""" - head_params = params.frcnn_head - return heads.OlnBoxScoreHead( - params.architecture.num_classes, - head_params.num_convs, - head_params.num_filters, - head_params.use_separable_conv, - head_params.num_fcs, - head_params.fc_dims, - params.norm_activation.activation, - head_params.use_batch_norm, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def mask_rcnn_head_generator(params): - """Generator function for Mask R-CNN head architecture.""" - head_params = params.mrcnn_head - return heads.MaskrcnnHead( - params.architecture.num_classes, - params.architecture.mask_target_size, - head_params.num_convs, - head_params.num_filters, - head_params.use_separable_conv, - params.norm_activation.activation, - head_params.use_batch_norm, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def oln_mask_score_head_generator(params): - """Generator function for Scoring Mask R-CNN head architecture.""" - head_params = params.mrcnn_head - return heads.OlnMaskScoreHead( - params.architecture.num_classes, - params.architecture.mask_target_size, - head_params.num_convs, - head_params.num_filters, - head_params.use_separable_conv, - params.norm_activation.activation, - head_params.use_batch_norm, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def shapeprior_head_generator(params): - """Generator function for shape prior head architecture.""" - head_params = params.shapemask_head - return heads.ShapemaskPriorHead( - params.architecture.num_classes, - head_params.num_downsample_channels, - head_params.mask_crop_size, - head_params.use_category_for_mask, - head_params.shape_prior_path) - - -def coarsemask_head_generator(params): - """Generator function for ShapeMask coarse mask head architecture.""" - head_params = params.shapemask_head - return heads.ShapemaskCoarsemaskHead( - params.architecture.num_classes, - head_params.num_downsample_channels, - head_params.mask_crop_size, - head_params.use_category_for_mask, - head_params.num_convs, - norm_activation=norm_activation_generator(params.norm_activation)) - - -def finemask_head_generator(params): - """Generator function for Shapemask fine mask head architecture.""" - head_params = params.shapemask_head - return heads.ShapemaskFinemaskHead( - params.architecture.num_classes, - head_params.num_downsample_channels, - head_params.mask_crop_size, - head_params.use_category_for_mask, - head_params.num_convs, - head_params.upsample_factor) diff --git a/official/vision/detection/modeling/architecture/fpn.py b/official/vision/detection/modeling/architecture/fpn.py deleted file mode 100644 index 3cfb56dbdec6c6b09e4cc7f6bbd70b054f6cbc10..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/fpn.py +++ /dev/null @@ -1,150 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Feature Pyramid Networks. - -Feature Pyramid Networks were proposed in: -[1] Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, - , and Serge Belongie - Feature Pyramid Networks for Object Detection. CVPR 2017. -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -import tensorflow as tf - -from official.vision.detection.modeling.architecture import nn_ops -from official.vision.detection.ops import spatial_transform_ops - - -class Fpn(object): - """Feature pyramid networks.""" - - def __init__(self, - min_level=3, - max_level=7, - fpn_feat_dims=256, - use_separable_conv=False, - activation='relu', - use_batch_norm=True, - norm_activation=nn_ops.norm_activation_builder( - activation='relu')): - """FPN initialization function. - - Args: - min_level: `int` minimum level in FPN output feature maps. - max_level: `int` maximum level in FPN output feature maps. - fpn_feat_dims: `int` number of filters in FPN layers. - use_separable_conv: `bool`, if True use separable convolution for - convolution in FPN layers. - use_batch_norm: 'bool', indicating whether batchnorm layers are added. - norm_activation: an operation that includes a normalization layer - followed by an optional activation layer. - """ - self._min_level = min_level - self._max_level = max_level - self._fpn_feat_dims = fpn_feat_dims - if use_separable_conv: - self._conv2d_op = functools.partial( - tf.keras.layers.SeparableConv2D, depth_multiplier=1) - else: - self._conv2d_op = tf.keras.layers.Conv2D - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - self._use_batch_norm = use_batch_norm - self._norm_activation = norm_activation - - self._norm_activations = {} - self._lateral_conv2d_op = {} - self._post_hoc_conv2d_op = {} - self._coarse_conv2d_op = {} - for level in range(self._min_level, self._max_level + 1): - if self._use_batch_norm: - self._norm_activations[level] = norm_activation( - use_activation=False, name='p%d-bn' % level) - self._lateral_conv2d_op[level] = self._conv2d_op( - filters=self._fpn_feat_dims, - kernel_size=(1, 1), - padding='same', - name='l%d' % level) - self._post_hoc_conv2d_op[level] = self._conv2d_op( - filters=self._fpn_feat_dims, - strides=(1, 1), - kernel_size=(3, 3), - padding='same', - name='post_hoc_d%d' % level) - self._coarse_conv2d_op[level] = self._conv2d_op( - filters=self._fpn_feat_dims, - strides=(2, 2), - kernel_size=(3, 3), - padding='same', - name='p%d' % level) - - def __call__(self, multilevel_features, is_training=None): - """Returns the FPN features for a given multilevel features. - - Args: - multilevel_features: a `dict` containing `int` keys for continuous feature - levels, e.g., [2, 3, 4, 5]. The values are corresponding features with - shape [batch_size, height_l, width_l, num_filters]. - is_training: `bool` if True, the model is in training mode. - - Returns: - a `dict` containing `int` keys for continuous feature levels - [min_level, min_level + 1, ..., max_level]. The values are corresponding - FPN features with shape [batch_size, height_l, width_l, fpn_feat_dims]. - """ - input_levels = list(multilevel_features.keys()) - if min(input_levels) > self._min_level: - raise ValueError( - 'The minimum backbone level %d should be '%(min(input_levels)) + - 'less or equal to FPN minimum level %d.:'%(self._min_level)) - backbone_max_level = min(max(input_levels), self._max_level) - with tf.name_scope('fpn'): - # Adds lateral connections. - feats_lateral = {} - for level in range(self._min_level, backbone_max_level + 1): - feats_lateral[level] = self._lateral_conv2d_op[level]( - multilevel_features[level]) - - # Adds top-down path. - feats = {backbone_max_level: feats_lateral[backbone_max_level]} - for level in range(backbone_max_level - 1, self._min_level - 1, -1): - feats[level] = spatial_transform_ops.nearest_upsampling( - feats[level + 1], 2) + feats_lateral[level] - - # Adds post-hoc 3x3 convolution kernel. - for level in range(self._min_level, backbone_max_level + 1): - feats[level] = self._post_hoc_conv2d_op[level](feats[level]) - - # Adds coarser FPN levels introduced for RetinaNet. - for level in range(backbone_max_level + 1, self._max_level + 1): - feats_in = feats[level - 1] - if level > backbone_max_level + 1: - feats_in = self._activation_op(feats_in) - feats[level] = self._coarse_conv2d_op[level](feats_in) - if self._use_batch_norm: - # Adds batch_norm layer. - for level in range(self._min_level, self._max_level + 1): - feats[level] = self._norm_activations[level]( - feats[level], is_training=is_training) - return feats diff --git a/official/vision/detection/modeling/architecture/heads.py b/official/vision/detection/modeling/architecture/heads.py deleted file mode 100644 index 8eb89892d67bd33541c6586cb035cdffbdc31ad8..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/heads.py +++ /dev/null @@ -1,1279 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Classes to build various prediction heads in all supported models.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -import numpy as np -import tensorflow as tf - -from official.vision.detection.modeling.architecture import nn_ops -from official.vision.detection.ops import spatial_transform_ops - - -class RpnHead(tf.keras.layers.Layer): - """Region Proposal Network head.""" - - def __init__( - self, - min_level, - max_level, - anchors_per_location, - num_convs=2, - num_filters=256, - use_separable_conv=False, - activation='relu', - use_batch_norm=True, - norm_activation=nn_ops.norm_activation_builder(activation='relu')): - """Initialize params to build Region Proposal Network head. - - Args: - min_level: `int` number of minimum feature level. - max_level: `int` number of maximum feature level. - anchors_per_location: `int` number of number of anchors per pixel - location. - num_convs: `int` number that represents the number of the intermediate - conv layers before the prediction. - num_filters: `int` number that represents the number of filters of the - intermediate conv layers. - use_separable_conv: `bool`, indicating whether the separable conv layers - is used. - activation: activation function. Support 'relu' and 'swish'. - use_batch_norm: 'bool', indicating whether batchnorm layers are added. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - """ - super().__init__(autocast=False) - - self._min_level = min_level - self._max_level = max_level - self._anchors_per_location = anchors_per_location - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - self._use_batch_norm = use_batch_norm - - if use_separable_conv: - self._conv2d_op = functools.partial( - tf.keras.layers.SeparableConv2D, - depth_multiplier=1, - bias_initializer=tf.zeros_initializer()) - else: - self._conv2d_op = functools.partial( - tf.keras.layers.Conv2D, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - bias_initializer=tf.zeros_initializer()) - - self._rpn_conv = self._conv2d_op( - num_filters, - kernel_size=(3, 3), - strides=(1, 1), - activation=(None if self._use_batch_norm else self._activation_op), - padding='same', - name='rpn') - self._rpn_class_conv = self._conv2d_op( - anchors_per_location, - kernel_size=(1, 1), - strides=(1, 1), - padding='valid', - name='rpn-class') - self._rpn_box_conv = self._conv2d_op( - 4 * anchors_per_location, - kernel_size=(1, 1), - strides=(1, 1), - padding='valid', - name='rpn-box') - - self._norm_activations = {} - if self._use_batch_norm: - for level in range(self._min_level, self._max_level + 1): - self._norm_activations[level] = norm_activation(name='rpn-l%d-bn' % - level) - - def _shared_rpn_heads(self, features, anchors_per_location, level, - is_training): - """Shared RPN heads.""" - features = self._rpn_conv(features) - if self._use_batch_norm: - # The batch normalization layers are not shared between levels. - features = self._norm_activations[level]( - features, is_training=is_training) - # Proposal classification scores - scores = self._rpn_class_conv(features) - # Proposal bbox regression deltas - bboxes = self._rpn_box_conv(features) - - return scores, bboxes - - def call(self, features, is_training=None): - - scores_outputs = {} - box_outputs = {} - - with tf.name_scope('rpn_head'): - for level in range(self._min_level, self._max_level + 1): - scores_output, box_output = self._shared_rpn_heads( - features[level], self._anchors_per_location, level, is_training) - scores_outputs[level] = scores_output - box_outputs[level] = box_output - return scores_outputs, box_outputs - - -class OlnRpnHead(tf.keras.layers.Layer): - """Region Proposal Network for Object Localization Network (OLN).""" - - def __init__( - self, - min_level, - max_level, - anchors_per_location, - num_convs=2, - num_filters=256, - use_separable_conv=False, - activation='relu', - use_batch_norm=True, - norm_activation=nn_ops.norm_activation_builder(activation='relu')): - """Initialize params to build Region Proposal Network head. - - Args: - min_level: `int` number of minimum feature level. - max_level: `int` number of maximum feature level. - anchors_per_location: `int` number of number of anchors per pixel - location. - num_convs: `int` number that represents the number of the intermediate - conv layers before the prediction. - num_filters: `int` number that represents the number of filters of the - intermediate conv layers. - use_separable_conv: `bool`, indicating whether the separable conv layers - is used. - activation: activation function. Support 'relu' and 'swish'. - use_batch_norm: 'bool', indicating whether batchnorm layers are added. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - """ - self._min_level = min_level - self._max_level = max_level - self._anchors_per_location = anchors_per_location - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - self._use_batch_norm = use_batch_norm - - if use_separable_conv: - self._conv2d_op = functools.partial( - tf.keras.layers.SeparableConv2D, - depth_multiplier=1, - bias_initializer=tf.zeros_initializer()) - else: - self._conv2d_op = functools.partial( - tf.keras.layers.Conv2D, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - bias_initializer=tf.zeros_initializer()) - - self._rpn_conv = self._conv2d_op( - num_filters, - kernel_size=(3, 3), - strides=(1, 1), - activation=(None if self._use_batch_norm else self._activation_op), - padding='same', - name='rpn') - self._rpn_class_conv = self._conv2d_op( - anchors_per_location, - kernel_size=(1, 1), - strides=(1, 1), - padding='valid', - name='rpn-class') - self._rpn_box_conv = self._conv2d_op( - 4 * anchors_per_location, - kernel_size=(1, 1), - strides=(1, 1), - padding='valid', - name='rpn-box-lrtb') - self._rpn_center_conv = self._conv2d_op( - anchors_per_location, - kernel_size=(1, 1), - strides=(1, 1), - padding='valid', - name='rpn-centerness') - - self._norm_activations = {} - if self._use_batch_norm: - for level in range(self._min_level, self._max_level + 1): - self._norm_activations[level] = norm_activation(name='rpn-l%d-bn' % - level) - - def _shared_rpn_heads(self, features, anchors_per_location, level, - is_training): - """Shared RPN heads.""" - features = self._rpn_conv(features) - if self._use_batch_norm: - # The batch normalization layers are not shared between levels. - features = self._norm_activations[level]( - features, is_training=is_training) - # Feature L2 normalization for training stability - features = tf.math.l2_normalize( - features, - axis=-1, - name='rpn-norm',) - # Proposal classification scores - scores = self._rpn_class_conv(features) - # Proposal bbox regression deltas - bboxes = self._rpn_box_conv(features) - # Proposal centerness scores - centers = self._rpn_center_conv(features) - - return scores, bboxes, centers - - def __call__(self, features, is_training=None): - - scores_outputs = {} - box_outputs = {} - center_outputs = {} - - with tf.name_scope('rpn_head'): - for level in range(self._min_level, self._max_level + 1): - scores_output, box_output, center_output = self._shared_rpn_heads( - features[level], self._anchors_per_location, level, is_training) - scores_outputs[level] = scores_output - box_outputs[level] = box_output - center_outputs[level] = center_output - return scores_outputs, box_outputs, center_outputs - - -class FastrcnnHead(tf.keras.layers.Layer): - """Fast R-CNN box head.""" - - def __init__( - self, - num_classes, - num_convs=0, - num_filters=256, - use_separable_conv=False, - num_fcs=2, - fc_dims=1024, - activation='relu', - use_batch_norm=True, - norm_activation=nn_ops.norm_activation_builder(activation='relu')): - """Initialize params to build Fast R-CNN box head. - - Args: - num_classes: a integer for the number of classes. - num_convs: `int` number that represents the number of the intermediate - conv layers before the FC layers. - num_filters: `int` number that represents the number of filters of the - intermediate conv layers. - use_separable_conv: `bool`, indicating whether the separable conv layers - is used. - num_fcs: `int` number that represents the number of FC layers before the - predictions. - fc_dims: `int` number that represents the number of dimension of the FC - layers. - activation: activation function. Support 'relu' and 'swish'. - use_batch_norm: 'bool', indicating whether batchnorm layers are added. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - """ - super(FastrcnnHead, self).__init__(autocast=False) - - self._num_classes = num_classes - - self._num_convs = num_convs - self._num_filters = num_filters - if use_separable_conv: - self._conv2d_op = functools.partial( - tf.keras.layers.SeparableConv2D, - depth_multiplier=1, - bias_initializer=tf.zeros_initializer()) - else: - self._conv2d_op = functools.partial( - tf.keras.layers.Conv2D, - kernel_initializer=tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - bias_initializer=tf.zeros_initializer()) - - self._num_fcs = num_fcs - self._fc_dims = fc_dims - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - self._use_batch_norm = use_batch_norm - self._norm_activation = norm_activation - - self._conv_ops = [] - self._conv_bn_ops = [] - for i in range(self._num_convs): - self._conv_ops.append( - self._conv2d_op( - self._num_filters, - kernel_size=(3, 3), - strides=(1, 1), - padding='same', - dilation_rate=(1, 1), - activation=(None - if self._use_batch_norm else self._activation_op), - name='conv_{}'.format(i))) - if self._use_batch_norm: - self._conv_bn_ops.append(self._norm_activation()) - - self._fc_ops = [] - self._fc_bn_ops = [] - for i in range(self._num_fcs): - self._fc_ops.append( - tf.keras.layers.Dense( - units=self._fc_dims, - activation=(None - if self._use_batch_norm else self._activation_op), - name='fc{}'.format(i))) - if self._use_batch_norm: - self._fc_bn_ops.append(self._norm_activation(fused=False)) - - self._class_predict = tf.keras.layers.Dense( - self._num_classes, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - bias_initializer=tf.zeros_initializer(), - name='class-predict') - self._box_predict = tf.keras.layers.Dense( - self._num_classes * 4, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.001), - bias_initializer=tf.zeros_initializer(), - name='box-predict') - - def call(self, roi_features, is_training=None): - """Box and class branches for the Mask-RCNN model. - - Args: - roi_features: A ROI feature tensor of shape [batch_size, num_rois, - height_l, width_l, num_filters]. - is_training: `boolean`, if True if model is in training mode. - - Returns: - class_outputs: a tensor with a shape of - [batch_size, num_rois, num_classes], representing the class predictions. - box_outputs: a tensor with a shape of - [batch_size, num_rois, num_classes * 4], representing the box - predictions. - """ - - with tf.name_scope( - 'fast_rcnn_head'): - # reshape inputs beofre FC. - _, num_rois, height, width, filters = roi_features.get_shape().as_list() - - net = tf.reshape(roi_features, [-1, height, width, filters]) - for i in range(self._num_convs): - net = self._conv_ops[i](net) - if self._use_batch_norm: - net = self._conv_bn_ops[i](net, is_training=is_training) - - filters = self._num_filters if self._num_convs > 0 else filters - net = tf.reshape(net, [-1, num_rois, height * width * filters]) - - for i in range(self._num_fcs): - net = self._fc_ops[i](net) - if self._use_batch_norm: - net = self._fc_bn_ops[i](net, is_training=is_training) - - class_outputs = self._class_predict(net) - box_outputs = self._box_predict(net) - return class_outputs, box_outputs - - -class OlnBoxScoreHead(tf.keras.layers.Layer): - """Box head of Object Localization Network (OLN).""" - - def __init__( - self, - num_classes, - num_convs=0, - num_filters=256, - use_separable_conv=False, - num_fcs=2, - fc_dims=1024, - activation='relu', - use_batch_norm=True, - norm_activation=nn_ops.norm_activation_builder(activation='relu')): - """Initialize params to build OLN box head. - - Args: - num_classes: a integer for the number of classes. - num_convs: `int` number that represents the number of the intermediate - conv layers before the FC layers. - num_filters: `int` number that represents the number of filters of the - intermediate conv layers. - use_separable_conv: `bool`, indicating whether the separable conv layers - is used. - num_fcs: `int` number that represents the number of FC layers before the - predictions. - fc_dims: `int` number that represents the number of dimension of the FC - layers. - activation: activation function. Support 'relu' and 'swish'. - use_batch_norm: 'bool', indicating whether batchnorm layers are added. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - """ - self._num_classes = num_classes - - self._num_convs = num_convs - self._num_filters = num_filters - if use_separable_conv: - self._conv2d_op = functools.partial( - tf.keras.layers.SeparableConv2D, - depth_multiplier=1, - bias_initializer=tf.zeros_initializer()) - else: - self._conv2d_op = functools.partial( - tf.keras.layers.Conv2D, - kernel_initializer=tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - bias_initializer=tf.zeros_initializer()) - - self._num_fcs = num_fcs - self._fc_dims = fc_dims - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - self._use_batch_norm = use_batch_norm - self._norm_activation = norm_activation - - self._conv_ops = [] - self._conv_bn_ops = [] - for i in range(self._num_convs): - self._conv_ops.append( - self._conv2d_op( - self._num_filters, - kernel_size=(3, 3), - strides=(1, 1), - padding='same', - dilation_rate=(1, 1), - activation=(None - if self._use_batch_norm else self._activation_op), - name='conv_{}'.format(i))) - if self._use_batch_norm: - self._conv_bn_ops.append(self._norm_activation()) - - self._fc_ops = [] - self._fc_bn_ops = [] - for i in range(self._num_fcs): - self._fc_ops.append( - tf.keras.layers.Dense( - units=self._fc_dims, - activation=(None - if self._use_batch_norm else self._activation_op), - name='fc{}'.format(i))) - if self._use_batch_norm: - self._fc_bn_ops.append(self._norm_activation(fused=False)) - - self._class_predict = tf.keras.layers.Dense( - self._num_classes, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - bias_initializer=tf.zeros_initializer(), - name='class-predict') - self._box_predict = tf.keras.layers.Dense( - self._num_classes * 4, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.001), - bias_initializer=tf.zeros_initializer(), - name='box-predict') - self._score_predict = tf.keras.layers.Dense( - 1, - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - bias_initializer=tf.zeros_initializer(), - name='score-predict') - - def __call__(self, roi_features, is_training=None): - """Box and class branches for the Mask-RCNN model. - - Args: - roi_features: A ROI feature tensor of shape [batch_size, num_rois, - height_l, width_l, num_filters]. - is_training: `boolean`, if True if model is in training mode. - - Returns: - class_outputs: a tensor with a shape of - [batch_size, num_rois, num_classes], representing the class predictions. - box_outputs: a tensor with a shape of - [batch_size, num_rois, num_classes * 4], representing the box - predictions. - """ - - with tf.name_scope('fast_rcnn_head'): - # reshape inputs beofre FC. - _, num_rois, height, width, filters = roi_features.get_shape().as_list() - - net = tf.reshape(roi_features, [-1, height, width, filters]) - for i in range(self._num_convs): - net = self._conv_ops[i](net) - if self._use_batch_norm: - net = self._conv_bn_ops[i](net, is_training=is_training) - - filters = self._num_filters if self._num_convs > 0 else filters - net = tf.reshape(net, [-1, num_rois, height * width * filters]) - - for i in range(self._num_fcs): - net = self._fc_ops[i](net) - if self._use_batch_norm: - net = self._fc_bn_ops[i](net, is_training=is_training) - - class_outputs = self._class_predict(net) - box_outputs = self._box_predict(net) - score_outputs = self._score_predict(net) - return class_outputs, box_outputs, score_outputs - - -class MaskrcnnHead(tf.keras.layers.Layer): - """Mask R-CNN head.""" - - def __init__( - self, - num_classes, - mask_target_size, - num_convs=4, - num_filters=256, - use_separable_conv=False, - activation='relu', - use_batch_norm=True, - norm_activation=nn_ops.norm_activation_builder(activation='relu')): - """Initialize params to build Fast R-CNN head. - - Args: - num_classes: a integer for the number of classes. - mask_target_size: a integer that is the resolution of masks. - num_convs: `int` number that represents the number of the intermediate - conv layers before the prediction. - num_filters: `int` number that represents the number of filters of the - intermediate conv layers. - use_separable_conv: `bool`, indicating whether the separable conv layers - is used. - activation: activation function. Support 'relu' and 'swish'. - use_batch_norm: 'bool', indicating whether batchnorm layers are added. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - """ - super(MaskrcnnHead, self).__init__(autocast=False) - self._num_classes = num_classes - self._mask_target_size = mask_target_size - - self._num_convs = num_convs - self._num_filters = num_filters - if use_separable_conv: - self._conv2d_op = functools.partial( - tf.keras.layers.SeparableConv2D, - depth_multiplier=1, - bias_initializer=tf.zeros_initializer()) - else: - self._conv2d_op = functools.partial( - tf.keras.layers.Conv2D, - kernel_initializer=tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - bias_initializer=tf.zeros_initializer()) - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - self._use_batch_norm = use_batch_norm - self._norm_activation = norm_activation - self._conv2d_ops = [] - for i in range(self._num_convs): - self._conv2d_ops.append( - self._conv2d_op( - self._num_filters, - kernel_size=(3, 3), - strides=(1, 1), - padding='same', - dilation_rate=(1, 1), - activation=(None - if self._use_batch_norm else self._activation_op), - name='mask-conv-l%d' % i)) - self._mask_conv_transpose = tf.keras.layers.Conv2DTranspose( - self._num_filters, - kernel_size=(2, 2), - strides=(2, 2), - padding='valid', - activation=(None if self._use_batch_norm else self._activation_op), - kernel_initializer=tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - bias_initializer=tf.zeros_initializer(), - name='conv5-mask') - - with tf.name_scope('mask_head'): - self._mask_conv2d_op = self._conv2d_op( - self._num_classes, - kernel_size=(1, 1), - strides=(1, 1), - padding='valid', - name='mask_fcn_logits') - - def call(self, roi_features, class_indices, is_training=None): - """Mask branch for the Mask-RCNN model. - - Args: - roi_features: A ROI feature tensor of shape [batch_size, num_rois, - height_l, width_l, num_filters]. - class_indices: a Tensor of shape [batch_size, num_rois], indicating which - class the ROI is. - is_training: `boolean`, if True if model is in training mode. - - Returns: - mask_outputs: a tensor with a shape of - [batch_size, num_masks, mask_height, mask_width, num_classes], - representing the mask predictions. - fg_gather_indices: a tensor with a shape of [batch_size, num_masks, 2], - representing the fg mask targets. - Raises: - ValueError: If boxes is not a rank-3 tensor or the last dimension of - boxes is not 4. - """ - - with tf.name_scope('mask_head'): - _, num_rois, height, width, filters = roi_features.get_shape().as_list() - net = tf.reshape(roi_features, [-1, height, width, filters]) - - for i in range(self._num_convs): - net = self._conv2d_ops[i](net) - if self._use_batch_norm: - net = self._norm_activation()(net, is_training=is_training) - - net = self._mask_conv_transpose(net) - if self._use_batch_norm: - net = self._norm_activation()(net, is_training=is_training) - - mask_outputs = self._mask_conv2d_op(net) - mask_outputs = tf.reshape(mask_outputs, [ - -1, num_rois, self._mask_target_size, self._mask_target_size, - self._num_classes - ]) - - with tf.name_scope('masks_post_processing'): - # TODO(pengchong): Figure out the way not to use the static inferred - # batch size. - batch_size, num_masks = class_indices.get_shape().as_list() - mask_outputs = tf.transpose(a=mask_outputs, perm=[0, 1, 4, 2, 3]) - # Constructs indices for gather. - batch_indices = tf.tile( - tf.expand_dims(tf.range(batch_size), axis=1), [1, num_masks]) - mask_indices = tf.tile( - tf.expand_dims(tf.range(num_masks), axis=0), [batch_size, 1]) - gather_indices = tf.stack( - [batch_indices, mask_indices, class_indices], axis=2) - mask_outputs = tf.gather_nd(mask_outputs, gather_indices) - return mask_outputs - - -class RetinanetHead(object): - """RetinaNet head.""" - - def __init__( - self, - min_level, - max_level, - num_classes, - anchors_per_location, - num_convs=4, - num_filters=256, - use_separable_conv=False, - norm_activation=nn_ops.norm_activation_builder(activation='relu')): - """Initialize params to build RetinaNet head. - - Args: - min_level: `int` number of minimum feature level. - max_level: `int` number of maximum feature level. - num_classes: `int` number of classification categories. - anchors_per_location: `int` number of anchors per pixel location. - num_convs: `int` number of stacked convolution before the last prediction - layer. - num_filters: `int` number of filters used in the head architecture. - use_separable_conv: `bool` to indicate whether to use separable - convoluation. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - """ - self._min_level = min_level - self._max_level = max_level - - self._num_classes = num_classes - self._anchors_per_location = anchors_per_location - - self._num_convs = num_convs - self._num_filters = num_filters - self._use_separable_conv = use_separable_conv - with tf.name_scope('class_net') as scope_name: - self._class_name_scope = tf.name_scope(scope_name) - with tf.name_scope('box_net') as scope_name: - self._box_name_scope = tf.name_scope(scope_name) - self._build_class_net_layers(norm_activation) - self._build_box_net_layers(norm_activation) - - def _class_net_batch_norm_name(self, i, level): - return 'class-%d-%d' % (i, level) - - def _box_net_batch_norm_name(self, i, level): - return 'box-%d-%d' % (i, level) - - def _build_class_net_layers(self, norm_activation): - """Build re-usable layers for class prediction network.""" - if self._use_separable_conv: - self._class_predict = tf.keras.layers.SeparableConv2D( - self._num_classes * self._anchors_per_location, - kernel_size=(3, 3), - bias_initializer=tf.constant_initializer(-np.log((1 - 0.01) / 0.01)), - padding='same', - name='class-predict') - else: - self._class_predict = tf.keras.layers.Conv2D( - self._num_classes * self._anchors_per_location, - kernel_size=(3, 3), - bias_initializer=tf.constant_initializer(-np.log((1 - 0.01) / 0.01)), - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=1e-5), - padding='same', - name='class-predict') - self._class_conv = [] - self._class_norm_activation = {} - for i in range(self._num_convs): - if self._use_separable_conv: - self._class_conv.append( - tf.keras.layers.SeparableConv2D( - self._num_filters, - kernel_size=(3, 3), - bias_initializer=tf.zeros_initializer(), - activation=None, - padding='same', - name='class-' + str(i))) - else: - self._class_conv.append( - tf.keras.layers.Conv2D( - self._num_filters, - kernel_size=(3, 3), - bias_initializer=tf.zeros_initializer(), - kernel_initializer=tf.keras.initializers.RandomNormal( - stddev=0.01), - activation=None, - padding='same', - name='class-' + str(i))) - for level in range(self._min_level, self._max_level + 1): - name = self._class_net_batch_norm_name(i, level) - self._class_norm_activation[name] = norm_activation(name=name) - - def _build_box_net_layers(self, norm_activation): - """Build re-usable layers for box prediction network.""" - if self._use_separable_conv: - self._box_predict = tf.keras.layers.SeparableConv2D( - 4 * self._anchors_per_location, - kernel_size=(3, 3), - bias_initializer=tf.zeros_initializer(), - padding='same', - name='box-predict') - else: - self._box_predict = tf.keras.layers.Conv2D( - 4 * self._anchors_per_location, - kernel_size=(3, 3), - bias_initializer=tf.zeros_initializer(), - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=1e-5), - padding='same', - name='box-predict') - self._box_conv = [] - self._box_norm_activation = {} - for i in range(self._num_convs): - if self._use_separable_conv: - self._box_conv.append( - tf.keras.layers.SeparableConv2D( - self._num_filters, - kernel_size=(3, 3), - activation=None, - bias_initializer=tf.zeros_initializer(), - padding='same', - name='box-' + str(i))) - else: - self._box_conv.append( - tf.keras.layers.Conv2D( - self._num_filters, - kernel_size=(3, 3), - activation=None, - bias_initializer=tf.zeros_initializer(), - kernel_initializer=tf.keras.initializers.RandomNormal( - stddev=0.01), - padding='same', - name='box-' + str(i))) - for level in range(self._min_level, self._max_level + 1): - name = self._box_net_batch_norm_name(i, level) - self._box_norm_activation[name] = norm_activation(name=name) - - def __call__(self, fpn_features, is_training=None): - """Returns outputs of RetinaNet head.""" - class_outputs = {} - box_outputs = {} - with tf.name_scope('retinanet_head'): - for level in range(self._min_level, self._max_level + 1): - features = fpn_features[level] - - class_outputs[level] = self.class_net( - features, level, is_training=is_training) - box_outputs[level] = self.box_net( - features, level, is_training=is_training) - return class_outputs, box_outputs - - def class_net(self, features, level, is_training): - """Class prediction network for RetinaNet.""" - with self._class_name_scope: - for i in range(self._num_convs): - features = self._class_conv[i](features) - # The convolution layers in the class net are shared among all levels, - # but each level has its batch normlization to capture the statistical - # difference among different levels. - name = self._class_net_batch_norm_name(i, level) - features = self._class_norm_activation[name]( - features, is_training=is_training) - - classes = self._class_predict(features) - return classes - - def box_net(self, features, level, is_training=None): - """Box regression network for RetinaNet.""" - with self._box_name_scope: - for i in range(self._num_convs): - features = self._box_conv[i](features) - # The convolution layers in the box net are shared among all levels, but - # each level has its batch normlization to capture the statistical - # difference among different levels. - name = self._box_net_batch_norm_name(i, level) - features = self._box_norm_activation[name]( - features, is_training=is_training) - - boxes = self._box_predict(features) - return boxes - - -# TODO(yeqing): Refactor this class when it is ready for var_scope reuse. -class ShapemaskPriorHead(object): - """ShapeMask Prior head.""" - - def __init__(self, num_classes, num_downsample_channels, mask_crop_size, - use_category_for_mask, shape_prior_path): - """Initialize params to build RetinaNet head. - - Args: - num_classes: Number of output classes. - num_downsample_channels: number of channels in mask branch. - mask_crop_size: feature crop size. - use_category_for_mask: use class information in mask branch. - shape_prior_path: the path to load shape priors. - """ - self._mask_num_classes = num_classes if use_category_for_mask else 1 - self._num_downsample_channels = num_downsample_channels - self._mask_crop_size = mask_crop_size - self._shape_prior_path = shape_prior_path - self._use_category_for_mask = use_category_for_mask - - self._shape_prior_fc = tf.keras.layers.Dense( - self._num_downsample_channels, name='shape-prior-fc') - - def __call__(self, fpn_features, boxes, outer_boxes, classes, is_training): - """Generate the detection priors from the box detections and FPN features. - - This corresponds to the Fig. 4 of the ShapeMask paper at - https://arxiv.org/pdf/1904.03239.pdf - - Args: - fpn_features: a dictionary of FPN features. - boxes: a float tensor of shape [batch_size, num_instances, 4] representing - the tight gt boxes from dataloader/detection. - outer_boxes: a float tensor of shape [batch_size, num_instances, 4] - representing the loose gt boxes from dataloader/detection. - classes: a int Tensor of shape [batch_size, num_instances] of instance - classes. - is_training: training mode or not. - - Returns: - instance_features: a float Tensor of shape [batch_size * num_instances, - mask_crop_size, mask_crop_size, num_downsample_channels]. This is the - instance feature crop. - detection_priors: A float Tensor of shape [batch_size * num_instances, - mask_size, mask_size, 1]. - """ - with tf.name_scope('prior_mask'): - batch_size, num_instances, _ = boxes.get_shape().as_list() - outer_boxes = tf.cast(outer_boxes, tf.float32) - boxes = tf.cast(boxes, tf.float32) - instance_features = spatial_transform_ops.multilevel_crop_and_resize( - fpn_features, outer_boxes, output_size=self._mask_crop_size) - instance_features = self._shape_prior_fc(instance_features) - - shape_priors = self._get_priors() - - # Get uniform priors for each outer box. - uniform_priors = tf.ones([ - batch_size, num_instances, self._mask_crop_size, self._mask_crop_size - ]) - uniform_priors = spatial_transform_ops.crop_mask_in_target_box( - uniform_priors, boxes, outer_boxes, self._mask_crop_size) - - # Classify shape priors using uniform priors + instance features. - prior_distribution = self._classify_shape_priors( - tf.cast(instance_features, tf.float32), uniform_priors, classes) - - instance_priors = tf.gather(shape_priors, classes) - instance_priors *= tf.expand_dims( - tf.expand_dims(tf.cast(prior_distribution, tf.float32), axis=-1), - axis=-1) - instance_priors = tf.reduce_sum(instance_priors, axis=2) - detection_priors = spatial_transform_ops.crop_mask_in_target_box( - instance_priors, boxes, outer_boxes, self._mask_crop_size) - - return instance_features, detection_priors - - def _get_priors(self): - """Load shape priors from file.""" - # loads class specific or agnostic shape priors - if self._shape_prior_path: - # Priors are loaded into shape [mask_num_classes, num_clusters, 32, 32]. - priors = np.load(tf.io.gfile.GFile(self._shape_prior_path, 'rb')) - priors = tf.convert_to_tensor(priors, dtype=tf.float32) - self._num_clusters = priors.get_shape().as_list()[1] - else: - # If prior path does not exist, do not use priors, i.e., pirors equal to - # uniform empty 32x32 patch. - self._num_clusters = 1 - priors = tf.zeros([ - self._mask_num_classes, self._num_clusters, self._mask_crop_size, - self._mask_crop_size - ]) - return priors - - def _classify_shape_priors(self, features, uniform_priors, classes): - """Classify the uniform prior by predicting the shape modes. - - Classify the object crop features into K modes of the clusters for each - category. - - Args: - features: A float Tensor of shape [batch_size, num_instances, mask_size, - mask_size, num_channels]. - uniform_priors: A float Tensor of shape [batch_size, num_instances, - mask_size, mask_size] representing the uniform detection priors. - classes: A int Tensor of shape [batch_size, num_instances] of detection - class ids. - - Returns: - prior_distribution: A float Tensor of shape - [batch_size, num_instances, num_clusters] representing the classifier - output probability over all possible shapes. - """ - - batch_size, num_instances, _, _, _ = features.get_shape().as_list() - features *= tf.expand_dims(uniform_priors, axis=-1) - # Reduce spatial dimension of features. The features have shape - # [batch_size, num_instances, num_channels]. - features = tf.reduce_mean(features, axis=(2, 3)) - logits = tf.keras.layers.Dense( - self._mask_num_classes * self._num_clusters, - kernel_initializer=tf.random_normal_initializer(stddev=0.01), - name='classify-shape-prior-fc')(features) - logits = tf.reshape( - logits, - [batch_size, num_instances, self._mask_num_classes, self._num_clusters]) - if self._use_category_for_mask: - logits = tf.gather(logits, tf.expand_dims(classes, axis=-1), batch_dims=2) - logits = tf.squeeze(logits, axis=2) - else: - logits = logits[:, :, 0, :] - - distribution = tf.nn.softmax(logits, name='shape_prior_weights') - return distribution - - -class ShapemaskCoarsemaskHead(object): - """ShapemaskCoarsemaskHead head.""" - - def __init__(self, - num_classes, - num_downsample_channels, - mask_crop_size, - use_category_for_mask, - num_convs, - norm_activation=nn_ops.norm_activation_builder()): - """Initialize params to build ShapeMask coarse and fine prediction head. - - Args: - num_classes: `int` number of mask classification categories. - num_downsample_channels: `int` number of filters at mask head. - mask_crop_size: feature crop size. - use_category_for_mask: use class information in mask branch. - num_convs: `int` number of stacked convolution before the last prediction - layer. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - """ - self._mask_num_classes = num_classes if use_category_for_mask else 1 - self._use_category_for_mask = use_category_for_mask - self._num_downsample_channels = num_downsample_channels - self._mask_crop_size = mask_crop_size - self._num_convs = num_convs - self._norm_activation = norm_activation - - self._coarse_mask_fc = tf.keras.layers.Dense( - self._num_downsample_channels, name='coarse-mask-fc') - - self._class_conv = [] - self._class_norm_activation = [] - - for i in range(self._num_convs): - self._class_conv.append( - tf.keras.layers.Conv2D( - self._num_downsample_channels, - kernel_size=(3, 3), - bias_initializer=tf.zeros_initializer(), - kernel_initializer=tf.keras.initializers.RandomNormal( - stddev=0.01), - padding='same', - name='coarse-mask-class-%d' % i)) - - self._class_norm_activation.append( - norm_activation(name='coarse-mask-class-%d-bn' % i)) - - self._class_predict = tf.keras.layers.Conv2D( - self._mask_num_classes, - kernel_size=(1, 1), - # Focal loss bias initialization to have foreground 0.01 probability. - bias_initializer=tf.constant_initializer(-np.log((1 - 0.01) / 0.01)), - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - padding='same', - name='coarse-mask-class-predict') - - def __call__(self, features, detection_priors, classes, is_training): - """Generate instance masks from FPN features and detection priors. - - This corresponds to the Fig. 5-6 of the ShapeMask paper at - https://arxiv.org/pdf/1904.03239.pdf - - Args: - features: a float Tensor of shape [batch_size, num_instances, - mask_crop_size, mask_crop_size, num_downsample_channels]. This is the - instance feature crop. - detection_priors: a float Tensor of shape [batch_size, num_instances, - mask_crop_size, mask_crop_size, 1]. This is the detection prior for the - instance. - classes: a int Tensor of shape [batch_size, num_instances] of instance - classes. - is_training: a bool indicating whether in training mode. - - Returns: - mask_outputs: instance mask prediction as a float Tensor of shape - [batch_size, num_instances, mask_size, mask_size]. - """ - with tf.name_scope('coarse_mask'): - # Transform detection priors to have the same dimension as features. - detection_priors = tf.expand_dims(detection_priors, axis=-1) - detection_priors = self._coarse_mask_fc(detection_priors) - - features += detection_priors - mask_logits = self.decoder_net(features, is_training) - # Gather the logits with right input class. - if self._use_category_for_mask: - mask_logits = tf.transpose(mask_logits, [0, 1, 4, 2, 3]) - mask_logits = tf.gather( - mask_logits, tf.expand_dims(classes, -1), batch_dims=2) - mask_logits = tf.squeeze(mask_logits, axis=2) - else: - mask_logits = mask_logits[..., 0] - - return mask_logits - - def decoder_net(self, features, is_training=False): - """Coarse mask decoder network architecture. - - Args: - features: A tensor of size [batch, height_in, width_in, channels_in]. - is_training: Whether batch_norm layers are in training mode. - - Returns: - images: A feature tensor of size [batch, output_size, output_size, - num_channels] - """ - (batch_size, num_instances, height, width, - num_channels) = features.get_shape().as_list() - features = tf.reshape( - features, [batch_size * num_instances, height, width, num_channels]) - for i in range(self._num_convs): - features = self._class_conv[i](features) - features = self._class_norm_activation[i]( - features, is_training=is_training) - - mask_logits = self._class_predict(features) - mask_logits = tf.reshape( - mask_logits, - [batch_size, num_instances, height, width, self._mask_num_classes]) - return mask_logits - - -class ShapemaskFinemaskHead(object): - """ShapemaskFinemaskHead head.""" - - def __init__(self, - num_classes, - num_downsample_channels, - mask_crop_size, - use_category_for_mask, - num_convs, - upsample_factor, - norm_activation=nn_ops.norm_activation_builder()): - """Initialize params to build ShapeMask coarse and fine prediction head. - - Args: - num_classes: `int` number of mask classification categories. - num_downsample_channels: `int` number of filters at mask head. - mask_crop_size: feature crop size. - use_category_for_mask: use class information in mask branch. - num_convs: `int` number of stacked convolution before the last prediction - layer. - upsample_factor: `int` number of fine mask upsampling factor. - norm_activation: an operation that includes a batch normalization layer - followed by a relu layer(optional). - """ - self._use_category_for_mask = use_category_for_mask - self._mask_num_classes = num_classes if use_category_for_mask else 1 - self._num_downsample_channels = num_downsample_channels - self._mask_crop_size = mask_crop_size - self._num_convs = num_convs - self.up_sample_factor = upsample_factor - - self._fine_mask_fc = tf.keras.layers.Dense( - self._num_downsample_channels, name='fine-mask-fc') - - self._upsample_conv = tf.keras.layers.Conv2DTranspose( - self._num_downsample_channels, - (self.up_sample_factor, self.up_sample_factor), - (self.up_sample_factor, self.up_sample_factor), - name='fine-mask-conv2d-tran') - - self._fine_class_conv = [] - self._fine_class_bn = [] - for i in range(self._num_convs): - self._fine_class_conv.append( - tf.keras.layers.Conv2D( - self._num_downsample_channels, - kernel_size=(3, 3), - bias_initializer=tf.zeros_initializer(), - kernel_initializer=tf.keras.initializers.RandomNormal( - stddev=0.01), - activation=None, - padding='same', - name='fine-mask-class-%d' % i)) - self._fine_class_bn.append( - norm_activation(name='fine-mask-class-%d-bn' % i)) - - self._class_predict_conv = tf.keras.layers.Conv2D( - self._mask_num_classes, - kernel_size=(1, 1), - # Focal loss bias initialization to have foreground 0.01 probability. - bias_initializer=tf.constant_initializer(-np.log((1 - 0.01) / 0.01)), - kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), - padding='same', - name='fine-mask-class-predict') - - def __call__(self, features, mask_logits, classes, is_training): - """Generate instance masks from FPN features and detection priors. - - This corresponds to the Fig. 5-6 of the ShapeMask paper at - https://arxiv.org/pdf/1904.03239.pdf - - Args: - features: a float Tensor of shape [batch_size, num_instances, - mask_crop_size, mask_crop_size, num_downsample_channels]. This is the - instance feature crop. - mask_logits: a float Tensor of shape [batch_size, num_instances, - mask_crop_size, mask_crop_size] indicating predicted mask logits. - classes: a int Tensor of shape [batch_size, num_instances] of instance - classes. - is_training: a bool indicating whether in training mode. - - Returns: - mask_outputs: instance mask prediction as a float Tensor of shape - [batch_size, num_instances, mask_size, mask_size]. - """ - # Extract the foreground mean features - # with tf.variable_scope('fine_mask', reuse=tf.AUTO_REUSE): - with tf.name_scope('fine_mask'): - mask_probs = tf.nn.sigmoid(mask_logits) - # Compute instance embedding for hard average. - binary_mask = tf.cast(tf.greater(mask_probs, 0.5), features.dtype) - instance_embedding = tf.reduce_sum( - features * tf.expand_dims(binary_mask, axis=-1), axis=(2, 3)) - instance_embedding /= tf.expand_dims( - tf.reduce_sum(binary_mask, axis=(2, 3)) + 1e-20, axis=-1) - # Take the difference between crop features and mean instance features. - features -= tf.expand_dims( - tf.expand_dims(instance_embedding, axis=2), axis=2) - - features += self._fine_mask_fc(tf.expand_dims(mask_probs, axis=-1)) - - # Decoder to generate upsampled segmentation mask. - mask_logits = self.decoder_net(features, is_training) - if self._use_category_for_mask: - mask_logits = tf.transpose(mask_logits, [0, 1, 4, 2, 3]) - mask_logits = tf.gather( - mask_logits, tf.expand_dims(classes, -1), batch_dims=2) - mask_logits = tf.squeeze(mask_logits, axis=2) - else: - mask_logits = mask_logits[..., 0] - - return mask_logits - - def decoder_net(self, features, is_training=False): - """Fine mask decoder network architecture. - - Args: - features: A tensor of size [batch, height_in, width_in, channels_in]. - is_training: Whether batch_norm layers are in training mode. - - Returns: - images: A feature tensor of size [batch, output_size, output_size, - num_channels], where output size is self._gt_upsample_scale times - that of input. - """ - (batch_size, num_instances, height, width, - num_channels) = features.get_shape().as_list() - features = tf.reshape( - features, [batch_size * num_instances, height, width, num_channels]) - for i in range(self._num_convs): - features = self._fine_class_conv[i](features) - features = self._fine_class_bn[i](features, is_training=is_training) - - if self.up_sample_factor > 1: - features = self._upsample_conv(features) - - # Predict per-class instance masks. - mask_logits = self._class_predict_conv(features) - - mask_logits = tf.reshape(mask_logits, [ - batch_size, num_instances, height * self.up_sample_factor, - width * self.up_sample_factor, self._mask_num_classes - ]) - return mask_logits diff --git a/official/vision/detection/modeling/architecture/identity.py b/official/vision/detection/modeling/architecture/identity.py deleted file mode 100644 index 778297f8919f8a90875c69ce1f11ef5dfd9fc95f..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/identity.py +++ /dev/null @@ -1,28 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Identity Fn that forwards the input features.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - - -class Identity(object): - """Identity function that forwards the input features.""" - - def __call__(self, features, is_training=False): - """Only forwards the input features.""" - return features - diff --git a/official/vision/detection/modeling/architecture/nn_blocks.py b/official/vision/detection/modeling/architecture/nn_blocks.py deleted file mode 100644 index 69a0d28261997eddbd9826d7681edbe95940e9c9..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/nn_blocks.py +++ /dev/null @@ -1,316 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains common building blocks for neural networks.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.modeling import tf_utils - - -class ResidualBlock(tf.keras.layers.Layer): - """A residual block.""" - - def __init__(self, - filters, - strides, - use_projection=False, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - **kwargs): - """A residual block with BN after convolutions. - - Args: - filters: `int` number of filters for the first two convolutions. Note that - the third and final convolution will use 4 times as many filters. - strides: `int` block stride. If greater than 1, this block will ultimately - downsample the input. - use_projection: `bool` for whether this block should use a projection - shortcut (versus the default identity shortcut). This is usually `True` - for the first block of a block group, which may change the number of - filters and the resolution. - kernel_initializer: kernel_initializer for convolutional layers. - kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D. - Default to None. - bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d. - Default to None. - activation: `str` name of the activation function. - use_sync_bn: if True, use synchronized batch normalization. - norm_momentum: `float` normalization omentum for the moving average. - norm_epsilon: `float` small float added to variance to avoid dividing by - zero. - **kwargs: keyword arguments to be passed. - """ - super(ResidualBlock, self).__init__(**kwargs) - - self._filters = filters - self._strides = strides - self._use_projection = use_projection - self._use_sync_bn = use_sync_bn - self._activation = activation - self._kernel_initializer = kernel_initializer - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation_fn = tf_utils.get_activation(activation) - - def build(self, input_shape): - if self._use_projection: - self._shortcut = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=1, - strides=self._strides, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv1 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=3, - strides=self._strides, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv2 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=3, - strides=1, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm2 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - super(ResidualBlock, self).build(input_shape) - - def get_config(self): - config = { - 'filters': self._filters, - 'strides': self._strides, - 'use_projection': self._use_projection, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon - } - - base_config = super(ResidualBlock, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs): - shortcut = inputs - if self._use_projection: - shortcut = self._shortcut(shortcut) - shortcut = self._norm0(shortcut) - - x = self._conv1(inputs) - x = self._norm1(x) - x = self._activation_fn(x) - - x = self._conv2(x) - x = self._norm2(x) - - return self._activation_fn(x + shortcut) - - -class BottleneckBlock(tf.keras.layers.Layer): - """A standard bottleneck block.""" - - def __init__(self, - filters, - strides, - use_projection=False, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - **kwargs): - """A standard bottleneck block with BN after convolutions. - - Args: - filters: `int` number of filters for the first two convolutions. Note that - the third and final convolution will use 4 times as many filters. - strides: `int` block stride. If greater than 1, this block will ultimately - downsample the input. - use_projection: `bool` for whether this block should use a projection - shortcut (versus the default identity shortcut). This is usually `True` - for the first block of a block group, which may change the number of - filters and the resolution. - kernel_initializer: kernel_initializer for convolutional layers. - kernel_regularizer: tf.keras.regularizers.Regularizer object for Conv2D. - Default to None. - bias_regularizer: tf.keras.regularizers.Regularizer object for Conv2d. - Default to None. - activation: `str` name of the activation function. - use_sync_bn: if True, use synchronized batch normalization. - norm_momentum: `float` normalization omentum for the moving average. - norm_epsilon: `float` small float added to variance to avoid dividing by - zero. - **kwargs: keyword arguments to be passed. - """ - super(BottleneckBlock, self).__init__(**kwargs) - - self._filters = filters - self._strides = strides - self._use_projection = use_projection - self._use_sync_bn = use_sync_bn - self._activation = activation - self._kernel_initializer = kernel_initializer - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - if use_sync_bn: - self._norm = tf.keras.layers.experimental.SyncBatchNormalization - else: - self._norm = tf.keras.layers.BatchNormalization - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - self._activation_fn = tf_utils.get_activation(activation) - - def build(self, input_shape): - if self._use_projection: - self._shortcut = tf.keras.layers.Conv2D( - filters=self._filters * 4, - kernel_size=1, - strides=self._strides, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm0 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv1 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm1 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv2 = tf.keras.layers.Conv2D( - filters=self._filters, - kernel_size=3, - strides=self._strides, - padding='same', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm2 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - self._conv3 = tf.keras.layers.Conv2D( - filters=self._filters * 4, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer) - self._norm3 = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon) - - super(BottleneckBlock, self).build(input_shape) - - def get_config(self): - config = { - 'filters': self._filters, - 'strides': self._strides, - 'use_projection': self._use_projection, - 'kernel_initializer': self._kernel_initializer, - 'kernel_regularizer': self._kernel_regularizer, - 'bias_regularizer': self._bias_regularizer, - 'activation': self._activation, - 'use_sync_bn': self._use_sync_bn, - 'norm_momentum': self._norm_momentum, - 'norm_epsilon': self._norm_epsilon - } - - base_config = super(BottleneckBlock, self).get_config() - return dict(list(base_config.items()) + list(config.items())) - - def call(self, inputs): - shortcut = inputs - if self._use_projection: - shortcut = self._shortcut(shortcut) - shortcut = self._norm0(shortcut) - - x = self._conv1(inputs) - x = self._norm1(x) - x = self._activation_fn(x) - - x = self._conv2(x) - x = self._norm2(x) - x = self._activation_fn(x) - - x = self._conv3(x) - x = self._norm3(x) - - return self._activation_fn(x + shortcut) diff --git a/official/vision/detection/modeling/architecture/nn_ops.py b/official/vision/detection/modeling/architecture/nn_ops.py deleted file mode 100644 index 76a33d98d0037361e1607d95ed275043fa41d364..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/nn_ops.py +++ /dev/null @@ -1,109 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Neural network operations commonly shared by the architectures.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -import tensorflow as tf - - -class NormActivation(tf.keras.layers.Layer): - """Combined Normalization and Activation layers.""" - - def __init__(self, - momentum=0.997, - epsilon=1e-4, - trainable=True, - init_zero=False, - use_activation=True, - activation='relu', - fused=True, - name=None): - """A class to construct layers for a batch normalization followed by a ReLU. - - Args: - momentum: momentum for the moving average. - epsilon: small float added to variance to avoid dividing by zero. - trainable: `bool`, if True also add variables to the graph collection - GraphKeys.TRAINABLE_VARIABLES. If False, freeze batch normalization - layer. - init_zero: `bool` if True, initializes scale parameter of batch - normalization with 0. If False, initialize it with 1. - fused: `bool` fused option in batch normalziation. - use_actiation: `bool`, whether to add the optional activation layer after - the batch normalization layer. - activation: 'string', the type of the activation layer. Currently support - `relu` and `swish`. - name: `str` name for the operation. - """ - super(NormActivation, self).__init__(trainable=trainable) - if init_zero: - gamma_initializer = tf.keras.initializers.Zeros() - else: - gamma_initializer = tf.keras.initializers.Ones() - self._normalization_op = tf.keras.layers.BatchNormalization( - momentum=momentum, - epsilon=epsilon, - center=True, - scale=True, - trainable=trainable, - fused=fused, - gamma_initializer=gamma_initializer, - name=name) - self._use_activation = use_activation - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - - def __call__(self, inputs, is_training=None): - """Builds the normalization layer followed by an optional activation layer. - - Args: - inputs: `Tensor` of shape `[batch, channels, ...]`. - is_training: `boolean`, if True if model is in training mode. - - Returns: - A normalized `Tensor` with the same `data_format`. - """ - # We will need to keep training=None by default, so that it can be inherit - # from keras.Model.training - if is_training and self.trainable: - is_training = True - inputs = self._normalization_op(inputs, training=is_training) - - if self._use_activation: - inputs = self._activation_op(inputs) - return inputs - - -def norm_activation_builder(momentum=0.997, - epsilon=1e-4, - trainable=True, - activation='relu', - **kwargs): - return functools.partial( - NormActivation, - momentum=momentum, - epsilon=epsilon, - trainable=trainable, - activation=activation, - **kwargs) diff --git a/official/vision/detection/modeling/architecture/resnet.py b/official/vision/detection/modeling/architecture/resnet.py deleted file mode 100644 index 6f76e880ed701e17795454c252d97d9a876d6d16..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/resnet.py +++ /dev/null @@ -1,351 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Contains definitions for the post-activation form of Residual Networks. - -Residual networks (ResNets) were proposed in: -[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun - Deep Residual Learning for Image Recognition. arXiv:1512.03385 -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf -from official.vision.detection.modeling.architecture import nn_ops - - -# TODO(b/140112644): Refactor the code with Keras style, i.e. build and call. -class Resnet(object): - """Class to build ResNet family model.""" - - def __init__( - self, - resnet_depth, - activation='relu', - norm_activation=nn_ops.norm_activation_builder(activation='relu'), - data_format='channels_last'): - """ResNet initialization function. - - Args: - resnet_depth: `int` depth of ResNet backbone model. - norm_activation: an operation that includes a normalization layer followed - by an optional activation layer. - data_format: `str` either "channels_first" for `[batch, channels, height, - width]` or "channels_last for `[batch, height, width, channels]`. - """ - self._resnet_depth = resnet_depth - if activation == 'relu': - self._activation_op = tf.nn.relu - elif activation == 'swish': - self._activation_op = tf.nn.swish - else: - raise ValueError('Unsupported activation `{}`.'.format(activation)) - self._norm_activation = norm_activation - self._data_format = data_format - - model_params = { - 10: { - 'block': self.residual_block, - 'layers': [1, 1, 1, 1] - }, - 18: { - 'block': self.residual_block, - 'layers': [2, 2, 2, 2] - }, - 34: { - 'block': self.residual_block, - 'layers': [3, 4, 6, 3] - }, - 50: { - 'block': self.bottleneck_block, - 'layers': [3, 4, 6, 3] - }, - 101: { - 'block': self.bottleneck_block, - 'layers': [3, 4, 23, 3] - }, - 152: { - 'block': self.bottleneck_block, - 'layers': [3, 8, 36, 3] - }, - 200: { - 'block': self.bottleneck_block, - 'layers': [3, 24, 36, 3] - } - } - - if resnet_depth not in model_params: - valid_resnet_depths = ', '.join( - [str(depth) for depth in sorted(model_params.keys())]) - raise ValueError( - 'The resnet_depth should be in [%s]. Not a valid resnet_depth:' % - (valid_resnet_depths), self._resnet_depth) - params = model_params[resnet_depth] - self._resnet_fn = self.resnet_v1_generator(params['block'], - params['layers']) - - def __call__(self, inputs, is_training=None): - """Returns the ResNet model for a given size and number of output classes. - - Args: - inputs: a `Tesnor` with shape [batch_size, height, width, 3] representing - a batch of images. - is_training: `bool` if True, the model is in training mode. - - Returns: - a `dict` containing `int` keys for continuous feature levels [2, 3, 4, 5]. - The values are corresponding feature hierarchy in ResNet with shape - [batch_size, height_l, width_l, num_filters]. - """ - with tf.name_scope('resnet%s' % self._resnet_depth): - return self._resnet_fn(inputs, is_training) - - def fixed_padding(self, inputs, kernel_size): - """Pads the input along the spatial dimensions independently of input size. - - Args: - inputs: `Tensor` of size `[batch, channels, height, width]` or `[batch, - height, width, channels]` depending on `data_format`. - kernel_size: `int` kernel size to be used for `conv2d` or max_pool2d` - operations. Should be a positive integer. - - Returns: - A padded `Tensor` of the same `data_format` with size either intact - (if `kernel_size == 1`) or padded (if `kernel_size > 1`). - """ - pad_total = kernel_size - 1 - pad_beg = pad_total // 2 - pad_end = pad_total - pad_beg - if self._data_format == 'channels_first': - padded_inputs = tf.pad( - tensor=inputs, - paddings=[[0, 0], [0, 0], [pad_beg, pad_end], [pad_beg, pad_end]]) - else: - padded_inputs = tf.pad( - tensor=inputs, - paddings=[[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]]) - - return padded_inputs - - def conv2d_fixed_padding(self, inputs, filters, kernel_size, strides): - """Strided 2-D convolution with explicit padding. - - The padding is consistent and is based only on `kernel_size`, not on the - dimensions of `inputs` (as opposed to using `tf.layers.conv2d` alone). - - Args: - inputs: `Tensor` of size `[batch, channels, height_in, width_in]`. - filters: `int` number of filters in the convolution. - kernel_size: `int` size of the kernel to be used in the convolution. - strides: `int` strides of the convolution. - - Returns: - A `Tensor` of shape `[batch, filters, height_out, width_out]`. - """ - if strides > 1: - inputs = self.fixed_padding(inputs, kernel_size) - - return tf.keras.layers.Conv2D( - filters=filters, - kernel_size=kernel_size, - strides=strides, - padding=('SAME' if strides == 1 else 'VALID'), - use_bias=False, - kernel_initializer=tf.initializers.VarianceScaling(), - data_format=self._data_format)( - inputs=inputs) - - def residual_block(self, - inputs, - filters, - strides, - use_projection=False, - is_training=None): - """Standard building block for residual networks with BN after convolutions. - - Args: - inputs: `Tensor` of size `[batch, channels, height, width]`. - filters: `int` number of filters for the first two convolutions. Note that - the third and final convolution will use 4 times as many filters. - strides: `int` block stride. If greater than 1, this block will ultimately - downsample the input. - use_projection: `bool` for whether this block should use a projection - shortcut (versus the default identity shortcut). This is usually `True` - for the first block of a block group, which may change the number of - filters and the resolution. - is_training: `bool` if True, the model is in training mode. - - Returns: - The output `Tensor` of the block. - """ - shortcut = inputs - if use_projection: - # Projection shortcut in first layer to match filters and strides - shortcut = self.conv2d_fixed_padding( - inputs=inputs, filters=filters, kernel_size=1, strides=strides) - shortcut = self._norm_activation(use_activation=False)( - shortcut, is_training=is_training) - - inputs = self.conv2d_fixed_padding( - inputs=inputs, filters=filters, kernel_size=3, strides=strides) - inputs = self._norm_activation()(inputs, is_training=is_training) - - inputs = self.conv2d_fixed_padding( - inputs=inputs, filters=filters, kernel_size=3, strides=1) - inputs = self._norm_activation( - use_activation=False, init_zero=True)( - inputs, is_training=is_training) - - return self._activation_op(inputs + shortcut) - - def bottleneck_block(self, - inputs, - filters, - strides, - use_projection=False, - is_training=None): - """Bottleneck block variant for residual networks with BN after convolutions. - - Args: - inputs: `Tensor` of size `[batch, channels, height, width]`. - filters: `int` number of filters for the first two convolutions. Note that - the third and final convolution will use 4 times as many filters. - strides: `int` block stride. If greater than 1, this block will ultimately - downsample the input. - use_projection: `bool` for whether this block should use a projection - shortcut (versus the default identity shortcut). This is usually `True` - for the first block of a block group, which may change the number of - filters and the resolution. - is_training: `bool` if True, the model is in training mode. - - Returns: - The output `Tensor` of the block. - """ - shortcut = inputs - if use_projection: - # Projection shortcut only in first block within a group. Bottleneck - # blocks end with 4 times the number of filters. - filters_out = 4 * filters - shortcut = self.conv2d_fixed_padding( - inputs=inputs, filters=filters_out, kernel_size=1, strides=strides) - shortcut = self._norm_activation(use_activation=False)( - shortcut, is_training=is_training) - - inputs = self.conv2d_fixed_padding( - inputs=inputs, filters=filters, kernel_size=1, strides=1) - inputs = self._norm_activation()(inputs, is_training=is_training) - - inputs = self.conv2d_fixed_padding( - inputs=inputs, filters=filters, kernel_size=3, strides=strides) - inputs = self._norm_activation()(inputs, is_training=is_training) - - inputs = self.conv2d_fixed_padding( - inputs=inputs, filters=4 * filters, kernel_size=1, strides=1) - inputs = self._norm_activation( - use_activation=False, init_zero=True)( - inputs, is_training=is_training) - - return self._activation_op(inputs + shortcut) - - def block_group(self, inputs, filters, block_fn, blocks, strides, name, - is_training): - """Creates one group of blocks for the ResNet model. - - Args: - inputs: `Tensor` of size `[batch, channels, height, width]`. - filters: `int` number of filters for the first convolution of the layer. - block_fn: `function` for the block to use within the model - blocks: `int` number of blocks contained in the layer. - strides: `int` stride to use for the first convolution of the layer. If - greater than 1, this layer will downsample the input. - name: `str`name for the Tensor output of the block layer. - is_training: `bool` if True, the model is in training mode. - - Returns: - The output `Tensor` of the block layer. - """ - # Only the first block per block_group uses projection shortcut and strides. - inputs = block_fn( - inputs, filters, strides, use_projection=True, is_training=is_training) - - for _ in range(1, blocks): - inputs = block_fn(inputs, filters, 1, is_training=is_training) - - return tf.identity(inputs, name) - - def resnet_v1_generator(self, block_fn, layers): - """Generator for ResNet v1 models. - - Args: - block_fn: `function` for the block to use within the model. Either - `residual_block` or `bottleneck_block`. - layers: list of 4 `int`s denoting the number of blocks to include in each - of the 4 block groups. Each group consists of blocks that take inputs of - the same resolution. - - Returns: - Model `function` that takes in `inputs` and `is_training` and returns the - output `Tensor` of the ResNet model. - """ - - def model(inputs, is_training=None): - """Creation of the model graph.""" - inputs = self.conv2d_fixed_padding( - inputs=inputs, filters=64, kernel_size=7, strides=2) - inputs = tf.identity(inputs, 'initial_conv') - inputs = self._norm_activation()(inputs, is_training=is_training) - - inputs = tf.keras.layers.MaxPool2D( - pool_size=3, strides=2, padding='SAME', - data_format=self._data_format)( - inputs) - inputs = tf.identity(inputs, 'initial_max_pool') - - c2 = self.block_group( - inputs=inputs, - filters=64, - block_fn=block_fn, - blocks=layers[0], - strides=1, - name='block_group1', - is_training=is_training) - c3 = self.block_group( - inputs=c2, - filters=128, - block_fn=block_fn, - blocks=layers[1], - strides=2, - name='block_group2', - is_training=is_training) - c4 = self.block_group( - inputs=c3, - filters=256, - block_fn=block_fn, - blocks=layers[2], - strides=2, - name='block_group3', - is_training=is_training) - c5 = self.block_group( - inputs=c4, - filters=512, - block_fn=block_fn, - blocks=layers[3], - strides=2, - name='block_group4', - is_training=is_training) - return {2: c2, 3: c3, 4: c4, 5: c5} - - return model diff --git a/official/vision/detection/modeling/architecture/spinenet.py b/official/vision/detection/modeling/architecture/spinenet.py deleted file mode 100644 index 87019a46f6d0c4e12751e10e923ddc31bc19e03c..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/architecture/spinenet.py +++ /dev/null @@ -1,504 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -# ============================================================================== -"""Implementation of SpineNet model. - -X. Du, T-Y. Lin, P. Jin, G. Ghiasi, M. Tan, Y. Cui, Q. V. Le, X. Song -SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization -https://arxiv.org/abs/1912.05027 -""" -import math - -from absl import logging -import tensorflow as tf - -from official.modeling import tf_utils -from official.vision.detection.modeling.architecture import nn_blocks - -layers = tf.keras.layers - -FILTER_SIZE_MAP = { - 1: 32, - 2: 64, - 3: 128, - 4: 256, - 5: 256, - 6: 256, - 7: 256, -} - -# The fixed SpineNet architecture discovered by NAS. -# Each element represents a specification of a building block: -# (block_level, block_fn, (input_offset0, input_offset1), is_output). -SPINENET_BLOCK_SPECS = [ - (2, 'bottleneck', (0, 1), False), - (4, 'residual', (0, 1), False), - (3, 'bottleneck', (2, 3), False), - (4, 'bottleneck', (2, 4), False), - (6, 'residual', (3, 5), False), - (4, 'bottleneck', (3, 5), False), - (5, 'residual', (6, 7), False), - (7, 'residual', (6, 8), False), - (5, 'bottleneck', (8, 9), False), - (5, 'bottleneck', (8, 10), False), - (4, 'bottleneck', (5, 10), True), - (3, 'bottleneck', (4, 10), True), - (5, 'bottleneck', (7, 12), True), - (7, 'bottleneck', (5, 14), True), - (6, 'bottleneck', (12, 14), True), -] - -SCALING_MAP = { - '49S': { - 'endpoints_num_filters': 128, - 'filter_size_scale': 0.65, - 'resample_alpha': 0.5, - 'block_repeats': 1, - }, - '49': { - 'endpoints_num_filters': 256, - 'filter_size_scale': 1.0, - 'resample_alpha': 0.5, - 'block_repeats': 1, - }, - '96': { - 'endpoints_num_filters': 256, - 'filter_size_scale': 1.0, - 'resample_alpha': 0.5, - 'block_repeats': 2, - }, - '143': { - 'endpoints_num_filters': 256, - 'filter_size_scale': 1.0, - 'resample_alpha': 1.0, - 'block_repeats': 3, - }, - '190': { - 'endpoints_num_filters': 512, - 'filter_size_scale': 1.3, - 'resample_alpha': 1.0, - 'block_repeats': 4, - }, -} - - -class BlockSpec(object): - """A container class that specifies the block configuration for SpineNet.""" - - def __init__(self, level, block_fn, input_offsets, is_output): - self.level = level - self.block_fn = block_fn - self.input_offsets = input_offsets - self.is_output = is_output - - -def build_block_specs(block_specs=None): - """Builds the list of BlockSpec objects for SpineNet.""" - if not block_specs: - block_specs = SPINENET_BLOCK_SPECS - logging.info('Building SpineNet block specs: %s', block_specs) - return [BlockSpec(*b) for b in block_specs] - - -class SpineNet(tf.keras.Model): - """Class to build SpineNet models.""" - - def __init__(self, - input_specs=tf.keras.layers.InputSpec(shape=[None, 640, 640, 3]), - min_level=3, - max_level=7, - block_specs=build_block_specs(), - endpoints_num_filters=256, - resample_alpha=0.5, - block_repeats=1, - filter_size_scale=1.0, - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001, - **kwargs): - """SpineNet model.""" - self._min_level = min_level - self._max_level = max_level - self._block_specs = block_specs - self._endpoints_num_filters = endpoints_num_filters - self._resample_alpha = resample_alpha - self._block_repeats = block_repeats - self._filter_size_scale = filter_size_scale - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - if activation == 'relu': - self._activation = tf.nn.relu - elif activation == 'swish': - self._activation = tf.nn.swish - else: - raise ValueError('Activation {} not implemented.'.format(activation)) - self._init_block_fn = 'bottleneck' - self._num_init_blocks = 2 - - if use_sync_bn: - self._norm = layers.experimental.SyncBatchNormalization - else: - self._norm = layers.BatchNormalization - - if tf.keras.backend.image_data_format() == 'channels_last': - self._bn_axis = -1 - else: - self._bn_axis = 1 - - # Build SpineNet. - inputs = tf.keras.Input(shape=input_specs.shape[1:]) - - net = self._build_stem(inputs=inputs) - net = self._build_scale_permuted_network( - net=net, input_width=input_specs.shape[1]) - net = self._build_endpoints(net=net) - - super(SpineNet, self).__init__(inputs=inputs, outputs=net) - - def _block_group(self, - inputs, - filters, - strides, - block_fn_cand, - block_repeats=1, - name='block_group'): - """Creates one group of blocks for the SpineNet model.""" - block_fn_candidates = { - 'bottleneck': nn_blocks.BottleneckBlock, - 'residual': nn_blocks.ResidualBlock, - } - block_fn = block_fn_candidates[block_fn_cand] - _, _, _, num_filters = inputs.get_shape().as_list() - - if block_fn_cand == 'bottleneck': - use_projection = not (num_filters == (filters * 4) and strides == 1) - else: - use_projection = not (num_filters == filters and strides == 1) - - x = block_fn( - filters=filters, - strides=strides, - use_projection=use_projection, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._activation, - use_sync_bn=self._use_sync_bn, - norm_momentum=self._norm_momentum, - norm_epsilon=self._norm_epsilon)( - inputs) - for _ in range(1, block_repeats): - x = block_fn( - filters=filters, - strides=1, - use_projection=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._activation, - use_sync_bn=self._use_sync_bn, - norm_momentum=self._norm_momentum, - norm_epsilon=self._norm_epsilon)( - x) - return tf.identity(x, name=name) - - def _build_stem(self, inputs): - """Build SpineNet stem.""" - x = layers.Conv2D( - filters=64, - kernel_size=7, - strides=2, - use_bias=False, - padding='same', - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - inputs) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation)(x) - x = layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x) - - net = [] - # Build the initial level 2 blocks. - for i in range(self._num_init_blocks): - x = self._block_group( - inputs=x, - filters=int(FILTER_SIZE_MAP[2] * self._filter_size_scale), - strides=1, - block_fn_cand=self._init_block_fn, - block_repeats=self._block_repeats, - name='stem_block_{}'.format(i + 1)) - net.append(x) - return net - - def _build_scale_permuted_network(self, - net, - input_width, - weighted_fusion=False): - """Build scale-permuted network.""" - net_sizes = [int(math.ceil(input_width / 2**2))] * len(net) - net_block_fns = [self._init_block_fn] * len(net) - num_outgoing_connections = [0] * len(net) - - endpoints = {} - for i, block_spec in enumerate(self._block_specs): - # Find out specs for the target block. - target_width = int(math.ceil(input_width / 2**block_spec.level)) - target_num_filters = int(FILTER_SIZE_MAP[block_spec.level] * - self._filter_size_scale) - target_block_fn = block_spec.block_fn - - # Resample then merge input0 and input1. - parents = [] - input0 = block_spec.input_offsets[0] - input1 = block_spec.input_offsets[1] - - x0 = self._resample_with_alpha( - inputs=net[input0], - input_width=net_sizes[input0], - input_block_fn=net_block_fns[input0], - target_width=target_width, - target_num_filters=target_num_filters, - target_block_fn=target_block_fn, - alpha=self._resample_alpha) - parents.append(x0) - num_outgoing_connections[input0] += 1 - - x1 = self._resample_with_alpha( - inputs=net[input1], - input_width=net_sizes[input1], - input_block_fn=net_block_fns[input1], - target_width=target_width, - target_num_filters=target_num_filters, - target_block_fn=target_block_fn, - alpha=self._resample_alpha) - parents.append(x1) - num_outgoing_connections[input1] += 1 - - # Merge 0 outdegree blocks to the output block. - if block_spec.is_output: - for j, (j_feat, - j_connections) in enumerate(zip(net, num_outgoing_connections)): - if j_connections == 0 and (j_feat.shape[2] == target_width and - j_feat.shape[3] == x0.shape[3]): - parents.append(j_feat) - num_outgoing_connections[j] += 1 - - # pylint: disable=g-direct-tensorflow-import - if weighted_fusion: - dtype = parents[0].dtype - parent_weights = [ - tf.nn.relu(tf.cast(tf.Variable(1.0, name='block{}_fusion{}'.format( - i, j)), dtype=dtype)) for j in range(len(parents))] - weights_sum = tf.add_n(parent_weights) - parents = [ - parents[i] * parent_weights[i] / (weights_sum + 0.0001) - for i in range(len(parents)) - ] - - # Fuse all parent nodes then build a new block. - x = tf_utils.get_activation(self._activation)(tf.add_n(parents)) - x = self._block_group( - inputs=x, - filters=target_num_filters, - strides=1, - block_fn_cand=target_block_fn, - block_repeats=self._block_repeats, - name='scale_permuted_block_{}'.format(i + 1)) - - net.append(x) - net_sizes.append(target_width) - net_block_fns.append(target_block_fn) - num_outgoing_connections.append(0) - - # Save output feats. - if block_spec.is_output: - if block_spec.level in endpoints: - raise ValueError('Duplicate feats found for output level {}.'.format( - block_spec.level)) - if (block_spec.level < self._min_level or - block_spec.level > self._max_level): - raise ValueError('Output level is out of range [{}, {}]'.format( - self._min_level, self._max_level)) - endpoints[block_spec.level] = x - - return endpoints - - def _build_endpoints(self, net): - """Match filter size for endpoints before sharing conv layers.""" - endpoints = {} - for level in range(self._min_level, self._max_level + 1): - x = layers.Conv2D( - filters=self._endpoints_num_filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - net[level]) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation)(x) - endpoints[level] = x - return endpoints - - def _resample_with_alpha(self, - inputs, - input_width, - input_block_fn, - target_width, - target_num_filters, - target_block_fn, - alpha=0.5): - """Match resolution and feature dimension.""" - _, _, _, input_num_filters = inputs.get_shape().as_list() - if input_block_fn == 'bottleneck': - input_num_filters /= 4 - new_num_filters = int(input_num_filters * alpha) - - x = layers.Conv2D( - filters=new_num_filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - inputs) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation)(x) - - # Spatial resampling. - if input_width > target_width: - x = layers.Conv2D( - filters=new_num_filters, - kernel_size=3, - strides=2, - padding='SAME', - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - x) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - x = tf_utils.get_activation(self._activation)(x) - input_width /= 2 - while input_width > target_width: - x = layers.MaxPool2D(pool_size=3, strides=2, padding='SAME')(x) - input_width /= 2 - elif input_width < target_width: - scale = target_width // input_width - x = layers.UpSampling2D(size=(scale, scale))(x) - - # Last 1x1 conv to match filter size. - if target_block_fn == 'bottleneck': - target_num_filters *= 4 - x = layers.Conv2D( - filters=target_num_filters, - kernel_size=1, - strides=1, - use_bias=False, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer)( - x) - x = self._norm( - axis=self._bn_axis, - momentum=self._norm_momentum, - epsilon=self._norm_epsilon)( - x) - - return x - - -class SpineNetBuilder(object): - """SpineNet builder.""" - - def __init__(self, - model_id, - input_specs=tf.keras.layers.InputSpec(shape=[None, 640, 640, 3]), - min_level=3, - max_level=7, - block_specs=build_block_specs(), - kernel_initializer='VarianceScaling', - kernel_regularizer=None, - bias_regularizer=None, - activation='relu', - use_sync_bn=False, - norm_momentum=0.99, - norm_epsilon=0.001): - if model_id not in SCALING_MAP: - raise ValueError( - 'SpineNet {} is not a valid architecture.'.format(model_id)) - scaling_params = SCALING_MAP[model_id] - self._input_specs = input_specs - self._min_level = min_level - self._max_level = max_level - self._block_specs = block_specs - self._endpoints_num_filters = scaling_params['endpoints_num_filters'] - self._resample_alpha = scaling_params['resample_alpha'] - self._block_repeats = scaling_params['block_repeats'] - self._filter_size_scale = scaling_params['filter_size_scale'] - self._kernel_initializer = kernel_initializer - self._kernel_regularizer = kernel_regularizer - self._bias_regularizer = bias_regularizer - self._activation = activation - self._use_sync_bn = use_sync_bn - self._norm_momentum = norm_momentum - self._norm_epsilon = norm_epsilon - - def __call__(self, inputs, is_training=None): - model = SpineNet( - input_specs=self._input_specs, - min_level=self._min_level, - max_level=self._max_level, - block_specs=self._block_specs, - endpoints_num_filters=self._endpoints_num_filters, - resample_alpha=self._resample_alpha, - block_repeats=self._block_repeats, - filter_size_scale=self._filter_size_scale, - kernel_initializer=self._kernel_initializer, - kernel_regularizer=self._kernel_regularizer, - bias_regularizer=self._bias_regularizer, - activation=self._activation, - use_sync_bn=self._use_sync_bn, - norm_momentum=self._norm_momentum, - norm_epsilon=self._norm_epsilon) - return model(inputs) diff --git a/official/vision/detection/modeling/base_model.py b/official/vision/detection/modeling/base_model.py deleted file mode 100644 index 0558e1db5530f3b85dd8ce9acf5d4c3b23146bc6..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/base_model.py +++ /dev/null @@ -1,136 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Base Model definition.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import abc -import functools -import re - -import tensorflow as tf -from official.vision.detection.modeling import checkpoint_utils -from official.vision.detection.modeling import learning_rates -from official.vision.detection.modeling import optimizers - - -def _make_filter_trainable_variables_fn(frozen_variable_prefix): - """Creates a function for filtering trainable varialbes.""" - - def _filter_trainable_variables(variables): - """Filters trainable varialbes. - - Args: - variables: a list of tf.Variable to be filtered. - - Returns: - filtered_variables: a list of tf.Variable filtered out the frozen ones. - """ - # frozen_variable_prefix: a regex string specifing the prefix pattern of - # the frozen variables' names. - filtered_variables = [ - v for v in variables if not frozen_variable_prefix or - not re.match(frozen_variable_prefix, v.name) - ] - return filtered_variables - - return _filter_trainable_variables - - -class Model(object): - """Base class for model function.""" - - __metaclass__ = abc.ABCMeta - - def __init__(self, params): - self._use_bfloat16 = params.architecture.use_bfloat16 - - if params.architecture.use_bfloat16: - tf.compat.v2.keras.mixed_precision.set_global_policy('mixed_bfloat16') - - # Optimization. - self._optimizer_fn = optimizers.OptimizerFactory(params.train.optimizer) - self._learning_rate = learning_rates.learning_rate_generator( - params.train.total_steps, params.train.learning_rate) - - self._frozen_variable_prefix = params.train.frozen_variable_prefix - self._regularization_var_regex = params.train.regularization_variable_regex - self._l2_weight_decay = params.train.l2_weight_decay - - # Checkpoint restoration. - self._checkpoint = params.train.checkpoint.as_dict() - - # Summary. - self._enable_summary = params.enable_summary - self._model_dir = params.model_dir - - @abc.abstractmethod - def build_outputs(self, inputs, mode): - """Build the graph of the forward path.""" - pass - - @abc.abstractmethod - def build_model(self, params, mode): - """Build the model object.""" - pass - - @abc.abstractmethod - def build_loss_fn(self): - """Build the model object.""" - pass - - def post_processing(self, labels, outputs): - """Post-processing function.""" - return labels, outputs - - def model_outputs(self, inputs, mode): - """Build the model outputs.""" - return self.build_outputs(inputs, mode) - - def build_optimizer(self): - """Returns train_op to optimize total loss.""" - # Sets up the optimizer. - return self._optimizer_fn(self._learning_rate) - - def make_filter_trainable_variables_fn(self): - """Creates a function for filtering trainable varialbes.""" - return _make_filter_trainable_variables_fn(self._frozen_variable_prefix) - - def weight_decay_loss(self, trainable_variables): - reg_variables = [ - v for v in trainable_variables - if self._regularization_var_regex is None or - re.match(self._regularization_var_regex, v.name) - ] - - return self._l2_weight_decay * tf.add_n( - [tf.nn.l2_loss(v) for v in reg_variables]) - - def make_restore_checkpoint_fn(self): - """Returns scaffold function to restore parameters from v1 checkpoint.""" - if 'skip_checkpoint_variables' in self._checkpoint: - skip_regex = self._checkpoint['skip_checkpoint_variables'] - else: - skip_regex = None - return checkpoint_utils.make_restore_checkpoint_fn( - self._checkpoint['path'], - prefix=self._checkpoint['prefix'], - skip_regex=skip_regex) - - def eval_metrics(self): - """Returns tuple of metric function and its inputs for evaluation.""" - raise NotImplementedError('Unimplemented eval_metrics') diff --git a/official/vision/detection/modeling/checkpoint_utils.py b/official/vision/detection/modeling/checkpoint_utils.py deleted file mode 100644 index fc0c09b7fcfee32b84db139b60880c018503d84a..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/checkpoint_utils.py +++ /dev/null @@ -1,137 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Util functions for loading checkpoints. - -Especially for loading Tensorflow 1.x -checkpoint to Tensorflow 2.x (keras) model. -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import re - -from absl import logging - -import tensorflow as tf - - -def _build_assignment_map(keras_model, - prefix='', - skip_variables_regex=None, - var_to_shape_map=None): - """Compute an assignment mapping for loading older checkpoints into a Keras - - model. Variable names are remapped from the original TPUEstimator model to - the new Keras name. - - Args: - keras_model: tf.keras.Model object to provide variables to assign. - prefix: prefix in the variable name to be remove for alignment with names in - the checkpoint. - skip_variables_regex: regular expression to math the names of variables that - do not need to be assign. - var_to_shape_map: variable name to shape mapping from the checkpoint. - - Returns: - The variable assignment map. - """ - assignment_map = {} - - checkpoint_names = [] - if var_to_shape_map: - checkpoint_names = list( - filter( - lambda x: not x.endswith('Momentum') and not x.endswith( - 'global_step'), var_to_shape_map.keys())) - - logging.info('Number of variables in the checkpoint %d', - len(checkpoint_names)) - - for var in keras_model.variables: - var_name = var.name - - if skip_variables_regex and re.match(skip_variables_regex, var_name): - continue - # Trim the index of the variable. - if ':' in var_name: - var_name = var_name[:var_name.rindex(':')] - if var_name.startswith(prefix): - var_name = var_name[len(prefix):] - - if not var_to_shape_map: - assignment_map[var_name] = var - continue - - # Match name with variables in the checkpoint. - match_names = list(filter(lambda x: x.endswith(var_name), checkpoint_names)) - try: - if match_names: - assert len(match_names) == 1, 'more then on matches for {}: {}'.format( - var_name, match_names) - checkpoint_names.remove(match_names[0]) - assignment_map[match_names[0]] = var - else: - logging.info('Error not found var name: %s', var_name) - except Exception as e: - logging.info('Error removing the match_name: %s', match_names) - logging.info('Exception: %s', e) - raise - logging.info('Found matching variable in checkpoint: %d', len(assignment_map)) - return assignment_map - - -def _get_checkpoint_map(checkpoint_path): - reader = tf.train.load_checkpoint(checkpoint_path) - return reader.get_variable_to_shape_map() - - -def make_restore_checkpoint_fn(checkpoint_path, prefix='', skip_regex=None): - """Returns scaffold function to restore parameters from v1 checkpoint. - - Args: - checkpoint_path: path of the checkpoint folder or file. - Example 1: '/path/to/model_dir/' - Example 2: '/path/to/model.ckpt-22500' - prefix: prefix in the variable name to be remove for alignment with names in - the checkpoint. - skip_regex: regular expression to math the names of variables that do not - need to be assign. - - Returns: - Callable[tf.kears.Model] -> void. Fn to load v1 checkpoint to keras model. - """ - - def _restore_checkpoint_fn(keras_model): - """Loads pretrained model through scaffold function.""" - if not checkpoint_path: - logging.info('checkpoint_path is empty') - return - var_prefix = prefix - if prefix and not prefix.endswith('/'): - var_prefix += '/' - var_to_shape_map = _get_checkpoint_map(checkpoint_path) - assert var_to_shape_map, 'var_to_shape_map should not be empty' - vars_to_load = _build_assignment_map( - keras_model, - prefix=var_prefix, - skip_variables_regex=skip_regex, - var_to_shape_map=var_to_shape_map) - if not vars_to_load: - raise ValueError('Variables to load is empty.') - tf.compat.v1.train.init_from_checkpoint(checkpoint_path, vars_to_load) - - return _restore_checkpoint_fn diff --git a/official/vision/detection/modeling/factory.py b/official/vision/detection/modeling/factory.py deleted file mode 100644 index c1393bcce047bce63cfae6cd9c963b783816d355..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/factory.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Factory to build detection model.""" - - -from official.vision.detection.modeling import maskrcnn_model -from official.vision.detection.modeling import olnmask_model -from official.vision.detection.modeling import retinanet_model -from official.vision.detection.modeling import shapemask_model - - -def model_generator(params): - """Model function generator.""" - if params.type == 'retinanet': - model_fn = retinanet_model.RetinanetModel(params) - elif params.type == 'mask_rcnn': - model_fn = maskrcnn_model.MaskrcnnModel(params) - elif params.type == 'olnmask': - model_fn = olnmask_model.OlnMaskModel(params) - elif params.type == 'shapemask': - model_fn = shapemask_model.ShapeMaskModel(params) - else: - raise ValueError('Model %s is not supported.'% params.type) - - return model_fn diff --git a/official/vision/detection/modeling/learning_rates.py b/official/vision/detection/modeling/learning_rates.py deleted file mode 100644 index 7c1cc147942af63064ae174baeeb0d5ead3a5d3e..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/learning_rates.py +++ /dev/null @@ -1,100 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Learning rate schedule.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -import numpy as np -import tensorflow as tf -from official.modeling.hyperparams import params_dict - - -class StepLearningRateWithLinearWarmup( - tf.keras.optimizers.schedules.LearningRateSchedule): - """Class to generate learning rate tensor.""" - - def __init__(self, total_steps, params): - """Creates the step learning rate tensor with linear warmup.""" - super(StepLearningRateWithLinearWarmup, self).__init__() - self._total_steps = total_steps - assert isinstance(params, (dict, params_dict.ParamsDict)) - if isinstance(params, dict): - params = params_dict.ParamsDict(params) - self._params = params - - def __call__(self, global_step): - warmup_lr = self._params.warmup_learning_rate - warmup_steps = self._params.warmup_steps - init_lr = self._params.init_learning_rate - lr_levels = self._params.learning_rate_levels - lr_steps = self._params.learning_rate_steps - linear_warmup = ( - warmup_lr + tf.cast(global_step, dtype=tf.float32) / warmup_steps * - (init_lr - warmup_lr)) - learning_rate = tf.where(global_step < warmup_steps, linear_warmup, init_lr) - - for next_learning_rate, start_step in zip(lr_levels, lr_steps): - learning_rate = tf.where(global_step >= start_step, next_learning_rate, - learning_rate) - return learning_rate - - def get_config(self): - return {'_params': self._params.as_dict()} - - -class CosineLearningRateWithLinearWarmup( - tf.keras.optimizers.schedules.LearningRateSchedule): - """Class to generate learning rate tensor.""" - - def __init__(self, total_steps, params): - """Creates the consine learning rate tensor with linear warmup.""" - super(CosineLearningRateWithLinearWarmup, self).__init__() - self._total_steps = total_steps - assert isinstance(params, (dict, params_dict.ParamsDict)) - if isinstance(params, dict): - params = params_dict.ParamsDict(params) - self._params = params - - def __call__(self, global_step): - global_step = tf.cast(global_step, dtype=tf.float32) - warmup_lr = self._params.warmup_learning_rate - warmup_steps = self._params.warmup_steps - init_lr = self._params.init_learning_rate - total_steps = self._total_steps - linear_warmup = ( - warmup_lr + global_step / warmup_steps * (init_lr - warmup_lr)) - cosine_learning_rate = ( - init_lr * (tf.cos(np.pi * (global_step - warmup_steps) / - (total_steps - warmup_steps)) + 1.0) / 2.0) - learning_rate = tf.where(global_step < warmup_steps, linear_warmup, - cosine_learning_rate) - return learning_rate - - def get_config(self): - return {'_params': self._params.as_dict()} - - -def learning_rate_generator(total_steps, params): - """The learning rate function generator.""" - if params.type == 'step': - return StepLearningRateWithLinearWarmup(total_steps, params) - elif params.type == 'cosine': - return CosineLearningRateWithLinearWarmup(total_steps, params) - else: - raise ValueError('Unsupported learning rate type: {}.'.format(params.type)) diff --git a/official/vision/detection/modeling/losses.py b/official/vision/detection/modeling/losses.py deleted file mode 100644 index 02e2632ae60c9da49f58c1239964d2f1104b52f8..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/losses.py +++ /dev/null @@ -1,725 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Losses used for detection models.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from absl import logging -import tensorflow as tf - - -def focal_loss(logits, targets, alpha, gamma, normalizer): - """Compute the focal loss between `logits` and the golden `target` values. - - Focal loss = -(1-pt)^gamma * log(pt) - where pt is the probability of being classified to the true class. - - Args: - logits: A float32 tensor of size - [batch, height_in, width_in, num_predictions]. - targets: A float32 tensor of size - [batch, height_in, width_in, num_predictions]. - alpha: A float32 scalar multiplying alpha to the loss from positive examples - and (1-alpha) to the loss from negative examples. - gamma: A float32 scalar modulating loss from hard and easy examples. - normalizer: A float32 scalar normalizes the total loss from all examples. - - Returns: - loss: A float32 Tensor of size [batch, height_in, width_in, num_predictions] - representing normalized loss on the prediction map. - """ - with tf.name_scope('focal_loss'): - positive_label_mask = tf.math.equal(targets, 1.0) - cross_entropy = ( - tf.nn.sigmoid_cross_entropy_with_logits(labels=targets, logits=logits)) - # Below are comments/derivations for computing modulator. - # For brevity, let x = logits, z = targets, r = gamma, and p_t = sigmod(x) - # for positive samples and 1 - sigmoid(x) for negative examples. - # - # The modulator, defined as (1 - P_t)^r, is a critical part in focal loss - # computation. For r > 0, it puts more weights on hard examples, and less - # weights on easier ones. However if it is directly computed as (1 - P_t)^r, - # its back-propagation is not stable when r < 1. The implementation here - # resolves the issue. - # - # For positive samples (labels being 1), - # (1 - p_t)^r - # = (1 - sigmoid(x))^r - # = (1 - (1 / (1 + exp(-x))))^r - # = (exp(-x) / (1 + exp(-x)))^r - # = exp(log((exp(-x) / (1 + exp(-x)))^r)) - # = exp(r * log(exp(-x)) - r * log(1 + exp(-x))) - # = exp(- r * x - r * log(1 + exp(-x))) - # - # For negative samples (labels being 0), - # (1 - p_t)^r - # = (sigmoid(x))^r - # = (1 / (1 + exp(-x)))^r - # = exp(log((1 / (1 + exp(-x)))^r)) - # = exp(-r * log(1 + exp(-x))) - # - # Therefore one unified form for positive (z = 1) and negative (z = 0) - # samples is: - # (1 - p_t)^r = exp(-r * z * x - r * log(1 + exp(-x))). - neg_logits = -1.0 * logits - modulator = tf.math.exp(gamma * targets * neg_logits - - gamma * tf.math.log1p(tf.math.exp(neg_logits))) - loss = modulator * cross_entropy - weighted_loss = tf.where(positive_label_mask, alpha * loss, - (1.0 - alpha) * loss) - weighted_loss /= normalizer - return weighted_loss - - -class RpnScoreLoss(object): - """Region Proposal Network score loss function.""" - - def __init__(self, params): - self._rpn_batch_size_per_im = params.rpn_batch_size_per_im - self._binary_crossentropy = tf.keras.losses.BinaryCrossentropy( - reduction=tf.keras.losses.Reduction.SUM, from_logits=True) - - def __call__(self, score_outputs, labels): - """Computes total RPN detection loss. - - Computes total RPN detection loss including box and score from all levels. - - Args: - score_outputs: an OrderDict with keys representing levels and values - representing scores in [batch_size, height, width, num_anchors]. - labels: the dictionary that returned from dataloader that includes - groundturth targets. - - Returns: - rpn_score_loss: a scalar tensor representing total score loss. - """ - with tf.name_scope('rpn_loss'): - levels = sorted(score_outputs.keys()) - - score_losses = [] - for level in levels: - score_losses.append( - self._rpn_score_loss( - score_outputs[level], - labels[level], - normalizer=tf.cast( - tf.shape(score_outputs[level])[0] * - self._rpn_batch_size_per_im, dtype=tf.float32))) - - # Sums per level losses to total loss. - return tf.math.add_n(score_losses) - - def _rpn_score_loss(self, score_outputs, score_targets, normalizer=1.0): - """Computes score loss.""" - # score_targets has three values: - # (1) score_targets[i]=1, the anchor is a positive sample. - # (2) score_targets[i]=0, negative. - # (3) score_targets[i]=-1, the anchor is don't care (ignore). - with tf.name_scope('rpn_score_loss'): - mask = tf.math.logical_or(tf.math.equal(score_targets, 1), - tf.math.equal(score_targets, 0)) - - score_targets = tf.math.maximum(score_targets, - tf.zeros_like(score_targets)) - - score_targets = tf.expand_dims(score_targets, axis=-1) - score_outputs = tf.expand_dims(score_outputs, axis=-1) - score_loss = self._binary_crossentropy( - score_targets, score_outputs, sample_weight=mask) - - score_loss /= normalizer - return score_loss - - -class RpnBoxLoss(object): - """Region Proposal Network box regression loss function.""" - - def __init__(self, params): - logging.info('RpnBoxLoss huber_loss_delta %s', params.huber_loss_delta) - # The delta is typically around the mean value of regression target. - # for instances, the regression targets of 512x512 input with 6 anchors on - # P2-P6 pyramid is about [0.1, 0.1, 0.2, 0.2]. - self._huber_loss = tf.keras.losses.Huber( - delta=params.huber_loss_delta, reduction=tf.keras.losses.Reduction.SUM) - - def __call__(self, box_outputs, labels): - """Computes total RPN detection loss. - - Computes total RPN detection loss including box and score from all levels. - - Args: - box_outputs: an OrderDict with keys representing levels and values - representing box regression targets in - [batch_size, height, width, num_anchors * 4]. - labels: the dictionary that returned from dataloader that includes - groundturth targets. - - Returns: - rpn_box_loss: a scalar tensor representing total box regression loss. - """ - with tf.name_scope('rpn_loss'): - levels = sorted(box_outputs.keys()) - - box_losses = [] - for level in levels: - box_losses.append(self._rpn_box_loss(box_outputs[level], labels[level])) - - # Sum per level losses to total loss. - return tf.add_n(box_losses) - - def _rpn_box_loss(self, box_outputs, box_targets, normalizer=1.0): - """Computes box regression loss.""" - with tf.name_scope('rpn_box_loss'): - mask = tf.cast(tf.not_equal(box_targets, 0.0), dtype=tf.float32) - box_targets = tf.expand_dims(box_targets, axis=-1) - box_outputs = tf.expand_dims(box_outputs, axis=-1) - box_loss = self._huber_loss(box_targets, box_outputs, sample_weight=mask) - # The loss is normalized by the sum of non-zero weights and additional - # normalizer provided by the function caller. Using + 0.01 here to avoid - # division by zero. - box_loss /= normalizer * (tf.reduce_sum(mask) + 0.01) - return box_loss - - -class OlnRpnCenterLoss(object): - """Object Localization Network RPN centerness regression loss function.""" - - def __init__(self): - self._l1_loss = tf.keras.losses.MeanAbsoluteError( - reduction=tf.keras.losses.Reduction.SUM) - - def __call__(self, center_outputs, labels): - """Computes total RPN centerness regression loss. - - Computes total RPN centerness score regression loss from all levels. - - Args: - center_outputs: an OrderDict with keys representing levels and values - representing anchor centerness regression targets in - [batch_size, height, width, num_anchors * 4]. - labels: the dictionary that returned from dataloader that includes - groundturth targets. - - Returns: - rpn_center_loss: a scalar tensor representing total centerness regression - loss. - """ - with tf.name_scope('rpn_loss'): - # Normalizer. - levels = sorted(center_outputs.keys()) - num_valid = 0 - # 00, neg=0, ign=-1. - mask_ = tf.cast(tf.logical_and( - tf.greater(center_targets[level][..., 0], 0.0), - tf.greater(tf.reduce_min(labels[level], -1), 0.0)), tf.float32) - normalizer += tf.reduce_sum(mask_) - normalizer += 1e-8 - # iou_loss over multi levels. - iou_losses = [] - for level in levels: - iou_losses.append( - self._rpn_iou_loss( - box_outputs[level], labels[level], - center_weight=center_targets[level][..., 0], - normalizer=normalizer)) - # Sum per level losses to total loss. - return tf.add_n(iou_losses) - - def _rpn_iou_loss(self, box_outputs, box_targets, - center_weight=None, normalizer=1.0): - """Computes box regression loss.""" - # for instances, the regression targets of 512x512 input with 6 anchors on - # P2-P6 pyramid is about [0.1, 0.1, 0.2, 0.2]. - with tf.name_scope('rpn_iou_loss'): - mask = tf.logical_and( - tf.greater(center_weight, 0.0), - tf.greater(tf.reduce_min(box_targets, -1), 0.0)) - - pred_left = box_outputs[..., 0] - pred_right = box_outputs[..., 1] - pred_top = box_outputs[..., 2] - pred_bottom = box_outputs[..., 3] - - gt_left = box_targets[..., 0] - gt_right = box_targets[..., 1] - gt_top = box_targets[..., 2] - gt_bottom = box_targets[..., 3] - - inter_width = (tf.minimum(pred_left, gt_left) + - tf.minimum(pred_right, gt_right)) - inter_height = (tf.minimum(pred_top, gt_top) + - tf.minimum(pred_bottom, gt_bottom)) - inter_area = inter_width * inter_height - union_area = ((pred_left + pred_right) * (pred_top + pred_bottom) + - (gt_left + gt_right) * (gt_top + gt_bottom) - - inter_area) - iou = inter_area / (union_area + 1e-8) - mask_ = tf.cast(mask, tf.float32) - iou = tf.clip_by_value(iou, clip_value_min=1e-8, clip_value_max=1.0) - neg_log_iou = -tf.math.log(iou) - iou_loss = tf.reduce_sum(neg_log_iou * mask_) - iou_loss /= normalizer - return iou_loss - - -class FastrcnnClassLoss(object): - """Fast R-CNN classification loss function.""" - - def __init__(self): - self._categorical_crossentropy = tf.keras.losses.CategoricalCrossentropy( - reduction=tf.keras.losses.Reduction.SUM, from_logits=True) - - def __call__(self, class_outputs, class_targets): - """Computes the class loss (Fast-RCNN branch) of Mask-RCNN. - - This function implements the classification loss of the Fast-RCNN. - - The classification loss is softmax on all RoIs. - Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/modeling/fast_rcnn_heads.py # pylint: disable=line-too-long - - Args: - class_outputs: a float tensor representing the class prediction for each box - with a shape of [batch_size, num_boxes, num_classes]. - class_targets: a float tensor representing the class label for each box - with a shape of [batch_size, num_boxes]. - - Returns: - a scalar tensor representing total class loss. - """ - with tf.name_scope('fast_rcnn_loss'): - batch_size, num_boxes, num_classes = class_outputs.get_shape().as_list() - class_targets = tf.cast(class_targets, dtype=tf.int32) - class_targets_one_hot = tf.one_hot(class_targets, num_classes) - return self._fast_rcnn_class_loss(class_outputs, class_targets_one_hot, - normalizer=batch_size * num_boxes / 2.0) - - def _fast_rcnn_class_loss(self, class_outputs, class_targets_one_hot, - normalizer): - """Computes classification loss.""" - with tf.name_scope('fast_rcnn_class_loss'): - class_loss = self._categorical_crossentropy(class_targets_one_hot, - class_outputs) - - class_loss /= normalizer - return class_loss - - -class FastrcnnBoxLoss(object): - """Fast R-CNN box regression loss function.""" - - def __init__(self, params): - logging.info('FastrcnnBoxLoss huber_loss_delta %s', params.huber_loss_delta) - # The delta is typically around the mean value of regression target. - # for instances, the regression targets of 512x512 input with 6 anchors on - # P2-P6 pyramid is about [0.1, 0.1, 0.2, 0.2]. - self._huber_loss = tf.keras.losses.Huber( - delta=params.huber_loss_delta, reduction=tf.keras.losses.Reduction.SUM) - - def __call__(self, box_outputs, class_targets, box_targets): - """Computes the box loss (Fast-RCNN branch) of Mask-RCNN. - - This function implements the box regression loss of the Fast-RCNN. As the - `box_outputs` produces `num_classes` boxes for each RoI, the reference model - expands `box_targets` to match the shape of `box_outputs` and selects only - the target that the RoI has a maximum overlap. (Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/roi_data/fast_rcnn.py) # pylint: disable=line-too-long - Instead, this function selects the `box_outputs` by the `class_targets` so - that it doesn't expand `box_targets`. - - The box loss is smooth L1-loss on only positive samples of RoIs. - Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/modeling/fast_rcnn_heads.py # pylint: disable=line-too-long - - Args: - box_outputs: a float tensor representing the box prediction for each box - with a shape of [batch_size, num_boxes, num_classes * 4]. - class_targets: a float tensor representing the class label for each box - with a shape of [batch_size, num_boxes]. - box_targets: a float tensor representing the box label for each box - with a shape of [batch_size, num_boxes, 4]. - - Returns: - box_loss: a scalar tensor representing total box regression loss. - """ - with tf.name_scope('fast_rcnn_loss'): - class_targets = tf.cast(class_targets, dtype=tf.int32) - - # Selects the box from `box_outputs` based on `class_targets`, with which - # the box has the maximum overlap. - (batch_size, num_rois, - num_class_specific_boxes) = box_outputs.get_shape().as_list() - num_classes = num_class_specific_boxes // 4 - box_outputs = tf.reshape(box_outputs, - [batch_size, num_rois, num_classes, 4]) - - box_indices = tf.reshape( - class_targets + tf.tile( - tf.expand_dims( - tf.range(batch_size) * num_rois * num_classes, 1), - [1, num_rois]) + tf.tile( - tf.expand_dims(tf.range(num_rois) * num_classes, 0), - [batch_size, 1]), [-1]) - - box_outputs = tf.matmul( - tf.one_hot( - box_indices, - batch_size * num_rois * num_classes, - dtype=box_outputs.dtype), tf.reshape(box_outputs, [-1, 4])) - box_outputs = tf.reshape(box_outputs, [batch_size, -1, 4]) - - return self._fast_rcnn_box_loss(box_outputs, box_targets, class_targets) - - def _fast_rcnn_box_loss(self, box_outputs, box_targets, class_targets, - normalizer=1.0): - """Computes box regression loss.""" - with tf.name_scope('fast_rcnn_box_loss'): - mask = tf.tile(tf.expand_dims(tf.greater(class_targets, 0), axis=2), - [1, 1, 4]) - mask = tf.cast(mask, dtype=tf.float32) - box_targets = tf.expand_dims(box_targets, axis=-1) - box_outputs = tf.expand_dims(box_outputs, axis=-1) - box_loss = self._huber_loss(box_targets, box_outputs, sample_weight=mask) - # The loss is normalized by the number of ones in mask, - # additianal normalizer provided by the user and using 0.01 here to avoid - # division by 0. - box_loss /= normalizer * (tf.reduce_sum(mask) + 0.01) - return box_loss - - -class OlnBoxScoreLoss(object): - """Object Localization Network Box-Iou scoring function.""" - - def __init__(self, params): - self._ignore_threshold = params.ignore_threshold - self._l1_loss = tf.keras.losses.MeanAbsoluteError( - reduction=tf.keras.losses.Reduction.SUM) - - def __call__(self, score_outputs, score_targets): - """Computes the class loss (Fast-RCNN branch) of Mask-RCNN. - - This function implements the classification loss of the Fast-RCNN. - - The classification loss is softmax on all RoIs. - Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/modeling/fast_rcnn_heads.py # pylint: disable=line-too-long - - Args: - score_outputs: a float tensor representing the class prediction for each box - with a shape of [batch_size, num_boxes, num_classes]. - score_targets: a float tensor representing the class label for each box - with a shape of [batch_size, num_boxes]. - - Returns: - a scalar tensor representing total score loss. - """ - with tf.name_scope('fast_rcnn_loss'): - score_outputs = tf.squeeze(score_outputs, -1) - - mask = tf.greater(score_targets, self._ignore_threshold) - num_valid = tf.reduce_sum(tf.cast(mask, tf.float32)) - score_targets = tf.maximum(score_targets, tf.zeros_like(score_targets)) - score_outputs = tf.sigmoid(score_outputs) - score_targets = tf.expand_dims(score_targets, -1) - score_outputs = tf.expand_dims(score_outputs, -1) - mask = tf.cast(mask, dtype=tf.float32) - score_loss = self._l1_loss(score_targets, score_outputs, - sample_weight=mask) - score_loss /= (num_valid + 1e-10) - return score_loss - - -class MaskrcnnLoss(object): - """Mask R-CNN instance segmentation mask loss function.""" - - def __init__(self): - self._binary_crossentropy = tf.keras.losses.BinaryCrossentropy( - reduction=tf.keras.losses.Reduction.SUM, from_logits=True) - - def __call__(self, mask_outputs, mask_targets, select_class_targets): - """Computes the mask loss of Mask-RCNN. - - This function implements the mask loss of Mask-RCNN. As the `mask_outputs` - produces `num_classes` masks for each RoI, the reference model expands - `mask_targets` to match the shape of `mask_outputs` and selects only the - target that the RoI has a maximum overlap. (Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/roi_data/mask_rcnn.py) # pylint: disable=line-too-long - Instead, this implementation selects the `mask_outputs` by the `class_targets` - so that it doesn't expand `mask_targets`. Note that the selection logic is - done in the post-processing of mask_rcnn_fn in mask_rcnn_architecture.py. - - Args: - mask_outputs: a float tensor representing the prediction for each mask, - with a shape of - [batch_size, num_masks, mask_height, mask_width]. - mask_targets: a float tensor representing the binary mask of ground truth - labels for each mask with a shape of - [batch_size, num_masks, mask_height, mask_width]. - select_class_targets: a tensor with a shape of [batch_size, num_masks], - representing the foreground mask targets. - - Returns: - mask_loss: a float tensor representing total mask loss. - """ - with tf.name_scope('mask_rcnn_loss'): - (batch_size, num_masks, mask_height, - mask_width) = mask_outputs.get_shape().as_list() - - weights = tf.tile( - tf.reshape(tf.greater(select_class_targets, 0), - [batch_size, num_masks, 1, 1]), - [1, 1, mask_height, mask_width]) - weights = tf.cast(weights, dtype=tf.float32) - - mask_targets = tf.expand_dims(mask_targets, axis=-1) - mask_outputs = tf.expand_dims(mask_outputs, axis=-1) - mask_loss = self._binary_crossentropy(mask_targets, mask_outputs, - sample_weight=weights) - - # The loss is normalized by the number of 1's in weights and - # + 0.01 is used to avoid division by zero. - return mask_loss / (tf.reduce_sum(weights) + 0.01) - - -class RetinanetClassLoss(object): - """RetinaNet class loss.""" - - def __init__(self, params, num_classes): - self._num_classes = num_classes - self._focal_loss_alpha = params.focal_loss_alpha - self._focal_loss_gamma = params.focal_loss_gamma - - def __call__(self, cls_outputs, labels, num_positives): - """Computes total detection loss. - - Computes total detection loss including box and class loss from all levels. - - Args: - cls_outputs: an OrderDict with keys representing levels and values - representing logits in [batch_size, height, width, - num_anchors * num_classes]. - labels: the dictionary that returned from dataloader that includes - class groundturth targets. - num_positives: number of positive examples in the minibatch. - - Returns: - an integar tensor representing total class loss. - """ - # Sums all positives in a batch for normalization and avoids zero - # num_positives_sum, which would lead to inf loss during training - num_positives_sum = tf.reduce_sum(input_tensor=num_positives) + 1.0 - - cls_losses = [] - for level in cls_outputs.keys(): - cls_losses.append(self.class_loss( - cls_outputs[level], labels[level], num_positives_sum)) - # Sums per level losses to total loss. - return tf.add_n(cls_losses) - - def class_loss(self, cls_outputs, cls_targets, num_positives, - ignore_label=-2): - """Computes RetinaNet classification loss.""" - # Onehot encoding for classification labels. - cls_targets_one_hot = tf.one_hot(cls_targets, self._num_classes) - bs, height, width, _, _ = cls_targets_one_hot.get_shape().as_list() - cls_targets_one_hot = tf.reshape(cls_targets_one_hot, - [bs, height, width, -1]) - loss = focal_loss(tf.cast(cls_outputs, dtype=tf.float32), - tf.cast(cls_targets_one_hot, dtype=tf.float32), - self._focal_loss_alpha, - self._focal_loss_gamma, - num_positives) - - ignore_loss = tf.where( - tf.equal(cls_targets, ignore_label), - tf.zeros_like(cls_targets, dtype=tf.float32), - tf.ones_like(cls_targets, dtype=tf.float32), - ) - ignore_loss = tf.expand_dims(ignore_loss, -1) - ignore_loss = tf.tile(ignore_loss, [1, 1, 1, 1, self._num_classes]) - ignore_loss = tf.reshape(ignore_loss, tf.shape(input=loss)) - return tf.reduce_sum(input_tensor=ignore_loss * loss) - - -class RetinanetBoxLoss(object): - """RetinaNet box loss.""" - - def __init__(self, params): - self._huber_loss = tf.keras.losses.Huber( - delta=params.huber_loss_delta, reduction=tf.keras.losses.Reduction.SUM) - - def __call__(self, box_outputs, labels, num_positives): - """Computes box detection loss. - - Computes total detection loss including box and class loss from all levels. - - Args: - box_outputs: an OrderDict with keys representing levels and values - representing box regression targets in [batch_size, height, width, - num_anchors * 4]. - labels: the dictionary that returned from dataloader that includes - box groundturth targets. - num_positives: number of positive examples in the minibatch. - - Returns: - an integer tensor representing total box regression loss. - """ - # Sums all positives in a batch for normalization and avoids zero - # num_positives_sum, which would lead to inf loss during training - num_positives_sum = tf.reduce_sum(input_tensor=num_positives) + 1.0 - - box_losses = [] - for level in box_outputs.keys(): - box_targets_l = labels[level] - box_losses.append( - self.box_loss(box_outputs[level], box_targets_l, num_positives_sum)) - # Sums per level losses to total loss. - return tf.add_n(box_losses) - - def box_loss(self, box_outputs, box_targets, num_positives): - """Computes RetinaNet box regression loss.""" - # The delta is typically around the mean value of regression target. - # for instances, the regression targets of 512x512 input with 6 anchors on - # P3-P7 pyramid is about [0.1, 0.1, 0.2, 0.2]. - normalizer = num_positives * 4.0 - mask = tf.cast(tf.not_equal(box_targets, 0.0), dtype=tf.float32) - box_targets = tf.expand_dims(box_targets, axis=-1) - box_outputs = tf.expand_dims(box_outputs, axis=-1) - box_loss = self._huber_loss(box_targets, box_outputs, sample_weight=mask) - box_loss /= normalizer - return box_loss - - -class ShapemaskMseLoss(object): - """ShapeMask mask Mean Squared Error loss function wrapper.""" - - def __call__(self, probs, labels, valid_mask): - """Compute instance segmentation loss. - - Args: - probs: A Tensor of shape [batch_size * num_points, height, width, - num_classes]. The logits are not necessarily between 0 and 1. - labels: A float32/float16 Tensor of shape [batch_size, num_instances, - mask_size, mask_size], where mask_size = - mask_crop_size * gt_upsample_scale for fine mask, or mask_crop_size - for coarse masks and shape priors. - valid_mask: a binary mask indicating valid training masks. - - Returns: - loss: an float tensor representing total mask classification loss. - """ - with tf.name_scope('shapemask_prior_loss'): - batch_size, num_instances = valid_mask.get_shape().as_list()[:2] - diff = (tf.cast(labels, dtype=tf.float32) - - tf.cast(probs, dtype=tf.float32)) - diff *= tf.cast( - tf.reshape(valid_mask, [batch_size, num_instances, 1, 1]), - tf.float32) - # Adding 0.001 in the denominator to avoid division by zero. - loss = tf.nn.l2_loss(diff) / (tf.reduce_sum(labels) + 0.001) - return loss - - -class ShapemaskLoss(object): - """ShapeMask mask loss function wrapper.""" - - def __init__(self): - self._binary_crossentropy = tf.keras.losses.BinaryCrossentropy( - reduction=tf.keras.losses.Reduction.SUM, from_logits=True) - - def __call__(self, logits, labels, valid_mask): - """ShapeMask mask cross entropy loss function wrapper. - - Args: - logits: A Tensor of shape [batch_size * num_instances, height, width, - num_classes]. The logits are not necessarily between 0 and 1. - labels: A float16/float32 Tensor of shape [batch_size, num_instances, - mask_size, mask_size], where mask_size = - mask_crop_size * gt_upsample_scale for fine mask, or mask_crop_size - for coarse masks and shape priors. - valid_mask: a binary mask of shape [batch_size, num_instances] - indicating valid training masks. - Returns: - loss: an float tensor representing total mask classification loss. - """ - with tf.name_scope('shapemask_loss'): - batch_size, num_instances = valid_mask.get_shape().as_list()[:2] - labels = tf.cast(labels, tf.float32) - logits = tf.cast(logits, tf.float32) - loss = self._binary_crossentropy(labels, logits) - loss *= tf.cast(tf.reshape( - valid_mask, [batch_size, num_instances, 1, 1]), loss.dtype) - # Adding 0.001 in the denominator to avoid division by zero. - loss = tf.reduce_sum(loss) / (tf.reduce_sum(labels) + 0.001) - return loss diff --git a/official/vision/detection/modeling/maskrcnn_model.py b/official/vision/detection/modeling/maskrcnn_model.py deleted file mode 100644 index e9e6bb2697d7f78d4d01c9dceb8e0997376aecab..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/maskrcnn_model.py +++ /dev/null @@ -1,338 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Model defination for the Mask R-CNN Model.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.detection.dataloader import anchor -from official.vision.detection.dataloader import mode_keys -from official.vision.detection.evaluation import factory as eval_factory -from official.vision.detection.modeling import base_model -from official.vision.detection.modeling import losses -from official.vision.detection.modeling.architecture import factory -from official.vision.detection.ops import postprocess_ops -from official.vision.detection.ops import roi_ops -from official.vision.detection.ops import spatial_transform_ops -from official.vision.detection.ops import target_ops -from official.vision.detection.utils import box_utils - - -class MaskrcnnModel(base_model.Model): - """Mask R-CNN model function.""" - - def __init__(self, params): - super(MaskrcnnModel, self).__init__(params) - - # For eval metrics. - self._params = params - self._keras_model = None - - self._include_mask = params.architecture.include_mask - - # Architecture generators. - self._backbone_fn = factory.backbone_generator(params) - self._fpn_fn = factory.multilevel_features_generator(params) - self._rpn_head_fn = factory.rpn_head_generator(params) - self._generate_rois_fn = roi_ops.ROIGenerator(params.roi_proposal) - self._sample_rois_fn = target_ops.ROISampler(params.roi_sampling) - self._sample_masks_fn = target_ops.MaskSampler( - params.architecture.mask_target_size, - params.mask_sampling.num_mask_samples_per_image) - - self._frcnn_head_fn = factory.fast_rcnn_head_generator(params) - if self._include_mask: - self._mrcnn_head_fn = factory.mask_rcnn_head_generator(params) - - # Loss function. - self._rpn_score_loss_fn = losses.RpnScoreLoss(params.rpn_score_loss) - self._rpn_box_loss_fn = losses.RpnBoxLoss(params.rpn_box_loss) - self._frcnn_class_loss_fn = losses.FastrcnnClassLoss() - self._frcnn_box_loss_fn = losses.FastrcnnBoxLoss(params.frcnn_box_loss) - if self._include_mask: - self._mask_loss_fn = losses.MaskrcnnLoss() - - self._generate_detections_fn = postprocess_ops.GenericDetectionGenerator( - params.postprocess) - - self._transpose_input = params.train.transpose_input - assert not self._transpose_input, 'Transpose input is not supportted.' - - def build_outputs(self, inputs, mode): - is_training = mode == mode_keys.TRAIN - model_outputs = {} - - image = inputs['image'] - _, image_height, image_width, _ = image.get_shape().as_list() - backbone_features = self._backbone_fn(image, is_training) - fpn_features = self._fpn_fn(backbone_features, is_training) - - rpn_score_outputs, rpn_box_outputs = self._rpn_head_fn( - fpn_features, is_training) - model_outputs.update({ - 'rpn_score_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - rpn_score_outputs), - 'rpn_box_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - rpn_box_outputs), - }) - input_anchor = anchor.Anchor(self._params.architecture.min_level, - self._params.architecture.max_level, - self._params.anchor.num_scales, - self._params.anchor.aspect_ratios, - self._params.anchor.anchor_size, - (image_height, image_width)) - rpn_rois, _ = self._generate_rois_fn(rpn_box_outputs, rpn_score_outputs, - input_anchor.multilevel_boxes, - inputs['image_info'][:, 1, :], - is_training) - if is_training: - rpn_rois = tf.stop_gradient(rpn_rois) - - # Sample proposals. - rpn_rois, matched_gt_boxes, matched_gt_classes, matched_gt_indices = ( - self._sample_rois_fn(rpn_rois, inputs['gt_boxes'], - inputs['gt_classes'])) - - # Create bounding box training targets. - box_targets = box_utils.encode_boxes( - matched_gt_boxes, rpn_rois, weights=[10.0, 10.0, 5.0, 5.0]) - # If the target is background, the box target is set to all 0s. - box_targets = tf.where( - tf.tile( - tf.expand_dims(tf.equal(matched_gt_classes, 0), axis=-1), - [1, 1, 4]), tf.zeros_like(box_targets), box_targets) - model_outputs.update({ - 'class_targets': matched_gt_classes, - 'box_targets': box_targets, - }) - - roi_features = spatial_transform_ops.multilevel_crop_and_resize( - fpn_features, rpn_rois, output_size=7) - - class_outputs, box_outputs = self._frcnn_head_fn(roi_features, is_training) - - model_outputs.update({ - 'class_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - class_outputs), - 'box_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - box_outputs), - }) - - # Add this output to train to make the checkpoint loadable in predict mode. - # If we skip it in train mode, the heads will be out-of-order and checkpoint - # loading will fail. - boxes, scores, classes, valid_detections = self._generate_detections_fn( - box_outputs, class_outputs, rpn_rois, inputs['image_info'][:, 1:2, :]) - model_outputs.update({ - 'num_detections': valid_detections, - 'detection_boxes': boxes, - 'detection_classes': classes, - 'detection_scores': scores, - }) - - if not self._include_mask: - return model_outputs - - if is_training: - rpn_rois, classes, mask_targets = self._sample_masks_fn( - rpn_rois, matched_gt_boxes, matched_gt_classes, matched_gt_indices, - inputs['gt_masks']) - mask_targets = tf.stop_gradient(mask_targets) - - classes = tf.cast(classes, dtype=tf.int32) - - model_outputs.update({ - 'mask_targets': mask_targets, - 'sampled_class_targets': classes, - }) - else: - rpn_rois = boxes - classes = tf.cast(classes, dtype=tf.int32) - - mask_roi_features = spatial_transform_ops.multilevel_crop_and_resize( - fpn_features, rpn_rois, output_size=14) - - mask_outputs = self._mrcnn_head_fn(mask_roi_features, classes, is_training) - - if is_training: - model_outputs.update({ - 'mask_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - mask_outputs), - }) - else: - model_outputs.update({'detection_masks': tf.nn.sigmoid(mask_outputs)}) - - return model_outputs - - def build_loss_fn(self): - if self._keras_model is None: - raise ValueError('build_loss_fn() must be called after build_model().') - - filter_fn = self.make_filter_trainable_variables_fn() - trainable_variables = filter_fn(self._keras_model.trainable_variables) - - def _total_loss_fn(labels, outputs): - rpn_score_loss = self._rpn_score_loss_fn(outputs['rpn_score_outputs'], - labels['rpn_score_targets']) - rpn_box_loss = self._rpn_box_loss_fn(outputs['rpn_box_outputs'], - labels['rpn_box_targets']) - - frcnn_class_loss = self._frcnn_class_loss_fn(outputs['class_outputs'], - outputs['class_targets']) - frcnn_box_loss = self._frcnn_box_loss_fn(outputs['box_outputs'], - outputs['class_targets'], - outputs['box_targets']) - - if self._include_mask: - mask_loss = self._mask_loss_fn(outputs['mask_outputs'], - outputs['mask_targets'], - outputs['sampled_class_targets']) - else: - mask_loss = 0.0 - - model_loss = ( - rpn_score_loss + rpn_box_loss + frcnn_class_loss + frcnn_box_loss + - mask_loss) - - l2_regularization_loss = self.weight_decay_loss(trainable_variables) - total_loss = model_loss + l2_regularization_loss - return { - 'total_loss': total_loss, - 'loss': total_loss, - 'fast_rcnn_class_loss': frcnn_class_loss, - 'fast_rcnn_box_loss': frcnn_box_loss, - 'mask_loss': mask_loss, - 'model_loss': model_loss, - 'l2_regularization_loss': l2_regularization_loss, - 'rpn_score_loss': rpn_score_loss, - 'rpn_box_loss': rpn_box_loss, - } - - return _total_loss_fn - - def build_input_layers(self, params, mode): - is_training = mode == mode_keys.TRAIN - input_shape = ( - params.maskrcnn_parser.output_size + - [params.maskrcnn_parser.num_channels]) - if is_training: - batch_size = params.train.batch_size - input_layer = { - 'image': - tf.keras.layers.Input( - shape=input_shape, - batch_size=batch_size, - name='image', - dtype=tf.bfloat16 if self._use_bfloat16 else tf.float32), - 'image_info': - tf.keras.layers.Input( - shape=[4, 2], - batch_size=batch_size, - name='image_info', - ), - 'gt_boxes': - tf.keras.layers.Input( - shape=[params.maskrcnn_parser.max_num_instances, 4], - batch_size=batch_size, - name='gt_boxes'), - 'gt_classes': - tf.keras.layers.Input( - shape=[params.maskrcnn_parser.max_num_instances], - batch_size=batch_size, - name='gt_classes', - dtype=tf.int64), - } - if self._include_mask: - input_layer['gt_masks'] = tf.keras.layers.Input( - shape=[ - params.maskrcnn_parser.max_num_instances, - params.maskrcnn_parser.mask_crop_size, - params.maskrcnn_parser.mask_crop_size - ], - batch_size=batch_size, - name='gt_masks') - else: - batch_size = params.eval.batch_size - input_layer = { - 'image': - tf.keras.layers.Input( - shape=input_shape, - batch_size=batch_size, - name='image', - dtype=tf.bfloat16 if self._use_bfloat16 else tf.float32), - 'image_info': - tf.keras.layers.Input( - shape=[4, 2], - batch_size=batch_size, - name='image_info', - ), - } - return input_layer - - def build_model(self, params, mode): - if self._keras_model is None: - input_layers = self.build_input_layers(self._params, mode) - outputs = self.model_outputs(input_layers, mode) - - model = tf.keras.models.Model( - inputs=input_layers, outputs=outputs, name='maskrcnn') - assert model is not None, 'Fail to build tf.keras.Model.' - model.optimizer = self.build_optimizer() - self._keras_model = model - - return self._keras_model - - def post_processing(self, labels, outputs): - required_output_fields = ['class_outputs', 'box_outputs'] - for field in required_output_fields: - if field not in outputs: - raise ValueError('"%s" is missing in outputs, requried %s found %s' % - (field, required_output_fields, outputs.keys())) - predictions = { - 'image_info': labels['image_info'], - 'num_detections': outputs['num_detections'], - 'detection_boxes': outputs['detection_boxes'], - 'detection_classes': outputs['detection_classes'], - 'detection_scores': outputs['detection_scores'], - } - if self._include_mask: - predictions.update({ - 'detection_masks': outputs['detection_masks'], - }) - - if 'groundtruths' in labels: - predictions['source_id'] = labels['groundtruths']['source_id'] - predictions['gt_source_id'] = labels['groundtruths']['source_id'] - predictions['gt_height'] = labels['groundtruths']['height'] - predictions['gt_width'] = labels['groundtruths']['width'] - predictions['gt_image_info'] = labels['image_info'] - predictions['gt_num_detections'] = ( - labels['groundtruths']['num_detections']) - predictions['gt_boxes'] = labels['groundtruths']['boxes'] - predictions['gt_classes'] = labels['groundtruths']['classes'] - predictions['gt_areas'] = labels['groundtruths']['areas'] - predictions['gt_is_crowds'] = labels['groundtruths']['is_crowds'] - return labels, predictions - - def eval_metrics(self): - return eval_factory.evaluator_generator(self._params.eval) diff --git a/official/vision/detection/modeling/olnmask_model.py b/official/vision/detection/modeling/olnmask_model.py deleted file mode 100644 index 60d59c1bd12bd25f8c2ed0e30bc32c7dad4cbdcf..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/olnmask_model.py +++ /dev/null @@ -1,432 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Model defination for the Object Localization Network (OLN) Model.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.detection.dataloader import anchor -from official.vision.detection.dataloader import mode_keys -from official.vision.detection.modeling import losses -from official.vision.detection.modeling.architecture import factory -from official.vision.detection.modeling.maskrcnn_model import MaskrcnnModel -from official.vision.detection.ops import postprocess_ops -from official.vision.detection.ops import roi_ops -from official.vision.detection.ops import spatial_transform_ops -from official.vision.detection.ops import target_ops -from official.vision.detection.utils import box_utils - - -class OlnMaskModel(MaskrcnnModel): - """OLN-Mask model function.""" - - def __init__(self, params): - super(OlnMaskModel, self).__init__(params) - - self._params = params - - # Different heads and layers. - self._include_rpn_class = params.architecture.include_rpn_class - self._include_mask = params.architecture.include_mask - self._include_frcnn_class = params.architecture.include_frcnn_class - self._include_frcnn_box = params.architecture.include_frcnn_box - self._include_centerness = params.rpn_head.has_centerness - self._include_box_score = (params.frcnn_head.has_scoring and - params.architecture.include_frcnn_box) - self._include_mask_score = (params.mrcnn_head.has_scoring and - params.architecture.include_mask) - - # Architecture generators. - self._backbone_fn = factory.backbone_generator(params) - self._fpn_fn = factory.multilevel_features_generator(params) - self._rpn_head_fn = factory.rpn_head_generator(params) - if self._include_centerness: - self._rpn_head_fn = factory.oln_rpn_head_generator(params) - else: - self._rpn_head_fn = factory.rpn_head_generator(params) - self._generate_rois_fn = roi_ops.OlnROIGenerator(params.roi_proposal) - self._sample_rois_fn = target_ops.ROIScoreSampler(params.roi_sampling) - self._sample_masks_fn = target_ops.MaskSampler( - params.architecture.mask_target_size, - params.mask_sampling.num_mask_samples_per_image) - - if self._include_box_score: - self._frcnn_head_fn = factory.oln_box_score_head_generator(params) - else: - self._frcnn_head_fn = factory.fast_rcnn_head_generator(params) - - if self._include_mask: - if self._include_mask_score: - self._mrcnn_head_fn = factory.oln_mask_score_head_generator(params) - else: - self._mrcnn_head_fn = factory.mask_rcnn_head_generator(params) - - # Loss function. - self._rpn_score_loss_fn = losses.RpnScoreLoss(params.rpn_score_loss) - self._rpn_box_loss_fn = losses.RpnBoxLoss(params.rpn_box_loss) - if self._include_centerness: - self._rpn_iou_loss_fn = losses.OlnRpnIoULoss() - self._rpn_center_loss_fn = losses.OlnRpnCenterLoss() - self._frcnn_class_loss_fn = losses.FastrcnnClassLoss() - self._frcnn_box_loss_fn = losses.FastrcnnBoxLoss(params.frcnn_box_loss) - if self._include_box_score: - self._frcnn_box_score_loss_fn = losses.OlnBoxScoreLoss( - params.frcnn_box_score_loss) - if self._include_mask: - self._mask_loss_fn = losses.MaskrcnnLoss() - - self._generate_detections_fn = postprocess_ops.OlnDetectionGenerator( - params.postprocess) - - self._transpose_input = params.train.transpose_input - assert not self._transpose_input, 'Transpose input is not supportted.' - - def build_outputs(self, inputs, mode): - is_training = mode == mode_keys.TRAIN - model_outputs = {} - - image = inputs['image'] - _, image_height, image_width, _ = image.get_shape().as_list() - backbone_features = self._backbone_fn(image, is_training) - fpn_features = self._fpn_fn(backbone_features, is_training) - - # rpn_centerness. - if self._include_centerness: - rpn_score_outputs, rpn_box_outputs, rpn_center_outputs = ( - self._rpn_head_fn(fpn_features, is_training)) - model_outputs.update({ - 'rpn_center_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - rpn_center_outputs), - }) - object_scores = rpn_center_outputs - else: - rpn_score_outputs, rpn_box_outputs = self._rpn_head_fn( - fpn_features, is_training) - object_scores = None - model_outputs.update({ - 'rpn_score_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - rpn_score_outputs), - 'rpn_box_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - rpn_box_outputs), - }) - input_anchor = anchor.Anchor(self._params.architecture.min_level, - self._params.architecture.max_level, - self._params.anchor.num_scales, - self._params.anchor.aspect_ratios, - self._params.anchor.anchor_size, - (image_height, image_width)) - rpn_rois, rpn_roi_scores = self._generate_rois_fn( - rpn_box_outputs, - rpn_score_outputs, - input_anchor.multilevel_boxes, - inputs['image_info'][:, 1, :], - is_training, - is_box_lrtb=self._include_centerness, - object_scores=object_scores, - ) - if (not self._include_frcnn_class and - not self._include_frcnn_box and - not self._include_mask): - # if not is_training: - # For direct RPN detection, - # use dummy box_outputs = (dy,dx,dh,dw = 0,0,0,0) - box_outputs = tf.zeros_like(rpn_rois) - box_outputs = tf.concat([box_outputs, box_outputs], -1) - boxes, scores, classes, valid_detections = self._generate_detections_fn( - box_outputs, rpn_roi_scores, rpn_rois, - inputs['image_info'][:, 1:2, :], - is_single_fg_score=True, # if no_background, no softmax is applied. - keep_nms=True) - model_outputs.update({ - 'num_detections': valid_detections, - 'detection_boxes': boxes, - 'detection_classes': classes, - 'detection_scores': scores, - }) - return model_outputs - - # ---- OLN-Proposal finishes here. ---- - - if is_training: - rpn_rois = tf.stop_gradient(rpn_rois) - rpn_roi_scores = tf.stop_gradient(rpn_roi_scores) - - # Sample proposals. - (rpn_rois, rpn_roi_scores, matched_gt_boxes, matched_gt_classes, - matched_gt_indices) = ( - self._sample_rois_fn(rpn_rois, rpn_roi_scores, inputs['gt_boxes'], - inputs['gt_classes'])) - # Create bounding box training targets. - box_targets = box_utils.encode_boxes( - matched_gt_boxes, rpn_rois, weights=[10.0, 10.0, 5.0, 5.0]) - # If the target is background, the box target is set to all 0s. - box_targets = tf.where( - tf.tile( - tf.expand_dims(tf.equal(matched_gt_classes, 0), axis=-1), - [1, 1, 4]), tf.zeros_like(box_targets), box_targets) - model_outputs.update({ - 'class_targets': matched_gt_classes, - 'box_targets': box_targets, - }) - # Create Box-IoU targets. { - box_ious = box_utils.bbox_overlap( - rpn_rois, inputs['gt_boxes']) - matched_box_ious = tf.reduce_max(box_ious, 2) - model_outputs.update({ - 'box_iou_targets': matched_box_ious,}) # } - - roi_features = spatial_transform_ops.multilevel_crop_and_resize( - fpn_features, rpn_rois, output_size=7) - - if not self._include_box_score: - class_outputs, box_outputs = self._frcnn_head_fn( - roi_features, is_training) - else: - class_outputs, box_outputs, score_outputs = self._frcnn_head_fn( - roi_features, is_training) - model_outputs.update({ - 'box_score_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - score_outputs),}) - model_outputs.update({ - 'class_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - class_outputs), - 'box_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - box_outputs), - }) - - # Add this output to train to make the checkpoint loadable in predict mode. - # If we skip it in train mode, the heads will be out-of-order and checkpoint - # loading will fail. - if not self._include_frcnn_box: - box_outputs = tf.zeros_like(box_outputs) # dummy zeros. - - if self._include_box_score: - score_outputs = tf.cast(tf.squeeze(score_outputs, -1), - rpn_roi_scores.dtype) - - # box-score = (rpn-centerness * box-iou)^(1/2) - # TR: rpn_roi_scores: b,1000, score_outputs: b,512 - # TS: rpn_roi_scores: b,1000, score_outputs: b,1000 - box_scores = tf.pow( - rpn_roi_scores * tf.sigmoid(score_outputs), 1/2.) - - if not self._include_frcnn_class: - boxes, scores, classes, valid_detections = self._generate_detections_fn( - box_outputs, - box_scores, - rpn_rois, - inputs['image_info'][:, 1:2, :], - is_single_fg_score=True, - keep_nms=True,) - else: - boxes, scores, classes, valid_detections = self._generate_detections_fn( - box_outputs, class_outputs, rpn_rois, - inputs['image_info'][:, 1:2, :], - keep_nms=True,) - model_outputs.update({ - 'num_detections': valid_detections, - 'detection_boxes': boxes, - 'detection_classes': classes, - 'detection_scores': scores, - }) - - # ---- OLN-Box finishes here. ---- - - if not self._include_mask: - return model_outputs - - if is_training: - rpn_rois, classes, mask_targets = self._sample_masks_fn( - rpn_rois, matched_gt_boxes, matched_gt_classes, matched_gt_indices, - inputs['gt_masks']) - mask_targets = tf.stop_gradient(mask_targets) - - classes = tf.cast(classes, dtype=tf.int32) - - model_outputs.update({ - 'mask_targets': mask_targets, - 'sampled_class_targets': classes, - }) - else: - rpn_rois = boxes - classes = tf.cast(classes, dtype=tf.int32) - - mask_roi_features = spatial_transform_ops.multilevel_crop_and_resize( - fpn_features, rpn_rois, output_size=14) - - mask_outputs = self._mrcnn_head_fn(mask_roi_features, classes, is_training) - - if is_training: - model_outputs.update({ - 'mask_outputs': - tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), - mask_outputs), - }) - else: - model_outputs.update({'detection_masks': tf.nn.sigmoid(mask_outputs)}) - - return model_outputs - - def build_loss_fn(self): - if self._keras_model is None: - raise ValueError('build_loss_fn() must be called after build_model().') - - filter_fn = self.make_filter_trainable_variables_fn() - trainable_variables = filter_fn(self._keras_model.trainable_variables) - - def _total_loss_fn(labels, outputs): - if self._include_rpn_class: - rpn_score_loss = self._rpn_score_loss_fn(outputs['rpn_score_outputs'], - labels['rpn_score_targets']) - else: - rpn_score_loss = 0.0 - if self._include_centerness: - rpn_center_loss = self._rpn_center_loss_fn( - outputs['rpn_center_outputs'], labels['rpn_center_targets']) - rpn_box_loss = self._rpn_iou_loss_fn( - outputs['rpn_box_outputs'], labels['rpn_box_targets'], - labels['rpn_center_targets']) - else: - rpn_center_loss = 0.0 - rpn_box_loss = self._rpn_box_loss_fn( - outputs['rpn_box_outputs'], labels['rpn_box_targets']) - - if self._include_frcnn_class: - frcnn_class_loss = self._frcnn_class_loss_fn( - outputs['class_outputs'], outputs['class_targets']) - else: - frcnn_class_loss = 0.0 - if self._include_frcnn_box: - frcnn_box_loss = self._frcnn_box_loss_fn( - outputs['box_outputs'], outputs['class_targets'], - outputs['box_targets']) - else: - frcnn_box_loss = 0.0 - if self._include_box_score: - box_score_loss = self._frcnn_box_score_loss_fn( - outputs['box_score_outputs'], outputs['box_iou_targets']) - else: - box_score_loss = 0.0 - - if self._include_mask: - mask_loss = self._mask_loss_fn(outputs['mask_outputs'], - outputs['mask_targets'], - outputs['sampled_class_targets']) - else: - mask_loss = 0.0 - - model_loss = ( - rpn_score_loss + rpn_box_loss + rpn_center_loss + - frcnn_class_loss + frcnn_box_loss + box_score_loss + - mask_loss) - - l2_regularization_loss = self.weight_decay_loss(trainable_variables) - total_loss = model_loss + l2_regularization_loss - return { - 'total_loss': total_loss, - 'loss': total_loss, - 'fast_rcnn_class_loss': frcnn_class_loss, - 'fast_rcnn_box_loss': frcnn_box_loss, - 'fast_rcnn_box_score_loss': box_score_loss, - 'mask_loss': mask_loss, - 'model_loss': model_loss, - 'l2_regularization_loss': l2_regularization_loss, - 'rpn_score_loss': rpn_score_loss, - 'rpn_box_loss': rpn_box_loss, - 'rpn_center_loss': rpn_center_loss, - } - - return _total_loss_fn - - def build_input_layers(self, params, mode): - is_training = mode == mode_keys.TRAIN - input_shape = ( - params.olnmask_parser.output_size + - [params.olnmask_parser.num_channels]) - if is_training: - batch_size = params.train.batch_size - input_layer = { - 'image': - tf.keras.layers.Input( - shape=input_shape, - batch_size=batch_size, - name='image', - dtype=tf.bfloat16 if self._use_bfloat16 else tf.float32), - 'image_info': - tf.keras.layers.Input( - shape=[4, 2], - batch_size=batch_size, - name='image_info', - ), - 'gt_boxes': - tf.keras.layers.Input( - shape=[params.olnmask_parser.max_num_instances, 4], - batch_size=batch_size, - name='gt_boxes'), - 'gt_classes': - tf.keras.layers.Input( - shape=[params.olnmask_parser.max_num_instances], - batch_size=batch_size, - name='gt_classes', - dtype=tf.int64), - } - if self._include_mask: - input_layer['gt_masks'] = tf.keras.layers.Input( - shape=[ - params.olnmask_parser.max_num_instances, - params.olnmask_parser.mask_crop_size, - params.olnmask_parser.mask_crop_size - ], - batch_size=batch_size, - name='gt_masks') - else: - batch_size = params.eval.batch_size - input_layer = { - 'image': - tf.keras.layers.Input( - shape=input_shape, - batch_size=batch_size, - name='image', - dtype=tf.bfloat16 if self._use_bfloat16 else tf.float32), - 'image_info': - tf.keras.layers.Input( - shape=[4, 2], - batch_size=batch_size, - name='image_info', - ), - } - return input_layer - - def build_model(self, params, mode): - if self._keras_model is None: - input_layers = self.build_input_layers(self._params, mode) - outputs = self.model_outputs(input_layers, mode) - - model = tf.keras.models.Model( - inputs=input_layers, outputs=outputs, name='olnmask') - assert model is not None, 'Fail to build tf.keras.Model.' - model.optimizer = self.build_optimizer() - self._keras_model = model - - return self._keras_model diff --git a/official/vision/detection/modeling/optimizers.py b/official/vision/detection/modeling/optimizers.py deleted file mode 100644 index 8b098c9f6456f77e720af387ec3a31ddb4ff2947..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/optimizers.py +++ /dev/null @@ -1,50 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Optimizers.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -import numpy as np -import tensorflow as tf - - -class OptimizerFactory(object): - """Class to generate optimizer function.""" - - def __init__(self, params): - """Creates optimized based on the specified flags.""" - if params.type == 'momentum': - self._optimizer = functools.partial( - tf.keras.optimizers.SGD, - momentum=params.momentum, - nesterov=params.nesterov) - elif params.type == 'adam': - self._optimizer = tf.keras.optimizers.Adam - elif params.type == 'adadelta': - self._optimizer = tf.keras.optimizers.Adadelta - elif params.type == 'adagrad': - self._optimizer = tf.keras.optimizers.Adagrad - elif params.type == 'rmsprop': - self._optimizer = functools.partial( - tf.keras.optimizers.RMSprop, momentum=params.momentum) - else: - raise ValueError('Unsupported optimizer type `{}`.'.format(params.type)) - - def __call__(self, learning_rate): - return self._optimizer(learning_rate=learning_rate) diff --git a/official/vision/detection/modeling/retinanet_model.py b/official/vision/detection/modeling/retinanet_model.py deleted file mode 100644 index 7a0a307c27ceb035b4f4c752ccbb1cd4ead4da29..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/retinanet_model.py +++ /dev/null @@ -1,165 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Model defination for the RetinaNet Model.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.detection.dataloader import mode_keys -from official.vision.detection.evaluation import factory as eval_factory -from official.vision.detection.modeling import base_model -from official.vision.detection.modeling import losses -from official.vision.detection.modeling.architecture import factory -from official.vision.detection.ops import postprocess_ops - - -class RetinanetModel(base_model.Model): - """RetinaNet model function.""" - - def __init__(self, params): - super(RetinanetModel, self).__init__(params) - - # For eval metrics. - self._params = params - - # Architecture generators. - self._backbone_fn = factory.backbone_generator(params) - self._fpn_fn = factory.multilevel_features_generator(params) - self._head_fn = factory.retinanet_head_generator(params) - - # Loss function. - self._cls_loss_fn = losses.RetinanetClassLoss( - params.retinanet_loss, params.architecture.num_classes) - self._box_loss_fn = losses.RetinanetBoxLoss(params.retinanet_loss) - self._box_loss_weight = params.retinanet_loss.box_loss_weight - self._keras_model = None - - # Predict function. - self._generate_detections_fn = postprocess_ops.MultilevelDetectionGenerator( - params.architecture.min_level, params.architecture.max_level, - params.postprocess) - - self._transpose_input = params.train.transpose_input - assert not self._transpose_input, 'Transpose input is not supported.' - # Input layer. - self._input_layer = tf.keras.layers.Input( - shape=(None, None, params.retinanet_parser.num_channels), - name='', - dtype=tf.bfloat16 if self._use_bfloat16 else tf.float32) - - def build_outputs(self, inputs, mode): - # If the input image is transposed (from NHWC to HWCN), we need to revert it - # back to the original shape before it's used in the computation. - if self._transpose_input: - inputs = tf.transpose(inputs, [3, 0, 1, 2]) - - backbone_features = self._backbone_fn( - inputs, is_training=(mode == mode_keys.TRAIN)) - fpn_features = self._fpn_fn( - backbone_features, is_training=(mode == mode_keys.TRAIN)) - cls_outputs, box_outputs = self._head_fn( - fpn_features, is_training=(mode == mode_keys.TRAIN)) - - if self._use_bfloat16: - levels = cls_outputs.keys() - for level in levels: - cls_outputs[level] = tf.cast(cls_outputs[level], tf.float32) - box_outputs[level] = tf.cast(box_outputs[level], tf.float32) - - model_outputs = { - 'cls_outputs': cls_outputs, - 'box_outputs': box_outputs, - } - return model_outputs - - def build_loss_fn(self): - if self._keras_model is None: - raise ValueError('build_loss_fn() must be called after build_model().') - - filter_fn = self.make_filter_trainable_variables_fn() - trainable_variables = filter_fn(self._keras_model.trainable_variables) - - def _total_loss_fn(labels, outputs): - cls_loss = self._cls_loss_fn(outputs['cls_outputs'], - labels['cls_targets'], - labels['num_positives']) - box_loss = self._box_loss_fn(outputs['box_outputs'], - labels['box_targets'], - labels['num_positives']) - model_loss = cls_loss + self._box_loss_weight * box_loss - l2_regularization_loss = self.weight_decay_loss(trainable_variables) - total_loss = model_loss + l2_regularization_loss - return { - 'total_loss': total_loss, - 'cls_loss': cls_loss, - 'box_loss': box_loss, - 'model_loss': model_loss, - 'l2_regularization_loss': l2_regularization_loss, - } - - return _total_loss_fn - - def build_model(self, params, mode=None): - if self._keras_model is None: - outputs = self.model_outputs(self._input_layer, mode) - - model = tf.keras.models.Model( - inputs=self._input_layer, outputs=outputs, name='retinanet') - assert model is not None, 'Fail to build tf.keras.Model.' - model.optimizer = self.build_optimizer() - self._keras_model = model - - return self._keras_model - - def post_processing(self, labels, outputs): - # TODO(yeqing): Moves the output related part into build_outputs. - required_output_fields = ['cls_outputs', 'box_outputs'] - for field in required_output_fields: - if field not in outputs: - raise ValueError('"%s" is missing in outputs, requried %s found %s', - field, required_output_fields, outputs.keys()) - required_label_fields = ['image_info', 'groundtruths'] - for field in required_label_fields: - if field not in labels: - raise ValueError('"%s" is missing in outputs, requried %s found %s', - field, required_label_fields, labels.keys()) - boxes, scores, classes, valid_detections = self._generate_detections_fn( - outputs['box_outputs'], outputs['cls_outputs'], labels['anchor_boxes'], - labels['image_info'][:, 1:2, :]) - # Discards the old output tensors to save memory. The `cls_outputs` and - # `box_outputs` are pretty big and could potentiall lead to memory issue. - outputs = { - 'source_id': labels['groundtruths']['source_id'], - 'image_info': labels['image_info'], - 'num_detections': valid_detections, - 'detection_boxes': boxes, - 'detection_classes': classes, - 'detection_scores': scores, - } - - if 'groundtruths' in labels: - labels['source_id'] = labels['groundtruths']['source_id'] - labels['boxes'] = labels['groundtruths']['boxes'] - labels['classes'] = labels['groundtruths']['classes'] - labels['areas'] = labels['groundtruths']['areas'] - labels['is_crowds'] = labels['groundtruths']['is_crowds'] - - return labels, outputs - - def eval_metrics(self): - return eval_factory.evaluator_generator(self._params.eval) diff --git a/official/vision/detection/modeling/shapemask_model.py b/official/vision/detection/modeling/shapemask_model.py deleted file mode 100644 index d197ec2fa38c167f616c0e60c5951bfb12ff94fb..0000000000000000000000000000000000000000 --- a/official/vision/detection/modeling/shapemask_model.py +++ /dev/null @@ -1,304 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Model definition for the ShapeMask Model.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.detection.dataloader import anchor -from official.vision.detection.dataloader import mode_keys -from official.vision.detection.evaluation import factory as eval_factory -from official.vision.detection.modeling import base_model -from official.vision.detection.modeling import losses -from official.vision.detection.modeling.architecture import factory -from official.vision.detection.ops import postprocess_ops -from official.vision.detection.utils import box_utils - - -class ShapeMaskModel(base_model.Model): - """ShapeMask model function.""" - - def __init__(self, params): - super(ShapeMaskModel, self).__init__(params) - - self._params = params - self._keras_model = None - - # Architecture generators. - self._backbone_fn = factory.backbone_generator(params) - self._fpn_fn = factory.multilevel_features_generator(params) - self._retinanet_head_fn = factory.retinanet_head_generator(params) - self._shape_prior_head_fn = factory.shapeprior_head_generator(params) - self._coarse_mask_fn = factory.coarsemask_head_generator(params) - self._fine_mask_fn = factory.finemask_head_generator(params) - - # Loss functions. - self._cls_loss_fn = losses.RetinanetClassLoss( - params.retinanet_loss, params.architecture.num_classes) - self._box_loss_fn = losses.RetinanetBoxLoss(params.retinanet_loss) - self._box_loss_weight = params.retinanet_loss.box_loss_weight - - # Mask loss function. - self._shapemask_prior_loss_fn = losses.ShapemaskMseLoss() - self._shapemask_loss_fn = losses.ShapemaskLoss() - self._shape_prior_loss_weight = ( - params.shapemask_loss.shape_prior_loss_weight) - self._coarse_mask_loss_weight = ( - params.shapemask_loss.coarse_mask_loss_weight) - self._fine_mask_loss_weight = (params.shapemask_loss.fine_mask_loss_weight) - - # Predict function. - self._generate_detections_fn = postprocess_ops.MultilevelDetectionGenerator( - params.architecture.min_level, params.architecture.max_level, - params.postprocess) - - def build_outputs(self, inputs, mode): - is_training = mode == mode_keys.TRAIN - images = inputs['image'] - - if 'anchor_boxes' in inputs: - anchor_boxes = inputs['anchor_boxes'] - else: - anchor_boxes = anchor.Anchor( - self._params.architecture.min_level, - self._params.architecture.max_level, self._params.anchor.num_scales, - self._params.anchor.aspect_ratios, self._params.anchor.anchor_size, - images.get_shape().as_list()[1:3]).multilevel_boxes - - batch_size = tf.shape(images)[0] - for level in anchor_boxes: - anchor_boxes[level] = tf.tile( - tf.expand_dims(anchor_boxes[level], 0), [batch_size, 1, 1, 1]) - - backbone_features = self._backbone_fn(images, is_training=is_training) - fpn_features = self._fpn_fn(backbone_features, is_training=is_training) - cls_outputs, box_outputs = self._retinanet_head_fn( - fpn_features, is_training=is_training) - - valid_boxes, valid_scores, valid_classes, valid_detections = ( - self._generate_detections_fn(box_outputs, cls_outputs, anchor_boxes, - inputs['image_info'][:, 1:2, :])) - - image_size = images.get_shape().as_list()[1:3] - valid_outer_boxes = box_utils.compute_outer_boxes( - tf.reshape(valid_boxes, [-1, 4]), - image_size, - scale=self._params.shapemask_parser.outer_box_scale) - valid_outer_boxes = tf.reshape(valid_outer_boxes, tf.shape(valid_boxes)) - - # Wrapping if else code paths into a layer to make the checkpoint loadable - # in prediction mode. - class SampledBoxesLayer(tf.keras.layers.Layer): - """ShapeMask model function.""" - - def call(self, inputs, val_boxes, val_classes, val_outer_boxes, training): - if training: - boxes = inputs['mask_boxes'] - outer_boxes = inputs['mask_outer_boxes'] - classes = inputs['mask_classes'] - else: - boxes = val_boxes - classes = val_classes - outer_boxes = val_outer_boxes - return boxes, classes, outer_boxes - - boxes, classes, outer_boxes = SampledBoxesLayer()( - inputs, - valid_boxes, - valid_classes, - valid_outer_boxes, - training=is_training) - - instance_features, prior_masks = self._shape_prior_head_fn( - fpn_features, boxes, outer_boxes, classes, is_training) - coarse_mask_logits = self._coarse_mask_fn(instance_features, prior_masks, - classes, is_training) - fine_mask_logits = self._fine_mask_fn(instance_features, coarse_mask_logits, - classes, is_training) - - model_outputs = { - 'cls_outputs': cls_outputs, - 'box_outputs': box_outputs, - 'fine_mask_logits': fine_mask_logits, - 'coarse_mask_logits': coarse_mask_logits, - 'prior_masks': prior_masks, - } - - if not is_training: - model_outputs.update({ - 'num_detections': valid_detections, - 'detection_boxes': valid_boxes, - 'detection_outer_boxes': valid_outer_boxes, - 'detection_masks': fine_mask_logits, - 'detection_classes': valid_classes, - 'detection_scores': valid_scores, - }) - - return model_outputs - - def build_loss_fn(self): - if self._keras_model is None: - raise ValueError('build_loss_fn() must be called after build_model().') - - filter_fn = self.make_filter_trainable_variables_fn() - trainable_variables = filter_fn(self._keras_model.trainable_variables) - - def _total_loss_fn(labels, outputs): - cls_loss = self._cls_loss_fn(outputs['cls_outputs'], - labels['cls_targets'], - labels['num_positives']) - box_loss = self._box_loss_fn(outputs['box_outputs'], - labels['box_targets'], - labels['num_positives']) - - # Adds Shapemask model losses. - shape_prior_loss = self._shapemask_prior_loss_fn(outputs['prior_masks'], - labels['mask_targets'], - labels['mask_is_valid']) - coarse_mask_loss = self._shapemask_loss_fn(outputs['coarse_mask_logits'], - labels['mask_targets'], - labels['mask_is_valid']) - fine_mask_loss = self._shapemask_loss_fn(outputs['fine_mask_logits'], - labels['fine_mask_targets'], - labels['mask_is_valid']) - - model_loss = ( - cls_loss + self._box_loss_weight * box_loss + - shape_prior_loss * self._shape_prior_loss_weight + - coarse_mask_loss * self._coarse_mask_loss_weight + - fine_mask_loss * self._fine_mask_loss_weight) - - l2_regularization_loss = self.weight_decay_loss(trainable_variables) - total_loss = model_loss + l2_regularization_loss - - shapemask_losses = { - 'total_loss': total_loss, - 'loss': total_loss, - 'retinanet_cls_loss': cls_loss, - 'l2_regularization_loss': l2_regularization_loss, - 'retinanet_box_loss': box_loss, - 'shapemask_prior_loss': shape_prior_loss, - 'shapemask_coarse_mask_loss': coarse_mask_loss, - 'shapemask_fine_mask_loss': fine_mask_loss, - 'model_loss': model_loss, - } - return shapemask_losses - - return _total_loss_fn - - def build_input_layers(self, params, mode): - is_training = mode == mode_keys.TRAIN - input_shape = ( - params.shapemask_parser.output_size + - [params.shapemask_parser.num_channels]) - if is_training: - batch_size = params.train.batch_size - input_layer = { - 'image': - tf.keras.layers.Input( - shape=input_shape, - batch_size=batch_size, - name='image', - dtype=tf.bfloat16 if self._use_bfloat16 else tf.float32), - 'image_info': - tf.keras.layers.Input( - shape=[4, 2], batch_size=batch_size, name='image_info'), - 'mask_classes': - tf.keras.layers.Input( - shape=[params.shapemask_parser.num_sampled_masks], - batch_size=batch_size, - name='mask_classes', - dtype=tf.int64), - 'mask_outer_boxes': - tf.keras.layers.Input( - shape=[params.shapemask_parser.num_sampled_masks, 4], - batch_size=batch_size, - name='mask_outer_boxes', - dtype=tf.float32), - 'mask_boxes': - tf.keras.layers.Input( - shape=[params.shapemask_parser.num_sampled_masks, 4], - batch_size=batch_size, - name='mask_boxes', - dtype=tf.float32), - } - else: - batch_size = params.eval.batch_size - input_layer = { - 'image': - tf.keras.layers.Input( - shape=input_shape, - batch_size=batch_size, - name='image', - dtype=tf.bfloat16 if self._use_bfloat16 else tf.float32), - 'image_info': - tf.keras.layers.Input( - shape=[4, 2], batch_size=batch_size, name='image_info'), - } - return input_layer - - def build_model(self, params, mode): - if self._keras_model is None: - input_layers = self.build_input_layers(self._params, mode) - outputs = self.model_outputs(input_layers, mode) - - model = tf.keras.models.Model( - inputs=input_layers, outputs=outputs, name='shapemask') - assert model is not None, 'Fail to build tf.keras.Model.' - model.optimizer = self.build_optimizer() - self._keras_model = model - - return self._keras_model - - def post_processing(self, labels, outputs): - required_output_fields = [ - 'num_detections', 'detection_boxes', 'detection_classes', - 'detection_masks', 'detection_scores' - ] - - for field in required_output_fields: - if field not in outputs: - raise ValueError( - '"{}" is missing in outputs, requried {} found {}'.format( - field, required_output_fields, outputs.keys())) - - required_label_fields = ['image_info'] - for field in required_label_fields: - if field not in labels: - raise ValueError( - '"{}" is missing in labels, requried {} found {}'.format( - field, required_label_fields, labels.keys())) - - predictions = { - 'image_info': labels['image_info'], - 'num_detections': outputs['num_detections'], - 'detection_boxes': outputs['detection_boxes'], - 'detection_outer_boxes': outputs['detection_outer_boxes'], - 'detection_classes': outputs['detection_classes'], - 'detection_scores': outputs['detection_scores'], - 'detection_masks': outputs['detection_masks'], - } - - if 'groundtruths' in labels: - predictions['source_id'] = labels['groundtruths']['source_id'] - labels = labels['groundtruths'] - - return labels, predictions - - def eval_metrics(self): - return eval_factory.evaluator_generator(self._params.eval) diff --git a/official/vision/detection/ops/__init__.py b/official/vision/detection/ops/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/ops/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/ops/nms.py b/official/vision/detection/ops/nms.py deleted file mode 100644 index a81ff1e8fcd44ddd35dcf1a3bf7a9dad1831c76f..0000000000000000000000000000000000000000 --- a/official/vision/detection/ops/nms.py +++ /dev/null @@ -1,201 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tensorflow implementation of non max suppression.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.detection.utils import box_utils - -NMS_TILE_SIZE = 512 - - -def _self_suppression(iou, _, iou_sum): - batch_size = tf.shape(iou)[0] - can_suppress_others = tf.cast( - tf.reshape(tf.reduce_max(iou, 1) <= 0.5, [batch_size, -1, 1]), iou.dtype) - iou_suppressed = tf.reshape( - tf.cast(tf.reduce_max(can_suppress_others * iou, 1) <= 0.5, iou.dtype), - [batch_size, -1, 1]) * iou - iou_sum_new = tf.reduce_sum(iou_suppressed, [1, 2]) - return [ - iou_suppressed, - tf.reduce_any(iou_sum - iou_sum_new > 0.5), iou_sum_new - ] - - -def _cross_suppression(boxes, box_slice, iou_threshold, inner_idx): - batch_size = tf.shape(boxes)[0] - new_slice = tf.slice(boxes, [0, inner_idx * NMS_TILE_SIZE, 0], - [batch_size, NMS_TILE_SIZE, 4]) - iou = box_utils.bbox_overlap(new_slice, box_slice) - ret_slice = tf.expand_dims( - tf.cast(tf.reduce_all(iou < iou_threshold, [1]), box_slice.dtype), - 2) * box_slice - return boxes, ret_slice, iou_threshold, inner_idx + 1 - - -def _suppression_loop_body(boxes, iou_threshold, output_size, idx): - """Process boxes in the range [idx*NMS_TILE_SIZE, (idx+1)*NMS_TILE_SIZE). - - Args: - boxes: a tensor with a shape of [batch_size, anchors, 4]. - iou_threshold: a float representing the threshold for deciding whether boxes - overlap too much with respect to IOU. - output_size: an int32 tensor of size [batch_size]. Representing the number - of selected boxes for each batch. - idx: an integer scalar representing induction variable. - - Returns: - boxes: updated boxes. - iou_threshold: pass down iou_threshold to the next iteration. - output_size: the updated output_size. - idx: the updated induction variable. - """ - num_tiles = tf.shape(boxes)[1] // NMS_TILE_SIZE - batch_size = tf.shape(boxes)[0] - - # Iterates over tiles that can possibly suppress the current tile. - box_slice = tf.slice(boxes, [0, idx * NMS_TILE_SIZE, 0], - [batch_size, NMS_TILE_SIZE, 4]) - _, box_slice, _, _ = tf.while_loop( - lambda _boxes, _box_slice, _threshold, inner_idx: inner_idx < idx, - _cross_suppression, [boxes, box_slice, iou_threshold, - tf.constant(0)]) - - # Iterates over the current tile to compute self-suppression. - iou = box_utils.bbox_overlap(box_slice, box_slice) - mask = tf.expand_dims( - tf.reshape(tf.range(NMS_TILE_SIZE), [1, -1]) > tf.reshape( - tf.range(NMS_TILE_SIZE), [-1, 1]), 0) - iou *= tf.cast(tf.logical_and(mask, iou >= iou_threshold), iou.dtype) - suppressed_iou, _, _ = tf.while_loop( - lambda _iou, loop_condition, _iou_sum: loop_condition, _self_suppression, - [iou, tf.constant(True), - tf.reduce_sum(iou, [1, 2])]) - suppressed_box = tf.reduce_sum(suppressed_iou, 1) > 0 - box_slice *= tf.expand_dims(1.0 - tf.cast(suppressed_box, box_slice.dtype), 2) - - # Uses box_slice to update the input boxes. - mask = tf.reshape( - tf.cast(tf.equal(tf.range(num_tiles), idx), boxes.dtype), [1, -1, 1, 1]) - boxes = tf.tile(tf.expand_dims( - box_slice, [1]), [1, num_tiles, 1, 1]) * mask + tf.reshape( - boxes, [batch_size, num_tiles, NMS_TILE_SIZE, 4]) * (1 - mask) - boxes = tf.reshape(boxes, [batch_size, -1, 4]) - - # Updates output_size. - output_size += tf.reduce_sum( - tf.cast(tf.reduce_any(box_slice > 0, [2]), tf.int32), [1]) - return boxes, iou_threshold, output_size, idx + 1 - - -def sorted_non_max_suppression_padded(scores, boxes, max_output_size, - iou_threshold): - """A wrapper that handles non-maximum suppression. - - Assumption: - * The boxes are sorted by scores unless the box is a dot (all coordinates - are zero). - * Boxes with higher scores can be used to suppress boxes with lower scores. - - The overal design of the algorithm is to handle boxes tile-by-tile: - - boxes = boxes.pad_to_multiply_of(tile_size) - num_tiles = len(boxes) // tile_size - output_boxes = [] - for i in range(num_tiles): - box_tile = boxes[i*tile_size : (i+1)*tile_size] - for j in range(i - 1): - suppressing_tile = boxes[j*tile_size : (j+1)*tile_size] - iou = bbox_overlap(box_tile, suppressing_tile) - # if the box is suppressed in iou, clear it to a dot - box_tile *= _update_boxes(iou) - # Iteratively handle the diagnal tile. - iou = _box_overlap(box_tile, box_tile) - iou_changed = True - while iou_changed: - # boxes that are not suppressed by anything else - suppressing_boxes = _get_suppressing_boxes(iou) - # boxes that are suppressed by suppressing_boxes - suppressed_boxes = _get_suppressed_boxes(iou, suppressing_boxes) - # clear iou to 0 for boxes that are suppressed, as they cannot be used - # to suppress other boxes any more - new_iou = _clear_iou(iou, suppressed_boxes) - iou_changed = (new_iou != iou) - iou = new_iou - # remaining boxes that can still suppress others, are selected boxes. - output_boxes.append(_get_suppressing_boxes(iou)) - if len(output_boxes) >= max_output_size: - break - - Args: - scores: a tensor with a shape of [batch_size, anchors]. - boxes: a tensor with a shape of [batch_size, anchors, 4]. - max_output_size: a scalar integer `Tensor` representing the maximum number - of boxes to be selected by non max suppression. - iou_threshold: a float representing the threshold for deciding whether boxes - overlap too much with respect to IOU. - - Returns: - nms_scores: a tensor with a shape of [batch_size, anchors]. It has same - dtype as input scores. - nms_proposals: a tensor with a shape of [batch_size, anchors, 4]. It has - same dtype as input boxes. - """ - batch_size = tf.shape(boxes)[0] - num_boxes = tf.shape(boxes)[1] - pad = tf.cast( - tf.math.ceil(tf.cast(num_boxes, tf.float32) / NMS_TILE_SIZE), - tf.int32) * NMS_TILE_SIZE - num_boxes - boxes = tf.pad(tf.cast(boxes, tf.float32), [[0, 0], [0, pad], [0, 0]]) - scores = tf.pad( - tf.cast(scores, tf.float32), [[0, 0], [0, pad]], constant_values=-1) - num_boxes += pad - - def _loop_cond(unused_boxes, unused_threshold, output_size, idx): - return tf.logical_and( - tf.reduce_min(output_size) < max_output_size, - idx < num_boxes // NMS_TILE_SIZE) - - selected_boxes, _, output_size, _ = tf.while_loop( - _loop_cond, _suppression_loop_body, - [boxes, iou_threshold, - tf.zeros([batch_size], tf.int32), - tf.constant(0)]) - idx = num_boxes - tf.cast( - tf.nn.top_k( - tf.cast(tf.reduce_any(selected_boxes > 0, [2]), tf.int32) * - tf.expand_dims(tf.range(num_boxes, 0, -1), 0), max_output_size)[0], - tf.int32) - idx = tf.minimum(idx, num_boxes - 1) - idx = tf.reshape(idx + tf.reshape(tf.range(batch_size) * num_boxes, [-1, 1]), - [-1]) - boxes = tf.reshape( - tf.gather(tf.reshape(boxes, [-1, 4]), idx), - [batch_size, max_output_size, 4]) - boxes = boxes * tf.cast( - tf.reshape(tf.range(max_output_size), [1, -1, 1]) < tf.reshape( - output_size, [-1, 1, 1]), boxes.dtype) - scores = tf.reshape( - tf.gather(tf.reshape(scores, [-1, 1]), idx), - [batch_size, max_output_size]) - scores = scores * tf.cast( - tf.reshape(tf.range(max_output_size), [1, -1]) < tf.reshape( - output_size, [-1, 1]), scores.dtype) - return scores, boxes diff --git a/official/vision/detection/ops/postprocess_ops.py b/official/vision/detection/ops/postprocess_ops.py deleted file mode 100644 index ba0f3c40664381c4fa76e4d617721edccbaab7d7..0000000000000000000000000000000000000000 --- a/official/vision/detection/ops/postprocess_ops.py +++ /dev/null @@ -1,497 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Post-processing model outputs to generate detection.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -import tensorflow as tf - -from official.vision.detection.ops import nms -from official.vision.detection.utils import box_utils - - -def generate_detections_factory(params): - """Factory to select function to generate detection.""" - if params.use_batched_nms: - func = functools.partial( - _generate_detections_batched, - max_total_size=params.max_total_size, - nms_iou_threshold=params.nms_iou_threshold, - score_threshold=params.score_threshold) - else: - func = functools.partial( - _generate_detections, - max_total_size=params.max_total_size, - nms_iou_threshold=params.nms_iou_threshold, - score_threshold=params.score_threshold, - pre_nms_num_boxes=params.pre_nms_num_boxes) - return func - - -def _select_top_k_scores(scores_in, pre_nms_num_detections): - """Select top_k scores and indices for each class. - - Args: - scores_in: a Tensor with shape [batch_size, N, num_classes], which stacks - class logit outputs on all feature levels. The N is the number of total - anchors on all levels. The num_classes is the number of classes predicted - by the model. - pre_nms_num_detections: Number of candidates before NMS. - - Returns: - scores and indices: Tensors with shape [batch_size, pre_nms_num_detections, - num_classes]. - """ - batch_size, num_anchors, num_class = scores_in.get_shape().as_list() - scores_trans = tf.transpose(scores_in, perm=[0, 2, 1]) - scores_trans = tf.reshape(scores_trans, [-1, num_anchors]) - - top_k_scores, top_k_indices = tf.nn.top_k( - scores_trans, k=pre_nms_num_detections, sorted=True) - - top_k_scores = tf.reshape(top_k_scores, - [batch_size, num_class, pre_nms_num_detections]) - top_k_indices = tf.reshape(top_k_indices, - [batch_size, num_class, pre_nms_num_detections]) - - return tf.transpose(top_k_scores, - [0, 2, 1]), tf.transpose(top_k_indices, [0, 2, 1]) - - -def _generate_detections(boxes, - scores, - max_total_size=100, - nms_iou_threshold=0.3, - score_threshold=0.05, - pre_nms_num_boxes=5000): - """Generate the final detections given the model outputs. - - This uses classes unrolling with while loop based NMS, could be parralled - at batch dimension. - - Args: - boxes: a tensor with shape [batch_size, N, num_classes, 4] or [batch_size, - N, 1, 4], which box predictions on all feature levels. The N is the number - of total anchors on all levels. - scores: a tensor with shape [batch_size, N, num_classes], which stacks class - probability on all feature levels. The N is the number of total anchors on - all levels. The num_classes is the number of classes predicted by the - model. Note that the class_outputs here is the raw score. - max_total_size: a scalar representing maximum number of boxes retained over - all classes. - nms_iou_threshold: a float representing the threshold for deciding whether - boxes overlap too much with respect to IOU. - score_threshold: a float representing the threshold for deciding when to - remove boxes based on score. - pre_nms_num_boxes: an int number of top candidate detections per class - before NMS. - - Returns: - nms_boxes: `float` Tensor of shape [batch_size, max_total_size, 4] - representing top detected boxes in [y1, x1, y2, x2]. - nms_scores: `float` Tensor of shape [batch_size, max_total_size] - representing sorted confidence scores for detected boxes. The values are - between [0, 1]. - nms_classes: `int` Tensor of shape [batch_size, max_total_size] representing - classes for detected boxes. - valid_detections: `int` Tensor of shape [batch_size] only the top - `valid_detections` boxes are valid detections. - """ - with tf.name_scope('generate_detections'): - nmsed_boxes = [] - nmsed_classes = [] - nmsed_scores = [] - valid_detections = [] - batch_size, _, num_classes_for_box, _ = boxes.get_shape().as_list() - _, total_anchors, num_classes = scores.get_shape().as_list() - # Selects top pre_nms_num scores and indices before NMS. - scores, indices = _select_top_k_scores( - scores, min(total_anchors, pre_nms_num_boxes)) - for i in range(num_classes): - boxes_i = boxes[:, :, min(num_classes_for_box - 1, i), :] - scores_i = scores[:, :, i] - # Obtains pre_nms_num_boxes before running NMS. - boxes_i = tf.gather(boxes_i, indices[:, :, i], batch_dims=1, axis=1) - - # Filter out scores. - boxes_i, scores_i = box_utils.filter_boxes_by_scores( - boxes_i, scores_i, min_score_threshold=score_threshold) - - (nmsed_scores_i, nmsed_boxes_i) = nms.sorted_non_max_suppression_padded( - tf.cast(scores_i, tf.float32), - tf.cast(boxes_i, tf.float32), - max_total_size, - iou_threshold=nms_iou_threshold) - nmsed_classes_i = tf.fill([batch_size, max_total_size], i) - nmsed_boxes.append(nmsed_boxes_i) - nmsed_scores.append(nmsed_scores_i) - nmsed_classes.append(nmsed_classes_i) - nmsed_boxes = tf.concat(nmsed_boxes, axis=1) - nmsed_scores = tf.concat(nmsed_scores, axis=1) - nmsed_classes = tf.concat(nmsed_classes, axis=1) - nmsed_scores, indices = tf.nn.top_k( - nmsed_scores, k=max_total_size, sorted=True) - nmsed_boxes = tf.gather(nmsed_boxes, indices, batch_dims=1, axis=1) - nmsed_classes = tf.gather(nmsed_classes, indices, batch_dims=1) - valid_detections = tf.reduce_sum( - input_tensor=tf.cast(tf.greater(nmsed_scores, -1), tf.int32), axis=1) - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections - - -def _generate_detections_per_image(boxes, - scores, - max_total_size=100, - nms_iou_threshold=0.3, - score_threshold=0.05, - pre_nms_num_boxes=5000): - """Generate the final detections per image given the model outputs. - - Args: - boxes: a tensor with shape [N, num_classes, 4] or [N, 1, 4], which box - predictions on all feature levels. The N is the number of total anchors on - all levels. - scores: a tensor with shape [N, num_classes], which stacks class probability - on all feature levels. The N is the number of total anchors on all levels. - The num_classes is the number of classes predicted by the model. Note that - the class_outputs here is the raw score. - max_total_size: a scalar representing maximum number of boxes retained over - all classes. - nms_iou_threshold: a float representing the threshold for deciding whether - boxes overlap too much with respect to IOU. - score_threshold: a float representing the threshold for deciding when to - remove boxes based on score. - pre_nms_num_boxes: an int number of top candidate detections per class - before NMS. - - Returns: - nms_boxes: `float` Tensor of shape [max_total_size, 4] representing top - detected boxes in [y1, x1, y2, x2]. - nms_scores: `float` Tensor of shape [max_total_size] representing sorted - confidence scores for detected boxes. The values are between [0, 1]. - nms_classes: `int` Tensor of shape [max_total_size] representing classes for - detected boxes. - valid_detections: `int` Tensor of shape [1] only the top `valid_detections` - boxes are valid detections. - """ - nmsed_boxes = [] - nmsed_scores = [] - nmsed_classes = [] - num_classes_for_box = boxes.get_shape().as_list()[1] - num_classes = scores.get_shape().as_list()[1] - for i in range(num_classes): - boxes_i = boxes[:, min(num_classes_for_box - 1, i)] - scores_i = scores[:, i] - - # Obtains pre_nms_num_boxes before running NMS. - scores_i, indices = tf.nn.top_k( - scores_i, k=tf.minimum(tf.shape(input=scores_i)[-1], pre_nms_num_boxes)) - boxes_i = tf.gather(boxes_i, indices) - - (nmsed_indices_i, nmsed_num_valid_i) = tf.image.non_max_suppression_padded( - tf.cast(boxes_i, tf.float32), - tf.cast(scores_i, tf.float32), - max_total_size, - iou_threshold=nms_iou_threshold, - score_threshold=score_threshold, - pad_to_max_output_size=True, - name='nms_detections_' + str(i)) - nmsed_boxes_i = tf.gather(boxes_i, nmsed_indices_i) - nmsed_scores_i = tf.gather(scores_i, nmsed_indices_i) - # Sets scores of invalid boxes to -1. - nmsed_scores_i = tf.where( - tf.less(tf.range(max_total_size), [nmsed_num_valid_i]), nmsed_scores_i, - -tf.ones_like(nmsed_scores_i)) - nmsed_classes_i = tf.fill([max_total_size], i) - nmsed_boxes.append(nmsed_boxes_i) - nmsed_scores.append(nmsed_scores_i) - nmsed_classes.append(nmsed_classes_i) - - # Concats results from all classes and sort them. - nmsed_boxes = tf.concat(nmsed_boxes, axis=0) - nmsed_scores = tf.concat(nmsed_scores, axis=0) - nmsed_classes = tf.concat(nmsed_classes, axis=0) - nmsed_scores, indices = tf.nn.top_k( - nmsed_scores, k=max_total_size, sorted=True) - nmsed_boxes = tf.gather(nmsed_boxes, indices) - nmsed_classes = tf.gather(nmsed_classes, indices) - valid_detections = tf.reduce_sum( - input_tensor=tf.cast(tf.greater(nmsed_scores, -1), tf.int32)) - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections - - -def _generate_detections_batched(boxes, scores, max_total_size, - nms_iou_threshold, score_threshold): - """Generates detected boxes with scores and classes for one-stage detector. - - The function takes output of multi-level ConvNets and anchor boxes and - generates detected boxes. Note that this used batched nms, which is not - supported on TPU currently. - - Args: - boxes: a tensor with shape [batch_size, N, num_classes, 4] or [batch_size, - N, 1, 4], which box predictions on all feature levels. The N is the number - of total anchors on all levels. - scores: a tensor with shape [batch_size, N, num_classes], which stacks class - probability on all feature levels. The N is the number of total anchors on - all levels. The num_classes is the number of classes predicted by the - model. Note that the class_outputs here is the raw score. - max_total_size: a scalar representing maximum number of boxes retained over - all classes. - nms_iou_threshold: a float representing the threshold for deciding whether - boxes overlap too much with respect to IOU. - score_threshold: a float representing the threshold for deciding when to - remove boxes based on score. - - Returns: - nms_boxes: `float` Tensor of shape [batch_size, max_total_size, 4] - representing top detected boxes in [y1, x1, y2, x2]. - nms_scores: `float` Tensor of shape [batch_size, max_total_size] - representing sorted confidence scores for detected boxes. The values are - between [0, 1]. - nms_classes: `int` Tensor of shape [batch_size, max_total_size] representing - classes for detected boxes. - valid_detections: `int` Tensor of shape [batch_size] only the top - `valid_detections` boxes are valid detections. - """ - with tf.name_scope('generate_detections'): - # TODO(tsungyi): Removes normalization/denomalization once the - # tf.image.combined_non_max_suppression is coordinate system agnostic. - # Normalizes maximum box cooridinates to 1. - normalizer = tf.reduce_max(boxes) - boxes /= normalizer - (nmsed_boxes, nmsed_scores, nmsed_classes, - valid_detections) = tf.image.combined_non_max_suppression( - boxes, - scores, - max_output_size_per_class=max_total_size, - max_total_size=max_total_size, - iou_threshold=nms_iou_threshold, - score_threshold=score_threshold, - pad_per_class=False, - ) - # De-normalizes box cooridinates. - nmsed_boxes *= normalizer - nmsed_classes = tf.cast(nmsed_classes, tf.int32) - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections - - -class MultilevelDetectionGenerator(tf.keras.layers.Layer): - """Generates detected boxes with scores and classes for one-stage detector.""" - - def __init__(self, min_level, max_level, params): - self._min_level = min_level - self._max_level = max_level - self._generate_detections = generate_detections_factory(params) - super(MultilevelDetectionGenerator, self).__init__(autocast=False) - - def call(self, box_outputs, class_outputs, anchor_boxes, image_shape): - # Collects outputs from all levels into a list. - boxes = [] - scores = [] - for i in range(self._min_level, self._max_level + 1): - box_outputs_i_shape = tf.shape(box_outputs[i]) - batch_size = box_outputs_i_shape[0] - num_anchors_per_locations = box_outputs_i_shape[-1] // 4 - num_classes = tf.shape(class_outputs[i])[-1] // num_anchors_per_locations - - # Applies score transformation and remove the implicit background class. - scores_i = tf.sigmoid( - tf.reshape(class_outputs[i], [batch_size, -1, num_classes])) - scores_i = tf.slice(scores_i, [0, 0, 1], [-1, -1, -1]) - - # Box decoding. - # The anchor boxes are shared for all data in a batch. - # One stage detector only supports class agnostic box regression. - anchor_boxes_i = tf.reshape(anchor_boxes[i], [batch_size, -1, 4]) - box_outputs_i = tf.reshape(box_outputs[i], [batch_size, -1, 4]) - boxes_i = box_utils.decode_boxes(box_outputs_i, anchor_boxes_i) - - # Box clipping. - boxes_i = box_utils.clip_boxes(boxes_i, image_shape) - - boxes.append(boxes_i) - scores.append(scores_i) - boxes = tf.concat(boxes, axis=1) - scores = tf.concat(scores, axis=1) - - nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections = ( - self._generate_detections(tf.expand_dims(boxes, axis=2), scores)) - - # Adds 1 to offset the background class which has index 0. - nmsed_classes += 1 - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections - - -class GenericDetectionGenerator(tf.keras.layers.Layer): - """Generates the final detected boxes with scores and classes.""" - - def __init__(self, params): - super(GenericDetectionGenerator, self).__init__(autocast=False) - self._generate_detections = generate_detections_factory(params) - - def call(self, box_outputs, class_outputs, anchor_boxes, image_shape): - """Generate final detections. - - Args: - box_outputs: a tensor of shape of [batch_size, K, num_classes * 4] - representing the class-specific box coordinates relative to anchors. - class_outputs: a tensor of shape of [batch_size, K, num_classes] - representing the class logits before applying score activiation. - anchor_boxes: a tensor of shape of [batch_size, K, 4] representing the - corresponding anchor boxes w.r.t `box_outputs`. - image_shape: a tensor of shape of [batch_size, 2] storing the image height - and width w.r.t. the scaled image, i.e. the same image space as - `box_outputs` and `anchor_boxes`. - - Returns: - nms_boxes: `float` Tensor of shape [batch_size, max_total_size, 4] - representing top detected boxes in [y1, x1, y2, x2]. - nms_scores: `float` Tensor of shape [batch_size, max_total_size] - representing sorted confidence scores for detected boxes. The values are - between [0, 1]. - nms_classes: `int` Tensor of shape [batch_size, max_total_size] - representing classes for detected boxes. - valid_detections: `int` Tensor of shape [batch_size] only the top - `valid_detections` boxes are valid detections. - """ - class_outputs = tf.nn.softmax(class_outputs, axis=-1) - - # Removes the background class. - class_outputs_shape = tf.shape(class_outputs) - batch_size = class_outputs_shape[0] - num_locations = class_outputs_shape[1] - num_classes = class_outputs_shape[-1] - num_detections = num_locations * (num_classes - 1) - - class_outputs = tf.slice(class_outputs, [0, 0, 1], [-1, -1, -1]) - box_outputs = tf.reshape( - box_outputs, - tf.stack([batch_size, num_locations, num_classes, 4], axis=-1)) - box_outputs = tf.slice(box_outputs, [0, 0, 1, 0], [-1, -1, -1, -1]) - anchor_boxes = tf.tile( - tf.expand_dims(anchor_boxes, axis=2), [1, 1, num_classes - 1, 1]) - box_outputs = tf.reshape(box_outputs, - tf.stack([batch_size, num_detections, 4], axis=-1)) - anchor_boxes = tf.reshape( - anchor_boxes, tf.stack([batch_size, num_detections, 4], axis=-1)) - - # Box decoding. - decoded_boxes = box_utils.decode_boxes( - box_outputs, anchor_boxes, weights=[10.0, 10.0, 5.0, 5.0]) - - # Box clipping - decoded_boxes = box_utils.clip_boxes(decoded_boxes, image_shape) - - decoded_boxes = tf.reshape( - decoded_boxes, - tf.stack([batch_size, num_locations, num_classes - 1, 4], axis=-1)) - - nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections = ( - self._generate_detections(decoded_boxes, class_outputs)) - - # Adds 1 to offset the background class which has index 0. - nmsed_classes += 1 - - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections - - -class OlnDetectionGenerator(GenericDetectionGenerator): - """Generates the final detected boxes with scores and classes.""" - - def __call__(self, box_outputs, class_outputs, anchor_boxes, image_shape, - is_single_fg_score=False, keep_nms=True): - """Generate final detections for Object Localization Network (OLN). - - Args: - box_outputs: a tensor of shape of [batch_size, K, num_classes * 4] - representing the class-specific box coordinates relative to anchors. - class_outputs: a tensor of shape of [batch_size, K, num_classes] - representing the class logits before applying score activiation. - anchor_boxes: a tensor of shape of [batch_size, K, 4] representing the - corresponding anchor boxes w.r.t `box_outputs`. - image_shape: a tensor of shape of [batch_size, 2] storing the image height - and width w.r.t. the scaled image, i.e. the same image space as - `box_outputs` and `anchor_boxes`. - is_single_fg_score: a Bool indicator of whether class_outputs includes the - background scores concatenated or not. By default, class_outputs is a - concatenation of both scores for the foreground and background. That is, - scores_without_bg=False. - keep_nms: a Bool indicator of whether to perform NMS or not. - - Returns: - nms_boxes: `float` Tensor of shape [batch_size, max_total_size, 4] - representing top detected boxes in [y1, x1, y2, x2]. - nms_scores: `float` Tensor of shape [batch_size, max_total_size] - representing sorted confidence scores for detected boxes. The values are - between [0, 1]. - nms_classes: `int` Tensor of shape [batch_size, max_total_size] - representing classes for detected boxes. - valid_detections: `int` Tensor of shape [batch_size] only the top - `valid_detections` boxes are valid detections. - """ - if is_single_fg_score: - # Concatenates dummy background scores. - dummy_bg_scores = tf.zeros_like(class_outputs) - class_outputs = tf.stack([dummy_bg_scores, class_outputs], -1) - else: - class_outputs = tf.nn.softmax(class_outputs, axis=-1) - - # Removes the background class. - class_outputs_shape = tf.shape(class_outputs) - batch_size = class_outputs_shape[0] - num_locations = class_outputs_shape[1] - num_classes = class_outputs_shape[-1] - num_detections = num_locations * (num_classes - 1) - - class_outputs = tf.slice(class_outputs, [0, 0, 1], [-1, -1, -1]) - box_outputs = tf.reshape( - box_outputs, - tf.stack([batch_size, num_locations, num_classes, 4], axis=-1)) - box_outputs = tf.slice(box_outputs, [0, 0, 1, 0], [-1, -1, -1, -1]) - anchor_boxes = tf.tile( - tf.expand_dims(anchor_boxes, axis=2), [1, 1, num_classes - 1, 1]) - box_outputs = tf.reshape(box_outputs, - tf.stack([batch_size, num_detections, 4], axis=-1)) - anchor_boxes = tf.reshape( - anchor_boxes, tf.stack([batch_size, num_detections, 4], axis=-1)) - - # Box decoding. For RPN outputs, box_outputs are all zeros. - decoded_boxes = box_utils.decode_boxes( - box_outputs, anchor_boxes, weights=[10.0, 10.0, 5.0, 5.0]) - - # Box clipping - decoded_boxes = box_utils.clip_boxes(decoded_boxes, image_shape) - - decoded_boxes = tf.reshape( - decoded_boxes, - tf.stack([batch_size, num_locations, num_classes - 1, 4], axis=-1)) - - if keep_nms: - nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections = ( - self._generate_detections(decoded_boxes, class_outputs)) - # Adds 1 to offset the background class which has index 0. - nmsed_classes += 1 - else: - nmsed_boxes = decoded_boxes[:, :, 0, :] - nmsed_scores = class_outputs[:, :, 0] - nmsed_classes = tf.cast(tf.ones_like(nmsed_scores), tf.int32) - valid_detections = tf.cast( - tf.reduce_sum(tf.ones_like(nmsed_scores), axis=-1), tf.int32) - - return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections diff --git a/official/vision/detection/ops/roi_ops.py b/official/vision/detection/ops/roi_ops.py deleted file mode 100644 index a198d0ee204996e695f0e58bacada4bf3154ac98..0000000000000000000000000000000000000000 --- a/official/vision/detection/ops/roi_ops.py +++ /dev/null @@ -1,468 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""ROI-related ops.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.detection.ops import nms -from official.vision.detection.utils import box_utils - - -def multilevel_propose_rois(rpn_boxes, - rpn_scores, - anchor_boxes, - image_shape, - rpn_pre_nms_top_k=2000, - rpn_post_nms_top_k=1000, - rpn_nms_threshold=0.7, - rpn_score_threshold=0.0, - rpn_min_size_threshold=0.0, - decode_boxes=True, - clip_boxes=True, - use_batched_nms=False, - apply_sigmoid_to_score=True): - """Proposes RoIs given a group of candidates from different FPN levels. - - The following describes the steps: - 1. For each individual level: - a. Apply sigmoid transform if specified. - b. Decode boxes if specified. - c. Clip boxes if specified. - d. Filter small boxes and those fall outside image if specified. - e. Apply pre-NMS filtering including pre-NMS top k and score thresholding. - f. Apply NMS. - 2. Aggregate post-NMS boxes from each level. - 3. Apply an overall top k to generate the final selected RoIs. - - Args: - rpn_boxes: a dict with keys representing FPN levels and values representing - box tenors of shape [batch_size, feature_h, feature_w, num_anchors * 4]. - rpn_scores: a dict with keys representing FPN levels and values representing - logit tensors of shape [batch_size, feature_h, feature_w, num_anchors]. - anchor_boxes: a dict with keys representing FPN levels and values - representing anchor box tensors of shape [batch_size, feature_h, - feature_w, num_anchors * 4]. - image_shape: a tensor of shape [batch_size, 2] where the last dimension are - [height, width] of the scaled image. - rpn_pre_nms_top_k: an integer of top scoring RPN proposals *per level* to - keep before applying NMS. Default: 2000. - rpn_post_nms_top_k: an integer of top scoring RPN proposals *in total* to - keep after applying NMS. Default: 1000. - rpn_nms_threshold: a float between 0 and 1 representing the IoU threshold - used for NMS. If 0.0, no NMS is applied. Default: 0.7. - rpn_score_threshold: a float between 0 and 1 representing the minimal box - score to keep before applying NMS. This is often used as a pre-filtering - step for better performance. If 0, no filtering is applied. Default: 0. - rpn_min_size_threshold: a float representing the minimal box size in each - side (w.r.t. the scaled image) to keep before applying NMS. This is often - used as a pre-filtering step for better performance. If 0, no filtering is - applied. Default: 0. - decode_boxes: a boolean indicating whether `rpn_boxes` needs to be decoded - using `anchor_boxes`. If False, use `rpn_boxes` directly and ignore - `anchor_boxes`. Default: True. - clip_boxes: a boolean indicating whether boxes are first clipped to the - scaled image size before appliying NMS. If False, no clipping is applied - and `image_shape` is ignored. Default: True. - use_batched_nms: a boolean indicating whether NMS is applied in batch using - `tf.image.combined_non_max_suppression`. Currently only available in - CPU/GPU. Default: False. - apply_sigmoid_to_score: a boolean indicating whether apply sigmoid to - `rpn_scores` before applying NMS. Default: True. - - Returns: - selected_rois: a tensor of shape [batch_size, rpn_post_nms_top_k, 4], - representing the box coordinates of the selected proposals w.r.t. the - scaled image. - selected_roi_scores: a tensor of shape [batch_size, rpn_post_nms_top_k, 1], - representing the scores of the selected proposals. - """ - with tf.name_scope('multilevel_propose_rois'): - rois = [] - roi_scores = [] - image_shape = tf.expand_dims(image_shape, axis=1) - for level in sorted(rpn_scores.keys()): - with tf.name_scope('level_%d' % level): - _, feature_h, feature_w, num_anchors_per_location = ( - rpn_scores[level].get_shape().as_list()) - - num_boxes = feature_h * feature_w * num_anchors_per_location - this_level_scores = tf.reshape(rpn_scores[level], [-1, num_boxes]) - this_level_boxes = tf.reshape(rpn_boxes[level], [-1, num_boxes, 4]) - this_level_anchors = tf.cast( - tf.reshape(anchor_boxes[level], [-1, num_boxes, 4]), - dtype=this_level_scores.dtype) - - if apply_sigmoid_to_score: - this_level_scores = tf.sigmoid(this_level_scores) - - if decode_boxes: - this_level_boxes = box_utils.decode_boxes(this_level_boxes, - this_level_anchors) - if clip_boxes: - this_level_boxes = box_utils.clip_boxes(this_level_boxes, image_shape) - - if rpn_min_size_threshold > 0.0: - this_level_boxes, this_level_scores = box_utils.filter_boxes( - this_level_boxes, this_level_scores, image_shape, - rpn_min_size_threshold) - - this_level_pre_nms_top_k = min(num_boxes, rpn_pre_nms_top_k) - this_level_post_nms_top_k = min(num_boxes, rpn_post_nms_top_k) - if rpn_nms_threshold > 0.0: - if use_batched_nms: - this_level_rois, this_level_roi_scores, _, _ = ( - tf.image.combined_non_max_suppression( - tf.expand_dims(this_level_boxes, axis=2), - tf.expand_dims(this_level_scores, axis=-1), - max_output_size_per_class=this_level_pre_nms_top_k, - max_total_size=this_level_post_nms_top_k, - iou_threshold=rpn_nms_threshold, - score_threshold=rpn_score_threshold, - pad_per_class=False, - clip_boxes=False)) - else: - if rpn_score_threshold > 0.0: - this_level_boxes, this_level_scores = ( - box_utils.filter_boxes_by_scores(this_level_boxes, - this_level_scores, - rpn_score_threshold)) - this_level_boxes, this_level_scores = box_utils.top_k_boxes( - this_level_boxes, this_level_scores, k=this_level_pre_nms_top_k) - this_level_roi_scores, this_level_rois = ( - nms.sorted_non_max_suppression_padded( - this_level_scores, - this_level_boxes, - max_output_size=this_level_post_nms_top_k, - iou_threshold=rpn_nms_threshold)) - else: - this_level_rois, this_level_roi_scores = box_utils.top_k_boxes( - this_level_rois, this_level_scores, k=this_level_post_nms_top_k) - - rois.append(this_level_rois) - roi_scores.append(this_level_roi_scores) - - all_rois = tf.concat(rois, axis=1) - all_roi_scores = tf.concat(roi_scores, axis=1) - - with tf.name_scope('top_k_rois'): - _, num_valid_rois = all_roi_scores.get_shape().as_list() - overall_top_k = min(num_valid_rois, rpn_post_nms_top_k) - - selected_rois, selected_roi_scores = box_utils.top_k_boxes( - all_rois, all_roi_scores, k=overall_top_k) - - return selected_rois, selected_roi_scores - - -class ROIGenerator(tf.keras.layers.Layer): - """Proposes RoIs for the second stage processing.""" - - def __init__(self, params): - self._rpn_pre_nms_top_k = params.rpn_pre_nms_top_k - self._rpn_post_nms_top_k = params.rpn_post_nms_top_k - self._rpn_nms_threshold = params.rpn_nms_threshold - self._rpn_score_threshold = params.rpn_score_threshold - self._rpn_min_size_threshold = params.rpn_min_size_threshold - self._test_rpn_pre_nms_top_k = params.test_rpn_pre_nms_top_k - self._test_rpn_post_nms_top_k = params.test_rpn_post_nms_top_k - self._test_rpn_nms_threshold = params.test_rpn_nms_threshold - self._test_rpn_score_threshold = params.test_rpn_score_threshold - self._test_rpn_min_size_threshold = params.test_rpn_min_size_threshold - self._use_batched_nms = params.use_batched_nms - super(ROIGenerator, self).__init__(autocast=False) - - def call(self, boxes, scores, anchor_boxes, image_shape, is_training): - """Generates RoI proposals. - - Args: - boxes: a dict with keys representing FPN levels and values representing - box tenors of shape [batch_size, feature_h, feature_w, num_anchors * 4]. - scores: a dict with keys representing FPN levels and values representing - logit tensors of shape [batch_size, feature_h, feature_w, num_anchors]. - anchor_boxes: a dict with keys representing FPN levels and values - representing anchor box tensors of shape [batch_size, feature_h, - feature_w, num_anchors * 4]. - image_shape: a tensor of shape [batch_size, 2] where the last dimension - are [height, width] of the scaled image. - is_training: a bool indicating whether it is in training or inference - mode. - - Returns: - proposed_rois: a tensor of shape [batch_size, rpn_post_nms_top_k, 4], - representing the box coordinates of the proposed RoIs w.r.t. the - scaled image. - proposed_roi_scores: a tensor of shape - [batch_size, rpn_post_nms_top_k, 1], representing the scores of the - proposed RoIs. - - """ - proposed_rois, proposed_roi_scores = multilevel_propose_rois( - boxes, - scores, - anchor_boxes, - image_shape, - rpn_pre_nms_top_k=(self._rpn_pre_nms_top_k - if is_training else self._test_rpn_pre_nms_top_k), - rpn_post_nms_top_k=(self._rpn_post_nms_top_k - if is_training else self._test_rpn_post_nms_top_k), - rpn_nms_threshold=(self._rpn_nms_threshold - if is_training else self._test_rpn_nms_threshold), - rpn_score_threshold=(self._rpn_score_threshold if is_training else - self._test_rpn_score_threshold), - rpn_min_size_threshold=(self._rpn_min_size_threshold if is_training else - self._test_rpn_min_size_threshold), - decode_boxes=True, - clip_boxes=True, - use_batched_nms=self._use_batched_nms, - apply_sigmoid_to_score=True) - return proposed_rois, proposed_roi_scores - - -class OlnROIGenerator(ROIGenerator): - """Proposes RoIs for the second stage processing.""" - - def __call__(self, boxes, scores, anchor_boxes, image_shape, is_training, - is_box_lrtb=False, object_scores=None): - """Generates RoI proposals. - - Args: - boxes: a dict with keys representing FPN levels and values representing - box tenors of shape [batch_size, feature_h, feature_w, num_anchors * 4]. - scores: a dict with keys representing FPN levels and values representing - logit tensors of shape [batch_size, feature_h, feature_w, num_anchors]. - anchor_boxes: a dict with keys representing FPN levels and values - representing anchor box tensors of shape [batch_size, feature_h, - feature_w, num_anchors * 4]. - image_shape: a tensor of shape [batch_size, 2] where the last dimension - are [height, width] of the scaled image. - is_training: a bool indicating whether it is in training or inference - mode. - is_box_lrtb: a bool indicating whether boxes are in lrtb (=left,right,top, - bottom) format. - object_scores: another objectness score (e.g., centerness). In OLN, we use - object_scores=centerness as a replacement of the scores at each level. - A dict with keys representing FPN levels and values representing logit - tensors of shape [batch_size, feature_h, feature_w, num_anchors]. - - Returns: - proposed_rois: a tensor of shape [batch_size, rpn_post_nms_top_k, 4], - representing the box coordinates of the proposed RoIs w.r.t. the - scaled image. - proposed_roi_scores: a tensor of shape - [batch_size, rpn_post_nms_top_k, 1], representing the scores of the - proposed RoIs. - - """ - proposed_rois, proposed_roi_scores = self.oln_multilevel_propose_rois( - boxes, - scores, - anchor_boxes, - image_shape, - rpn_pre_nms_top_k=(self._rpn_pre_nms_top_k - if is_training else self._test_rpn_pre_nms_top_k), - rpn_post_nms_top_k=(self._rpn_post_nms_top_k - if is_training else self._test_rpn_post_nms_top_k), - rpn_nms_threshold=(self._rpn_nms_threshold - if is_training else self._test_rpn_nms_threshold), - rpn_score_threshold=(self._rpn_score_threshold if is_training else - self._test_rpn_score_threshold), - rpn_min_size_threshold=(self._rpn_min_size_threshold if is_training else - self._test_rpn_min_size_threshold), - decode_boxes=True, - clip_boxes=True, - use_batched_nms=self._use_batched_nms, - apply_sigmoid_to_score=True, - is_box_lrtb=is_box_lrtb, - rpn_object_scores=object_scores,) - return proposed_rois, proposed_roi_scores - - def oln_multilevel_propose_rois(self, - rpn_boxes, - rpn_scores, - anchor_boxes, - image_shape, - rpn_pre_nms_top_k=2000, - rpn_post_nms_top_k=1000, - rpn_nms_threshold=0.7, - rpn_score_threshold=0.0, - rpn_min_size_threshold=0.0, - decode_boxes=True, - clip_boxes=True, - use_batched_nms=False, - apply_sigmoid_to_score=True, - is_box_lrtb=False, - rpn_object_scores=None,): - """Proposes RoIs given a group of candidates from different FPN levels. - - The following describes the steps: - 1. For each individual level: - a. Adjust scores for each level if specified by rpn_object_scores. - b. Apply sigmoid transform if specified. - c. Decode boxes (either of xyhw or left-right-top-bottom format) if - specified. - d. Clip boxes if specified. - e. Filter small boxes and those fall outside image if specified. - f. Apply pre-NMS filtering including pre-NMS top k and score - thresholding. - g. Apply NMS. - 2. Aggregate post-NMS boxes from each level. - 3. Apply an overall top k to generate the final selected RoIs. - - Args: - rpn_boxes: a dict with keys representing FPN levels and values - representing box tenors of shape [batch_size, feature_h, feature_w, - num_anchors * 4]. - rpn_scores: a dict with keys representing FPN levels and values - representing logit tensors of shape [batch_size, feature_h, feature_w, - num_anchors]. - anchor_boxes: a dict with keys representing FPN levels and values - representing anchor box tensors of shape [batch_size, feature_h, - feature_w, num_anchors * 4]. - image_shape: a tensor of shape [batch_size, 2] where the last dimension - are [height, width] of the scaled image. - rpn_pre_nms_top_k: an integer of top scoring RPN proposals *per level* to - keep before applying NMS. Default: 2000. - rpn_post_nms_top_k: an integer of top scoring RPN proposals *in total* to - keep after applying NMS. Default: 1000. - rpn_nms_threshold: a float between 0 and 1 representing the IoU threshold - used for NMS. If 0.0, no NMS is applied. Default: 0.7. - rpn_score_threshold: a float between 0 and 1 representing the minimal box - score to keep before applying NMS. This is often used as a pre-filtering - step for better performance. If 0, no filtering is applied. Default: 0. - rpn_min_size_threshold: a float representing the minimal box size in each - side (w.r.t. the scaled image) to keep before applying NMS. This is - often used as a pre-filtering step for better performance. If 0, no - filtering is applied. Default: 0. - decode_boxes: a boolean indicating whether `rpn_boxes` needs to be decoded - using `anchor_boxes`. If False, use `rpn_boxes` directly and ignore - `anchor_boxes`. Default: True. - clip_boxes: a boolean indicating whether boxes are first clipped to the - scaled image size before appliying NMS. If False, no clipping is applied - and `image_shape` is ignored. Default: True. - use_batched_nms: a boolean indicating whether NMS is applied in batch - using `tf.image.combined_non_max_suppression`. Currently only available - in CPU/GPU. Default: False. - apply_sigmoid_to_score: a boolean indicating whether apply sigmoid to - `rpn_scores` before applying NMS. Default: True. - is_box_lrtb: a bool indicating whether boxes are in lrtb (=left,right,top, - bottom) format. - rpn_object_scores: a predicted objectness score (e.g., centerness). In - OLN, we use object_scores=centerness as a replacement of the scores at - each level. A dict with keys representing FPN levels and values - representing logit tensors of shape [batch_size, feature_h, feature_w, - num_anchors]. - - Returns: - selected_rois: a tensor of shape [batch_size, rpn_post_nms_top_k, 4], - representing the box coordinates of the selected proposals w.r.t. the - scaled image. - selected_roi_scores: a tensor of shape [batch_size, rpn_post_nms_top_k, - 1],representing the scores of the selected proposals. - """ - with tf.name_scope('multilevel_propose_rois'): - rois = [] - roi_scores = [] - image_shape = tf.expand_dims(image_shape, axis=1) - for level in sorted(rpn_scores.keys()): - with tf.name_scope('level_%d' % level): - _, feature_h, feature_w, num_anchors_per_location = ( - rpn_scores[level].get_shape().as_list()) - - num_boxes = feature_h * feature_w * num_anchors_per_location - this_level_scores = tf.reshape(rpn_scores[level], [-1, num_boxes]) - this_level_boxes = tf.reshape(rpn_boxes[level], [-1, num_boxes, 4]) - this_level_anchors = tf.cast( - tf.reshape(anchor_boxes[level], [-1, num_boxes, 4]), - dtype=this_level_scores.dtype) - - if rpn_object_scores: - this_level_object_scores = rpn_object_scores[level] - this_level_object_scores = tf.reshape(this_level_object_scores, - [-1, num_boxes]) - this_level_object_scores = tf.cast(this_level_object_scores, - this_level_scores.dtype) - this_level_scores = this_level_object_scores - - if apply_sigmoid_to_score: - this_level_scores = tf.sigmoid(this_level_scores) - - if decode_boxes: - if is_box_lrtb: # Box in left-right-top-bottom format. - this_level_boxes = box_utils.decode_boxes_lrtb( - this_level_boxes, this_level_anchors) - else: # Box in standard x-y-h-w format. - this_level_boxes = box_utils.decode_boxes( - this_level_boxes, this_level_anchors) - - if clip_boxes: - this_level_boxes = box_utils.clip_boxes( - this_level_boxes, image_shape) - - if rpn_min_size_threshold > 0.0: - this_level_boxes, this_level_scores = box_utils.filter_boxes( - this_level_boxes, this_level_scores, image_shape, - rpn_min_size_threshold) - - this_level_pre_nms_top_k = min(num_boxes, rpn_pre_nms_top_k) - this_level_post_nms_top_k = min(num_boxes, rpn_post_nms_top_k) - if rpn_nms_threshold > 0.0: - if use_batched_nms: - this_level_rois, this_level_roi_scores, _, _ = ( - tf.image.combined_non_max_suppression( - tf.expand_dims(this_level_boxes, axis=2), - tf.expand_dims(this_level_scores, axis=-1), - max_output_size_per_class=this_level_pre_nms_top_k, - max_total_size=this_level_post_nms_top_k, - iou_threshold=rpn_nms_threshold, - score_threshold=rpn_score_threshold, - pad_per_class=False, - clip_boxes=False)) - else: - if rpn_score_threshold > 0.0: - this_level_boxes, this_level_scores = ( - box_utils.filter_boxes_by_scores(this_level_boxes, - this_level_scores, - rpn_score_threshold)) - this_level_boxes, this_level_scores = box_utils.top_k_boxes( - this_level_boxes, this_level_scores, - k=this_level_pre_nms_top_k) - this_level_roi_scores, this_level_rois = ( - nms.sorted_non_max_suppression_padded( - this_level_scores, - this_level_boxes, - max_output_size=this_level_post_nms_top_k, - iou_threshold=rpn_nms_threshold)) - else: - this_level_rois, this_level_roi_scores = box_utils.top_k_boxes( - this_level_rois, this_level_scores, k=this_level_post_nms_top_k) - - rois.append(this_level_rois) - roi_scores.append(this_level_roi_scores) - - all_rois = tf.concat(rois, axis=1) - all_roi_scores = tf.concat(roi_scores, axis=1) - - with tf.name_scope('top_k_rois'): - _, num_valid_rois = all_roi_scores.get_shape().as_list() - overall_top_k = min(num_valid_rois, rpn_post_nms_top_k) - - selected_rois, selected_roi_scores = box_utils.top_k_boxes( - all_rois, all_roi_scores, k=overall_top_k) - - return selected_rois, selected_roi_scores diff --git a/official/vision/detection/ops/spatial_transform_ops.py b/official/vision/detection/ops/spatial_transform_ops.py deleted file mode 100644 index 4b7d7ecde48ca8dd1eeb4f7356a1642583b1754d..0000000000000000000000000000000000000000 --- a/official/vision/detection/ops/spatial_transform_ops.py +++ /dev/null @@ -1,603 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Functions to performa spatial transformation for Tensor.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -_EPSILON = 1e-8 - - -def nearest_upsampling(data, scale): - """Nearest neighbor upsampling implementation. - - Args: - data: A tensor with a shape of [batch, height_in, width_in, channels]. - scale: An integer multiple to scale resolution of input data. - - Returns: - data_up: A tensor with a shape of - [batch, height_in*scale, width_in*scale, channels]. Same dtype as input - data. - """ - with tf.name_scope('nearest_upsampling'): - bs, _, _, c = data.get_shape().as_list() - shape = tf.shape(input=data) - h = shape[1] - w = shape[2] - bs = -1 if bs is None else bs - # Uses reshape to quickly upsample the input. The nearest pixel is selected - # implicitly via broadcasting. - data = tf.reshape(data, [bs, h, 1, w, 1, c]) * tf.ones( - [1, 1, scale, 1, scale, 1], dtype=data.dtype) - return tf.reshape(data, [bs, h * scale, w * scale, c]) - - -def feature_bilinear_interpolation(features, kernel_y, kernel_x): - """Feature bilinear interpolation. - - The RoIAlign feature f can be computed by bilinear interpolation - of four neighboring feature points f0, f1, f2, and f3. - - f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T - [f10, f11]] - f(y, x) = (hy*hx)f00 + (hy*lx)f01 + (ly*hx)f10 + (lx*ly)f11 - f(y, x) = w00*f00 + w01*f01 + w10*f10 + w11*f11 - kernel_y = [hy, ly] - kernel_x = [hx, lx] - - Args: - features: The features are in shape of [batch_size, num_boxes, output_size * - 2, output_size * 2, num_filters]. - kernel_y: Tensor of size [batch_size, boxes, output_size, 2, 1]. - kernel_x: Tensor of size [batch_size, boxes, output_size, 2, 1]. - - Returns: - A 5-D tensor representing feature crop of shape - [batch_size, num_boxes, output_size, output_size, num_filters]. - - """ - (batch_size, num_boxes, output_size, _, - num_filters) = features.get_shape().as_list() - output_size = output_size // 2 - kernel_y = tf.reshape(kernel_y, [batch_size, num_boxes, output_size * 2, 1]) - kernel_x = tf.reshape(kernel_x, [batch_size, num_boxes, 1, output_size * 2]) - # Use implicit broadcast to generate the interpolation kernel. The - # multiplier `4` is for avg pooling. - interpolation_kernel = kernel_y * kernel_x * 4 - - # Interpolate the gathered features with computed interpolation kernels. - features *= tf.cast( - tf.expand_dims(interpolation_kernel, axis=-1), dtype=features.dtype) - features = tf.reshape( - features, - [batch_size * num_boxes, output_size * 2, output_size * 2, num_filters]) - features = tf.nn.avg_pool(features, [1, 2, 2, 1], [1, 2, 2, 1], 'VALID') - features = tf.reshape( - features, [batch_size, num_boxes, output_size, output_size, num_filters]) - return features - - -def compute_grid_positions(boxes, boundaries, output_size, sample_offset): - """Compute the grid position w.r.t. - - the corresponding feature map. - - Args: - boxes: a 3-D tensor of shape [batch_size, num_boxes, 4] encoding the - information of each box w.r.t. the corresponding feature map. - boxes[:, :, 0:2] are the grid position in (y, x) (float) of the top-left - corner of each box. boxes[:, :, 2:4] are the box sizes in (h, w) (float) - in terms of the number of pixels of the corresponding feature map size. - boundaries: a 3-D tensor of shape [batch_size, num_boxes, 2] representing - the boundary (in (y, x)) of the corresponding feature map for each box. - Any resampled grid points that go beyond the bounary will be clipped. - output_size: a scalar indicating the output crop size. - sample_offset: a float number in [0, 1] indicates the subpixel sample offset - from grid point. - - Returns: - kernel_y: Tensor of size [batch_size, boxes, output_size, 2, 1]. - kernel_x: Tensor of size [batch_size, boxes, output_size, 2, 1]. - box_grid_y0y1: Tensor of size [batch_size, boxes, output_size, 2] - box_grid_x0x1: Tensor of size [batch_size, boxes, output_size, 2] - """ - batch_size, num_boxes, _ = boxes.get_shape().as_list() - box_grid_x = [] - box_grid_y = [] - for i in range(output_size): - box_grid_x.append(boxes[:, :, 1] + - (i + sample_offset) * boxes[:, :, 3] / output_size) - box_grid_y.append(boxes[:, :, 0] + - (i + sample_offset) * boxes[:, :, 2] / output_size) - box_grid_x = tf.stack(box_grid_x, axis=2) - box_grid_y = tf.stack(box_grid_y, axis=2) - - box_grid_y0 = tf.floor(box_grid_y) - box_grid_x0 = tf.floor(box_grid_x) - box_grid_x0 = tf.maximum(0., box_grid_x0) - box_grid_y0 = tf.maximum(0., box_grid_y0) - - box_grid_x0 = tf.minimum(box_grid_x0, tf.expand_dims(boundaries[:, :, 1], -1)) - box_grid_x1 = tf.minimum(box_grid_x0 + 1, - tf.expand_dims(boundaries[:, :, 1], -1)) - box_grid_y0 = tf.minimum(box_grid_y0, tf.expand_dims(boundaries[:, :, 0], -1)) - box_grid_y1 = tf.minimum(box_grid_y0 + 1, - tf.expand_dims(boundaries[:, :, 0], -1)) - - box_gridx0x1 = tf.stack([box_grid_x0, box_grid_x1], axis=-1) - box_gridy0y1 = tf.stack([box_grid_y0, box_grid_y1], axis=-1) - - # The RoIAlign feature f can be computed by bilinear interpolation of four - # neighboring feature points f0, f1, f2, and f3. - # f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T - # [f10, f11]] - # f(y, x) = (hy*hx)f00 + (hy*lx)f01 + (ly*hx)f10 + (lx*ly)f11 - # f(y, x) = w00*f00 + w01*f01 + w10*f10 + w11*f11 - ly = box_grid_y - box_grid_y0 - lx = box_grid_x - box_grid_x0 - hy = 1.0 - ly - hx = 1.0 - lx - kernel_y = tf.reshape( - tf.stack([hy, ly], axis=3), [batch_size, num_boxes, output_size, 2, 1]) - kernel_x = tf.reshape( - tf.stack([hx, lx], axis=3), [batch_size, num_boxes, output_size, 2, 1]) - return kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 - - -def get_grid_one_hot(box_gridy0y1, box_gridx0x1, feature_height, feature_width): - """Get grid_one_hot from indices and feature_size.""" - (batch_size, num_boxes, output_size, _) = box_gridx0x1.get_shape().as_list() - y_indices = tf.cast( - tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size, 2]), - dtype=tf.int32) - x_indices = tf.cast( - tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size, 2]), - dtype=tf.int32) - - # shape is [batch_size, num_boxes, output_size, 2, height] - grid_y_one_hot = tf.one_hot(tf.cast(y_indices, tf.int32), feature_height) - # shape is [batch_size, num_boxes, output_size, 2, width] - grid_x_one_hot = tf.one_hot(tf.cast(x_indices, tf.int32), feature_width) - - return grid_y_one_hot, grid_x_one_hot - - -def selective_crop_and_resize(features, - boxes, - box_levels, - boundaries, - output_size=7, - sample_offset=0.5, - use_einsum_gather=False): - """Crop and resize boxes on a set of feature maps. - - Given multiple features maps indexed by different levels, and a set of boxes - where each box is mapped to a certain level, it selectively crops and resizes - boxes from the corresponding feature maps to generate the box features. - - We follow the ROIAlign technique (see https://arxiv.org/pdf/1703.06870.pdf, - figure 3 for reference). Specifically, for each feature map, we select an - (output_size, output_size) set of pixels corresponding to the box location, - and then use bilinear interpolation to select the feature value for each - pixel. - - For performance, we perform the gather and interpolation on all layers as a - single operation. In this op the multi-level features are first stacked and - gathered into [2*output_size, 2*output_size] feature points. Then bilinear - interpolation is performed on the gathered feature points to generate - [output_size, output_size] RoIAlign feature map. - - Here is the step-by-step algorithm: - 1. The multi-level features are gathered into a - [batch_size, num_boxes, output_size*2, output_size*2, num_filters] - Tensor. The Tensor contains four neighboring feature points for each - vertice in the output grid. - 2. Compute the interpolation kernel of shape - [batch_size, num_boxes, output_size*2, output_size*2]. The last 2 axis - can be seen as stacking 2x2 interpolation kernels for all vertices in the - output grid. - 3. Element-wise multiply the gathered features and interpolation kernel. - Then apply 2x2 average pooling to reduce spatial dimension to - output_size. - - Args: - features: a 5-D tensor of shape [batch_size, num_levels, max_height, - max_width, num_filters] where cropping and resizing are based. - boxes: a 3-D tensor of shape [batch_size, num_boxes, 4] encoding the - information of each box w.r.t. the corresponding feature map. - boxes[:, :, 0:2] are the grid position in (y, x) (float) of the top-left - corner of each box. boxes[:, :, 2:4] are the box sizes in (h, w) (float) - in terms of the number of pixels of the corresponding feature map size. - box_levels: a 3-D tensor of shape [batch_size, num_boxes, 1] representing - the 0-based corresponding feature level index of each box. - boundaries: a 3-D tensor of shape [batch_size, num_boxes, 2] representing - the boundary (in (y, x)) of the corresponding feature map for each box. - Any resampled grid points that go beyond the bounary will be clipped. - output_size: a scalar indicating the output crop size. - sample_offset: a float number in [0, 1] indicates the subpixel sample offset - from grid point. - use_einsum_gather: use einsum to replace gather or not. Replacing einsum - with gather can improve performance when feature size is not large, einsum - is friendly with model partition as well. Gather's performance is better - when feature size is very large and there are multiple box levels. - - Returns: - features_per_box: a 5-D tensor of shape - [batch_size, num_boxes, output_size, output_size, num_filters] - representing the cropped features. - """ - (batch_size, num_levels, max_feature_height, max_feature_width, - num_filters) = features.get_shape().as_list() - _, num_boxes, _ = boxes.get_shape().as_list() - - kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 = compute_grid_positions( - boxes, boundaries, output_size, sample_offset) - x_indices = tf.cast( - tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - y_indices = tf.cast( - tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - - if use_einsum_gather: - # Blinear interpolation is done during the last two gathers: - # f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T - # [f10, f11]] - # [[f00, f01], - # [f10, f11]] = tf.einsum(tf.einsum(features, y_one_hot), x_one_hot) - # where [hy, ly] and [hx, lx] are the bilinear interpolation kernel. - - # shape is [batch_size, boxes, output_size, 2, 1] - grid_y_one_hot, grid_x_one_hot = get_grid_one_hot(box_gridy0y1, - box_gridx0x1, - max_feature_height, - max_feature_width) - - # shape is [batch_size, num_boxes, output_size, height] - grid_y_weight = tf.reduce_sum( - tf.multiply(grid_y_one_hot, kernel_y), axis=-2) - # shape is [batch_size, num_boxes, output_size, width] - grid_x_weight = tf.reduce_sum( - tf.multiply(grid_x_one_hot, kernel_x), axis=-2) - - # Gather for y_axis. - # shape is [batch_size, num_boxes, output_size, width, features] - features_per_box = tf.einsum('bmhwf,bmoh->bmowf', features, - tf.cast(grid_y_weight, features.dtype)) - # Gather for x_axis. - # shape is [batch_size, num_boxes, output_size, output_size, features] - features_per_box = tf.einsum('bmhwf,bmow->bmhof', features_per_box, - tf.cast(grid_x_weight, features.dtype)) - else: - height_dim_offset = max_feature_width - level_dim_offset = max_feature_height * height_dim_offset - batch_dim_offset = num_levels * level_dim_offset - - batch_size_offset = tf.tile( - tf.reshape( - tf.range(batch_size) * batch_dim_offset, [batch_size, 1, 1, 1]), - [1, num_boxes, output_size * 2, output_size * 2]) - box_levels_offset = tf.tile( - tf.reshape(box_levels * level_dim_offset, - [batch_size, num_boxes, 1, 1]), - [1, 1, output_size * 2, output_size * 2]) - y_indices_offset = tf.tile( - tf.reshape(y_indices * height_dim_offset, - [batch_size, num_boxes, output_size * 2, 1]), - [1, 1, 1, output_size * 2]) - x_indices_offset = tf.tile( - tf.reshape(x_indices, [batch_size, num_boxes, 1, output_size * 2]), - [1, 1, output_size * 2, 1]) - - indices = tf.reshape( - batch_size_offset + box_levels_offset + y_indices_offset + - x_indices_offset, [-1]) - - features = tf.reshape(features, [-1, num_filters]) - # TODO(wangtao): replace tf.gather with tf.gather_nd and try to get similar - # performance. - features_per_box = tf.reshape( - tf.gather(features, indices), - [batch_size, num_boxes, output_size * 2, output_size * 2, num_filters]) - features_per_box = feature_bilinear_interpolation(features_per_box, - kernel_y, kernel_x) - - return features_per_box - - -def multilevel_crop_and_resize(features, boxes, output_size=7): - """Crop and resize on multilevel feature pyramid. - - Generate the (output_size, output_size) set of pixels for each input box - by first locating the box into the correct feature level, and then cropping - and resizing it using the correspoding feature map of that level. - - Args: - features: A dictionary with key as pyramid level and value as features. The - features are in shape of [batch_size, height_l, width_l, num_filters]. - boxes: A 3-D Tensor of shape [batch_size, num_boxes, 4]. Each row represents - a box with [y1, x1, y2, x2] in un-normalized coordinates. - output_size: A scalar to indicate the output crop size. - - Returns: - A 5-D tensor representing feature crop of shape - [batch_size, num_boxes, output_size, output_size, num_filters]. - """ - - with tf.name_scope('multilevel_crop_and_resize'): - levels = list(features.keys()) - min_level = min(levels) - max_level = max(levels) - batch_size, max_feature_height, max_feature_width, num_filters = ( - features[min_level].get_shape().as_list()) - _, num_boxes, _ = boxes.get_shape().as_list() - - # Stack feature pyramid into a features_all of shape - # [batch_size, levels, height, width, num_filters]. - features_all = [] - feature_heights = [] - feature_widths = [] - for level in range(min_level, max_level + 1): - shape = features[level].get_shape().as_list() - feature_heights.append(shape[1]) - feature_widths.append(shape[2]) - # Concat tensor of [batch_size, height_l * width_l, num_filters] for each - # levels. - features_all.append( - tf.reshape(features[level], [batch_size, -1, num_filters])) - features_r2 = tf.reshape(tf.concat(features_all, 1), [-1, num_filters]) - - # Calculate height_l * width_l for each level. - level_dim_sizes = [ - feature_widths[i] * feature_heights[i] - for i in range(len(feature_widths)) - ] - # level_dim_offsets is accumulated sum of level_dim_size. - level_dim_offsets = [0] - for i in range(len(feature_widths) - 1): - level_dim_offsets.append(level_dim_offsets[i] + level_dim_sizes[i]) - batch_dim_size = level_dim_offsets[-1] + level_dim_sizes[-1] - level_dim_offsets = tf.constant(level_dim_offsets, tf.int32) - height_dim_sizes = tf.constant(feature_widths, tf.int32) - - # Assigns boxes to the right level. - box_width = boxes[:, :, 3] - boxes[:, :, 1] - box_height = boxes[:, :, 2] - boxes[:, :, 0] - areas_sqrt = tf.sqrt(box_height * box_width) - levels = tf.cast( - tf.math.floordiv( - tf.math.log(tf.divide(areas_sqrt, 224.0)), tf.math.log(2.0)) + 4.0, - dtype=tf.int32) - # Maps levels between [min_level, max_level]. - levels = tf.minimum(max_level, tf.maximum(levels, min_level)) - - # Projects box location and sizes to corresponding feature levels. - scale_to_level = tf.cast( - tf.pow(tf.constant(2.0), tf.cast(levels, tf.float32)), - dtype=boxes.dtype) - boxes /= tf.expand_dims(scale_to_level, axis=2) - box_width /= scale_to_level - box_height /= scale_to_level - boxes = tf.concat([ - boxes[:, :, 0:2], - tf.expand_dims(box_height, -1), - tf.expand_dims(box_width, -1) - ], - axis=-1) - - # Maps levels to [0, max_level-min_level]. - levels -= min_level - level_strides = tf.pow([[2.0]], tf.cast(levels, tf.float32)) - boundary = tf.cast( - tf.concat([ - tf.expand_dims( - [[tf.cast(max_feature_height, tf.float32)]] / level_strides - 1, - axis=-1), - tf.expand_dims( - [[tf.cast(max_feature_width, tf.float32)]] / level_strides - 1, - axis=-1), - ], - axis=-1), boxes.dtype) - - # Compute grid positions. - kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 = compute_grid_positions( - boxes, boundary, output_size, sample_offset=0.5) - - x_indices = tf.cast( - tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - y_indices = tf.cast( - tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size * 2]), - dtype=tf.int32) - - batch_size_offset = tf.tile( - tf.reshape( - tf.range(batch_size) * batch_dim_size, [batch_size, 1, 1, 1]), - [1, num_boxes, output_size * 2, output_size * 2]) - # Get level offset for each box. Each box belongs to one level. - levels_offset = tf.tile( - tf.reshape( - tf.gather(level_dim_offsets, levels), - [batch_size, num_boxes, 1, 1]), - [1, 1, output_size * 2, output_size * 2]) - y_indices_offset = tf.tile( - tf.reshape( - y_indices * tf.expand_dims(tf.gather(height_dim_sizes, levels), -1), - [batch_size, num_boxes, output_size * 2, 1]), - [1, 1, 1, output_size * 2]) - x_indices_offset = tf.tile( - tf.reshape(x_indices, [batch_size, num_boxes, 1, output_size * 2]), - [1, 1, output_size * 2, 1]) - indices = tf.reshape( - batch_size_offset + levels_offset + y_indices_offset + x_indices_offset, - [-1]) - - # TODO(wangtao): replace tf.gather with tf.gather_nd and try to get similar - # performance. - features_per_box = tf.reshape( - tf.gather(features_r2, indices), - [batch_size, num_boxes, output_size * 2, output_size * 2, num_filters]) - - # Bilinear interpolation. - features_per_box = feature_bilinear_interpolation(features_per_box, - kernel_y, kernel_x) - return features_per_box - - -def single_level_feature_crop(features, level_boxes, detection_prior_levels, - min_mask_level, mask_crop_size): - """Crop the FPN features at the appropriate levels for each detection. - - - Args: - features: a float tensor of shape [batch_size, num_levels, max_feature_size, - max_feature_size, num_downsample_channels]. - level_boxes: a float Tensor of the level boxes to crop from. [batch_size, - num_instances, 4]. - detection_prior_levels: an int Tensor of instance assigned level of shape - [batch_size, num_instances]. - min_mask_level: minimum FPN level to crop mask feature from. - mask_crop_size: an int of mask crop size. - - Returns: - crop_features: a float Tensor of shape [batch_size * num_instances, - mask_crop_size, mask_crop_size, num_downsample_channels]. This is the - instance feature crop. - """ - (batch_size, num_levels, max_feature_size, _, - num_downsample_channels) = features.get_shape().as_list() - _, num_of_instances, _ = level_boxes.get_shape().as_list() - level_boxes = tf.cast(level_boxes, tf.int32) - assert num_of_instances == detection_prior_levels.get_shape().as_list()[1] - - x_start_indices = level_boxes[:, :, 1] - y_start_indices = level_boxes[:, :, 0] - # generate the full indices (not just the starting index) - x_idx_list = [] - y_idx_list = [] - for i in range(mask_crop_size): - x_idx_list.append(x_start_indices + i) - y_idx_list.append(y_start_indices + i) - - x_indices = tf.stack(x_idx_list, axis=2) - y_indices = tf.stack(y_idx_list, axis=2) - levels = detection_prior_levels - min_mask_level - height_dim_size = max_feature_size - level_dim_size = max_feature_size * height_dim_size - batch_dim_size = num_levels * level_dim_size - # TODO(weicheng) change this to gather_nd for better readability. - indices = tf.reshape( - tf.tile( - tf.reshape( - tf.range(batch_size) * batch_dim_size, [batch_size, 1, 1, 1]), - [1, num_of_instances, mask_crop_size, mask_crop_size]) + tf.tile( - tf.reshape(levels * level_dim_size, - [batch_size, num_of_instances, 1, 1]), - [1, 1, mask_crop_size, mask_crop_size]) + tf.tile( - tf.reshape(y_indices * height_dim_size, - [batch_size, num_of_instances, mask_crop_size, 1]), - [1, 1, 1, mask_crop_size]) + - tf.tile( - tf.reshape(x_indices, - [batch_size, num_of_instances, 1, mask_crop_size]), - [1, 1, mask_crop_size, 1]), [-1]) - - features_r2 = tf.reshape(features, [-1, num_downsample_channels]) - crop_features = tf.reshape( - tf.gather(features_r2, indices), [ - batch_size * num_of_instances, mask_crop_size, mask_crop_size, - num_downsample_channels - ]) - - return crop_features - - -def crop_mask_in_target_box(masks, - boxes, - target_boxes, - output_size, - sample_offset=0, - use_einsum=True): - """Crop masks in target boxes. - - Args: - masks: A tensor with a shape of [batch_size, num_masks, height, width]. - boxes: a float tensor representing box cooridnates that tightly enclose - masks with a shape of [batch_size, num_masks, 4] in un-normalized - coordinates. A box is represented by [ymin, xmin, ymax, xmax]. - target_boxes: a float tensor representing target box cooridnates for masks - with a shape of [batch_size, num_masks, 4] in un-normalized coordinates. A - box is represented by [ymin, xmin, ymax, xmax]. - output_size: A scalar to indicate the output crop size. It currently only - supports to output a square shape outputs. - sample_offset: a float number in [0, 1] indicates the subpixel sample offset - from grid point. - use_einsum: Use einsum to replace gather in selective_crop_and_resize. - - Returns: - A 4-D tensor representing feature crop of shape - [batch_size, num_boxes, output_size, output_size]. - """ - with tf.name_scope('crop_mask_in_target_box'): - batch_size, num_masks, height, width = masks.get_shape().as_list() - masks = tf.reshape(masks, [batch_size * num_masks, height, width, 1]) - # Pad zeros on the boundary of masks. - masks = tf.image.pad_to_bounding_box(masks, 2, 2, height + 4, width + 4) - masks = tf.reshape(masks, [batch_size, num_masks, height + 4, width + 4, 1]) - - # Projects target box locations and sizes to corresponding cropped - # mask coordinates. - gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( - value=boxes, num_or_size_splits=4, axis=2) - bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( - value=target_boxes, num_or_size_splits=4, axis=2) - y_transform = (bb_y_min - gt_y_min) * height / (gt_y_max - gt_y_min + - _EPSILON) + 2 - x_transform = (bb_x_min - gt_x_min) * height / (gt_x_max - gt_x_min + - _EPSILON) + 2 - h_transform = (bb_y_max - bb_y_min) * width / ( - gt_y_max - gt_y_min + _EPSILON) - w_transform = (bb_x_max - bb_x_min) * width / ( - gt_x_max - gt_x_min + _EPSILON) - - boundaries = tf.concat([ - tf.cast( - tf.ones_like(y_transform) * ((height + 4) - 1), dtype=tf.float32), - tf.cast( - tf.ones_like(x_transform) * ((width + 4) - 1), dtype=tf.float32) - ], - axis=-1) - - # Reshape tensors to have the right shape for selective_crop_and_resize. - trasnformed_boxes = tf.concat( - [y_transform, x_transform, h_transform, w_transform], -1) - levels = tf.tile( - tf.reshape(tf.range(num_masks), [1, num_masks]), [batch_size, 1]) - - cropped_masks = selective_crop_and_resize( - masks, - trasnformed_boxes, - levels, - boundaries, - output_size, - sample_offset=sample_offset, - use_einsum_gather=use_einsum) - cropped_masks = tf.squeeze(cropped_masks, axis=-1) - - return cropped_masks diff --git a/official/vision/detection/ops/target_ops.py b/official/vision/detection/ops/target_ops.py deleted file mode 100644 index 8dca9f9cb4fd4807435135af35e56c5fe3148bc7..0000000000000000000000000000000000000000 --- a/official/vision/detection/ops/target_ops.py +++ /dev/null @@ -1,571 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Target and sampling related ops.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.detection.ops import spatial_transform_ops -from official.vision.detection.utils import box_utils -from official.vision.utils.object_detection import balanced_positive_negative_sampler - - -def box_matching(boxes, gt_boxes, gt_classes): - """Match boxes to groundtruth boxes. - - Given the proposal boxes and the groundtruth boxes and classes, perform the - groundtruth matching by taking the argmax of the IoU between boxes and - groundtruth boxes. - - Args: - boxes: a tensor of shape of [batch_size, N, 4] representing the box - coordiantes to be matched to groundtruth boxes. - gt_boxes: a tensor of shape of [batch_size, MAX_INSTANCES, 4] representing - the groundtruth box coordinates. It is padded with -1s to indicate the - invalid boxes. - gt_classes: [batch_size, MAX_INSTANCES] representing the groundtruth box - classes. It is padded with -1s to indicate the invalid classes. - - Returns: - matched_gt_boxes: a tensor of shape of [batch_size, N, 4], representing - the matched groundtruth box coordinates for each input box. If the box - does not overlap with any groundtruth boxes, the matched boxes of it - will be set to all 0s. - matched_gt_classes: a tensor of shape of [batch_size, N], representing - the matched groundtruth classes for each input box. If the box does not - overlap with any groundtruth boxes, the matched box classes of it will - be set to 0, which corresponds to the background class. - matched_gt_indices: a tensor of shape of [batch_size, N], representing - the indices of the matched groundtruth boxes in the original gt_boxes - tensor. If the box does not overlap with any groundtruth boxes, the - index of the matched groundtruth will be set to -1. - matched_iou: a tensor of shape of [batch_size, N], representing the IoU - between the box and its matched groundtruth box. The matched IoU is the - maximum IoU of the box and all the groundtruth boxes. - iou: a tensor of shape of [batch_size, N, K], representing the IoU matrix - between boxes and the groundtruth boxes. The IoU between a box and the - invalid groundtruth boxes whose coordinates are [-1, -1, -1, -1] is -1. - """ - # Compute IoU between boxes and gt_boxes. - # iou <- [batch_size, N, K] - iou = box_utils.bbox_overlap(boxes, gt_boxes) - - # max_iou <- [batch_size, N] - # 0.0 -> no match to gt, or -1.0 match to no gt - matched_iou = tf.reduce_max(iou, axis=-1) - - # background_box_mask <- bool, [batch_size, N] - background_box_mask = tf.less_equal(matched_iou, 0.0) - - argmax_iou_indices = tf.argmax(iou, axis=-1, output_type=tf.int32) - - argmax_iou_indices_shape = tf.shape(argmax_iou_indices) - batch_indices = ( - tf.expand_dims(tf.range(argmax_iou_indices_shape[0]), axis=-1) * - tf.ones([1, argmax_iou_indices_shape[-1]], dtype=tf.int32)) - gather_nd_indices = tf.stack([batch_indices, argmax_iou_indices], axis=-1) - - matched_gt_boxes = tf.gather_nd(gt_boxes, gather_nd_indices) - matched_gt_boxes = tf.where( - tf.tile(tf.expand_dims(background_box_mask, axis=-1), [1, 1, 4]), - tf.zeros_like(matched_gt_boxes, dtype=matched_gt_boxes.dtype), - matched_gt_boxes) - - matched_gt_classes = tf.gather_nd(gt_classes, gather_nd_indices) - matched_gt_classes = tf.where(background_box_mask, - tf.zeros_like(matched_gt_classes), - matched_gt_classes) - - matched_gt_indices = tf.where(background_box_mask, - -tf.ones_like(argmax_iou_indices), - argmax_iou_indices) - - return (matched_gt_boxes, matched_gt_classes, matched_gt_indices, matched_iou, - iou) - - -def assign_and_sample_proposals(proposed_boxes, - gt_boxes, - gt_classes, - num_samples_per_image=512, - mix_gt_boxes=True, - fg_fraction=0.25, - fg_iou_thresh=0.5, - bg_iou_thresh_hi=0.5, - bg_iou_thresh_lo=0.0): - """Assigns the proposals with groundtruth classes and performs subsmpling. - - Given `proposed_boxes`, `gt_boxes`, and `gt_classes`, the function uses the - following algorithm to generate the final `num_samples_per_image` RoIs. - 1. Calculates the IoU between each proposal box and each gt_boxes. - 2. Assigns each proposed box with a groundtruth class and box by choosing - the largest IoU overlap. - 3. Samples `num_samples_per_image` boxes from all proposed boxes, and - returns box_targets, class_targets, and RoIs. - - Args: - proposed_boxes: a tensor of shape of [batch_size, N, 4]. N is the number of - proposals before groundtruth assignment. The last dimension is the box - coordinates w.r.t. the scaled images in [ymin, xmin, ymax, xmax] format. - gt_boxes: a tensor of shape of [batch_size, MAX_NUM_INSTANCES, 4]. The - coordinates of gt_boxes are in the pixel coordinates of the scaled image. - This tensor might have padding of values -1 indicating the invalid box - coordinates. - gt_classes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES]. This - tensor might have paddings with values of -1 indicating the invalid - classes. - num_samples_per_image: a integer represents RoI minibatch size per image. - mix_gt_boxes: a bool indicating whether to mix the groundtruth boxes before - sampling proposals. - fg_fraction: a float represents the target fraction of RoI minibatch that is - labeled foreground (i.e., class > 0). - fg_iou_thresh: a float represents the IoU overlap threshold for an RoI to be - considered foreground (if >= fg_iou_thresh). - bg_iou_thresh_hi: a float represents the IoU overlap threshold for an RoI to - be considered background (class = 0 if overlap in [LO, HI)). - bg_iou_thresh_lo: a float represents the IoU overlap threshold for an RoI to - be considered background (class = 0 if overlap in [LO, HI)). - - Returns: - sampled_rois: a tensor of shape of [batch_size, K, 4], representing the - coordinates of the sampled RoIs, where K is the number of the sampled - RoIs, i.e. K = num_samples_per_image. - sampled_gt_boxes: a tensor of shape of [batch_size, K, 4], storing the - box coordinates of the matched groundtruth boxes of the samples RoIs. - sampled_gt_classes: a tensor of shape of [batch_size, K], storing the - classes of the matched groundtruth boxes of the sampled RoIs. - sampled_gt_indices: a tensor of shape of [batch_size, K], storing the - indices of the sampled groudntruth boxes in the original `gt_boxes` - tensor, i.e. gt_boxes[sampled_gt_indices[:, i]] = sampled_gt_boxes[:, i]. - """ - - with tf.name_scope('sample_proposals'): - if mix_gt_boxes: - boxes = tf.concat([proposed_boxes, gt_boxes], axis=1) - else: - boxes = proposed_boxes - - (matched_gt_boxes, matched_gt_classes, matched_gt_indices, matched_iou, - _) = box_matching(boxes, gt_boxes, gt_classes) - - positive_match = tf.greater(matched_iou, fg_iou_thresh) - negative_match = tf.logical_and( - tf.greater_equal(matched_iou, bg_iou_thresh_lo), - tf.less(matched_iou, bg_iou_thresh_hi)) - ignored_match = tf.less(matched_iou, 0.0) - - # re-assign negatively matched boxes to the background class. - matched_gt_classes = tf.where(negative_match, - tf.zeros_like(matched_gt_classes), - matched_gt_classes) - matched_gt_indices = tf.where(negative_match, - tf.zeros_like(matched_gt_indices), - matched_gt_indices) - - sample_candidates = tf.logical_and( - tf.logical_or(positive_match, negative_match), - tf.logical_not(ignored_match)) - - sampler = ( - balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( - positive_fraction=fg_fraction, is_static=True)) - - batch_size, _ = sample_candidates.get_shape().as_list() - sampled_indicators = [] - for i in range(batch_size): - sampled_indicator = sampler.subsample(sample_candidates[i], - num_samples_per_image, - positive_match[i]) - sampled_indicators.append(sampled_indicator) - sampled_indicators = tf.stack(sampled_indicators) - _, sampled_indices = tf.nn.top_k( - tf.cast(sampled_indicators, dtype=tf.int32), - k=num_samples_per_image, - sorted=True) - - sampled_indices_shape = tf.shape(sampled_indices) - batch_indices = ( - tf.expand_dims(tf.range(sampled_indices_shape[0]), axis=-1) * - tf.ones([1, sampled_indices_shape[-1]], dtype=tf.int32)) - gather_nd_indices = tf.stack([batch_indices, sampled_indices], axis=-1) - - sampled_rois = tf.gather_nd(boxes, gather_nd_indices) - sampled_gt_boxes = tf.gather_nd(matched_gt_boxes, gather_nd_indices) - sampled_gt_classes = tf.gather_nd(matched_gt_classes, gather_nd_indices) - sampled_gt_indices = tf.gather_nd(matched_gt_indices, gather_nd_indices) - - return (sampled_rois, sampled_gt_boxes, sampled_gt_classes, - sampled_gt_indices) - - -def sample_and_crop_foreground_masks(candidate_rois, - candidate_gt_boxes, - candidate_gt_classes, - candidate_gt_indices, - gt_masks, - num_mask_samples_per_image=128, - mask_target_size=28): - """Samples and creates cropped foreground masks for training. - - Args: - candidate_rois: a tensor of shape of [batch_size, N, 4], where N is the - number of candidate RoIs to be considered for mask sampling. It includes - both positive and negative RoIs. The `num_mask_samples_per_image` positive - RoIs will be sampled to create mask training targets. - candidate_gt_boxes: a tensor of shape of [batch_size, N, 4], storing the - corresponding groundtruth boxes to the `candidate_rois`. - candidate_gt_classes: a tensor of shape of [batch_size, N], storing the - corresponding groundtruth classes to the `candidate_rois`. 0 in the tensor - corresponds to the background class, i.e. negative RoIs. - candidate_gt_indices: a tensor of shape [batch_size, N], storing the - corresponding groundtruth instance indices to the `candidate_gt_boxes`, - i.e. gt_boxes[candidate_gt_indices[:, i]] = candidate_gt_boxes[:, i] and - gt_boxes which is of shape [batch_size, MAX_INSTANCES, 4], M >= N, is - the superset of candidate_gt_boxes. - gt_masks: a tensor of [batch_size, MAX_INSTANCES, mask_height, mask_width] - containing all the groundtruth masks which sample masks are drawn from. - num_mask_samples_per_image: an integer which specifies the number of masks - to sample. - mask_target_size: an integer which specifies the final cropped mask size - after sampling. The output masks are resized w.r.t the sampled RoIs. - - Returns: - foreground_rois: a tensor of shape of [batch_size, K, 4] storing the RoI - that corresponds to the sampled foreground masks, where - K = num_mask_samples_per_image. - foreground_classes: a tensor of shape of [batch_size, K] storing the classes - corresponding to the sampled foreground masks. - cropoped_foreground_masks: a tensor of shape of - [batch_size, K, mask_target_size, mask_target_size] storing the cropped - foreground masks used for training. - """ - with tf.name_scope('sample_and_crop_foreground_masks'): - _, fg_instance_indices = tf.nn.top_k( - tf.cast(tf.greater(candidate_gt_classes, 0), dtype=tf.int32), - k=num_mask_samples_per_image) - - fg_instance_indices_shape = tf.shape(fg_instance_indices) - batch_indices = ( - tf.expand_dims(tf.range(fg_instance_indices_shape[0]), axis=-1) * - tf.ones([1, fg_instance_indices_shape[-1]], dtype=tf.int32)) - - gather_nd_instance_indices = tf.stack([batch_indices, fg_instance_indices], - axis=-1) - foreground_rois = tf.gather_nd(candidate_rois, gather_nd_instance_indices) - foreground_boxes = tf.gather_nd(candidate_gt_boxes, - gather_nd_instance_indices) - foreground_classes = tf.gather_nd(candidate_gt_classes, - gather_nd_instance_indices) - foreground_gt_indices = tf.gather_nd(candidate_gt_indices, - gather_nd_instance_indices) - - foreground_gt_indices_shape = tf.shape(foreground_gt_indices) - batch_indices = ( - tf.expand_dims(tf.range(foreground_gt_indices_shape[0]), axis=-1) * - tf.ones([1, foreground_gt_indices_shape[-1]], dtype=tf.int32)) - gather_nd_gt_indices = tf.stack([batch_indices, foreground_gt_indices], - axis=-1) - foreground_masks = tf.gather_nd(gt_masks, gather_nd_gt_indices) - - cropped_foreground_masks = spatial_transform_ops.crop_mask_in_target_box( - foreground_masks, - foreground_boxes, - foreground_rois, - mask_target_size, - sample_offset=0.5) - - return foreground_rois, foreground_classes, cropped_foreground_masks - - -class ROISampler(tf.keras.layers.Layer): - """Samples RoIs and creates training targets.""" - - def __init__(self, params): - self._num_samples_per_image = params.num_samples_per_image - self._fg_fraction = params.fg_fraction - self._fg_iou_thresh = params.fg_iou_thresh - self._bg_iou_thresh_hi = params.bg_iou_thresh_hi - self._bg_iou_thresh_lo = params.bg_iou_thresh_lo - self._mix_gt_boxes = params.mix_gt_boxes - super(ROISampler, self).__init__(autocast=False) - - def call(self, rois, gt_boxes, gt_classes): - """Sample and assign RoIs for training. - - Args: - rois: a tensor of shape of [batch_size, N, 4]. N is the number of - proposals before groundtruth assignment. The last dimension is the box - coordinates w.r.t. the scaled images in [ymin, xmin, ymax, xmax] format. - gt_boxes: a tensor of shape of [batch_size, MAX_NUM_INSTANCES, 4]. The - coordinates of gt_boxes are in the pixel coordinates of the scaled - image. This tensor might have padding of values -1 indicating the - invalid box coordinates. - gt_classes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES]. This - tensor might have paddings with values of -1 indicating the invalid - classes. - - Returns: - sampled_rois: a tensor of shape of [batch_size, K, 4], representing the - coordinates of the sampled RoIs, where K is the number of the sampled - RoIs, i.e. K = num_samples_per_image. - sampled_gt_boxes: a tensor of shape of [batch_size, K, 4], storing the - box coordinates of the matched groundtruth boxes of the samples RoIs. - sampled_gt_classes: a tensor of shape of [batch_size, K], storing the - classes of the matched groundtruth boxes of the sampled RoIs. - """ - sampled_rois, sampled_gt_boxes, sampled_gt_classes, sampled_gt_indices = ( - assign_and_sample_proposals( - rois, - gt_boxes, - gt_classes, - num_samples_per_image=self._num_samples_per_image, - mix_gt_boxes=self._mix_gt_boxes, - fg_fraction=self._fg_fraction, - fg_iou_thresh=self._fg_iou_thresh, - bg_iou_thresh_hi=self._bg_iou_thresh_hi, - bg_iou_thresh_lo=self._bg_iou_thresh_lo)) - return (sampled_rois, sampled_gt_boxes, sampled_gt_classes, - sampled_gt_indices) - - -class ROIScoreSampler(ROISampler): - """Samples RoIs, RoI-scores and creates training targets.""" - - def __call__(self, rois, roi_scores, gt_boxes, gt_classes): - """Sample and assign RoIs for training. - - Args: - rois: a tensor of shape of [batch_size, N, 4]. N is the number of - proposals before groundtruth assignment. The last dimension is the box - coordinates w.r.t. the scaled images in [ymin, xmin, ymax, xmax] format. - roi_scores: - gt_boxes: a tensor of shape of [batch_size, MAX_NUM_INSTANCES, 4]. The - coordinates of gt_boxes are in the pixel coordinates of the scaled - image. This tensor might have padding of values -1 indicating the - invalid box coordinates. - gt_classes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES]. This - tensor might have paddings with values of -1 indicating the invalid - classes. - - Returns: - sampled_rois: a tensor of shape of [batch_size, K, 4], representing the - coordinates of the sampled RoIs, where K is the number of the sampled - RoIs, i.e. K = num_samples_per_image. - sampled_roi_scores: - sampled_gt_boxes: a tensor of shape of [batch_size, K, 4], storing the - box coordinates of the matched groundtruth boxes of the samples RoIs. - sampled_gt_classes: a tensor of shape of [batch_size, K], storing the - classes of the matched groundtruth boxes of the sampled RoIs. - """ - (sampled_rois, sampled_roi_scores, sampled_gt_boxes, sampled_gt_classes, - sampled_gt_indices) = ( - self.assign_and_sample_proposals_and_scores( - rois, - roi_scores, - gt_boxes, - gt_classes, - num_samples_per_image=self._num_samples_per_image, - mix_gt_boxes=self._mix_gt_boxes, - fg_fraction=self._fg_fraction, - fg_iou_thresh=self._fg_iou_thresh, - bg_iou_thresh_hi=self._bg_iou_thresh_hi, - bg_iou_thresh_lo=self._bg_iou_thresh_lo)) - return (sampled_rois, sampled_roi_scores, sampled_gt_boxes, - sampled_gt_classes, sampled_gt_indices) - - def assign_and_sample_proposals_and_scores(self, - proposed_boxes, - proposed_scores, - gt_boxes, - gt_classes, - num_samples_per_image=512, - mix_gt_boxes=True, - fg_fraction=0.25, - fg_iou_thresh=0.5, - bg_iou_thresh_hi=0.5, - bg_iou_thresh_lo=0.0): - """Assigns the proposals with groundtruth classes and performs subsmpling. - - Given `proposed_boxes`, `gt_boxes`, and `gt_classes`, the function uses the - following algorithm to generate the final `num_samples_per_image` RoIs. - 1. Calculates the IoU between each proposal box and each gt_boxes. - 2. Assigns each proposed box with a groundtruth class and box by choosing - the largest IoU overlap. - 3. Samples `num_samples_per_image` boxes from all proposed boxes, and - returns box_targets, class_targets, and RoIs. - - Args: - proposed_boxes: a tensor of shape of [batch_size, N, 4]. N is the number - of proposals before groundtruth assignment. The last dimension is the - box coordinates w.r.t. the scaled images in [ymin, xmin, ymax, xmax] - format. - proposed_scores: a tensor of shape of [batch_size, N]. N is the number of - proposals before groundtruth assignment. It is the rpn scores for all - proposed boxes which can be either their classification or centerness - scores. - gt_boxes: a tensor of shape of [batch_size, MAX_NUM_INSTANCES, 4]. The - coordinates of gt_boxes are in the pixel coordinates of the scaled - image. This tensor might have padding of values -1 indicating the - invalid box coordinates. - gt_classes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES]. This - tensor might have paddings with values of -1 indicating the invalid - classes. - num_samples_per_image: a integer represents RoI minibatch size per image. - mix_gt_boxes: a bool indicating whether to mix the groundtruth boxes - before sampling proposals. - fg_fraction: a float represents the target fraction of RoI minibatch that - is labeled foreground (i.e., class > 0). - fg_iou_thresh: a float represents the IoU overlap threshold for an RoI to - be considered foreground (if >= fg_iou_thresh). - bg_iou_thresh_hi: a float represents the IoU overlap threshold for an RoI - to be considered background (class = 0 if overlap in [LO, HI)). - bg_iou_thresh_lo: a float represents the IoU overlap threshold for an RoI - to be considered background (class = 0 if overlap in [LO, HI)). - - Returns: - sampled_rois: a tensor of shape of [batch_size, K, 4], representing the - coordinates of the sampled RoIs, where K is the number of the sampled - RoIs, i.e. K = num_samples_per_image. - sampled_scores: a tensor of shape of [batch_size, K], representing the - confidence score of the sampled RoIs, where K is the number of the - sampled RoIs, i.e. K = num_samples_per_image. - sampled_gt_boxes: a tensor of shape of [batch_size, K, 4], storing the - box coordinates of the matched groundtruth boxes of the samples RoIs. - sampled_gt_classes: a tensor of shape of [batch_size, K], storing the - classes of the matched groundtruth boxes of the sampled RoIs. - sampled_gt_indices: a tensor of shape of [batch_size, K], storing the - indices of the sampled groudntruth boxes in the original `gt_boxes` - tensor, i.e. gt_boxes[sampled_gt_indices[:, i]] = - sampled_gt_boxes[:, i]. - """ - - with tf.name_scope('sample_proposals_and_scores'): - if mix_gt_boxes: - boxes = tf.concat([proposed_boxes, gt_boxes], axis=1) - gt_scores = tf.ones_like(gt_boxes[:, :, 0]) - scores = tf.concat([proposed_scores, gt_scores], axis=1) - else: - boxes = proposed_boxes - scores = proposed_scores - - (matched_gt_boxes, matched_gt_classes, matched_gt_indices, matched_iou, - _) = box_matching(boxes, gt_boxes, gt_classes) - - positive_match = tf.greater(matched_iou, fg_iou_thresh) - negative_match = tf.logical_and( - tf.greater_equal(matched_iou, bg_iou_thresh_lo), - tf.less(matched_iou, bg_iou_thresh_hi)) - ignored_match = tf.less(matched_iou, 0.0) - - # re-assign negatively matched boxes to the background class. - matched_gt_classes = tf.where(negative_match, - tf.zeros_like(matched_gt_classes), - matched_gt_classes) - matched_gt_indices = tf.where(negative_match, - tf.zeros_like(matched_gt_indices), - matched_gt_indices) - - sample_candidates = tf.logical_and( - tf.logical_or(positive_match, negative_match), - tf.logical_not(ignored_match)) - - sampler = ( - balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( - positive_fraction=fg_fraction, is_static=True)) - - batch_size, _ = sample_candidates.get_shape().as_list() - sampled_indicators = [] - for i in range(batch_size): - sampled_indicator = sampler.subsample(sample_candidates[i], - num_samples_per_image, - positive_match[i]) - sampled_indicators.append(sampled_indicator) - sampled_indicators = tf.stack(sampled_indicators) - _, sampled_indices = tf.nn.top_k( - tf.cast(sampled_indicators, dtype=tf.int32), - k=num_samples_per_image, - sorted=True) - - sampled_indices_shape = tf.shape(sampled_indices) - batch_indices = ( - tf.expand_dims(tf.range(sampled_indices_shape[0]), axis=-1) * - tf.ones([1, sampled_indices_shape[-1]], dtype=tf.int32)) - gather_nd_indices = tf.stack([batch_indices, sampled_indices], axis=-1) - - sampled_rois = tf.gather_nd(boxes, gather_nd_indices) - sampled_roi_scores = tf.gather_nd(scores, gather_nd_indices) - sampled_gt_boxes = tf.gather_nd(matched_gt_boxes, gather_nd_indices) - sampled_gt_classes = tf.gather_nd(matched_gt_classes, gather_nd_indices) - sampled_gt_indices = tf.gather_nd(matched_gt_indices, gather_nd_indices) - - return (sampled_rois, sampled_roi_scores, sampled_gt_boxes, - sampled_gt_classes, sampled_gt_indices) - - -class MaskSampler(tf.keras.layers.Layer): - """Samples and creates mask training targets.""" - - def __init__(self, mask_target_size, num_mask_samples_per_image): - self._mask_target_size = mask_target_size - self._num_mask_samples_per_image = num_mask_samples_per_image - super(MaskSampler, self).__init__(autocast=False) - - def call(self, - candidate_rois, - candidate_gt_boxes, - candidate_gt_classes, - candidate_gt_indices, - gt_masks): - """Sample and create mask targets for training. - - Args: - candidate_rois: a tensor of shape of [batch_size, N, 4], where N is the - number of candidate RoIs to be considered for mask sampling. It includes - both positive and negative RoIs. The `num_mask_samples_per_image` - positive RoIs will be sampled to create mask training targets. - candidate_gt_boxes: a tensor of shape of [batch_size, N, 4], storing the - corresponding groundtruth boxes to the `candidate_rois`. - candidate_gt_classes: a tensor of shape of [batch_size, N], storing the - corresponding groundtruth classes to the `candidate_rois`. 0 in the - tensor corresponds to the background class, i.e. negative RoIs. - candidate_gt_indices: a tensor of shape [batch_size, N], storing the - corresponding groundtruth instance indices to the `candidate_gt_boxes`, - i.e. gt_boxes[candidate_gt_indices[:, i]] = candidate_gt_boxes[:, i], - where gt_boxes which is of shape [batch_size, MAX_INSTANCES, 4], M >= - N, is the superset of candidate_gt_boxes. - gt_masks: a tensor of [batch_size, MAX_INSTANCES, mask_height, mask_width] - containing all the groundtruth masks which sample masks are drawn from. - after sampling. The output masks are resized w.r.t the sampled RoIs. - - Returns: - foreground_rois: a tensor of shape of [batch_size, K, 4] storing the RoI - that corresponds to the sampled foreground masks, where - K = num_mask_samples_per_image. - foreground_classes: a tensor of shape of [batch_size, K] storing the - classes corresponding to the sampled foreground masks. - cropoped_foreground_masks: a tensor of shape of - [batch_size, K, mask_target_size, mask_target_size] storing the - cropped foreground masks used for training. - """ - foreground_rois, foreground_classes, cropped_foreground_masks = ( - sample_and_crop_foreground_masks(candidate_rois, candidate_gt_boxes, - candidate_gt_classes, - candidate_gt_indices, gt_masks, - self._num_mask_samples_per_image, - self._mask_target_size)) - return foreground_rois, foreground_classes, cropped_foreground_masks diff --git a/official/vision/detection/utils/__init__.py b/official/vision/detection/utils/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/detection/utils/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/detection/utils/box_utils.py b/official/vision/detection/utils/box_utils.py deleted file mode 100644 index bc95fa8e3602d49f922fb135531e95078942b7c1..0000000000000000000000000000000000000000 --- a/official/vision/detection/utils/box_utils.py +++ /dev/null @@ -1,700 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Utility functions for bounding box processing.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import tensorflow as tf - -EPSILON = 1e-8 -BBOX_XFORM_CLIP = np.log(1000. / 16.) - - -def visualize_images_with_bounding_boxes(images, box_outputs, step, - summary_writer): - """Records subset of evaluation images with bounding boxes.""" - image_shape = tf.shape(images[0]) - image_height = tf.cast(image_shape[0], tf.float32) - image_width = tf.cast(image_shape[1], tf.float32) - normalized_boxes = normalize_boxes(box_outputs, [image_height, image_width]) - - bounding_box_color = tf.constant([[1.0, 1.0, 0.0, 1.0]]) - image_summary = tf.image.draw_bounding_boxes(images, normalized_boxes, - bounding_box_color) - with summary_writer.as_default(): - tf.summary.image('bounding_box_summary', image_summary, step=step) - summary_writer.flush() - - -def yxyx_to_xywh(boxes): - """Converts boxes from ymin, xmin, ymax, xmax to xmin, ymin, width, height. - - Args: - boxes: a numpy array whose last dimension is 4 representing the coordinates - of boxes in ymin, xmin, ymax, xmax order. - - Returns: - boxes: a numpy array whose shape is the same as `boxes` in new format. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[-1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - boxes_ymin = boxes[..., 0] - boxes_xmin = boxes[..., 1] - boxes_width = boxes[..., 3] - boxes[..., 1] - boxes_height = boxes[..., 2] - boxes[..., 0] - new_boxes = np.stack([boxes_xmin, boxes_ymin, boxes_width, boxes_height], - axis=-1) - - return new_boxes - - -def jitter_boxes(boxes, noise_scale=0.025): - """Jitter the box coordinates by some noise distribution. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - noise_scale: a python float which specifies the magnitude of noise. The rule - of thumb is to set this between (0, 0.1]. The default value is found to - mimic the noisy detections best empirically. - - Returns: - jittered_boxes: a tensor whose shape is the same as `boxes` representing - the jittered boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[-1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - with tf.name_scope('jitter_boxes'): - bbox_jitters = tf.random.normal(boxes.get_shape(), stddev=noise_scale) - ymin = boxes[..., 0:1] - xmin = boxes[..., 1:2] - ymax = boxes[..., 2:3] - xmax = boxes[..., 3:4] - width = xmax - xmin - height = ymax - ymin - new_center_x = (xmin + xmax) / 2.0 + bbox_jitters[..., 0:1] * width - new_center_y = (ymin + ymax) / 2.0 + bbox_jitters[..., 1:2] * height - new_width = width * tf.math.exp(bbox_jitters[..., 2:3]) - new_height = height * tf.math.exp(bbox_jitters[..., 3:4]) - jittered_boxes = tf.concat([ - new_center_y - new_height * 0.5, new_center_x - new_width * 0.5, - new_center_y + new_height * 0.5, new_center_x + new_width * 0.5 - ], - axis=-1) - - return jittered_boxes - - -def normalize_boxes(boxes, image_shape): - """Converts boxes to the normalized coordinates. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - - Returns: - normalized_boxes: a tensor whose shape is the same as `boxes` representing - the normalized boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[-1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - with tf.name_scope('normalize_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height = image_shape[..., 0:1] - width = image_shape[..., 1:2] - - ymin = boxes[..., 0:1] / height - xmin = boxes[..., 1:2] / width - ymax = boxes[..., 2:3] / height - xmax = boxes[..., 3:4] / width - - normalized_boxes = tf.concat([ymin, xmin, ymax, xmax], axis=-1) - return normalized_boxes - - -def denormalize_boxes(boxes, image_shape): - """Converts boxes normalized by [height, width] to pixel coordinates. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - - Returns: - denormalized_boxes: a tensor whose shape is the same as `boxes` representing - the denormalized boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - with tf.name_scope('denormalize_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height, width = tf.split(image_shape, 2, axis=-1) - - ymin, xmin, ymax, xmax = tf.split(boxes, 4, axis=-1) - ymin = ymin * height - xmin = xmin * width - ymax = ymax * height - xmax = xmax * width - - denormalized_boxes = tf.concat([ymin, xmin, ymax, xmax], axis=-1) - return denormalized_boxes - - -def clip_boxes(boxes, image_shape): - """Clips boxes to image boundaries. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - - Returns: - clipped_boxes: a tensor whose shape is the same as `boxes` representing the - clipped boxes. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[-1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - with tf.name_scope('clip_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - max_length = [height - 1.0, width - 1.0, height - 1.0, width - 1.0] - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height, width = tf.unstack(image_shape, axis=-1) - max_length = tf.stack( - [height - 1.0, width - 1.0, height - 1.0, width - 1.0], axis=-1) - - clipped_boxes = tf.math.maximum(tf.math.minimum(boxes, max_length), 0.0) - return clipped_boxes - - -def compute_outer_boxes(boxes, image_shape, scale=1.0): - """Compute outer box encloses an object with a margin. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - image_shape: a list of two integers, a two-element vector or a tensor such - that all but the last dimensions are `broadcastable` to `boxes`. The last - dimension is 2, which represents [height, width]. - scale: a float number specifying the scale of output outer boxes to input - `boxes`. - - Returns: - outer_boxes: a tensor whose shape is the same as `boxes` representing the - outer boxes. - """ - if scale < 1.0: - raise ValueError( - 'scale is {}, but outer box scale must be greater than 1.0.'.format( - scale)) - centers_y = (boxes[..., 0] + boxes[..., 2]) / 2.0 - centers_x = (boxes[..., 1] + boxes[..., 3]) / 2.0 - box_height = (boxes[..., 2] - boxes[..., 0]) * scale - box_width = (boxes[..., 3] - boxes[..., 1]) * scale - outer_boxes = tf.stack([ - centers_y - box_height / 2.0, centers_x - box_width / 2.0, - centers_y + box_height / 2.0, centers_x + box_width / 2.0 - ], - axis=1) - outer_boxes = clip_boxes(outer_boxes, image_shape) - return outer_boxes - - -def encode_boxes(boxes, anchors, weights=None): - """Encode boxes to targets. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, - representing the coordinates of anchors in ymin, xmin, ymax, xmax order. - weights: None or a list of four float numbers used to scale coordinates. - - Returns: - encoded_boxes: a tensor whose shape is the same as `boxes` representing the - encoded box targets. - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[-1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - with tf.name_scope('encode_boxes'): - boxes = tf.cast(boxes, dtype=anchors.dtype) - ymin = boxes[..., 0:1] - xmin = boxes[..., 1:2] - ymax = boxes[..., 2:3] - xmax = boxes[..., 3:4] - box_h = ymax - ymin + 1.0 - box_w = xmax - xmin + 1.0 - box_yc = ymin + 0.5 * box_h - box_xc = xmin + 0.5 * box_w - - anchor_ymin = anchors[..., 0:1] - anchor_xmin = anchors[..., 1:2] - anchor_ymax = anchors[..., 2:3] - anchor_xmax = anchors[..., 3:4] - anchor_h = anchor_ymax - anchor_ymin + 1.0 - anchor_w = anchor_xmax - anchor_xmin + 1.0 - anchor_yc = anchor_ymin + 0.5 * anchor_h - anchor_xc = anchor_xmin + 0.5 * anchor_w - - encoded_dy = (box_yc - anchor_yc) / anchor_h - encoded_dx = (box_xc - anchor_xc) / anchor_w - encoded_dh = tf.math.log(box_h / anchor_h) - encoded_dw = tf.math.log(box_w / anchor_w) - if weights: - encoded_dy *= weights[0] - encoded_dx *= weights[1] - encoded_dh *= weights[2] - encoded_dw *= weights[3] - - encoded_boxes = tf.concat([encoded_dy, encoded_dx, encoded_dh, encoded_dw], - axis=-1) - return encoded_boxes - - -def decode_boxes(encoded_boxes, anchors, weights=None): - """Decode boxes. - - Args: - encoded_boxes: a tensor whose last dimension is 4 representing the - coordinates of encoded boxes in ymin, xmin, ymax, xmax order. - anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, - representing the coordinates of anchors in ymin, xmin, ymax, xmax order. - weights: None or a list of four float numbers used to scale coordinates. - - Returns: - encoded_boxes: a tensor whose shape is the same as `boxes` representing the - decoded box targets. - """ - if encoded_boxes.shape[-1] != 4: - raise ValueError('encoded_boxes.shape[-1] is {:d}, but must be 4.'.format( - encoded_boxes.shape[-1])) - - with tf.name_scope('decode_boxes'): - encoded_boxes = tf.cast(encoded_boxes, dtype=anchors.dtype) - dy = encoded_boxes[..., 0:1] - dx = encoded_boxes[..., 1:2] - dh = encoded_boxes[..., 2:3] - dw = encoded_boxes[..., 3:4] - if weights: - dy /= weights[0] - dx /= weights[1] - dh /= weights[2] - dw /= weights[3] - dh = tf.math.minimum(dh, BBOX_XFORM_CLIP) - dw = tf.math.minimum(dw, BBOX_XFORM_CLIP) - - anchor_ymin = anchors[..., 0:1] - anchor_xmin = anchors[..., 1:2] - anchor_ymax = anchors[..., 2:3] - anchor_xmax = anchors[..., 3:4] - anchor_h = anchor_ymax - anchor_ymin + 1.0 - anchor_w = anchor_xmax - anchor_xmin + 1.0 - anchor_yc = anchor_ymin + 0.5 * anchor_h - anchor_xc = anchor_xmin + 0.5 * anchor_w - - decoded_boxes_yc = dy * anchor_h + anchor_yc - decoded_boxes_xc = dx * anchor_w + anchor_xc - decoded_boxes_h = tf.math.exp(dh) * anchor_h - decoded_boxes_w = tf.math.exp(dw) * anchor_w - - decoded_boxes_ymin = decoded_boxes_yc - 0.5 * decoded_boxes_h - decoded_boxes_xmin = decoded_boxes_xc - 0.5 * decoded_boxes_w - decoded_boxes_ymax = decoded_boxes_ymin + decoded_boxes_h - 1.0 - decoded_boxes_xmax = decoded_boxes_xmin + decoded_boxes_w - 1.0 - - decoded_boxes = tf.concat([ - decoded_boxes_ymin, decoded_boxes_xmin, decoded_boxes_ymax, - decoded_boxes_xmax - ], - axis=-1) - return decoded_boxes - - -def encode_boxes_lrtb(boxes, anchors, weights=None): - """Encode boxes to targets on lrtb (=left,right,top,bottom) format. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates - of boxes in ymin, xmin, ymax, xmax order. - anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, - representing the coordinates of anchors in ymin, xmin, ymax, xmax order. - weights: None or a list of four float numbers used to scale coordinates. - - Returns: - encoded_boxes_lrtb: a tensor whose shape is the same as `boxes` representing - the encoded box targets. The box targets encode the left, right, top, - bottom distances from an anchor location to the four borders of the - matched groundtruth bounding box. - center_targets: centerness targets defined by the left, right, top, and - bottom distance targets. The centerness is defined as the deviation of the - anchor location from the groundtruth object center. Formally, centerness = - sqrt(min(left, right)/max(left, right)*min(top, bottom)/max(top, bottom)). - - Raises: - ValueError: If the last dimension of boxes is not 4. - """ - if boxes.shape[-1] != 4: - raise ValueError( - 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) - - with tf.name_scope('encode_boxes_lrtb'): - boxes = tf.cast(boxes, dtype=anchors.dtype) - ymin = boxes[..., 0:1] - xmin = boxes[..., 1:2] - ymax = boxes[..., 2:3] - xmax = boxes[..., 3:4] - # box_h = ymax - ymin + 1.0 - # box_w = xmax - xmin + 1.0 - box_h = ymax - ymin - box_w = xmax - xmin - - anchor_ymin = anchors[..., 0:1] - anchor_xmin = anchors[..., 1:2] - anchor_ymax = anchors[..., 2:3] - anchor_xmax = anchors[..., 3:4] - # anchor_h = anchor_ymax - anchor_ymin + 1.0 - # anchor_w = anchor_xmax - anchor_xmin + 1.0 - anchor_h = anchor_ymax - anchor_ymin - anchor_w = anchor_xmax - anchor_xmin - anchor_yc = anchor_ymin + 0.5 * anchor_h - anchor_xc = anchor_xmin + 0.5 * anchor_w - - box_h += EPSILON - box_w += EPSILON - anchor_h += EPSILON - anchor_w += EPSILON - - left = (anchor_xc - xmin) / anchor_w - right = (xmax - anchor_xc) / anchor_w - top = (anchor_yc - ymin) / anchor_h - bottom = (ymax - anchor_yc) / anchor_h - - # Create centerness target. { - lrtb_targets = tf.concat([left, right, top, bottom], axis=-1) - valid_match = tf.greater(tf.reduce_min(lrtb_targets, -1), 0.0) - - # Centerness score. - left_right = tf.concat([left, right], axis=-1) - - left_right = tf.where(tf.stack([valid_match, valid_match], -1), - left_right, tf.zeros_like(left_right)) - top_bottom = tf.concat([top, bottom], axis=-1) - top_bottom = tf.where(tf.stack([valid_match, valid_match], -1), - top_bottom, tf.zeros_like(top_bottom)) - center_targets = tf.sqrt( - (tf.reduce_min(left_right, -1) / - (tf.reduce_max(left_right, -1) + EPSILON)) * - (tf.reduce_min(top_bottom, -1) / - (tf.reduce_max(top_bottom, -1) + EPSILON))) - center_targets = tf.where(valid_match, - center_targets, - tf.zeros_like(center_targets)) - if weights: - left *= weights[0] - right *= weights[1] - top *= weights[2] - bottom *= weights[3] - - encoded_boxes_lrtb = tf.concat( - [left, right, top, bottom], - axis=-1) - - return encoded_boxes_lrtb, center_targets - - -def decode_boxes_lrtb(encoded_boxes_lrtb, anchors, weights=None): - """Decode boxes. - - Args: - encoded_boxes_lrtb: a tensor whose last dimension is 4 representing the - coordinates of encoded boxes in left, right, top, bottom order. - anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, - representing the coordinates of anchors in ymin, xmin, ymax, xmax order. - weights: None or a list of four float numbers used to scale coordinates. - - Returns: - decoded_boxes_lrtb: a tensor whose shape is the same as `boxes` representing - the decoded box targets in lrtb (=left,right,top,bottom) format. The box - decoded box coordinates represent the left, right, top, and bottom - distances from an anchor location to the four borders of the matched - groundtruth bounding box. - """ - if encoded_boxes_lrtb.shape[-1] != 4: - raise ValueError( - 'encoded_boxes_lrtb.shape[-1] is {:d}, but must be 4.' - .format(encoded_boxes_lrtb.shape[-1])) - - with tf.name_scope('decode_boxes_lrtb'): - encoded_boxes_lrtb = tf.cast(encoded_boxes_lrtb, dtype=anchors.dtype) - left = encoded_boxes_lrtb[..., 0:1] - right = encoded_boxes_lrtb[..., 1:2] - top = encoded_boxes_lrtb[..., 2:3] - bottom = encoded_boxes_lrtb[..., 3:4] - if weights: - left /= weights[0] - right /= weights[1] - top /= weights[2] - bottom /= weights[3] - - anchor_ymin = anchors[..., 0:1] - anchor_xmin = anchors[..., 1:2] - anchor_ymax = anchors[..., 2:3] - anchor_xmax = anchors[..., 3:4] - - anchor_h = anchor_ymax - anchor_ymin - anchor_w = anchor_xmax - anchor_xmin - anchor_yc = anchor_ymin + 0.5 * anchor_h - anchor_xc = anchor_xmin + 0.5 * anchor_w - anchor_h += EPSILON - anchor_w += EPSILON - - decoded_boxes_ymin = anchor_yc - top * anchor_h - decoded_boxes_xmin = anchor_xc - left * anchor_w - decoded_boxes_ymax = anchor_yc + bottom * anchor_h - decoded_boxes_xmax = anchor_xc + right * anchor_w - - decoded_boxes_lrtb = tf.concat( - [decoded_boxes_ymin, decoded_boxes_xmin, - decoded_boxes_ymax, decoded_boxes_xmax], - axis=-1) - return decoded_boxes_lrtb - - -def filter_boxes(boxes, scores, image_shape, min_size_threshold): - """Filter and remove boxes that are too small or fall outside the image. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - scores: a tensor whose shape is the same as tf.shape(boxes)[:-1] - representing the original scores of the boxes. - image_shape: a tensor whose shape is the same as, or `broadcastable` to - `boxes` except the last dimension, which is 2, representing [height, - width] of the scaled image. - min_size_threshold: a float representing the minimal box size in each side - (w.r.t. the scaled image). Boxes whose sides are smaller than it will be - filtered out. - - Returns: - filtered_boxes: a tensor whose shape is the same as `boxes` but with - the position of the filtered boxes are filled with 0. - filtered_scores: a tensor whose shape is the same as 'scores' but with - the positinon of the filtered boxes filled with 0. - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - with tf.name_scope('filter_boxes'): - if isinstance(image_shape, list) or isinstance(image_shape, tuple): - height, width = image_shape - else: - image_shape = tf.cast(image_shape, dtype=boxes.dtype) - height = image_shape[..., 0] - width = image_shape[..., 1] - - ymin = boxes[..., 0] - xmin = boxes[..., 1] - ymax = boxes[..., 2] - xmax = boxes[..., 3] - - h = ymax - ymin + 1.0 - w = xmax - xmin + 1.0 - yc = ymin + 0.5 * h - xc = xmin + 0.5 * w - - min_size = tf.cast( - tf.math.maximum(min_size_threshold, 1.0), dtype=boxes.dtype) - - filtered_size_mask = tf.math.logical_and( - tf.math.greater(h, min_size), tf.math.greater(w, min_size)) - filtered_center_mask = tf.logical_and( - tf.math.logical_and(tf.math.greater(yc, 0.0), tf.math.less(yc, height)), - tf.math.logical_and(tf.math.greater(xc, 0.0), tf.math.less(xc, width))) - filtered_mask = tf.math.logical_and(filtered_size_mask, - filtered_center_mask) - - filtered_scores = tf.where(filtered_mask, scores, tf.zeros_like(scores)) - filtered_boxes = tf.cast( - tf.expand_dims(filtered_mask, axis=-1), dtype=boxes.dtype) * boxes - - return filtered_boxes, filtered_scores - - -def filter_boxes_by_scores(boxes, scores, min_score_threshold): - """Filter and remove boxes whose scores are smaller than the threshold. - - Args: - boxes: a tensor whose last dimension is 4 representing the coordinates of - boxes in ymin, xmin, ymax, xmax order. - scores: a tensor whose shape is the same as tf.shape(boxes)[:-1] - representing the original scores of the boxes. - min_score_threshold: a float representing the minimal box score threshold. - Boxes whose score are smaller than it will be filtered out. - - Returns: - filtered_boxes: a tensor whose shape is the same as `boxes` but with - the position of the filtered boxes are filled with -1. - filtered_scores: a tensor whose shape is the same as 'scores' but with - the - """ - if boxes.shape[-1] != 4: - raise ValueError('boxes.shape[1] is {:d}, but must be 4.'.format( - boxes.shape[-1])) - - with tf.name_scope('filter_boxes_by_scores'): - filtered_mask = tf.math.greater(scores, min_score_threshold) - filtered_scores = tf.where(filtered_mask, scores, -tf.ones_like(scores)) - filtered_boxes = tf.cast( - tf.expand_dims(filtered_mask, axis=-1), dtype=boxes.dtype) * boxes - - return filtered_boxes, filtered_scores - - -def top_k_boxes(boxes, scores, k): - """Sort and select top k boxes according to the scores. - - Args: - boxes: a tensor of shape [batch_size, N, 4] representing the coordiante of - the boxes. N is the number of boxes per image. - scores: a tensor of shsape [batch_size, N] representing the socre of the - boxes. - k: an integer or a tensor indicating the top k number. - - Returns: - selected_boxes: a tensor of shape [batch_size, k, 4] representing the - selected top k box coordinates. - selected_scores: a tensor of shape [batch_size, k] representing the selected - top k box scores. - """ - with tf.name_scope('top_k_boxes'): - selected_scores, top_k_indices = tf.nn.top_k(scores, k=k, sorted=True) - - batch_size, _ = scores.get_shape().as_list() - if batch_size == 1: - selected_boxes = tf.squeeze( - tf.gather(boxes, top_k_indices, axis=1), axis=1) - else: - top_k_indices_shape = tf.shape(top_k_indices) - batch_indices = ( - tf.expand_dims(tf.range(top_k_indices_shape[0]), axis=-1) * - tf.ones([1, top_k_indices_shape[-1]], dtype=tf.int32)) - gather_nd_indices = tf.stack([batch_indices, top_k_indices], axis=-1) - selected_boxes = tf.gather_nd(boxes, gather_nd_indices) - - return selected_boxes, selected_scores - - -def bbox_overlap(boxes, gt_boxes): - """Calculates the overlap between proposal and ground truth boxes. - - Some `gt_boxes` may have been padded. The returned `iou` tensor for these - boxes will be -1. - - Args: - boxes: a tensor with a shape of [batch_size, N, 4]. N is the number of - proposals before groundtruth assignment (e.g., rpn_post_nms_topn). The - last dimension is the pixel coordinates in [ymin, xmin, ymax, xmax] form. - gt_boxes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES, 4]. This - tensor might have paddings with a negative value. - - Returns: - iou: a tensor with as a shape of [batch_size, N, MAX_NUM_INSTANCES]. - """ - with tf.name_scope('bbox_overlap'): - bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( - value=boxes, num_or_size_splits=4, axis=2) - gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( - value=gt_boxes, num_or_size_splits=4, axis=2) - - # Calculates the intersection area. - i_xmin = tf.math.maximum(bb_x_min, tf.transpose(gt_x_min, [0, 2, 1])) - i_xmax = tf.math.minimum(bb_x_max, tf.transpose(gt_x_max, [0, 2, 1])) - i_ymin = tf.math.maximum(bb_y_min, tf.transpose(gt_y_min, [0, 2, 1])) - i_ymax = tf.math.minimum(bb_y_max, tf.transpose(gt_y_max, [0, 2, 1])) - i_area = tf.math.maximum((i_xmax - i_xmin), 0) * tf.math.maximum( - (i_ymax - i_ymin), 0) - - # Calculates the union area. - bb_area = (bb_y_max - bb_y_min) * (bb_x_max - bb_x_min) - gt_area = (gt_y_max - gt_y_min) * (gt_x_max - gt_x_min) - # Adds a small epsilon to avoid divide-by-zero. - u_area = bb_area + tf.transpose(gt_area, [0, 2, 1]) - i_area + 1e-8 - - # Calculates IoU. - iou = i_area / u_area - - # Fills -1 for IoU entries between the padded ground truth boxes. - gt_invalid_mask = tf.less( - tf.reduce_max(gt_boxes, axis=-1, keepdims=True), 0.0) - padding_mask = tf.logical_or( - tf.zeros_like(bb_x_min, dtype=tf.bool), - tf.transpose(gt_invalid_mask, [0, 2, 1])) - iou = tf.where(padding_mask, -tf.ones_like(iou), iou) - - return iou - - -def get_non_empty_box_indices(boxes): - """Get indices for non-empty boxes.""" - # Selects indices if box height or width is 0. - height = boxes[:, 2] - boxes[:, 0] - width = boxes[:, 3] - boxes[:, 1] - indices = tf.where( - tf.logical_and(tf.greater(height, 0), tf.greater(width, 0))) - return indices[:, 0] diff --git a/official/vision/detection/utils/class_utils.py b/official/vision/detection/utils/class_utils.py deleted file mode 100644 index cbf806f11070736c17de79dd63240e9a626808d9..0000000000000000000000000000000000000000 --- a/official/vision/detection/utils/class_utils.py +++ /dev/null @@ -1,44 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Utility functions for handling dataset object categories.""" - - -def coco_split_class_ids(split_name): - """Return the COCO class split ids based on split name and training mode. - - Args: - split_name: The name of dataset split. - - Returns: - class_ids: a python list of integer. - """ - if split_name == 'all': - return [] - - elif split_name == 'voc': - return [ - 1, 2, 3, 4, 5, 6, 7, 9, 16, 17, 18, 19, 20, 21, 44, 62, 63, 64, 67, 72 - ] - - elif split_name == 'nonvoc': - return [ - 8, 10, 11, 13, 14, 15, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34, 35, 36, - 37, 38, 39, 40, 41, 42, 43, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, - 57, 58, 59, 60, 61, 65, 70, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, - 85, 86, 87, 88, 89, 90 - ] - - else: - raise ValueError('Invalid split name {}!!!'.format(split_name)) diff --git a/official/vision/detection/utils/dataloader_utils.py b/official/vision/detection/utils/dataloader_utils.py deleted file mode 100644 index 9569d7713c2177d233bcdc21934edb6ffbe0fefd..0000000000000000000000000000000000000000 --- a/official/vision/detection/utils/dataloader_utils.py +++ /dev/null @@ -1,40 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Utility functions for dataloader.""" - -import tensorflow as tf - -from official.vision.detection.utils import input_utils - - -def process_source_id(source_id): - """Processes source_id to the right format.""" - if source_id.dtype == tf.string: - source_id = tf.cast(tf.strings.to_number(source_id), tf.int64) - with tf.control_dependencies([source_id]): - source_id = tf.cond( - pred=tf.equal(tf.size(input=source_id), 0), - true_fn=lambda: tf.cast(tf.constant(-1), tf.int64), - false_fn=lambda: tf.identity(source_id)) - return source_id - - -def pad_groundtruths_to_fixed_size(gt, n): - """Pads the first dimension of groundtruths labels to the fixed size.""" - gt['boxes'] = input_utils.pad_to_fixed_size(gt['boxes'], n, -1) - gt['is_crowds'] = input_utils.pad_to_fixed_size(gt['is_crowds'], n, 0) - gt['areas'] = input_utils.pad_to_fixed_size(gt['areas'], n, -1) - gt['classes'] = input_utils.pad_to_fixed_size(gt['classes'], n, -1) - return gt diff --git a/official/vision/detection/utils/input_utils.py b/official/vision/detection/utils/input_utils.py deleted file mode 100644 index 62d8489c306b9031e523d5bef2638875a9c66cc0..0000000000000000000000000000000000000000 --- a/official/vision/detection/utils/input_utils.py +++ /dev/null @@ -1,359 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Utility functions for input processing.""" - -import math - -import tensorflow as tf - -from official.vision.detection.utils import box_utils -from official.vision.utils.object_detection import preprocessor - - -def pad_to_fixed_size(input_tensor, size, constant_values=0): - """Pads data to a fixed length at the first dimension. - - Args: - input_tensor: `Tensor` with any dimension. - size: `int` number for the first dimension of output Tensor. - constant_values: `int` value assigned to the paddings. - - Returns: - `Tensor` with the first dimension padded to `size`. - """ - input_shape = input_tensor.get_shape().as_list() - padding_shape = [] - - # Computes the padding length on the first dimension. - padding_length = tf.maximum(0, size - tf.shape(input_tensor)[0]) - assert_length = tf.Assert( - tf.greater_equal(padding_length, 0), [padding_length]) - with tf.control_dependencies([assert_length]): - padding_shape.append(padding_length) - - # Copies shapes of the rest of input shape dimensions. - for i in range(1, len(input_shape)): - padding_shape.append(tf.shape(input=input_tensor)[i]) - - # Pads input tensor to the fixed first dimension. - paddings = tf.cast(constant_values * tf.ones(padding_shape), - input_tensor.dtype) - padded_tensor = tf.concat([input_tensor, paddings], axis=0) - output_shape = input_shape - output_shape[0] = size - padded_tensor.set_shape(output_shape) - return padded_tensor - - -def normalize_image(image, - offset=(0.485, 0.456, 0.406), - scale=(0.229, 0.224, 0.225)): - """Normalizes the image to zero mean and unit variance.""" - image = tf.image.convert_image_dtype(image, dtype=tf.float32) - offset = tf.constant(offset) - offset = tf.expand_dims(offset, axis=0) - offset = tf.expand_dims(offset, axis=0) - image -= offset - - scale = tf.constant(scale) - scale = tf.expand_dims(scale, axis=0) - scale = tf.expand_dims(scale, axis=0) - image /= scale - return image - - -def compute_padded_size(desired_size, stride): - """Compute the padded size given the desired size and the stride. - - The padded size will be the smallest rectangle, such that each dimension is - the smallest multiple of the stride which is larger than the desired - dimension. For example, if desired_size = (100, 200) and stride = 32, - the output padded_size = (128, 224). - - Args: - desired_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the target output image size. - stride: an integer, the stride of the backbone network. - - Returns: - padded_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the padded output image size. - """ - if isinstance(desired_size, list) or isinstance(desired_size, tuple): - padded_size = [ - int(math.ceil(d * 1.0 / stride) * stride) for d in desired_size - ] - else: - padded_size = tf.cast( - tf.math.ceil(tf.cast(desired_size, dtype=tf.float32) / stride) * stride, - tf.int32) - return padded_size - - -def resize_and_crop_image(image, - desired_size, - padded_size, - aug_scale_min=1.0, - aug_scale_max=1.0, - seed=1, - method=tf.image.ResizeMethod.BILINEAR): - """Resizes the input image to output size. - - Resize and pad images given the desired output size of the image and - stride size. - - Here are the preprocessing steps. - 1. For a given image, keep its aspect ratio and rescale the image to make it - the largest rectangle to be bounded by the rectangle specified by the - `desired_size`. - 2. Pad the rescaled image to the padded_size. - - Args: - image: a `Tensor` of shape [height, width, 3] representing an image. - desired_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the desired actual output image size. - padded_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the padded output image size. Padding will be applied - after scaling the image to the desired_size. - aug_scale_min: a `float` with range between [0, 1.0] representing minimum - random scale applied to desired_size for training scale jittering. - aug_scale_max: a `float` with range between [1.0, inf] representing maximum - random scale applied to desired_size for training scale jittering. - seed: seed for random scale jittering. - method: function to resize input image to scaled image. - - Returns: - output_image: `Tensor` of shape [height, width, 3] where [height, width] - equals to `output_size`. - image_info: a 2D `Tensor` that encodes the information of the image and the - applied preprocessing. It is in the format of - [[original_height, original_width], [desired_height, desired_width], - [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, - desireed_width] is the actual scaled image size, and [y_scale, x_scale] is - the scaling factory, which is the ratio of - scaled dimension / original dimension. - """ - with tf.name_scope('resize_and_crop_image'): - image_size = tf.cast(tf.shape(input=image)[0:2], tf.float32) - - random_jittering = (aug_scale_min != 1.0 or aug_scale_max != 1.0) - - if random_jittering: - random_scale = tf.random.uniform([], - aug_scale_min, - aug_scale_max, - seed=seed) - scaled_size = tf.round(random_scale * desired_size) - else: - scaled_size = desired_size - - scale = tf.minimum(scaled_size[0] / image_size[0], - scaled_size[1] / image_size[1]) - scaled_size = tf.round(image_size * scale) - - # Computes 2D image_scale. - image_scale = scaled_size / image_size - - # Selects non-zero random offset (x, y) if scaled image is larger than - # desired_size. - if random_jittering: - max_offset = scaled_size - desired_size - max_offset = tf.where( - tf.less(max_offset, 0), tf.zeros_like(max_offset), max_offset) - offset = max_offset * tf.random.uniform([ - 2, - ], 0, 1, seed=seed) - offset = tf.cast(offset, tf.int32) - else: - offset = tf.zeros((2,), tf.int32) - - scaled_image = tf.image.resize( - image, tf.cast(scaled_size, tf.int32), method=method) - - if random_jittering: - scaled_image = scaled_image[offset[0]:offset[0] + desired_size[0], - offset[1]:offset[1] + desired_size[1], :] - - output_image = tf.image.pad_to_bounding_box(scaled_image, 0, 0, - padded_size[0], padded_size[1]) - - image_info = tf.stack([ - image_size, - tf.cast(desired_size, dtype=tf.float32), image_scale, - tf.cast(offset, tf.float32) - ]) - return output_image, image_info - - -def resize_and_crop_image_v2(image, - short_side, - long_side, - padded_size, - aug_scale_min=1.0, - aug_scale_max=1.0, - seed=1, - method=tf.image.ResizeMethod.BILINEAR): - """Resizes the input image to output size (Faster R-CNN style). - - Resize and pad images given the specified short / long side length and the - stride size. - - Here are the preprocessing steps. - 1. For a given image, keep its aspect ratio and first try to rescale the short - side of the original image to `short_side`. - 2. If the scaled image after 1 has a long side that exceeds `long_side`, keep - the aspect ratio and rescal the long side of the image to `long_side`. - 2. Pad the rescaled image to the padded_size. - - Args: - image: a `Tensor` of shape [height, width, 3] representing an image. - short_side: a scalar `Tensor` or `int` representing the desired short side - to be rescaled to. - long_side: a scalar `Tensor` or `int` representing the desired long side to - be rescaled to. - padded_size: a `Tensor` or `int` list/tuple of two elements representing - [height, width] of the padded output image size. Padding will be applied - after scaling the image to the desired_size. - aug_scale_min: a `float` with range between [0, 1.0] representing minimum - random scale applied to desired_size for training scale jittering. - aug_scale_max: a `float` with range between [1.0, inf] representing maximum - random scale applied to desired_size for training scale jittering. - seed: seed for random scale jittering. - method: function to resize input image to scaled image. - - Returns: - output_image: `Tensor` of shape [height, width, 3] where [height, width] - equals to `output_size`. - image_info: a 2D `Tensor` that encodes the information of the image and the - applied preprocessing. It is in the format of - [[original_height, original_width], [desired_height, desired_width], - [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, - desired_width] is the actual scaled image size, and [y_scale, x_scale] is - the scaling factor, which is the ratio of - scaled dimension / original dimension. - """ - with tf.name_scope('resize_and_crop_image_v2'): - image_size = tf.cast(tf.shape(image)[0:2], tf.float32) - - scale_using_short_side = ( - short_side / tf.math.minimum(image_size[0], image_size[1])) - scale_using_long_side = ( - long_side / tf.math.maximum(image_size[0], image_size[1])) - - scaled_size = tf.math.round(image_size * scale_using_short_side) - scaled_size = tf.where( - tf.math.greater( - tf.math.maximum(scaled_size[0], scaled_size[1]), long_side), - tf.math.round(image_size * scale_using_long_side), scaled_size) - desired_size = scaled_size - - random_jittering = (aug_scale_min != 1.0 or aug_scale_max != 1.0) - - if random_jittering: - random_scale = tf.random.uniform([], - aug_scale_min, - aug_scale_max, - seed=seed) - scaled_size = tf.math.round(random_scale * scaled_size) - - # Computes 2D image_scale. - image_scale = scaled_size / image_size - - # Selects non-zero random offset (x, y) if scaled image is larger than - # desired_size. - if random_jittering: - max_offset = scaled_size - desired_size - max_offset = tf.where( - tf.math.less(max_offset, 0), tf.zeros_like(max_offset), max_offset) - offset = max_offset * tf.random.uniform([ - 2, - ], 0, 1, seed=seed) - offset = tf.cast(offset, tf.int32) - else: - offset = tf.zeros((2,), tf.int32) - - scaled_image = tf.image.resize( - image, tf.cast(scaled_size, tf.int32), method=method) - - if random_jittering: - scaled_image = scaled_image[offset[0]:offset[0] + desired_size[0], - offset[1]:offset[1] + desired_size[1], :] - - output_image = tf.image.pad_to_bounding_box(scaled_image, 0, 0, - padded_size[0], padded_size[1]) - - image_info = tf.stack([ - image_size, - tf.cast(desired_size, dtype=tf.float32), image_scale, - tf.cast(offset, tf.float32) - ]) - return output_image, image_info - - -def resize_and_crop_boxes(boxes, image_scale, output_size, offset): - """Resizes boxes to output size with scale and offset. - - Args: - boxes: `Tensor` of shape [N, 4] representing ground truth boxes. - image_scale: 2D float `Tensor` representing scale factors that apply to - [height, width] of input image. - output_size: 2D `Tensor` or `int` representing [height, width] of target - output image size. - offset: 2D `Tensor` representing top-left corner [y0, x0] to crop scaled - boxes. - - Returns: - boxes: `Tensor` of shape [N, 4] representing the scaled boxes. - """ - # Adjusts box coordinates based on image_scale and offset. - boxes *= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2]) - boxes -= tf.tile(tf.expand_dims(offset, axis=0), [1, 2]) - # Clips the boxes. - boxes = box_utils.clip_boxes(boxes, output_size) - return boxes - - -def resize_and_crop_masks(masks, image_scale, output_size, offset): - """Resizes boxes to output size with scale and offset. - - Args: - masks: `Tensor` of shape [N, H, W, 1] representing ground truth masks. - image_scale: 2D float `Tensor` representing scale factors that apply to - [height, width] of input image. - output_size: 2D `Tensor` or `int` representing [height, width] of target - output image size. - offset: 2D `Tensor` representing top-left corner [y0, x0] to crop scaled - boxes. - - Returns: - masks: `Tensor` of shape [N, H, W, 1] representing the scaled masks. - """ - mask_size = tf.shape(input=masks)[1:3] - scaled_size = tf.cast(image_scale * tf.cast(mask_size, image_scale.dtype), - tf.int32) - scaled_masks = tf.image.resize( - masks, scaled_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) - offset = tf.cast(offset, tf.int32) - scaled_masks = scaled_masks[:, offset[0]:offset[0] + output_size[0], - offset[1]:offset[1] + output_size[1], :] - - output_masks = tf.image.pad_to_bounding_box(scaled_masks, 0, 0, - output_size[0], output_size[1]) - return output_masks - - -def random_horizontal_flip(image, boxes=None, masks=None): - """Randomly flips input image and bounding boxes.""" - return preprocessor.random_horizontal_flip(image, boxes, masks) diff --git a/official/vision/detection/utils/mask_utils.py b/official/vision/detection/utils/mask_utils.py deleted file mode 100644 index 926c829b81b35b11ca53a5a3d351d0ebca36205e..0000000000000000000000000000000000000000 --- a/official/vision/detection/utils/mask_utils.py +++ /dev/null @@ -1,171 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Utility functions for segmentations.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math - -import numpy as np -import cv2 - - -def paste_instance_masks(masks, detected_boxes, image_height, image_width): - """Paste instance masks to generate the image segmentation results. - - Args: - masks: a numpy array of shape [N, mask_height, mask_width] representing the - instance masks w.r.t. the `detected_boxes`. - detected_boxes: a numpy array of shape [N, 4] representing the reference - bounding boxes. - image_height: an integer representing the height of the image. - image_width: an integer representing the width of the image. - - Returns: - segms: a numpy array of shape [N, image_height, image_width] representing - the instance masks *pasted* on the image canvas. - """ - - def expand_boxes(boxes, scale): - """Expands an array of boxes by a given scale.""" - # Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/boxes.py#L227 # pylint: disable=line-too-long - # The `boxes` in the reference implementation is in [x1, y1, x2, y2] form, - # whereas `boxes` here is in [x1, y1, w, h] form - w_half = boxes[:, 2] * .5 - h_half = boxes[:, 3] * .5 - x_c = boxes[:, 0] + w_half - y_c = boxes[:, 1] + h_half - - w_half *= scale - h_half *= scale - - boxes_exp = np.zeros(boxes.shape) - boxes_exp[:, 0] = x_c - w_half - boxes_exp[:, 2] = x_c + w_half - boxes_exp[:, 1] = y_c - h_half - boxes_exp[:, 3] = y_c + h_half - - return boxes_exp - - # Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/core/test.py#L812 # pylint: disable=line-too-long - # To work around an issue with cv2.resize (it seems to automatically pad - # with repeated border values), we manually zero-pad the masks by 1 pixel - # prior to resizing back to the original image resolution. This prevents - # "top hat" artifacts. We therefore need to expand the reference boxes by an - # appropriate factor. - _, mask_height, mask_width = masks.shape - scale = max((mask_width + 2.0) / mask_width, - (mask_height + 2.0) / mask_height) - - ref_boxes = expand_boxes(detected_boxes, scale) - ref_boxes = ref_boxes.astype(np.int32) - padded_mask = np.zeros((mask_height + 2, mask_width + 2), dtype=np.float32) - segms = [] - for mask_ind, mask in enumerate(masks): - im_mask = np.zeros((image_height, image_width), dtype=np.uint8) - # Process mask inside bounding boxes. - padded_mask[1:-1, 1:-1] = mask[:, :] - - ref_box = ref_boxes[mask_ind, :] - w = ref_box[2] - ref_box[0] + 1 - h = ref_box[3] - ref_box[1] + 1 - w = np.maximum(w, 1) - h = np.maximum(h, 1) - - mask = cv2.resize(padded_mask, (w, h)) - mask = np.array(mask > 0.5, dtype=np.uint8) - - x_0 = min(max(ref_box[0], 0), image_width) - x_1 = min(max(ref_box[2] + 1, 0), image_width) - y_0 = min(max(ref_box[1], 0), image_height) - y_1 = min(max(ref_box[3] + 1, 0), image_height) - - im_mask[y_0:y_1, x_0:x_1] = mask[(y_0 - ref_box[1]):(y_1 - ref_box[1]), - (x_0 - ref_box[0]):(x_1 - ref_box[0])] - segms.append(im_mask) - - segms = np.array(segms) - assert masks.shape[0] == segms.shape[0] - return segms - - -def paste_instance_masks_v2(masks, detected_boxes, image_height, image_width): - """Paste instance masks to generate the image segmentation (v2). - - Args: - masks: a numpy array of shape [N, mask_height, mask_width] representing the - instance masks w.r.t. the `detected_boxes`. - detected_boxes: a numpy array of shape [N, 4] representing the reference - bounding boxes. - image_height: an integer representing the height of the image. - image_width: an integer representing the width of the image. - - Returns: - segms: a numpy array of shape [N, image_height, image_width] representing - the instance masks *pasted* on the image canvas. - """ - _, mask_height, mask_width = masks.shape - - segms = [] - for i, mask in enumerate(masks): - box = detected_boxes[i, :] - xmin = box[0] - ymin = box[1] - xmax = xmin + box[2] - ymax = ymin + box[3] - - # Sample points of the cropped mask w.r.t. the image grid. - # Note that these coordinates may fall beyond the image. - # Pixel clipping will happen after warping. - xmin_int = int(math.floor(xmin)) - xmax_int = int(math.ceil(xmax)) - ymin_int = int(math.floor(ymin)) - ymax_int = int(math.ceil(ymax)) - - alpha = box[2] / (1.0 * mask_width) - beta = box[3] / (1.0 * mask_height) - # pylint: disable=invalid-name - # Transformation from mask pixel indices to image coordinate. - M_mask_to_image = np.array([[alpha, 0, xmin], [0, beta, ymin], [0, 0, 1]], - dtype=np.float32) - # Transformation from image to cropped mask coordinate. - M_image_to_crop = np.array( - [[1, 0, -xmin_int], [0, 1, -ymin_int], [0, 0, 1]], dtype=np.float32) - M = np.dot(M_image_to_crop, M_mask_to_image) - # Compensate the half pixel offset that OpenCV has in the - # warpPerspective implementation: the top-left pixel is sampled - # at (0,0), but we want it to be at (0.5, 0.5). - M = np.dot( - np.dot( - np.array([[1, 0, -0.5], [0, 1, -0.5], [0, 0, 1]], np.float32), M), - np.array([[1, 0, 0.5], [0, 1, 0.5], [0, 0, 1]], np.float32)) - # pylint: enable=invalid-name - cropped_mask = cv2.warpPerspective( - mask.astype(np.float32), M, (xmax_int - xmin_int, ymax_int - ymin_int)) - cropped_mask = np.array(cropped_mask > 0.5, dtype=np.uint8) - - img_mask = np.zeros((image_height, image_width)) - x0 = max(min(xmin_int, image_width), 0) - x1 = max(min(xmax_int, image_width), 0) - y0 = max(min(ymin_int, image_height), 0) - y1 = max(min(ymax_int, image_height), 0) - img_mask[y0:y1, x0:x1] = cropped_mask[(y0 - ymin_int):(y1 - ymin_int), - (x0 - xmin_int):(x1 - xmin_int)] - - segms.append(img_mask) - - segms = np.array(segms) - return segms diff --git a/official/vision/evaluation/__init__.py b/official/vision/evaluation/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/vision/evaluation/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/evaluation/coco_evaluator.py b/official/vision/evaluation/coco_evaluator.py new file mode 100644 index 0000000000000000000000000000000000000000..00789abc768c9cbf9c223e5e4880cf03605b1810 --- /dev/null +++ b/official/vision/evaluation/coco_evaluator.py @@ -0,0 +1,336 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""The COCO-style evaluator. + +The following snippet demonstrates the use of interfaces: + + evaluator = COCOEvaluator(...) + for _ in range(num_evals): + for _ in range(num_batches_per_eval): + predictions, groundtruth = predictor.predict(...) # pop a batch. + evaluator.update_state(groundtruths, predictions) + evaluator.result() # finish one full eval and reset states. + +See also: https://github.com/cocodataset/cocoapi/ +""" + +import atexit +import tempfile +# Import libraries +from absl import logging +import numpy as np +from pycocotools import cocoeval +import six +import tensorflow as tf + +from official.vision.evaluation import coco_utils + + +class COCOEvaluator(object): + """COCO evaluation metric class.""" + + def __init__(self, + annotation_file, + include_mask, + need_rescale_bboxes=True, + per_category_metrics=False): + """Constructs COCO evaluation class. + + The class provides the interface to COCO metrics_fn. The + _update_op() takes detections from each image and push them to + self.detections. The _evaluate() loads a JSON file in COCO annotation format + as the groundtruths and runs COCO evaluation. + + Args: + annotation_file: a JSON file that stores annotations of the eval dataset. + If `annotation_file` is None, groundtruth annotations will be loaded + from the dataloader. + include_mask: a boolean to indicate whether or not to include the mask + eval. + need_rescale_bboxes: If true bboxes in `predictions` will be rescaled back + to absolute values (`image_info` is needed in this case). + per_category_metrics: Whether to return per category metrics. + """ + if annotation_file: + if annotation_file.startswith('gs://'): + _, local_val_json = tempfile.mkstemp(suffix='.json') + tf.io.gfile.remove(local_val_json) + + tf.io.gfile.copy(annotation_file, local_val_json) + atexit.register(tf.io.gfile.remove, local_val_json) + else: + local_val_json = annotation_file + self._coco_gt = coco_utils.COCOWrapper( + eval_type=('mask' if include_mask else 'box'), + annotation_file=local_val_json) + self._annotation_file = annotation_file + self._include_mask = include_mask + self._per_category_metrics = per_category_metrics + self._metric_names = [ + 'AP', 'AP50', 'AP75', 'APs', 'APm', 'APl', 'ARmax1', 'ARmax10', + 'ARmax100', 'ARs', 'ARm', 'ARl' + ] + self._required_prediction_fields = [ + 'source_id', 'num_detections', 'detection_classes', 'detection_scores', + 'detection_boxes' + ] + self._need_rescale_bboxes = need_rescale_bboxes + if self._need_rescale_bboxes: + self._required_prediction_fields.append('image_info') + self._required_groundtruth_fields = [ + 'source_id', 'height', 'width', 'classes', 'boxes' + ] + if self._include_mask: + mask_metric_names = ['mask_' + x for x in self._metric_names] + self._metric_names.extend(mask_metric_names) + self._required_prediction_fields.extend(['detection_masks']) + self._required_groundtruth_fields.extend(['masks']) + + self.reset_states() + + @property + def name(self): + return 'coco_metric' + + def reset_states(self): + """Resets internal states for a fresh run.""" + self._predictions = {} + if not self._annotation_file: + self._groundtruths = {} + + def result(self): + """Evaluates detection results, and reset_states.""" + metric_dict = self.evaluate() + # Cleans up the internal variables in order for a fresh eval next time. + self.reset_states() + return metric_dict + + def evaluate(self): + """Evaluates with detections from all images with COCO API. + + Returns: + coco_metric: float numpy array with shape [24] representing the + coco-style evaluation metrics (box and mask). + """ + if not self._annotation_file: + logging.info('There is no annotation_file in COCOEvaluator.') + gt_dataset = coco_utils.convert_groundtruths_to_coco_dataset( + self._groundtruths) + coco_gt = coco_utils.COCOWrapper( + eval_type=('mask' if self._include_mask else 'box'), + gt_dataset=gt_dataset) + else: + logging.info('Using annotation file: %s', self._annotation_file) + coco_gt = self._coco_gt + coco_predictions = coco_utils.convert_predictions_to_coco_annotations( + self._predictions) + coco_dt = coco_gt.loadRes(predictions=coco_predictions) + image_ids = [ann['image_id'] for ann in coco_predictions] + + coco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='bbox') + coco_eval.params.imgIds = image_ids + coco_eval.evaluate() + coco_eval.accumulate() + coco_eval.summarize() + coco_metrics = coco_eval.stats + + if self._include_mask: + mcoco_eval = cocoeval.COCOeval(coco_gt, coco_dt, iouType='segm') + mcoco_eval.params.imgIds = image_ids + mcoco_eval.evaluate() + mcoco_eval.accumulate() + mcoco_eval.summarize() + mask_coco_metrics = mcoco_eval.stats + + if self._include_mask: + metrics = np.hstack((coco_metrics, mask_coco_metrics)) + else: + metrics = coco_metrics + + metrics_dict = {} + for i, name in enumerate(self._metric_names): + metrics_dict[name] = metrics[i].astype(np.float32) + + # Adds metrics per category. + if self._per_category_metrics: + metrics_dict.update(self._retrieve_per_category_metrics(coco_eval)) + + if self._include_mask: + metrics_dict.update(self._retrieve_per_category_metrics( + mcoco_eval, prefix='mask')) + + return metrics_dict + + def _retrieve_per_category_metrics(self, coco_eval, prefix=''): + """Retrieves and per-category metrics and retuns them in a dict. + + Args: + coco_eval: a cocoeval.COCOeval object containing evaluation data. + prefix: str, A string used to prefix metric names. + + Returns: + metrics_dict: A dictionary with per category metrics. + """ + + metrics_dict = {} + if prefix: + prefix = prefix + ' ' + + if hasattr(coco_eval, 'category_stats'): + for category_index, category_id in enumerate(coco_eval.params.catIds): + if self._annotation_file: + coco_category = self._coco_gt.cats[category_id] + # if 'name' is available use it, otherwise use `id` + category_display_name = coco_category.get('name', category_id) + else: + category_display_name = category_id + + metrics_dict[prefix + 'Precision mAP ByCategory/{}'.format( + category_display_name + )] = coco_eval.category_stats[0][category_index].astype(np.float32) + metrics_dict[prefix + 'Precision mAP ByCategory@50IoU/{}'.format( + category_display_name + )] = coco_eval.category_stats[1][category_index].astype(np.float32) + metrics_dict[prefix + 'Precision mAP ByCategory@75IoU/{}'.format( + category_display_name + )] = coco_eval.category_stats[2][category_index].astype(np.float32) + metrics_dict[prefix + 'Precision mAP ByCategory (small) /{}'.format( + category_display_name + )] = coco_eval.category_stats[3][category_index].astype(np.float32) + metrics_dict[prefix + 'Precision mAP ByCategory (medium) /{}'.format( + category_display_name + )] = coco_eval.category_stats[4][category_index].astype(np.float32) + metrics_dict[prefix + 'Precision mAP ByCategory (large) /{}'.format( + category_display_name + )] = coco_eval.category_stats[5][category_index].astype(np.float32) + metrics_dict[prefix + 'Recall AR@1 ByCategory/{}'.format( + category_display_name + )] = coco_eval.category_stats[6][category_index].astype(np.float32) + metrics_dict[prefix + 'Recall AR@10 ByCategory/{}'.format( + category_display_name + )] = coco_eval.category_stats[7][category_index].astype(np.float32) + metrics_dict[prefix + 'Recall AR@100 ByCategory/{}'.format( + category_display_name + )] = coco_eval.category_stats[8][category_index].astype(np.float32) + metrics_dict[prefix + 'Recall AR (small) ByCategory/{}'.format( + category_display_name + )] = coco_eval.category_stats[9][category_index].astype(np.float32) + metrics_dict[prefix + 'Recall AR (medium) ByCategory/{}'.format( + category_display_name + )] = coco_eval.category_stats[10][category_index].astype(np.float32) + metrics_dict[prefix + 'Recall AR (large) ByCategory/{}'.format( + category_display_name + )] = coco_eval.category_stats[11][category_index].astype(np.float32) + + return metrics_dict + + def _process_predictions(self, predictions): + image_scale = np.tile(predictions['image_info'][:, 2:3, :], (1, 1, 2)) + predictions['detection_boxes'] = ( + predictions['detection_boxes'].astype(np.float32)) + predictions['detection_boxes'] /= image_scale + if 'detection_outer_boxes' in predictions: + predictions['detection_outer_boxes'] = ( + predictions['detection_outer_boxes'].astype(np.float32)) + predictions['detection_outer_boxes'] /= image_scale + + def _convert_to_numpy(self, groundtruths, predictions): + """Converts tesnors to numpy arrays.""" + if groundtruths: + labels = tf.nest.map_structure(lambda x: x.numpy(), groundtruths) + numpy_groundtruths = {} + for key, val in labels.items(): + if isinstance(val, tuple): + val = np.concatenate(val) + numpy_groundtruths[key] = val + else: + numpy_groundtruths = groundtruths + + if predictions: + outputs = tf.nest.map_structure(lambda x: x.numpy(), predictions) + numpy_predictions = {} + for key, val in outputs.items(): + if isinstance(val, tuple): + val = np.concatenate(val) + numpy_predictions[key] = val + else: + numpy_predictions = predictions + + return numpy_groundtruths, numpy_predictions + + def update_state(self, groundtruths, predictions): + """Update and aggregate detection results and groundtruth data. + + Args: + groundtruths: a dictionary of Tensors including the fields below. + See also different parsers under `../dataloader` for more details. + Required fields: + - source_id: a numpy array of int or string of shape [batch_size]. + - height: a numpy array of int of shape [batch_size]. + - width: a numpy array of int of shape [batch_size]. + - num_detections: a numpy array of int of shape [batch_size]. + - boxes: a numpy array of float of shape [batch_size, K, 4]. + - classes: a numpy array of int of shape [batch_size, K]. + Optional fields: + - is_crowds: a numpy array of int of shape [batch_size, K]. If the + field is absent, it is assumed that this instance is not crowd. + - areas: a numy array of float of shape [batch_size, K]. If the + field is absent, the area is calculated using either boxes or + masks depending on which one is available. + - masks: a numpy array of float of shape + [batch_size, K, mask_height, mask_width], + predictions: a dictionary of tensors including the fields below. + See different parsers under `../dataloader` for more details. + Required fields: + - source_id: a numpy array of int or string of shape [batch_size]. + - image_info [if `need_rescale_bboxes` is True]: a numpy array of + float of shape [batch_size, 4, 2]. + - num_detections: a numpy array of + int of shape [batch_size]. + - detection_boxes: a numpy array of float of shape [batch_size, K, 4]. + - detection_classes: a numpy array of int of shape [batch_size, K]. + - detection_scores: a numpy array of float of shape [batch_size, K]. + Optional fields: + - detection_masks: a numpy array of float of shape + [batch_size, K, mask_height, mask_width]. + Raises: + ValueError: if the required prediction or groundtruth fields are not + present in the incoming `predictions` or `groundtruths`. + """ + groundtruths, predictions = self._convert_to_numpy(groundtruths, + predictions) + for k in self._required_prediction_fields: + if k not in predictions: + raise ValueError( + 'Missing the required key `{}` in predictions!'.format(k)) + if self._need_rescale_bboxes: + self._process_predictions(predictions) + for k, v in six.iteritems(predictions): + if k not in self._predictions: + self._predictions[k] = [v] + else: + self._predictions[k].append(v) + + if not self._annotation_file: + assert groundtruths + for k in self._required_groundtruth_fields: + if k not in groundtruths: + raise ValueError( + 'Missing the required key `{}` in groundtruths!'.format(k)) + for k, v in six.iteritems(groundtruths): + if k not in self._groundtruths: + self._groundtruths[k] = [v] + else: + self._groundtruths[k].append(v) diff --git a/official/vision/evaluation/coco_utils.py b/official/vision/evaluation/coco_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..13f35735877e55841696a65760250acd6ccb6bbd --- /dev/null +++ b/official/vision/evaluation/coco_utils.py @@ -0,0 +1,400 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Util functions related to pycocotools and COCO eval.""" + +import copy +import json + +# Import libraries + +from absl import logging +import numpy as np +from PIL import Image +from pycocotools import coco +from pycocotools import mask as mask_api +import six +import tensorflow as tf + +from official.common import dataset_fn +from official.vision.dataloaders import tf_example_decoder +from official.vision.ops import box_ops +from official.vision.ops import mask_ops + + +class COCOWrapper(coco.COCO): + """COCO wrapper class. + + This class wraps COCO API object, which provides the following additional + functionalities: + 1. Support string type image id. + 2. Support loading the groundtruth dataset using the external annotation + dictionary. + 3. Support loading the prediction results using the external annotation + dictionary. + """ + + def __init__(self, eval_type='box', annotation_file=None, gt_dataset=None): + """Instantiates a COCO-style API object. + + Args: + eval_type: either 'box' or 'mask'. + annotation_file: a JSON file that stores annotations of the eval dataset. + This is required if `gt_dataset` is not provided. + gt_dataset: the groundtruth eval datatset in COCO API format. + """ + if ((annotation_file and gt_dataset) or + ((not annotation_file) and (not gt_dataset))): + raise ValueError('One and only one of `annotation_file` and `gt_dataset` ' + 'needs to be specified.') + + if eval_type not in ['box', 'mask']: + raise ValueError('The `eval_type` can only be either `box` or `mask`.') + + coco.COCO.__init__(self, annotation_file=annotation_file) + self._eval_type = eval_type + if gt_dataset: + self.dataset = gt_dataset + self.createIndex() + + def loadRes(self, predictions): + """Loads result file and return a result api object. + + Args: + predictions: a list of dictionary each representing an annotation in COCO + format. The required fields are `image_id`, `category_id`, `score`, + `bbox`, `segmentation`. + + Returns: + res: result COCO api object. + + Raises: + ValueError: if the set of image id from predctions is not the subset of + the set of image id of the groundtruth dataset. + """ + res = coco.COCO() + res.dataset['images'] = copy.deepcopy(self.dataset['images']) + res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) + + image_ids = [ann['image_id'] for ann in predictions] + if set(image_ids) != (set(image_ids) & set(self.getImgIds())): + raise ValueError('Results do not correspond to the current dataset!') + for ann in predictions: + x1, x2, y1, y2 = [ann['bbox'][0], ann['bbox'][0] + ann['bbox'][2], + ann['bbox'][1], ann['bbox'][1] + ann['bbox'][3]] + if self._eval_type == 'box': + ann['area'] = ann['bbox'][2] * ann['bbox'][3] + ann['segmentation'] = [ + [x1, y1, x1, y2, x2, y2, x2, y1]] + elif self._eval_type == 'mask': + ann['area'] = mask_api.area(ann['segmentation']) + + res.dataset['annotations'] = copy.deepcopy(predictions) + res.createIndex() + return res + + +def convert_predictions_to_coco_annotations(predictions): + """Converts a batch of predictions to annotations in COCO format. + + Args: + predictions: a dictionary of lists of numpy arrays including the following + fields. K below denotes the maximum number of instances per image. + Required fields: + - source_id: a list of numpy arrays of int or string of shape + [batch_size]. + - num_detections: a list of numpy arrays of int of shape [batch_size]. + - detection_boxes: a list of numpy arrays of float of shape + [batch_size, K, 4], where coordinates are in the original image + space (not the scaled image space). + - detection_classes: a list of numpy arrays of int of shape + [batch_size, K]. + - detection_scores: a list of numpy arrays of float of shape + [batch_size, K]. + Optional fields: + - detection_masks: a list of numpy arrays of float of shape + [batch_size, K, mask_height, mask_width]. + + Returns: + coco_predictions: prediction in COCO annotation format. + """ + coco_predictions = [] + num_batches = len(predictions['source_id']) + max_num_detections = predictions['detection_classes'][0].shape[1] + use_outer_box = 'detection_outer_boxes' in predictions + for i in range(num_batches): + predictions['detection_boxes'][i] = box_ops.yxyx_to_xywh( + predictions['detection_boxes'][i]) + if use_outer_box: + predictions['detection_outer_boxes'][i] = box_ops.yxyx_to_xywh( + predictions['detection_outer_boxes'][i]) + mask_boxes = predictions['detection_outer_boxes'] + else: + mask_boxes = predictions['detection_boxes'] + + batch_size = predictions['source_id'][i].shape[0] + for j in range(batch_size): + if 'detection_masks' in predictions: + image_masks = mask_ops.paste_instance_masks( + predictions['detection_masks'][i][j], + mask_boxes[i][j], + int(predictions['image_info'][i][j, 0, 0]), + int(predictions['image_info'][i][j, 0, 1])) + binary_masks = (image_masks > 0.0).astype(np.uint8) + encoded_masks = [ + mask_api.encode(np.asfortranarray(binary_mask)) + for binary_mask in list(binary_masks)] + for k in range(max_num_detections): + ann = {} + ann['image_id'] = predictions['source_id'][i][j] + ann['category_id'] = predictions['detection_classes'][i][j, k] + ann['bbox'] = predictions['detection_boxes'][i][j, k] + ann['score'] = predictions['detection_scores'][i][j, k] + if 'detection_masks' in predictions: + ann['segmentation'] = encoded_masks[k] + coco_predictions.append(ann) + + for i, ann in enumerate(coco_predictions): + ann['id'] = i + 1 + + return coco_predictions + + +def convert_groundtruths_to_coco_dataset(groundtruths, label_map=None): + """Converts groundtruths to the dataset in COCO format. + + Args: + groundtruths: a dictionary of numpy arrays including the fields below. + Note that each element in the list represent the number for a single + example without batch dimension. K below denotes the actual number of + instances for each image. + Required fields: + - source_id: a list of numpy arrays of int or string of shape + [batch_size]. + - height: a list of numpy arrays of int of shape [batch_size]. + - width: a list of numpy arrays of int of shape [batch_size]. + - num_detections: a list of numpy arrays of int of shape [batch_size]. + - boxes: a list of numpy arrays of float of shape [batch_size, K, 4], + where coordinates are in the original image space (not the + normalized coordinates). + - classes: a list of numpy arrays of int of shape [batch_size, K]. + Optional fields: + - is_crowds: a list of numpy arrays of int of shape [batch_size, K]. If + th field is absent, it is assumed that this instance is not crowd. + - areas: a list of numy arrays of float of shape [batch_size, K]. If the + field is absent, the area is calculated using either boxes or + masks depending on which one is available. + - masks: a list of numpy arrays of string of shape [batch_size, K], + label_map: (optional) a dictionary that defines items from the category id + to the category name. If `None`, collect the category mappping from the + `groundtruths`. + + Returns: + coco_groundtruths: the groundtruth dataset in COCO format. + """ + source_ids = np.concatenate(groundtruths['source_id'], axis=0) + heights = np.concatenate(groundtruths['height'], axis=0) + widths = np.concatenate(groundtruths['width'], axis=0) + gt_images = [{'id': int(i), 'height': int(h), 'width': int(w)} for i, h, w + in zip(source_ids, heights, widths)] + + gt_annotations = [] + num_batches = len(groundtruths['source_id']) + for i in range(num_batches): + logging.info( + 'convert_groundtruths_to_coco_dataset: Processing annotation %d', i) + max_num_instances = groundtruths['classes'][i].shape[1] + batch_size = groundtruths['source_id'][i].shape[0] + for j in range(batch_size): + num_instances = groundtruths['num_detections'][i][j] + if num_instances > max_num_instances: + logging.warning( + 'num_groundtruths is larger than max_num_instances, %d v.s. %d', + num_instances, max_num_instances) + num_instances = max_num_instances + for k in range(int(num_instances)): + ann = {} + ann['image_id'] = int(groundtruths['source_id'][i][j]) + if 'is_crowds' in groundtruths: + ann['iscrowd'] = int(groundtruths['is_crowds'][i][j, k]) + else: + ann['iscrowd'] = 0 + ann['category_id'] = int(groundtruths['classes'][i][j, k]) + boxes = groundtruths['boxes'][i] + ann['bbox'] = [ + float(boxes[j, k, 1]), + float(boxes[j, k, 0]), + float(boxes[j, k, 3] - boxes[j, k, 1]), + float(boxes[j, k, 2] - boxes[j, k, 0])] + if 'areas' in groundtruths: + ann['area'] = float(groundtruths['areas'][i][j, k]) + else: + ann['area'] = float( + (boxes[j, k, 3] - boxes[j, k, 1]) * + (boxes[j, k, 2] - boxes[j, k, 0])) + if 'masks' in groundtruths: + if isinstance(groundtruths['masks'][i][j, k], tf.Tensor): + mask = Image.open( + six.BytesIO(groundtruths['masks'][i][j, k].numpy())) + width, height = mask.size + np_mask = ( + np.array(mask.getdata()).reshape(height, + width).astype(np.uint8)) + else: + mask = Image.open( + six.BytesIO(groundtruths['masks'][i][j, k])) + width, height = mask.size + np_mask = ( + np.array(mask.getdata()).reshape(height, + width).astype(np.uint8)) + np_mask[np_mask > 0] = 255 + encoded_mask = mask_api.encode(np.asfortranarray(np_mask)) + ann['segmentation'] = encoded_mask + # Ensure the content of `counts` is JSON serializable string. + if 'counts' in ann['segmentation']: + ann['segmentation']['counts'] = six.ensure_str( + ann['segmentation']['counts']) + if 'areas' not in groundtruths: + ann['area'] = mask_api.area(encoded_mask) + gt_annotations.append(ann) + + for i, ann in enumerate(gt_annotations): + ann['id'] = i + 1 + + if label_map: + gt_categories = [{'id': i, 'name': label_map[i]} for i in label_map] + else: + category_ids = [gt['category_id'] for gt in gt_annotations] + gt_categories = [{'id': i} for i in set(category_ids)] + + gt_dataset = { + 'images': gt_images, + 'categories': gt_categories, + 'annotations': copy.deepcopy(gt_annotations), + } + return gt_dataset + + +class COCOGroundtruthGenerator: + """Generates the groundtruth annotations from a single example.""" + + def __init__(self, file_pattern, file_type, num_examples, include_mask, + regenerate_source_id=False): + self._file_pattern = file_pattern + self._num_examples = num_examples + self._include_mask = include_mask + self._dataset_fn = dataset_fn.pick_dataset_fn(file_type) + self._regenerate_source_id = regenerate_source_id + + def _parse_single_example(self, example): + """Parses a single serialized tf.Example proto. + + Args: + example: a serialized tf.Example proto string. + + Returns: + A dictionary of groundtruth with the following fields: + source_id: a scalar tensor of int64 representing the image source_id. + height: a scalar tensor of int64 representing the image height. + width: a scalar tensor of int64 representing the image width. + boxes: a float tensor of shape [K, 4], representing the groundtruth + boxes in absolute coordinates with respect to the original image size. + classes: a int64 tensor of shape [K], representing the class labels of + each instances. + is_crowds: a bool tensor of shape [K], indicating whether the instance + is crowd. + areas: a float tensor of shape [K], indicating the area of each + instance. + masks: a string tensor of shape [K], containing the bytes of the png + mask of each instance. + """ + decoder = tf_example_decoder.TfExampleDecoder( + include_mask=self._include_mask, + regenerate_source_id=self._regenerate_source_id) + decoded_tensors = decoder.decode(example) + + image = decoded_tensors['image'] + image_size = tf.shape(image)[0:2] + boxes = box_ops.denormalize_boxes( + decoded_tensors['groundtruth_boxes'], image_size) + + source_id = decoded_tensors['source_id'] + if source_id.dtype is tf.string: + source_id = tf.strings.to_number(source_id, out_type=tf.int64) + + groundtruths = { + 'source_id': source_id, + 'height': decoded_tensors['height'], + 'width': decoded_tensors['width'], + 'num_detections': tf.shape(decoded_tensors['groundtruth_classes'])[0], + 'boxes': boxes, + 'classes': decoded_tensors['groundtruth_classes'], + 'is_crowds': decoded_tensors['groundtruth_is_crowd'], + 'areas': decoded_tensors['groundtruth_area'], + } + if self._include_mask: + groundtruths.update({ + 'masks': decoded_tensors['groundtruth_instance_masks_png'], + }) + return groundtruths + + def _build_pipeline(self): + """Builds data pipeline to generate groundtruth annotations.""" + dataset = tf.data.Dataset.list_files(self._file_pattern, shuffle=False) + dataset = dataset.interleave( + map_func=lambda filename: self._dataset_fn(filename).prefetch(1), + cycle_length=None, + num_parallel_calls=tf.data.experimental.AUTOTUNE) + + dataset = dataset.take(self._num_examples) + dataset = dataset.map(self._parse_single_example, + num_parallel_calls=tf.data.experimental.AUTOTUNE) + dataset = dataset.batch(1, drop_remainder=False) + dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE) + return dataset + + def __call__(self): + return self._build_pipeline() + + +def scan_and_generator_annotation_file(file_pattern: str, + file_type: str, + num_samples: int, + include_mask: bool, + annotation_file: str, + regenerate_source_id: bool = False): + """Scans and generate the COCO-style annotation JSON file given a dataset.""" + groundtruth_generator = COCOGroundtruthGenerator( + file_pattern, file_type, num_samples, include_mask, regenerate_source_id) + generate_annotation_file(groundtruth_generator, annotation_file) + + +def generate_annotation_file(groundtruth_generator, + annotation_file): + """Generates COCO-style annotation JSON file given a groundtruth generator.""" + groundtruths = {} + logging.info('Loading groundtruth annotations from dataset to memory...') + for i, groundtruth in enumerate(groundtruth_generator()): + logging.info('generate_annotation_file: Processing annotation %d', i) + for k, v in six.iteritems(groundtruth): + if k not in groundtruths: + groundtruths[k] = [v] + else: + groundtruths[k].append(v) + gt_dataset = convert_groundtruths_to_coco_dataset(groundtruths) + + logging.info('Saving groundtruth annotations to the JSON file...') + with tf.io.gfile.GFile(annotation_file, 'w') as f: + f.write(json.dumps(gt_dataset)) + logging.info('Done saving the JSON file...') diff --git a/official/vision/beta/evaluation/coco_utils_test.py b/official/vision/evaluation/coco_utils_test.py similarity index 89% rename from official/vision/beta/evaluation/coco_utils_test.py rename to official/vision/evaluation/coco_utils_test.py index 6134031fae016c65824a1c54352cfb9014d69895..0c8d2c91d54bee2d1dff2f4d7635d69da113b8a5 100644 --- a/official/vision/beta/evaluation/coco_utils_test.py +++ b/official/vision/evaluation/coco_utils_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,8 +18,8 @@ import os import tensorflow as tf -from official.vision.beta.dataloaders import tfexample_utils -from official.vision.beta.evaluation import coco_utils +from official.vision.dataloaders import tfexample_utils +from official.vision.evaluation import coco_utils class CocoUtilsTest(tf.test.TestCase): diff --git a/official/vision/evaluation/iou.py b/official/vision/evaluation/iou.py new file mode 100644 index 0000000000000000000000000000000000000000..e662e77d6bad952b652fce69cc7bdc34a97193b8 --- /dev/null +++ b/official/vision/evaluation/iou.py @@ -0,0 +1,58 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""IOU Metrics used for semantic segmentation models.""" + +import tensorflow as tf + + +class PerClassIoU(tf.keras.metrics.MeanIoU): + """Computes the per-class Intersection-Over-Union metric. + + This metric computes the IOU for each semantic class. + IOU is defined as follows: + IOU = true_positive / (true_positive + false_positive + false_negative). + The predictions are accumulated in a confusion matrix, weighted by + `sample_weight` and the metric is then calculated from it. + + If `sample_weight` is `None`, weights default to 1. + Use `sample_weight` of 0 to mask values. + + Example: + + >>> # cm = [[1, 1], + >>> # [1, 1]] + >>> # sum_row = [2, 2], sum_col = [2, 2], true_positives = [1, 1] + >>> # iou = true_positives / (sum_row + sum_col - true_positives)) + >>> # result = [(1 / (2 + 2 - 1), 1 / (2 + 2 - 1)] = 0.33 + >>> m = tf.keras.metrics.MeanIoU(num_classes=2) + >>> m.update_state([0, 0, 1, 1], [0, 1, 0, 1]) + >>> m.result().numpy() + [0.33333334, 0.33333334] + """ + + def result(self): + """Compute IoU for each class via the confusion matrix.""" + sum_over_row = tf.cast( + tf.reduce_sum(self.total_cm, axis=0), dtype=self._dtype) + sum_over_col = tf.cast( + tf.reduce_sum(self.total_cm, axis=1), dtype=self._dtype) + true_positives = tf.cast( + tf.linalg.tensor_diag_part(self.total_cm), dtype=self._dtype) + + # sum_over_row + sum_over_col = + # 2 * true_positives + false_positives + false_negatives. + denominator = sum_over_row + sum_over_col - true_positives + + return tf.math.divide_no_nan(true_positives, denominator) diff --git a/official/vision/beta/evaluation/iou_test.py b/official/vision/evaluation/iou_test.py similarity index 97% rename from official/vision/beta/evaluation/iou_test.py rename to official/vision/evaluation/iou_test.py index 64def868552020a777e381791a72b22282b3f8e5..370039426d89600596de0aeb34aff55d112740d5 100644 --- a/official/vision/beta/evaluation/iou_test.py +++ b/official/vision/evaluation/iou_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ import tensorflow as tf -from official.vision.beta.evaluation import iou +from official.vision.evaluation import iou class MeanIoUTest(tf.test.TestCase): diff --git a/official/vision/beta/evaluation/panoptic_quality.py b/official/vision/evaluation/panoptic_quality.py similarity index 99% rename from official/vision/beta/evaluation/panoptic_quality.py rename to official/vision/evaluation/panoptic_quality.py index aba0a2330c4dc13d90cc6c613c8f8dd886cbdcee..581ab36fff0185b79be38f2c11c72686d5a321d2 100644 --- a/official/vision/beta/evaluation/panoptic_quality.py +++ b/official/vision/evaluation/panoptic_quality.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -277,12 +277,12 @@ class PanopticQuality: np.sum(in_category_set.astype(np.int32)), }) else: - results[category_set_name] = { + results.update({ f'{category_set_name}_pq': 0., f'{category_set_name}_sq': 0., f'{category_set_name}_rq': 0., f'{category_set_name}_num_categories': 0 - } + }) return results diff --git a/official/vision/beta/evaluation/panoptic_quality_evaluator.py b/official/vision/evaluation/panoptic_quality_evaluator.py similarity index 91% rename from official/vision/beta/evaluation/panoptic_quality_evaluator.py rename to official/vision/evaluation/panoptic_quality_evaluator.py index 6425c6883514458223414be8cc90e10976b08634..6bcb02f0bc804016cdc6b82dc70391f2d7f5eea3 100644 --- a/official/vision/beta/evaluation/panoptic_quality_evaluator.py +++ b/official/vision/evaluation/panoptic_quality_evaluator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -29,7 +29,7 @@ See also: https://github.com/cocodataset/cocoapi/ import numpy as np import tensorflow as tf -from official.vision.beta.evaluation import panoptic_quality +from official.vision.evaluation import panoptic_quality def _crop_padding(mask, image_info): @@ -181,4 +181,14 @@ class PanopticQualityEvaluator: self._pq_metric_module.compare_and_accumulate( groundtruths_, predictions_) else: - self._pq_metric_module.compare_and_accumulate(groundtruths, predictions) + for idx in range(len(groundtruths['category_mask'])): + groundtruths_ = { + 'category_mask': groundtruths['category_mask'][idx], + 'instance_mask': groundtruths['instance_mask'][idx] + } + predictions_ = { + 'category_mask': predictions['category_mask'][idx], + 'instance_mask': predictions['instance_mask'][idx] + } + self._pq_metric_module.compare_and_accumulate(groundtruths_, + predictions_) diff --git a/official/vision/beta/evaluation/panoptic_quality_evaluator_test.py b/official/vision/evaluation/panoptic_quality_evaluator_test.py similarity index 96% rename from official/vision/beta/evaluation/panoptic_quality_evaluator_test.py rename to official/vision/evaluation/panoptic_quality_evaluator_test.py index 9490da65fdd8ff17dc955ec938f3c334418c7480..b9d1454d01daa661da67638f9703e516bf458dc2 100644 --- a/official/vision/beta/evaluation/panoptic_quality_evaluator_test.py +++ b/official/vision/evaluation/panoptic_quality_evaluator_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ import numpy as np import tensorflow as tf -from official.vision.beta.evaluation import panoptic_quality_evaluator +from official.vision.evaluation import panoptic_quality_evaluator class PanopticQualityEvaluatorTest(tf.test.TestCase): diff --git a/official/vision/beta/evaluation/panoptic_quality_test.py b/official/vision/evaluation/panoptic_quality_test.py similarity index 98% rename from official/vision/beta/evaluation/panoptic_quality_test.py rename to official/vision/evaluation/panoptic_quality_test.py index 078ec5f1a57eb41cde98d7f34b6bc8f0dd6d41cb..95ad64e722ca05b239e569ae2bd184385cf95bde 100644 --- a/official/vision/beta/evaluation/panoptic_quality_test.py +++ b/official/vision/evaluation/panoptic_quality_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,7 +22,7 @@ https://github.com/tensorflow/models/blob/master/research/deeplab/evaluation/pan from absl.testing import absltest import numpy as np -from official.vision.beta.evaluation import panoptic_quality +from official.vision.evaluation import panoptic_quality class PanopticQualityTest(absltest.TestCase): diff --git a/official/vision/evaluation/segmentation_metrics.py b/official/vision/evaluation/segmentation_metrics.py new file mode 100644 index 0000000000000000000000000000000000000000..cfdaf45c8bcf82c8104973ca6966ae229752d9aa --- /dev/null +++ b/official/vision/evaluation/segmentation_metrics.py @@ -0,0 +1,178 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Metrics for segmentation.""" + +import tensorflow as tf + +from official.vision.ops import box_ops +from official.vision.ops import spatial_transform_ops + + +class MeanIoU(tf.keras.metrics.MeanIoU): + """Mean IoU metric for semantic segmentation. + + This class utilizes tf.keras.metrics.MeanIoU to perform batched mean iou when + both input images and groundtruth masks are resized to the same size + (rescale_predictions=False). It also computes mean iou on groundtruth original + sizes, in which case, each prediction is rescaled back to the original image + size. + """ + + def __init__(self, + num_classes, + rescale_predictions=False, + name=None, + dtype=None): + """Constructs Segmentation evaluator class. + + Args: + num_classes: `int`, number of classes. + rescale_predictions: `bool`, whether to scale back prediction to original + image sizes. If True, y_true['image_info'] is used to rescale + predictions. + name: `str`, name of the metric instance.. + dtype: data type of the metric result. + """ + self._rescale_predictions = rescale_predictions + super().__init__(num_classes=num_classes, name=name, dtype=dtype) + + def update_state(self, y_true, y_pred): + """Updates metric state. + + Args: + y_true: `dict`, dictionary with the following name, and key values. + - masks: [batch, height, width, 1], groundtruth masks. + - valid_masks: [batch, height, width, 1], valid elements in the mask. + - image_info: [batch, 4, 2], a tensor that holds information about + original and preprocessed images. Each entry is in the format of + [[original_height, original_width], [input_height, input_width], + [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, + desired_width] is the actual scaled image size, and [y_scale, x_scale] + is the scaling factor, which is the ratio of scaled dimension / + original dimension. + y_pred: Tensor [batch, height_p, width_p, num_classes], predicated masks. + """ + predictions = y_pred + masks = y_true['masks'] + valid_masks = y_true['valid_masks'] + images_info = y_true['image_info'] + + if isinstance(predictions, tuple) or isinstance(predictions, list): + predictions = tf.concat(predictions, axis=0) + masks = tf.concat(masks, axis=0) + valid_masks = tf.concat(valid_masks, axis=0) + images_info = tf.concat(images_info, axis=0) + + # Ignore mask elements is set to zero for argmax op. + masks = tf.where(valid_masks, masks, tf.zeros_like(masks)) + masks_size = tf.shape(masks)[1:3] + + if self._rescale_predictions: + # Scale back predictions to original image shapes and pad to mask size. + # Note: instead of cropping the masks to image shape (dynamic), here we + # pad the rescaled predictions to mask size (fixed). And update the + # valid_masks to mask out the pixels outside the original image shape. + predictions, image_shape_masks = _rescale_and_pad_predictions( + predictions, images_info, output_size=masks_size) + # Only the area within the original image shape is valid. + # (batch_size, height, width, 1) + valid_masks = tf.cast(valid_masks, tf.bool) & tf.expand_dims( + image_shape_masks, axis=-1) + else: + predictions = tf.image.resize( + predictions, masks_size, method=tf.image.ResizeMethod.BILINEAR) + + predictions = tf.argmax(predictions, axis=3) + flatten_predictions = tf.reshape(predictions, shape=[-1]) + flatten_masks = tf.reshape(masks, shape=[-1]) + flatten_valid_masks = tf.reshape(valid_masks, shape=[-1]) + + super().update_state( + y_true=flatten_masks, + y_pred=flatten_predictions, + sample_weight=tf.cast(flatten_valid_masks, tf.float32)) + + +class PerClassIoU(MeanIoU): + """Per class IoU metric for semantic segmentation.""" + + def result(self): + """Compute IoU for each class via the confusion matrix.""" + sum_over_row = tf.cast( + tf.reduce_sum(self.total_cm, axis=0), dtype=self._dtype) + sum_over_col = tf.cast( + tf.reduce_sum(self.total_cm, axis=1), dtype=self._dtype) + true_positives = tf.cast( + tf.linalg.tensor_diag_part(self.total_cm), dtype=self._dtype) + + # sum_over_row + sum_over_col = + # 2 * true_positives + false_positives + false_negatives. + denominator = sum_over_row + sum_over_col - true_positives + + return tf.math.divide_no_nan(true_positives, denominator) + + +def _rescale_and_pad_predictions(predictions, images_info, output_size): + """Scales back predictions to original image shapes and pads to output size. + + Args: + predictions: A tensor in shape [batch, height, width, num_classes] which + stores the model predictions. + images_info: A tensor in shape [batch, 4, 2] that holds information about + original and preprocessed images. Each entry is in the format of + [[original_height, original_width], [input_height, input_width], [y_scale, + x_scale], [y_offset, x_offset]], where [desired_height, desired_width] is + the actual scaled image size, and [y_scale, x_scale] is the scaling + factor, which is the ratio of scaled dimension / original dimension. + output_size: A list/tuple/tensor stores the size of the padded output in + [output_height, output_width]. + + Returns: + predictions: A tensor in shape [batch, output_height, output_width, + num_classes] which stores the rescaled and padded predictions. + image_shape_masks: A bool tensor in shape [batch, output_height, + output_width] where the pixels inside the original image shape are true, + otherwise false. + """ + # (batch_size, 2) + image_shape = tf.cast(images_info[:, 0, :], tf.int32) + desired_size = tf.cast(images_info[:, 1, :], tf.float32) + image_scale = tf.cast(images_info[:, 2, :], tf.float32) + offset = tf.cast(images_info[:, 3, :], tf.int32) + rescale_size = tf.cast(tf.math.ceil(desired_size / image_scale), tf.int32) + + # Rescale the predictions, then crop to the original image shape and + # finally pad zeros to match the mask size. + predictions = ( + spatial_transform_ops.bilinear_resize_with_crop_and_pad( + predictions, + rescale_size, + crop_offset=offset, + crop_size=image_shape, + output_size=output_size)) + + # (batch_size, 2) + y0_x0 = tf.broadcast_to( + tf.constant([[0, 0]], dtype=image_shape.dtype), tf.shape(image_shape)) + # (batch_size, 4) + image_shape_bbox = tf.concat([y0_x0, image_shape], axis=1) + # (batch_size, height, width) + image_shape_masks = box_ops.bbox2mask( + bbox=image_shape_bbox, + image_height=output_size[0], + image_width=output_size[1], + dtype=tf.bool) + + return predictions, image_shape_masks diff --git a/official/vision/evaluation/segmentation_metrics_test.py b/official/vision/evaluation/segmentation_metrics_test.py new file mode 100644 index 0000000000000000000000000000000000000000..ac91cdad475060798532f8e722f3e95d32fa5d82 --- /dev/null +++ b/official/vision/evaluation/segmentation_metrics_test.py @@ -0,0 +1,73 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for segmentation_metrics.""" + +from absl.testing import parameterized +import tensorflow as tf + +from official.vision.evaluation import segmentation_metrics + + +class SegmentationMetricsTest(parameterized.TestCase, tf.test.TestCase): + + def _create_test_data(self): + y_pred_cls0 = tf.constant([[1, 1, 0], [1, 1, 0], [0, 0, 0]], + dtype=tf.uint16)[tf.newaxis, :, :, tf.newaxis] + y_pred_cls1 = tf.constant([[0, 0, 0], [0, 0, 1], [0, 0, 1]], + dtype=tf.uint16)[tf.newaxis, :, :, tf.newaxis] + y_pred = tf.concat((y_pred_cls0, y_pred_cls1), axis=-1) + + y_true = { + 'masks': + tf.constant( + [[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], + [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1]], + dtype=tf.uint16)[tf.newaxis, :, :, tf.newaxis], + 'valid_masks': + tf.ones([1, 6, 6, 1], dtype=tf.bool), + 'image_info': + tf.constant([[[6, 6], [3, 3], [0.5, 0.5], [0, 0]]], + dtype=tf.float32) + } + return y_pred, y_true + + @parameterized.parameters(True, False) + def test_mean_iou_metric(self, rescale_predictions): + tf.config.experimental_run_functions_eagerly(True) + mean_iou_metric = segmentation_metrics.MeanIoU( + num_classes=2, rescale_predictions=rescale_predictions) + y_pred, y_true = self._create_test_data() + # Disable autograph for correct coverage statistics. + update_fn = tf.autograph.experimental.do_not_convert( + mean_iou_metric.update_state) + update_fn(y_true=y_true, y_pred=y_pred) + miou = mean_iou_metric.result() + self.assertAlmostEqual(miou.numpy(), 0.762, places=3) + + @parameterized.parameters(True, False) + def test_per_class_mean_iou_metric(self, rescale_predictions): + per_class_iou_metric = segmentation_metrics.PerClassIoU( + num_classes=2, rescale_predictions=rescale_predictions) + y_pred, y_true = self._create_test_data() + # Disable autograph for correct coverage statistics. + update_fn = tf.autograph.experimental.do_not_convert( + per_class_iou_metric.update_state) + update_fn(y_true=y_true, y_pred=y_pred) + per_class_miou = per_class_iou_metric.result() + self.assertAllClose(per_class_miou.numpy(), [0.857, 0.667], atol=1e-3) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/evaluation/wod_detection_evaluator.py b/official/vision/evaluation/wod_detection_evaluator.py similarity index 97% rename from official/vision/beta/evaluation/wod_detection_evaluator.py rename to official/vision/evaluation/wod_detection_evaluator.py index f4dd7024e01a31b790c0d42025e090eda30e7d1d..61e51ea70d5b38e1ff00136701837b02a036a17c 100644 --- a/official/vision/beta/evaluation/wod_detection_evaluator.py +++ b/official/vision/evaluation/wod_detection_evaluator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ import pprint from absl import logging import tensorflow as tf -from official.vision.beta.ops import box_ops +from official.vision.ops import box_ops from waymo_open_dataset import label_pb2 from waymo_open_dataset.metrics.python import wod_detection_evaluator from waymo_open_dataset.protos import breakdown_pb2 @@ -148,7 +148,7 @@ class WOD2dDetectionEvaluator(wod_detection_evaluator.WODDetectionEvaluator): def evaluate(self): """Compute the final metrics.""" - ap, _, _, _, _ = super().evaluate() + ap, _, _, _, _, _, _ = super().evaluate() metric_dict = {} for i, name in enumerate(self._breakdown_names): # Skip sign metrics in 2d detection task. diff --git a/official/vision/examples/starter/README.md b/official/vision/examples/starter/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b375843cf16e164bb57d367642fa23197a2ccca8 --- /dev/null +++ b/official/vision/examples/starter/README.md @@ -0,0 +1,214 @@ +# TF Vision Example Project + +This is a minimal example project to demonstrate how to use TF Model Garden's +building blocks to implement a new vision project from scratch. + +Below we use classification as an example. We will walk you through the process +of creating a new projects leveraging existing components, such as tasks, data +loaders, models, etc. You will get better understanding of these components by +going through the process. You can also refer to the docstring of corresponding +components to get more information. + +## Create Model + +In +[example_model.py](example_model.py), +we show how to create a new model. The `ExampleModel` is a subclass of +`tf.keras.Model` that defines necessary parameters. Here, you need to have +`input_specs` to specify the input shape and dimensions, and build layers within +constructor: + +```python +class ExampleModel(tf.keras.Model): + def __init__( + self, + num_classes: int, + input_specs: tf.keras.layers.InputSpec = tf.keras.layers.InputSpec( + shape=[None, None, None, 3]), + **kwargs): + # Build layers. +``` + +Given the `ExampleModel`, you can define a function that takes a model config as +input and return an `ExampleModel` instance, similar as +[build_example_model](example_model.py#L80). +As a simple example, we define a single model. However, you can split the model +implementation to individual components, such as backbones, decoders, heads, as +what we do +[here](https://github.com/tensorflow/models/blob/master/official/vision/modeling). +And then in `build_example_model` function, you can hook up these components +together to obtain your full model. + +## Create Dataloader + +A dataloader reads, decodes and parses the input data. We have created various +[dataloaders](https://github.com/tensorflow/models/blob/master/official/vision/dataloaders) +to handle standard input formats for classification, detection and segmentation. +If you have non-standard or complex data, you may want to create your own +dataloader. It contains a `Decoder` and a `Parser`. + +- The + [Decoder](example_input.py#L33) + decodes a TF Example record and returns a dictionary of decoded tensors: + + ```python + class Decoder(decoder.Decoder): + """A tf.Example decoder for classification task.""" + def __init__(self): + """Initializes the decoder. + + The constructor defines the mapping between the field name and the value + from an input tf.Example. For example, we define two fields for image bytes + and labels. There is no limit on the number of fields to decode. + """ + self._keys_to_features = { + 'image/encoded': + tf.io.FixedLenFeature((), tf.string, default_value=''), + 'image/class/label': + tf.io.FixedLenFeature((), tf.int64, default_value=-1) + } + ``` + +- The + [Parser](example_input.py#L68) + parses the decoded tensors and performs pre-processing to the input data, + such as image decoding, augmentation and resizing, etc. It should have + `_parse_train_data` and `_parse_eval_data` functions, in which the processed + images and labels are returned. + +## Create Config + +Next you will define configs for your project. All configs are defined as +`dataclass` objects, and can have default parameter values. + +First, you will define your +[`ExampleDataConfig`](example_config.py#L27). +It inherits from `config_definitions.DataConfig` that already defines a few +common fields, like `input_path`, `file_type`, `global_batch_size`, etc. You can +add more fields in your own config as needed. + +You can then define you model config +[`ExampleModel`](example_config.py#L39) +that inherits from `hyperparams.Config`. Expose your own model parameters here. + +You can then define your `Loss` and `Evaluation` configs. + +Next, you will put all the above configs into an +[`ExampleTask`](example_config.py#L56) +config. Here you list the configs for your data, model, loss, and evaluation, +etc. + +Finally, you can define a +[`tf_vision_example_experiment`](example_config.py#L66), +which creates a template for your experiments and fills with default parameters. +These default parameter values can be overridden by a YAML file, like +[example_config_tpu.yaml](example_config_tpu.yaml). +Also, make sure you give a unique name to your experiment template by the +decorator: + +```python +@exp_factory.register_config_factory('tf_vision_example_experiment') +def tf_vision_example_experiment() -> cfg.ExperimentConfig: + """Definition of a full example experiment.""" + # Create and return experiment template. +``` + +## Create Task + +A task is a class that encapsules the logic of loading data, building models, +performing one-step training and validation, etc. It connects all components +together and is called by the base +[Trainer](https://github.com/tensorflow/models/blob/master/official/core/base_trainer.py). + +You can create your own task by inheriting from base +[Task](https://github.com/tensorflow/models/blob/master/official/core/base_task.py), +or from one of the +[tasks](https://github.com/tensorflow/models/blob/master/official/vision/tasks/) +we already defined, if most of the operations can be reused. An `ExampleTask` +inheriting from +[ImageClassificationTask](https://github.com/tensorflow/models/blob/master/official/vision/tasks/image_classification.py#L32) +can be found +[here](example_task.py). +We will go through each important components in the task in the following. + +- `build_model`: you can instantiate a model you have defined above. It is + also good practice to run forward pass with a dummy input to ensure layers + within the model are properly initialized. + +- `build_inputs`: here you can instantiate a Decoder object and a Parser + object. They are used to create an `InputReader` that will generate a + `tf.data.Dataset` object. + +- `build_losses`: it takes groundtruth labels and model outputs as input, and + computes the loss. It will be called in `train_step` and `validation_step`. + You can also define different losses for training and validation, for + example, `build_train_losses` and `build_validation_losses`. Just make sure + they are called by the corresponding functions properly. + +- `build_metrics`: here you can define your own metrics. It should return a + list of `tf.keras.metrics.Metric` objects. You can create your own metric + class by subclassing `tf.keras.metrics.Metric`. + +- `train_step` and `validation_step`: they perform one-step training and + validation. They take one batch of training/validation data, run forward + pass, gather losses and update metrics. They assume the data format is + consistency with that from the `Parser` output. `train_step` also contains + backward pass to update model weights. + +## Import registry + +To use your custom dataloaders, models, tasks, etc., you will need to register +them properly. The recommended way is to have a single file with all relevant +files imported, for example, +[registry_imports.py](registry_imports.py). +You can see in this file we import all our custom components: + +```python +# pylint: disable=unused-import +from official.common import registry_imports +from official.vision.beta.projects.example import example_config +from official.vision.beta.projects.example import example_input +from official.vision.beta.projects.example import example_model +from official.vision.beta.projects.example import example_task +``` + +## Training + +You can create your own trainer by branching from our core +[trainer](https://github.com/tensorflow/models/blob/master/official/vision/train.py). +Just make sure you import the registry like this: + +```python +from official.vision.beta.projects.example import registry_imports # pylint: disable=unused-import +``` + +You can run training locally for testing purpose: + +```bash +# Assume you are under official/vision/projects. +python3 example/train.py \ + --experiment=tf_vision_example_experiment \ + --config_file=${PWD}/example/example_config_local.yaml \ + --mode=train \ + --model_dir=/tmp/tfvision_test/ +``` + +It can also run on Google Cloud using Cloud TPU. +[Here](https://cloud.google.com/tpu/docs/how-to) is the instruction of using +Cloud TPU and here is a more detailed +[tutorial](https://cloud.google.com/tpu/docs/tutorials/resnet-rs-2.x) of +training a ResNet-RS model. Following the instructions to set up Cloud TPU and +launch training by: + +```bash +EXP_TYPE=tf_vision_example_experiment # This should match the registered name of your experiment template. +EXP_NAME=exp_001 # You can give any name to the experiment. +TPU_NAME=experiment01 +# Now launch the experiment. +python3 example/train.py \ + --experiment=$EXP_TYPE \ + --mode=train \ + --tpu=$TPU_NAME \ + --model_dir=/tmp/tfvision_test/ + --config_file=third_party/tensorflow_models/official/vision/examples/starter/example_config_tpu.yaml +``` diff --git a/official/vision/beta/projects/example/example_config.py b/official/vision/examples/starter/example_config.py similarity index 98% rename from official/vision/beta/projects/example/example_config.py rename to official/vision/examples/starter/example_config.py index 8750acf8e30254b4afd479163ed0d932adb70ac4..5fed8307eddd83f7b9baad72a57e86f714b1f4a4 100644 --- a/official/vision/beta/projects/example/example_config.py +++ b/official/vision/examples/starter/example_config.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,9 +13,8 @@ # limitations under the License. """Example experiment configuration definition.""" -from typing import List - import dataclasses +from typing import List from official.core import config_definitions as cfg from official.core import exp_factory diff --git a/official/vision/beta/projects/example/example_config_local.yaml b/official/vision/examples/starter/example_config_local.yaml similarity index 80% rename from official/vision/beta/projects/example/example_config_local.yaml rename to official/vision/examples/starter/example_config_local.yaml index bbf04ee72f91e89170099e5dbdf0f698ee271490..193a524cfcf21e1458bc0c9d1a4367e3f80ab9b1 100644 --- a/official/vision/beta/projects/example/example_config_local.yaml +++ b/official/vision/examples/starter/example_config_local.yaml @@ -3,12 +3,12 @@ task: num_classes: 1001 input_size: [128, 128, 3] train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 64 dtype: 'bfloat16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 64 dtype: 'bfloat16' diff --git a/official/vision/beta/projects/example/example_config_tpu.yaml b/official/vision/examples/starter/example_config_tpu.yaml similarity index 82% rename from official/vision/beta/projects/example/example_config_tpu.yaml rename to official/vision/examples/starter/example_config_tpu.yaml index 22d6c5185eba6596a8d07d41f85aa4cf4c166576..5073bd1f6a98d373b08d82ef21145440d56f0807 100644 --- a/official/vision/beta/projects/example/example_config_tpu.yaml +++ b/official/vision/examples/starter/example_config_tpu.yaml @@ -6,12 +6,12 @@ task: num_classes: 1001 input_size: [128, 128, 3] train_data: - input_path: 'imagenet-2012-tfrecord/train*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/train*' is_training: true global_batch_size: 4096 dtype: 'bfloat16' validation_data: - input_path: 'imagenet-2012-tfrecord/valid*' + input_path: 'gs://mlcompass-data/imagenet/imagenet-2012-tfrecord/valid*' is_training: false global_batch_size: 4096 dtype: 'bfloat16' diff --git a/official/vision/beta/projects/example/example_input.py b/official/vision/examples/starter/example_input.py similarity index 92% rename from official/vision/beta/projects/example/example_input.py rename to official/vision/examples/starter/example_input.py index a3437752eadb6dea9299989dc00685a6a98393c6..84a2d8446f1cd8d1798403ae47cdd7acc3a665cc 100644 --- a/official/vision/beta/projects/example/example_input.py +++ b/official/vision/examples/starter/example_input.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,12 +22,9 @@ from typing import Mapping, List, Tuple # Import libraries import tensorflow as tf -from official.vision.beta.dataloaders import decoder -from official.vision.beta.dataloaders import parser -from official.vision.beta.ops import preprocess_ops - -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) +from official.vision.dataloaders import decoder +from official.vision.dataloaders import parser +from official.vision.ops import preprocess_ops class Decoder(decoder.Decoder): @@ -102,7 +99,7 @@ class Parser(parser.Parser): # Normalizes image with mean and std pixel values. image = preprocess_ops.normalize_image( - image, offset=MEAN_RGB, scale=STDDEV_RGB) + image, offset=preprocess_ops.MEAN_RGB, scale=preprocess_ops.STDDEV_RGB) image = tf.image.convert_image_dtype(image, self._dtype) return image, label diff --git a/official/vision/beta/projects/example/example_model.py b/official/vision/examples/starter/example_model.py similarity index 92% rename from official/vision/beta/projects/example/example_model.py rename to official/vision/examples/starter/example_model.py index 48417498fb98ab1b67b9f2318e0e9eea0345eda5..92b562d2c775872ca25d61cab86a155020eb3e70 100644 --- a/official/vision/beta/projects/example/example_model.py +++ b/official/vision/examples/starter/example_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,13 +16,13 @@ This is only a dummy example to showcase how a model is composed. It is usually not needed to implement a modedl from scratch. Most SoTA models can be found and -directly used from `official/vision/beta/modeling` directory. +directly used from `official/vision/modeling` directory. """ from typing import Any, Mapping # Import libraries import tensorflow as tf -from official.vision.beta.projects.example import example_config as example_cfg +from official.vision.examples.starter import example_config as example_cfg class ExampleModel(tf.keras.Model): @@ -84,7 +84,7 @@ def build_example_model(input_specs: tf.keras.layers.InputSpec, This function is the main entry point to build a model. Commonly, it build a model by building a backbone, decoder and head. An example of building a classification model is at - third_party/tensorflow_models/official/vision/beta/modeling/backbones/resnet.py. + third_party/tensorflow_models/official/vision/modeling/backbones/resnet.py. However, it is not mandatory for all models to have these three pieces exactly. Depending on the task, model can be as simple as the example model here or more complex, such as multi-head architecture. diff --git a/official/vision/beta/projects/example/example_task.py b/official/vision/examples/starter/example_task.py similarity index 93% rename from official/vision/beta/projects/example/example_task.py rename to official/vision/examples/starter/example_task.py index 412a401fd36dbb82d172094453165562a61cb487..0aafa073d33e13dbd6a86da13b8dbafa92890ee1 100644 --- a/official/vision/beta/projects/example/example_task.py +++ b/official/vision/examples/starter/example_task.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,10 +21,10 @@ from official.common import dataset_fn from official.core import base_task from official.core import task_factory from official.modeling import tf_utils -from official.vision.beta.dataloaders import input_reader_factory -from official.vision.beta.projects.example import example_config as exp_cfg -from official.vision.beta.projects.example import example_input -from official.vision.beta.projects.example import example_model +from official.vision.dataloaders import input_reader_factory +from official.vision.examples.starter import example_config as exp_cfg +from official.vision.examples.starter import example_input +from official.vision.examples.starter import example_model @task_factory.register_task_cls(exp_cfg.ExampleTask) @@ -138,7 +138,7 @@ class ExampleTask(base_task.Task): between output from Parser and input used here. Args: - inputs: A tuple of of input tensors of (features, labels). + inputs: A tuple of input tensors of (features, labels). model: A tf.keras.Model instance. optimizer: The optimizer for this training step. metrics: A nested structure of metrics objects. @@ -186,7 +186,7 @@ class ExampleTask(base_task.Task): """Runs validatation step. Args: - inputs: A tuple of of input tensors of (features, labels). + inputs: A tuple of input tensors of (features, labels). model: A tf.keras.Model instance. metrics: A nested structure of metrics objects. diff --git a/official/vision/examples/starter/registry_imports.py b/official/vision/examples/starter/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..feda2277a035607a9ad3a9a89a20f921198697ed --- /dev/null +++ b/official/vision/examples/starter/registry_imports.py @@ -0,0 +1,27 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration. + +Custom models, task, configs, etc need to be imported to registry so they can be +picked up by the trainer. They can be included in this file so you do not need +to handle each file separately. +""" + +# pylint: disable=unused-import +from official.common import registry_imports +from official.vision.examples.starter import example_config +from official.vision.examples.starter import example_input +from official.vision.examples.starter import example_model +from official.vision.examples.starter import example_task diff --git a/official/vision/examples/starter/train.py b/official/vision/examples/starter/train.py new file mode 100644 index 0000000000000000000000000000000000000000..bf9f4c52a9576e5478a9014852cd49a3025fb3c5 --- /dev/null +++ b/official/vision/examples/starter/train.py @@ -0,0 +1,30 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision trainer. + +All custom registry are imported from registry_imports. Here we use default +trainer so we directly call train.main. If you need to customize the trainer, +branch from `official/vision/beta/train.py` and make changes. +""" +from absl import app + +from official.common import flags as tfm_flags +from official.vision import train +from official.vision.examples.starter import registry_imports # pylint: disable=unused-import + + +if __name__ == '__main__': + tfm_flags.define_flags() + app.run(train.main) diff --git a/official/vision/image_classification/README.md b/official/vision/image_classification/README.md deleted file mode 100644 index 8e2edbf91888ec916231f66fea53f4887352a6c5..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/README.md +++ /dev/null @@ -1,185 +0,0 @@ -# Image Classification - -**Warning:** the features in the `image_classification/` folder have been fully -intergrated into vision/beta. Please use the [new code base](../beta/README.md). - -This folder contains TF 2.0 model examples for image classification: - -* [MNIST](#mnist) -* [Classifier Trainer](#classifier-trainer), a framework that uses the Keras -compile/fit methods for image classification models, including: - * ResNet - * EfficientNet[^1] - -[^1]: Currently a work in progress. We cannot match "AutoAugment (AA)" in [the original version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet). -For more information about other types of models, please refer to this -[README file](../../README.md). - -## Before you begin -Please make sure that you have the latest version of TensorFlow -installed and -[add the models folder to your Python path](/official/#running-the-models). - -### ImageNet preparation - -#### Using TFDS -`classifier_trainer.py` supports ImageNet with -[TensorFlow Datasets (TFDS)](https://www.tensorflow.org/datasets/overview). - -Please see the following [example snippet](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/scripts/download_and_prepare.py) -for more information on how to use TFDS to download and prepare datasets, and -specifically the [TFDS ImageNet readme](https://github.com/tensorflow/datasets/blob/master/docs/catalog/imagenet2012.md) -for manual download instructions. - -#### Legacy TFRecords -Download the ImageNet dataset and convert it to TFRecord format. -The following [script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py) -and [README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy) -provide a few options. - -Note that the legacy ResNet runners, e.g. [resnet/resnet_ctl_imagenet_main.py](resnet/resnet_ctl_imagenet_main.py) -require TFRecords whereas `classifier_trainer.py` can use both by setting the -builder to 'records' or 'tfds' in the configurations. - -### Running on Cloud TPUs - -Note: These models will **not** work with TPUs on Colab. - -You can train image classification models on Cloud TPUs using -[tf.distribute.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf.distribute.TPUStrategy?version=nightly). -If you are not familiar with Cloud TPUs, it is strongly recommended that you go -through the -[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to -create a TPU and GCE VM. - -### Running on multiple GPU hosts - -You can also train these models on multiple hosts, each with GPUs, using -[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy). - -The easiest way to run multi-host benchmarks is to set the -[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG) -appropriately at each host. e.g., to run using `MultiWorkerMirroredStrategy` on -2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and -host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker", -"index": i}`. `MultiWorkerMirroredStrategy` will automatically use all the -available GPUs at each host. - -## MNIST - -To download the data and run the MNIST sample model locally for the first time, -run one of the following command: - -```bash -python3 mnist_main.py \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --train_epochs=10 \ - --distribution_strategy=one_device \ - --num_gpus=$NUM_GPUS \ - --download -``` - -To train the model on a Cloud TPU, run the following command: - -```bash -python3 mnist_main.py \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --train_epochs=10 \ - --distribution_strategy=tpu \ - --download -``` - -Note: the `--download` flag is only required the first time you run the model. - - -## Classifier Trainer -The classifier trainer is a unified framework for running image classification -models using Keras's compile/fit methods. Experiments should be provided in the -form of YAML files, some examples are included within the configs/examples -folder. Please see [configs/examples](./configs/examples) for more example -configurations. - -The provided configuration files use a per replica batch size and is scaled -by the number of devices. For instance, if `batch size` = 64, then for 1 GPU -the global batch size would be 64 * 1 = 64. For 8 GPUs, the global batch size -would be 64 * 8 = 512. Similarly, for a v3-8 TPU, the global batch size would -be 64 * 8 = 512, and for a v3-32, the global batch size is 64 * 32 = 2048. - -### ResNet50 - -#### On GPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=resnet \ - --dataset=imagenet \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/resnet/imagenet/gpu.yaml \ - --params_override='runtime.num_gpus=$NUM_GPUS' -``` - -To train on multiple hosts, each with GPUs attached using -[MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) -please update `runtime` section in gpu.yaml -(or override using `--params_override`) with: - -```YAML -# gpu.yaml -runtime: - distribution_strategy: 'multi_worker_mirrored' - worker_hosts: '$HOST1:port,$HOST2:port' - num_gpus: $NUM_GPUS - task_index: 0 -``` -By having `task_index: 0` on the first host and `task_index: 1` on the second -and so on. `$HOST1` and `$HOST2` are the IP addresses of the hosts, and `port` -can be chosen any free port on the hosts. Only the first host will write -TensorBoard Summaries and save checkpoints. - -#### On TPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=resnet \ - --dataset=imagenet \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/resnet/imagenet/tpu.yaml -``` - -### EfficientNet -**Note: EfficientNet development is a work in progress.** -#### On GPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=efficientnet \ - --dataset=imagenet \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml \ - --params_override='runtime.num_gpus=$NUM_GPUS' -``` - - -#### On TPU: -```bash -python3 classifier_trainer.py \ - --mode=train_and_eval \ - --model_type=efficientnet \ - --dataset=imagenet \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml -``` - -Note that the number of GPU devices can be overridden in the command line using -`--params_overrides`. The TPU does not need this override as the device is fixed -by providing the TPU address or name with the `--tpu` flag. - diff --git a/official/vision/image_classification/__init__.py b/official/vision/image_classification/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/image_classification/augment.py b/official/vision/image_classification/augment.py deleted file mode 100644 index f322d31dac6ecc1e282566134720d42261a9b7fc..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/augment.py +++ /dev/null @@ -1,985 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""AutoAugment and RandAugment policies for enhanced image preprocessing. - -AutoAugment Reference: https://arxiv.org/abs/1805.09501 -RandAugment Reference: https://arxiv.org/abs/1909.13719 -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math -from typing import Any, Dict, List, Optional, Text, Tuple - -from keras.layers.preprocessing import image_preprocessing as image_ops -import tensorflow as tf - - -# This signifies the max integer that the controller RNN could predict for the -# augmentation scheme. -_MAX_LEVEL = 10. - - -def to_4d(image: tf.Tensor) -> tf.Tensor: - """Converts an input Tensor to 4 dimensions. - - 4D image => [N, H, W, C] or [N, C, H, W] - 3D image => [1, H, W, C] or [1, C, H, W] - 2D image => [1, H, W, 1] - - Args: - image: The 2/3/4D input tensor. - - Returns: - A 4D image tensor. - - Raises: - `TypeError` if `image` is not a 2/3/4D tensor. - - """ - shape = tf.shape(image) - original_rank = tf.rank(image) - left_pad = tf.cast(tf.less_equal(original_rank, 3), dtype=tf.int32) - right_pad = tf.cast(tf.equal(original_rank, 2), dtype=tf.int32) - new_shape = tf.concat( - [ - tf.ones(shape=left_pad, dtype=tf.int32), - shape, - tf.ones(shape=right_pad, dtype=tf.int32), - ], - axis=0, - ) - return tf.reshape(image, new_shape) - - -def from_4d(image: tf.Tensor, ndims: tf.Tensor) -> tf.Tensor: - """Converts a 4D image back to `ndims` rank.""" - shape = tf.shape(image) - begin = tf.cast(tf.less_equal(ndims, 3), dtype=tf.int32) - end = 4 - tf.cast(tf.equal(ndims, 2), dtype=tf.int32) - new_shape = shape[begin:end] - return tf.reshape(image, new_shape) - - -def _convert_translation_to_transform(translations: tf.Tensor) -> tf.Tensor: - """Converts translations to a projective transform. - - The translation matrix looks like this: - [[1 0 -dx] - [0 1 -dy] - [0 0 1]] - - Args: - translations: The 2-element list representing [dx, dy], or a matrix of - 2-element lists representing [dx dy] to translate for each image. The - shape must be static. - - Returns: - The transformation matrix of shape (num_images, 8). - - Raises: - `TypeError` if - - the shape of `translations` is not known or - - the shape of `translations` is not rank 1 or 2. - - """ - translations = tf.convert_to_tensor(translations, dtype=tf.float32) - if translations.get_shape().ndims is None: - raise TypeError('translations rank must be statically known') - elif len(translations.get_shape()) == 1: - translations = translations[None] - elif len(translations.get_shape()) != 2: - raise TypeError('translations should have rank 1 or 2.') - num_translations = tf.shape(translations)[0] - - return tf.concat( - values=[ - tf.ones((num_translations, 1), tf.dtypes.float32), - tf.zeros((num_translations, 1), tf.dtypes.float32), - -translations[:, 0, None], - tf.zeros((num_translations, 1), tf.dtypes.float32), - tf.ones((num_translations, 1), tf.dtypes.float32), - -translations[:, 1, None], - tf.zeros((num_translations, 2), tf.dtypes.float32), - ], - axis=1, - ) - - -def _convert_angles_to_transform(angles: tf.Tensor, image_width: tf.Tensor, - image_height: tf.Tensor) -> tf.Tensor: - """Converts an angle or angles to a projective transform. - - Args: - angles: A scalar to rotate all images, or a vector to rotate a batch of - images. This must be a scalar. - image_width: The width of the image(s) to be transformed. - image_height: The height of the image(s) to be transformed. - - Returns: - A tensor of shape (num_images, 8). - - Raises: - `TypeError` if `angles` is not rank 0 or 1. - - """ - angles = tf.convert_to_tensor(angles, dtype=tf.float32) - if len(angles.get_shape()) == 0: # pylint:disable=g-explicit-length-test - angles = angles[None] - elif len(angles.get_shape()) != 1: - raise TypeError('Angles should have a rank 0 or 1.') - x_offset = ((image_width - 1) - - (tf.math.cos(angles) * (image_width - 1) - tf.math.sin(angles) * - (image_height - 1))) / 2.0 - y_offset = ((image_height - 1) - - (tf.math.sin(angles) * (image_width - 1) + tf.math.cos(angles) * - (image_height - 1))) / 2.0 - num_angles = tf.shape(angles)[0] - return tf.concat( - values=[ - tf.math.cos(angles)[:, None], - -tf.math.sin(angles)[:, None], - x_offset[:, None], - tf.math.sin(angles)[:, None], - tf.math.cos(angles)[:, None], - y_offset[:, None], - tf.zeros((num_angles, 2), tf.dtypes.float32), - ], - axis=1, - ) - - -def transform(image: tf.Tensor, transforms) -> tf.Tensor: - """Prepares input data for `image_ops.transform`.""" - original_ndims = tf.rank(image) - transforms = tf.convert_to_tensor(transforms, dtype=tf.float32) - if transforms.shape.rank == 1: - transforms = transforms[None] - image = to_4d(image) - image = image_ops.transform( - images=image, transforms=transforms, interpolation='nearest') - return from_4d(image, original_ndims) - - -def translate(image: tf.Tensor, translations) -> tf.Tensor: - """Translates image(s) by provided vectors. - - Args: - image: An image Tensor of type uint8. - translations: A vector or matrix representing [dx dy]. - - Returns: - The translated version of the image. - - """ - transforms = _convert_translation_to_transform(translations) - return transform(image, transforms=transforms) - - -def rotate(image: tf.Tensor, degrees: float) -> tf.Tensor: - """Rotates the image by degrees either clockwise or counterclockwise. - - Args: - image: An image Tensor of type uint8. - degrees: Float, a scalar angle in degrees to rotate all images by. If - degrees is positive the image will be rotated clockwise otherwise it will - be rotated counterclockwise. - - Returns: - The rotated version of image. - - """ - # Convert from degrees to radians. - degrees_to_radians = math.pi / 180.0 - radians = tf.cast(degrees * degrees_to_radians, tf.float32) - - original_ndims = tf.rank(image) - image = to_4d(image) - - image_height = tf.cast(tf.shape(image)[1], tf.float32) - image_width = tf.cast(tf.shape(image)[2], tf.float32) - transforms = _convert_angles_to_transform( - angles=radians, image_width=image_width, image_height=image_height) - # In practice, we should randomize the rotation degrees by flipping - # it negatively half the time, but that's done on 'degrees' outside - # of the function. - image = transform(image, transforms=transforms) - return from_4d(image, original_ndims) - - -def blend(image1: tf.Tensor, image2: tf.Tensor, factor: float) -> tf.Tensor: - """Blend image1 and image2 using 'factor'. - - Factor can be above 0.0. A value of 0.0 means only image1 is used. - A value of 1.0 means only image2 is used. A value between 0.0 and - 1.0 means we linearly interpolate the pixel values between the two - images. A value greater than 1.0 "extrapolates" the difference - between the two pixel values, and we clip the results to values - between 0 and 255. - - Args: - image1: An image Tensor of type uint8. - image2: An image Tensor of type uint8. - factor: A floating point value above 0.0. - - Returns: - A blended image Tensor of type uint8. - """ - if factor == 0.0: - return tf.convert_to_tensor(image1) - if factor == 1.0: - return tf.convert_to_tensor(image2) - - image1 = tf.cast(image1, tf.float32) - image2 = tf.cast(image2, tf.float32) - - difference = image2 - image1 - scaled = factor * difference - - # Do addition in float. - temp = tf.cast(image1, tf.float32) + scaled - - # Interpolate - if factor > 0.0 and factor < 1.0: - # Interpolation means we always stay within 0 and 255. - return tf.cast(temp, tf.uint8) - - # Extrapolate: - # - # We need to clip and then cast. - return tf.cast(tf.clip_by_value(temp, 0.0, 255.0), tf.uint8) - - -def cutout(image: tf.Tensor, pad_size: int, replace: int = 0) -> tf.Tensor: - """Apply cutout (https://arxiv.org/abs/1708.04552) to image. - - This operation applies a (2*pad_size x 2*pad_size) mask of zeros to - a random location within `img`. The pixel values filled in will be of the - value `replace`. The located where the mask will be applied is randomly - chosen uniformly over the whole image. - - Args: - image: An image Tensor of type uint8. - pad_size: Specifies how big the zero mask that will be generated is that is - applied to the image. The mask will be of size (2*pad_size x 2*pad_size). - replace: What pixel value to fill in the image in the area that has the - cutout mask applied to it. - - Returns: - An image Tensor that is of type uint8. - """ - image_height = tf.shape(image)[0] - image_width = tf.shape(image)[1] - - # Sample the center location in the image where the zero mask will be applied. - cutout_center_height = tf.random.uniform( - shape=[], minval=0, maxval=image_height, dtype=tf.int32) - - cutout_center_width = tf.random.uniform( - shape=[], minval=0, maxval=image_width, dtype=tf.int32) - - lower_pad = tf.maximum(0, cutout_center_height - pad_size) - upper_pad = tf.maximum(0, image_height - cutout_center_height - pad_size) - left_pad = tf.maximum(0, cutout_center_width - pad_size) - right_pad = tf.maximum(0, image_width - cutout_center_width - pad_size) - - cutout_shape = [ - image_height - (lower_pad + upper_pad), - image_width - (left_pad + right_pad) - ] - padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]] - mask = tf.pad( - tf.zeros(cutout_shape, dtype=image.dtype), - padding_dims, - constant_values=1) - mask = tf.expand_dims(mask, -1) - mask = tf.tile(mask, [1, 1, 3]) - image = tf.where( - tf.equal(mask, 0), - tf.ones_like(image, dtype=image.dtype) * replace, image) - return image - - -def solarize(image: tf.Tensor, threshold: int = 128) -> tf.Tensor: - # For each pixel in the image, select the pixel - # if the value is less than the threshold. - # Otherwise, subtract 255 from the pixel. - return tf.where(image < threshold, image, 255 - image) - - -def solarize_add(image: tf.Tensor, - addition: int = 0, - threshold: int = 128) -> tf.Tensor: - # For each pixel in the image less than threshold - # we add 'addition' amount to it and then clip the - # pixel value to be between 0 and 255. The value - # of 'addition' is between -128 and 128. - added_image = tf.cast(image, tf.int64) + addition - added_image = tf.cast(tf.clip_by_value(added_image, 0, 255), tf.uint8) - return tf.where(image < threshold, added_image, image) - - -def color(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Color.""" - degenerate = tf.image.grayscale_to_rgb(tf.image.rgb_to_grayscale(image)) - return blend(degenerate, image, factor) - - -def contrast(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Contrast.""" - degenerate = tf.image.rgb_to_grayscale(image) - # Cast before calling tf.histogram. - degenerate = tf.cast(degenerate, tf.int32) - - # Compute the grayscale histogram, then compute the mean pixel value, - # and create a constant image size of that value. Use that as the - # blending degenerate target of the original image. - hist = tf.histogram_fixed_width(degenerate, [0, 255], nbins=256) - mean = tf.reduce_sum(tf.cast(hist, tf.float32)) / 256.0 - degenerate = tf.ones_like(degenerate, dtype=tf.float32) * mean - degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) - degenerate = tf.image.grayscale_to_rgb(tf.cast(degenerate, tf.uint8)) - return blend(degenerate, image, factor) - - -def brightness(image: tf.Tensor, factor: float) -> tf.Tensor: - """Equivalent of PIL Brightness.""" - degenerate = tf.zeros_like(image) - return blend(degenerate, image, factor) - - -def posterize(image: tf.Tensor, bits: int) -> tf.Tensor: - """Equivalent of PIL Posterize.""" - shift = 8 - bits - return tf.bitwise.left_shift(tf.bitwise.right_shift(image, shift), shift) - - -def wrapped_rotate(image: tf.Tensor, degrees: float, replace: int) -> tf.Tensor: - """Applies rotation with wrap/unwrap.""" - image = rotate(wrap(image), degrees=degrees) - return unwrap(image, replace) - - -def translate_x(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: - """Equivalent of PIL Translate in X dimension.""" - image = translate(wrap(image), [-pixels, 0]) - return unwrap(image, replace) - - -def translate_y(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: - """Equivalent of PIL Translate in Y dimension.""" - image = translate(wrap(image), [0, -pixels]) - return unwrap(image, replace) - - -def shear_x(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: - """Equivalent of PIL Shearing in X dimension.""" - # Shear parallel to x axis is a projective transform - # with a matrix form of: - # [1 level - # 0 1]. - image = transform( - image=wrap(image), transforms=[1., level, 0., 0., 1., 0., 0., 0.]) - return unwrap(image, replace) - - -def shear_y(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: - """Equivalent of PIL Shearing in Y dimension.""" - # Shear parallel to y axis is a projective transform - # with a matrix form of: - # [1 0 - # level 1]. - image = transform( - image=wrap(image), transforms=[1., 0., 0., level, 1., 0., 0., 0.]) - return unwrap(image, replace) - - -def autocontrast(image: tf.Tensor) -> tf.Tensor: - """Implements Autocontrast function from PIL using TF ops. - - Args: - image: A 3D uint8 tensor. - - Returns: - The image after it has had autocontrast applied to it and will be of type - uint8. - """ - - def scale_channel(image: tf.Tensor) -> tf.Tensor: - """Scale the 2D image using the autocontrast rule.""" - # A possibly cheaper version can be done using cumsum/unique_with_counts - # over the histogram values, rather than iterating over the entire image. - # to compute mins and maxes. - lo = tf.cast(tf.reduce_min(image), tf.float32) - hi = tf.cast(tf.reduce_max(image), tf.float32) - - # Scale the image, making the lowest value 0 and the highest value 255. - def scale_values(im): - scale = 255.0 / (hi - lo) - offset = -lo * scale - im = tf.cast(im, tf.float32) * scale + offset - im = tf.clip_by_value(im, 0.0, 255.0) - return tf.cast(im, tf.uint8) - - result = tf.cond(hi > lo, lambda: scale_values(image), lambda: image) - return result - - # Assumes RGB for now. Scales each channel independently - # and then stacks the result. - s1 = scale_channel(image[:, :, 0]) - s2 = scale_channel(image[:, :, 1]) - s3 = scale_channel(image[:, :, 2]) - image = tf.stack([s1, s2, s3], 2) - return image - - -def sharpness(image: tf.Tensor, factor: float) -> tf.Tensor: - """Implements Sharpness function from PIL using TF ops.""" - orig_image = image - image = tf.cast(image, tf.float32) - # Make image 4D for conv operation. - image = tf.expand_dims(image, 0) - # SMOOTH PIL Kernel. - kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]], - dtype=tf.float32, - shape=[3, 3, 1, 1]) / 13. - # Tile across channel dimension. - kernel = tf.tile(kernel, [1, 1, 3, 1]) - strides = [1, 1, 1, 1] - degenerate = tf.nn.depthwise_conv2d( - image, kernel, strides, padding='VALID', dilations=[1, 1]) - degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) - degenerate = tf.squeeze(tf.cast(degenerate, tf.uint8), [0]) - - # For the borders of the resulting image, fill in the values of the - # original image. - mask = tf.ones_like(degenerate) - padded_mask = tf.pad(mask, [[1, 1], [1, 1], [0, 0]]) - padded_degenerate = tf.pad(degenerate, [[1, 1], [1, 1], [0, 0]]) - result = tf.where(tf.equal(padded_mask, 1), padded_degenerate, orig_image) - - # Blend the final result. - return blend(result, orig_image, factor) - - -def equalize(image: tf.Tensor) -> tf.Tensor: - """Implements Equalize function from PIL using TF ops.""" - - def scale_channel(im, c): - """Scale the data in the channel to implement equalize.""" - im = tf.cast(im[:, :, c], tf.int32) - # Compute the histogram of the image channel. - histo = tf.histogram_fixed_width(im, [0, 255], nbins=256) - - # For the purposes of computing the step, filter out the nonzeros. - nonzero = tf.where(tf.not_equal(histo, 0)) - nonzero_histo = tf.reshape(tf.gather(histo, nonzero), [-1]) - step = (tf.reduce_sum(nonzero_histo) - nonzero_histo[-1]) // 255 - - def build_lut(histo, step): - # Compute the cumulative sum, shifting by step // 2 - # and then normalization by step. - lut = (tf.cumsum(histo) + (step // 2)) // step - # Shift lut, prepending with 0. - lut = tf.concat([[0], lut[:-1]], 0) - # Clip the counts to be in range. This is done - # in the C code for image.point. - return tf.clip_by_value(lut, 0, 255) - - # If step is zero, return the original image. Otherwise, build - # lut from the full histogram and step and then index from it. - result = tf.cond( - tf.equal(step, 0), lambda: im, - lambda: tf.gather(build_lut(histo, step), im)) - - return tf.cast(result, tf.uint8) - - # Assumes RGB for now. Scales each channel independently - # and then stacks the result. - s1 = scale_channel(image, 0) - s2 = scale_channel(image, 1) - s3 = scale_channel(image, 2) - image = tf.stack([s1, s2, s3], 2) - return image - - -def invert(image: tf.Tensor) -> tf.Tensor: - """Inverts the image pixels.""" - image = tf.convert_to_tensor(image) - return 255 - image - - -def wrap(image: tf.Tensor) -> tf.Tensor: - """Returns 'image' with an extra channel set to all 1s.""" - shape = tf.shape(image) - extended_channel = tf.ones([shape[0], shape[1], 1], image.dtype) - extended = tf.concat([image, extended_channel], axis=2) - return extended - - -def unwrap(image: tf.Tensor, replace: int) -> tf.Tensor: - """Unwraps an image produced by wrap. - - Where there is a 0 in the last channel for every spatial position, - the rest of the three channels in that spatial dimension are grayed - (set to 128). Operations like translate and shear on a wrapped - Tensor will leave 0s in empty locations. Some transformations look - at the intensity of values to do preprocessing, and we want these - empty pixels to assume the 'average' value, rather than pure black. - - - Args: - image: A 3D Image Tensor with 4 channels. - replace: A one or three value 1D tensor to fill empty pixels. - - Returns: - image: A 3D image Tensor with 3 channels. - """ - image_shape = tf.shape(image) - # Flatten the spatial dimensions. - flattened_image = tf.reshape(image, [-1, image_shape[2]]) - - # Find all pixels where the last channel is zero. - alpha_channel = tf.expand_dims(flattened_image[:, 3], axis=-1) - - replace = tf.concat([replace, tf.ones([1], image.dtype)], 0) - - # Where they are zero, fill them in with 'replace'. - flattened_image = tf.where( - tf.equal(alpha_channel, 0), - tf.ones_like(flattened_image, dtype=image.dtype) * replace, - flattened_image) - - image = tf.reshape(flattened_image, image_shape) - image = tf.slice(image, [0, 0, 0], [image_shape[0], image_shape[1], 3]) - return image - - -def _randomly_negate_tensor(tensor): - """With 50% prob turn the tensor negative.""" - should_flip = tf.cast(tf.floor(tf.random.uniform([]) + 0.5), tf.bool) - final_tensor = tf.cond(should_flip, lambda: tensor, lambda: -tensor) - return final_tensor - - -def _rotate_level_to_arg(level: float): - level = (level / _MAX_LEVEL) * 30. - level = _randomly_negate_tensor(level) - return (level,) - - -def _shrink_level_to_arg(level: float): - """Converts level to ratio by which we shrink the image content.""" - if level == 0: - return (1.0,) # if level is zero, do not shrink the image - # Maximum shrinking ratio is 2.9. - level = 2. / (_MAX_LEVEL / level) + 0.9 - return (level,) - - -def _enhance_level_to_arg(level: float): - return ((level / _MAX_LEVEL) * 1.8 + 0.1,) - - -def _shear_level_to_arg(level: float): - level = (level / _MAX_LEVEL) * 0.3 - # Flip level to negative with 50% chance. - level = _randomly_negate_tensor(level) - return (level,) - - -def _translate_level_to_arg(level: float, translate_const: float): - level = (level / _MAX_LEVEL) * float(translate_const) - # Flip level to negative with 50% chance. - level = _randomly_negate_tensor(level) - return (level,) - - -def _mult_to_arg(level: float, multiplier: float = 1.): - return (int((level / _MAX_LEVEL) * multiplier),) - - -def _apply_func_with_prob(func: Any, image: tf.Tensor, args: Any, prob: float): - """Apply `func` to image w/ `args` as input with probability `prob`.""" - assert isinstance(args, tuple) - - # Apply the function with probability `prob`. - should_apply_op = tf.cast( - tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) - augmented_image = tf.cond(should_apply_op, lambda: func(image, *args), - lambda: image) - return augmented_image - - -def select_and_apply_random_policy(policies: Any, image: tf.Tensor): - """Select a random policy from `policies` and apply it to `image`.""" - policy_to_select = tf.random.uniform([], maxval=len(policies), dtype=tf.int32) - # Note that using tf.case instead of tf.conds would result in significantly - # larger graphs and would even break export for some larger policies. - for (i, policy) in enumerate(policies): - image = tf.cond( - tf.equal(i, policy_to_select), - lambda selected_policy=policy: selected_policy(image), - lambda: image) - return image - - -NAME_TO_FUNC = { - 'AutoContrast': autocontrast, - 'Equalize': equalize, - 'Invert': invert, - 'Rotate': wrapped_rotate, - 'Posterize': posterize, - 'Solarize': solarize, - 'SolarizeAdd': solarize_add, - 'Color': color, - 'Contrast': contrast, - 'Brightness': brightness, - 'Sharpness': sharpness, - 'ShearX': shear_x, - 'ShearY': shear_y, - 'TranslateX': translate_x, - 'TranslateY': translate_y, - 'Cutout': cutout, -} - -# Functions that have a 'replace' parameter -REPLACE_FUNCS = frozenset({ - 'Rotate', - 'TranslateX', - 'ShearX', - 'ShearY', - 'TranslateY', - 'Cutout', -}) - - -def level_to_arg(cutout_const: float, translate_const: float): - """Creates a dict mapping image operation names to their arguments.""" - - no_arg = lambda level: () - posterize_arg = lambda level: _mult_to_arg(level, 4) - solarize_arg = lambda level: _mult_to_arg(level, 256) - solarize_add_arg = lambda level: _mult_to_arg(level, 110) - cutout_arg = lambda level: _mult_to_arg(level, cutout_const) - translate_arg = lambda level: _translate_level_to_arg(level, translate_const) - - args = { - 'AutoContrast': no_arg, - 'Equalize': no_arg, - 'Invert': no_arg, - 'Rotate': _rotate_level_to_arg, - 'Posterize': posterize_arg, - 'Solarize': solarize_arg, - 'SolarizeAdd': solarize_add_arg, - 'Color': _enhance_level_to_arg, - 'Contrast': _enhance_level_to_arg, - 'Brightness': _enhance_level_to_arg, - 'Sharpness': _enhance_level_to_arg, - 'ShearX': _shear_level_to_arg, - 'ShearY': _shear_level_to_arg, - 'Cutout': cutout_arg, - 'TranslateX': translate_arg, - 'TranslateY': translate_arg, - } - return args - - -def _parse_policy_info(name: Text, prob: float, level: float, - replace_value: List[int], cutout_const: float, - translate_const: float) -> Tuple[Any, float, Any]: - """Return the function that corresponds to `name` and update `level` param.""" - func = NAME_TO_FUNC[name] - args = level_to_arg(cutout_const, translate_const)[name](level) - - if name in REPLACE_FUNCS: - # Add in replace arg if it is required for the function that is called. - args = tuple(list(args) + [replace_value]) - - return func, prob, args - - -class ImageAugment(object): - """Image augmentation class for applying image distortions.""" - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """Given an image tensor, returns a distorted image with the same shape. - - Args: - image: `Tensor` of shape [height, width, 3] representing an image. - - Returns: - The augmented version of `image`. - """ - raise NotImplementedError() - - -class AutoAugment(ImageAugment): - """Applies the AutoAugment policy to images. - - AutoAugment is from the paper: https://arxiv.org/abs/1805.09501. - """ - - def __init__(self, - augmentation_name: Text = 'v0', - policies: Optional[Dict[Text, Any]] = None, - cutout_const: float = 100, - translate_const: float = 250): - """Applies the AutoAugment policy to images. - - Args: - augmentation_name: The name of the AutoAugment policy to use. The - available options are `v0` and `test`. `v0` is the policy used for all - of the results in the paper and was found to achieve the best results on - the COCO dataset. `v1`, `v2` and `v3` are additional good policies found - on the COCO dataset that have slight variation in what operations were - used during the search procedure along with how many operations are - applied in parallel to a single image (2 vs 3). - policies: list of lists of tuples in the form `(func, prob, level)`, - `func` is a string name of the augmentation function, `prob` is the - probability of applying the `func` operation, `level` is the input - argument for `func`. - cutout_const: multiplier for applying cutout. - translate_const: multiplier for applying translation. - """ - super(AutoAugment, self).__init__() - - if policies is None: - self.available_policies = { - 'v0': self.policy_v0(), - 'test': self.policy_test(), - 'simple': self.policy_simple(), - } - - if augmentation_name not in self.available_policies: - raise ValueError( - 'Invalid augmentation_name: {}'.format(augmentation_name)) - - self.augmentation_name = augmentation_name - self.policies = self.available_policies[augmentation_name] - self.cutout_const = float(cutout_const) - self.translate_const = float(translate_const) - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """Applies the AutoAugment policy to `image`. - - AutoAugment is from the paper: https://arxiv.org/abs/1805.09501. - - Args: - image: `Tensor` of shape [height, width, 3] representing an image. - - Returns: - A version of image that now has data augmentation applied to it based on - the `policies` pass into the function. - """ - input_image_type = image.dtype - - if input_image_type != tf.uint8: - image = tf.clip_by_value(image, 0.0, 255.0) - image = tf.cast(image, dtype=tf.uint8) - - replace_value = [128] * 3 - - # func is the string name of the augmentation function, prob is the - # probability of applying the operation and level is the parameter - # associated with the tf op. - - # tf_policies are functions that take in an image and return an augmented - # image. - tf_policies = [] - for policy in self.policies: - tf_policy = [] - # Link string name to the correct python function and make sure the - # correct argument is passed into that function. - for policy_info in policy: - policy_info = list(policy_info) + [ - replace_value, self.cutout_const, self.translate_const - ] - tf_policy.append(_parse_policy_info(*policy_info)) - # Now build the tf policy that will apply the augmentation procedue - # on image. - def make_final_policy(tf_policy_): - - def final_policy(image_): - for func, prob, args in tf_policy_: - image_ = _apply_func_with_prob(func, image_, args, prob) - return image_ - - return final_policy - - tf_policies.append(make_final_policy(tf_policy)) - - image = select_and_apply_random_policy(tf_policies, image) - image = tf.cast(image, dtype=input_image_type) - return image - - @staticmethod - def policy_v0(): - """Autoaugment policy that was used in AutoAugment Paper. - - Each tuple is an augmentation operation of the form - (operation, probability, magnitude). Each element in policy is a - sub-policy that will be applied sequentially on the image. - - Returns: - the policy. - """ - - # TODO(dankondratyuk): tensorflow_addons defines custom ops, which - # for some reason are not included when building/linking - # This results in the error, "Op type not registered - # 'Addons>ImageProjectiveTransformV2' in binary" when running on borg TPUs - policy = [ - [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)], - [('Color', 0.4, 9), ('Equalize', 0.6, 3)], - [('Color', 0.4, 1), ('Rotate', 0.6, 8)], - [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], - [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], - [('Color', 0.2, 0), ('Equalize', 0.8, 8)], - [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], - [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)], - [('Color', 0.6, 1), ('Equalize', 1.0, 2)], - [('Invert', 0.4, 9), ('Rotate', 0.6, 0)], - [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)], - [('Color', 0.4, 7), ('Equalize', 0.6, 0)], - [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], - [('Solarize', 0.6, 8), ('Color', 0.6, 9)], - [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)], - [('Rotate', 1.0, 7), ('TranslateY', 0.8, 9)], - [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)], - [('ShearY', 0.8, 0), ('Color', 0.6, 4)], - [('Color', 1.0, 0), ('Rotate', 0.6, 2)], - [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], - [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], - [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)], - [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], - [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], - [('Color', 0.8, 6), ('Rotate', 0.4, 5)], - ] - return policy - - @staticmethod - def policy_simple(): - """Same as `policy_v0`, except with custom ops removed.""" - - policy = [ - [('Color', 0.4, 9), ('Equalize', 0.6, 3)], - [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], - [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], - [('Color', 0.2, 0), ('Equalize', 0.8, 8)], - [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], - [('Color', 0.6, 1), ('Equalize', 1.0, 2)], - [('Color', 0.4, 7), ('Equalize', 0.6, 0)], - [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], - [('Solarize', 0.6, 8), ('Color', 0.6, 9)], - [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], - [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], - [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], - [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], - ] - return policy - - @staticmethod - def policy_test(): - """Autoaugment test policy for debugging.""" - policy = [ - [('TranslateX', 1.0, 4), ('Equalize', 1.0, 10)], - ] - return policy - - -class RandAugment(ImageAugment): - """Applies the RandAugment policy to images. - - RandAugment is from the paper https://arxiv.org/abs/1909.13719, - """ - - def __init__(self, - num_layers: int = 2, - magnitude: float = 10., - cutout_const: float = 40., - translate_const: float = 100.): - """Applies the RandAugment policy to images. - - Args: - num_layers: Integer, the number of augmentation transformations to apply - sequentially to an image. Represented as (N) in the paper. Usually best - values will be in the range [1, 3]. - magnitude: Integer, shared magnitude across all augmentation operations. - Represented as (M) in the paper. Usually best values are in the range - [5, 10]. - cutout_const: multiplier for applying cutout. - translate_const: multiplier for applying translation. - """ - super(RandAugment, self).__init__() - - self.num_layers = num_layers - self.magnitude = float(magnitude) - self.cutout_const = float(cutout_const) - self.translate_const = float(translate_const) - self.available_ops = [ - 'AutoContrast', 'Equalize', 'Invert', 'Rotate', 'Posterize', 'Solarize', - 'Color', 'Contrast', 'Brightness', 'Sharpness', 'ShearX', 'ShearY', - 'TranslateX', 'TranslateY', 'Cutout', 'SolarizeAdd' - ] - - def distort(self, image: tf.Tensor) -> tf.Tensor: - """Applies the RandAugment policy to `image`. - - Args: - image: `Tensor` of shape [height, width, 3] representing an image. - - Returns: - The augmented version of `image`. - """ - input_image_type = image.dtype - - if input_image_type != tf.uint8: - image = tf.clip_by_value(image, 0.0, 255.0) - image = tf.cast(image, dtype=tf.uint8) - - replace_value = [128] * 3 - min_prob, max_prob = 0.2, 0.8 - - for _ in range(self.num_layers): - op_to_select = tf.random.uniform([], - maxval=len(self.available_ops) + 1, - dtype=tf.int32) - - branch_fns = [] - for (i, op_name) in enumerate(self.available_ops): - prob = tf.random.uniform([], - minval=min_prob, - maxval=max_prob, - dtype=tf.float32) - func, _, args = _parse_policy_info(op_name, prob, self.magnitude, - replace_value, self.cutout_const, - self.translate_const) - branch_fns.append(( - i, - # pylint:disable=g-long-lambda - lambda selected_func=func, selected_args=args: selected_func( - image, *selected_args))) - # pylint:enable=g-long-lambda - - image = tf.switch_case( - branch_index=op_to_select, - branch_fns=branch_fns, - default=lambda: tf.identity(image)) - - image = tf.cast(image, dtype=input_image_type) - return image diff --git a/official/vision/image_classification/augment_test.py b/official/vision/image_classification/augment_test.py deleted file mode 100644 index 6279352204c46ae24d1971c48160ff7c6b0acc79..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/augment_test.py +++ /dev/null @@ -1,129 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for autoaugment.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from absl.testing import parameterized - -import tensorflow as tf - -from official.vision.image_classification import augment - - -def get_dtype_test_cases(): - return [ - ('uint8', tf.uint8), - ('int32', tf.int32), - ('float16', tf.float16), - ('float32', tf.float32), - ] - - -@parameterized.named_parameters(get_dtype_test_cases()) -class TransformsTest(parameterized.TestCase, tf.test.TestCase): - """Basic tests for fundamental transformations.""" - - def test_to_from_4d(self, dtype): - for shape in [(10, 10), (10, 10, 10), (10, 10, 10, 10)]: - original_ndims = len(shape) - image = tf.zeros(shape, dtype=dtype) - image_4d = augment.to_4d(image) - self.assertEqual(4, tf.rank(image_4d)) - self.assertAllEqual(image, augment.from_4d(image_4d, original_ndims)) - - def test_transform(self, dtype): - image = tf.constant([[1, 2], [3, 4]], dtype=dtype) - self.assertAllEqual( - augment.transform(image, transforms=[1] * 8), [[4, 4], [4, 4]]) - - def test_translate(self, dtype): - image = tf.constant( - [[1, 0, 1, 0], [0, 1, 0, 1], [1, 0, 1, 0], [0, 1, 0, 1]], dtype=dtype) - translations = [-1, -1] - translated = augment.translate(image=image, translations=translations) - expected = [[1, 0, 1, 1], [0, 1, 0, 0], [1, 0, 1, 1], [1, 0, 1, 1]] - self.assertAllEqual(translated, expected) - - def test_translate_shapes(self, dtype): - translation = [0, 0] - for shape in [(3, 3), (5, 5), (224, 224, 3)]: - image = tf.zeros(shape, dtype=dtype) - self.assertAllEqual(image, augment.translate(image, translation)) - - def test_translate_invalid_translation(self, dtype): - image = tf.zeros((1, 1), dtype=dtype) - invalid_translation = [[[1, 1]]] - with self.assertRaisesRegex(TypeError, 'rank 1 or 2'): - _ = augment.translate(image, invalid_translation) - - def test_rotate(self, dtype): - image = tf.reshape(tf.cast(tf.range(9), dtype), (3, 3)) - rotation = 90. - transformed = augment.rotate(image=image, degrees=rotation) - expected = [[2, 5, 8], [1, 4, 7], [0, 3, 6]] - self.assertAllEqual(transformed, expected) - - def test_rotate_shapes(self, dtype): - degrees = 0. - for shape in [(3, 3), (5, 5), (224, 224, 3)]: - image = tf.zeros(shape, dtype=dtype) - self.assertAllEqual(image, augment.rotate(image, degrees)) - - -class AutoaugmentTest(tf.test.TestCase): - - def test_autoaugment(self): - """Smoke test to be sure there are no syntax errors.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - - augmenter = augment.AutoAugment() - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - - def test_randaug(self): - """Smoke test to be sure there are no syntax errors.""" - image = tf.zeros((224, 224, 3), dtype=tf.uint8) - - augmenter = augment.RandAugment() - aug_image = augmenter.distort(image) - - self.assertEqual((224, 224, 3), aug_image.shape) - - def test_all_policy_ops(self): - """Smoke test to be sure all augmentation functions can execute.""" - - prob = 1 - magnitude = 10 - replace_value = [128] * 3 - cutout_const = 100 - translate_const = 250 - - image = tf.ones((224, 224, 3), dtype=tf.uint8) - - for op_name in augment.NAME_TO_FUNC: - func, _, args = augment._parse_policy_info(op_name, prob, magnitude, - replace_value, cutout_const, - translate_const) - image = func(image, *args) - - self.assertEqual((224, 224, 3), image.shape) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/callbacks.py b/official/vision/image_classification/callbacks.py deleted file mode 100644 index a4934ed88f7db280d1ffd9ad57346f68a5395d5e..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/callbacks.py +++ /dev/null @@ -1,256 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Common modules for callbacks.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -from typing import Any, List, MutableMapping, Optional, Text - -from absl import logging -import tensorflow as tf - -from official.modeling import optimization -from official.utils.misc import keras_utils - - -def get_callbacks( - model_checkpoint: bool = True, - include_tensorboard: bool = True, - time_history: bool = True, - track_lr: bool = True, - write_model_weights: bool = True, - apply_moving_average: bool = False, - initial_step: int = 0, - batch_size: int = 0, - log_steps: int = 0, - model_dir: Optional[str] = None, - backup_and_restore: bool = False) -> List[tf.keras.callbacks.Callback]: - """Get all callbacks.""" - model_dir = model_dir or '' - callbacks = [] - if model_checkpoint: - ckpt_full_path = os.path.join(model_dir, 'model.ckpt-{epoch:04d}') - callbacks.append( - tf.keras.callbacks.ModelCheckpoint( - ckpt_full_path, save_weights_only=True, verbose=1)) - if backup_and_restore: - backup_dir = os.path.join(model_dir, 'tmp') - callbacks.append( - tf.keras.callbacks.experimental.BackupAndRestore(backup_dir)) - if include_tensorboard: - callbacks.append( - CustomTensorBoard( - log_dir=model_dir, - track_lr=track_lr, - initial_step=initial_step, - write_images=write_model_weights, - profile_batch=0)) - if time_history: - callbacks.append( - keras_utils.TimeHistory( - batch_size, - log_steps, - logdir=model_dir if include_tensorboard else None)) - if apply_moving_average: - # Save moving average model to a different file so that - # we can resume training from a checkpoint - ckpt_full_path = os.path.join(model_dir, 'average', - 'model.ckpt-{epoch:04d}') - callbacks.append( - AverageModelCheckpoint( - update_weights=False, - filepath=ckpt_full_path, - save_weights_only=True, - verbose=1)) - callbacks.append(MovingAverageCallback()) - return callbacks - - -def get_scalar_from_tensor(t: tf.Tensor) -> int: - """Utility function to convert a Tensor to a scalar.""" - t = tf.keras.backend.get_value(t) - if callable(t): - return t() - else: - return t - - -class CustomTensorBoard(tf.keras.callbacks.TensorBoard): - """A customized TensorBoard callback that tracks additional datapoints. - - Metrics tracked: - - Global learning rate - - Attributes: - log_dir: the path of the directory where to save the log files to be parsed - by TensorBoard. - track_lr: `bool`, whether or not to track the global learning rate. - initial_step: the initial step, used for preemption recovery. - **kwargs: Additional arguments for backwards compatibility. Possible key is - `period`. - """ - - # TODO(b/146499062): track params, flops, log lr, l2 loss, - # classification loss - - def __init__(self, - log_dir: str, - track_lr: bool = False, - initial_step: int = 0, - **kwargs): - super(CustomTensorBoard, self).__init__(log_dir=log_dir, **kwargs) - self.step = initial_step - self._track_lr = track_lr - - def on_batch_begin(self, - epoch: int, - logs: Optional[MutableMapping[str, Any]] = None) -> None: - self.step += 1 - if logs is None: - logs = {} - logs.update(self._calculate_metrics()) - super(CustomTensorBoard, self).on_batch_begin(epoch, logs) - - def on_epoch_begin(self, - epoch: int, - logs: Optional[MutableMapping[str, Any]] = None) -> None: - if logs is None: - logs = {} - metrics = self._calculate_metrics() - logs.update(metrics) - for k, v in metrics.items(): - logging.info('Current %s: %f', k, v) - super(CustomTensorBoard, self).on_epoch_begin(epoch, logs) - - def on_epoch_end(self, - epoch: int, - logs: Optional[MutableMapping[str, Any]] = None) -> None: - if logs is None: - logs = {} - metrics = self._calculate_metrics() - logs.update(metrics) - super(CustomTensorBoard, self).on_epoch_end(epoch, logs) - - def _calculate_metrics(self) -> MutableMapping[str, Any]: - logs = {} - # TODO(b/149030439): disable LR reporting. - # if self._track_lr: - # logs['learning_rate'] = self._calculate_lr() - return logs - - def _calculate_lr(self) -> int: - """Calculates the learning rate given the current step.""" - return get_scalar_from_tensor( - self._get_base_optimizer()._decayed_lr(var_dtype=tf.float32)) # pylint:disable=protected-access - - def _get_base_optimizer(self) -> tf.keras.optimizers.Optimizer: - """Get the base optimizer used by the current model.""" - - optimizer = self.model.optimizer - - # The optimizer might be wrapped by another class, so unwrap it - while hasattr(optimizer, '_optimizer'): - optimizer = optimizer._optimizer # pylint:disable=protected-access - - return optimizer - - -class MovingAverageCallback(tf.keras.callbacks.Callback): - """A Callback to be used with a `ExponentialMovingAverage` optimizer. - - Applies moving average weights to the model during validation time to test - and predict on the averaged weights rather than the current model weights. - Once training is complete, the model weights will be overwritten with the - averaged weights (by default). - - Attributes: - overwrite_weights_on_train_end: Whether to overwrite the current model - weights with the averaged weights from the moving average optimizer. - **kwargs: Any additional callback arguments. - """ - - def __init__(self, overwrite_weights_on_train_end: bool = False, **kwargs): - super(MovingAverageCallback, self).__init__(**kwargs) - self.overwrite_weights_on_train_end = overwrite_weights_on_train_end - - def set_model(self, model: tf.keras.Model): - super(MovingAverageCallback, self).set_model(model) - assert isinstance(self.model.optimizer, - optimization.ExponentialMovingAverage) - self.model.optimizer.shadow_copy(self.model) - - def on_test_begin(self, logs: Optional[MutableMapping[Text, Any]] = None): - self.model.optimizer.swap_weights() - - def on_test_end(self, logs: Optional[MutableMapping[Text, Any]] = None): - self.model.optimizer.swap_weights() - - def on_train_end(self, logs: Optional[MutableMapping[Text, Any]] = None): - if self.overwrite_weights_on_train_end: - self.model.optimizer.assign_average_vars(self.model.variables) - - -class AverageModelCheckpoint(tf.keras.callbacks.ModelCheckpoint): - """Saves and, optionally, assigns the averaged weights. - - Taken from tfa.callbacks.AverageModelCheckpoint. - - Attributes: - update_weights: If True, assign the moving average weights to the model, and - save them. If False, keep the old non-averaged weights, but the saved - model uses the average weights. See `tf.keras.callbacks.ModelCheckpoint` - for the other args. - """ - - def __init__(self, - update_weights: bool, - filepath: str, - monitor: str = 'val_loss', - verbose: int = 0, - save_best_only: bool = False, - save_weights_only: bool = False, - mode: str = 'auto', - save_freq: str = 'epoch', - **kwargs): - self.update_weights = update_weights - super().__init__(filepath, monitor, verbose, save_best_only, - save_weights_only, mode, save_freq, **kwargs) - - def set_model(self, model): - if not isinstance(model.optimizer, optimization.ExponentialMovingAverage): - raise TypeError('AverageModelCheckpoint is only used when training' - 'with MovingAverage') - return super().set_model(model) - - def _save_model(self, epoch, logs): - assert isinstance(self.model.optimizer, - optimization.ExponentialMovingAverage) - - if self.update_weights: - self.model.optimizer.assign_average_vars(self.model.variables) - return super()._save_model(epoch, logs) # pytype: disable=attribute-error # typed-keras - else: - # Note: `model.get_weights()` gives us the weights (non-ref) - # whereas `model.variables` returns references to the variables. - non_avg_weights = self.model.get_weights() - self.model.optimizer.assign_average_vars(self.model.variables) - # result is currently None, since `super._save_model` doesn't - # return anything, but this may change in the future. - result = super()._save_model(epoch, logs) # pytype: disable=attribute-error # typed-keras - self.model.set_weights(non_avg_weights) - return result diff --git a/official/vision/image_classification/classifier_trainer.py b/official/vision/image_classification/classifier_trainer.py deleted file mode 100644 index ab6fbaea960e7d894d69e213e95c313d7fe9893c..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/classifier_trainer.py +++ /dev/null @@ -1,456 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Runs an Image Classification model.""" - -import os -import pprint -from typing import Any, Tuple, Text, Optional, Mapping - -from absl import app -from absl import flags -from absl import logging -import tensorflow as tf -from official.common import distribute_utils -from official.modeling import hyperparams -from official.modeling import performance -from official.utils import hyperparams_flags -from official.utils.misc import keras_utils -from official.vision.image_classification import callbacks as custom_callbacks -from official.vision.image_classification import dataset_factory -from official.vision.image_classification import optimizer_factory -from official.vision.image_classification.configs import base_configs -from official.vision.image_classification.configs import configs -from official.vision.image_classification.efficientnet import efficientnet_model -from official.vision.image_classification.resnet import common -from official.vision.image_classification.resnet import resnet_model - - -def get_models() -> Mapping[str, tf.keras.Model]: - """Returns the mapping from model type name to Keras model.""" - return { - 'efficientnet': efficientnet_model.EfficientNet.from_name, - 'resnet': resnet_model.resnet50, - } - - -def get_dtype_map() -> Mapping[str, tf.dtypes.DType]: - """Returns the mapping from dtype string representations to TF dtypes.""" - return { - 'float32': tf.float32, - 'bfloat16': tf.bfloat16, - 'float16': tf.float16, - 'fp32': tf.float32, - 'bf16': tf.bfloat16, - } - - -def _get_metrics(one_hot: bool) -> Mapping[Text, Any]: - """Get a dict of available metrics to track.""" - if one_hot: - return { - # (name, metric_fn) - 'acc': - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - 'accuracy': - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - 'top_1': - tf.keras.metrics.CategoricalAccuracy(name='accuracy'), - 'top_5': - tf.keras.metrics.TopKCategoricalAccuracy( - k=5, name='top_5_accuracy'), - } - else: - return { - # (name, metric_fn) - 'acc': - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - 'accuracy': - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - 'top_1': - tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), - 'top_5': - tf.keras.metrics.SparseTopKCategoricalAccuracy( - k=5, name='top_5_accuracy'), - } - - -def get_image_size_from_model( - params: base_configs.ExperimentConfig) -> Optional[int]: - """If the given model has a preferred image size, return it.""" - if params.model_name == 'efficientnet': - efficientnet_name = params.model.model_params.model_name - if efficientnet_name in efficientnet_model.MODEL_CONFIGS: - return efficientnet_model.MODEL_CONFIGS[efficientnet_name].resolution - return None - - -def _get_dataset_builders(params: base_configs.ExperimentConfig, - strategy: tf.distribute.Strategy, - one_hot: bool) -> Tuple[Any, Any]: - """Create and return train and validation dataset builders.""" - if one_hot: - logging.warning('label_smoothing > 0, so datasets will be one hot encoded.') - else: - logging.warning('label_smoothing not applied, so datasets will not be one ' - 'hot encoded.') - - num_devices = strategy.num_replicas_in_sync if strategy else 1 - - image_size = get_image_size_from_model(params) - - dataset_configs = [params.train_dataset, params.validation_dataset] - builders = [] - - for config in dataset_configs: - if config is not None and config.has_data: - builder = dataset_factory.DatasetBuilder( - config, - image_size=image_size or config.image_size, - num_devices=num_devices, - one_hot=one_hot) - else: - builder = None - builders.append(builder) - - return builders - - -def get_loss_scale(params: base_configs.ExperimentConfig, - fp16_default: float = 128.) -> float: - """Returns the loss scale for initializations.""" - loss_scale = params.runtime.loss_scale - if loss_scale == 'dynamic': - return loss_scale - elif loss_scale is not None: - return float(loss_scale) - elif (params.train_dataset.dtype == 'float32' or - params.train_dataset.dtype == 'bfloat16'): - return 1. - else: - assert params.train_dataset.dtype == 'float16' - return fp16_default - - -def _get_params_from_flags(flags_obj: flags.FlagValues): - """Get ParamsDict from flags.""" - model = flags_obj.model_type.lower() - dataset = flags_obj.dataset.lower() - params = configs.get_config(model=model, dataset=dataset) - - flags_overrides = { - 'model_dir': flags_obj.model_dir, - 'mode': flags_obj.mode, - 'model': { - 'name': model, - }, - 'runtime': { - 'run_eagerly': flags_obj.run_eagerly, - 'tpu': flags_obj.tpu, - }, - 'train_dataset': { - 'data_dir': flags_obj.data_dir, - }, - 'validation_dataset': { - 'data_dir': flags_obj.data_dir, - }, - 'train': { - 'time_history': { - 'log_steps': flags_obj.log_steps, - }, - }, - } - - overriding_configs = (flags_obj.config_file, flags_obj.params_override, - flags_overrides) - - pp = pprint.PrettyPrinter() - - logging.info('Base params: %s', pp.pformat(params.as_dict())) - - for param in overriding_configs: - logging.info('Overriding params: %s', param) - params = hyperparams.override_params_dict(params, param, is_strict=True) - - params.validate() - params.lock() - - logging.info('Final model parameters: %s', pp.pformat(params.as_dict())) - return params - - -def resume_from_checkpoint(model: tf.keras.Model, model_dir: str, - train_steps: int) -> int: - """Resumes from the latest checkpoint, if possible. - - Loads the model weights and optimizer settings from a checkpoint. - This function should be used in case of preemption recovery. - - Args: - model: The model whose weights should be restored. - model_dir: The directory where model weights were saved. - train_steps: The number of steps to train. - - Returns: - The epoch of the latest checkpoint, or 0 if not restoring. - - """ - logging.info('Load from checkpoint is enabled.') - latest_checkpoint = tf.train.latest_checkpoint(model_dir) - logging.info('latest_checkpoint: %s', latest_checkpoint) - if not latest_checkpoint: - logging.info('No checkpoint detected.') - return 0 - - logging.info('Checkpoint file %s found and restoring from ' - 'checkpoint', latest_checkpoint) - model.load_weights(latest_checkpoint) - initial_epoch = model.optimizer.iterations // train_steps - logging.info('Completed loading from checkpoint.') - logging.info('Resuming from epoch %d', initial_epoch) - return int(initial_epoch) - - -def initialize(params: base_configs.ExperimentConfig, - dataset_builder: dataset_factory.DatasetBuilder): - """Initializes backend related initializations.""" - keras_utils.set_session_config(enable_xla=params.runtime.enable_xla) - performance.set_mixed_precision_policy(dataset_builder.dtype) - if tf.config.list_physical_devices('GPU'): - data_format = 'channels_first' - else: - data_format = 'channels_last' - tf.keras.backend.set_image_data_format(data_format) - if params.runtime.run_eagerly: - # Enable eager execution to allow step-by-step debugging - tf.config.experimental_run_functions_eagerly(True) - if tf.config.list_physical_devices('GPU'): - if params.runtime.gpu_thread_mode: - keras_utils.set_gpu_thread_mode_and_count( - per_gpu_thread_count=params.runtime.per_gpu_thread_count, - gpu_thread_mode=params.runtime.gpu_thread_mode, - num_gpus=params.runtime.num_gpus, - datasets_num_private_threads=params.runtime - .dataset_num_private_threads) # pylint:disable=line-too-long - if params.runtime.batchnorm_spatial_persistent: - os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1' - - -def define_classifier_flags(): - """Defines common flags for image classification.""" - hyperparams_flags.initialize_common_flags() - flags.DEFINE_string( - 'data_dir', default=None, help='The location of the input data.') - flags.DEFINE_string( - 'mode', - default=None, - help='Mode to run: `train`, `eval`, `train_and_eval` or `export`.') - flags.DEFINE_bool( - 'run_eagerly', - default=None, - help='Use eager execution and disable autograph for debugging.') - flags.DEFINE_string( - 'model_type', - default=None, - help='The type of the model, e.g. EfficientNet, etc.') - flags.DEFINE_string( - 'dataset', - default=None, - help='The name of the dataset, e.g. ImageNet, etc.') - flags.DEFINE_integer( - 'log_steps', - default=100, - help='The interval of steps between logging of batch level stats.') - - -def serialize_config(params: base_configs.ExperimentConfig, model_dir: str): - """Serializes and saves the experiment config.""" - params_save_path = os.path.join(model_dir, 'params.yaml') - logging.info('Saving experiment configuration to %s', params_save_path) - tf.io.gfile.makedirs(model_dir) - hyperparams.save_params_dict_to_yaml(params, params_save_path) - - -def train_and_eval( - params: base_configs.ExperimentConfig, - strategy_override: tf.distribute.Strategy) -> Mapping[str, Any]: - """Runs the train and eval path using compile/fit.""" - logging.info('Running train and eval.') - - distribute_utils.configure_cluster(params.runtime.worker_hosts, - params.runtime.task_index) - - # Note: for TPUs, strategy and scope should be created before the dataset - strategy = strategy_override or distribute_utils.get_distribution_strategy( - distribution_strategy=params.runtime.distribution_strategy, - all_reduce_alg=params.runtime.all_reduce_alg, - num_gpus=params.runtime.num_gpus, - tpu_address=params.runtime.tpu) - - strategy_scope = distribute_utils.get_strategy_scope(strategy) - - logging.info('Detected %d devices.', - strategy.num_replicas_in_sync if strategy else 1) - - label_smoothing = params.model.loss.label_smoothing - one_hot = label_smoothing and label_smoothing > 0 - - builders = _get_dataset_builders(params, strategy, one_hot) - datasets = [ - builder.build(strategy) if builder else None for builder in builders - ] - - # Unpack datasets and builders based on train/val/test splits - train_builder, validation_builder = builders # pylint: disable=unbalanced-tuple-unpacking - train_dataset, validation_dataset = datasets - - train_epochs = params.train.epochs - train_steps = params.train.steps or train_builder.num_steps - validation_steps = params.evaluation.steps or validation_builder.num_steps - - initialize(params, train_builder) - - logging.info('Global batch size: %d', train_builder.global_batch_size) - - with strategy_scope: - model_params = params.model.model_params.as_dict() - model = get_models()[params.model.name](**model_params) - learning_rate = optimizer_factory.build_learning_rate( - params=params.model.learning_rate, - batch_size=train_builder.global_batch_size, - train_epochs=train_epochs, - train_steps=train_steps) - optimizer = optimizer_factory.build_optimizer( - optimizer_name=params.model.optimizer.name, - base_learning_rate=learning_rate, - params=params.model.optimizer.as_dict(), - model=model) - optimizer = performance.configure_optimizer( - optimizer, - use_float16=train_builder.dtype == 'float16', - loss_scale=get_loss_scale(params)) - - metrics_map = _get_metrics(one_hot) - metrics = [metrics_map[metric] for metric in params.train.metrics] - steps_per_loop = train_steps if params.train.set_epoch_loop else 1 - - if one_hot: - loss_obj = tf.keras.losses.CategoricalCrossentropy( - label_smoothing=params.model.loss.label_smoothing) - else: - loss_obj = tf.keras.losses.SparseCategoricalCrossentropy() - model.compile( - optimizer=optimizer, - loss=loss_obj, - metrics=metrics, - steps_per_execution=steps_per_loop) - - initial_epoch = 0 - if params.train.resume_checkpoint: - initial_epoch = resume_from_checkpoint( - model=model, model_dir=params.model_dir, train_steps=train_steps) - - callbacks = custom_callbacks.get_callbacks( - model_checkpoint=params.train.callbacks.enable_checkpoint_and_export, - include_tensorboard=params.train.callbacks.enable_tensorboard, - time_history=params.train.callbacks.enable_time_history, - track_lr=params.train.tensorboard.track_lr, - write_model_weights=params.train.tensorboard.write_model_weights, - initial_step=initial_epoch * train_steps, - batch_size=train_builder.global_batch_size, - log_steps=params.train.time_history.log_steps, - model_dir=params.model_dir, - backup_and_restore=params.train.callbacks.enable_backup_and_restore) - - serialize_config(params=params, model_dir=params.model_dir) - - if params.evaluation.skip_eval: - validation_kwargs = {} - else: - validation_kwargs = { - 'validation_data': validation_dataset, - 'validation_steps': validation_steps, - 'validation_freq': params.evaluation.epochs_between_evals, - } - - history = model.fit( - train_dataset, - epochs=train_epochs, - steps_per_epoch=train_steps, - initial_epoch=initial_epoch, - callbacks=callbacks, - verbose=2, - **validation_kwargs) - - validation_output = None - if not params.evaluation.skip_eval: - validation_output = model.evaluate( - validation_dataset, steps=validation_steps, verbose=2) - - # TODO(dankondratyuk): eval and save final test accuracy - stats = common.build_stats(history, validation_output, callbacks) - return stats - - -def export(params: base_configs.ExperimentConfig): - """Runs the model export functionality.""" - logging.info('Exporting model.') - model_params = params.model.model_params.as_dict() - model = get_models()[params.model.name](**model_params) - checkpoint = params.export.checkpoint - if checkpoint is None: - logging.info('No export checkpoint was provided. Using the latest ' - 'checkpoint from model_dir.') - checkpoint = tf.train.latest_checkpoint(params.model_dir) - - model.load_weights(checkpoint) - model.save(params.export.destination) - - -def run(flags_obj: flags.FlagValues, - strategy_override: tf.distribute.Strategy = None) -> Mapping[str, Any]: - """Runs Image Classification model using native Keras APIs. - - Args: - flags_obj: An object containing parsed flag values. - strategy_override: A `tf.distribute.Strategy` object to use for model. - - Returns: - Dictionary of training/eval stats - """ - params = _get_params_from_flags(flags_obj) - if params.mode == 'train_and_eval': - return train_and_eval(params, strategy_override) - elif params.mode == 'export_only': - export(params) - else: - raise ValueError('{} is not a valid mode.'.format(params.mode)) - - -def main(_): - stats = run(flags.FLAGS) - if stats: - logging.info('Run stats:\n%s', stats) - - -if __name__ == '__main__': - logging.set_verbosity(logging.INFO) - define_classifier_flags() - flags.mark_flag_as_required('data_dir') - flags.mark_flag_as_required('mode') - flags.mark_flag_as_required('model_type') - flags.mark_flag_as_required('dataset') - - app.run(main) diff --git a/official/vision/image_classification/classifier_trainer_test.py b/official/vision/image_classification/classifier_trainer_test.py deleted file mode 100644 index 06227c154427db3057269f9e9250a179a52264c9..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/classifier_trainer_test.py +++ /dev/null @@ -1,240 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Unit tests for the classifier trainer models.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools -import json - -import os -import sys - -from typing import Any, Callable, Iterable, Mapping, MutableMapping, Optional, Tuple - -from absl import flags -from absl.testing import flagsaver -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations -from official.utils.flags import core as flags_core -from official.vision.image_classification import classifier_trainer - - -classifier_trainer.define_classifier_flags() - - -def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]: - """Returns the combinations of end-to-end tests to run.""" - return combinations.combine( - distribution=[ - strategy_combinations.default_strategy, - strategy_combinations.cloud_tpu_strategy, - strategy_combinations.one_device_strategy_gpu, - strategy_combinations.mirrored_strategy_with_two_gpus, - ], - model=[ - 'efficientnet', - 'resnet', - ], - dataset=[ - 'imagenet', - ], - ) - - -def get_params_override(params_override: Mapping[str, Any]) -> str: - """Converts params_override dict to string command.""" - return '--params_override=' + json.dumps(params_override) - - -def basic_params_override(dtype: str = 'float32') -> MutableMapping[str, Any]: - """Returns a basic parameter configuration for testing.""" - return { - 'train_dataset': { - 'builder': 'synthetic', - 'use_per_replica_batch_size': True, - 'batch_size': 1, - 'image_size': 224, - 'dtype': dtype, - }, - 'validation_dataset': { - 'builder': 'synthetic', - 'batch_size': 1, - 'use_per_replica_batch_size': True, - 'image_size': 224, - 'dtype': dtype, - }, - 'train': { - 'steps': 1, - 'epochs': 1, - 'callbacks': { - 'enable_checkpoint_and_export': True, - 'enable_tensorboard': False, - }, - }, - 'evaluation': { - 'steps': 1, - }, - } - - -@flagsaver.flagsaver -def run_end_to_end(main: Callable[[Any], None], - extra_flags: Optional[Iterable[str]] = None, - model_dir: Optional[str] = None): - """Runs the classifier trainer end-to-end.""" - extra_flags = [] if extra_flags is None else extra_flags - args = [sys.argv[0], '--model_dir', model_dir] + extra_flags - flags_core.parse_flags(argv=args) - main(flags.FLAGS) - - -class ClassifierTest(tf.test.TestCase, parameterized.TestCase): - """Unit tests for Keras models.""" - _tempdir = None - - @classmethod - def setUpClass(cls): # pylint: disable=invalid-name - super(ClassifierTest, cls).setUpClass() - - def tearDown(self): - super(ClassifierTest, self).tearDown() - tf.io.gfile.rmtree(self.get_temp_dir()) - - @combinations.generate(distribution_strategy_combinations()) - def test_end_to_end_train_and_eval(self, distribution, model, dataset): - """Test train_and_eval and export for Keras classifier models.""" - # Some parameters are not defined as flags (e.g. cannot run - # classifier_train.py --batch_size=...) by design, so use - # "--params_override=..." instead - model_dir = self.create_tempdir().full_path - base_flags = [ - '--data_dir=not_used', - '--model_type=' + model, - '--dataset=' + dataset, - ] - train_and_eval_flags = base_flags + [ - get_params_override(basic_params_override()), - '--mode=train_and_eval', - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - run_end_to_end( - main=run, extra_flags=train_and_eval_flags, model_dir=model_dir) - - @combinations.generate( - combinations.combine( - distribution=[ - strategy_combinations.one_device_strategy_gpu, - ], - model=[ - 'efficientnet', - 'resnet', - ], - dataset='imagenet', - dtype='float16', - )) - def test_gpu_train(self, distribution, model, dataset, dtype): - """Test train_and_eval and export for Keras classifier models.""" - # Some parameters are not defined as flags (e.g. cannot run - # classifier_train.py --batch_size=...) by design, so use - # "--params_override=..." instead - model_dir = self.create_tempdir().full_path - base_flags = [ - '--data_dir=not_used', - '--model_type=' + model, - '--dataset=' + dataset, - ] - train_and_eval_flags = base_flags + [ - get_params_override(basic_params_override(dtype)), - '--mode=train_and_eval', - ] - - export_params = basic_params_override() - export_path = os.path.join(model_dir, 'export') - export_params['export'] = {} - export_params['export']['destination'] = export_path - export_flags = base_flags + [ - '--mode=export_only', - get_params_override(export_params) - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - run_end_to_end( - main=run, extra_flags=train_and_eval_flags, model_dir=model_dir) - run_end_to_end(main=run, extra_flags=export_flags, model_dir=model_dir) - self.assertTrue(os.path.exists(export_path)) - - @combinations.generate( - combinations.combine( - distribution=[ - strategy_combinations.cloud_tpu_strategy, - ], - model=[ - 'efficientnet', - 'resnet', - ], - dataset='imagenet', - dtype='bfloat16', - )) - def test_tpu_train(self, distribution, model, dataset, dtype): - """Test train_and_eval and export for Keras classifier models.""" - # Some parameters are not defined as flags (e.g. cannot run - # classifier_train.py --batch_size=...) by design, so use - # "--params_override=..." instead - model_dir = self.create_tempdir().full_path - base_flags = [ - '--data_dir=not_used', - '--model_type=' + model, - '--dataset=' + dataset, - ] - train_and_eval_flags = base_flags + [ - get_params_override(basic_params_override(dtype)), - '--mode=train_and_eval', - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - run_end_to_end( - main=run, extra_flags=train_and_eval_flags, model_dir=model_dir) - - @combinations.generate(distribution_strategy_combinations()) - def test_end_to_end_invalid_mode(self, distribution, model, dataset): - """Test the Keras EfficientNet model with `strategy`.""" - model_dir = self.create_tempdir().full_path - extra_flags = [ - '--data_dir=not_used', - '--mode=invalid_mode', - '--model_type=' + model, - '--dataset=' + dataset, - get_params_override(basic_params_override()), - ] - - run = functools.partial( - classifier_trainer.run, strategy_override=distribution) - with self.assertRaises(ValueError): - run_end_to_end(main=run, extra_flags=extra_flags, model_dir=model_dir) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/classifier_trainer_util_test.py b/official/vision/image_classification/classifier_trainer_util_test.py deleted file mode 100644 index d3624c286fdc716e4a09df56fbb8157fa35602aa..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/classifier_trainer_util_test.py +++ /dev/null @@ -1,166 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Unit tests for the classifier trainer models.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import copy -import os - -from absl.testing import parameterized -import tensorflow as tf - -from official.vision.image_classification import classifier_trainer -from official.vision.image_classification import dataset_factory -from official.vision.image_classification import test_utils -from official.vision.image_classification.configs import base_configs - - -def get_trivial_model(num_classes: int) -> tf.keras.Model: - """Creates and compiles trivial model for ImageNet dataset.""" - model = test_utils.trivial_model(num_classes=num_classes) - lr = 0.01 - optimizer = tf.keras.optimizers.SGD(learning_rate=lr) - loss_obj = tf.keras.losses.SparseCategoricalCrossentropy() - model.compile(optimizer=optimizer, loss=loss_obj, run_eagerly=True) - return model - - -def get_trivial_data() -> tf.data.Dataset: - """Gets trivial data in the ImageNet size.""" - - def generate_data(_) -> tf.data.Dataset: - image = tf.zeros(shape=(224, 224, 3), dtype=tf.float32) - label = tf.zeros([1], dtype=tf.int32) - return image, label - - dataset = tf.data.Dataset.range(1) - dataset = dataset.repeat() - dataset = dataset.map( - generate_data, num_parallel_calls=tf.data.experimental.AUTOTUNE) - dataset = dataset.prefetch(buffer_size=1).batch(1) - return dataset - - -class UtilTests(parameterized.TestCase, tf.test.TestCase): - """Tests for individual utility functions within classifier_trainer.py.""" - - @parameterized.named_parameters( - ('efficientnet-b0', 'efficientnet', 'efficientnet-b0', 224), - ('efficientnet-b1', 'efficientnet', 'efficientnet-b1', 240), - ('efficientnet-b2', 'efficientnet', 'efficientnet-b2', 260), - ('efficientnet-b3', 'efficientnet', 'efficientnet-b3', 300), - ('efficientnet-b4', 'efficientnet', 'efficientnet-b4', 380), - ('efficientnet-b5', 'efficientnet', 'efficientnet-b5', 456), - ('efficientnet-b6', 'efficientnet', 'efficientnet-b6', 528), - ('efficientnet-b7', 'efficientnet', 'efficientnet-b7', 600), - ('resnet', 'resnet', '', None), - ) - def test_get_model_size(self, model, model_name, expected): - config = base_configs.ExperimentConfig( - model_name=model, - model=base_configs.ModelConfig( - model_params={ - 'model_name': model_name, - },)) - size = classifier_trainer.get_image_size_from_model(config) - self.assertEqual(size, expected) - - @parameterized.named_parameters( - ('dynamic', 'dynamic', None, 'dynamic'), - ('scalar', 128., None, 128.), - ('float32', None, 'float32', 1), - ('float16', None, 'float16', 128), - ) - def test_get_loss_scale(self, loss_scale, dtype, expected): - config = base_configs.ExperimentConfig( - runtime=base_configs.RuntimeConfig(loss_scale=loss_scale), - train_dataset=dataset_factory.DatasetConfig(dtype=dtype)) - ls = classifier_trainer.get_loss_scale(config, fp16_default=128) - self.assertEqual(ls, expected) - - @parameterized.named_parameters(('float16', 'float16'), - ('bfloat16', 'bfloat16')) - def test_initialize(self, dtype): - config = base_configs.ExperimentConfig( - runtime=base_configs.RuntimeConfig( - run_eagerly=False, - enable_xla=False, - per_gpu_thread_count=1, - gpu_thread_mode='gpu_private', - num_gpus=1, - dataset_num_private_threads=1, - ), - train_dataset=dataset_factory.DatasetConfig(dtype=dtype), - model=base_configs.ModelConfig(), - ) - - class EmptyClass: - pass - - fake_ds_builder = EmptyClass() - fake_ds_builder.dtype = dtype - fake_ds_builder.config = EmptyClass() - classifier_trainer.initialize(config, fake_ds_builder) - - def test_resume_from_checkpoint(self): - """Tests functionality for resuming from checkpoint.""" - # Set the keras policy - tf.keras.mixed_precision.set_global_policy('mixed_bfloat16') - - # Get the model, datasets, and compile it. - model = get_trivial_model(10) - - # Create the checkpoint - model_dir = self.create_tempdir().full_path - train_epochs = 1 - train_steps = 10 - ds = get_trivial_data() - callbacks = [ - tf.keras.callbacks.ModelCheckpoint( - os.path.join(model_dir, 'model.ckpt-{epoch:04d}'), - save_weights_only=True) - ] - model.fit( - ds, - callbacks=callbacks, - epochs=train_epochs, - steps_per_epoch=train_steps) - - # Test load from checkpoint - clean_model = get_trivial_model(10) - weights_before_load = copy.deepcopy(clean_model.get_weights()) - initial_epoch = classifier_trainer.resume_from_checkpoint( - model=clean_model, model_dir=model_dir, train_steps=train_steps) - self.assertEqual(initial_epoch, 1) - self.assertNotAllClose(weights_before_load, clean_model.get_weights()) - - tf.io.gfile.rmtree(model_dir) - - def test_serialize_config(self): - """Tests functionality for serializing data.""" - config = base_configs.ExperimentConfig() - model_dir = self.create_tempdir().full_path - classifier_trainer.serialize_config(params=config, model_dir=model_dir) - saved_params_path = os.path.join(model_dir, 'params.yaml') - self.assertTrue(os.path.exists(saved_params_path)) - tf.io.gfile.rmtree(model_dir) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/configs/__init__.py b/official/vision/image_classification/configs/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/image_classification/configs/base_configs.py b/official/vision/image_classification/configs/base_configs.py deleted file mode 100644 index 760b3dce03fc017c912eb499e30ff1418b5ec090..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/base_configs.py +++ /dev/null @@ -1,257 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Definitions for high level configuration groups..""" - -import dataclasses -from typing import Any, List, Mapping, Optional -from official.core import config_definitions -from official.modeling import hyperparams - -RuntimeConfig = config_definitions.RuntimeConfig - - -@dataclasses.dataclass -class TensorBoardConfig(hyperparams.Config): - """Configuration for TensorBoard. - - Attributes: - track_lr: Whether or not to track the learning rate in TensorBoard. Defaults - to True. - write_model_weights: Whether or not to write the model weights as images in - TensorBoard. Defaults to False. - """ - track_lr: bool = True - write_model_weights: bool = False - - -@dataclasses.dataclass -class CallbacksConfig(hyperparams.Config): - """Configuration for Callbacks. - - Attributes: - enable_checkpoint_and_export: Whether or not to enable checkpoints as a - Callback. Defaults to True. - enable_backup_and_restore: Whether or not to add BackupAndRestore - callback. Defaults to True. - enable_tensorboard: Whether or not to enable TensorBoard as a Callback. - Defaults to True. - enable_time_history: Whether or not to enable TimeHistory Callbacks. - Defaults to True. - """ - enable_checkpoint_and_export: bool = True - enable_backup_and_restore: bool = False - enable_tensorboard: bool = True - enable_time_history: bool = True - - -@dataclasses.dataclass -class ExportConfig(hyperparams.Config): - """Configuration for exports. - - Attributes: - checkpoint: the path to the checkpoint to export. - destination: the path to where the checkpoint should be exported. - """ - checkpoint: str = None - destination: str = None - - -@dataclasses.dataclass -class MetricsConfig(hyperparams.Config): - """Configuration for Metrics. - - Attributes: - accuracy: Whether or not to track accuracy as a Callback. Defaults to None. - top_5: Whether or not to track top_5_accuracy as a Callback. Defaults to - None. - """ - accuracy: bool = None - top_5: bool = None - - -@dataclasses.dataclass -class TimeHistoryConfig(hyperparams.Config): - """Configuration for the TimeHistory callback. - - Attributes: - log_steps: Interval of steps between logging of batch level stats. - """ - log_steps: int = None - - -@dataclasses.dataclass -class TrainConfig(hyperparams.Config): - """Configuration for training. - - Attributes: - resume_checkpoint: Whether or not to enable load checkpoint loading. - Defaults to None. - epochs: The number of training epochs to run. Defaults to None. - steps: The number of steps to run per epoch. If None, then this will be - inferred based on the number of images and batch size. Defaults to None. - callbacks: An instance of CallbacksConfig. - metrics: An instance of MetricsConfig. - tensorboard: An instance of TensorBoardConfig. - set_epoch_loop: Whether or not to set `steps_per_execution` to - equal the number of training steps in `model.compile`. This reduces the - number of callbacks run per epoch which significantly improves end-to-end - TPU training time. - """ - resume_checkpoint: bool = None - epochs: int = None - steps: int = None - callbacks: CallbacksConfig = CallbacksConfig() - metrics: MetricsConfig = None - tensorboard: TensorBoardConfig = TensorBoardConfig() - time_history: TimeHistoryConfig = TimeHistoryConfig() - set_epoch_loop: bool = False - - -@dataclasses.dataclass -class EvalConfig(hyperparams.Config): - """Configuration for evaluation. - - Attributes: - epochs_between_evals: The number of train epochs to run between evaluations. - Defaults to None. - steps: The number of eval steps to run during evaluation. If None, this will - be inferred based on the number of images and batch size. Defaults to - None. - skip_eval: Whether or not to skip evaluation. - """ - epochs_between_evals: int = None - steps: int = None - skip_eval: bool = False - - -@dataclasses.dataclass -class LossConfig(hyperparams.Config): - """Configuration for Loss. - - Attributes: - name: The name of the loss. Defaults to None. - label_smoothing: Whether or not to apply label smoothing to the loss. This - only applies to 'categorical_cross_entropy'. - """ - name: str = None - label_smoothing: float = None - - -@dataclasses.dataclass -class OptimizerConfig(hyperparams.Config): - """Configuration for Optimizers. - - Attributes: - name: The name of the optimizer. Defaults to None. - decay: Decay or rho, discounting factor for gradient. Defaults to None. - epsilon: Small value used to avoid 0 denominator. Defaults to None. - momentum: Plain momentum constant. Defaults to None. - nesterov: Whether or not to apply Nesterov momentum. Defaults to None. - moving_average_decay: The amount of decay to apply. If 0 or None, then - exponential moving average is not used. Defaults to None. - lookahead: Whether or not to apply the lookahead optimizer. Defaults to - None. - beta_1: The exponential decay rate for the 1st moment estimates. Used in the - Adam optimizers. Defaults to None. - beta_2: The exponential decay rate for the 2nd moment estimates. Used in the - Adam optimizers. Defaults to None. - epsilon: Small value used to avoid 0 denominator. Defaults to 1e-7. - """ - name: str = None - decay: float = None - epsilon: float = None - momentum: float = None - nesterov: bool = None - moving_average_decay: Optional[float] = None - lookahead: Optional[bool] = None - beta_1: float = None - beta_2: float = None - epsilon: float = None - - -@dataclasses.dataclass -class LearningRateConfig(hyperparams.Config): - """Configuration for learning rates. - - Attributes: - name: The name of the learning rate. Defaults to None. - initial_lr: The initial learning rate. Defaults to None. - decay_epochs: The number of decay epochs. Defaults to None. - decay_rate: The rate of decay. Defaults to None. - warmup_epochs: The number of warmup epochs. Defaults to None. - batch_lr_multiplier: The multiplier to apply to the base learning rate, if - necessary. Defaults to None. - examples_per_epoch: the number of examples in a single epoch. Defaults to - None. - boundaries: boundaries used in piecewise constant decay with warmup. - multipliers: multipliers used in piecewise constant decay with warmup. - scale_by_batch_size: Scale the learning rate by a fraction of the batch - size. Set to 0 for no scaling (default). - staircase: Apply exponential decay at discrete values instead of continuous. - """ - name: str = None - initial_lr: float = None - decay_epochs: float = None - decay_rate: float = None - warmup_epochs: int = None - examples_per_epoch: int = None - boundaries: List[int] = None - multipliers: List[float] = None - scale_by_batch_size: float = 0. - staircase: bool = None - - -@dataclasses.dataclass -class ModelConfig(hyperparams.Config): - """Configuration for Models. - - Attributes: - name: The name of the model. Defaults to None. - model_params: The parameters used to create the model. Defaults to None. - num_classes: The number of classes in the model. Defaults to None. - loss: A `LossConfig` instance. Defaults to None. - optimizer: An `OptimizerConfig` instance. Defaults to None. - """ - name: str = None - model_params: hyperparams.Config = None - num_classes: int = None - loss: LossConfig = None - optimizer: OptimizerConfig = None - - -@dataclasses.dataclass -class ExperimentConfig(hyperparams.Config): - """Base configuration for an image classification experiment. - - Attributes: - model_dir: The directory to use when running an experiment. - mode: e.g. 'train_and_eval', 'export' - runtime: A `RuntimeConfig` instance. - train: A `TrainConfig` instance. - evaluation: An `EvalConfig` instance. - model: A `ModelConfig` instance. - export: An `ExportConfig` instance. - """ - model_dir: str = None - model_name: str = None - mode: str = None - runtime: RuntimeConfig = None - train_dataset: Any = None - validation_dataset: Any = None - train: TrainConfig = None - evaluation: EvalConfig = None - model: ModelConfig = None - export: ExportConfig = None diff --git a/official/vision/image_classification/configs/configs.py b/official/vision/image_classification/configs/configs.py deleted file mode 100644 index 127af58c476f7ae849ca43e5765379b77897aea8..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/configs.py +++ /dev/null @@ -1,113 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configuration utils for image classification experiments.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import dataclasses - -from official.vision.image_classification import dataset_factory -from official.vision.image_classification.configs import base_configs -from official.vision.image_classification.efficientnet import efficientnet_config -from official.vision.image_classification.resnet import resnet_config - - -@dataclasses.dataclass -class EfficientNetImageNetConfig(base_configs.ExperimentConfig): - """Base configuration to train efficientnet-b0 on ImageNet. - - Attributes: - export: An `ExportConfig` instance - runtime: A `RuntimeConfig` instance. - dataset: A `DatasetConfig` instance. - train: A `TrainConfig` instance. - evaluation: An `EvalConfig` instance. - model: A `ModelConfig` instance. - """ - export: base_configs.ExportConfig = base_configs.ExportConfig() - runtime: base_configs.RuntimeConfig = base_configs.RuntimeConfig() - train_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='train') - validation_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='validation') - train: base_configs.TrainConfig = base_configs.TrainConfig( - resume_checkpoint=True, - epochs=500, - steps=None, - callbacks=base_configs.CallbacksConfig( - enable_checkpoint_and_export=True, enable_tensorboard=True), - metrics=['accuracy', 'top_5'], - time_history=base_configs.TimeHistoryConfig(log_steps=100), - tensorboard=base_configs.TensorBoardConfig( - track_lr=True, write_model_weights=False), - set_epoch_loop=False) - evaluation: base_configs.EvalConfig = base_configs.EvalConfig( - epochs_between_evals=1, steps=None) - model: base_configs.ModelConfig = \ - efficientnet_config.EfficientNetModelConfig() - - -@dataclasses.dataclass -class ResNetImagenetConfig(base_configs.ExperimentConfig): - """Base configuration to train resnet-50 on ImageNet.""" - export: base_configs.ExportConfig = base_configs.ExportConfig() - runtime: base_configs.RuntimeConfig = base_configs.RuntimeConfig() - train_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='train', - one_hot=False, - mean_subtract=True, - standardize=True) - validation_dataset: dataset_factory.DatasetConfig = \ - dataset_factory.ImageNetConfig(split='validation', - one_hot=False, - mean_subtract=True, - standardize=True) - train: base_configs.TrainConfig = base_configs.TrainConfig( - resume_checkpoint=True, - epochs=90, - steps=None, - callbacks=base_configs.CallbacksConfig( - enable_checkpoint_and_export=True, enable_tensorboard=True), - metrics=['accuracy', 'top_5'], - time_history=base_configs.TimeHistoryConfig(log_steps=100), - tensorboard=base_configs.TensorBoardConfig( - track_lr=True, write_model_weights=False), - set_epoch_loop=False) - evaluation: base_configs.EvalConfig = base_configs.EvalConfig( - epochs_between_evals=1, steps=None) - model: base_configs.ModelConfig = resnet_config.ResNetModelConfig() - - -def get_config(model: str, dataset: str) -> base_configs.ExperimentConfig: - """Given model and dataset names, return the ExperimentConfig.""" - dataset_model_config_map = { - 'imagenet': { - 'efficientnet': EfficientNetImageNetConfig(), - 'resnet': ResNetImagenetConfig(), - } - } - try: - return dataset_model_config_map[dataset][model] - except KeyError: - if dataset not in dataset_model_config_map: - raise KeyError('Invalid dataset received. Received: {}. Supported ' - 'datasets include: {}'.format( - dataset, ', '.join(dataset_model_config_map.keys()))) - raise KeyError('Invalid model received. Received: {}. Supported models for' - '{} include: {}'.format( - model, dataset, - ', '.join(dataset_model_config_map[dataset].keys()))) diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml deleted file mode 100644 index 6f40ffb1e3020a231832a120d9938bf77e9cc74b..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml +++ /dev/null @@ -1,52 +0,0 @@ -# Training configuration for EfficientNet-b0 trained on ImageNet on GPUs. -# Takes ~32 minutes per epoch for 8 V100s. -# Reaches ~76.1% within 350 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'mirrored' - num_gpus: 1 -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' - augmenter: - name: 'autoaugment' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' -model: - model_params: - model_name: 'efficientnet-b0' - overrides: - num_classes: 1000 - batch_norm: 'default' - dtype: 'float32' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml deleted file mode 100644 index c5be7e9ba32fc7e8f3999df8e7446405dd2d4173..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml +++ /dev/null @@ -1,52 +0,0 @@ -# Training configuration for EfficientNet-b0 trained on ImageNet on TPUs. -# Takes ~2 minutes, 50 seconds per epoch for v3-32. -# Reaches ~76.1% within 350 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'tpu' -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' - augmenter: - name: 'autoaugment' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' -model: - model_params: - model_name: 'efficientnet-b0' - overrides: - num_classes: 1000 - batch_norm: 'tpu' - dtype: 'bfloat16' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 - set_epoch_loop: True -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml deleted file mode 100644 index 2f3dce01a46c64c4d92e97091628daeadaceb21d..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-gpu.yaml +++ /dev/null @@ -1,47 +0,0 @@ -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'mirrored' - num_gpus: 1 -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 32 - use_per_replica_batch_size: True - dtype: 'float32' -model: - model_params: - model_name: 'efficientnet-b1' - overrides: - num_classes: 1000 - batch_norm: 'default' - dtype: 'float32' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml b/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml deleted file mode 100644 index 0bb6a9fe6f0b417f92686178d4bc79a44c5a4aa7..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/efficientnet/imagenet/efficientnet-b1-tpu.yaml +++ /dev/null @@ -1,51 +0,0 @@ -# Training configuration for EfficientNet-b1 trained on ImageNet on TPUs. -# Takes ~3 minutes, 15 seconds per epoch for v3-32. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'tpu' -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'train' - num_classes: 1000 - num_examples: 1281167 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' - augmenter: - name: 'autoaugment' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'records' - split: 'validation' - num_classes: 1000 - num_examples: 50000 - batch_size: 128 - use_per_replica_batch_size: True - dtype: 'bfloat16' -model: - model_params: - model_name: 'efficientnet-b1' - overrides: - num_classes: 1000 - batch_norm: 'tpu' - dtype: 'bfloat16' - activation: 'swish' - optimizer: - name: 'rmsprop' - momentum: 0.9 - decay: 0.9 - moving_average_decay: 0.0 - lookahead: false - learning_rate: - name: 'exponential' - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 500 - set_epoch_loop: True -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml b/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml deleted file mode 100644 index 2037d6b5d1c39b9ff898eaf49ec7a68e3987356b..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/resnet/imagenet/gpu.yaml +++ /dev/null @@ -1,49 +0,0 @@ -# Training configuration for ResNet trained on ImageNet on GPUs. -# Reaches > 76.1% within 90 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'mirrored' - num_gpus: 1 - batchnorm_spatial_persistent: True -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'train' - image_size: 224 - num_classes: 1000 - num_examples: 1281167 - batch_size: 256 - use_per_replica_batch_size: True - dtype: 'float16' - mean_subtract: True - standardize: True -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'validation' - image_size: 224 - num_classes: 1000 - num_examples: 50000 - batch_size: 256 - use_per_replica_batch_size: True - dtype: 'float16' - mean_subtract: True - standardize: True -model: - name: 'resnet' - model_params: - rescale_inputs: False - optimizer: - name: 'momentum' - momentum: 0.9 - decay: 0.9 - epsilon: 0.001 - loss: - label_smoothing: 0.1 -train: - resume_checkpoint: True - epochs: 90 -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml b/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml deleted file mode 100644 index 0a3030333bb42ce59e67cfbe12a12be877ab19d0..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/configs/examples/resnet/imagenet/tpu.yaml +++ /dev/null @@ -1,55 +0,0 @@ -# Training configuration for ResNet trained on ImageNet on TPUs. -# Takes ~4 minutes, 30 seconds seconds per epoch for a v3-32. -# Reaches > 76.1% within 90 epochs. -# Note: This configuration uses a scaled per-replica batch size based on the number of devices. -runtime: - distribution_strategy: 'tpu' -train_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'train' - one_hot: False - image_size: 224 - num_classes: 1000 - num_examples: 1281167 - batch_size: 128 - use_per_replica_batch_size: True - mean_subtract: False - standardize: False - dtype: 'bfloat16' -validation_dataset: - name: 'imagenet2012' - data_dir: null - builder: 'tfds' - split: 'validation' - one_hot: False - image_size: 224 - num_classes: 1000 - num_examples: 50000 - batch_size: 128 - use_per_replica_batch_size: True - mean_subtract: False - standardize: False - dtype: 'bfloat16' -model: - name: 'resnet' - model_params: - rescale_inputs: True - optimizer: - name: 'momentum' - momentum: 0.9 - decay: 0.9 - epsilon: 0.001 - moving_average_decay: 0. - lookahead: False - loss: - label_smoothing: 0.1 -train: - callbacks: - enable_checkpoint_and_export: True - resume_checkpoint: True - epochs: 90 - set_epoch_loop: True -evaluation: - epochs_between_evals: 1 diff --git a/official/vision/image_classification/dataset_factory.py b/official/vision/image_classification/dataset_factory.py deleted file mode 100644 index a0458ecccf9a74eb57480f8d127c0eb736591ff5..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/dataset_factory.py +++ /dev/null @@ -1,537 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Dataset utilities for vision tasks using TFDS and tf.data.Dataset.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -from typing import Any, List, Optional, Tuple, Mapping, Union - -from absl import logging -from dataclasses import dataclass -import tensorflow as tf -import tensorflow_datasets as tfds - -from official.modeling.hyperparams import base_config -from official.vision.image_classification import augment -from official.vision.image_classification import preprocessing - -AUGMENTERS = { - 'autoaugment': augment.AutoAugment, - 'randaugment': augment.RandAugment, -} - - -@dataclass -class AugmentConfig(base_config.Config): - """Configuration for image augmenters. - - Attributes: - name: The name of the image augmentation to use. Possible options are None - (default), 'autoaugment', or 'randaugment'. - params: Any paramaters used to initialize the augmenter. - """ - name: Optional[str] = None - params: Optional[Mapping[str, Any]] = None - - def build(self) -> augment.ImageAugment: - """Build the augmenter using this config.""" - params = self.params or {} - augmenter = AUGMENTERS.get(self.name, None) - return augmenter(**params) if augmenter is not None else None - - -@dataclass -class DatasetConfig(base_config.Config): - """The base configuration for building datasets. - - Attributes: - name: The name of the Dataset. Usually should correspond to a TFDS dataset. - data_dir: The path where the dataset files are stored, if available. - filenames: Optional list of strings representing the TFRecord names. - builder: The builder type used to load the dataset. Value should be one of - 'tfds' (load using TFDS), 'records' (load from TFRecords), or 'synthetic' - (generate dummy synthetic data without reading from files). - split: The split of the dataset. Usually 'train', 'validation', or 'test'. - image_size: The size of the image in the dataset. This assumes that `width` - == `height`. Set to 'infer' to infer the image size from TFDS info. This - requires `name` to be a registered dataset in TFDS. - num_classes: The number of classes given by the dataset. Set to 'infer' to - infer the image size from TFDS info. This requires `name` to be a - registered dataset in TFDS. - num_channels: The number of channels given by the dataset. Set to 'infer' to - infer the image size from TFDS info. This requires `name` to be a - registered dataset in TFDS. - num_examples: The number of examples given by the dataset. Set to 'infer' to - infer the image size from TFDS info. This requires `name` to be a - registered dataset in TFDS. - batch_size: The base batch size for the dataset. - use_per_replica_batch_size: Whether to scale the batch size based on - available resources. If set to `True`, the dataset builder will return - batch_size multiplied by `num_devices`, the number of device replicas - (e.g., the number of GPUs or TPU cores). This setting should be `True` if - the strategy argument is passed to `build()` and `num_devices > 1`. - num_devices: The number of replica devices to use. This should be set by - `strategy.num_replicas_in_sync` when using a distribution strategy. - dtype: The desired dtype of the dataset. This will be set during - preprocessing. - one_hot: Whether to apply one hot encoding. Set to `True` to be able to use - label smoothing. - augmenter: The augmenter config to use. No augmentation is used by default. - download: Whether to download data using TFDS. - shuffle_buffer_size: The buffer size used for shuffling training data. - file_shuffle_buffer_size: The buffer size used for shuffling raw training - files. - skip_decoding: Whether to skip image decoding when loading from TFDS. - cache: whether to cache to dataset examples. Can be used to avoid re-reading - from disk on the second epoch. Requires significant memory overhead. - tf_data_service: The URI of a tf.data service to offload preprocessing onto - during training. The URI should be in the format "protocol://address", - e.g. "grpc://tf-data-service:5050". - mean_subtract: whether or not to apply mean subtraction to the dataset. - standardize: whether or not to apply standardization to the dataset. - """ - name: Optional[str] = None - data_dir: Optional[str] = None - filenames: Optional[List[str]] = None - builder: str = 'tfds' - split: str = 'train' - image_size: Union[int, str] = 'infer' - num_classes: Union[int, str] = 'infer' - num_channels: Union[int, str] = 'infer' - num_examples: Union[int, str] = 'infer' - batch_size: int = 128 - use_per_replica_batch_size: bool = True - num_devices: int = 1 - dtype: str = 'float32' - one_hot: bool = True - augmenter: AugmentConfig = AugmentConfig() - download: bool = False - shuffle_buffer_size: int = 10000 - file_shuffle_buffer_size: int = 1024 - skip_decoding: bool = True - cache: bool = False - tf_data_service: Optional[str] = None - mean_subtract: bool = False - standardize: bool = False - - @property - def has_data(self): - """Whether this dataset is has any data associated with it.""" - return self.name or self.data_dir or self.filenames - - -@dataclass -class ImageNetConfig(DatasetConfig): - """The base ImageNet dataset config.""" - name: str = 'imagenet2012' - # Note: for large datasets like ImageNet, using records is faster than tfds - builder: str = 'records' - image_size: int = 224 - num_channels: int = 3 - num_examples: int = 1281167 - num_classes: int = 1000 - batch_size: int = 128 - - -@dataclass -class Cifar10Config(DatasetConfig): - """The base CIFAR-10 dataset config.""" - name: str = 'cifar10' - image_size: int = 224 - batch_size: int = 128 - download: bool = True - cache: bool = True - - -class DatasetBuilder: - """An object for building datasets. - - Allows building various pipelines fetching examples, preprocessing, etc. - Maintains additional state information calculated from the dataset, i.e., - training set split, batch size, and number of steps (batches). - """ - - def __init__(self, config: DatasetConfig, **overrides: Any): - """Initialize the builder from the config.""" - self.config = config.replace(**overrides) - self.builder_info = None - - if self.config.augmenter is not None: - logging.info('Using augmentation: %s', self.config.augmenter.name) - self.augmenter = self.config.augmenter.build() - else: - self.augmenter = None - - @property - def is_training(self) -> bool: - """Whether this is the training set.""" - return self.config.split == 'train' - - @property - def batch_size(self) -> int: - """The batch size, multiplied by the number of replicas (if configured).""" - if self.config.use_per_replica_batch_size: - return self.config.batch_size * self.config.num_devices - else: - return self.config.batch_size - - @property - def global_batch_size(self): - """The global batch size across all replicas.""" - return self.batch_size - - @property - def local_batch_size(self): - """The base unscaled batch size.""" - if self.config.use_per_replica_batch_size: - return self.config.batch_size - else: - return self.config.batch_size // self.config.num_devices - - @property - def num_steps(self) -> int: - """The number of steps (batches) to exhaust this dataset.""" - # Always divide by the global batch size to get the correct # of steps - return self.num_examples // self.global_batch_size - - @property - def dtype(self) -> tf.dtypes.DType: - """Converts the config's dtype string to a tf dtype. - - Returns: - A mapping from string representation of a dtype to the `tf.dtypes.DType`. - - Raises: - ValueError if the config's dtype is not supported. - - """ - dtype_map = { - 'float32': tf.float32, - 'bfloat16': tf.bfloat16, - 'float16': tf.float16, - 'fp32': tf.float32, - 'bf16': tf.bfloat16, - } - try: - return dtype_map[self.config.dtype] - except: - raise ValueError('Invalid DType provided. Supported types: {}'.format( - dtype_map.keys())) - - @property - def image_size(self) -> int: - """The size of each image (can be inferred from the dataset).""" - - if self.config.image_size == 'infer': - return self.info.features['image'].shape[0] - else: - return int(self.config.image_size) - - @property - def num_channels(self) -> int: - """The number of image channels (can be inferred from the dataset).""" - if self.config.num_channels == 'infer': - return self.info.features['image'].shape[-1] - else: - return int(self.config.num_channels) - - @property - def num_examples(self) -> int: - """The number of examples (can be inferred from the dataset).""" - if self.config.num_examples == 'infer': - return self.info.splits[self.config.split].num_examples - else: - return int(self.config.num_examples) - - @property - def num_classes(self) -> int: - """The number of classes (can be inferred from the dataset).""" - if self.config.num_classes == 'infer': - return self.info.features['label'].num_classes - else: - return int(self.config.num_classes) - - @property - def info(self) -> tfds.core.DatasetInfo: - """The TFDS dataset info, if available.""" - try: - if self.builder_info is None: - self.builder_info = tfds.builder(self.config.name).info - except ConnectionError as e: - logging.error('Failed to use TFDS to load info. Please set dataset info ' - '(image_size, num_channels, num_examples, num_classes) in ' - 'the dataset config.') - raise e - return self.builder_info - - def build( - self, - strategy: Optional[tf.distribute.Strategy] = None) -> tf.data.Dataset: - """Construct a dataset end-to-end and return it using an optional strategy. - - Args: - strategy: a strategy that, if passed, will distribute the dataset - according to that strategy. If passed and `num_devices > 1`, - `use_per_replica_batch_size` must be set to `True`. - - Returns: - A TensorFlow dataset outputting batched images and labels. - """ - if strategy: - if strategy.num_replicas_in_sync != self.config.num_devices: - logging.warn( - 'Passed a strategy with %d devices, but expected' - '%d devices.', strategy.num_replicas_in_sync, - self.config.num_devices) - dataset = strategy.distribute_datasets_from_function(self._build) - else: - dataset = self._build() - - return dataset - - def _build( - self, - input_context: Optional[tf.distribute.InputContext] = None - ) -> tf.data.Dataset: - """Construct a dataset end-to-end and return it. - - Args: - input_context: An optional context provided by `tf.distribute` for - cross-replica training. - - Returns: - A TensorFlow dataset outputting batched images and labels. - """ - builders = { - 'tfds': self.load_tfds, - 'records': self.load_records, - 'synthetic': self.load_synthetic, - } - - builder = builders.get(self.config.builder, None) - - if builder is None: - raise ValueError('Unknown builder type {}'.format(self.config.builder)) - - self.input_context = input_context - dataset = builder() - dataset = self.pipeline(dataset) - - return dataset - - def load_tfds(self) -> tf.data.Dataset: - """Return a dataset loading files from TFDS.""" - - logging.info('Using TFDS to load data.') - - builder = tfds.builder(self.config.name, data_dir=self.config.data_dir) - - if self.config.download: - builder.download_and_prepare() - - decoders = {} - - if self.config.skip_decoding: - decoders['image'] = tfds.decode.SkipDecoding() - - read_config = tfds.ReadConfig( - interleave_cycle_length=10, - interleave_block_length=1, - input_context=self.input_context) - - dataset = builder.as_dataset( - split=self.config.split, - as_supervised=True, - shuffle_files=True, - decoders=decoders, - read_config=read_config) - - return dataset - - def load_records(self) -> tf.data.Dataset: - """Return a dataset loading files with TFRecords.""" - logging.info('Using TFRecords to load data.') - if self.config.filenames is None: - if self.config.data_dir is None: - raise ValueError('Dataset must specify a path for the data files.') - - file_pattern = os.path.join(self.config.data_dir, - '{}*'.format(self.config.split)) - dataset = tf.data.Dataset.list_files(file_pattern, shuffle=False) - else: - dataset = tf.data.Dataset.from_tensor_slices(self.config.filenames) - - return dataset - - def load_synthetic(self) -> tf.data.Dataset: - """Return a dataset generating dummy synthetic data.""" - logging.info('Generating a synthetic dataset.') - - def generate_data(_): - image = tf.zeros([self.image_size, self.image_size, self.num_channels], - dtype=self.dtype) - label = tf.zeros([1], dtype=tf.int32) - return image, label - - dataset = tf.data.Dataset.range(1) - dataset = dataset.repeat() - dataset = dataset.map( - generate_data, num_parallel_calls=tf.data.experimental.AUTOTUNE) - return dataset - - def pipeline(self, dataset: tf.data.Dataset) -> tf.data.Dataset: - """Build a pipeline fetching, shuffling, and preprocessing the dataset. - - Args: - dataset: A `tf.data.Dataset` that loads raw files. - - Returns: - A TensorFlow dataset outputting batched images and labels. - """ - if (self.config.builder != 'tfds' and self.input_context and - self.input_context.num_input_pipelines > 1): - dataset = dataset.shard(self.input_context.num_input_pipelines, - self.input_context.input_pipeline_id) - logging.info( - 'Sharding the dataset: input_pipeline_id=%d ' - 'num_input_pipelines=%d', self.input_context.num_input_pipelines, - self.input_context.input_pipeline_id) - - if self.is_training and self.config.builder == 'records': - # Shuffle the input files. - dataset.shuffle(buffer_size=self.config.file_shuffle_buffer_size) - - if self.is_training and not self.config.cache: - dataset = dataset.repeat() - - if self.config.builder == 'records': - # Read the data from disk in parallel - dataset = dataset.interleave( - tf.data.TFRecordDataset, - cycle_length=10, - block_length=1, - num_parallel_calls=tf.data.experimental.AUTOTUNE) - - if self.config.cache: - dataset = dataset.cache() - - if self.is_training: - dataset = dataset.shuffle(self.config.shuffle_buffer_size) - dataset = dataset.repeat() - - # Parse, pre-process, and batch the data in parallel - if self.config.builder == 'records': - preprocess = self.parse_record - else: - preprocess = self.preprocess - dataset = dataset.map( - preprocess, num_parallel_calls=tf.data.experimental.AUTOTUNE) - - if self.input_context and self.config.num_devices > 1: - if not self.config.use_per_replica_batch_size: - raise ValueError( - 'The builder does not support a global batch size with more than ' - 'one replica. Got {} replicas. Please set a ' - '`per_replica_batch_size` and enable ' - '`use_per_replica_batch_size=True`.'.format( - self.config.num_devices)) - - # The batch size of the dataset will be multiplied by the number of - # replicas automatically when strategy.distribute_datasets_from_function - # is called, so we use local batch size here. - dataset = dataset.batch( - self.local_batch_size, drop_remainder=self.is_training) - else: - dataset = dataset.batch( - self.global_batch_size, drop_remainder=self.is_training) - - # Prefetch overlaps in-feed with training - dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE) - - if self.config.tf_data_service: - if not hasattr(tf.data.experimental, 'service'): - raise ValueError('The tf_data_service flag requires Tensorflow version ' - '>= 2.3.0, but the version is {}'.format( - tf.__version__)) - dataset = dataset.apply( - tf.data.experimental.service.distribute( - processing_mode='parallel_epochs', - service=self.config.tf_data_service, - job_name='resnet_train')) - dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) - - return dataset - - def parse_record(self, record: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - """Parse an ImageNet record from a serialized string Tensor.""" - keys_to_features = { - 'image/encoded': tf.io.FixedLenFeature((), tf.string, ''), - 'image/format': tf.io.FixedLenFeature((), tf.string, 'jpeg'), - 'image/class/label': tf.io.FixedLenFeature([], tf.int64, -1), - 'image/class/text': tf.io.FixedLenFeature([], tf.string, ''), - 'image/object/bbox/xmin': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/bbox/ymin': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/bbox/xmax': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/bbox/ymax': tf.io.VarLenFeature(dtype=tf.float32), - 'image/object/class/label': tf.io.VarLenFeature(dtype=tf.int64), - } - - parsed = tf.io.parse_single_example(record, keys_to_features) - - label = tf.reshape(parsed['image/class/label'], shape=[1]) - - # Subtract one so that labels are in [0, 1000) - label -= 1 - - image_bytes = tf.reshape(parsed['image/encoded'], shape=[]) - image, label = self.preprocess(image_bytes, label) - - return image, label - - def preprocess(self, image: tf.Tensor, - label: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: - """Apply image preprocessing and augmentation to the image and label.""" - if self.is_training: - image = preprocessing.preprocess_for_train( - image, - image_size=self.image_size, - mean_subtract=self.config.mean_subtract, - standardize=self.config.standardize, - dtype=self.dtype, - augmenter=self.augmenter) - else: - image = preprocessing.preprocess_for_eval( - image, - image_size=self.image_size, - num_channels=self.num_channels, - mean_subtract=self.config.mean_subtract, - standardize=self.config.standardize, - dtype=self.dtype) - - label = tf.cast(label, tf.int32) - if self.config.one_hot: - label = tf.one_hot(label, self.num_classes) - label = tf.reshape(label, [self.num_classes]) - - return image, label - - @classmethod - def from_params(cls, *args, **kwargs): - """Construct a dataset builder from a default config and any overrides.""" - config = DatasetConfig.from_args(*args, **kwargs) - return cls(config) diff --git a/official/vision/image_classification/efficientnet/__init__.py b/official/vision/image_classification/efficientnet/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/image_classification/efficientnet/common_modules.py b/official/vision/image_classification/efficientnet/common_modules.py deleted file mode 100644 index 9c3d11c8676773be4f7fc27187d0852fdd58aaf4..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/common_modules.py +++ /dev/null @@ -1,118 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Common modeling utilities.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import numpy as np -import tensorflow as tf -import tensorflow.compat.v1 as tf1 -from typing import Text, Optional - -from tensorflow.python.tpu import tpu_function - - -@tf.keras.utils.register_keras_serializable(package='Vision') -class TpuBatchNormalization(tf.keras.layers.BatchNormalization): - """Cross replica batch normalization.""" - - def __init__(self, fused: Optional[bool] = False, **kwargs): - if fused in (True, None): - raise ValueError('TpuBatchNormalization does not support fused=True.') - super(TpuBatchNormalization, self).__init__(fused=fused, **kwargs) - - def _cross_replica_average(self, t: tf.Tensor, num_shards_per_group: int): - """Calculates the average value of input tensor across TPU replicas.""" - num_shards = tpu_function.get_tpu_context().number_of_shards - group_assignment = None - if num_shards_per_group > 1: - if num_shards % num_shards_per_group != 0: - raise ValueError( - 'num_shards: %d mod shards_per_group: %d, should be 0' % - (num_shards, num_shards_per_group)) - num_groups = num_shards // num_shards_per_group - group_assignment = [[ - x for x in range(num_shards) if x // num_shards_per_group == y - ] for y in range(num_groups)] - return tf1.tpu.cross_replica_sum(t, group_assignment) / tf.cast( - num_shards_per_group, t.dtype) - - def _moments(self, inputs: tf.Tensor, reduction_axes: int, keep_dims: int): - """Compute the mean and variance: it overrides the original _moments.""" - shard_mean, shard_variance = super(TpuBatchNormalization, self)._moments( - inputs, reduction_axes, keep_dims=keep_dims) - - num_shards = tpu_function.get_tpu_context().number_of_shards or 1 - if num_shards <= 8: # Skip cross_replica for 2x2 or smaller slices. - num_shards_per_group = 1 - else: - num_shards_per_group = max(8, num_shards // 8) - if num_shards_per_group > 1: - # Compute variance using: Var[X]= E[X^2] - E[X]^2. - shard_square_of_mean = tf.math.square(shard_mean) - shard_mean_of_square = shard_variance + shard_square_of_mean - group_mean = self._cross_replica_average(shard_mean, num_shards_per_group) - group_mean_of_square = self._cross_replica_average( - shard_mean_of_square, num_shards_per_group) - group_variance = group_mean_of_square - tf.math.square(group_mean) - return (group_mean, group_variance) - else: - return (shard_mean, shard_variance) - - -def get_batch_norm(batch_norm_type: Text) -> tf.keras.layers.BatchNormalization: - """A helper to create a batch normalization getter. - - Args: - batch_norm_type: The type of batch normalization layer implementation. `tpu` - will use `TpuBatchNormalization`. - - Returns: - An instance of `tf.keras.layers.BatchNormalization`. - """ - if batch_norm_type == 'tpu': - return TpuBatchNormalization - - return tf.keras.layers.BatchNormalization # pytype: disable=bad-return-type # typed-keras - - -def count_params(model, trainable_only=True): - """Returns the count of all model parameters, or just trainable ones.""" - if not trainable_only: - return model.count_params() - else: - return int( - np.sum([ - tf.keras.backend.count_params(p) for p in model.trainable_weights - ])) - - -def load_weights(model: tf.keras.Model, - model_weights_path: Text, - weights_format: Text = 'saved_model'): - """Load model weights from the given file path. - - Args: - model: the model to load weights into - model_weights_path: the path of the model weights - weights_format: the model weights format. One of 'saved_model', 'h5', or - 'checkpoint'. - """ - if weights_format == 'saved_model': - loaded_model = tf.keras.models.load_model(model_weights_path) - model.set_weights(loaded_model.get_weights()) - else: - model.load_weights(model_weights_path) diff --git a/official/vision/image_classification/efficientnet/efficientnet_config.py b/official/vision/image_classification/efficientnet/efficientnet_config.py deleted file mode 100644 index 47cfd740221d3581db585e90bc6df0711c289019..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/efficientnet_config.py +++ /dev/null @@ -1,78 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configuration definitions for EfficientNet losses, learning rates, and optimizers.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from typing import Any, Mapping - -import dataclasses - -from official.modeling.hyperparams import base_config -from official.vision.image_classification.configs import base_configs - - -@dataclasses.dataclass -class EfficientNetModelConfig(base_configs.ModelConfig): - """Configuration for the EfficientNet model. - - This configuration will default to settings used for training efficientnet-b0 - on a v3-8 TPU on ImageNet. - - Attributes: - name: The name of the model. Defaults to 'EfficientNet'. - num_classes: The number of classes in the model. - model_params: A dictionary that represents the parameters of the - EfficientNet model. These will be passed in to the "from_name" function. - loss: The configuration for loss. Defaults to a categorical cross entropy - implementation. - optimizer: The configuration for optimizations. Defaults to an RMSProp - configuration. - learning_rate: The configuration for learning rate. Defaults to an - exponential configuration. - """ - name: str = 'EfficientNet' - num_classes: int = 1000 - model_params: base_config.Config = dataclasses.field( - default_factory=lambda: { - 'model_name': 'efficientnet-b0', - 'model_weights_path': '', - 'weights_format': 'saved_model', - 'overrides': { - 'batch_norm': 'default', - 'rescale_input': True, - 'num_classes': 1000, - 'activation': 'swish', - 'dtype': 'float32', - } - }) - loss: base_configs.LossConfig = base_configs.LossConfig( - name='categorical_crossentropy', label_smoothing=0.1) - optimizer: base_configs.OptimizerConfig = base_configs.OptimizerConfig( - name='rmsprop', - decay=0.9, - epsilon=0.001, - momentum=0.9, - moving_average_decay=None) - learning_rate: base_configs.LearningRateConfig = base_configs.LearningRateConfig( # pylint: disable=line-too-long - name='exponential', - initial_lr=0.008, - decay_epochs=2.4, - decay_rate=0.97, - warmup_epochs=5, - scale_by_batch_size=1. / 128., - staircase=True) diff --git a/official/vision/image_classification/efficientnet/efficientnet_model.py b/official/vision/image_classification/efficientnet/efficientnet_model.py deleted file mode 100644 index ad385715cd866209a0d3958a6742cbde73f16091..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/efficientnet_model.py +++ /dev/null @@ -1,499 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Contains definitions for EfficientNet model. - -[1] Mingxing Tan, Quoc V. Le - EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. - ICML'19, https://arxiv.org/abs/1905.11946 -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import math -import os -from typing import Any, Dict, Optional, Text, Tuple - -from absl import logging -from dataclasses import dataclass -import tensorflow as tf - -from official.modeling import tf_utils -from official.modeling.hyperparams import base_config -from official.vision.image_classification import preprocessing -from official.vision.image_classification.efficientnet import common_modules - - -@dataclass -class BlockConfig(base_config.Config): - """Config for a single MB Conv Block.""" - input_filters: int = 0 - output_filters: int = 0 - kernel_size: int = 3 - num_repeat: int = 1 - expand_ratio: int = 1 - strides: Tuple[int, int] = (1, 1) - se_ratio: Optional[float] = None - id_skip: bool = True - fused_conv: bool = False - conv_type: str = 'depthwise' - - -@dataclass -class ModelConfig(base_config.Config): - """Default Config for Efficientnet-B0.""" - width_coefficient: float = 1.0 - depth_coefficient: float = 1.0 - resolution: int = 224 - dropout_rate: float = 0.2 - blocks: Tuple[BlockConfig, ...] = ( - # (input_filters, output_filters, kernel_size, num_repeat, - # expand_ratio, strides, se_ratio) - # pylint: disable=bad-whitespace - BlockConfig.from_args(32, 16, 3, 1, 1, (1, 1), 0.25), - BlockConfig.from_args(16, 24, 3, 2, 6, (2, 2), 0.25), - BlockConfig.from_args(24, 40, 5, 2, 6, (2, 2), 0.25), - BlockConfig.from_args(40, 80, 3, 3, 6, (2, 2), 0.25), - BlockConfig.from_args(80, 112, 5, 3, 6, (1, 1), 0.25), - BlockConfig.from_args(112, 192, 5, 4, 6, (2, 2), 0.25), - BlockConfig.from_args(192, 320, 3, 1, 6, (1, 1), 0.25), - # pylint: enable=bad-whitespace - ) - stem_base_filters: int = 32 - top_base_filters: int = 1280 - activation: str = 'simple_swish' - batch_norm: str = 'default' - bn_momentum: float = 0.99 - bn_epsilon: float = 1e-3 - # While the original implementation used a weight decay of 1e-5, - # tf.nn.l2_loss divides it by 2, so we halve this to compensate in Keras - weight_decay: float = 5e-6 - drop_connect_rate: float = 0.2 - depth_divisor: int = 8 - min_depth: Optional[int] = None - use_se: bool = True - input_channels: int = 3 - num_classes: int = 1000 - model_name: str = 'efficientnet' - rescale_input: bool = True - data_format: str = 'channels_last' - dtype: str = 'float32' - - -MODEL_CONFIGS = { - # (width, depth, resolution, dropout) - 'efficientnet-b0': ModelConfig.from_args(1.0, 1.0, 224, 0.2), - 'efficientnet-b1': ModelConfig.from_args(1.0, 1.1, 240, 0.2), - 'efficientnet-b2': ModelConfig.from_args(1.1, 1.2, 260, 0.3), - 'efficientnet-b3': ModelConfig.from_args(1.2, 1.4, 300, 0.3), - 'efficientnet-b4': ModelConfig.from_args(1.4, 1.8, 380, 0.4), - 'efficientnet-b5': ModelConfig.from_args(1.6, 2.2, 456, 0.4), - 'efficientnet-b6': ModelConfig.from_args(1.8, 2.6, 528, 0.5), - 'efficientnet-b7': ModelConfig.from_args(2.0, 3.1, 600, 0.5), - 'efficientnet-b8': ModelConfig.from_args(2.2, 3.6, 672, 0.5), - 'efficientnet-l2': ModelConfig.from_args(4.3, 5.3, 800, 0.5), -} - -CONV_KERNEL_INITIALIZER = { - 'class_name': 'VarianceScaling', - 'config': { - 'scale': 2.0, - 'mode': 'fan_out', - # Note: this is a truncated normal distribution - 'distribution': 'normal' - } -} - -DENSE_KERNEL_INITIALIZER = { - 'class_name': 'VarianceScaling', - 'config': { - 'scale': 1 / 3.0, - 'mode': 'fan_out', - 'distribution': 'uniform' - } -} - - -def round_filters(filters: int, config: ModelConfig) -> int: - """Round number of filters based on width coefficient.""" - width_coefficient = config.width_coefficient - min_depth = config.min_depth - divisor = config.depth_divisor - orig_filters = filters - - if not width_coefficient: - return filters - - filters *= width_coefficient - min_depth = min_depth or divisor - new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor) - # Make sure that round down does not go down by more than 10%. - if new_filters < 0.9 * filters: - new_filters += divisor - logging.info('round_filter input=%s output=%s', orig_filters, new_filters) - return int(new_filters) - - -def round_repeats(repeats: int, depth_coefficient: float) -> int: - """Round number of repeats based on depth coefficient.""" - return int(math.ceil(depth_coefficient * repeats)) - - -def conv2d_block(inputs: tf.Tensor, - conv_filters: Optional[int], - config: ModelConfig, - kernel_size: Any = (1, 1), - strides: Any = (1, 1), - use_batch_norm: bool = True, - use_bias: bool = False, - activation: Optional[Any] = None, - depthwise: bool = False, - name: Optional[Text] = None): - """A conv2d followed by batch norm and an activation.""" - batch_norm = common_modules.get_batch_norm(config.batch_norm) - bn_momentum = config.bn_momentum - bn_epsilon = config.bn_epsilon - data_format = tf.keras.backend.image_data_format() - weight_decay = config.weight_decay - - name = name or '' - - # Collect args based on what kind of conv2d block is desired - init_kwargs = { - 'kernel_size': kernel_size, - 'strides': strides, - 'use_bias': use_bias, - 'padding': 'same', - 'name': name + '_conv2d', - 'kernel_regularizer': tf.keras.regularizers.l2(weight_decay), - 'bias_regularizer': tf.keras.regularizers.l2(weight_decay), - } - - if depthwise: - conv2d = tf.keras.layers.DepthwiseConv2D - init_kwargs.update({'depthwise_initializer': CONV_KERNEL_INITIALIZER}) - else: - conv2d = tf.keras.layers.Conv2D - init_kwargs.update({ - 'filters': conv_filters, - 'kernel_initializer': CONV_KERNEL_INITIALIZER - }) - - x = conv2d(**init_kwargs)(inputs) - - if use_batch_norm: - bn_axis = 1 if data_format == 'channels_first' else -1 - x = batch_norm( - axis=bn_axis, - momentum=bn_momentum, - epsilon=bn_epsilon, - name=name + '_bn')( - x) - - if activation is not None: - x = tf.keras.layers.Activation(activation, name=name + '_activation')(x) - return x - - -def mb_conv_block(inputs: tf.Tensor, - block: BlockConfig, - config: ModelConfig, - prefix: Optional[Text] = None): - """Mobile Inverted Residual Bottleneck. - - Args: - inputs: the Keras input to the block - block: BlockConfig, arguments to create a Block - config: ModelConfig, a set of model parameters - prefix: prefix for naming all layers - - Returns: - the output of the block - """ - use_se = config.use_se - activation = tf_utils.get_activation(config.activation) - drop_connect_rate = config.drop_connect_rate - data_format = tf.keras.backend.image_data_format() - use_depthwise = block.conv_type != 'no_depthwise' - prefix = prefix or '' - - filters = block.input_filters * block.expand_ratio - - x = inputs - - if block.fused_conv: - # If we use fused mbconv, skip expansion and use regular conv. - x = conv2d_block( - x, - filters, - config, - kernel_size=block.kernel_size, - strides=block.strides, - activation=activation, - name=prefix + 'fused') - else: - if block.expand_ratio != 1: - # Expansion phase - kernel_size = (1, 1) if use_depthwise else (3, 3) - x = conv2d_block( - x, - filters, - config, - kernel_size=kernel_size, - activation=activation, - name=prefix + 'expand') - - # Depthwise Convolution - if use_depthwise: - x = conv2d_block( - x, - conv_filters=None, - config=config, - kernel_size=block.kernel_size, - strides=block.strides, - activation=activation, - depthwise=True, - name=prefix + 'depthwise') - - # Squeeze and Excitation phase - if use_se: - assert block.se_ratio is not None - assert 0 < block.se_ratio <= 1 - num_reduced_filters = max(1, int(block.input_filters * block.se_ratio)) - - if data_format == 'channels_first': - se_shape = (filters, 1, 1) - else: - se_shape = (1, 1, filters) - - se = tf.keras.layers.GlobalAveragePooling2D(name=prefix + 'se_squeeze')(x) - se = tf.keras.layers.Reshape(se_shape, name=prefix + 'se_reshape')(se) - - se = conv2d_block( - se, - num_reduced_filters, - config, - use_bias=True, - use_batch_norm=False, - activation=activation, - name=prefix + 'se_reduce') - se = conv2d_block( - se, - filters, - config, - use_bias=True, - use_batch_norm=False, - activation='sigmoid', - name=prefix + 'se_expand') - x = tf.keras.layers.multiply([x, se], name=prefix + 'se_excite') - - # Output phase - x = conv2d_block( - x, block.output_filters, config, activation=None, name=prefix + 'project') - - # Add identity so that quantization-aware training can insert quantization - # ops correctly. - x = tf.keras.layers.Activation( - tf_utils.get_activation('identity'), name=prefix + 'id')( - x) - - if (block.id_skip and all(s == 1 for s in block.strides) and - block.input_filters == block.output_filters): - if drop_connect_rate and drop_connect_rate > 0: - # Apply dropconnect - # The only difference between dropout and dropconnect in TF is scaling by - # drop_connect_rate during training. See: - # https://github.com/keras-team/keras/pull/9898#issuecomment-380577612 - x = tf.keras.layers.Dropout( - drop_connect_rate, noise_shape=(None, 1, 1, 1), name=prefix + 'drop')( - x) - - x = tf.keras.layers.add([x, inputs], name=prefix + 'add') - - return x - - -def efficientnet(image_input: tf.keras.layers.Input, config: ModelConfig): # pytype: disable=invalid-annotation # typed-keras - """Creates an EfficientNet graph given the model parameters. - - This function is wrapped by the `EfficientNet` class to make a tf.keras.Model. - - Args: - image_input: the input batch of images - config: the model config - - Returns: - the output of efficientnet - """ - depth_coefficient = config.depth_coefficient - blocks = config.blocks - stem_base_filters = config.stem_base_filters - top_base_filters = config.top_base_filters - activation = tf_utils.get_activation(config.activation) - dropout_rate = config.dropout_rate - drop_connect_rate = config.drop_connect_rate - num_classes = config.num_classes - input_channels = config.input_channels - rescale_input = config.rescale_input - data_format = tf.keras.backend.image_data_format() - dtype = config.dtype - weight_decay = config.weight_decay - - x = image_input - if data_format == 'channels_first': - # Happens on GPU/TPU if available. - x = tf.keras.layers.Permute((3, 1, 2))(x) - if rescale_input: - x = preprocessing.normalize_images( - x, num_channels=input_channels, dtype=dtype, data_format=data_format) - - # Build stem - x = conv2d_block( - x, - round_filters(stem_base_filters, config), - config, - kernel_size=[3, 3], - strides=[2, 2], - activation=activation, - name='stem') - - # Build blocks - num_blocks_total = sum( - round_repeats(block.num_repeat, depth_coefficient) for block in blocks) - block_num = 0 - - for stack_idx, block in enumerate(blocks): - assert block.num_repeat > 0 - # Update block input and output filters based on depth multiplier - block = block.replace( - input_filters=round_filters(block.input_filters, config), - output_filters=round_filters(block.output_filters, config), - num_repeat=round_repeats(block.num_repeat, depth_coefficient)) - - # The first block needs to take care of stride and filter size increase - drop_rate = drop_connect_rate * float(block_num) / num_blocks_total - config = config.replace(drop_connect_rate=drop_rate) - block_prefix = 'stack_{}/block_0/'.format(stack_idx) - x = mb_conv_block(x, block, config, block_prefix) - block_num += 1 - if block.num_repeat > 1: - block = block.replace(input_filters=block.output_filters, strides=[1, 1]) - - for block_idx in range(block.num_repeat - 1): - drop_rate = drop_connect_rate * float(block_num) / num_blocks_total - config = config.replace(drop_connect_rate=drop_rate) - block_prefix = 'stack_{}/block_{}/'.format(stack_idx, block_idx + 1) - x = mb_conv_block(x, block, config, prefix=block_prefix) - block_num += 1 - - # Build top - x = conv2d_block( - x, - round_filters(top_base_filters, config), - config, - activation=activation, - name='top') - - # Build classifier - x = tf.keras.layers.GlobalAveragePooling2D(name='top_pool')(x) - if dropout_rate and dropout_rate > 0: - x = tf.keras.layers.Dropout(dropout_rate, name='top_dropout')(x) - x = tf.keras.layers.Dense( - num_classes, - kernel_initializer=DENSE_KERNEL_INITIALIZER, - kernel_regularizer=tf.keras.regularizers.l2(weight_decay), - bias_regularizer=tf.keras.regularizers.l2(weight_decay), - name='logits')( - x) - x = tf.keras.layers.Activation('softmax', name='probs')(x) - - return x - - -class EfficientNet(tf.keras.Model): - """Wrapper class for an EfficientNet Keras model. - - Contains helper methods to build, manage, and save metadata about the model. - """ - - def __init__(self, - config: Optional[ModelConfig] = None, - overrides: Optional[Dict[Text, Any]] = None): - """Create an EfficientNet model. - - Args: - config: (optional) the main model parameters to create the model - overrides: (optional) a dict containing keys that can override config - """ - overrides = overrides or {} - config = config or ModelConfig() - - self.config = config.replace(**overrides) - - input_channels = self.config.input_channels - model_name = self.config.model_name - input_shape = (None, None, input_channels) # Should handle any size image - image_input = tf.keras.layers.Input(shape=input_shape) - - output = efficientnet(image_input, self.config) - - # Cast to float32 in case we have a different model dtype - output = tf.cast(output, tf.float32) - - logging.info('Building model %s with params %s', model_name, self.config) - - super(EfficientNet, self).__init__( - inputs=image_input, outputs=output, name=model_name) - - @classmethod - def from_name(cls, - model_name: Text, - model_weights_path: Optional[Text] = None, - weights_format: Text = 'saved_model', - overrides: Optional[Dict[Text, Any]] = None): - """Construct an EfficientNet model from a predefined model name. - - E.g., `EfficientNet.from_name('efficientnet-b0')`. - - Args: - model_name: the predefined model name - model_weights_path: the path to the weights (h5 file or saved model dir) - weights_format: the model weights format. One of 'saved_model', 'h5', or - 'checkpoint'. - overrides: (optional) a dict containing keys that can override config - - Returns: - A constructed EfficientNet instance. - """ - model_configs = dict(MODEL_CONFIGS) - overrides = dict(overrides) if overrides else {} - - # One can define their own custom models if necessary - model_configs.update(overrides.pop('model_config', {})) - - if model_name not in model_configs: - raise ValueError('Unknown model name {}'.format(model_name)) - - config = model_configs[model_name] - - model = cls(config=config, overrides=overrides) - - if model_weights_path: - common_modules.load_weights( - model, model_weights_path, weights_format=weights_format) - - return model diff --git a/official/vision/image_classification/efficientnet/tfhub_export.py b/official/vision/image_classification/efficientnet/tfhub_export.py deleted file mode 100644 index d3518a1304c8c761cfaabdcc96dead70dd9b0097..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/efficientnet/tfhub_export.py +++ /dev/null @@ -1,67 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""A script to export TF-Hub SavedModel.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -from absl import app -from absl import flags - -import tensorflow as tf - -from official.vision.image_classification.efficientnet import efficientnet_model - -FLAGS = flags.FLAGS - -flags.DEFINE_string("model_name", None, "EfficientNet model name.") -flags.DEFINE_string("model_path", None, "File path to TF model checkpoint.") -flags.DEFINE_string("export_path", None, - "TF-Hub SavedModel destination path to export.") - - -def export_tfhub(model_path, hub_destination, model_name): - """Restores a tf.keras.Model and saves for TF-Hub.""" - model_configs = dict(efficientnet_model.MODEL_CONFIGS) - config = model_configs[model_name] - - image_input = tf.keras.layers.Input( - shape=(None, None, 3), name="image_input", dtype=tf.float32) - x = image_input * 255.0 - ouputs = efficientnet_model.efficientnet(x, config) - hub_model = tf.keras.Model(image_input, ouputs) - ckpt = tf.train.Checkpoint(model=hub_model) - ckpt.restore(model_path).assert_existing_objects_matched() - hub_model.save( - os.path.join(hub_destination, "classification"), include_optimizer=False) - - feature_vector_output = hub_model.get_layer(name="top_pool").get_output_at(0) - hub_model2 = tf.keras.Model(image_input, feature_vector_output) - hub_model2.save( - os.path.join(hub_destination, "feature-vector"), include_optimizer=False) - - -def main(argv): - if len(argv) > 1: - raise app.UsageError("Too many command-line arguments.") - - export_tfhub(FLAGS.model_path, FLAGS.export_path, FLAGS.model_name) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/vision/image_classification/learning_rate.py b/official/vision/image_classification/learning_rate.py deleted file mode 100644 index 72f7e95187521eeebefa1e698ca5382f10642e88..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/learning_rate.py +++ /dev/null @@ -1,117 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Learning rate utilities for vision tasks.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from typing import Any, Mapping, Optional - -import numpy as np -import tensorflow as tf - -BASE_LEARNING_RATE = 0.1 - - -class WarmupDecaySchedule(tf.keras.optimizers.schedules.LearningRateSchedule): - """A wrapper for LearningRateSchedule that includes warmup steps.""" - - def __init__(self, - lr_schedule: tf.keras.optimizers.schedules.LearningRateSchedule, - warmup_steps: int, - warmup_lr: Optional[float] = None): - """Add warmup decay to a learning rate schedule. - - Args: - lr_schedule: base learning rate scheduler - warmup_steps: number of warmup steps - warmup_lr: an optional field for the final warmup learning rate. This - should be provided if the base `lr_schedule` does not contain this - field. - """ - super(WarmupDecaySchedule, self).__init__() - self._lr_schedule = lr_schedule - self._warmup_steps = warmup_steps - self._warmup_lr = warmup_lr - - def __call__(self, step: int): - lr = self._lr_schedule(step) - if self._warmup_steps: - if self._warmup_lr is not None: - initial_learning_rate = tf.convert_to_tensor( - self._warmup_lr, name="initial_learning_rate") - else: - initial_learning_rate = tf.convert_to_tensor( - self._lr_schedule.initial_learning_rate, - name="initial_learning_rate") - dtype = initial_learning_rate.dtype - global_step_recomp = tf.cast(step, dtype) - warmup_steps = tf.cast(self._warmup_steps, dtype) - warmup_lr = initial_learning_rate * global_step_recomp / warmup_steps - lr = tf.cond(global_step_recomp < warmup_steps, lambda: warmup_lr, - lambda: lr) - return lr - - def get_config(self) -> Mapping[str, Any]: - config = self._lr_schedule.get_config() - config.update({ - "warmup_steps": self._warmup_steps, - "warmup_lr": self._warmup_lr, - }) - return config - - -class CosineDecayWithWarmup(tf.keras.optimizers.schedules.LearningRateSchedule): - """Class to generate learning rate tensor.""" - - def __init__(self, batch_size: int, total_steps: int, warmup_steps: int): - """Creates the consine learning rate tensor with linear warmup. - - Args: - batch_size: The training batch size used in the experiment. - total_steps: Total training steps. - warmup_steps: Steps for the warm up period. - """ - super(CosineDecayWithWarmup, self).__init__() - base_lr_batch_size = 256 - self._total_steps = total_steps - self._init_learning_rate = BASE_LEARNING_RATE * batch_size / base_lr_batch_size - self._warmup_steps = warmup_steps - - def __call__(self, global_step: int): - global_step = tf.cast(global_step, dtype=tf.float32) - warmup_steps = self._warmup_steps - init_lr = self._init_learning_rate - total_steps = self._total_steps - - linear_warmup = global_step / warmup_steps * init_lr - - cosine_learning_rate = init_lr * (tf.cos(np.pi * - (global_step - warmup_steps) / - (total_steps - warmup_steps)) + - 1.0) / 2.0 - - learning_rate = tf.where(global_step < warmup_steps, linear_warmup, - cosine_learning_rate) - return learning_rate - - def get_config(self): - return { - "total_steps": self._total_steps, - "warmup_learning_rate": self._warmup_learning_rate, - "warmup_steps": self._warmup_steps, - "init_learning_rate": self._init_learning_rate, - } diff --git a/official/vision/image_classification/learning_rate_test.py b/official/vision/image_classification/learning_rate_test.py deleted file mode 100644 index 6c33ed24b8e46b8ecb58005a1f528e62a66f0005..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/learning_rate_test.py +++ /dev/null @@ -1,60 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for learning_rate.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - -from official.vision.image_classification import learning_rate - - -class LearningRateTests(tf.test.TestCase): - - def test_warmup_decay(self): - """Basic computational test for warmup decay.""" - initial_lr = 0.01 - decay_steps = 100 - decay_rate = 0.01 - warmup_steps = 10 - - base_lr = tf.keras.optimizers.schedules.ExponentialDecay( - initial_learning_rate=initial_lr, - decay_steps=decay_steps, - decay_rate=decay_rate) - lr = learning_rate.WarmupDecaySchedule( - lr_schedule=base_lr, warmup_steps=warmup_steps) - - for step in range(warmup_steps - 1): - config = lr.get_config() - self.assertEqual(config['warmup_steps'], warmup_steps) - self.assertAllClose( - self.evaluate(lr(step)), step / warmup_steps * initial_lr) - - def test_cosine_decay_with_warmup(self): - """Basic computational test for cosine decay with warmup.""" - expected_lrs = [0.0, 0.1, 0.05, 0.0] - - lr = learning_rate.CosineDecayWithWarmup( - batch_size=256, total_steps=3, warmup_steps=1) - - for step in [0, 1, 2, 3]: - self.assertAllClose(lr(step), expected_lrs[step]) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/mnist_main.py b/official/vision/image_classification/mnist_main.py deleted file mode 100644 index 3eba80b06a9215cb5dc4d3b13facb2f2a4f3058c..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/mnist_main.py +++ /dev/null @@ -1,176 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Runs a simple model on the MNIST dataset.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import tensorflow as tf -import tensorflow_datasets as tfds -from official.common import distribute_utils -from official.utils.flags import core as flags_core -from official.utils.misc import model_helpers -from official.vision.image_classification.resnet import common - -FLAGS = flags.FLAGS - - -def build_model(): - """Constructs the ML model used to predict handwritten digits.""" - - image = tf.keras.layers.Input(shape=(28, 28, 1)) - - y = tf.keras.layers.Conv2D(filters=32, - kernel_size=5, - padding='same', - activation='relu')(image) - y = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), - strides=(2, 2), - padding='same')(y) - y = tf.keras.layers.Conv2D(filters=32, - kernel_size=5, - padding='same', - activation='relu')(y) - y = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), - strides=(2, 2), - padding='same')(y) - y = tf.keras.layers.Flatten()(y) - y = tf.keras.layers.Dense(1024, activation='relu')(y) - y = tf.keras.layers.Dropout(0.4)(y) - - probs = tf.keras.layers.Dense(10, activation='softmax')(y) - - model = tf.keras.models.Model(image, probs, name='mnist') - - return model - - -@tfds.decode.make_decoder(output_dtype=tf.float32) -def decode_image(example, feature): - """Convert image to float32 and normalize from [0, 255] to [0.0, 1.0].""" - return tf.cast(feature.decode_example(example), dtype=tf.float32) / 255 - - -def run(flags_obj, datasets_override=None, strategy_override=None): - """Run MNIST model training and eval loop using native Keras APIs. - - Args: - flags_obj: An object containing parsed flag values. - datasets_override: A pair of `tf.data.Dataset` objects to train the model, - representing the train and test sets. - strategy_override: A `tf.distribute.Strategy` object to use for model. - - Returns: - Dictionary of training and eval stats. - """ - # Start TF profiler server. - tf.profiler.experimental.server.start(flags_obj.profiler_port) - - strategy = strategy_override or distribute_utils.get_distribution_strategy( - distribution_strategy=flags_obj.distribution_strategy, - num_gpus=flags_obj.num_gpus, - tpu_address=flags_obj.tpu) - - strategy_scope = distribute_utils.get_strategy_scope(strategy) - - mnist = tfds.builder('mnist', data_dir=flags_obj.data_dir) - if flags_obj.download: - mnist.download_and_prepare() - - mnist_train, mnist_test = datasets_override or mnist.as_dataset( - split=['train', 'test'], - decoders={'image': decode_image()}, # pylint: disable=no-value-for-parameter - as_supervised=True) - train_input_dataset = mnist_train.cache().repeat().shuffle( - buffer_size=50000).batch(flags_obj.batch_size) - eval_input_dataset = mnist_test.cache().repeat().batch(flags_obj.batch_size) - - with strategy_scope: - lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay( - 0.05, decay_steps=100000, decay_rate=0.96) - optimizer = tf.keras.optimizers.SGD(learning_rate=lr_schedule) - - model = build_model() - model.compile( - optimizer=optimizer, - loss='sparse_categorical_crossentropy', - metrics=['sparse_categorical_accuracy']) - - num_train_examples = mnist.info.splits['train'].num_examples - train_steps = num_train_examples // flags_obj.batch_size - train_epochs = flags_obj.train_epochs - - ckpt_full_path = os.path.join(flags_obj.model_dir, 'model.ckpt-{epoch:04d}') - callbacks = [ - tf.keras.callbacks.ModelCheckpoint( - ckpt_full_path, save_weights_only=True), - tf.keras.callbacks.TensorBoard(log_dir=flags_obj.model_dir), - ] - - num_eval_examples = mnist.info.splits['test'].num_examples - num_eval_steps = num_eval_examples // flags_obj.batch_size - - history = model.fit( - train_input_dataset, - epochs=train_epochs, - steps_per_epoch=train_steps, - callbacks=callbacks, - validation_steps=num_eval_steps, - validation_data=eval_input_dataset, - validation_freq=flags_obj.epochs_between_evals) - - export_path = os.path.join(flags_obj.model_dir, 'saved_model') - model.save(export_path, include_optimizer=False) - - eval_output = model.evaluate( - eval_input_dataset, steps=num_eval_steps, verbose=2) - - stats = common.build_stats(history, eval_output, callbacks) - return stats - - -def define_mnist_flags(): - """Define command line flags for MNIST model.""" - flags_core.define_base( - clean=True, - num_gpu=True, - train_epochs=True, - epochs_between_evals=True, - distribution_strategy=True) - flags_core.define_device() - flags_core.define_distribution() - flags.DEFINE_bool('download', True, - 'Whether to download data to `--data_dir`.') - flags.DEFINE_integer('profiler_port', 9012, - 'Port to start profiler server on.') - FLAGS.set_default('batch_size', 1024) - - -def main(_): - model_helpers.apply_clean(FLAGS) - stats = run(flags.FLAGS) - logging.info('Run stats:\n%s', stats) - - -if __name__ == '__main__': - logging.set_verbosity(logging.INFO) - define_mnist_flags() - app.run(main) diff --git a/official/vision/image_classification/mnist_test.py b/official/vision/image_classification/mnist_test.py deleted file mode 100644 index c94396a444294b37259ba849bd8ea2f6f76997d0..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/mnist_test.py +++ /dev/null @@ -1,89 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test the Keras MNIST model on GPU.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import functools - -from absl.testing import parameterized -import tensorflow as tf - -from tensorflow.python.distribute import combinations -from tensorflow.python.distribute import strategy_combinations -from official.utils.testing import integration -from official.vision.image_classification import mnist_main - - -mnist_main.define_mnist_flags() - - -def eager_strategy_combinations(): - return combinations.combine( - distribution=[ - strategy_combinations.default_strategy, - strategy_combinations.cloud_tpu_strategy, - strategy_combinations.one_device_strategy_gpu, - ],) - - -class KerasMnistTest(tf.test.TestCase, parameterized.TestCase): - """Unit tests for sample Keras MNIST model.""" - _tempdir = None - - @classmethod - def setUpClass(cls): # pylint: disable=invalid-name - super(KerasMnistTest, cls).setUpClass() - - def tearDown(self): - super(KerasMnistTest, self).tearDown() - tf.io.gfile.rmtree(self.get_temp_dir()) - - @combinations.generate(eager_strategy_combinations()) - def test_end_to_end(self, distribution): - """Test Keras MNIST model with `strategy`.""" - - extra_flags = [ - "-train_epochs", - "1", - # Let TFDS find the metadata folder automatically - "--data_dir=" - ] - - dummy_data = ( - tf.ones(shape=(10, 28, 28, 1), dtype=tf.int32), - tf.range(10), - ) - datasets = ( - tf.data.Dataset.from_tensor_slices(dummy_data), - tf.data.Dataset.from_tensor_slices(dummy_data), - ) - - run = functools.partial( - mnist_main.run, - datasets_override=datasets, - strategy_override=distribution) - - integration.run_synthetic( - main=run, - synth=False, - tmp_root=self.create_tempdir().full_path, - extra_flags=extra_flags) - - -if __name__ == "__main__": - tf.test.main() diff --git a/official/vision/image_classification/optimizer_factory.py b/official/vision/image_classification/optimizer_factory.py deleted file mode 100644 index 48a4512ee96438cec1367d6493f63a230b01eeb1..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/optimizer_factory.py +++ /dev/null @@ -1,181 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Optimizer factory for vision tasks.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from typing import Any, Dict, Optional, Text - -from absl import logging -import tensorflow as tf -import tensorflow_addons as tfa - -from official.modeling import optimization -from official.vision.image_classification import learning_rate -from official.vision.image_classification.configs import base_configs - -# pylint: disable=protected-access - - -def build_optimizer( - optimizer_name: Text, - base_learning_rate: tf.keras.optimizers.schedules.LearningRateSchedule, - params: Dict[Text, Any], - model: Optional[tf.keras.Model] = None): - """Build the optimizer based on name. - - Args: - optimizer_name: String representation of the optimizer name. Examples: sgd, - momentum, rmsprop. - base_learning_rate: `tf.keras.optimizers.schedules.LearningRateSchedule` - base learning rate. - params: String -> Any dictionary representing the optimizer params. This - should contain optimizer specific parameters such as `base_learning_rate`, - `decay`, etc. - model: The `tf.keras.Model`. This is used for the shadow copy if using - `ExponentialMovingAverage`. - - Returns: - A tf.keras.Optimizer. - - Raises: - ValueError if the provided optimizer_name is not supported. - - """ - optimizer_name = optimizer_name.lower() - logging.info('Building %s optimizer with params %s', optimizer_name, params) - - if optimizer_name == 'sgd': - logging.info('Using SGD optimizer') - nesterov = params.get('nesterov', False) - optimizer = tf.keras.optimizers.SGD( - learning_rate=base_learning_rate, nesterov=nesterov) - elif optimizer_name == 'momentum': - logging.info('Using momentum optimizer') - nesterov = params.get('nesterov', False) - optimizer = tf.keras.optimizers.SGD( - learning_rate=base_learning_rate, - momentum=params['momentum'], - nesterov=nesterov) - elif optimizer_name == 'rmsprop': - logging.info('Using RMSProp') - rho = params.get('decay', None) or params.get('rho', 0.9) - momentum = params.get('momentum', 0.9) - epsilon = params.get('epsilon', 1e-07) - optimizer = tf.keras.optimizers.RMSprop( - learning_rate=base_learning_rate, - rho=rho, - momentum=momentum, - epsilon=epsilon) - elif optimizer_name == 'adam': - logging.info('Using Adam') - beta_1 = params.get('beta_1', 0.9) - beta_2 = params.get('beta_2', 0.999) - epsilon = params.get('epsilon', 1e-07) - optimizer = tf.keras.optimizers.Adam( - learning_rate=base_learning_rate, - beta_1=beta_1, - beta_2=beta_2, - epsilon=epsilon) - elif optimizer_name == 'adamw': - logging.info('Using AdamW') - weight_decay = params.get('weight_decay', 0.01) - beta_1 = params.get('beta_1', 0.9) - beta_2 = params.get('beta_2', 0.999) - epsilon = params.get('epsilon', 1e-07) - optimizer = tfa.optimizers.AdamW( - weight_decay=weight_decay, - learning_rate=base_learning_rate, - beta_1=beta_1, - beta_2=beta_2, - epsilon=epsilon) - else: - raise ValueError('Unknown optimizer %s' % optimizer_name) - - if params.get('lookahead', None): - logging.info('Using lookahead optimizer.') - optimizer = tfa.optimizers.Lookahead(optimizer) - - # Moving average should be applied last, as it's applied at test time - moving_average_decay = params.get('moving_average_decay', 0.) - if moving_average_decay is not None and moving_average_decay > 0.: - if model is None: - raise ValueError( - '`model` must be provided if using `ExponentialMovingAverage`.') - logging.info('Including moving average decay.') - optimizer = optimization.ExponentialMovingAverage( - optimizer=optimizer, average_decay=moving_average_decay) - optimizer.shadow_copy(model) - return optimizer - - -def build_learning_rate(params: base_configs.LearningRateConfig, - batch_size: Optional[int] = None, - train_epochs: Optional[int] = None, - train_steps: Optional[int] = None): - """Build the learning rate given the provided configuration.""" - decay_type = params.name - base_lr = params.initial_lr - decay_rate = params.decay_rate - if params.decay_epochs is not None: - decay_steps = params.decay_epochs * train_steps - else: - decay_steps = 0 - if params.warmup_epochs is not None: - warmup_steps = params.warmup_epochs * train_steps - else: - warmup_steps = 0 - - lr_multiplier = params.scale_by_batch_size - - if lr_multiplier and lr_multiplier > 0: - # Scale the learning rate based on the batch size and a multiplier - base_lr *= lr_multiplier * batch_size - logging.info( - 'Scaling the learning rate based on the batch size ' - 'multiplier. New base_lr: %f', base_lr) - - if decay_type == 'exponential': - logging.info( - 'Using exponential learning rate with: ' - 'initial_learning_rate: %f, decay_steps: %d, ' - 'decay_rate: %f', base_lr, decay_steps, decay_rate) - lr = tf.keras.optimizers.schedules.ExponentialDecay( - initial_learning_rate=base_lr, - decay_steps=decay_steps, - decay_rate=decay_rate, - staircase=params.staircase) - elif decay_type == 'stepwise': - steps_per_epoch = params.examples_per_epoch // batch_size - boundaries = [boundary * steps_per_epoch for boundary in params.boundaries] - multipliers = [batch_size * multiplier for multiplier in params.multipliers] - logging.info( - 'Using stepwise learning rate. Parameters: ' - 'boundaries: %s, values: %s', boundaries, multipliers) - lr = tf.keras.optimizers.schedules.PiecewiseConstantDecay( - boundaries=boundaries, values=multipliers) - elif decay_type == 'cosine_with_warmup': - lr = learning_rate.CosineDecayWithWarmup( - batch_size=batch_size, - total_steps=train_epochs * train_steps, - warmup_steps=warmup_steps) - if warmup_steps > 0: - if decay_type not in ['cosine_with_warmup']: - logging.info('Applying %d warmup steps to the learning rate', - warmup_steps) - lr = learning_rate.WarmupDecaySchedule( - lr, warmup_steps, warmup_lr=base_lr) - return lr diff --git a/official/vision/image_classification/optimizer_factory_test.py b/official/vision/image_classification/optimizer_factory_test.py deleted file mode 100644 index 41d71a328d6fc0d27709978ae75994f8985a166d..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/optimizer_factory_test.py +++ /dev/null @@ -1,118 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Tests for optimizer_factory.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -from absl.testing import parameterized - -import tensorflow as tf -from official.vision.image_classification import optimizer_factory -from official.vision.image_classification.configs import base_configs - - -class OptimizerFactoryTest(tf.test.TestCase, parameterized.TestCase): - - def build_toy_model(self) -> tf.keras.Model: - """Creates a toy `tf.Keras.Model`.""" - model = tf.keras.Sequential() - model.add(tf.keras.layers.Dense(1, input_shape=(1,))) - return model - - @parameterized.named_parameters( - ('sgd', 'sgd', 0., False), ('momentum', 'momentum', 0., False), - ('rmsprop', 'rmsprop', 0., False), ('adam', 'adam', 0., False), - ('adamw', 'adamw', 0., False), - ('momentum_lookahead', 'momentum', 0., True), - ('sgd_ema', 'sgd', 0.999, False), - ('momentum_ema', 'momentum', 0.999, False), - ('rmsprop_ema', 'rmsprop', 0.999, False)) - def test_optimizer(self, optimizer_name, moving_average_decay, lookahead): - """Smoke test to be sure no syntax errors.""" - model = self.build_toy_model() - params = { - 'learning_rate': 0.001, - 'rho': 0.09, - 'momentum': 0., - 'epsilon': 1e-07, - 'moving_average_decay': moving_average_decay, - 'lookahead': lookahead, - } - optimizer = optimizer_factory.build_optimizer( - optimizer_name=optimizer_name, - base_learning_rate=params['learning_rate'], - params=params, - model=model) - self.assertTrue(issubclass(type(optimizer), tf.keras.optimizers.Optimizer)) - - def test_unknown_optimizer(self): - with self.assertRaises(ValueError): - optimizer_factory.build_optimizer( - optimizer_name='this_optimizer_does_not_exist', - base_learning_rate=None, - params=None) - - def test_learning_rate_without_decay_or_warmups(self): - params = base_configs.LearningRateConfig( - name='exponential', - initial_lr=0.01, - decay_rate=0.01, - decay_epochs=None, - warmup_epochs=None, - scale_by_batch_size=0.01, - examples_per_epoch=1, - boundaries=[0], - multipliers=[0, 1]) - batch_size = 1 - train_steps = 1 - - lr = optimizer_factory.build_learning_rate( - params=params, batch_size=batch_size, train_steps=train_steps) - self.assertTrue( - issubclass( - type(lr), tf.keras.optimizers.schedules.LearningRateSchedule)) - - @parameterized.named_parameters(('exponential', 'exponential'), - ('cosine_with_warmup', 'cosine_with_warmup')) - def test_learning_rate_with_decay_and_warmup(self, lr_decay_type): - """Basic smoke test for syntax.""" - params = base_configs.LearningRateConfig( - name=lr_decay_type, - initial_lr=0.01, - decay_rate=0.01, - decay_epochs=1, - warmup_epochs=1, - scale_by_batch_size=0.01, - examples_per_epoch=1, - boundaries=[0], - multipliers=[0, 1]) - batch_size = 1 - train_epochs = 1 - train_steps = 1 - - lr = optimizer_factory.build_learning_rate( - params=params, - batch_size=batch_size, - train_epochs=train_epochs, - train_steps=train_steps) - self.assertTrue( - issubclass( - type(lr), tf.keras.optimizers.schedules.LearningRateSchedule)) - - -if __name__ == '__main__': - tf.test.main() diff --git a/official/vision/image_classification/preprocessing.py b/official/vision/image_classification/preprocessing.py deleted file mode 100644 index bd7e2e1d19faab1a4257f81bc59a5845d75b1823..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/preprocessing.py +++ /dev/null @@ -1,390 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Preprocessing functions for images.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf -from typing import List, Optional, Text, Tuple - -from official.vision.image_classification import augment - - -# Calculated from the ImageNet training set -MEAN_RGB = (0.485 * 255, 0.456 * 255, 0.406 * 255) -STDDEV_RGB = (0.229 * 255, 0.224 * 255, 0.225 * 255) - -IMAGE_SIZE = 224 -CROP_PADDING = 32 - - -def mean_image_subtraction( - image_bytes: tf.Tensor, - means: Tuple[float, ...], - num_channels: int = 3, - dtype: tf.dtypes.DType = tf.float32, -) -> tf.Tensor: - """Subtracts the given means from each image channel. - - For example: - means = [123.68, 116.779, 103.939] - image_bytes = mean_image_subtraction(image_bytes, means) - - Note that the rank of `image` must be known. - - Args: - image_bytes: a tensor of size [height, width, C]. - means: a C-vector of values to subtract from each channel. - num_channels: number of color channels in the image that will be distorted. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - the centered image. - - Raises: - ValueError: If the rank of `image` is unknown, if `image` has a rank other - than three or if the number of channels in `image` doesn't match the - number of values in `means`. - """ - if image_bytes.get_shape().ndims != 3: - raise ValueError('Input must be of size [height, width, C>0]') - - if len(means) != num_channels: - raise ValueError('len(means) must match the number of channels') - - # We have a 1-D tensor of means; convert to 3-D. - # Note(b/130245863): we explicitly call `broadcast` instead of simply - # expanding dimensions for better performance. - means = tf.broadcast_to(means, tf.shape(image_bytes)) - if dtype is not None: - means = tf.cast(means, dtype=dtype) - - return image_bytes - means - - -def standardize_image( - image_bytes: tf.Tensor, - stddev: Tuple[float, ...], - num_channels: int = 3, - dtype: tf.dtypes.DType = tf.float32, -) -> tf.Tensor: - """Divides the given stddev from each image channel. - - For example: - stddev = [123.68, 116.779, 103.939] - image_bytes = standardize_image(image_bytes, stddev) - - Note that the rank of `image` must be known. - - Args: - image_bytes: a tensor of size [height, width, C]. - stddev: a C-vector of values to divide from each channel. - num_channels: number of color channels in the image that will be distorted. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - the centered image. - - Raises: - ValueError: If the rank of `image` is unknown, if `image` has a rank other - than three or if the number of channels in `image` doesn't match the - number of values in `stddev`. - """ - if image_bytes.get_shape().ndims != 3: - raise ValueError('Input must be of size [height, width, C>0]') - - if len(stddev) != num_channels: - raise ValueError('len(stddev) must match the number of channels') - - # We have a 1-D tensor of stddev; convert to 3-D. - # Note(b/130245863): we explicitly call `broadcast` instead of simply - # expanding dimensions for better performance. - stddev = tf.broadcast_to(stddev, tf.shape(image_bytes)) - if dtype is not None: - stddev = tf.cast(stddev, dtype=dtype) - - return image_bytes / stddev - - -def normalize_images(features: tf.Tensor, - mean_rgb: Tuple[float, ...] = MEAN_RGB, - stddev_rgb: Tuple[float, ...] = STDDEV_RGB, - num_channels: int = 3, - dtype: tf.dtypes.DType = tf.float32, - data_format: Text = 'channels_last') -> tf.Tensor: - """Normalizes the input image channels with the given mean and stddev. - - Args: - features: `Tensor` representing decoded images in float format. - mean_rgb: the mean of the channels to subtract. - stddev_rgb: the stddev of the channels to divide. - num_channels: the number of channels in the input image tensor. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - data_format: the format of the input image tensor - ['channels_first', 'channels_last']. - - Returns: - A normalized image `Tensor`. - """ - # TODO(allencwang) - figure out how to use mean_image_subtraction and - # standardize_image on batches of images and replace the following. - if data_format == 'channels_first': - stats_shape = [num_channels, 1, 1] - else: - stats_shape = [1, 1, num_channels] - - if dtype is not None: - features = tf.image.convert_image_dtype(features, dtype=dtype) - - if mean_rgb is not None: - mean_rgb = tf.constant(mean_rgb, - shape=stats_shape, - dtype=features.dtype) - mean_rgb = tf.broadcast_to(mean_rgb, tf.shape(features)) - features = features - mean_rgb - - if stddev_rgb is not None: - stddev_rgb = tf.constant(stddev_rgb, - shape=stats_shape, - dtype=features.dtype) - stddev_rgb = tf.broadcast_to(stddev_rgb, tf.shape(features)) - features = features / stddev_rgb - - return features - - -def decode_and_center_crop(image_bytes: tf.Tensor, - image_size: int = IMAGE_SIZE, - crop_padding: int = CROP_PADDING) -> tf.Tensor: - """Crops to center of image with padding then scales image_size. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - image_size: image height/width dimension. - crop_padding: the padding size to use when centering the crop. - - Returns: - A decoded and cropped image `Tensor`. - """ - decoded = image_bytes.dtype != tf.string - shape = (tf.shape(image_bytes) if decoded - else tf.image.extract_jpeg_shape(image_bytes)) - image_height = shape[0] - image_width = shape[1] - - padded_center_crop_size = tf.cast( - ((image_size / (image_size + crop_padding)) * - tf.cast(tf.minimum(image_height, image_width), tf.float32)), - tf.int32) - - offset_height = ((image_height - padded_center_crop_size) + 1) // 2 - offset_width = ((image_width - padded_center_crop_size) + 1) // 2 - crop_window = tf.stack([offset_height, offset_width, - padded_center_crop_size, padded_center_crop_size]) - if decoded: - image = tf.image.crop_to_bounding_box( - image_bytes, - offset_height=offset_height, - offset_width=offset_width, - target_height=padded_center_crop_size, - target_width=padded_center_crop_size) - else: - image = tf.image.decode_and_crop_jpeg(image_bytes, crop_window, channels=3) - - image = resize_image(image_bytes=image, - height=image_size, - width=image_size) - - return image - - -def decode_crop_and_flip(image_bytes: tf.Tensor) -> tf.Tensor: - """Crops an image to a random part of the image, then randomly flips. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - - Returns: - A decoded and cropped image `Tensor`. - - """ - decoded = image_bytes.dtype != tf.string - bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) - shape = (tf.shape(image_bytes) if decoded - else tf.image.extract_jpeg_shape(image_bytes)) - sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( - shape, - bounding_boxes=bbox, - min_object_covered=0.1, - aspect_ratio_range=[0.75, 1.33], - area_range=[0.05, 1.0], - max_attempts=100, - use_image_if_no_bounding_boxes=True) - bbox_begin, bbox_size, _ = sample_distorted_bounding_box - - # Reassemble the bounding box in the format the crop op requires. - offset_height, offset_width, _ = tf.unstack(bbox_begin) - target_height, target_width, _ = tf.unstack(bbox_size) - crop_window = tf.stack([offset_height, offset_width, - target_height, target_width]) - if decoded: - cropped = tf.image.crop_to_bounding_box( - image_bytes, - offset_height=offset_height, - offset_width=offset_width, - target_height=target_height, - target_width=target_width) - else: - cropped = tf.image.decode_and_crop_jpeg(image_bytes, - crop_window, - channels=3) - - # Flip to add a little more random distortion in. - cropped = tf.image.random_flip_left_right(cropped) - return cropped - - -def resize_image(image_bytes: tf.Tensor, - height: int = IMAGE_SIZE, - width: int = IMAGE_SIZE) -> tf.Tensor: - """Resizes an image to a given height and width. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - height: image height dimension. - width: image width dimension. - - Returns: - A tensor containing the resized image. - - """ - return tf.compat.v1.image.resize( - image_bytes, [height, width], method=tf.image.ResizeMethod.BILINEAR, - align_corners=False) - - -def preprocess_for_eval( - image_bytes: tf.Tensor, - image_size: int = IMAGE_SIZE, - num_channels: int = 3, - mean_subtract: bool = False, - standardize: bool = False, - dtype: tf.dtypes.DType = tf.float32 -) -> tf.Tensor: - """Preprocesses the given image for evaluation. - - Args: - image_bytes: `Tensor` representing an image binary of arbitrary size. - image_size: image height/width dimension. - num_channels: number of image input channels. - mean_subtract: whether or not to apply mean subtraction. - standardize: whether or not to apply standardization. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - A preprocessed and normalized image `Tensor`. - """ - images = decode_and_center_crop(image_bytes, image_size) - images = tf.reshape(images, [image_size, image_size, num_channels]) - - if mean_subtract: - images = mean_image_subtraction(image_bytes=images, means=MEAN_RGB) - if standardize: - images = standardize_image(image_bytes=images, stddev=STDDEV_RGB) - if dtype is not None: - images = tf.image.convert_image_dtype(images, dtype=dtype) - - return images - - -def load_eval_image(filename: Text, image_size: int = IMAGE_SIZE) -> tf.Tensor: - """Reads an image from the filesystem and applies image preprocessing. - - Args: - filename: a filename path of an image. - image_size: image height/width dimension. - - Returns: - A preprocessed and normalized image `Tensor`. - """ - image_bytes = tf.io.read_file(filename) - image = preprocess_for_eval(image_bytes, image_size) - - return image - - -def build_eval_dataset(filenames: List[Text], - labels: Optional[List[int]] = None, - image_size: int = IMAGE_SIZE, - batch_size: int = 1) -> tf.Tensor: - """Builds a tf.data.Dataset from a list of filenames and labels. - - Args: - filenames: a list of filename paths of images. - labels: a list of labels corresponding to each image. - image_size: image height/width dimension. - batch_size: the batch size used by the dataset - - Returns: - A preprocessed and normalized image `Tensor`. - """ - if labels is None: - labels = [0] * len(filenames) - - filenames = tf.constant(filenames) - labels = tf.constant(labels) - dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)) - - dataset = dataset.map( - lambda filename, label: (load_eval_image(filename, image_size), label)) - dataset = dataset.batch(batch_size) - - return dataset - - -def preprocess_for_train(image_bytes: tf.Tensor, - image_size: int = IMAGE_SIZE, - augmenter: Optional[augment.ImageAugment] = None, - mean_subtract: bool = False, - standardize: bool = False, - dtype: tf.dtypes.DType = tf.float32) -> tf.Tensor: - """Preprocesses the given image for training. - - Args: - image_bytes: `Tensor` representing an image binary of - arbitrary size of dtype tf.uint8. - image_size: image height/width dimension. - augmenter: the image augmenter to apply. - mean_subtract: whether or not to apply mean subtraction. - standardize: whether or not to apply standardization. - dtype: the dtype to convert the images to. Set to `None` to skip conversion. - - Returns: - A preprocessed and normalized image `Tensor`. - """ - images = decode_crop_and_flip(image_bytes=image_bytes) - images = resize_image(images, height=image_size, width=image_size) - if augmenter is not None: - images = augmenter.distort(images) - if mean_subtract: - images = mean_image_subtraction(image_bytes=images, means=MEAN_RGB) - if standardize: - images = standardize_image(image_bytes=images, stddev=STDDEV_RGB) - if dtype is not None: - images = tf.image.convert_image_dtype(images, dtype) - - return images diff --git a/official/vision/image_classification/resnet/README.md b/official/vision/image_classification/resnet/README.md deleted file mode 100644 index 5064523fbdcd4222c2159bdc1c09b7156800bf54..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/README.md +++ /dev/null @@ -1,125 +0,0 @@ -This folder contains a -[custom training loop (CTL)](#resnet-custom-training-loop) implementation for -ResNet50. - -## Before you begin -Please refer to the [README](../README.md) in the parent directory for -information on setup and preparing the data. - -## ResNet (custom training loop) - -Similar to the [estimator implementation](../../../r1/resnet), the Keras -implementation has code for the ImageNet dataset. The ImageNet -version uses a ResNet50 model implemented in -[`resnet_model.py`](./resnet_model.py). - - -### Pretrained Models - -* [ResNet50 Checkpoints](https://storage.googleapis.com/cloud-tpu-checkpoints/resnet/resnet50.tar.gz) - -* ResNet50 TFHub: [feature vector](https://tfhub.dev/tensorflow/resnet_50/feature_vector/1) -and [classification](https://tfhub.dev/tensorflow/resnet_50/classification/1) - -Again, if you did not download the data to the default directory, specify the -location with the `--data_dir` flag: - -```bash -python3 resnet_ctl_imagenet_main.py --data_dir=/path/to/imagenet -``` - -There are more flag options you can specify. Here are some examples: - -- `--use_synthetic_data`: when set to true, synthetic data, rather than real -data, are used; -- `--batch_size`: the batch size used for the model; -- `--model_dir`: the directory to save the model checkpoint; -- `--train_epochs`: number of epoches to run for training the model; -- `--train_steps`: number of steps to run for training the model. We now only -support a number that is smaller than the number of batches in an epoch. -- `--skip_eval`: when set to true, evaluation as well as validation during -training is skipped - -For example, this is a typical command line to run with ImageNet data with -batch size 128 per GPU: - -```bash -python3 -m resnet_ctl_imagenet_main.py \ - --model_dir=/tmp/model_dir/something \ - --num_gpus=2 \ - --batch_size=128 \ - --train_epochs=90 \ - --train_steps=10 \ - --use_synthetic_data=false -``` - -See [`common.py`](common.py) for full list of options. - -### Using multiple GPUs - -You can train these models on multiple GPUs using `tf.distribute.Strategy` API. -You can read more about them in this -[guide](https://www.tensorflow.org/guide/distribute_strategy). - -In this example, we have made it easier to use is with just a command line flag -`--num_gpus`. By default this flag is 1 if TensorFlow is compiled with CUDA, -and 0 otherwise. - -- --num_gpus=0: Uses tf.distribute.OneDeviceStrategy with CPU as the device. -- --num_gpus=1: Uses tf.distribute.OneDeviceStrategy with GPU as the device. -- --num_gpus=2+: Uses tf.distribute.MirroredStrategy to run synchronous -distributed training across the GPUs. - -If you wish to run without `tf.distribute.Strategy`, you can do so by setting -`--distribution_strategy=off`. - -### Running on multiple GPU hosts - -You can also train these models on multiple hosts, each with GPUs, using -`tf.distribute.Strategy`. - -The easiest way to run multi-host benchmarks is to set the -[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG) -appropriately at each host. e.g., to run using `MultiWorkerMirroredStrategy` on -2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and -host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker", -"index": i}`. `MultiWorkerMirroredStrategy` will automatically use all the -available GPUs at each host. - -### Running on Cloud TPUs - -Note: This model will **not** work with TPUs on Colab. - -You can train the ResNet CTL model on Cloud TPUs using -`tf.distribute.TPUStrategy`. If you are not familiar with Cloud TPUs, it is -strongly recommended that you go through the -[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to -create a TPU and GCE VM. - -To run ResNet model on a TPU, you must set `--distribution_strategy=tpu` and -`--tpu=$TPU_NAME`, where `$TPU_NAME` the name of your TPU in the Cloud Console. -From a GCE VM, you can run the following command to train ResNet for one epoch -on a v2-8 or v3-8 TPU by setting `TRAIN_EPOCHS` to 1: - -```bash -python3 resnet_ctl_imagenet_main.py \ - --tpu=$TPU_NAME \ - --model_dir=$MODEL_DIR \ - --data_dir=$DATA_DIR \ - --batch_size=1024 \ - --steps_per_loop=500 \ - --train_epochs=$TRAIN_EPOCHS \ - --use_synthetic_data=false \ - --dtype=fp32 \ - --enable_eager=true \ - --enable_tensorboard=true \ - --distribution_strategy=tpu \ - --log_steps=50 \ - --single_l2_loss_op=true \ - --use_tf_function=true -``` - -To train the ResNet to convergence, run it for 90 epochs by setting -`TRAIN_EPOCHS` to 90. - -Note: `$MODEL_DIR` and `$DATA_DIR` must be GCS paths. diff --git a/official/vision/image_classification/resnet/__init__.py b/official/vision/image_classification/resnet/__init__.py deleted file mode 100644 index e419af524b5f349fe04abfa820c3cb51b777d422..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/__init__.py +++ /dev/null @@ -1,14 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - diff --git a/official/vision/image_classification/resnet/__pycache__/__init__.cpython-37.pyc b/official/vision/image_classification/resnet/__pycache__/__init__.cpython-37.pyc deleted file mode 100644 index e98e8562f00ac9d642e299e9e0873756c1fba2ca..0000000000000000000000000000000000000000 Binary files a/official/vision/image_classification/resnet/__pycache__/__init__.cpython-37.pyc and /dev/null differ diff --git a/official/vision/image_classification/resnet/__pycache__/common.cpython-37.pyc b/official/vision/image_classification/resnet/__pycache__/common.cpython-37.pyc deleted file mode 100644 index 7d37d16c6289e4c783821515499522484b44a0f8..0000000000000000000000000000000000000000 Binary files a/official/vision/image_classification/resnet/__pycache__/common.cpython-37.pyc and /dev/null differ diff --git a/official/vision/image_classification/resnet/__pycache__/imagenet_preprocessing.cpython-37.pyc b/official/vision/image_classification/resnet/__pycache__/imagenet_preprocessing.cpython-37.pyc deleted file mode 100644 index 3d2c5222fd00bb6b7ea1f5d957870ac91b5f9220..0000000000000000000000000000000000000000 Binary files a/official/vision/image_classification/resnet/__pycache__/imagenet_preprocessing.cpython-37.pyc and /dev/null differ diff --git a/official/vision/image_classification/resnet/__pycache__/resnet_model.cpython-37.pyc b/official/vision/image_classification/resnet/__pycache__/resnet_model.cpython-37.pyc deleted file mode 100644 index 7eb5c30278027c37a65b7fc1ba8ec08616618ea4..0000000000000000000000000000000000000000 Binary files a/official/vision/image_classification/resnet/__pycache__/resnet_model.cpython-37.pyc and /dev/null differ diff --git a/official/vision/image_classification/resnet/__pycache__/resnet_runnable.cpython-37.pyc b/official/vision/image_classification/resnet/__pycache__/resnet_runnable.cpython-37.pyc deleted file mode 100644 index f6db67c6a67e0e47f20e2ec9ca3396f44a40e47c..0000000000000000000000000000000000000000 Binary files a/official/vision/image_classification/resnet/__pycache__/resnet_runnable.cpython-37.pyc and /dev/null differ diff --git a/official/vision/image_classification/resnet/common.py b/official/vision/image_classification/resnet/common.py deleted file mode 100644 index a034ba7dd0be5b2b2536727137497c84519001a5..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/common.py +++ /dev/null @@ -1,418 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Common util functions and classes used by both keras cifar and imagenet.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -from absl import flags -import tensorflow as tf - -import tensorflow_model_optimization as tfmot -from official.utils.flags import core as flags_core -from official.utils.misc import keras_utils - -FLAGS = flags.FLAGS -BASE_LEARNING_RATE = 0.1 # This matches Jing's version. -TRAIN_TOP_1 = 'training_accuracy_top_1' -LR_SCHEDULE = [ # (multiplier, epoch to start) tuples - (1.0, 5), (0.1, 30), (0.01, 60), (0.001, 80) -] - - -class PiecewiseConstantDecayWithWarmup( - tf.keras.optimizers.schedules.LearningRateSchedule): - """Piecewise constant decay with warmup schedule.""" - - def __init__(self, - batch_size, - epoch_size, - warmup_epochs, - boundaries, - multipliers, - compute_lr_on_cpu=True, - name=None): - super(PiecewiseConstantDecayWithWarmup, self).__init__() - if len(boundaries) != len(multipliers) - 1: - raise ValueError('The length of boundaries must be 1 less than the ' - 'length of multipliers') - - base_lr_batch_size = 256 - steps_per_epoch = epoch_size // batch_size - - self.rescaled_lr = BASE_LEARNING_RATE * batch_size / base_lr_batch_size - self.step_boundaries = [float(steps_per_epoch) * x for x in boundaries] - self.lr_values = [self.rescaled_lr * m for m in multipliers] - self.warmup_steps = warmup_epochs * steps_per_epoch - self.compute_lr_on_cpu = compute_lr_on_cpu - self.name = name - - self.learning_rate_ops_cache = {} - - def __call__(self, step): - if tf.executing_eagerly(): - return self._get_learning_rate(step) - - # In an eager function or graph, the current implementation of optimizer - # repeatedly call and thus create ops for the learning rate schedule. To - # avoid this, we cache the ops if not executing eagerly. - graph = tf.compat.v1.get_default_graph() - if graph not in self.learning_rate_ops_cache: - if self.compute_lr_on_cpu: - with tf.device('/device:CPU:0'): - self.learning_rate_ops_cache[graph] = self._get_learning_rate(step) - else: - self.learning_rate_ops_cache[graph] = self._get_learning_rate(step) - return self.learning_rate_ops_cache[graph] - - def _get_learning_rate(self, step): - """Compute learning rate at given step.""" - with tf.name_scope('PiecewiseConstantDecayWithWarmup'): - - def warmup_lr(step): - return self.rescaled_lr * ( - tf.cast(step, tf.float32) / tf.cast(self.warmup_steps, tf.float32)) - - def piecewise_lr(step): - return tf.compat.v1.train.piecewise_constant(step, self.step_boundaries, - self.lr_values) - - return tf.cond(step < self.warmup_steps, lambda: warmup_lr(step), - lambda: piecewise_lr(step)) - - def get_config(self): - return { - 'rescaled_lr': self.rescaled_lr, - 'step_boundaries': self.step_boundaries, - 'lr_values': self.lr_values, - 'warmup_steps': self.warmup_steps, - 'compute_lr_on_cpu': self.compute_lr_on_cpu, - 'name': self.name - } - - -def get_optimizer(learning_rate=0.1): - """Returns optimizer to use.""" - # The learning_rate is overwritten at the beginning of each step by callback. - return tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9) - - -def get_callbacks(pruning_method=None, - enable_checkpoint_and_export=False, - model_dir=None): - """Returns common callbacks.""" - time_callback = keras_utils.TimeHistory( - FLAGS.batch_size, - FLAGS.log_steps, - logdir=FLAGS.model_dir if FLAGS.enable_tensorboard else None) - callbacks = [time_callback] - - if FLAGS.enable_tensorboard: - tensorboard_callback = tf.keras.callbacks.TensorBoard( - log_dir=FLAGS.model_dir, profile_batch=FLAGS.profile_steps) - callbacks.append(tensorboard_callback) - - is_pruning_enabled = pruning_method is not None - if is_pruning_enabled: - callbacks.append(tfmot.sparsity.keras.UpdatePruningStep()) - if model_dir is not None: - callbacks.append( - tfmot.sparsity.keras.PruningSummaries( - log_dir=model_dir, profile_batch=0)) - - if enable_checkpoint_and_export: - if model_dir is not None: - ckpt_full_path = os.path.join(model_dir, 'model.ckpt-{epoch:04d}') - callbacks.append( - tf.keras.callbacks.ModelCheckpoint( - ckpt_full_path, save_weights_only=True)) - return callbacks - - -def build_stats(history, eval_output, callbacks): - """Normalizes and returns dictionary of stats. - - Args: - history: Results of the training step. Supports both categorical_accuracy - and sparse_categorical_accuracy. - eval_output: Output of the eval step. Assumes first value is eval_loss and - second value is accuracy_top_1. - callbacks: a list of callbacks which might include a time history callback - used during keras.fit. - - Returns: - Dictionary of normalized results. - """ - stats = {} - if eval_output: - stats['accuracy_top_1'] = float(eval_output[1]) - stats['eval_loss'] = float(eval_output[0]) - if history and history.history: - train_hist = history.history - # Gets final loss from training. - stats['loss'] = float(train_hist['loss'][-1]) - # Gets top_1 training accuracy. - if 'categorical_accuracy' in train_hist: - stats[TRAIN_TOP_1] = float(train_hist['categorical_accuracy'][-1]) - elif 'sparse_categorical_accuracy' in train_hist: - stats[TRAIN_TOP_1] = float(train_hist['sparse_categorical_accuracy'][-1]) - elif 'accuracy' in train_hist: - stats[TRAIN_TOP_1] = float(train_hist['accuracy'][-1]) - - if not callbacks: - return stats - - # Look for the time history callback which was used during keras.fit - for callback in callbacks: - if isinstance(callback, keras_utils.TimeHistory): - timestamp_log = callback.timestamp_log - stats['step_timestamp_log'] = timestamp_log - stats['train_finish_time'] = callback.train_finish_time - if callback.epoch_runtime_log: - stats['avg_exp_per_second'] = callback.average_examples_per_second - - return stats - - -def define_keras_flags(model=False, - optimizer=False, - pretrained_filepath=False): - """Define flags for Keras models.""" - flags_core.define_base( - clean=True, - num_gpu=True, - run_eagerly=True, - train_epochs=True, - epochs_between_evals=True, - distribution_strategy=True) - flags_core.define_performance( - num_parallel_calls=False, - synthetic_data=True, - dtype=True, - all_reduce_alg=True, - num_packs=True, - tf_gpu_thread_mode=True, - datasets_num_private_threads=True, - loss_scale=True, - fp16_implementation=True, - tf_data_experimental_slack=True, - enable_xla=True, - training_dataset_cache=True) - flags_core.define_image() - flags_core.define_benchmark() - flags_core.define_distribution() - flags.adopt_module_key_flags(flags_core) - - flags.DEFINE_boolean(name='enable_eager', default=False, help='Enable eager?') - flags.DEFINE_boolean(name='skip_eval', default=False, help='Skip evaluation?') - # TODO(b/135607288): Remove this flag once we understand the root cause of - # slowdown when setting the learning phase in Keras backend. - flags.DEFINE_boolean( - name='set_learning_phase_to_train', - default=True, - help='If skip eval, also set Keras learning phase to 1 (training).') - flags.DEFINE_boolean( - name='explicit_gpu_placement', - default=False, - help='If not using distribution strategy, explicitly set device scope ' - 'for the Keras training loop.') - flags.DEFINE_boolean( - name='use_trivial_model', - default=False, - help='Whether to use a trivial Keras model.') - flags.DEFINE_boolean( - name='report_accuracy_metrics', - default=True, - help='Report metrics during training and evaluation.') - flags.DEFINE_boolean( - name='use_tensor_lr', - default=True, - help='Use learning rate tensor instead of a callback.') - flags.DEFINE_boolean( - name='enable_tensorboard', - default=False, - help='Whether to enable Tensorboard callback.') - flags.DEFINE_string( - name='profile_steps', - default=None, - help='Save profiling data to model dir at given range of global steps. The ' - 'value must be a comma separated pair of positive integers, specifying ' - 'the first and last step to profile. For example, "--profile_steps=2,4" ' - 'triggers the profiler to process 3 steps, starting from the 2nd step. ' - 'Note that profiler has a non-trivial performance overhead, and the ' - 'output file can be gigantic if profiling many steps.') - flags.DEFINE_integer( - name='train_steps', - default=None, - help='The number of steps to run for training. If it is larger than ' - '# batches per epoch, then use # batches per epoch. This flag will be ' - 'ignored if train_epochs is set to be larger than 1. ') - flags.DEFINE_boolean( - name='batchnorm_spatial_persistent', - default=True, - help='Enable the spacial persistent mode for CuDNN batch norm kernel.') - flags.DEFINE_boolean( - name='enable_get_next_as_optional', - default=False, - help='Enable get_next_as_optional behavior in DistributedIterator.') - flags.DEFINE_boolean( - name='enable_checkpoint_and_export', - default=False, - help='Whether to enable a checkpoint callback and export the savedmodel.') - flags.DEFINE_string(name='tpu', default='', help='TPU address to connect to.') - flags.DEFINE_integer( - name='steps_per_loop', - default=None, - help='Number of steps per training loop. Only training step happens ' - 'inside the loop. Callbacks will not be called inside. Will be capped at ' - 'steps per epoch.') - flags.DEFINE_boolean( - name='use_tf_while_loop', - default=True, - help='Whether to build a tf.while_loop inside the training loop on the ' - 'host. Setting it to True is critical to have peak performance on ' - 'TPU.') - - if model: - flags.DEFINE_string('model', 'resnet50_v1.5', - 'Name of model preset. (mobilenet, resnet50_v1.5)') - if optimizer: - flags.DEFINE_string( - 'optimizer', 'resnet50_default', 'Name of optimizer preset. ' - '(mobilenet_default, resnet50_default)') - # TODO(kimjaehong): Replace as general hyper-params not only for mobilenet. - flags.DEFINE_float( - 'initial_learning_rate_per_sample', 0.00007, - 'Initial value of learning rate per sample for ' - 'mobilenet_default.') - flags.DEFINE_float('lr_decay_factor', 0.94, - 'Learning rate decay factor for mobilenet_default.') - flags.DEFINE_float('num_epochs_per_decay', 2.5, - 'Number of epochs per decay for mobilenet_default.') - if pretrained_filepath: - flags.DEFINE_string('pretrained_filepath', '', 'Pretrained file path.') - - -def get_synth_data(height, width, num_channels, num_classes, dtype): - """Creates a set of synthetic random data. - - Args: - height: Integer height that will be used to create a fake image tensor. - width: Integer width that will be used to create a fake image tensor. - num_channels: Integer depth that will be used to create a fake image tensor. - num_classes: Number of classes that should be represented in the fake labels - tensor - dtype: Data type for features/images. - - Returns: - A tuple of tensors representing the inputs and labels. - - """ - # Synthetic input should be within [0, 255]. - inputs = tf.random.truncated_normal([height, width, num_channels], - dtype=dtype, - mean=127, - stddev=60, - name='synthetic_inputs') - labels = tf.random.uniform([1], - minval=0, - maxval=num_classes - 1, - dtype=tf.int32, - name='synthetic_labels') - return inputs, labels - - -def define_pruning_flags(): - """Define flags for pruning methods.""" - flags.DEFINE_string( - 'pruning_method', None, 'Pruning method.' - 'None (no pruning) or polynomial_decay.') - flags.DEFINE_float('pruning_initial_sparsity', 0.0, - 'Initial sparsity for pruning.') - flags.DEFINE_float('pruning_final_sparsity', 0.5, - 'Final sparsity for pruning.') - flags.DEFINE_integer('pruning_begin_step', 0, 'Begin step for pruning.') - flags.DEFINE_integer('pruning_end_step', 100000, 'End step for pruning.') - flags.DEFINE_integer('pruning_frequency', 100, 'Frequency for pruning.') - - -def define_clustering_flags(): - """Define flags for clustering methods.""" - flags.DEFINE_string('clustering_method', None, - 'None (no clustering) or selective_clustering ' - '(cluster last three Conv2D layers of the model).') - - -def get_synth_input_fn(height, - width, - num_channels, - num_classes, - dtype=tf.float32, - drop_remainder=True): - """Returns an input function that returns a dataset with random data. - - This input_fn returns a data set that iterates over a set of random data and - bypasses all preprocessing, e.g. jpeg decode and copy. The host to device - copy is still included. This used to find the upper throughput bound when - tuning the full input pipeline. - - Args: - height: Integer height that will be used to create a fake image tensor. - width: Integer width that will be used to create a fake image tensor. - num_channels: Integer depth that will be used to create a fake image tensor. - num_classes: Number of classes that should be represented in the fake labels - tensor - dtype: Data type for features/images. - drop_remainder: A boolean indicates whether to drop the remainder of the - batches. If True, the batch dimension will be static. - - Returns: - An input_fn that can be used in place of a real one to return a dataset - that can be used for iteration. - """ - - # pylint: disable=unused-argument - def input_fn(is_training, data_dir, batch_size, *args, **kwargs): - """Returns dataset filled with random data.""" - inputs, labels = get_synth_data( - height=height, - width=width, - num_channels=num_channels, - num_classes=num_classes, - dtype=dtype) - # Cast to float32 for Keras model. - labels = tf.cast(labels, dtype=tf.float32) - data = tf.data.Dataset.from_tensors((inputs, labels)).repeat() - - # `drop_remainder` will make dataset produce outputs with known shapes. - data = data.batch(batch_size, drop_remainder=drop_remainder) - data = data.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) - return data - - return input_fn - - -def set_cudnn_batchnorm_mode(): - """Set CuDNN batchnorm mode for better performance. - - Note: Spatial Persistent mode may lead to accuracy losses for certain - models. - """ - if FLAGS.batchnorm_spatial_persistent: - os.environ['TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT'] = '1' - else: - os.environ.pop('TF_USE_CUDNN_BATCHNORM_SPATIAL_PERSISTENT', None) diff --git a/official/vision/image_classification/resnet/imagenet_preprocessing.py b/official/vision/image_classification/resnet/imagenet_preprocessing.py deleted file mode 100644 index 86ba3ed98084987ea5d63edf8fd5f515d58fba93..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/imagenet_preprocessing.py +++ /dev/null @@ -1,574 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Provides utilities to preprocess images. - -Training images are sampled using the provided bounding boxes, and subsequently -cropped to the sampled bounding box. Images are additionally flipped randomly, -then resized to the target output size (without aspect-ratio preservation). - -Images used during evaluation are resized (with aspect-ratio preservation) and -centrally cropped. - -All images undergo mean color subtraction. - -Note that these steps are colloquially referred to as "ResNet preprocessing," -and they differ from "VGG preprocessing," which does not use bounding boxes -and instead does an aspect-preserving resize followed by random crop during -training. (These both differ from "Inception preprocessing," which introduces -color distortion steps.) - -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -from absl import logging -import tensorflow as tf - -DEFAULT_IMAGE_SIZE = 224 -NUM_CHANNELS = 3 -NUM_CLASSES = 1001 - -NUM_IMAGES = { - 'train': 1281167, - 'validation': 50000, -} - -_NUM_TRAIN_FILES = 1024 -_SHUFFLE_BUFFER = 10000 - -_R_MEAN = 123.68 -_G_MEAN = 116.78 -_B_MEAN = 103.94 -CHANNEL_MEANS = [_R_MEAN, _G_MEAN, _B_MEAN] - -# The lower bound for the smallest side of the image for aspect-preserving -# resizing. For example, if an image is 500 x 1000, it will be resized to -# _RESIZE_MIN x (_RESIZE_MIN * 2). -_RESIZE_MIN = 256 - - -def process_record_dataset(dataset, - is_training, - batch_size, - shuffle_buffer, - parse_record_fn, - dtype=tf.float32, - datasets_num_private_threads=None, - drop_remainder=False, - tf_data_experimental_slack=False): - """Given a Dataset with raw records, return an iterator over the records. - - Args: - dataset: A Dataset representing raw records - is_training: A boolean denoting whether the input is for training. - batch_size: The number of samples per batch. - shuffle_buffer: The buffer size to use when shuffling records. A larger - value results in better randomness, but smaller values reduce startup time - and use less memory. - parse_record_fn: A function that takes a raw record and returns the - corresponding (image, label) pair. - dtype: Data type to use for images/features. - datasets_num_private_threads: Number of threads for a private threadpool - created for all datasets computation. - drop_remainder: A boolean indicates whether to drop the remainder of the - batches. If True, the batch dimension will be static. - tf_data_experimental_slack: Whether to enable tf.data's `experimental_slack` - option. - - Returns: - Dataset of (image, label) pairs ready for iteration. - """ - # Defines a specific size thread pool for tf.data operations. - if datasets_num_private_threads: - options = tf.data.Options() - options.experimental_threading.private_threadpool_size = ( - datasets_num_private_threads) - dataset = dataset.with_options(options) - logging.info('datasets_num_private_threads: %s', - datasets_num_private_threads) - - if is_training: - # Shuffles records before repeating to respect epoch boundaries. - dataset = dataset.shuffle(buffer_size=shuffle_buffer) - # Repeats the dataset for the number of epochs to train. - dataset = dataset.repeat() - - # Parses the raw records into images and labels. - dataset = dataset.map( - lambda value: parse_record_fn(value, is_training, dtype), - num_parallel_calls=tf.data.experimental.AUTOTUNE) - dataset = dataset.batch(batch_size, drop_remainder=drop_remainder) - - # Operations between the final prefetch and the get_next call to the iterator - # will happen synchronously during run time. We prefetch here again to - # background all of the above processing work and keep it out of the - # critical training path. Setting buffer_size to tf.data.experimental.AUTOTUNE - # allows DistributionStrategies to adjust how many batches to fetch based - # on how many devices are present. - dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE) - - options = tf.data.Options() - options.experimental_slack = tf_data_experimental_slack - dataset = dataset.with_options(options) - - return dataset - - -def get_filenames(is_training, data_dir): - """Return filenames for dataset.""" - if is_training: - return [ - os.path.join(data_dir, 'train-%05d-of-01024' % i) - for i in range(_NUM_TRAIN_FILES) - ] - else: - return [ - os.path.join(data_dir, 'validation-%05d-of-00128' % i) - for i in range(128) - ] - - -def parse_example_proto(example_serialized): - """Parses an Example proto containing a training example of an image. - - The output of the build_image_data.py image preprocessing script is a dataset - containing serialized Example protocol buffers. Each Example proto contains - the following fields (values are included as examples): - - image/height: 462 - image/width: 581 - image/colorspace: 'RGB' - image/channels: 3 - image/class/label: 615 - image/class/synset: 'n03623198' - image/class/text: 'knee pad' - image/object/bbox/xmin: 0.1 - image/object/bbox/xmax: 0.9 - image/object/bbox/ymin: 0.2 - image/object/bbox/ymax: 0.6 - image/object/bbox/label: 615 - image/format: 'JPEG' - image/filename: 'ILSVRC2012_val_00041207.JPEG' - image/encoded: - - Args: - example_serialized: scalar Tensor tf.string containing a serialized Example - protocol buffer. - - Returns: - image_buffer: Tensor tf.string containing the contents of a JPEG file. - label: Tensor tf.int32 containing the label. - bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] - where each coordinate is [0, 1) and the coordinates are arranged as - [ymin, xmin, ymax, xmax]. - """ - # Dense features in Example proto. - feature_map = { - 'image/encoded': - tf.io.FixedLenFeature([], dtype=tf.string, default_value=''), - 'image/class/label': - tf.io.FixedLenFeature([], dtype=tf.int64, default_value=-1), - 'image/class/text': - tf.io.FixedLenFeature([], dtype=tf.string, default_value=''), - } - sparse_float32 = tf.io.VarLenFeature(dtype=tf.float32) - # Sparse features in Example proto. - feature_map.update({ - k: sparse_float32 for k in [ - 'image/object/bbox/xmin', 'image/object/bbox/ymin', - 'image/object/bbox/xmax', 'image/object/bbox/ymax' - ] - }) - - features = tf.io.parse_single_example( - serialized=example_serialized, features=feature_map) - label = tf.cast(features['image/class/label'], dtype=tf.int32) - - xmin = tf.expand_dims(features['image/object/bbox/xmin'].values, 0) - ymin = tf.expand_dims(features['image/object/bbox/ymin'].values, 0) - xmax = tf.expand_dims(features['image/object/bbox/xmax'].values, 0) - ymax = tf.expand_dims(features['image/object/bbox/ymax'].values, 0) - - # Note that we impose an ordering of (y, x) just to make life difficult. - bbox = tf.concat([ymin, xmin, ymax, xmax], 0) - - # Force the variable number of bounding boxes into the shape - # [1, num_boxes, coords]. - bbox = tf.expand_dims(bbox, 0) - bbox = tf.transpose(a=bbox, perm=[0, 2, 1]) - - return features['image/encoded'], label, bbox - - -def parse_record(raw_record, is_training, dtype): - """Parses a record containing a training example of an image. - - The input record is parsed into a label and image, and the image is passed - through preprocessing steps (cropping, flipping, and so on). - - Args: - raw_record: scalar Tensor tf.string containing a serialized Example protocol - buffer. - is_training: A boolean denoting whether the input is for training. - dtype: data type to use for images/features. - - Returns: - Tuple with processed image tensor in a channel-last format and - one-hot-encoded label tensor. - """ - image_buffer, label, bbox = parse_example_proto(raw_record) - - image = preprocess_image( - image_buffer=image_buffer, - bbox=bbox, - output_height=DEFAULT_IMAGE_SIZE, - output_width=DEFAULT_IMAGE_SIZE, - num_channels=NUM_CHANNELS, - is_training=is_training) - image = tf.cast(image, dtype) - - # Subtract one so that labels are in [0, 1000), and cast to float32 for - # Keras model. - label = tf.cast( - tf.cast(tf.reshape(label, shape=[1]), dtype=tf.int32) - 1, - dtype=tf.float32) - return image, label - - -def get_parse_record_fn(use_keras_image_data_format=False): - """Get a function for parsing the records, accounting for image format. - - This is useful by handling different types of Keras models. For instance, - the current resnet_model.resnet50 input format is always channel-last, - whereas the keras_applications mobilenet input format depends on - tf.keras.backend.image_data_format(). We should set - use_keras_image_data_format=False for the former and True for the latter. - - Args: - use_keras_image_data_format: A boolean denoting whether data format is keras - backend image data format. If False, the image format is channel-last. If - True, the image format matches tf.keras.backend.image_data_format(). - - Returns: - Function to use for parsing the records. - """ - - def parse_record_fn(raw_record, is_training, dtype): - image, label = parse_record(raw_record, is_training, dtype) - if use_keras_image_data_format: - if tf.keras.backend.image_data_format() == 'channels_first': - image = tf.transpose(image, perm=[2, 0, 1]) - return image, label - - return parse_record_fn - - -def input_fn(is_training, - data_dir, - batch_size, - dtype=tf.float32, - datasets_num_private_threads=None, - parse_record_fn=parse_record, - input_context=None, - drop_remainder=False, - tf_data_experimental_slack=False, - training_dataset_cache=False, - filenames=None): - """Input function which provides batches for train or eval. - - Args: - is_training: A boolean denoting whether the input is for training. - data_dir: The directory containing the input data. - batch_size: The number of samples per batch. - dtype: Data type to use for images/features - datasets_num_private_threads: Number of private threads for tf.data. - parse_record_fn: Function to use for parsing the records. - input_context: A `tf.distribute.InputContext` object passed in by - `tf.distribute.Strategy`. - drop_remainder: A boolean indicates whether to drop the remainder of the - batches. If True, the batch dimension will be static. - tf_data_experimental_slack: Whether to enable tf.data's `experimental_slack` - option. - training_dataset_cache: Whether to cache the training dataset on workers. - Typically used to improve training performance when training data is in - remote storage and can fit into worker memory. - filenames: Optional field for providing the file names of the TFRecords. - - Returns: - A dataset that can be used for iteration. - """ - if filenames is None: - filenames = get_filenames(is_training, data_dir) - dataset = tf.data.Dataset.from_tensor_slices(filenames) - - if input_context: - logging.info( - 'Sharding the dataset: input_pipeline_id=%d num_input_pipelines=%d', - input_context.input_pipeline_id, input_context.num_input_pipelines) - dataset = dataset.shard(input_context.num_input_pipelines, - input_context.input_pipeline_id) - - if is_training: - # Shuffle the input files - dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES) - - # Convert to individual records. - # cycle_length = 10 means that up to 10 files will be read and deserialized in - # parallel. You may want to increase this number if you have a large number of - # CPU cores. - dataset = dataset.interleave( - tf.data.TFRecordDataset, - cycle_length=10, - num_parallel_calls=tf.data.experimental.AUTOTUNE) - - if is_training and training_dataset_cache: - # Improve training performance when training data is in remote storage and - # can fit into worker memory. - dataset = dataset.cache() - - return process_record_dataset( - dataset=dataset, - is_training=is_training, - batch_size=batch_size, - shuffle_buffer=_SHUFFLE_BUFFER, - parse_record_fn=parse_record_fn, - dtype=dtype, - datasets_num_private_threads=datasets_num_private_threads, - drop_remainder=drop_remainder, - tf_data_experimental_slack=tf_data_experimental_slack, - ) - - -def _decode_crop_and_flip(image_buffer, bbox, num_channels): - """Crops the given image to a random part of the image, and randomly flips. - - We use the fused decode_and_crop op, which performs better than the two ops - used separately in series, but note that this requires that the image be - passed in as an un-decoded string Tensor. - - Args: - image_buffer: scalar string Tensor representing the raw JPEG image buffer. - bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] - where each coordinate is [0, 1) and the coordinates are arranged as [ymin, - xmin, ymax, xmax]. - num_channels: Integer depth of the image buffer for decoding. - - Returns: - 3-D tensor with cropped image. - - """ - # A large fraction of image datasets contain a human-annotated bounding box - # delineating the region of the image containing the object of interest. We - # choose to create a new bounding box for the object which is a randomly - # distorted version of the human-annotated bounding box that obeys an - # allowed range of aspect ratios, sizes and overlap with the human-annotated - # bounding box. If no box is supplied, then we assume the bounding box is - # the entire image. - sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( - tf.image.extract_jpeg_shape(image_buffer), - bounding_boxes=bbox, - min_object_covered=0.1, - aspect_ratio_range=[0.75, 1.33], - area_range=[0.05, 1.0], - max_attempts=100, - use_image_if_no_bounding_boxes=True) - bbox_begin, bbox_size, _ = sample_distorted_bounding_box - - # Reassemble the bounding box in the format the crop op requires. - offset_y, offset_x, _ = tf.unstack(bbox_begin) - target_height, target_width, _ = tf.unstack(bbox_size) - crop_window = tf.stack([offset_y, offset_x, target_height, target_width]) - - # Use the fused decode and crop op here, which is faster than each in series. - cropped = tf.image.decode_and_crop_jpeg( - image_buffer, crop_window, channels=num_channels) - - # Flip to add a little more random distortion in. - cropped = tf.image.random_flip_left_right(cropped) - return cropped - - -def _central_crop(image, crop_height, crop_width): - """Performs central crops of the given image list. - - Args: - image: a 3-D image tensor - crop_height: the height of the image following the crop. - crop_width: the width of the image following the crop. - - Returns: - 3-D tensor with cropped image. - """ - shape = tf.shape(input=image) - height, width = shape[0], shape[1] - - amount_to_be_cropped_h = (height - crop_height) - crop_top = amount_to_be_cropped_h // 2 - amount_to_be_cropped_w = (width - crop_width) - crop_left = amount_to_be_cropped_w // 2 - return tf.slice(image, [crop_top, crop_left, 0], - [crop_height, crop_width, -1]) - - -def _mean_image_subtraction(image, means, num_channels): - """Subtracts the given means from each image channel. - - For example: - means = [123.68, 116.779, 103.939] - image = _mean_image_subtraction(image, means) - - Note that the rank of `image` must be known. - - Args: - image: a tensor of size [height, width, C]. - means: a C-vector of values to subtract from each channel. - num_channels: number of color channels in the image that will be distorted. - - Returns: - the centered image. - - Raises: - ValueError: If the rank of `image` is unknown, if `image` has a rank other - than three or if the number of channels in `image` doesn't match the - number of values in `means`. - """ - if image.get_shape().ndims != 3: - raise ValueError('Input must be of size [height, width, C>0]') - - if len(means) != num_channels: - raise ValueError('len(means) must match the number of channels') - - # We have a 1-D tensor of means; convert to 3-D. - # Note(b/130245863): we explicitly call `broadcast` instead of simply - # expanding dimensions for better performance. - means = tf.broadcast_to(means, tf.shape(image)) - - return image - means - - -def _smallest_size_at_least(height, width, resize_min): - """Computes new shape with the smallest side equal to `smallest_side`. - - Computes new shape with the smallest side equal to `smallest_side` while - preserving the original aspect ratio. - - Args: - height: an int32 scalar tensor indicating the current height. - width: an int32 scalar tensor indicating the current width. - resize_min: A python integer or scalar `Tensor` indicating the size of the - smallest side after resize. - - Returns: - new_height: an int32 scalar tensor indicating the new height. - new_width: an int32 scalar tensor indicating the new width. - """ - resize_min = tf.cast(resize_min, tf.float32) - - # Convert to floats to make subsequent calculations go smoothly. - height, width = tf.cast(height, tf.float32), tf.cast(width, tf.float32) - - smaller_dim = tf.minimum(height, width) - scale_ratio = resize_min / smaller_dim - - # Convert back to ints to make heights and widths that TF ops will accept. - new_height = tf.cast(height * scale_ratio, tf.int32) - new_width = tf.cast(width * scale_ratio, tf.int32) - - return new_height, new_width - - -def _aspect_preserving_resize(image, resize_min): - """Resize images preserving the original aspect ratio. - - Args: - image: A 3-D image `Tensor`. - resize_min: A python integer or scalar `Tensor` indicating the size of the - smallest side after resize. - - Returns: - resized_image: A 3-D tensor containing the resized image. - """ - shape = tf.shape(input=image) - height, width = shape[0], shape[1] - - new_height, new_width = _smallest_size_at_least(height, width, resize_min) - - return _resize_image(image, new_height, new_width) - - -def _resize_image(image, height, width): - """Simple wrapper around tf.resize_images. - - This is primarily to make sure we use the same `ResizeMethod` and other - details each time. - - Args: - image: A 3-D image `Tensor`. - height: The target height for the resized image. - width: The target width for the resized image. - - Returns: - resized_image: A 3-D tensor containing the resized image. The first two - dimensions have the shape [height, width]. - """ - return tf.compat.v1.image.resize( - image, [height, width], - method=tf.image.ResizeMethod.BILINEAR, - align_corners=False) - - -def preprocess_image(image_buffer, - bbox, - output_height, - output_width, - num_channels, - is_training=False): - """Preprocesses the given image. - - Preprocessing includes decoding, cropping, and resizing for both training - and eval images. Training preprocessing, however, introduces some random - distortion of the image to improve accuracy. - - Args: - image_buffer: scalar string Tensor representing the raw JPEG image buffer. - bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] - where each coordinate is [0, 1) and the coordinates are arranged as [ymin, - xmin, ymax, xmax]. - output_height: The height of the image after preprocessing. - output_width: The width of the image after preprocessing. - num_channels: Integer depth of the image buffer for decoding. - is_training: `True` if we're preprocessing the image for training and - `False` otherwise. - - Returns: - A preprocessed image. - """ - if is_training: - # For training, we want to randomize some of the distortions. - image = _decode_crop_and_flip(image_buffer, bbox, num_channels) - image = _resize_image(image, output_height, output_width) - else: - # For validation, we want to decode, resize, then just crop the middle. - image = tf.image.decode_jpeg(image_buffer, channels=num_channels) - image = _aspect_preserving_resize(image, _RESIZE_MIN) - image = _central_crop(image, output_height, output_width) - - image.set_shape([output_height, output_width, num_channels]) - - return _mean_image_subtraction(image, CHANNEL_MEANS, num_channels) diff --git a/official/vision/image_classification/resnet/resnet_config.py b/official/vision/image_classification/resnet/resnet_config.py deleted file mode 100644 index e39db3955f9fe9c312ea307c8ac3196d45447cf3..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_config.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# Lint as: python3 -"""Configuration definitions for ResNet losses, learning rates, and optimizers.""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import dataclasses - -from official.modeling.hyperparams import base_config -from official.vision.image_classification.configs import base_configs - - -@dataclasses.dataclass -class ResNetModelConfig(base_configs.ModelConfig): - """Configuration for the ResNet model.""" - name: str = 'ResNet' - num_classes: int = 1000 - model_params: base_config.Config = dataclasses.field( - default_factory=lambda: { - 'num_classes': 1000, - 'batch_size': None, - 'use_l2_regularizer': True, - 'rescale_inputs': False, - }) - loss: base_configs.LossConfig = base_configs.LossConfig( - name='sparse_categorical_crossentropy') - optimizer: base_configs.OptimizerConfig = base_configs.OptimizerConfig( - name='momentum', - decay=0.9, - epsilon=0.001, - momentum=0.9, - moving_average_decay=None) - learning_rate: base_configs.LearningRateConfig = ( - base_configs.LearningRateConfig( - name='stepwise', - initial_lr=0.1, - examples_per_epoch=1281167, - boundaries=[30, 60, 80], - warmup_epochs=5, - scale_by_batch_size=1. / 256., - multipliers=[0.1 / 256, 0.01 / 256, 0.001 / 256, 0.0001 / 256])) diff --git a/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py b/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py deleted file mode 100644 index dbd4bb721e7da4c87b440270a0f4136b0c8fa4a1..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_ctl_imagenet_main.py +++ /dev/null @@ -1,197 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Runs a ResNet model on the ImageNet dataset using custom training loops.""" - -import math -import os - -# Import libraries -from absl import app -from absl import flags -from absl import logging -import orbit -import json -import tensorflow as tf -from official.common import distribute_utils -from official.modeling import performance -from official.utils.flags import core as flags_core -from official.utils.misc import keras_utils -from official.utils.misc import model_helpers -from official.vision.image_classification.resnet import common -from official.vision.image_classification.resnet import imagenet_preprocessing -from official.vision.image_classification.resnet import resnet_runnable - -flags.DEFINE_boolean(name='use_tf_function', default=True, - help='Wrap the train and test step inside a ' - 'tf.function.') -flags.DEFINE_boolean(name='single_l2_loss_op', default=False, - help='Calculate L2_loss on concatenated weights, ' - 'instead of using Keras per-layer L2 loss.') - - -def build_stats(runnable, time_callback): - """Normalizes and returns dictionary of stats. - - Args: - runnable: The module containing all the training and evaluation metrics. - time_callback: Time tracking callback instance. - - Returns: - Dictionary of normalized results. - """ - stats = {} - - if not runnable.flags_obj.skip_eval: - stats['eval_loss'] = runnable.test_loss.result().numpy() - stats['eval_acc'] = runnable.test_accuracy.result().numpy() - - stats['train_loss'] = runnable.train_loss.result().numpy() - stats['train_acc'] = runnable.train_accuracy.result().numpy() - - if time_callback: - timestamp_log = time_callback.timestamp_log - stats['step_timestamp_log'] = timestamp_log - stats['train_finish_time'] = time_callback.train_finish_time - if time_callback.epoch_runtime_log: - stats['avg_exp_per_second'] = time_callback.average_examples_per_second - - return stats - - -def get_num_train_iterations(flags_obj): - """Returns the number of training steps, train and test epochs.""" - train_steps = ( - imagenet_preprocessing.NUM_IMAGES['train'] // flags_obj.batch_size) - train_epochs = flags_obj.train_epochs - - if flags_obj.train_steps: - train_steps = min(flags_obj.train_steps, train_steps) - train_epochs = 1 - - eval_steps = math.ceil(1.0 * imagenet_preprocessing.NUM_IMAGES['validation'] / - flags_obj.batch_size) - - return train_steps, train_epochs, eval_steps - - -def run(flags_obj): - """Run ResNet ImageNet training and eval loop using custom training loops. - - Args: - flags_obj: An object containing parsed flag values. - - Raises: - ValueError: If fp16 is passed as it is not currently supported. - - Returns: - Dictionary of training and eval stats. - """ - - keras_utils.set_session_config() - performance.set_mixed_precision_policy(flags_core.get_tf_dtype(flags_obj)) - - if tf.config.list_physical_devices('GPU'): - if flags_obj.tf_gpu_thread_mode: - keras_utils.set_gpu_thread_mode_and_count( - per_gpu_thread_count=flags_obj.per_gpu_thread_count, - gpu_thread_mode=flags_obj.tf_gpu_thread_mode, - num_gpus=flags_obj.num_gpus, - datasets_num_private_threads=flags_obj.datasets_num_private_threads) - common.set_cudnn_batchnorm_mode() - - data_format = flags_obj.data_format - if data_format is None: - data_format = ('channels_first' if tf.config.list_physical_devices('GPU') - else 'channels_last') - tf.keras.backend.set_image_data_format(data_format) - - strategy = distribute_utils.get_distribution_strategy( - distribution_strategy=flags_obj.distribution_strategy, - num_gpus=flags_obj.num_gpus, - all_reduce_alg=flags_obj.all_reduce_alg, - num_packs=flags_obj.num_packs, - tpu_address=flags_obj.tpu) - - per_epoch_steps, train_epochs, eval_steps = get_num_train_iterations( - flags_obj) - if flags_obj.steps_per_loop is None: - steps_per_loop = per_epoch_steps - elif flags_obj.steps_per_loop > per_epoch_steps: - steps_per_loop = per_epoch_steps - logging.warn('Setting steps_per_loop to %d to respect epoch boundary.', - steps_per_loop) - else: - steps_per_loop = flags_obj.steps_per_loop - - logging.info( - 'Training %d epochs, each epoch has %d steps, ' - 'total steps: %d; Eval %d steps', train_epochs, per_epoch_steps, - train_epochs * per_epoch_steps, eval_steps) - - time_callback = keras_utils.TimeHistory( - flags_obj.batch_size, - flags_obj.log_steps, - logdir=flags_obj.model_dir if flags_obj.enable_tensorboard else None) - with distribute_utils.get_strategy_scope(strategy): - runnable = resnet_runnable.ResnetRunnable(flags_obj, time_callback, - per_epoch_steps) - - eval_interval = flags_obj.epochs_between_evals * per_epoch_steps - checkpoint_interval = ( - steps_per_loop * 5 if flags_obj.enable_checkpoint_and_export else None) - summary_interval = steps_per_loop if flags_obj.enable_tensorboard else None - - checkpoint_manager = tf.train.CheckpointManager( - runnable.checkpoint, - directory=flags_obj.model_dir, - max_to_keep=10, - step_counter=runnable.global_step, - checkpoint_interval=checkpoint_interval) - - resnet_controller = orbit.Controller( - strategy=strategy, - trainer=runnable, - evaluator=runnable if not flags_obj.skip_eval else None, - global_step=runnable.global_step, - steps_per_loop=steps_per_loop, - checkpoint_manager=checkpoint_manager, - summary_interval=summary_interval, - summary_dir=flags_obj.model_dir, - eval_summary_dir=os.path.join(flags_obj.model_dir, 'eval')) - - time_callback.on_train_begin() - if not flags_obj.skip_eval: - resnet_controller.train_and_evaluate( - train_steps=per_epoch_steps * train_epochs, - eval_steps=eval_steps, - eval_interval=eval_interval) - else: - resnet_controller.train(steps=per_epoch_steps * train_epochs) - time_callback.on_train_end() - - stats = build_stats(runnable, time_callback) - return stats - - -def main(_): - model_helpers.apply_clean(flags.FLAGS) - stats = run(flags.FLAGS) - logging.info('Run stats:\n%s', stats) - - -if __name__ == '__main__': - logging.set_verbosity(logging.INFO) - common.define_keras_flags() - app.run(main) diff --git a/official/vision/image_classification/resnet/resnet_model.py b/official/vision/image_classification/resnet/resnet_model.py deleted file mode 100644 index 597b85739e965a157aff995d14891f76698678d4..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_model.py +++ /dev/null @@ -1,325 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""ResNet50 model for Keras. - -Adapted from tf.keras.applications.resnet50.ResNet50(). -This is ResNet model version 1.5. - -Related papers/blogs: -- https://arxiv.org/abs/1512.03385 -- https://arxiv.org/pdf/1603.05027v2.pdf -- http://torch.ch/blog/2016/02/04/resnets.html - -""" -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf -from official.vision.image_classification.resnet import imagenet_preprocessing - -layers = tf.keras.layers - - -def _gen_l2_regularizer(use_l2_regularizer=True, l2_weight_decay=1e-4): - return tf.keras.regularizers.L2( - l2_weight_decay) if use_l2_regularizer else None - - -def identity_block(input_tensor, - kernel_size, - filters, - stage, - block, - use_l2_regularizer=True, - batch_norm_decay=0.9, - batch_norm_epsilon=1e-5): - """The identity block is the block that has no conv layer at shortcut. - - Args: - input_tensor: input tensor - kernel_size: default 3, the kernel size of middle conv layer at main path - filters: list of integers, the filters of 3 conv layer at main path - stage: integer, current stage label, used for generating layer names - block: 'a','b'..., current block label, used for generating layer names - use_l2_regularizer: whether to use L2 regularizer on Conv layer. - batch_norm_decay: Moment of batch norm layers. - batch_norm_epsilon: Epsilon of batch borm layers. - - Returns: - Output tensor for the block. - """ - filters1, filters2, filters3 = filters - if tf.keras.backend.image_data_format() == 'channels_last': - bn_axis = 3 - else: - bn_axis = 1 - conv_name_base = 'res' + str(stage) + block + '_branch' - bn_name_base = 'bn' + str(stage) + block + '_branch' - - x = layers.Conv2D( - filters1, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2a')( - input_tensor) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2a')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters2, - kernel_size, - padding='same', - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2b')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2b')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters3, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2c')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2c')( - x) - - x = layers.add([x, input_tensor]) - x = layers.Activation('relu')(x) - return x - - -def conv_block(input_tensor, - kernel_size, - filters, - stage, - block, - strides=(2, 2), - use_l2_regularizer=True, - batch_norm_decay=0.9, - batch_norm_epsilon=1e-5): - """A block that has a conv layer at shortcut. - - Note that from stage 3, - the second conv layer at main path is with strides=(2, 2) - And the shortcut should have strides=(2, 2) as well - - Args: - input_tensor: input tensor - kernel_size: default 3, the kernel size of middle conv layer at main path - filters: list of integers, the filters of 3 conv layer at main path - stage: integer, current stage label, used for generating layer names - block: 'a','b'..., current block label, used for generating layer names - strides: Strides for the second conv layer in the block. - use_l2_regularizer: whether to use L2 regularizer on Conv layer. - batch_norm_decay: Moment of batch norm layers. - batch_norm_epsilon: Epsilon of batch borm layers. - - Returns: - Output tensor for the block. - """ - filters1, filters2, filters3 = filters - if tf.keras.backend.image_data_format() == 'channels_last': - bn_axis = 3 - else: - bn_axis = 1 - conv_name_base = 'res' + str(stage) + block + '_branch' - bn_name_base = 'bn' + str(stage) + block + '_branch' - - x = layers.Conv2D( - filters1, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2a')( - input_tensor) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2a')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters2, - kernel_size, - strides=strides, - padding='same', - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2b')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2b')( - x) - x = layers.Activation('relu')(x) - - x = layers.Conv2D( - filters3, (1, 1), - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '2c')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '2c')( - x) - - shortcut = layers.Conv2D( - filters3, (1, 1), - strides=strides, - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name=conv_name_base + '1')( - input_tensor) - shortcut = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name=bn_name_base + '1')( - shortcut) - - x = layers.add([x, shortcut]) - x = layers.Activation('relu')(x) - return x - - -def resnet50(num_classes, - batch_size=None, - use_l2_regularizer=True, - rescale_inputs=False, - batch_norm_decay=0.9, - batch_norm_epsilon=1e-5): - """Instantiates the ResNet50 architecture. - - Args: - num_classes: `int` number of classes for image classification. - batch_size: Size of the batches for each step. - use_l2_regularizer: whether to use L2 regularizer on Conv/Dense layer. - rescale_inputs: whether to rescale inputs from 0 to 1. - batch_norm_decay: Moment of batch norm layers. - batch_norm_epsilon: Epsilon of batch borm layers. - - Returns: - A Keras model instance. - """ - input_shape = (224, 224, 3) - img_input = layers.Input(shape=input_shape, batch_size=batch_size) - if rescale_inputs: - # Hub image modules expect inputs in the range [0, 1]. This rescales these - # inputs to the range expected by the trained model. - x = layers.Lambda( - lambda x: x * 255.0 - tf.keras.backend.constant( # pylint: disable=g-long-lambda - imagenet_preprocessing.CHANNEL_MEANS, - shape=[1, 1, 3], - dtype=x.dtype), - name='rescale')( - img_input) - else: - x = img_input - - if tf.keras.backend.image_data_format() == 'channels_first': - x = layers.Permute((3, 1, 2))(x) - bn_axis = 1 - else: # channels_last - bn_axis = 3 - - block_config = dict( - use_l2_regularizer=use_l2_regularizer, - batch_norm_decay=batch_norm_decay, - batch_norm_epsilon=batch_norm_epsilon) - x = layers.ZeroPadding2D(padding=(3, 3), name='conv1_pad')(x) - x = layers.Conv2D( - 64, (7, 7), - strides=(2, 2), - padding='valid', - use_bias=False, - kernel_initializer='he_normal', - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name='conv1')( - x) - x = layers.BatchNormalization( - axis=bn_axis, - momentum=batch_norm_decay, - epsilon=batch_norm_epsilon, - name='bn_conv1')( - x) - x = layers.Activation('relu')(x) - x = layers.MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x) - - x = conv_block( - x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), **block_config) - x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', **block_config) - x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', **block_config) - - x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', **block_config) - x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', **block_config) - x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', **block_config) - x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', **block_config) - - x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e', **block_config) - x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f', **block_config) - - x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', **block_config) - x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', **block_config) - x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', **block_config) - - x = layers.GlobalAveragePooling2D()(x) - x = layers.Dense( - num_classes, - kernel_initializer=tf.initializers.random_normal(stddev=0.01), - kernel_regularizer=_gen_l2_regularizer(use_l2_regularizer), - bias_regularizer=_gen_l2_regularizer(use_l2_regularizer), - name='fc1000')( - x) - - # A softmax that is followed by the model loss must be done cannot be done - # in float16 due to numeric issues. So we pass dtype=float32. - x = layers.Activation('softmax', dtype='float32')(x) - - # Create model. - return tf.keras.Model(img_input, x, name='resnet50') diff --git a/official/vision/image_classification/resnet/resnet_runnable.py b/official/vision/image_classification/resnet/resnet_runnable.py deleted file mode 100644 index dfd8d250d03154e42a07ce44180839b831d977ce..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/resnet_runnable.py +++ /dev/null @@ -1,221 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Runs a ResNet model on the ImageNet dataset using custom training loops.""" - -import orbit -import tensorflow as tf -from official.modeling import grad_utils -from official.modeling import performance -from official.utils.flags import core as flags_core -from official.vision.image_classification.resnet import common -from official.vision.image_classification.resnet import imagenet_preprocessing -from official.vision.image_classification.resnet import resnet_model - - -class ResnetRunnable(orbit.StandardTrainer, orbit.StandardEvaluator): - """Implements the training and evaluation APIs for Resnet model.""" - - def __init__(self, flags_obj, time_callback, epoch_steps): - self.strategy = tf.distribute.get_strategy() - self.flags_obj = flags_obj - self.dtype = flags_core.get_tf_dtype(flags_obj) - self.time_callback = time_callback - - # Input pipeline related - batch_size = flags_obj.batch_size - if batch_size % self.strategy.num_replicas_in_sync != 0: - raise ValueError( - 'Batch size must be divisible by number of replicas : {}'.format( - self.strategy.num_replicas_in_sync)) - - # As auto rebatching is not supported in - # `distribute_datasets_from_function()` API, which is - # required when cloning dataset to multiple workers in eager mode, - # we use per-replica batch size. - self.batch_size = int(batch_size / self.strategy.num_replicas_in_sync) - - if self.flags_obj.use_synthetic_data: - self.input_fn = common.get_synth_input_fn( - height=imagenet_preprocessing.DEFAULT_IMAGE_SIZE, - width=imagenet_preprocessing.DEFAULT_IMAGE_SIZE, - num_channels=imagenet_preprocessing.NUM_CHANNELS, - num_classes=imagenet_preprocessing.NUM_CLASSES, - dtype=self.dtype, - drop_remainder=True) - else: - self.input_fn = imagenet_preprocessing.input_fn - - self.model = resnet_model.resnet50( - num_classes=imagenet_preprocessing.NUM_CLASSES, - use_l2_regularizer=not flags_obj.single_l2_loss_op) - - lr_schedule = common.PiecewiseConstantDecayWithWarmup( - batch_size=flags_obj.batch_size, - epoch_size=imagenet_preprocessing.NUM_IMAGES['train'], - warmup_epochs=common.LR_SCHEDULE[0][1], - boundaries=list(p[1] for p in common.LR_SCHEDULE[1:]), - multipliers=list(p[0] for p in common.LR_SCHEDULE), - compute_lr_on_cpu=True) - self.optimizer = common.get_optimizer(lr_schedule) - # Make sure iterations variable is created inside scope. - self.global_step = self.optimizer.iterations - -<<<<<<< HEAD - use_graph_rewrite = flags_obj.fp16_implementation == 'graph_rewrite' - if use_graph_rewrite and not flags_obj.use_tf_function: - raise ValueError('--fp16_implementation=graph_rewrite requires ' - '--use_tf_function to be true') - self.optimizer = performance.configure_optimizer( - self.optimizer, - use_float16=self.dtype == tf.float16, - use_graph_rewrite=use_graph_rewrite, -======= - self.optimizer = performance.configure_optimizer( - self.optimizer, - use_float16=self.dtype == tf.float16, ->>>>>>> v2.8.0 - loss_scale=flags_core.get_loss_scale(flags_obj, default_for_fp16=128)) - - self.train_loss = tf.keras.metrics.Mean('train_loss', dtype=tf.float32) - self.train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy( - 'train_accuracy', dtype=tf.float32) - self.test_loss = tf.keras.metrics.Mean('test_loss', dtype=tf.float32) - self.test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy( - 'test_accuracy', dtype=tf.float32) - - self.checkpoint = tf.train.Checkpoint( - model=self.model, optimizer=self.optimizer) - - # Handling epochs. - self.epoch_steps = epoch_steps - self.epoch_helper = orbit.utils.EpochHelper(epoch_steps, self.global_step) - train_dataset = orbit.utils.make_distributed_dataset( - self.strategy, - self.input_fn, - is_training=True, - data_dir=self.flags_obj.data_dir, - batch_size=self.batch_size, - parse_record_fn=imagenet_preprocessing.parse_record, - datasets_num_private_threads=self.flags_obj - .datasets_num_private_threads, - dtype=self.dtype, - drop_remainder=True) - orbit.StandardTrainer.__init__( - self, - train_dataset, - options=orbit.StandardTrainerOptions( - use_tf_while_loop=flags_obj.use_tf_while_loop, - use_tf_function=flags_obj.use_tf_function)) - if not flags_obj.skip_eval: - eval_dataset = orbit.utils.make_distributed_dataset( - self.strategy, - self.input_fn, - is_training=False, - data_dir=self.flags_obj.data_dir, - batch_size=self.batch_size, - parse_record_fn=imagenet_preprocessing.parse_record, - dtype=self.dtype) - orbit.StandardEvaluator.__init__( - self, - eval_dataset, - options=orbit.StandardEvaluatorOptions( - use_tf_function=flags_obj.use_tf_function)) - - def train_loop_begin(self): - """See base class.""" - # Reset all metrics - self.train_loss.reset_states() - self.train_accuracy.reset_states() - - self._epoch_begin() - self.time_callback.on_batch_begin(self.epoch_helper.batch_index) - - def train_step(self, iterator): - """See base class.""" - - def step_fn(inputs): - """Function to run on the device.""" - images, labels = inputs - with tf.GradientTape() as tape: - logits = self.model(images, training=True) - - prediction_loss = tf.keras.losses.sparse_categorical_crossentropy( - labels, logits) - loss = tf.reduce_sum(prediction_loss) * (1.0 / - self.flags_obj.batch_size) - num_replicas = self.strategy.num_replicas_in_sync - l2_weight_decay = 1e-4 - if self.flags_obj.single_l2_loss_op: - l2_loss = l2_weight_decay * 2 * tf.add_n([ - tf.nn.l2_loss(v) - for v in self.model.trainable_variables - if 'bn' not in v.name - ]) - - loss += (l2_loss / num_replicas) - else: - loss += (tf.reduce_sum(self.model.losses) / num_replicas) - - grad_utils.minimize_using_explicit_allreduce( - tape, self.optimizer, loss, self.model.trainable_variables) - self.train_loss.update_state(loss) - self.train_accuracy.update_state(labels, logits) - if self.flags_obj.enable_xla: - step_fn = tf.function(step_fn, jit_compile=True) - self.strategy.run(step_fn, args=(next(iterator),)) - - def train_loop_end(self): - """See base class.""" - metrics = { - 'train_loss': self.train_loss.result(), - 'train_accuracy': self.train_accuracy.result(), - } - self.time_callback.on_batch_end(self.epoch_helper.batch_index - 1) - self._epoch_end() - return metrics - - def eval_begin(self): - """See base class.""" - self.test_loss.reset_states() - self.test_accuracy.reset_states() - - def eval_step(self, iterator): - """See base class.""" - - def step_fn(inputs): - """Function to run on the device.""" - images, labels = inputs - logits = self.model(images, training=False) - loss = tf.keras.losses.sparse_categorical_crossentropy(labels, logits) - loss = tf.reduce_sum(loss) * (1.0 / self.flags_obj.batch_size) - self.test_loss.update_state(loss) - self.test_accuracy.update_state(labels, logits) - - self.strategy.run(step_fn, args=(next(iterator),)) - - def eval_end(self): - """See base class.""" - return { - 'test_loss': self.test_loss.result(), - 'test_accuracy': self.test_accuracy.result() - } - - def _epoch_begin(self): - if self.epoch_helper.epoch_begin(): - self.time_callback.on_epoch_begin(self.epoch_helper.current_epoch) - - def _epoch_end(self): - if self.epoch_helper.epoch_end(): - self.time_callback.on_epoch_end(self.epoch_helper.current_epoch) diff --git a/official/vision/image_classification/resnet/tfhub_export.py b/official/vision/image_classification/resnet/tfhub_export.py deleted file mode 100644 index 2b19f70bc7ae0c019d4d969cdedb28fdc5898b79..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/resnet/tfhub_export.py +++ /dev/null @@ -1,66 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""A script to export TF-Hub SavedModel.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os - -# Import libraries -from absl import app -from absl import flags - -import tensorflow as tf - -from official.vision.image_classification.resnet import imagenet_preprocessing -from official.vision.image_classification.resnet import resnet_model - -FLAGS = flags.FLAGS - -flags.DEFINE_string("model_path", None, - "File path to TF model checkpoint or H5 file.") -flags.DEFINE_string("export_path", None, - "TF-Hub SavedModel destination path to export.") - - -def export_tfhub(model_path, hub_destination): - """Restores a tf.keras.Model and saves for TF-Hub.""" - model = resnet_model.resnet50( - num_classes=imagenet_preprocessing.NUM_CLASSES, rescale_inputs=True) - model.load_weights(model_path) - model.save( - os.path.join(hub_destination, "classification"), include_optimizer=False) - - # Extracts a sub-model to use pooling feature vector as model output. - image_input = model.get_layer(index=0).get_output_at(0) - feature_vector_output = model.get_layer(name="reduce_mean").get_output_at(0) - hub_model = tf.keras.Model(image_input, feature_vector_output) - - # Exports a SavedModel. - hub_model.save( - os.path.join(hub_destination, "feature-vector"), include_optimizer=False) - - -def main(argv): - if len(argv) > 1: - raise app.UsageError("Too many command-line arguments.") - - export_tfhub(FLAGS.model_path, FLAGS.export_path) - - -if __name__ == "__main__": - app.run(main) diff --git a/official/vision/image_classification/test_utils.py b/official/vision/image_classification/test_utils.py deleted file mode 100644 index 8d7180c9d4e10c3241c4d6dd31d2cd013439df7a..0000000000000000000000000000000000000000 --- a/official/vision/image_classification/test_utils.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Test utilities for image classification tasks.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import tensorflow as tf - - -def trivial_model(num_classes): - """Trivial model for ImageNet dataset.""" - - input_shape = (224, 224, 3) - img_input = tf.keras.layers.Input(shape=input_shape) - - x = tf.keras.layers.Lambda( - lambda x: tf.keras.backend.reshape(x, [-1, 224 * 224 * 3]), - name='reshape')(img_input) - x = tf.keras.layers.Dense(1, name='fc1')(x) - x = tf.keras.layers.Dense(num_classes, name='fc1000')(x) - x = tf.keras.layers.Activation('softmax', dtype='float32')(x) - - return tf.keras.models.Model(img_input, x, name='trivial') diff --git a/official/vision/losses/__init__.py b/official/vision/losses/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/vision/losses/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/beta/losses/focal_loss.py b/official/vision/losses/focal_loss.py similarity index 98% rename from official/vision/beta/losses/focal_loss.py rename to official/vision/losses/focal_loss.py index 7241d9ae261e6644bb973e85587a3f6de535f603..4a4ce70b35829d8c1d93e6cabefe3072d21f3ab3 100644 --- a/official/vision/beta/losses/focal_loss.py +++ b/official/vision/losses/focal_loss.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/losses/loss_utils.py b/official/vision/losses/loss_utils.py new file mode 100644 index 0000000000000000000000000000000000000000..1c59d0c89d5a105afe054abe64a87aa44f7e5c47 --- /dev/null +++ b/official/vision/losses/loss_utils.py @@ -0,0 +1,42 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Losses utilities for detection models.""" + +import tensorflow as tf + + +def multi_level_flatten(multi_level_inputs, last_dim=None): + """Flattens a multi-level input. + + Args: + multi_level_inputs: Ordered Dict with level to [batch, d1, ..., dm]. + last_dim: Whether the output should be [batch_size, None], or [batch_size, + None, last_dim]. Defaults to `None`. + + Returns: + Concatenated output [batch_size, None], or [batch_size, None, dm] + """ + flattened_inputs = [] + batch_size = None + for level in multi_level_inputs.keys(): + single_input = multi_level_inputs[level] + if batch_size is None: + batch_size = single_input.shape[0] or tf.shape(single_input)[0] + if last_dim is not None: + flattened_input = tf.reshape(single_input, [batch_size, -1, last_dim]) + else: + flattened_input = tf.reshape(single_input, [batch_size, -1]) + flattened_inputs.append(flattened_input) + return tf.concat(flattened_inputs, axis=1) diff --git a/official/vision/beta/losses/maskrcnn_losses.py b/official/vision/losses/maskrcnn_losses.py similarity index 99% rename from official/vision/beta/losses/maskrcnn_losses.py rename to official/vision/losses/maskrcnn_losses.py index 48fd01819261b7fc8d34111046e6b1fae606edd0..99e0ac95bc25140ccce471f01660eb427d5c2e5a 100644 --- a/official/vision/beta/losses/maskrcnn_losses.py +++ b/official/vision/losses/maskrcnn_losses.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/losses/retinanet_losses.py b/official/vision/losses/retinanet_losses.py similarity index 99% rename from official/vision/beta/losses/retinanet_losses.py rename to official/vision/losses/retinanet_losses.py index 8baf2525e215bf95c8b202f8bafff0e0d7c67bc2..91aaecf082d4d5d390a2445c5e3290767401c578 100644 --- a/official/vision/beta/losses/retinanet_losses.py +++ b/official/vision/losses/retinanet_losses.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/losses/segmentation_losses.py b/official/vision/losses/segmentation_losses.py new file mode 100644 index 0000000000000000000000000000000000000000..c6e95efaa9d8af38f84322a4f1ff870d2da4a3f0 --- /dev/null +++ b/official/vision/losses/segmentation_losses.py @@ -0,0 +1,245 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Losses used for segmentation models.""" + +import tensorflow as tf + +from official.modeling import tf_utils +from official.vision.dataloaders import utils + +EPSILON = 1e-5 + + +class SegmentationLoss: + """Semantic segmentation loss.""" + + def __init__(self, + label_smoothing, + class_weights, + ignore_label, + use_groundtruth_dimension, + top_k_percent_pixels=1.0, + gt_is_matting_map=False + ): + """Initializes `SegmentationLoss`. + + Args: + label_smoothing: A float, if > 0., smooth out one-hot probability by + spreading the amount of probability to all other label classes. + class_weights: A float list containing the weight of each class. + ignore_label: An integer specifying the ignore label. + + use_groundtruth_dimension: A boolean, whether to resize the output to + match the dimension of the ground truth. + top_k_percent_pixels: A float, the value lies in [0.0, 1.0]. When its + value < 1., only compute the loss for the top k percent pixels. This is + useful for hard pixel mining. + gt_is_matting_map: If or not the groundtruth mask is a matting map. Note + that the matting map is only supported for 2 class segmentation. + """ + self._label_smoothing = label_smoothing + self._class_weights = class_weights + self._ignore_label = ignore_label + self._use_groundtruth_dimension = use_groundtruth_dimension + self._top_k_percent_pixels = top_k_percent_pixels + self._gt_is_matting_map = gt_is_matting_map + + def __call__(self, logits, labels, **kwargs): + """Computes `SegmentationLoss`. + + Args: + logits: A float tensor in shape (batch_size, height, width, num_classes) + which is the output of the network. + labels: A tensor in shape (batch_size, height, width, 1), which is the + label mask of the ground truth. + **kwargs: additional keyword arguments. + + Returns: + A 0-D float which stores the overall loss of the batch. + """ + _, height, width, _ = logits.get_shape().as_list() + + if self._use_groundtruth_dimension: + # TODO(arashwan): Test using align corners to match deeplab alignment. + logits = tf.image.resize( + logits, tf.shape(labels)[1:3], method=tf.image.ResizeMethod.BILINEAR) + else: + labels = tf.image.resize( + labels, (height, width), + method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) + + # Do not need to cast into int32 if it is a matting map + if not self._gt_is_matting_map: + labels = tf.cast(labels, tf.int32) + + valid_mask = tf.not_equal(labels, self._ignore_label) + + cross_entropy_loss = self.compute_pixelwise_loss(labels, logits, valid_mask, + **kwargs) + + if self._top_k_percent_pixels < 1.0: + return self.aggregate_loss_top_k(cross_entropy_loss) + else: + return self.aggregate_loss(cross_entropy_loss, valid_mask) + + def compute_pixelwise_loss(self, labels, logits, valid_mask, **kwargs): + """Computes the loss for each pixel. + + Args: + labels: An int32 tensor in shape (batch_size, height, width, 1), which is + the label mask of the ground truth. + logits: A float tensor in shape (batch_size, height, width, num_classes) + which is the output of the network. + valid_mask: A bool tensor in shape (batch_size, height, width, 1) which + masks out ignored pixels. + **kwargs: additional keyword arguments. + + Returns: + A float tensor in shape (batch_size, height, width) which stores the loss + value for each pixel. + """ + num_classes = logits.get_shape().as_list()[-1] + + # Assign pixel with ignore label to class 0 (background). The loss on the + # pixel will later be masked out. + labels = tf.where(valid_mask, labels, tf.zeros_like(labels)) + + cross_entropy_loss = tf.nn.softmax_cross_entropy_with_logits( + labels=self.get_labels_with_prob(labels, logits, **kwargs), + logits=logits) + + if not self._class_weights: + class_weights = [1] * num_classes + else: + class_weights = self._class_weights + + if num_classes != len(class_weights): + raise ValueError( + 'Length of class_weights should be {}'.format(num_classes)) + + valid_mask = tf.squeeze(tf.cast(valid_mask, tf.float32), axis=-1) + + # If groundtruth is matting map, binarize the value to create the weight + # mask + if self._gt_is_matting_map: + labels = tf.cast(utils.binarize_matting_map(labels), tf.int32) + + weight_mask = tf.einsum( + '...y,y->...', + tf.one_hot(tf.squeeze(labels, axis=-1), num_classes, dtype=tf.float32), + tf.constant(class_weights, tf.float32)) + return cross_entropy_loss * valid_mask * weight_mask + + def get_labels_with_prob(self, labels, logits, **unused_kwargs): + """Get a tensor representing the probability of each class for each pixel. + + This method can be overridden in subclasses for customizing loss function. + + Args: + labels: If groundtruth mask is not matting map, an int32 tensor which is + the label map of the groundtruth. If groundtruth mask is matting map, + an float32 tensor. The shape is always (batch_size, height, width, 1). + logits: A float tensor in shape (batch_size, height, width, num_classes) + which is the output of the network. + **unused_kwargs: Unused keyword arguments. + + Returns: + A float tensor in shape (batch_size, height, width, num_classes). + """ + num_classes = logits.get_shape().as_list()[-1] + + if self._gt_is_matting_map: + train_labels = tf.concat([1 - labels, labels], axis=-1) + else: + labels = tf.squeeze(labels, axis=-1) + train_labels = tf.one_hot(labels, num_classes) + return train_labels * ( + 1 - self._label_smoothing) + self._label_smoothing / num_classes + + def aggregate_loss(self, pixelwise_loss, valid_mask): + """Aggregate the pixelwise loss. + + Args: + pixelwise_loss: A float tensor in shape (batch_size, height, width) which + stores the loss of each pixel. + valid_mask: A bool tensor in shape (batch_size, height, width, 1) which + masks out ignored pixels. + + Returns: + A 0-D float which stores the overall loss of the batch. + """ + normalizer = tf.reduce_sum(tf.cast(valid_mask, tf.float32)) + EPSILON + return tf.reduce_sum(pixelwise_loss) / normalizer + + def aggregate_loss_top_k(self, pixelwise_loss): + """Aggregate the top-k greatest pixelwise loss. + + Args: + pixelwise_loss: A float tensor in shape (batch_size, height, width) which + stores the loss of each pixel. + + Returns: + A 0-D float which stores the overall loss of the batch. + """ + pixelwise_loss = tf.reshape(pixelwise_loss, shape=[-1]) + top_k_pixels = tf.cast( + self._top_k_percent_pixels * + tf.cast(tf.size(pixelwise_loss), tf.float32), tf.int32) + top_k_losses, _ = tf.math.top_k(pixelwise_loss, k=top_k_pixels, sorted=True) + normalizer = tf.reduce_sum( + tf.cast(tf.not_equal(top_k_losses, 0.0), tf.float32)) + EPSILON + return tf.reduce_sum(top_k_losses) / normalizer + + +def get_actual_mask_scores(logits, labels, ignore_label): + """Gets actual mask scores.""" + _, height, width, num_classes = logits.get_shape().as_list() + batch_size = tf.shape(logits)[0] + logits = tf.stop_gradient(logits) + labels = tf.image.resize( + labels, (height, width), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) + predicted_labels = tf.argmax(logits, -1, output_type=tf.int32) + flat_predictions = tf.reshape(predicted_labels, [batch_size, -1]) + flat_labels = tf.cast(tf.reshape(labels, [batch_size, -1]), tf.int32) + + one_hot_predictions = tf.one_hot( + flat_predictions, num_classes, on_value=True, off_value=False) + one_hot_labels = tf.one_hot( + flat_labels, num_classes, on_value=True, off_value=False) + keep_mask = tf.not_equal(flat_labels, ignore_label) + keep_mask = tf.expand_dims(keep_mask, 2) + + overlap = tf.logical_and(one_hot_predictions, one_hot_labels) + overlap = tf.logical_and(overlap, keep_mask) + overlap = tf.reduce_sum(tf.cast(overlap, tf.float32), axis=1) + union = tf.logical_or(one_hot_predictions, one_hot_labels) + union = tf.logical_and(union, keep_mask) + union = tf.reduce_sum(tf.cast(union, tf.float32), axis=1) + actual_scores = tf.divide(overlap, tf.maximum(union, EPSILON)) + return actual_scores + + +class MaskScoringLoss: + """Mask Scoring loss.""" + + def __init__(self, ignore_label): + self._ignore_label = ignore_label + self._mse_loss = tf.keras.losses.MeanSquaredError( + reduction=tf.keras.losses.Reduction.NONE) + + def __call__(self, predicted_scores, logits, labels): + actual_scores = get_actual_mask_scores(logits, labels, self._ignore_label) + loss = tf_utils.safe_mean(self._mse_loss(actual_scores, predicted_scores)) + return loss diff --git a/official/vision/modeling/__init__.py b/official/vision/modeling/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..c3b855601effede11c20dd73253528f98c4e1855 --- /dev/null +++ b/official/vision/modeling/__init__.py @@ -0,0 +1,21 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Modeling package definition.""" + +from official.vision.modeling import backbones +from official.vision.modeling import decoders +from official.vision.modeling import heads +from official.vision.modeling import layers +from official.vision.modeling import models diff --git a/official/vision/modeling/backbones/__init__.py b/official/vision/modeling/backbones/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1edc1d45928ec6e79ad587dfcb23d4c6b828dc2d --- /dev/null +++ b/official/vision/modeling/backbones/__init__.py @@ -0,0 +1,26 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Backbones package definition.""" + +from official.vision.modeling.backbones.efficientnet import EfficientNet +from official.vision.modeling.backbones.mobiledet import MobileDet +from official.vision.modeling.backbones.mobilenet import MobileNet +from official.vision.modeling.backbones.resnet import ResNet +from official.vision.modeling.backbones.resnet_3d import ResNet3D +from official.vision.modeling.backbones.resnet_deeplab import DilatedResNet +from official.vision.modeling.backbones.revnet import RevNet +from official.vision.modeling.backbones.spinenet import SpineNet +from official.vision.modeling.backbones.spinenet_mobile import SpineNetMobile +from official.vision.modeling.backbones.vit import VisionTransformer diff --git a/official/vision/beta/modeling/backbones/efficientnet.py b/official/vision/modeling/backbones/efficientnet.py similarity index 98% rename from official/vision/beta/modeling/backbones/efficientnet.py rename to official/vision/modeling/backbones/efficientnet.py index 068b749ce35d37c085e5c9d91de2a0acc17fde2e..ee5680adc0b93081d541e371c27509a50baff487 100644 --- a/official/vision/beta/modeling/backbones/efficientnet.py +++ b/official/vision/modeling/backbones/efficientnet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,9 +23,9 @@ import tensorflow as tf from official.modeling import hyperparams from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers layers = tf.keras.layers diff --git a/official/vision/beta/modeling/backbones/efficientnet_test.py b/official/vision/modeling/backbones/efficientnet_test.py similarity index 96% rename from official/vision/beta/modeling/backbones/efficientnet_test.py rename to official/vision/modeling/backbones/efficientnet_test.py index 00e35001e743fcacf64bf30dc85cf75a969934b7..95cea19d73847dcd72c9e868b8cb8ae7e62a4fc6 100644 --- a/official/vision/beta/modeling/backbones/efficientnet_test.py +++ b/official/vision/modeling/backbones/efficientnet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for EfficientNet.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import efficientnet +from official.vision.modeling.backbones import efficientnet class EfficientNetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/modeling/backbones/factory.py b/official/vision/modeling/backbones/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..8421250aee7d9651e83b9fb4381ead9d0b19f810 --- /dev/null +++ b/official/vision/modeling/backbones/factory.py @@ -0,0 +1,112 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Backbone registers and factory method. + +One can regitered a new backbone model by the following two steps: + +1 Import the factory and register the build in the backbone file. +2 Import the backbone class and add a build in __init__.py. + +``` +# my_backbone.py + +from modeling.backbones import factory + +class MyBackbone(): + ... + +@factory.register_backbone_builder('my_backbone') +def build_my_backbone(): + return MyBackbone() + +# backbones/__init__.py adds import +from modeling.backbones.my_backbone import MyBackbone +``` + +If one wants the MyBackbone class to be used only by those binary +then don't imported the backbone module in backbones/__init__.py, but import it +in place that uses it. + + +""" +from typing import Sequence, Union + +# Import libraries + +import tensorflow as tf + +from official.core import registry +from official.modeling import hyperparams + + +_REGISTERED_BACKBONE_CLS = {} + + +def register_backbone_builder(key: str): + """Decorates a builder of backbone class. + + The builder should be a Callable (a class or a function). + This decorator supports registration of backbone builder as follows: + + ``` + class MyBackbone(tf.keras.Model): + pass + + @register_backbone_builder('mybackbone') + def builder(input_specs, config, l2_reg): + return MyBackbone(...) + + # Builds a MyBackbone object. + my_backbone = build_backbone_3d(input_specs, config, l2_reg) + ``` + + Args: + key: A `str` of key to look up the builder. + + Returns: + A callable for using as class decorator that registers the decorated class + for creation from an instance of task_config_cls. + """ + return registry.register(_REGISTERED_BACKBONE_CLS, key) + + +def build_backbone(input_specs: Union[tf.keras.layers.InputSpec, + Sequence[tf.keras.layers.InputSpec]], + backbone_config: hyperparams.Config, + norm_activation_config: hyperparams.Config, + l2_regularizer: tf.keras.regularizers.Regularizer = None, + **kwargs) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + """Builds backbone from a config. + + Args: + input_specs: A (sequence of) `tf.keras.layers.InputSpec` of input. + backbone_config: A `OneOfConfig` of backbone config. + norm_activation_config: A config for normalization/activation layer. + l2_regularizer: A `tf.keras.regularizers.Regularizer` object. Default to + None. + **kwargs: Additional keyword args to be passed to backbone builder. + + Returns: + A `tf.keras.Model` instance of the backbone. + """ + backbone_builder = registry.lookup(_REGISTERED_BACKBONE_CLS, + backbone_config.type) + + return backbone_builder( + input_specs=input_specs, + backbone_config=backbone_config, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer, + **kwargs) diff --git a/official/vision/modeling/backbones/factory_test.py b/official/vision/modeling/backbones/factory_test.py new file mode 100644 index 0000000000000000000000000000000000000000..552b79cca659d490a65a8c661fec8bb58e4b6683 --- /dev/null +++ b/official/vision/modeling/backbones/factory_test.py @@ -0,0 +1,227 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for factory functions.""" +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from official.vision.configs import backbones as backbones_cfg +from official.vision.configs import backbones_3d as backbones_3d_cfg +from official.vision.configs import common as common_cfg +from official.vision.modeling import backbones +from official.vision.modeling.backbones import factory + + +class FactoryTest(tf.test.TestCase, parameterized.TestCase): + + @combinations.generate( + combinations.combine(model_id=[18, 34, 50, 101, 152],)) + def test_resnet_creation(self, model_id): + """Test creation of ResNet models.""" + + network = backbones.ResNet( + model_id=model_id, se_ratio=0.0, norm_momentum=0.99, norm_epsilon=1e-5) + + backbone_config = backbones_cfg.Backbone( + type='resnet', + resnet=backbones_cfg.ResNet(model_id=model_id, se_ratio=0.0)) + norm_activation_config = common_cfg.NormActivation( + norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) + + factory_network = factory.build_backbone( + input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), + backbone_config=backbone_config, + norm_activation_config=norm_activation_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + + @combinations.generate( + combinations.combine( + model_id=['b0', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7'], + se_ratio=[0.0, 0.25], + )) + def test_efficientnet_creation(self, model_id, se_ratio): + """Test creation of EfficientNet models.""" + + network = backbones.EfficientNet( + model_id=model_id, + se_ratio=se_ratio, + norm_momentum=0.99, + norm_epsilon=1e-5) + + backbone_config = backbones_cfg.Backbone( + type='efficientnet', + efficientnet=backbones_cfg.EfficientNet( + model_id=model_id, se_ratio=se_ratio)) + norm_activation_config = common_cfg.NormActivation( + norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) + + factory_network = factory.build_backbone( + input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), + backbone_config=backbone_config, + norm_activation_config=norm_activation_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + + @combinations.generate( + combinations.combine( + model_id=['MobileNetV1', 'MobileNetV2', + 'MobileNetV3Large', 'MobileNetV3Small', + 'MobileNetV3EdgeTPU'], + filter_size_scale=[1.0, 0.75], + )) + def test_mobilenet_creation(self, model_id, filter_size_scale): + """Test creation of Mobilenet models.""" + + network = backbones.MobileNet( + model_id=model_id, + filter_size_scale=filter_size_scale, + norm_momentum=0.99, + norm_epsilon=1e-5) + + backbone_config = backbones_cfg.Backbone( + type='mobilenet', + mobilenet=backbones_cfg.MobileNet( + model_id=model_id, filter_size_scale=filter_size_scale)) + norm_activation_config = common_cfg.NormActivation( + norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) + + factory_network = factory.build_backbone( + input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), + backbone_config=backbone_config, + norm_activation_config=norm_activation_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + + @combinations.generate(combinations.combine(model_id=['49'],)) + def test_spinenet_creation(self, model_id): + """Test creation of SpineNet models.""" + input_size = 128 + min_level = 3 + max_level = 7 + + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size, input_size, 3]) + network = backbones.SpineNet( + input_specs=input_specs, + min_level=min_level, + max_level=max_level, + norm_momentum=0.99, + norm_epsilon=1e-5) + + backbone_config = backbones_cfg.Backbone( + type='spinenet', + spinenet=backbones_cfg.SpineNet(model_id=model_id)) + norm_activation_config = common_cfg.NormActivation( + norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) + + factory_network = factory.build_backbone( + input_specs=tf.keras.layers.InputSpec( + shape=[None, input_size, input_size, 3]), + backbone_config=backbone_config, + norm_activation_config=norm_activation_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + + @combinations.generate( + combinations.combine(model_id=[38, 56, 104],)) + def test_revnet_creation(self, model_id): + """Test creation of RevNet models.""" + network = backbones.RevNet( + model_id=model_id, norm_momentum=0.99, norm_epsilon=1e-5) + + backbone_config = backbones_cfg.Backbone( + type='revnet', + revnet=backbones_cfg.RevNet(model_id=model_id)) + norm_activation_config = common_cfg.NormActivation( + norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) + + factory_network = factory.build_backbone( + input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), + backbone_config=backbone_config, + norm_activation_config=norm_activation_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + + @combinations.generate(combinations.combine(model_type=['resnet_3d'],)) + def test_resnet_3d_creation(self, model_type): + """Test creation of ResNet 3D models.""" + backbone_cfg = backbones_3d_cfg.Backbone3D(type=model_type).get() + temporal_strides = [] + temporal_kernel_sizes = [] + for block_spec in backbone_cfg.block_specs: + temporal_strides.append(block_spec.temporal_strides) + temporal_kernel_sizes.append(block_spec.temporal_kernel_sizes) + + _ = backbones.ResNet3D( + model_id=backbone_cfg.model_id, + temporal_strides=temporal_strides, + temporal_kernel_sizes=temporal_kernel_sizes, + norm_momentum=0.99, + norm_epsilon=1e-5) + + @combinations.generate( + combinations.combine( + model_id=[ + 'MobileDetCPU', + 'MobileDetDSP', + 'MobileDetEdgeTPU', + 'MobileDetGPU'], + filter_size_scale=[1.0, 0.75], + )) + def test_mobiledet_creation(self, model_id, filter_size_scale): + """Test creation of Mobiledet models.""" + + network = backbones.MobileDet( + model_id=model_id, + filter_size_scale=filter_size_scale, + norm_momentum=0.99, + norm_epsilon=1e-5) + + backbone_config = backbones_cfg.Backbone( + type='mobiledet', + mobiledet=backbones_cfg.MobileDet( + model_id=model_id, filter_size_scale=filter_size_scale)) + norm_activation_config = common_cfg.NormActivation( + norm_momentum=0.99, norm_epsilon=1e-5, use_sync_bn=False) + + factory_network = factory.build_backbone( + input_specs=tf.keras.layers.InputSpec(shape=[None, None, None, 3]), + backbone_config=backbone_config, + norm_activation_config=norm_activation_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/modeling/backbones/mobiledet.py b/official/vision/modeling/backbones/mobiledet.py similarity index 98% rename from official/vision/beta/modeling/backbones/mobiledet.py rename to official/vision/modeling/backbones/mobiledet.py index 0e4db75702ec45571b8bead8079a9587ea3a6995..58037b22b6e7a21be868a5c85156e98219f4aa3e 100644 --- a/official/vision/beta/modeling/backbones/mobiledet.py +++ b/official/vision/modeling/backbones/mobiledet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,10 +20,10 @@ from typing import Any, Dict, Optional, Tuple, List import tensorflow as tf from official.modeling import hyperparams -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.backbones import mobilenet -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.backbones import factory +from official.vision.modeling.backbones import mobilenet +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers layers = tf.keras.layers diff --git a/official/vision/beta/modeling/backbones/mobiledet_test.py b/official/vision/modeling/backbones/mobiledet_test.py similarity index 96% rename from official/vision/beta/modeling/backbones/mobiledet_test.py rename to official/vision/modeling/backbones/mobiledet_test.py index 0d0126c2beb643a66c201332fab8214ec6d1d15f..24d0b850ed4030cd93068e01dceeeb705065d31e 100644 --- a/official/vision/beta/modeling/backbones/mobiledet_test.py +++ b/official/vision/modeling/backbones/mobiledet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -19,7 +19,7 @@ import itertools from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import mobiledet +from official.vision.modeling.backbones import mobiledet class MobileDetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/beta/modeling/backbones/mobilenet.py b/official/vision/modeling/backbones/mobilenet.py similarity index 99% rename from official/vision/beta/modeling/backbones/mobilenet.py rename to official/vision/modeling/backbones/mobilenet.py index 42d8466419864a485c2c36f632db8af241172673..a0854dce64f52ffb099981cfb8b11b4e5c058f98 100644 --- a/official/vision/beta/modeling/backbones/mobilenet.py +++ b/official/vision/modeling/backbones/mobilenet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,9 +21,9 @@ from typing import Optional, Dict, Any, Tuple import tensorflow as tf from official.modeling import hyperparams from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers layers = tf.keras.layers @@ -861,8 +861,7 @@ class MobileNet(tf.keras.Model): net = block(net) elif block_def.block_fn == 'gpooling': - net = layers.GlobalAveragePooling2D()(net) - net = layers.Reshape((1, 1, net.shape[1]))(net) + net = layers.GlobalAveragePooling2D(keepdims=True)(net) else: raise ValueError('Unknown block type {} for layer {}'.format( diff --git a/official/vision/beta/modeling/backbones/mobilenet_test.py b/official/vision/modeling/backbones/mobilenet_test.py similarity index 98% rename from official/vision/beta/modeling/backbones/mobilenet_test.py rename to official/vision/modeling/backbones/mobilenet_test.py index 7266383ffd0df4ff97062587713d3b73a57edcd9..71281d30bd7270d983f929219046aabae6169573 100644 --- a/official/vision/beta/modeling/backbones/mobilenet_test.py +++ b/official/vision/modeling/backbones/mobilenet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for MobileNet.""" import itertools @@ -23,7 +22,7 @@ import math from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import mobilenet +from official.vision.modeling.backbones import mobilenet class MobileNetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/modeling/backbones/resnet.py b/official/vision/modeling/backbones/resnet.py new file mode 100644 index 0000000000000000000000000000000000000000..0f8653a5eb02b98965478e23d2b74359a01f251f --- /dev/null +++ b/official/vision/modeling/backbones/resnet.py @@ -0,0 +1,432 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of ResNet and ResNet-RS models.""" + +from typing import Callable, Optional + +# Import libraries +import tensorflow as tf + +from official.modeling import hyperparams +from official.modeling import tf_utils +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers + +layers = tf.keras.layers + +# Specifications for different ResNet variants. +# Each entry specifies block configurations of the particular ResNet variant. +# Each element in the block configuration is in the following format: +# (block_fn, num_filters, block_repeats) +RESNET_SPECS = { + 10: [ + ('residual', 64, 1), + ('residual', 128, 1), + ('residual', 256, 1), + ('residual', 512, 1), + ], + 18: [ + ('residual', 64, 2), + ('residual', 128, 2), + ('residual', 256, 2), + ('residual', 512, 2), + ], + 34: [ + ('residual', 64, 3), + ('residual', 128, 4), + ('residual', 256, 6), + ('residual', 512, 3), + ], + 50: [ + ('bottleneck', 64, 3), + ('bottleneck', 128, 4), + ('bottleneck', 256, 6), + ('bottleneck', 512, 3), + ], + 101: [ + ('bottleneck', 64, 3), + ('bottleneck', 128, 4), + ('bottleneck', 256, 23), + ('bottleneck', 512, 3), + ], + 152: [ + ('bottleneck', 64, 3), + ('bottleneck', 128, 8), + ('bottleneck', 256, 36), + ('bottleneck', 512, 3), + ], + 200: [ + ('bottleneck', 64, 3), + ('bottleneck', 128, 24), + ('bottleneck', 256, 36), + ('bottleneck', 512, 3), + ], + 270: [ + ('bottleneck', 64, 4), + ('bottleneck', 128, 29), + ('bottleneck', 256, 53), + ('bottleneck', 512, 4), + ], + 350: [ + ('bottleneck', 64, 4), + ('bottleneck', 128, 36), + ('bottleneck', 256, 72), + ('bottleneck', 512, 4), + ], + 420: [ + ('bottleneck', 64, 4), + ('bottleneck', 128, 44), + ('bottleneck', 256, 87), + ('bottleneck', 512, 4), + ], +} + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class ResNet(tf.keras.Model): + """Creates ResNet and ResNet-RS family models. + + This implements the Deep Residual Network from: + Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. + Deep Residual Learning for Image Recognition. + (https://arxiv.org/pdf/1512.03385) and + Irwan Bello, William Fedus, Xianzhi Du, Ekin D. Cubuk, Aravind Srinivas, + Tsung-Yi Lin, Jonathon Shlens, Barret Zoph. + Revisiting ResNets: Improved Training and Scaling Strategies. + (https://arxiv.org/abs/2103.07579). + """ + + def __init__( + self, + model_id: int, + input_specs: tf.keras.layers.InputSpec = layers.InputSpec( + shape=[None, None, None, 3]), + depth_multiplier: float = 1.0, + stem_type: str = 'v0', + resnetd_shortcut: bool = False, + replace_stem_max_pool: bool = False, + se_ratio: Optional[float] = None, + init_stochastic_depth_rate: float = 0.0, + scale_stem: bool = True, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bn_trainable: bool = True, + **kwargs): + """Initializes a ResNet model. + + Args: + model_id: An `int` of the depth of ResNet backbone model. + input_specs: A `tf.keras.layers.InputSpec` of the input tensor. + depth_multiplier: A `float` of the depth multiplier to uniformaly scale up + all layers in channel size. This argument is also referred to as + `width_multiplier` in (https://arxiv.org/abs/2103.07579). + stem_type: A `str` of stem type of ResNet. Default to `v0`. If set to + `v1`, use ResNet-D type stem (https://arxiv.org/abs/1812.01187). + resnetd_shortcut: A `bool` of whether to use ResNet-D shortcut in + downsampling blocks. + replace_stem_max_pool: A `bool` of whether to replace the max pool in stem + with a stride-2 conv, + se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. + init_stochastic_depth_rate: A `float` of initial stochastic depth rate. + scale_stem: A `bool` of whether to scale stem layers. + activation: A `str` name of the activation function. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A small `float` added to variance to avoid dividing by zero. + kernel_initializer: A str for kernel initializer of convolutional layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + Default to None. + bn_trainable: A `bool` that indicates whether batch norm layers should be + trainable. Default to True. + **kwargs: Additional keyword arguments to be passed. + """ + self._model_id = model_id + self._input_specs = input_specs + self._depth_multiplier = depth_multiplier + self._stem_type = stem_type + self._resnetd_shortcut = resnetd_shortcut + self._replace_stem_max_pool = replace_stem_max_pool + self._se_ratio = se_ratio + self._init_stochastic_depth_rate = init_stochastic_depth_rate + self._scale_stem = scale_stem + self._use_sync_bn = use_sync_bn + self._activation = activation + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + if use_sync_bn: + self._norm = layers.experimental.SyncBatchNormalization + else: + self._norm = layers.BatchNormalization + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._bn_trainable = bn_trainable + + if tf.keras.backend.image_data_format() == 'channels_last': + bn_axis = -1 + else: + bn_axis = 1 + + # Build ResNet. + inputs = tf.keras.Input(shape=input_specs.shape[1:]) + + stem_depth_multiplier = self._depth_multiplier if scale_stem else 1.0 + if stem_type == 'v0': + x = layers.Conv2D( + filters=int(64 * stem_depth_multiplier), + kernel_size=7, + strides=2, + use_bias=False, + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + inputs) + x = self._norm( + axis=bn_axis, + momentum=norm_momentum, + epsilon=norm_epsilon, + trainable=bn_trainable)( + x) + x = tf_utils.get_activation(activation, use_keras_layer=True)(x) + elif stem_type == 'v1': + x = layers.Conv2D( + filters=int(32 * stem_depth_multiplier), + kernel_size=3, + strides=2, + use_bias=False, + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + inputs) + x = self._norm( + axis=bn_axis, + momentum=norm_momentum, + epsilon=norm_epsilon, + trainable=bn_trainable)( + x) + x = tf_utils.get_activation(activation, use_keras_layer=True)(x) + x = layers.Conv2D( + filters=int(32 * stem_depth_multiplier), + kernel_size=3, + strides=1, + use_bias=False, + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + x) + x = self._norm( + axis=bn_axis, + momentum=norm_momentum, + epsilon=norm_epsilon, + trainable=bn_trainable)( + x) + x = tf_utils.get_activation(activation, use_keras_layer=True)(x) + x = layers.Conv2D( + filters=int(64 * stem_depth_multiplier), + kernel_size=3, + strides=1, + use_bias=False, + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + x) + x = self._norm( + axis=bn_axis, + momentum=norm_momentum, + epsilon=norm_epsilon, + trainable=bn_trainable)( + x) + x = tf_utils.get_activation(activation, use_keras_layer=True)(x) + else: + raise ValueError('Stem type {} not supported.'.format(stem_type)) + + if replace_stem_max_pool: + x = layers.Conv2D( + filters=int(64 * self._depth_multiplier), + kernel_size=3, + strides=2, + use_bias=False, + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + x) + x = self._norm( + axis=bn_axis, + momentum=norm_momentum, + epsilon=norm_epsilon, + trainable=bn_trainable)( + x) + x = tf_utils.get_activation(activation, use_keras_layer=True)(x) + else: + x = layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x) + + endpoints = {} + for i, spec in enumerate(RESNET_SPECS[model_id]): + if spec[0] == 'residual': + block_fn = nn_blocks.ResidualBlock + elif spec[0] == 'bottleneck': + block_fn = nn_blocks.BottleneckBlock + else: + raise ValueError('Block fn `{}` is not supported.'.format(spec[0])) + x = self._block_group( + inputs=x, + filters=int(spec[1] * self._depth_multiplier), + strides=(1 if i == 0 else 2), + block_fn=block_fn, + block_repeats=spec[2], + stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( + self._init_stochastic_depth_rate, i + 2, 5), + name='block_group_l{}'.format(i + 2)) + endpoints[str(i + 2)] = x + + self._output_specs = {l: endpoints[l].get_shape() for l in endpoints} + + super(ResNet, self).__init__(inputs=inputs, outputs=endpoints, **kwargs) + + def _block_group(self, + inputs: tf.Tensor, + filters: int, + strides: int, + block_fn: Callable[..., tf.keras.layers.Layer], + block_repeats: int = 1, + stochastic_depth_drop_rate: float = 0.0, + name: str = 'block_group'): + """Creates one group of blocks for the ResNet model. + + Args: + inputs: A `tf.Tensor` of size `[batch, channels, height, width]`. + filters: An `int` number of filters for the first convolution of the + layer. + strides: An `int` stride to use for the first convolution of the layer. + If greater than 1, this layer will downsample the input. + block_fn: The type of block group. Either `nn_blocks.ResidualBlock` or + `nn_blocks.BottleneckBlock`. + block_repeats: An `int` number of blocks contained in the layer. + stochastic_depth_drop_rate: A `float` of drop rate of the current block + group. + name: A `str` name for the block. + + Returns: + The output `tf.Tensor` of the block layer. + """ + x = block_fn( + filters=filters, + strides=strides, + use_projection=True, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + se_ratio=self._se_ratio, + resnetd_shortcut=self._resnetd_shortcut, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=self._activation, + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon, + bn_trainable=self._bn_trainable)( + inputs) + + for _ in range(1, block_repeats): + x = block_fn( + filters=filters, + strides=1, + use_projection=False, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + se_ratio=self._se_ratio, + resnetd_shortcut=self._resnetd_shortcut, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=self._activation, + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon, + bn_trainable=self._bn_trainable)( + x) + + return tf.keras.layers.Activation('linear', name=name)(x) + + def get_config(self): + config_dict = { + 'model_id': self._model_id, + 'depth_multiplier': self._depth_multiplier, + 'stem_type': self._stem_type, + 'resnetd_shortcut': self._resnetd_shortcut, + 'replace_stem_max_pool': self._replace_stem_max_pool, + 'activation': self._activation, + 'se_ratio': self._se_ratio, + 'init_stochastic_depth_rate': self._init_stochastic_depth_rate, + 'scale_stem': self._scale_stem, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'bn_trainable': self._bn_trainable + } + return config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + @property + def output_specs(self): + """A dict of {level: TensorShape} pairs for the model output.""" + return self._output_specs + + +@factory.register_backbone_builder('resnet') +def build_resnet( + input_specs: tf.keras.layers.InputSpec, + backbone_config: hyperparams.Config, + norm_activation_config: hyperparams.Config, + l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: # pytype: disable=annotation-type-mismatch # typed-keras + """Builds ResNet backbone from a config.""" + backbone_type = backbone_config.type + backbone_cfg = backbone_config.get() + assert backbone_type == 'resnet', (f'Inconsistent backbone type ' + f'{backbone_type}') + + return ResNet( + model_id=backbone_cfg.model_id, + input_specs=input_specs, + depth_multiplier=backbone_cfg.depth_multiplier, + stem_type=backbone_cfg.stem_type, + resnetd_shortcut=backbone_cfg.resnetd_shortcut, + replace_stem_max_pool=backbone_cfg.replace_stem_max_pool, + se_ratio=backbone_cfg.se_ratio, + init_stochastic_depth_rate=backbone_cfg.stochastic_depth_drop_rate, + scale_stem=backbone_cfg.scale_stem, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer, + bn_trainable=backbone_cfg.bn_trainable) diff --git a/official/vision/beta/modeling/backbones/resnet_3d.py b/official/vision/modeling/backbones/resnet_3d.py similarity index 88% rename from official/vision/beta/modeling/backbones/resnet_3d.py rename to official/vision/modeling/backbones/resnet_3d.py index f1876df24bd6bc2bf008d10c5ad0ef7072e3daaf..6fffb901a4466812260e3537159b8846ade2b42d 100644 --- a/official/vision/beta/modeling/backbones/resnet_3d.py +++ b/official/vision/modeling/backbones/resnet_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,9 +20,9 @@ import tensorflow as tf from official.modeling import hyperparams from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks_3d -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks_3d +from official.vision.modeling.layers import nn_layers layers = tf.keras.layers @@ -153,19 +153,76 @@ class ResNet3D(tf.keras.Model): self._kernel_regularizer = kernel_regularizer self._bias_regularizer = bias_regularizer if tf.keras.backend.image_data_format() == 'channels_last': - bn_axis = -1 + self._bn_axis = -1 else: - bn_axis = 1 + self._bn_axis = 1 # Build ResNet3D backbone. inputs = tf.keras.Input(shape=input_specs.shape[1:]) + endpoints = self._build_model(inputs) + self._output_specs = {l: endpoints[l].get_shape() for l in endpoints} + + super(ResNet3D, self).__init__(inputs=inputs, outputs=endpoints, **kwargs) + + def _build_model(self, inputs): + """Builds model architecture. + + Args: + inputs: the keras input spec. + + Returns: + endpoints: A dictionary of backbone endpoint features. + """ + # Build stem. + x = self._build_stem(inputs, stem_type=self._stem_type) + + temporal_kernel_size = 1 if self._stem_pool_temporal_stride == 1 else 3 + x = layers.MaxPool3D( + pool_size=[temporal_kernel_size, 3, 3], + strides=[self._stem_pool_temporal_stride, 2, 2], + padding='same')(x) + + # Build intermediate blocks and endpoints. + resnet_specs = RESNET_SPECS[self._model_id] + if len(self._temporal_strides) != len(resnet_specs) or len( + self._temporal_kernel_sizes) != len(resnet_specs): + raise ValueError( + 'Number of blocks in temporal specs should equal to resnet_specs.') + + endpoints = {} + for i, resnet_spec in enumerate(resnet_specs): + if resnet_spec[0] == 'bottleneck3d': + block_fn = nn_blocks_3d.BottleneckBlock3D + else: + raise ValueError('Block fn `{}` is not supported.'.format( + resnet_spec[0])) + + use_self_gating = ( + self._use_self_gating[i] if self._use_self_gating else False) + x = self._block_group( + inputs=x, + filters=resnet_spec[1], + temporal_kernel_sizes=self._temporal_kernel_sizes[i], + temporal_strides=self._temporal_strides[i], + spatial_strides=(1 if i == 0 else 2), + block_fn=block_fn, + block_repeats=resnet_spec[2], + stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( + self._init_stochastic_depth_rate, i + 2, 5), + use_self_gating=use_self_gating, + name='block_group_l{}'.format(i + 2)) + endpoints[str(i + 2)] = x + + return endpoints + def _build_stem(self, inputs, stem_type): + """Builds stem layer.""" # Build stem. if stem_type == 'v0': x = layers.Conv3D( filters=64, - kernel_size=[stem_conv_temporal_kernel_size, 7, 7], - strides=[stem_conv_temporal_stride, 2, 2], + kernel_size=[self._stem_conv_temporal_kernel_size, 7, 7], + strides=[self._stem_conv_temporal_stride, 2, 2], use_bias=False, padding='same', kernel_initializer=self._kernel_initializer, @@ -173,14 +230,15 @@ class ResNet3D(tf.keras.Model): bias_regularizer=self._bias_regularizer)( inputs) x = self._norm( - axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)( - x) - x = tf_utils.get_activation(activation)(x) + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)(x) + x = tf_utils.get_activation(self._activation)(x) elif stem_type == 'v1': x = layers.Conv3D( filters=32, - kernel_size=[stem_conv_temporal_kernel_size, 3, 3], - strides=[stem_conv_temporal_stride, 2, 2], + kernel_size=[self._stem_conv_temporal_kernel_size, 3, 3], + strides=[self._stem_conv_temporal_stride, 2, 2], use_bias=False, padding='same', kernel_initializer=self._kernel_initializer, @@ -188,9 +246,10 @@ class ResNet3D(tf.keras.Model): bias_regularizer=self._bias_regularizer)( inputs) x = self._norm( - axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)( - x) - x = tf_utils.get_activation(activation)(x) + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)(x) + x = tf_utils.get_activation(self._activation)(x) x = layers.Conv3D( filters=32, kernel_size=[1, 3, 3], @@ -202,9 +261,10 @@ class ResNet3D(tf.keras.Model): bias_regularizer=self._bias_regularizer)( x) x = self._norm( - axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)( - x) - x = tf_utils.get_activation(activation)(x) + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)(x) + x = tf_utils.get_activation(self._activation)(x) x = layers.Conv3D( filters=64, kernel_size=[1, 3, 3], @@ -216,51 +276,14 @@ class ResNet3D(tf.keras.Model): bias_regularizer=self._bias_regularizer)( x) x = self._norm( - axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)( - x) - x = tf_utils.get_activation(activation)(x) + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)(x) + x = tf_utils.get_activation(self._activation)(x) else: raise ValueError(f'Stem type {stem_type} not supported.') - temporal_kernel_size = 1 if stem_pool_temporal_stride == 1 else 3 - x = layers.MaxPool3D( - pool_size=[temporal_kernel_size, 3, 3], - strides=[stem_pool_temporal_stride, 2, 2], - padding='same')( - x) - - # Build intermediate blocks and endpoints. - resnet_specs = RESNET_SPECS[model_id] - if len(temporal_strides) != len(resnet_specs) or len( - temporal_kernel_sizes) != len(resnet_specs): - raise ValueError( - 'Number of blocks in temporal specs should equal to resnet_specs.') - - endpoints = {} - for i, resnet_spec in enumerate(resnet_specs): - if resnet_spec[0] == 'bottleneck3d': - block_fn = nn_blocks_3d.BottleneckBlock3D - else: - raise ValueError('Block fn `{}` is not supported.'.format( - resnet_spec[0])) - - x = self._block_group( - inputs=x, - filters=resnet_spec[1], - temporal_kernel_sizes=temporal_kernel_sizes[i], - temporal_strides=temporal_strides[i], - spatial_strides=(1 if i == 0 else 2), - block_fn=block_fn, - block_repeats=resnet_spec[2], - stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( - self._init_stochastic_depth_rate, i + 2, 5), - use_self_gating=use_self_gating[i] if use_self_gating else False, - name='block_group_l{}'.format(i + 2)) - endpoints[str(i + 2)] = x - - self._output_specs = {l: endpoints[l].get_shape() for l in endpoints} - - super(ResNet3D, self).__init__(inputs=inputs, outputs=endpoints, **kwargs) + return x def _block_group(self, inputs: tf.Tensor, diff --git a/official/vision/beta/modeling/backbones/resnet_3d_test.py b/official/vision/modeling/backbones/resnet_3d_test.py similarity index 96% rename from official/vision/beta/modeling/backbones/resnet_3d_test.py rename to official/vision/modeling/backbones/resnet_3d_test.py index ea40c8f4fddb45bb586ce2b733f32fb73ee63e55..8ee8a98389274978eebff9de9573a1352c0efb60 100644 --- a/official/vision/beta/modeling/backbones/resnet_3d_test.py +++ b/official/vision/modeling/backbones/resnet_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for resnet.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import resnet_3d +from official.vision.modeling.backbones import resnet_3d class ResNet3DTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/beta/modeling/backbones/resnet_deeplab.py b/official/vision/modeling/backbones/resnet_deeplab.py similarity index 88% rename from official/vision/beta/modeling/backbones/resnet_deeplab.py rename to official/vision/modeling/backbones/resnet_deeplab.py index 611689fd353ec68cde0b3c996817bad95e1043a1..5e8ba9001a2387702d8e5c409c26f817ac8af866 100644 --- a/official/vision/beta/modeling/backbones/resnet_deeplab.py +++ b/official/vision/modeling/backbones/resnet_deeplab.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,9 +20,9 @@ import numpy as np import tensorflow as tf from official.modeling import hyperparams from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers layers = tf.keras.layers @@ -43,6 +43,18 @@ RESNET_SPECS = { ('bottleneck', 256, 23), ('bottleneck', 512, 3), ], + 152: [ + ('bottleneck', 64, 3), + ('bottleneck', 128, 8), + ('bottleneck', 256, 36), + ('bottleneck', 512, 3), + ], + 200: [ + ('bottleneck', 64, 3), + ('bottleneck', 128, 24), + ('bottleneck', 256, 36), + ('bottleneck', 512, 3), + ], } @@ -63,6 +75,8 @@ class DilatedResNet(tf.keras.Model): input_specs: tf.keras.layers.InputSpec = layers.InputSpec( shape=[None, None, None, 3]), stem_type: str = 'v0', + resnetd_shortcut: bool = False, + replace_stem_max_pool: bool = False, se_ratio: Optional[float] = None, init_stochastic_depth_rate: float = 0.0, multigrid: Optional[Tuple[int]] = None, @@ -84,6 +98,10 @@ class DilatedResNet(tf.keras.Model): input_specs: A `tf.keras.layers.InputSpec` of the input tensor. stem_type: A `str` of stem type. Can be `v0` or `v1`. `v1` replaces 7x7 conv by 3 3x3 convs. + resnetd_shortcut: A `bool` of whether to use ResNet-D shortcut in + downsampling blocks. + replace_stem_max_pool: A `bool` of whether to replace the max pool in stem + with a stride-2 conv, se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. init_stochastic_depth_rate: A `float` of initial stochastic depth rate. multigrid: A tuple of the same length as the number of blocks in the last @@ -116,6 +134,8 @@ class DilatedResNet(tf.keras.Model): self._kernel_regularizer = kernel_regularizer self._bias_regularizer = bias_regularizer self._stem_type = stem_type + self._resnetd_shortcut = resnetd_shortcut + self._replace_stem_max_pool = replace_stem_max_pool self._se_ratio = se_ratio self._init_stochastic_depth_rate = init_stochastic_depth_rate @@ -188,7 +208,23 @@ class DilatedResNet(tf.keras.Model): else: raise ValueError('Stem type {} not supported.'.format(stem_type)) - x = layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x) + if replace_stem_max_pool: + x = layers.Conv2D( + filters=64, + kernel_size=3, + strides=2, + use_bias=False, + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + x) + x = self._norm( + axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)( + x) + x = tf_utils.get_activation(activation, use_keras_layer=True)(x) + else: + x = layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x) normal_resnet_stage = int(np.math.log2(self._output_stride)) - 2 @@ -284,6 +320,7 @@ class DilatedResNet(tf.keras.Model): use_projection=True, stochastic_depth_drop_rate=stochastic_depth_drop_rate, se_ratio=self._se_ratio, + resnetd_shortcut=self._resnetd_shortcut, kernel_initializer=self._kernel_initializer, kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer, @@ -299,6 +336,7 @@ class DilatedResNet(tf.keras.Model): dilation_rate=dilation_rate * multigrid[i], use_projection=False, stochastic_depth_drop_rate=stochastic_depth_drop_rate, + resnetd_shortcut=self._resnetd_shortcut, se_ratio=self._se_ratio, kernel_initializer=self._kernel_initializer, kernel_regularizer=self._kernel_regularizer, @@ -316,6 +354,8 @@ class DilatedResNet(tf.keras.Model): 'model_id': self._model_id, 'output_stride': self._output_stride, 'stem_type': self._stem_type, + 'resnetd_shortcut': self._resnetd_shortcut, + 'replace_stem_max_pool': self._replace_stem_max_pool, 'se_ratio': self._se_ratio, 'init_stochastic_depth_rate': self._init_stochastic_depth_rate, 'activation': self._activation, @@ -355,6 +395,8 @@ def build_dilated_resnet( output_stride=backbone_cfg.output_stride, input_specs=input_specs, stem_type=backbone_cfg.stem_type, + resnetd_shortcut=backbone_cfg.resnetd_shortcut, + replace_stem_max_pool=backbone_cfg.replace_stem_max_pool, se_ratio=backbone_cfg.se_ratio, init_stochastic_depth_rate=backbone_cfg.stochastic_depth_drop_rate, multigrid=backbone_cfg.multigrid, diff --git a/official/vision/beta/modeling/backbones/resnet_deeplab_test.py b/official/vision/modeling/backbones/resnet_deeplab_test.py similarity index 85% rename from official/vision/beta/modeling/backbones/resnet_deeplab_test.py rename to official/vision/modeling/backbones/resnet_deeplab_test.py index 53169a1fe06935b9be8aea203a68e5e17aa15451..23f2f62e411323a312c5eab2035e5be9db0a2e53 100644 --- a/official/vision/beta/modeling/backbones/resnet_deeplab_test.py +++ b/official/vision/modeling/backbones/resnet_deeplab_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for resnet_deeplab models.""" # Import libraries @@ -22,7 +21,7 @@ import tensorflow as tf from tensorflow.python.distribute import combinations from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.modeling.backbones import resnet_deeplab +from official.vision.modeling.backbones import resnet_deeplab class ResNetTest(parameterized.TestCase, tf.test.TestCase): @@ -30,8 +29,12 @@ class ResNetTest(parameterized.TestCase, tf.test.TestCase): @parameterized.parameters( (128, 50, 4, 8), (128, 101, 4, 8), + (128, 152, 4, 8), + (128, 200, 4, 8), (128, 50, 4, 16), (128, 101, 4, 16), + (128, 152, 4, 16), + (128, 200, 4, 16), ) def test_network_creation(self, input_size, model_id, endpoint_filter_scale, output_stride): @@ -49,13 +52,17 @@ class ResNetTest(parameterized.TestCase, tf.test.TestCase): ], endpoints[str(int(np.math.log2(output_stride)))].shape.as_list()) @parameterized.parameters( - ('v0', None, 0.0), - ('v1', None, 0.0), - ('v1', 0.25, 0.0), - ('v1', 0.25, 0.2), + ('v0', None, 0.0, False, False), + ('v1', None, 0.0, False, False), + ('v1', 0.25, 0.0, False, False), + ('v1', 0.25, 0.2, False, False), + ('v1', 0.25, 0.0, True, False), + ('v1', 0.25, 0.2, False, True), + ('v1', None, 0.2, True, True), ) def test_network_features(self, stem_type, se_ratio, - init_stochastic_depth_rate): + init_stochastic_depth_rate, resnetd_shortcut, + replace_stem_max_pool): """Test additional features of ResNet models.""" input_size = 128 model_id = 50 @@ -68,6 +75,8 @@ class ResNetTest(parameterized.TestCase, tf.test.TestCase): model_id=model_id, output_stride=output_stride, stem_type=stem_type, + resnetd_shortcut=resnetd_shortcut, + replace_stem_max_pool=replace_stem_max_pool, se_ratio=se_ratio, init_stochastic_depth_rate=init_stochastic_depth_rate) inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) @@ -117,6 +126,8 @@ class ResNetTest(parameterized.TestCase, tf.test.TestCase): stem_type='v0', se_ratio=0.25, init_stochastic_depth_rate=0.2, + resnetd_shortcut=False, + replace_stem_max_pool=False, use_sync_bn=False, activation='relu', norm_momentum=0.99, diff --git a/official/vision/beta/modeling/backbones/resnet_test.py b/official/vision/modeling/backbones/resnet_test.py similarity index 97% rename from official/vision/beta/modeling/backbones/resnet_test.py rename to official/vision/modeling/backbones/resnet_test.py index aa6d5b6790f0c9c7cff659083695efa6b7c45994..f4af789e670f0db8b87b14538304d2aab2a33ade 100644 --- a/official/vision/beta/modeling/backbones/resnet_test.py +++ b/official/vision/modeling/backbones/resnet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for resnet.""" # Import libraries @@ -22,7 +21,7 @@ import tensorflow as tf from tensorflow.python.distribute import combinations from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.modeling.backbones import resnet +from official.vision.modeling.backbones import resnet class ResNetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/beta/modeling/backbones/revnet.py b/official/vision/modeling/backbones/revnet.py similarity index 97% rename from official/vision/beta/modeling/backbones/revnet.py rename to official/vision/modeling/backbones/revnet.py index 550ed5a34e447d162c845a5ca51cd8cbf783fca3..aecaded31658a727f7353474df6e77ad389649ec 100644 --- a/official/vision/beta/modeling/backbones/revnet.py +++ b/official/vision/modeling/backbones/revnet.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Contains definitions of RevNet.""" from typing import Any, Callable, Dict, Optional @@ -20,8 +19,8 @@ from typing import Any, Callable, Dict, Optional import tensorflow as tf from official.modeling import hyperparams from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks # Specifications for different RevNet variants. @@ -208,7 +207,7 @@ class RevNet(tf.keras.Model): @property def output_specs(self) -> Dict[int, tf.TensorShape]: """A dict of {level: TensorShape} pairs for the model output.""" - return self._output_specs + return self._output_specs # pytype: disable=bad-return-type # trace-all-classes @factory.register_backbone_builder('revnet') diff --git a/official/vision/beta/modeling/backbones/revnet_test.py b/official/vision/modeling/backbones/revnet_test.py similarity index 95% rename from official/vision/beta/modeling/backbones/revnet_test.py rename to official/vision/modeling/backbones/revnet_test.py index dd797f0ffc61c95c97c901c7e52a6848e4225f38..d3aad349e486ec59e3175641b741ea36a9adfd1b 100644 --- a/official/vision/beta/modeling/backbones/revnet_test.py +++ b/official/vision/modeling/backbones/revnet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for RevNet.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import revnet +from official.vision.modeling.backbones import revnet class RevNetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/modeling/backbones/spinenet.py b/official/vision/modeling/backbones/spinenet.py new file mode 100644 index 0000000000000000000000000000000000000000..6f641029edd03567a48b350df24d73f9ce787596 --- /dev/null +++ b/official/vision/modeling/backbones/spinenet.py @@ -0,0 +1,576 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of SpineNet Networks.""" + +import math +from typing import Any, List, Optional, Tuple + +# Import libraries + +from absl import logging +import tensorflow as tf + +from official.modeling import hyperparams +from official.modeling import tf_utils +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers +from official.vision.ops import spatial_transform_ops + +layers = tf.keras.layers + +FILTER_SIZE_MAP = { + 1: 32, + 2: 64, + 3: 128, + 4: 256, + 5: 256, + 6: 256, + 7: 256, +} + +# The fixed SpineNet architecture discovered by NAS. +# Each element represents a specification of a building block: +# (block_level, block_fn, (input_offset0, input_offset1), is_output). +SPINENET_BLOCK_SPECS = [ + (2, 'bottleneck', (0, 1), False), + (4, 'residual', (0, 1), False), + (3, 'bottleneck', (2, 3), False), + (4, 'bottleneck', (2, 4), False), + (6, 'residual', (3, 5), False), + (4, 'bottleneck', (3, 5), False), + (5, 'residual', (6, 7), False), + (7, 'residual', (6, 8), False), + (5, 'bottleneck', (8, 9), False), + (5, 'bottleneck', (8, 10), False), + (4, 'bottleneck', (5, 10), True), + (3, 'bottleneck', (4, 10), True), + (5, 'bottleneck', (7, 12), True), + (7, 'bottleneck', (5, 14), True), + (6, 'bottleneck', (12, 14), True), + (2, 'bottleneck', (2, 13), True), +] + +SCALING_MAP = { + '49S': { + 'endpoints_num_filters': 128, + 'filter_size_scale': 0.65, + 'resample_alpha': 0.5, + 'block_repeats': 1, + }, + '49': { + 'endpoints_num_filters': 256, + 'filter_size_scale': 1.0, + 'resample_alpha': 0.5, + 'block_repeats': 1, + }, + '96': { + 'endpoints_num_filters': 256, + 'filter_size_scale': 1.0, + 'resample_alpha': 0.5, + 'block_repeats': 2, + }, + '143': { + 'endpoints_num_filters': 256, + 'filter_size_scale': 1.0, + 'resample_alpha': 1.0, + 'block_repeats': 3, + }, + # SpineNet-143 with 1.3x filter_size_scale. + '143L': { + 'endpoints_num_filters': 256, + 'filter_size_scale': 1.3, + 'resample_alpha': 1.0, + 'block_repeats': 3, + }, + '190': { + 'endpoints_num_filters': 512, + 'filter_size_scale': 1.3, + 'resample_alpha': 1.0, + 'block_repeats': 4, + }, +} + + +class BlockSpec(object): + """A container class that specifies the block configuration for SpineNet.""" + + def __init__(self, level: int, block_fn: str, input_offsets: Tuple[int, int], + is_output: bool): + self.level = level + self.block_fn = block_fn + self.input_offsets = input_offsets + self.is_output = is_output + + +def build_block_specs( + block_specs: Optional[List[Tuple[Any, ...]]] = None) -> List[BlockSpec]: + """Builds the list of BlockSpec objects for SpineNet.""" + if not block_specs: + block_specs = SPINENET_BLOCK_SPECS + logging.info('Building SpineNet block specs: %s', block_specs) + return [BlockSpec(*b) for b in block_specs] + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SpineNet(tf.keras.Model): + """Creates a SpineNet family model. + + This implements: + Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, + Yin Cui, Quoc V. Le, Xiaodan Song. + SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization. + (https://arxiv.org/abs/1912.05027) + """ + + def __init__( + self, + input_specs: tf.keras.layers.InputSpec = tf.keras.layers.InputSpec( + shape=[None, None, None, 3]), + min_level: int = 3, + max_level: int = 7, + block_specs: List[BlockSpec] = build_block_specs(), + endpoints_num_filters: int = 256, + resample_alpha: float = 0.5, + block_repeats: int = 1, + filter_size_scale: float = 1.0, + init_stochastic_depth_rate: float = 0.0, + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + **kwargs): + """Initializes a SpineNet model. + + Args: + input_specs: A `tf.keras.layers.InputSpec` of the input tensor. + min_level: An `int` of min level for output mutiscale features. + max_level: An `int` of max level for output mutiscale features. + block_specs: A list of block specifications for the SpineNet model + discovered by NAS. + endpoints_num_filters: An `int` of feature dimension for the output + endpoints. + resample_alpha: A `float` of resampling factor in cross-scale connections. + block_repeats: An `int` of number of blocks contained in the layer. + filter_size_scale: A `float` of multiplier for the filters (number of + channels) for all convolution ops. The value must be greater than zero. + Typical usage will be to set this value in (0, 1) to reduce the number + of parameters or computation cost of the model. + init_stochastic_depth_rate: A `float` of initial stochastic depth rate. + kernel_initializer: A str for kernel initializer of convolutional layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + Default to None. + activation: A `str` name of the activation function. + use_sync_bn: If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A small `float` added to variance to avoid dividing by zero. + **kwargs: Additional keyword arguments to be passed. + """ + self._input_specs = input_specs + self._min_level = min_level + self._max_level = max_level + self._block_specs = block_specs + self._endpoints_num_filters = endpoints_num_filters + self._resample_alpha = resample_alpha + self._block_repeats = block_repeats + self._filter_size_scale = filter_size_scale + self._init_stochastic_depth_rate = init_stochastic_depth_rate + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._activation = activation + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._init_block_fn = 'bottleneck' + self._num_init_blocks = 2 + + self._set_activation_fn(activation) + + if use_sync_bn: + self._norm = layers.experimental.SyncBatchNormalization + else: + self._norm = layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + + # Build SpineNet. + inputs = tf.keras.Input(shape=input_specs.shape[1:]) + + net = self._build_stem(inputs=inputs) + input_width = input_specs.shape[2] + if input_width is None: + max_stride = max(map(lambda b: b.level, block_specs)) + input_width = 2 ** max_stride + net = self._build_scale_permuted_network(net=net, input_width=input_width) + endpoints = self._build_endpoints(net=net) + + self._output_specs = {l: endpoints[l].get_shape() for l in endpoints} + super(SpineNet, self).__init__(inputs=inputs, outputs=endpoints) + + def _set_activation_fn(self, activation): + if activation == 'relu': + self._activation_fn = tf.nn.relu + elif activation == 'swish': + self._activation_fn = tf.nn.swish + else: + raise ValueError('Activation {} not implemented.'.format(activation)) + + def _block_group(self, + inputs: tf.Tensor, + filters: int, + strides: int, + block_fn_cand: str, + block_repeats: int = 1, + stochastic_depth_drop_rate: Optional[float] = None, + name: str = 'block_group'): + """Creates one group of blocks for the SpineNet model.""" + block_fn_candidates = { + 'bottleneck': nn_blocks.BottleneckBlock, + 'residual': nn_blocks.ResidualBlock, + } + block_fn = block_fn_candidates[block_fn_cand] + _, _, _, num_filters = inputs.get_shape().as_list() + + if block_fn_cand == 'bottleneck': + use_projection = not (num_filters == (filters * 4) and strides == 1) + else: + use_projection = not (num_filters == filters and strides == 1) + + x = block_fn( + filters=filters, + strides=strides, + use_projection=use_projection, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=self._activation, + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon)( + inputs) + for _ in range(1, block_repeats): + x = block_fn( + filters=filters, + strides=1, + use_projection=False, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=self._activation, + use_sync_bn=self._use_sync_bn, + norm_momentum=self._norm_momentum, + norm_epsilon=self._norm_epsilon)( + x) + return tf.identity(x, name=name) + + def _build_stem(self, inputs): + """Builds SpineNet stem.""" + x = layers.Conv2D( + filters=64, + kernel_size=7, + strides=2, + use_bias=False, + padding='same', + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + inputs) + x = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)( + x) + x = tf_utils.get_activation(self._activation_fn)(x) + x = layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x) + + net = [] + # Build the initial level 2 blocks. + for i in range(self._num_init_blocks): + x = self._block_group( + inputs=x, + filters=int(FILTER_SIZE_MAP[2] * self._filter_size_scale), + strides=1, + block_fn_cand=self._init_block_fn, + block_repeats=self._block_repeats, + name='stem_block_{}'.format(i + 1)) + net.append(x) + return net + + def _build_scale_permuted_network(self, + net, + input_width, + weighted_fusion=False): + """Builds scale-permuted network.""" + net_sizes = [int(math.ceil(input_width / 2**2))] * len(net) + net_block_fns = [self._init_block_fn] * len(net) + num_outgoing_connections = [0] * len(net) + + endpoints = {} + for i, block_spec in enumerate(self._block_specs): + # Find out specs for the target block. + target_width = int(math.ceil(input_width / 2**block_spec.level)) + target_num_filters = int(FILTER_SIZE_MAP[block_spec.level] * + self._filter_size_scale) + target_block_fn = block_spec.block_fn + + # Resample then merge input0 and input1. + parents = [] + input0 = block_spec.input_offsets[0] + input1 = block_spec.input_offsets[1] + + x0 = self._resample_with_alpha( + inputs=net[input0], + input_width=net_sizes[input0], + input_block_fn=net_block_fns[input0], + target_width=target_width, + target_num_filters=target_num_filters, + target_block_fn=target_block_fn, + alpha=self._resample_alpha) + parents.append(x0) + num_outgoing_connections[input0] += 1 + + x1 = self._resample_with_alpha( + inputs=net[input1], + input_width=net_sizes[input1], + input_block_fn=net_block_fns[input1], + target_width=target_width, + target_num_filters=target_num_filters, + target_block_fn=target_block_fn, + alpha=self._resample_alpha) + parents.append(x1) + num_outgoing_connections[input1] += 1 + + # Merge 0 outdegree blocks to the output block. + if block_spec.is_output: + for j, (j_feat, + j_connections) in enumerate(zip(net, num_outgoing_connections)): + if j_connections == 0 and (j_feat.shape[2] == target_width and + j_feat.shape[3] == x0.shape[3]): + parents.append(j_feat) + num_outgoing_connections[j] += 1 + + # pylint: disable=g-direct-tensorflow-import + if weighted_fusion: + dtype = parents[0].dtype + parent_weights = [ + tf.nn.relu(tf.cast(tf.Variable(1.0, name='block{}_fusion{}'.format( + i, j)), dtype=dtype)) for j in range(len(parents))] + weights_sum = tf.add_n(parent_weights) + parents = [ + parents[i] * parent_weights[i] / (weights_sum + 0.0001) + for i in range(len(parents)) + ] + + # Fuse all parent nodes then build a new block. + x = tf_utils.get_activation(self._activation_fn)(tf.add_n(parents)) + x = self._block_group( + inputs=x, + filters=target_num_filters, + strides=1, + block_fn_cand=target_block_fn, + block_repeats=self._block_repeats, + stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( + self._init_stochastic_depth_rate, i + 1, len(self._block_specs)), + name='scale_permuted_block_{}'.format(i + 1)) + + net.append(x) + net_sizes.append(target_width) + net_block_fns.append(target_block_fn) + num_outgoing_connections.append(0) + + # Save output feats. + if block_spec.is_output: + if block_spec.level in endpoints: + raise ValueError('Duplicate feats found for output level {}.'.format( + block_spec.level)) + if (block_spec.level < self._min_level or + block_spec.level > self._max_level): + logging.warning( + 'SpineNet output level %s out of range [min_level, max_level] = ' + '[%s, %s] will not be used for further processing.', + block_spec.level, self._min_level, self._max_level) + endpoints[str(block_spec.level)] = x + + return endpoints + + def _build_endpoints(self, net): + """Matches filter size for endpoints before sharing conv layers.""" + endpoints = {} + for level in range(self._min_level, self._max_level + 1): + x = layers.Conv2D( + filters=self._endpoints_num_filters, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + net[str(level)]) + x = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)( + x) + x = tf_utils.get_activation(self._activation_fn)(x) + endpoints[str(level)] = x + return endpoints + + def _resample_with_alpha(self, + inputs, + input_width, + input_block_fn, + target_width, + target_num_filters, + target_block_fn, + alpha=0.5): + """Matches resolution and feature dimension.""" + _, _, _, input_num_filters = inputs.get_shape().as_list() + if input_block_fn == 'bottleneck': + input_num_filters /= 4 + new_num_filters = int(input_num_filters * alpha) + + x = layers.Conv2D( + filters=new_num_filters, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + inputs) + x = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)( + x) + x = tf_utils.get_activation(self._activation_fn)(x) + + # Spatial resampling. + if input_width > target_width: + x = layers.Conv2D( + filters=new_num_filters, + kernel_size=3, + strides=2, + padding='SAME', + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + x) + x = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)( + x) + x = tf_utils.get_activation(self._activation_fn)(x) + input_width /= 2 + while input_width > target_width: + x = layers.MaxPool2D(pool_size=3, strides=2, padding='SAME')(x) + input_width /= 2 + elif input_width < target_width: + scale = target_width // input_width + x = spatial_transform_ops.nearest_upsampling(x, scale=scale) + + # Last 1x1 conv to match filter size. + if target_block_fn == 'bottleneck': + target_num_filters *= 4 + x = layers.Conv2D( + filters=target_num_filters, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=self._kernel_initializer, + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer)( + x) + x = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon)( + x) + return x + + def get_config(self): + config_dict = { + 'min_level': self._min_level, + 'max_level': self._max_level, + 'endpoints_num_filters': self._endpoints_num_filters, + 'resample_alpha': self._resample_alpha, + 'block_repeats': self._block_repeats, + 'filter_size_scale': self._filter_size_scale, + 'init_stochastic_depth_rate': self._init_stochastic_depth_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon + } + return config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + @property + def output_specs(self): + """A dict of {level: TensorShape} pairs for the model output.""" + return self._output_specs + + +@factory.register_backbone_builder('spinenet') +def build_spinenet( + input_specs: tf.keras.layers.InputSpec, + backbone_config: hyperparams.Config, + norm_activation_config: hyperparams.Config, + l2_regularizer: tf.keras.regularizers.Regularizer = None) -> tf.keras.Model: + """Builds SpineNet backbone from a config.""" + backbone_type = backbone_config.type + backbone_cfg = backbone_config.get() + assert backbone_type == 'spinenet', (f'Inconsistent backbone type ' + f'{backbone_type}') + + model_id = str(backbone_cfg.model_id) + if model_id not in SCALING_MAP: + raise ValueError( + 'SpineNet-{} is not a valid architecture.'.format(model_id)) + scaling_params = SCALING_MAP[model_id] + + return SpineNet( + input_specs=input_specs, + min_level=backbone_cfg.min_level, + max_level=backbone_cfg.max_level, + endpoints_num_filters=scaling_params['endpoints_num_filters'], + resample_alpha=scaling_params['resample_alpha'], + block_repeats=scaling_params['block_repeats'], + filter_size_scale=scaling_params['filter_size_scale'], + init_stochastic_depth_rate=backbone_cfg.stochastic_depth_drop_rate, + kernel_regularizer=l2_regularizer, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon) diff --git a/official/vision/beta/modeling/backbones/spinenet_mobile.py b/official/vision/modeling/backbones/spinenet_mobile.py similarity index 97% rename from official/vision/beta/modeling/backbones/spinenet_mobile.py rename to official/vision/modeling/backbones/spinenet_mobile.py index 1edf5c7d92544b922a5b291e69b3db3735fcee77..89f82abd64644541f78874b9a5e489fdfb0b15c9 100644 --- a/official/vision/beta/modeling/backbones/spinenet_mobile.py +++ b/official/vision/modeling/backbones/spinenet_mobile.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -38,10 +37,10 @@ import tensorflow as tf from official.modeling import hyperparams from official.modeling import tf_utils -from official.vision.beta.modeling.backbones import factory -from official.vision.beta.modeling.layers import nn_blocks -from official.vision.beta.modeling.layers import nn_layers -from official.vision.beta.ops import spatial_transform_ops +from official.vision.modeling.backbones import factory +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers +from official.vision.ops import spatial_transform_ops layers = tf.keras.layers @@ -244,6 +243,7 @@ class SpineNetMobile(tf.keras.Model): in_filters=in_filters, out_filters=out_filters, strides=strides, + se_gating_activation='hard_sigmoid', se_ratio=se_ratio, expand_ratio=expand_ratio, stochastic_depth_drop_rate=stochastic_depth_drop_rate, @@ -365,15 +365,21 @@ class SpineNetMobile(tf.keras.Model): parent_weights = [ tf.nn.relu(tf.cast(tf.Variable(1.0, name='block{}_fusion{}'.format( i, j)), dtype=dtype)) for j in range(len(parents))] - weights_sum = layers.Add()(parent_weights) + weights_sum = parent_weights[0] + for adder in parent_weights[1:]: + weights_sum = layers.Add()([weights_sum, adder]) + parents = [ parents[i] * parent_weights[i] / (weights_sum + 0.0001) for i in range(len(parents)) ] # Fuse all parent nodes then build a new block. + x = parents[0] + for adder in parents[1:]: + x = layers.Add()([x, adder]) x = tf_utils.get_activation( - self._activation, use_keras_layer=True)(layers.Add()(parents)) + self._activation, use_keras_layer=True)(x) x = self._block_group( inputs=x, in_filters=target_num_filters, diff --git a/official/vision/beta/modeling/backbones/spinenet_mobile_test.py b/official/vision/modeling/backbones/spinenet_mobile_test.py similarity index 96% rename from official/vision/beta/modeling/backbones/spinenet_mobile_test.py rename to official/vision/modeling/backbones/spinenet_mobile_test.py index 56fab19bcc391c5d3460159832fde3ab23ca35e8..cc0604154372334f72ada548570550ad4c28b5aa 100644 --- a/official/vision/beta/modeling/backbones/spinenet_mobile_test.py +++ b/official/vision/modeling/backbones/spinenet_mobile_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 # Copyright 2020 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -32,7 +31,7 @@ from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import spinenet_mobile +from official.vision.modeling.backbones import spinenet_mobile class SpineNetMobileTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/beta/modeling/backbones/spinenet_test.py b/official/vision/modeling/backbones/spinenet_test.py similarity index 88% rename from official/vision/beta/modeling/backbones/spinenet_test.py rename to official/vision/modeling/backbones/spinenet_test.py index e3a036a6e38b683c459e170803821b33fd62d2fd..e6fdf9c358a22e259e4764337c2a30475ea1eca8 100644 --- a/official/vision/beta/modeling/backbones/spinenet_test.py +++ b/official/vision/modeling/backbones/spinenet_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,13 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for SpineNet.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import spinenet +from official.vision.modeling.backbones import spinenet class SpineNetTest(parameterized.TestCase, tf.test.TestCase): @@ -123,6 +122,18 @@ class SpineNetTest(parameterized.TestCase, tf.test.TestCase): # If the serialization was successful, the new config should match the old. self.assertAllEqual(network.get_config(), new_network.get_config()) + @parameterized.parameters( + ('relu', tf.nn.relu), + ('swish', tf.nn.swish) + ) + def test_activation(self, activation, activation_fn): + model = spinenet.SpineNet(activation=activation) + self.assertEqual(model._activation_fn, activation_fn) + + def test_invalid_activation_raises_valurerror(self): + with self.assertRaises(ValueError): + spinenet.SpineNet(activation='invalid_activation_name') + if __name__ == '__main__': tf.test.main() diff --git a/official/vision/modeling/backbones/vit.py b/official/vision/modeling/backbones/vit.py new file mode 100644 index 0000000000000000000000000000000000000000..1138e46fe12966c52f39a2f83d5c3b3a5634809c --- /dev/null +++ b/official/vision/modeling/backbones/vit.py @@ -0,0 +1,322 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""VisionTransformer models.""" + +from typing import Optional, Tuple + +from absl import logging +import tensorflow as tf + +from official.modeling import activations +from official.vision.modeling.backbones import factory +from official.vision.modeling.backbones.vit_specs import VIT_SPECS +from official.vision.modeling.layers import nn_blocks +from official.vision.modeling.layers import nn_layers + +layers = tf.keras.layers + + +class AddPositionEmbs(tf.keras.layers.Layer): + """Adds (optionally learned) positional embeddings to the inputs.""" + + def __init__(self, + posemb_init: Optional[tf.keras.initializers.Initializer] = None, + posemb_origin_shape: Optional[Tuple[int, int]] = None, + posemb_target_shape: Optional[Tuple[int, int]] = None, + **kwargs): + """Constructs Postional Embedding module. + + The logic of this module is: the learnable positional embeddings length will + be determined by the inputs_shape or posemb_origin_shape (if provided) + during the construction. If the posemb_target_shape is provided and is + different from the positional embeddings length, the embeddings will be + interpolated during the forward call. + + Args: + posemb_init: The positional embedding initializer. + posemb_origin_shape: The intended positional embedding shape. + posemb_target_shape: The potential target shape positional embedding may + be interpolated to. + **kwargs: other args. + """ + super().__init__(**kwargs) + self.posemb_init = posemb_init + self.posemb_origin_shape = posemb_origin_shape + self.posemb_target_shape = posemb_target_shape + + def build(self, inputs_shape): + if self.posemb_origin_shape is not None: + pos_emb_length = self.posemb_origin_shape[0] * self.posemb_origin_shape[1] + else: + pos_emb_length = inputs_shape[1] + pos_emb_shape = (1, pos_emb_length, inputs_shape[2]) + self.pos_embedding = self.add_weight( + 'pos_embedding', pos_emb_shape, initializer=self.posemb_init) + + def _interpolate(self, pos_embedding: tf.Tensor, from_shape: Tuple[int, int], + to_shape: Tuple[int, int]) -> tf.Tensor: + """Interpolates the positional embeddings.""" + logging.info('Interpolating postional embedding from length: %d to %d', + from_shape, to_shape) + grid_emb = tf.reshape(pos_embedding, [1] + list(from_shape) + [-1]) + # NOTE: Using BILINEAR interpolation by default. + grid_emb = tf.image.resize(grid_emb, to_shape) + return tf.reshape(grid_emb, [1, to_shape[0] * to_shape[1], -1]) + + def call(self, inputs, inputs_positions=None): + del inputs_positions + pos_embedding = self.pos_embedding + # inputs.shape is (batch_size, seq_len, emb_dim). + if inputs.shape[1] != pos_embedding.shape[1]: + pos_embedding = self._interpolate( + pos_embedding, + from_shape=self.posemb_origin_shape, + to_shape=self.posemb_target_shape) + pos_embedding = tf.cast(pos_embedding, inputs.dtype) + + return inputs + pos_embedding + + +class TokenLayer(tf.keras.layers.Layer): + """A simple layer to wrap token parameters.""" + + def build(self, inputs_shape): + self.cls = self.add_weight( + 'cls', (1, 1, inputs_shape[-1]), initializer='zeros') + + def call(self, inputs): + cls = tf.cast(self.cls, inputs.dtype) + cls = cls + tf.zeros_like(inputs[:, 0:1]) # A hacky way to tile. + x = tf.concat([cls, inputs], axis=1) + return x + + +class Encoder(tf.keras.layers.Layer): + """Transformer Encoder.""" + + def __init__(self, + num_layers, + mlp_dim, + num_heads, + dropout_rate=0.1, + attention_dropout_rate=0.1, + kernel_regularizer=None, + inputs_positions=None, + init_stochastic_depth_rate=0.0, + kernel_initializer='glorot_uniform', + add_pos_embed=True, + pos_embed_origin_shape=None, + pos_embed_target_shape=None, + **kwargs): + super().__init__(**kwargs) + self._num_layers = num_layers + self._mlp_dim = mlp_dim + self._num_heads = num_heads + self._dropout_rate = dropout_rate + self._attention_dropout_rate = attention_dropout_rate + self._kernel_regularizer = kernel_regularizer + self._inputs_positions = inputs_positions + self._init_stochastic_depth_rate = init_stochastic_depth_rate + self._kernel_initializer = kernel_initializer + self._add_pos_embed = add_pos_embed + self._pos_embed_origin_shape = pos_embed_origin_shape + self._pos_embed_target_shape = pos_embed_target_shape + + def build(self, input_shape): + if self._add_pos_embed: + self._pos_embed = AddPositionEmbs( + posemb_init=tf.keras.initializers.RandomNormal(stddev=0.02), + posemb_origin_shape=self._pos_embed_origin_shape, + posemb_target_shape=self._pos_embed_target_shape, + name='posembed_input') + self._dropout = layers.Dropout(rate=self._dropout_rate) + + self._encoder_layers = [] + # Set layer norm epsilons to 1e-6 to be consistent with JAX implementation. + # https://flax.readthedocs.io/en/latest/_autosummary/flax.deprecated.nn.LayerNorm.html + for i in range(self._num_layers): + encoder_layer = nn_blocks.TransformerEncoderBlock( + inner_activation=activations.gelu, + num_attention_heads=self._num_heads, + inner_dim=self._mlp_dim, + output_dropout=self._dropout_rate, + attention_dropout=self._attention_dropout_rate, + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=self._kernel_initializer, + norm_first=True, + stochastic_depth_drop_rate=nn_layers.get_stochastic_depth_rate( + self._init_stochastic_depth_rate, i + 1, self._num_layers), + norm_epsilon=1e-6) + self._encoder_layers.append(encoder_layer) + self._norm = layers.LayerNormalization(epsilon=1e-6) + super().build(input_shape) + + def call(self, inputs, training=None): + x = inputs + if self._add_pos_embed: + x = self._pos_embed(x, inputs_positions=self._inputs_positions) + x = self._dropout(x, training=training) + + for encoder_layer in self._encoder_layers: + x = encoder_layer(x, training=training) + x = self._norm(x) + return x + + def get_config(self): + config = super().get_config() + updates = { + 'num_layers': self._num_layers, + 'mlp_dim': self._mlp_dim, + 'num_heads': self._num_heads, + 'dropout_rate': self._dropout_rate, + 'attention_dropout_rate': self._attention_dropout_rate, + 'kernel_regularizer': self._kernel_regularizer, + 'inputs_positions': self._inputs_positions, + 'init_stochastic_depth_rate': self._init_stochastic_depth_rate, + 'kernel_initializer': self._kernel_initializer, + 'add_pos_embed': self._add_pos_embed, + 'pos_embed_origin_shape': self._pos_embed_origin_shape, + 'pos_embed_target_shape': self._pos_embed_target_shape, + } + config.update(updates) + return config + + +class VisionTransformer(tf.keras.Model): + """Class to build VisionTransformer family model.""" + + def __init__(self, + mlp_dim=3072, + num_heads=12, + num_layers=12, + attention_dropout_rate=0.0, + dropout_rate=0.1, + init_stochastic_depth_rate=0.0, + input_specs=layers.InputSpec(shape=[None, None, None, 3]), + patch_size=16, + hidden_size=768, + representation_size=0, + pooler='token', + kernel_regularizer=None, + original_init: bool = True, + pos_embed_shape: Optional[Tuple[int, int]] = None): + """VisionTransformer initialization function.""" + self._mlp_dim = mlp_dim + self._num_heads = num_heads + self._num_layers = num_layers + self._hidden_size = hidden_size + self._patch_size = patch_size + + inputs = tf.keras.Input(shape=input_specs.shape[1:]) + + x = layers.Conv2D( + filters=hidden_size, + kernel_size=patch_size, + strides=patch_size, + padding='valid', + kernel_regularizer=kernel_regularizer, + kernel_initializer='lecun_normal' if original_init else 'he_uniform')( + inputs) + if tf.keras.backend.image_data_format() == 'channels_last': + rows_axis, cols_axis = (1, 2) + else: + rows_axis, cols_axis = (2, 3) + # The reshape below assumes the data_format is 'channels_last,' so + # transpose to that. Once the data is flattened by the reshape, the + # data_format is irrelevant, so no need to update + # tf.keras.backend.image_data_format. + x = tf.transpose(x, perm=[0, 2, 3, 1]) + + pos_embed_target_shape = (x.shape[rows_axis], x.shape[cols_axis]) + seq_len = (input_specs.shape[rows_axis] // patch_size) * ( + input_specs.shape[cols_axis] // patch_size) + x = tf.reshape(x, [-1, seq_len, hidden_size]) + + # If we want to add a class token, add it here. + if pooler == 'token': + x = TokenLayer(name='cls')(x) + + x = Encoder( + num_layers=num_layers, + mlp_dim=mlp_dim, + num_heads=num_heads, + dropout_rate=dropout_rate, + attention_dropout_rate=attention_dropout_rate, + kernel_regularizer=kernel_regularizer, + kernel_initializer='glorot_uniform' if original_init else dict( + class_name='TruncatedNormal', config=dict(stddev=.02)), + init_stochastic_depth_rate=init_stochastic_depth_rate, + pos_embed_origin_shape=pos_embed_shape, + pos_embed_target_shape=pos_embed_target_shape)( + x) + + if pooler == 'token': + x = x[:, 0] + elif pooler == 'gap': + x = tf.reduce_mean(x, axis=1) + elif pooler == 'none': + x = tf.identity(x, name='encoded_tokens') + else: + raise ValueError(f'unrecognized pooler type: {pooler}') + + if representation_size: + x = tf.keras.layers.Dense( + representation_size, + kernel_regularizer=kernel_regularizer, + name='pre_logits', + kernel_initializer='lecun_normal' if original_init else 'he_uniform')( + x) + x = tf.nn.tanh(x) + else: + x = tf.identity(x, name='pre_logits') + + if pooler == 'none': + endpoints = {'encoded_tokens': x} + else: + endpoints = { + 'pre_logits': + tf.reshape(x, [-1, 1, 1, representation_size or hidden_size]) + } + super(VisionTransformer, self).__init__(inputs=inputs, outputs=endpoints) + + +@factory.register_backbone_builder('vit') +def build_vit(input_specs, + backbone_config, + norm_activation_config, + l2_regularizer=None): + """Build ViT model.""" + del norm_activation_config + backbone_type = backbone_config.type + backbone_cfg = backbone_config.get() + assert backbone_type == 'vit', (f'Inconsistent backbone type ' + f'{backbone_type}') + backbone_cfg.override(VIT_SPECS[backbone_cfg.model_name]) + + return VisionTransformer( + mlp_dim=backbone_cfg.transformer.mlp_dim, + num_heads=backbone_cfg.transformer.num_heads, + num_layers=backbone_cfg.transformer.num_layers, + attention_dropout_rate=backbone_cfg.transformer.attention_dropout_rate, + dropout_rate=backbone_cfg.transformer.dropout_rate, + init_stochastic_depth_rate=backbone_cfg.init_stochastic_depth_rate, + input_specs=input_specs, + patch_size=backbone_cfg.patch_size, + hidden_size=backbone_cfg.hidden_size, + representation_size=backbone_cfg.representation_size, + pooler=backbone_cfg.pooler, + kernel_regularizer=l2_regularizer, + original_init=backbone_cfg.original_init, + pos_embed_shape=backbone_cfg.pos_embed_shape) diff --git a/official/vision/modeling/backbones/vit_specs.py b/official/vision/modeling/backbones/vit_specs.py new file mode 100644 index 0000000000000000000000000000000000000000..060bc2d09be50b4dd10b2891a518a6e948d435b1 --- /dev/null +++ b/official/vision/modeling/backbones/vit_specs.py @@ -0,0 +1,68 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""VisionTransformer backbone specs.""" +import immutabledict + + +VIT_SPECS = immutabledict.immutabledict({ + 'vit-ti16': + dict( + hidden_size=192, + patch_size=16, + transformer=dict(mlp_dim=768, num_heads=3, num_layers=12), + ), + 'vit-s16': + dict( + hidden_size=384, + patch_size=16, + transformer=dict(mlp_dim=1536, num_heads=6, num_layers=12), + ), + 'vit-b16': + dict( + hidden_size=768, + patch_size=16, + transformer=dict(mlp_dim=3072, num_heads=12, num_layers=12), + ), + 'vit-b32': + dict( + hidden_size=768, + patch_size=32, + transformer=dict(mlp_dim=3072, num_heads=12, num_layers=12), + ), + 'vit-l16': + dict( + hidden_size=1024, + patch_size=16, + transformer=dict(mlp_dim=4096, num_heads=16, num_layers=24), + ), + 'vit-l32': + dict( + hidden_size=1024, + patch_size=32, + transformer=dict(mlp_dim=4096, num_heads=16, num_layers=24), + ), + 'vit-h14': + dict( + hidden_size=1280, + patch_size=14, + transformer=dict(mlp_dim=5120, num_heads=16, num_layers=32), + ), + 'vit-g14': + dict( + hidden_size=1664, + patch_size=14, + transformer=dict(mlp_dim=8192, num_heads=16, num_layers=48), + ), +}) diff --git a/official/vision/modeling/backbones/vit_test.py b/official/vision/modeling/backbones/vit_test.py new file mode 100644 index 0000000000000000000000000000000000000000..507c3226379888082cef29bbb278d4c7dcb0e9e7 --- /dev/null +++ b/official/vision/modeling/backbones/vit_test.py @@ -0,0 +1,73 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for VIT.""" + +from absl.testing import parameterized +import tensorflow as tf + +from official.vision.modeling.backbones import vit + + +class VisionTransformerTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (224, 85798656), + (256, 85844736), + ) + def test_network_creation(self, input_size, params_count): + """Test creation of VisionTransformer family models.""" + tf.keras.backend.set_image_data_format('channels_last') + input_specs = tf.keras.layers.InputSpec( + shape=[2, input_size, input_size, 3]) + network = vit.VisionTransformer(input_specs=input_specs) + + inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) + _ = network(inputs) + self.assertEqual(network.count_params(), params_count) + + def test_network_none_pooler(self): + tf.keras.backend.set_image_data_format('channels_last') + input_size = 256 + input_specs = tf.keras.layers.InputSpec( + shape=[2, input_size, input_size, 3]) + network = vit.VisionTransformer( + input_specs=input_specs, + patch_size=16, + pooler='none', + representation_size=128, + pos_embed_shape=(14, 14)) # (224 // 16) + + inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) + output = network(inputs)['encoded_tokens'] + self.assertEqual(output.shape, [1, 256, 128]) + + def test_posembedding_interpolation(self): + tf.keras.backend.set_image_data_format('channels_last') + input_size = 256 + input_specs = tf.keras.layers.InputSpec( + shape=[2, input_size, input_size, 3]) + network = vit.VisionTransformer( + input_specs=input_specs, + patch_size=16, + pooler='gap', + pos_embed_shape=(14, 14)) # (224 // 16) + + inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1) + output = network(inputs)['pre_logits'] + self.assertEqual(output.shape, [1, 1, 1, 768]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/modeling/classification_model.py b/official/vision/modeling/classification_model.py similarity index 98% rename from official/vision/beta/modeling/classification_model.py rename to official/vision/modeling/classification_model.py index cde7ebcca596804db1c781ebb24b7bea405cf342..ae85e4a164a3ffed1fcf868a326d36a1e790bb27 100644 --- a/official/vision/beta/modeling/classification_model.py +++ b/official/vision/modeling/classification_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/modeling/classification_model_test.py b/official/vision/modeling/classification_model_test.py similarity index 80% rename from official/vision/beta/modeling/classification_model_test.py rename to official/vision/modeling/classification_model_test.py index 5af56e794f616c9f7afe45d2f37c8658df34c141..9647f3e12c62d15a01df121a5ea359489df57fb2 100644 --- a/official/vision/beta/modeling/classification_model_test.py +++ b/official/vision/modeling/classification_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for classification network.""" # Import libraries @@ -22,26 +21,55 @@ import tensorflow as tf from tensorflow.python.distribute import combinations from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import classification_model +from official.vision.modeling import backbones +from official.vision.modeling import classification_model class ClassificationNetworkTest(parameterized.TestCase, tf.test.TestCase): + @parameterized.parameters( + (192 * 4, 3, 12, 192, 5524416), + (384 * 4, 6, 12, 384, 21665664), + ) + def test_vision_transformer_creation(self, mlp_dim, num_heads, num_layers, + hidden_size, num_params): + """Test for creation of a Vision Transformer classifier.""" + inputs = np.random.rand(2, 224, 224, 3) + + tf.keras.backend.set_image_data_format('channels_last') + + backbone = backbones.VisionTransformer( + mlp_dim=mlp_dim, + num_heads=num_heads, + num_layers=num_layers, + hidden_size=hidden_size, + input_specs=tf.keras.layers.InputSpec(shape=[None, 224, 224, 3]), + ) + self.assertEqual(backbone.count_params(), num_params) + + num_classes = 1000 + model = classification_model.ClassificationModel( + backbone=backbone, + num_classes=num_classes, + dropout_rate=0.2, + ) + + logits = model(inputs) + self.assertAllEqual([2, num_classes], logits.numpy().shape) + @parameterized.parameters( (128, 50, 'relu'), (128, 50, 'relu'), (128, 50, 'swish'), ) - def test_resnet_network_creation( - self, input_size, resnet_model_id, activation): + def test_resnet_network_creation(self, input_size, resnet_model_id, + activation): """Test for creation of a ResNet-50 classifier.""" inputs = np.random.rand(2, input_size, input_size, 3) tf.keras.backend.set_image_data_format('channels_last') - backbone = backbones.ResNet( - model_id=resnet_model_id, activation=activation) + backbone = backbones.ResNet(model_id=resnet_model_id, activation=activation) self.assertEqual(backbone.count_params(), 23561152) num_classes = 1000 diff --git a/official/vision/modeling/decoders/__init__.py b/official/vision/modeling/decoders/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..7efea1543a4631f823a7b2d1dc837ed7859ba41b --- /dev/null +++ b/official/vision/modeling/decoders/__init__.py @@ -0,0 +1,19 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Decoders package definition.""" + +from official.vision.modeling.decoders.aspp import ASPP +from official.vision.modeling.decoders.fpn import FPN +from official.vision.modeling.decoders.nasfpn import NASFPN diff --git a/official/vision/beta/modeling/decoders/aspp.py b/official/vision/modeling/decoders/aspp.py similarity index 96% rename from official/vision/beta/modeling/decoders/aspp.py rename to official/vision/modeling/decoders/aspp.py index 4a908184331c102e09e67aedd29eb6b7bcb38166..946a5750dae00d4a7e068a9114693ee8c24399f4 100644 --- a/official/vision/beta/modeling/decoders/aspp.py +++ b/official/vision/modeling/decoders/aspp.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,9 +20,9 @@ from typing import Any, List, Mapping, Optional, Union import tensorflow as tf from official.modeling import hyperparams -from official.vision.beta.modeling.decoders import factory -from official.vision.beta.modeling.layers import deeplab -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.decoders import factory +from official.vision.modeling.layers import deeplab +from official.vision.modeling.layers import nn_layers TensorMapUnion = Union[tf.Tensor, Mapping[str, tf.Tensor]] @@ -103,7 +103,7 @@ class ASPP(tf.keras.layers.Layer): if self._config_dict['pool_kernel_size']: pool_kernel_size = [ int(p_size // 2**self._config_dict['level']) - for p_size in self._config_dict['pool_kernel_size'] + for p_size in self._config_dict['pool_kernel_size'] # pytype: disable=attribute-error # trace-all-classes ] self.aspp = self._aspp_layer( diff --git a/official/vision/beta/modeling/decoders/aspp_test.py b/official/vision/modeling/decoders/aspp_test.py similarity index 93% rename from official/vision/beta/modeling/decoders/aspp_test.py rename to official/vision/modeling/decoders/aspp_test.py index c04b2feb474fc5b1b8eebfcf1b1a5ac0da960564..11398ea8acdc0d9842f709c69b032d636120c96f 100644 --- a/official/vision/beta/modeling/decoders/aspp_test.py +++ b/official/vision/modeling/decoders/aspp_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,15 +12,14 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for aspp.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import resnet -from official.vision.beta.modeling.decoders import aspp +from official.vision.modeling.backbones import resnet +from official.vision.modeling.decoders import aspp class ASPPTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/modeling/decoders/factory.py b/official/vision/modeling/decoders/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..d1f732b9b68c63a8624c0d0599d33a0cece98f28 --- /dev/null +++ b/official/vision/modeling/decoders/factory.py @@ -0,0 +1,135 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Decoder registers and factory method. + +One can register a new decoder model by the following two steps: + +1 Import the factory and register the build in the decoder file. +2 Import the decoder class and add a build in __init__.py. + +``` +# my_decoder.py + +from modeling.decoders import factory + +class MyDecoder(): + ... + +@factory.register_decoder_builder('my_decoder') +def build_my_decoder(): + return MyDecoder() + +# decoders/__init__.py adds import +from modeling.decoders.my_decoder import MyDecoder +``` + +If one wants the MyDecoder class to be used only by those binary +then don't imported the decoder module in decoders/__init__.py, but import it +in place that uses it. +""" +from typing import Any, Callable, Mapping, Optional, Union + +# Import libraries + +import tensorflow as tf + +from official.core import registry +from official.modeling import hyperparams + +_REGISTERED_DECODER_CLS = {} + + +def register_decoder_builder(key: str) -> Callable[..., Any]: + """Decorates a builder of decoder class. + + The builder should be a Callable (a class or a function). + This decorator supports registration of decoder builder as follows: + + ``` + class MyDecoder(tf.keras.Model): + pass + + @register_decoder_builder('mydecoder') + def builder(input_specs, config, l2_reg): + return MyDecoder(...) + + # Builds a MyDecoder object. + my_decoder = build_decoder_3d(input_specs, config, l2_reg) + ``` + + Args: + key: A `str` of key to look up the builder. + + Returns: + A callable for using as class decorator that registers the decorated class + for creation from an instance of task_config_cls. + """ + return registry.register(_REGISTERED_DECODER_CLS, key) + + +@register_decoder_builder('identity') +def build_identity( + input_specs: Optional[Mapping[str, tf.TensorShape]] = None, + model_config: Optional[hyperparams.Config] = None, + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None) -> None: + """Builds identity decoder from a config. + + All the input arguments are not used by identity decoder but kept here to + ensure the interface is consistent. + + Args: + input_specs: A `dict` of input specifications. A dictionary consists of + {level: TensorShape} from a backbone. + model_config: A `OneOfConfig` of model config. + l2_regularizer: A `tf.keras.regularizers.Regularizer` object. Default to + None. + + Returns: + An instance of the identity decoder. + """ + del input_specs, model_config, l2_regularizer # Unused by identity decoder. + + +def build_decoder( + input_specs: Mapping[str, tf.TensorShape], + model_config: hyperparams.Config, + l2_regularizer: tf.keras.regularizers.Regularizer = None, + **kwargs) -> Union[None, tf.keras.Model, tf.keras.layers.Layer]: # pytype: disable=annotation-type-mismatch # typed-keras + """Builds decoder from a config. + + A decoder can be a keras.Model, a keras.layers.Layer, or None. If it is not + None, the decoder will take features from the backbone as input and generate + decoded feature maps. If it is None, such as an identity decoder, the decoder + is skipped and features from the backbone are regarded as model output. + + Args: + input_specs: A `dict` of input specifications. A dictionary consists of + {level: TensorShape} from a backbone. + model_config: A `OneOfConfig` of model config. + l2_regularizer: A `tf.keras.regularizers.Regularizer` object. Default to + None. + **kwargs: Additional keyword args to be passed to decoder builder. + + Returns: + An instance of the decoder. + """ + decoder_builder = registry.lookup(_REGISTERED_DECODER_CLS, + model_config.decoder.type) + + return decoder_builder( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer, + **kwargs) diff --git a/official/vision/modeling/decoders/factory_test.py b/official/vision/modeling/decoders/factory_test.py new file mode 100644 index 0000000000000000000000000000000000000000..16c8253bfe7f415efed44448db8ee71e34186f99 --- /dev/null +++ b/official/vision/modeling/decoders/factory_test.py @@ -0,0 +1,159 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for decoder factory functions.""" + +from absl.testing import parameterized +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from official.vision import configs +from official.vision.configs import decoders as decoders_cfg +from official.vision.modeling import decoders +from official.vision.modeling.decoders import factory + + +class FactoryTest(tf.test.TestCase, parameterized.TestCase): + + @combinations.generate( + combinations.combine( + num_filters=[128, 256], use_separable_conv=[True, False])) + def test_fpn_decoder_creation(self, num_filters, use_separable_conv): + """Test creation of FPN decoder.""" + min_level = 3 + max_level = 7 + input_specs = {} + for level in range(min_level, max_level): + input_specs[str(level)] = tf.TensorShape( + [1, 128 // (2**level), 128 // (2**level), 3]) + + network = decoders.FPN( + input_specs=input_specs, + num_filters=num_filters, + use_separable_conv=use_separable_conv, + use_sync_bn=True) + + model_config = configs.retinanet.RetinaNet() + model_config.min_level = min_level + model_config.max_level = max_level + model_config.num_classes = 10 + model_config.input_size = [None, None, 3] + model_config.decoder = decoders_cfg.Decoder( + type='fpn', + fpn=decoders_cfg.FPN( + num_filters=num_filters, use_separable_conv=use_separable_conv)) + + factory_network = factory.build_decoder( + input_specs=input_specs, model_config=model_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + + @combinations.generate( + combinations.combine( + num_filters=[128, 256], + num_repeats=[3, 5], + use_separable_conv=[True, False])) + def test_nasfpn_decoder_creation(self, num_filters, num_repeats, + use_separable_conv): + """Test creation of NASFPN decoder.""" + min_level = 3 + max_level = 7 + input_specs = {} + for level in range(min_level, max_level): + input_specs[str(level)] = tf.TensorShape( + [1, 128 // (2**level), 128 // (2**level), 3]) + + network = decoders.NASFPN( + input_specs=input_specs, + num_filters=num_filters, + num_repeats=num_repeats, + use_separable_conv=use_separable_conv, + use_sync_bn=True) + + model_config = configs.retinanet.RetinaNet() + model_config.min_level = min_level + model_config.max_level = max_level + model_config.num_classes = 10 + model_config.input_size = [None, None, 3] + model_config.decoder = decoders_cfg.Decoder( + type='nasfpn', + nasfpn=decoders_cfg.NASFPN( + num_filters=num_filters, + num_repeats=num_repeats, + use_separable_conv=use_separable_conv)) + + factory_network = factory.build_decoder( + input_specs=input_specs, model_config=model_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + + self.assertEqual(network_config, factory_network_config) + + @combinations.generate( + combinations.combine( + level=[3, 4], + dilation_rates=[[6, 12, 18], [6, 12]], + num_filters=[128, 256])) + def test_aspp_decoder_creation(self, level, dilation_rates, num_filters): + """Test creation of ASPP decoder.""" + input_specs = {'1': tf.TensorShape([1, 128, 128, 3])} + + network = decoders.ASPP( + level=level, + dilation_rates=dilation_rates, + num_filters=num_filters, + use_sync_bn=True) + + model_config = configs.semantic_segmentation.SemanticSegmentationModel() + model_config.num_classes = 10 + model_config.input_size = [None, None, 3] + model_config.decoder = decoders_cfg.Decoder( + type='aspp', + aspp=decoders_cfg.ASPP( + level=level, dilation_rates=dilation_rates, + num_filters=num_filters)) + + factory_network = factory.build_decoder( + input_specs=input_specs, model_config=model_config) + + network_config = network.get_config() + factory_network_config = factory_network.get_config() + # Due to calling `super().get_config()` in aspp layer, everything but the + # the name of two layer instances are the same, so we force equal name so it + # will not give false alarm. + factory_network_config['name'] = network_config['name'] + + self.assertEqual(network_config, factory_network_config) + + def test_identity_decoder_creation(self): + """Test creation of identity decoder.""" + model_config = configs.retinanet.RetinaNet() + model_config.num_classes = 2 + model_config.input_size = [None, None, 3] + + model_config.decoder = decoders_cfg.Decoder( + type='identity', identity=decoders_cfg.Identity()) + + factory_network = factory.build_decoder( + input_specs=None, model_config=model_config) + + self.assertIsNone(factory_network) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/modeling/decoders/fpn.py b/official/vision/modeling/decoders/fpn.py new file mode 100644 index 0000000000000000000000000000000000000000..4f3a77aaaff8daf82a3ac3c2f55e22ee275b761d --- /dev/null +++ b/official/vision/modeling/decoders/fpn.py @@ -0,0 +1,256 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains the definitions of Feature Pyramid Networks (FPN).""" +from typing import Any, Mapping, Optional + +# Import libraries +from absl import logging +import tensorflow as tf + +from official.modeling import hyperparams +from official.modeling import tf_utils +from official.vision.modeling.decoders import factory +from official.vision.ops import spatial_transform_ops + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class FPN(tf.keras.Model): + """Creates a Feature Pyramid Network (FPN). + + This implemets the paper: + Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and + Serge Belongie. + Feature Pyramid Networks for Object Detection. + (https://arxiv.org/pdf/1612.03144) + """ + + def __init__( + self, + input_specs: Mapping[str, tf.TensorShape], + min_level: int = 3, + max_level: int = 7, + num_filters: int = 256, + fusion_type: str = 'sum', + use_separable_conv: bool = False, + use_keras_layer: bool = False, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_initializer: str = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a Feature Pyramid Network (FPN). + + Args: + input_specs: A `dict` of input specifications. A dictionary consists of + {level: TensorShape} from a backbone. + min_level: An `int` of minimum level in FPN output feature maps. + max_level: An `int` of maximum level in FPN output feature maps. + num_filters: An `int` number of filters in FPN layers. + fusion_type: A `str` of `sum` or `concat`. Whether performing sum or + concat for feature fusion. + use_separable_conv: A `bool`. If True use separable convolution for + convolution in FPN layers. + use_keras_layer: A `bool`. If Ture use keras layers as many as possible. + activation: A `str` name of the activation function. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_initializer: A `str` name of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + self._config_dict = { + 'input_specs': input_specs, + 'min_level': min_level, + 'max_level': max_level, + 'num_filters': num_filters, + 'fusion_type': fusion_type, + 'use_separable_conv': use_separable_conv, + 'use_keras_layer': use_keras_layer, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_initializer': kernel_initializer, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + } + if use_separable_conv: + conv2d = tf.keras.layers.SeparableConv2D + else: + conv2d = tf.keras.layers.Conv2D + if use_sync_bn: + norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + norm = tf.keras.layers.BatchNormalization + activation_fn = tf_utils.get_activation(activation, use_keras_layer=True) + + # Build input feature pyramid. + if tf.keras.backend.image_data_format() == 'channels_last': + bn_axis = -1 + else: + bn_axis = 1 + + # Get input feature pyramid from backbone. + logging.info('FPN input_specs: %s', input_specs) + inputs = self._build_input_pyramid(input_specs, min_level) + backbone_max_level = min(int(max(inputs.keys())), max_level) + + # Build lateral connections. + feats_lateral = {} + for level in range(min_level, backbone_max_level + 1): + feats_lateral[str(level)] = conv2d( + filters=num_filters, + kernel_size=1, + padding='same', + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer)( + inputs[str(level)]) + + # Build top-down path. + feats = {str(backbone_max_level): feats_lateral[str(backbone_max_level)]} + for level in range(backbone_max_level - 1, min_level - 1, -1): + feat_a = spatial_transform_ops.nearest_upsampling( + feats[str(level + 1)], 2, use_keras_layer=use_keras_layer) + feat_b = feats_lateral[str(level)] + + if fusion_type == 'sum': + if use_keras_layer: + feats[str(level)] = tf.keras.layers.Add()([feat_a, feat_b]) + else: + feats[str(level)] = feat_a + feat_b + elif fusion_type == 'concat': + if use_keras_layer: + feats[str(level)] = tf.keras.layers.Concatenate(axis=-1)( + [feat_a, feat_b]) + else: + feats[str(level)] = tf.concat([feat_a, feat_b], axis=-1) + else: + raise ValueError('Fusion type {} not supported.'.format(fusion_type)) + + # TODO(xianzhi): consider to remove bias in conv2d. + # Build post-hoc 3x3 convolution kernel. + for level in range(min_level, backbone_max_level + 1): + feats[str(level)] = conv2d( + filters=num_filters, + strides=1, + kernel_size=3, + padding='same', + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer)( + feats[str(level)]) + + # TODO(xianzhi): consider to remove bias in conv2d. + # Build coarser FPN levels introduced for RetinaNet. + for level in range(backbone_max_level + 1, max_level + 1): + feats_in = feats[str(level - 1)] + if level > backbone_max_level + 1: + feats_in = activation_fn(feats_in) + feats[str(level)] = conv2d( + filters=num_filters, + strides=2, + kernel_size=3, + padding='same', + kernel_initializer=kernel_initializer, + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer)( + feats_in) + + # Apply batch norm layers. + for level in range(min_level, max_level + 1): + feats[str(level)] = norm( + axis=bn_axis, momentum=norm_momentum, epsilon=norm_epsilon)( + feats[str(level)]) + + self._output_specs = { + str(level): feats[str(level)].get_shape() + for level in range(min_level, max_level + 1) + } + + super(FPN, self).__init__(inputs=inputs, outputs=feats, **kwargs) + + def _build_input_pyramid(self, input_specs: Mapping[str, tf.TensorShape], + min_level: int): + assert isinstance(input_specs, dict) + if min(input_specs.keys()) > str(min_level): + raise ValueError( + 'Backbone min level should be less or equal to FPN min level') + + inputs = {} + for level, spec in input_specs.items(): + inputs[level] = tf.keras.Input(shape=spec[1:]) + return inputs + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + @property + def output_specs(self) -> Mapping[str, tf.TensorShape]: + """A dict of {level: TensorShape} pairs for the model output.""" + return self._output_specs + + +@factory.register_decoder_builder('fpn') +def build_fpn_decoder( + input_specs: Mapping[str, tf.TensorShape], + model_config: hyperparams.Config, + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None +) -> tf.keras.Model: + """Builds FPN decoder from a config. + + Args: + input_specs: A `dict` of input specifications. A dictionary consists of + {level: TensorShape} from a backbone. + model_config: A OneOfConfig. Model config. + l2_regularizer: A `tf.keras.regularizers.Regularizer` instance. Default to + None. + + Returns: + A `tf.keras.Model` instance of the FPN decoder. + + Raises: + ValueError: If the model_config.decoder.type is not `fpn`. + """ + decoder_type = model_config.decoder.type + decoder_cfg = model_config.decoder.get() + if decoder_type != 'fpn': + raise ValueError(f'Inconsistent decoder type {decoder_type}. ' + 'Need to be `fpn`.') + norm_activation_config = model_config.norm_activation + return FPN( + input_specs=input_specs, + min_level=model_config.min_level, + max_level=model_config.max_level, + num_filters=decoder_cfg.num_filters, + fusion_type=decoder_cfg.fusion_type, + use_separable_conv=decoder_cfg.use_separable_conv, + use_keras_layer=decoder_cfg.use_keras_layer, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) diff --git a/official/vision/beta/modeling/decoders/fpn_test.py b/official/vision/modeling/decoders/fpn_test.py similarity index 79% rename from official/vision/beta/modeling/decoders/fpn_test.py rename to official/vision/modeling/decoders/fpn_test.py index 1aef30011abac19295d4bc3e7afa39ca68be6ae6..483efd21d148eacb31581f1fadfd37c1ad567936 100644 --- a/official/vision/beta/modeling/decoders/fpn_test.py +++ b/official/vision/modeling/decoders/fpn_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,26 +12,27 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for FPN.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import mobilenet -from official.vision.beta.modeling.backbones import resnet -from official.vision.beta.modeling.decoders import fpn +from official.vision.modeling.backbones import mobilenet +from official.vision.modeling.backbones import resnet +from official.vision.modeling.decoders import fpn class FPNTest(parameterized.TestCase, tf.test.TestCase): @parameterized.parameters( - (256, 3, 7, False, 'sum'), - (256, 3, 7, True, 'concat'), + (256, 3, 7, False, False, 'sum'), + (256, 3, 7, False, True, 'sum'), + (256, 3, 7, True, False, 'concat'), + (256, 3, 7, True, True, 'concat'), ) def test_network_creation(self, input_size, min_level, max_level, - use_separable_conv, fusion_type): + use_separable_conv, use_keras_layer, fusion_type): """Test creation of FPN.""" tf.keras.backend.set_image_data_format('channels_last') @@ -43,7 +44,8 @@ class FPNTest(parameterized.TestCase, tf.test.TestCase): min_level=min_level, max_level=max_level, fusion_type=fusion_type, - use_separable_conv=use_separable_conv) + use_separable_conv=use_separable_conv, + use_keras_layer=use_keras_layer) endpoints = backbone(inputs) feats = network(endpoints) @@ -55,11 +57,14 @@ class FPNTest(parameterized.TestCase, tf.test.TestCase): feats[str(level)].shape.as_list()) @parameterized.parameters( - (256, 3, 7, False), - (256, 3, 7, True), + (256, 3, 7, False, False), + (256, 3, 7, False, True), + (256, 3, 7, True, False), + (256, 3, 7, True, True), ) def test_network_creation_with_mobilenet(self, input_size, min_level, - max_level, use_separable_conv): + max_level, use_separable_conv, + use_keras_layer): """Test creation of FPN with mobilenet backbone.""" tf.keras.backend.set_image_data_format('channels_last') @@ -70,7 +75,8 @@ class FPNTest(parameterized.TestCase, tf.test.TestCase): input_specs=backbone.output_specs, min_level=min_level, max_level=max_level, - use_separable_conv=use_separable_conv) + use_separable_conv=use_separable_conv, + use_keras_layer=use_keras_layer) endpoints = backbone(inputs) feats = network(endpoints) @@ -90,6 +96,7 @@ class FPNTest(parameterized.TestCase, tf.test.TestCase): num_filters=256, fusion_type='sum', use_separable_conv=False, + use_keras_layer=False, use_sync_bn=False, activation='relu', norm_momentum=0.99, diff --git a/official/vision/beta/modeling/decoders/nasfpn.py b/official/vision/modeling/decoders/nasfpn.py similarity index 98% rename from official/vision/beta/modeling/decoders/nasfpn.py rename to official/vision/modeling/decoders/nasfpn.py index b09e074c8f42638b2015dba334f67ebde25afef2..ff8ca0aeb481ef9641edcd449fb3da0004095334 100644 --- a/official/vision/beta/modeling/decoders/nasfpn.py +++ b/official/vision/modeling/decoders/nasfpn.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -23,8 +23,8 @@ import tensorflow as tf from official.modeling import hyperparams from official.modeling import tf_utils -from official.vision.beta.modeling.decoders import factory -from official.vision.beta.ops import spatial_transform_ops +from official.vision.modeling.decoders import factory +from official.vision.ops import spatial_transform_ops # The fixed NAS-FPN architecture discovered by NAS. @@ -135,25 +135,6 @@ class NASFPN(tf.keras.Model): self._conv_op = (tf.keras.layers.SeparableConv2D if self._config_dict['use_separable_conv'] else tf.keras.layers.Conv2D) - if self._config_dict['use_separable_conv']: - self._conv_kwargs = { - 'depthwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'pointwise_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'depthwise_regularizer': self._config_dict['kernel_regularizer'], - 'pointwise_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - } - else: - self._conv_kwargs = { - 'kernel_initializer': tf.keras.initializers.VarianceScaling( - scale=2, mode='fan_out', distribution='untruncated_normal'), - 'bias_initializer': tf.zeros_initializer(), - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - 'bias_regularizer': self._config_dict['bias_regularizer'], - } self._norm_op = (tf.keras.layers.experimental.SyncBatchNormalization if self._config_dict['use_sync_bn'] else tf.keras.layers.BatchNormalization) @@ -240,6 +221,28 @@ class NASFPN(tf.keras.Model): else: return x + @property + def _conv_kwargs(self): + if self._config_dict['use_separable_conv']: + return { + 'depthwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'pointwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'depthwise_regularizer': self._config_dict['kernel_regularizer'], + 'pointwise_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + else: + return { + 'kernel_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + def _global_attention(self, feat0, feat1): m = tf.math.reduce_max(feat0, axis=[1, 2], keepdims=True) m = tf.math.sigmoid(m) diff --git a/official/vision/beta/modeling/decoders/nasfpn_test.py b/official/vision/modeling/decoders/nasfpn_test.py similarity index 89% rename from official/vision/beta/modeling/decoders/nasfpn_test.py rename to official/vision/modeling/decoders/nasfpn_test.py index c8101281591fcd401b796c6a20e72c7e9c4eaf3e..75c07195188f3cbea13a9107572c6f0acba8658e 100644 --- a/official/vision/beta/modeling/decoders/nasfpn_test.py +++ b/official/vision/modeling/decoders/nasfpn_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,15 +12,14 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for NAS-FPN.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.backbones import resnet -from official.vision.beta.modeling.decoders import nasfpn +from official.vision.modeling.backbones import resnet +from official.vision.modeling.decoders import nasfpn class NASFPNTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/modeling/factory.py b/official/vision/modeling/factory.py new file mode 100644 index 0000000000000000000000000000000000000000..90fae06874486627a005b574eafa0c1ee8e448e8 --- /dev/null +++ b/official/vision/modeling/factory.py @@ -0,0 +1,388 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Factory methods to build models.""" + +from typing import Optional + +import tensorflow as tf + +from official.vision.configs import image_classification as classification_cfg +from official.vision.configs import maskrcnn as maskrcnn_cfg +from official.vision.configs import retinanet as retinanet_cfg +from official.vision.configs import semantic_segmentation as segmentation_cfg +from official.vision.modeling import backbones +from official.vision.modeling import classification_model +from official.vision.modeling import decoders +from official.vision.modeling import maskrcnn_model +from official.vision.modeling import retinanet_model +from official.vision.modeling import segmentation_model +from official.vision.modeling.heads import dense_prediction_heads +from official.vision.modeling.heads import instance_heads +from official.vision.modeling.heads import segmentation_heads +from official.vision.modeling.layers import detection_generator +from official.vision.modeling.layers import mask_sampler +from official.vision.modeling.layers import roi_aligner +from official.vision.modeling.layers import roi_generator +from official.vision.modeling.layers import roi_sampler + + +def build_classification_model( + input_specs: tf.keras.layers.InputSpec, + model_config: classification_cfg.ImageClassificationModel, + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + skip_logits_layer: bool = False, + backbone: Optional[tf.keras.Model] = None) -> tf.keras.Model: + """Builds the classification model.""" + norm_activation_config = model_config.norm_activation + if not backbone: + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + + model = classification_model.ClassificationModel( + backbone=backbone, + num_classes=model_config.num_classes, + input_specs=input_specs, + dropout_rate=model_config.dropout_rate, + kernel_initializer=model_config.kernel_initializer, + kernel_regularizer=l2_regularizer, + add_head_batch_norm=model_config.add_head_batch_norm, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + skip_logits_layer=skip_logits_layer) + return model + + +def build_maskrcnn(input_specs: tf.keras.layers.InputSpec, + model_config: maskrcnn_cfg.MaskRCNN, + l2_regularizer: Optional[ + tf.keras.regularizers.Regularizer] = None, + backbone: Optional[tf.keras.Model] = None, + decoder: Optional[tf.keras.Model] = None) -> tf.keras.Model: + """Builds Mask R-CNN model.""" + norm_activation_config = model_config.norm_activation + if not backbone: + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + backbone_features = backbone(tf.keras.Input(input_specs.shape[1:])) + + if not decoder: + decoder = decoders.factory.build_decoder( + input_specs=backbone.output_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + rpn_head_config = model_config.rpn_head + roi_generator_config = model_config.roi_generator + roi_sampler_config = model_config.roi_sampler + roi_aligner_config = model_config.roi_aligner + detection_head_config = model_config.detection_head + generator_config = model_config.detection_generator + num_anchors_per_location = ( + len(model_config.anchor.aspect_ratios) * model_config.anchor.num_scales) + + rpn_head = dense_prediction_heads.RPNHead( + min_level=model_config.min_level, + max_level=model_config.max_level, + num_anchors_per_location=num_anchors_per_location, + num_convs=rpn_head_config.num_convs, + num_filters=rpn_head_config.num_filters, + use_separable_conv=rpn_head_config.use_separable_conv, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + detection_head = instance_heads.DetectionHead( + num_classes=model_config.num_classes, + num_convs=detection_head_config.num_convs, + num_filters=detection_head_config.num_filters, + use_separable_conv=detection_head_config.use_separable_conv, + num_fcs=detection_head_config.num_fcs, + fc_dims=detection_head_config.fc_dims, + class_agnostic_bbox_pred=detection_head_config.class_agnostic_bbox_pred, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer, + name='detection_head') + + if decoder: + decoder_features = decoder(backbone_features) + rpn_head(decoder_features) + + if roi_sampler_config.cascade_iou_thresholds: + detection_head_cascade = [detection_head] + for cascade_num in range(len(roi_sampler_config.cascade_iou_thresholds)): + detection_head = instance_heads.DetectionHead( + num_classes=model_config.num_classes, + num_convs=detection_head_config.num_convs, + num_filters=detection_head_config.num_filters, + use_separable_conv=detection_head_config.use_separable_conv, + num_fcs=detection_head_config.num_fcs, + fc_dims=detection_head_config.fc_dims, + class_agnostic_bbox_pred=detection_head_config + .class_agnostic_bbox_pred, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer, + name='detection_head_{}'.format(cascade_num + 1)) + + detection_head_cascade.append(detection_head) + detection_head = detection_head_cascade + + roi_generator_obj = roi_generator.MultilevelROIGenerator( + pre_nms_top_k=roi_generator_config.pre_nms_top_k, + pre_nms_score_threshold=roi_generator_config.pre_nms_score_threshold, + pre_nms_min_size_threshold=( + roi_generator_config.pre_nms_min_size_threshold), + nms_iou_threshold=roi_generator_config.nms_iou_threshold, + num_proposals=roi_generator_config.num_proposals, + test_pre_nms_top_k=roi_generator_config.test_pre_nms_top_k, + test_pre_nms_score_threshold=( + roi_generator_config.test_pre_nms_score_threshold), + test_pre_nms_min_size_threshold=( + roi_generator_config.test_pre_nms_min_size_threshold), + test_nms_iou_threshold=roi_generator_config.test_nms_iou_threshold, + test_num_proposals=roi_generator_config.test_num_proposals, + use_batched_nms=roi_generator_config.use_batched_nms) + + roi_sampler_cascade = [] + roi_sampler_obj = roi_sampler.ROISampler( + mix_gt_boxes=roi_sampler_config.mix_gt_boxes, + num_sampled_rois=roi_sampler_config.num_sampled_rois, + foreground_fraction=roi_sampler_config.foreground_fraction, + foreground_iou_threshold=roi_sampler_config.foreground_iou_threshold, + background_iou_high_threshold=( + roi_sampler_config.background_iou_high_threshold), + background_iou_low_threshold=( + roi_sampler_config.background_iou_low_threshold)) + roi_sampler_cascade.append(roi_sampler_obj) + # Initialize additional roi simplers for cascade heads. + if roi_sampler_config.cascade_iou_thresholds: + for iou in roi_sampler_config.cascade_iou_thresholds: + roi_sampler_obj = roi_sampler.ROISampler( + mix_gt_boxes=False, + num_sampled_rois=roi_sampler_config.num_sampled_rois, + foreground_iou_threshold=iou, + background_iou_high_threshold=iou, + background_iou_low_threshold=0.0, + skip_subsampling=True) + roi_sampler_cascade.append(roi_sampler_obj) + + roi_aligner_obj = roi_aligner.MultilevelROIAligner( + crop_size=roi_aligner_config.crop_size, + sample_offset=roi_aligner_config.sample_offset) + + detection_generator_obj = detection_generator.DetectionGenerator( + apply_nms=generator_config.apply_nms, + pre_nms_top_k=generator_config.pre_nms_top_k, + pre_nms_score_threshold=generator_config.pre_nms_score_threshold, + nms_iou_threshold=generator_config.nms_iou_threshold, + max_num_detections=generator_config.max_num_detections, + nms_version=generator_config.nms_version, + use_cpu_nms=generator_config.use_cpu_nms, + soft_nms_sigma=generator_config.soft_nms_sigma) + + if model_config.include_mask: + mask_head = instance_heads.MaskHead( + num_classes=model_config.num_classes, + upsample_factor=model_config.mask_head.upsample_factor, + num_convs=model_config.mask_head.num_convs, + num_filters=model_config.mask_head.num_filters, + use_separable_conv=model_config.mask_head.use_separable_conv, + activation=model_config.norm_activation.activation, + norm_momentum=model_config.norm_activation.norm_momentum, + norm_epsilon=model_config.norm_activation.norm_epsilon, + kernel_regularizer=l2_regularizer, + class_agnostic=model_config.mask_head.class_agnostic) + + mask_sampler_obj = mask_sampler.MaskSampler( + mask_target_size=( + model_config.mask_roi_aligner.crop_size * + model_config.mask_head.upsample_factor), + num_sampled_masks=model_config.mask_sampler.num_sampled_masks) + + mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner( + crop_size=model_config.mask_roi_aligner.crop_size, + sample_offset=model_config.mask_roi_aligner.sample_offset) + else: + mask_head = None + mask_sampler_obj = None + mask_roi_aligner_obj = None + + model = maskrcnn_model.MaskRCNNModel( + backbone=backbone, + decoder=decoder, + rpn_head=rpn_head, + detection_head=detection_head, + roi_generator=roi_generator_obj, + roi_sampler=roi_sampler_cascade, + roi_aligner=roi_aligner_obj, + detection_generator=detection_generator_obj, + mask_head=mask_head, + mask_sampler=mask_sampler_obj, + mask_roi_aligner=mask_roi_aligner_obj, + class_agnostic_bbox_pred=detection_head_config.class_agnostic_bbox_pred, + cascade_class_ensemble=detection_head_config.cascade_class_ensemble, + min_level=model_config.min_level, + max_level=model_config.max_level, + num_scales=model_config.anchor.num_scales, + aspect_ratios=model_config.anchor.aspect_ratios, + anchor_size=model_config.anchor.anchor_size) + return model + + +def build_retinanet( + input_specs: tf.keras.layers.InputSpec, + model_config: retinanet_cfg.RetinaNet, + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + backbone: Optional[tf.keras.Model] = None, + decoder: Optional[tf.keras.Model] = None +) -> tf.keras.Model: + """Builds RetinaNet model.""" + norm_activation_config = model_config.norm_activation + if not backbone: + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + backbone_features = backbone(tf.keras.Input(input_specs.shape[1:])) + + if not decoder: + decoder = decoders.factory.build_decoder( + input_specs=backbone.output_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + head_config = model_config.head + generator_config = model_config.detection_generator + num_anchors_per_location = ( + len(model_config.anchor.aspect_ratios) * model_config.anchor.num_scales) + + head = dense_prediction_heads.RetinaNetHead( + min_level=model_config.min_level, + max_level=model_config.max_level, + num_classes=model_config.num_classes, + num_anchors_per_location=num_anchors_per_location, + num_convs=head_config.num_convs, + num_filters=head_config.num_filters, + attribute_heads=[ + cfg.as_dict() for cfg in (head_config.attribute_heads or []) + ], + share_classification_heads=head_config.share_classification_heads, + use_separable_conv=head_config.use_separable_conv, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + # Builds decoder and head so that their trainable weights are initialized + if decoder: + decoder_features = decoder(backbone_features) + _ = head(decoder_features) + + detection_generator_obj = detection_generator.MultilevelDetectionGenerator( + apply_nms=generator_config.apply_nms, + pre_nms_top_k=generator_config.pre_nms_top_k, + pre_nms_score_threshold=generator_config.pre_nms_score_threshold, + nms_iou_threshold=generator_config.nms_iou_threshold, + max_num_detections=generator_config.max_num_detections, + nms_version=generator_config.nms_version, + use_cpu_nms=generator_config.use_cpu_nms, + soft_nms_sigma=generator_config.soft_nms_sigma, + tflite_post_processing_config=generator_config.tflite_post_processing + .as_dict()) + + model = retinanet_model.RetinaNetModel( + backbone, + decoder, + head, + detection_generator_obj, + min_level=model_config.min_level, + max_level=model_config.max_level, + num_scales=model_config.anchor.num_scales, + aspect_ratios=model_config.anchor.aspect_ratios, + anchor_size=model_config.anchor.anchor_size) + return model + + +def build_segmentation_model( + input_specs: tf.keras.layers.InputSpec, + model_config: segmentation_cfg.SemanticSegmentationModel, + l2_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + backbone: Optional[tf.keras.Model] = None, + decoder: Optional[tf.keras.Model] = None +) -> tf.keras.Model: + """Builds Segmentation model.""" + norm_activation_config = model_config.norm_activation + if not backbone: + backbone = backbones.factory.build_backbone( + input_specs=input_specs, + backbone_config=model_config.backbone, + norm_activation_config=norm_activation_config, + l2_regularizer=l2_regularizer) + + if not decoder: + decoder = decoders.factory.build_decoder( + input_specs=backbone.output_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + head_config = model_config.head + + head = segmentation_heads.SegmentationHead( + num_classes=model_config.num_classes, + level=head_config.level, + num_convs=head_config.num_convs, + prediction_kernel_size=head_config.prediction_kernel_size, + num_filters=head_config.num_filters, + use_depthwise_convolution=head_config.use_depthwise_convolution, + upsample_factor=head_config.upsample_factor, + feature_fusion=head_config.feature_fusion, + low_level=head_config.low_level, + low_level_num_filters=head_config.low_level_num_filters, + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + mask_scoring_head = None + if model_config.mask_scoring_head: + mask_scoring_head = segmentation_heads.MaskScoring( + num_classes=model_config.num_classes, + **model_config.mask_scoring_head.as_dict(), + activation=norm_activation_config.activation, + use_sync_bn=norm_activation_config.use_sync_bn, + norm_momentum=norm_activation_config.norm_momentum, + norm_epsilon=norm_activation_config.norm_epsilon, + kernel_regularizer=l2_regularizer) + + model = segmentation_model.SegmentationModel( + backbone, decoder, head, mask_scoring_head=mask_scoring_head) + return model diff --git a/official/vision/beta/modeling/factory_3d.py b/official/vision/modeling/factory_3d.py similarity index 92% rename from official/vision/beta/modeling/factory_3d.py rename to official/vision/modeling/factory_3d.py index 01d8052eae1955ced16c3fe4b2421e27879fc27b..f9e254a3dcc3dd005a3ac264e20e8df01bf4f17a 100644 --- a/official/vision/beta/modeling/factory_3d.py +++ b/official/vision/modeling/factory_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,9 +18,9 @@ import tensorflow as tf from official.core import registry -from official.vision.beta.configs import video_classification as video_classification_cfg -from official.vision.beta.modeling import video_classification_model -from official.vision.beta.modeling import backbones +from official.vision.configs import video_classification as video_classification_cfg +from official.vision.modeling import video_classification_model +from official.vision.modeling import backbones _REGISTERED_MODEL_CLS = {} diff --git a/official/vision/modeling/factory_test.py b/official/vision/modeling/factory_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c7b1395542e702160ad7786ce1978422f60a6c09 --- /dev/null +++ b/official/vision/modeling/factory_test.py @@ -0,0 +1,131 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for factory.py.""" + +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from official.vision.configs import backbones +from official.vision.configs import backbones_3d +from official.vision.configs import image_classification as classification_cfg +from official.vision.configs import maskrcnn as maskrcnn_cfg +from official.vision.configs import retinanet as retinanet_cfg +from official.vision.configs import video_classification as video_classification_cfg +from official.vision.modeling import factory +from official.vision.modeling import factory_3d + + +class ClassificationModelBuilderTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('resnet', (224, 224), 5e-5), + ('resnet', (224, 224), None), + ('resnet', (None, None), 5e-5), + ('resnet', (None, None), None), + ) + def test_builder(self, backbone_type, input_size, weight_decay): + num_classes = 2 + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], 3]) + model_config = classification_cfg.ImageClassificationModel( + num_classes=num_classes, + backbone=backbones.Backbone(type=backbone_type)) + l2_regularizer = ( + tf.keras.regularizers.l2(weight_decay) if weight_decay else None) + _ = factory.build_classification_model( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + +class MaskRCNNBuilderTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('resnet', (640, 640)), + ('resnet', (None, None)), + ) + def test_builder(self, backbone_type, input_size): + num_classes = 2 + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], 3]) + model_config = maskrcnn_cfg.MaskRCNN( + num_classes=num_classes, + backbone=backbones.Backbone(type=backbone_type)) + l2_regularizer = tf.keras.regularizers.l2(5e-5) + _ = factory.build_maskrcnn( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + + +class RetinaNetBuilderTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('resnet', (640, 640), False), + ('resnet', (None, None), True), + ) + def test_builder(self, backbone_type, input_size, has_att_heads): + num_classes = 2 + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], 3]) + if has_att_heads: + attribute_heads_config = [ + retinanet_cfg.AttributeHead(name='att1'), + retinanet_cfg.AttributeHead( + name='att2', type='classification', size=2), + ] + else: + attribute_heads_config = None + model_config = retinanet_cfg.RetinaNet( + num_classes=num_classes, + backbone=backbones.Backbone(type=backbone_type), + head=retinanet_cfg.RetinaNetHead( + attribute_heads=attribute_heads_config)) + l2_regularizer = tf.keras.regularizers.l2(5e-5) + _ = factory.build_retinanet( + input_specs=input_specs, + model_config=model_config, + l2_regularizer=l2_regularizer) + if has_att_heads: + self.assertEqual(model_config.head.attribute_heads[0].as_dict(), + dict(name='att1', type='regression', size=1)) + self.assertEqual(model_config.head.attribute_heads[1].as_dict(), + dict(name='att2', type='classification', size=2)) + + +class VideoClassificationModelBuilderTest(parameterized.TestCase, + tf.test.TestCase): + + @parameterized.parameters( + ('resnet_3d', (8, 224, 224), 5e-5), + ('resnet_3d', (None, None, None), 5e-5), + ) + def test_builder(self, backbone_type, input_size, weight_decay): + input_specs = tf.keras.layers.InputSpec( + shape=[None, input_size[0], input_size[1], input_size[2], 3]) + model_config = video_classification_cfg.VideoClassificationModel( + backbone=backbones_3d.Backbone3D(type=backbone_type)) + l2_regularizer = ( + tf.keras.regularizers.l2(weight_decay) if weight_decay else None) + _ = factory_3d.build_video_classification_model( + input_specs=input_specs, + model_config=model_config, + num_classes=2, + l2_regularizer=l2_regularizer) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/modeling/heads/__init__.py b/official/vision/modeling/heads/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..1b5746dae9766775dfb51210a3fc676f8d630a6d --- /dev/null +++ b/official/vision/modeling/heads/__init__.py @@ -0,0 +1,21 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Heads package definition.""" + +from official.vision.modeling.heads.dense_prediction_heads import RetinaNetHead +from official.vision.modeling.heads.dense_prediction_heads import RPNHead +from official.vision.modeling.heads.instance_heads import DetectionHead +from official.vision.modeling.heads.instance_heads import MaskHead +from official.vision.modeling.heads.segmentation_heads import SegmentationHead diff --git a/official/vision/modeling/heads/dense_prediction_heads.py b/official/vision/modeling/heads/dense_prediction_heads.py new file mode 100644 index 0000000000000000000000000000000000000000..4057a434d99df146ed989171ed7cb6b3e58f0af4 --- /dev/null +++ b/official/vision/modeling/heads/dense_prediction_heads.py @@ -0,0 +1,540 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of dense prediction heads.""" + +from typing import Any, Dict, List, Mapping, Optional, Union + +# Import libraries + +import numpy as np +import tensorflow as tf + +from official.modeling import tf_utils + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class RetinaNetHead(tf.keras.layers.Layer): + """Creates a RetinaNet head.""" + + def __init__( + self, + min_level: int, + max_level: int, + num_classes: int, + num_anchors_per_location: int, + num_convs: int = 4, + num_filters: int = 256, + attribute_heads: Optional[List[Dict[str, Any]]] = None, + share_classification_heads: bool = False, + use_separable_conv: bool = False, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + num_params_per_anchor: int = 4, + **kwargs): + """Initializes a RetinaNet head. + + Args: + min_level: An `int` number of minimum feature level. + max_level: An `int` number of maximum feature level. + num_classes: An `int` number of classes to predict. + num_anchors_per_location: An `int` number of number of anchors per pixel + location. + num_convs: An `int` number that represents the number of the intermediate + conv layers before the prediction. + num_filters: An `int` number that represents the number of filters of the + intermediate conv layers. + attribute_heads: If not None, a list that contains a dict for each + additional attribute head. Each dict consists of 3 key-value pairs: + `name`, `type` ('regression' or 'classification'), and `size` (number + of predicted values for each instance). + share_classification_heads: A `bool` that indicates whethere + sharing weights among the main and attribute classification heads. + use_separable_conv: A `bool` that indicates whether the separable + convolution layers is used. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + num_params_per_anchor: Number of parameters required to specify an anchor + box. For example, `num_params_per_anchor` would be 4 for axis-aligned + anchor boxes specified by their y-centers, x-centers, heights, and + widths. + **kwargs: Additional keyword arguments to be passed. + """ + super(RetinaNetHead, self).__init__(**kwargs) + self._config_dict = { + 'min_level': min_level, + 'max_level': max_level, + 'num_classes': num_classes, + 'num_anchors_per_location': num_anchors_per_location, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'attribute_heads': attribute_heads, + 'share_classification_heads': share_classification_heads, + 'use_separable_conv': use_separable_conv, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + 'num_params_per_anchor': num_params_per_anchor, + } + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation = tf_utils.get_activation(activation) + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the variables of the head.""" + conv_op = (tf.keras.layers.SeparableConv2D + if self._config_dict['use_separable_conv'] + else tf.keras.layers.Conv2D) + conv_kwargs = { + 'filters': self._config_dict['num_filters'], + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal( + stddev=0.01), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + # Class net. + self._cls_convs = [] + self._cls_norms = [] + for level in range( + self._config_dict['min_level'], self._config_dict['max_level'] + 1): + this_level_cls_norms = [] + for i in range(self._config_dict['num_convs']): + if level == self._config_dict['min_level']: + cls_conv_name = 'classnet-conv_{}'.format(i) + if 'kernel_initializer' in conv_kwargs: + conv_kwargs['kernel_initializer'] = tf_utils.clone_initializer( + conv_kwargs['kernel_initializer']) + self._cls_convs.append(conv_op(name=cls_conv_name, **conv_kwargs)) + cls_norm_name = 'classnet-conv-norm_{}_{}'.format(level, i) + this_level_cls_norms.append(bn_op(name=cls_norm_name, **bn_kwargs)) + self._cls_norms.append(this_level_cls_norms) + + classifier_kwargs = { + 'filters': ( + self._config_dict['num_classes'] * + self._config_dict['num_anchors_per_location']), + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.constant_initializer(-np.log((1 - 0.01) / 0.01)), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + classifier_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal(stddev=1e-5), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + self._classifier = conv_op(name='scores', **classifier_kwargs) + + # Box net. + self._box_convs = [] + self._box_norms = [] + for level in range( + self._config_dict['min_level'], self._config_dict['max_level'] + 1): + this_level_box_norms = [] + for i in range(self._config_dict['num_convs']): + if level == self._config_dict['min_level']: + box_conv_name = 'boxnet-conv_{}'.format(i) + if 'kernel_initializer' in conv_kwargs: + conv_kwargs['kernel_initializer'] = tf_utils.clone_initializer( + conv_kwargs['kernel_initializer']) + self._box_convs.append(conv_op(name=box_conv_name, **conv_kwargs)) + box_norm_name = 'boxnet-conv-norm_{}_{}'.format(level, i) + this_level_box_norms.append(bn_op(name=box_norm_name, **bn_kwargs)) + self._box_norms.append(this_level_box_norms) + + box_regressor_kwargs = { + 'filters': (self._config_dict['num_params_per_anchor'] * + self._config_dict['num_anchors_per_location']), + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + box_regressor_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal( + stddev=1e-5), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + self._box_regressor = conv_op(name='boxes', **box_regressor_kwargs) + + # Attribute learning nets. + if self._config_dict['attribute_heads']: + self._att_predictors = {} + self._att_convs = {} + self._att_norms = {} + + for att_config in self._config_dict['attribute_heads']: + att_name = att_config['name'] + att_type = att_config['type'] + att_size = att_config['size'] + att_convs_i = [] + att_norms_i = [] + + # Build conv and norm layers. + for level in range(self._config_dict['min_level'], + self._config_dict['max_level'] + 1): + this_level_att_norms = [] + for i in range(self._config_dict['num_convs']): + if level == self._config_dict['min_level']: + att_conv_name = '{}-conv_{}'.format(att_name, i) + if 'kernel_initializer' in conv_kwargs: + conv_kwargs['kernel_initializer'] = tf_utils.clone_initializer( + conv_kwargs['kernel_initializer']) + att_convs_i.append(conv_op(name=att_conv_name, **conv_kwargs)) + att_norm_name = '{}-conv-norm_{}_{}'.format(att_name, level, i) + this_level_att_norms.append(bn_op(name=att_norm_name, **bn_kwargs)) + att_norms_i.append(this_level_att_norms) + self._att_convs[att_name] = att_convs_i + self._att_norms[att_name] = att_norms_i + + # Build the final prediction layer. + att_predictor_kwargs = { + 'filters': + (att_size * self._config_dict['num_anchors_per_location']), + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if att_type == 'regression': + att_predictor_kwargs.update( + {'bias_initializer': tf.zeros_initializer()}) + elif att_type == 'classification': + att_predictor_kwargs.update({ + 'bias_initializer': + tf.constant_initializer(-np.log((1 - 0.01) / 0.01)) + }) + else: + raise ValueError( + 'Attribute head type {} not supported.'.format(att_type)) + + if not self._config_dict['use_separable_conv']: + att_predictor_kwargs.update({ + 'kernel_initializer': + tf.keras.initializers.RandomNormal(stddev=1e-5), + 'kernel_regularizer': + self._config_dict['kernel_regularizer'], + }) + + self._att_predictors[att_name] = conv_op( + name='{}_attributes'.format(att_name), **att_predictor_kwargs) + + super(RetinaNetHead, self).build(input_shape) + + def call(self, features: Mapping[str, tf.Tensor]): + """Forward pass of the RetinaNet head. + + Args: + features: A `dict` of `tf.Tensor` where + - key: A `str` of the level of the multilevel features. + - values: A `tf.Tensor`, the feature map tensors, whose shape is + [batch, height_l, width_l, channels]. + + Returns: + scores: A `dict` of `tf.Tensor` which includes scores of the predictions. + - key: A `str` of the level of the multilevel predictions. + - values: A `tf.Tensor` of the box scores predicted from a particular + feature level, whose shape is + [batch, height_l, width_l, num_classes * num_anchors_per_location]. + boxes: A `dict` of `tf.Tensor` which includes coordinates of the + predictions. + - key: A `str` of the level of the multilevel predictions. + - values: A `tf.Tensor` of the box scores predicted from a particular + feature level, whose shape is + [batch, height_l, width_l, + num_params_per_anchor * num_anchors_per_location]. + attributes: a dict of (attribute_name, attribute_prediction). Each + `attribute_prediction` is a dict of: + - key: `str`, the level of the multilevel predictions. + - values: `Tensor`, the box scores predicted from a particular feature + level, whose shape is + [batch, height_l, width_l, + attribute_size * num_anchors_per_location]. + Can be an empty dictionary if no attribute learning is required. + """ + scores = {} + boxes = {} + if self._config_dict['attribute_heads']: + attributes = { + att_config['name']: {} + for att_config in self._config_dict['attribute_heads'] + } + else: + attributes = {} + + for i, level in enumerate( + range(self._config_dict['min_level'], + self._config_dict['max_level'] + 1)): + this_level_features = features[str(level)] + + # class net. + x = this_level_features + for conv, norm in zip(self._cls_convs, self._cls_norms[i]): + x = conv(x) + x = norm(x) + x = self._activation(x) + classnet_x = x + scores[str(level)] = self._classifier(classnet_x) + + # box net. + x = this_level_features + for conv, norm in zip(self._box_convs, self._box_norms[i]): + x = conv(x) + x = norm(x) + x = self._activation(x) + boxes[str(level)] = self._box_regressor(x) + + # attribute nets. + if self._config_dict['attribute_heads']: + for att_config in self._config_dict['attribute_heads']: + att_name = att_config['name'] + att_type = att_config['type'] + if self._config_dict[ + 'share_classification_heads'] and att_type == 'classification': + attributes[att_name][str(level)] = self._att_predictors[att_name]( + classnet_x) + else: + x = this_level_features + for conv, norm in zip(self._att_convs[att_name], + self._att_norms[att_name][i]): + x = conv(x) + x = norm(x) + x = self._activation(x) + attributes[att_name][str(level)] = self._att_predictors[att_name](x) + + return scores, boxes, attributes + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class RPNHead(tf.keras.layers.Layer): + """Creates a Region Proposal Network (RPN) head.""" + + def __init__( + self, + min_level: int, + max_level: int, + num_anchors_per_location: int, + num_convs: int = 1, + num_filters: int = 256, + use_separable_conv: bool = False, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a Region Proposal Network head. + + Args: + min_level: An `int` number of minimum feature level. + max_level: An `int` number of maximum feature level. + num_anchors_per_location: An `int` number of number of anchors per pixel + location. + num_convs: An `int` number that represents the number of the intermediate + convolution layers before the prediction. + num_filters: An `int` number that represents the number of filters of the + intermediate convolution layers. + use_separable_conv: A `bool` that indicates whether the separable + convolution layers is used. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + super(RPNHead, self).__init__(**kwargs) + self._config_dict = { + 'min_level': min_level, + 'max_level': max_level, + 'num_anchors_per_location': num_anchors_per_location, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'use_separable_conv': use_separable_conv, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + } + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation = tf_utils.get_activation(activation) + + def build(self, input_shape): + """Creates the variables of the head.""" + conv_op = (tf.keras.layers.SeparableConv2D + if self._config_dict['use_separable_conv'] + else tf.keras.layers.Conv2D) + conv_kwargs = { + 'filters': self._config_dict['num_filters'], + 'kernel_size': 3, + 'padding': 'same', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal( + stddev=0.01), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + self._convs = [] + self._norms = [] + for level in range( + self._config_dict['min_level'], self._config_dict['max_level'] + 1): + this_level_norms = [] + for i in range(self._config_dict['num_convs']): + if level == self._config_dict['min_level']: + conv_name = 'rpn-conv_{}'.format(i) + if 'kernel_initializer' in conv_kwargs: + conv_kwargs['kernel_initializer'] = tf_utils.clone_initializer( + conv_kwargs['kernel_initializer']) + self._convs.append(conv_op(name=conv_name, **conv_kwargs)) + norm_name = 'rpn-conv-norm_{}_{}'.format(level, i) + this_level_norms.append(bn_op(name=norm_name, **bn_kwargs)) + self._norms.append(this_level_norms) + + classifier_kwargs = { + 'filters': self._config_dict['num_anchors_per_location'], + 'kernel_size': 1, + 'padding': 'valid', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + classifier_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal( + stddev=1e-5), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + self._classifier = conv_op(name='rpn-scores', **classifier_kwargs) + + box_regressor_kwargs = { + 'filters': 4 * self._config_dict['num_anchors_per_location'], + 'kernel_size': 1, + 'padding': 'valid', + 'bias_initializer': tf.zeros_initializer(), + 'bias_regularizer': self._config_dict['bias_regularizer'], + } + if not self._config_dict['use_separable_conv']: + box_regressor_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.RandomNormal( + stddev=1e-5), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + }) + self._box_regressor = conv_op(name='rpn-boxes', **box_regressor_kwargs) + + super(RPNHead, self).build(input_shape) + + def call(self, features: Mapping[str, tf.Tensor]): + """Forward pass of the RPN head. + + Args: + features: A `dict` of `tf.Tensor` where + - key: A `str` of the level of the multilevel features. + - values: A `tf.Tensor`, the feature map tensors, whose shape is [batch, + height_l, width_l, channels]. + + Returns: + scores: A `dict` of `tf.Tensor` which includes scores of the predictions. + - key: A `str` of the level of the multilevel predictions. + - values: A `tf.Tensor` of the box scores predicted from a particular + feature level, whose shape is + [batch, height_l, width_l, num_classes * num_anchors_per_location]. + boxes: A `dict` of `tf.Tensor` which includes coordinates of the + predictions. + - key: A `str` of the level of the multilevel predictions. + - values: A `tf.Tensor` of the box scores predicted from a particular + feature level, whose shape is + [batch, height_l, width_l, 4 * num_anchors_per_location]. + """ + scores = {} + boxes = {} + for i, level in enumerate( + range(self._config_dict['min_level'], + self._config_dict['max_level'] + 1)): + x = features[str(level)] + for conv, norm in zip(self._convs, self._norms[i]): + x = conv(x) + x = norm(x) + x = self._activation(x) + scores[str(level)] = self._classifier(x) + boxes[str(level)] = self._box_regressor(x) + return scores, boxes + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/vision/modeling/heads/dense_prediction_heads_test.py b/official/vision/modeling/heads/dense_prediction_heads_test.py new file mode 100644 index 0000000000000000000000000000000000000000..8aacce4b30c3d280bd3c03ddb9bad5ac922590ac --- /dev/null +++ b/official/vision/modeling/heads/dense_prediction_heads_test.py @@ -0,0 +1,149 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for dense_prediction_heads.py.""" + +# Import libraries +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.vision.modeling.heads import dense_prediction_heads + + +class RetinaNetHeadTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (False, False, False, None, False), + (False, True, False, None, False), + (True, False, True, 'regression', False), + (True, True, True, 'classification', True), + ) + def test_forward(self, use_separable_conv, use_sync_bn, has_att_heads, + att_type, share_classification_heads): + if has_att_heads: + attribute_heads = [dict(name='depth', type=att_type, size=1)] + else: + attribute_heads = None + + retinanet_head = dense_prediction_heads.RetinaNetHead( + min_level=3, + max_level=4, + num_classes=3, + num_anchors_per_location=3, + num_convs=2, + num_filters=256, + attribute_heads=attribute_heads, + share_classification_heads=share_classification_heads, + use_separable_conv=use_separable_conv, + activation='relu', + use_sync_bn=use_sync_bn, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + features = { + '3': np.random.rand(2, 128, 128, 16), + '4': np.random.rand(2, 64, 64, 16), + } + scores, boxes, attributes = retinanet_head(features) + self.assertAllEqual(scores['3'].numpy().shape, [2, 128, 128, 9]) + self.assertAllEqual(scores['4'].numpy().shape, [2, 64, 64, 9]) + self.assertAllEqual(boxes['3'].numpy().shape, [2, 128, 128, 12]) + self.assertAllEqual(boxes['4'].numpy().shape, [2, 64, 64, 12]) + if has_att_heads: + for att in attributes.values(): + self.assertAllEqual(att['3'].numpy().shape, [2, 128, 128, 3]) + self.assertAllEqual(att['4'].numpy().shape, [2, 64, 64, 3]) + + def test_serialize_deserialize(self): + retinanet_head = dense_prediction_heads.RetinaNetHead( + min_level=3, + max_level=7, + num_classes=3, + num_anchors_per_location=9, + num_convs=2, + num_filters=16, + attribute_heads=None, + use_separable_conv=False, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + config = retinanet_head.get_config() + new_retinanet_head = ( + dense_prediction_heads.RetinaNetHead.from_config(config)) + self.assertAllEqual( + retinanet_head.get_config(), new_retinanet_head.get_config()) + + +class RpnHeadTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (False, False), + (False, True), + (True, False), + (True, True), + ) + def test_forward(self, use_separable_conv, use_sync_bn): + rpn_head = dense_prediction_heads.RPNHead( + min_level=3, + max_level=4, + num_anchors_per_location=3, + num_convs=2, + num_filters=256, + use_separable_conv=use_separable_conv, + activation='relu', + use_sync_bn=use_sync_bn, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + features = { + '3': np.random.rand(2, 128, 128, 16), + '4': np.random.rand(2, 64, 64, 16), + } + scores, boxes = rpn_head(features) + self.assertAllEqual(scores['3'].numpy().shape, [2, 128, 128, 3]) + self.assertAllEqual(scores['4'].numpy().shape, [2, 64, 64, 3]) + self.assertAllEqual(boxes['3'].numpy().shape, [2, 128, 128, 12]) + self.assertAllEqual(boxes['4'].numpy().shape, [2, 64, 64, 12]) + + def test_serialize_deserialize(self): + rpn_head = dense_prediction_heads.RPNHead( + min_level=3, + max_level=7, + num_anchors_per_location=9, + num_convs=2, + num_filters=16, + use_separable_conv=False, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + config = rpn_head.get_config() + new_rpn_head = dense_prediction_heads.RPNHead.from_config(config) + self.assertAllEqual(rpn_head.get_config(), new_rpn_head.get_config()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/modeling/heads/instance_heads.py b/official/vision/modeling/heads/instance_heads.py new file mode 100644 index 0000000000000000000000000000000000000000..1d6e2097bb53d1f59d7f3e2a6efb12d96d808868 --- /dev/null +++ b/official/vision/modeling/heads/instance_heads.py @@ -0,0 +1,452 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of instance prediction heads.""" + +from typing import List, Union, Optional +# Import libraries +import tensorflow as tf + +from official.modeling import tf_utils + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class DetectionHead(tf.keras.layers.Layer): + """Creates a detection head.""" + + def __init__( + self, + num_classes: int, + num_convs: int = 0, + num_filters: int = 256, + use_separable_conv: bool = False, + num_fcs: int = 2, + fc_dims: int = 1024, + class_agnostic_bbox_pred: bool = False, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a detection head. + + Args: + num_classes: An `int` for the number of classes. + num_convs: An `int` number that represents the number of the intermediate + convolution layers before the FC layers. + num_filters: An `int` number that represents the number of filters of the + intermediate convolution layers. + use_separable_conv: A `bool` that indicates whether the separable + convolution layers is used. + num_fcs: An `int` number that represents the number of FC layers before + the predictions. + fc_dims: An `int` number that represents the number of dimension of the FC + layers. + class_agnostic_bbox_pred: `bool`, indicating whether bboxes should be + predicted for every class or not. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + """ + super(DetectionHead, self).__init__(**kwargs) + self._config_dict = { + 'num_classes': num_classes, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'use_separable_conv': use_separable_conv, + 'num_fcs': num_fcs, + 'fc_dims': fc_dims, + 'class_agnostic_bbox_pred': class_agnostic_bbox_pred, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + } + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation = tf_utils.get_activation(activation) + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the variables of the head.""" + conv_op = (tf.keras.layers.SeparableConv2D + if self._config_dict['use_separable_conv'] + else tf.keras.layers.Conv2D) + conv_kwargs = { + 'filters': self._config_dict['num_filters'], + 'kernel_size': 3, + 'padding': 'same', + } + if self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'depthwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'pointwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'depthwise_regularizer': self._config_dict['kernel_regularizer'], + 'pointwise_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + else: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + self._convs = [] + self._conv_norms = [] + for i in range(self._config_dict['num_convs']): + conv_name = 'detection-conv_{}'.format(i) + if 'kernel_initializer' in conv_kwargs: + conv_kwargs['kernel_initializer'] = tf_utils.clone_initializer( + conv_kwargs['kernel_initializer']) + self._convs.append(conv_op(name=conv_name, **conv_kwargs)) + bn_name = 'detection-conv-bn_{}'.format(i) + self._conv_norms.append(bn_op(name=bn_name, **bn_kwargs)) + + self._fcs = [] + self._fc_norms = [] + for i in range(self._config_dict['num_fcs']): + fc_name = 'detection-fc_{}'.format(i) + self._fcs.append( + tf.keras.layers.Dense( + units=self._config_dict['fc_dims'], + kernel_initializer=tf.keras.initializers.VarianceScaling( + scale=1 / 3.0, mode='fan_out', distribution='uniform'), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer'], + name=fc_name)) + bn_name = 'detection-fc-bn_{}'.format(i) + self._fc_norms.append(bn_op(name=bn_name, **bn_kwargs)) + + self._classifier = tf.keras.layers.Dense( + units=self._config_dict['num_classes'], + kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01), + bias_initializer=tf.zeros_initializer(), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer'], + name='detection-scores') + + num_box_outputs = (4 if self._config_dict['class_agnostic_bbox_pred'] else + self._config_dict['num_classes'] * 4) + self._box_regressor = tf.keras.layers.Dense( + units=num_box_outputs, + kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.001), + bias_initializer=tf.zeros_initializer(), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer'], + name='detection-boxes') + + super(DetectionHead, self).build(input_shape) + + def call(self, inputs: tf.Tensor, training: bool = None): + """Forward pass of box and class branches for the Mask-RCNN model. + + Args: + inputs: A `tf.Tensor` of the shape [batch_size, num_instances, roi_height, + roi_width, roi_channels], representing the ROI features. + training: a `bool` indicating whether it is in `training` mode. + + Returns: + class_outputs: A `tf.Tensor` of the shape + [batch_size, num_rois, num_classes], representing the class predictions. + box_outputs: A `tf.Tensor` of the shape + [batch_size, num_rois, num_classes * 4], representing the box + predictions. + """ + roi_features = inputs + _, num_rois, height, width, filters = roi_features.get_shape().as_list() + + x = tf.reshape(roi_features, [-1, height, width, filters]) + for conv, bn in zip(self._convs, self._conv_norms): + x = conv(x) + x = bn(x) + x = self._activation(x) + + _, _, _, filters = x.get_shape().as_list() + x = tf.reshape(x, [-1, num_rois, height * width * filters]) + + for fc, bn in zip(self._fcs, self._fc_norms): + x = fc(x) + x = bn(x) + x = self._activation(x) + + classes = self._classifier(x) + boxes = self._box_regressor(x) + return classes, boxes + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class MaskHead(tf.keras.layers.Layer): + """Creates a mask head.""" + + def __init__( + self, + num_classes: int, + upsample_factor: int = 2, + num_convs: int = 4, + num_filters: int = 256, + use_separable_conv: bool = False, + activation: str = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + class_agnostic: bool = False, + **kwargs): + """Initializes a mask head. + + Args: + num_classes: An `int` of the number of classes. + upsample_factor: An `int` that indicates the upsample factor to generate + the final predicted masks. It should be >= 1. + num_convs: An `int` number that represents the number of the intermediate + convolution layers before the mask prediction layers. + num_filters: An `int` number that represents the number of filters of the + intermediate convolution layers. + use_separable_conv: A `bool` that indicates whether the separable + convolution layers is used. + activation: A `str` that indicates which activation is used, e.g. 'relu', + 'swish', etc. + use_sync_bn: A `bool` that indicates whether to use synchronized batch + normalization across different replicas. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + class_agnostic: A `bool`. If set, we use a single channel mask head that + is shared between all classes. + **kwargs: Additional keyword arguments to be passed. + """ + super(MaskHead, self).__init__(**kwargs) + self._config_dict = { + 'num_classes': num_classes, + 'upsample_factor': upsample_factor, + 'num_convs': num_convs, + 'num_filters': num_filters, + 'use_separable_conv': use_separable_conv, + 'activation': activation, + 'use_sync_bn': use_sync_bn, + 'norm_momentum': norm_momentum, + 'norm_epsilon': norm_epsilon, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + 'class_agnostic': class_agnostic + } + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation = tf_utils.get_activation(activation) + + def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): + """Creates the variables of the head.""" + conv_op = (tf.keras.layers.SeparableConv2D + if self._config_dict['use_separable_conv'] + else tf.keras.layers.Conv2D) + conv_kwargs = { + 'filters': self._config_dict['num_filters'], + 'kernel_size': 3, + 'padding': 'same', + } + if self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'depthwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'pointwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'depthwise_regularizer': self._config_dict['kernel_regularizer'], + 'pointwise_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + else: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + bn_op = (tf.keras.layers.experimental.SyncBatchNormalization + if self._config_dict['use_sync_bn'] + else tf.keras.layers.BatchNormalization) + bn_kwargs = { + 'axis': self._bn_axis, + 'momentum': self._config_dict['norm_momentum'], + 'epsilon': self._config_dict['norm_epsilon'], + } + + self._convs = [] + self._conv_norms = [] + for i in range(self._config_dict['num_convs']): + conv_name = 'mask-conv_{}'.format(i) + for initializer_name in ['kernel_initializer', 'depthwise_initializer', + 'pointwise_initializer']: + if initializer_name in conv_kwargs: + conv_kwargs[initializer_name] = tf_utils.clone_initializer( + conv_kwargs[initializer_name]) + self._convs.append(conv_op(name=conv_name, **conv_kwargs)) + bn_name = 'mask-conv-bn_{}'.format(i) + self._conv_norms.append(bn_op(name=bn_name, **bn_kwargs)) + + self._deconv = tf.keras.layers.Conv2DTranspose( + filters=self._config_dict['num_filters'], + kernel_size=self._config_dict['upsample_factor'], + strides=self._config_dict['upsample_factor'], + padding='valid', + kernel_initializer=tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + bias_initializer=tf.zeros_initializer(), + kernel_regularizer=self._config_dict['kernel_regularizer'], + bias_regularizer=self._config_dict['bias_regularizer'], + name='mask-upsampling') + self._deconv_bn = bn_op(name='mask-deconv-bn', **bn_kwargs) + + if self._config_dict['class_agnostic']: + num_filters = 1 + else: + num_filters = self._config_dict['num_classes'] + + conv_kwargs = { + 'filters': num_filters, + 'kernel_size': 1, + 'padding': 'valid', + } + if self._config_dict['use_separable_conv']: + conv_kwargs.update({ + 'depthwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'pointwise_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'depthwise_regularizer': self._config_dict['kernel_regularizer'], + 'pointwise_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + else: + conv_kwargs.update({ + 'kernel_initializer': tf.keras.initializers.VarianceScaling( + scale=2, mode='fan_out', distribution='untruncated_normal'), + 'bias_initializer': tf.zeros_initializer(), + 'kernel_regularizer': self._config_dict['kernel_regularizer'], + 'bias_regularizer': self._config_dict['bias_regularizer'], + }) + self._mask_regressor = conv_op(name='mask-logits', **conv_kwargs) + + super(MaskHead, self).build(input_shape) + + def call(self, inputs: List[tf.Tensor], training: bool = None): + """Forward pass of mask branch for the Mask-RCNN model. + + Args: + inputs: A `list` of two tensors where + inputs[0]: A `tf.Tensor` of shape [batch_size, num_instances, + roi_height, roi_width, roi_channels], representing the ROI features. + inputs[1]: A `tf.Tensor` of shape [batch_size, num_instances], + representing the classes of the ROIs. + training: A `bool` indicating whether it is in `training` mode. + + Returns: + mask_outputs: A `tf.Tensor` of shape + [batch_size, num_instances, roi_height * upsample_factor, + roi_width * upsample_factor], representing the mask predictions. + """ + roi_features, roi_classes = inputs + batch_size, num_rois, height, width, filters = ( + roi_features.get_shape().as_list()) + if batch_size is None: + batch_size = tf.shape(roi_features)[0] + + x = tf.reshape(roi_features, [-1, height, width, filters]) + for conv, bn in zip(self._convs, self._conv_norms): + x = conv(x) + x = bn(x) + x = self._activation(x) + + x = self._deconv(x) + x = self._deconv_bn(x) + x = self._activation(x) + + logits = self._mask_regressor(x) + + mask_height = height * self._config_dict['upsample_factor'] + mask_width = width * self._config_dict['upsample_factor'] + + if self._config_dict['class_agnostic']: + logits = tf.reshape(logits, [-1, num_rois, mask_height, mask_width, 1]) + else: + logits = tf.reshape( + logits, + [-1, num_rois, mask_height, mask_width, + self._config_dict['num_classes']]) + + batch_indices = tf.tile( + tf.expand_dims(tf.range(batch_size), axis=1), [1, num_rois]) + mask_indices = tf.tile( + tf.expand_dims(tf.range(num_rois), axis=0), [batch_size, 1]) + + if self._config_dict['class_agnostic']: + class_gather_indices = tf.zeros_like(roi_classes, dtype=tf.int32) + else: + class_gather_indices = tf.cast(roi_classes, dtype=tf.int32) + + gather_indices = tf.stack( + [batch_indices, mask_indices, class_gather_indices], + axis=2) + mask_outputs = tf.gather_nd( + tf.transpose(logits, [0, 1, 4, 2, 3]), gather_indices) + return mask_outputs + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/vision/modeling/heads/instance_heads_test.py b/official/vision/modeling/heads/instance_heads_test.py new file mode 100644 index 0000000000000000000000000000000000000000..4be5f15c321ebcd438dc6bcfacfd7d5989181d44 --- /dev/null +++ b/official/vision/modeling/heads/instance_heads_test.py @@ -0,0 +1,134 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for instance_heads.py.""" + +# Import libraries +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.vision.modeling.heads import instance_heads + + +class DetectionHeadTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (0, 0, False, False), + (0, 1, False, False), + (1, 0, False, False), + (1, 1, False, False), + ) + def test_forward(self, num_convs, num_fcs, use_separable_conv, use_sync_bn): + detection_head = instance_heads.DetectionHead( + num_classes=3, + num_convs=num_convs, + num_filters=16, + use_separable_conv=use_separable_conv, + num_fcs=num_fcs, + fc_dims=4, + activation='relu', + use_sync_bn=use_sync_bn, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + roi_features = np.random.rand(2, 10, 128, 128, 16) + scores, boxes = detection_head(roi_features) + self.assertAllEqual(scores.numpy().shape, [2, 10, 3]) + self.assertAllEqual(boxes.numpy().shape, [2, 10, 12]) + + def test_serialize_deserialize(self): + detection_head = instance_heads.DetectionHead( + num_classes=91, + num_convs=0, + num_filters=256, + use_separable_conv=False, + num_fcs=2, + fc_dims=1024, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + config = detection_head.get_config() + new_detection_head = instance_heads.DetectionHead.from_config(config) + self.assertAllEqual( + detection_head.get_config(), new_detection_head.get_config()) + + +class MaskHeadTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (1, 1, False), + (1, 2, False), + (2, 1, False), + (2, 2, False), + ) + def test_forward(self, upsample_factor, num_convs, use_sync_bn): + mask_head = instance_heads.MaskHead( + num_classes=3, + upsample_factor=upsample_factor, + num_convs=num_convs, + num_filters=16, + use_separable_conv=False, + activation='relu', + use_sync_bn=use_sync_bn, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + roi_features = np.random.rand(2, 10, 14, 14, 16) + roi_classes = np.zeros((2, 10)) + masks = mask_head([roi_features, roi_classes]) + self.assertAllEqual( + masks.numpy().shape, + [2, 10, 14 * upsample_factor, 14 * upsample_factor]) + + def test_serialize_deserialize(self): + mask_head = instance_heads.MaskHead( + num_classes=3, + upsample_factor=2, + num_convs=1, + num_filters=256, + use_separable_conv=False, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + kernel_regularizer=None, + bias_regularizer=None, + ) + config = mask_head.get_config() + new_mask_head = instance_heads.MaskHead.from_config(config) + self.assertAllEqual( + mask_head.get_config(), new_mask_head.get_config()) + + def test_forward_class_agnostic(self): + mask_head = instance_heads.MaskHead( + num_classes=3, + class_agnostic=True + ) + roi_features = np.random.rand(2, 10, 14, 14, 16) + roi_classes = np.zeros((2, 10)) + masks = mask_head([roi_features, roi_classes]) + self.assertAllEqual(masks.numpy().shape, [2, 10, 28, 28]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/modeling/heads/segmentation_heads.py b/official/vision/modeling/heads/segmentation_heads.py similarity index 91% rename from official/vision/beta/modeling/heads/segmentation_heads.py rename to official/vision/modeling/heads/segmentation_heads.py index 89f6d606b9b699278f0ab0b1c13762f806b9da11..a2aebdff28450f014e5d14eb8e9fbe53bcabf215 100644 --- a/official/vision/beta/modeling/heads/segmentation_heads.py +++ b/official/vision/modeling/heads/segmentation_heads.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ from typing import List, Union, Optional, Mapping, Tuple, Any import tensorflow as tf from official.modeling import tf_utils -from official.vision.beta.modeling.layers import nn_layers -from official.vision.beta.ops import spatial_transform_ops +from official.vision.modeling.layers import nn_layers +from official.vision.ops import spatial_transform_ops class MaskScoring(tf.keras.Model): @@ -118,6 +118,9 @@ class MaskScoring(tf.keras.Model): self._conv_norms = [] for i in range(self._config_dict['num_convs']): conv_name = 'mask-scoring_{}'.format(i) + if 'kernel_initializer' in conv_kwargs: + conv_kwargs['kernel_initializer'] = tf_utils.clone_initializer( + conv_kwargs['kernel_initializer']) self._convs.append(conv_op(name=conv_name, **conv_kwargs)) bn_name = 'mask-scoring-bn_{}'.format(i) self._conv_norms.append(bn_op(name=bn_name, **bn_kwargs)) @@ -233,8 +236,9 @@ class SegmentationHead(tf.keras.layers.Layer): prediction layer. upsample_factor: An `int` number to specify the upsampling factor to generate finer mask. Default 1 means no upsampling is applied. - feature_fusion: One of `deeplabv3plus`, `pyramid_fusion`, - `panoptic_fpn_fusion`, or None. If `deeplabv3plus`, features from + feature_fusion: One of the constants in nn_layers.FeatureFusion, namely + `deeplabv3plus`, `pyramid_fusion`, `panoptic_fpn_fusion`, + `deeplabv3plus_sum_to_merge`, or None. If `deeplabv3plus`, features from decoder_features[level] will be fused with low level feature maps from backbone. If `pyramid_fusion`, multiscale features will be resized and fused at the target level. @@ -245,10 +249,12 @@ class SegmentationHead(tf.keras.layers.Layer): feature fusion. It is only used when feature_fusion is set to `panoptic_fpn_fusion`. low_level: An `int` of backbone level to be used for feature fusion. It is - used when feature_fusion is set to `deeplabv3plus`. + used when feature_fusion is set to `deeplabv3plus` or + `deeplabv3plus_sum_to_merge`. low_level_num_filters: An `int` of reduced number of filters for the low level features before fusing it with higher level features. It is only - used when feature_fusion is set to `deeplabv3plus`. + used when feature_fusion is set to `deeplabv3plus` or + `deeplabv3plus_sum_to_merge`. num_decoder_filters: An `int` of number of filters in the decoder outputs. It is only used when feature_fusion is set to `panoptic_fpn_fusion`. activation: A `str` that indicates which activation is used, e.g. 'relu', @@ -294,15 +300,7 @@ class SegmentationHead(tf.keras.layers.Layer): def build(self, input_shape: Union[tf.TensorShape, List[tf.TensorShape]]): """Creates the variables of the segmentation head.""" use_depthwise_convolution = self._config_dict['use_depthwise_convolution'] - random_initializer = tf.keras.initializers.RandomNormal(stddev=0.01) conv_op = tf.keras.layers.Conv2D - conv_kwargs = { - 'kernel_size': 3 if not use_depthwise_convolution else 1, - 'padding': 'same', - 'use_bias': False, - 'kernel_initializer': random_initializer, - 'kernel_regularizer': self._config_dict['kernel_regularizer'], - } bn_op = (tf.keras.layers.experimental.SyncBatchNormalization if self._config_dict['use_sync_bn'] else tf.keras.layers.BatchNormalization) @@ -312,7 +310,8 @@ class SegmentationHead(tf.keras.layers.Layer): 'epsilon': self._config_dict['norm_epsilon'], } - if self._config_dict['feature_fusion'] == 'deeplabv3plus': + if self._config_dict['feature_fusion'] in {'deeplabv3plus', + 'deeplabv3plus_sum_to_merge'}: # Deeplabv3+ feature fusion layers. self._dlv3p_conv = conv_op( kernel_size=1, @@ -348,7 +347,8 @@ class SegmentationHead(tf.keras.layers.Layer): kernel_size=3, padding='same', use_bias=False, - depthwise_initializer=random_initializer, + depthwise_initializer=tf.keras.initializers.RandomNormal( + stddev=0.01), depthwise_regularizer=self._config_dict['kernel_regularizer'], depth_multiplier=1)) norm_name = 'segmentation_head_depthwise_norm_{}'.format(i) @@ -358,7 +358,12 @@ class SegmentationHead(tf.keras.layers.Layer): conv_op( name=conv_name, filters=self._config_dict['num_filters'], - **conv_kwargs)) + kernel_size=3 if not use_depthwise_convolution else 1, + padding='same', + use_bias=False, + kernel_initializer=tf.keras.initializers.RandomNormal( + stddev=0.01), + kernel_regularizer=self._config_dict['kernel_regularizer'])) norm_name = 'segmentation_head_norm_{}'.format(i) self._norms.append(bn_op(name=norm_name, **bn_kwargs)) @@ -398,7 +403,8 @@ class SegmentationHead(tf.keras.layers.Layer): backbone_output = inputs[0] decoder_output = inputs[1] - if self._config_dict['feature_fusion'] == 'deeplabv3plus': + if self._config_dict['feature_fusion'] in {'deeplabv3plus', + 'deeplabv3plus_sum_to_merge'}: # deeplabv3+ feature fusion x = decoder_output[str(self._config_dict['level'])] if isinstance( decoder_output, dict) else decoder_output @@ -410,7 +416,10 @@ class SegmentationHead(tf.keras.layers.Layer): x = tf.image.resize( x, tf.shape(y)[1:3], method=tf.image.ResizeMethod.BILINEAR) x = tf.cast(x, dtype=y.dtype) - x = tf.concat([x, y], axis=self._bn_axis) + if self._config_dict['feature_fusion'] == 'deeplabv3plus': + x = tf.concat([x, y], axis=self._bn_axis) + else: + x = tf.keras.layers.Add()([x, y]) elif self._config_dict['feature_fusion'] == 'pyramid_fusion': if not isinstance(decoder_output, dict): raise ValueError('Only support dictionary decoder_output.') diff --git a/official/vision/beta/modeling/heads/segmentation_heads_test.py b/official/vision/modeling/heads/segmentation_heads_test.py similarity index 90% rename from official/vision/beta/modeling/heads/segmentation_heads_test.py rename to official/vision/modeling/heads/segmentation_heads_test.py index 2ec7ded68c17df6c7c6418bd9f6c656e7f0e27ea..21eff5bbaf1a709aa1b251d794e8c8cae901a2b3 100644 --- a/official/vision/beta/modeling/heads/segmentation_heads_test.py +++ b/official/vision/modeling/heads/segmentation_heads_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for segmentation_heads.py.""" # Import libraries @@ -20,7 +19,7 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.modeling.heads import segmentation_heads +from official.vision.modeling.heads import segmentation_heads class SegmentationHeadTest(parameterized.TestCase, tf.test.TestCase): @@ -31,7 +30,9 @@ class SegmentationHeadTest(parameterized.TestCase, tf.test.TestCase): (2, 'panoptic_fpn_fusion', 2, 5), (2, 'panoptic_fpn_fusion', 2, 6), (3, 'panoptic_fpn_fusion', 3, 5), - (3, 'panoptic_fpn_fusion', 3, 6)) + (3, 'panoptic_fpn_fusion', 3, 6), + (3, 'deeplabv3plus', 3, 6), + (3, 'deeplabv3plus_sum_to_merge', 3, 6)) def test_forward(self, level, feature_fusion, decoder_min_level, decoder_max_level): backbone_features = { @@ -53,6 +54,8 @@ class SegmentationHeadTest(parameterized.TestCase, tf.test.TestCase): head = segmentation_heads.SegmentationHead( num_classes=10, level=level, + low_level=decoder_min_level, + low_level_num_filters=64, feature_fusion=feature_fusion, decoder_min_level=decoder_min_level, decoder_max_level=decoder_max_level, @@ -60,7 +63,7 @@ class SegmentationHeadTest(parameterized.TestCase, tf.test.TestCase): logits = head((backbone_features, decoder_features)) - if level in decoder_features: + if str(level) in decoder_features: self.assertAllEqual(logits.numpy().shape, [ 2, decoder_features[str(level)].shape[1], decoder_features[str(level)].shape[2], 10 @@ -91,6 +94,7 @@ class MaskScoringHeadTest(parameterized.TestCase, tf.test.TestCase): num_convs=num_convs, num_filters=num_filters, fc_dims=128, + num_fcs=num_fcs, fc_input_size=fc_input_size) scores = head(features) diff --git a/official/vision/modeling/layers/__init__.py b/official/vision/modeling/layers/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..bdba7c60bb30251b227b7c0f9a0361ae46363073 --- /dev/null +++ b/official/vision/modeling/layers/__init__.py @@ -0,0 +1,43 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Layers package definition.""" + +from official.vision.modeling.layers.box_sampler import BoxSampler +from official.vision.modeling.layers.detection_generator import DetectionGenerator +from official.vision.modeling.layers.detection_generator import MultilevelDetectionGenerator +from official.vision.modeling.layers.mask_sampler import MaskSampler +from official.vision.modeling.layers.nn_blocks import BottleneckBlock +from official.vision.modeling.layers.nn_blocks import BottleneckResidualInner +from official.vision.modeling.layers.nn_blocks import DepthwiseSeparableConvBlock +from official.vision.modeling.layers.nn_blocks import InvertedBottleneckBlock +from official.vision.modeling.layers.nn_blocks import ResidualBlock +from official.vision.modeling.layers.nn_blocks import ResidualInner +from official.vision.modeling.layers.nn_blocks import ReversibleLayer +from official.vision.modeling.layers.nn_blocks_3d import BottleneckBlock3D +from official.vision.modeling.layers.nn_blocks_3d import SelfGating +from official.vision.modeling.layers.nn_layers import CausalConvMixin +from official.vision.modeling.layers.nn_layers import Conv2D +from official.vision.modeling.layers.nn_layers import Conv3D +from official.vision.modeling.layers.nn_layers import DepthwiseConv2D +from official.vision.modeling.layers.nn_layers import GlobalAveragePool3D +from official.vision.modeling.layers.nn_layers import PositionalEncoding +from official.vision.modeling.layers.nn_layers import Scale +from official.vision.modeling.layers.nn_layers import SpatialAveragePool3D +from official.vision.modeling.layers.nn_layers import SqueezeExcitation +from official.vision.modeling.layers.nn_layers import StochasticDepth +from official.vision.modeling.layers.nn_layers import TemporalSoftmaxPool +from official.vision.modeling.layers.roi_aligner import MultilevelROIAligner +from official.vision.modeling.layers.roi_generator import MultilevelROIGenerator +from official.vision.modeling.layers.roi_sampler import ROISampler diff --git a/official/vision/beta/modeling/layers/box_sampler.py b/official/vision/modeling/layers/box_sampler.py similarity index 96% rename from official/vision/beta/modeling/layers/box_sampler.py rename to official/vision/modeling/layers/box_sampler.py index 3dfefbc680ea94722c214878d9479a7137b4d060..b04e0d87187cebdeb0a5467520c39ac186cf3498 100644 --- a/official/vision/beta/modeling/layers/box_sampler.py +++ b/official/vision/modeling/layers/box_sampler.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ # Import libraries import tensorflow as tf -from official.vision.beta.ops import sampling_ops +from official.vision.ops import sampling_ops @tf.keras.utils.register_keras_serializable(package='Vision') diff --git a/official/vision/beta/modeling/layers/deeplab.py b/official/vision/modeling/layers/deeplab.py similarity index 79% rename from official/vision/beta/modeling/layers/deeplab.py rename to official/vision/modeling/layers/deeplab.py index 78468c49e1f7e7744c837919972e362d4e92195a..8a3d38177e5702952a82ab36e0414800d1ad547c 100644 --- a/official/vision/beta/modeling/layers/deeplab.py +++ b/official/vision/modeling/layers/deeplab.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,6 +16,8 @@ import tensorflow as tf +from official.modeling import tf_utils + class SpatialPyramidPooling(tf.keras.layers.Layer): """Implements the Atrous Spatial Pyramid Pooling. @@ -85,8 +87,6 @@ class SpatialPyramidPooling(tf.keras.layers.Layer): self.use_depthwise_convolution = use_depthwise_convolution def build(self, input_shape): - height = input_shape[1] - width = input_shape[2] channels = input_shape[3] self.aspp_layers = [] @@ -103,8 +103,10 @@ class SpatialPyramidPooling(tf.keras.layers.Layer): conv_sequential = tf.keras.Sequential([ tf.keras.layers.Conv2D( - filters=self.output_channels, kernel_size=(1, 1), - kernel_initializer=self.kernel_initializer, + filters=self.output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer( + self.kernel_initializer), kernel_regularizer=self.kernel_regularizer, use_bias=False), bn_op( @@ -121,21 +123,29 @@ class SpatialPyramidPooling(tf.keras.layers.Layer): if self.use_depthwise_convolution: leading_layers += [ tf.keras.layers.DepthwiseConv2D( - depth_multiplier=1, kernel_size=kernel_size, - padding='same', depthwise_regularizer=self.kernel_regularizer, - depthwise_initializer=self.kernel_initializer, - dilation_rate=dilation_rate, use_bias=False) + depth_multiplier=1, + kernel_size=kernel_size, + padding='same', + dilation_rate=dilation_rate, + use_bias=False) ] kernel_size = (1, 1) conv_sequential = tf.keras.Sequential(leading_layers + [ tf.keras.layers.Conv2D( - filters=self.output_channels, kernel_size=kernel_size, - padding='same', kernel_regularizer=self.kernel_regularizer, - kernel_initializer=self.kernel_initializer, - dilation_rate=dilation_rate, use_bias=False), - bn_op(axis=bn_axis, momentum=self.batchnorm_momentum, - epsilon=self.batchnorm_epsilon), - tf.keras.layers.Activation(self.activation)]) + filters=self.output_channels, + kernel_size=kernel_size, + padding='same', + kernel_regularizer=self.kernel_regularizer, + kernel_initializer=tf_utils.clone_initializer( + self.kernel_initializer), + dilation_rate=dilation_rate, + use_bias=False), + bn_op( + axis=bn_axis, + momentum=self.batchnorm_momentum, + epsilon=self.batchnorm_epsilon), + tf.keras.layers.Activation(self.activation) + ]) self.aspp_layers.append(conv_sequential) if self.pool_kernel_size is None: @@ -151,27 +161,25 @@ class SpatialPyramidPooling(tf.keras.layers.Layer): tf.keras.layers.Conv2D( filters=self.output_channels, kernel_size=(1, 1), - kernel_initializer=self.kernel_initializer, + kernel_initializer=tf_utils.clone_initializer( + self.kernel_initializer), kernel_regularizer=self.kernel_regularizer, use_bias=False), bn_op( axis=bn_axis, momentum=self.batchnorm_momentum, epsilon=self.batchnorm_epsilon), - tf.keras.layers.Activation(self.activation), - tf.keras.layers.experimental.preprocessing.Resizing( - height, - width, - interpolation=self.interpolation, - dtype=tf.float32) + tf.keras.layers.Activation(self.activation) ])) self.aspp_layers.append(pool_sequential) self.projection = tf.keras.Sequential([ tf.keras.layers.Conv2D( - filters=self.output_channels, kernel_size=(1, 1), - kernel_initializer=self.kernel_initializer, + filters=self.output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer( + self.kernel_initializer), kernel_regularizer=self.kernel_regularizer, use_bias=False), bn_op( @@ -179,14 +187,19 @@ class SpatialPyramidPooling(tf.keras.layers.Layer): momentum=self.batchnorm_momentum, epsilon=self.batchnorm_epsilon), tf.keras.layers.Activation(self.activation), - tf.keras.layers.Dropout(rate=self.dropout)]) + tf.keras.layers.Dropout(rate=self.dropout) + ]) def call(self, inputs, training=None): if training is None: training = tf.keras.backend.learning_phase() result = [] - for layer in self.aspp_layers: - result.append(tf.cast(layer(inputs, training=training), inputs.dtype)) + for i, layer in enumerate(self.aspp_layers): + x = layer(inputs, training=training) + # Apply resize layer to the end of the last set of layers. + if i == len(self.aspp_layers) - 1: + x = tf.image.resize(tf.cast(x, tf.float32), tf.shape(inputs)[1:3]) + result.append(tf.cast(x, inputs.dtype)) result = tf.concat(result, axis=-1) result = self.projection(result, training=training) return result diff --git a/official/vision/beta/modeling/layers/deeplab_test.py b/official/vision/modeling/layers/deeplab_test.py similarity index 93% rename from official/vision/beta/modeling/layers/deeplab_test.py rename to official/vision/modeling/layers/deeplab_test.py index 2dbe15a19a8b05449b71f222f3264599ef71ead3..c3b7577f4d387cb3f5f74196a504eec46c6126fa 100644 --- a/official/vision/beta/modeling/layers/deeplab_test.py +++ b/official/vision/modeling/layers/deeplab_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ import tensorflow as tf from tensorflow.python.keras import keras_parameterized -from official.vision.beta.modeling.layers import deeplab +from official.vision.modeling.layers import deeplab @keras_parameterized.run_all_keras_modes diff --git a/official/vision/modeling/layers/detection_generator.py b/official/vision/modeling/layers/detection_generator.py new file mode 100644 index 0000000000000000000000000000000000000000..aec8bd1d5a715573ac2ce23082a098a12b5a157c --- /dev/null +++ b/official/vision/modeling/layers/detection_generator.py @@ -0,0 +1,1001 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains definitions of generators to generate the final detections.""" +import contextlib +from typing import Any, Dict, List, Optional, Mapping, Sequence +# Import libraries +import tensorflow as tf + +from official.vision.ops import box_ops +from official.vision.ops import nms +from official.vision.ops import preprocess_ops + + +def _generate_detections_v1(boxes: tf.Tensor, + scores: tf.Tensor, + attributes: Optional[Mapping[str, + tf.Tensor]] = None, + pre_nms_top_k: int = 5000, + pre_nms_score_threshold: float = 0.05, + nms_iou_threshold: float = 0.5, + max_num_detections: int = 100, + soft_nms_sigma: Optional[float] = None): + """Generates the final detections given the model outputs. + + The implementation unrolls the batch dimension and process images one by one. + It required the batch dimension to be statically known and it is TPU + compatible. + + Args: + boxes: A `tf.Tensor` with shape `[batch_size, N, num_classes, 4]` or + `[batch_size, N, 1, 4]` for box predictions on all feature levels. The + N is the number of total anchors on all levels. + scores: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which + stacks class probability on all feature levels. The N is the number of + total anchors on all levels. The num_classes is the number of classes + predicted by the model. Note that the class_outputs here is the raw score. + attributes: None or a dict of (attribute_name, attributes) pairs. Each + attributes is a `tf.Tensor` with shape + `[batch_size, N, num_classes, attribute_size]` or + `[batch_size, N, 1, attribute_size]` for attribute predictions on all + feature levels. The N is the number of total anchors on all levels. Can + be None if no attribute learning is required. + pre_nms_top_k: An `int` number of top candidate detections per class before + NMS. + pre_nms_score_threshold: A `float` representing the threshold for deciding + when to remove boxes based on score. + nms_iou_threshold: A `float` representing the threshold for deciding whether + boxes overlap too much with respect to IOU. + max_num_detections: A scalar representing maximum number of boxes retained + over all classes. + soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. + When soft_nms_sigma=0.0 (which is default), we fall back to standard NMS. + + Returns: + nms_boxes: A `float` type `tf.Tensor` of shape + `[batch_size, max_num_detections, 4]` representing top detected boxes in + `[y1, x1, y2, x2]`. + nms_scores: A `float` type `tf.Tensor` of shape + `[batch_size, max_num_detections]` representing sorted confidence scores + for detected boxes. The values are between `[0, 1]`. + nms_classes: An `int` type `tf.Tensor` of shape + `[batch_size, max_num_detections]` representing classes for detected + boxes. + valid_detections: An `int` type `tf.Tensor` of shape `[batch_size]` only the + top `valid_detections` boxes are valid detections. + nms_attributes: None or a dict of (attribute_name, attributes). Each + attribute is a `float` type `tf.Tensor` of shape + `[batch_size, max_num_detections, attribute_size]` representing attribute + predictions for detected boxes. Can be an empty dict if no attribute + learning is required. + """ + with tf.name_scope('generate_detections'): + batch_size = scores.get_shape().as_list()[0] + nmsed_boxes = [] + nmsed_classes = [] + nmsed_scores = [] + valid_detections = [] + if attributes: + nmsed_attributes = {att_name: [] for att_name in attributes.keys()} + else: + nmsed_attributes = {} + + for i in range(batch_size): + (nmsed_boxes_i, nmsed_scores_i, nmsed_classes_i, valid_detections_i, + nmsed_att_i) = _generate_detections_per_image( + boxes[i], + scores[i], + attributes={ + att_name: att[i] for att_name, att in attributes.items() + } if attributes else {}, + pre_nms_top_k=pre_nms_top_k, + pre_nms_score_threshold=pre_nms_score_threshold, + nms_iou_threshold=nms_iou_threshold, + max_num_detections=max_num_detections, + soft_nms_sigma=soft_nms_sigma) + nmsed_boxes.append(nmsed_boxes_i) + nmsed_scores.append(nmsed_scores_i) + nmsed_classes.append(nmsed_classes_i) + valid_detections.append(valid_detections_i) + if attributes: + for att_name in attributes.keys(): + nmsed_attributes[att_name].append(nmsed_att_i[att_name]) + + nmsed_boxes = tf.stack(nmsed_boxes, axis=0) + nmsed_scores = tf.stack(nmsed_scores, axis=0) + nmsed_classes = tf.stack(nmsed_classes, axis=0) + valid_detections = tf.stack(valid_detections, axis=0) + if attributes: + for att_name in attributes.keys(): + nmsed_attributes[att_name] = tf.stack(nmsed_attributes[att_name], axis=0) + + return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, nmsed_attributes + + +def _generate_detections_per_image( + boxes: tf.Tensor, + scores: tf.Tensor, + attributes: Optional[Mapping[str, tf.Tensor]] = None, + pre_nms_top_k: int = 5000, + pre_nms_score_threshold: float = 0.05, + nms_iou_threshold: float = 0.5, + max_num_detections: int = 100, + soft_nms_sigma: Optional[float] = None): + """Generates the final detections per image given the model outputs. + + Args: + boxes: A `tf.Tensor` with shape `[N, num_classes, 4]` or `[N, 1, 4]`, which + box predictions on all feature levels. The N is the number of total + anchors on all levels. + scores: A `tf.Tensor` with shape `[N, num_classes]`, which stacks class + probability on all feature levels. The N is the number of total anchors on + all levels. The num_classes is the number of classes predicted by the + model. Note that the class_outputs here is the raw score. + attributes: If not None, a dict of `tf.Tensor`. Each value is in shape + `[N, num_classes, attribute_size]` or `[N, 1, attribute_size]` of + attribute predictions on all feature levels. The N is the number of total + anchors on all levels. + pre_nms_top_k: An `int` number of top candidate detections per class before + NMS. + pre_nms_score_threshold: A `float` representing the threshold for deciding + when to remove boxes based on score. + nms_iou_threshold: A `float` representing the threshold for deciding whether + boxes overlap too much with respect to IOU. + max_num_detections: A `scalar` representing maximum number of boxes retained + over all classes. + soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. + When soft_nms_sigma=0.0, we fall back to standard NMS. + If set to None, `tf.image.non_max_suppression_padded` is called instead. + + Returns: + nms_boxes: A `float` tf.Tensor of shape `[max_num_detections, 4]` + representing top detected boxes in `[y1, x1, y2, x2]`. + nms_scores: A `float` tf.Tensor of shape `[max_num_detections]` representing + sorted confidence scores for detected boxes. The values are between [0, + 1]. + nms_classes: An `int` tf.Tensor of shape `[max_num_detections]` representing + classes for detected boxes. + valid_detections: An `int` tf.Tensor of shape [1] only the top + `valid_detections` boxes are valid detections. + nms_attributes: None or a dict. Each value is a `float` tf.Tensor of shape + `[max_num_detections, attribute_size]` representing attribute predictions + for detected boxes. Can be an empty dict if `attributes` is None. + """ + nmsed_boxes = [] + nmsed_scores = [] + nmsed_classes = [] + num_classes_for_box = boxes.get_shape().as_list()[1] + num_classes = scores.get_shape().as_list()[1] + if attributes: + nmsed_attributes = {att_name: [] for att_name in attributes.keys()} + else: + nmsed_attributes = {} + + for i in range(num_classes): + boxes_i = boxes[:, min(num_classes_for_box - 1, i)] + scores_i = scores[:, i] + # Obtains pre_nms_top_k before running NMS. + scores_i, indices = tf.nn.top_k( + scores_i, k=tf.minimum(tf.shape(scores_i)[-1], pre_nms_top_k)) + boxes_i = tf.gather(boxes_i, indices) + + if soft_nms_sigma is not None: + (nmsed_indices_i, + nmsed_scores_i) = tf.image.non_max_suppression_with_scores( + tf.cast(boxes_i, tf.float32), + tf.cast(scores_i, tf.float32), + max_num_detections, + iou_threshold=nms_iou_threshold, + score_threshold=pre_nms_score_threshold, + soft_nms_sigma=soft_nms_sigma, + name='nms_detections_' + str(i)) + nmsed_boxes_i = tf.gather(boxes_i, nmsed_indices_i) + nmsed_boxes_i = preprocess_ops.clip_or_pad_to_fixed_size( + nmsed_boxes_i, max_num_detections, 0.0) + nmsed_scores_i = preprocess_ops.clip_or_pad_to_fixed_size( + nmsed_scores_i, max_num_detections, -1.0) + else: + (nmsed_indices_i, + nmsed_num_valid_i) = tf.image.non_max_suppression_padded( + tf.cast(boxes_i, tf.float32), + tf.cast(scores_i, tf.float32), + max_num_detections, + iou_threshold=nms_iou_threshold, + score_threshold=pre_nms_score_threshold, + pad_to_max_output_size=True, + name='nms_detections_' + str(i)) + nmsed_boxes_i = tf.gather(boxes_i, nmsed_indices_i) + nmsed_scores_i = tf.gather(scores_i, nmsed_indices_i) + # Sets scores of invalid boxes to -1. + nmsed_scores_i = tf.where( + tf.less(tf.range(max_num_detections), [nmsed_num_valid_i]), + nmsed_scores_i, -tf.ones_like(nmsed_scores_i)) + + nmsed_classes_i = tf.fill([max_num_detections], i) + nmsed_boxes.append(nmsed_boxes_i) + nmsed_scores.append(nmsed_scores_i) + nmsed_classes.append(nmsed_classes_i) + if attributes: + for att_name, att in attributes.items(): + num_classes_for_attr = att.get_shape().as_list()[1] + att_i = att[:, min(num_classes_for_attr - 1, i)] + att_i = tf.gather(att_i, indices) + nmsed_att_i = tf.gather(att_i, nmsed_indices_i) + nmsed_att_i = preprocess_ops.clip_or_pad_to_fixed_size( + nmsed_att_i, max_num_detections, 0.0) + nmsed_attributes[att_name].append(nmsed_att_i) + + # Concats results from all classes and sort them. + nmsed_boxes = tf.concat(nmsed_boxes, axis=0) + nmsed_scores = tf.concat(nmsed_scores, axis=0) + nmsed_classes = tf.concat(nmsed_classes, axis=0) + nmsed_scores, indices = tf.nn.top_k( + nmsed_scores, k=max_num_detections, sorted=True) + nmsed_boxes = tf.gather(nmsed_boxes, indices) + nmsed_classes = tf.gather(nmsed_classes, indices) + valid_detections = tf.reduce_sum( + tf.cast(tf.greater(nmsed_scores, -1), tf.int32)) + if attributes: + for att_name in attributes.keys(): + nmsed_attributes[att_name] = tf.concat(nmsed_attributes[att_name], axis=0) + nmsed_attributes[att_name] = tf.gather(nmsed_attributes[att_name], + indices) + + return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, nmsed_attributes + + +def _select_top_k_scores(scores_in: tf.Tensor, pre_nms_num_detections: int): + """Selects top_k scores and indices for each class. + + Args: + scores_in: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which + stacks class logit outputs on all feature levels. The N is the number of + total anchors on all levels. The num_classes is the number of classes + predicted by the model. + pre_nms_num_detections: Number of candidates before NMS. + + Returns: + scores and indices: A `tf.Tensor` with shape + `[batch_size, pre_nms_num_detections, num_classes]`. + """ + batch_size, num_anchors, num_class = scores_in.get_shape().as_list() + if batch_size is None: + batch_size = tf.shape(scores_in)[0] + scores_trans = tf.transpose(scores_in, perm=[0, 2, 1]) + scores_trans = tf.reshape(scores_trans, [-1, num_anchors]) + + top_k_scores, top_k_indices = tf.nn.top_k( + scores_trans, k=pre_nms_num_detections, sorted=True) + + top_k_scores = tf.reshape(top_k_scores, + [batch_size, num_class, pre_nms_num_detections]) + top_k_indices = tf.reshape(top_k_indices, + [batch_size, num_class, pre_nms_num_detections]) + + return tf.transpose(top_k_scores, + [0, 2, 1]), tf.transpose(top_k_indices, [0, 2, 1]) + + +def _generate_detections_v2(boxes: tf.Tensor, + scores: tf.Tensor, + pre_nms_top_k: int = 5000, + pre_nms_score_threshold: float = 0.05, + nms_iou_threshold: float = 0.5, + max_num_detections: int = 100): + """Generates the final detections given the model outputs. + + This implementation unrolls classes dimension while using the tf.while_loop + to implement the batched NMS, so that it can be parallelized at the batch + dimension. It should give better performance comparing to v1 implementation. + It is TPU compatible. + + Args: + boxes: A `tf.Tensor` with shape `[batch_size, N, num_classes, 4]` or + `[batch_size, N, 1, 4]`, which box predictions on all feature levels. The + N is the number of total anchors on all levels. + scores: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which + stacks class probability on all feature levels. The N is the number of + total anchors on all levels. The num_classes is the number of classes + predicted by the model. Note that the class_outputs here is the raw score. + pre_nms_top_k: An `int` number of top candidate detections per class before + NMS. + pre_nms_score_threshold: A `float` representing the threshold for deciding + when to remove boxes based on score. + nms_iou_threshold: A `float` representing the threshold for deciding whether + boxes overlap too much with respect to IOU. + max_num_detections: A `scalar` representing maximum number of boxes retained + over all classes. + + Returns: + nms_boxes: A `float` tf.Tensor of shape [batch_size, max_num_detections, 4] + representing top detected boxes in [y1, x1, y2, x2]. + nms_scores: A `float` tf.Tensor of shape [batch_size, max_num_detections] + representing sorted confidence scores for detected boxes. The values are + between [0, 1]. + nms_classes: An `int` tf.Tensor of shape [batch_size, max_num_detections] + representing classes for detected boxes. + valid_detections: An `int` tf.Tensor of shape [batch_size] only the top + `valid_detections` boxes are valid detections. + """ + with tf.name_scope('generate_detections'): + nmsed_boxes = [] + nmsed_classes = [] + nmsed_scores = [] + valid_detections = [] + batch_size, _, num_classes_for_box, _ = boxes.get_shape().as_list() + if batch_size is None: + batch_size = tf.shape(boxes)[0] + _, total_anchors, num_classes = scores.get_shape().as_list() + # Selects top pre_nms_num scores and indices before NMS. + scores, indices = _select_top_k_scores( + scores, min(total_anchors, pre_nms_top_k)) + for i in range(num_classes): + boxes_i = boxes[:, :, min(num_classes_for_box - 1, i), :] + scores_i = scores[:, :, i] + # Obtains pre_nms_top_k before running NMS. + boxes_i = tf.gather(boxes_i, indices[:, :, i], batch_dims=1, axis=1) + + # Filter out scores. + boxes_i, scores_i = box_ops.filter_boxes_by_scores( + boxes_i, scores_i, min_score_threshold=pre_nms_score_threshold) + + (nmsed_scores_i, nmsed_boxes_i) = nms.sorted_non_max_suppression_padded( + tf.cast(scores_i, tf.float32), + tf.cast(boxes_i, tf.float32), + max_num_detections, + iou_threshold=nms_iou_threshold) + nmsed_classes_i = tf.fill([batch_size, max_num_detections], i) + nmsed_boxes.append(nmsed_boxes_i) + nmsed_scores.append(nmsed_scores_i) + nmsed_classes.append(nmsed_classes_i) + nmsed_boxes = tf.concat(nmsed_boxes, axis=1) + nmsed_scores = tf.concat(nmsed_scores, axis=1) + nmsed_classes = tf.concat(nmsed_classes, axis=1) + nmsed_scores, indices = tf.nn.top_k( + nmsed_scores, k=max_num_detections, sorted=True) + nmsed_boxes = tf.gather(nmsed_boxes, indices, batch_dims=1, axis=1) + nmsed_classes = tf.gather(nmsed_classes, indices, batch_dims=1) + valid_detections = tf.reduce_sum( + input_tensor=tf.cast(tf.greater(nmsed_scores, 0.0), tf.int32), axis=1) + return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections + + +def _generate_detections_batched(boxes: tf.Tensor, scores: tf.Tensor, + pre_nms_score_threshold: float, + nms_iou_threshold: float, + max_num_detections: int): + """Generates detected boxes with scores and classes for one-stage detector. + + The function takes output of multi-level ConvNets and anchor boxes and + generates detected boxes. Note that this used batched nms, which is not + supported on TPU currently. + + Args: + boxes: A `tf.Tensor` with shape `[batch_size, N, num_classes, 4]` or + `[batch_size, N, 1, 4]`, which box predictions on all feature levels. The + N is the number of total anchors on all levels. + scores: A `tf.Tensor` with shape `[batch_size, N, num_classes]`, which + stacks class probability on all feature levels. The N is the number of + total anchors on all levels. The num_classes is the number of classes + predicted by the model. Note that the class_outputs here is the raw score. + pre_nms_score_threshold: A `float` representing the threshold for deciding + when to remove boxes based on score. + nms_iou_threshold: A `float` representing the threshold for deciding whether + boxes overlap too much with respect to IOU. + max_num_detections: A `scalar` representing maximum number of boxes retained + over all classes. + + Returns: + nms_boxes: A `float` tf.Tensor of shape [batch_size, max_num_detections, 4] + representing top detected boxes in [y1, x1, y2, x2]. + nms_scores: A `float` tf.Tensor of shape [batch_size, max_num_detections] + representing sorted confidence scores for detected boxes. The values are + between [0, 1]. + nms_classes: An `int` tf.Tensor of shape [batch_size, max_num_detections] + representing classes for detected boxes. + valid_detections: An `int` tf.Tensor of shape [batch_size] only the top + `valid_detections` boxes are valid detections. + """ + with tf.name_scope('generate_detections'): + nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections = ( + tf.image.combined_non_max_suppression( + boxes, + scores, + max_output_size_per_class=max_num_detections, + max_total_size=max_num_detections, + iou_threshold=nms_iou_threshold, + score_threshold=pre_nms_score_threshold, + pad_per_class=False, + clip_boxes=False)) + nmsed_classes = tf.cast(nmsed_classes, tf.int32) + return nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections + + +def _generate_detections_tflite_implements_signature( + config: Dict[str, Any]) -> str: + """Returns `experimental_implements` signature for TFLite's custom NMS op. + + This signature encodes the arguments to correctly initialize TFLite's custom + post-processing op in the MLIR converter. + For details on `experimental_implements` see here: + https://www.tensorflow.org/api_docs/python/tf/function + + Args: + config: A dictionary of configs defining parameters for TFLite NMS op. + + Returns: + An `experimental_implements` signature string. + """ + scale_value = 1.0 + + implements_signature = [ + 'name: "%s"' % 'TFLite_Detection_PostProcess', + 'attr { key: "max_detections" value { i: %d } }' % + config['max_detections'], + 'attr { key: "max_classes_per_detection" value { i: %d } }' % + config['max_classes_per_detection'], + 'attr { key: "use_regular_nms" value { b: %s } }' % + str(config['use_regular_nms']).lower(), + 'attr { key: "nms_score_threshold" value { f: %f } }' % + config['nms_score_threshold'], + 'attr { key: "nms_iou_threshold" value { f: %f } }' % + config['nms_iou_threshold'], + 'attr { key: "y_scale" value { f: %f } }' % scale_value, + 'attr { key: "x_scale" value { f: %f } }' % scale_value, + 'attr { key: "h_scale" value { f: %f } }' % scale_value, + 'attr { key: "w_scale" value { f: %f } }' % scale_value, + 'attr { key: "num_classes" value { i: %d } }' % config['num_classes'] + ] + implements_signature = ' '.join(implements_signature) + return implements_signature + + +def _generate_detections_tflite(raw_boxes: Mapping[str, tf.Tensor], + raw_scores: Mapping[str, tf.Tensor], + anchor_boxes: Mapping[str, tf.Tensor], + config: Dict[str, Any]) -> Sequence[Any]: + """Generate detections for conversion to TFLite. + + Mathematically same as class-agnostic NMS, except that the last portion of + the TF graph constitutes a dummy `tf.function` that contains an annotation + for conversion to TFLite's custom NMS op. Using this custom op allows + features like post-training quantization & accelerator support. + NOTE: This function does NOT return a valid output, and is only meant to + generate a SavedModel for TFLite conversion via MLIR. The generated SavedModel + should not be used for inference. + For TFLite op details, see tensorflow/lite/kernels/detection_postprocess.cc + + Args: + raw_boxes: A dictionary of tensors for raw boxes. Key is level of features + and value is a tensor denoting a level of boxes with shape [1, H, W, 4 * + num_anchors]. + raw_scores: A dictionary of tensors for classes. Key is level of features + and value is a tensor denoting a level of logits with shape [1, H, W, + num_class * num_anchors]. + anchor_boxes: A dictionary of tensors for anchor boxes. Key is level of + features and value is a tensor denoting a level of anchors with shape + [num_anchors, 4]. + config: A dictionary of configs defining parameters for TFLite NMS op. + + Returns: + A (dummy) tuple of (boxes, scores, classess, num_detections). + + Raises: + ValueError: If the last dimension of predicted boxes is not divisible by 4, + or the last dimension of predicted scores is not divisible by number of + anchors per location. + """ + scores, boxes, anchors = [], [], [] + levels = list(raw_scores.keys()) + min_level = int(min(levels)) + max_level = int(max(levels)) + batch_size = tf.shape(raw_scores[str(min_level)])[0] + + num_anchors_per_locations_times_4 = raw_boxes[str( + min_level)].get_shape().as_list()[-1] + if num_anchors_per_locations_times_4 % 4 != 0: + raise ValueError( + 'The last dimension of predicted boxes should be divisible by 4.') + num_anchors_per_locations = num_anchors_per_locations_times_4 // 4 + if num_anchors_per_locations_times_4 % 4 != 0: + raise ValueError( + f'The last dimension of predicted scores should be divisible by {num_anchors_per_locations}.' + ) + num_classes = raw_scores[str( + min_level)].get_shape().as_list()[-1] // num_anchors_per_locations + config.update({'num_classes': num_classes}) + + for i in range(min_level, max_level + 1): + scores.append( + tf.sigmoid( + tf.reshape(raw_scores[str(i)], [batch_size, -1, num_classes]))) + boxes.append(tf.reshape(raw_boxes[str(i)], [batch_size, -1, 4])) + anchors.append(tf.reshape(anchor_boxes[str(i)], [-1, 4])) + scores = tf.concat(scores, 1) + boxes = tf.concat(boxes, 1) + anchors = tf.concat(anchors, 0) + + ycenter_a = (anchors[..., 0] + anchors[..., 2]) / 2 + xcenter_a = (anchors[..., 1] + anchors[..., 3]) / 2 + ha = anchors[..., 2] - anchors[..., 0] + wa = anchors[..., 3] - anchors[..., 1] + anchors = tf.stack([ycenter_a, xcenter_a, ha, wa], axis=-1) + + # There is no TF equivalent for TFLite's custom post-processing op. + # So we add an 'empty' composite function here, that is legalized to the + # custom op with MLIR. + # For details, see: tensorflow/compiler/mlir/lite/utils/nms_utils.cc + @tf.function( + experimental_implements=_generate_detections_tflite_implements_signature( + config)) + # pylint: disable=g-unused-argument,unused-argument + def dummy_post_processing(input_boxes, input_scores, input_anchors): + boxes = tf.constant(0.0, dtype=tf.float32, name='boxes') + scores = tf.constant(0.0, dtype=tf.float32, name='scores') + classes = tf.constant(0.0, dtype=tf.float32, name='classes') + num_detections = tf.constant(0.0, dtype=tf.float32, name='num_detections') + return boxes, classes, scores, num_detections + + return dummy_post_processing(boxes, scores, anchors)[::-1] + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class DetectionGenerator(tf.keras.layers.Layer): + """Generates the final detected boxes with scores and classes.""" + + def __init__(self, + apply_nms: bool = True, + pre_nms_top_k: int = 5000, + pre_nms_score_threshold: float = 0.05, + nms_iou_threshold: float = 0.5, + max_num_detections: int = 100, + nms_version: str = 'v2', + use_cpu_nms: bool = False, + soft_nms_sigma: Optional[float] = None, + **kwargs): + """Initializes a detection generator. + + Args: + apply_nms: A `bool` of whether or not apply non maximum suppression. + If False, the decoded boxes and their scores are returned. + pre_nms_top_k: An `int` of the number of top scores proposals to be kept + before applying NMS. + pre_nms_score_threshold: A `float` of the score threshold to apply before + applying NMS. Proposals whose scores are below this threshold are + thrown away. + nms_iou_threshold: A `float` in [0, 1], the NMS IoU threshold. + max_num_detections: An `int` of the final number of total detections to + generate. + nms_version: A string of `batched`, `v1` or `v2` specifies NMS version. + use_cpu_nms: A `bool` of whether or not enforce NMS to run on CPU. + soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. + When soft_nms_sigma=0.0, we fall back to standard NMS. + **kwargs: Additional keyword arguments passed to Layer. + """ + self._config_dict = { + 'apply_nms': apply_nms, + 'pre_nms_top_k': pre_nms_top_k, + 'pre_nms_score_threshold': pre_nms_score_threshold, + 'nms_iou_threshold': nms_iou_threshold, + 'max_num_detections': max_num_detections, + 'nms_version': nms_version, + 'use_cpu_nms': use_cpu_nms, + 'soft_nms_sigma': soft_nms_sigma, + } + super(DetectionGenerator, self).__init__(**kwargs) + + def __call__(self, + raw_boxes: tf.Tensor, + raw_scores: tf.Tensor, + anchor_boxes: tf.Tensor, + image_shape: tf.Tensor, + regression_weights: Optional[List[float]] = None, + bbox_per_class: bool = True): + """Generates final detections. + + Args: + raw_boxes: A `tf.Tensor` of shape of `[batch_size, K, num_classes * 4]` + representing the class-specific box coordinates relative to anchors. + raw_scores: A `tf.Tensor` of shape of `[batch_size, K, num_classes]` + representing the class logits before applying score activiation. + anchor_boxes: A `tf.Tensor` of shape of `[batch_size, K, 4]` representing + the corresponding anchor boxes w.r.t `box_outputs`. + image_shape: A `tf.Tensor` of shape of `[batch_size, 2]` storing the image + height and width w.r.t. the scaled image, i.e. the same image space as + `box_outputs` and `anchor_boxes`. + regression_weights: A list of four float numbers to scale coordinates. + bbox_per_class: A `bool`. If True, perform per-class box regression. + + Returns: + If `apply_nms` = True, the return is a dictionary with keys: + `detection_boxes`: A `float` tf.Tensor of shape + [batch, max_num_detections, 4] representing top detected boxes in + [y1, x1, y2, x2]. + `detection_scores`: A `float` `tf.Tensor` of shape + [batch, max_num_detections] representing sorted confidence scores for + detected boxes. The values are between [0, 1]. + `detection_classes`: An `int` tf.Tensor of shape + [batch, max_num_detections] representing classes for detected boxes. + `num_detections`: An `int` tf.Tensor of shape [batch] only the first + `num_detections` boxes are valid detections + If `apply_nms` = False, the return is a dictionary with keys: + `decoded_boxes`: A `float` tf.Tensor of shape [batch, num_raw_boxes, 4] + representing all the decoded boxes. + `decoded_box_scores`: A `float` tf.Tensor of shape + [batch, num_raw_boxes] representing socres of all the decoded boxes. + """ + box_scores = tf.nn.softmax(raw_scores, axis=-1) + + # Removes the background class. + box_scores_shape = tf.shape(box_scores) + box_scores_shape_list = box_scores.get_shape().as_list() + batch_size = box_scores_shape[0] + num_locations = box_scores_shape_list[1] + num_classes = box_scores_shape_list[-1] + + box_scores = tf.slice(box_scores, [0, 0, 1], [-1, -1, -1]) + + if bbox_per_class: + num_detections = num_locations * (num_classes - 1) + raw_boxes = tf.reshape(raw_boxes, + [batch_size, num_locations, num_classes, 4]) + raw_boxes = tf.slice(raw_boxes, [0, 0, 1, 0], [-1, -1, -1, -1]) + anchor_boxes = tf.tile( + tf.expand_dims(anchor_boxes, axis=2), [1, 1, num_classes - 1, 1]) + raw_boxes = tf.reshape(raw_boxes, [batch_size, num_detections, 4]) + anchor_boxes = tf.reshape(anchor_boxes, [batch_size, num_detections, 4]) + + # Box decoding. + decoded_boxes = box_ops.decode_boxes( + raw_boxes, anchor_boxes, weights=regression_weights) + + # Box clipping + decoded_boxes = box_ops.clip_boxes( + decoded_boxes, tf.expand_dims(image_shape, axis=1)) + + if bbox_per_class: + decoded_boxes = tf.reshape( + decoded_boxes, [batch_size, num_locations, num_classes - 1, 4]) + else: + decoded_boxes = tf.expand_dims(decoded_boxes, axis=2) + + if not self._config_dict['apply_nms']: + return { + 'decoded_boxes': decoded_boxes, + 'decoded_box_scores': box_scores, + } + + # Optionally force the NMS be run on CPU. + if self._config_dict['use_cpu_nms']: + nms_context = tf.device('cpu:0') + else: + nms_context = contextlib.nullcontext() + + with nms_context: + if self._config_dict['nms_version'] == 'batched': + (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( + _generate_detections_batched( + decoded_boxes, box_scores, + self._config_dict['pre_nms_score_threshold'], + self._config_dict['nms_iou_threshold'], + self._config_dict['max_num_detections'])) + elif self._config_dict['nms_version'] == 'v1': + (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, _) = ( + _generate_detections_v1( + decoded_boxes, + box_scores, + pre_nms_top_k=self._config_dict['pre_nms_top_k'], + pre_nms_score_threshold=self + ._config_dict['pre_nms_score_threshold'], + nms_iou_threshold=self._config_dict['nms_iou_threshold'], + max_num_detections=self._config_dict['max_num_detections'], + soft_nms_sigma=self._config_dict['soft_nms_sigma'])) + elif self._config_dict['nms_version'] == 'v2': + (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( + _generate_detections_v2( + decoded_boxes, + box_scores, + pre_nms_top_k=self._config_dict['pre_nms_top_k'], + pre_nms_score_threshold=self + ._config_dict['pre_nms_score_threshold'], + nms_iou_threshold=self._config_dict['nms_iou_threshold'], + max_num_detections=self._config_dict['max_num_detections'])) + else: + raise ValueError('NMS version {} not supported.'.format( + self._config_dict['nms_version'])) + + # Adds 1 to offset the background class which has index 0. + nmsed_classes += 1 + + return { + 'num_detections': valid_detections, + 'detection_boxes': nmsed_boxes, + 'detection_classes': nmsed_classes, + 'detection_scores': nmsed_scores, + } + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class MultilevelDetectionGenerator(tf.keras.layers.Layer): + """Generates detected boxes with scores and classes for one-stage detector.""" + + def __init__(self, + apply_nms: bool = True, + pre_nms_top_k: int = 5000, + pre_nms_score_threshold: float = 0.05, + nms_iou_threshold: float = 0.5, + max_num_detections: int = 100, + nms_version: str = 'v1', + use_cpu_nms: bool = False, + soft_nms_sigma: Optional[float] = None, + tflite_post_processing_config: Optional[Dict[str, Any]] = None, + **kwargs): + """Initializes a multi-level detection generator. + + Args: + apply_nms: A `bool` of whether or not apply non maximum suppression. If + False, the decoded boxes and their scores are returned. + pre_nms_top_k: An `int` of the number of top scores proposals to be kept + before applying NMS. + pre_nms_score_threshold: A `float` of the score threshold to apply before + applying NMS. Proposals whose scores are below this threshold are thrown + away. + nms_iou_threshold: A `float` in [0, 1], the NMS IoU threshold. + max_num_detections: An `int` of the final number of total detections to + generate. + nms_version: A string of `batched`, `v1` or `v2` specifies NMS version + use_cpu_nms: A `bool` of whether or not enforce NMS to run on CPU. + soft_nms_sigma: A `float` representing the sigma parameter for Soft NMS. + When soft_nms_sigma=0.0, we fall back to standard NMS. + tflite_post_processing_config: An optional dictionary containing + post-processing parameters used for TFLite custom NMS op. + + **kwargs: Additional keyword arguments passed to Layer. + """ + self._config_dict = { + 'apply_nms': apply_nms, + 'pre_nms_top_k': pre_nms_top_k, + 'pre_nms_score_threshold': pre_nms_score_threshold, + 'nms_iou_threshold': nms_iou_threshold, + 'max_num_detections': max_num_detections, + 'nms_version': nms_version, + 'use_cpu_nms': use_cpu_nms, + 'soft_nms_sigma': soft_nms_sigma + } + + if tflite_post_processing_config is not None: + self._config_dict.update( + {'tflite_post_processing_config': tflite_post_processing_config}) + super(MultilevelDetectionGenerator, self).__init__(**kwargs) + + def _decode_multilevel_outputs( + self, + raw_boxes: Mapping[str, tf.Tensor], + raw_scores: Mapping[str, tf.Tensor], + anchor_boxes: Mapping[str, tf.Tensor], + image_shape: tf.Tensor, + raw_attributes: Optional[Mapping[str, tf.Tensor]] = None): + """Collects dict of multilevel boxes, scores, attributes into lists.""" + boxes = [] + scores = [] + if raw_attributes: + attributes = {att_name: [] for att_name in raw_attributes.keys()} + else: + attributes = {} + + levels = list(raw_boxes.keys()) + min_level = int(min(levels)) + max_level = int(max(levels)) + for i in range(min_level, max_level + 1): + raw_boxes_i = raw_boxes[str(i)] + raw_scores_i = raw_scores[str(i)] + batch_size = tf.shape(raw_boxes_i)[0] + (_, feature_h_i, feature_w_i, + num_anchors_per_locations_times_4) = raw_boxes_i.get_shape().as_list() + num_locations = feature_h_i * feature_w_i + num_anchors_per_locations = num_anchors_per_locations_times_4 // 4 + num_classes = raw_scores_i.get_shape().as_list( + )[-1] // num_anchors_per_locations + + # Applies score transformation and remove the implicit background class. + scores_i = tf.sigmoid( + tf.reshape(raw_scores_i, [ + batch_size, num_locations * num_anchors_per_locations, num_classes + ])) + scores_i = tf.slice(scores_i, [0, 0, 1], [-1, -1, -1]) + + # Box decoding. + # The anchor boxes are shared for all data in a batch. + # One stage detector only supports class agnostic box regression. + anchor_boxes_i = tf.reshape( + anchor_boxes[str(i)], + [batch_size, num_locations * num_anchors_per_locations, 4]) + raw_boxes_i = tf.reshape( + raw_boxes_i, + [batch_size, num_locations * num_anchors_per_locations, 4]) + boxes_i = box_ops.decode_boxes(raw_boxes_i, anchor_boxes_i) + + # Box clipping. + boxes_i = box_ops.clip_boxes( + boxes_i, tf.expand_dims(image_shape, axis=1)) + + boxes.append(boxes_i) + scores.append(scores_i) + + if raw_attributes: + for att_name, raw_att in raw_attributes.items(): + attribute_size = raw_att[str( + i)].get_shape().as_list()[-1] // num_anchors_per_locations + att_i = tf.reshape(raw_att[str(i)], [ + batch_size, num_locations * num_anchors_per_locations, + attribute_size + ]) + attributes[att_name].append(att_i) + + boxes = tf.concat(boxes, axis=1) + boxes = tf.expand_dims(boxes, axis=2) + scores = tf.concat(scores, axis=1) + + if raw_attributes: + for att_name in raw_attributes.keys(): + attributes[att_name] = tf.concat(attributes[att_name], axis=1) + attributes[att_name] = tf.expand_dims(attributes[att_name], axis=2) + + return boxes, scores, attributes + + def __call__(self, + raw_boxes: Mapping[str, tf.Tensor], + raw_scores: Mapping[str, tf.Tensor], + anchor_boxes: Mapping[str, tf.Tensor], + image_shape: tf.Tensor, + raw_attributes: Optional[Mapping[str, tf.Tensor]] = None): + """Generates final detections. + + Args: + raw_boxes: A `dict` with keys representing FPN levels and values + representing box tenors of shape `[batch, feature_h, feature_w, + num_anchors * 4]`. + raw_scores: A `dict` with keys representing FPN levels and values + representing logit tensors of shape `[batch, feature_h, feature_w, + num_anchors]`. + anchor_boxes: A `dict` with keys representing FPN levels and values + representing anchor tenors of shape `[batch_size, K, 4]` representing + the corresponding anchor boxes w.r.t `box_outputs`. + image_shape: A `tf.Tensor` of shape of [batch_size, 2] storing the image + height and width w.r.t. the scaled image, i.e. the same image space as + `box_outputs` and `anchor_boxes`. + raw_attributes: If not None, a `dict` of (attribute_name, + attribute_prediction) pairs. `attribute_prediction` is a dict that + contains keys representing FPN levels and values representing tenors of + shape `[batch, feature_h, feature_w, num_anchors * attribute_size]`. + + Returns: + If `apply_nms` = True, the return is a dictionary with keys: + `detection_boxes`: A `float` tf.Tensor of shape + [batch, max_num_detections, 4] representing top detected boxes in + [y1, x1, y2, x2]. + `detection_scores`: A `float` tf.Tensor of shape + [batch, max_num_detections] representing sorted confidence scores for + detected boxes. The values are between [0, 1]. + `detection_classes`: An `int` tf.Tensor of shape + [batch, max_num_detections] representing classes for detected boxes. + `num_detections`: An `int` tf.Tensor of shape [batch] only the first + `num_detections` boxes are valid detections + `detection_attributes`: A dict. Values of the dict is a `float` + tf.Tensor of shape [batch, max_num_detections, attribute_size] + representing attribute predictions for detected boxes. + If `apply_nms` = False, the return is a dictionary with keys: + `decoded_boxes`: A `float` tf.Tensor of shape [batch, num_raw_boxes, 4] + representing all the decoded boxes. + `decoded_box_scores`: A `float` tf.Tensor of shape + [batch, num_raw_boxes] representing socres of all the decoded boxes. + `decoded_box_attributes`: A dict. Values in the dict is a + `float` tf.Tensor of shape [batch, num_raw_boxes, attribute_size] + representing attribute predictions of all the decoded boxes. + """ + if self._config_dict['apply_nms'] and self._config_dict[ + 'nms_version'] == 'tflite': + boxes, classes, scores, num_detections = _generate_detections_tflite( + raw_boxes, raw_scores, anchor_boxes, + self.get_config()['tflite_post_processing_config']) + return { + 'num_detections': num_detections, + 'detection_boxes': boxes, + 'detection_classes': classes, + 'detection_scores': scores + } + + boxes, scores, attributes = self._decode_multilevel_outputs( + raw_boxes, raw_scores, anchor_boxes, image_shape, raw_attributes) + + if not self._config_dict['apply_nms']: + return { + 'decoded_boxes': boxes, + 'decoded_box_scores': scores, + 'decoded_box_attributes': attributes, + } + + # Optionally force the NMS to run on CPU. + if self._config_dict['use_cpu_nms']: + nms_context = tf.device('cpu:0') + else: + nms_context = contextlib.nullcontext() + + with nms_context: + if raw_attributes and (self._config_dict['nms_version'] != 'v1'): + raise ValueError( + 'Attribute learning is only supported for NMSv1 but NMS {} is used.' + .format(self._config_dict['nms_version'])) + if self._config_dict['nms_version'] == 'batched': + (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( + _generate_detections_batched( + boxes, scores, self._config_dict['pre_nms_score_threshold'], + self._config_dict['nms_iou_threshold'], + self._config_dict['max_num_detections'])) + # Set `nmsed_attributes` to None for batched NMS. + nmsed_attributes = {} + elif self._config_dict['nms_version'] == 'v1': + (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections, + nmsed_attributes) = ( + _generate_detections_v1( + boxes, + scores, + attributes=attributes if raw_attributes else None, + pre_nms_top_k=self._config_dict['pre_nms_top_k'], + pre_nms_score_threshold=self + ._config_dict['pre_nms_score_threshold'], + nms_iou_threshold=self._config_dict['nms_iou_threshold'], + max_num_detections=self._config_dict['max_num_detections'], + soft_nms_sigma=self._config_dict['soft_nms_sigma'])) + elif self._config_dict['nms_version'] == 'v2': + (nmsed_boxes, nmsed_scores, nmsed_classes, valid_detections) = ( + _generate_detections_v2( + boxes, + scores, + pre_nms_top_k=self._config_dict['pre_nms_top_k'], + pre_nms_score_threshold=self + ._config_dict['pre_nms_score_threshold'], + nms_iou_threshold=self._config_dict['nms_iou_threshold'], + max_num_detections=self._config_dict['max_num_detections'])) + # Set `nmsed_attributes` to None for v2. + nmsed_attributes = {} + else: + raise ValueError('NMS version {} not supported.'.format( + self._config_dict['nms_version'])) + + # Adds 1 to offset the background class which has index 0. + nmsed_classes += 1 + + return { + 'num_detections': valid_detections, + 'detection_boxes': nmsed_boxes, + 'detection_classes': nmsed_classes, + 'detection_scores': nmsed_scores, + 'detection_attributes': nmsed_attributes, + } + + def get_config(self): + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/vision/modeling/layers/detection_generator_test.py b/official/vision/modeling/layers/detection_generator_test.py new file mode 100644 index 0000000000000000000000000000000000000000..44a5007ad7dc5c691e275e7676f1de2c09e6d61c --- /dev/null +++ b/official/vision/modeling/layers/detection_generator_test.py @@ -0,0 +1,282 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for detection_generator.py.""" +# Import libraries + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.vision.modeling.layers import detection_generator +from official.vision.ops import anchor + + +class SelectTopKScoresTest(tf.test.TestCase): + + def testSelectTopKScores(self): + pre_nms_num_boxes = 2 + scores_data = [[[0.2, 0.2], [0.1, 0.9], [0.5, 0.1], [0.3, 0.5]]] + scores_in = tf.constant(scores_data, dtype=tf.float32) + top_k_scores, top_k_indices = detection_generator._select_top_k_scores( + scores_in, pre_nms_num_detections=pre_nms_num_boxes) + expected_top_k_scores = np.array([[[0.5, 0.9], [0.3, 0.5]]], + dtype=np.float32) + + expected_top_k_indices = [[[2, 1], [3, 3]]] + + self.assertAllEqual(top_k_scores.numpy(), expected_top_k_scores) + self.assertAllEqual(top_k_indices.numpy(), expected_top_k_indices) + + +class DetectionGeneratorTest( + parameterized.TestCase, tf.test.TestCase): + + @parameterized.product( + nms_version=['batched', 'v1', 'v2'], + use_cpu_nms=[True, False], + soft_nms_sigma=[None, 0.1]) + def testDetectionsOutputShape(self, nms_version, use_cpu_nms, soft_nms_sigma): + max_num_detections = 10 + num_classes = 4 + pre_nms_top_k = 5000 + pre_nms_score_threshold = 0.01 + batch_size = 1 + kwargs = { + 'apply_nms': True, + 'pre_nms_top_k': pre_nms_top_k, + 'pre_nms_score_threshold': pre_nms_score_threshold, + 'nms_iou_threshold': 0.5, + 'max_num_detections': max_num_detections, + 'nms_version': nms_version, + 'use_cpu_nms': use_cpu_nms, + 'soft_nms_sigma': soft_nms_sigma, + } + generator = detection_generator.DetectionGenerator(**kwargs) + + cls_outputs_all = ( + np.random.rand(84, num_classes) - 0.5) * 3 # random 84x3 outputs. + box_outputs_all = np.random.rand(84, 4 * num_classes) # random 84 boxes. + anchor_boxes_all = np.random.rand(84, 4) # random 84 boxes. + class_outputs = tf.reshape( + tf.convert_to_tensor(cls_outputs_all, dtype=tf.float32), + [1, 84, num_classes]) + box_outputs = tf.reshape( + tf.convert_to_tensor(box_outputs_all, dtype=tf.float32), + [1, 84, 4 * num_classes]) + anchor_boxes = tf.reshape( + tf.convert_to_tensor(anchor_boxes_all, dtype=tf.float32), + [1, 84, 4]) + image_info = tf.constant( + [[[1000, 1000], [100, 100], [0.1, 0.1], [0, 0]]], + dtype=tf.float32) + results = generator( + box_outputs, class_outputs, anchor_boxes, image_info[:, 1, :]) + boxes = results['detection_boxes'] + classes = results['detection_classes'] + scores = results['detection_scores'] + valid_detections = results['num_detections'] + + self.assertEqual(boxes.numpy().shape, (batch_size, max_num_detections, 4)) + self.assertEqual(scores.numpy().shape, (batch_size, max_num_detections,)) + self.assertEqual(classes.numpy().shape, (batch_size, max_num_detections,)) + self.assertEqual(valid_detections.numpy().shape, (batch_size,)) + + def test_serialize_deserialize(self): + kwargs = { + 'apply_nms': True, + 'pre_nms_top_k': 1000, + 'pre_nms_score_threshold': 0.1, + 'nms_iou_threshold': 0.5, + 'max_num_detections': 10, + 'nms_version': 'v2', + 'use_cpu_nms': False, + 'soft_nms_sigma': None, + } + generator = detection_generator.DetectionGenerator(**kwargs) + + expected_config = dict(kwargs) + self.assertEqual(generator.get_config(), expected_config) + + new_generator = ( + detection_generator.DetectionGenerator.from_config( + generator.get_config())) + + self.assertAllEqual(generator.get_config(), new_generator.get_config()) + + +class MultilevelDetectionGeneratorTest( + parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + ('batched', False, True, None, None), + ('batched', False, False, None, None), + ('v2', False, True, None, None), + ('v2', False, False, None, None), + ('v1', True, True, 0.0, None), + ('v1', True, False, 0.1, None), + ('v1', True, False, None, None), + ('tflite', False, False, None, True), + ('tflite', False, False, None, False), + ) + def testDetectionsOutputShape(self, nms_version, has_att_heads, use_cpu_nms, + soft_nms_sigma, use_regular_nms): + min_level = 4 + max_level = 6 + num_scales = 2 + max_num_detections = 10 + aspect_ratios = [1.0, 2.0] + anchor_scale = 2.0 + output_size = [64, 64] + num_classes = 4 + pre_nms_top_k = 5000 + pre_nms_score_threshold = 0.01 + batch_size = 1 + tflite_post_processing_config = { + 'max_detections': max_num_detections, + 'max_classes_per_detection': 1, + 'use_regular_nms': use_regular_nms, + 'nms_score_threshold': 0.01, + 'nms_iou_threshold': 0.5 + } + kwargs = { + 'apply_nms': True, + 'pre_nms_top_k': pre_nms_top_k, + 'pre_nms_score_threshold': pre_nms_score_threshold, + 'nms_iou_threshold': 0.5, + 'max_num_detections': max_num_detections, + 'nms_version': nms_version, + 'use_cpu_nms': use_cpu_nms, + 'soft_nms_sigma': soft_nms_sigma, + 'tflite_post_processing_config': tflite_post_processing_config + } + + input_anchor = anchor.build_anchor_generator(min_level, max_level, + num_scales, aspect_ratios, + anchor_scale) + anchor_boxes = input_anchor(output_size) + cls_outputs_all = ( + np.random.rand(84, num_classes) - 0.5) * 3 # random 84x3 outputs. + box_outputs_all = np.random.rand(84, 4) # random 84 boxes. + class_outputs = { + '4': + tf.reshape( + tf.convert_to_tensor(cls_outputs_all[0:64], dtype=tf.float32), + [1, 8, 8, num_classes]), + '5': + tf.reshape( + tf.convert_to_tensor(cls_outputs_all[64:80], dtype=tf.float32), + [1, 4, 4, num_classes]), + '6': + tf.reshape( + tf.convert_to_tensor(cls_outputs_all[80:84], dtype=tf.float32), + [1, 2, 2, num_classes]), + } + box_outputs = { + '4': tf.reshape(tf.convert_to_tensor( + box_outputs_all[0:64], dtype=tf.float32), [1, 8, 8, 4]), + '5': tf.reshape(tf.convert_to_tensor( + box_outputs_all[64:80], dtype=tf.float32), [1, 4, 4, 4]), + '6': tf.reshape(tf.convert_to_tensor( + box_outputs_all[80:84], dtype=tf.float32), [1, 2, 2, 4]), + } + if has_att_heads: + att_outputs_all = np.random.rand(84, 1) # random attributes. + att_outputs = { + 'depth': { + '4': + tf.reshape( + tf.convert_to_tensor( + att_outputs_all[0:64], dtype=tf.float32), + [1, 8, 8, 1]), + '5': + tf.reshape( + tf.convert_to_tensor( + att_outputs_all[64:80], dtype=tf.float32), + [1, 4, 4, 1]), + '6': + tf.reshape( + tf.convert_to_tensor( + att_outputs_all[80:84], dtype=tf.float32), + [1, 2, 2, 1]), + } + } + else: + att_outputs = None + image_info = tf.constant([[[1000, 1000], [100, 100], [0.1, 0.1], [0, 0]]], + dtype=tf.float32) + generator = detection_generator.MultilevelDetectionGenerator(**kwargs) + results = generator(box_outputs, class_outputs, anchor_boxes, + image_info[:, 1, :], att_outputs) + boxes = results['detection_boxes'] + classes = results['detection_classes'] + scores = results['detection_scores'] + valid_detections = results['num_detections'] + + if nms_version == 'tflite': + # When nms_version is `tflite`, all output tensors are empty as the actual + # post-processing happens in the TFLite model. + self.assertEqual(boxes.numpy().shape, ()) + self.assertEqual(scores.numpy().shape, ()) + self.assertEqual(classes.numpy().shape, ()) + self.assertEqual(valid_detections.numpy().shape, ()) + else: + self.assertEqual(boxes.numpy().shape, (batch_size, max_num_detections, 4)) + self.assertEqual(scores.numpy().shape, ( + batch_size, + max_num_detections, + )) + self.assertEqual(classes.numpy().shape, ( + batch_size, + max_num_detections, + )) + self.assertEqual(valid_detections.numpy().shape, (batch_size,)) + if has_att_heads: + for att in results['detection_attributes'].values(): + self.assertEqual(att.numpy().shape, + (batch_size, max_num_detections, 1)) + + def test_serialize_deserialize(self): + tflite_post_processing_config = { + 'max_detections': 100, + 'max_classes_per_detection': 1, + 'use_regular_nms': True, + 'nms_score_threshold': 0.01, + 'nms_iou_threshold': 0.5 + } + kwargs = { + 'apply_nms': True, + 'pre_nms_top_k': 1000, + 'pre_nms_score_threshold': 0.1, + 'nms_iou_threshold': 0.5, + 'max_num_detections': 10, + 'nms_version': 'v2', + 'use_cpu_nms': False, + 'soft_nms_sigma': None, + 'tflite_post_processing_config': tflite_post_processing_config + } + generator = detection_generator.MultilevelDetectionGenerator(**kwargs) + + expected_config = dict(kwargs) + self.assertEqual(generator.get_config(), expected_config) + + new_generator = ( + detection_generator.MultilevelDetectionGenerator.from_config( + generator.get_config())) + + self.assertAllEqual(generator.get_config(), new_generator.get_config()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/modeling/layers/mask_sampler.py b/official/vision/modeling/layers/mask_sampler.py similarity index 98% rename from official/vision/beta/modeling/layers/mask_sampler.py rename to official/vision/modeling/layers/mask_sampler.py index 73d3caa32749bdd37488eefbcab6982273bbbed3..bf9c322a6d40c5cc9f631afeea9be3e575ccd952 100644 --- a/official/vision/beta/modeling/layers/mask_sampler.py +++ b/official/vision/modeling/layers/mask_sampler.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ # Import libraries import tensorflow as tf -from official.vision.beta.ops import spatial_transform_ops +from official.vision.ops import spatial_transform_ops def _sample_and_crop_foreground_masks(candidate_rois: tf.Tensor, diff --git a/official/vision/modeling/layers/nn_blocks.py b/official/vision/modeling/layers/nn_blocks.py new file mode 100644 index 0000000000000000000000000000000000000000..0b6378e945f64a44fbed405438d18e4764238ab8 --- /dev/null +++ b/official/vision/modeling/layers/nn_blocks.py @@ -0,0 +1,1624 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common building blocks for neural networks.""" + +from typing import Any, Callable, Dict, List, Optional, Tuple, Union, Text + +# Import libraries +from absl import logging +import tensorflow as tf + +from official.modeling import tf_utils +from official.nlp import modeling as nlp_modeling +from official.vision.modeling.layers import nn_layers + + +def _pad_strides(strides: int, axis: int) -> Tuple[int, int, int, int]: + """Converts int to len 4 strides (`tf.nn.avg_pool` uses length 4).""" + if axis == 1: + return (1, 1, strides, strides) + else: + return (1, strides, strides, 1) + + +def _maybe_downsample(x: tf.Tensor, out_filter: int, strides: int, + axis: int) -> tf.Tensor: + """Downsamples feature map and 0-pads tensor if in_filter != out_filter.""" + data_format = 'NCHW' if axis == 1 else 'NHWC' + strides = _pad_strides(strides, axis=axis) + + x = tf.nn.avg_pool(x, strides, strides, 'VALID', data_format=data_format) + + in_filter = x.shape[axis] + if in_filter < out_filter: + # Pad on channel dimension with 0s: half on top half on bottom. + pad_size = [(out_filter - in_filter) // 2, (out_filter - in_filter) // 2] + if axis == 1: + x = tf.pad(x, [[0, 0], pad_size, [0, 0], [0, 0]]) + else: + x = tf.pad(x, [[0, 0], [0, 0], [0, 0], pad_size]) + + return x + 0. + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class ResidualBlock(tf.keras.layers.Layer): + """A residual block.""" + + def __init__(self, + filters, + strides, + use_projection=False, + se_ratio=None, + resnetd_shortcut=False, + stochastic_depth_drop_rate=None, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + use_explicit_padding: bool = False, + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + bn_trainable=True, + **kwargs): + """Initializes a residual block with BN after convolutions. + + Args: + filters: An `int` number of filters for the first two convolutions. Note + that the third and final convolution will use 4 times as many filters. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + use_projection: A `bool` for whether this block should use a projection + shortcut (versus the default identity shortcut). This is usually `True` + for the first block of a block group, which may change the number of + filters and the resolution. + se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. + resnetd_shortcut: A `bool` if True, apply the resnetd style modification + to the shortcut connection. Not implemented in residual blocks. + stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + use_explicit_padding: Use 'VALID' padding for convolutions, but prepad + inputs so that the output dimensions are the same as if 'SAME' padding + were used. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + bn_trainable: A `bool` that indicates whether batch norm layers should be + trainable. Default to True. + **kwargs: Additional keyword arguments to be passed. + """ + super(ResidualBlock, self).__init__(**kwargs) + + self._filters = filters + self._strides = strides + self._use_projection = use_projection + self._se_ratio = se_ratio + self._resnetd_shortcut = resnetd_shortcut + self._use_explicit_padding = use_explicit_padding + self._use_sync_bn = use_sync_bn + self._activation = activation + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation_fn = tf_utils.get_activation(activation) + self._bn_trainable = bn_trainable + + def build(self, input_shape): + if self._use_projection: + self._shortcut = tf.keras.layers.Conv2D( + filters=self._filters, + kernel_size=1, + strides=self._strides, + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + + conv1_padding = 'same' + # explicit padding here is added for centernet + if self._use_explicit_padding: + self._pad = tf.keras.layers.ZeroPadding2D(padding=(1, 1)) + conv1_padding = 'valid' + + self._conv1 = tf.keras.layers.Conv2D( + filters=self._filters, + kernel_size=3, + strides=self._strides, + padding=conv1_padding, + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + + self._conv2 = tf.keras.layers.Conv2D( + filters=self._filters, + kernel_size=3, + strides=1, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm2 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + + if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: + self._squeeze_excitation = nn_layers.SqueezeExcitation( + in_filters=self._filters, + out_filters=self._filters, + se_ratio=self._se_ratio, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + else: + self._squeeze_excitation = None + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + + super(ResidualBlock, self).build(input_shape) + + def get_config(self): + config = { + 'filters': self._filters, + 'strides': self._strides, + 'use_projection': self._use_projection, + 'se_ratio': self._se_ratio, + 'resnetd_shortcut': self._resnetd_shortcut, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_explicit_padding': self._use_explicit_padding, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'bn_trainable': self._bn_trainable + } + base_config = super(ResidualBlock, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + shortcut = inputs + if self._use_projection: + shortcut = self._shortcut(shortcut) + shortcut = self._norm0(shortcut) + + if self._use_explicit_padding: + inputs = self._pad(inputs) + x = self._conv1(inputs) + x = self._norm1(x) + x = self._activation_fn(x) + + x = self._conv2(x) + x = self._norm2(x) + + if self._squeeze_excitation: + x = self._squeeze_excitation(x) + + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + + return self._activation_fn(x + shortcut) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class BottleneckBlock(tf.keras.layers.Layer): + """A standard bottleneck block.""" + + def __init__(self, + filters, + strides, + dilation_rate=1, + use_projection=False, + se_ratio=None, + resnetd_shortcut=False, + stochastic_depth_drop_rate=None, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + use_sync_bn=False, + norm_momentum=0.99, + norm_epsilon=0.001, + bn_trainable=True, + **kwargs): + """Initializes a standard bottleneck block with BN after convolutions. + + Args: + filters: An `int` number of filters for the first two convolutions. Note + that the third and final convolution will use 4 times as many filters. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + dilation_rate: An `int` dilation_rate of convolutions. Default to 1. + use_projection: A `bool` for whether this block should use a projection + shortcut (versus the default identity shortcut). This is usually `True` + for the first block of a block group, which may change the number of + filters and the resolution. + se_ratio: A `float` or None. Ratio of the Squeeze-and-Excitation layer. + resnetd_shortcut: A `bool`. If True, apply the resnetd style modification + to the shortcut connection. + stochastic_depth_drop_rate: A `float` or None. If not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + bn_trainable: A `bool` that indicates whether batch norm layers should be + trainable. Default to True. + **kwargs: Additional keyword arguments to be passed. + """ + super(BottleneckBlock, self).__init__(**kwargs) + + self._filters = filters + self._strides = strides + self._dilation_rate = dilation_rate + self._use_projection = use_projection + self._se_ratio = se_ratio + self._resnetd_shortcut = resnetd_shortcut + self._use_sync_bn = use_sync_bn + self._activation = activation + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._bn_trainable = bn_trainable + + def build(self, input_shape): + if self._use_projection: + if self._resnetd_shortcut: + self._shortcut0 = tf.keras.layers.AveragePooling2D( + pool_size=2, strides=self._strides, padding='same') + self._shortcut1 = tf.keras.layers.Conv2D( + filters=self._filters * 4, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + else: + self._shortcut = tf.keras.layers.Conv2D( + filters=self._filters * 4, + kernel_size=1, + strides=self._strides, + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + + self._norm0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + + self._conv1 = tf.keras.layers.Conv2D( + filters=self._filters, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation1 = tf_utils.get_activation( + self._activation, use_keras_layer=True) + + self._conv2 = tf.keras.layers.Conv2D( + filters=self._filters, + kernel_size=3, + strides=self._strides, + dilation_rate=self._dilation_rate, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm2 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation2 = tf_utils.get_activation( + self._activation, use_keras_layer=True) + + self._conv3 = tf.keras.layers.Conv2D( + filters=self._filters * 4, + kernel_size=1, + strides=1, + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm3 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon, + trainable=self._bn_trainable) + self._activation3 = tf_utils.get_activation( + self._activation, use_keras_layer=True) + + if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: + self._squeeze_excitation = nn_layers.SqueezeExcitation( + in_filters=self._filters * 4, + out_filters=self._filters * 4, + se_ratio=self._se_ratio, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + else: + self._squeeze_excitation = None + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + self._add = tf.keras.layers.Add() + + super(BottleneckBlock, self).build(input_shape) + + def get_config(self): + config = { + 'filters': self._filters, + 'strides': self._strides, + 'dilation_rate': self._dilation_rate, + 'use_projection': self._use_projection, + 'se_ratio': self._se_ratio, + 'resnetd_shortcut': self._resnetd_shortcut, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'bn_trainable': self._bn_trainable + } + base_config = super(BottleneckBlock, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + shortcut = inputs + if self._use_projection: + if self._resnetd_shortcut: + shortcut = self._shortcut0(shortcut) + shortcut = self._shortcut1(shortcut) + else: + shortcut = self._shortcut(shortcut) + shortcut = self._norm0(shortcut) + + x = self._conv1(inputs) + x = self._norm1(x) + x = self._activation1(x) + + x = self._conv2(x) + x = self._norm2(x) + x = self._activation2(x) + + x = self._conv3(x) + x = self._norm3(x) + + if self._squeeze_excitation: + x = self._squeeze_excitation(x) + + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + + x = self._add([x, shortcut]) + return self._activation3(x) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class InvertedBottleneckBlock(tf.keras.layers.Layer): + """An inverted bottleneck block.""" + + def __init__(self, + in_filters, + out_filters, + expand_ratio, + strides, + kernel_size=3, + se_ratio=None, + stochastic_depth_drop_rate=None, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + se_inner_activation='relu', + se_gating_activation='sigmoid', + se_round_down_protect=True, + expand_se_in_filters=False, + depthwise_activation=None, + use_sync_bn=False, + dilation_rate=1, + divisible_by=1, + regularize_depthwise=False, + use_depthwise=True, + use_residual=True, + norm_momentum=0.99, + norm_epsilon=0.001, + output_intermediate_endpoints=False, + **kwargs): + """Initializes an inverted bottleneck block with BN after convolutions. + + Args: + in_filters: An `int` number of filters of the input tensor. + out_filters: An `int` number of filters of the output tensor. + expand_ratio: An `int` of expand_ratio for an inverted bottleneck block. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + kernel_size: An `int` kernel_size of the depthwise conv layer. + se_ratio: A `float` or None. If not None, se ratio for the squeeze and + excitation layer. + stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + se_inner_activation: A `str` name of squeeze-excitation inner activation. + se_gating_activation: A `str` name of squeeze-excitation gating + activation. + se_round_down_protect: A `bool` of whether round down more than 10% will + be allowed in SE layer. + expand_se_in_filters: A `bool` of whether or not to expand in_filter in + squeeze and excitation layer. + depthwise_activation: A `str` name of the activation function for + depthwise only. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + dilation_rate: An `int` that specifies the dilation rate to use for. + divisible_by: An `int` that ensures all inner dimensions are divisible by + this number. dilated convolution: An `int` to specify the same value for + all spatial dimensions. + regularize_depthwise: A `bool` of whether or not apply regularization on + depthwise. + use_depthwise: A `bool` of whether to uses fused convolutions instead of + depthwise. + use_residual: A `bool` of whether to include residual connection between + input and output. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + output_intermediate_endpoints: A `bool` of whether or not output the + intermediate endpoints. + **kwargs: Additional keyword arguments to be passed. + """ + super(InvertedBottleneckBlock, self).__init__(**kwargs) + + self._in_filters = in_filters + self._out_filters = out_filters + self._expand_ratio = expand_ratio + self._strides = strides + self._kernel_size = kernel_size + self._se_ratio = se_ratio + self._divisible_by = divisible_by + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._dilation_rate = dilation_rate + self._use_sync_bn = use_sync_bn + self._regularize_depthwise = regularize_depthwise + self._use_depthwise = use_depthwise + self._use_residual = use_residual + self._activation = activation + self._se_inner_activation = se_inner_activation + self._se_gating_activation = se_gating_activation + self._depthwise_activation = depthwise_activation + self._se_round_down_protect = se_round_down_protect + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + self._expand_se_in_filters = expand_se_in_filters + self._output_intermediate_endpoints = output_intermediate_endpoints + + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + if not depthwise_activation: + self._depthwise_activation = activation + if regularize_depthwise: + self._depthsize_regularizer = kernel_regularizer + else: + self._depthsize_regularizer = None + + def build(self, input_shape): + expand_filters = self._in_filters + if self._expand_ratio > 1: + # First 1x1 conv for channel expansion. + expand_filters = nn_layers.make_divisible( + self._in_filters * self._expand_ratio, self._divisible_by) + + expand_kernel = 1 if self._use_depthwise else self._kernel_size + expand_stride = 1 if self._use_depthwise else self._strides + + self._conv0 = tf.keras.layers.Conv2D( + filters=expand_filters, + kernel_size=expand_kernel, + strides=expand_stride, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._activation_layer = tf_utils.get_activation( + self._activation, use_keras_layer=True) + + if self._use_depthwise: + # Depthwise conv. + self._conv1 = tf.keras.layers.DepthwiseConv2D( + kernel_size=(self._kernel_size, self._kernel_size), + strides=self._strides, + padding='same', + depth_multiplier=1, + dilation_rate=self._dilation_rate, + use_bias=False, + depthwise_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + depthwise_regularizer=self._depthsize_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._depthwise_activation_layer = tf_utils.get_activation( + self._depthwise_activation, use_keras_layer=True) + + # Squeeze and excitation. + if self._se_ratio and self._se_ratio > 0 and self._se_ratio <= 1: + logging.info('Use Squeeze and excitation.') + in_filters = self._in_filters + if self._expand_se_in_filters: + in_filters = expand_filters + self._squeeze_excitation = nn_layers.SqueezeExcitation( + in_filters=in_filters, + out_filters=expand_filters, + se_ratio=self._se_ratio, + divisible_by=self._divisible_by, + round_down_protect=self._se_round_down_protect, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer, + activation=self._se_inner_activation, + gating_activation=self._se_gating_activation) + else: + self._squeeze_excitation = None + + # Last 1x1 conv. + self._conv2 = tf.keras.layers.Conv2D( + filters=self._out_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm2 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + self._add = tf.keras.layers.Add() + + super(InvertedBottleneckBlock, self).build(input_shape) + + def get_config(self): + config = { + 'in_filters': self._in_filters, + 'out_filters': self._out_filters, + 'expand_ratio': self._expand_ratio, + 'strides': self._strides, + 'kernel_size': self._kernel_size, + 'se_ratio': self._se_ratio, + 'divisible_by': self._divisible_by, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'se_inner_activation': self._se_inner_activation, + 'se_gating_activation': self._se_gating_activation, + 'se_round_down_protect': self._se_round_down_protect, + 'expand_se_in_filters': self._expand_se_in_filters, + 'depthwise_activation': self._depthwise_activation, + 'dilation_rate': self._dilation_rate, + 'use_sync_bn': self._use_sync_bn, + 'regularize_depthwise': self._regularize_depthwise, + 'use_depthwise': self._use_depthwise, + 'use_residual': self._use_residual, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'output_intermediate_endpoints': self._output_intermediate_endpoints + } + base_config = super(InvertedBottleneckBlock, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + endpoints = {} + shortcut = inputs + if self._expand_ratio > 1: + x = self._conv0(inputs) + x = self._norm0(x) + x = self._activation_layer(x) + else: + x = inputs + + if self._use_depthwise: + x = self._conv1(x) + x = self._norm1(x) + x = self._depthwise_activation_layer(x) + if self._output_intermediate_endpoints: + endpoints['depthwise'] = x + + if self._squeeze_excitation: + x = self._squeeze_excitation(x) + + x = self._conv2(x) + x = self._norm2(x) + + if (self._use_residual and self._in_filters == self._out_filters and + self._strides == 1): + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + x = self._add([x, shortcut]) + + if self._output_intermediate_endpoints: + return x, endpoints + return x + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class ResidualInner(tf.keras.layers.Layer): + """Creates a single inner block of a residual. + + This corresponds to `F`/`G` functions in the RevNet paper: + Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse. + The Reversible Residual Network: Backpropagation Without Storing Activations. + (https://arxiv.org/pdf/1707.04585.pdf) + """ + + def __init__( + self, + filters: int, + strides: int, + kernel_initializer: Union[str, Callable[ + ..., tf.keras.initializers.Initializer]] = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + activation: Union[str, Callable[..., tf.Tensor]] = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + batch_norm_first: bool = True, + **kwargs): + """Initializes a ResidualInner. + + Args: + filters: An `int` of output filter size. + strides: An `int` of stride size for convolution for the residual block. + kernel_initializer: A `str` or `tf.keras.initializers.Initializer` + instance for convolutional layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` for Conv2D. + activation: A `str` or `callable` instance of the activation function. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + batch_norm_first: A `bool` of whether to apply activation and batch norm + before conv. + **kwargs: Additional keyword arguments to be passed. + """ + super(ResidualInner, self).__init__(**kwargs) + + self.strides = strides + self.filters = filters + self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) + self._kernel_regularizer = kernel_regularizer + self._activation = tf.keras.activations.get(activation) + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._batch_norm_first = batch_norm_first + + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation_fn = tf_utils.get_activation(activation) + + def build(self, input_shape: tf.TensorShape): + if self._batch_norm_first: + self._batch_norm_0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + self._conv2d_1 = tf.keras.layers.Conv2D( + filters=self.filters, + kernel_size=3, + strides=self.strides, + use_bias=False, + padding='same', + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer) + + self._batch_norm_1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + self._conv2d_2 = tf.keras.layers.Conv2D( + filters=self.filters, + kernel_size=3, + strides=1, + use_bias=False, + padding='same', + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer) + + super(ResidualInner, self).build(input_shape) + + def get_config(self) -> Dict[str, Any]: + config = { + 'filters': self.filters, + 'strides': self.strides, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'batch_norm_first': self._batch_norm_first, + } + base_config = super(ResidualInner, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, + inputs: tf.Tensor, + training: Optional[bool] = None) -> tf.Tensor: + x = inputs + if self._batch_norm_first: + x = self._batch_norm_0(x, training=training) + x = self._activation_fn(x) + x = self._conv2d_1(x) + + x = self._batch_norm_1(x, training=training) + x = self._activation_fn(x) + x = self._conv2d_2(x) + return x + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class BottleneckResidualInner(tf.keras.layers.Layer): + """Creates a single inner block of a bottleneck. + + This corresponds to `F`/`G` functions in the RevNet paper: + Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse. + The Reversible Residual Network: Backpropagation Without Storing Activations. + (https://arxiv.org/pdf/1707.04585.pdf) + """ + + def __init__( + self, + filters: int, + strides: int, + kernel_initializer: Union[str, Callable[ + ..., tf.keras.initializers.Initializer]] = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + activation: Union[str, Callable[..., tf.Tensor]] = 'relu', + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + batch_norm_first: bool = True, + **kwargs): + """Initializes a BottleneckResidualInner. + + Args: + filters: An `int` number of filters for first 2 convolutions. Last Last, + and thus the number of output channels from the bottlneck block is + `4*filters` + strides: An `int` of stride size for convolution for the residual block. + kernel_initializer: A `str` or `tf.keras.initializers.Initializer` + instance for convolutional layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` for Conv2D. + activation: A `str` or `callable` instance of the activation function. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + batch_norm_first: A `bool` of whether to apply activation and batch norm + before conv. + **kwargs: Additional keyword arguments to be passed. + """ + super(BottleneckResidualInner, self).__init__(**kwargs) + + self.strides = strides + self.filters = filters + self._kernel_initializer = tf.keras.initializers.get(kernel_initializer) + self._kernel_regularizer = kernel_regularizer + self._activation = tf.keras.activations.get(activation) + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._batch_norm_first = batch_norm_first + + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation_fn = tf_utils.get_activation(activation) + + def build(self, input_shape: tf.TensorShape): + if self._batch_norm_first: + self._batch_norm_0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._conv2d_1 = tf.keras.layers.Conv2D( + filters=self.filters, + kernel_size=1, + strides=self.strides, + use_bias=False, + padding='same', + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer) + self._batch_norm_1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._conv2d_2 = tf.keras.layers.Conv2D( + filters=self.filters, + kernel_size=3, + strides=1, + use_bias=False, + padding='same', + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer) + self._batch_norm_2 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._conv2d_3 = tf.keras.layers.Conv2D( + filters=self.filters * 4, + kernel_size=1, + strides=1, + use_bias=False, + padding='same', + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer) + + super(BottleneckResidualInner, self).build(input_shape) + + def get_config(self) -> Dict[str, Any]: + config = { + 'filters': self.filters, + 'strides': self.strides, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon, + 'batch_norm_first': self._batch_norm_first, + } + base_config = super(BottleneckResidualInner, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, + inputs: tf.Tensor, + training: Optional[bool] = None) -> tf.Tensor: + x = inputs + if self._batch_norm_first: + x = self._batch_norm_0(x, training=training) + x = self._activation_fn(x) + x = self._conv2d_1(x) + + x = self._batch_norm_1(x, training=training) + x = self._activation_fn(x) + x = self._conv2d_2(x) + + x = self._batch_norm_2(x, training=training) + x = self._activation_fn(x) + x = self._conv2d_3(x) + + return x + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class ReversibleLayer(tf.keras.layers.Layer): + """Creates a reversible layer. + + Computes y1 = x1 + f(x2), y2 = x2 + g(y1), where f and g can be arbitrary + layers that are stateless, which in this case are `ResidualInner` layers. + """ + + def __init__(self, + f: tf.keras.layers.Layer, + g: tf.keras.layers.Layer, + manual_grads: bool = True, + **kwargs): + """Initializes a ReversibleLayer. + + Args: + f: A `tf.keras.layers.Layer` instance of `f` inner block referred to in + paper. Each reversible layer consists of two inner functions. For + example, in RevNet the reversible residual consists of two f/g inner + (bottleneck) residual functions. Where the input to the reversible layer + is x, the input gets partitioned in the channel dimension and the + forward pass follows (eq8): x = [x1; x2], z1 = x1 + f(x2), y2 = x2 + + g(z1), y1 = stop_gradient(z1). + g: A `tf.keras.layers.Layer` instance of `g` inner block referred to in + paper. Detailed explanation same as above as `f` arg. + manual_grads: A `bool` [Testing Only] of whether to manually take + gradients as in Algorithm 1 or defer to autograd. + **kwargs: Additional keyword arguments to be passed. + """ + super(ReversibleLayer, self).__init__(**kwargs) + + self._f = f + self._g = g + self._manual_grads = manual_grads + + if tf.keras.backend.image_data_format() == 'channels_last': + self._axis = -1 + else: + self._axis = 1 + + def get_config(self) -> Dict[str, Any]: + config = { + 'f': self._f, + 'g': self._g, + 'manual_grads': self._manual_grads, + } + base_config = super(ReversibleLayer, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def _ckpt_non_trainable_vars(self): + self._f_non_trainable_vars = [ + v.read_value() for v in self._f.non_trainable_variables + ] + self._g_non_trainable_vars = [ + v.read_value() for v in self._g.non_trainable_variables + ] + + def _load_ckpt_non_trainable_vars(self): + for v, v_chkpt in zip(self._f.non_trainable_variables, + self._f_non_trainable_vars): + v.assign(v_chkpt) + for v, v_chkpt in zip(self._g.non_trainable_variables, + self._g_non_trainable_vars): + v.assign(v_chkpt) + + def call(self, + inputs: tf.Tensor, + training: Optional[bool] = None) -> tf.Tensor: + + @tf.custom_gradient + def reversible( + x: tf.Tensor + ) -> Tuple[tf.Tensor, Callable[[Any], Tuple[List[tf.Tensor], + List[tf.Tensor]]]]: + """Implements Algorithm 1 in the RevNet paper. + + Aidan N. Gomez, Mengye Ren, Raquel Urtasun, Roger B. Grosse. + The Reversible Residual Network: Backpropagation Without Storing + Activations. + (https://arxiv.org/pdf/1707.04585.pdf) + + Args: + x: An input `tf.Tensor. + + Returns: + y: The output [y1; y2] in Algorithm 1. + grad_fn: A callable function that computes the gradients. + """ + with tf.GradientTape() as fwdtape: + fwdtape.watch(x) + x1, x2 = tf.split(x, num_or_size_splits=2, axis=self._axis) + f_x2 = self._f(x2, training=training) + x1_down = _maybe_downsample(x1, f_x2.shape[self._axis], self._f.strides, + self._axis) + z1 = f_x2 + x1_down + g_z1 = self._g(z1, training=training) + x2_down = _maybe_downsample(x2, g_z1.shape[self._axis], self._f.strides, + self._axis) + y2 = x2_down + g_z1 + + # Equation 8: https://arxiv.org/pdf/1707.04585.pdf + # Decouple y1 and z1 so that their derivatives are different. + y1 = tf.identity(z1) + y = tf.concat([y1, y2], axis=self._axis) + + irreversible = ((self._f.strides != 1 or self._g.strides != 1) or + (y.shape[self._axis] != inputs.shape[self._axis])) + + # Checkpointing moving mean/variance for batch normalization layers + # as they shouldn't be updated during the custom gradient pass of f/g. + self._ckpt_non_trainable_vars() + + def grad_fn( + dy: tf.Tensor, + variables: Optional[List[tf.Variable]] = None, + ) -> Tuple[List[tf.Tensor], List[tf.Tensor]]: + """Given dy calculate (dy/dx)|_{x_{input}} using f/g.""" + if irreversible or not self._manual_grads: + grads_combined = fwdtape.gradient( + y, [x] + variables, output_gradients=dy) + dx = grads_combined[0] + grad_vars = grads_combined[1:] + else: + y1_nograd = tf.stop_gradient(y1) + y2_nograd = tf.stop_gradient(y2) + dy1, dy2 = tf.split(dy, num_or_size_splits=2, axis=self._axis) + + # Index mapping from self.f/g.trainable_variables to grad_fn + # input `variables` kwarg so that we can reorder dwf + dwg + # variable gradient list to match `variables` order. + f_var_refs = [v.ref() for v in self._f.trainable_variables] + g_var_refs = [v.ref() for v in self._g.trainable_variables] + fg_var_refs = f_var_refs + g_var_refs + self_to_var_index = [fg_var_refs.index(v.ref()) for v in variables] + + # Algorithm 1 in paper (line # documented in-line) + z1 = y1_nograd # line 2 + with tf.GradientTape() as gtape: + gtape.watch(z1) + g_z1 = self._g(z1, training=training) + x2 = y2_nograd - g_z1 # line 3 + + with tf.GradientTape() as ftape: + ftape.watch(x2) + f_x2 = self._f(x2, training=training) + x1 = z1 - f_x2 # pylint: disable=unused-variable # line 4 + + # Compute gradients + g_grads_combined = gtape.gradient( + g_z1, [z1] + self._g.trainable_variables, output_gradients=dy2) + dz1 = dy1 + g_grads_combined[0] # line 5 + dwg = g_grads_combined[1:] # line 9 + + f_grads_combined = ftape.gradient( + f_x2, [x2] + self._f.trainable_variables, output_gradients=dz1) + dx2 = dy2 + f_grads_combined[0] # line 6 + dwf = f_grads_combined[1:] # line 8 + dx1 = dz1 # line 7 + + # Pack the input and variable gradients. + dx = tf.concat([dx1, dx2], axis=self._axis) + grad_vars = dwf + dwg + # Reorder gradients (trainable_variables to variables kwarg order) + grad_vars = [grad_vars[i] for i in self_to_var_index] + + # Restore batch normalization moving mean/variance for correctness. + self._load_ckpt_non_trainable_vars() + + return dx, grad_vars # grad_fn end + + return y, grad_fn # reversible end + + activations = reversible(inputs) + return activations + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class DepthwiseSeparableConvBlock(tf.keras.layers.Layer): + """Creates a depthwise separable convolution block with batch normalization. + """ + + def __init__( + self, + filters: int, + kernel_size: int = 3, + strides: int = 1, + regularize_depthwise=False, + activation: Text = 'relu6', + kernel_initializer: Text = 'VarianceScaling', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + dilation_rate: int = 1, + use_sync_bn: bool = False, + norm_momentum: float = 0.99, + norm_epsilon: float = 0.001, + **kwargs): + """Initializes a convolution block with batch normalization. + + Args: + filters: An `int` number of filters for the first two convolutions. Note + that the third and final convolution will use 4 times as many filters. + kernel_size: An `int` that specifies the height and width of the 2D + convolution window. + strides: An `int` of block stride. If greater than 1, this block will + ultimately downsample the input. + regularize_depthwise: A `bool`. If Ture, apply regularization on + depthwise. + activation: A `str` name of the activation function. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + dilation_rate: An `int` or tuple/list of 2 `int`, specifying the dilation + rate to use for dilated convolution. Can be a single integer to specify + the same value for all spatial dimensions. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + **kwargs: Additional keyword arguments to be passed. + """ + super(DepthwiseSeparableConvBlock, self).__init__(**kwargs) + self._filters = filters + self._kernel_size = kernel_size + self._strides = strides + self._activation = activation + self._regularize_depthwise = regularize_depthwise + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._dilation_rate = dilation_rate + self._use_sync_bn = use_sync_bn + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + self._activation_fn = tf_utils.get_activation(activation) + if regularize_depthwise: + self._depthsize_regularizer = kernel_regularizer + else: + self._depthsize_regularizer = None + + def get_config(self): + config = { + 'filters': self._filters, + 'strides': self._strides, + 'regularize_depthwise': self._regularize_depthwise, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon + } + base_config = super(DepthwiseSeparableConvBlock, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def build(self, input_shape): + + self._dwconv0 = tf.keras.layers.DepthwiseConv2D( + kernel_size=self._kernel_size, + strides=self._strides, + padding='same', + depth_multiplier=1, + dilation_rate=self._dilation_rate, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._depthsize_regularizer, + use_bias=False) + self._norm0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + self._conv1 = tf.keras.layers.Conv2D( + filters=self._filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer) + self._norm1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + super(DepthwiseSeparableConvBlock, self).build(input_shape) + + def call(self, inputs, training=None): + x = self._dwconv0(inputs) + x = self._norm0(x) + x = self._activation_fn(x) + + x = self._conv1(x) + x = self._norm1(x) + return self._activation_fn(x) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class TuckerConvBlock(tf.keras.layers.Layer): + """An Tucker block (generalized bottleneck).""" + + def __init__(self, + in_filters, + out_filters, + input_compression_ratio, + output_compression_ratio, + strides, + kernel_size=3, + stochastic_depth_drop_rate=None, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + use_sync_bn=False, + divisible_by=1, + use_residual=True, + norm_momentum=0.99, + norm_epsilon=0.001, + **kwargs): + """Initializes an inverted bottleneck block with BN after convolutions. + + Args: + in_filters: An `int` number of filters of the input tensor. + out_filters: An `int` number of filters of the output tensor. + input_compression_ratio: An `float` of compression ratio for input + filters. + output_compression_ratio: An `float` of compression ratio for output + filters. + strides: An `int` block stride. If greater than 1, this block will + ultimately downsample the input. + kernel_size: An `int` kernel_size of the depthwise conv layer. + stochastic_depth_drop_rate: A `float` or None. if not None, drop rate for + the stochastic depth layer. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + use_sync_bn: A `bool`. If True, use synchronized batch normalization. + divisible_by: An `int` that ensures all inner dimensions are divisible by + this number. + use_residual: A `bool` of whether to include residual connection between + input and output. + norm_momentum: A `float` of normalization momentum for the moving average. + norm_epsilon: A `float` added to variance to avoid dividing by zero. + **kwargs: Additional keyword arguments to be passed. + """ + super(TuckerConvBlock, self).__init__(**kwargs) + + self._in_filters = in_filters + self._out_filters = out_filters + self._input_compression_ratio = input_compression_ratio + self._output_compression_ratio = output_compression_ratio + self._strides = strides + self._kernel_size = kernel_size + self._divisible_by = divisible_by + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._use_sync_bn = use_sync_bn + self._use_residual = use_residual + self._activation = activation + self._kernel_initializer = kernel_initializer + self._norm_momentum = norm_momentum + self._norm_epsilon = norm_epsilon + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + + if use_sync_bn: + self._norm = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._norm = tf.keras.layers.BatchNormalization + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + + def build(self, input_shape): + input_compressed_filters = nn_layers.make_divisible( + value=self._in_filters * self._input_compression_ratio, + divisor=self._divisible_by, + round_down_protect=False) + + self._conv0 = tf.keras.layers.Conv2D( + filters=input_compressed_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm0 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._activation_layer0 = tf_utils.get_activation( + self._activation, use_keras_layer=True) + + output_compressed_filters = nn_layers.make_divisible( + value=self._out_filters * self._output_compression_ratio, + divisor=self._divisible_by, + round_down_protect=False) + + self._conv1 = tf.keras.layers.Conv2D( + filters=output_compressed_filters, + kernel_size=self._kernel_size, + strides=self._strides, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm1 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + self._activation_layer1 = tf_utils.get_activation( + self._activation, use_keras_layer=True) + + # Last 1x1 conv. + self._conv2 = tf.keras.layers.Conv2D( + filters=self._out_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=False, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + self._norm2 = self._norm( + axis=self._bn_axis, + momentum=self._norm_momentum, + epsilon=self._norm_epsilon) + + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = None + self._add = tf.keras.layers.Add() + + super(TuckerConvBlock, self).build(input_shape) + + def get_config(self): + config = { + 'in_filters': self._in_filters, + 'out_filters': self._out_filters, + 'input_compression_ratio': self._input_compression_ratio, + 'output_compression_ratio': self._output_compression_ratio, + 'strides': self._strides, + 'kernel_size': self._kernel_size, + 'divisible_by': self._divisible_by, + 'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'use_sync_bn': self._use_sync_bn, + 'use_residual': self._use_residual, + 'norm_momentum': self._norm_momentum, + 'norm_epsilon': self._norm_epsilon + } + base_config = super(TuckerConvBlock, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + shortcut = inputs + + x = self._conv0(inputs) + x = self._norm0(x) + x = self._activation_layer0(x) + + x = self._conv1(x) + x = self._norm1(x) + x = self._activation_layer1(x) + + x = self._conv2(x) + x = self._norm2(x) + + if (self._use_residual and self._in_filters == self._out_filters and + self._strides == 1): + if self._stochastic_depth: + x = self._stochastic_depth(x, training=training) + x = self._add([x, shortcut]) + + return x + + +class TransformerEncoderBlock(nlp_modeling.layers.TransformerEncoderBlock): + """TransformerEncoderBlock layer with stochastic depth.""" + + def __init__(self, + *args, + stochastic_depth_drop_rate=0.0, + return_attention=False, + **kwargs): + """Initializes TransformerEncoderBlock.""" + super().__init__(*args, **kwargs) + self._stochastic_depth_drop_rate = stochastic_depth_drop_rate + self._return_attention = return_attention + + def build(self, input_shape): + if self._stochastic_depth_drop_rate: + self._stochastic_depth = nn_layers.StochasticDepth( + self._stochastic_depth_drop_rate) + else: + self._stochastic_depth = lambda x, *args, **kwargs: tf.identity(x) + + super().build(input_shape) + + def get_config(self): + config = {'stochastic_depth_drop_rate': self._stochastic_depth_drop_rate} + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + """Transformer self-attention encoder block call.""" + if isinstance(inputs, (list, tuple)): + if len(inputs) == 2: + input_tensor, attention_mask = inputs + key_value = None + elif len(inputs) == 3: + input_tensor, key_value, attention_mask = inputs + else: + raise ValueError('Unexpected inputs to %s with length at %d' % + (self.__class__, len(inputs))) + else: + input_tensor, key_value, attention_mask = (inputs, None, None) + + if self._output_range: + if self._norm_first: + source_tensor = input_tensor[:, 0:self._output_range, :] + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor[:, 0:self._output_range, :] + if attention_mask is not None: + attention_mask = attention_mask[:, 0:self._output_range, :] + else: + if self._norm_first: + source_tensor = input_tensor + input_tensor = self._attention_layer_norm(input_tensor) + if key_value is not None: + key_value = self._attention_layer_norm(key_value) + target_tensor = input_tensor + + if key_value is None: + key_value = input_tensor + attention_output, attention_scores = self._attention_layer( + query=target_tensor, + value=key_value, + attention_mask=attention_mask, + return_attention_scores=True) + attention_output = self._attention_dropout(attention_output) + + if self._norm_first: + attention_output = source_tensor + self._stochastic_depth( + attention_output, training=training) + else: + attention_output = self._attention_layer_norm( + target_tensor + + self._stochastic_depth(attention_output, training=training)) + + if self._norm_first: + source_attention_output = attention_output + attention_output = self._output_layer_norm(attention_output) + inner_output = self._intermediate_dense(attention_output) + inner_output = self._intermediate_activation_layer(inner_output) + inner_output = self._inner_dropout_layer(inner_output) + layer_output = self._output_dense(inner_output) + layer_output = self._output_dropout(layer_output) + + if self._norm_first: + if self._return_attention: + return source_attention_output + self._stochastic_depth( + layer_output, training=training), attention_scores + else: + return source_attention_output + self._stochastic_depth( + layer_output, training=training) + + # During mixed precision training, layer norm output is always fp32 for now. + # Casts fp32 for the subsequent add. + layer_output = tf.cast(layer_output, tf.float32) + if self._return_attention: + return self._output_layer_norm(layer_output + self._stochastic_depth( + attention_output, training=training)), attention_scores + else: + return self._output_layer_norm( + layer_output + + self._stochastic_depth(attention_output, training=training)) diff --git a/official/vision/beta/modeling/layers/nn_blocks_3d.py b/official/vision/modeling/layers/nn_blocks_3d.py similarity index 94% rename from official/vision/beta/modeling/layers/nn_blocks_3d.py rename to official/vision/modeling/layers/nn_blocks_3d.py index cf377530af094810757bea43dfaef83264e4b36b..53f6f11681085fdda890bb53698cd370fd044e95 100644 --- a/official/vision/beta/modeling/layers/nn_blocks_3d.py +++ b/official/vision/modeling/layers/nn_blocks_3d.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ import tensorflow as tf from official.modeling import tf_utils -from official.vision.beta.modeling.layers import nn_layers +from official.vision.modeling.layers import nn_layers @tf.keras.utils.register_keras_serializable(package='Vision') @@ -155,7 +155,7 @@ class BottleneckBlock3D(tf.keras.layers.Layer): self._temporal_strides, self._spatial_strides, self._spatial_strides ], use_bias=False, - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer) self._norm0 = self._norm( @@ -169,7 +169,7 @@ class BottleneckBlock3D(tf.keras.layers.Layer): strides=[self._temporal_strides, 1, 1], padding='same', use_bias=False, - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer) self._norm1 = self._norm( @@ -183,7 +183,7 @@ class BottleneckBlock3D(tf.keras.layers.Layer): strides=[1, self._spatial_strides, self._spatial_strides], padding='same', use_bias=False, - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer) self._norm2 = self._norm( @@ -197,7 +197,7 @@ class BottleneckBlock3D(tf.keras.layers.Layer): strides=[1, 1, 1], padding='same', use_bias=False, - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer) self._norm3 = self._norm( @@ -211,7 +211,8 @@ class BottleneckBlock3D(tf.keras.layers.Layer): out_filters=self._filters * 4, se_ratio=self._se_ratio, use_3d_input=True, - kernel_initializer=self._kernel_initializer, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), kernel_regularizer=self._kernel_regularizer, bias_regularizer=self._bias_regularizer) else: diff --git a/official/vision/beta/modeling/layers/nn_blocks_3d_test.py b/official/vision/modeling/layers/nn_blocks_3d_test.py similarity index 93% rename from official/vision/beta/modeling/layers/nn_blocks_3d_test.py rename to official/vision/modeling/layers/nn_blocks_3d_test.py index 189c0e7cb7df4710b2e465af476ce43952097ca7..9f88d4be716dde430df8d70e2ad6df8115994c28 100644 --- a/official/vision/beta/modeling/layers/nn_blocks_3d_test.py +++ b/official/vision/modeling/layers/nn_blocks_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,14 +12,13 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for resnet.""" # Import libraries from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.modeling.layers import nn_blocks_3d +from official.vision.modeling.layers import nn_blocks_3d class NNBlocksTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/modeling/layers/nn_blocks_test.py b/official/vision/modeling/layers/nn_blocks_test.py new file mode 100644 index 0000000000000000000000000000000000000000..3c5a1dedc775f3c31c3df936d3be1f77d65f46e3 --- /dev/null +++ b/official/vision/modeling/layers/nn_blocks_test.py @@ -0,0 +1,340 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for nn_blocks.""" + +from typing import Any, Iterable, Tuple +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from tensorflow.python.distribute import strategy_combinations +from official.vision.modeling.layers import nn_blocks + + +def distribution_strategy_combinations() -> Iterable[Tuple[Any, ...]]: + """Returns the combinations of end-to-end tests to run.""" + return combinations.combine( + distribution=[ + strategy_combinations.default_strategy, + strategy_combinations.cloud_tpu_strategy, + strategy_combinations.one_device_strategy_gpu, + ],) + + +class NNBlocksTest(parameterized.TestCase, tf.test.TestCase): + + @parameterized.parameters( + (nn_blocks.ResidualBlock, 1, False, 0.0, None), + (nn_blocks.ResidualBlock, 2, True, 0.2, 0.25), + ) + def test_residual_block_creation(self, block_fn, strides, use_projection, + stochastic_depth_drop_rate, se_ratio): + input_size = 128 + filter_size = 256 + inputs = tf.keras.Input( + shape=(input_size, input_size, filter_size), batch_size=1) + block = block_fn( + filter_size, + strides, + use_projection=use_projection, + se_ratio=se_ratio, + stochastic_depth_drop_rate=stochastic_depth_drop_rate, + ) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, filter_size], + features.shape.as_list()) + + @parameterized.parameters( + (nn_blocks.BottleneckBlock, 1, False, 0.0, None), + (nn_blocks.BottleneckBlock, 2, True, 0.2, 0.25), + ) + def test_bottleneck_block_creation(self, block_fn, strides, use_projection, + stochastic_depth_drop_rate, se_ratio): + input_size = 128 + filter_size = 256 + inputs = tf.keras.Input( + shape=(input_size, input_size, filter_size * 4), batch_size=1) + block = block_fn( + filter_size, + strides, + use_projection=use_projection, + se_ratio=se_ratio, + stochastic_depth_drop_rate=stochastic_depth_drop_rate) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, filter_size * 4], + features.shape.as_list()) + + @parameterized.parameters( + (nn_blocks.InvertedBottleneckBlock, 1, 1, None, None), + (nn_blocks.InvertedBottleneckBlock, 6, 1, None, None), + (nn_blocks.InvertedBottleneckBlock, 1, 2, None, None), + (nn_blocks.InvertedBottleneckBlock, 1, 1, 0.2, None), + (nn_blocks.InvertedBottleneckBlock, 1, 1, None, 0.2), + ) + def test_invertedbottleneck_block_creation(self, block_fn, expand_ratio, + strides, se_ratio, + stochastic_depth_drop_rate): + input_size = 128 + in_filters = 24 + out_filters = 40 + inputs = tf.keras.Input( + shape=(input_size, input_size, in_filters), batch_size=1) + block = block_fn( + in_filters=in_filters, + out_filters=out_filters, + expand_ratio=expand_ratio, + strides=strides, + se_ratio=se_ratio, + stochastic_depth_drop_rate=stochastic_depth_drop_rate) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, out_filters], + features.shape.as_list()) + + @parameterized.parameters( + (nn_blocks.TuckerConvBlock, 1, 0.25, 0.25), + (nn_blocks.TuckerConvBlock, 2, 0.25, 0.25), + ) + def test_tucker_conv_block( + self, block_fn, strides, + input_compression_ratio, output_compression_ratio): + input_size = 128 + in_filters = 24 + out_filters = 24 + inputs = tf.keras.Input( + shape=(input_size, input_size, in_filters), batch_size=1) + block = block_fn( + in_filters=in_filters, + out_filters=out_filters, + input_compression_ratio=input_compression_ratio, + output_compression_ratio=output_compression_ratio, + strides=strides) + + features = block(inputs) + + self.assertAllEqual( + [1, input_size // strides, input_size // strides, out_filters], + features.shape.as_list()) + + +class ResidualInnerTest(parameterized.TestCase, tf.test.TestCase): + + @combinations.generate(distribution_strategy_combinations()) + def test_shape(self, distribution): + bsz, h, w, c = 8, 32, 32, 32 + filters = 64 + strides = 2 + + input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) + with distribution.scope(): + test_layer = nn_blocks.ResidualInner(filters, strides) + + output = test_layer(input_tensor) + expected_output_shape = [bsz, h // strides, w // strides, filters] + self.assertEqual(expected_output_shape, output.shape.as_list()) + + +class BottleneckResidualInnerTest(parameterized.TestCase, tf.test.TestCase): + + @combinations.generate(distribution_strategy_combinations()) + def test_shape(self, distribution): + bsz, h, w, c = 8, 32, 32, 32 + filters = 64 + strides = 2 + + input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) + with distribution.scope(): + test_layer = nn_blocks.BottleneckResidualInner(filters, strides) + + output = test_layer(input_tensor) + expected_output_shape = [bsz, h // strides, w // strides, filters * 4] + self.assertEqual(expected_output_shape, output.shape.as_list()) + + +class DepthwiseSeparableConvBlockTest(parameterized.TestCase, tf.test.TestCase): + + @combinations.generate(distribution_strategy_combinations()) + def test_shape(self, distribution): + batch_size, height, width, num_channels = 8, 32, 32, 32 + num_filters = 64 + strides = 2 + + input_tensor = tf.random.normal( + shape=[batch_size, height, width, num_channels]) + with distribution.scope(): + block = nn_blocks.DepthwiseSeparableConvBlock( + num_filters, strides=strides) + config_dict = block.get_config() + recreate_block = nn_blocks.DepthwiseSeparableConvBlock(**config_dict) + + output_tensor = block(input_tensor) + expected_output_shape = [ + batch_size, height // strides, width // strides, num_filters + ] + self.assertEqual(output_tensor.shape.as_list(), expected_output_shape) + + output_tensor = recreate_block(input_tensor) + self.assertEqual(output_tensor.shape.as_list(), expected_output_shape) + + +class ReversibleLayerTest(parameterized.TestCase, tf.test.TestCase): + + @combinations.generate(distribution_strategy_combinations()) + def test_downsampling_non_reversible_step(self, distribution): + bsz, h, w, c = 8, 32, 32, 32 + filters = 64 + strides = 2 + + input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) + with distribution.scope(): + f = nn_blocks.ResidualInner( + filters=filters // 2, strides=strides, batch_norm_first=True) + g = nn_blocks.ResidualInner( + filters=filters // 2, strides=1, batch_norm_first=True) + test_layer = nn_blocks.ReversibleLayer(f, g) + test_layer.build(input_tensor.shape) + optimizer = tf.keras.optimizers.SGD(learning_rate=0.01) + + @tf.function + def step_fn(): + with tf.GradientTape() as tape: + output = test_layer(input_tensor, training=True) + grads = tape.gradient(output, test_layer.trainable_variables) + # Test applying gradients with optimizer works + optimizer.apply_gradients(zip(grads, test_layer.trainable_variables)) + + return output + + replica_output = distribution.run(step_fn) + outputs = distribution.experimental_local_results(replica_output) + + # Assert forward pass shape + expected_output_shape = [bsz, h // strides, w // strides, filters] + for output in outputs: + self.assertEqual(expected_output_shape, output.shape.as_list()) + + @combinations.generate(distribution_strategy_combinations()) + def test_reversible_step(self, distribution): + # Reversible layers satisfy: (a) strides = 1 (b) in_filter = out_filter + bsz, h, w, c = 8, 32, 32, 32 + filters = c + strides = 1 + + input_tensor = tf.random.uniform(shape=[bsz, h, w, c]) + with distribution.scope(): + f = nn_blocks.ResidualInner( + filters=filters // 2, strides=strides, batch_norm_first=False) + g = nn_blocks.ResidualInner( + filters=filters // 2, strides=1, batch_norm_first=False) + test_layer = nn_blocks.ReversibleLayer(f, g) + test_layer(input_tensor, training=False) # init weights + optimizer = tf.keras.optimizers.SGD(learning_rate=0.01) + + @tf.function + def step_fn(): + with tf.GradientTape() as tape: + output = test_layer(input_tensor, training=True) + grads = tape.gradient(output, test_layer.trainable_variables) + # Test applying gradients with optimizer works + optimizer.apply_gradients(zip(grads, test_layer.trainable_variables)) + + return output + + @tf.function + def fwd(): + test_layer(input_tensor) + + distribution.run(fwd) # Initialize variables + prev_variables = tf.identity_n(test_layer.trainable_variables) + replica_output = distribution.run(step_fn) + outputs = distribution.experimental_local_results(replica_output) + + # Assert variables values have changed values + for v0, v1 in zip(prev_variables, test_layer.trainable_variables): + self.assertNotAllEqual(v0, v1) + + # Assert forward pass shape + expected_output_shape = [bsz, h // strides, w // strides, filters] + for output in outputs: + self.assertEqual(expected_output_shape, output.shape.as_list()) + + @combinations.generate(distribution_strategy_combinations()) + def test_manual_gradients_correctness(self, distribution): + bsz, h, w, c = 8, 32, 32, 32 + filters = c + strides = 1 + + input_tensor = tf.random.uniform(shape=[bsz, h, w, c * 4]) # bottleneck + with distribution.scope(): + f_manual = nn_blocks.BottleneckResidualInner( + filters=filters // 2, strides=strides, batch_norm_first=False) + g_manual = nn_blocks.BottleneckResidualInner( + filters=filters // 2, strides=1, batch_norm_first=False) + manual_grad_layer = nn_blocks.ReversibleLayer(f_manual, g_manual) + manual_grad_layer(input_tensor, training=False) # init weights + + f_auto = nn_blocks.BottleneckResidualInner( + filters=filters // 2, strides=strides, batch_norm_first=False) + g_auto = nn_blocks.BottleneckResidualInner( + filters=filters // 2, strides=1, batch_norm_first=False) + auto_grad_layer = nn_blocks.ReversibleLayer( + f_auto, g_auto, manual_grads=False) + auto_grad_layer(input_tensor) # init weights + # Clone all weights (tf.keras.layers.Layer has no .clone()) + auto_grad_layer._f.set_weights(manual_grad_layer._f.get_weights()) + auto_grad_layer._g.set_weights(manual_grad_layer._g.get_weights()) + + @tf.function + def manual_fn(): + with tf.GradientTape() as tape: + output = manual_grad_layer(input_tensor, training=True) + grads = tape.gradient(output, manual_grad_layer.trainable_variables) + return grads + + @tf.function + def auto_fn(): + with tf.GradientTape() as tape: + output = auto_grad_layer(input_tensor, training=True) + grads = tape.gradient(output, auto_grad_layer.trainable_variables) + return grads + + manual_grads = distribution.run(manual_fn) + auto_grads = distribution.run(auto_fn) + + # Assert gradients calculated manually are close to that from autograd + for manual_grad, auto_grad in zip(manual_grads, auto_grads): + self.assertAllClose( + distribution.experimental_local_results(manual_grad), + distribution.experimental_local_results(auto_grad), + atol=5e-3, + rtol=5e-3) + + # Verify that BN moving mean and variance is correct. + for manual_var, auto_var in zip(manual_grad_layer.non_trainable_variables, + auto_grad_layer.non_trainable_variables): + self.assertAllClose(manual_var, auto_var) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/modeling/layers/nn_layers.py b/official/vision/modeling/layers/nn_layers.py new file mode 100644 index 0000000000000000000000000000000000000000..36858207fa67d992afe770d22c4e12b60a4937e7 --- /dev/null +++ b/official/vision/modeling/layers/nn_layers.py @@ -0,0 +1,1281 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Contains common building blocks for neural networks.""" +from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union + +from absl import logging +import tensorflow as tf +import tensorflow_addons as tfa + +from official.modeling import tf_utils +from official.vision.ops import spatial_transform_ops + + +# Type annotations. +States = Dict[str, tf.Tensor] +Activation = Union[str, Callable] + + +def make_divisible(value: float, + divisor: int, + min_value: Optional[float] = None, + round_down_protect: bool = True, + ) -> int: + """This is to ensure that all layers have channels that are divisible by 8. + + Args: + value: A `float` of original value. + divisor: An `int` of the divisor that need to be checked upon. + min_value: A `float` of minimum value threshold. + round_down_protect: A `bool` indicating whether round down more than 10% + will be allowed. + + Returns: + The adjusted value in `int` that is divisible against divisor. + """ + if min_value is None: + min_value = divisor + new_value = max(min_value, int(value + divisor / 2) // divisor * divisor) + # Make sure that round down does not go down by more than 10%. + if round_down_protect and new_value < 0.9 * value: + new_value += divisor + return int(new_value) + + +def round_filters(filters: int, + multiplier: float, + divisor: int = 8, + min_depth: Optional[int] = None, + round_down_protect: bool = True, + skip: bool = False) -> int: + """Rounds number of filters based on width multiplier.""" + orig_f = filters + if skip or not multiplier: + return filters + + new_filters = make_divisible(value=filters * multiplier, + divisor=divisor, + min_value=min_depth, + round_down_protect=round_down_protect) + + logging.info('round_filter input=%s output=%s', orig_f, new_filters) + return int(new_filters) + + +def get_padding_for_kernel_size(kernel_size): + """Compute padding size given kernel size.""" + if kernel_size == 7: + return (3, 3) + elif kernel_size == 3: + return (1, 1) + else: + raise ValueError('Padding for kernel size {} not known.'.format( + kernel_size)) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SqueezeExcitation(tf.keras.layers.Layer): + """Creates a squeeze and excitation layer.""" + + def __init__(self, + in_filters, + out_filters, + se_ratio, + divisible_by=1, + use_3d_input=False, + kernel_initializer='VarianceScaling', + kernel_regularizer=None, + bias_regularizer=None, + activation='relu', + gating_activation='sigmoid', + round_down_protect=True, + **kwargs): + """Initializes a squeeze and excitation layer. + + Args: + in_filters: An `int` number of filters of the input tensor. + out_filters: An `int` number of filters of the output tensor. + se_ratio: A `float` or None. If not None, se ratio for the squeeze and + excitation layer. + divisible_by: An `int` that ensures all inner dimensions are divisible by + this number. + use_3d_input: A `bool` of whether input is 2D or 3D image. + kernel_initializer: A `str` of kernel_initializer for convolutional + layers. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default to None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2d. + Default to None. + activation: A `str` name of the activation function. + gating_activation: A `str` name of the activation function for final + gating function. + round_down_protect: A `bool` of whether round down more than 10% will be + allowed. + **kwargs: Additional keyword arguments to be passed. + """ + super(SqueezeExcitation, self).__init__(**kwargs) + + self._in_filters = in_filters + self._out_filters = out_filters + self._se_ratio = se_ratio + self._divisible_by = divisible_by + self._round_down_protect = round_down_protect + self._use_3d_input = use_3d_input + self._activation = activation + self._gating_activation = gating_activation + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._bias_regularizer = bias_regularizer + if tf.keras.backend.image_data_format() == 'channels_last': + if not use_3d_input: + self._spatial_axis = [1, 2] + else: + self._spatial_axis = [1, 2, 3] + else: + if not use_3d_input: + self._spatial_axis = [2, 3] + else: + self._spatial_axis = [2, 3, 4] + self._activation_fn = tf_utils.get_activation(activation) + self._gating_activation_fn = tf_utils.get_activation(gating_activation) + + def build(self, input_shape): + num_reduced_filters = make_divisible( + max(1, int(self._in_filters * self._se_ratio)), + divisor=self._divisible_by, + round_down_protect=self._round_down_protect) + + self._se_reduce = tf.keras.layers.Conv2D( + filters=num_reduced_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=True, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + + self._se_expand = tf.keras.layers.Conv2D( + filters=self._out_filters, + kernel_size=1, + strides=1, + padding='same', + use_bias=True, + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + bias_regularizer=self._bias_regularizer) + + super(SqueezeExcitation, self).build(input_shape) + + def get_config(self): + config = { + 'in_filters': self._in_filters, + 'out_filters': self._out_filters, + 'se_ratio': self._se_ratio, + 'divisible_by': self._divisible_by, + 'use_3d_input': self._use_3d_input, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'bias_regularizer': self._bias_regularizer, + 'activation': self._activation, + 'gating_activation': self._gating_activation, + 'round_down_protect': self._round_down_protect, + } + base_config = super(SqueezeExcitation, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + x = tf.reduce_mean(inputs, self._spatial_axis, keepdims=True) + x = self._activation_fn(self._se_reduce(x)) + x = self._gating_activation_fn(self._se_expand(x)) + return x * inputs + + +def get_stochastic_depth_rate(init_rate, i, n): + """Get drop connect rate for the ith block. + + Args: + init_rate: A `float` of initial drop rate. + i: An `int` of order of the current block. + n: An `int` total number of blocks. + + Returns: + Drop rate of the ith block. + """ + if init_rate is not None: + if init_rate < 0 or init_rate > 1: + raise ValueError('Initial drop rate must be within 0 and 1.') + rate = init_rate * float(i) / n + else: + rate = None + return rate + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class StochasticDepth(tf.keras.layers.Layer): + """Creates a stochastic depth layer.""" + + def __init__(self, stochastic_depth_drop_rate, **kwargs): + """Initializes a stochastic depth layer. + + Args: + stochastic_depth_drop_rate: A `float` of drop rate. + **kwargs: Additional keyword arguments to be passed. + + Returns: + A output `tf.Tensor` of which should have the same shape as input. + """ + super(StochasticDepth, self).__init__(**kwargs) + self._drop_rate = stochastic_depth_drop_rate + + def get_config(self): + config = {'stochastic_depth_drop_rate': self._drop_rate} + base_config = super(StochasticDepth, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs, training=None): + if training is None: + training = tf.keras.backend.learning_phase() + if not training or self._drop_rate is None or self._drop_rate == 0: + return inputs + + keep_prob = 1.0 - self._drop_rate + batch_size = tf.shape(inputs)[0] + random_tensor = keep_prob + random_tensor += tf.random.uniform( + [batch_size] + [1] * (inputs.shape.rank - 1), dtype=inputs.dtype) + binary_tensor = tf.floor(random_tensor) + output = tf.math.divide(inputs, keep_prob) * binary_tensor + return output + + +@tf.keras.utils.register_keras_serializable(package='Vision') +def pyramid_feature_fusion(inputs, target_level): + """Fuses all feature maps in the feature pyramid at the target level. + + Args: + inputs: A dictionary containing the feature pyramid. The size of the input + tensor needs to be fixed. + target_level: An `int` of the target feature level for feature fusion. + + Returns: + A `float` `tf.Tensor` of shape [batch_size, feature_height, feature_width, + feature_channel]. + """ + # Convert keys to int. + pyramid_feats = {int(k): v for k, v in inputs.items()} + min_level = min(pyramid_feats.keys()) + max_level = max(pyramid_feats.keys()) + resampled_feats = [] + + for l in range(min_level, max_level + 1): + if l == target_level: + resampled_feats.append(pyramid_feats[l]) + else: + feat = pyramid_feats[l] + target_size = list(feat.shape[1:3]) + target_size[0] *= 2**(l - target_level) + target_size[1] *= 2**(l - target_level) + # Casts feat to float32 so the resize op can be run on TPU. + feat = tf.cast(feat, tf.float32) + feat = tf.image.resize( + feat, size=target_size, method=tf.image.ResizeMethod.BILINEAR) + # Casts it back to be compatible with the rest opetations. + feat = tf.cast(feat, pyramid_feats[l].dtype) + resampled_feats.append(feat) + + return tf.math.add_n(resampled_feats) + + +class PanopticFPNFusion(tf.keras.Model): + """Creates a Panoptic FPN feature Fusion layer. + + This implements feature fusion for semantic segmentation head from the paper: + Alexander Kirillov, Ross Girshick, Kaiming He and Piotr Dollar. + Panoptic Feature Pyramid Networks. + (https://arxiv.org/pdf/1901.02446.pdf) + """ + + def __init__( + self, + min_level: int = 2, + max_level: int = 5, + target_level: int = 2, + num_filters: int = 128, + num_fpn_filters: int = 256, + activation: str = 'relu', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + bias_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + + """Initializes panoptic FPN feature fusion layer. + + Args: + min_level: An `int` of minimum level to use in feature fusion. + max_level: An `int` of maximum level to use in feature fusion. + target_level: An `int` of the target feature level for feature fusion. + num_filters: An `int` number of filters in conv2d layers. + num_fpn_filters: An `int` number of filters in the FPN outputs + activation: A `str` name of the activation function. + kernel_regularizer: A `tf.keras.regularizers.Regularizer` object for + Conv2D. Default is None. + bias_regularizer: A `tf.keras.regularizers.Regularizer` object for Conv2D. + **kwargs: Additional keyword arguments to be passed. + Returns: + A `float` `tf.Tensor` of shape [batch_size, feature_height, feature_width, + feature_channel]. + """ + if target_level > max_level: + raise ValueError('target_level should be less than max_level') + + self._config_dict = { + 'min_level': min_level, + 'max_level': max_level, + 'target_level': target_level, + 'num_filters': num_filters, + 'num_fpn_filters': num_fpn_filters, + 'activation': activation, + 'kernel_regularizer': kernel_regularizer, + 'bias_regularizer': bias_regularizer, + } + norm = tfa.layers.GroupNormalization + conv2d = tf.keras.layers.Conv2D + activation_fn = tf_utils.get_activation(activation) + if tf.keras.backend.image_data_format() == 'channels_last': + norm_axis = -1 + else: + norm_axis = 1 + inputs = self._build_inputs(num_fpn_filters, min_level, max_level) + + upscaled_features = [] + for level in range(min_level, max_level + 1): + num_conv_layers = max(1, level - target_level) + x = inputs[str(level)] + for i in range(num_conv_layers): + x = conv2d( + filters=num_filters, + kernel_size=3, + padding='same', + kernel_initializer=tf.keras.initializers.VarianceScaling(), + kernel_regularizer=kernel_regularizer, + bias_regularizer=bias_regularizer)(x) + x = norm(groups=32, axis=norm_axis)(x) + x = activation_fn(x) + if level != target_level: + x = spatial_transform_ops.nearest_upsampling(x, scale=2) + upscaled_features.append(x) + + fused_features = tf.math.add_n(upscaled_features) + self._output_specs = {str(target_level): fused_features.get_shape()} + + super(PanopticFPNFusion, self).__init__( + inputs=inputs, outputs=fused_features, **kwargs) + + def _build_inputs(self, num_filters: int, + min_level: int, max_level: int): + inputs = {} + for level in range(min_level, max_level + 1): + inputs[str(level)] = tf.keras.Input(shape=[None, None, num_filters]) + return inputs + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) + + @property + def output_specs(self) -> Mapping[str, tf.TensorShape]: + """A dict of {level: TensorShape} pairs for the model output.""" + return self._output_specs + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class Scale(tf.keras.layers.Layer): + """Scales the input by a trainable scalar weight. + + This is useful for applying ReZero to layers, which improves convergence + speed. This implements the paper: + ReZero is All You Need: Fast Convergence at Large Depth. + (https://arxiv.org/pdf/2003.04887.pdf). + """ + + def __init__( + self, + initializer: tf.keras.initializers.Initializer = 'ones', + regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + **kwargs): + """Initializes a scale layer. + + Args: + initializer: A `str` of initializer for the scalar weight. + regularizer: A `tf.keras.regularizers.Regularizer` for the scalar weight. + **kwargs: Additional keyword arguments to be passed to this layer. + + Returns: + An `tf.Tensor` of which should have the same shape as input. + """ + super(Scale, self).__init__(**kwargs) + + self._initializer = initializer + self._regularizer = regularizer + + self._scale = self.add_weight( + name='scale', + shape=[], + dtype=self.dtype, + initializer=self._initializer, + regularizer=self._regularizer, + trainable=True) + + def get_config(self): + """Returns a dictionary containing the config used for initialization.""" + config = { + 'initializer': self._initializer, + 'regularizer': self._regularizer, + } + base_config = super(Scale, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + """Calls the layer with the given inputs.""" + scale = tf.cast(self._scale, inputs.dtype) + return scale * inputs + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class TemporalSoftmaxPool(tf.keras.layers.Layer): + """Creates a network layer corresponding to temporal softmax pooling. + + This is useful for multi-class logits (used in e.g., Charades). Modified from + AssembleNet Charades evaluation from: + + Michael S. Ryoo, AJ Piergiovanni, Mingxing Tan, Anelia Angelova. + AssembleNet: Searching for Multi-Stream Neural Connectivity in Video + Architectures. + (https://arxiv.org/pdf/1905.13209.pdf). + """ + + def call(self, inputs): + """Calls the layer with the given inputs.""" + assert inputs.shape.rank in (3, 4, 5) + frames = tf.shape(inputs)[1] + pre_logits = inputs / tf.sqrt(tf.cast(frames, inputs.dtype)) + activations = tf.nn.softmax(pre_logits, axis=1) + outputs = inputs * activations + return outputs + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class PositionalEncoding(tf.keras.layers.Layer): + """Creates a network layer that adds a sinusoidal positional encoding. + + Positional encoding is incremented across frames, and is added to the input. + The positional encoding is first weighted at 0 so that the network can choose + to ignore it. This implements: + + Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, + Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. + Attention Is All You Need. + (https://arxiv.org/pdf/1706.03762.pdf). + """ + + def __init__(self, + initializer: tf.keras.initializers.Initializer = 'zeros', + cache_encoding: bool = False, + state_prefix: Optional[str] = None, + **kwargs): + """Initializes positional encoding. + + Args: + initializer: A `str` of initializer for weighting the positional encoding. + cache_encoding: A `bool`. If True, cache the positional encoding tensor + after calling build. Otherwise, rebuild the tensor for every call. + Setting this to False can be useful when we want to input a variable + number of frames, so the positional encoding tensor can change shape. + state_prefix: a prefix string to identify states. + **kwargs: Additional keyword arguments to be passed to this layer. + + Returns: + A `tf.Tensor` of which should have the same shape as input. + """ + super(PositionalEncoding, self).__init__(**kwargs) + self._initializer = initializer + self._cache_encoding = cache_encoding + self._pos_encoding = None + self._rezero = Scale(initializer=initializer, name='rezero') + state_prefix = state_prefix if state_prefix is not None else '' + self._state_prefix = state_prefix + self._frame_count_name = f'{state_prefix}_pos_enc_frame_count' + + def get_config(self): + """Returns a dictionary containing the config used for initialization.""" + config = { + 'initializer': self._initializer, + 'cache_encoding': self._cache_encoding, + 'state_prefix': self._state_prefix, + } + base_config = super(PositionalEncoding, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def _positional_encoding(self, + num_positions: Union[int, tf.Tensor], + hidden_size: Union[int, tf.Tensor], + start_position: Union[int, tf.Tensor] = 0, + dtype: str = 'float32') -> tf.Tensor: + """Creates a sequence of sinusoidal positional encoding vectors. + + Args: + num_positions: the total number of positions (frames). + hidden_size: the number of channels used for the hidden vectors. + start_position: the start position. + dtype: the dtype of the output tensor. + + Returns: + The positional encoding tensor with shape [num_positions, hidden_size]. + """ + if isinstance(start_position, tf.Tensor) and start_position.shape.rank == 1: + start_position = start_position[0] + + # Calling `tf.range` with `dtype=tf.bfloat16` results in an error, + # so we cast afterward. + positions = tf.range(start_position, start_position + num_positions) + positions = tf.cast(positions, dtype)[:, tf.newaxis] + idx = tf.range(hidden_size)[tf.newaxis, :] + + power = tf.cast(2 * (idx // 2), dtype) + power /= tf.cast(hidden_size, dtype) + angles = 1. / tf.math.pow(10_000., power) + radians = positions * angles + + sin = tf.math.sin(radians[:, 0::2]) + cos = tf.math.cos(radians[:, 1::2]) + pos_encoding = tf.concat([sin, cos], axis=-1) + + return pos_encoding + + def _get_pos_encoding(self, + input_shape: tf.Tensor, + frame_count: int = 0) -> tf.Tensor: + """Calculates the positional encoding from the input shape. + + Args: + input_shape: the shape of the input. + frame_count: a count of frames that indicates the index of the first + frame. + + Returns: + The positional encoding tensor with shape [num_positions, hidden_size]. + + """ + frames = input_shape[1] + channels = input_shape[-1] + pos_encoding = self._positional_encoding( + frames, channels, start_position=frame_count, dtype=self.dtype) + pos_encoding = tf.reshape(pos_encoding, [1, frames, 1, 1, channels]) + return pos_encoding + + def build(self, input_shape): + """Builds the layer with the given input shape. + + Args: + input_shape: The input shape. + + Raises: + ValueError: If using 'channels_first' data format. + """ + if tf.keras.backend.image_data_format() == 'channels_first': + raise ValueError('"channels_first" mode is unsupported.') + + if self._cache_encoding: + self._pos_encoding = self._get_pos_encoding(input_shape) + + super(PositionalEncoding, self).build(input_shape) + + def call( + self, + inputs: tf.Tensor, + states: Optional[States] = None, + output_states: bool = True, + ) -> Union[tf.Tensor, Tuple[tf.Tensor, States]]: + """Calls the layer with the given inputs. + + Args: + inputs: An input `tf.Tensor`. + states: A `dict` of states such that, if any of the keys match for this + layer, will overwrite the contents of the buffer(s). Expected keys + include `state_prefix + '_pos_enc_frame_count'`. + output_states: A `bool`. If True, returns the output tensor and output + states. Returns just the output tensor otherwise. + + Returns: + An output `tf.Tensor` (and optionally the states if `output_states=True`). + + Raises: + ValueError: If using 'channels_first' data format. + """ + states = dict(states) if states is not None else {} + + # Keep a count of frames encountered across input iterations in + # num_frames to be able to accurately update the positional encoding. + num_frames = tf.shape(inputs)[1] + frame_count = tf.cast(states.get(self._frame_count_name, [0]), tf.int32) + states[self._frame_count_name] = frame_count + num_frames + + if self._cache_encoding: + pos_encoding = self._pos_encoding + else: + pos_encoding = self._get_pos_encoding( + tf.shape(inputs), frame_count=frame_count) + pos_encoding = tf.cast(pos_encoding, inputs.dtype) + pos_encoding = self._rezero(pos_encoding) + outputs = inputs + pos_encoding + + return (outputs, states) if output_states else outputs + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class GlobalAveragePool3D(tf.keras.layers.Layer): + """Creates a global average pooling layer with causal mode. + + Implements causal mode, which runs a cumulative sum (with `tf.cumsum`) across + frames in the time dimension, allowing the use of a stream buffer. Sums any + valid input state with the current input to allow state to accumulate over + several iterations. + """ + + def __init__(self, + keepdims: bool = False, + causal: bool = False, + state_prefix: Optional[str] = None, + **kwargs): + """Initializes a global average pool layer. + + Args: + keepdims: A `bool`. If True, keep the averaged dimensions. + causal: A `bool` of whether to run in causal mode with a cumulative sum + across frames. + state_prefix: a prefix string to identify states. + **kwargs: Additional keyword arguments to be passed to this layer. + + Returns: + An output `tf.Tensor`. + """ + super(GlobalAveragePool3D, self).__init__(**kwargs) + + self._keepdims = keepdims + self._causal = causal + state_prefix = state_prefix if state_prefix is not None else '' + self._state_prefix = state_prefix + + self._state_name = f'{state_prefix}_pool_buffer' + self._frame_count_name = f'{state_prefix}_pool_frame_count' + + def get_config(self): + """Returns a dictionary containing the config used for initialization.""" + config = { + 'keepdims': self._keepdims, + 'causal': self._causal, + 'state_prefix': self._state_prefix, + } + base_config = super(GlobalAveragePool3D, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, + inputs: tf.Tensor, + states: Optional[States] = None, + output_states: bool = False + ) -> Union[tf.Tensor, Tuple[tf.Tensor, States]]: + """Calls the layer with the given inputs. + + Args: + inputs: An input `tf.Tensor`. + states: A `dict` of states such that, if any of the keys match for this + layer, will overwrite the contents of the buffer(s). + Expected keys include `state_prefix + '__pool_buffer'` and + `state_prefix + '__pool_frame_count'`. + output_states: A `bool`. If True, returns the output tensor and output + states. Returns just the output tensor otherwise. + + Returns: + An output `tf.Tensor` (and optionally the states if `output_states=True`). + If `causal=True`, the output tensor will have shape + `[batch_size, num_frames, 1, 1, channels]` if `keepdims=True`. We keep + the frame dimension in this case to simulate a cumulative global average + as if we are inputting one frame at a time. If `causal=False`, the output + is equivalent to `tf.keras.layers.GlobalAveragePooling3D` with shape + `[batch_size, 1, 1, 1, channels]` if `keepdims=True` (plus the optional + buffer stored in `states`). + + Raises: + ValueError: If using 'channels_first' data format. + """ + states = dict(states) if states is not None else {} + + if tf.keras.backend.image_data_format() == 'channels_first': + raise ValueError('"channels_first" mode is unsupported.') + + # Shape: [batch_size, 1, 1, 1, channels] + buffer = states.get(self._state_name, None) + if buffer is None: + buffer = tf.zeros_like(inputs[:, :1, :1, :1], dtype=inputs.dtype) + states[self._state_name] = buffer + + # Keep a count of frames encountered across input iterations in + # num_frames to be able to accurately take a cumulative average across + # all frames when running in streaming mode + num_frames = tf.shape(inputs)[1] + frame_count = states.get(self._frame_count_name, tf.constant([0])) + frame_count = tf.cast(frame_count, tf.int32) + states[self._frame_count_name] = frame_count + num_frames + + if self._causal: + # Take a mean of spatial dimensions to make computation more efficient. + x = tf.reduce_mean(inputs, axis=[2, 3], keepdims=True) + x = tf.cumsum(x, axis=1) + x = x + buffer + + # The last frame will be the value of the next state + # Shape: [batch_size, 1, 1, 1, channels] + states[self._state_name] = x[:, -1:] + + # In causal mode, the divisor increments by 1 for every frame to + # calculate cumulative averages instead of one global average + mean_divisors = tf.range(num_frames) + frame_count + 1 + mean_divisors = tf.reshape(mean_divisors, [1, num_frames, 1, 1, 1]) + mean_divisors = tf.cast(mean_divisors, x.dtype) + + # Shape: [batch_size, num_frames, 1, 1, channels] + x = x / mean_divisors + else: + # In non-causal mode, we (optionally) sum across frames to take a + # cumulative average across input iterations rather than individual + # frames. If no buffer state is passed, this essentially becomes + # regular global average pooling. + # Shape: [batch_size, 1, 1, 1, channels] + x = tf.reduce_sum(inputs, axis=(1, 2, 3), keepdims=True) + x = x / tf.cast(tf.shape(inputs)[2] * tf.shape(inputs)[3], x.dtype) + x = x + buffer + + # Shape: [batch_size, 1, 1, 1, channels] + states[self._state_name] = x + + x = x / tf.cast(frame_count + num_frames, x.dtype) + + if not self._keepdims: + x = tf.squeeze(x, axis=(1, 2, 3)) + + return (x, states) if output_states else x + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SpatialAveragePool3D(tf.keras.layers.Layer): + """Creates a global average pooling layer pooling across spatial dimentions.""" + + def __init__(self, keepdims: bool = False, **kwargs): + """Initializes a global average pool layer. + + Args: + keepdims: A `bool`. If True, keep the averaged dimensions. + **kwargs: Additional keyword arguments to be passed to this layer. + + Returns: + An output `tf.Tensor`. + """ + super(SpatialAveragePool3D, self).__init__(**kwargs) + self._keepdims = keepdims + + def get_config(self): + """Returns a dictionary containing the config used for initialization.""" + config = { + 'keepdims': self._keepdims, + } + base_config = super(SpatialAveragePool3D, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def build(self, input_shape): + """Builds the layer with the given input shape.""" + if tf.keras.backend.image_data_format() == 'channels_first': + raise ValueError('"channels_first" mode is unsupported.') + + super(SpatialAveragePool3D, self).build(input_shape) + + def call(self, inputs, states=None, output_states: bool = False): + """Calls the layer with the given inputs.""" + if inputs.shape.rank != 5: + raise ValueError( + 'Input should have rank {}, got {}'.format(5, inputs.shape.rank)) + + output = tf.reduce_mean(inputs, axis=(2, 3), keepdims=self._keepdims) + return (output, states) if output_states else output + + +class CausalConvMixin: + """Mixin class to implement CausalConv for `tf.keras.layers.Conv` layers.""" + + @property + def use_buffered_input(self) -> bool: + return self._use_buffered_input + + @use_buffered_input.setter + def use_buffered_input(self, variable: bool): + self._use_buffered_input = variable + + def _compute_buffered_causal_padding(self, + inputs: tf.Tensor, + use_buffered_input: bool = False, + time_axis: int = 1, + ) -> List[List[int]]: + """Calculates padding for 'causal' option for conv layers. + + Args: + inputs: An optional input `tf.Tensor` to be padded. + use_buffered_input: A `bool`. If True, use 'valid' padding along the time + dimension. This should be set when applying the stream buffer. + time_axis: An `int` of the axis of the time dimension. + + Returns: + A list of paddings for `tf.pad`. + """ + input_shape = tf.shape(inputs)[1:-1] + + if tf.keras.backend.image_data_format() == 'channels_first': + raise ValueError('"channels_first" mode is unsupported.') + + kernel_size_effective = [ + (self.kernel_size[i] + + (self.kernel_size[i] - 1) * (self.dilation_rate[i] - 1)) + for i in range(self.rank) + ] + pad_total = [kernel_size_effective[0] - 1] + for i in range(1, self.rank): + overlap = (input_shape[i] - 1) % self.strides[i] + 1 + pad_total.append(tf.maximum(kernel_size_effective[i] - overlap, 0)) + pad_beg = [pad_total[i] // 2 for i in range(self.rank)] + pad_end = [pad_total[i] - pad_beg[i] for i in range(self.rank)] + padding = [[pad_beg[i], pad_end[i]] for i in range(self.rank)] + padding = [[0, 0]] + padding + [[0, 0]] + + if use_buffered_input: + padding[time_axis] = [0, 0] + else: + padding[time_axis] = [padding[time_axis][0] + padding[time_axis][1], 0] + return padding + + def _causal_validate_init(self): + """Validates the Conv layer initial configuration.""" + # Overriding this method is meant to circumvent unnecessary errors when + # using causal padding. + if (self.filters is not None + and self.filters % self.groups != 0): + raise ValueError( + 'The number of filters must be evenly divisible by the number of ' + 'groups. Received: groups={}, filters={}'.format( + self.groups, self.filters)) + + if not all(self.kernel_size): + raise ValueError('The argument `kernel_size` cannot contain 0(s). ' + 'Received: %s' % (self.kernel_size,)) + + def _buffered_spatial_output_shape(self, spatial_output_shape: List[int]): + """Computes the spatial output shape from the input shape.""" + # When buffer padding, use 'valid' padding across time. The output shape + # across time should be the input shape minus any padding, assuming + # the stride across time is 1. + if self._use_buffered_input and spatial_output_shape[0] is not None: + padding = self._compute_buffered_causal_padding( + tf.zeros([1] + spatial_output_shape + [1]), use_buffered_input=False) + spatial_output_shape[0] -= sum(padding[1]) + return spatial_output_shape + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class Conv2D(tf.keras.layers.Conv2D, CausalConvMixin): + """Conv2D layer supporting CausalConv. + + Supports `padding='causal'` option (like in `tf.keras.layers.Conv1D`), + which applies causal padding to the temporal dimension, and same padding in + the spatial dimensions. + """ + + def __init__(self, *args, use_buffered_input=False, **kwargs): + """Initializes conv2d. + + Args: + *args: Arguments to be passed. + use_buffered_input: A `bool`. If True, the input is expected to be padded + beforehand. In effect, calling this layer will use 'valid' padding on + the temporal dimension to simulate 'causal' padding. + **kwargs: Additional keyword arguments to be passed. + + Returns: + An output `tf.Tensor` of the Conv2D operation. + """ + super(Conv2D, self).__init__(*args, **kwargs) + self._use_buffered_input = use_buffered_input + + def get_config(self): + """Returns a dictionary containing the config used for initialization.""" + config = { + 'use_buffered_input': self._use_buffered_input, + } + base_config = super(Conv2D, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def _compute_causal_padding(self, inputs): + """Computes causal padding dimensions for the given inputs.""" + return self._compute_buffered_causal_padding( + inputs, use_buffered_input=self._use_buffered_input) + + def _validate_init(self): + """Validates the Conv layer initial configuration.""" + self._causal_validate_init() + + def _spatial_output_shape(self, spatial_input_shape: List[int]): + """Computes the spatial output shape from the input shape.""" + shape = super(Conv2D, self)._spatial_output_shape(spatial_input_shape) + return self._buffered_spatial_output_shape(shape) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class DepthwiseConv2D(tf.keras.layers.DepthwiseConv2D, CausalConvMixin): + """DepthwiseConv2D layer supporting CausalConv. + + Supports `padding='causal'` option (like in `tf.keras.layers.Conv1D`), + which applies causal padding to the temporal dimension, and same padding in + the spatial dimensions. + """ + + def __init__(self, *args, use_buffered_input=False, **kwargs): + """Initializes depthwise conv2d. + + Args: + *args: Arguments to be passed. + use_buffered_input: A `bool`. If True, the input is expected to be padded + beforehand. In effect, calling this layer will use 'valid' padding on + the temporal dimension to simulate 'causal' padding. + **kwargs: Additional keyword arguments to be passed. + + Returns: + An output `tf.Tensor` of the DepthwiseConv2D operation. + """ + super(DepthwiseConv2D, self).__init__(*args, **kwargs) + self._use_buffered_input = use_buffered_input + + # Causal padding is unsupported by default for DepthwiseConv2D, + # so we resort to valid padding internally. However, we handle + # causal padding as a special case with `self._is_causal`, which is + # defined by the super class. + if self.padding == 'causal': + self.padding = 'valid' + + def get_config(self): + """Returns a dictionary containing the config used for initialization.""" + config = { + 'use_buffered_input': self._use_buffered_input, + } + base_config = super(DepthwiseConv2D, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + """Calls the layer with the given inputs.""" + if self._is_causal: + inputs = tf.pad(inputs, self._compute_causal_padding(inputs)) + return super(DepthwiseConv2D, self).call(inputs) + + def _compute_causal_padding(self, inputs): + """Computes causal padding dimensions for the given inputs.""" + return self._compute_buffered_causal_padding( + inputs, use_buffered_input=self._use_buffered_input) + + def _validate_init(self): + """Validates the Conv layer initial configuration.""" + self._causal_validate_init() + + def _spatial_output_shape(self, spatial_input_shape: List[int]): + """Computes the spatial output shape from the input shape.""" + shape = super(DepthwiseConv2D, self)._spatial_output_shape( + spatial_input_shape) + return self._buffered_spatial_output_shape(shape) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class Conv3D(tf.keras.layers.Conv3D, CausalConvMixin): + """Conv3D layer supporting CausalConv. + + Supports `padding='causal'` option (like in `tf.keras.layers.Conv1D`), + which applies causal padding to the temporal dimension, and same padding in + the spatial dimensions. + """ + + def __init__(self, *args, use_buffered_input=False, **kwargs): + """Initializes conv3d. + + Args: + *args: Arguments to be passed. + use_buffered_input: A `bool`. If True, the input is expected to be padded + beforehand. In effect, calling this layer will use 'valid' padding on + the temporal dimension to simulate 'causal' padding. + **kwargs: Additional keyword arguments to be passed. + + Returns: + An output `tf.Tensor` of the Conv3D operation. + """ + super(Conv3D, self).__init__(*args, **kwargs) + self._use_buffered_input = use_buffered_input + + def get_config(self): + """Returns a dictionary containing the config used for initialization.""" + config = { + 'use_buffered_input': self._use_buffered_input, + } + base_config = super(Conv3D, self).get_config() + return dict(list(base_config.items()) + list(config.items())) + + def call(self, inputs): + """Call the layer with the given inputs.""" + # Note: tf.nn.conv3d with depthwise kernels on CPU is currently only + # supported when compiling with TF graph (XLA) using tf.function, so it + # is compiled by default here (b/186463870). + conv_fn = tf.function(super(Conv3D, self).call, jit_compile=True) + return conv_fn(inputs) + + def _compute_causal_padding(self, inputs): + """Computes causal padding dimensions for the given inputs.""" + return self._compute_buffered_causal_padding( + inputs, use_buffered_input=self._use_buffered_input) + + def _validate_init(self): + """Validates the Conv layer initial configuration.""" + self._causal_validate_init() + + def _spatial_output_shape(self, spatial_input_shape: List[int]): + """Computes the spatial output shape from the input shape.""" + shape = super(Conv3D, self)._spatial_output_shape(spatial_input_shape) + return self._buffered_spatial_output_shape(shape) + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SpatialPyramidPooling(tf.keras.layers.Layer): + """Implements the Atrous Spatial Pyramid Pooling. + + References: + [Rethinking Atrous Convolution for Semantic Image Segmentation]( + https://arxiv.org/pdf/1706.05587.pdf) + [Encoder-Decoder with Atrous Separable Convolution for Semantic Image + Segmentation](https://arxiv.org/pdf/1802.02611.pdf) + """ + + def __init__( + self, + output_channels: int, + dilation_rates: List[int], + pool_kernel_size: Optional[List[int]] = None, + use_sync_bn: bool = False, + batchnorm_momentum: float = 0.99, + batchnorm_epsilon: float = 0.001, + activation: str = 'relu', + dropout: float = 0.5, + kernel_initializer: str = 'GlorotUniform', + kernel_regularizer: Optional[tf.keras.regularizers.Regularizer] = None, + interpolation: str = 'bilinear', + use_depthwise_convolution: bool = False, + **kwargs): + """Initializes `SpatialPyramidPooling`. + + Args: + output_channels: Number of channels produced by SpatialPyramidPooling. + dilation_rates: A list of integers for parallel dilated conv. + pool_kernel_size: A list of integers or None. If None, global average + pooling is applied, otherwise an average pooling of pool_kernel_size is + applied. + use_sync_bn: A bool, whether or not to use sync batch normalization. + batchnorm_momentum: A float for the momentum in BatchNorm. Defaults to + 0.99. + batchnorm_epsilon: A float for the epsilon value in BatchNorm. Defaults to + 0.001. + activation: A `str` for type of activation to be used. Defaults to 'relu'. + dropout: A float for the dropout rate before output. Defaults to 0.5. + kernel_initializer: Kernel initializer for conv layers. Defaults to + `glorot_uniform`. + kernel_regularizer: Kernel regularizer for conv layers. Defaults to None. + interpolation: The interpolation method for upsampling. Defaults to + `bilinear`. + use_depthwise_convolution: Allows spatial pooling to be separable + depthwise convolusions. [Encoder-Decoder with Atrous Separable + Convolution for Semantic Image Segmentation]( + https://arxiv.org/pdf/1802.02611.pdf) + **kwargs: Other keyword arguments for the layer. + """ + super().__init__(**kwargs) + + self._output_channels = output_channels + self._dilation_rates = dilation_rates + self._use_sync_bn = use_sync_bn + self._batchnorm_momentum = batchnorm_momentum + self._batchnorm_epsilon = batchnorm_epsilon + self._activation = activation + self._dropout = dropout + self._kernel_initializer = kernel_initializer + self._kernel_regularizer = kernel_regularizer + self._interpolation = interpolation + self._pool_kernel_size = pool_kernel_size + self._use_depthwise_convolution = use_depthwise_convolution + self._activation_fn = tf_utils.get_activation(activation) + if self._use_sync_bn: + self._bn_op = tf.keras.layers.experimental.SyncBatchNormalization + else: + self._bn_op = tf.keras.layers.BatchNormalization + + if tf.keras.backend.image_data_format() == 'channels_last': + self._bn_axis = -1 + else: + self._bn_axis = 1 + + def build(self, input_shape): + height = input_shape[1] + width = input_shape[2] + channels = input_shape[3] + + self.aspp_layers = [] + + conv1 = tf.keras.layers.Conv2D( + filters=self._output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + use_bias=False) + norm1 = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + + self.aspp_layers.append([conv1, norm1]) + + for dilation_rate in self._dilation_rates: + leading_layers = [] + kernel_size = (3, 3) + if self._use_depthwise_convolution: + leading_layers += [ + tf.keras.layers.DepthwiseConv2D( + depth_multiplier=1, + kernel_size=kernel_size, + padding='same', + depthwise_regularizer=self._kernel_regularizer, + depthwise_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + dilation_rate=dilation_rate, + use_bias=False) + ] + kernel_size = (1, 1) + conv_dilation = leading_layers + [ + tf.keras.layers.Conv2D( + filters=self._output_channels, + kernel_size=kernel_size, + padding='same', + kernel_regularizer=self._kernel_regularizer, + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + dilation_rate=dilation_rate, + use_bias=False) + ] + norm_dilation = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + + self.aspp_layers.append(conv_dilation + [norm_dilation]) + + if self._pool_kernel_size is None: + pooling = [ + tf.keras.layers.GlobalAveragePooling2D(), + tf.keras.layers.Reshape((1, 1, channels)) + ] + else: + pooling = [tf.keras.layers.AveragePooling2D(self._pool_kernel_size)] + + conv2 = tf.keras.layers.Conv2D( + filters=self._output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer(self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + use_bias=False) + norm2 = self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + + self.aspp_layers.append(pooling + [conv2, norm2]) + + self._resizing_layer = tf.keras.layers.Resizing( + height, width, interpolation=self._interpolation, dtype=tf.float32) + + self._projection = [ + tf.keras.layers.Conv2D( + filters=self._output_channels, + kernel_size=(1, 1), + kernel_initializer=tf_utils.clone_initializer( + self._kernel_initializer), + kernel_regularizer=self._kernel_regularizer, + use_bias=False), + self._bn_op( + axis=self._bn_axis, + momentum=self._batchnorm_momentum, + epsilon=self._batchnorm_epsilon) + ] + self._dropout_layer = tf.keras.layers.Dropout(rate=self._dropout) + self._concat_layer = tf.keras.layers.Concatenate(axis=-1) + + def call(self, + inputs: tf.Tensor, + training: Optional[bool] = None) -> tf.Tensor: + if training is None: + training = tf.keras.backend.learning_phase() + result = [] + for i, layers in enumerate(self.aspp_layers): + x = inputs + for layer in layers: + # Apply layers sequentially. + x = layer(x, training=training) + x = self._activation_fn(x) + + # Apply resize layer to the end of the last set of layers. + if i == len(self.aspp_layers) - 1: + x = self._resizing_layer(x) + + result.append(tf.cast(x, inputs.dtype)) + x = self._concat_layer(result) + for layer in self._projection: + x = layer(x, training=training) + x = self._activation_fn(x) + return self._dropout_layer(x) + + def get_config(self): + config = { + 'output_channels': self._output_channels, + 'dilation_rates': self._dilation_rates, + 'pool_kernel_size': self._pool_kernel_size, + 'use_sync_bn': self._use_sync_bn, + 'batchnorm_momentum': self._batchnorm_momentum, + 'batchnorm_epsilon': self._batchnorm_epsilon, + 'activation': self._activation, + 'dropout': self._dropout, + 'kernel_initializer': self._kernel_initializer, + 'kernel_regularizer': self._kernel_regularizer, + 'interpolation': self._interpolation, + } + base_config = super().get_config() + return dict(list(base_config.items()) + list(config.items())) diff --git a/official/vision/modeling/layers/nn_layers_test.py b/official/vision/modeling/layers/nn_layers_test.py new file mode 100644 index 0000000000000000000000000000000000000000..e5675f610ffc9d4a936fb54fccedf63726da1436 --- /dev/null +++ b/official/vision/modeling/layers/nn_layers_test.py @@ -0,0 +1,418 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for nn_layers.""" + +# Import libraries +from absl.testing import parameterized +import tensorflow as tf + +from official.vision.modeling.layers import nn_layers + + +class NNLayersTest(parameterized.TestCase, tf.test.TestCase): + + def test_scale(self): + scale = nn_layers.Scale(initializer=tf.keras.initializers.constant(10.)) + output = scale(3.) + self.assertAllEqual(output, 30.) + + def test_temporal_softmax_pool(self): + inputs = tf.range(4, dtype=tf.float32) + 1. + inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) + layer = nn_layers.TemporalSoftmaxPool() + output = layer(inputs) + self.assertAllClose( + output, + [[[[[0.10153633]]], + [[[0.33481020]]], + [[[0.82801306]]], + [[[1.82021690]]]]]) + + def test_positional_encoding(self): + pos_encoding = nn_layers.PositionalEncoding( + initializer='ones', cache_encoding=False) + pos_encoding_cached = nn_layers.PositionalEncoding( + initializer='ones', cache_encoding=True) + + inputs = tf.ones([1, 4, 1, 1, 3]) + outputs, _ = pos_encoding(inputs) + outputs_cached, _ = pos_encoding_cached(inputs) + + expected = tf.constant( + [[[[[1.0000000, 1.0000000, 2.0000000]]], + [[[1.8414710, 1.0021545, 1.5403023]]], + [[[1.9092975, 1.0043088, 0.5838531]]], + [[[1.1411200, 1.0064633, 0.0100075]]]]]) + + self.assertEqual(outputs.shape, expected.shape) + self.assertAllClose(outputs, expected) + + self.assertEqual(outputs.shape, outputs_cached.shape) + self.assertAllClose(outputs, outputs_cached) + + inputs = tf.ones([1, 5, 1, 1, 3]) + _ = pos_encoding(inputs) + + def test_positional_encoding_bfloat16(self): + pos_encoding = nn_layers.PositionalEncoding(initializer='ones') + + inputs = tf.ones([1, 4, 1, 1, 3], dtype=tf.bfloat16) + outputs, _ = pos_encoding(inputs) + + expected = tf.constant( + [[[[[1.0000000, 1.0000000, 2.0000000]]], + [[[1.8414710, 1.0021545, 1.5403023]]], + [[[1.9092975, 1.0043088, 0.5838531]]], + [[[1.1411200, 1.0064633, 0.0100075]]]]]) + + self.assertEqual(outputs.shape, expected.shape) + self.assertAllClose(outputs, expected) + + def test_global_average_pool_basic(self): + pool = nn_layers.GlobalAveragePool3D(keepdims=True) + + inputs = tf.ones([1, 2, 3, 4, 1]) + outputs = pool(inputs, output_states=False) + + expected = tf.ones([1, 1, 1, 1, 1]) + + self.assertEqual(outputs.shape, expected.shape) + self.assertAllEqual(outputs, expected) + + def test_positional_encoding_stream(self): + pos_encoding = nn_layers.PositionalEncoding( + initializer='ones', cache_encoding=False) + + inputs = tf.range(4, dtype=tf.float32) + 1. + inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) + inputs = tf.tile(inputs, [1, 1, 1, 1, 3]) + expected, _ = pos_encoding(inputs) + + for num_splits in [1, 2, 4]: + frames = tf.split(inputs, num_splits, axis=1) + states = {} + predicted = [] + for frame in frames: + output, states = pos_encoding(frame, states=states) + predicted.append(output) + predicted = tf.concat(predicted, axis=1) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + self.assertAllClose(predicted, [[[[[1.0000000, 1.0000000, 2.0000000]]], + [[[2.8414710, 2.0021544, 2.5403023]]], + [[[3.9092975, 3.0043090, 2.5838532]]], + [[[4.1411200, 4.0064630, 3.0100074]]]]]) + + def test_global_average_pool_keras(self): + pool = nn_layers.GlobalAveragePool3D(keepdims=False) + keras_pool = tf.keras.layers.GlobalAveragePooling3D() + + inputs = 10 * tf.random.normal([1, 2, 3, 4, 1]) + + outputs = pool(inputs, output_states=False) + keras_output = keras_pool(inputs) + + self.assertAllEqual(outputs.shape, keras_output.shape) + self.assertAllClose(outputs, keras_output) + + def test_stream_global_average_pool(self): + gap = nn_layers.GlobalAveragePool3D(keepdims=True, causal=False) + + inputs = tf.range(4, dtype=tf.float32) + 1. + inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) + inputs = tf.tile(inputs, [1, 1, 2, 2, 3]) + expected, _ = gap(inputs, output_states=True) + + for num_splits in [1, 2, 4]: + frames = tf.split(inputs, num_splits, axis=1) + states = {} + predicted = None + for frame in frames: + predicted, states = gap(frame, states=states, output_states=True) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + self.assertAllClose( + predicted, + [[[[[2.5, 2.5, 2.5]]]]]) + + def test_causal_stream_global_average_pool(self): + gap = nn_layers.GlobalAveragePool3D(keepdims=True, causal=True) + + inputs = tf.range(4, dtype=tf.float32) + 1. + inputs = tf.reshape(inputs, [1, 4, 1, 1, 1]) + inputs = tf.tile(inputs, [1, 1, 2, 2, 3]) + expected, _ = gap(inputs, output_states=True) + + for num_splits in [1, 2, 4]: + frames = tf.split(inputs, num_splits, axis=1) + states = {} + predicted = [] + for frame in frames: + x, states = gap(frame, states=states, output_states=True) + predicted.append(x) + predicted = tf.concat(predicted, axis=1) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + self.assertAllClose( + predicted, + [[[[[1.0, 1.0, 1.0]]], + [[[1.5, 1.5, 1.5]]], + [[[2.0, 2.0, 2.0]]], + [[[2.5, 2.5, 2.5]]]]]) + + def test_spatial_average_pool(self): + pool = nn_layers.SpatialAveragePool3D(keepdims=True) + + inputs = tf.range(64, dtype=tf.float32) + 1. + inputs = tf.reshape(inputs, [1, 4, 4, 4, 1]) + + output = pool(inputs) + + self.assertEqual(output.shape, [1, 4, 1, 1, 1]) + self.assertAllClose( + output, + [[[[[8.50]]], + [[[24.5]]], + [[[40.5]]], + [[[56.5]]]]]) + + def test_conv2d_causal(self): + conv2d = nn_layers.Conv2D( + filters=3, + kernel_size=(3, 3), + strides=(1, 2), + padding='causal', + use_buffered_input=True, + kernel_initializer='ones', + use_bias=False, + ) + + inputs = tf.ones([1, 4, 2, 3]) + + paddings = [[0, 0], [2, 0], [0, 0], [0, 0]] + padded_inputs = tf.pad(inputs, paddings) + predicted = conv2d(padded_inputs) + + expected = tf.constant( + [[[[6.0, 6.0, 6.0]], + [[12., 12., 12.]], + [[18., 18., 18.]], + [[18., 18., 18.]]]]) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + conv2d.use_buffered_input = False + predicted = conv2d(inputs) + + self.assertFalse(conv2d.use_buffered_input) + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + def test_depthwise_conv2d_causal(self): + conv2d = nn_layers.DepthwiseConv2D( + kernel_size=(3, 3), + strides=(1, 1), + padding='causal', + use_buffered_input=True, + depthwise_initializer='ones', + use_bias=False, + ) + + inputs = tf.ones([1, 2, 2, 3]) + + paddings = [[0, 0], [2, 0], [0, 0], [0, 0]] + padded_inputs = tf.pad(inputs, paddings) + predicted = conv2d(padded_inputs) + + expected = tf.constant( + [[[[2., 2., 2.], + [2., 2., 2.]], + [[4., 4., 4.], + [4., 4., 4.]]]]) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + conv2d.use_buffered_input = False + predicted = conv2d(inputs) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + def test_conv3d_causal(self): + conv3d = nn_layers.Conv3D( + filters=3, + kernel_size=(3, 3, 3), + strides=(1, 2, 2), + padding='causal', + use_buffered_input=True, + kernel_initializer='ones', + use_bias=False, + ) + + inputs = tf.ones([1, 2, 4, 4, 3]) + + paddings = [[0, 0], [2, 0], [0, 0], [0, 0], [0, 0]] + padded_inputs = tf.pad(inputs, paddings) + predicted = conv3d(padded_inputs) + + expected = tf.constant( + [[[[[27., 27., 27.], + [18., 18., 18.]], + [[18., 18., 18.], + [12., 12., 12.]]], + [[[54., 54., 54.], + [36., 36., 36.]], + [[36., 36., 36.], + [24., 24., 24.]]]]]) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + conv3d.use_buffered_input = False + predicted = conv3d(inputs) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + def test_depthwise_conv3d_causal(self): + conv3d = nn_layers.Conv3D( + filters=3, + kernel_size=(3, 3, 3), + strides=(1, 2, 2), + padding='causal', + use_buffered_input=True, + kernel_initializer='ones', + use_bias=False, + groups=3, + ) + + inputs = tf.ones([1, 2, 4, 4, 3]) + + paddings = [[0, 0], [2, 0], [0, 0], [0, 0], [0, 0]] + padded_inputs = tf.pad(inputs, paddings) + predicted = conv3d(padded_inputs) + + expected = tf.constant( + [[[[[9.0, 9.0, 9.0], + [6.0, 6.0, 6.0]], + [[6.0, 6.0, 6.0], + [4.0, 4.0, 4.0]]], + [[[18.0, 18.0, 18.0], + [12., 12., 12.]], + [[12., 12., 12.], + [8., 8., 8.]]]]]) + + output_shape = conv3d._spatial_output_shape([4, 4, 4]) + self.assertAllClose(output_shape, [2, 2, 2]) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + conv3d.use_buffered_input = False + predicted = conv3d(inputs) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + def test_conv3d_causal_padding_2d(self): + """Test to ensure causal padding works like standard padding.""" + conv3d = nn_layers.Conv3D( + filters=1, + kernel_size=(1, 3, 3), + strides=(1, 2, 2), + padding='causal', + use_buffered_input=False, + kernel_initializer='ones', + use_bias=False, + ) + + keras_conv3d = tf.keras.layers.Conv3D( + filters=1, + kernel_size=(1, 3, 3), + strides=(1, 2, 2), + padding='same', + kernel_initializer='ones', + use_bias=False, + ) + + inputs = tf.ones([1, 1, 4, 4, 1]) + + predicted = conv3d(inputs) + expected = keras_conv3d(inputs) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + self.assertAllClose(predicted, + [[[[[9.], + [6.]], + [[6.], + [4.]]]]]) + + def test_conv3d_causal_padding_1d(self): + """Test to ensure causal padding works like standard padding.""" + conv3d = nn_layers.Conv3D( + filters=1, + kernel_size=(3, 1, 1), + strides=(2, 1, 1), + padding='causal', + use_buffered_input=False, + kernel_initializer='ones', + use_bias=False, + ) + + keras_conv1d = tf.keras.layers.Conv1D( + filters=1, + kernel_size=3, + strides=2, + padding='causal', + kernel_initializer='ones', + use_bias=False, + ) + + inputs = tf.ones([1, 4, 1, 1, 1]) + + predicted = conv3d(inputs) + expected = keras_conv1d(tf.squeeze(inputs, axis=[2, 3])) + expected = tf.reshape(expected, [1, 2, 1, 1, 1]) + + self.assertEqual(predicted.shape, expected.shape) + self.assertAllClose(predicted, expected) + + self.assertAllClose(predicted, + [[[[[1.]]], + [[[3.]]]]]) + + @parameterized.parameters( + (None, []), + (None, [6, 12, 18]), + ([32, 32], [6, 12, 18]), + ) + def test_aspp(self, pool_kernel_size, dilation_rates): + inputs = tf.keras.Input(shape=(64, 64, 128), dtype=tf.float32) + layer = nn_layers.SpatialPyramidPooling( + output_channels=256, + dilation_rates=dilation_rates, + pool_kernel_size=pool_kernel_size) + output = layer(inputs) + self.assertAllEqual([None, 64, 64, 256], output.shape) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/modeling/layers/roi_aligner.py b/official/vision/modeling/layers/roi_aligner.py similarity index 95% rename from official/vision/beta/modeling/layers/roi_aligner.py rename to official/vision/modeling/layers/roi_aligner.py index 6f9f55b604ee45a47f43c00c887021f6c0fe932d..93187a9a850a3d62653bc455a3cd1b9729c29417 100644 --- a/official/vision/beta/modeling/layers/roi_aligner.py +++ b/official/vision/modeling/layers/roi_aligner.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ from typing import Mapping import tensorflow as tf -from official.vision.beta.ops import spatial_transform_ops +from official.vision.ops import spatial_transform_ops @tf.keras.utils.register_keras_serializable(package='Vision') diff --git a/official/vision/beta/modeling/layers/roi_aligner_test.py b/official/vision/modeling/layers/roi_aligner_test.py similarity index 90% rename from official/vision/beta/modeling/layers/roi_aligner_test.py rename to official/vision/modeling/layers/roi_aligner_test.py index ce6b124fed4001b2aee82b7b01372caef1b540be..464f8cf9d50524f848dc267145192b41b6876bfc 100644 --- a/official/vision/beta/modeling/layers/roi_aligner_test.py +++ b/official/vision/modeling/layers/roi_aligner_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ # Import libraries import tensorflow as tf -from official.vision.beta.modeling.layers import roi_aligner +from official.vision.modeling.layers import roi_aligner class MultilevelROIAlignerTest(tf.test.TestCase): diff --git a/official/vision/beta/modeling/layers/roi_generator.py b/official/vision/modeling/layers/roi_generator.py similarity index 98% rename from official/vision/beta/modeling/layers/roi_generator.py rename to official/vision/modeling/layers/roi_generator.py index b569a3ba3b92db936498fc8422c9b15ca914958f..3f00bbb648c7b9ff997912f964139e3d512c8ff9 100644 --- a/official/vision/beta/modeling/layers/roi_generator.py +++ b/official/vision/modeling/layers/roi_generator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,8 +17,8 @@ from typing import Optional, Mapping # Import libraries import tensorflow as tf -from official.vision.beta.ops import box_ops -from official.vision.beta.ops import nms +from official.vision.ops import box_ops +from official.vision.ops import nms def _multilevel_propose_rois(raw_boxes: Mapping[str, tf.Tensor], diff --git a/official/vision/beta/modeling/layers/roi_sampler.py b/official/vision/modeling/layers/roi_sampler.py similarity index 96% rename from official/vision/beta/modeling/layers/roi_sampler.py rename to official/vision/modeling/layers/roi_sampler.py index 3625d294443bfe392b5bc48e229aceb415291be3..ba35c274c695516f51f0b2a55d90fcac2c541023 100644 --- a/official/vision/beta/modeling/layers/roi_sampler.py +++ b/official/vision/modeling/layers/roi_sampler.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,10 +17,10 @@ import tensorflow as tf -from official.vision.beta.modeling.layers import box_sampler -from official.vision.beta.ops import box_matcher -from official.vision.beta.ops import iou_similarity -from official.vision.beta.ops import target_gather +from official.vision.modeling.layers import box_sampler +from official.vision.ops import box_matcher +from official.vision.ops import iou_similarity +from official.vision.ops import target_gather @tf.keras.utils.register_keras_serializable(package='Vision') diff --git a/official/vision/modeling/maskrcnn_model.py b/official/vision/modeling/maskrcnn_model.py new file mode 100644 index 0000000000000000000000000000000000000000..8dd7636ddd411c4514d4d2f4e653e316e1d4be79 --- /dev/null +++ b/official/vision/modeling/maskrcnn_model.py @@ -0,0 +1,429 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""R-CNN(-RS) models.""" + +from typing import Any, List, Mapping, Optional, Tuple, Union + +import tensorflow as tf + +from official.vision.ops import anchor +from official.vision.ops import box_ops + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class MaskRCNNModel(tf.keras.Model): + """The Mask R-CNN(-RS) and Cascade RCNN-RS models.""" + + def __init__(self, + backbone: tf.keras.Model, + decoder: tf.keras.Model, + rpn_head: tf.keras.layers.Layer, + detection_head: Union[tf.keras.layers.Layer, + List[tf.keras.layers.Layer]], + roi_generator: tf.keras.layers.Layer, + roi_sampler: Union[tf.keras.layers.Layer, + List[tf.keras.layers.Layer]], + roi_aligner: tf.keras.layers.Layer, + detection_generator: tf.keras.layers.Layer, + mask_head: Optional[tf.keras.layers.Layer] = None, + mask_sampler: Optional[tf.keras.layers.Layer] = None, + mask_roi_aligner: Optional[tf.keras.layers.Layer] = None, + class_agnostic_bbox_pred: bool = False, + cascade_class_ensemble: bool = False, + min_level: Optional[int] = None, + max_level: Optional[int] = None, + num_scales: Optional[int] = None, + aspect_ratios: Optional[List[float]] = None, + anchor_size: Optional[float] = None, + **kwargs): + """Initializes the R-CNN(-RS) model. + + Args: + backbone: `tf.keras.Model`, the backbone network. + decoder: `tf.keras.Model`, the decoder network. + rpn_head: the RPN head. + detection_head: the detection head or a list of heads. + roi_generator: the ROI generator. + roi_sampler: a single ROI sampler or a list of ROI samplers for cascade + detection heads. + roi_aligner: the ROI aligner. + detection_generator: the detection generator. + mask_head: the mask head. + mask_sampler: the mask sampler. + mask_roi_aligner: the ROI alginer for mask prediction. + class_agnostic_bbox_pred: if True, perform class agnostic bounding box + prediction. Needs to be `True` for Cascade RCNN models. + cascade_class_ensemble: if True, ensemble classification scores over all + detection heads. + min_level: Minimum level in output feature maps. + max_level: Maximum level in output feature maps. + num_scales: A number representing intermediate scales added on each level. + For instances, num_scales=2 adds one additional intermediate anchor + scales [2^0, 2^0.5] on each level. + aspect_ratios: A list representing the aspect raito anchors added on each + level. The number indicates the ratio of width to height. For instances, + aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each scale level. + anchor_size: A number representing the scale of size of the base anchor to + the feature stride 2^level. + **kwargs: keyword arguments to be passed. + """ + super(MaskRCNNModel, self).__init__(**kwargs) + self._config_dict = { + 'backbone': backbone, + 'decoder': decoder, + 'rpn_head': rpn_head, + 'detection_head': detection_head, + 'roi_generator': roi_generator, + 'roi_sampler': roi_sampler, + 'roi_aligner': roi_aligner, + 'detection_generator': detection_generator, + 'mask_head': mask_head, + 'mask_sampler': mask_sampler, + 'mask_roi_aligner': mask_roi_aligner, + 'class_agnostic_bbox_pred': class_agnostic_bbox_pred, + 'cascade_class_ensemble': cascade_class_ensemble, + 'min_level': min_level, + 'max_level': max_level, + 'num_scales': num_scales, + 'aspect_ratios': aspect_ratios, + 'anchor_size': anchor_size, + } + self.backbone = backbone + self.decoder = decoder + self.rpn_head = rpn_head + if not isinstance(detection_head, (list, tuple)): + self.detection_head = [detection_head] + else: + self.detection_head = detection_head + self.roi_generator = roi_generator + if not isinstance(roi_sampler, (list, tuple)): + self.roi_sampler = [roi_sampler] + else: + self.roi_sampler = roi_sampler + if len(self.roi_sampler) > 1 and not class_agnostic_bbox_pred: + raise ValueError( + '`class_agnostic_bbox_pred` needs to be True if multiple detection heads are specified.' + ) + self.roi_aligner = roi_aligner + self.detection_generator = detection_generator + self._include_mask = mask_head is not None + self.mask_head = mask_head + if self._include_mask and mask_sampler is None: + raise ValueError('`mask_sampler` is not provided in Mask R-CNN.') + self.mask_sampler = mask_sampler + if self._include_mask and mask_roi_aligner is None: + raise ValueError('`mask_roi_aligner` is not provided in Mask R-CNN.') + self.mask_roi_aligner = mask_roi_aligner + # Weights for the regression losses for each FRCNN layer. + # TODO(xianzhi): Make the weights configurable. + self._cascade_layer_to_weights = [ + [10.0, 10.0, 5.0, 5.0], + [20.0, 20.0, 10.0, 10.0], + [30.0, 30.0, 15.0, 15.0], + ] + + def call(self, + images: tf.Tensor, + image_shape: tf.Tensor, + anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, + gt_boxes: Optional[tf.Tensor] = None, + gt_classes: Optional[tf.Tensor] = None, + gt_masks: Optional[tf.Tensor] = None, + training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: + + model_outputs, intermediate_outputs = self._call_box_outputs( + images=images, image_shape=image_shape, anchor_boxes=anchor_boxes, + gt_boxes=gt_boxes, gt_classes=gt_classes, training=training) + if not self._include_mask: + return model_outputs + + model_mask_outputs = self._call_mask_outputs( + model_box_outputs=model_outputs, + features=model_outputs['decoder_features'], + current_rois=intermediate_outputs['current_rois'], + matched_gt_indices=intermediate_outputs['matched_gt_indices'], + matched_gt_boxes=intermediate_outputs['matched_gt_boxes'], + matched_gt_classes=intermediate_outputs['matched_gt_classes'], + gt_masks=gt_masks, + training=training) + model_outputs.update(model_mask_outputs) # pytype: disable=attribute-error # dynamic-method-lookup + return model_outputs + + def _get_backbone_and_decoder_features(self, images): + + backbone_features = self.backbone(images) + if self.decoder: + features = self.decoder(backbone_features) + else: + features = backbone_features + return backbone_features, features + + def _call_box_outputs( + self, images: tf.Tensor, + image_shape: tf.Tensor, + anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, + gt_boxes: Optional[tf.Tensor] = None, + gt_classes: Optional[tf.Tensor] = None, + training: Optional[bool] = None) -> Tuple[ + Mapping[str, tf.Tensor], Mapping[str, tf.Tensor]]: + """Implementation of the Faster-RCNN logic for boxes.""" + model_outputs = {} + + # Feature extraction. + (backbone_features, + decoder_features) = self._get_backbone_and_decoder_features(images) + + # Region proposal network. + rpn_scores, rpn_boxes = self.rpn_head(decoder_features) + + model_outputs.update({ + 'backbone_features': backbone_features, + 'decoder_features': decoder_features, + 'rpn_boxes': rpn_boxes, + 'rpn_scores': rpn_scores + }) + + # Generate anchor boxes for this batch if not provided. + if anchor_boxes is None: + _, image_height, image_width, _ = images.get_shape().as_list() + anchor_boxes = anchor.Anchor( + min_level=self._config_dict['min_level'], + max_level=self._config_dict['max_level'], + num_scales=self._config_dict['num_scales'], + aspect_ratios=self._config_dict['aspect_ratios'], + anchor_size=self._config_dict['anchor_size'], + image_size=(image_height, image_width)).multilevel_boxes + for l in anchor_boxes: + anchor_boxes[l] = tf.tile( + tf.expand_dims(anchor_boxes[l], axis=0), + [tf.shape(images)[0], 1, 1, 1]) + + # Generate RoIs. + current_rois, _ = self.roi_generator(rpn_boxes, rpn_scores, anchor_boxes, + image_shape, training) + + next_rois = current_rois + all_class_outputs = [] + for cascade_num in range(len(self.roi_sampler)): + # In cascade RCNN we want the higher layers to have different regression + # weights as the predicted deltas become smaller and smaller. + regression_weights = self._cascade_layer_to_weights[cascade_num] + current_rois = next_rois + + (class_outputs, box_outputs, model_outputs, matched_gt_boxes, + matched_gt_classes, matched_gt_indices, + current_rois) = self._run_frcnn_head( + features=decoder_features, + rois=current_rois, + gt_boxes=gt_boxes, + gt_classes=gt_classes, + training=training, + model_outputs=model_outputs, + cascade_num=cascade_num, + regression_weights=regression_weights) + all_class_outputs.append(class_outputs) + + # Generate ROIs for the next cascade head if there is any. + if cascade_num < len(self.roi_sampler) - 1: + next_rois = box_ops.decode_boxes( + tf.cast(box_outputs, tf.float32), + current_rois, + weights=regression_weights) + next_rois = box_ops.clip_boxes(next_rois, + tf.expand_dims(image_shape, axis=1)) + + if not training: + if self._config_dict['cascade_class_ensemble']: + class_outputs = tf.add_n(all_class_outputs) / len(all_class_outputs) + + detections = self.detection_generator( + box_outputs, + class_outputs, + current_rois, + image_shape, + regression_weights, + bbox_per_class=(not self._config_dict['class_agnostic_bbox_pred'])) + model_outputs.update({ + 'cls_outputs': class_outputs, + 'box_outputs': box_outputs, + }) + if self.detection_generator.get_config()['apply_nms']: + model_outputs.update({ + 'detection_boxes': detections['detection_boxes'], + 'detection_scores': detections['detection_scores'], + 'detection_classes': detections['detection_classes'], + 'num_detections': detections['num_detections'] + }) + else: + model_outputs.update({ + 'decoded_boxes': detections['decoded_boxes'], + 'decoded_box_scores': detections['decoded_box_scores'] + }) + + intermediate_outputs = { + 'matched_gt_boxes': matched_gt_boxes, + 'matched_gt_indices': matched_gt_indices, + 'matched_gt_classes': matched_gt_classes, + 'current_rois': current_rois, + } + return (model_outputs, intermediate_outputs) + + def _call_mask_outputs( + self, + model_box_outputs: Mapping[str, tf.Tensor], + features: tf.Tensor, + current_rois: tf.Tensor, + matched_gt_indices: tf.Tensor, + matched_gt_boxes: tf.Tensor, + matched_gt_classes: tf.Tensor, + gt_masks: tf.Tensor, + training: Optional[bool] = None) -> Mapping[str, tf.Tensor]: + """Implementation of Mask-RCNN mask prediction logic.""" + + model_outputs = dict(model_box_outputs) + if training: + current_rois, roi_classes, roi_masks = self.mask_sampler( + current_rois, matched_gt_boxes, matched_gt_classes, + matched_gt_indices, gt_masks) + roi_masks = tf.stop_gradient(roi_masks) + + model_outputs.update({ + 'mask_class_targets': roi_classes, + 'mask_targets': roi_masks, + }) + else: + current_rois = model_outputs['detection_boxes'] + roi_classes = model_outputs['detection_classes'] + + mask_logits, mask_probs = self._features_to_mask_outputs( + features, current_rois, roi_classes) + + if training: + model_outputs.update({ + 'mask_outputs': mask_logits, + }) + else: + model_outputs.update({ + 'detection_masks': mask_probs, + }) + return model_outputs + + def _run_frcnn_head(self, features, rois, gt_boxes, gt_classes, training, + model_outputs, cascade_num, regression_weights): + """Runs the frcnn head that does both class and box prediction. + + Args: + features: `list` of features from the feature extractor. + rois: `list` of current rois that will be used to predict bbox refinement + and classes from. + gt_boxes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES, 4]. + This tensor might have paddings with a negative value. + gt_classes: [batch_size, MAX_INSTANCES] representing the groundtruth box + classes. It is padded with -1s to indicate the invalid classes. + training: `bool`, if model is training or being evaluated. + model_outputs: `dict`, used for storing outputs used for eval and losses. + cascade_num: `int`, the current frcnn layer in the cascade. + regression_weights: `list`, weights used for l1 loss in bounding box + regression. + + Returns: + class_outputs: Class predictions for rois. + box_outputs: Box predictions for rois. These are formatted for the + regression loss and need to be converted before being used as rois + in the next stage. + model_outputs: Updated dict with predictions used for losses and eval. + matched_gt_boxes: If `is_training` is true, then these give the gt box + location of its positive match. + matched_gt_classes: If `is_training` is true, then these give the gt class + of the predicted box. + matched_gt_boxes: If `is_training` is true, then these give the box + location of its positive match. + matched_gt_indices: If `is_training` is true, then gives the index of + the positive box match. Used for mask prediction. + rois: The sampled rois used for this layer. + """ + # Only used during training. + matched_gt_boxes, matched_gt_classes, matched_gt_indices = (None, None, + None) + if training and gt_boxes is not None: + rois = tf.stop_gradient(rois) + + current_roi_sampler = self.roi_sampler[cascade_num] + rois, matched_gt_boxes, matched_gt_classes, matched_gt_indices = ( + current_roi_sampler(rois, gt_boxes, gt_classes)) + # Create bounding box training targets. + box_targets = box_ops.encode_boxes( + matched_gt_boxes, rois, weights=regression_weights) + # If the target is background, the box target is set to all 0s. + box_targets = tf.where( + tf.tile( + tf.expand_dims(tf.equal(matched_gt_classes, 0), axis=-1), + [1, 1, 4]), tf.zeros_like(box_targets), box_targets) + model_outputs.update({ + 'class_targets_{}'.format(cascade_num) + if cascade_num else 'class_targets': + matched_gt_classes, + 'box_targets_{}'.format(cascade_num) + if cascade_num else 'box_targets': + box_targets, + }) + + # Get roi features. + roi_features = self.roi_aligner(features, rois) + + # Run frcnn head to get class and bbox predictions. + current_detection_head = self.detection_head[cascade_num] + class_outputs, box_outputs = current_detection_head(roi_features) + + model_outputs.update({ + 'class_outputs_{}'.format(cascade_num) + if cascade_num else 'class_outputs': + class_outputs, + 'box_outputs_{}'.format(cascade_num) if cascade_num else 'box_outputs': + box_outputs, + }) + return (class_outputs, box_outputs, model_outputs, matched_gt_boxes, + matched_gt_classes, matched_gt_indices, rois) + + def _features_to_mask_outputs(self, features, rois, roi_classes): + # Mask RoI align. + mask_roi_features = self.mask_roi_aligner(features, rois) + + # Mask head. + raw_masks = self.mask_head([mask_roi_features, roi_classes]) + + return raw_masks, tf.nn.sigmoid(raw_masks) + + @property + def checkpoint_items( + self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: + """Returns a dictionary of items to be additionally checkpointed.""" + items = dict( + backbone=self.backbone, + rpn_head=self.rpn_head, + detection_head=self.detection_head) + if self.decoder is not None: + items.update(decoder=self.decoder) + if self._include_mask: + items.update(mask_head=self.mask_head) + + return items + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/vision/modeling/maskrcnn_model_test.py b/official/vision/modeling/maskrcnn_model_test.py new file mode 100644 index 0000000000000000000000000000000000000000..96ea3a2efd4483d8a2113c2e7914eb919bcea186 --- /dev/null +++ b/official/vision/modeling/maskrcnn_model_test.py @@ -0,0 +1,397 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for maskrcnn_model.py.""" + +import os +# Import libraries +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from tensorflow.python.distribute import strategy_combinations +from official.vision.modeling import maskrcnn_model +from official.vision.modeling.backbones import resnet +from official.vision.modeling.decoders import fpn +from official.vision.modeling.heads import dense_prediction_heads +from official.vision.modeling.heads import instance_heads +from official.vision.modeling.layers import detection_generator +from official.vision.modeling.layers import mask_sampler +from official.vision.modeling.layers import roi_aligner +from official.vision.modeling.layers import roi_generator +from official.vision.modeling.layers import roi_sampler +from official.vision.ops import anchor + + +class MaskRCNNModelTest(parameterized.TestCase, tf.test.TestCase): + + @combinations.generate( + combinations.combine( + include_mask=[True, False], + use_separable_conv=[True, False], + build_anchor_boxes=[True, False], + is_training=[True, False])) + def test_build_model(self, include_mask, use_separable_conv, + build_anchor_boxes, is_training): + num_classes = 3 + min_level = 3 + max_level = 7 + num_scales = 3 + aspect_ratios = [1.0] + anchor_size = 3 + resnet_model_id = 50 + num_anchors_per_location = num_scales * len(aspect_ratios) + image_size = 384 + images = np.random.rand(2, image_size, image_size, 3) + image_shape = np.array([[image_size, image_size], [image_size, image_size]]) + + if build_anchor_boxes: + anchor_boxes = anchor.Anchor( + min_level=min_level, + max_level=max_level, + num_scales=num_scales, + aspect_ratios=aspect_ratios, + anchor_size=3, + image_size=(image_size, image_size)).multilevel_boxes + for l in anchor_boxes: + anchor_boxes[l] = tf.tile( + tf.expand_dims(anchor_boxes[l], axis=0), [2, 1, 1, 1]) + else: + anchor_boxes = None + + backbone = resnet.ResNet(model_id=resnet_model_id) + decoder = fpn.FPN( + input_specs=backbone.output_specs, + min_level=min_level, + max_level=max_level, + use_separable_conv=use_separable_conv) + rpn_head = dense_prediction_heads.RPNHead( + min_level=min_level, + max_level=max_level, + num_anchors_per_location=num_anchors_per_location, + num_convs=1) + detection_head = instance_heads.DetectionHead(num_classes=num_classes) + roi_generator_obj = roi_generator.MultilevelROIGenerator() + roi_sampler_obj = roi_sampler.ROISampler() + roi_aligner_obj = roi_aligner.MultilevelROIAligner() + detection_generator_obj = detection_generator.DetectionGenerator() + if include_mask: + mask_head = instance_heads.MaskHead( + num_classes=num_classes, upsample_factor=2) + mask_sampler_obj = mask_sampler.MaskSampler( + mask_target_size=28, num_sampled_masks=1) + mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) + else: + mask_head = None + mask_sampler_obj = None + mask_roi_aligner_obj = None + model = maskrcnn_model.MaskRCNNModel( + backbone, + decoder, + rpn_head, + detection_head, + roi_generator_obj, + roi_sampler_obj, + roi_aligner_obj, + detection_generator_obj, + mask_head, + mask_sampler_obj, + mask_roi_aligner_obj, + min_level=min_level, + max_level=max_level, + num_scales=num_scales, + aspect_ratios=aspect_ratios, + anchor_size=anchor_size) + + gt_boxes = np.array( + [[[10, 10, 15, 15], [2.5, 2.5, 7.5, 7.5], [-1, -1, -1, -1]], + [[100, 100, 150, 150], [-1, -1, -1, -1], [-1, -1, -1, -1]]], + dtype=np.float32) + gt_classes = np.array([[2, 1, -1], [1, -1, -1]], dtype=np.int32) + if include_mask: + gt_masks = np.ones((2, 3, 100, 100)) + else: + gt_masks = None + + # Results will be checked in test_forward. + _ = model( + images, + image_shape, + anchor_boxes, + gt_boxes, + gt_classes, + gt_masks, + training=is_training) + + @combinations.generate( + combinations.combine( + strategy=[ + strategy_combinations.cloud_tpu_strategy, + strategy_combinations.one_device_strategy_gpu, + ], + include_mask=[True, False], + build_anchor_boxes=[True, False], + use_cascade_heads=[True, False], + training=[True, False], + )) + def test_forward(self, strategy, include_mask, build_anchor_boxes, training, + use_cascade_heads): + num_classes = 3 + min_level = 3 + max_level = 4 + num_scales = 3 + aspect_ratios = [1.0] + anchor_size = 3 + if use_cascade_heads: + cascade_iou_thresholds = [0.6] + class_agnostic_bbox_pred = True + cascade_class_ensemble = True + else: + cascade_iou_thresholds = None + class_agnostic_bbox_pred = False + cascade_class_ensemble = False + + image_size = (256, 256) + images = np.random.rand(2, image_size[0], image_size[1], 3) + image_shape = np.array([[224, 100], [100, 224]]) + with strategy.scope(): + if build_anchor_boxes: + anchor_boxes = anchor.Anchor( + min_level=min_level, + max_level=max_level, + num_scales=num_scales, + aspect_ratios=aspect_ratios, + anchor_size=anchor_size, + image_size=image_size).multilevel_boxes + else: + anchor_boxes = None + num_anchors_per_location = len(aspect_ratios) * num_scales + + input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) + backbone = resnet.ResNet(model_id=50, input_specs=input_specs) + decoder = fpn.FPN( + min_level=min_level, + max_level=max_level, + input_specs=backbone.output_specs) + rpn_head = dense_prediction_heads.RPNHead( + min_level=min_level, + max_level=max_level, + num_anchors_per_location=num_anchors_per_location) + detection_head = instance_heads.DetectionHead( + num_classes=num_classes, + class_agnostic_bbox_pred=class_agnostic_bbox_pred) + roi_generator_obj = roi_generator.MultilevelROIGenerator() + + roi_sampler_cascade = [] + roi_sampler_obj = roi_sampler.ROISampler() + roi_sampler_cascade.append(roi_sampler_obj) + if cascade_iou_thresholds: + for iou in cascade_iou_thresholds: + roi_sampler_obj = roi_sampler.ROISampler( + mix_gt_boxes=False, + foreground_iou_threshold=iou, + background_iou_high_threshold=iou, + background_iou_low_threshold=0.0, + skip_subsampling=True) + roi_sampler_cascade.append(roi_sampler_obj) + roi_aligner_obj = roi_aligner.MultilevelROIAligner() + detection_generator_obj = detection_generator.DetectionGenerator() + if include_mask: + mask_head = instance_heads.MaskHead( + num_classes=num_classes, upsample_factor=2) + mask_sampler_obj = mask_sampler.MaskSampler( + mask_target_size=28, num_sampled_masks=1) + mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) + else: + mask_head = None + mask_sampler_obj = None + mask_roi_aligner_obj = None + model = maskrcnn_model.MaskRCNNModel( + backbone, + decoder, + rpn_head, + detection_head, + roi_generator_obj, + roi_sampler_obj, + roi_aligner_obj, + detection_generator_obj, + mask_head, + mask_sampler_obj, + mask_roi_aligner_obj, + class_agnostic_bbox_pred=class_agnostic_bbox_pred, + cascade_class_ensemble=cascade_class_ensemble, + min_level=min_level, + max_level=max_level, + num_scales=num_scales, + aspect_ratios=aspect_ratios, + anchor_size=anchor_size) + + gt_boxes = np.array( + [[[10, 10, 15, 15], [2.5, 2.5, 7.5, 7.5], [-1, -1, -1, -1]], + [[100, 100, 150, 150], [-1, -1, -1, -1], [-1, -1, -1, -1]]], + dtype=np.float32) + gt_classes = np.array([[2, 1, -1], [1, -1, -1]], dtype=np.int32) + if include_mask: + gt_masks = np.ones((2, 3, 100, 100)) + else: + gt_masks = None + + results = model( + images, + image_shape, + anchor_boxes, + gt_boxes, + gt_classes, + gt_masks, + training=training) + + self.assertIn('rpn_boxes', results) + self.assertIn('rpn_scores', results) + if training: + self.assertIn('class_targets', results) + self.assertIn('box_targets', results) + self.assertIn('class_outputs', results) + self.assertIn('box_outputs', results) + if include_mask: + self.assertIn('mask_outputs', results) + else: + self.assertIn('detection_boxes', results) + self.assertIn('detection_scores', results) + self.assertIn('detection_classes', results) + self.assertIn('num_detections', results) + if include_mask: + self.assertIn('detection_masks', results) + + @parameterized.parameters( + (False,), + (True,), + ) + def test_serialize_deserialize(self, include_mask): + input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) + backbone = resnet.ResNet(model_id=50, input_specs=input_specs) + decoder = fpn.FPN( + min_level=3, max_level=7, input_specs=backbone.output_specs) + rpn_head = dense_prediction_heads.RPNHead( + min_level=3, max_level=7, num_anchors_per_location=3) + detection_head = instance_heads.DetectionHead(num_classes=2) + roi_generator_obj = roi_generator.MultilevelROIGenerator() + roi_sampler_obj = roi_sampler.ROISampler() + roi_aligner_obj = roi_aligner.MultilevelROIAligner() + detection_generator_obj = detection_generator.DetectionGenerator() + if include_mask: + mask_head = instance_heads.MaskHead(num_classes=2, upsample_factor=2) + mask_sampler_obj = mask_sampler.MaskSampler( + mask_target_size=28, num_sampled_masks=1) + mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) + else: + mask_head = None + mask_sampler_obj = None + mask_roi_aligner_obj = None + model = maskrcnn_model.MaskRCNNModel( + backbone, + decoder, + rpn_head, + detection_head, + roi_generator_obj, + roi_sampler_obj, + roi_aligner_obj, + detection_generator_obj, + mask_head, + mask_sampler_obj, + mask_roi_aligner_obj, + min_level=3, + max_level=7, + num_scales=3, + aspect_ratios=[1.0], + anchor_size=3) + + config = model.get_config() + new_model = maskrcnn_model.MaskRCNNModel.from_config(config) + + # Validate that the config can be forced to JSON. + _ = new_model.to_json() + + # If the serialization was successful, the new config should match the old. + self.assertAllEqual(model.get_config(), new_model.get_config()) + + @parameterized.parameters( + (False,), + (True,), + ) + def test_checkpoint(self, include_mask): + input_specs = tf.keras.layers.InputSpec(shape=[None, None, None, 3]) + backbone = resnet.ResNet(model_id=50, input_specs=input_specs) + decoder = fpn.FPN( + min_level=3, max_level=7, input_specs=backbone.output_specs) + rpn_head = dense_prediction_heads.RPNHead( + min_level=3, max_level=7, num_anchors_per_location=3) + detection_head = instance_heads.DetectionHead(num_classes=2) + roi_generator_obj = roi_generator.MultilevelROIGenerator() + roi_sampler_obj = roi_sampler.ROISampler() + roi_aligner_obj = roi_aligner.MultilevelROIAligner() + detection_generator_obj = detection_generator.DetectionGenerator() + if include_mask: + mask_head = instance_heads.MaskHead(num_classes=2, upsample_factor=2) + mask_sampler_obj = mask_sampler.MaskSampler( + mask_target_size=28, num_sampled_masks=1) + mask_roi_aligner_obj = roi_aligner.MultilevelROIAligner(crop_size=14) + else: + mask_head = None + mask_sampler_obj = None + mask_roi_aligner_obj = None + model = maskrcnn_model.MaskRCNNModel( + backbone, + decoder, + rpn_head, + detection_head, + roi_generator_obj, + roi_sampler_obj, + roi_aligner_obj, + detection_generator_obj, + mask_head, + mask_sampler_obj, + mask_roi_aligner_obj, + min_level=3, + max_level=7, + num_scales=3, + aspect_ratios=[1.0], + anchor_size=3) + expect_checkpoint_items = dict( + backbone=backbone, + decoder=decoder, + rpn_head=rpn_head, + detection_head=[detection_head]) + if include_mask: + expect_checkpoint_items['mask_head'] = mask_head + self.assertAllEqual(expect_checkpoint_items, model.checkpoint_items) + + # Test save and load checkpoints. + ckpt = tf.train.Checkpoint(model=model, **model.checkpoint_items) + save_dir = self.create_tempdir().full_path + ckpt.save(os.path.join(save_dir, 'ckpt')) + + partial_ckpt = tf.train.Checkpoint(backbone=backbone) + partial_ckpt.read(tf.train.latest_checkpoint( + save_dir)).expect_partial().assert_existing_objects_matched() + + if include_mask: + partial_ckpt_mask = tf.train.Checkpoint( + backbone=backbone, mask_head=mask_head) + partial_ckpt_mask.restore(tf.train.latest_checkpoint( + save_dir)).expect_partial().assert_existing_objects_matched() + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/modeling/models/__init__.py b/official/vision/modeling/models/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..517b75b41e02fd215ae9ad6e91202f77a5d81bdf --- /dev/null +++ b/official/vision/modeling/models/__init__.py @@ -0,0 +1,21 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Models under Vision package.""" + +from official.vision.modeling.classification_model import ClassificationModel +from official.vision.modeling.maskrcnn_model import MaskRCNNModel +from official.vision.modeling.retinanet_model import RetinaNetModel +from official.vision.modeling.segmentation_model import SegmentationModel +from official.vision.modeling.video_classification_model import VideoClassificationModel diff --git a/official/vision/modeling/retinanet_model.py b/official/vision/modeling/retinanet_model.py new file mode 100644 index 0000000000000000000000000000000000000000..b20dc19d3553e6e0a7d071816204d83229369652 --- /dev/null +++ b/official/vision/modeling/retinanet_model.py @@ -0,0 +1,227 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""RetinaNet.""" +from typing import Any, Mapping, List, Optional, Union, Sequence + +# Import libraries +import tensorflow as tf + +from official.vision.ops import anchor + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class RetinaNetModel(tf.keras.Model): + """The RetinaNet model class.""" + + def __init__(self, + backbone: tf.keras.Model, + decoder: tf.keras.Model, + head: tf.keras.layers.Layer, + detection_generator: tf.keras.layers.Layer, + min_level: Optional[int] = None, + max_level: Optional[int] = None, + num_scales: Optional[int] = None, + aspect_ratios: Optional[List[float]] = None, + anchor_size: Optional[float] = None, + **kwargs): + """Classification initialization function. + + Args: + backbone: `tf.keras.Model` a backbone network. + decoder: `tf.keras.Model` a decoder network. + head: `RetinaNetHead`, the RetinaNet head. + detection_generator: the detection generator. + min_level: Minimum level in output feature maps. + max_level: Maximum level in output feature maps. + num_scales: A number representing intermediate scales added + on each level. For instances, num_scales=2 adds one additional + intermediate anchor scales [2^0, 2^0.5] on each level. + aspect_ratios: A list representing the aspect raito + anchors added on each level. The number indicates the ratio of width to + height. For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors + on each scale level. + anchor_size: A number representing the scale of size of the base + anchor to the feature stride 2^level. + **kwargs: keyword arguments to be passed. + """ + super(RetinaNetModel, self).__init__(**kwargs) + self._config_dict = { + 'backbone': backbone, + 'decoder': decoder, + 'head': head, + 'detection_generator': detection_generator, + 'min_level': min_level, + 'max_level': max_level, + 'num_scales': num_scales, + 'aspect_ratios': aspect_ratios, + 'anchor_size': anchor_size, + } + self._backbone = backbone + self._decoder = decoder + self._head = head + self._detection_generator = detection_generator + + def call(self, + images: Union[tf.Tensor, Sequence[tf.Tensor]], + image_shape: Optional[tf.Tensor] = None, + anchor_boxes: Optional[Mapping[str, tf.Tensor]] = None, + output_intermediate_features: bool = False, + training: bool = None) -> Mapping[str, tf.Tensor]: + """Forward pass of the RetinaNet model. + + Args: + images: `Tensor` or a sequence of `Tensor`, the input batched images to + the backbone network, whose shape(s) is [batch, height, width, 3]. If it + is a sequence of `Tensor`, we will assume the anchors are generated + based on the shape of the first image(s). + image_shape: `Tensor`, the actual shape of the input images, whose shape + is [batch, 2] where the last dimension is [height, width]. Note that + this is the actual image shape excluding paddings. For example, images + in the batch may be resized into different shapes before padding to the + fixed size. + anchor_boxes: a dict of tensors which includes multilevel anchors. + - key: `str`, the level of the multilevel predictions. + - values: `Tensor`, the anchor coordinates of a particular feature + level, whose shape is [height_l, width_l, num_anchors_per_location]. + output_intermediate_features: `bool` indicating whether to return the + intermediate feature maps generated by backbone and decoder. + training: `bool`, indicating whether it is in training mode. + + Returns: + scores: a dict of tensors which includes scores of the predictions. + - key: `str`, the level of the multilevel predictions. + - values: `Tensor`, the box scores predicted from a particular feature + level, whose shape is + [batch, height_l, width_l, num_classes * num_anchors_per_location]. + boxes: a dict of tensors which includes coordinates of the predictions. + - key: `str`, the level of the multilevel predictions. + - values: `Tensor`, the box coordinates predicted from a particular + feature level, whose shape is + [batch, height_l, width_l, 4 * num_anchors_per_location]. + attributes: a dict of (attribute_name, attribute_predictions). Each + attribute prediction is a dict that includes: + - key: `str`, the level of the multilevel predictions. + - values: `Tensor`, the attribute predictions from a particular + feature level, whose shape is + [batch, height_l, width_l, att_size * num_anchors_per_location]. + """ + outputs = {} + # Feature extraction. + features = self.backbone(images) + if output_intermediate_features: + outputs.update( + {'backbone_{}'.format(k): v for k, v in features.items()}) + if self.decoder: + features = self.decoder(features) + if output_intermediate_features: + outputs.update( + {'decoder_{}'.format(k): v for k, v in features.items()}) + + # Dense prediction. `raw_attributes` can be empty. + raw_scores, raw_boxes, raw_attributes = self.head(features) + + if training: + outputs.update({ + 'cls_outputs': raw_scores, + 'box_outputs': raw_boxes, + }) + if raw_attributes: + outputs.update({'attribute_outputs': raw_attributes}) + return outputs + else: + # Generate anchor boxes for this batch if not provided. + if anchor_boxes is None: + if isinstance(images, Sequence): + primary_images = images[0] + elif isinstance(images, tf.Tensor): + primary_images = images + else: + raise ValueError( + 'Input should be a tf.Tensor or a sequence of tf.Tensor, not {}.' + .format(type(images))) + + _, image_height, image_width, _ = primary_images.get_shape().as_list() + anchor_boxes = anchor.Anchor( + min_level=self._config_dict['min_level'], + max_level=self._config_dict['max_level'], + num_scales=self._config_dict['num_scales'], + aspect_ratios=self._config_dict['aspect_ratios'], + anchor_size=self._config_dict['anchor_size'], + image_size=(image_height, image_width)).multilevel_boxes + for l in anchor_boxes: + anchor_boxes[l] = tf.tile( + tf.expand_dims(anchor_boxes[l], axis=0), + [tf.shape(primary_images)[0], 1, 1, 1]) + + # Post-processing. + final_results = self.detection_generator(raw_boxes, raw_scores, + anchor_boxes, image_shape, + raw_attributes) + outputs.update({ + 'cls_outputs': raw_scores, + 'box_outputs': raw_boxes, + }) + if self.detection_generator.get_config()['apply_nms']: + outputs.update({ + 'detection_boxes': final_results['detection_boxes'], + 'detection_scores': final_results['detection_scores'], + 'detection_classes': final_results['detection_classes'], + 'num_detections': final_results['num_detections'] + }) + else: + outputs.update({ + 'decoded_boxes': final_results['decoded_boxes'], + 'decoded_box_scores': final_results['decoded_box_scores'] + }) + + if raw_attributes: + outputs.update({ + 'attribute_outputs': raw_attributes, + 'detection_attributes': final_results['detection_attributes'], + }) + return outputs + + @property + def checkpoint_items( + self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: + """Returns a dictionary of items to be additionally checkpointed.""" + items = dict(backbone=self.backbone, head=self.head) + if self.decoder is not None: + items.update(decoder=self.decoder) + + return items + + @property + def backbone(self) -> tf.keras.Model: + return self._backbone + + @property + def decoder(self) -> tf.keras.Model: + return self._decoder + + @property + def head(self) -> tf.keras.layers.Layer: + return self._head + + @property + def detection_generator(self) -> tf.keras.layers.Layer: + return self._detection_generator + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config): + return cls(**config) diff --git a/official/vision/beta/modeling/retinanet_model_test.py b/official/vision/modeling/retinanet_model_test.py similarity index 96% rename from official/vision/beta/modeling/retinanet_model_test.py rename to official/vision/modeling/retinanet_model_test.py index 17191e5575bb0e27517a5a92f78f9777dac03371..80ee55aff93171f5825d724c9d374c889c69bb61 100644 --- a/official/vision/beta/modeling/retinanet_model_test.py +++ b/official/vision/modeling/retinanet_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for RetinaNet models.""" # Import libraries @@ -22,12 +21,12 @@ import tensorflow as tf from tensorflow.python.distribute import combinations from tensorflow.python.distribute import strategy_combinations -from official.vision.beta.modeling import retinanet_model -from official.vision.beta.modeling.backbones import resnet -from official.vision.beta.modeling.decoders import fpn -from official.vision.beta.modeling.heads import dense_prediction_heads -from official.vision.beta.modeling.layers import detection_generator -from official.vision.beta.ops import anchor +from official.vision.modeling import retinanet_model +from official.vision.modeling.backbones import resnet +from official.vision.modeling.decoders import fpn +from official.vision.modeling.heads import dense_prediction_heads +from official.vision.modeling.layers import detection_generator +from official.vision.ops import anchor class RetinaNetTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/modeling/segmentation_model.py b/official/vision/modeling/segmentation_model.py new file mode 100644 index 0000000000000000000000000000000000000000..18cdf59952afdcfb3db495ea8e392510f5238681 --- /dev/null +++ b/official/vision/modeling/segmentation_model.py @@ -0,0 +1,94 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Build segmentation models.""" +from typing import Any, Mapping, Union, Optional, Dict + +# Import libraries +import tensorflow as tf + +layers = tf.keras.layers + + +@tf.keras.utils.register_keras_serializable(package='Vision') +class SegmentationModel(tf.keras.Model): + """A Segmentation class model. + + Input images are passed through backbone first. Decoder network is then + applied, and finally, segmentation head is applied on the output of the + decoder network. Layers such as ASPP should be part of decoder. Any feature + fusion is done as part of the segmentation head (i.e. deeplabv3+ feature + fusion is not part of the decoder, instead it is part of the segmentation + head). This way, different feature fusion techniques can be combined with + different backbones, and decoders. + """ + + def __init__(self, backbone: tf.keras.Model, decoder: tf.keras.Model, + head: tf.keras.layers.Layer, + mask_scoring_head: Optional[tf.keras.layers.Layer] = None, + **kwargs): + """Segmentation initialization function. + + Args: + backbone: a backbone network. + decoder: a decoder network. E.g. FPN. + head: segmentation head. + mask_scoring_head: mask scoring head. + **kwargs: keyword arguments to be passed. + """ + super(SegmentationModel, self).__init__(**kwargs) + self._config_dict = { + 'backbone': backbone, + 'decoder': decoder, + 'head': head, + 'mask_scoring_head': mask_scoring_head, + } + self.backbone = backbone + self.decoder = decoder + self.head = head + self.mask_scoring_head = mask_scoring_head + + def call(self, inputs: tf.Tensor, training: bool = None + ) -> Dict[str, tf.Tensor]: + backbone_features = self.backbone(inputs) + + if self.decoder: + decoder_features = self.decoder(backbone_features) + else: + decoder_features = backbone_features + + logits = self.head((backbone_features, decoder_features)) + outputs = {'logits': logits} + if self.mask_scoring_head: + mask_scores = self.mask_scoring_head(logits) + outputs.update({'mask_scores': mask_scores}) + return outputs + + @property + def checkpoint_items( + self) -> Mapping[str, Union[tf.keras.Model, tf.keras.layers.Layer]]: + """Returns a dictionary of items to be additionally checkpointed.""" + items = dict(backbone=self.backbone, head=self.head) + if self.decoder is not None: + items.update(decoder=self.decoder) + if self.mask_scoring_head is not None: + items.update(mask_scoring_head=self.mask_scoring_head) + return items + + def get_config(self) -> Mapping[str, Any]: + return self._config_dict + + @classmethod + def from_config(cls, config, custom_objects=None): + return cls(**config) diff --git a/official/vision/beta/modeling/segmentation_model_test.py b/official/vision/modeling/segmentation_model_test.py similarity index 88% rename from official/vision/beta/modeling/segmentation_model_test.py rename to official/vision/modeling/segmentation_model_test.py index ec2e1ee985e0846e2d366dfff77e103496dafc90..b1a2f8076bae46b8ba28192c0eadea4942292621 100644 --- a/official/vision/beta/modeling/segmentation_model_test.py +++ b/official/vision/modeling/segmentation_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,17 +12,16 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for segmentation network.""" from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import segmentation_model -from official.vision.beta.modeling.decoders import fpn -from official.vision.beta.modeling.heads import segmentation_heads +from official.vision.modeling import backbones +from official.vision.modeling import segmentation_model +from official.vision.modeling.decoders import fpn +from official.vision.modeling.heads import segmentation_heads class SegmentationNetworkTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/beta/modeling/video_classification_model.py b/official/vision/modeling/video_classification_model.py similarity index 98% rename from official/vision/beta/modeling/video_classification_model.py rename to official/vision/modeling/video_classification_model.py index 552c6b10e8e553389f0a8e62b3c0f20fa280f154..8aedd35bcc4959dacc13b57bc0e1c200f5af26b9 100644 --- a/official/vision/beta/modeling/video_classification_model.py +++ b/official/vision/modeling/video_classification_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/modeling/video_classification_model_test.py b/official/vision/modeling/video_classification_model_test.py similarity index 93% rename from official/vision/beta/modeling/video_classification_model_test.py rename to official/vision/modeling/video_classification_model_test.py index 7b06cf83bf19529c9d4378865ba323cfa892e708..cd4b4a3559407b28c9ef9ded43070b19b2d6c53f 100644 --- a/official/vision/beta/modeling/video_classification_model_test.py +++ b/official/vision/modeling/video_classification_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Tests for video classification network.""" # Import libraries @@ -20,8 +19,8 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.modeling import backbones -from official.vision.beta.modeling import video_classification_model +from official.vision.modeling import backbones +from official.vision.modeling import video_classification_model class VideoClassificationNetworkTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/ops/__init__.py b/official/vision/ops/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..310bfb28f0c252bc4a4485325059bff28c5250c2 --- /dev/null +++ b/official/vision/ops/__init__.py @@ -0,0 +1,14 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + diff --git a/official/vision/ops/anchor.py b/official/vision/ops/anchor.py new file mode 100644 index 0000000000000000000000000000000000000000..570da35e4065d3b919d55a760f81e1224fd0e2a2 --- /dev/null +++ b/official/vision/ops/anchor.py @@ -0,0 +1,412 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Anchor box and labeler definition.""" + +import collections +from typing import Dict, Optional, Tuple + +# Import libraries + +import tensorflow as tf + +from official.vision.ops import anchor_generator +from official.vision.ops import box_matcher +from official.vision.ops import iou_similarity +from official.vision.ops import target_gather +from official.vision.utils.object_detection import balanced_positive_negative_sampler +from official.vision.utils.object_detection import box_list +from official.vision.utils.object_detection import faster_rcnn_box_coder + + +class Anchor(object): + """Anchor class for anchor-based object detectors.""" + + def __init__(self, + min_level, + max_level, + num_scales, + aspect_ratios, + anchor_size, + image_size): + """Constructs multiscale anchors. + + Args: + min_level: integer number of minimum level of the output feature pyramid. + max_level: integer number of maximum level of the output feature pyramid. + num_scales: integer number representing intermediate scales added + on each level. For instances, num_scales=2 adds one additional + intermediate anchor scales [2^0, 2^0.5] on each level. + aspect_ratios: list of float numbers representing the aspect raito anchors + added on each level. The number indicates the ratio of width to height. + For instances, aspect_ratios=[1.0, 2.0, 0.5] adds three anchors on each + scale level. + anchor_size: float number representing the scale of size of the base + anchor to the feature stride 2^level. + image_size: a list of integer numbers or Tensors representing + [height, width] of the input image size.The image_size should be divided + by the largest feature stride 2^max_level. + """ + self.min_level = min_level + self.max_level = max_level + self.num_scales = num_scales + self.aspect_ratios = aspect_ratios + self.anchor_size = anchor_size + self.image_size = image_size + self.boxes = self._generate_boxes() + + def _generate_boxes(self) -> tf.Tensor: + """Generates multiscale anchor boxes. + + Returns: + a Tensor of shape [N, 4], representing anchor boxes of all levels + concatenated together. + """ + boxes_all = [] + for level in range(self.min_level, self.max_level + 1): + boxes_l = [] + for scale in range(self.num_scales): + for aspect_ratio in self.aspect_ratios: + stride = 2 ** level + intermidate_scale = 2 ** (scale / float(self.num_scales)) + base_anchor_size = self.anchor_size * stride * intermidate_scale + aspect_x = aspect_ratio ** 0.5 + aspect_y = aspect_ratio ** -0.5 + half_anchor_size_x = base_anchor_size * aspect_x / 2.0 + half_anchor_size_y = base_anchor_size * aspect_y / 2.0 + x = tf.range(stride / 2, self.image_size[1], stride) + y = tf.range(stride / 2, self.image_size[0], stride) + xv, yv = tf.meshgrid(x, y) + xv = tf.cast(tf.reshape(xv, [-1]), dtype=tf.float32) + yv = tf.cast(tf.reshape(yv, [-1]), dtype=tf.float32) + # Tensor shape Nx4. + boxes = tf.stack([yv - half_anchor_size_y, xv - half_anchor_size_x, + yv + half_anchor_size_y, xv + half_anchor_size_x], + axis=1) + boxes_l.append(boxes) + # Concat anchors on the same level to tensor shape NxAx4. + boxes_l = tf.stack(boxes_l, axis=1) + boxes_l = tf.reshape(boxes_l, [-1, 4]) + boxes_all.append(boxes_l) + return tf.concat(boxes_all, axis=0) + + def unpack_labels(self, labels: tf.Tensor) -> Dict[str, tf.Tensor]: + """Unpacks an array of labels into multiscales labels.""" + unpacked_labels = collections.OrderedDict() + count = 0 + for level in range(self.min_level, self.max_level + 1): + feat_size_y = tf.cast(self.image_size[0] / 2 ** level, tf.int32) + feat_size_x = tf.cast(self.image_size[1] / 2 ** level, tf.int32) + steps = feat_size_y * feat_size_x * self.anchors_per_location + unpacked_labels[str(level)] = tf.reshape( + labels[count:count + steps], [feat_size_y, feat_size_x, -1]) + count += steps + return unpacked_labels + + @property + def anchors_per_location(self): + return self.num_scales * len(self.aspect_ratios) + + @property + def multilevel_boxes(self): + return self.unpack_labels(self.boxes) + + +class AnchorLabeler(object): + """Labeler for dense object detector.""" + + def __init__(self, + match_threshold=0.5, + unmatched_threshold=0.5): + """Constructs anchor labeler to assign labels to anchors. + + Args: + match_threshold: a float number between 0 and 1 representing the + lower-bound threshold to assign positive labels for anchors. An anchor + with a score over the threshold is labeled positive. + unmatched_threshold: a float number between 0 and 1 representing the + upper-bound threshold to assign negative labels for anchors. An anchor + with a score below the threshold is labeled negative. + """ + self.similarity_calc = iou_similarity.IouSimilarity() + self.target_gather = target_gather.TargetGather() + self.matcher = box_matcher.BoxMatcher( + thresholds=[unmatched_threshold, match_threshold], + indicators=[-1, -2, 1], + force_match_for_each_col=True) + self.box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder() + + def label_anchors( + self, + anchor_boxes: Dict[str, tf.Tensor], + gt_boxes: tf.Tensor, + gt_labels: tf.Tensor, + gt_attributes: Optional[Dict[str, tf.Tensor]] = None, + gt_weights: Optional[tf.Tensor] = None + ) -> Tuple[Dict[str, tf.Tensor], Dict[str, tf.Tensor], Dict[str, Dict[ + str, tf.Tensor]], tf.Tensor, tf.Tensor]: + """Labels anchors with ground truth inputs. + + Args: + anchor_boxes: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with + shape [height_l, width_l, num_anchors_per_location * 4]. The height_l + and width_l represent the dimension of the feature pyramid at l-th + level. For each anchor box, the tensor stores [y0, x0, y1, x1] for the + four corners. + gt_boxes: A float tensor with shape [N, 4] representing groundtruth boxes. + For each row, it stores [y0, x0, y1, x1] for four corners of a box. + gt_labels: A integer tensor with shape [N, 1] representing groundtruth + classes. + gt_attributes: If not None, a dict of (name, gt_attribute) pairs. + `gt_attribute` is a float tensor with shape [N, attribute_size] + representing groundtruth attributes. + gt_weights: If not None, a float tensor with shape [N] representing + groundtruth weights. + + Returns: + cls_targets_dict: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with + shape [height_l, width_l, num_anchors_per_location]. The height_l and + width_l represent the dimension of class logits at l-th level. + box_targets_dict: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with + shape [height_l, width_l, num_anchors_per_location * 4]. The height_l + and width_l represent the dimension of bounding box regression output at + l-th level. + attribute_targets_dict: A dict with (name, attribute_targets) pairs. Each + `attribute_targets` represents an ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with + shape [height_l, width_l, num_anchors_per_location * attribute_size]. + The height_l and width_l represent the dimension of attribute prediction + output at l-th level. + cls_weights: A flattened Tensor with shape [num_anchors], that serves as + masking / sample weight for classification loss. Its value is 1.0 for + positive and negative matched anchors, and 0.0 for ignored anchors. + box_weights: A flattened Tensor with shape [num_anchors], that serves as + masking / sample weight for regression loss. Its value is 1.0 for + positive matched anchors, and 0.0 for negative and ignored anchors. + """ + flattened_anchor_boxes = [] + for anchors in anchor_boxes.values(): + flattened_anchor_boxes.append(tf.reshape(anchors, [-1, 4])) + flattened_anchor_boxes = tf.concat(flattened_anchor_boxes, axis=0) + similarity_matrix = self.similarity_calc(flattened_anchor_boxes, gt_boxes) + match_indices, match_indicators = self.matcher(similarity_matrix) + + mask = tf.less_equal(match_indicators, 0) + cls_mask = tf.expand_dims(mask, -1) + cls_targets = self.target_gather(gt_labels, match_indices, cls_mask, -1) + box_mask = tf.tile(cls_mask, [1, 4]) + box_targets = self.target_gather(gt_boxes, match_indices, box_mask) + att_targets = {} + if gt_attributes: + for k, v in gt_attributes.items(): + att_size = v.get_shape().as_list()[-1] + att_mask = tf.tile(cls_mask, [1, att_size]) + att_targets[k] = self.target_gather(v, match_indices, att_mask, 0.0) + + weights = tf.squeeze(tf.ones_like(gt_labels, dtype=tf.float32), -1) + if gt_weights is not None: + weights = tf.math.multiply(weights, gt_weights) + box_weights = self.target_gather(weights, match_indices, mask) + ignore_mask = tf.equal(match_indicators, -2) + cls_weights = self.target_gather(weights, match_indices, ignore_mask) + box_targets_list = box_list.BoxList(box_targets) + anchor_box_list = box_list.BoxList(flattened_anchor_boxes) + box_targets = self.box_coder.encode(box_targets_list, anchor_box_list) + + # Unpacks labels into multi-level representations. + cls_targets_dict = unpack_targets(cls_targets, anchor_boxes) + box_targets_dict = unpack_targets(box_targets, anchor_boxes) + attribute_targets_dict = {} + for k, v in att_targets.items(): + attribute_targets_dict[k] = unpack_targets(v, anchor_boxes) + + return cls_targets_dict, box_targets_dict, attribute_targets_dict, cls_weights, box_weights + + +class RpnAnchorLabeler(AnchorLabeler): + """Labeler for Region Proposal Network.""" + + def __init__(self, + match_threshold=0.7, + unmatched_threshold=0.3, + rpn_batch_size_per_im=256, + rpn_fg_fraction=0.5): + AnchorLabeler.__init__(self, match_threshold=match_threshold, + unmatched_threshold=unmatched_threshold) + self._rpn_batch_size_per_im = rpn_batch_size_per_im + self._rpn_fg_fraction = rpn_fg_fraction + + def _get_rpn_samples(self, match_results): + """Computes anchor labels. + + This function performs subsampling for foreground (fg) and background (bg) + anchors. + Args: + match_results: A integer tensor with shape [N] representing the + matching results of anchors. (1) match_results[i]>=0, + meaning that column i is matched with row match_results[i]. + (2) match_results[i]=-1, meaning that column i is not matched. + (3) match_results[i]=-2, meaning that column i is ignored. + Returns: + score_targets: a integer tensor with the a shape of [N]. + (1) score_targets[i]=1, the anchor is a positive sample. + (2) score_targets[i]=0, negative. (3) score_targets[i]=-1, the anchor is + don't care (ignore). + """ + sampler = ( + balanced_positive_negative_sampler.BalancedPositiveNegativeSampler( + positive_fraction=self._rpn_fg_fraction, is_static=False)) + # indicator includes both positive and negative labels. + # labels includes only positives labels. + # positives = indicator & labels. + # negatives = indicator & !labels. + # ignore = !indicator. + indicator = tf.greater(match_results, -2) + labels = tf.greater(match_results, -1) + + samples = sampler.subsample( + indicator, self._rpn_batch_size_per_im, labels) + positive_labels = tf.where( + tf.logical_and(samples, labels), + tf.constant(2, dtype=tf.int32, shape=match_results.shape), + tf.constant(0, dtype=tf.int32, shape=match_results.shape)) + negative_labels = tf.where( + tf.logical_and(samples, tf.logical_not(labels)), + tf.constant(1, dtype=tf.int32, shape=match_results.shape), + tf.constant(0, dtype=tf.int32, shape=match_results.shape)) + ignore_labels = tf.fill(match_results.shape, -1) + + return (ignore_labels + positive_labels + negative_labels, + positive_labels, negative_labels) + + def label_anchors( + self, anchor_boxes: Dict[str, tf.Tensor], gt_boxes: tf.Tensor, + gt_labels: tf.Tensor + ) -> Tuple[Dict[str, tf.Tensor], Dict[str, tf.Tensor]]: + """Labels anchors with ground truth inputs. + + Args: + anchor_boxes: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with + shape [height_l, width_l, num_anchors_per_location * 4]. The height_l + and width_l represent the dimension of the feature pyramid at l-th + level. For each anchor box, the tensor stores [y0, x0, y1, x1] for the + four corners. + gt_boxes: A float tensor with shape [N, 4] representing groundtruth boxes. + For each row, it stores [y0, x0, y1, x1] for four corners of a box. + gt_labels: A integer tensor with shape [N, 1] representing groundtruth + classes. + + Returns: + score_targets_dict: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with + shape [height_l, width_l, num_anchors_per_location]. The height_l and + width_l represent the dimension of class logits at l-th level. + box_targets_dict: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with + shape [height_l, width_l, num_anchors_per_location * 4]. The height_l + and width_l represent the dimension of bounding box regression output at + l-th level. + """ + flattened_anchor_boxes = [] + for anchors in anchor_boxes.values(): + flattened_anchor_boxes.append(tf.reshape(anchors, [-1, 4])) + flattened_anchor_boxes = tf.concat(flattened_anchor_boxes, axis=0) + similarity_matrix = self.similarity_calc(flattened_anchor_boxes, gt_boxes) + match_indices, match_indicators = self.matcher(similarity_matrix) + box_mask = tf.tile(tf.expand_dims(tf.less_equal(match_indicators, 0), -1), + [1, 4]) + box_targets = self.target_gather(gt_boxes, match_indices, box_mask) + box_targets_list = box_list.BoxList(box_targets) + anchor_box_list = box_list.BoxList(flattened_anchor_boxes) + box_targets = self.box_coder.encode(box_targets_list, anchor_box_list) + + # Zero out the unmatched and ignored regression targets. + num_matches = match_indices.shape.as_list()[0] or tf.shape(match_indices)[0] + unmatched_ignored_box_targets = tf.zeros([num_matches, 4], dtype=tf.float32) + matched_anchors_mask = tf.greater_equal(match_indicators, 0) + # To broadcast matched_anchors_mask to the same shape as + # matched_reg_targets. + matched_anchors_mask = tf.tile( + tf.expand_dims(matched_anchors_mask, 1), + [1, tf.shape(box_targets)[1]]) + box_targets = tf.where(matched_anchors_mask, box_targets, + unmatched_ignored_box_targets) + + # score_targets contains the subsampled positive and negative anchors. + score_targets, _, _ = self._get_rpn_samples(match_indicators) + + # Unpacks labels. + score_targets_dict = unpack_targets(score_targets, anchor_boxes) + box_targets_dict = unpack_targets(box_targets, anchor_boxes) + + return score_targets_dict, box_targets_dict + + +def build_anchor_generator(min_level, max_level, num_scales, aspect_ratios, + anchor_size): + """Build anchor generator from levels.""" + anchor_sizes = collections.OrderedDict() + strides = collections.OrderedDict() + scales = [] + for scale in range(num_scales): + scales.append(2**(scale / float(num_scales))) + for level in range(min_level, max_level + 1): + stride = 2**level + strides[str(level)] = stride + anchor_sizes[str(level)] = anchor_size * stride + anchor_gen = anchor_generator.AnchorGenerator( + anchor_sizes=anchor_sizes, + scales=scales, + aspect_ratios=aspect_ratios, + strides=strides) + return anchor_gen + + +def unpack_targets( + targets: tf.Tensor, + anchor_boxes_dict: Dict[str, tf.Tensor]) -> Dict[str, tf.Tensor]: + """Unpacks an array of labels into multiscales labels. + + Args: + targets: A tensor with shape [num_anchors, M] representing the packed + targets with M values stored for each anchor. + anchor_boxes_dict: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with shape + [height_l, width_l, num_anchors_per_location * 4]. The height_l and + width_l represent the dimension of the feature pyramid at l-th level. For + each anchor box, the tensor stores [y0, x0, y1, x1] for the four corners. + + Returns: + unpacked_targets: An ordered dictionary with keys + [min_level, min_level+1, ..., max_level]. The values are tensor with shape + [height_l, width_l, num_anchors_per_location * M]. The height_l and + width_l represent the dimension of the feature pyramid at l-th level. M is + the number of values stored for each anchor. + """ + unpacked_targets = collections.OrderedDict() + count = 0 + for level, anchor_boxes in anchor_boxes_dict.items(): + feat_size_shape = anchor_boxes.shape.as_list() + feat_size_y = feat_size_shape[0] + feat_size_x = feat_size_shape[1] + anchors_per_location = int(feat_size_shape[2] / 4) + steps = feat_size_y * feat_size_x * anchors_per_location + unpacked_targets[level] = tf.reshape(targets[count:count + steps], + [feat_size_y, feat_size_x, -1]) + count += steps + return unpacked_targets diff --git a/official/vision/beta/ops/anchor_generator.py b/official/vision/ops/anchor_generator.py similarity index 98% rename from official/vision/beta/ops/anchor_generator.py rename to official/vision/ops/anchor_generator.py index c15b178da09913cd080bf54ccd2ee261f17751d6..5a231fedcb015678347dc34aa6b07f618442ed2c 100644 --- a/official/vision/beta/ops/anchor_generator.py +++ b/official/vision/ops/anchor_generator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -47,7 +47,7 @@ class _SingleAnchorGenerator: stride: A single int represents the anchor stride size between center of each anchor. clip_boxes: Boolean to represent whether the anchor coordinates should be - clipped to the image size. Defaults to `True`. + clipped to the image size. Defaults to `False`. Input shape: the size of the image, `[H, W, C]` Output shape: the size of anchors, `[(H / stride) * (W / stride), 4]` """ diff --git a/official/vision/beta/ops/anchor_generator_test.py b/official/vision/ops/anchor_generator_test.py similarity index 97% rename from official/vision/beta/ops/anchor_generator_test.py rename to official/vision/ops/anchor_generator_test.py index 2d24e76dc2c425c2549b6bc9e27a0d477c364ef4..95a7b53844181f9cbb749ad9b5b432789049aa29 100644 --- a/official/vision/beta/ops/anchor_generator_test.py +++ b/official/vision/ops/anchor_generator_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ from absl.testing import parameterized import tensorflow as tf -from official.vision.beta.ops import anchor_generator +from official.vision.ops import anchor_generator class AnchorGeneratorTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/beta/ops/anchor_test.py b/official/vision/ops/anchor_test.py similarity index 98% rename from official/vision/beta/ops/anchor_test.py rename to official/vision/ops/anchor_test.py index ba119ebedc5a95fd2c4ce697ed405a55f2d43dde..6b7af08fe508eedb27556a3083f37fa825903fa0 100644 --- a/official/vision/beta/ops/anchor_test.py +++ b/official/vision/ops/anchor_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ from absl.testing import parameterized import numpy as np import tensorflow as tf -from official.vision.beta.ops import anchor +from official.vision.ops import anchor class AnchorTest(parameterized.TestCase, tf.test.TestCase): diff --git a/official/vision/ops/augment.py b/official/vision/ops/augment.py new file mode 100644 index 0000000000000000000000000000000000000000..9d08474f292451d1c49c62127c459285a7124b29 --- /dev/null +++ b/official/vision/ops/augment.py @@ -0,0 +1,2401 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Augmentation policies for enhanced image/video preprocessing. + +AutoAugment Reference: + - AutoAugment Reference: https://arxiv.org/abs/1805.09501 + - AutoAugment for Object Detection Reference: https://arxiv.org/abs/1906.11172 +RandAugment Reference: https://arxiv.org/abs/1909.13719 +RandomErasing Reference: https://arxiv.org/abs/1708.04896 +MixupAndCutmix: + - Mixup: https://arxiv.org/abs/1710.09412 + - Cutmix: https://arxiv.org/abs/1905.04899 + +RandomErasing, Mixup and Cutmix are inspired by +https://github.com/rwightman/pytorch-image-models + +""" +import inspect +import math +from typing import Any, List, Iterable, Optional, Text, Tuple + +from keras.layers.preprocessing import image_preprocessing as image_ops +import numpy as np +import tensorflow as tf + + +# This signifies the max integer that the controller RNN could predict for the +# augmentation scheme. +_MAX_LEVEL = 10. + + +def to_4d(image: tf.Tensor) -> tf.Tensor: + """Converts an input Tensor to 4 dimensions. + + 4D image => [N, H, W, C] or [N, C, H, W] + 3D image => [1, H, W, C] or [1, C, H, W] + 2D image => [1, H, W, 1] + + Args: + image: The 2/3/4D input tensor. + + Returns: + A 4D image tensor. + + Raises: + `TypeError` if `image` is not a 2/3/4D tensor. + + """ + shape = tf.shape(image) + original_rank = tf.rank(image) + left_pad = tf.cast(tf.less_equal(original_rank, 3), dtype=tf.int32) + right_pad = tf.cast(tf.equal(original_rank, 2), dtype=tf.int32) + new_shape = tf.concat( + [ + tf.ones(shape=left_pad, dtype=tf.int32), + shape, + tf.ones(shape=right_pad, dtype=tf.int32), + ], + axis=0, + ) + return tf.reshape(image, new_shape) + + +def from_4d(image: tf.Tensor, ndims: tf.Tensor) -> tf.Tensor: + """Converts a 4D image back to `ndims` rank.""" + shape = tf.shape(image) + begin = tf.cast(tf.less_equal(ndims, 3), dtype=tf.int32) + end = 4 - tf.cast(tf.equal(ndims, 2), dtype=tf.int32) + new_shape = shape[begin:end] + return tf.reshape(image, new_shape) + + +def _convert_translation_to_transform(translations: tf.Tensor) -> tf.Tensor: + """Converts translations to a projective transform. + + The translation matrix looks like this: + [[1 0 -dx] + [0 1 -dy] + [0 0 1]] + + Args: + translations: The 2-element list representing [dx, dy], or a matrix of + 2-element lists representing [dx dy] to translate for each image. The + shape must be static. + + Returns: + The transformation matrix of shape (num_images, 8). + + Raises: + `TypeError` if + - the shape of `translations` is not known or + - the shape of `translations` is not rank 1 or 2. + + """ + translations = tf.convert_to_tensor(translations, dtype=tf.float32) + if translations.get_shape().ndims is None: + raise TypeError('translations rank must be statically known') + elif len(translations.get_shape()) == 1: + translations = translations[None] + elif len(translations.get_shape()) != 2: + raise TypeError('translations should have rank 1 or 2.') + num_translations = tf.shape(translations)[0] + + return tf.concat( + values=[ + tf.ones((num_translations, 1), tf.dtypes.float32), + tf.zeros((num_translations, 1), tf.dtypes.float32), + -translations[:, 0, None], + tf.zeros((num_translations, 1), tf.dtypes.float32), + tf.ones((num_translations, 1), tf.dtypes.float32), + -translations[:, 1, None], + tf.zeros((num_translations, 2), tf.dtypes.float32), + ], + axis=1, + ) + + +def _convert_angles_to_transform(angles: tf.Tensor, image_width: tf.Tensor, + image_height: tf.Tensor) -> tf.Tensor: + """Converts an angle or angles to a projective transform. + + Args: + angles: A scalar to rotate all images, or a vector to rotate a batch of + images. This must be a scalar. + image_width: The width of the image(s) to be transformed. + image_height: The height of the image(s) to be transformed. + + Returns: + A tensor of shape (num_images, 8). + + Raises: + `TypeError` if `angles` is not rank 0 or 1. + + """ + angles = tf.convert_to_tensor(angles, dtype=tf.float32) + if len(angles.get_shape()) == 0: # pylint:disable=g-explicit-length-test + angles = angles[None] + elif len(angles.get_shape()) != 1: + raise TypeError('Angles should have a rank 0 or 1.') + x_offset = ((image_width - 1) - + (tf.math.cos(angles) * (image_width - 1) - tf.math.sin(angles) * + (image_height - 1))) / 2.0 + y_offset = ((image_height - 1) - + (tf.math.sin(angles) * (image_width - 1) + tf.math.cos(angles) * + (image_height - 1))) / 2.0 + num_angles = tf.shape(angles)[0] + return tf.concat( + values=[ + tf.math.cos(angles)[:, None], + -tf.math.sin(angles)[:, None], + x_offset[:, None], + tf.math.sin(angles)[:, None], + tf.math.cos(angles)[:, None], + y_offset[:, None], + tf.zeros((num_angles, 2), tf.dtypes.float32), + ], + axis=1, + ) + + +def transform(image: tf.Tensor, transforms) -> tf.Tensor: + """Prepares input data for `image_ops.transform`.""" + original_ndims = tf.rank(image) + transforms = tf.convert_to_tensor(transforms, dtype=tf.float32) + if transforms.shape.rank == 1: + transforms = transforms[None] + image = to_4d(image) + image = image_ops.transform( + images=image, transforms=transforms, interpolation='nearest') + return from_4d(image, original_ndims) + + +def translate(image: tf.Tensor, translations) -> tf.Tensor: + """Translates image(s) by provided vectors. + + Args: + image: An image Tensor of type uint8. + translations: A vector or matrix representing [dx dy]. + + Returns: + The translated version of the image. + + """ + transforms = _convert_translation_to_transform(translations) + return transform(image, transforms=transforms) + + +def rotate(image: tf.Tensor, degrees: float) -> tf.Tensor: + """Rotates the image by degrees either clockwise or counterclockwise. + + Args: + image: An image Tensor of type uint8. + degrees: Float, a scalar angle in degrees to rotate all images by. If + degrees is positive the image will be rotated clockwise otherwise it will + be rotated counterclockwise. + + Returns: + The rotated version of image. + + """ + # Convert from degrees to radians. + degrees_to_radians = math.pi / 180.0 + radians = tf.cast(degrees * degrees_to_radians, tf.float32) + + original_ndims = tf.rank(image) + image = to_4d(image) + + image_height = tf.cast(tf.shape(image)[1], tf.float32) + image_width = tf.cast(tf.shape(image)[2], tf.float32) + transforms = _convert_angles_to_transform( + angles=radians, image_width=image_width, image_height=image_height) + # In practice, we should randomize the rotation degrees by flipping + # it negatively half the time, but that's done on 'degrees' outside + # of the function. + image = transform(image, transforms=transforms) + return from_4d(image, original_ndims) + + +def blend(image1: tf.Tensor, image2: tf.Tensor, factor: float) -> tf.Tensor: + """Blend image1 and image2 using 'factor'. + + Factor can be above 0.0. A value of 0.0 means only image1 is used. + A value of 1.0 means only image2 is used. A value between 0.0 and + 1.0 means we linearly interpolate the pixel values between the two + images. A value greater than 1.0 "extrapolates" the difference + between the two pixel values, and we clip the results to values + between 0 and 255. + + Args: + image1: An image Tensor of type uint8. + image2: An image Tensor of type uint8. + factor: A floating point value above 0.0. + + Returns: + A blended image Tensor of type uint8. + """ + if factor == 0.0: + return tf.convert_to_tensor(image1) + if factor == 1.0: + return tf.convert_to_tensor(image2) + + image1 = tf.cast(image1, tf.float32) + image2 = tf.cast(image2, tf.float32) + + difference = image2 - image1 + scaled = factor * difference + + # Do addition in float. + temp = tf.cast(image1, tf.float32) + scaled + + # Interpolate + if factor > 0.0 and factor < 1.0: + # Interpolation means we always stay within 0 and 255. + return tf.cast(temp, tf.uint8) + + # Extrapolate: + # + # We need to clip and then cast. + return tf.cast(tf.clip_by_value(temp, 0.0, 255.0), tf.uint8) + + +def cutout(image: tf.Tensor, pad_size: int, replace: int = 0) -> tf.Tensor: + """Apply cutout (https://arxiv.org/abs/1708.04552) to image. + + This operation applies a (2*pad_size x 2*pad_size) mask of zeros to + a random location within `image`. The pixel values filled in will be of the + value `replace`. The location where the mask will be applied is randomly + chosen uniformly over the whole image. + + Args: + image: An image Tensor of type uint8. + pad_size: Specifies how big the zero mask that will be generated is that is + applied to the image. The mask will be of size (2*pad_size x 2*pad_size). + replace: What pixel value to fill in the image in the area that has the + cutout mask applied to it. + + Returns: + An image Tensor that is of type uint8. + """ + if image.shape.rank not in [3, 4]: + raise ValueError('Bad image rank: {}'.format(image.shape.rank)) + + if image.shape.rank == 4: + return cutout_video(image, replace=replace) + + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + + # Sample the center location in the image where the zero mask will be applied. + cutout_center_height = tf.random.uniform( + shape=[], minval=0, maxval=image_height, dtype=tf.int32) + + cutout_center_width = tf.random.uniform( + shape=[], minval=0, maxval=image_width, dtype=tf.int32) + + image = _fill_rectangle(image, cutout_center_width, cutout_center_height, + pad_size, pad_size, replace) + + return image + + +def _fill_rectangle(image, + center_width, + center_height, + half_width, + half_height, + replace=None): + """Fills blank area.""" + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + + lower_pad = tf.maximum(0, center_height - half_height) + upper_pad = tf.maximum(0, image_height - center_height - half_height) + left_pad = tf.maximum(0, center_width - half_width) + right_pad = tf.maximum(0, image_width - center_width - half_width) + + cutout_shape = [ + image_height - (lower_pad + upper_pad), + image_width - (left_pad + right_pad) + ] + padding_dims = [[lower_pad, upper_pad], [left_pad, right_pad]] + mask = tf.pad( + tf.zeros(cutout_shape, dtype=image.dtype), + padding_dims, + constant_values=1) + mask = tf.expand_dims(mask, -1) + mask = tf.tile(mask, [1, 1, 3]) + + if replace is None: + fill = tf.random.normal(tf.shape(image), dtype=image.dtype) + elif isinstance(replace, tf.Tensor): + fill = replace + else: + fill = tf.ones_like(image, dtype=image.dtype) * replace + image = tf.where(tf.equal(mask, 0), fill, image) + + return image + + +def _fill_rectangle_video(image, + center_width, + center_height, + half_width, + half_height, + replace=None): + """Fills blank area for video.""" + image_time = tf.shape(image)[0] + image_height = tf.shape(image)[1] + image_width = tf.shape(image)[2] + + lower_pad = tf.maximum(0, center_height - half_height) + upper_pad = tf.maximum(0, image_height - center_height - half_height) + left_pad = tf.maximum(0, center_width - half_width) + right_pad = tf.maximum(0, image_width - center_width - half_width) + + cutout_shape = [ + image_time, image_height - (lower_pad + upper_pad), + image_width - (left_pad + right_pad) + ] + padding_dims = [[0, 0], [lower_pad, upper_pad], [left_pad, right_pad]] + mask = tf.pad( + tf.zeros(cutout_shape, dtype=image.dtype), + padding_dims, + constant_values=1) + mask = tf.expand_dims(mask, -1) + mask = tf.tile(mask, [1, 1, 1, 3]) + + if replace is None: + fill = tf.random.normal(tf.shape(image), dtype=image.dtype) + elif isinstance(replace, tf.Tensor): + fill = replace + else: + fill = tf.ones_like(image, dtype=image.dtype) * replace + image = tf.where(tf.equal(mask, 0), fill, image) + + return image + + +def cutout_video(image: tf.Tensor, replace: int = 0) -> tf.Tensor: + """Apply cutout (https://arxiv.org/abs/1708.04552) to a video. + + This operation applies a random size 3D mask of zeros to a random location + within `image`. The mask is padded The pixel values filled in will be of the + value `replace`. The location where the mask will be applied is randomly + chosen uniformly over the whole image. The size of the mask is randomly + sampled uniformly from [0.25*height, 0.5*height], [0.25*width, 0.5*width], + and [1, 0.25*depth], which represent the height, width, and number of frames + of the input video tensor respectively. + + Args: + image: A video Tensor of type uint8. + replace: What pixel value to fill in the image in the area that has the + cutout mask applied to it. + + Returns: + An video Tensor that is of type uint8. + """ + image_depth = tf.shape(image)[0] + image_height = tf.shape(image)[1] + image_width = tf.shape(image)[2] + + # Sample the center location in the image where the zero mask will be applied. + cutout_center_height = tf.random.uniform( + shape=[], minval=0, maxval=image_height, dtype=tf.int32) + + cutout_center_width = tf.random.uniform( + shape=[], minval=0, maxval=image_width, dtype=tf.int32) + + cutout_center_depth = tf.random.uniform( + shape=[], minval=0, maxval=image_depth, dtype=tf.int32) + + pad_size_height = tf.random.uniform( + shape=[], + minval=tf.maximum(1, tf.cast(image_height / 4, tf.int32)), + maxval=tf.maximum(2, tf.cast(image_height / 2, tf.int32)), + dtype=tf.int32) + pad_size_width = tf.random.uniform( + shape=[], + minval=tf.maximum(1, tf.cast(image_width / 4, tf.int32)), + maxval=tf.maximum(2, tf.cast(image_width / 2, tf.int32)), + dtype=tf.int32) + pad_size_depth = tf.random.uniform( + shape=[], + minval=1, + maxval=tf.maximum(2, tf.cast(image_depth / 4, tf.int32)), + dtype=tf.int32) + + lower_pad = tf.maximum(0, cutout_center_height - pad_size_height) + upper_pad = tf.maximum( + 0, image_height - cutout_center_height - pad_size_height) + left_pad = tf.maximum(0, cutout_center_width - pad_size_width) + right_pad = tf.maximum(0, image_width - cutout_center_width - pad_size_width) + back_pad = tf.maximum(0, cutout_center_depth - pad_size_depth) + forward_pad = tf.maximum( + 0, image_depth - cutout_center_depth - pad_size_depth) + + cutout_shape = [ + image_depth - (back_pad + forward_pad), + image_height - (lower_pad + upper_pad), + image_width - (left_pad + right_pad), + ] + padding_dims = [[back_pad, forward_pad], + [lower_pad, upper_pad], + [left_pad, right_pad]] + mask = tf.pad( + tf.zeros(cutout_shape, dtype=image.dtype), + padding_dims, + constant_values=1) + mask = tf.expand_dims(mask, -1) + mask = tf.tile(mask, [1, 1, 1, 3]) + image = tf.where( + tf.equal(mask, 0), + tf.ones_like(image, dtype=image.dtype) * replace, image) + return image + + +def solarize(image: tf.Tensor, threshold: int = 128) -> tf.Tensor: + """Solarize the input image(s).""" + # For each pixel in the image, select the pixel + # if the value is less than the threshold. + # Otherwise, subtract 255 from the pixel. + return tf.where(image < threshold, image, 255 - image) + + +def solarize_add(image: tf.Tensor, + addition: int = 0, + threshold: int = 128) -> tf.Tensor: + """Additive solarize the input image(s).""" + # For each pixel in the image less than threshold + # we add 'addition' amount to it and then clip the + # pixel value to be between 0 and 255. The value + # of 'addition' is between -128 and 128. + added_image = tf.cast(image, tf.int64) + addition + added_image = tf.cast(tf.clip_by_value(added_image, 0, 255), tf.uint8) + return tf.where(image < threshold, added_image, image) + + +def color(image: tf.Tensor, factor: float) -> tf.Tensor: + """Equivalent of PIL Color.""" + degenerate = tf.image.grayscale_to_rgb(tf.image.rgb_to_grayscale(image)) + return blend(degenerate, image, factor) + + +def contrast(image: tf.Tensor, factor: float) -> tf.Tensor: + """Equivalent of PIL Contrast.""" + degenerate = tf.image.rgb_to_grayscale(image) + # Cast before calling tf.histogram. + degenerate = tf.cast(degenerate, tf.int32) + + # Compute the grayscale histogram, then compute the mean pixel value, + # and create a constant image size of that value. Use that as the + # blending degenerate target of the original image. + hist = tf.histogram_fixed_width(degenerate, [0, 255], nbins=256) + mean = tf.reduce_sum(tf.cast(hist, tf.float32)) / 256.0 + degenerate = tf.ones_like(degenerate, dtype=tf.float32) * mean + degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) + degenerate = tf.image.grayscale_to_rgb(tf.cast(degenerate, tf.uint8)) + return blend(degenerate, image, factor) + + +def brightness(image: tf.Tensor, factor: float) -> tf.Tensor: + """Equivalent of PIL Brightness.""" + degenerate = tf.zeros_like(image) + return blend(degenerate, image, factor) + + +def posterize(image: tf.Tensor, bits: int) -> tf.Tensor: + """Equivalent of PIL Posterize.""" + shift = 8 - bits + return tf.bitwise.left_shift(tf.bitwise.right_shift(image, shift), shift) + + +def wrapped_rotate(image: tf.Tensor, degrees: float, replace: int) -> tf.Tensor: + """Applies rotation with wrap/unwrap.""" + image = rotate(wrap(image), degrees=degrees) + return unwrap(image, replace) + + +def translate_x(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: + """Equivalent of PIL Translate in X dimension.""" + image = translate(wrap(image), [-pixels, 0]) + return unwrap(image, replace) + + +def translate_y(image: tf.Tensor, pixels: int, replace: int) -> tf.Tensor: + """Equivalent of PIL Translate in Y dimension.""" + image = translate(wrap(image), [0, -pixels]) + return unwrap(image, replace) + + +def shear_x(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: + """Equivalent of PIL Shearing in X dimension.""" + # Shear parallel to x axis is a projective transform + # with a matrix form of: + # [1 level + # 0 1]. + image = transform( + image=wrap(image), transforms=[1., level, 0., 0., 1., 0., 0., 0.]) + return unwrap(image, replace) + + +def shear_y(image: tf.Tensor, level: float, replace: int) -> tf.Tensor: + """Equivalent of PIL Shearing in Y dimension.""" + # Shear parallel to y axis is a projective transform + # with a matrix form of: + # [1 0 + # level 1]. + image = transform( + image=wrap(image), transforms=[1., 0., 0., level, 1., 0., 0., 0.]) + return unwrap(image, replace) + + +def autocontrast(image: tf.Tensor) -> tf.Tensor: + """Implements Autocontrast function from PIL using TF ops. + + Args: + image: A 3D uint8 tensor. + + Returns: + The image after it has had autocontrast applied to it and will be of type + uint8. + """ + + def scale_channel(image: tf.Tensor) -> tf.Tensor: + """Scale the 2D image using the autocontrast rule.""" + # A possibly cheaper version can be done using cumsum/unique_with_counts + # over the histogram values, rather than iterating over the entire image. + # to compute mins and maxes. + lo = tf.cast(tf.reduce_min(image), tf.float32) + hi = tf.cast(tf.reduce_max(image), tf.float32) + + # Scale the image, making the lowest value 0 and the highest value 255. + def scale_values(im): + scale = 255.0 / (hi - lo) + offset = -lo * scale + im = tf.cast(im, tf.float32) * scale + offset + im = tf.clip_by_value(im, 0.0, 255.0) + return tf.cast(im, tf.uint8) + + result = tf.cond(hi > lo, lambda: scale_values(image), lambda: image) + return result + + # Assumes RGB for now. Scales each channel independently + # and then stacks the result. + s1 = scale_channel(image[..., 0]) + s2 = scale_channel(image[..., 1]) + s3 = scale_channel(image[..., 2]) + image = tf.stack([s1, s2, s3], -1) + + return image + + +def sharpness(image: tf.Tensor, factor: float) -> tf.Tensor: + """Implements Sharpness function from PIL using TF ops.""" + orig_image = image + image = tf.cast(image, tf.float32) + # Make image 4D for conv operation. + image = tf.expand_dims(image, 0) + # SMOOTH PIL Kernel. + if orig_image.shape.rank == 3: + kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]], + dtype=tf.float32, + shape=[3, 3, 1, 1]) / 13. + # Tile across channel dimension. + kernel = tf.tile(kernel, [1, 1, 3, 1]) + strides = [1, 1, 1, 1] + degenerate = tf.nn.depthwise_conv2d( + image, kernel, strides, padding='VALID', dilations=[1, 1]) + elif orig_image.shape.rank == 4: + kernel = tf.constant([[1, 1, 1], [1, 5, 1], [1, 1, 1]], + dtype=tf.float32, + shape=[1, 3, 3, 1, 1]) / 13. + strides = [1, 1, 1, 1, 1] + # Run the kernel across each channel + channels = tf.split(image, 3, axis=-1) + degenerates = [ + tf.nn.conv3d(channel, kernel, strides, padding='VALID', + dilations=[1, 1, 1, 1, 1]) + for channel in channels + ] + degenerate = tf.concat(degenerates, -1) + else: + raise ValueError('Bad image rank: {}'.format(image.shape.rank)) + degenerate = tf.clip_by_value(degenerate, 0.0, 255.0) + degenerate = tf.squeeze(tf.cast(degenerate, tf.uint8), [0]) + + # For the borders of the resulting image, fill in the values of the + # original image. + mask = tf.ones_like(degenerate) + paddings = [[0, 0]] * (orig_image.shape.rank - 3) + padded_mask = tf.pad(mask, paddings + [[1, 1], [1, 1], [0, 0]]) + padded_degenerate = tf.pad(degenerate, paddings + [[1, 1], [1, 1], [0, 0]]) + result = tf.where(tf.equal(padded_mask, 1), padded_degenerate, orig_image) + + # Blend the final result. + return blend(result, orig_image, factor) + + +def equalize(image: tf.Tensor) -> tf.Tensor: + """Implements Equalize function from PIL using TF ops.""" + + def scale_channel(im, c): + """Scale the data in the channel to implement equalize.""" + im = tf.cast(im[..., c], tf.int32) + # Compute the histogram of the image channel. + histo = tf.histogram_fixed_width(im, [0, 255], nbins=256) + + # For the purposes of computing the step, filter out the nonzeros. + nonzero = tf.where(tf.not_equal(histo, 0)) + nonzero_histo = tf.reshape(tf.gather(histo, nonzero), [-1]) + step = (tf.reduce_sum(nonzero_histo) - nonzero_histo[-1]) // 255 + + def build_lut(histo, step): + # Compute the cumulative sum, shifting by step // 2 + # and then normalization by step. + lut = (tf.cumsum(histo) + (step // 2)) // step + # Shift lut, prepending with 0. + lut = tf.concat([[0], lut[:-1]], 0) + # Clip the counts to be in range. This is done + # in the C code for image.point. + return tf.clip_by_value(lut, 0, 255) + + # If step is zero, return the original image. Otherwise, build + # lut from the full histogram and step and then index from it. + result = tf.cond( + tf.equal(step, 0), lambda: im, + lambda: tf.gather(build_lut(histo, step), im)) + + return tf.cast(result, tf.uint8) + + # Assumes RGB for now. Scales each channel independently + # and then stacks the result. + s1 = scale_channel(image, 0) + s2 = scale_channel(image, 1) + s3 = scale_channel(image, 2) + image = tf.stack([s1, s2, s3], -1) + return image + + +def invert(image: tf.Tensor) -> tf.Tensor: + """Inverts the image pixels.""" + image = tf.convert_to_tensor(image) + return 255 - image + + +def wrap(image: tf.Tensor) -> tf.Tensor: + """Returns 'image' with an extra channel set to all 1s.""" + shape = tf.shape(image) + extended_channel = tf.expand_dims(tf.ones(shape[:-1], image.dtype), -1) + extended = tf.concat([image, extended_channel], axis=-1) + return extended + + +def unwrap(image: tf.Tensor, replace: int) -> tf.Tensor: + """Unwraps an image produced by wrap. + + Where there is a 0 in the last channel for every spatial position, + the rest of the three channels in that spatial dimension are grayed + (set to 128). Operations like translate and shear on a wrapped + Tensor will leave 0s in empty locations. Some transformations look + at the intensity of values to do preprocessing, and we want these + empty pixels to assume the 'average' value, rather than pure black. + + + Args: + image: A 3D Image Tensor with 4 channels. + replace: A one or three value 1D tensor to fill empty pixels. + + Returns: + image: A 3D image Tensor with 3 channels. + """ + image_shape = tf.shape(image) + # Flatten the spatial dimensions. + flattened_image = tf.reshape(image, [-1, image_shape[-1]]) + + # Find all pixels where the last channel is zero. + alpha_channel = tf.expand_dims(flattened_image[..., 3], axis=-1) + + replace = tf.concat([replace, tf.ones([1], image.dtype)], 0) + + # Where they are zero, fill them in with 'replace'. + flattened_image = tf.where( + tf.equal(alpha_channel, 0), + tf.ones_like(flattened_image, dtype=image.dtype) * replace, + flattened_image) + + image = tf.reshape(flattened_image, image_shape) + image = tf.slice( + image, + [0] * image.shape.rank, + tf.concat([image_shape[:-1], [3]], -1)) + return image + + +def _scale_bbox_only_op_probability(prob): + """Reduce the probability of the bbox-only operation. + + Probability is reduced so that we do not distort the content of too many + bounding boxes that are close to each other. The value of 3.0 was a chosen + hyper parameter when designing the autoaugment algorithm that we found + empirically to work well. + + Args: + prob: Float that is the probability of applying the bbox-only operation. + + Returns: + Reduced probability. + """ + return prob / 3.0 + + +def _apply_bbox_augmentation(image, bbox, augmentation_func, *args): + """Applies augmentation_func to the subsection of image indicated by bbox. + + Args: + image: 3D uint8 Tensor. + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + augmentation_func: Augmentation function that will be applied to the + subsection of image. + *args: Additional parameters that will be passed into augmentation_func + when it is called. + + Returns: + A modified version of image, where the bbox location in the image will + have `ugmentation_func applied to it. + """ + image_height = tf.cast(tf.shape(image)[0], tf.float32) + image_width = tf.cast(tf.shape(image)[1], tf.float32) + min_y = tf.cast(image_height * bbox[0], tf.int32) + min_x = tf.cast(image_width * bbox[1], tf.int32) + max_y = tf.cast(image_height * bbox[2], tf.int32) + max_x = tf.cast(image_width * bbox[3], tf.int32) + image_height = tf.cast(image_height, tf.int32) + image_width = tf.cast(image_width, tf.int32) + + # Clip to be sure the max values do not fall out of range. + max_y = tf.minimum(max_y, image_height - 1) + max_x = tf.minimum(max_x, image_width - 1) + + # Get the sub-tensor that is the image within the bounding box region. + bbox_content = image[min_y:max_y + 1, min_x:max_x + 1, :] + + # Apply the augmentation function to the bbox portion of the image. + augmented_bbox_content = augmentation_func(bbox_content, *args) + + # Pad the augmented_bbox_content and the mask to match the shape of original + # image. + augmented_bbox_content = tf.pad(augmented_bbox_content, + [[min_y, (image_height - 1) - max_y], + [min_x, (image_width - 1) - max_x], + [0, 0]]) + + # Create a mask that will be used to zero out a part of the original image. + mask_tensor = tf.zeros_like(bbox_content) + + mask_tensor = tf.pad(mask_tensor, + [[min_y, (image_height - 1) - max_y], + [min_x, (image_width - 1) - max_x], + [0, 0]], + constant_values=1) + # Replace the old bbox content with the new augmented content. + image = image * mask_tensor + augmented_bbox_content + return image + + +def _concat_bbox(bbox, bboxes): + """Helper function that concates bbox to bboxes along the first dimension.""" + + # Note if all elements in bboxes are -1 (_INVALID_BOX), then this means + # we discard bboxes and start the bboxes Tensor with the current bbox. + bboxes_sum_check = tf.reduce_sum(bboxes) + bbox = tf.expand_dims(bbox, 0) + # This check will be true when it is an _INVALID_BOX + bboxes = tf.cond(tf.equal(bboxes_sum_check, -4.0), + lambda: bbox, + lambda: tf.concat([bboxes, bbox], 0)) + return bboxes + + +def _apply_bbox_augmentation_wrapper(image, bbox, new_bboxes, prob, + augmentation_func, func_changes_bbox, + *args): + """Applies _apply_bbox_augmentation with probability prob. + + Args: + image: 3D uint8 Tensor. + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + new_bboxes: 2D Tensor that is a list of the bboxes in the image after they + have been altered by aug_func. These will only be changed when + func_changes_bbox is set to true. Each bbox has 4 elements + (min_y, min_x, max_y, max_x) of type float that are the normalized + bbox coordinates between 0 and 1. + prob: Float that is the probability of applying _apply_bbox_augmentation. + augmentation_func: Augmentation function that will be applied to the + subsection of image. + func_changes_bbox: Boolean. Does augmentation_func return bbox in addition + to image. + *args: Additional parameters that will be passed into augmentation_func + when it is called. + + Returns: + A tuple. Fist element is a modified version of image, where the bbox + location in the image will have augmentation_func applied to it if it is + chosen to be called with probability `prob`. The second element is a + Tensor of Tensors of length 4 that will contain the altered bbox after + applying augmentation_func. + """ + should_apply_op = tf.cast( + tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) + if func_changes_bbox: + augmented_image, bbox = tf.cond( + should_apply_op, + lambda: augmentation_func(image, bbox, *args), + lambda: (image, bbox)) + else: + augmented_image = tf.cond( + should_apply_op, + lambda: _apply_bbox_augmentation(image, bbox, augmentation_func, *args), + lambda: image) + new_bboxes = _concat_bbox(bbox, new_bboxes) + return augmented_image, new_bboxes + + +def _apply_multi_bbox_augmentation_wrapper(image, bboxes, prob, aug_func, + func_changes_bbox, *args): + """Checks to be sure num bboxes > 0 before calling inner function.""" + num_bboxes = tf.shape(bboxes)[0] + image, bboxes = tf.cond( + tf.equal(num_bboxes, 0), + lambda: (image, bboxes), + # pylint:disable=g-long-lambda + lambda: _apply_multi_bbox_augmentation( + image, bboxes, prob, aug_func, func_changes_bbox, *args)) + # pylint:enable=g-long-lambda + return image, bboxes + + +# Represents an invalid bounding box that is used for checking for padding +# lists of bounding box coordinates for a few augmentation operations +_INVALID_BOX = [[-1.0, -1.0, -1.0, -1.0]] + + +def _apply_multi_bbox_augmentation(image, bboxes, prob, aug_func, + func_changes_bbox, *args): + """Applies aug_func to the image for each bbox in bboxes. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float. + prob: Float that is the probability of applying aug_func to a specific + bounding box within the image. + aug_func: Augmentation function that will be applied to the + subsections of image indicated by the bbox values in bboxes. + func_changes_bbox: Boolean. Does augmentation_func return bbox in addition + to image. + *args: Additional parameters that will be passed into augmentation_func + when it is called. + + Returns: + A modified version of image, where each bbox location in the image will + have augmentation_func applied to it if it is chosen to be called with + probability prob independently across all bboxes. Also the final + bboxes are returned that will be unchanged if func_changes_bbox is set to + false and if true, the new altered ones will be returned. + + Raises: + ValueError if applied to video. + """ + if image.shape.rank == 4: + raise ValueError('Image rank 4 is not supported') + + # Will keep track of the new altered bboxes after aug_func is repeatedly + # applied. The -1 values are a dummy value and this first Tensor will be + # removed upon appending the first real bbox. + new_bboxes = tf.constant(_INVALID_BOX) + + # If the bboxes are empty, then just give it _INVALID_BOX. The result + # will be thrown away. + bboxes = tf.cond(tf.equal(tf.size(bboxes), 0), + lambda: tf.constant(_INVALID_BOX), + lambda: bboxes) + + bboxes = tf.ensure_shape(bboxes, (None, 4)) + + # pylint:disable=g-long-lambda + wrapped_aug_func = ( + lambda _image, bbox, _new_bboxes: _apply_bbox_augmentation_wrapper( + _image, bbox, _new_bboxes, prob, aug_func, func_changes_bbox, *args)) + # pylint:enable=g-long-lambda + + # Setup the while_loop. + num_bboxes = tf.shape(bboxes)[0] # We loop until we go over all bboxes. + idx = tf.constant(0) # Counter for the while loop. + + # Conditional function when to end the loop once we go over all bboxes + # images_and_bboxes contain (_image, _new_bboxes) + cond = lambda _idx, _images_and_bboxes: tf.less(_idx, num_bboxes) + + # Shuffle the bboxes so that the augmentation order is not deterministic if + # we are not changing the bboxes with aug_func. + if not func_changes_bbox: + loop_bboxes = tf.random.shuffle(bboxes) + else: + loop_bboxes = bboxes + + # Main function of while_loop where we repeatedly apply augmentation on the + # bboxes in the image. + # pylint:disable=g-long-lambda + body = lambda _idx, _images_and_bboxes: [ + _idx + 1, wrapped_aug_func(_images_and_bboxes[0], + loop_bboxes[_idx], + _images_and_bboxes[1])] + # pylint:enable=g-long-lambda + + _, (image, new_bboxes) = tf.while_loop( + cond, body, [idx, (image, new_bboxes)], + shape_invariants=[idx.get_shape(), + (image.get_shape(), tf.TensorShape([None, 4]))]) + + # Either return the altered bboxes or the original ones depending on if + # we altered them in anyway. + if func_changes_bbox: + final_bboxes = new_bboxes + else: + final_bboxes = bboxes + return image, final_bboxes + + +def _clip_bbox(min_y, min_x, max_y, max_x): + """Clip bounding box coordinates between 0 and 1. + + Args: + min_y: Normalized bbox coordinate of type float between 0 and 1. + min_x: Normalized bbox coordinate of type float between 0 and 1. + max_y: Normalized bbox coordinate of type float between 0 and 1. + max_x: Normalized bbox coordinate of type float between 0 and 1. + + Returns: + Clipped coordinate values between 0 and 1. + """ + min_y = tf.clip_by_value(min_y, 0.0, 1.0) + min_x = tf.clip_by_value(min_x, 0.0, 1.0) + max_y = tf.clip_by_value(max_y, 0.0, 1.0) + max_x = tf.clip_by_value(max_x, 0.0, 1.0) + return min_y, min_x, max_y, max_x + + +def _check_bbox_area(min_y, min_x, max_y, max_x, delta=0.05): + """Adjusts bbox coordinates to make sure the area is > 0. + + Args: + min_y: Normalized bbox coordinate of type float between 0 and 1. + min_x: Normalized bbox coordinate of type float between 0 and 1. + max_y: Normalized bbox coordinate of type float between 0 and 1. + max_x: Normalized bbox coordinate of type float between 0 and 1. + delta: Float, this is used to create a gap of size 2 * delta between + bbox min/max coordinates that are the same on the boundary. + This prevents the bbox from having an area of zero. + + Returns: + Tuple of new bbox coordinates between 0 and 1 that will now have a + guaranteed area > 0. + """ + height = max_y - min_y + width = max_x - min_x + def _adjust_bbox_boundaries(min_coord, max_coord): + # Make sure max is never 0 and min is never 1. + max_coord = tf.maximum(max_coord, 0.0 + delta) + min_coord = tf.minimum(min_coord, 1.0 - delta) + return min_coord, max_coord + min_y, max_y = tf.cond(tf.equal(height, 0.0), + lambda: _adjust_bbox_boundaries(min_y, max_y), + lambda: (min_y, max_y)) + min_x, max_x = tf.cond(tf.equal(width, 0.0), + lambda: _adjust_bbox_boundaries(min_x, max_x), + lambda: (min_x, max_x)) + return min_y, min_x, max_y, max_x + + +def _rotate_bbox(bbox, image_height, image_width, degrees): + """Rotates the bbox coordinated by degrees. + + Args: + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + image_height: Int, height of the image. + image_width: Int, height of the image. + degrees: Float, a scalar angle in degrees to rotate all images by. If + degrees is positive the image will be rotated clockwise otherwise it will + be rotated counterclockwise. + + Returns: + A tensor of the same shape as bbox, but now with the rotated coordinates. + """ + image_height, image_width = ( + tf.cast(image_height, tf.float32), tf.cast(image_width, tf.float32)) + + # Convert from degrees to radians. + degrees_to_radians = math.pi / 180.0 + radians = degrees * degrees_to_radians + + # Translate the bbox to the center of the image and turn the normalized 0-1 + # coordinates to absolute pixel locations. + # Y coordinates are made negative as the y axis of images goes down with + # increasing pixel values, so we negate to make sure x axis and y axis points + # are in the traditionally positive direction. + min_y = -tf.cast(image_height * (bbox[0] - 0.5), tf.int32) + min_x = tf.cast(image_width * (bbox[1] - 0.5), tf.int32) + max_y = -tf.cast(image_height * (bbox[2] - 0.5), tf.int32) + max_x = tf.cast(image_width * (bbox[3] - 0.5), tf.int32) + coordinates = tf.stack( + [[min_y, min_x], [min_y, max_x], [max_y, min_x], [max_y, max_x]]) + coordinates = tf.cast(coordinates, tf.float32) + # Rotate the coordinates according to the rotation matrix clockwise if + # radians is positive, else negative + rotation_matrix = tf.stack( + [[tf.cos(radians), tf.sin(radians)], + [-tf.sin(radians), tf.cos(radians)]]) + new_coords = tf.cast( + tf.matmul(rotation_matrix, tf.transpose(coordinates)), tf.int32) + # Find min/max values and convert them back to normalized 0-1 floats. + min_y = -( + tf.cast(tf.reduce_max(new_coords[0, :]), tf.float32) / image_height - 0.5) + min_x = tf.cast(tf.reduce_min(new_coords[1, :]), + tf.float32) / image_width + 0.5 + max_y = -( + tf.cast(tf.reduce_min(new_coords[0, :]), tf.float32) / image_height - 0.5) + max_x = tf.cast(tf.reduce_max(new_coords[1, :]), + tf.float32) / image_width + 0.5 + + # Clip the bboxes to be sure the fall between [0, 1]. + min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) + min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) + return tf.stack([min_y, min_x, max_y, max_x]) + + +def rotate_with_bboxes(image, bboxes, degrees, replace): + """Equivalent of PIL Rotate that rotates the image and bbox. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float. + degrees: Float, a scalar angle in degrees to rotate all images by. If + degrees is positive the image will be rotated clockwise otherwise it will + be rotated counterclockwise. + replace: A one or three value 1D tensor to fill empty pixels. + + Returns: + A tuple containing a 3D uint8 Tensor that will be the result of rotating + image by degrees. The second element of the tuple is bboxes, where now + the coordinates will be shifted to reflect the rotated image. + + Raises: + ValueError: If applied to video. + """ + if image.shape.rank == 4: + raise ValueError('Image rank 4 is not supported') + + # Rotate the image. + image = wrapped_rotate(image, degrees, replace) + + # Convert bbox coordinates to pixel values. + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + # pylint:disable=g-long-lambda + wrapped_rotate_bbox = lambda bbox: _rotate_bbox( + bbox, image_height, image_width, degrees) + # pylint:enable=g-long-lambda + bboxes = tf.map_fn(wrapped_rotate_bbox, bboxes) + return image, bboxes + + +def _shear_bbox(bbox, image_height, image_width, level, shear_horizontal): + """Shifts the bbox according to how the image was sheared. + + Args: + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + image_height: Int, height of the image. + image_width: Int, height of the image. + level: Float. How much to shear the image. + shear_horizontal: If true then shear in X dimension else shear in + the Y dimension. + + Returns: + A tensor of the same shape as bbox, but now with the shifted coordinates. + """ + image_height, image_width = ( + tf.cast(image_height, tf.float32), tf.cast(image_width, tf.float32)) + + # Change bbox coordinates to be pixels. + min_y = tf.cast(image_height * bbox[0], tf.int32) + min_x = tf.cast(image_width * bbox[1], tf.int32) + max_y = tf.cast(image_height * bbox[2], tf.int32) + max_x = tf.cast(image_width * bbox[3], tf.int32) + coordinates = tf.stack( + [[min_y, min_x], [min_y, max_x], [max_y, min_x], [max_y, max_x]]) + coordinates = tf.cast(coordinates, tf.float32) + + # Shear the coordinates according to the translation matrix. + if shear_horizontal: + translation_matrix = tf.stack( + [[1, 0], [-level, 1]]) + else: + translation_matrix = tf.stack( + [[1, -level], [0, 1]]) + translation_matrix = tf.cast(translation_matrix, tf.float32) + new_coords = tf.cast( + tf.matmul(translation_matrix, tf.transpose(coordinates)), tf.int32) + + # Find min/max values and convert them back to floats. + min_y = tf.cast(tf.reduce_min(new_coords[0, :]), tf.float32) / image_height + min_x = tf.cast(tf.reduce_min(new_coords[1, :]), tf.float32) / image_width + max_y = tf.cast(tf.reduce_max(new_coords[0, :]), tf.float32) / image_height + max_x = tf.cast(tf.reduce_max(new_coords[1, :]), tf.float32) / image_width + + # Clip the bboxes to be sure the fall between [0, 1]. + min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) + min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) + return tf.stack([min_y, min_x, max_y, max_x]) + + +def shear_with_bboxes(image, bboxes, level, replace, shear_horizontal): + """Applies Shear Transformation to the image and shifts the bboxes. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float with values + between [0, 1]. + level: Float. How much to shear the image. This value will be between + -0.3 to 0.3. + replace: A one or three value 1D tensor to fill empty pixels. + shear_horizontal: Boolean. If true then shear in X dimension else shear in + the Y dimension. + + Returns: + A tuple containing a 3D uint8 Tensor that will be the result of shearing + image by level. The second element of the tuple is bboxes, where now + the coordinates will be shifted to reflect the sheared image. + + Raises: + ValueError: If applied to video. + """ + if image.shape.rank == 4: + raise ValueError('Image rank 4 is not supported') + + if shear_horizontal: + image = shear_x(image, level, replace) + else: + image = shear_y(image, level, replace) + + # Convert bbox coordinates to pixel values. + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + # pylint:disable=g-long-lambda + wrapped_shear_bbox = lambda bbox: _shear_bbox( + bbox, image_height, image_width, level, shear_horizontal) + # pylint:enable=g-long-lambda + bboxes = tf.map_fn(wrapped_shear_bbox, bboxes) + return image, bboxes + + +def _shift_bbox(bbox, image_height, image_width, pixels, shift_horizontal): + """Shifts the bbox coordinates by pixels. + + Args: + bbox: 1D Tensor that has 4 elements (min_y, min_x, max_y, max_x) + of type float that represents the normalized coordinates between 0 and 1. + image_height: Int, height of the image. + image_width: Int, width of the image. + pixels: An int. How many pixels to shift the bbox. + shift_horizontal: Boolean. If true then shift in X dimension else shift in + Y dimension. + + Returns: + A tensor of the same shape as bbox, but now with the shifted coordinates. + """ + pixels = tf.cast(pixels, tf.int32) + # Convert bbox to integer pixel locations. + min_y = tf.cast(tf.cast(image_height, tf.float32) * bbox[0], tf.int32) + min_x = tf.cast(tf.cast(image_width, tf.float32) * bbox[1], tf.int32) + max_y = tf.cast(tf.cast(image_height, tf.float32) * bbox[2], tf.int32) + max_x = tf.cast(tf.cast(image_width, tf.float32) * bbox[3], tf.int32) + + if shift_horizontal: + min_x = tf.maximum(0, min_x - pixels) + max_x = tf.minimum(image_width, max_x - pixels) + else: + min_y = tf.maximum(0, min_y - pixels) + max_y = tf.minimum(image_height, max_y - pixels) + + # Convert bbox back to floats. + min_y = tf.cast(min_y, tf.float32) / tf.cast(image_height, tf.float32) + min_x = tf.cast(min_x, tf.float32) / tf.cast(image_width, tf.float32) + max_y = tf.cast(max_y, tf.float32) / tf.cast(image_height, tf.float32) + max_x = tf.cast(max_x, tf.float32) / tf.cast(image_width, tf.float32) + + # Clip the bboxes to be sure the fall between [0, 1]. + min_y, min_x, max_y, max_x = _clip_bbox(min_y, min_x, max_y, max_x) + min_y, min_x, max_y, max_x = _check_bbox_area(min_y, min_x, max_y, max_x) + return tf.stack([min_y, min_x, max_y, max_x]) + + +def translate_bbox(image, bboxes, pixels, replace, shift_horizontal): + """Equivalent of PIL Translate in X/Y dimension that shifts image and bbox. + + Args: + image: 3D uint8 Tensor. + bboxes: 2D Tensor that is a list of the bboxes in the image. Each bbox + has 4 elements (min_y, min_x, max_y, max_x) of type float with values + between [0, 1]. + pixels: An int. How many pixels to shift the image and bboxes + replace: A one or three value 1D tensor to fill empty pixels. + shift_horizontal: Boolean. If true then shift in X dimension else shift in + Y dimension. + + Returns: + A tuple containing a 3D uint8 Tensor that will be the result of translating + image by pixels. The second element of the tuple is bboxes, where now + the coordinates will be shifted to reflect the shifted image. + + Raises: + ValueError if applied to video. + """ + if image.shape.rank == 4: + raise ValueError('Image rank 4 is not supported') + + if shift_horizontal: + image = translate_x(image, pixels, replace) + else: + image = translate_y(image, pixels, replace) + + # Convert bbox coordinates to pixel values. + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + # pylint:disable=g-long-lambda + wrapped_shift_bbox = lambda bbox: _shift_bbox( + bbox, image_height, image_width, pixels, shift_horizontal) + # pylint:enable=g-long-lambda + bboxes = tf.map_fn(wrapped_shift_bbox, bboxes) + return image, bboxes + + +def translate_y_only_bboxes( + image: tf.Tensor, bboxes: tf.Tensor, prob: float, pixels: int, replace): + """Apply translate_y to each bbox in the image with probability prob.""" + if bboxes.shape.rank == 4: + raise ValueError('translate_y_only_bboxes does not support rank 4 boxes') + + func_changes_bbox = False + prob = _scale_bbox_only_op_probability(prob) + return _apply_multi_bbox_augmentation_wrapper( + image, bboxes, prob, translate_y, func_changes_bbox, pixels, replace) + + +def _randomly_negate_tensor(tensor): + """With 50% prob turn the tensor negative.""" + should_flip = tf.cast(tf.floor(tf.random.uniform([]) + 0.5), tf.bool) + final_tensor = tf.cond(should_flip, lambda: tensor, lambda: -tensor) + return final_tensor + + +def _rotate_level_to_arg(level: float): + level = (level / _MAX_LEVEL) * 30. + level = _randomly_negate_tensor(level) + return (level,) + + +def _shrink_level_to_arg(level: float): + """Converts level to ratio by which we shrink the image content.""" + if level == 0: + return (1.0,) # if level is zero, do not shrink the image + # Maximum shrinking ratio is 2.9. + level = 2. / (_MAX_LEVEL / level) + 0.9 + return (level,) + + +def _enhance_level_to_arg(level: float): + return ((level / _MAX_LEVEL) * 1.8 + 0.1,) + + +def _shear_level_to_arg(level: float): + level = (level / _MAX_LEVEL) * 0.3 + # Flip level to negative with 50% chance. + level = _randomly_negate_tensor(level) + return (level,) + + +def _translate_level_to_arg(level: float, translate_const: float): + level = (level / _MAX_LEVEL) * float(translate_const) + # Flip level to negative with 50% chance. + level = _randomly_negate_tensor(level) + return (level,) + + +def _mult_to_arg(level: float, multiplier: float = 1.): + return (int((level / _MAX_LEVEL) * multiplier),) + + +def _apply_func_with_prob(func: Any, image: tf.Tensor, + bboxes: Optional[tf.Tensor], args: Any, prob: float): + """Apply `func` to image w/ `args` as input with probability `prob`.""" + assert isinstance(args, tuple) + assert inspect.getfullargspec(func)[0][1] == 'bboxes' + + # Apply the function with probability `prob`. + should_apply_op = tf.cast( + tf.floor(tf.random.uniform([], dtype=tf.float32) + prob), tf.bool) + augmented_image, augmented_bboxes = tf.cond( + should_apply_op, + lambda: func(image, bboxes, *args), + lambda: (image, bboxes)) + return augmented_image, augmented_bboxes + + +def select_and_apply_random_policy(policies: Any, + image: tf.Tensor, + bboxes: Optional[tf.Tensor] = None): + """Select a random policy from `policies` and apply it to `image`.""" + policy_to_select = tf.random.uniform([], maxval=len(policies), dtype=tf.int32) + # Note that using tf.case instead of tf.conds would result in significantly + # larger graphs and would even break export for some larger policies. + for (i, policy) in enumerate(policies): + image, bboxes = tf.cond( + tf.equal(i, policy_to_select), + lambda selected_policy=policy: selected_policy(image, bboxes), + lambda: (image, bboxes)) + return image, bboxes + + +NAME_TO_FUNC = { + 'AutoContrast': autocontrast, + 'Equalize': equalize, + 'Invert': invert, + 'Rotate': wrapped_rotate, + 'Posterize': posterize, + 'Solarize': solarize, + 'SolarizeAdd': solarize_add, + 'Color': color, + 'Contrast': contrast, + 'Brightness': brightness, + 'Sharpness': sharpness, + 'ShearX': shear_x, + 'ShearY': shear_y, + 'TranslateX': translate_x, + 'TranslateY': translate_y, + 'Cutout': cutout, + 'Rotate_BBox': rotate_with_bboxes, + # pylint:disable=g-long-lambda + 'ShearX_BBox': lambda image, bboxes, level, replace: shear_with_bboxes( + image, bboxes, level, replace, shear_horizontal=True), + 'ShearY_BBox': lambda image, bboxes, level, replace: shear_with_bboxes( + image, bboxes, level, replace, shear_horizontal=False), + 'TranslateX_BBox': lambda image, bboxes, pixels, replace: translate_bbox( + image, bboxes, pixels, replace, shift_horizontal=True), + 'TranslateY_BBox': lambda image, bboxes, pixels, replace: translate_bbox( + image, bboxes, pixels, replace, shift_horizontal=False), + # pylint:enable=g-long-lambda + 'TranslateY_Only_BBoxes': translate_y_only_bboxes, +} + +# Functions that require a `bboxes` parameter. +REQUIRE_BOXES_FUNCS = frozenset({ + 'Rotate_BBox', + 'ShearX_BBox', + 'ShearY_BBox', + 'TranslateX_BBox', + 'TranslateY_BBox', + 'TranslateY_Only_BBoxes', +}) + +# Functions that have a 'prob' parameter +PROB_FUNCS = frozenset({ + 'TranslateY_Only_BBoxes', +}) + +# Functions that have a 'replace' parameter +REPLACE_FUNCS = frozenset({ + 'Rotate', + 'TranslateX', + 'ShearX', + 'ShearY', + 'TranslateY', + 'Cutout', + 'Rotate_BBox', + 'ShearX_BBox', + 'ShearY_BBox', + 'TranslateX_BBox', + 'TranslateY_BBox', + 'TranslateY_Only_BBoxes', +}) + + +def level_to_arg(cutout_const: float, translate_const: float): + """Creates a dict mapping image operation names to their arguments.""" + + no_arg = lambda level: () + posterize_arg = lambda level: _mult_to_arg(level, 4) + solarize_arg = lambda level: _mult_to_arg(level, 256) + solarize_add_arg = lambda level: _mult_to_arg(level, 110) + cutout_arg = lambda level: _mult_to_arg(level, cutout_const) + translate_arg = lambda level: _translate_level_to_arg(level, translate_const) + translate_bbox_arg = lambda level: _translate_level_to_arg(level, 120) + + args = { + 'AutoContrast': no_arg, + 'Equalize': no_arg, + 'Invert': no_arg, + 'Rotate': _rotate_level_to_arg, + 'Posterize': posterize_arg, + 'Solarize': solarize_arg, + 'SolarizeAdd': solarize_add_arg, + 'Color': _enhance_level_to_arg, + 'Contrast': _enhance_level_to_arg, + 'Brightness': _enhance_level_to_arg, + 'Sharpness': _enhance_level_to_arg, + 'ShearX': _shear_level_to_arg, + 'ShearY': _shear_level_to_arg, + 'Cutout': cutout_arg, + 'TranslateX': translate_arg, + 'TranslateY': translate_arg, + 'Rotate_BBox': _rotate_level_to_arg, + 'ShearX_BBox': _shear_level_to_arg, + 'ShearY_BBox': _shear_level_to_arg, + # pylint:disable=g-long-lambda + 'TranslateX_BBox': lambda level: _translate_level_to_arg( + level, translate_const), + 'TranslateY_BBox': lambda level: _translate_level_to_arg( + level, translate_const), + # pylint:enable=g-long-lambda + 'TranslateY_Only_BBoxes': translate_bbox_arg, + } + return args + + +def bbox_wrapper(func): + """Adds a bboxes function argument to func and returns unchanged bboxes.""" + def wrapper(images, bboxes, *args, **kwargs): + return (func(images, *args, **kwargs), bboxes) + return wrapper + + +def _parse_policy_info(name: Text, + prob: float, + level: float, + replace_value: List[int], + cutout_const: float, + translate_const: float, + level_std: float = 0.) -> Tuple[Any, float, Any]: + """Return the function that corresponds to `name` and update `level` param.""" + func = NAME_TO_FUNC[name] + + if level_std > 0: + level += tf.random.normal([], dtype=tf.float32) + level = tf.clip_by_value(level, 0., _MAX_LEVEL) + + args = level_to_arg(cutout_const, translate_const)[name](level) + + if name in PROB_FUNCS: + # Add in the prob arg if it is required for the function that is called. + args = tuple([prob] + list(args)) + + if name in REPLACE_FUNCS: + # Add in replace arg if it is required for the function that is called. + args = tuple(list(args) + [replace_value]) + + # Add bboxes as the second positional argument for the function if it does + # not already exist. + if 'bboxes' not in inspect.getfullargspec(func)[0]: + func = bbox_wrapper(func) + + return func, prob, args + + +class ImageAugment(object): + """Image augmentation class for applying image distortions.""" + + def distort( + self, + image: tf.Tensor + ) -> tf.Tensor: + """Given an image tensor, returns a distorted image with the same shape. + + Args: + image: `Tensor` of shape [height, width, 3] or + [num_frames, height, width, 3] representing an image or image sequence. + + Returns: + The augmented version of `image`. + """ + raise NotImplementedError() + + def distort_with_boxes( + self, + image: tf.Tensor, + bboxes: tf.Tensor + ) -> Tuple[tf.Tensor, tf.Tensor]: + """Distorts the image and bounding boxes. + + Args: + image: `Tensor` of shape [height, width, 3] or + [num_frames, height, width, 3] representing an image or image sequence. + bboxes: `Tensor` of shape [num_boxes, 4] or [num_frames, num_boxes, 4] + representing bounding boxes for an image or image sequence. + + Returns: + The augmented version of `image` and `bboxes`. + """ + raise NotImplementedError + + +class AutoAugment(ImageAugment): + """Applies the AutoAugment policy to images. + + AutoAugment is from the paper: https://arxiv.org/abs/1805.09501. + """ + + def __init__(self, + augmentation_name: Text = 'v0', + policies: Optional[Iterable[Iterable[Tuple[Text, float, + float]]]] = None, + cutout_const: float = 100, + translate_const: float = 250): + """Applies the AutoAugment policy to images. + + Args: + augmentation_name: The name of the AutoAugment policy to use. The + available options are `v0`, `test`, `reduced_cifar10`, `svhn` and + `reduced_imagenet`. `v0` is the policy used for all + of the results in the paper and was found to achieve the best results on + the COCO dataset. `v1`, `v2` and `v3` are additional good policies found + on the COCO dataset that have slight variation in what operations were + used during the search procedure along with how many operations are + applied in parallel to a single image (2 vs 3). Make sure to set + `policies` to `None` (the default) if you want to set options using + `augmentation_name`. + policies: list of lists of tuples in the form `(func, prob, level)`, + `func` is a string name of the augmentation function, `prob` is the + probability of applying the `func` operation, `level` (or magnitude) is + the input argument for `func`. For example: + ``` + [[('Equalize', 0.9, 3), ('Color', 0.7, 8)], + [('Invert', 0.6, 5), ('Rotate', 0.2, 9), ('ShearX', 0.1, 2)], ...] + ``` + The outer-most list must be 3-d. The number of operations in a + sub-policy can vary from one sub-policy to another. + If you provide `policies` as input, any option set with + `augmentation_name` will get overriden as they are mutually exclusive. + cutout_const: multiplier for applying cutout. + translate_const: multiplier for applying translation. + + Raises: + ValueError if `augmentation_name` is unsupported. + """ + super(AutoAugment, self).__init__() + + self.augmentation_name = augmentation_name + self.cutout_const = float(cutout_const) + self.translate_const = float(translate_const) + self.available_policies = { + 'detection_v0': self.detection_policy_v0(), + 'v0': self.policy_v0(), + 'test': self.policy_test(), + 'simple': self.policy_simple(), + 'reduced_cifar10': self.policy_reduced_cifar10(), + 'svhn': self.policy_svhn(), + 'reduced_imagenet': self.policy_reduced_imagenet(), + 'panoptic_deeplab_policy': self.panoptic_deeplab_policy(), + 'vit': self.vit(), + } + + if not policies: + if augmentation_name not in self.available_policies: + raise ValueError( + 'Invalid augmentation_name: {}'.format(augmentation_name)) + + self.policies = self.available_policies[augmentation_name] + + else: + self._check_policy_shape(policies) + self.policies = policies + + def _check_policy_shape(self, policies): + """Checks dimension and shape of the custom policy. + + Args: + policies: List of list of tuples in the form `(func, prob, level)`. Must + have shape of `(:, :, 3)`. + + Raises: + ValueError if the shape of `policies` is unexpected. + """ + in_shape = np.array(policies).shape + if len(in_shape) != 3 or in_shape[-1:] != (3,): + raise ValueError('Wrong shape detected for custom policy. Expected ' + '(:, :, 3) but got {}.'.format(in_shape)) + + def _make_tf_policies(self): + """Prepares the TF functions for augmentations based on the policies.""" + replace_value = [128] * 3 + + # func is the string name of the augmentation function, prob is the + # probability of applying the operation and level is the parameter + # associated with the tf op. + + # tf_policies are functions that take in an image and return an augmented + # image. + tf_policies = [] + for policy in self.policies: + tf_policy = [] + assert_ranges = [] + # Link string name to the correct python function and make sure the + # correct argument is passed into that function. + for policy_info in policy: + _, prob, level = policy_info + assert_ranges.append(tf.Assert(tf.less_equal(prob, 1.), [prob])) + assert_ranges.append( + tf.Assert(tf.less_equal(level, int(_MAX_LEVEL)), [level])) + + policy_info = list(policy_info) + [ + replace_value, self.cutout_const, self.translate_const + ] + tf_policy.append(_parse_policy_info(*policy_info)) + # Now build the tf policy that will apply the augmentation procedue + # on image. + def make_final_policy(tf_policy_): + + def final_policy(image_, bboxes_): + for func, prob, args in tf_policy_: + image_, bboxes_ = _apply_func_with_prob(func, image_, bboxes_, args, + prob) + return image_, bboxes_ + + return final_policy + + with tf.control_dependencies(assert_ranges): + tf_policies.append(make_final_policy(tf_policy)) + + return tf_policies + + def distort(self, image: tf.Tensor) -> tf.Tensor: + """See base class.""" + input_image_type = image.dtype + if input_image_type != tf.uint8: + image = tf.clip_by_value(image, 0.0, 255.0) + image = tf.cast(image, dtype=tf.uint8) + + tf_policies = self._make_tf_policies() + image, _ = select_and_apply_random_policy(tf_policies, image, bboxes=None) + image = tf.cast(image, dtype=input_image_type) + return image + + def distort_with_boxes(self, image: tf.Tensor, + bboxes: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: + """See base class.""" + input_image_type = image.dtype + if input_image_type != tf.uint8: + image = tf.clip_by_value(image, 0.0, 255.0) + image = tf.cast(image, dtype=tf.uint8) + + tf_policies = self._make_tf_policies() + image, bboxes = select_and_apply_random_policy(tf_policies, image, bboxes) + return image, bboxes + + @staticmethod + def detection_policy_v0(): + """Autoaugment policy that was used in AutoAugment Paper for Detection. + + https://arxiv.org/pdf/1906.11172 + + Each tuple is an augmentation operation of the form + (operation, probability, magnitude). Each element in policy is a + sub-policy that will be applied sequentially on the image. + + Returns: + the policy. + """ + policy = [ + [('TranslateX_BBox', 0.6, 4), ('Equalize', 0.8, 10)], + [('TranslateY_Only_BBoxes', 0.2, 2), ('Cutout', 0.8, 8)], + [('Sharpness', 0.0, 8), ('ShearX_BBox', 0.4, 0)], + [('ShearY_BBox', 1.0, 2), ('TranslateY_Only_BBoxes', 0.6, 6)], + [('Rotate_BBox', 0.6, 10), ('Color', 1.0, 6)], + ] + return policy + + @staticmethod + def policy_v0(): + """Autoaugment policy that was used in AutoAugment Paper. + + Each tuple is an augmentation operation of the form + (operation, probability, magnitude). Each element in policy is a + sub-policy that will be applied sequentially on the image. + + Returns: + the policy. + """ + + policy = [ + [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)], + [('Color', 0.4, 9), ('Equalize', 0.6, 3)], + [('Color', 0.4, 1), ('Rotate', 0.6, 8)], + [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], + [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], + [('Color', 0.2, 0), ('Equalize', 0.8, 8)], + [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], + [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)], + [('Color', 0.6, 1), ('Equalize', 1.0, 2)], + [('Invert', 0.4, 9), ('Rotate', 0.6, 0)], + [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)], + [('Color', 0.4, 7), ('Equalize', 0.6, 0)], + [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], + [('Solarize', 0.6, 8), ('Color', 0.6, 9)], + [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)], + [('Rotate', 1.0, 7), ('TranslateY', 0.8, 9)], + [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)], + [('ShearY', 0.8, 0), ('Color', 0.6, 4)], + [('Color', 1.0, 0), ('Rotate', 0.6, 2)], + [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], + [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], + [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)], + [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], + [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], + [('Color', 0.8, 6), ('Rotate', 0.4, 5)], + ] + return policy + + @staticmethod + def policy_reduced_cifar10(): + """Autoaugment policy for reduced CIFAR-10 dataset. + + Result is from the AutoAugment paper: https://arxiv.org/abs/1805.09501. + + Each tuple is an augmentation operation of the form + (operation, probability, magnitude). Each element in policy is a + sub-policy that will be applied sequentially on the image. + + Returns: + the policy. + """ + policy = [ + [('Invert', 0.1, 7), ('Contrast', 0.2, 6)], + [('Rotate', 0.7, 2), ('TranslateX', 0.3, 9)], + [('Sharpness', 0.8, 1), ('Sharpness', 0.9, 3)], + [('ShearY', 0.5, 8), ('TranslateY', 0.7, 9)], + [('AutoContrast', 0.5, 8), ('Equalize', 0.9, 2)], + [('ShearY', 0.2, 7), ('Posterize', 0.3, 7)], + [('Color', 0.4, 3), ('Brightness', 0.6, 7)], + [('Sharpness', 0.3, 9), ('Brightness', 0.7, 9)], + [('Equalize', 0.6, 5), ('Equalize', 0.5, 1)], + [('Contrast', 0.6, 7), ('Sharpness', 0.6, 5)], + [('Color', 0.7, 7), ('TranslateX', 0.5, 8)], + [('Equalize', 0.3, 7), ('AutoContrast', 0.4, 8)], + [('TranslateY', 0.4, 3), ('Sharpness', 0.2, 6)], + [('Brightness', 0.9, 6), ('Color', 0.2, 8)], + [('Solarize', 0.5, 2), ('Invert', 0.0, 3)], + [('Equalize', 0.2, 0), ('AutoContrast', 0.6, 0)], + [('Equalize', 0.2, 8), ('Equalize', 0.6, 4)], + [('Color', 0.9, 9), ('Equalize', 0.6, 6)], + [('AutoContrast', 0.8, 4), ('Solarize', 0.2, 8)], + [('Brightness', 0.1, 3), ('Color', 0.7, 0)], + [('Solarize', 0.4, 5), ('AutoContrast', 0.9, 3)], + [('TranslateY', 0.9, 9), ('TranslateY', 0.7, 9)], + [('AutoContrast', 0.9, 2), ('Solarize', 0.8, 3)], + [('Equalize', 0.8, 8), ('Invert', 0.1, 3)], + [('TranslateY', 0.7, 9), ('AutoContrast', 0.9, 1)], + ] + return policy + + @staticmethod + def policy_svhn(): + """Autoaugment policy for SVHN dataset. + + Result is from the AutoAugment paper: https://arxiv.org/abs/1805.09501. + + Each tuple is an augmentation operation of the form + (operation, probability, magnitude). Each element in policy is a + sub-policy that will be applied sequentially on the image. + + Returns: + the policy. + """ + policy = [ + [('ShearX', 0.9, 4), ('Invert', 0.2, 3)], + [('ShearY', 0.9, 8), ('Invert', 0.7, 5)], + [('Equalize', 0.6, 5), ('Solarize', 0.6, 6)], + [('Invert', 0.9, 3), ('Equalize', 0.6, 3)], + [('Equalize', 0.6, 1), ('Rotate', 0.9, 3)], + [('ShearX', 0.9, 4), ('AutoContrast', 0.8, 3)], + [('ShearY', 0.9, 8), ('Invert', 0.4, 5)], + [('ShearY', 0.9, 5), ('Solarize', 0.2, 6)], + [('Invert', 0.9, 6), ('AutoContrast', 0.8, 1)], + [('Equalize', 0.6, 3), ('Rotate', 0.9, 3)], + [('ShearX', 0.9, 4), ('Solarize', 0.3, 3)], + [('ShearY', 0.8, 8), ('Invert', 0.7, 4)], + [('Equalize', 0.9, 5), ('TranslateY', 0.6, 6)], + [('Invert', 0.9, 4), ('Equalize', 0.6, 7)], + [('Contrast', 0.3, 3), ('Rotate', 0.8, 4)], + [('Invert', 0.8, 5), ('TranslateY', 0.0, 2)], + [('ShearY', 0.7, 6), ('Solarize', 0.4, 8)], + [('Invert', 0.6, 4), ('Rotate', 0.8, 4)], + [('ShearY', 0.3, 7), ('TranslateX', 0.9, 3)], + [('ShearX', 0.1, 6), ('Invert', 0.6, 5)], + [('Solarize', 0.7, 2), ('TranslateY', 0.6, 7)], + [('ShearY', 0.8, 4), ('Invert', 0.8, 8)], + [('ShearX', 0.7, 9), ('TranslateY', 0.8, 3)], + [('ShearY', 0.8, 5), ('AutoContrast', 0.7, 3)], + [('ShearX', 0.7, 2), ('Invert', 0.1, 5)], + ] + return policy + + @staticmethod + def policy_reduced_imagenet(): + """Autoaugment policy for reduced ImageNet dataset. + + Result is from the AutoAugment paper: https://arxiv.org/abs/1805.09501. + + Each tuple is an augmentation operation of the form + (operation, probability, magnitude). Each element in policy is a + sub-policy that will be applied sequentially on the image. + + Returns: + the policy. + """ + policy = [ + [('Posterize', 0.4, 8), ('Rotate', 0.6, 9)], + [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)], + [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)], + [('Posterize', 0.6, 7), ('Posterize', 0.6, 6)], + [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)], + [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)], + [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)], + [('Posterize', 0.8, 5), ('Equalize', 1.0, 2)], + [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)], + [('Equalize', 0.6, 8), ('Posterize', 0.4, 6)], + [('Rotate', 0.8, 8), ('Color', 0.4, 0)], + [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)], + [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)], + [('Invert', 0.6, 4), ('Equalize', 1.0, 8)], + [('Color', 0.6, 4), ('Contrast', 1.0, 8)], + [('Rotate', 0.8, 8), ('Color', 1.0, 2)], + [('Color', 0.8, 8), ('Solarize', 0.8, 7)], + [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)], + [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)], + [('Color', 0.4, 0), ('Equalize', 0.6, 3)], + [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)], + [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)], + [('Invert', 0.6, 4), ('Equalize', 1.0, 8)], + [('Color', 0.6, 4), ('Contrast', 1.0, 8)], + [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)] + ] + return policy + + @staticmethod + def policy_simple(): + """Same as `policy_v0`, except with custom ops removed.""" + + policy = [ + [('Color', 0.4, 9), ('Equalize', 0.6, 3)], + [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)], + [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)], + [('Color', 0.2, 0), ('Equalize', 0.8, 8)], + [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)], + [('Color', 0.6, 1), ('Equalize', 1.0, 2)], + [('Color', 0.4, 7), ('Equalize', 0.6, 0)], + [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)], + [('Solarize', 0.6, 8), ('Color', 0.6, 9)], + [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)], + [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)], + [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)], + [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)], + ] + return policy + + @staticmethod + def panoptic_deeplab_policy(): + policy = [ + [('Sharpness', 0.4, 1.4), ('Brightness', 0.2, 2.0)], + [('Equalize', 0.0, 1.8), ('Contrast', 0.2, 2.0)], + [('Sharpness', 0.2, 1.8), ('Color', 0.2, 1.8)], + [('Solarize', 0.2, 1.4), ('Equalize', 0.6, 1.8)], + [('Sharpness', 0.2, 0.2), ('Equalize', 0.2, 1.4)]] + return policy + + @staticmethod + def vit(): + """Autoaugment policy for a generic ViT.""" + policy = [ + [('Sharpness', 0.4, 1.4), ('Brightness', 0.2, 2.0), ('Cutout', 0.8, 8)], + [('Equalize', 0.0, 1.8), ('Contrast', 0.2, 2.0), ('Cutout', 0.8, 8)], + [('Sharpness', 0.2, 1.8), ('Color', 0.2, 1.8), ('Cutout', 0.8, 8)], + [('Solarize', 0.2, 1.4), ('Equalize', 0.6, 1.8), ('Cutout', 0.8, 8)], + [('Sharpness', 0.2, 0.2), ('Equalize', 0.2, 1.4), ('Cutout', 0.8, 8)], + [('Sharpness', 0.4, 7), ('Invert', 0.6, 8), ('Cutout', 0.8, 8)], + [('Invert', 0.6, 4), ('Equalize', 1.0, 8), ('Cutout', 0.8, 8)], + [('Posterize', 0.6, 7), ('Posterize', 0.6, 6), ('Cutout', 0.8, 8)], + [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5), ('Cutout', 0.8, 8)], + ] + return policy + + @staticmethod + def policy_test(): + """Autoaugment test policy for debugging.""" + policy = [ + [('TranslateX', 1.0, 4), ('Equalize', 1.0, 10)], + ] + return policy + + +def _maybe_identity(x: Optional[tf.Tensor]) -> Optional[tf.Tensor]: + return tf.identity(x) if x is not None else None + + +class RandAugment(ImageAugment): + """Applies the RandAugment policy to images. + + RandAugment is from the paper https://arxiv.org/abs/1909.13719. + """ + + def __init__(self, + num_layers: int = 2, + magnitude: float = 10., + cutout_const: float = 40., + translate_const: float = 100., + magnitude_std: float = 0.0, + prob_to_apply: Optional[float] = None, + exclude_ops: Optional[List[str]] = None): + """Applies the RandAugment policy to images. + + Args: + num_layers: Integer, the number of augmentation transformations to apply + sequentially to an image. Represented as (N) in the paper. Usually best + values will be in the range [1, 3]. + magnitude: Integer, shared magnitude across all augmentation operations. + Represented as (M) in the paper. Usually best values are in the range + [5, 10]. + cutout_const: multiplier for applying cutout. + translate_const: multiplier for applying translation. + magnitude_std: randomness of the severity as proposed by the authors of + the timm library. + prob_to_apply: The probability to apply the selected augmentation at each + layer. + exclude_ops: exclude selected operations. + """ + super(RandAugment, self).__init__() + + self.num_layers = num_layers + self.magnitude = float(magnitude) + self.cutout_const = float(cutout_const) + self.translate_const = float(translate_const) + self.prob_to_apply = ( + float(prob_to_apply) if prob_to_apply is not None else None) + self.available_ops = [ + 'AutoContrast', 'Equalize', 'Invert', 'Rotate', 'Posterize', 'Solarize', + 'Color', 'Contrast', 'Brightness', 'Sharpness', 'ShearX', 'ShearY', + 'TranslateX', 'TranslateY', 'Cutout', 'SolarizeAdd' + ] + self.magnitude_std = magnitude_std + if exclude_ops: + self.available_ops = [ + op for op in self.available_ops if op not in exclude_ops + ] + + @classmethod + def build_for_detection(cls, + num_layers: int = 2, + magnitude: float = 10., + cutout_const: float = 40., + translate_const: float = 100., + magnitude_std: float = 0.0, + prob_to_apply: Optional[float] = None, + exclude_ops: Optional[List[str]] = None): + """Builds a RandAugment that modifies bboxes for geometric transforms.""" + augmenter = cls( + num_layers=num_layers, + magnitude=magnitude, + cutout_const=cutout_const, + translate_const=translate_const, + magnitude_std=magnitude_std, + prob_to_apply=prob_to_apply, + exclude_ops=exclude_ops) + box_aware_ops_by_base_name = { + 'Rotate': 'Rotate_BBox', + 'ShearX': 'ShearX_BBox', + 'ShearY': 'ShearY_BBox', + 'TranslateX': 'TranslateX_BBox', + 'TranslateY': 'TranslateY_BBox', + } + augmenter.available_ops = [ + box_aware_ops_by_base_name.get(op_name) or op_name + for op_name in augmenter.available_ops + ] + return augmenter + + def _distort_common( + self, + image: tf.Tensor, + bboxes: Optional[tf.Tensor] = None + ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]: + """Distorts the image and optionally bounding boxes.""" + input_image_type = image.dtype + + if input_image_type != tf.uint8: + image = tf.clip_by_value(image, 0.0, 255.0) + image = tf.cast(image, dtype=tf.uint8) + + replace_value = [128] * 3 + min_prob, max_prob = 0.2, 0.8 + + aug_image = image + aug_bboxes = bboxes + + for _ in range(self.num_layers): + op_to_select = tf.random.uniform([], + maxval=len(self.available_ops) + 1, + dtype=tf.int32) + + branch_fns = [] + for (i, op_name) in enumerate(self.available_ops): + prob = tf.random.uniform([], + minval=min_prob, + maxval=max_prob, + dtype=tf.float32) + func, _, args = _parse_policy_info(op_name, prob, self.magnitude, + replace_value, self.cutout_const, + self.translate_const, + self.magnitude_std) + branch_fns.append(( + i, + # pylint:disable=g-long-lambda + lambda selected_func=func, selected_args=args: selected_func( + image, bboxes, *selected_args))) + # pylint:enable=g-long-lambda + + aug_image, aug_bboxes = tf.switch_case( + branch_index=op_to_select, + branch_fns=branch_fns, + default=lambda: (tf.identity(image), _maybe_identity(bboxes))) # pylint: disable=cell-var-from-loop + + if self.prob_to_apply is not None: + aug_image, aug_bboxes = tf.cond( + tf.random.uniform(shape=[], dtype=tf.float32) < self.prob_to_apply, + lambda: (tf.identity(aug_image), _maybe_identity(aug_bboxes)), + lambda: (tf.identity(image), _maybe_identity(bboxes))) + image = aug_image + bboxes = aug_bboxes + + image = tf.cast(image, dtype=input_image_type) + return image, bboxes + + def distort(self, image: tf.Tensor) -> tf.Tensor: + """See base class.""" + image, _ = self._distort_common(image) + return image + + def distort_with_boxes(self, image: tf.Tensor, + bboxes: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: + """See base class.""" + image, bboxes = self._distort_common(image, bboxes) + return image, bboxes + + +class RandomErasing(ImageAugment): + """Applies RandomErasing to a single image. + + Reference: https://arxiv.org/abs/1708.04896 + + Implementation is inspired by + https://github.com/rwightman/pytorch-image-models. + """ + + def __init__(self, + probability: float = 0.25, + min_area: float = 0.02, + max_area: float = 1 / 3, + min_aspect: float = 0.3, + max_aspect: Optional[float] = None, + min_count=1, + max_count=1, + trials=10): + """Applies RandomErasing to a single image. + + Args: + probability: Probability of augmenting the image. Defaults to `0.25`. + min_area: Minimum area of the random erasing rectangle. Defaults to + `0.02`. + max_area: Maximum area of the random erasing rectangle. Defaults to `1/3`. + min_aspect: Minimum aspect rate of the random erasing rectangle. Defaults + to `0.3`. + max_aspect: Maximum aspect rate of the random erasing rectangle. Defaults + to `None`. + min_count: Minimum number of erased rectangles. Defaults to `1`. + max_count: Maximum number of erased rectangles. Defaults to `1`. + trials: Maximum number of trials to randomly sample a rectangle that + fulfills constraint. Defaults to `10`. + """ + self._probability = probability + self._min_area = float(min_area) + self._max_area = float(max_area) + self._min_log_aspect = math.log(min_aspect) + self._max_log_aspect = math.log(max_aspect or 1 / min_aspect) + self._min_count = min_count + self._max_count = max_count + self._trials = trials + + def distort(self, image: tf.Tensor) -> tf.Tensor: + """Applies RandomErasing to single `image`. + + Args: + image (tf.Tensor): Of shape [height, width, 3] representing an image. + + Returns: + tf.Tensor: The augmented version of `image`. + """ + uniform_random = tf.random.uniform(shape=[], minval=0., maxval=1.0) + mirror_cond = tf.less(uniform_random, self._probability) + image = tf.cond(mirror_cond, lambda: self._erase(image), lambda: image) + return image + + @tf.function + def _erase(self, image: tf.Tensor) -> tf.Tensor: + """Erase an area.""" + if self._min_count == self._max_count: + count = self._min_count + else: + count = tf.random.uniform( + shape=[], + minval=int(self._min_count), + maxval=int(self._max_count - self._min_count + 1), + dtype=tf.int32) + + image_height = tf.shape(image)[0] + image_width = tf.shape(image)[1] + area = tf.cast(image_width * image_height, tf.float32) + + for _ in range(count): + # Work around since break is not supported in tf.function + is_trial_successfull = False + for _ in range(self._trials): + if not is_trial_successfull: + erase_area = tf.random.uniform( + shape=[], + minval=area * self._min_area, + maxval=area * self._max_area) + aspect_ratio = tf.math.exp( + tf.random.uniform( + shape=[], + minval=self._min_log_aspect, + maxval=self._max_log_aspect)) + + half_height = tf.cast( + tf.math.round(tf.math.sqrt(erase_area * aspect_ratio) / 2), + dtype=tf.int32) + half_width = tf.cast( + tf.math.round(tf.math.sqrt(erase_area / aspect_ratio) / 2), + dtype=tf.int32) + + if 2 * half_height < image_height and 2 * half_width < image_width: + center_height = tf.random.uniform( + shape=[], + minval=0, + maxval=int(image_height - 2 * half_height), + dtype=tf.int32) + center_width = tf.random.uniform( + shape=[], + minval=0, + maxval=int(image_width - 2 * half_width), + dtype=tf.int32) + + image = _fill_rectangle( + image, + center_width, + center_height, + half_width, + half_height, + replace=None) + + is_trial_successfull = True + + return image + + +class MixupAndCutmix: + """Applies Mixup and/or Cutmix to a batch of images. + + - Mixup: https://arxiv.org/abs/1710.09412 + - Cutmix: https://arxiv.org/abs/1905.04899 + + Implementaion is inspired by https://github.com/rwightman/pytorch-image-models + """ + + def __init__(self, + mixup_alpha: float = .8, + cutmix_alpha: float = 1., + prob: float = 1.0, + switch_prob: float = 0.5, + label_smoothing: float = 0.1, + num_classes: int = 1001): + """Applies Mixup and/or Cutmix to a batch of images. + + Args: + mixup_alpha (float, optional): For drawing a random lambda (`lam`) from a + beta distribution (for each image). If zero Mixup is deactivated. + Defaults to .8. + cutmix_alpha (float, optional): For drawing a random lambda (`lam`) from a + beta distribution (for each image). If zero Cutmix is deactivated. + Defaults to 1.. + prob (float, optional): Of augmenting the batch. Defaults to 1.0. + switch_prob (float, optional): Probability of applying Cutmix for the + batch. Defaults to 0.5. + label_smoothing (float, optional): Constant for label smoothing. Defaults + to 0.1. + num_classes (int, optional): Number of classes. Defaults to 1001. + """ + self.mixup_alpha = mixup_alpha + self.cutmix_alpha = cutmix_alpha + self.mix_prob = prob + self.switch_prob = switch_prob + self.label_smoothing = label_smoothing + self.num_classes = num_classes + self.mode = 'batch' + self.mixup_enabled = True + + if self.mixup_alpha and not self.cutmix_alpha: + self.switch_prob = -1 + elif not self.mixup_alpha and self.cutmix_alpha: + self.switch_prob = 1 + + def __call__(self, images: tf.Tensor, + labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: + return self.distort(images, labels) + + def distort(self, images: tf.Tensor, + labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: + """Applies Mixup and/or Cutmix to batch of images and transforms labels. + + Args: + images (tf.Tensor): Of shape [batch_size, height, width, 3] representing a + batch of image, or [batch_size, time, height, width, 3] representing a + batch of video. + labels (tf.Tensor): Of shape [batch_size, ] representing the class id for + each image of the batch. + + Returns: + Tuple[tf.Tensor, tf.Tensor]: The augmented version of `image` and + `labels`. + """ + labels = tf.reshape(labels, [-1]) + augment_cond = tf.less( + tf.random.uniform(shape=[], minval=0., maxval=1.0), self.mix_prob) + # pylint: disable=g-long-lambda + augment_a = lambda: self._update_labels(*tf.cond( + tf.less( + tf.random.uniform(shape=[], minval=0., maxval=1.0), self.switch_prob + ), lambda: self._cutmix(images, labels), lambda: self._mixup( + images, labels))) + augment_b = lambda: (images, self._smooth_labels(labels)) + # pylint: enable=g-long-lambda + + return tf.cond(augment_cond, augment_a, augment_b) + + @staticmethod + def _sample_from_beta(alpha, beta, shape): + sample_alpha = tf.random.gamma(shape, 1., beta=alpha) + sample_beta = tf.random.gamma(shape, 1., beta=beta) + return sample_alpha / (sample_alpha + sample_beta) + + def _cutmix(self, images: tf.Tensor, + labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]: + """Applies cutmix.""" + lam = MixupAndCutmix._sample_from_beta(self.cutmix_alpha, self.cutmix_alpha, + tf.shape(labels)) + + ratio = tf.math.sqrt(1 - lam) + + batch_size = tf.shape(images)[0] + + if images.shape.rank == 4: + image_height, image_width = tf.shape(images)[1], tf.shape(images)[2] + fill_fn = _fill_rectangle + elif images.shape.rank == 5: + image_height, image_width = tf.shape(images)[2], tf.shape(images)[3] + fill_fn = _fill_rectangle_video + else: + raise ValueError('Bad image rank: {}'.format(images.shape.rank)) + + cut_height = tf.cast( + ratio * tf.cast(image_height, dtype=tf.float32), dtype=tf.int32) + cut_width = tf.cast( + ratio * tf.cast(image_height, dtype=tf.float32), dtype=tf.int32) + + random_center_height = tf.random.uniform( + shape=[batch_size], minval=0, maxval=image_height, dtype=tf.int32) + random_center_width = tf.random.uniform( + shape=[batch_size], minval=0, maxval=image_width, dtype=tf.int32) + + bbox_area = cut_height * cut_width + lam = 1. - bbox_area / (image_height * image_width) + lam = tf.cast(lam, dtype=tf.float32) + + images = tf.map_fn( + lambda x: fill_fn(*x), + (images, random_center_width, random_center_height, cut_width // 2, + cut_height // 2, tf.reverse(images, [0])), + dtype=( + images.dtype, tf.int32, tf.int32, tf.int32, tf.int32, images.dtype), + fn_output_signature=tf.TensorSpec(images.shape[1:], dtype=images.dtype)) + + return images, labels, lam + + def _mixup(self, images: tf.Tensor, + labels: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor]: + """Applies mixup.""" + lam = MixupAndCutmix._sample_from_beta(self.mixup_alpha, self.mixup_alpha, + tf.shape(labels)) + if images.shape.rank == 4: + lam = tf.reshape(lam, [-1, 1, 1, 1]) + elif images.shape.rank == 5: + lam = tf.reshape(lam, [-1, 1, 1, 1, 1]) + else: + raise ValueError('Bad image rank: {}'.format(images.shape.rank)) + + lam_cast = tf.cast(lam, dtype=images.dtype) + images = lam_cast * images + (1. - lam_cast) * tf.reverse(images, [0]) + + return images, labels, tf.squeeze(lam) + + def _smooth_labels(self, labels: tf.Tensor) -> tf.Tensor: + off_value = self.label_smoothing / self.num_classes + on_value = 1. - self.label_smoothing + off_value + + smooth_labels = tf.one_hot( + labels, self.num_classes, on_value=on_value, off_value=off_value) + return smooth_labels + + def _update_labels(self, images: tf.Tensor, labels: tf.Tensor, + lam: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: + labels_1 = self._smooth_labels(labels) + labels_2 = tf.reverse(labels_1, [0]) + + lam = tf.reshape(lam, [-1, 1]) + labels = lam * labels_1 + (1. - lam) * labels_2 + + return images, labels diff --git a/official/vision/ops/augment_test.py b/official/vision/ops/augment_test.py new file mode 100644 index 0000000000000000000000000000000000000000..c0b4afebb33322b8777b307f3ef700eab96e013c --- /dev/null +++ b/official/vision/ops/augment_test.py @@ -0,0 +1,498 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for autoaugment.""" + +from __future__ import absolute_import +from __future__ import division +from __future__ import print_function + +import random +from absl.testing import parameterized + +import tensorflow as tf + +from official.vision.ops import augment + + +def get_dtype_test_cases(): + return [ + ('uint8', tf.uint8), + ('int32', tf.int32), + ('float16', tf.float16), + ('float32', tf.float32), + ] + + +@parameterized.named_parameters(get_dtype_test_cases()) +class TransformsTest(parameterized.TestCase, tf.test.TestCase): + """Basic tests for fundamental transformations.""" + + def test_to_from_4d(self, dtype): + for shape in [(10, 10), (10, 10, 10), (10, 10, 10, 10)]: + original_ndims = len(shape) + image = tf.zeros(shape, dtype=dtype) + image_4d = augment.to_4d(image) + self.assertEqual(4, tf.rank(image_4d)) + self.assertAllEqual(image, augment.from_4d(image_4d, original_ndims)) + + def test_transform(self, dtype): + image = tf.constant([[1, 2], [3, 4]], dtype=dtype) + self.assertAllEqual( + augment.transform(image, transforms=[1] * 8), [[4, 4], [4, 4]]) + + def test_translate(self, dtype): + image = tf.constant( + [[1, 0, 1, 0], [0, 1, 0, 1], [1, 0, 1, 0], [0, 1, 0, 1]], dtype=dtype) + translations = [-1, -1] + translated = augment.translate(image=image, translations=translations) + expected = [[1, 0, 1, 1], [0, 1, 0, 0], [1, 0, 1, 1], [1, 0, 1, 1]] + self.assertAllEqual(translated, expected) + + def test_translate_shapes(self, dtype): + translation = [0, 0] + for shape in [(3, 3), (5, 5), (224, 224, 3)]: + image = tf.zeros(shape, dtype=dtype) + self.assertAllEqual(image, augment.translate(image, translation)) + + def test_translate_invalid_translation(self, dtype): + image = tf.zeros((1, 1), dtype=dtype) + invalid_translation = [[[1, 1]]] + with self.assertRaisesRegex(TypeError, 'rank 1 or 2'): + _ = augment.translate(image, invalid_translation) + + def test_rotate(self, dtype): + image = tf.reshape(tf.cast(tf.range(9), dtype), (3, 3)) + rotation = 90. + transformed = augment.rotate(image=image, degrees=rotation) + expected = [[2, 5, 8], [1, 4, 7], [0, 3, 6]] + self.assertAllEqual(transformed, expected) + + def test_rotate_shapes(self, dtype): + degrees = 0. + for shape in [(3, 3), (5, 5), (224, 224, 3)]: + image = tf.zeros(shape, dtype=dtype) + self.assertAllEqual(image, augment.rotate(image, degrees)) + + +class AutoaugmentTest(tf.test.TestCase, parameterized.TestCase): + + AVAILABLE_POLICIES = [ + 'v0', + 'test', + 'simple', + 'reduced_cifar10', + 'svhn', + 'reduced_imagenet', + 'detection_v0', + 'vit', + ] + + def test_autoaugment(self): + """Smoke test to be sure there are no syntax errors.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + + for policy in self.AVAILABLE_POLICIES: + augmenter = augment.AutoAugment(augmentation_name=policy) + aug_image = augmenter.distort(image) + + self.assertEqual((224, 224, 3), aug_image.shape) + + def test_autoaugment_with_bboxes(self): + """Smoke test to be sure there are no syntax errors with bboxes.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + bboxes = tf.ones((2, 4), dtype=tf.float32) + + for policy in self.AVAILABLE_POLICIES: + augmenter = augment.AutoAugment(augmentation_name=policy) + aug_image, aug_bboxes = augmenter.distort_with_boxes(image, bboxes) + + self.assertEqual((224, 224, 3), aug_image.shape) + self.assertEqual((2, 4), aug_bboxes.shape) + + def test_randaug(self): + """Smoke test to be sure there are no syntax errors.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + + augmenter = augment.RandAugment() + aug_image = augmenter.distort(image) + + self.assertEqual((224, 224, 3), aug_image.shape) + + def test_randaug_with_bboxes(self): + """Smoke test to be sure there are no syntax errors with bboxes.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + bboxes = tf.ones((2, 4), dtype=tf.float32) + + augmenter = augment.RandAugment() + aug_image, aug_bboxes = augmenter.distort_with_boxes(image, bboxes) + + self.assertEqual((224, 224, 3), aug_image.shape) + self.assertEqual((2, 4), aug_bboxes.shape) + + def test_randaug_build_for_detection(self): + """Smoke test to be sure there are no syntax errors built for detection.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + bboxes = tf.ones((2, 4), dtype=tf.float32) + + augmenter = augment.RandAugment.build_for_detection() + self.assertCountEqual(augmenter.available_ops, [ + 'AutoContrast', 'Equalize', 'Invert', 'Posterize', 'Solarize', 'Color', + 'Contrast', 'Brightness', 'Sharpness', 'Cutout', 'SolarizeAdd', + 'Rotate_BBox', 'ShearX_BBox', 'ShearY_BBox', 'TranslateX_BBox', + 'TranslateY_BBox' + ]) + + aug_image, aug_bboxes = augmenter.distort_with_boxes(image, bboxes) + self.assertEqual((224, 224, 3), aug_image.shape) + self.assertEqual((2, 4), aug_bboxes.shape) + + def test_all_policy_ops(self): + """Smoke test to be sure all augmentation functions can execute.""" + + prob = 1 + magnitude = 10 + replace_value = [128] * 3 + cutout_const = 100 + translate_const = 250 + + image = tf.ones((224, 224, 3), dtype=tf.uint8) + bboxes = None + + for op_name in augment.NAME_TO_FUNC.keys() - augment.REQUIRE_BOXES_FUNCS: + func, _, args = augment._parse_policy_info(op_name, prob, magnitude, + replace_value, cutout_const, + translate_const) + image, bboxes = func(image, bboxes, *args) + + self.assertEqual((224, 224, 3), image.shape) + self.assertIsNone(bboxes) + + def test_all_policy_ops_with_bboxes(self): + """Smoke test to be sure all augmentation functions can execute.""" + + prob = 1 + magnitude = 10 + replace_value = [128] * 3 + cutout_const = 100 + translate_const = 250 + + image = tf.ones((224, 224, 3), dtype=tf.uint8) + bboxes = tf.ones((2, 4), dtype=tf.float32) + + for op_name in augment.NAME_TO_FUNC: + func, _, args = augment._parse_policy_info(op_name, prob, magnitude, + replace_value, cutout_const, + translate_const) + image, bboxes = func(image, bboxes, *args) + + self.assertEqual((224, 224, 3), image.shape) + self.assertEqual((2, 4), bboxes.shape) + + def test_autoaugment_video(self): + """Smoke test with video to be sure there are no syntax errors.""" + image = tf.zeros((2, 224, 224, 3), dtype=tf.uint8) + + for policy in self.AVAILABLE_POLICIES: + augmenter = augment.AutoAugment(augmentation_name=policy) + aug_image = augmenter.distort(image) + + self.assertEqual((2, 224, 224, 3), aug_image.shape) + + def test_autoaugment_video_with_boxes(self): + """Smoke test with video to be sure there are no syntax errors.""" + image = tf.zeros((2, 224, 224, 3), dtype=tf.uint8) + bboxes = tf.ones((2, 2, 4), dtype=tf.float32) + + for policy in self.AVAILABLE_POLICIES: + augmenter = augment.AutoAugment(augmentation_name=policy) + aug_image, aug_bboxes = augmenter.distort_with_boxes(image, bboxes) + + self.assertEqual((2, 224, 224, 3), aug_image.shape) + self.assertEqual((2, 2, 4), aug_bboxes.shape) + + def test_randaug_video(self): + """Smoke test with video to be sure there are no syntax errors.""" + image = tf.zeros((2, 224, 224, 3), dtype=tf.uint8) + + augmenter = augment.RandAugment() + aug_image = augmenter.distort(image) + + self.assertEqual((2, 224, 224, 3), aug_image.shape) + + def test_all_policy_ops_video(self): + """Smoke test to be sure all video augmentation functions can execute.""" + + prob = 1 + magnitude = 10 + replace_value = [128] * 3 + cutout_const = 100 + translate_const = 250 + + image = tf.ones((2, 224, 224, 3), dtype=tf.uint8) + bboxes = None + + for op_name in augment.NAME_TO_FUNC.keys() - augment.REQUIRE_BOXES_FUNCS: + func, _, args = augment._parse_policy_info(op_name, prob, magnitude, + replace_value, cutout_const, + translate_const) + image, bboxes = func(image, bboxes, *args) + + self.assertEqual((2, 224, 224, 3), image.shape) + self.assertIsNone(bboxes) + + def test_all_policy_ops_video_with_bboxes(self): + """Smoke test to be sure all video augmentation functions can execute.""" + + prob = 1 + magnitude = 10 + replace_value = [128] * 3 + cutout_const = 100 + translate_const = 250 + + image = tf.ones((2, 224, 224, 3), dtype=tf.uint8) + bboxes = tf.ones((2, 2, 4), dtype=tf.float32) + + for op_name in augment.NAME_TO_FUNC: + func, _, args = augment._parse_policy_info(op_name, prob, magnitude, + replace_value, cutout_const, + translate_const) + if op_name in { + 'Rotate_BBox', + 'ShearX_BBox', + 'ShearY_BBox', + 'TranslateX_BBox', + 'TranslateY_BBox', + 'TranslateY_Only_BBoxes', + }: + with self.assertRaises(ValueError): + func(image, bboxes, *args) + else: + image, bboxes = func(image, bboxes, *args) + + self.assertEqual((2, 224, 224, 3), image.shape) + self.assertEqual((2, 2, 4), bboxes.shape) + + def _generate_test_policy(self): + """Generate a test policy at random.""" + op_list = list(augment.NAME_TO_FUNC.keys()) + size = 6 + prob = [round(random.uniform(0., 1.), 1) for _ in range(size)] + mag = [round(random.uniform(0, 10)) for _ in range(size)] + policy = [] + for i in range(0, size, 2): + policy.append([(op_list[i], prob[i], mag[i]), + (op_list[i + 1], prob[i + 1], mag[i + 1])]) + return policy + + def test_custom_policy(self): + """Test autoaugment with a custom policy.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + augmenter = augment.AutoAugment(policies=self._generate_test_policy()) + aug_image = augmenter.distort(image) + + self.assertEqual((224, 224, 3), aug_image.shape) + + @parameterized.named_parameters( + {'testcase_name': '_OutOfRangeProb', + 'sub_policy': ('Equalize', 1.1, 3), 'value': '1.1'}, + {'testcase_name': '_OutOfRangeMag', + 'sub_policy': ('Equalize', 0.9, 11), 'value': '11'}, + ) + def test_invalid_custom_sub_policy(self, sub_policy, value): + """Test autoaugment with out-of-range values in the custom policy.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + policy = self._generate_test_policy() + policy[0][0] = sub_policy + augmenter = augment.AutoAugment(policies=policy) + + with self.assertRaisesRegex( + tf.errors.InvalidArgumentError, + r'Expected \'tf.Tensor\(False, shape=\(\), dtype=bool\)\' to be true. ' + r'Summarized data: ({})'.format(value)): + augmenter.distort(image) + + def test_invalid_custom_policy_ndim(self): + """Test autoaugment with wrong dimension in the custom policy.""" + policy = [[('Equalize', 0.8, 1), ('Shear', 0.8, 4)], + [('TranslateY', 0.6, 3), ('Rotate', 0.9, 3)]] + policy = [[policy]] + + with self.assertRaisesRegex( + ValueError, + r'Expected \(:, :, 3\) but got \(1, 1, 2, 2, 3\).'): + augment.AutoAugment(policies=policy) + + def test_invalid_custom_policy_shape(self): + """Test autoaugment with wrong shape in the custom policy.""" + policy = [[('Equalize', 0.8, 1, 1), ('Shear', 0.8, 4, 1)], + [('TranslateY', 0.6, 3, 1), ('Rotate', 0.9, 3, 1)]] + + with self.assertRaisesRegex( + ValueError, + r'Expected \(:, :, 3\) but got \(2, 2, 4\)'): + augment.AutoAugment(policies=policy) + + def test_invalid_custom_policy_key(self): + """Test autoaugment with invalid key in the custom policy.""" + image = tf.zeros((224, 224, 3), dtype=tf.uint8) + policy = [[('AAAAA', 0.8, 1), ('Shear', 0.8, 4)], + [('TranslateY', 0.6, 3), ('Rotate', 0.9, 3)]] + augmenter = augment.AutoAugment(policies=policy) + + with self.assertRaisesRegex(KeyError, '\'AAAAA\''): + augmenter.distort(image) + + +class RandomErasingTest(tf.test.TestCase, parameterized.TestCase): + + def test_random_erase_replaces_some_pixels(self): + image = tf.zeros((224, 224, 3), dtype=tf.float32) + augmenter = augment.RandomErasing(probability=1., max_count=10) + + aug_image = augmenter.distort(image) + + self.assertEqual((224, 224, 3), aug_image.shape) + self.assertNotEqual(0, tf.reduce_max(aug_image)) + + +class MixupAndCutmixTest(tf.test.TestCase, parameterized.TestCase): + + def test_mixup_and_cutmix_smoothes_labels(self): + batch_size = 12 + num_classes = 1000 + label_smoothing = 0.1 + + images = tf.random.normal((batch_size, 224, 224, 3), dtype=tf.float32) + labels = tf.range(batch_size) + augmenter = augment.MixupAndCutmix( + num_classes=num_classes, label_smoothing=label_smoothing) + + aug_images, aug_labels = augmenter.distort(images, labels) + + self.assertEqual(images.shape, aug_images.shape) + self.assertEqual(images.dtype, aug_images.dtype) + self.assertEqual([batch_size, num_classes], aug_labels.shape) + self.assertAllLessEqual(aug_labels, 1. - label_smoothing + + 2. / num_classes) # With tolerance + self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - + 1e4) # With tolerance + + def test_mixup_changes_image(self): + batch_size = 12 + num_classes = 1000 + label_smoothing = 0.1 + + images = tf.random.normal((batch_size, 224, 224, 3), dtype=tf.float32) + labels = tf.range(batch_size) + augmenter = augment.MixupAndCutmix( + mixup_alpha=1., cutmix_alpha=0., num_classes=num_classes) + + aug_images, aug_labels = augmenter.distort(images, labels) + + self.assertEqual(images.shape, aug_images.shape) + self.assertEqual(images.dtype, aug_images.dtype) + self.assertEqual([batch_size, num_classes], aug_labels.shape) + self.assertAllLessEqual(aug_labels, 1. - label_smoothing + + 2. / num_classes) # With tolerance + self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - + 1e4) # With tolerance + self.assertFalse(tf.math.reduce_all(images == aug_images)) + + def test_cutmix_changes_image(self): + batch_size = 12 + num_classes = 1000 + label_smoothing = 0.1 + + images = tf.random.normal((batch_size, 224, 224, 3), dtype=tf.float32) + labels = tf.range(batch_size) + augmenter = augment.MixupAndCutmix( + mixup_alpha=0., cutmix_alpha=1., num_classes=num_classes) + + aug_images, aug_labels = augmenter.distort(images, labels) + + self.assertEqual(images.shape, aug_images.shape) + self.assertEqual(images.dtype, aug_images.dtype) + self.assertEqual([batch_size, num_classes], aug_labels.shape) + self.assertAllLessEqual(aug_labels, 1. - label_smoothing + + 2. / num_classes) # With tolerance + self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - + 1e4) # With tolerance + self.assertFalse(tf.math.reduce_all(images == aug_images)) + + def test_mixup_and_cutmix_smoothes_labels_with_videos(self): + batch_size = 12 + num_classes = 1000 + label_smoothing = 0.1 + + images = tf.random.normal((batch_size, 8, 224, 224, 3), dtype=tf.float32) + labels = tf.range(batch_size) + augmenter = augment.MixupAndCutmix( + num_classes=num_classes, label_smoothing=label_smoothing) + + aug_images, aug_labels = augmenter.distort(images, labels) + + self.assertEqual(images.shape, aug_images.shape) + self.assertEqual(images.dtype, aug_images.dtype) + self.assertEqual([batch_size, num_classes], aug_labels.shape) + self.assertAllLessEqual(aug_labels, 1. - label_smoothing + + 2. / num_classes) # With tolerance + self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - + 1e4) # With tolerance + + def test_mixup_changes_video(self): + batch_size = 12 + num_classes = 1000 + label_smoothing = 0.1 + + images = tf.random.normal((batch_size, 8, 224, 224, 3), dtype=tf.float32) + labels = tf.range(batch_size) + augmenter = augment.MixupAndCutmix( + mixup_alpha=1., cutmix_alpha=0., num_classes=num_classes) + + aug_images, aug_labels = augmenter.distort(images, labels) + + self.assertEqual(images.shape, aug_images.shape) + self.assertEqual(images.dtype, aug_images.dtype) + self.assertEqual([batch_size, num_classes], aug_labels.shape) + self.assertAllLessEqual(aug_labels, 1. - label_smoothing + + 2. / num_classes) # With tolerance + self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - + 1e4) # With tolerance + self.assertFalse(tf.math.reduce_all(images == aug_images)) + + def test_cutmix_changes_video(self): + batch_size = 12 + num_classes = 1000 + label_smoothing = 0.1 + + images = tf.random.normal((batch_size, 8, 224, 224, 3), dtype=tf.float32) + labels = tf.range(batch_size) + augmenter = augment.MixupAndCutmix( + mixup_alpha=0., cutmix_alpha=1., num_classes=num_classes) + + aug_images, aug_labels = augmenter.distort(images, labels) + + self.assertEqual(images.shape, aug_images.shape) + self.assertEqual(images.dtype, aug_images.dtype) + self.assertEqual([batch_size, num_classes], aug_labels.shape) + self.assertAllLessEqual(aug_labels, 1. - label_smoothing + + 2. / num_classes) # With tolerance + self.assertAllGreaterEqual(aug_labels, label_smoothing / num_classes - + 1e4) # With tolerance + self.assertFalse(tf.math.reduce_all(images == aug_images)) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/ops/box_matcher.py b/official/vision/ops/box_matcher.py new file mode 100644 index 0000000000000000000000000000000000000000..1661b8100e1f4662b438c896d917eb9bc0179e33 --- /dev/null +++ b/official/vision/ops/box_matcher.py @@ -0,0 +1,202 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Box matcher implementation.""" + +from typing import List, Tuple + +import tensorflow as tf + + +class BoxMatcher: + """Matcher based on highest value. + + This class computes matches from a similarity matrix. Each column is matched + to a single row. + + To support object detection target assignment this class enables setting both + positive_threshold (upper threshold) and negative_threshold (lower thresholds) + defining three categories of similarity which define whether examples are + positive, negative, or ignored, for example: + (1) thresholds=[negative_threshold, positive_threshold], and + indicators=[negative_value, ignore_value, positive_value]: The similarity + metrics below negative_threshold will be assigned with negative_value, + the metrics between negative_threshold and positive_threshold will be + assigned ignore_value, and the metrics above positive_threshold will be + assigned positive_value. + (2) thresholds=[negative_threshold, positive_threshold], and + indicators=[ignore_value, negative_value, positive_value]: The similarity + metric below negative_threshold will be assigned with ignore_value, + the metrics between negative_threshold and positive_threshold will be + assigned negative_value, and the metrics above positive_threshold will be + assigned positive_value. + """ + + def __init__(self, + thresholds: List[float], + indicators: List[int], + force_match_for_each_col: bool = False): + """Construct BoxMatcher. + + Args: + thresholds: A list of thresholds to classify the matches into different + types (e.g. positive or negative or ignored match). The list needs to be + sorted, and will be prepended with -Inf and appended with +Inf. + indicators: A list of values representing match types (e.g. positive or + negative or ignored match). len(`indicators`) must equal to + len(`thresholds`) + 1. + force_match_for_each_col: If True, ensures that each column is matched to + at least one row (which is not guaranteed otherwise if the + positive_threshold is high). Defaults to False. If True, all force + matched row will be assigned to `indicators[-1]`. + + Raises: + ValueError: If `threshold` not sorted, + or len(indicators) != len(threshold) + 1 + """ + if not all([lo <= hi for (lo, hi) in zip(thresholds[:-1], thresholds[1:])]): + raise ValueError('`threshold` must be sorted, got {}'.format(thresholds)) + self.indicators = indicators + if len(indicators) != len(thresholds) + 1: + raise ValueError('len(`indicators`) must be len(`thresholds`) + 1, got ' + 'indicators {}, thresholds {}'.format( + indicators, thresholds)) + thresholds = thresholds[:] + thresholds.insert(0, -float('inf')) + thresholds.append(float('inf')) + self.thresholds = thresholds + self._force_match_for_each_col = force_match_for_each_col + + def __call__(self, + similarity_matrix: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]: + """Tries to match each column of the similarity matrix to a row. + + Args: + similarity_matrix: A float tensor of shape [num_rows, num_cols] or + [batch_size, num_rows, num_cols] representing any similarity metric. + + Returns: + matched_columns: An integer tensor of shape [num_rows] or [batch_size, + num_rows] storing the index of the matched column for each row. + match_indicators: An integer tensor of shape [num_rows] or [batch_size, + num_rows] storing the match type indicator (e.g. positive or negative or + ignored match). + """ + squeeze_result = False + if len(similarity_matrix.shape) == 2: + squeeze_result = True + similarity_matrix = tf.expand_dims(similarity_matrix, axis=0) + + static_shape = similarity_matrix.shape.as_list() + num_rows = static_shape[1] or tf.shape(similarity_matrix)[1] + batch_size = static_shape[0] or tf.shape(similarity_matrix)[0] + + def _match_when_rows_are_empty(): + """Performs matching when the rows of similarity matrix are empty. + + When the rows are empty, all detections are false positives. So we return + a tensor of -1's to indicate that the rows do not match to any columns. + + Returns: + matched_columns: An integer tensor of shape [num_rows] or [batch_size, + num_rows] storing the index of the matched column for each row. + match_indicators: An integer tensor of shape [num_rows] or [batch_size, + num_rows] storing the match type indicator (e.g. positive or negative + or ignored match). + """ + with tf.name_scope('empty_gt_boxes'): + matched_columns = tf.zeros([batch_size, num_rows], dtype=tf.int32) + match_indicators = -tf.ones([batch_size, num_rows], dtype=tf.int32) + return matched_columns, match_indicators + + def _match_when_rows_are_non_empty(): + """Performs matching when the rows of similarity matrix are non empty. + + Returns: + matched_columns: An integer tensor of shape [num_rows] or [batch_size, + num_rows] storing the index of the matched column for each row. + match_indicators: An integer tensor of shape [num_rows] or [batch_size, + num_rows] storing the match type indicator (e.g. positive or negative + or ignored match). + """ + with tf.name_scope('non_empty_gt_boxes'): + matched_columns = tf.argmax( + similarity_matrix, axis=-1, output_type=tf.int32) + + # Get logical indices of ignored and unmatched columns as tf.int64 + matched_vals = tf.reduce_max(similarity_matrix, axis=-1) + match_indicators = tf.zeros([batch_size, num_rows], tf.int32) + + match_dtype = matched_vals.dtype + for (ind, low, high) in zip(self.indicators, self.thresholds[:-1], + self.thresholds[1:]): + low_threshold = tf.cast(low, match_dtype) + high_threshold = tf.cast(high, match_dtype) + mask = tf.logical_and( + tf.greater_equal(matched_vals, low_threshold), + tf.less(matched_vals, high_threshold)) + match_indicators = self._set_values_using_indicator( + match_indicators, mask, ind) + + if self._force_match_for_each_col: + # [batch_size, num_cols], for each column (groundtruth_box), find the + # best matching row (anchor). + matching_rows = tf.argmax( + input=similarity_matrix, axis=1, output_type=tf.int32) + # [batch_size, num_cols, num_rows], a transposed 0-1 mapping matrix M, + # where M[j, i] = 1 means column j is matched to row i. + column_to_row_match_mapping = tf.one_hot( + matching_rows, depth=num_rows) + # [batch_size, num_rows], for each row (anchor), find the matched + # column (groundtruth_box). + force_matched_columns = tf.argmax( + input=column_to_row_match_mapping, axis=1, output_type=tf.int32) + # [batch_size, num_rows] + force_matched_column_mask = tf.cast( + tf.reduce_max(column_to_row_match_mapping, axis=1), tf.bool) + # [batch_size, num_rows] + matched_columns = tf.where(force_matched_column_mask, + force_matched_columns, matched_columns) + match_indicators = tf.where( + force_matched_column_mask, self.indicators[-1] * + tf.ones([batch_size, num_rows], dtype=tf.int32), match_indicators) + + return matched_columns, match_indicators + + num_gt_boxes = similarity_matrix.shape.as_list()[-1] or tf.shape( + similarity_matrix)[-1] + matched_columns, match_indicators = tf.cond( + pred=tf.greater(num_gt_boxes, 0), + true_fn=_match_when_rows_are_non_empty, + false_fn=_match_when_rows_are_empty) + + if squeeze_result: + matched_columns = tf.squeeze(matched_columns, axis=0) + match_indicators = tf.squeeze(match_indicators, axis=0) + + return matched_columns, match_indicators + + def _set_values_using_indicator(self, x, indicator, val): + """Set the indicated fields of x to val. + + Args: + x: tensor. + indicator: boolean with same shape as x. + val: scalar with value to set. + + Returns: + modified tensor. + """ + indicator = tf.cast(indicator, x.dtype) + return tf.add(tf.multiply(x, 1 - indicator), val * indicator) diff --git a/official/vision/beta/ops/box_matcher_test.py b/official/vision/ops/box_matcher_test.py similarity index 95% rename from official/vision/beta/ops/box_matcher_test.py rename to official/vision/ops/box_matcher_test.py index 67f8c4ccbc5c12802ff0cf977a5fad15f72460c6..0ea8b11dd717909b2970e39c509387d243da463f 100644 --- a/official/vision/beta/ops/box_matcher_test.py +++ b/official/vision/ops/box_matcher_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ import tensorflow as tf -from official.vision.beta.ops import box_matcher +from official.vision.ops import box_matcher class BoxMatcherTest(tf.test.TestCase): diff --git a/official/vision/ops/box_ops.py b/official/vision/ops/box_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..05553c66024ae5c5756464ea351b6776a780157c --- /dev/null +++ b/official/vision/ops/box_ops.py @@ -0,0 +1,848 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Box related ops.""" + +# Import libraries +import numpy as np +import tensorflow as tf + + +EPSILON = 1e-8 +BBOX_XFORM_CLIP = np.log(1000. / 16.) + + +def yxyx_to_xywh(boxes): + """Converts boxes from ymin, xmin, ymax, xmax to xmin, ymin, width, height. + + Args: + boxes: a numpy array whose last dimension is 4 representing the coordinates + of boxes in ymin, xmin, ymax, xmax order. + + Returns: + boxes: a numpy array whose shape is the same as `boxes` in new format. + + Raises: + ValueError: If the last dimension of boxes is not 4. + """ + if boxes.shape[-1] != 4: + raise ValueError( + 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) + + boxes_ymin = boxes[..., 0] + boxes_xmin = boxes[..., 1] + boxes_width = boxes[..., 3] - boxes[..., 1] + boxes_height = boxes[..., 2] - boxes[..., 0] + new_boxes = np.stack( + [boxes_xmin, boxes_ymin, boxes_width, boxes_height], axis=-1) + + return new_boxes + + +def yxyx_to_cycxhw(boxes): + """Converts box corner coordinates to center plus height and width terms. + + Args: + boxes: a `Tensor` with last dimension of 4, representing the coordinates of + boxes in ymin, xmin, ymax, xmax order. + + Returns: + boxes: a `Tensor` with the same shape as the inputted boxes, in the format + of cy, cx, height, width. + + Raises: + ValueError: if the last dimension of boxes is not 4. + """ + if boxes.shape[-1] != 4: + raise ValueError('Last dimension of boxes must be 4 but is {:d}'.format( + boxes.shape[-1])) + + boxes_ycenter = (boxes[..., 0] + boxes[..., 2]) / 2 + boxes_xcenter = (boxes[..., 1] + boxes[..., 3]) / 2 + boxes_height = boxes[..., 2] - boxes[..., 0] + boxes_width = boxes[..., 3] - boxes[..., 1] + + new_boxes = tf.stack( + [boxes_ycenter, boxes_xcenter, boxes_height, boxes_width], axis=-1) + return new_boxes + + +def cycxhw_to_yxyx(boxes): + """Converts box center coordinates plus height and width terms to corner. + + Args: + boxes: a numpy array whose last dimension is 4 representing the coordinates + of boxes in cy, cx, height, width order. + + Returns: + boxes: a numpy array whose shape is the same as `boxes` in new format. + + Raises: + ValueError: If the last dimension of boxes is not 4. + """ + if boxes.shape[-1] != 4: + raise ValueError( + 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) + + boxes_ymin = boxes[..., 0] - boxes[..., 2] / 2 + boxes_xmin = boxes[..., 1] - boxes[..., 3] / 2 + boxes_ymax = boxes[..., 0] + boxes[..., 2] / 2 + boxes_xmax = boxes[..., 1] + boxes[..., 3] / 2 + new_boxes = tf.stack([ + boxes_ymin, boxes_xmin, boxes_ymax, boxes_xmax], axis=-1) + return new_boxes + + +def jitter_boxes(boxes, noise_scale=0.025): + """Jitter the box coordinates by some noise distribution. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates of + boxes in ymin, xmin, ymax, xmax order. + noise_scale: a python float which specifies the magnitude of noise. The rule + of thumb is to set this between (0, 0.1]. The default value is found to + mimic the noisy detections best empirically. + + Returns: + jittered_boxes: a tensor whose shape is the same as `boxes` representing + the jittered boxes. + + Raises: + ValueError: If the last dimension of boxes is not 4. + """ + if boxes.shape[-1] != 4: + raise ValueError( + 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) + + with tf.name_scope('jitter_boxes'): + bbox_jitters = tf.random.normal(tf.shape(boxes), stddev=noise_scale) + ymin = boxes[..., 0:1] + xmin = boxes[..., 1:2] + ymax = boxes[..., 2:3] + xmax = boxes[..., 3:4] + width = xmax - xmin + height = ymax - ymin + new_center_x = (xmin + xmax) / 2.0 + bbox_jitters[..., 0:1] * width + new_center_y = (ymin + ymax) / 2.0 + bbox_jitters[..., 1:2] * height + new_width = width * tf.math.exp(bbox_jitters[..., 2:3]) + new_height = height * tf.math.exp(bbox_jitters[..., 3:4]) + jittered_boxes = tf.concat( + [new_center_y - new_height * 0.5, new_center_x - new_width * 0.5, + new_center_y + new_height * 0.5, new_center_x + new_width * 0.5], + axis=-1) + + return jittered_boxes + + +def normalize_boxes(boxes, image_shape): + """Converts boxes to the normalized coordinates. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates + of boxes in ymin, xmin, ymax, xmax order. + image_shape: a list of two integers, a two-element vector or a tensor such + that all but the last dimensions are `broadcastable` to `boxes`. The last + dimension is 2, which represents [height, width]. + + Returns: + normalized_boxes: a tensor whose shape is the same as `boxes` representing + the normalized boxes. + + Raises: + ValueError: If the last dimension of boxes is not 4. + """ + if boxes.shape[-1] != 4: + raise ValueError( + 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) + + with tf.name_scope('normalize_boxes'): + if isinstance(image_shape, list) or isinstance(image_shape, tuple): + height, width = image_shape + else: + image_shape = tf.cast(image_shape, dtype=boxes.dtype) + height = image_shape[..., 0:1] + width = image_shape[..., 1:2] + + ymin = boxes[..., 0:1] / height + xmin = boxes[..., 1:2] / width + ymax = boxes[..., 2:3] / height + xmax = boxes[..., 3:4] / width + + normalized_boxes = tf.concat([ymin, xmin, ymax, xmax], axis=-1) + return normalized_boxes + + +def denormalize_boxes(boxes, image_shape): + """Converts boxes normalized by [height, width] to pixel coordinates. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates + of boxes in ymin, xmin, ymax, xmax order. + image_shape: a list of two integers, a two-element vector or a tensor such + that all but the last dimensions are `broadcastable` to `boxes`. The last + dimension is 2, which represents [height, width]. + + Returns: + denormalized_boxes: a tensor whose shape is the same as `boxes` representing + the denormalized boxes. + + Raises: + ValueError: If the last dimension of boxes is not 4. + """ + with tf.name_scope('denormalize_boxes'): + if isinstance(image_shape, list) or isinstance(image_shape, tuple): + height, width = image_shape + else: + image_shape = tf.cast(image_shape, dtype=boxes.dtype) + height, width = tf.split(image_shape, 2, axis=-1) + + ymin, xmin, ymax, xmax = tf.split(boxes, 4, axis=-1) + ymin = ymin * height + xmin = xmin * width + ymax = ymax * height + xmax = xmax * width + + denormalized_boxes = tf.concat([ymin, xmin, ymax, xmax], axis=-1) + return denormalized_boxes + + +def horizontal_flip_boxes(normalized_boxes): + """Flips normalized boxes horizontally. + + Args: + normalized_boxes: the boxes in normalzied coordinates. + + Returns: + horizontally flipped boxes. + """ + if normalized_boxes.shape[-1] != 4: + raise ValueError('boxes.shape[-1] is {:d}, but must be 4.'.format( + normalized_boxes.shape[-1])) + + with tf.name_scope('horizontal_flip_boxes'): + ymin, xmin, ymax, xmax = tf.split( + value=normalized_boxes, num_or_size_splits=4, axis=-1) + flipped_xmin = tf.subtract(1.0, xmax) + flipped_xmax = tf.subtract(1.0, xmin) + flipped_boxes = tf.concat([ymin, flipped_xmin, ymax, flipped_xmax], axis=-1) + return flipped_boxes + + +def clip_boxes(boxes, image_shape): + """Clips boxes to image boundaries. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates + of boxes in ymin, xmin, ymax, xmax order. + image_shape: a list of two integers, a two-element vector or a tensor such + that all but the last dimensions are `broadcastable` to `boxes`. The last + dimension is 2, which represents [height, width]. + + Returns: + clipped_boxes: a tensor whose shape is the same as `boxes` representing the + clipped boxes. + + Raises: + ValueError: If the last dimension of boxes is not 4. + """ + if boxes.shape[-1] != 4: + raise ValueError( + 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) + + with tf.name_scope('clip_boxes'): + if isinstance(image_shape, list) or isinstance(image_shape, tuple): + height, width = image_shape + max_length = [height, width, height, width] + else: + image_shape = tf.cast(image_shape, dtype=boxes.dtype) + height, width = tf.unstack(image_shape, axis=-1) + max_length = tf.stack([height, width, height, width], axis=-1) + + clipped_boxes = tf.math.maximum(tf.math.minimum(boxes, max_length), 0.0) + return clipped_boxes + + +def compute_outer_boxes(boxes, image_shape, scale=1.0): + """Compute outer box encloses an object with a margin. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates of + boxes in ymin, xmin, ymax, xmax order. + image_shape: a list of two integers, a two-element vector or a tensor such + that all but the last dimensions are `broadcastable` to `boxes`. The last + dimension is 2, which represents [height, width]. + scale: a float number specifying the scale of output outer boxes to input + `boxes`. + + Returns: + outer_boxes: a tensor whose shape is the same as `boxes` representing the + outer boxes. + """ + if scale < 1.0: + raise ValueError( + 'scale is {}, but outer box scale must be greater than 1.0.'.format( + scale)) + centers_y = (boxes[..., 0] + boxes[..., 2]) / 2.0 + centers_x = (boxes[..., 1] + boxes[..., 3]) / 2.0 + box_height = (boxes[..., 2] - boxes[..., 0]) * scale + box_width = (boxes[..., 3] - boxes[..., 1]) * scale + outer_boxes = tf.stack( + [centers_y - box_height / 2.0, centers_x - box_width / 2.0, + centers_y + box_height / 2.0, centers_x + box_width / 2.0], + axis=1) + outer_boxes = clip_boxes(outer_boxes, image_shape) + return outer_boxes + + +def encode_boxes(boxes, anchors, weights=None): + """Encode boxes to targets. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates + of boxes in ymin, xmin, ymax, xmax order. + anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, + representing the coordinates of anchors in ymin, xmin, ymax, xmax order. + weights: None or a list of four float numbers used to scale coordinates. + + Returns: + encoded_boxes: a tensor whose shape is the same as `boxes` representing the + encoded box targets. + + Raises: + ValueError: If the last dimension of boxes is not 4. + """ + if boxes.shape[-1] != 4: + raise ValueError( + 'boxes.shape[-1] is {:d}, but must be 4.'.format(boxes.shape[-1])) + + with tf.name_scope('encode_boxes'): + boxes = tf.cast(boxes, dtype=anchors.dtype) + ymin = boxes[..., 0:1] + xmin = boxes[..., 1:2] + ymax = boxes[..., 2:3] + xmax = boxes[..., 3:4] + box_h = ymax - ymin + box_w = xmax - xmin + box_yc = ymin + 0.5 * box_h + box_xc = xmin + 0.5 * box_w + + anchor_ymin = anchors[..., 0:1] + anchor_xmin = anchors[..., 1:2] + anchor_ymax = anchors[..., 2:3] + anchor_xmax = anchors[..., 3:4] + anchor_h = anchor_ymax - anchor_ymin + anchor_w = anchor_xmax - anchor_xmin + anchor_yc = anchor_ymin + 0.5 * anchor_h + anchor_xc = anchor_xmin + 0.5 * anchor_w + + encoded_dy = (box_yc - anchor_yc) / anchor_h + encoded_dx = (box_xc - anchor_xc) / anchor_w + encoded_dh = tf.math.log(box_h / anchor_h) + encoded_dw = tf.math.log(box_w / anchor_w) + if weights: + encoded_dy *= weights[0] + encoded_dx *= weights[1] + encoded_dh *= weights[2] + encoded_dw *= weights[3] + + encoded_boxes = tf.concat( + [encoded_dy, encoded_dx, encoded_dh, encoded_dw], axis=-1) + return encoded_boxes + + +def decode_boxes(encoded_boxes, anchors, weights=None): + """Decode boxes. + + Args: + encoded_boxes: a tensor whose last dimension is 4 representing the + coordinates of encoded boxes in ymin, xmin, ymax, xmax order. + anchors: a tensor whose shape is the same as, or `broadcastable` to `boxes`, + representing the coordinates of anchors in ymin, xmin, ymax, xmax order. + weights: None or a list of four float numbers used to scale coordinates. + + Returns: + encoded_boxes: a tensor whose shape is the same as `boxes` representing the + decoded box targets. + """ + if encoded_boxes.shape[-1] != 4: + raise ValueError( + 'encoded_boxes.shape[-1] is {:d}, but must be 4.' + .format(encoded_boxes.shape[-1])) + + with tf.name_scope('decode_boxes'): + encoded_boxes = tf.cast(encoded_boxes, dtype=anchors.dtype) + dy = encoded_boxes[..., 0:1] + dx = encoded_boxes[..., 1:2] + dh = encoded_boxes[..., 2:3] + dw = encoded_boxes[..., 3:4] + if weights: + dy /= weights[0] + dx /= weights[1] + dh /= weights[2] + dw /= weights[3] + dh = tf.math.minimum(dh, BBOX_XFORM_CLIP) + dw = tf.math.minimum(dw, BBOX_XFORM_CLIP) + + anchor_ymin = anchors[..., 0:1] + anchor_xmin = anchors[..., 1:2] + anchor_ymax = anchors[..., 2:3] + anchor_xmax = anchors[..., 3:4] + anchor_h = anchor_ymax - anchor_ymin + anchor_w = anchor_xmax - anchor_xmin + anchor_yc = anchor_ymin + 0.5 * anchor_h + anchor_xc = anchor_xmin + 0.5 * anchor_w + + decoded_boxes_yc = dy * anchor_h + anchor_yc + decoded_boxes_xc = dx * anchor_w + anchor_xc + decoded_boxes_h = tf.math.exp(dh) * anchor_h + decoded_boxes_w = tf.math.exp(dw) * anchor_w + + decoded_boxes_ymin = decoded_boxes_yc - 0.5 * decoded_boxes_h + decoded_boxes_xmin = decoded_boxes_xc - 0.5 * decoded_boxes_w + decoded_boxes_ymax = decoded_boxes_ymin + decoded_boxes_h + decoded_boxes_xmax = decoded_boxes_xmin + decoded_boxes_w + + decoded_boxes = tf.concat( + [decoded_boxes_ymin, decoded_boxes_xmin, + decoded_boxes_ymax, decoded_boxes_xmax], + axis=-1) + return decoded_boxes + + +def filter_boxes(boxes, scores, image_shape, min_size_threshold): + """Filter and remove boxes that are too small or fall outside the image. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates of + boxes in ymin, xmin, ymax, xmax order. + scores: a tensor whose shape is the same as tf.shape(boxes)[:-1] + representing the original scores of the boxes. + image_shape: a tensor whose shape is the same as, or `broadcastable` to + `boxes` except the last dimension, which is 2, representing [height, + width] of the scaled image. + min_size_threshold: a float representing the minimal box size in each side + (w.r.t. the scaled image). Boxes whose sides are smaller than it will be + filtered out. + + Returns: + filtered_boxes: a tensor whose shape is the same as `boxes` but with + the position of the filtered boxes are filled with 0. + filtered_scores: a tensor whose shape is the same as 'scores' but with + the positinon of the filtered boxes filled with 0. + """ + if boxes.shape[-1] != 4: + raise ValueError( + 'boxes.shape[1] is {:d}, but must be 4.'.format(boxes.shape[-1])) + + with tf.name_scope('filter_boxes'): + if isinstance(image_shape, list) or isinstance(image_shape, tuple): + height, width = image_shape + else: + image_shape = tf.cast(image_shape, dtype=boxes.dtype) + height = image_shape[..., 0] + width = image_shape[..., 1] + + ymin = boxes[..., 0] + xmin = boxes[..., 1] + ymax = boxes[..., 2] + xmax = boxes[..., 3] + + h = ymax - ymin + w = xmax - xmin + yc = ymin + 0.5 * h + xc = xmin + 0.5 * w + + min_size = tf.cast( + tf.math.maximum(min_size_threshold, 0.0), dtype=boxes.dtype) + + filtered_size_mask = tf.math.logical_and( + tf.math.greater(h, min_size), tf.math.greater(w, min_size)) + filtered_center_mask = tf.logical_and( + tf.math.logical_and(tf.math.greater(yc, 0.0), tf.math.less(yc, height)), + tf.math.logical_and(tf.math.greater(xc, 0.0), tf.math.less(xc, width))) + filtered_mask = tf.math.logical_and( + filtered_size_mask, filtered_center_mask) + + filtered_scores = tf.where(filtered_mask, scores, tf.zeros_like(scores)) + filtered_boxes = tf.cast( + tf.expand_dims(filtered_mask, axis=-1), dtype=boxes.dtype) * boxes + return filtered_boxes, filtered_scores + + +def filter_boxes_by_scores(boxes, scores, min_score_threshold): + """Filter and remove boxes whose scores are smaller than the threshold. + + Args: + boxes: a tensor whose last dimension is 4 representing the coordinates of + boxes in ymin, xmin, ymax, xmax order. + scores: a tensor whose shape is the same as tf.shape(boxes)[:-1] + representing the original scores of the boxes. + min_score_threshold: a float representing the minimal box score threshold. + Boxes whose score are smaller than it will be filtered out. + + Returns: + filtered_boxes: a tensor whose shape is the same as `boxes` but with + the position of the filtered boxes are filled with -1. + filtered_scores: a tensor whose shape is the same as 'scores' but with + the + """ + if boxes.shape[-1] != 4: + raise ValueError('boxes.shape[1] is {:d}, but must be 4.'.format( + boxes.shape[-1])) + + with tf.name_scope('filter_boxes_by_scores'): + filtered_mask = tf.math.greater(scores, min_score_threshold) + filtered_scores = tf.where(filtered_mask, scores, -tf.ones_like(scores)) + filtered_boxes = tf.cast( + tf.expand_dims(filtered_mask, axis=-1), dtype=boxes.dtype) * boxes + + return filtered_boxes, filtered_scores + + +def gather_instances(selected_indices, instances, *aux_instances): + """Gather instances by indices. + + Args: + selected_indices: a Tensor of shape [batch, K] which indicates the selected + indices in instance dimension (2nd dimension). + instances: a Tensor of shape [batch, N, ...] where the 2nd dimension is + the instance dimension to be selected from. + *aux_instances: the additional Tensors whose shapes are in [batch, N, ...] + which are the tensors to be selected from using the `selected_indices`. + + Returns: + selected_instances: the tensor of shape [batch, K, ...] which corresponds to + the selected instances of the `instances` tensor. + selected_aux_instances: the additional tensors of shape [batch, K, ...] + which corresponds to the selected instances of the `aus_instances` + tensors. + """ + batch_size = instances.shape[0] + if batch_size == 1: + selected_instances = tf.squeeze( + tf.gather(instances, selected_indices, axis=1), axis=1) + if aux_instances: + selected_aux_instances = [ + tf.squeeze( + tf.gather(a, selected_indices, axis=1), axis=1) + for a in aux_instances + ] + return tuple([selected_instances] + selected_aux_instances) + else: + return selected_instances + else: + indices_shape = tf.shape(selected_indices) + batch_indices = ( + tf.expand_dims(tf.range(indices_shape[0]), axis=-1) * + tf.ones([1, indices_shape[-1]], dtype=tf.int32)) + gather_nd_indices = tf.stack( + [batch_indices, selected_indices], axis=-1) + selected_instances = tf.gather_nd(instances, gather_nd_indices) + if aux_instances: + selected_aux_instances = [ + tf.gather_nd(a, gather_nd_indices) for a in aux_instances + ] + return tuple([selected_instances] + selected_aux_instances) + else: + return selected_instances + + +def top_k_boxes(boxes, scores, k): + """Sort and select top k boxes according to the scores. + + Args: + boxes: a tensor of shape [batch_size, N, 4] representing the coordinate of + the boxes. N is the number of boxes per image. + scores: a tensor of shsape [batch_size, N] representing the socre of the + boxes. + k: an integer or a tensor indicating the top k number. + + Returns: + selected_boxes: a tensor of shape [batch_size, k, 4] representing the + selected top k box coordinates. + selected_scores: a tensor of shape [batch_size, k] representing the selected + top k box scores. + """ + with tf.name_scope('top_k_boxes'): + selected_scores, top_k_indices = tf.nn.top_k(scores, k=k, sorted=True) + selected_boxes = gather_instances(top_k_indices, boxes) + return selected_boxes, selected_scores + + +def get_non_empty_box_indices(boxes): + """Get indices for non-empty boxes.""" + # Selects indices if box height or width is 0. + height = boxes[:, 2] - boxes[:, 0] + width = boxes[:, 3] - boxes[:, 1] + indices = tf.where(tf.logical_and(tf.greater(height, 0), + tf.greater(width, 0))) + return indices[:, 0] + + +def bbox_overlap(boxes, gt_boxes): + """Calculates the overlap between proposal and ground truth boxes. + + Some `boxes` or `gt_boxes` may have been padded. The returned `iou` tensor + for these boxes will be -1. + + Args: + boxes: a tensor with a shape of [batch_size, N, 4]. N is the number of + proposals before groundtruth assignment (e.g., rpn_post_nms_topn). The + last dimension is the pixel coordinates in [ymin, xmin, ymax, xmax] form. + gt_boxes: a tensor with a shape of [batch_size, MAX_NUM_INSTANCES, 4]. This + tensor might have paddings with a negative value. + + Returns: + iou: a tensor with as a shape of [batch_size, N, MAX_NUM_INSTANCES]. + """ + with tf.name_scope('bbox_overlap'): + bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( + value=boxes, num_or_size_splits=4, axis=2) + gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( + value=gt_boxes, num_or_size_splits=4, axis=2) + + # Calculates the intersection area. + i_xmin = tf.math.maximum(bb_x_min, tf.transpose(gt_x_min, [0, 2, 1])) + i_xmax = tf.math.minimum(bb_x_max, tf.transpose(gt_x_max, [0, 2, 1])) + i_ymin = tf.math.maximum(bb_y_min, tf.transpose(gt_y_min, [0, 2, 1])) + i_ymax = tf.math.minimum(bb_y_max, tf.transpose(gt_y_max, [0, 2, 1])) + i_area = ( + tf.math.maximum((i_xmax - i_xmin), 0) * + tf.math.maximum((i_ymax - i_ymin), 0)) + + # Calculates the union area. + bb_area = (bb_y_max - bb_y_min) * (bb_x_max - bb_x_min) + gt_area = (gt_y_max - gt_y_min) * (gt_x_max - gt_x_min) + # Adds a small epsilon to avoid divide-by-zero. + u_area = bb_area + tf.transpose(gt_area, [0, 2, 1]) - i_area + 1e-8 + + # Calculates IoU. + iou = i_area / u_area + + # Fills -1 for IoU entries between the padded ground truth boxes. + gt_invalid_mask = tf.less( + tf.reduce_max(gt_boxes, axis=-1, keepdims=True), 0.0) + padding_mask = tf.logical_or( + tf.zeros_like(bb_x_min, dtype=tf.bool), + tf.transpose(gt_invalid_mask, [0, 2, 1])) + iou = tf.where(padding_mask, -tf.ones_like(iou), iou) + + # Fills -1 for invalid (-1) boxes. + boxes_invalid_mask = tf.less( + tf.reduce_max(boxes, axis=-1, keepdims=True), 0.0) + iou = tf.where(boxes_invalid_mask, -tf.ones_like(iou), iou) + + return iou + + +def bbox_generalized_overlap(boxes, gt_boxes): + """Calculates the GIOU between proposal and ground truth boxes. + + The generalized intersection of union is an adjustment of the traditional IOU + metric which provides continuous updates even for predictions with no overlap. + This metric is defined in https://giou.stanford.edu/GIoU.pdf. Note, some + `gt_boxes` may have been padded. The returned `giou` tensor for these boxes + will be -1. + + Args: + boxes: a `Tensor` with a shape of [batch_size, N, 4]. N is the number of + proposals before groundtruth assignment (e.g., rpn_post_nms_topn). The + last dimension is the pixel coordinates in [ymin, xmin, ymax, xmax] form. + gt_boxes: a `Tensor` with a shape of [batch_size, max_num_instances, 4]. + This tensor may have paddings with a negative value and will also be in + the [ymin, xmin, ymax, xmax] format. + + Returns: + giou: a `Tensor` with as a shape of [batch_size, N, max_num_instances]. + """ + with tf.name_scope('bbox_generalized_overlap'): + assert boxes.shape.as_list( + )[-1] == 4, 'Boxes must be defined by 4 coordinates.' + assert gt_boxes.shape.as_list( + )[-1] == 4, 'Groundtruth boxes must be defined by 4 coordinates.' + + bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( + value=boxes, num_or_size_splits=4, axis=2) + gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( + value=gt_boxes, num_or_size_splits=4, axis=2) + + # Calculates the hull area for each pair of boxes, with one from + # boxes and the other from gt_boxes. + # Outputs for coordinates are of shape [batch_size, N, max_num_instances] + h_xmin = tf.minimum(bb_x_min, tf.transpose(gt_x_min, [0, 2, 1])) + h_xmax = tf.maximum(bb_x_max, tf.transpose(gt_x_max, [0, 2, 1])) + h_ymin = tf.minimum(bb_y_min, tf.transpose(gt_y_min, [0, 2, 1])) + h_ymax = tf.maximum(bb_y_max, tf.transpose(gt_y_max, [0, 2, 1])) + h_area = tf.maximum((h_xmax - h_xmin), 0) * tf.maximum((h_ymax - h_ymin), 0) + # Add a small epsilon to avoid divide-by-zero. + h_area = h_area + 1e-8 + + # Calculates the intersection area. + i_xmin = tf.maximum(bb_x_min, tf.transpose(gt_x_min, [0, 2, 1])) + i_xmax = tf.minimum(bb_x_max, tf.transpose(gt_x_max, [0, 2, 1])) + i_ymin = tf.maximum(bb_y_min, tf.transpose(gt_y_min, [0, 2, 1])) + i_ymax = tf.minimum(bb_y_max, tf.transpose(gt_y_max, [0, 2, 1])) + i_area = tf.maximum((i_xmax - i_xmin), 0) * tf.maximum((i_ymax - i_ymin), 0) + + # Calculates the union area. + bb_area = (bb_y_max - bb_y_min) * (bb_x_max - bb_x_min) + gt_area = (gt_y_max - gt_y_min) * (gt_x_max - gt_x_min) + + # Adds a small epsilon to avoid divide-by-zero. + u_area = bb_area + tf.transpose(gt_area, [0, 2, 1]) - i_area + 1e-8 + + # Calculates IoU. + iou = i_area / u_area + # Calculates GIoU. + giou = iou - (h_area - u_area) / h_area + + # Fills -1 for GIoU entries between the padded ground truth boxes. + gt_invalid_mask = tf.less( + tf.reduce_max(gt_boxes, axis=-1, keepdims=True), 0.0) + padding_mask = tf.broadcast_to( + tf.transpose(gt_invalid_mask, [0, 2, 1]), tf.shape(giou)) + giou = tf.where(padding_mask, -tf.ones_like(giou), giou) + return giou + + +def box_matching(boxes, gt_boxes, gt_classes): + """Match boxes to groundtruth boxes. + + Given the proposal boxes and the groundtruth boxes and classes, perform the + groundtruth matching by taking the argmax of the IoU between boxes and + groundtruth boxes. + + Args: + boxes: a tensor of shape of [batch_size, N, 4] representing the box + coordiantes to be matched to groundtruth boxes. + gt_boxes: a tensor of shape of [batch_size, MAX_INSTANCES, 4] representing + the groundtruth box coordinates. It is padded with -1s to indicate the + invalid boxes. + gt_classes: [batch_size, MAX_INSTANCES] representing the groundtruth box + classes. It is padded with -1s to indicate the invalid classes. + + Returns: + matched_gt_boxes: a tensor of shape of [batch_size, N, 4], representing + the matched groundtruth box coordinates for each input box. If the box + does not overlap with any groundtruth boxes, the matched boxes of it + will be set to all 0s. + matched_gt_classes: a tensor of shape of [batch_size, N], representing + the matched groundtruth classes for each input box. If the box does not + overlap with any groundtruth boxes, the matched box classes of it will + be set to 0, which corresponds to the background class. + matched_gt_indices: a tensor of shape of [batch_size, N], representing + the indices of the matched groundtruth boxes in the original gt_boxes + tensor. If the box does not overlap with any groundtruth boxes, the + index of the matched groundtruth will be set to -1. + matched_iou: a tensor of shape of [batch_size, N], representing the IoU + between the box and its matched groundtruth box. The matched IoU is the + maximum IoU of the box and all the groundtruth boxes. + iou: a tensor of shape of [batch_size, N, K], representing the IoU matrix + between boxes and the groundtruth boxes. The IoU between a box and the + invalid groundtruth boxes whose coordinates are [-1, -1, -1, -1] is -1. + """ + # Compute IoU between boxes and gt_boxes. + # iou <- [batch_size, N, K] + iou = bbox_overlap(boxes, gt_boxes) + + # max_iou <- [batch_size, N] + # 0.0 -> no match to gt, or -1.0 match to no gt + matched_iou = tf.reduce_max(iou, axis=-1) + + # background_box_mask <- bool, [batch_size, N] + background_box_mask = tf.less_equal(matched_iou, 0.0) + + argmax_iou_indices = tf.argmax(iou, axis=-1, output_type=tf.int32) + + matched_gt_boxes, matched_gt_classes = gather_instances( + argmax_iou_indices, gt_boxes, gt_classes) + matched_gt_boxes = tf.where( + tf.tile(tf.expand_dims(background_box_mask, axis=-1), [1, 1, 4]), + tf.zeros_like(matched_gt_boxes, dtype=matched_gt_boxes.dtype), + matched_gt_boxes) + matched_gt_classes = tf.where( + background_box_mask, + tf.zeros_like(matched_gt_classes), + matched_gt_classes) + + matched_gt_indices = tf.where( + background_box_mask, + -tf.ones_like(argmax_iou_indices), + argmax_iou_indices) + + return (matched_gt_boxes, matched_gt_classes, matched_gt_indices, + matched_iou, iou) + + +def bbox2mask(bbox: tf.Tensor, + *, + image_height: int, + image_width: int, + dtype: tf.DType = tf.bool) -> tf.Tensor: + """Converts bounding boxes to bitmasks. + + Args: + bbox: A tensor in shape (..., 4) with arbitrary numbers of batch dimensions, + representing the absolute coordinates (ymin, xmin, ymax, xmax) for each + bounding box. + image_height: an integer representing the height of the image. + image_width: an integer representing the width of the image. + dtype: DType of the output bitmasks. + + Returns: + A tensor in shape (..., height, width) which stores the bitmasks created + from the bounding boxes. For example: + + >>> bbox2mask(tf.constant([[1,2,4,4]]), + image_height=5, + image_width=5, + dtype=tf.int32) + + """ + bbox_shape = bbox.get_shape().as_list() + if bbox_shape[-1] != 4: + raise ValueError( + 'Expected the last dimension of `bbox` has size == 4, but the shape ' + 'of `bbox` was: %s' % bbox_shape) + + # (..., 1) + ymin = bbox[..., 0:1] + xmin = bbox[..., 1:2] + ymax = bbox[..., 2:3] + xmax = bbox[..., 3:4] + # (..., 1, width) + ymin = tf.expand_dims(tf.repeat(ymin, repeats=image_width, axis=-1), axis=-2) + # (..., height, 1) + xmin = tf.expand_dims(tf.repeat(xmin, repeats=image_height, axis=-1), axis=-1) + # (..., 1, width) + ymax = tf.expand_dims(tf.repeat(ymax, repeats=image_width, axis=-1), axis=-2) + # (..., height, 1) + xmax = tf.expand_dims(tf.repeat(xmax, repeats=image_height, axis=-1), axis=-1) + + # (height, 1) + y_grid = tf.expand_dims(tf.range(image_height, dtype=bbox.dtype), axis=-1) + # (1, width) + x_grid = tf.expand_dims(tf.range(image_width, dtype=bbox.dtype), axis=-2) + + # (..., height, width) + ymin_mask = y_grid >= ymin + xmin_mask = x_grid >= xmin + ymax_mask = y_grid < ymax + xmax_mask = x_grid < xmax + return tf.cast(ymin_mask & xmin_mask & ymax_mask & xmax_mask, dtype) diff --git a/official/vision/beta/ops/iou_similarity.py b/official/vision/ops/iou_similarity.py similarity index 98% rename from official/vision/beta/ops/iou_similarity.py rename to official/vision/ops/iou_similarity.py index cdbb397fc2b8b043baa40c26d934e3d2215756bb..c73a957739da6e20916cc316aa41926e8c3ce06b 100644 --- a/official/vision/beta/ops/iou_similarity.py +++ b/official/vision/ops/iou_similarity.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/ops/iou_similarity_test.py b/official/vision/ops/iou_similarity_test.py similarity index 94% rename from official/vision/beta/ops/iou_similarity_test.py rename to official/vision/ops/iou_similarity_test.py index 512ea064a9da56cde7deb3c68e6d737a6ad94a4d..706d281cabfe91a6e18d3c0921aa04aa0910fcc7 100644 --- a/official/vision/beta/ops/iou_similarity_test.py +++ b/official/vision/ops/iou_similarity_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ import tensorflow as tf -from official.vision.beta.ops import iou_similarity +from official.vision.ops import iou_similarity class BoxMatcherTest(tf.test.TestCase): diff --git a/official/vision/ops/mask_ops.py b/official/vision/ops/mask_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..64fb05f92b547f7ac4ded198a62fda662308806e --- /dev/null +++ b/official/vision/ops/mask_ops.py @@ -0,0 +1,185 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Utility functions for segmentations.""" + +import math +# Import libraries +import cv2 +import numpy as np + + +def paste_instance_masks(masks: np.ndarray, detected_boxes: np.ndarray, + image_height: int, image_width: int) -> np.ndarray: + """Paste instance masks to generate the image segmentation results. + + Args: + masks: a numpy array of shape [N, mask_height, mask_width] representing the + instance masks w.r.t. the `detected_boxes`. + detected_boxes: a numpy array of shape [N, 4] representing the reference + bounding boxes. + image_height: an integer representing the height of the image. + image_width: an integer representing the width of the image. + + Returns: + segms: a numpy array of shape [N, image_height, image_width] representing + the instance masks *pasted* on the image canvas. + """ + + def expand_boxes(boxes: np.ndarray, scale: float) -> np.ndarray: + """Expands an array of boxes by a given scale.""" + # Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/boxes.py#L227 # pylint: disable=line-too-long + # The `boxes` in the reference implementation is in [x1, y1, x2, y2] form, + # whereas `boxes` here is in [x1, y1, w, h] form + w_half = boxes[:, 2] * 0.5 + h_half = boxes[:, 3] * 0.5 + x_c = boxes[:, 0] + w_half + y_c = boxes[:, 1] + h_half + + w_half *= scale + h_half *= scale + + boxes_exp = np.zeros(boxes.shape) + boxes_exp[:, 0] = x_c - w_half + boxes_exp[:, 2] = x_c + w_half + boxes_exp[:, 1] = y_c - h_half + boxes_exp[:, 3] = y_c + h_half + + return boxes_exp + + # Reference: https://github.com/facebookresearch/Detectron/blob/master/detectron/core/test.py#L812 # pylint: disable=line-too-long + # To work around an issue with cv2.resize (it seems to automatically pad + # with repeated border values), we manually zero-pad the masks by 1 pixel + # prior to resizing back to the original image resolution. This prevents + # "top hat" artifacts. We therefore need to expand the reference boxes by an + # appropriate factor. + _, mask_height, mask_width = masks.shape + scale = max((mask_width + 2.0) / mask_width, + (mask_height + 2.0) / mask_height) + + ref_boxes = expand_boxes(detected_boxes, scale) + ref_boxes = ref_boxes.astype(np.int32) + padded_mask = np.zeros((mask_height + 2, mask_width + 2), dtype=np.float32) + segms = [] + for mask_ind, mask in enumerate(masks): + im_mask = np.zeros((image_height, image_width), dtype=np.uint8) + # Process mask inside bounding boxes. + padded_mask[1:-1, 1:-1] = mask[:, :] + + ref_box = ref_boxes[mask_ind, :] + w = ref_box[2] - ref_box[0] + 1 + h = ref_box[3] - ref_box[1] + 1 + w = np.maximum(w, 1) + h = np.maximum(h, 1) + + mask = cv2.resize(padded_mask, (w, h)) + mask = np.array(mask > 0.5, dtype=np.uint8) + + x_0 = min(max(ref_box[0], 0), image_width) + x_1 = min(max(ref_box[2] + 1, 0), image_width) + y_0 = min(max(ref_box[1], 0), image_height) + y_1 = min(max(ref_box[3] + 1, 0), image_height) + + im_mask[y_0:y_1, x_0:x_1] = mask[ + (y_0 - ref_box[1]):(y_1 - ref_box[1]), + (x_0 - ref_box[0]):(x_1 - ref_box[0]) + ] + segms.append(im_mask) + + segms = np.array(segms) + assert masks.shape[0] == segms.shape[0] + return segms + + +def paste_instance_masks_v2(masks: np.ndarray, detected_boxes: np.ndarray, + image_height: int, image_width: int) -> np.ndarray: + """Paste instance masks to generate the image segmentation (v2). + + Args: + masks: a numpy array of shape [N, mask_height, mask_width] representing the + instance masks w.r.t. the `detected_boxes`. + detected_boxes: a numpy array of shape [N, 4] representing the reference + bounding boxes. + image_height: an integer representing the height of the image. + image_width: an integer representing the width of the image. + + Returns: + segms: a numpy array of shape [N, image_height, image_width] representing + the instance masks *pasted* on the image canvas. + """ + _, mask_height, mask_width = masks.shape + + segms = [] + for i, mask in enumerate(masks): + box = detected_boxes[i, :] + xmin = box[0] + ymin = box[1] + xmax = xmin + box[2] + ymax = ymin + box[3] + + # Sample points of the cropped mask w.r.t. the image grid. + # Note that these coordinates may fall beyond the image. + # Pixel clipping will happen after warping. + xmin_int = int(math.floor(xmin)) + xmax_int = int(math.ceil(xmax)) + ymin_int = int(math.floor(ymin)) + ymax_int = int(math.ceil(ymax)) + + alpha = box[2] / (1.0 * mask_width) + beta = box[3] / (1.0 * mask_height) + # pylint: disable=invalid-name + # Transformation from mask pixel indices to image coordinate. + M_mask_to_image = np.array( + [[alpha, 0, xmin], + [0, beta, ymin], + [0, 0, 1]], + dtype=np.float32) + # Transformation from image to cropped mask coordinate. + M_image_to_crop = np.array( + [[1, 0, -xmin_int], + [0, 1, -ymin_int], + [0, 0, 1]], + dtype=np.float32) + M = np.dot(M_image_to_crop, M_mask_to_image) + # Compensate the half pixel offset that OpenCV has in the + # warpPerspective implementation: the top-left pixel is sampled + # at (0,0), but we want it to be at (0.5, 0.5). + M = np.dot( + np.dot( + np.array([[1, 0, -0.5], + [0, 1, -0.5], + [0, 0, 1]], np.float32), + M), + np.array([[1, 0, 0.5], + [0, 1, 0.5], + [0, 0, 1]], np.float32)) + # pylint: enable=invalid-name + cropped_mask = cv2.warpPerspective( + mask.astype(np.float32), M, + (xmax_int - xmin_int, ymax_int - ymin_int)) + cropped_mask = np.array(cropped_mask > 0.5, dtype=np.uint8) + + img_mask = np.zeros((image_height, image_width)) + x0 = max(min(xmin_int, image_width), 0) + x1 = max(min(xmax_int, image_width), 0) + y0 = max(min(ymin_int, image_height), 0) + y1 = max(min(ymax_int, image_height), 0) + img_mask[y0:y1, x0:x1] = cropped_mask[ + (y0 - ymin_int):(y1 - ymin_int), + (x0 - xmin_int):(x1 - xmin_int)] + + segms.append(img_mask) + + segms = np.array(segms) + return segms diff --git a/official/vision/beta/ops/mask_ops_test.py b/official/vision/ops/mask_ops_test.py similarity index 93% rename from official/vision/beta/ops/mask_ops_test.py rename to official/vision/ops/mask_ops_test.py index 09b7663294a7b56b82957f04a9390c1ad4824f5e..443ea1d6a9b60c1f95d40875ac33e2be84bac3db 100644 --- a/official/vision/beta/ops/mask_ops_test.py +++ b/official/vision/ops/mask_ops_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,13 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. - """Tests for mask_ops.py.""" # Import libraries import numpy as np import tensorflow as tf -from official.vision.beta.ops import mask_ops +from official.vision.ops import mask_ops class MaskUtilsTest(tf.test.TestCase): diff --git a/official/vision/ops/nms.py b/official/vision/ops/nms.py new file mode 100644 index 0000000000000000000000000000000000000000..96287a420a3dbeb7bfdd632dac5cdf5d50b68739 --- /dev/null +++ b/official/vision/ops/nms.py @@ -0,0 +1,202 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tensorflow implementation of non max suppression.""" + +# Import libraries +import tensorflow as tf + +from official.vision.ops import box_ops + + +NMS_TILE_SIZE = 512 + + +def _self_suppression(iou, _, iou_sum): + batch_size = tf.shape(iou)[0] + can_suppress_others = tf.cast( + tf.reshape(tf.reduce_max(iou, 1) <= 0.5, [batch_size, -1, 1]), iou.dtype) + iou_suppressed = tf.reshape( + tf.cast(tf.reduce_max(can_suppress_others * iou, 1) <= 0.5, iou.dtype), + [batch_size, -1, 1]) * iou + iou_sum_new = tf.reduce_sum(iou_suppressed, [1, 2]) + return [ + iou_suppressed, + tf.reduce_any(iou_sum - iou_sum_new > 0.5), iou_sum_new + ] + + +def _cross_suppression(boxes, box_slice, iou_threshold, inner_idx): + batch_size = tf.shape(boxes)[0] + new_slice = tf.slice(boxes, [0, inner_idx * NMS_TILE_SIZE, 0], + [batch_size, NMS_TILE_SIZE, 4]) + iou = box_ops.bbox_overlap(new_slice, box_slice) + ret_slice = tf.expand_dims( + tf.cast(tf.reduce_all(iou < iou_threshold, [1]), box_slice.dtype), + 2) * box_slice + return boxes, ret_slice, iou_threshold, inner_idx + 1 + + +def _suppression_loop_body(boxes, iou_threshold, output_size, idx): + """Process boxes in the range [idx*NMS_TILE_SIZE, (idx+1)*NMS_TILE_SIZE). + + Args: + boxes: a tensor with a shape of [batch_size, anchors, 4]. + iou_threshold: a float representing the threshold for deciding whether boxes + overlap too much with respect to IOU. + output_size: an int32 tensor of size [batch_size]. Representing the number + of selected boxes for each batch. + idx: an integer scalar representing induction variable. + + Returns: + boxes: updated boxes. + iou_threshold: pass down iou_threshold to the next iteration. + output_size: the updated output_size. + idx: the updated induction variable. + """ + num_tiles = tf.shape(boxes)[1] // NMS_TILE_SIZE + batch_size = tf.shape(boxes)[0] + + # Iterates over tiles that can possibly suppress the current tile. + box_slice = tf.slice(boxes, [0, idx * NMS_TILE_SIZE, 0], + [batch_size, NMS_TILE_SIZE, 4]) + _, box_slice, _, _ = tf.while_loop( + lambda _boxes, _box_slice, _threshold, inner_idx: inner_idx < idx, + _cross_suppression, [boxes, box_slice, iou_threshold, + tf.constant(0)]) + + # Iterates over the current tile to compute self-suppression. + iou = box_ops.bbox_overlap(box_slice, box_slice) + mask = tf.expand_dims( + tf.reshape(tf.range(NMS_TILE_SIZE), [1, -1]) > tf.reshape( + tf.range(NMS_TILE_SIZE), [-1, 1]), 0) + iou *= tf.cast(tf.logical_and(mask, iou >= iou_threshold), iou.dtype) + suppressed_iou, _, _ = tf.while_loop( + lambda _iou, loop_condition, _iou_sum: loop_condition, _self_suppression, + [iou, tf.constant(True), + tf.reduce_sum(iou, [1, 2])]) + suppressed_box = tf.reduce_sum(suppressed_iou, 1) > 0 + box_slice *= tf.expand_dims(1.0 - tf.cast(suppressed_box, box_slice.dtype), 2) + + # Uses box_slice to update the input boxes. + mask = tf.reshape( + tf.cast(tf.equal(tf.range(num_tiles), idx), boxes.dtype), [1, -1, 1, 1]) + boxes = tf.tile(tf.expand_dims( + box_slice, [1]), [1, num_tiles, 1, 1]) * mask + tf.reshape( + boxes, [batch_size, num_tiles, NMS_TILE_SIZE, 4]) * (1 - mask) + boxes = tf.reshape(boxes, [batch_size, -1, 4]) + + # Updates output_size. + output_size += tf.reduce_sum( + tf.cast(tf.reduce_any(box_slice > 0, [2]), tf.int32), [1]) + return boxes, iou_threshold, output_size, idx + 1 + + +def sorted_non_max_suppression_padded(scores, + boxes, + max_output_size, + iou_threshold): + """A wrapper that handles non-maximum suppression. + + Assumption: + * The boxes are sorted by scores unless the box is a dot (all coordinates + are zero). + * Boxes with higher scores can be used to suppress boxes with lower scores. + + The overal design of the algorithm is to handle boxes tile-by-tile: + + boxes = boxes.pad_to_multiply_of(tile_size) + num_tiles = len(boxes) // tile_size + output_boxes = [] + for i in range(num_tiles): + box_tile = boxes[i*tile_size : (i+1)*tile_size] + for j in range(i - 1): + suppressing_tile = boxes[j*tile_size : (j+1)*tile_size] + iou = bbox_overlap(box_tile, suppressing_tile) + # if the box is suppressed in iou, clear it to a dot + box_tile *= _update_boxes(iou) + # Iteratively handle the diagnal tile. + iou = _box_overlap(box_tile, box_tile) + iou_changed = True + while iou_changed: + # boxes that are not suppressed by anything else + suppressing_boxes = _get_suppressing_boxes(iou) + # boxes that are suppressed by suppressing_boxes + suppressed_boxes = _get_suppressed_boxes(iou, suppressing_boxes) + # clear iou to 0 for boxes that are suppressed, as they cannot be used + # to suppress other boxes any more + new_iou = _clear_iou(iou, suppressed_boxes) + iou_changed = (new_iou != iou) + iou = new_iou + # remaining boxes that can still suppress others, are selected boxes. + output_boxes.append(_get_suppressing_boxes(iou)) + if len(output_boxes) >= max_output_size: + break + + Args: + scores: a tensor with a shape of [batch_size, anchors]. + boxes: a tensor with a shape of [batch_size, anchors, 4]. + max_output_size: a scalar integer `Tensor` representing the maximum number + of boxes to be selected by non max suppression. + iou_threshold: a float representing the threshold for deciding whether boxes + overlap too much with respect to IOU. + + Returns: + nms_scores: a tensor with a shape of [batch_size, anchors]. It has same + dtype as input scores. + nms_proposals: a tensor with a shape of [batch_size, anchors, 4]. It has + same dtype as input boxes. + """ + batch_size = tf.shape(boxes)[0] + num_boxes = tf.shape(boxes)[1] + pad = tf.cast( + tf.math.ceil(tf.cast(num_boxes, tf.float32) / NMS_TILE_SIZE), + tf.int32) * NMS_TILE_SIZE - num_boxes + boxes = tf.pad(tf.cast(boxes, tf.float32), [[0, 0], [0, pad], [0, 0]]) + scores = tf.pad( + tf.cast(scores, tf.float32), [[0, 0], [0, pad]], constant_values=-1) + num_boxes += pad + + def _loop_cond(unused_boxes, unused_threshold, output_size, idx): + return tf.logical_and( + tf.reduce_min(output_size) < max_output_size, + idx < num_boxes // NMS_TILE_SIZE) + + selected_boxes, _, output_size, _ = tf.while_loop( + _loop_cond, _suppression_loop_body, [ + boxes, iou_threshold, + tf.zeros([batch_size], tf.int32), + tf.constant(0) + ]) + idx = num_boxes - tf.cast( + tf.nn.top_k( + tf.cast(tf.reduce_any(selected_boxes > 0, [2]), tf.int32) * + tf.expand_dims(tf.range(num_boxes, 0, -1), 0), max_output_size)[0], + tf.int32) + idx = tf.minimum(idx, num_boxes - 1) + idx = tf.reshape( + idx + tf.reshape(tf.range(batch_size) * num_boxes, [-1, 1]), [-1]) + boxes = tf.reshape( + tf.gather(tf.reshape(boxes, [-1, 4]), idx), + [batch_size, max_output_size, 4]) + boxes = boxes * tf.cast( + tf.reshape(tf.range(max_output_size), [1, -1, 1]) < tf.reshape( + output_size, [-1, 1, 1]), boxes.dtype) + scores = tf.reshape( + tf.gather(tf.reshape(scores, [-1, 1]), idx), + [batch_size, max_output_size]) + scores = scores * tf.cast( + tf.reshape(tf.range(max_output_size), [1, -1]) < tf.reshape( + output_size, [-1, 1]), scores.dtype) + return scores, boxes diff --git a/official/vision/ops/preprocess_ops.py b/official/vision/ops/preprocess_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..75572814606b92d5d9fa82ed822b97144ddc49d3 --- /dev/null +++ b/official/vision/ops/preprocess_ops.py @@ -0,0 +1,984 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Preprocessing ops.""" + +import math +from typing import Optional, Tuple, Sequence, Union +from six.moves import range +import tensorflow as tf + +from official.vision.ops import augment +from official.vision.ops import box_ops + +CENTER_CROP_FRACTION = 0.875 + +# Calculated from the ImageNet training set +MEAN_NORM = (0.485, 0.456, 0.406) +STDDEV_NORM = (0.229, 0.224, 0.225) +MEAN_RGB = tuple(255 * i for i in MEAN_NORM) +STDDEV_RGB = tuple(255 * i for i in STDDEV_NORM) + +# Alias for convenience. PLEASE use `box_ops.horizontal_flip_boxes` directly. +horizontal_flip_boxes = box_ops.horizontal_flip_boxes + + +def clip_or_pad_to_fixed_size(input_tensor, size, constant_values=0): + """Pads data to a fixed length at the first dimension. + + Args: + input_tensor: `Tensor` with any dimension. + size: `int` number for the first dimension of output Tensor. + constant_values: `int` value assigned to the paddings. + + Returns: + `Tensor` with the first dimension padded to `size`. + """ + input_shape = input_tensor.get_shape().as_list() + padding_shape = [] + + # Computes the padding length on the first dimension, clip input tensor if it + # is longer than `size`. + input_length = tf.shape(input_tensor)[0] + input_length = tf.clip_by_value(input_length, 0, size) + input_tensor = input_tensor[:input_length] + + padding_length = tf.maximum(0, size - input_length) + padding_shape.append(padding_length) + + # Copies shapes of the rest of input shape dimensions. + for i in range(1, len(input_shape)): + padding_shape.append(tf.shape(input_tensor)[i]) + + # Pads input tensor to the fixed first dimension. + paddings = tf.cast(constant_values * tf.ones(padding_shape), + input_tensor.dtype) + padded_tensor = tf.concat([input_tensor, paddings], axis=0) + output_shape = input_shape + output_shape[0] = size + padded_tensor.set_shape(output_shape) + return padded_tensor + + +def normalize_image(image: tf.Tensor, + offset: Sequence[float] = MEAN_NORM, + scale: Sequence[float] = STDDEV_NORM): + """Normalizes the image to zero mean and unit variance.""" + with tf.name_scope('normalize_image'): + image = tf.image.convert_image_dtype(image, dtype=tf.float32) + return normalize_scaled_float_image(image, offset, scale) + + +def normalize_scaled_float_image(image: tf.Tensor, + offset: Sequence[float] = MEAN_NORM, + scale: Sequence[float] = STDDEV_NORM): + """Normalizes a scaled float image to zero mean and unit variance. + + It assumes the input image is float dtype with values in [0, 1). + + Args: + image: A tf.Tensor in float32 dtype with values in range [0, 1). + offset: A tuple of mean values to be subtracted from the image. + scale: A tuple of normalization factors. + + Returns: + A normalized image tensor. + """ + offset = tf.constant(offset) + offset = tf.expand_dims(offset, axis=0) + offset = tf.expand_dims(offset, axis=0) + image -= offset + + scale = tf.constant(scale) + scale = tf.expand_dims(scale, axis=0) + scale = tf.expand_dims(scale, axis=0) + image /= scale + return image + + +def compute_padded_size(desired_size, stride): + """Compute the padded size given the desired size and the stride. + + The padded size will be the smallest rectangle, such that each dimension is + the smallest multiple of the stride which is larger than the desired + dimension. For example, if desired_size = (100, 200) and stride = 32, + the output padded_size = (128, 224). + + Args: + desired_size: a `Tensor` or `int` list/tuple of two elements representing + [height, width] of the target output image size. + stride: an integer, the stride of the backbone network. + + Returns: + padded_size: a `Tensor` or `int` list/tuple of two elements representing + [height, width] of the padded output image size. + """ + if isinstance(desired_size, list) or isinstance(desired_size, tuple): + padded_size = [int(math.ceil(d * 1.0 / stride) * stride) + for d in desired_size] + else: + padded_size = tf.cast( + tf.math.ceil( + tf.cast(desired_size, dtype=tf.float32) / stride) * stride, + tf.int32) + return padded_size + + +def resize_and_crop_image(image, + desired_size, + padded_size, + aug_scale_min=1.0, + aug_scale_max=1.0, + seed=1, + method=tf.image.ResizeMethod.BILINEAR): + """Resizes the input image to output size (RetinaNet style). + + Resize and pad images given the desired output size of the image and + stride size. + + Here are the preprocessing steps. + 1. For a given image, keep its aspect ratio and rescale the image to make it + the largest rectangle to be bounded by the rectangle specified by the + `desired_size`. + 2. Pad the rescaled image to the padded_size. + + Args: + image: a `Tensor` of shape [height, width, 3] representing an image. + desired_size: a `Tensor` or `int` list/tuple of two elements representing + [height, width] of the desired actual output image size. + padded_size: a `Tensor` or `int` list/tuple of two elements representing + [height, width] of the padded output image size. Padding will be applied + after scaling the image to the desired_size. + aug_scale_min: a `float` with range between [0, 1.0] representing minimum + random scale applied to desired_size for training scale jittering. + aug_scale_max: a `float` with range between [1.0, inf] representing maximum + random scale applied to desired_size for training scale jittering. + seed: seed for random scale jittering. + method: function to resize input image to scaled image. + + Returns: + output_image: `Tensor` of shape [height, width, 3] where [height, width] + equals to `output_size`. + image_info: a 2D `Tensor` that encodes the information of the image and the + applied preprocessing. It is in the format of + [[original_height, original_width], [desired_height, desired_width], + [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, + desired_width] is the actual scaled image size, and [y_scale, x_scale] is + the scaling factor, which is the ratio of + scaled dimension / original dimension. + """ + with tf.name_scope('resize_and_crop_image'): + image_size = tf.cast(tf.shape(image)[0:2], tf.float32) + + random_jittering = (aug_scale_min != 1.0 or aug_scale_max != 1.0) + + if random_jittering: + random_scale = tf.random.uniform( + [], aug_scale_min, aug_scale_max, seed=seed) + scaled_size = tf.round(random_scale * desired_size) + else: + scaled_size = desired_size + + scale = tf.minimum( + scaled_size[0] / image_size[0], scaled_size[1] / image_size[1]) + scaled_size = tf.round(image_size * scale) + + # Computes 2D image_scale. + image_scale = scaled_size / image_size + + # Selects non-zero random offset (x, y) if scaled image is larger than + # desired_size. + if random_jittering: + max_offset = scaled_size - desired_size + max_offset = tf.where( + tf.less(max_offset, 0), tf.zeros_like(max_offset), max_offset) + offset = max_offset * tf.random.uniform([2,], 0, 1, seed=seed) + offset = tf.cast(offset, tf.int32) + else: + offset = tf.zeros((2,), tf.int32) + + scaled_image = tf.image.resize( + image, tf.cast(scaled_size, tf.int32), method=method) + + if random_jittering: + scaled_image = scaled_image[ + offset[0]:offset[0] + desired_size[0], + offset[1]:offset[1] + desired_size[1], :] + + output_image = tf.image.pad_to_bounding_box( + scaled_image, 0, 0, padded_size[0], padded_size[1]) + + image_info = tf.stack([ + image_size, + tf.constant(desired_size, dtype=tf.float32), + image_scale, + tf.cast(offset, tf.float32)]) + return output_image, image_info + + +def resize_and_crop_image_v2(image, + short_side, + long_side, + padded_size, + aug_scale_min=1.0, + aug_scale_max=1.0, + seed=1, + method=tf.image.ResizeMethod.BILINEAR): + """Resizes the input image to output size (Faster R-CNN style). + + Resize and pad images given the specified short / long side length and the + stride size. + + Here are the preprocessing steps. + 1. For a given image, keep its aspect ratio and first try to rescale the short + side of the original image to `short_side`. + 2. If the scaled image after 1 has a long side that exceeds `long_side`, keep + the aspect ratio and rescal the long side of the image to `long_side`. + 2. Pad the rescaled image to the padded_size. + + Args: + image: a `Tensor` of shape [height, width, 3] representing an image. + short_side: a scalar `Tensor` or `int` representing the desired short side + to be rescaled to. + long_side: a scalar `Tensor` or `int` representing the desired long side to + be rescaled to. + padded_size: a `Tensor` or `int` list/tuple of two elements representing + [height, width] of the padded output image size. Padding will be applied + after scaling the image to the desired_size. + aug_scale_min: a `float` with range between [0, 1.0] representing minimum + random scale applied to desired_size for training scale jittering. + aug_scale_max: a `float` with range between [1.0, inf] representing maximum + random scale applied to desired_size for training scale jittering. + seed: seed for random scale jittering. + method: function to resize input image to scaled image. + + Returns: + output_image: `Tensor` of shape [height, width, 3] where [height, width] + equals to `output_size`. + image_info: a 2D `Tensor` that encodes the information of the image and the + applied preprocessing. It is in the format of + [[original_height, original_width], [desired_height, desired_width], + [y_scale, x_scale], [y_offset, x_offset]], where [desired_height, + desired_width] is the actual scaled image size, and [y_scale, x_scale] is + the scaling factor, which is the ratio of + scaled dimension / original dimension. + """ + with tf.name_scope('resize_and_crop_image_v2'): + image_size = tf.cast(tf.shape(image)[0:2], tf.float32) + + scale_using_short_side = ( + short_side / tf.math.minimum(image_size[0], image_size[1])) + scale_using_long_side = ( + long_side / tf.math.maximum(image_size[0], image_size[1])) + + scaled_size = tf.math.round(image_size * scale_using_short_side) + scaled_size = tf.where( + tf.math.greater( + tf.math.maximum(scaled_size[0], scaled_size[1]), long_side), + tf.math.round(image_size * scale_using_long_side), + scaled_size) + desired_size = scaled_size + + random_jittering = (aug_scale_min != 1.0 or aug_scale_max != 1.0) + + if random_jittering: + random_scale = tf.random.uniform( + [], aug_scale_min, aug_scale_max, seed=seed) + scaled_size = tf.math.round(random_scale * scaled_size) + + # Computes 2D image_scale. + image_scale = scaled_size / image_size + + # Selects non-zero random offset (x, y) if scaled image is larger than + # desired_size. + if random_jittering: + max_offset = scaled_size - desired_size + max_offset = tf.where( + tf.math.less(max_offset, 0), tf.zeros_like(max_offset), max_offset) + offset = max_offset * tf.random.uniform([2,], 0, 1, seed=seed) + offset = tf.cast(offset, tf.int32) + else: + offset = tf.zeros((2,), tf.int32) + + scaled_image = tf.image.resize( + image, tf.cast(scaled_size, tf.int32), method=method) + + if random_jittering: + scaled_image = scaled_image[ + offset[0]:offset[0] + desired_size[0], + offset[1]:offset[1] + desired_size[1], :] + + output_image = tf.image.pad_to_bounding_box( + scaled_image, 0, 0, padded_size[0], padded_size[1]) + + image_info = tf.stack([ + image_size, + tf.cast(desired_size, dtype=tf.float32), + image_scale, + tf.cast(offset, tf.float32)]) + return output_image, image_info + + +def resize_image( + image: tf.Tensor, + size: Union[Tuple[int, int], int], + max_size: Optional[int] = None, + method: tf.image.ResizeMethod = tf.image.ResizeMethod.BILINEAR): + """Resize image with size and max_size. + + Args: + image: the image to be resized. + size: if list to tuple, resize to it. If scalar, we keep the same + aspect ratio and resize the short side to the value. + max_size: only used when size is a scalar. When the larger side is larger + than max_size after resized with size we used max_size to keep the aspect + ratio instead. + method: the method argument passed to tf.image.resize. + + Returns: + the resized image and image_info to be used for downstream processing. + image_info: a 2D `Tensor` that encodes the information of the image and the + applied preprocessing. It is in the format of + [[original_height, original_width], [resized_height, resized_width], + [y_scale, x_scale], [0, 0]], where [resized_height, resized_width] + is the actual scaled image size, and [y_scale, x_scale] is the + scaling factor, which is the ratio of + scaled dimension / original dimension. + """ + + def get_size_with_aspect_ratio(image_size, size, max_size=None): + h = image_size[0] + w = image_size[1] + if max_size is not None: + min_original_size = tf.cast(tf.math.minimum(w, h), dtype=tf.float32) + max_original_size = tf.cast(tf.math.maximum(w, h), dtype=tf.float32) + if max_original_size / min_original_size * size > max_size: + size = tf.cast( + tf.math.floor(max_size * min_original_size / max_original_size), + dtype=tf.int32) + else: + size = tf.cast(size, tf.int32) + + else: + size = tf.cast(size, tf.int32) + if (w <= h and w == size) or (h <= w and h == size): + return tf.stack([h, w]) + + if w < h: + ow = size + oh = tf.cast( + (tf.cast(size, dtype=tf.float32) * tf.cast(h, dtype=tf.float32) / + tf.cast(w, dtype=tf.float32)), + dtype=tf.int32) + else: + oh = size + ow = tf.cast( + (tf.cast(size, dtype=tf.float32) * tf.cast(w, dtype=tf.float32) / + tf.cast(h, dtype=tf.float32)), + dtype=tf.int32) + + return tf.stack([oh, ow]) + + def get_size(image_size, size, max_size=None): + if isinstance(size, (list, tuple)): + return size[::-1] + else: + return get_size_with_aspect_ratio(image_size, size, max_size) + + orignal_size = tf.shape(image)[0:2] + size = get_size(orignal_size, size, max_size) + rescaled_image = tf.image.resize( + image, tf.cast(size, tf.int32), method=method) + image_scale = size / orignal_size + image_info = tf.stack([ + tf.cast(orignal_size, dtype=tf.float32), + tf.cast(size, dtype=tf.float32), + tf.cast(image_scale, tf.float32), + tf.constant([0.0, 0.0], dtype=tf.float32) + ]) + return rescaled_image, image_info + + +def center_crop_image(image): + """Center crop a square shape slice from the input image. + + It crops a square shape slice from the image. The side of the actual crop + is 224 / 256 = 0.875 of the short side of the original image. References: + [1] Very Deep Convolutional Networks for Large-Scale Image Recognition + https://arxiv.org/abs/1409.1556 + [2] Deep Residual Learning for Image Recognition + https://arxiv.org/abs/1512.03385 + + Args: + image: a Tensor of shape [height, width, 3] representing the input image. + + Returns: + cropped_image: a Tensor representing the center cropped image. + """ + with tf.name_scope('center_crop_image'): + image_size = tf.cast(tf.shape(image)[:2], dtype=tf.float32) + crop_size = ( + CENTER_CROP_FRACTION * tf.math.minimum(image_size[0], image_size[1])) + crop_offset = tf.cast((image_size - crop_size) / 2.0, dtype=tf.int32) + crop_size = tf.cast(crop_size, dtype=tf.int32) + cropped_image = image[ + crop_offset[0]:crop_offset[0] + crop_size, + crop_offset[1]:crop_offset[1] + crop_size, :] + return cropped_image + + +def center_crop_image_v2(image_bytes, image_shape): + """Center crop a square shape slice from the input image. + + It crops a square shape slice from the image. The side of the actual crop + is 224 / 256 = 0.875 of the short side of the original image. References: + [1] Very Deep Convolutional Networks for Large-Scale Image Recognition + https://arxiv.org/abs/1409.1556 + [2] Deep Residual Learning for Image Recognition + https://arxiv.org/abs/1512.03385 + + This is a faster version of `center_crop_image` which takes the original + image bytes and image size as the inputs, and partially decode the JPEG + bytes according to the center crop. + + Args: + image_bytes: a Tensor of type string representing the raw image bytes. + image_shape: a Tensor specifying the shape of the raw image. + + Returns: + cropped_image: a Tensor representing the center cropped image. + """ + with tf.name_scope('center_image_crop_v2'): + image_shape = tf.cast(image_shape, tf.float32) + crop_size = ( + CENTER_CROP_FRACTION * tf.math.minimum(image_shape[0], image_shape[1])) + crop_offset = tf.cast((image_shape - crop_size) / 2.0, dtype=tf.int32) + crop_size = tf.cast(crop_size, dtype=tf.int32) + crop_window = tf.stack( + [crop_offset[0], crop_offset[1], crop_size, crop_size]) + cropped_image = tf.image.decode_and_crop_jpeg( + image_bytes, crop_window, channels=3) + return cropped_image + + +def random_crop_image(image, + aspect_ratio_range=(3. / 4., 4. / 3.), + area_range=(0.08, 1.0), + max_attempts=10, + seed=1): + """Randomly crop an arbitrary shaped slice from the input image. + + Args: + image: a Tensor of shape [height, width, 3] representing the input image. + aspect_ratio_range: a list of floats. The cropped area of the image must + have an aspect ratio = width / height within this range. + area_range: a list of floats. The cropped reas of the image must contain + a fraction of the input image within this range. + max_attempts: the number of attempts at generating a cropped region of the + image of the specified constraints. After max_attempts failures, return + the entire image. + seed: the seed of the random generator. + + Returns: + cropped_image: a Tensor representing the random cropped image. Can be the + original image if max_attempts is exhausted. + """ + with tf.name_scope('random_crop_image'): + crop_offset, crop_size, _ = tf.image.sample_distorted_bounding_box( + tf.shape(image), + tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]), + seed=seed, + min_object_covered=area_range[0], + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + max_attempts=max_attempts) + cropped_image = tf.slice(image, crop_offset, crop_size) + return cropped_image + + +def random_crop_image_v2(image_bytes, + image_shape, + aspect_ratio_range=(3. / 4., 4. / 3.), + area_range=(0.08, 1.0), + max_attempts=10, + seed=1): + """Randomly crop an arbitrary shaped slice from the input image. + + This is a faster version of `random_crop_image` which takes the original + image bytes and image size as the inputs, and partially decode the JPEG + bytes according to the generated crop. + + Args: + image_bytes: a Tensor of type string representing the raw image bytes. + image_shape: a Tensor specifying the shape of the raw image. + aspect_ratio_range: a list of floats. The cropped area of the image must + have an aspect ratio = width / height within this range. + area_range: a list of floats. The cropped reas of the image must contain + a fraction of the input image within this range. + max_attempts: the number of attempts at generating a cropped region of the + image of the specified constraints. After max_attempts failures, return + the entire image. + seed: the seed of the random generator. + + Returns: + cropped_image: a Tensor representing the random cropped image. Can be the + original image if max_attempts is exhausted. + """ + with tf.name_scope('random_crop_image_v2'): + crop_offset, crop_size, _ = tf.image.sample_distorted_bounding_box( + image_shape, + tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]), + seed=seed, + min_object_covered=area_range[0], + aspect_ratio_range=aspect_ratio_range, + area_range=area_range, + max_attempts=max_attempts) + offset_y, offset_x, _ = tf.unstack(crop_offset) + crop_height, crop_width, _ = tf.unstack(crop_size) + crop_window = tf.stack([offset_y, offset_x, crop_height, crop_width]) + cropped_image = tf.image.decode_and_crop_jpeg( + image_bytes, crop_window, channels=3) + return cropped_image + + +def resize_and_crop_boxes(boxes, + image_scale, + output_size, + offset): + """Resizes boxes to output size with scale and offset. + + Args: + boxes: `Tensor` of shape [N, 4] representing ground truth boxes. + image_scale: 2D float `Tensor` representing scale factors that apply to + [height, width] of input image. + output_size: 2D `Tensor` or `int` representing [height, width] of target + output image size. + offset: 2D `Tensor` representing top-left corner [y0, x0] to crop scaled + boxes. + + Returns: + boxes: `Tensor` of shape [N, 4] representing the scaled boxes. + """ + with tf.name_scope('resize_and_crop_boxes'): + # Adjusts box coordinates based on image_scale and offset. + boxes *= tf.tile(tf.expand_dims(image_scale, axis=0), [1, 2]) + boxes -= tf.tile(tf.expand_dims(offset, axis=0), [1, 2]) + # Clips the boxes. + boxes = box_ops.clip_boxes(boxes, output_size) + return boxes + + +def resize_and_crop_masks(masks, image_scale, output_size, offset): + """Resizes boxes to output size with scale and offset. + + Args: + masks: `Tensor` of shape [N, H, W, C] representing ground truth masks. + image_scale: 2D float `Tensor` representing scale factors that apply to + [height, width] of input image. + output_size: 2D `Tensor` or `int` representing [height, width] of target + output image size. + offset: 2D `Tensor` representing top-left corner [y0, x0] to crop scaled + boxes. + + Returns: + masks: `Tensor` of shape [N, H, W, C] representing the scaled masks. + """ + with tf.name_scope('resize_and_crop_masks'): + mask_size = tf.cast(tf.shape(masks)[1:3], tf.float32) + num_channels = tf.shape(masks)[3] + # Pad masks to avoid empty mask annotations. + masks = tf.concat([ + tf.zeros([1, mask_size[0], mask_size[1], num_channels], + dtype=masks.dtype), masks + ], + axis=0) + + scaled_size = tf.cast(image_scale * mask_size, tf.int32) + scaled_masks = tf.image.resize( + masks, scaled_size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR) + offset = tf.cast(offset, tf.int32) + scaled_masks = scaled_masks[ + :, + offset[0]:offset[0] + output_size[0], + offset[1]:offset[1] + output_size[1], + :] + + output_masks = tf.image.pad_to_bounding_box( + scaled_masks, 0, 0, output_size[0], output_size[1]) + # Remove padding. + output_masks = output_masks[1::] + return output_masks + + +def horizontal_flip_image(image): + """Flips image horizontally.""" + return tf.image.flip_left_right(image) + + +def horizontal_flip_masks(masks): + """Flips masks horizontally.""" + return masks[:, :, ::-1] + + +def random_horizontal_flip(image, normalized_boxes=None, masks=None, seed=1): + """Randomly flips input image and bounding boxes.""" + with tf.name_scope('random_horizontal_flip'): + do_flip = tf.greater(tf.random.uniform([], seed=seed), 0.5) + + image = tf.cond( + do_flip, + lambda: horizontal_flip_image(image), + lambda: image) + + if normalized_boxes is not None: + normalized_boxes = tf.cond( + do_flip, + lambda: horizontal_flip_boxes(normalized_boxes), + lambda: normalized_boxes) + + if masks is not None: + masks = tf.cond( + do_flip, + lambda: horizontal_flip_masks(masks), + lambda: masks) + + return image, normalized_boxes, masks + + +def random_horizontal_flip_with_roi( + image: tf.Tensor, + boxes: Optional[tf.Tensor] = None, + masks: Optional[tf.Tensor] = None, + roi_boxes: Optional[tf.Tensor] = None, + seed: int = 1 +) -> Tuple[tf.Tensor, Optional[tf.Tensor], Optional[tf.Tensor], + Optional[tf.Tensor]]: + """Randomly flips input image and bounding boxes. + + Extends preprocess_ops.random_horizontal_flip to also flip roi_boxes used + by ViLD. + + Args: + image: `tf.Tensor`, the image to apply the random flip. + boxes: `tf.Tensor` or `None`, boxes corresponding to the image. + masks: `tf.Tensor` or `None`, masks corresponding to the image. + roi_boxes: `tf.Tensor` or `None`, RoIs corresponding to the image. + seed: Seed for Tensorflow's random number generator. + + Returns: + image: `tf.Tensor`, flipped image. + boxes: `tf.Tensor` or `None`, flipped boxes corresponding to the image. + masks: `tf.Tensor` or `None`, flipped masks corresponding to the image. + roi_boxes: `tf.Tensor` or `None`, flipped RoIs corresponding to the image. + """ + with tf.name_scope('random_horizontal_flip'): + do_flip = tf.greater(tf.random.uniform([], seed=seed), 0.5) + + image = tf.cond(do_flip, lambda: horizontal_flip_image(image), + lambda: image) + + if boxes is not None: + boxes = tf.cond(do_flip, lambda: horizontal_flip_boxes(boxes), + lambda: boxes) + + if masks is not None: + masks = tf.cond(do_flip, lambda: horizontal_flip_masks(masks), + lambda: masks) + + if roi_boxes is not None: + roi_boxes = tf.cond(do_flip, lambda: horizontal_flip_boxes(roi_boxes), + lambda: roi_boxes) + + return image, boxes, masks, roi_boxes + + +def color_jitter(image: tf.Tensor, + brightness: Optional[float] = 0., + contrast: Optional[float] = 0., + saturation: Optional[float] = 0., + seed: Optional[int] = None) -> tf.Tensor: + """Applies color jitter to an image, similarly to torchvision`s ColorJitter. + + Args: + image (tf.Tensor): Of shape [height, width, 3] and type uint8. + brightness (float, optional): Magnitude for brightness jitter. Defaults to + 0. + contrast (float, optional): Magnitude for contrast jitter. Defaults to 0. + saturation (float, optional): Magnitude for saturation jitter. Defaults to + 0. + seed (int, optional): Random seed. Defaults to None. + + Returns: + tf.Tensor: The augmented `image` of type uint8. + """ + image = tf.cast(image, dtype=tf.uint8) + image = random_brightness(image, brightness, seed=seed) + image = random_contrast(image, contrast, seed=seed) + image = random_saturation(image, saturation, seed=seed) + return image + + +def random_brightness(image: tf.Tensor, + brightness: float = 0., + seed: Optional[int] = None) -> tf.Tensor: + """Jitters brightness of an image. + + Args: + image (tf.Tensor): Of shape [height, width, 3] and type uint8. + brightness (float, optional): Magnitude for brightness jitter. Defaults to + 0. + seed (int, optional): Random seed. Defaults to None. + + Returns: + tf.Tensor: The augmented `image` of type uint8. + """ + assert brightness >= 0, '`brightness` must be positive' + brightness = tf.random.uniform([], + max(0, 1 - brightness), + 1 + brightness, + seed=seed, + dtype=tf.float32) + return augment.brightness(image, brightness) + + +def random_contrast(image: tf.Tensor, + contrast: float = 0., + seed: Optional[int] = None) -> tf.Tensor: + """Jitters contrast of an image, similarly to torchvision`s ColorJitter. + + Args: + image (tf.Tensor): Of shape [height, width, 3] and type uint8. + contrast (float, optional): Magnitude for contrast jitter. Defaults to 0. + seed (int, optional): Random seed. Defaults to None. + + Returns: + tf.Tensor: The augmented `image` of type uint8. + """ + assert contrast >= 0, '`contrast` must be positive' + contrast = tf.random.uniform([], + max(0, 1 - contrast), + 1 + contrast, + seed=seed, + dtype=tf.float32) + return augment.contrast(image, contrast) + + +def random_saturation(image: tf.Tensor, + saturation: float = 0., + seed: Optional[int] = None) -> tf.Tensor: + """Jitters saturation of an image, similarly to torchvision`s ColorJitter. + + Args: + image (tf.Tensor): Of shape [height, width, 3] and type uint8. + saturation (float, optional): Magnitude for saturation jitter. Defaults to + 0. + seed (int, optional): Random seed. Defaults to None. + + Returns: + tf.Tensor: The augmented `image` of type uint8. + """ + assert saturation >= 0, '`saturation` must be positive' + saturation = tf.random.uniform([], + max(0, 1 - saturation), + 1 + saturation, + seed=seed, + dtype=tf.float32) + return _saturation(image, saturation) + + +def _saturation(image: tf.Tensor, + saturation: Optional[float] = 0.) -> tf.Tensor: + return augment.blend( + tf.repeat(tf.image.rgb_to_grayscale(image), 3, axis=-1), image, + saturation) + + +def random_crop_image_with_boxes_and_labels(img, boxes, labels, min_scale, + aspect_ratio_range, + min_overlap_params, max_retry): + """Crops a random slice from the input image. + + The function will correspondingly recompute the bounding boxes and filter out + outside boxes and their labels. + + References: + [1] End-to-End Object Detection with Transformers + https://arxiv.org/abs/2005.12872 + + The preprocessing steps: + 1. Sample a minimum IoU overlap. + 2. For each trial, sample the new image width, height, and top-left corner. + 3. Compute the IoUs of bounding boxes with the cropped image and retry if + the maximum IoU is below the sampled threshold. + 4. Find boxes whose centers are in the cropped image. + 5. Compute new bounding boxes in the cropped region and only select those + boxes' labels. + + Args: + img: a 'Tensor' of shape [height, width, 3] representing the input image. + boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding + boxes with (ymin, xmin, ymax, xmax). + labels: a 'Tensor' of shape [N,] representing the class labels of the boxes. + min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random + scale variable. + aspect_ratio_range: a list of two 'float' that specifies the lower and upper + bound of the random aspect ratio. + min_overlap_params: a list of four 'float' representing the min value, max + value, step size, and offset for the minimum overlap sample. + max_retry: an 'int' representing the number of trials for cropping. If it is + exhausted, no cropping will be performed. + + Returns: + img: a Tensor representing the random cropped image. Can be the + original image if max_retry is exhausted. + boxes: a Tensor representing the bounding boxes in the cropped image. + labels: a Tensor representing the new bounding boxes' labels. + """ + + shape = tf.shape(img) + original_h = shape[0] + original_w = shape[1] + + minval, maxval, step, offset = min_overlap_params + + min_overlap = tf.math.floordiv( + tf.random.uniform([], minval=minval, maxval=maxval), step) * step - offset + + min_overlap = tf.clip_by_value(min_overlap, 0.0, 1.1) + + if min_overlap > 1.0: + return img, boxes, labels + + aspect_ratio_low = aspect_ratio_range[0] + aspect_ratio_high = aspect_ratio_range[1] + + for _ in tf.range(max_retry): + scale_h = tf.random.uniform([], min_scale, 1.0) + scale_w = tf.random.uniform([], min_scale, 1.0) + new_h = tf.cast( + scale_h * tf.cast(original_h, dtype=tf.float32), dtype=tf.int32) + new_w = tf.cast( + scale_w * tf.cast(original_w, dtype=tf.float32), dtype=tf.int32) + + # Aspect ratio has to be in the prespecified range + aspect_ratio = new_h / new_w + if aspect_ratio_low > aspect_ratio or aspect_ratio > aspect_ratio_high: + continue + + left = tf.random.uniform([], 0, original_w - new_w, dtype=tf.int32) + right = left + new_w + top = tf.random.uniform([], 0, original_h - new_h, dtype=tf.int32) + bottom = top + new_h + + normalized_left = tf.cast( + left, dtype=tf.float32) / tf.cast( + original_w, dtype=tf.float32) + normalized_right = tf.cast( + right, dtype=tf.float32) / tf.cast( + original_w, dtype=tf.float32) + normalized_top = tf.cast( + top, dtype=tf.float32) / tf.cast( + original_h, dtype=tf.float32) + normalized_bottom = tf.cast( + bottom, dtype=tf.float32) / tf.cast( + original_h, dtype=tf.float32) + + cropped_box = tf.expand_dims( + tf.stack([ + normalized_top, + normalized_left, + normalized_bottom, + normalized_right, + ]), + axis=0) + iou = box_ops.bbox_overlap( + tf.expand_dims(cropped_box, axis=0), + tf.expand_dims(boxes, axis=0)) # (1, 1, n_ground_truth) + iou = tf.squeeze(iou, axis=[0, 1]) + + # If not a single bounding box has a Jaccard overlap of greater than + # the minimum, try again + if tf.reduce_max(iou) < min_overlap: + continue + + centroids = box_ops.yxyx_to_cycxhw(boxes) + mask = tf.math.logical_and( + tf.math.logical_and(centroids[:, 0] > normalized_top, + centroids[:, 0] < normalized_bottom), + tf.math.logical_and(centroids[:, 1] > normalized_left, + centroids[:, 1] < normalized_right)) + # If not a single bounding box has its center in the crop, try again. + if tf.reduce_sum(tf.cast(mask, dtype=tf.int32)) > 0: + indices = tf.squeeze(tf.where(mask), axis=1) + + filtered_boxes = tf.gather(boxes, indices) + + boxes = tf.clip_by_value( + (filtered_boxes[..., :] * tf.cast( + tf.stack([original_h, original_w, original_h, original_w]), + dtype=tf.float32) - + tf.cast(tf.stack([top, left, top, left]), dtype=tf.float32)) / + tf.cast(tf.stack([new_h, new_w, new_h, new_w]), dtype=tf.float32), + 0.0, 1.0) + + img = tf.image.crop_to_bounding_box(img, top, left, bottom - top, + right - left) + + labels = tf.gather(labels, indices) + break + + return img, boxes, labels + + +def random_crop(image, + boxes, + labels, + min_scale=0.3, + aspect_ratio_range=(0.5, 2.0), + min_overlap_params=(0.0, 1.4, 0.2, 0.1), + max_retry=50, + seed=None): + """Randomly crop the image and boxes, filtering labels. + + Args: + image: a 'Tensor' of shape [height, width, 3] representing the input image. + boxes: a 'Tensor' of shape [N, 4] representing the ground-truth bounding + boxes with (ymin, xmin, ymax, xmax). + labels: a 'Tensor' of shape [N,] representing the class labels of the boxes. + min_scale: a 'float' in [0.0, 1.0) indicating the lower bound of the random + scale variable. + aspect_ratio_range: a list of two 'float' that specifies the lower and upper + bound of the random aspect ratio. + min_overlap_params: a list of four 'float' representing the min value, max + value, step size, and offset for the minimum overlap sample. + max_retry: an 'int' representing the number of trials for cropping. If it is + exhausted, no cropping will be performed. + seed: the random number seed of int, but could be None. + + Returns: + image: a Tensor representing the random cropped image. Can be the + original image if max_retry is exhausted. + boxes: a Tensor representing the bounding boxes in the cropped image. + labels: a Tensor representing the new bounding boxes' labels. + """ + with tf.name_scope('random_crop'): + do_crop = tf.greater(tf.random.uniform([], seed=seed), 0.5) + if do_crop: + return random_crop_image_with_boxes_and_labels(image, boxes, labels, + min_scale, + aspect_ratio_range, + min_overlap_params, + max_retry) + else: + return image, boxes, labels diff --git a/official/vision/ops/preprocess_ops_3d.py b/official/vision/ops/preprocess_ops_3d.py new file mode 100644 index 0000000000000000000000000000000000000000..45501a3764e21d51dfa96c60b841f28c3c8281db --- /dev/null +++ b/official/vision/ops/preprocess_ops_3d.py @@ -0,0 +1,401 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Utils for processing video dataset features.""" + +from typing import Optional, Tuple +import tensorflow as tf + + +def _sample_or_pad_sequence_indices(sequence: tf.Tensor, num_steps: int, + stride: int, + offset: tf.Tensor) -> tf.Tensor: + """Returns indices to take for sampling or padding sequences to fixed size.""" + sequence_length = tf.shape(sequence)[0] + sel_idx = tf.range(sequence_length) + + # Repeats sequence until num_steps are available in total. + max_length = num_steps * stride + offset + num_repeats = tf.math.floordiv(max_length + sequence_length - 1, + sequence_length) + sel_idx = tf.tile(sel_idx, [num_repeats]) + + steps = tf.range(offset, offset + num_steps * stride, stride) + return tf.gather(sel_idx, steps) + + +def sample_linspace_sequence(sequence: tf.Tensor, num_windows: int, + num_steps: int, stride: int) -> tf.Tensor: + """Samples `num_windows` segments from sequence with linearly spaced offsets. + + The samples are concatenated in a single `tf.Tensor` in order to have the same + format structure per timestep (e.g. a single frame). If `num_steps` * `stride` + is bigger than the number of timesteps, the sequence is repeated. This + function can be used in evaluation in order to extract enough segments to span + the entire sequence. + + Args: + sequence: Any tensor where the first dimension is timesteps. + num_windows: Number of windows retrieved from the sequence. + num_steps: Number of steps (e.g. frames) to take. + stride: Distance to sample between timesteps. + + Returns: + A single `tf.Tensor` with first dimension `num_windows` * `num_steps`. The + tensor contains the concatenated list of `num_windows` tensors which offsets + have been linearly spaced from input. + """ + sequence_length = tf.shape(sequence)[0] + max_offset = tf.maximum(0, sequence_length - num_steps * stride) + offsets = tf.linspace(0.0, tf.cast(max_offset, tf.float32), num_windows) + offsets = tf.cast(offsets, tf.int32) + + all_indices = [] + for i in range(num_windows): + all_indices.append( + _sample_or_pad_sequence_indices( + sequence=sequence, + num_steps=num_steps, + stride=stride, + offset=offsets[i])) + + indices = tf.concat(all_indices, axis=0) + indices.set_shape((num_windows * num_steps,)) + return tf.gather(sequence, indices) + + +def sample_sequence(sequence: tf.Tensor, + num_steps: int, + random: bool, + stride: int, + seed: Optional[int] = None) -> tf.Tensor: + """Samples a single segment of size `num_steps` from a given sequence. + + If `random` is not `True`, this function will simply sample the central window + of the sequence. Otherwise, a random offset will be chosen in a way that the + desired `num_steps` might be extracted from the sequence. + + Args: + sequence: Any tensor where the first dimension is timesteps. + num_steps: Number of steps (e.g. frames) to take. + random: A boolean indicating whether to random sample the single window. If + `True`, the offset is randomized. If `False`, the middle frame minus half + of `num_steps` is the first frame. + stride: Distance to sample between timesteps. + seed: A deterministic seed to use when sampling. + + Returns: + A single `tf.Tensor` with first dimension `num_steps` with the sampled + segment. + """ + sequence_length = tf.shape(sequence)[0] + + if random: + sequence_length = tf.cast(sequence_length, tf.float32) + frame_stride = tf.cast(stride, tf.float32) + max_offset = tf.cond( + sequence_length > (num_steps - 1) * frame_stride, + lambda: sequence_length - (num_steps - 1) * frame_stride, + lambda: sequence_length) + offset = tf.random.uniform((), + maxval=tf.cast(max_offset, dtype=tf.int32), + dtype=tf.int32, + seed=seed) + else: + offset = (sequence_length - num_steps * stride) // 2 + offset = tf.maximum(0, offset) + + indices = _sample_or_pad_sequence_indices( + sequence=sequence, num_steps=num_steps, stride=stride, offset=offset) + indices.set_shape((num_steps,)) + + return tf.gather(sequence, indices) + + +def sample_segment_sequence(sequence: tf.Tensor, + num_frames: int, + is_training: bool, + seed: Optional[int] = None) -> tf.Tensor: + """Samples a single segment of size `num_frames` from a given sequence. + + This function follows the temporal segment network sampling style + (https://arxiv.org/abs/1608.00859). The video sequence would be divided into + `num_frames` non-overlapping segments with same length. If `is_training` is + `True`, we would randomly sampling one frame for each segment, and when + `is_training` is `False`, only the center frame of each segment is sampled. + + Args: + sequence: Any tensor where the first dimension is timesteps. + num_frames: Number of frames to take. + is_training: A boolean indicating sampling in training or evaluation mode. + seed: A deterministic seed to use when sampling. + + Returns: + A single `tf.Tensor` with first dimension `num_steps` with the sampled + segment. + """ + sequence_length = tf.shape(sequence)[0] + + sequence_length = tf.cast(sequence_length, tf.float32) + segment_length = tf.cast(sequence_length // num_frames, tf.float32) + segment_indices = tf.linspace(0.0, sequence_length, num_frames + 1) + segment_indices = tf.cast(segment_indices, tf.int32) + + if is_training: + segment_length = tf.cast(segment_length, tf.int32) + # pylint:disable=g-long-lambda + segment_offsets = tf.cond( + segment_length == 0, + lambda: tf.zeros(shape=(num_frames,), dtype=tf.int32), + lambda: tf.random.uniform( + shape=(num_frames,), + minval=0, + maxval=segment_length, + dtype=tf.int32, + seed=seed)) + # pylint:disable=g-long-lambda + + else: + # Only sampling central frame during inference for being deterministic. + segment_offsets = tf.ones( + shape=(num_frames,), dtype=tf.int32) * tf.cast( + segment_length // 2, dtype=tf.int32) + + indices = segment_indices[:-1] + segment_offsets + indices.set_shape((num_frames,)) + + return tf.gather(sequence, indices) + + +def decode_jpeg(image_string: tf.Tensor, channels: int = 0) -> tf.Tensor: + """Decodes JPEG raw bytes string into a RGB uint8 Tensor. + + Args: + image_string: A `tf.Tensor` of type strings with the raw JPEG bytes where + the first dimension is timesteps. + channels: Number of channels of the JPEG image. Allowed values are 0, 1 and + 3. If 0, the number of channels will be calculated at runtime and no + static shape is set. + + Returns: + A Tensor of shape [T, H, W, C] of type uint8 with the decoded images. + """ + return tf.map_fn( + lambda x: tf.image.decode_jpeg(x, channels=channels), + image_string, + back_prop=False, + dtype=tf.uint8) + + +def crop_image(frames: tf.Tensor, + target_height: int, + target_width: int, + random: bool = False, + num_crops: int = 1, + seed: Optional[int] = None) -> tf.Tensor: + """Crops the image sequence of images. + + If requested size is bigger than image size, image is padded with 0. If not + random cropping, a central crop is performed if num_crops is 1. + + Args: + frames: A Tensor of dimension [timesteps, in_height, in_width, channels]. + target_height: Target cropped image height. + target_width: Target cropped image width. + random: A boolean indicating if crop should be randomized. + num_crops: Number of crops (support 1 for central crop and 3 for 3-crop). + seed: A deterministic seed to use when random cropping. + + Returns: + A Tensor of shape [timesteps, out_height, out_width, channels] of type uint8 + with the cropped images. + """ + if random: + # Random spatial crop. + shape = tf.shape(frames) + # If a static_shape is available (e.g. when using this method from add_image + # method), it will be used to have an output tensor with static shape. + static_shape = frames.shape.as_list() + seq_len = shape[0] if static_shape[0] is None else static_shape[0] + channels = shape[3] if static_shape[3] is None else static_shape[3] + frames = tf.image.random_crop( + frames, (seq_len, target_height, target_width, channels), seed) + else: + if num_crops == 1: + # Central crop or pad. + frames = tf.image.resize_with_crop_or_pad(frames, target_height, + target_width) + + elif num_crops == 3: + # Three-crop evaluation. + shape = tf.shape(frames) + static_shape = frames.shape.as_list() + seq_len = shape[0] if static_shape[0] is None else static_shape[0] + height = shape[1] if static_shape[1] is None else static_shape[1] + width = shape[2] if static_shape[2] is None else static_shape[2] + channels = shape[3] if static_shape[3] is None else static_shape[3] + + size = tf.convert_to_tensor( + (seq_len, target_height, target_width, channels)) + + offset_1 = tf.broadcast_to([0, 0, 0, 0], [4]) + # pylint:disable=g-long-lambda + offset_2 = tf.cond( + tf.greater_equal(height, width), + true_fn=lambda: tf.broadcast_to([ + 0, tf.cast(height, tf.float32) / 2 - target_height // 2, 0, 0 + ], [4]), + false_fn=lambda: tf.broadcast_to([ + 0, 0, tf.cast(width, tf.float32) / 2 - target_width // 2, 0 + ], [4])) + offset_3 = tf.cond( + tf.greater_equal(height, width), + true_fn=lambda: tf.broadcast_to( + [0, tf.cast(height, tf.float32) - target_height, 0, 0], [4]), + false_fn=lambda: tf.broadcast_to( + [0, 0, tf.cast(width, tf.float32) - target_width, 0], [4])) + # pylint:disable=g-long-lambda + + crops = [] + for offset in [offset_1, offset_2, offset_3]: + offset = tf.cast(tf.math.round(offset), tf.int32) + crops.append(tf.slice(frames, offset, size)) + frames = tf.concat(crops, axis=0) + + else: + raise NotImplementedError( + f"Only 1-crop and 3-crop are supported. Found {num_crops!r}.") + + return frames + + +def resize_smallest(frames: tf.Tensor, min_resize: int) -> tf.Tensor: + """Resizes frames so that min(`height`, `width`) is equal to `min_resize`. + + This function will not do anything if the min(`height`, `width`) is already + equal to `min_resize`. This allows to save compute time. + + Args: + frames: A Tensor of dimension [timesteps, input_h, input_w, channels]. + min_resize: Minimum size of the final image dimensions. + + Returns: + A Tensor of shape [timesteps, output_h, output_w, channels] of type + frames.dtype where min(output_h, output_w) = min_resize. + """ + shape = tf.shape(frames) + input_h = shape[1] + input_w = shape[2] + + output_h = tf.maximum(min_resize, (input_h * min_resize) // input_w) + output_w = tf.maximum(min_resize, (input_w * min_resize) // input_h) + + def resize_fn(): + frames_resized = tf.image.resize(frames, (output_h, output_w)) + return tf.cast(frames_resized, frames.dtype) + + should_resize = tf.math.logical_or( + tf.not_equal(input_w, output_w), tf.not_equal(input_h, output_h)) + frames = tf.cond(should_resize, resize_fn, lambda: frames) + + return frames + + +def random_crop_resize(frames: tf.Tensor, output_h: int, output_w: int, + num_frames: int, num_channels: int, + aspect_ratio: Tuple[float, float], + area_range: Tuple[float, float]) -> tf.Tensor: + """First crops clip with jittering and then resizes to (output_h, output_w). + + Args: + frames: A Tensor of dimension [timesteps, input_h, input_w, channels]. + output_h: Resized image height. + output_w: Resized image width. + num_frames: Number of input frames per clip. + num_channels: Number of channels of the clip. + aspect_ratio: Float tuple with the aspect range for cropping. + area_range: Float tuple with the area range for cropping. + + Returns: + A Tensor of shape [timesteps, output_h, output_w, channels] of type + frames.dtype. + """ + shape = tf.shape(frames) + seq_len, _, _, channels = shape[0], shape[1], shape[2], shape[3] + bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4]) + factor = output_w / output_h + aspect_ratio = (aspect_ratio[0] * factor, aspect_ratio[1] * factor) + sample_distorted_bbox = tf.image.sample_distorted_bounding_box( + shape[1:], + bounding_boxes=bbox, + min_object_covered=0.1, + aspect_ratio_range=aspect_ratio, + area_range=area_range, + max_attempts=100, + use_image_if_no_bounding_boxes=True) + bbox_begin, bbox_size, _ = sample_distorted_bbox + offset_y, offset_x, _ = tf.unstack(bbox_begin) + target_height, target_width, _ = tf.unstack(bbox_size) + size = tf.convert_to_tensor((seq_len, target_height, target_width, channels)) + offset = tf.convert_to_tensor((0, offset_y, offset_x, 0)) + frames = tf.slice(frames, offset, size) + frames = tf.cast(tf.image.resize(frames, (output_h, output_w)), frames.dtype) + frames.set_shape((num_frames, output_h, output_w, num_channels)) + return frames + + +def random_flip_left_right(frames: tf.Tensor, + seed: Optional[int] = None) -> tf.Tensor: + """Flips all the frames with a probability of 50%. + + Args: + frames: A Tensor of shape [timesteps, input_h, input_w, channels]. + seed: A seed to use for the random sampling. + + Returns: + A Tensor of shape [timesteps, output_h, output_w, channels] eventually + flipped left right. + """ + is_flipped = tf.random.uniform((), + minval=0, + maxval=2, + dtype=tf.int32, + seed=seed) + + frames = tf.cond( + tf.equal(is_flipped, 1), + true_fn=lambda: tf.image.flip_left_right(frames), + false_fn=lambda: frames) + return frames + + +def normalize_image(frames: tf.Tensor, + zero_centering_image: bool, + dtype: tf.dtypes.DType = tf.float32) -> tf.Tensor: + """Normalizes images. + + Args: + frames: A Tensor of numbers. + zero_centering_image: If True, results are in [-1, 1], if False, results are + in [0, 1]. + dtype: Type of output Tensor. + + Returns: + A Tensor of same shape as the input and of the given type. + """ + frames = tf.cast(frames, dtype) + if zero_centering_image: + return frames * (2.0 / 255.0) - 1.0 + else: + return frames / 255.0 diff --git a/official/vision/beta/ops/preprocess_ops_3d_test.py b/official/vision/ops/preprocess_ops_3d_test.py similarity index 92% rename from official/vision/beta/ops/preprocess_ops_3d_test.py rename to official/vision/ops/preprocess_ops_3d_test.py index b2db9334b7d19bb6f6194219757b2f22e9c71fe0..5b15616be9a2e240dcc221174d6e861201b268c9 100644 --- a/official/vision/beta/ops/preprocess_ops_3d_test.py +++ b/official/vision/ops/preprocess_ops_3d_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 import io import itertools @@ -20,7 +19,7 @@ import numpy as np from PIL import Image import tensorflow as tf -from official.vision.beta.ops import preprocess_ops_3d +from official.vision.ops import preprocess_ops_3d class ParserUtilsTest(tf.test.TestCase): @@ -73,6 +72,16 @@ class ParserUtilsTest(tf.test.TestCase): self.assertBetween(offset_3, 0, 99) self.assertAllEqual(sampled_seq_3, range(offset_3, offset_3 + 10)) + def test_sample_segment_sequence(self): + sequence = tf.range(100) + sampled_seq_1 = preprocess_ops_3d.sample_segment_sequence( + sequence, 10, False) + sampled_seq_2 = preprocess_ops_3d.sample_segment_sequence( + sequence, 10, True) + self.assertAllEqual(sampled_seq_1, [5 + i * 10 for i in range(10)]) + for idx, v in enumerate(sampled_seq_2): + self.assertBetween(v - idx * 10, 0, 10) + def test_decode_jpeg(self): # Create a random RGB JPEG image. random_image = np.random.randint(0, 256, size=(263, 320, 3), dtype=np.uint8) diff --git a/official/vision/beta/ops/preprocess_ops_test.py b/official/vision/ops/preprocess_ops_test.py similarity index 82% rename from official/vision/beta/ops/preprocess_ops_test.py rename to official/vision/ops/preprocess_ops_test.py index db9ecf887fb431f284848fc74bdba443e2e29216..91ae010988607f8dd7cf63302449db42f594a54a 100644 --- a/official/vision/beta/ops/preprocess_ops_test.py +++ b/official/vision/ops/preprocess_ops_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -21,7 +21,7 @@ import numpy as np from PIL import Image import tensorflow as tf -from official.vision.beta.ops import preprocess_ops +from official.vision.ops import preprocess_ops def _encode_image(image_array, fmt): @@ -225,6 +225,66 @@ class InputUtilsTest(parameterized.TestCase, tf.test.TestCase): np.random.randint(low=0, high=num_boxes, size=(num_boxes,)), tf.int64) _ = preprocess_ops.random_crop(image, boxes, labels) + @parameterized.parameters( + ((640, 640, 3), (1000, 1000), None, (1000, 1000, 3)), + ((1280, 640, 3), 320, None, (640, 320, 3)), + ((640, 1280, 3), 320, None, (320, 640, 3)), + ((640, 640, 3), 320, 100, (100, 100, 3))) + def test_resize_image(self, input_shape, size, max_size, expected_shape): + resized_img, image_info = preprocess_ops.resize_image( + tf.zeros((input_shape)), size, max_size) + self.assertAllEqual(tf.shape(resized_img), expected_shape) + self.assertAllEqual(image_info[0], input_shape[:-1]) + self.assertAllEqual(image_info[1], expected_shape[:-1]) + self.assertAllEqual( + image_info[2], + np.array(expected_shape[:-1]) / np.array(input_shape[:-1])) + self.assertAllEqual(image_info[3], [0, 0]) + + def test_resize_and_crop_masks(self): + # shape: (2, 1, 4, 3) + masks = tf.constant([[[ + [0, 1, 2], + [3, 4, 5], + [6, 7, 8], + [9, 10, 11], + ]], [[ + [12, 13, 14], + [15, 16, 17], + [18, 19, 20], + [21, 22, 23], + ]]]) + output = preprocess_ops.resize_and_crop_masks( + masks, image_scale=[2.0, 0.5], output_size=[2, 3], offset=[1, 0]) + # shape: (2, 2, 3, 3) + expected_output = tf.constant([ + [ + [ + [3, 4, 5], + [9, 10, 11], + [0, 0, 0], + ], + [ + [0, 0, 0], + [0, 0, 0], + [0, 0, 0], + ], + ], + [ + [ + [15, 16, 17], + [21, 22, 23], + [0, 0, 0], + ], + [ + [0, 0, 0], + [0, 0, 0], + [0, 0, 0], + ], + ], + ]) + self.assertAllEqual(expected_output, output) + if __name__ == '__main__': tf.test.main() diff --git a/official/vision/beta/ops/sampling_ops.py b/official/vision/ops/sampling_ops.py similarity index 99% rename from official/vision/beta/ops/sampling_ops.py rename to official/vision/ops/sampling_ops.py index bd19e3ff727d4febe0a88653015c1f369f95e75d..f86979e13cbaafa274fb87f9eee791caa88b122c 100644 --- a/official/vision/beta/ops/sampling_ops.py +++ b/official/vision/ops/sampling_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/ops/spatial_transform_ops.py b/official/vision/ops/spatial_transform_ops.py new file mode 100644 index 0000000000000000000000000000000000000000..5c5408b7cdb0c97f117c429c1d050998bec95494 --- /dev/null +++ b/official/vision/ops/spatial_transform_ops.py @@ -0,0 +1,852 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Spatial transform ops.""" + +from typing import Dict, Tuple + +import numpy as np +import tensorflow as tf + +from official.vision.ops.box_ops import bbox2mask + +_EPSILON = 1e-8 + + +def _feature_bilinear_interpolation(features: tf.Tensor, kernel_y: tf.Tensor, + kernel_x: tf.Tensor) -> tf.Tensor: + """Feature bilinear interpolation. + + The RoIAlign feature f can be computed by bilinear interpolation + of four neighboring feature points f0, f1, f2, and f3. + + f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T + [f10, f11]] + f(y, x) = (hy*hx)f00 + (hy*lx)f01 + (ly*hx)f10 + (lx*ly)f11 + f(y, x) = w00*f00 + w01*f01 + w10*f10 + w11*f11 + kernel_y = [hy, ly] + kernel_x = [hx, lx] + + Args: + features: The features are in shape of [batch_size, num_boxes, output_size * + 2, output_size * 2, num_filters]. + kernel_y: Tensor of size [batch_size, boxes, output_size, 2, 1]. + kernel_x: Tensor of size [batch_size, boxes, output_size, 2, 1]. + + Returns: + A 5-D tensor representing feature crop of shape + [batch_size, num_boxes, output_size, output_size, num_filters]. + + """ + features_shape = tf.shape(features) + batch_size, num_boxes, output_size, num_filters = ( + features_shape[0], features_shape[1], features_shape[2], + features_shape[4]) + + output_size = output_size // 2 + kernel_y = tf.reshape(kernel_y, [batch_size, num_boxes, output_size * 2, 1]) + kernel_x = tf.reshape(kernel_x, [batch_size, num_boxes, 1, output_size * 2]) + # Use implicit broadcast to generate the interpolation kernel. The + # multiplier `4` is for avg pooling. + interpolation_kernel = kernel_y * kernel_x * 4 + + # Interpolate the gathered features with computed interpolation kernels. + features *= tf.cast( + tf.expand_dims(interpolation_kernel, axis=-1), dtype=features.dtype) + features = tf.reshape( + features, + [batch_size * num_boxes, output_size * 2, output_size * 2, num_filters]) + features = tf.nn.avg_pool(features, [1, 2, 2, 1], [1, 2, 2, 1], 'VALID') + features = tf.reshape( + features, [batch_size, num_boxes, output_size, output_size, num_filters]) + return features + + +def _compute_grid_positions( + boxes: tf.Tensor, boundaries: tf.Tensor, output_size: int, + sample_offset: float) -> Tuple[tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor]: + """Computes the grid position w.r.t. + + the corresponding feature map. + + Args: + boxes: a 3-D tensor of shape [batch_size, num_boxes, 4] encoding the + information of each box w.r.t. the corresponding feature map. + boxes[:, :, 0:2] are the grid position in (y, x) (float) of the top-left + corner of each box. boxes[:, :, 2:4] are the box sizes in (h, w) (float) + in terms of the number of pixels of the corresponding feature map size. + boundaries: a 3-D tensor of shape [batch_size, num_boxes, 2] representing + the boundary (in (y, x)) of the corresponding feature map for each box. + Any resampled grid points that go beyond the bounary will be clipped. + output_size: a scalar indicating the output crop size. + sample_offset: a float number in [0, 1] indicates the subpixel sample offset + from grid point. + + Returns: + kernel_y: Tensor of size [batch_size, boxes, output_size, 2, 1]. + kernel_x: Tensor of size [batch_size, boxes, output_size, 2, 1]. + box_grid_y0y1: Tensor of size [batch_size, boxes, output_size, 2] + box_grid_x0x1: Tensor of size [batch_size, boxes, output_size, 2] + """ + boxes_shape = tf.shape(boxes) + batch_size, num_boxes = boxes_shape[0], boxes_shape[1] + if batch_size is None: + batch_size = tf.shape(boxes)[0] + box_grid_x = [] + box_grid_y = [] + for i in range(output_size): + box_grid_x.append(boxes[:, :, 1] + + (i + sample_offset) * boxes[:, :, 3] / output_size) + box_grid_y.append(boxes[:, :, 0] + + (i + sample_offset) * boxes[:, :, 2] / output_size) + box_grid_x = tf.stack(box_grid_x, axis=2) + box_grid_y = tf.stack(box_grid_y, axis=2) + + box_grid_y0 = tf.floor(box_grid_y) + box_grid_x0 = tf.floor(box_grid_x) + box_grid_x0 = tf.maximum(tf.cast(0., dtype=box_grid_x0.dtype), box_grid_x0) + box_grid_y0 = tf.maximum(tf.cast(0., dtype=box_grid_y0.dtype), box_grid_y0) + + box_grid_x0 = tf.minimum(box_grid_x0, tf.expand_dims(boundaries[:, :, 1], -1)) + box_grid_x1 = tf.minimum(box_grid_x0 + 1, + tf.expand_dims(boundaries[:, :, 1], -1)) + box_grid_y0 = tf.minimum(box_grid_y0, tf.expand_dims(boundaries[:, :, 0], -1)) + box_grid_y1 = tf.minimum(box_grid_y0 + 1, + tf.expand_dims(boundaries[:, :, 0], -1)) + + box_gridx0x1 = tf.stack([box_grid_x0, box_grid_x1], axis=-1) + box_gridy0y1 = tf.stack([box_grid_y0, box_grid_y1], axis=-1) + + # The RoIAlign feature f can be computed by bilinear interpolation of four + # neighboring feature points f0, f1, f2, and f3. + # f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T + # [f10, f11]] + # f(y, x) = (hy*hx)f00 + (hy*lx)f01 + (ly*hx)f10 + (lx*ly)f11 + # f(y, x) = w00*f00 + w01*f01 + w10*f10 + w11*f11 + ly = box_grid_y - box_grid_y0 + lx = box_grid_x - box_grid_x0 + hy = 1.0 - ly + hx = 1.0 - lx + kernel_y = tf.reshape( + tf.stack([hy, ly], axis=3), [batch_size, num_boxes, output_size, 2, 1]) + kernel_x = tf.reshape( + tf.stack([hx, lx], axis=3), [batch_size, num_boxes, output_size, 2, 1]) + return kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 + + +def multilevel_crop_and_resize(features: Dict[str, tf.Tensor], + boxes: tf.Tensor, + output_size: int = 7, + sample_offset: float = 0.5) -> tf.Tensor: + """Crop and resize on multilevel feature pyramid. + + Generate the (output_size, output_size) set of pixels for each input box + by first locating the box into the correct feature level, and then cropping + and resizing it using the correspoding feature map of that level. + + Args: + features: A dictionary with key as pyramid level and value as features. The + features are in shape of [batch_size, height_l, width_l, num_filters]. + boxes: A 3-D Tensor of shape [batch_size, num_boxes, 4]. Each row represents + a box with [y1, x1, y2, x2] in un-normalized coordinates. + output_size: A scalar to indicate the output crop size. + sample_offset: a float number in [0, 1] indicates the subpixel sample offset + from grid point. + + Returns: + A 5-D tensor representing feature crop of shape + [batch_size, num_boxes, output_size, output_size, num_filters]. + """ + + with tf.name_scope('multilevel_crop_and_resize'): + levels = list(features.keys()) + min_level = int(min(levels)) + max_level = int(max(levels)) + features_shape = tf.shape(features[str(min_level)]) + batch_size, max_feature_height, max_feature_width, num_filters = ( + features_shape[0], features_shape[1], features_shape[2], + features_shape[3]) + + num_boxes = tf.shape(boxes)[1] + + # Stack feature pyramid into a features_all of shape + # [batch_size, levels, height, width, num_filters]. + features_all = [] + feature_heights = [] + feature_widths = [] + for level in range(min_level, max_level + 1): + shape = features[str(level)].get_shape().as_list() + feature_heights.append(shape[1]) + feature_widths.append(shape[2]) + # Concat tensor of [batch_size, height_l * width_l, num_filters] for each + # levels. + features_all.append( + tf.reshape(features[str(level)], [batch_size, -1, num_filters])) + features_r2 = tf.reshape(tf.concat(features_all, 1), [-1, num_filters]) + + # Calculate height_l * width_l for each level. + level_dim_sizes = [ + feature_widths[i] * feature_heights[i] + for i in range(len(feature_widths)) + ] + # level_dim_offsets is accumulated sum of level_dim_size. + level_dim_offsets = [0] + for i in range(len(feature_widths) - 1): + level_dim_offsets.append(level_dim_offsets[i] + level_dim_sizes[i]) + batch_dim_size = level_dim_offsets[-1] + level_dim_sizes[-1] + level_dim_offsets = tf.constant(level_dim_offsets, tf.int32) + height_dim_sizes = tf.constant(feature_widths, tf.int32) + + # Assigns boxes to the right level. + box_width = boxes[:, :, 3] - boxes[:, :, 1] + box_height = boxes[:, :, 2] - boxes[:, :, 0] + areas_sqrt = tf.sqrt( + tf.cast(box_height, tf.float32) * tf.cast(box_width, tf.float32)) + + levels = tf.cast( + tf.math.floordiv( + tf.math.log(tf.math.divide_no_nan(areas_sqrt, 224.0)), + tf.math.log(2.0)) + 4.0, + dtype=tf.int32) + # Maps levels between [min_level, max_level]. + levels = tf.minimum(max_level, tf.maximum(levels, min_level)) + + # Projects box location and sizes to corresponding feature levels. + scale_to_level = tf.cast( + tf.pow(tf.constant(2.0), tf.cast(levels, tf.float32)), + dtype=boxes.dtype) + boxes /= tf.expand_dims(scale_to_level, axis=2) + box_width /= scale_to_level + box_height /= scale_to_level + boxes = tf.concat([boxes[:, :, 0:2], + tf.expand_dims(box_height, -1), + tf.expand_dims(box_width, -1)], axis=-1) + + # Maps levels to [0, max_level-min_level]. + levels -= min_level + level_strides = tf.pow([[2.0]], tf.cast(levels, tf.float32)) + boundary = tf.cast( + tf.concat([ + tf.expand_dims( + [[tf.cast(max_feature_height, tf.float32)]] / level_strides - 1, + axis=-1), + tf.expand_dims( + [[tf.cast(max_feature_width, tf.float32)]] / level_strides - 1, + axis=-1), + ], + axis=-1), boxes.dtype) + + # Compute grid positions. + kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 = _compute_grid_positions( + boxes, boundary, output_size, sample_offset) + + x_indices = tf.cast( + tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size * 2]), + dtype=tf.int32) + y_indices = tf.cast( + tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size * 2]), + dtype=tf.int32) + + batch_size_offset = tf.tile( + tf.reshape( + tf.range(batch_size) * batch_dim_size, [batch_size, 1, 1, 1]), + [1, num_boxes, output_size * 2, output_size * 2]) + # Get level offset for each box. Each box belongs to one level. + levels_offset = tf.tile( + tf.reshape( + tf.gather(level_dim_offsets, levels), + [batch_size, num_boxes, 1, 1]), + [1, 1, output_size * 2, output_size * 2]) + y_indices_offset = tf.tile( + tf.reshape( + y_indices * tf.expand_dims(tf.gather(height_dim_sizes, levels), -1), + [batch_size, num_boxes, output_size * 2, 1]), + [1, 1, 1, output_size * 2]) + x_indices_offset = tf.tile( + tf.reshape(x_indices, [batch_size, num_boxes, 1, output_size * 2]), + [1, 1, output_size * 2, 1]) + indices = tf.reshape( + batch_size_offset + levels_offset + y_indices_offset + x_indices_offset, + [-1]) + + # TODO(wangtao): replace tf.gather with tf.gather_nd and try to get similar + # performance. + features_per_box = tf.reshape( + tf.gather(features_r2, indices), + [batch_size, num_boxes, output_size * 2, output_size * 2, num_filters]) + + # Bilinear interpolation. + features_per_box = _feature_bilinear_interpolation( + features_per_box, kernel_y, kernel_x) + return features_per_box + + +def _selective_crop_and_resize(features: tf.Tensor, + boxes: tf.Tensor, + box_levels: tf.Tensor, + boundaries: tf.Tensor, + output_size: int = 7, + sample_offset: float = 0.5, + use_einsum_gather: bool = False) -> tf.Tensor: + """Crop and resize boxes on a set of feature maps. + + Given multiple features maps indexed by different levels, and a set of boxes + where each box is mapped to a certain level, it selectively crops and resizes + boxes from the corresponding feature maps to generate the box features. + + We follow the ROIAlign technique (see https://arxiv.org/pdf/1703.06870.pdf, + figure 3 for reference). Specifically, for each feature map, we select an + (output_size, output_size) set of pixels corresponding to the box location, + and then use bilinear interpolation to select the feature value for each + pixel. + + For performance, we perform the gather and interpolation on all layers as a + single operation. In this op the multi-level features are first stacked and + gathered into [2*output_size, 2*output_size] feature points. Then bilinear + interpolation is performed on the gathered feature points to generate + [output_size, output_size] RoIAlign feature map. + + Here is the step-by-step algorithm: + 1. The multi-level features are gathered into a + [batch_size, num_boxes, output_size*2, output_size*2, num_filters] + Tensor. The Tensor contains four neighboring feature points for each + vertex in the output grid. + 2. Compute the interpolation kernel of shape + [batch_size, num_boxes, output_size*2, output_size*2]. The last 2 axis + can be seen as stacking 2x2 interpolation kernels for all vertices in the + output grid. + 3. Element-wise multiply the gathered features and interpolation kernel. + Then apply 2x2 average pooling to reduce spatial dimension to + output_size. + + Args: + features: a 5-D tensor of shape [batch_size, num_levels, max_height, + max_width, num_filters] where cropping and resizing are based. + boxes: a 3-D tensor of shape [batch_size, num_boxes, 4] encoding the + information of each box w.r.t. the corresponding feature map. + boxes[:, :, 0:2] are the grid position in (y, x) (float) of the top-left + corner of each box. boxes[:, :, 2:4] are the box sizes in (h, w) (float) + in terms of the number of pixels of the corresponding feature map size. + box_levels: a 3-D tensor of shape [batch_size, num_boxes, 1] representing + the 0-based corresponding feature level index of each box. + boundaries: a 3-D tensor of shape [batch_size, num_boxes, 2] representing + the boundary (in (y, x)) of the corresponding feature map for each box. + Any resampled grid points that go beyond the bounary will be clipped. + output_size: a scalar indicating the output crop size. + sample_offset: a float number in [0, 1] indicates the subpixel sample offset + from grid point. + use_einsum_gather: use einsum to replace gather or not. Replacing einsum + with gather can improve performance when feature size is not large, einsum + is friendly with model partition as well. Gather's performance is better + when feature size is very large and there are multiple box levels. + + Returns: + features_per_box: a 5-D tensor of shape + [batch_size, num_boxes, output_size, output_size, num_filters] + representing the cropped features. + """ + (batch_size, num_levels, max_feature_height, max_feature_width, + num_filters) = features.get_shape().as_list() + if batch_size is None: + batch_size = tf.shape(features)[0] + _, num_boxes, _ = boxes.get_shape().as_list() + + kernel_y, kernel_x, box_gridy0y1, box_gridx0x1 = _compute_grid_positions( + boxes, boundaries, output_size, sample_offset) + x_indices = tf.cast( + tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size * 2]), + dtype=tf.int32) + y_indices = tf.cast( + tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size * 2]), + dtype=tf.int32) + + if use_einsum_gather: + # Blinear interpolation is done during the last two gathers: + # f(y, x) = [hy, ly] * [[f00, f01], * [hx, lx]^T + # [f10, f11]] + # [[f00, f01], + # [f10, f11]] = tf.einsum(tf.einsum(features, y_one_hot), x_one_hot) + # where [hy, ly] and [hx, lx] are the bilinear interpolation kernel. + y_indices = tf.cast( + tf.reshape(box_gridy0y1, [batch_size, num_boxes, output_size, 2]), + dtype=tf.int32) + x_indices = tf.cast( + tf.reshape(box_gridx0x1, [batch_size, num_boxes, output_size, 2]), + dtype=tf.int32) + + # shape is [batch_size, num_boxes, output_size, 2, height] + grid_y_one_hot = tf.one_hot( + tf.cast(y_indices, tf.int32), max_feature_height, dtype=kernel_y.dtype) + # shape is [batch_size, num_boxes, output_size, 2, width] + grid_x_one_hot = tf.one_hot( + tf.cast(x_indices, tf.int32), max_feature_width, dtype=kernel_x.dtype) + + # shape is [batch_size, num_boxes, output_size, height] + grid_y_weight = tf.reduce_sum( + tf.multiply(grid_y_one_hot, kernel_y), axis=-2) + # shape is [batch_size, num_boxes, output_size, width] + grid_x_weight = tf.reduce_sum( + tf.multiply(grid_x_one_hot, kernel_x), axis=-2) + + # Gather for y_axis. + # shape is [batch_size, num_boxes, output_size, width, features] + features_per_box = tf.einsum('bmhwf,bmoh->bmowf', features, + tf.cast(grid_y_weight, features.dtype)) + # Gather for x_axis. + # shape is [batch_size, num_boxes, output_size, output_size, features] + features_per_box = tf.einsum('bmhwf,bmow->bmhof', features_per_box, + tf.cast(grid_x_weight, features.dtype)) + else: + height_dim_offset = max_feature_width + level_dim_offset = max_feature_height * height_dim_offset + batch_dim_offset = num_levels * level_dim_offset + + batch_size_offset = tf.tile( + tf.reshape( + tf.range(batch_size) * batch_dim_offset, [batch_size, 1, 1, 1]), + [1, num_boxes, output_size * 2, output_size * 2]) + box_levels_offset = tf.tile( + tf.reshape(box_levels * level_dim_offset, + [batch_size, num_boxes, 1, 1]), + [1, 1, output_size * 2, output_size * 2]) + y_indices_offset = tf.tile( + tf.reshape(y_indices * height_dim_offset, + [batch_size, num_boxes, output_size * 2, 1]), + [1, 1, 1, output_size * 2]) + x_indices_offset = tf.tile( + tf.reshape(x_indices, [batch_size, num_boxes, 1, output_size * 2]), + [1, 1, output_size * 2, 1]) + + indices = tf.reshape( + batch_size_offset + box_levels_offset + y_indices_offset + + x_indices_offset, [-1]) + + features = tf.reshape(features, [-1, num_filters]) + # TODO(wangtao): replace tf.gather with tf.gather_nd and try to get similar + # performance. + features_per_box = tf.reshape( + tf.gather(features, indices), + [batch_size, num_boxes, output_size * 2, output_size * 2, num_filters]) + features_per_box = _feature_bilinear_interpolation( + features_per_box, kernel_y, kernel_x) + + return features_per_box + + +def crop_mask_in_target_box(masks: tf.Tensor, + boxes: tf.Tensor, + target_boxes: tf.Tensor, + output_size: int, + sample_offset: float = 0.0, + use_einsum: bool = True) -> tf.Tensor: + """Crop masks in target boxes. + + Args: + masks: A tensor with a shape of [batch_size, num_masks, height, width]. + boxes: a float tensor representing box cooridnates that tightly enclose + masks with a shape of [batch_size, num_masks, 4] in un-normalized + coordinates. A box is represented by [ymin, xmin, ymax, xmax]. + target_boxes: a float tensor representing target box cooridnates for masks + with a shape of [batch_size, num_masks, 4] in un-normalized coordinates. A + box is represented by [ymin, xmin, ymax, xmax]. + output_size: A scalar to indicate the output crop size. It currently only + supports to output a square shape outputs. + sample_offset: a float number in [0, 1] indicates the subpixel sample offset + from grid point. + use_einsum: Use einsum to replace gather in selective_crop_and_resize. + + Returns: + A 4-D tensor representing feature crop of shape + [batch_size, num_boxes, output_size, output_size]. + """ + with tf.name_scope('crop_mask_in_target_box'): + # Cast to float32, as the y_transform and other transform variables may + # overflow in float16 + masks = tf.cast(masks, tf.float32) + boxes = tf.cast(boxes, tf.float32) + target_boxes = tf.cast(target_boxes, tf.float32) + + batch_size, num_masks, height, width = masks.get_shape().as_list() + if batch_size is None: + batch_size = tf.shape(masks)[0] + masks = tf.reshape(masks, [batch_size * num_masks, height, width, 1]) + # Pad zeros on the boundary of masks. + masks = tf.image.pad_to_bounding_box(masks, 2, 2, height + 4, width + 4) + masks = tf.reshape(masks, [batch_size, num_masks, height+4, width+4, 1]) + + # Projects target box locations and sizes to corresponding cropped + # mask coordinates. + gt_y_min, gt_x_min, gt_y_max, gt_x_max = tf.split( + value=boxes, num_or_size_splits=4, axis=2) + bb_y_min, bb_x_min, bb_y_max, bb_x_max = tf.split( + value=target_boxes, num_or_size_splits=4, axis=2) + y_transform = (bb_y_min - gt_y_min) * height / ( + gt_y_max - gt_y_min + _EPSILON) + 2 + x_transform = (bb_x_min - gt_x_min) * height / ( + gt_x_max - gt_x_min + _EPSILON) + 2 + h_transform = (bb_y_max - bb_y_min) * width / ( + gt_y_max - gt_y_min + _EPSILON) + w_transform = (bb_x_max - bb_x_min) * width / ( + gt_x_max - gt_x_min + _EPSILON) + + boundaries = tf.concat( + [tf.ones_like(y_transform) * ((height + 4) - 1), + tf.ones_like(x_transform) * ((width + 4) - 1)], + axis=-1) + boundaries = tf.cast(boundaries, dtype=y_transform.dtype) + + # Reshape tensors to have the right shape for selective_crop_and_resize. + trasnformed_boxes = tf.concat( + [y_transform, x_transform, h_transform, w_transform], -1) + levels = tf.tile(tf.reshape(tf.range(num_masks), [1, num_masks]), + [batch_size, 1]) + + cropped_masks = _selective_crop_and_resize( + masks, + trasnformed_boxes, + levels, + boundaries, + output_size, + sample_offset=sample_offset, + use_einsum_gather=use_einsum) + cropped_masks = tf.squeeze(cropped_masks, axis=-1) + + return cropped_masks + + +def nearest_upsampling(data: tf.Tensor, + scale: int, + use_keras_layer: bool = False) -> tf.Tensor: + """Nearest neighbor upsampling implementation. + + Args: + data: A tensor with a shape of [batch, height_in, width_in, channels]. + scale: An integer multiple to scale resolution of input data. + use_keras_layer: If True, use keras Upsampling2D layer. + + Returns: + data_up: A tensor with a shape of + [batch, height_in*scale, width_in*scale, channels]. Same dtype as input + data. + """ + if use_keras_layer: + return tf.keras.layers.UpSampling2D(size=(scale, scale), + interpolation='nearest')(data) + with tf.name_scope('nearest_upsampling'): + bs, _, _, c = data.get_shape().as_list() + shape = tf.shape(input=data) + h = shape[1] + w = shape[2] + bs = -1 if bs is None else bs + # Uses reshape to quickly upsample the input. The nearest pixel is selected + # via tiling. + data = tf.tile( + tf.reshape(data, [bs, h, 1, w, 1, c]), [1, 1, scale, 1, scale, 1]) + return tf.reshape(data, [bs, h * scale, w * scale, c]) + + +def _gather_rows_from_matrix(input_matrix: tf.Tensor, + row_indices: tf.Tensor) -> tf.Tensor: + """Gather rows from the input matrix (2-D tensor). + + This operation is equivalent to tf.gather(input_matrix, row_indices), but is + implemented in sparse matrix multiplication. + + Args: + input_matrix: A 2-D tensor in shape (input_h, input_w) from which to gather + values. The shape must be 2-D, since sparse matrix multiplication is + currently only supported on 2-D matrices. + row_indices: A 1-D int tensor in shape (output_h) which stored the row + indices of the input. + + Returns: + A tensor in shape (output_h, input_w) which stores the gathered rows. + """ + input_matrix_shape = input_matrix.get_shape().as_list() + if len(input_matrix_shape) != 2: + raise ValueError( + 'Expected the input_matrix tensor (input_h, input_w) has rank == 2, ' + 'was: %s' % input_matrix_shape) + row_indices_shape = row_indices.get_shape().as_list() + if len(row_indices_shape) != 1: + raise ValueError( + 'Expected the row_indices tensor (output_h) has rank == 1, was: %s' % + row_indices_shape) + + # (output_h, input_h) + indices_one_hot = tf.one_hot( + row_indices, depth=input_matrix_shape[0], dtype=input_matrix.dtype) + # Matrix multiplication: (output_h, input_h) x (input_h, input_w) + # (output_h, input_w) + return tf.linalg.matmul(indices_one_hot, input_matrix, a_is_sparse=True) + + +def bilinear_resize_to_bbox(images: tf.Tensor, bbox: tf.Tensor, + output_size: tf.Tensor) -> tf.Tensor: + """Bilinear resizes the images to fit into the bounding boxes in the output. + + Args: + images: A tensor in shape (batch_size, input_h, input_w, ...) with arbitrary + numbers of channel dimensions. + bbox: A tensor in shape (batch_size, 4), representing the absolute + coordinates (ymin, xmin, ymax, xmax) for each bounding box. + output_size: The size of the output images in (output_h, output_w). + + Returns: + A tensor in shape (batch_size, output_h, output_w, ...). + """ + images_shape = images.get_shape().as_list() + images_rank = len(images_shape) + if images_rank < 3: + raise ValueError( + 'Expected the input images (batch_size, height, width, ...) ' + 'has rank >= 3, was: %s' % images_shape) + bbox_shape = bbox.get_shape().as_list() + if bbox_shape[-1] != 4: + raise ValueError( + 'Expected the last dimension of `bbox` has size == 4, but the shape ' + 'of `bbox` was: %s' % bbox_shape) + + rank_range = list(range(images_rank)) + extra_dims = images_shape[3:] + extra_dims_perm = rank_range[3:] + extra_dims_product = 1 + for d in extra_dims: + extra_dims_product *= d + + input_h = tf.cast(tf.shape(images)[1], tf.float32) + input_w = tf.cast(tf.shape(images)[2], tf.float32) + output_h = output_size[0] + output_w = output_size[1] + + bbox = tf.cast(bbox, tf.float32) + # (batch_size, 1) + bbox_ymin = bbox[:, 0:1] + bbox_xmin = bbox[:, 1:2] + bbox_ymax = bbox[:, 2:3] + bbox_xmax = bbox[:, 3:4] + bbox_h = bbox_ymax - bbox_ymin + bbox_w = bbox_xmax - bbox_xmin + scale_h = tf.math.divide_no_nan(input_h, bbox_h) + scale_w = tf.math.divide_no_nan(input_w, bbox_w) + + # Generates the output grids. + # (output_h) + output_y_grid = tf.range(output_h, dtype=bbox_ymin.dtype) + # (output_w) + output_x_grid = tf.range(output_w, dtype=bbox_xmin.dtype) + + # Computes the input source positions (float) which map to the output grids + # (integer). + # Applies half pixel offset here to ensure the output is center-aligned to the + # input. + # TODO(b/245614786): support align_corners=True. + # (batch_size, output_h) + input_y_pos = tf.clip_by_value( + (output_y_grid - bbox_ymin + 0.5) * scale_h - 0.5, 0.0, input_h - 1.0) + # (batch_size, output_w) + input_x_pos = tf.clip_by_value( + (output_x_grid - bbox_xmin + 0.5) * scale_w - 0.5, 0.0, input_w - 1.0) + + # Gets the positions (integer) of the four nearest neighbors of the input + # source position (float). + # (y0, x0): left-top + # (y0, x1): right-top + # (y1, x0): left-bottom + # (y1, x1): right-bottom + # (batch_size, output_h) + input_y0 = tf.cast( + tf.clip_by_value(tf.floor(input_y_pos), 0.0, input_h - 2.0), tf.int32) + input_y1 = input_y0 + 1 + # (batch_size, output_w) + input_x0 = tf.cast( + tf.clip_by_value(tf.floor(input_x_pos), 0.0, input_w - 2.0), tf.int32) + input_x1 = input_x0 + 1 + + # (batch_size, output_h) + output_y_mask = (bbox_ymin <= output_y_grid) & (output_y_grid < bbox_ymax) + # (batch_size, output_w) + output_x_mask = (bbox_xmin <= output_x_grid) & (output_x_grid < bbox_xmax) + + # Masks the output pixels outside the bounding box by setting their input + # neighbors to -1. This makes `tf.one_hot` operation produce all zeros at + # these pixels, so as to accelerate the sparse matrix multiplication in + # `_gather_rows_from_matrix`. + # (batch_size, output_h) + input_y0 = tf.where(output_y_mask, input_y0, -tf.ones_like(input_y0)) + input_y1 = tf.where(output_y_mask, input_y1, -tf.ones_like(input_y1)) + # (batch_size, output_w) + input_x0 = tf.where(output_x_mask, input_x0, -tf.ones_like(input_x0)) + input_x1 = tf.where(output_x_mask, input_x1, -tf.ones_like(input_x1)) + + input_h = tf.cast(input_h, tf.int32) + input_w = tf.cast(input_w, tf.int32) + images = tf.cast(images, tf.float32) + if images_rank > 3: + # Reshapes the images since _gather_rows_from_matrix only takes 2-D tensor. + # (batch_size, input_h, input_w * extra_dims_product) + images = tf.reshape(images, [-1, input_h, input_w * extra_dims_product]) + + # Fetches the rows from the input source images. + # (batch_size, output_h, input_w * extra_dims_product) + val_y0 = tf.map_fn( + lambda x: _gather_rows_from_matrix(x[0], x[1]), + elems=(images, input_y0), + fn_output_signature=tf.float32, + parallel_iterations=32) + val_y1 = tf.map_fn( + lambda x: _gather_rows_from_matrix(x[0], x[1]), + elems=(images, input_y1), + fn_output_signature=tf.float32, + parallel_iterations=32) + + if images_rank > 3: + new_shape = [-1, output_h, input_w] + extra_dims + # (batch_size, output_h, input_w, ...) + val_y0 = tf.reshape(val_y0, new_shape) + val_y1 = tf.reshape(val_y1, new_shape) + + # Transposes the tensors for reusing _gather_rows_from_matrix later. + new_perm = [0, 2, 1] + extra_dims_perm + # (batch_size, input_w, output_h, ...) + val_y0 = tf.transpose(val_y0, new_perm) + val_y1 = tf.transpose(val_y1, new_perm) + + if images_rank > 3: + new_shape = [-1, input_w, output_h * extra_dims_product] + # (batch_size, input_w, output_h * extra_dims_product) + val_y0 = tf.reshape(val_y0, new_shape) + val_y1 = tf.reshape(val_y1, new_shape) + + # Fetches the pixels from the rows using the column indices. + # val_00, val_01, val_10, val_11 store the pixels of the four nearest + # neighbors of the input source position. + # (batch_size, output_w, output_h * extra_dims_product) + val_00 = tf.map_fn( + lambda x: _gather_rows_from_matrix(x[0], x[1]), + elems=(val_y0, input_x0), + fn_output_signature=tf.float32, + parallel_iterations=32) + val_01 = tf.map_fn( + lambda x: _gather_rows_from_matrix(x[0], x[1]), + elems=(val_y0, input_x1), + fn_output_signature=tf.float32, + parallel_iterations=32) + val_10 = tf.map_fn( + lambda x: _gather_rows_from_matrix(x[0], x[1]), + elems=(val_y1, input_x0), + fn_output_signature=tf.float32, + parallel_iterations=32) + val_11 = tf.map_fn( + lambda x: _gather_rows_from_matrix(x[0], x[1]), + elems=(val_y1, input_x1), + fn_output_signature=tf.float32, + parallel_iterations=32) + + if images_rank > 3: + new_shape = [-1, output_w, output_h] + extra_dims + # (batch_size, output_w, output_h, ...) + val_00 = tf.reshape(val_00, new_shape) + val_01 = tf.reshape(val_01, new_shape) + val_10 = tf.reshape(val_10, new_shape) + val_11 = tf.reshape(val_11, new_shape) + + # (..., batch_size, output_h, output_w) + new_perm = extra_dims_perm + [0, 2, 1] + val_00 = tf.transpose(val_00, new_perm) + val_01 = tf.transpose(val_01, new_perm) + val_10 = tf.transpose(val_10, new_perm) + val_11 = tf.transpose(val_11, new_perm) + + # (batch_size, output_height, 1) + input_y_pos = input_y_pos[:, :, tf.newaxis] + input_y0 = tf.cast(input_y0[:, :, tf.newaxis], input_y_pos.dtype) + input_y1 = tf.cast(input_y1[:, :, tf.newaxis], input_y_pos.dtype) + # (batch_size, 1, output_width) + input_x_pos = input_x_pos[:, tf.newaxis, :] + input_x0 = tf.cast(input_x0[:, tf.newaxis, :], input_x_pos.dtype) + input_x1 = tf.cast(input_x1[:, tf.newaxis, :], input_x_pos.dtype) + + # Compute the weights of the four nearest neighbors for interpolation. + # (batch_size, output_height, output_width) + weight_00 = (input_y1 - input_y_pos) * (input_x1 - input_x_pos) + weight_01 = (input_y1 - input_y_pos) * (input_x_pos - input_x0) + weight_10 = (input_y_pos - input_y0) * (input_x1 - input_x_pos) + weight_11 = (input_y_pos - input_y0) * (input_x_pos - input_x0) + + # (..., batch_size, output_height, output_width) + output_images = ( + val_00 * weight_00 + val_01 * weight_01 + val_10 * weight_10 + + val_11 * weight_11) + + # (batch_size, output_height, output_width, ...) + return tf.transpose(output_images, np.roll(rank_range, -len(extra_dims))) + + +def bilinear_resize_with_crop_and_pad(images: tf.Tensor, + rescale_size: tf.Tensor, + crop_offset: tf.Tensor, + crop_size: tf.Tensor, + output_size: tf.Tensor) -> tf.Tensor: + """Bilinear resizes the images, then crops and finally pads to output size. + + Args: + images: A tensor in shape (batch_size, input_h, input_w, ...) with arbitrary + numbers of channel dimensions. + rescale_size: An int tensor in shape (batch_size, 2), representing the sizes + of the rescaled images. + crop_offset: An int tensor in shape (batch_size, 2), representing the + left-top offset of the crop box. Applying negative offsets means adding + extra margins at the left-top. + crop_size: An int tensor in shape (batch_size, 2), representing the sizes of + the cropped images. + output_size: The size of the output image in (output_h, output_w). + + Returns: + A tensor in shape (batch_size, output_h, output_w, ...). + """ + images_shape = images.get_shape().as_list() + images_rank = len(images_shape) + if images_rank < 3: + raise ValueError( + 'Expected the input images (batch_size, height, width, ...) ' + 'has rank >= 3, was: %s' % images_shape) + num_extra_dims = images_rank - 3 + + # Rescales the images, applies the offset and pastes to the output canvas. + + # (batch_size, 2) + ymin_xmin = -crop_offset + # (batch_size, 2) + ymax_xmax = ymin_xmin + tf.cast(rescale_size, ymin_xmin.dtype) + # (batch_size, 4) + rescale_bbox = tf.concat([ymin_xmin, ymax_xmax], axis=1) + # (batch_size, output_height, output_width, ...) + rescaled_padded_images = bilinear_resize_to_bbox(images, rescale_bbox, + output_size) + + # Masks out the pixels outside of the crop box. + # (batch_size, 2) + y0_x0 = tf.broadcast_to( + tf.constant([[0, 0]], dtype=crop_size.dtype), tf.shape(crop_size)) + # (batch_size, 4) + crop_bbox = tf.concat([y0_x0, crop_size], axis=1) + # (batch_size, output_height, output_width, ...) + crop_bbox_mask = bbox2mask( + crop_bbox, + image_height=output_size[0], + image_width=output_size[1], + dtype=rescaled_padded_images.dtype)[[...] + [tf.newaxis] * num_extra_dims] + # (batch_size, output_height, output_width, ...) + return rescaled_padded_images * crop_bbox_mask diff --git a/official/vision/beta/ops/target_gather.py b/official/vision/ops/target_gather.py similarity index 98% rename from official/vision/beta/ops/target_gather.py rename to official/vision/ops/target_gather.py index e9cbbe4c62d1cd53e621bd4970a19f623a35e357..3c8c3a0a417b2afe2a0b1b87be560f233dfba74d 100644 --- a/official/vision/beta/ops/target_gather.py +++ b/official/vision/ops/target_gather.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/ops/target_gather_test.py b/official/vision/ops/target_gather_test.py similarity index 95% rename from official/vision/beta/ops/target_gather_test.py rename to official/vision/ops/target_gather_test.py index e686271813b6a09acfd00a3c28df59b77ce09279..49d9f8f026ae622005b75c62694135b68c672775 100644 --- a/official/vision/beta/ops/target_gather_test.py +++ b/official/vision/ops/target_gather_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,7 +16,7 @@ import tensorflow as tf -from official.vision.beta.ops import target_gather +from official.vision.ops import target_gather class TargetGatherTest(tf.test.TestCase): diff --git a/official/vision/registry_imports.py b/official/vision/registry_imports.py new file mode 100644 index 0000000000000000000000000000000000000000..83dd2f7b7b599d109fb7e5958869802203633fb5 --- /dev/null +++ b/official/vision/registry_imports.py @@ -0,0 +1,18 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""All necessary imports for registration.""" +# pylint: disable=unused-import +from official import vision +from official.utils.testing import mock_task diff --git a/official/vision/serving/__init__.py b/official/vision/serving/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..3b0bcaab7e724c5b3d5aca48d9ea0b34f7d591d1 --- /dev/null +++ b/official/vision/serving/__init__.py @@ -0,0 +1,16 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tools for exporting models.""" +from official.vision.serving import export_saved_model_lib diff --git a/official/vision/serving/detection.py b/official/vision/serving/detection.py new file mode 100644 index 0000000000000000000000000000000000000000..587afc08068e88114d32aa7af8807c951f91853f --- /dev/null +++ b/official/vision/serving/detection.py @@ -0,0 +1,207 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Detection input and model functions for serving/inference.""" + +from typing import Mapping, Text + +from absl import logging +import tensorflow as tf + +from official.vision import configs +from official.vision.modeling import factory +from official.vision.ops import anchor +from official.vision.ops import box_ops +from official.vision.ops import preprocess_ops +from official.vision.serving import export_base + + +class DetectionModule(export_base.ExportModule): + """Detection Module.""" + + def _build_model(self): + + if self._batch_size is None: + # Only batched NMS is supported with dynamic batch size. + self.params.task.model.detection_generator.nms_version = 'batched' + logging.info( + 'nms_version is set to `batched` because only batched NMS is ' + 'supported with dynamic batch size.') + + input_specs = tf.keras.layers.InputSpec(shape=[self._batch_size] + + self._input_image_size + [3]) + + if isinstance(self.params.task.model, configs.maskrcnn.MaskRCNN): + model = factory.build_maskrcnn( + input_specs=input_specs, model_config=self.params.task.model) + elif isinstance(self.params.task.model, configs.retinanet.RetinaNet): + model = factory.build_retinanet( + input_specs=input_specs, model_config=self.params.task.model) + else: + raise ValueError('Detection module not implemented for {} model.'.format( + type(self.params.task.model))) + + return model + + def _build_anchor_boxes(self): + """Builds and returns anchor boxes.""" + model_params = self.params.task.model + input_anchor = anchor.build_anchor_generator( + min_level=model_params.min_level, + max_level=model_params.max_level, + num_scales=model_params.anchor.num_scales, + aspect_ratios=model_params.anchor.aspect_ratios, + anchor_size=model_params.anchor.anchor_size) + return input_anchor( + image_size=(self._input_image_size[0], self._input_image_size[1])) + + def _build_inputs(self, image): + """Builds detection model inputs for serving.""" + model_params = self.params.task.model + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image( + image, offset=preprocess_ops.MEAN_RGB, scale=preprocess_ops.STDDEV_RGB) + + image, image_info = preprocess_ops.resize_and_crop_image( + image, + self._input_image_size, + padded_size=preprocess_ops.compute_padded_size( + self._input_image_size, 2**model_params.max_level), + aug_scale_min=1.0, + aug_scale_max=1.0) + anchor_boxes = self._build_anchor_boxes() + + return image, anchor_boxes, image_info + + def preprocess(self, images: tf.Tensor) -> ( + tf.Tensor, Mapping[Text, tf.Tensor], tf.Tensor): + """Preprocess inputs to be suitable for the model. + + Args: + images: The images tensor. + Returns: + images: The images tensor cast to float. + anchor_boxes: Dict mapping anchor levels to anchor boxes. + image_info: Tensor containing the details of the image resizing. + + """ + model_params = self.params.task.model + with tf.device('cpu:0'): + images = tf.cast(images, dtype=tf.float32) + + # Tensor Specs for map_fn outputs (images, anchor_boxes, and image_info). + images_spec = tf.TensorSpec(shape=self._input_image_size + [3], + dtype=tf.float32) + + num_anchors = model_params.anchor.num_scales * len( + model_params.anchor.aspect_ratios) * 4 + anchor_shapes = [] + for level in range(model_params.min_level, model_params.max_level + 1): + anchor_level_spec = tf.TensorSpec( + shape=[ + self._input_image_size[0] // 2**level, + self._input_image_size[1] // 2**level, num_anchors + ], + dtype=tf.float32) + anchor_shapes.append((str(level), anchor_level_spec)) + + image_info_spec = tf.TensorSpec(shape=[4, 2], dtype=tf.float32) + + images, anchor_boxes, image_info = tf.nest.map_structure( + tf.identity, + tf.map_fn( + self._build_inputs, + elems=images, + fn_output_signature=(images_spec, dict(anchor_shapes), + image_info_spec), + parallel_iterations=32)) + + return images, anchor_boxes, image_info + + def serve(self, images: tf.Tensor): + """Cast image to float and run inference. + + Args: + images: uint8 Tensor of shape [batch_size, None, None, 3] + Returns: + Tensor holding detection output logits. + """ + + # Skip image preprocessing when input_type is tflite so it is compatible + # with TFLite quantization. + if self._input_type != 'tflite': + images, anchor_boxes, image_info = self.preprocess(images) + else: + with tf.device('cpu:0'): + anchor_boxes = self._build_anchor_boxes() + # image_info is a 3D tensor of shape [batch_size, 4, 2]. It is in the + # format of [[original_height, original_width], + # [desired_height, desired_width], [y_scale, x_scale], + # [y_offset, x_offset]]. When input_type is tflite, input image is + # supposed to be preprocessed already. + image_info = tf.convert_to_tensor([[ + self._input_image_size, self._input_image_size, [1.0, 1.0], [0, 0] + ]], + dtype=tf.float32) + input_image_shape = image_info[:, 1, :] + + # To overcome keras.Model extra limitation to save a model with layers that + # have multiple inputs, we use `model.call` here to trigger the forward + # path. Note that, this disables some keras magics happens in `__call__`. + detections = self.model.call( + images=images, + image_shape=input_image_shape, + anchor_boxes=anchor_boxes, + training=False) + + if self.params.task.model.detection_generator.apply_nms: + # For RetinaNet model, apply export_config. + # TODO(huizhongc): Add export_config to fasterrcnn and maskrcnn as needed. + if isinstance(self.params.task.model, configs.retinanet.RetinaNet): + export_config = self.params.task.export_config + # Normalize detection box coordinates to [0, 1]. + if export_config.output_normalized_coordinates: + detection_boxes = ( + detections['detection_boxes'] / + tf.tile(image_info[:, 2:3, :], [1, 1, 2])) + detections['detection_boxes'] = box_ops.normalize_boxes( + detection_boxes, image_info[:, 0:1, :]) + + # Cast num_detections and detection_classes to float. This allows the + # model inference to work on chain (go/chain) as chain requires floating + # point outputs. + if export_config.cast_num_detections_to_float: + detections['num_detections'] = tf.cast( + detections['num_detections'], dtype=tf.float32) + if export_config.cast_detection_classes_to_float: + detections['detection_classes'] = tf.cast( + detections['detection_classes'], dtype=tf.float32) + + final_outputs = { + 'detection_boxes': detections['detection_boxes'], + 'detection_scores': detections['detection_scores'], + 'detection_classes': detections['detection_classes'], + 'num_detections': detections['num_detections'] + } + else: + final_outputs = { + 'decoded_boxes': detections['decoded_boxes'], + 'decoded_box_scores': detections['decoded_box_scores'] + } + + if 'detection_masks' in detections.keys(): + final_outputs['detection_masks'] = detections['detection_masks'] + + final_outputs.update({'image_info': image_info}) + return final_outputs diff --git a/official/vision/serving/detection_test.py b/official/vision/serving/detection_test.py new file mode 100644 index 0000000000000000000000000000000000000000..82ecc834d460ba3825bd0399858a36f3abcc4b92 --- /dev/null +++ b/official/vision/serving/detection_test.py @@ -0,0 +1,132 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Test for image detection export lib.""" + +import io +import os + +from absl.testing import parameterized +import numpy as np +from PIL import Image +import tensorflow as tf + +from official.core import exp_factory +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.serving import detection + + +class DetectionExportTest(tf.test.TestCase, parameterized.TestCase): + + def _get_detection_module(self, experiment_name, input_type): + params = exp_factory.get_exp_config(experiment_name) + params.task.model.backbone.resnet.model_id = 18 + params.task.model.detection_generator.nms_version = 'batched' + detection_module = detection.DetectionModule( + params, + batch_size=1, + input_image_size=[640, 640], + input_type=input_type) + return detection_module + + def _export_from_module(self, module, input_type, save_directory): + signatures = module.get_inference_signatures( + {input_type: 'serving_default'}) + tf.saved_model.save(module, save_directory, signatures=signatures) + + def _get_dummy_input(self, input_type, batch_size, image_size): + """Get dummy input for the given input type.""" + h, w = image_size + + if input_type == 'image_tensor': + return tf.zeros((batch_size, h, w, 3), dtype=np.uint8) + elif input_type == 'image_bytes': + image = Image.fromarray(np.zeros((h, w, 3), dtype=np.uint8)) + byte_io = io.BytesIO() + image.save(byte_io, 'PNG') + return [byte_io.getvalue() for b in range(batch_size)] + elif input_type == 'tf_example': + image_tensor = tf.zeros((h, w, 3), dtype=tf.uint8) + encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() + example = tf.train.Example( + features=tf.train.Features( + feature={ + 'image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[encoded_jpeg])), + })).SerializeToString() + return [example for b in range(batch_size)] + elif input_type == 'tflite': + return tf.zeros((batch_size, h, w, 3), dtype=np.float32) + + @parameterized.parameters( + ('image_tensor', 'fasterrcnn_resnetfpn_coco', [384, 384]), + ('image_bytes', 'fasterrcnn_resnetfpn_coco', [640, 640]), + ('tf_example', 'fasterrcnn_resnetfpn_coco', [640, 640]), + ('tflite', 'fasterrcnn_resnetfpn_coco', [640, 640]), + ('image_tensor', 'maskrcnn_resnetfpn_coco', [640, 640]), + ('image_bytes', 'maskrcnn_resnetfpn_coco', [640, 384]), + ('tf_example', 'maskrcnn_resnetfpn_coco', [640, 640]), + ('tflite', 'maskrcnn_resnetfpn_coco', [640, 640]), + ('image_tensor', 'retinanet_resnetfpn_coco', [640, 640]), + ('image_bytes', 'retinanet_resnetfpn_coco', [640, 640]), + ('tf_example', 'retinanet_resnetfpn_coco', [384, 640]), + ('tflite', 'retinanet_resnetfpn_coco', [640, 640]), + ('image_tensor', 'retinanet_resnetfpn_coco', [384, 384]), + ('image_bytes', 'retinanet_spinenet_coco', [640, 640]), + ('tf_example', 'retinanet_spinenet_coco', [640, 384]), + ('tflite', 'retinanet_spinenet_coco', [640, 640]), + ) + def test_export(self, input_type, experiment_name, image_size): + tmp_dir = self.get_temp_dir() + module = self._get_detection_module(experiment_name, input_type) + + self._export_from_module(module, input_type, tmp_dir) + + self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) + self.assertTrue( + os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) + self.assertTrue( + os.path.exists( + os.path.join(tmp_dir, 'variables', + 'variables.data-00000-of-00001'))) + + imported = tf.saved_model.load(tmp_dir) + detection_fn = imported.signatures['serving_default'] + + images = self._get_dummy_input( + input_type, batch_size=1, image_size=image_size) + + signatures = module.get_inference_signatures( + {input_type: 'serving_default'}) + expected_outputs = signatures['serving_default'](tf.constant(images)) + outputs = detection_fn(tf.constant(images)) + + self.assertAllEqual(outputs['detection_boxes'].numpy(), + expected_outputs['detection_boxes'].numpy()) + self.assertAllEqual(outputs['detection_classes'].numpy(), + expected_outputs['detection_classes'].numpy()) + self.assertAllEqual(outputs['detection_scores'].numpy(), + expected_outputs['detection_scores'].numpy()) + self.assertAllEqual(outputs['num_detections'].numpy(), + expected_outputs['num_detections'].numpy()) + + def test_build_model_fail_with_none_batch_size(self): + params = exp_factory.get_exp_config('retinanet_resnetfpn_coco') + detection.DetectionModule( + params, batch_size=None, input_image_size=[640, 640]) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/serving/export_base.py b/official/vision/serving/export_base.py similarity index 92% rename from official/vision/beta/serving/export_base.py rename to official/vision/serving/export_base.py index efdc61e60a4511cef7dd64a0df7f828a79fcc9c8..a40132a438b6fadac7e1d3b42d431bbe7aedbfbd 100644 --- a/official/vision/beta/serving/export_base.py +++ b/official/vision/serving/export_base.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """Base class for model export.""" import abc @@ -33,7 +32,8 @@ class ExportModule(export_base.ExportModule, metaclass=abc.ABCMeta): input_image_size: List[int], input_type: str = 'image_tensor', num_channels: int = 3, - model: Optional[tf.keras.Model] = None): + model: Optional[tf.keras.Model] = None, + input_name: Optional[str] = None): """Initializes a module for export. Args: @@ -44,12 +44,14 @@ class ExportModule(export_base.ExportModule, metaclass=abc.ABCMeta): input_type: The input signature type. num_channels: The number of the image channels. model: A tf.keras.Model instance to be exported. + input_name: A customized input tensor name. """ self.params = params self._batch_size = batch_size self._input_image_size = input_image_size self._num_channels = num_channels self._input_type = input_type + self._input_name = input_name if model is None: model = self._build_model() # pylint: disable=assignment-from-none super().__init__(params=params, model=model) @@ -164,19 +166,20 @@ class ExportModule(export_base.ExportModule, metaclass=abc.ABCMeta): input_signature = tf.TensorSpec( shape=[self._batch_size] + [None] * len(self._input_image_size) + [self._num_channels], - dtype=tf.uint8) + dtype=tf.uint8, + name=self._input_name) signatures[ def_name] = self.inference_from_image_tensors.get_concrete_function( input_signature) elif key == 'image_bytes': input_signature = tf.TensorSpec( - shape=[self._batch_size], dtype=tf.string) + shape=[self._batch_size], dtype=tf.string, name=self._input_name) signatures[ def_name] = self.inference_from_image_bytes.get_concrete_function( input_signature) elif key == 'serve_examples' or key == 'tf_example': input_signature = tf.TensorSpec( - shape=[self._batch_size], dtype=tf.string) + shape=[self._batch_size], dtype=tf.string, name=self._input_name) signatures[ def_name] = self.inference_from_tf_example.get_concrete_function( input_signature) @@ -184,7 +187,8 @@ class ExportModule(export_base.ExportModule, metaclass=abc.ABCMeta): input_signature = tf.TensorSpec( shape=[self._batch_size] + self._input_image_size + [self._num_channels], - dtype=tf.float32) + dtype=tf.float32, + name=self._input_name) signatures[def_name] = self.inference_for_tflite.get_concrete_function( input_signature) else: diff --git a/official/vision/beta/serving/export_base_v2.py b/official/vision/serving/export_base_v2.py similarity index 97% rename from official/vision/beta/serving/export_base_v2.py rename to official/vision/serving/export_base_v2.py index f3f148bb5477a4e22947ee335bfa44ccb4ade047..25469b1bb6ddb4397879f4b00ebc293189d71e50 100644 --- a/official/vision/beta/serving/export_base_v2.py +++ b/official/vision/serving/export_base_v2.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/beta/serving/export_base_v2_test.py b/official/vision/serving/export_base_v2_test.py similarity index 95% rename from official/vision/beta/serving/export_base_v2_test.py rename to official/vision/serving/export_base_v2_test.py index a1bb2a36f5b133af49ccfd9ee1d8915d9e3e3233..16ac8a13cb8207ad059eaf6ec08109bb98c0739b 100644 --- a/official/vision/beta/serving/export_base_v2_test.py +++ b/official/vision/serving/export_base_v2_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -18,7 +18,7 @@ import os import tensorflow as tf from official.core import export_base -from official.vision.beta.serving import export_base_v2 +from official.vision.serving import export_base_v2 class TestModel(tf.keras.Model): diff --git a/official/vision/serving/export_module_factory.py b/official/vision/serving/export_module_factory.py new file mode 100644 index 0000000000000000000000000000000000000000..123821af618db7e8b01020e31fc9da23fe5905b5 --- /dev/null +++ b/official/vision/serving/export_module_factory.py @@ -0,0 +1,89 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Factory for vision export modules.""" + +from typing import List, Optional + +import tensorflow as tf + +from official.core import config_definitions as cfg +from official.vision import configs +from official.vision.dataloaders import classification_input +from official.vision.modeling import factory +from official.vision.serving import export_base_v2 as export_base +from official.vision.serving import export_utils + + +def create_classification_export_module(params: cfg.ExperimentConfig, + input_type: str, + batch_size: int, + input_image_size: List[int], + num_channels: int = 3): + """Creats classification export module.""" + input_signature = export_utils.get_image_input_signatures( + input_type, batch_size, input_image_size, num_channels) + input_specs = tf.keras.layers.InputSpec( + shape=[batch_size] + input_image_size + [num_channels]) + + model = factory.build_classification_model( + input_specs=input_specs, + model_config=params.task.model, + l2_regularizer=None) + + def preprocess_fn(inputs): + image_tensor = export_utils.parse_image(inputs, input_type, + input_image_size, num_channels) + # If input_type is `tflite`, do not apply image preprocessing. + if input_type == 'tflite': + return image_tensor + + def preprocess_image_fn(inputs): + return classification_input.Parser.inference_fn( + inputs, input_image_size, num_channels) + + images = tf.map_fn( + preprocess_image_fn, elems=image_tensor, + fn_output_signature=tf.TensorSpec( + shape=input_image_size + [num_channels], + dtype=tf.float32)) + + return images + + def postprocess_fn(logits): + probs = tf.nn.softmax(logits) + return {'logits': logits, 'probs': probs} + + export_module = export_base.ExportModule(params, + model=model, + input_signature=input_signature, + preprocessor=preprocess_fn, + postprocessor=postprocess_fn) + return export_module + + +def get_export_module(params: cfg.ExperimentConfig, + input_type: str, + batch_size: Optional[int], + input_image_size: List[int], + num_channels: int = 3) -> export_base.ExportModule: + """Factory for export modules.""" + if isinstance(params.task, + configs.image_classification.ImageClassificationTask): + export_module = create_classification_export_module( + params, input_type, batch_size, input_image_size, num_channels) + else: + raise ValueError('Export module not implemented for {} task.'.format( + type(params.task))) + return export_module diff --git a/official/vision/beta/serving/export_module_factory_test.py b/official/vision/serving/export_module_factory_test.py similarity index 94% rename from official/vision/beta/serving/export_module_factory_test.py rename to official/vision/serving/export_module_factory_test.py index 4115611c444ed9a3557fc3c85ca784b19d09dae4..4d96db87a613cea48fe06b5fb2275bf0a7ac1f7d 100644 --- a/official/vision/beta/serving/export_module_factory_test.py +++ b/official/vision/serving/export_module_factory_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -22,11 +22,11 @@ import numpy as np from PIL import Image import tensorflow as tf -from official.common import registry_imports # pylint: disable=unused-import from official.core import exp_factory from official.core import export_base -from official.vision.beta.dataloaders import classification_input -from official.vision.beta.serving import export_module_factory +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.dataloaders import classification_input +from official.vision.serving import export_module_factory class ImageClassificationExportTest(tf.test.TestCase, parameterized.TestCase): diff --git a/official/vision/serving/export_saved_model.py b/official/vision/serving/export_saved_model.py new file mode 100644 index 0000000000000000000000000000000000000000..c878c7ea1af344b882b4bcd9da3001818b46dfb3 --- /dev/null +++ b/official/vision/serving/export_saved_model.py @@ -0,0 +1,116 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Vision models export binary for serving/inference. + +To export a trained checkpoint in saved_model format (shell script): + +EXPERIMENT_TYPE = XX +CHECKPOINT_PATH = XX +EXPORT_DIR_PATH = XX +export_saved_model --experiment=${EXPERIMENT_TYPE} \ + --export_dir=${EXPORT_DIR_PATH}/ \ + --checkpoint_path=${CHECKPOINT_PATH} \ + --batch_size=2 \ + --input_image_size=224,224 + +To serve (python): + +export_dir_path = XX +input_type = XX +input_images = XX +imported = tf.saved_model.load(export_dir_path) +model_fn = imported.signatures['serving_default'] +output = model_fn(input_images) +""" + +from absl import app +from absl import flags + +from official.core import exp_factory +from official.modeling import hyperparams +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.serving import export_saved_model_lib + +FLAGS = flags.FLAGS + +_EXPERIMENT = flags.DEFINE_string( + 'experiment', None, 'experiment type, e.g. retinanet_resnetfpn_coco') +_EXPORT_DIR = flags.DEFINE_string('export_dir', None, 'The export directory.') +_CHECKPOINT_PATH = flags.DEFINE_string('checkpoint_path', None, + 'Checkpoint path.') +_CONFIG_FILE = flags.DEFINE_multi_string( + 'config_file', + default=None, + help='YAML/JSON files which specifies overrides. The override order ' + 'follows the order of args. Note that each file ' + 'can be used as an override template to override the default parameters ' + 'specified in Python. If the same parameter is specified in both ' + '`--config_file` and `--params_override`, `config_file` will be used ' + 'first, followed by params_override.') +_PARAMS_OVERRIDE = flags.DEFINE_string( + 'params_override', '', + 'The JSON/YAML file or string which specifies the parameter to be overriden' + ' on top of `config_file` template.') +_BATCH_SIZE = flags.DEFINE_integer('batch_size', None, 'The batch size.') +_IMAGE_TYPE = flags.DEFINE_string( + 'input_type', 'image_tensor', + 'One of `image_tensor`, `image_bytes`, `tf_example` and `tflite`.') +_INPUT_IMAGE_SIZE = flags.DEFINE_string( + 'input_image_size', '224,224', + 'The comma-separated string of two integers representing the height,width ' + 'of the input to the model.') +_EXPORT_CHECKPOINT_SUBDIR = flags.DEFINE_string( + 'export_checkpoint_subdir', 'checkpoint', + 'The subdirectory for checkpoints.') +_EXPORT_SAVED_MODEL_SUBDIR = flags.DEFINE_string( + 'export_saved_model_subdir', 'saved_model', + 'The subdirectory for saved model.') +_LOG_MODEL_FLOPS_AND_PARAMS = flags.DEFINE_bool( + 'log_model_flops_and_params', False, + 'If true, logs model flops and parameters.') +_INPUT_NAME = flags.DEFINE_string( + 'input_name', None, + 'Input tensor name in signature def. Default at None which' + 'produces input tensor name `inputs`.') + + +def main(_): + + params = exp_factory.get_exp_config(_EXPERIMENT.value) + for config_file in _CONFIG_FILE.value or []: + params = hyperparams.override_params_dict( + params, config_file, is_strict=True) + if _PARAMS_OVERRIDE.value: + params = hyperparams.override_params_dict( + params, _PARAMS_OVERRIDE.value, is_strict=True) + + params.validate() + params.lock() + + export_saved_model_lib.export_inference_graph( + input_type=_IMAGE_TYPE.value, + batch_size=_BATCH_SIZE.value, + input_image_size=[int(x) for x in _INPUT_IMAGE_SIZE.value.split(',')], + params=params, + checkpoint_path=_CHECKPOINT_PATH.value, + export_dir=_EXPORT_DIR.value, + export_checkpoint_subdir=_EXPORT_CHECKPOINT_SUBDIR.value, + export_saved_model_subdir=_EXPORT_SAVED_MODEL_SUBDIR.value, + log_model_flops_and_params=_LOG_MODEL_FLOPS_AND_PARAMS.value, + input_name=_INPUT_NAME.value) + + +if __name__ == '__main__': + app.run(main) diff --git a/official/vision/beta/serving/export_saved_model_lib.py b/official/vision/serving/export_saved_model_lib.py similarity index 80% rename from official/vision/beta/serving/export_saved_model_lib.py rename to official/vision/serving/export_saved_model_lib.py index dd8599dd3b30c848c91e9c8322fbf09714c4e1f8..72a4881509cdd23ee9155ec040693b08a5554b38 100644 --- a/official/vision/beta/serving/export_saved_model_lib.py +++ b/official/vision/serving/export_saved_model_lib.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,11 +12,10 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 r"""Vision models export utility function for serving/inference.""" import os -from typing import Optional, List +from typing import Optional, List, Union, Text, Dict from absl import logging import tensorflow as tf @@ -24,11 +23,11 @@ import tensorflow as tf from official.core import config_definitions as cfg from official.core import export_base from official.core import train_utils -from official.vision.beta import configs -from official.vision.beta.serving import detection -from official.vision.beta.serving import image_classification -from official.vision.beta.serving import semantic_segmentation -from official.vision.beta.serving import video_classification +from official.vision import configs +from official.vision.serving import detection +from official.vision.serving import image_classification +from official.vision.serving import semantic_segmentation +from official.vision.serving import video_classification def export_inference_graph( @@ -43,7 +42,10 @@ def export_inference_graph( export_checkpoint_subdir: Optional[str] = None, export_saved_model_subdir: Optional[str] = None, save_options: Optional[tf.saved_model.SaveOptions] = None, - log_model_flops_and_params: bool = False): + log_model_flops_and_params: bool = False, + checkpoint: Optional[tf.train.Checkpoint] = None, + input_name: Optional[str] = None, + function_keys: Optional[Union[List[Text], Dict[Text, Text]]] = None,): """Exports inference graph for the model specified in the exp config. Saved model is stored at export_dir/saved_model, checkpoint is saved @@ -67,6 +69,13 @@ def export_inference_graph( save_options: `SaveOptions` for `tf.saved_model.save`. log_model_flops_and_params: If True, writes model FLOPs to model_flops.txt and model parameters to model_params.txt. + checkpoint: An optional tf.train.Checkpoint. If provided, the export module + will use it to read the weights. + input_name: The input tensor name, default at `None` which produces input + tensor name `inputs`. + function_keys: a list of string keys to retrieve pre-defined serving + signatures. The signaute keys will be set with defaults. If a dictionary + is provided, the values will be used as signature keys. """ if export_checkpoint_subdir: @@ -90,7 +99,8 @@ def export_inference_graph( batch_size=batch_size, input_image_size=input_image_size, input_type=input_type, - num_channels=num_channels) + num_channels=num_channels, + input_name=input_name) elif isinstance(params.task, configs.retinanet.RetinaNetTask) or isinstance( params.task, configs.maskrcnn.MaskRCNNTask): export_module = detection.DetectionModule( @@ -98,7 +108,8 @@ def export_inference_graph( batch_size=batch_size, input_image_size=input_image_size, input_type=input_type, - num_channels=num_channels) + num_channels=num_channels, + input_name=input_name) elif isinstance(params.task, configs.semantic_segmentation.SemanticSegmentationTask): export_module = semantic_segmentation.SegmentationModule( @@ -106,7 +117,8 @@ def export_inference_graph( batch_size=batch_size, input_image_size=input_image_size, input_type=input_type, - num_channels=num_channels) + num_channels=num_channels, + input_name=input_name) elif isinstance(params.task, configs.video_classification.VideoClassificationTask): export_module = video_classification.VideoClassificationModule( @@ -114,15 +126,17 @@ def export_inference_graph( batch_size=batch_size, input_image_size=input_image_size, input_type=input_type, - num_channels=num_channels) + num_channels=num_channels, + input_name=input_name) else: raise ValueError('Export module not implemented for {} task.'.format( type(params.task))) export_base.export( export_module, - function_keys=[input_type], + function_keys=function_keys if function_keys else [input_type], export_savedmodel_dir=output_saved_model_directory, + checkpoint=checkpoint, checkpoint_path=checkpoint_path, timestamped=False, save_options=save_options) diff --git a/official/vision/beta/serving/export_saved_model_lib_test.py b/official/vision/serving/export_saved_model_lib_test.py similarity index 93% rename from official/vision/beta/serving/export_saved_model_lib_test.py rename to official/vision/serving/export_saved_model_lib_test.py index 82e5c8ba2e0810cfb060631e7c6ab1d88a69ee7a..5cbacbd342b1d23be1f0d336b76183ae469db381 100644 --- a/official/vision/beta/serving/export_saved_model_lib_test.py +++ b/official/vision/serving/export_saved_model_lib_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,8 +20,8 @@ from unittest import mock import tensorflow as tf from official.core import export_base -from official.vision.beta import configs -from official.vision.beta.serving import export_saved_model_lib +from official.vision import configs +from official.vision.serving import export_saved_model_lib class WriteModelFlopsAndParamsTest(tf.test.TestCase): diff --git a/official/vision/beta/serving/export_saved_model_lib_v2.py b/official/vision/serving/export_saved_model_lib_v2.py similarity index 85% rename from official/vision/beta/serving/export_saved_model_lib_v2.py rename to official/vision/serving/export_saved_model_lib_v2.py index 6260ad6cba6d0334addc7c50cf7dfef11a3ad9cb..8657f3e1d2a87659e5b10d266266fd37c618f675 100644 --- a/official/vision/beta/serving/export_saved_model_lib_v2.py +++ b/official/vision/serving/export_saved_model_lib_v2.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,14 +15,14 @@ r"""Vision models export utility function for serving/inference.""" import os -from typing import Optional, List +from typing import Optional, List, Union, Text, Dict import tensorflow as tf from official.core import config_definitions as cfg from official.core import export_base from official.core import train_utils -from official.vision.beta.serving import export_module_factory +from official.vision.serving import export_module_factory def export( @@ -36,6 +36,7 @@ def export( export_module: Optional[export_base.ExportModule] = None, export_checkpoint_subdir: Optional[str] = None, export_saved_model_subdir: Optional[str] = None, + function_keys: Optional[Union[List[Text], Dict[Text, Text]]] = None, save_options: Optional[tf.saved_model.SaveOptions] = None): """Exports the model specified in the exp config. @@ -57,6 +58,9 @@ def export( to store checkpoint. export_saved_model_subdir: Optional subdirectory under export_dir to store saved model. + function_keys: a list of string keys to retrieve pre-defined serving + signatures. The signaute keys will be set with defaults. If a dictionary + is provided, the values will be used as signature keys. save_options: `SaveOptions` for `tf.saved_model.save`. """ @@ -81,7 +85,7 @@ def export( export_base.export( export_module, - function_keys=[input_type], + function_keys=function_keys if function_keys else [input_type], export_savedmodel_dir=output_saved_model_directory, checkpoint_path=checkpoint_path, timestamped=False, diff --git a/official/vision/serving/export_tfhub.py b/official/vision/serving/export_tfhub.py new file mode 100644 index 0000000000000000000000000000000000000000..b6e939f25719f33912d523359e02e1df31d8ca4c --- /dev/null +++ b/official/vision/serving/export_tfhub.py @@ -0,0 +1,104 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""A script to export the image classification as a TF-Hub SavedModel.""" + +# Import libraries +from absl import app +from absl import flags + +import tensorflow as tf + +from official.core import exp_factory +from official.modeling import hyperparams +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.modeling import factory + + +FLAGS = flags.FLAGS + +flags.DEFINE_string( + 'experiment', None, 'experiment type, e.g. resnet_imagenet') +flags.DEFINE_string( + 'checkpoint_path', None, 'Checkpoint path.') +flags.DEFINE_string( + 'export_path', None, 'The export directory.') +flags.DEFINE_multi_string( + 'config_file', + None, + 'A YAML/JSON files which specifies overrides. The override order ' + 'follows the order of args. Note that each file ' + 'can be used as an override template to override the default parameters ' + 'specified in Python. If the same parameter is specified in both ' + '`--config_file` and `--params_override`, `config_file` will be used ' + 'first, followed by params_override.') +flags.DEFINE_string( + 'params_override', '', + 'The JSON/YAML file or string which specifies the parameter to be overriden' + ' on top of `config_file` template.') +flags.DEFINE_integer( + 'batch_size', None, 'The batch size.') +flags.DEFINE_string( + 'input_image_size', + '224,224', + 'The comma-separated string of two integers representing the height,width ' + 'of the input to the model.') +flags.DEFINE_boolean( + 'skip_logits_layer', + False, + 'Whether to skip the prediction layer and only output the feature vector.') + + +def export_model_to_tfhub(params, + batch_size, + input_image_size, + skip_logits_layer, + checkpoint_path, + export_path): + """Export an image classification model to TF-Hub.""" + input_specs = tf.keras.layers.InputSpec(shape=[batch_size] + + input_image_size + [3]) + + model = factory.build_classification_model( + input_specs=input_specs, + model_config=params.task.model, + l2_regularizer=None, + skip_logits_layer=skip_logits_layer) + checkpoint = tf.train.Checkpoint(model=model) + checkpoint.restore(checkpoint_path).assert_existing_objects_matched() + model.save(export_path, include_optimizer=False, save_format='tf') + + +def main(_): + params = exp_factory.get_exp_config(FLAGS.experiment) + for config_file in FLAGS.config_file or []: + params = hyperparams.override_params_dict( + params, config_file, is_strict=True) + if FLAGS.params_override: + params = hyperparams.override_params_dict( + params, FLAGS.params_override, is_strict=True) + params.validate() + params.lock() + + export_model_to_tfhub( + params=params, + batch_size=FLAGS.batch_size, + input_image_size=[int(x) for x in FLAGS.input_image_size.split(',')], + skip_logits_layer=FLAGS.skip_logits_layer, + checkpoint_path=FLAGS.checkpoint_path, + export_path=FLAGS.export_path) + + +if __name__ == '__main__': + app.run(main) diff --git a/official/vision/serving/export_tflite.py b/official/vision/serving/export_tflite.py new file mode 100644 index 0000000000000000000000000000000000000000..e20374630af004e291b1cf6a2090c2756556940e --- /dev/null +++ b/official/vision/serving/export_tflite.py @@ -0,0 +1,123 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +r"""Binary to convert a saved model to tflite model. + +It requires a SavedModel exported using export_saved_model.py with batch size 1 +and input type `tflite`, and using the same config file used for exporting saved +model. It includes optional post-training quantization. When using integer +quantization, calibration steps need to be provided to calibrate model input. + +To convert a SavedModel to a TFLite model: + +EXPERIMENT_TYPE = XX +TFLITE_PATH = XX +SAVED_MOODEL_DIR = XX +CONFIG_FILE = XX +export_tflite --experiment=${EXPERIMENT_TYPE} \ + --saved_model_dir=${SAVED_MOODEL_DIR} \ + --tflite_path=${TFLITE_PATH} \ + --config_file=${CONFIG_FILE} \ + --quant_type=fp16 \ + --calibration_steps=500 +""" +from absl import app +from absl import flags +from absl import logging + +import tensorflow as tf +from official.core import exp_factory +from official.modeling import hyperparams +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.serving import export_tflite_lib + +FLAGS = flags.FLAGS + +_EXPERIMENT = flags.DEFINE_string( + 'experiment', + None, + 'experiment type, e.g. retinanet_resnetfpn_coco', + required=True) +_CONFIG_FILE = flags.DEFINE_multi_string( + 'config_file', + default='', + help='YAML/JSON files which specifies overrides. The override order ' + 'follows the order of args. Note that each file ' + 'can be used as an override template to override the default parameters ' + 'specified in Python. If the same parameter is specified in both ' + '`--config_file` and `--params_override`, `config_file` will be used ' + 'first, followed by params_override.') +_PARAMS_OVERRIDE = flags.DEFINE_string( + 'params_override', '', + 'The JSON/YAML file or string which specifies the parameter to be overriden' + ' on top of `config_file` template.') +_SAVED_MODEL_DIR = flags.DEFINE_string( + 'saved_model_dir', None, 'The directory to the saved model.', required=True) +_TFLITE_PATH = flags.DEFINE_string( + 'tflite_path', None, 'The path to the output tflite model.', required=True) +_QUANT_TYPE = flags.DEFINE_string( + 'quant_type', + default=None, + help='Post training quantization type. Support `int8_fallback`, ' + '`int8_full_fp32_io`, `int8_full`, `fp16`, `qat`, `qat_fp32_io`, ' + '`int8_full_int8_io` and `default`. See ' + 'https://www.tensorflow.org/lite/performance/post_training_quantization ' + 'for more details.') +_CALIBRATION_STEPS = flags.DEFINE_integer( + 'calibration_steps', 500, + 'The number of calibration steps for integer model.') +_DENYLISTED_OPS = flags.DEFINE_string( + 'denylisted_ops', '', 'The comma-separated string of ops ' + 'that are excluded from integer quantization. The name of ' + 'ops should be all capital letters, such as CAST or GREATER.' + 'This is useful to exclude certains ops that affects quality or latency. ' + 'Valid ops that should not be included are quantization friendly ops, such ' + 'as CONV_2D, DEPTHWISE_CONV_2D, FULLY_CONNECTED, etc.') + + +def main(_) -> None: + params = exp_factory.get_exp_config(_EXPERIMENT.value) + if _CONFIG_FILE.value is not None: + for config_file in _CONFIG_FILE.value: + params = hyperparams.override_params_dict( + params, config_file, is_strict=True) + if _PARAMS_OVERRIDE.value: + params = hyperparams.override_params_dict( + params, _PARAMS_OVERRIDE.value, is_strict=True) + + params.validate() + params.lock() + + logging.info('Converting SavedModel from %s to TFLite model...', + _SAVED_MODEL_DIR.value) + + if _DENYLISTED_OPS.value: + denylisted_ops = list(_DENYLISTED_OPS.value.split(',')) + else: + denylisted_ops = None + tflite_model = export_tflite_lib.convert_tflite_model( + saved_model_dir=_SAVED_MODEL_DIR.value, + quant_type=_QUANT_TYPE.value, + params=params, + calibration_steps=_CALIBRATION_STEPS.value, + denylisted_ops=denylisted_ops) + + with tf.io.gfile.GFile(_TFLITE_PATH.value, 'wb') as fw: + fw.write(tflite_model) + + logging.info('TFLite model converted and saved to %s.', _TFLITE_PATH.value) + + +if __name__ == '__main__': + app.run(main) diff --git a/official/vision/serving/export_tflite_lib.py b/official/vision/serving/export_tflite_lib.py new file mode 100644 index 0000000000000000000000000000000000000000..d94abb557061120e46731bfb026865b70c8d9a07 --- /dev/null +++ b/official/vision/serving/export_tflite_lib.py @@ -0,0 +1,161 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Library to facilitate TFLite model conversion.""" +import functools +from typing import Iterator, List, Optional + +from absl import logging +import tensorflow as tf + +from official.core import base_task +from official.core import config_definitions as cfg +from official.vision import configs +from official.vision import tasks + + +def create_representative_dataset( + params: cfg.ExperimentConfig, + task: Optional[base_task.Task] = None) -> tf.data.Dataset: + """Creates a tf.data.Dataset to load images for representative dataset. + + Args: + params: An ExperimentConfig. + task: An optional task instance. If it is None, task will be built according + to the task type in params. + + Returns: + A tf.data.Dataset instance. + + Raises: + ValueError: If task is not supported. + """ + if task is None: + if isinstance(params.task, + configs.image_classification.ImageClassificationTask): + + task = tasks.image_classification.ImageClassificationTask(params.task) + elif isinstance(params.task, configs.retinanet.RetinaNetTask): + task = tasks.retinanet.RetinaNetTask(params.task) + elif isinstance(params.task, configs.maskrcnn.MaskRCNNTask): + task = tasks.maskrcnn.MaskRCNNTask(params.task) + elif isinstance(params.task, + configs.semantic_segmentation.SemanticSegmentationTask): + task = tasks.semantic_segmentation.SemanticSegmentationTask(params.task) + else: + raise ValueError('Task {} not supported.'.format(type(params.task))) + # Ensure batch size is 1 for TFLite model. + params.task.train_data.global_batch_size = 1 + params.task.train_data.dtype = 'float32' + logging.info('Task config: %s', params.task.as_dict()) + return task.build_inputs(params=params.task.train_data) + + +def representative_dataset( + params: cfg.ExperimentConfig, + task: Optional[base_task.Task] = None, + calibration_steps: int = 2000) -> Iterator[List[tf.Tensor]]: + """"Creates representative dataset for input calibration. + + Args: + params: An ExperimentConfig. + task: An optional task instance. If it is None, task will be built according + to the task type in params. + calibration_steps: The steps to do calibration. + + Yields: + An input image tensor. + """ + dataset = create_representative_dataset(params=params, task=task) + for image, _ in dataset.take(calibration_steps): + # Skip images that do not have 3 channels. + if image.shape[-1] != 3: + continue + yield [image] + + +def convert_tflite_model(saved_model_dir: str, + quant_type: Optional[str] = None, + params: Optional[cfg.ExperimentConfig] = None, + task: Optional[base_task.Task] = None, + calibration_steps: Optional[int] = 2000, + denylisted_ops: Optional[list[str]] = None) -> bytes: + """Converts and returns a TFLite model. + + Args: + saved_model_dir: The directory to the SavedModel. + quant_type: The post training quantization (PTQ) method. It can be one of + `default` (dynamic range), `fp16` (float16), `int8` (integer wih float + fallback), `int8_full` (integer only) and None (no quantization). + params: An optional ExperimentConfig to load and preprocess input images to + do calibration for integer quantization. + task: An optional task instance. If it is None, task will be built according + to the task type in params. + calibration_steps: The steps to do calibration. + denylisted_ops: A list of strings containing ops that are excluded from + integer quantization. + + Returns: + A converted TFLite model with optional PTQ. + + Raises: + ValueError: If `representative_dataset_path` is not present if integer + quantization is requested. + """ + converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir) + if quant_type: + if quant_type.startswith('int8'): + converter.optimizations = [tf.lite.Optimize.DEFAULT] + converter.representative_dataset = functools.partial( + representative_dataset, + params=params, + task=task, + calibration_steps=calibration_steps) + if quant_type.startswith('int8_full'): + converter.target_spec.supported_ops = [ + tf.lite.OpsSet.TFLITE_BUILTINS_INT8 + ] + if quant_type == 'int8_full': + converter.inference_input_type = tf.uint8 + converter.inference_output_type = tf.uint8 + if quant_type == 'int8_full_int8_io': + converter.inference_input_type = tf.int8 + converter.inference_output_type = tf.int8 + + if denylisted_ops: + debug_options = tf.lite.experimental.QuantizationDebugOptions( + denylisted_ops=denylisted_ops) + debugger = tf.lite.experimental.QuantizationDebugger( + converter=converter, + debug_dataset=functools.partial( + representative_dataset, + params=params, + calibration_steps=calibration_steps), + debug_options=debug_options) + debugger.run() + return debugger.get_nondebug_quantized_model() + + elif quant_type == 'fp16': + converter.optimizations = [tf.lite.Optimize.DEFAULT] + converter.target_spec.supported_types = [tf.float16] + elif quant_type in ('default', 'qat_fp32_io'): + converter.optimizations = [tf.lite.Optimize.DEFAULT] + elif quant_type == 'qat': + converter.optimizations = [tf.lite.Optimize.DEFAULT] + converter.inference_input_type = tf.uint8 # or tf.int8 + converter.inference_output_type = tf.uint8 # or tf.int8 + else: + raise ValueError(f'quantization type {quant_type} is not supported.') + + return converter.convert() diff --git a/official/vision/serving/export_tflite_lib_test.py b/official/vision/serving/export_tflite_lib_test.py new file mode 100644 index 0000000000000000000000000000000000000000..e7010d7fe7108f77a40f6e00ef049ad058525ea2 --- /dev/null +++ b/official/vision/serving/export_tflite_lib_test.py @@ -0,0 +1,192 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tests for export_tflite_lib.""" +import os + +from absl.testing import parameterized +import tensorflow as tf + +from tensorflow.python.distribute import combinations +from official.core import exp_factory +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.dataloaders import tfexample_utils +from official.vision.serving import detection as detection_serving +from official.vision.serving import export_tflite_lib +from official.vision.serving import image_classification as image_classification_serving +from official.vision.serving import semantic_segmentation as semantic_segmentation_serving + + +class ExportTfliteLibTest(tf.test.TestCase, parameterized.TestCase): + + def setUp(self): + super().setUp() + # Create test data for image classification. + self.test_tfrecord_file_cls = os.path.join(self.get_temp_dir(), + 'cls_test.tfrecord') + example = tf.train.Example.FromString( + tfexample_utils.create_classification_example( + image_height=224, image_width=224)) + self._create_test_tfrecord( + tfrecord_file=self.test_tfrecord_file_cls, + example=example, + num_samples=10) + + # Create test data for object detection. + self.test_tfrecord_file_det = os.path.join(self.get_temp_dir(), + 'det_test.tfrecord') + example = tfexample_utils.create_detection_test_example( + image_height=128, image_width=128, image_channel=3, num_instances=10) + self._create_test_tfrecord( + tfrecord_file=self.test_tfrecord_file_det, + example=example, + num_samples=10) + + # Create test data for semantic segmentation. + self.test_tfrecord_file_seg = os.path.join(self.get_temp_dir(), + 'seg_test.tfrecord') + example = tfexample_utils.create_segmentation_test_example( + image_height=512, image_width=512, image_channel=3) + self._create_test_tfrecord( + tfrecord_file=self.test_tfrecord_file_seg, + example=example, + num_samples=10) + + def _create_test_tfrecord(self, tfrecord_file, example, num_samples): + examples = [example] * num_samples + tfexample_utils.dump_to_tfrecord( + record_file=tfrecord_file, tf_examples=examples) + + def _export_from_module(self, module, input_type, saved_model_dir): + signatures = module.get_inference_signatures( + {input_type: 'serving_default'}) + tf.saved_model.save(module, saved_model_dir, signatures=signatures) + + @combinations.generate( + combinations.combine( + experiment=['mobilenet_imagenet'], + quant_type=[ + None, + 'default', + 'fp16', + 'int8_fallback', + 'int8_full', + 'int8_full_fp32_io', + 'int8_full_int8_io', + ])) + def test_export_tflite_image_classification(self, experiment, quant_type): + + params = exp_factory.get_exp_config(experiment) + params.task.validation_data.input_path = self.test_tfrecord_file_cls + params.task.train_data.input_path = self.test_tfrecord_file_cls + params.task.train_data.shuffle_buffer_size = 10 + temp_dir = self.get_temp_dir() + module = image_classification_serving.ClassificationModule( + params=params, + batch_size=1, + input_image_size=[224, 224], + input_type='tflite') + self._export_from_module( + module=module, + input_type='tflite', + saved_model_dir=os.path.join(temp_dir, 'saved_model')) + + tflite_model = export_tflite_lib.convert_tflite_model( + saved_model_dir=os.path.join(temp_dir, 'saved_model'), + quant_type=quant_type, + params=params, + calibration_steps=5) + + self.assertIsInstance(tflite_model, bytes) + + @combinations.generate( + combinations.combine( + experiment=['retinanet_mobile_coco'], + quant_type=[ + None, + 'default', + 'fp16', + 'int8_fallback', + 'int8_full', + 'int8_full_fp32_io', + 'int8_full_int8_io', + ])) + def test_export_tflite_detection(self, experiment, quant_type): + + params = exp_factory.get_exp_config(experiment) + params.task.validation_data.input_path = self.test_tfrecord_file_det + params.task.train_data.input_path = self.test_tfrecord_file_det + params.task.model.num_classes = 2 + params.task.model.backbone.spinenet_mobile.model_id = '49XS' + params.task.model.input_size = [128, 128, 3] + params.task.model.detection_generator.nms_version = 'v1' + params.task.train_data.shuffle_buffer_size = 5 + temp_dir = self.get_temp_dir() + module = detection_serving.DetectionModule( + params=params, + batch_size=1, + input_image_size=[128, 128], + input_type='tflite') + self._export_from_module( + module=module, + input_type='tflite', + saved_model_dir=os.path.join(temp_dir, 'saved_model')) + + tflite_model = export_tflite_lib.convert_tflite_model( + saved_model_dir=os.path.join(temp_dir, 'saved_model'), + quant_type=quant_type, + params=params, + calibration_steps=1) + + self.assertIsInstance(tflite_model, bytes) + + @combinations.generate( + combinations.combine( + experiment=['mnv2_deeplabv3_pascal'], + quant_type=[ + None, + 'default', + 'fp16', + 'int8_fallback', + 'int8_full', + 'int8_full_fp32_io', + 'int8_full_int8_io', + ])) + def test_export_tflite_semantic_segmentation(self, experiment, quant_type): + + params = exp_factory.get_exp_config(experiment) + params.task.validation_data.input_path = self.test_tfrecord_file_seg + params.task.train_data.input_path = self.test_tfrecord_file_seg + params.task.train_data.shuffle_buffer_size = 10 + temp_dir = self.get_temp_dir() + module = semantic_segmentation_serving.SegmentationModule( + params=params, + batch_size=1, + input_image_size=[512, 512], + input_type='tflite') + self._export_from_module( + module=module, + input_type='tflite', + saved_model_dir=os.path.join(temp_dir, 'saved_model')) + + tflite_model = export_tflite_lib.convert_tflite_model( + saved_model_dir=os.path.join(temp_dir, 'saved_model'), + quant_type=quant_type, + params=params, + calibration_steps=5) + + self.assertIsInstance(tflite_model, bytes) + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/beta/serving/export_utils.py b/official/vision/serving/export_utils.py similarity index 98% rename from official/vision/beta/serving/export_utils.py rename to official/vision/serving/export_utils.py index e3f650d42d440de0a04e4c8cb6c7ccfa56a1a701..5c9c5ea5e21487b3a7ad284a0033769ae24d0c3b 100644 --- a/official/vision/beta/serving/export_utils.py +++ b/official/vision/serving/export_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/serving/image_classification.py b/official/vision/serving/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..d10ce1cadf52cb78a83f631c9c498d2ddc20683a --- /dev/null +++ b/official/vision/serving/image_classification.py @@ -0,0 +1,82 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification input and model functions for serving/inference.""" + +import tensorflow as tf + +from official.vision.modeling import factory +from official.vision.ops import preprocess_ops +from official.vision.serving import export_base + + +class ClassificationModule(export_base.ExportModule): + """classification Module.""" + + def _build_model(self): + input_specs = tf.keras.layers.InputSpec( + shape=[self._batch_size] + self._input_image_size + [3]) + + return factory.build_classification_model( + input_specs=input_specs, + model_config=self.params.task.model, + l2_regularizer=None) + + def _build_inputs(self, image): + """Builds classification model inputs for serving.""" + # Center crops and resizes image. + if self.params.task.train_data.aug_crop: + image = preprocess_ops.center_crop_image(image) + + image = tf.image.resize( + image, self._input_image_size, method=tf.image.ResizeMethod.BILINEAR) + + image = tf.reshape( + image, [self._input_image_size[0], self._input_image_size[1], 3]) + + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image( + image, offset=preprocess_ops.MEAN_RGB, scale=preprocess_ops.STDDEV_RGB) + return image + + def serve(self, images): + """Cast image to float and run inference. + + Args: + images: uint8 Tensor of shape [batch_size, None, None, 3] + Returns: + Tensor holding classification output logits. + """ + # Skip image preprocessing when input_type is tflite so it is compatible + # with TFLite quantization. + if self._input_type != 'tflite': + with tf.device('cpu:0'): + images = tf.cast(images, dtype=tf.float32) + + images = tf.nest.map_structure( + tf.identity, + tf.map_fn( + self._build_inputs, + elems=images, + fn_output_signature=tf.TensorSpec( + shape=self._input_image_size + [3], dtype=tf.float32), + parallel_iterations=32)) + + logits = self.inference_step(images) + if self.params.task.train_data.is_multilabel: + probs = tf.math.sigmoid(logits) + else: + probs = tf.nn.softmax(logits) + + return {'logits': logits, 'probs': probs} diff --git a/official/vision/serving/image_classification_test.py b/official/vision/serving/image_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..cc859d9e77d18629fe87c5c7b70771c42983b2dd --- /dev/null +++ b/official/vision/serving/image_classification_test.py @@ -0,0 +1,120 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Test for image classification export lib.""" + +import io +import os + +from absl.testing import parameterized +import numpy as np +from PIL import Image +import tensorflow as tf + +from official.core import exp_factory +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.serving import image_classification + + +class ImageClassificationExportTest(tf.test.TestCase, parameterized.TestCase): + + def _get_classification_module(self, input_type): + params = exp_factory.get_exp_config('resnet_imagenet') + params.task.model.backbone.resnet.model_id = 18 + classification_module = image_classification.ClassificationModule( + params, + batch_size=1, + input_image_size=[224, 224], + input_type=input_type) + return classification_module + + def _export_from_module(self, module, input_type, save_directory): + signatures = module.get_inference_signatures( + {input_type: 'serving_default'}) + tf.saved_model.save(module, + save_directory, + signatures=signatures) + + def _get_dummy_input(self, input_type): + """Get dummy input for the given input type.""" + + if input_type == 'image_tensor': + return tf.zeros((1, 224, 224, 3), dtype=np.uint8) + elif input_type == 'image_bytes': + image = Image.fromarray(np.zeros((224, 224, 3), dtype=np.uint8)) + byte_io = io.BytesIO() + image.save(byte_io, 'PNG') + return [byte_io.getvalue()] + elif input_type == 'tf_example': + image_tensor = tf.zeros((224, 224, 3), dtype=tf.uint8) + encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() + example = tf.train.Example( + features=tf.train.Features( + feature={ + 'image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[encoded_jpeg])), + })).SerializeToString() + return [example] + elif input_type == 'tflite': + return tf.zeros((1, 224, 224, 3), dtype=np.float32) + + @parameterized.parameters( + {'input_type': 'image_tensor'}, + {'input_type': 'image_bytes'}, + {'input_type': 'tf_example'}, + {'input_type': 'tflite'}, + ) + def test_export(self, input_type='image_tensor'): + tmp_dir = self.get_temp_dir() + module = self._get_classification_module(input_type) + # Test that the model restores any attrs that are trackable objects + # (eg: tables, resource variables, keras models/layers, tf.hub modules). + module.model.test_trackable = tf.keras.layers.InputLayer(input_shape=(4,)) + + self._export_from_module(module, input_type, tmp_dir) + + self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) + self.assertTrue(os.path.exists( + os.path.join(tmp_dir, 'variables', 'variables.index'))) + self.assertTrue(os.path.exists( + os.path.join(tmp_dir, 'variables', 'variables.data-00000-of-00001'))) + + imported = tf.saved_model.load(tmp_dir) + classification_fn = imported.signatures['serving_default'] + + images = self._get_dummy_input(input_type) + if input_type != 'tflite': + processed_images = tf.nest.map_structure( + tf.stop_gradient, + tf.map_fn( + module._build_inputs, + elems=tf.zeros((1, 224, 224, 3), dtype=tf.uint8), + fn_output_signature=tf.TensorSpec( + shape=[224, 224, 3], dtype=tf.float32))) + else: + processed_images = images + expected_logits = module.model(processed_images, training=False) + expected_prob = tf.nn.softmax(expected_logits) + out = classification_fn(tf.constant(images)) + + # The imported model should contain any trackable attrs that the original + # model had. + self.assertTrue(hasattr(imported.model, 'test_trackable')) + self.assertAllClose(out['logits'].numpy(), expected_logits.numpy()) + self.assertAllClose(out['probs'].numpy(), expected_prob.numpy()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/serving/semantic_segmentation.py b/official/vision/serving/semantic_segmentation.py new file mode 100644 index 0000000000000000000000000000000000000000..2fc5bd5959cdb1c791072d1c2a63321c48a1cbee --- /dev/null +++ b/official/vision/serving/semantic_segmentation.py @@ -0,0 +1,107 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Semantic segmentation input and model functions for serving/inference.""" + +import tensorflow as tf + +from official.vision.modeling import factory +from official.vision.ops import preprocess_ops +from official.vision.serving import export_base + + +class SegmentationModule(export_base.ExportModule): + """Segmentation Module.""" + + def _build_model(self): + input_specs = tf.keras.layers.InputSpec( + shape=[self._batch_size] + self._input_image_size + [3]) + + return factory.build_segmentation_model( + input_specs=input_specs, + model_config=self.params.task.model, + l2_regularizer=None) + + def _build_inputs(self, image): + """Builds classification model inputs for serving.""" + + # Normalizes image with mean and std pixel values. + image = preprocess_ops.normalize_image( + image, offset=preprocess_ops.MEAN_RGB, scale=preprocess_ops.STDDEV_RGB) + + if self.params.task.train_data.preserve_aspect_ratio: + image, image_info = preprocess_ops.resize_and_crop_image( + image, + self._input_image_size, + padded_size=self._input_image_size, + aug_scale_min=1.0, + aug_scale_max=1.0) + else: + image, image_info = preprocess_ops.resize_image(image, + self._input_image_size) + return image, image_info + + def serve(self, images): + """Cast image to float and run inference. + + Args: + images: uint8 Tensor of shape [batch_size, None, None, 3] + Returns: + Tensor holding classification output logits. + """ + # Skip image preprocessing when input_type is tflite so it is compatible + # with TFLite quantization. + image_info = None + if self._input_type != 'tflite': + with tf.device('cpu:0'): + images = tf.cast(images, dtype=tf.float32) + images_spec = tf.TensorSpec( + shape=self._input_image_size + [3], dtype=tf.float32) + image_info_spec = tf.TensorSpec(shape=[4, 2], dtype=tf.float32) + + images, image_info = tf.nest.map_structure( + tf.identity, + tf.map_fn( + self._build_inputs, + elems=images, + fn_output_signature=(images_spec, image_info_spec), + parallel_iterations=32)) + + outputs = self.inference_step(images) + + # Optionally resize prediction to the input image size. + if self.params.task.export_config.rescale_output: + logits = outputs['logits'] + if logits.shape[0] != 1: + raise ValueError('Batch size cannot be more than 1.') + + image_shape = tf.cast(image_info[0, 0, :], tf.int32) + if self.params.task.train_data.preserve_aspect_ratio: + rescale_size = tf.cast( + tf.math.ceil(image_info[0, 1, :] / image_info[0, 2, :]), tf.int32) + offsets = tf.cast(image_info[0, 3, :], tf.int32) + logits = tf.image.resize(logits, rescale_size, method='bilinear') + outputs['logits'] = tf.image.crop_to_bounding_box( + logits, offsets[0], offsets[1], image_shape[0], image_shape[1]) + else: + outputs['logits'] = tf.image.resize( + logits, [image_shape[0], image_shape[1]], method='bilinear') + else: + outputs['logits'] = tf.image.resize( + outputs['logits'], self._input_image_size, method='bilinear') + + if image_info is not None: + outputs.update({'image_info': image_info}) + + return outputs diff --git a/official/vision/serving/semantic_segmentation_test.py b/official/vision/serving/semantic_segmentation_test.py new file mode 100644 index 0000000000000000000000000000000000000000..0e99fb04d1ef1fdd2bb5e896c18a5d960bee37fc --- /dev/null +++ b/official/vision/serving/semantic_segmentation_test.py @@ -0,0 +1,145 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Test for semantic segmentation export lib.""" + +import io +import os + +from absl.testing import parameterized +import numpy as np +from PIL import Image +import tensorflow as tf + +from official.core import exp_factory +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.serving import semantic_segmentation + + +class SemanticSegmentationExportTest(tf.test.TestCase, parameterized.TestCase): + + def _get_segmentation_module(self, + input_type, + rescale_output, + preserve_aspect_ratio, + batch_size=1): + params = exp_factory.get_exp_config('mnv2_deeplabv3_pascal') + params.task.export_config.rescale_output = rescale_output + params.task.train_data.preserve_aspect_ratio = preserve_aspect_ratio + segmentation_module = semantic_segmentation.SegmentationModule( + params, + batch_size=batch_size, + input_image_size=[112, 112], + input_type=input_type) + return segmentation_module + + def _export_from_module(self, module, input_type, save_directory): + signatures = module.get_inference_signatures( + {input_type: 'serving_default'}) + tf.saved_model.save(module, save_directory, signatures=signatures) + + def _get_dummy_input(self, input_type, input_image_size): + """Get dummy input for the given input type.""" + + height = input_image_size[0] + width = input_image_size[1] + if input_type == 'image_tensor': + return tf.zeros((1, height, width, 3), dtype=np.uint8) + elif input_type == 'image_bytes': + image = Image.fromarray(np.zeros((height, width, 3), dtype=np.uint8)) + byte_io = io.BytesIO() + image.save(byte_io, 'PNG') + return [byte_io.getvalue()] + elif input_type == 'tf_example': + image_tensor = tf.zeros((height, width, 3), dtype=tf.uint8) + encoded_jpeg = tf.image.encode_jpeg(tf.constant(image_tensor)).numpy() + example = tf.train.Example( + features=tf.train.Features( + feature={ + 'image/encoded': + tf.train.Feature( + bytes_list=tf.train.BytesList(value=[encoded_jpeg])), + })).SerializeToString() + return [example] + elif input_type == 'tflite': + return tf.zeros((1, height, width, 3), dtype=np.float32) + + @parameterized.parameters( + ('image_tensor', False, [112, 112], False), + ('image_bytes', False, [112, 112], False), + ('tf_example', False, [112, 112], True), + ('tflite', False, [112, 112], False), + ('image_tensor', True, [112, 56], True), + ('image_bytes', True, [112, 56], True), + ('tf_example', True, [56, 112], False), + ) + def test_export(self, input_type, rescale_output, input_image_size, + preserve_aspect_ratio): + tmp_dir = self.get_temp_dir() + module = self._get_segmentation_module( + input_type=input_type, + rescale_output=rescale_output, + preserve_aspect_ratio=preserve_aspect_ratio) + + self._export_from_module(module, input_type, tmp_dir) + + self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) + self.assertTrue( + os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) + self.assertTrue( + os.path.exists( + os.path.join(tmp_dir, 'variables', + 'variables.data-00000-of-00001'))) + + imported = tf.saved_model.load(tmp_dir) + segmentation_fn = imported.signatures['serving_default'] + + images = self._get_dummy_input(input_type, input_image_size) + if input_type != 'tflite': + processed_images, _ = tf.nest.map_structure( + tf.stop_gradient, + tf.map_fn( + module._build_inputs, + elems=tf.zeros((1, 112, 112, 3), dtype=tf.uint8), + fn_output_signature=(tf.TensorSpec( + shape=[112, 112, 3], dtype=tf.float32), + tf.TensorSpec( + shape=[4, 2], dtype=tf.float32)))) + else: + processed_images = images + + logits = module.model(processed_images, training=False)['logits'] + if rescale_output: + expected_output = tf.image.resize( + logits, input_image_size, method='bilinear') + else: + expected_output = tf.image.resize(logits, [112, 112], method='bilinear') + out = segmentation_fn(tf.constant(images)) + self.assertAllClose(out['logits'].numpy(), expected_output.numpy()) + + def test_export_invalid_batch_size(self): + batch_size = 3 + tmp_dir = self.get_temp_dir() + module = self._get_segmentation_module( + input_type='image_tensor', + rescale_output=True, + preserve_aspect_ratio=False, + batch_size=batch_size) + with self.assertRaisesRegex(ValueError, + 'Batch size cannot be more than 1.'): + self._export_from_module(module, 'image_tensor', tmp_dir) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/serving/video_classification.py b/official/vision/serving/video_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..9a345b7910f238714d0897e4b6ca798d4ce5763c --- /dev/null +++ b/official/vision/serving/video_classification.py @@ -0,0 +1,187 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Video classification input and model functions for serving/inference.""" +from typing import Mapping, Dict, Text + +import tensorflow as tf + +from official.vision.dataloaders import video_input +from official.vision.serving import export_base +from official.vision.tasks import video_classification + + +class VideoClassificationModule(export_base.ExportModule): + """Video classification Module.""" + + def _build_model(self): + input_params = self.params.task.train_data + self._num_frames = input_params.feature_shape[0] + self._stride = input_params.temporal_stride + self._min_resize = input_params.min_image_size + self._crop_size = input_params.feature_shape[1] + + self._output_audio = input_params.output_audio + task = video_classification.VideoClassificationTask(self.params.task) + return task.build_model() + + def _decode_tf_example(self, encoded_inputs: tf.Tensor): + sequence_description = { + # Each image is a string encoding JPEG. + video_input.IMAGE_KEY: + tf.io.FixedLenSequenceFeature((), tf.string), + } + if self._output_audio: + sequence_description[self._params.task.validation_data.audio_feature] = ( + tf.io.VarLenFeature(dtype=tf.float32)) + _, decoded_tensors = tf.io.parse_single_sequence_example( + encoded_inputs, {}, sequence_description) + for key, value in decoded_tensors.items(): + if isinstance(value, tf.SparseTensor): + decoded_tensors[key] = tf.sparse.to_dense(value) + return decoded_tensors + + def _preprocess_image(self, image): + image = video_input.process_image( + image=image, + is_training=False, + num_frames=self._num_frames, + stride=self._stride, + num_test_clips=1, + min_resize=self._min_resize, + crop_size=self._crop_size, + num_crops=1) + image = tf.cast(image, tf.float32) # Use config. + features = {'image': image} + return features + + def _preprocess_audio(self, audio): + features = {} + audio = tf.cast(audio, dtype=tf.float32) # Use config. + audio = video_input.preprocess_ops_3d.sample_sequence( + audio, 20, random=False, stride=1) + audio = tf.ensure_shape( + audio, self._params.task.validation_data.audio_feature_shape) + features['audio'] = audio + return features + + @tf.function + def inference_from_tf_example( + self, encoded_inputs: tf.Tensor) -> Mapping[str, tf.Tensor]: + with tf.device('cpu:0'): + if self._output_audio: + inputs = tf.map_fn( + self._decode_tf_example, (encoded_inputs), + fn_output_signature={ + video_input.IMAGE_KEY: tf.string, + self._params.task.validation_data.audio_feature: tf.float32 + }) + return self.serve(inputs['image'], inputs['audio']) + else: + inputs = tf.map_fn( + self._decode_tf_example, (encoded_inputs), + fn_output_signature={ + video_input.IMAGE_KEY: tf.string, + }) + return self.serve(inputs[video_input.IMAGE_KEY], tf.zeros([1, 1])) + + @tf.function + def inference_from_image_tensors( + self, input_frames: tf.Tensor) -> Mapping[str, tf.Tensor]: + return self.serve(input_frames, tf.zeros([1, 1])) + + @tf.function + def inference_from_image_audio_tensors( + self, input_frames: tf.Tensor, + input_audio: tf.Tensor) -> Mapping[str, tf.Tensor]: + return self.serve(input_frames, input_audio) + + @tf.function + def inference_from_image_bytes(self, inputs: tf.Tensor): + raise NotImplementedError( + 'Video classification do not support image bytes input.') + + def serve(self, input_frames: tf.Tensor, input_audio: tf.Tensor): + """Cast image to float and run inference. + + Args: + input_frames: uint8 Tensor of shape [batch_size, None, None, 3] + input_audio: float32 + + Returns: + Tensor holding classification output logits. + """ + with tf.device('cpu:0'): + inputs = tf.map_fn( + self._preprocess_image, (input_frames), + fn_output_signature={ + 'image': tf.float32, + }) + if self._output_audio: + inputs.update( + tf.map_fn( + self._preprocess_audio, (input_audio), + fn_output_signature={'audio': tf.float32})) + logits = self.inference_step(inputs) + if self.params.task.train_data.is_multilabel: + probs = tf.math.sigmoid(logits) + else: + probs = tf.nn.softmax(logits) + return {'logits': logits, 'probs': probs} + + def get_inference_signatures(self, function_keys: Dict[Text, Text]): + """Gets defined function signatures. + + Args: + function_keys: A dictionary with keys as the function to create signature + for and values as the signature keys when returns. + + Returns: + A dictionary with key as signature key and value as concrete functions + that can be used for tf.saved_model.save. + """ + signatures = {} + for key, def_name in function_keys.items(): + if key == 'image_tensor': + input_signature = tf.TensorSpec( + shape=[self._batch_size] + self._input_image_size + [3], + dtype=tf.uint8, + name='INPUT_FRAMES') + signatures[ + def_name] = self.inference_from_image_tensors.get_concrete_function( + input_signature) + elif key == 'frames_audio': + input_signature = [ + tf.TensorSpec( + shape=[self._batch_size] + self._input_image_size + [3], + dtype=tf.uint8, + name='INPUT_FRAMES'), + tf.TensorSpec( + shape=[self._batch_size] + + self.params.task.train_data.audio_feature_shape, + dtype=tf.float32, + name='INPUT_AUDIO') + ] + signatures[ + def_name] = self.inference_from_image_audio_tensors.get_concrete_function( + input_signature) + elif key == 'serve_examples' or key == 'tf_example': + input_signature = tf.TensorSpec( + shape=[self._batch_size], dtype=tf.string) + signatures[ + def_name] = self.inference_from_tf_example.get_concrete_function( + input_signature) + else: + raise ValueError('Unrecognized `input_type`') + return signatures diff --git a/official/vision/serving/video_classification_test.py b/official/vision/serving/video_classification_test.py new file mode 100644 index 0000000000000000000000000000000000000000..18fc38fe6dd41faf178d5412c1d52ab4aee4cbfd --- /dev/null +++ b/official/vision/serving/video_classification_test.py @@ -0,0 +1,113 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +# import io +import os +import random + +from absl.testing import parameterized +import numpy as np +import tensorflow as tf + +from official.core import exp_factory +from official.vision import registry_imports # pylint: disable=unused-import +from official.vision.dataloaders import tfexample_utils +from official.vision.serving import video_classification + + +class VideoClassificationTest(tf.test.TestCase, parameterized.TestCase): + + def _get_classification_module(self): + params = exp_factory.get_exp_config('video_classification_ucf101') + params.task.train_data.feature_shape = (8, 64, 64, 3) + params.task.validation_data.feature_shape = (8, 64, 64, 3) + params.task.model.backbone.resnet_3d.model_id = 50 + classification_module = video_classification.VideoClassificationModule( + params, batch_size=1, input_image_size=[8, 64, 64]) + return classification_module + + def _export_from_module(self, module, input_type, save_directory): + signatures = module.get_inference_signatures( + {input_type: 'serving_default'}) + tf.saved_model.save(module, save_directory, signatures=signatures) + + def _get_dummy_input(self, input_type, module=None): + """Get dummy input for the given input type.""" + + if input_type == 'image_tensor': + images = np.random.randint( + low=0, high=255, size=(1, 8, 64, 64, 3), dtype=np.uint8) + # images = np.zeros((1, 8, 64, 64, 3), dtype=np.uint8) + return images, images + elif input_type == 'tf_example': + example = tfexample_utils.make_video_test_example( + image_shape=(64, 64, 3), + audio_shape=(20, 128), + label=random.randint(0, 100)).SerializeToString() + images = tf.nest.map_structure( + tf.stop_gradient, + tf.map_fn( + module._decode_tf_example, + elems=tf.constant([example]), + fn_output_signature={ + video_classification.video_input.IMAGE_KEY: tf.string, + })) + images = images[video_classification.video_input.IMAGE_KEY] + return [example], images + else: + raise ValueError(f'{input_type}') + + @parameterized.parameters( + {'input_type': 'image_tensor'}, + {'input_type': 'tf_example'}, + ) + def test_export(self, input_type): + tmp_dir = self.get_temp_dir() + module = self._get_classification_module() + + self._export_from_module(module, input_type, tmp_dir) + + self.assertTrue(os.path.exists(os.path.join(tmp_dir, 'saved_model.pb'))) + self.assertTrue( + os.path.exists(os.path.join(tmp_dir, 'variables', 'variables.index'))) + self.assertTrue( + os.path.exists( + os.path.join(tmp_dir, 'variables', + 'variables.data-00000-of-00001'))) + + imported = tf.saved_model.load(tmp_dir) + classification_fn = imported.signatures['serving_default'] + + images, images_tensor = self._get_dummy_input(input_type, module) + processed_images = tf.nest.map_structure( + tf.stop_gradient, + tf.map_fn( + module._preprocess_image, + elems=images_tensor, + fn_output_signature={ + 'image': tf.float32, + })) + expected_logits = module.model(processed_images, training=False) + expected_prob = tf.nn.softmax(expected_logits) + out = classification_fn(tf.constant(images)) + + # The imported model should contain any trackable attrs that the original + # model had. + self.assertAllClose(out['logits'].numpy(), expected_logits.numpy()) + self.assertAllClose(out['probs'].numpy(), expected_prob.numpy()) + + +if __name__ == '__main__': + tf.test.main() diff --git a/official/vision/tasks/__init__.py b/official/vision/tasks/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..c3f1720c788ded6598ccc8a2ff02f1ae7739f0d9 --- /dev/null +++ b/official/vision/tasks/__init__.py @@ -0,0 +1,21 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Tasks package definition.""" + +from official.vision.tasks.image_classification import ImageClassificationTask +from official.vision.tasks.maskrcnn import MaskRCNNTask +from official.vision.tasks.retinanet import RetinaNetTask +from official.vision.tasks.semantic_segmentation import SemanticSegmentationTask +from official.vision.tasks.video_classification import VideoClassificationTask diff --git a/official/vision/tasks/image_classification.py b/official/vision/tasks/image_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..71b1fcd33f1dd7b30e5d7c872b9e15e89c0ac304 --- /dev/null +++ b/official/vision/tasks/image_classification.py @@ -0,0 +1,368 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image classification task definition.""" +from typing import Any, Optional, List, Tuple +from absl import logging +import tensorflow as tf + +from official.common import dataset_fn +from official.core import base_task +from official.core import task_factory +from official.modeling import tf_utils +from official.vision.configs import image_classification as exp_cfg +from official.vision.dataloaders import classification_input +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import tfds_factory +from official.vision.modeling import factory +from official.vision.ops import augment + + +@task_factory.register_task_cls(exp_cfg.ImageClassificationTask) +class ImageClassificationTask(base_task.Task): + """A task for image classification.""" + + def build_model(self): + """Builds classification model.""" + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = factory.build_classification_model( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + + if self.task_config.freeze_backbone: + model.backbone.trainable = False + return model + + def initialize(self, model: tf.keras.Model): + """Loads pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(model=model) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + elif self.task_config.init_checkpoint_modules == 'backbone': + ckpt = tf.train.Checkpoint(backbone=model.backbone) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + raise ValueError( + "Only 'all' or 'backbone' can be used to initialize the model.") + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs( + self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None + ) -> tf.data.Dataset: + """Builds classification input.""" + + num_classes = self.task_config.model.num_classes + input_size = self.task_config.model.input_size + image_field_key = self.task_config.train_data.image_field_key + label_field_key = self.task_config.train_data.label_field_key + is_multilabel = self.task_config.train_data.is_multilabel + + if params.tfds_name: + decoder = tfds_factory.get_classification_decoder(params.tfds_name) + else: + decoder = classification_input.Decoder( + image_field_key=image_field_key, label_field_key=label_field_key, + is_multilabel=is_multilabel) + + parser = classification_input.Parser( + output_size=input_size[:2], + num_classes=num_classes, + image_field_key=image_field_key, + label_field_key=label_field_key, + decode_jpeg_only=params.decode_jpeg_only, + aug_rand_hflip=params.aug_rand_hflip, + aug_crop=params.aug_crop, + aug_type=params.aug_type, + color_jitter=params.color_jitter, + random_erasing=params.random_erasing, + is_multilabel=is_multilabel, + dtype=params.dtype) + + postprocess_fn = None + if params.mixup_and_cutmix: + postprocess_fn = augment.MixupAndCutmix( + mixup_alpha=params.mixup_and_cutmix.mixup_alpha, + cutmix_alpha=params.mixup_and_cutmix.cutmix_alpha, + prob=params.mixup_and_cutmix.prob, + label_smoothing=params.mixup_and_cutmix.label_smoothing, + num_classes=num_classes) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training), + postprocess_fn=postprocess_fn) + + dataset = reader.read(input_context=input_context) + + return dataset + + def build_losses(self, + labels: tf.Tensor, + model_outputs: tf.Tensor, + aux_losses: Optional[Any] = None) -> tf.Tensor: + """Builds sparse categorical cross entropy loss. + + Args: + labels: Input groundtruth labels. + model_outputs: Output logits of the classifier. + aux_losses: The auxiliarly loss tensors, i.e. `losses` in tf.keras.Model. + + Returns: + The total loss tensor. + """ + losses_config = self.task_config.losses + is_multilabel = self.task_config.train_data.is_multilabel + + if not is_multilabel: + if losses_config.one_hot: + total_loss = tf.keras.losses.categorical_crossentropy( + labels, + model_outputs, + from_logits=True, + label_smoothing=losses_config.label_smoothing) + elif losses_config.soft_labels: + total_loss = tf.nn.softmax_cross_entropy_with_logits( + labels, model_outputs) + else: + total_loss = tf.keras.losses.sparse_categorical_crossentropy( + labels, model_outputs, from_logits=True) + else: + # Multi-label weighted binary cross entropy loss. + total_loss = tf.nn.sigmoid_cross_entropy_with_logits( + labels=labels, logits=model_outputs) + total_loss = tf.reduce_sum(total_loss, axis=-1) + + total_loss = tf_utils.safe_mean(total_loss) + if aux_losses: + total_loss += tf.add_n(aux_losses) + + total_loss = losses_config.loss_weight * total_loss + return total_loss + + def build_metrics(self, + training: bool = True) -> List[tf.keras.metrics.Metric]: + """Gets streaming metrics for training/validation.""" + is_multilabel = self.task_config.train_data.is_multilabel + if not is_multilabel: + k = self.task_config.evaluation.top_k + if (self.task_config.losses.one_hot or + self.task_config.losses.soft_labels): + metrics = [ + tf.keras.metrics.CategoricalAccuracy(name='accuracy'), + tf.keras.metrics.TopKCategoricalAccuracy( + k=k, name='top_{}_accuracy'.format(k))] + if hasattr( + self.task_config.evaluation, 'precision_and_recall_thresholds' + ) and self.task_config.evaluation.precision_and_recall_thresholds: + thresholds = self.task_config.evaluation.precision_and_recall_thresholds + # pylint:disable=g-complex-comprehension + metrics += [ + tf.keras.metrics.Precision( + thresholds=th, + name='precision_at_threshold_{}'.format(th), + top_k=1) for th in thresholds + ] + metrics += [ + tf.keras.metrics.Recall( + thresholds=th, + name='recall_at_threshold_{}'.format(th), + top_k=1) for th in thresholds + ] + + # Add per-class precision and recall. + if hasattr( + self.task_config.evaluation, + 'report_per_class_precision_and_recall' + ) and self.task_config.evaluation.report_per_class_precision_and_recall: + for class_id in range(self.task_config.model.num_classes): + metrics += [ + tf.keras.metrics.Precision( + thresholds=th, + class_id=class_id, + name=f'precision_at_threshold_{th}/{class_id}', + top_k=1) for th in thresholds + ] + metrics += [ + tf.keras.metrics.Recall( + thresholds=th, + class_id=class_id, + name=f'recall_at_threshold_{th}/{class_id}', + top_k=1) for th in thresholds + ] + # pylint:enable=g-complex-comprehension + else: + metrics = [ + tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), + tf.keras.metrics.SparseTopKCategoricalAccuracy( + k=k, name='top_{}_accuracy'.format(k))] + else: + metrics = [] + # These metrics destablize the training if included in training. The jobs + # fail due to OOM. + # TODO(arashwan): Investigate adding following metric to train. + if not training: + metrics = [ + tf.keras.metrics.AUC( + name='globalPR-AUC', + curve='PR', + multi_label=False, + from_logits=True), + tf.keras.metrics.AUC( + name='meanPR-AUC', + curve='PR', + multi_label=True, + num_labels=self.task_config.model.num_classes, + from_logits=True), + ] + return metrics + + def train_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None): + """Does forward and backward. + + Args: + inputs: A tuple of input tensors of (features, labels). + model: A tf.keras.Model instance. + optimizer: The optimizer for this training step. + metrics: A nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + is_multilabel = self.task_config.train_data.is_multilabel + if self.task_config.losses.one_hot and not is_multilabel: + labels = tf.one_hot(labels, self.task_config.model.num_classes) + + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model(features, training=True) + + # Casting output layer as float32 is necessary when mixed_precision is + # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. + outputs = tf.nest.map_structure( + lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + loss = self.build_losses( + model_outputs=outputs, + labels=labels, + aux_losses=model.losses) + # Scales loss as the default gradients allreduce performs sum inside the + # optimizer. + scaled_loss = loss / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance( + optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient before apply_gradients when LossScaleOptimizer is + # used. + if isinstance( + optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: loss} + + # Convert logits to softmax for metric computation if needed. + if hasattr(self.task_config.model, + 'output_softmax') and self.task_config.model.output_softmax: + outputs = tf.nn.softmax(outputs, axis=-1) + if metrics: + self.process_metrics(metrics, labels, outputs) + elif model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in model.metrics}) + return logs + + def validation_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None): + """Runs validatation step. + + Args: + inputs: A tuple of input tensors of (features, labels). + model: A tf.keras.Model instance. + metrics: A nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + one_hot = self.task_config.losses.one_hot + soft_labels = self.task_config.losses.soft_labels + is_multilabel = self.task_config.train_data.is_multilabel + # Note: `soft_labels`` only apply to the training phrase. In the validation + # phrase, labels should still be integer ids and need to be converted to + # one hot format. + if (one_hot or soft_labels) and not is_multilabel: + labels = tf.one_hot(labels, self.task_config.model.num_classes) + + outputs = self.inference_step(features, model) + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + loss = self.build_losses( + model_outputs=outputs, + labels=labels, + aux_losses=model.losses) + + logs = {self.loss: loss} + # Convert logits to softmax for metric computation if needed. + if hasattr(self.task_config.model, + 'output_softmax') and self.task_config.model.output_softmax: + outputs = tf.nn.softmax(outputs, axis=-1) + if metrics: + self.process_metrics(metrics, labels, outputs) + elif model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in model.metrics}) + return logs + + def inference_step(self, inputs: tf.Tensor, model: tf.keras.Model): + """Performs the forward step.""" + return model(inputs, training=False) diff --git a/official/vision/tasks/maskrcnn.py b/official/vision/tasks/maskrcnn.py new file mode 100644 index 0000000000000000000000000000000000000000..f7ef94439d02b853283c4a5fdb2d4dc322aa9d7d --- /dev/null +++ b/official/vision/tasks/maskrcnn.py @@ -0,0 +1,479 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""MaskRCNN task definition.""" + +import os +from typing import Any, Dict, Optional, List, Tuple, Mapping + +from absl import logging +import tensorflow as tf +from official.common import dataset_fn as dataset_fn_lib +from official.core import base_task +from official.core import task_factory +from official.vision.configs import maskrcnn as exp_cfg +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import maskrcnn_input +from official.vision.dataloaders import tf_example_decoder +from official.vision.dataloaders import tf_example_label_map_decoder +from official.vision.evaluation import coco_evaluator +from official.vision.evaluation import coco_utils +from official.vision.losses import maskrcnn_losses +from official.vision.modeling import factory + + +def zero_out_disallowed_class_ids(batch_class_ids: tf.Tensor, + allowed_class_ids: List[int]): + """Zero out IDs of classes not in allowed_class_ids. + + Args: + batch_class_ids: A [batch_size, num_instances] int tensor of input + class IDs. + allowed_class_ids: A python list of class IDs which we want to allow. + + Returns: + filtered_class_ids: A [batch_size, num_instances] int tensor with any + class ID not in allowed_class_ids set to 0. + """ + + allowed_class_ids = tf.constant(allowed_class_ids, + dtype=batch_class_ids.dtype) + + match_ids = (batch_class_ids[:, :, tf.newaxis] == + allowed_class_ids[tf.newaxis, tf.newaxis, :]) + + match_ids = tf.reduce_any(match_ids, axis=2) + return tf.where(match_ids, batch_class_ids, tf.zeros_like(batch_class_ids)) + + +@task_factory.register_task_cls(exp_cfg.MaskRCNNTask) +class MaskRCNNTask(base_task.Task): + """A single-replica view of training procedure. + + Mask R-CNN task provides artifacts for training/evalution procedures, + including loading/iterating over Datasets, initializing the model, calculating + the loss, post-processing, and customized metrics with reduction. + """ + + def build_model(self): + """Build Mask R-CNN model.""" + + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = factory.build_maskrcnn( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + + if self.task_config.freeze_backbone: + model.backbone.trainable = False + + return model + + def initialize(self, model: tf.keras.Model): + """Loading pretrained checkpoint.""" + + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + ckpt_items = {} + if 'backbone' in self.task_config.init_checkpoint_modules: + ckpt_items.update(backbone=model.backbone) + if 'decoder' in self.task_config.init_checkpoint_modules: + ckpt_items.update(decoder=model.decoder) + + ckpt = tf.train.Checkpoint(**ckpt_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs( + self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None, + dataset_fn: Optional[dataset_fn_lib.PossibleDatasetType] = None + ) -> tf.data.Dataset: + """Build input dataset.""" + decoder_cfg = params.decoder.get() + if params.decoder.type == 'simple_decoder': + decoder = tf_example_decoder.TfExampleDecoder( + include_mask=self._task_config.model.include_mask, + regenerate_source_id=decoder_cfg.regenerate_source_id, + mask_binarize_threshold=decoder_cfg.mask_binarize_threshold) + elif params.decoder.type == 'label_map_decoder': + decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( + label_map=decoder_cfg.label_map, + include_mask=self._task_config.model.include_mask, + regenerate_source_id=decoder_cfg.regenerate_source_id, + mask_binarize_threshold=decoder_cfg.mask_binarize_threshold) + else: + raise ValueError('Unknown decoder type: {}!'.format(params.decoder.type)) + + parser = maskrcnn_input.Parser( + output_size=self.task_config.model.input_size[:2], + min_level=self.task_config.model.min_level, + max_level=self.task_config.model.max_level, + num_scales=self.task_config.model.anchor.num_scales, + aspect_ratios=self.task_config.model.anchor.aspect_ratios, + anchor_size=self.task_config.model.anchor.anchor_size, + dtype=params.dtype, + rpn_match_threshold=params.parser.rpn_match_threshold, + rpn_unmatched_threshold=params.parser.rpn_unmatched_threshold, + rpn_batch_size_per_im=params.parser.rpn_batch_size_per_im, + rpn_fg_fraction=params.parser.rpn_fg_fraction, + aug_rand_hflip=params.parser.aug_rand_hflip, + aug_scale_min=params.parser.aug_scale_min, + aug_scale_max=params.parser.aug_scale_max, + aug_type=params.parser.aug_type, + skip_crowd_during_training=params.parser.skip_crowd_during_training, + max_num_instances=params.parser.max_num_instances, + include_mask=self._task_config.model.include_mask, + mask_crop_size=params.parser.mask_crop_size) + + if not dataset_fn: + dataset_fn = dataset_fn_lib.pick_dataset_fn(params.file_type) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn, + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + dataset = reader.read(input_context=input_context) + + return dataset + + def _build_rpn_losses( + self, outputs: Mapping[str, Any], + labels: Mapping[str, Any]) -> Tuple[tf.Tensor, tf.Tensor]: + """Build losses for Region Proposal Network (RPN).""" + rpn_score_loss_fn = maskrcnn_losses.RpnScoreLoss( + tf.shape(outputs['box_outputs'])[1]) + rpn_box_loss_fn = maskrcnn_losses.RpnBoxLoss( + self.task_config.losses.rpn_huber_loss_delta) + rpn_score_loss = tf.reduce_mean( + rpn_score_loss_fn(outputs['rpn_scores'], labels['rpn_score_targets'])) + rpn_box_loss = tf.reduce_mean( + rpn_box_loss_fn(outputs['rpn_boxes'], labels['rpn_box_targets'])) + return rpn_score_loss, rpn_box_loss + + def _build_frcnn_losses( + self, outputs: Mapping[str, Any], + labels: Mapping[str, Any]) -> Tuple[tf.Tensor, tf.Tensor]: + """Build losses for Fast R-CNN.""" + cascade_ious = self.task_config.model.roi_sampler.cascade_iou_thresholds + + frcnn_cls_loss_fn = maskrcnn_losses.FastrcnnClassLoss() + frcnn_box_loss_fn = maskrcnn_losses.FastrcnnBoxLoss( + self.task_config.losses.frcnn_huber_loss_delta, + self.task_config.model.detection_head.class_agnostic_bbox_pred) + + # Final cls/box losses are computed as an average of all detection heads. + frcnn_cls_loss = 0.0 + frcnn_box_loss = 0.0 + num_det_heads = 1 if cascade_ious is None else 1 + len(cascade_ious) + for cas_num in range(num_det_heads): + frcnn_cls_loss_i = tf.reduce_mean( + frcnn_cls_loss_fn( + outputs['class_outputs_{}' + .format(cas_num) if cas_num else 'class_outputs'], + outputs['class_targets_{}' + .format(cas_num) if cas_num else 'class_targets'])) + frcnn_box_loss_i = tf.reduce_mean( + frcnn_box_loss_fn( + outputs['box_outputs_{}'.format(cas_num + ) if cas_num else 'box_outputs'], + outputs['class_targets_{}' + .format(cas_num) if cas_num else 'class_targets'], + outputs['box_targets_{}'.format(cas_num + ) if cas_num else 'box_targets'])) + frcnn_cls_loss += frcnn_cls_loss_i + frcnn_box_loss += frcnn_box_loss_i + frcnn_cls_loss /= num_det_heads + frcnn_box_loss /= num_det_heads + return frcnn_cls_loss, frcnn_box_loss + + def _build_mask_loss(self, outputs: Mapping[str, Any]) -> tf.Tensor: + """Build losses for the masks.""" + mask_loss_fn = maskrcnn_losses.MaskrcnnLoss() + mask_class_targets = outputs['mask_class_targets'] + if self.task_config.allowed_mask_class_ids is not None: + # Classes with ID=0 are ignored by mask_loss_fn in loss computation. + mask_class_targets = zero_out_disallowed_class_ids( + mask_class_targets, self.task_config.allowed_mask_class_ids) + return tf.reduce_mean( + mask_loss_fn(outputs['mask_outputs'], outputs['mask_targets'], + mask_class_targets)) + + def build_losses(self, + outputs: Mapping[str, Any], + labels: Mapping[str, Any], + aux_losses: Optional[Any] = None) -> Dict[str, tf.Tensor]: + """Build Mask R-CNN losses.""" + rpn_score_loss, rpn_box_loss = self._build_rpn_losses(outputs, labels) + frcnn_cls_loss, frcnn_box_loss = self._build_frcnn_losses(outputs, labels) + if self.task_config.model.include_mask: + mask_loss = self._build_mask_loss(outputs) + else: + mask_loss = tf.constant(0.0, dtype=tf.float32) + + params = self.task_config + model_loss = ( + params.losses.rpn_score_weight * rpn_score_loss + + params.losses.rpn_box_weight * rpn_box_loss + + params.losses.frcnn_class_weight * frcnn_cls_loss + + params.losses.frcnn_box_weight * frcnn_box_loss + + params.losses.mask_weight * mask_loss) + + total_loss = model_loss + if aux_losses: + reg_loss = tf.reduce_sum(aux_losses) + total_loss = model_loss + reg_loss + + total_loss = params.losses.loss_weight * total_loss + losses = { + 'total_loss': total_loss, + 'rpn_score_loss': rpn_score_loss, + 'rpn_box_loss': rpn_box_loss, + 'frcnn_cls_loss': frcnn_cls_loss, + 'frcnn_box_loss': frcnn_box_loss, + 'mask_loss': mask_loss, + 'model_loss': model_loss, + } + return losses + + def _build_coco_metrics(self): + """Build COCO metrics evaluator.""" + if (not self._task_config.model.include_mask + ) or self._task_config.annotation_file: + self.coco_metric = coco_evaluator.COCOEvaluator( + annotation_file=self._task_config.annotation_file, + include_mask=self._task_config.model.include_mask, + per_category_metrics=self._task_config.per_category_metrics) + else: + # Builds COCO-style annotation file if include_mask is True, and + # annotation_file isn't provided. + annotation_path = os.path.join(self._logging_dir, 'annotation.json') + if tf.io.gfile.exists(annotation_path): + logging.info( + 'annotation.json file exists, skipping creating the annotation' + ' file.') + else: + if self._task_config.validation_data.num_examples <= 0: + logging.info('validation_data.num_examples needs to be > 0') + if not self._task_config.validation_data.input_path: + logging.info('Can not create annotation file for tfds.') + logging.info( + 'Creating coco-style annotation file: %s', annotation_path) + coco_utils.scan_and_generator_annotation_file( + self._task_config.validation_data.input_path, + self._task_config.validation_data.file_type, + self._task_config.validation_data.num_examples, + self.task_config.model.include_mask, annotation_path, + regenerate_source_id=self._task_config.validation_data.decoder + .simple_decoder.regenerate_source_id) + self.coco_metric = coco_evaluator.COCOEvaluator( + annotation_file=annotation_path, + include_mask=self._task_config.model.include_mask, + per_category_metrics=self._task_config.per_category_metrics) + + def build_metrics(self, training: bool = True): + """Build detection metrics.""" + metrics = [] + if training: + metric_names = [ + 'total_loss', + 'rpn_score_loss', + 'rpn_box_loss', + 'frcnn_cls_loss', + 'frcnn_box_loss', + 'mask_loss', + 'model_loss' + ] + for name in metric_names: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + + else: + if self._task_config.use_coco_metrics: + self._build_coco_metrics() + if self._task_config.use_wod_metrics: + # To use Waymo open dataset metrics, please install one of the pip + # package `waymo-open-dataset-tf-*` from + # https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md#use-pre-compiled-pippip3-packages-for-linux + # Note that the package is built with specific tensorflow version and + # will produce error if it does not match the tf version that is + # currently used. + try: + from official.vision.evaluation import wod_detection_evaluator # pylint: disable=g-import-not-at-top + except ModuleNotFoundError: + logging.error('waymo-open-dataset should be installed to enable Waymo' + ' evaluator.') + raise + self.wod_metric = wod_detection_evaluator.WOD2dDetectionEvaluator() + + return metrics + + def train_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + images, labels = inputs + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model( + images, + image_shape=labels['image_info'][:, 1, :], + anchor_boxes=labels['anchor_boxes'], + gt_boxes=labels['gt_boxes'], + gt_classes=labels['gt_classes'], + gt_masks=(labels['gt_masks'] if self.task_config.model.include_mask + else None), + training=True) + outputs = tf.nest.map_structure( + lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + losses = self.build_losses( + outputs=outputs, labels=labels, aux_losses=model.losses) + scaled_loss = losses['total_loss'] / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient when LossScaleOptimizer is used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: losses['total_loss']} + + if metrics: + for m in metrics: + m.update_state(losses[m.name]) + + return logs + + def validation_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + images, labels = inputs + + outputs = model( + images, + anchor_boxes=labels['anchor_boxes'], + image_shape=labels['image_info'][:, 1, :], + training=False) + + logs = {self.loss: 0} + if self._task_config.use_coco_metrics: + coco_model_outputs = { + 'detection_boxes': outputs['detection_boxes'], + 'detection_scores': outputs['detection_scores'], + 'detection_classes': outputs['detection_classes'], + 'num_detections': outputs['num_detections'], + 'source_id': labels['groundtruths']['source_id'], + 'image_info': labels['image_info'] + } + if self.task_config.model.include_mask: + coco_model_outputs.update({ + 'detection_masks': outputs['detection_masks'], + }) + logs.update( + {self.coco_metric.name: (labels['groundtruths'], coco_model_outputs)}) + + if self.task_config.use_wod_metrics: + wod_model_outputs = { + 'detection_boxes': outputs['detection_boxes'], + 'detection_scores': outputs['detection_scores'], + 'detection_classes': outputs['detection_classes'], + 'num_detections': outputs['num_detections'], + 'source_id': labels['groundtruths']['source_id'], + 'image_info': labels['image_info'] + } + logs.update( + {self.wod_metric.name: (labels['groundtruths'], wod_model_outputs)}) + return logs + + def aggregate_logs(self, state=None, step_outputs=None): + if self._task_config.use_coco_metrics: + if state is None: + self.coco_metric.reset_states() + self.coco_metric.update_state( + step_outputs[self.coco_metric.name][0], + step_outputs[self.coco_metric.name][1]) + if self._task_config.use_wod_metrics: + if state is None: + self.wod_metric.reset_states() + self.wod_metric.update_state( + step_outputs[self.wod_metric.name][0], + step_outputs[self.wod_metric.name][1]) + if state is None: + # Create an arbitrary state to indicate it's not the first step in the + # following calls to this function. + state = True + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + logs = {} + if self._task_config.use_coco_metrics: + logs.update(self.coco_metric.result()) + if self._task_config.use_wod_metrics: + logs.update(self.wod_metric.result()) + return logs diff --git a/official/vision/tasks/retinanet.py b/official/vision/tasks/retinanet.py new file mode 100644 index 0000000000000000000000000000000000000000..79a424ec302dad0600f6be6ddaa3f67871eac04d --- /dev/null +++ b/official/vision/tasks/retinanet.py @@ -0,0 +1,404 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""RetinaNet task definition.""" +from typing import Any, List, Mapping, Optional, Tuple + +from absl import logging +import tensorflow as tf + +from official.common import dataset_fn +from official.core import base_task +from official.core import task_factory +from official.vision.configs import retinanet as exp_cfg +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import retinanet_input +from official.vision.dataloaders import tf_example_decoder +from official.vision.dataloaders import tfds_factory +from official.vision.dataloaders import tf_example_label_map_decoder +from official.vision.evaluation import coco_evaluator +from official.vision.losses import focal_loss +from official.vision.losses import loss_utils +from official.vision.modeling import factory + + +@task_factory.register_task_cls(exp_cfg.RetinaNetTask) +class RetinaNetTask(base_task.Task): + """A single-replica view of training procedure. + + RetinaNet task provides artifacts for training/evalution procedures, including + loading/iterating over Datasets, initializing the model, calculating the loss, + post-processing, and customized metrics with reduction. + """ + + def build_model(self): + """Build RetinaNet model.""" + + input_specs = tf.keras.layers.InputSpec( + shape=[None] + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = factory.build_retinanet( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + + if self.task_config.freeze_backbone: + model.backbone.trainable = False + + return model + + def initialize(self, model: tf.keras.Model): + """Loading pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + ckpt_items = {} + if 'backbone' in self.task_config.init_checkpoint_modules: + ckpt_items.update(backbone=model.backbone) + if 'decoder' in self.task_config.init_checkpoint_modules: + ckpt_items.update(decoder=model.decoder) + + ckpt = tf.train.Checkpoint(**ckpt_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs(self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None): + """Build input dataset.""" + + if params.tfds_name: + decoder = tfds_factory.get_detection_decoder(params.tfds_name) + else: + decoder_cfg = params.decoder.get() + if params.decoder.type == 'simple_decoder': + decoder = tf_example_decoder.TfExampleDecoder( + regenerate_source_id=decoder_cfg.regenerate_source_id) + elif params.decoder.type == 'label_map_decoder': + decoder = tf_example_label_map_decoder.TfExampleDecoderLabelMap( + label_map=decoder_cfg.label_map, + regenerate_source_id=decoder_cfg.regenerate_source_id) + else: + raise ValueError('Unknown decoder type: {}!'.format( + params.decoder.type)) + + parser = retinanet_input.Parser( + output_size=self.task_config.model.input_size[:2], + min_level=self.task_config.model.min_level, + max_level=self.task_config.model.max_level, + num_scales=self.task_config.model.anchor.num_scales, + aspect_ratios=self.task_config.model.anchor.aspect_ratios, + anchor_size=self.task_config.model.anchor.anchor_size, + dtype=params.dtype, + match_threshold=params.parser.match_threshold, + unmatched_threshold=params.parser.unmatched_threshold, + aug_type=params.parser.aug_type, + aug_rand_hflip=params.parser.aug_rand_hflip, + aug_scale_min=params.parser.aug_scale_min, + aug_scale_max=params.parser.aug_scale_max, + skip_crowd_during_training=params.parser.skip_crowd_during_training, + max_num_instances=params.parser.max_num_instances) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + dataset = reader.read(input_context=input_context) + + return dataset + + def build_attribute_loss(self, + attribute_heads: List[exp_cfg.AttributeHead], + outputs: Mapping[str, Any], + labels: Mapping[str, Any], + box_sample_weight: tf.Tensor) -> float: + """Computes attribute loss. + + Args: + attribute_heads: a list of attribute head configs. + outputs: RetinaNet model outputs. + labels: RetinaNet labels. + box_sample_weight: normalized bounding box sample weights. + + Returns: + Attribute loss of all attribute heads. + """ + attribute_loss = 0.0 + for head in attribute_heads: + if head.name not in labels['attribute_targets']: + raise ValueError(f'Attribute {head.name} not found in label targets.') + if head.name not in outputs['attribute_outputs']: + raise ValueError(f'Attribute {head.name} not found in model outputs.') + + y_true_att = loss_utils.multi_level_flatten( + labels['attribute_targets'][head.name], last_dim=head.size) + y_pred_att = loss_utils.multi_level_flatten( + outputs['attribute_outputs'][head.name], last_dim=head.size) + if head.type == 'regression': + att_loss_fn = tf.keras.losses.Huber( + 1.0, reduction=tf.keras.losses.Reduction.SUM) + att_loss = att_loss_fn( + y_true=y_true_att, + y_pred=y_pred_att, + sample_weight=box_sample_weight) + else: + raise ValueError(f'Attribute type {head.type} not supported.') + attribute_loss += att_loss + + return attribute_loss + + def build_losses(self, + outputs: Mapping[str, Any], + labels: Mapping[str, Any], + aux_losses: Optional[Any] = None): + """Build RetinaNet losses.""" + params = self.task_config + attribute_heads = self.task_config.model.head.attribute_heads + + cls_loss_fn = focal_loss.FocalLoss( + alpha=params.losses.focal_loss_alpha, + gamma=params.losses.focal_loss_gamma, + reduction=tf.keras.losses.Reduction.SUM) + box_loss_fn = tf.keras.losses.Huber( + params.losses.huber_loss_delta, reduction=tf.keras.losses.Reduction.SUM) + + # Sums all positives in a batch for normalization and avoids zero + # num_positives_sum, which would lead to inf loss during training + cls_sample_weight = labels['cls_weights'] + box_sample_weight = labels['box_weights'] + num_positives = tf.reduce_sum(box_sample_weight) + 1.0 + cls_sample_weight = cls_sample_weight / num_positives + box_sample_weight = box_sample_weight / num_positives + y_true_cls = loss_utils.multi_level_flatten( + labels['cls_targets'], last_dim=None) + y_true_cls = tf.one_hot(y_true_cls, params.model.num_classes) + y_pred_cls = loss_utils.multi_level_flatten( + outputs['cls_outputs'], last_dim=params.model.num_classes) + y_true_box = loss_utils.multi_level_flatten( + labels['box_targets'], last_dim=4) + y_pred_box = loss_utils.multi_level_flatten( + outputs['box_outputs'], last_dim=4) + + cls_loss = cls_loss_fn( + y_true=y_true_cls, y_pred=y_pred_cls, sample_weight=cls_sample_weight) + box_loss = box_loss_fn( + y_true=y_true_box, y_pred=y_pred_box, sample_weight=box_sample_weight) + + model_loss = cls_loss + params.losses.box_loss_weight * box_loss + + if attribute_heads: + model_loss += self.build_attribute_loss(attribute_heads, outputs, labels, + box_sample_weight) + + total_loss = model_loss + if aux_losses: + reg_loss = tf.reduce_sum(aux_losses) + total_loss = model_loss + reg_loss + + total_loss = params.losses.loss_weight * total_loss + + return total_loss, cls_loss, box_loss, model_loss + + def build_metrics(self, training: bool = True): + """Build detection metrics.""" + metrics = [] + metric_names = ['total_loss', 'cls_loss', 'box_loss', 'model_loss'] + for name in metric_names: + metrics.append(tf.keras.metrics.Mean(name, dtype=tf.float32)) + + if not training: + if self.task_config.validation_data.tfds_name and self.task_config.annotation_file: + raise ValueError( + "Can't evaluate using annotation file when TFDS is used.") + if self._task_config.use_coco_metrics: + self.coco_metric = coco_evaluator.COCOEvaluator( + annotation_file=self.task_config.annotation_file, + include_mask=False, + per_category_metrics=self.task_config.per_category_metrics) + if self._task_config.use_wod_metrics: + # To use Waymo open dataset metrics, please install one of the pip + # package `waymo-open-dataset-tf-*` from + # https://github.com/waymo-research/waymo-open-dataset/blob/master/docs/quick_start.md#use-pre-compiled-pippip3-packages-for-linux + # Note that the package is built with specific tensorflow version and + # will produce error if it does not match the tf version that is + # currently used. + try: + from official.vision.evaluation import wod_detection_evaluator # pylint: disable=g-import-not-at-top + except ModuleNotFoundError: + logging.error('waymo-open-dataset should be installed to enable Waymo' + ' evaluator.') + raise + self.wod_metric = wod_detection_evaluator.WOD2dDetectionEvaluator() + + return metrics + + def train_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model(features, training=True) + outputs = tf.nest.map_structure( + lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + loss, cls_loss, box_loss, model_loss = self.build_losses( + outputs=outputs, labels=labels, aux_losses=model.losses) + scaled_loss = loss / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient when LossScaleOptimizer is used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: loss} + + all_losses = { + 'total_loss': loss, + 'cls_loss': cls_loss, + 'box_loss': box_loss, + 'model_loss': model_loss, + } + if metrics: + for m in metrics: + m.update_state(all_losses[m.name]) + logs.update({m.name: m.result()}) + + return logs + + def validation_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + + outputs = model(features, anchor_boxes=labels['anchor_boxes'], + image_shape=labels['image_info'][:, 1, :], + training=False) + loss, cls_loss, box_loss, model_loss = self.build_losses( + outputs=outputs, labels=labels, aux_losses=model.losses) + logs = {self.loss: loss} + + all_losses = { + 'total_loss': loss, + 'cls_loss': cls_loss, + 'box_loss': box_loss, + 'model_loss': model_loss, + } + + if self._task_config.use_coco_metrics: + coco_model_outputs = { + 'detection_boxes': outputs['detection_boxes'], + 'detection_scores': outputs['detection_scores'], + 'detection_classes': outputs['detection_classes'], + 'num_detections': outputs['num_detections'], + 'source_id': labels['groundtruths']['source_id'], + 'image_info': labels['image_info'] + } + logs.update( + {self.coco_metric.name: (labels['groundtruths'], coco_model_outputs)}) + if self.task_config.use_wod_metrics: + wod_model_outputs = { + 'detection_boxes': outputs['detection_boxes'], + 'detection_scores': outputs['detection_scores'], + 'detection_classes': outputs['detection_classes'], + 'num_detections': outputs['num_detections'], + 'source_id': labels['groundtruths']['source_id'], + 'image_info': labels['image_info'] + } + logs.update( + {self.wod_metric.name: (labels['groundtruths'], wod_model_outputs)}) + + if metrics: + for m in metrics: + m.update_state(all_losses[m.name]) + logs.update({m.name: m.result()}) + return logs + + def aggregate_logs(self, state=None, step_outputs=None): + if self._task_config.use_coco_metrics: + if state is None: + self.coco_metric.reset_states() + self.coco_metric.update_state(step_outputs[self.coco_metric.name][0], + step_outputs[self.coco_metric.name][1]) + if self._task_config.use_wod_metrics: + if state is None: + self.wod_metric.reset_states() + self.wod_metric.update_state(step_outputs[self.wod_metric.name][0], + step_outputs[self.wod_metric.name][1]) + if state is None: + # Create an arbitrary state to indicate it's not the first step in the + # following calls to this function. + state = True + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + logs = {} + if self._task_config.use_coco_metrics: + logs.update(self.coco_metric.result()) + if self._task_config.use_wod_metrics: + logs.update(self.wod_metric.result()) + return logs diff --git a/official/vision/tasks/semantic_segmentation.py b/official/vision/tasks/semantic_segmentation.py new file mode 100644 index 0000000000000000000000000000000000000000..5a9da4bf6591ee447df764e280746ac2c9ad1a54 --- /dev/null +++ b/official/vision/tasks/semantic_segmentation.py @@ -0,0 +1,334 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Image segmentation task definition.""" +from typing import Any, Optional, List, Tuple, Mapping, Union + +from absl import logging +import tensorflow as tf +from official.common import dataset_fn +from official.core import base_task +from official.core import task_factory +from official.vision.configs import semantic_segmentation as exp_cfg +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import segmentation_input +from official.vision.dataloaders import tfds_factory +from official.vision.evaluation import segmentation_metrics +from official.vision.losses import segmentation_losses +from official.vision.modeling import factory + + +@task_factory.register_task_cls(exp_cfg.SemanticSegmentationTask) +class SemanticSegmentationTask(base_task.Task): + """A task for semantic segmentation.""" + + def build_model(self): + """Builds segmentation model.""" + input_specs = tf.keras.layers.InputSpec(shape=[None] + + self.task_config.model.input_size) + + l2_weight_decay = self.task_config.losses.l2_weight_decay + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = ( + tf.keras.regularizers.l2(l2_weight_decay / + 2.0) if l2_weight_decay else None) + + model = factory.build_segmentation_model( + input_specs=input_specs, + model_config=self.task_config.model, + l2_regularizer=l2_regularizer) + return model + + def initialize(self, model: tf.keras.Model): + """Loads pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if 'all' in self.task_config.init_checkpoint_modules: + ckpt = tf.train.Checkpoint(**model.checkpoint_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + ckpt_items = {} + if 'backbone' in self.task_config.init_checkpoint_modules: + ckpt_items.update(backbone=model.backbone) + if 'decoder' in self.task_config.init_checkpoint_modules: + ckpt_items.update(decoder=model.decoder) + + ckpt = tf.train.Checkpoint(**ckpt_items) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def build_inputs(self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None): + """Builds classification input.""" + + ignore_label = self.task_config.losses.ignore_label + gt_is_matting_map = self.task_config.losses.gt_is_matting_map + + if params.tfds_name: + decoder = tfds_factory.get_segmentation_decoder(params.tfds_name) + else: + decoder = segmentation_input.Decoder() + + parser = segmentation_input.Parser( + output_size=params.output_size, + crop_size=params.crop_size, + ignore_label=ignore_label, + resize_eval_groundtruth=params.resize_eval_groundtruth, + gt_is_matting_map=gt_is_matting_map, + groundtruth_padded_size=params.groundtruth_padded_size, + aug_scale_min=params.aug_scale_min, + aug_scale_max=params.aug_scale_max, + aug_rand_hflip=params.aug_rand_hflip, + preserve_aspect_ratio=params.preserve_aspect_ratio, + dtype=params.dtype) + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=dataset_fn.pick_dataset_fn(params.file_type), + decoder_fn=decoder.decode, + parser_fn=parser.parse_fn(params.is_training)) + + dataset = reader.read(input_context=input_context) + + return dataset + + def build_losses(self, + labels: Mapping[str, tf.Tensor], + model_outputs: Union[Mapping[str, tf.Tensor], tf.Tensor], + aux_losses: Optional[Any] = None): + """Segmentation loss. + + Args: + labels: labels. + model_outputs: Output logits of the classifier. + aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model. + + Returns: + The total loss tensor. + """ + loss_params = self._task_config.losses + segmentation_loss_fn = segmentation_losses.SegmentationLoss( + loss_params.label_smoothing, + loss_params.class_weights, + loss_params.ignore_label, + use_groundtruth_dimension=loss_params.use_groundtruth_dimension, + top_k_percent_pixels=loss_params.top_k_percent_pixels, + gt_is_matting_map=loss_params.gt_is_matting_map) + + total_loss = segmentation_loss_fn(model_outputs['logits'], labels['masks']) + + if 'mask_scores' in model_outputs: + mask_scoring_loss_fn = segmentation_losses.MaskScoringLoss( + loss_params.ignore_label) + total_loss += mask_scoring_loss_fn(model_outputs['mask_scores'], + model_outputs['logits'], + labels['masks']) + + if aux_losses: + total_loss += tf.add_n(aux_losses) + + total_loss = loss_params.loss_weight * total_loss + + return total_loss + + def process_metrics(self, metrics, labels, model_outputs, **kwargs): + """Process and update metrics. + + Called when using custom training loop API. + + Args: + metrics: a nested structure of metrics objects. The return of function + self.build_metrics. + labels: a tensor or a nested structure of tensors. + model_outputs: a tensor or a nested structure of tensors. For example, + output of the keras model built by self.build_model. + **kwargs: other args. + """ + for metric in metrics: + if 'mask_scores_mse' == metric.name: + actual_mask_scores = segmentation_losses.get_actual_mask_scores( + model_outputs['logits'], labels['masks'], + self.task_config.losses.ignore_label) + metric.update_state(actual_mask_scores, model_outputs['mask_scores']) + else: + metric.update_state(labels, model_outputs['logits']) + + def build_metrics(self, training: bool = True): + """Gets streaming metrics for training/validation.""" + metrics = [] + self.iou_metric = None + + if training and self.task_config.evaluation.report_train_mean_iou: + metrics.append( + segmentation_metrics.MeanIoU( + name='mean_iou', + num_classes=self.task_config.model.num_classes, + rescale_predictions=False, + dtype=tf.float32)) + if self.task_config.model.get('mask_scoring_head'): + metrics.append( + tf.keras.metrics.MeanSquaredError(name='mask_scores_mse')) + + if not training: + self.iou_metric = segmentation_metrics.PerClassIoU( + name='per_class_iou', + num_classes=self.task_config.model.num_classes, + rescale_predictions=( + not self.task_config.validation_data.resize_eval_groundtruth), + dtype=tf.float32) + if (self.task_config.validation_data.resize_eval_groundtruth and + self.task_config.model.get('mask_scoring_head')): + # Masks scores metric can only be computed if labels are scaled to match + # preticted mask scores. + metrics.append( + tf.keras.metrics.MeanSquaredError(name='mask_scores_mse')) + + return metrics + + def train_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + + input_partition_dims = self.task_config.train_input_partition_dims + if input_partition_dims: + strategy = tf.distribute.get_strategy() + features = strategy.experimental_split_to_logical_devices( + features, input_partition_dims) + + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model(features, training=True) + if isinstance(outputs, tf.Tensor): + outputs = {'logits': outputs} + # Casting output layer as float32 is necessary when mixed_precision is + # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + loss = self.build_losses( + model_outputs=outputs, labels=labels, aux_losses=model.losses) + # Scales loss as the default gradients allreduce performs sum inside the + # optimizer. + scaled_loss = loss / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient before apply_gradients when LossScaleOptimizer is + # used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = {self.loss: loss} + if metrics: + self.process_metrics(metrics, labels, outputs) + logs.update({m.name: m.result() for m in metrics}) + + return logs + + def validation_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + + input_partition_dims = self.task_config.eval_input_partition_dims + if input_partition_dims: + strategy = tf.distribute.get_strategy() + features = strategy.experimental_split_to_logical_devices( + features, input_partition_dims) + + outputs = self.inference_step(features, model) + if isinstance(outputs, tf.Tensor): + outputs = {'logits': outputs} + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + + if self.task_config.validation_data.resize_eval_groundtruth: + loss = self.build_losses( + model_outputs=outputs, labels=labels, aux_losses=model.losses) + else: + loss = 0 + + logs = {self.loss: loss} + + if self.iou_metric is not None: + self.iou_metric.update_state(labels, outputs['logits']) + if metrics: + self.process_metrics(metrics, labels, outputs) + + return logs + + def inference_step(self, inputs: tf.Tensor, model: tf.keras.Model): + """Performs the forward step.""" + return model(inputs, training=False) + + def aggregate_logs(self, state=None, step_outputs=None): + if state is None and self.iou_metric is not None: + self.iou_metric.reset_states() + state = self.iou_metric + return state + + def reduce_aggregated_logs(self, aggregated_logs, global_step=None): + result = {} + if self.iou_metric is not None: + ious = self.iou_metric.result() + # TODO(arashwan): support loading class name from a label map file. + if self.task_config.evaluation.report_per_class_iou: + for i, value in enumerate(ious.numpy()): + result.update({'iou/{}'.format(i): value}) + # Computes mean IoU + result.update({'mean_iou': tf.reduce_mean(ious)}) + return result diff --git a/official/vision/tasks/video_classification.py b/official/vision/tasks/video_classification.py new file mode 100644 index 0000000000000000000000000000000000000000..5d1e9770a301e0e10d2fb4857995b31d59adbcb7 --- /dev/null +++ b/official/vision/tasks/video_classification.py @@ -0,0 +1,369 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Video classification task definition.""" +from typing import Any, Optional, List, Tuple + +from absl import logging +import tensorflow as tf +from official.core import base_task +from official.core import task_factory +from official.modeling import tf_utils +from official.vision.configs import video_classification as exp_cfg +from official.vision.dataloaders import input_reader_factory +from official.vision.dataloaders import video_input +from official.vision.modeling import factory_3d +from official.vision.ops import augment + + +@task_factory.register_task_cls(exp_cfg.VideoClassificationTask) +class VideoClassificationTask(base_task.Task): + """A task for video classification.""" + + def _get_num_classes(self): + """Gets the number of classes.""" + return self.task_config.train_data.num_classes + + def _get_feature_shape(self): + """Get the common feature shape for train and eval.""" + return [ + d1 if d1 == d2 else None + for d1, d2 in zip(self.task_config.train_data.feature_shape, + self.task_config.validation_data.feature_shape) + ] + + def _get_num_test_views(self): + """Gets number of views for test.""" + num_test_clips = self.task_config.validation_data.num_test_clips + num_test_crops = self.task_config.validation_data.num_test_crops + num_test_views = num_test_clips * num_test_crops + return num_test_views + + def _is_multilabel(self): + """If the label is multi-labels.""" + return self.task_config.train_data.is_multilabel + + def build_model(self): + """Builds video classification model.""" + common_input_shape = self._get_feature_shape() + input_specs = tf.keras.layers.InputSpec(shape=[None] + common_input_shape) + logging.info('Build model input %r', common_input_shape) + + l2_weight_decay = float(self.task_config.losses.l2_weight_decay) + # Divide weight decay by 2.0 to match the implementation of tf.nn.l2_loss. + # (https://www.tensorflow.org/api_docs/python/tf/keras/regularizers/l2) + # (https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss) + l2_regularizer = (tf.keras.regularizers.l2( + l2_weight_decay / 2.0) if l2_weight_decay else None) + + model = factory_3d.build_model( + self.task_config.model.model_type, + input_specs=input_specs, + model_config=self.task_config.model, + num_classes=self._get_num_classes(), + l2_regularizer=l2_regularizer) + + if self.task_config.freeze_backbone: + logging.info('Freezing model backbone.') + model.backbone.trainable = False + return model + + def initialize(self, model: tf.keras.Model): + """Loads pretrained checkpoint.""" + if not self.task_config.init_checkpoint: + return + + ckpt_dir_or_file = self.task_config.init_checkpoint + if tf.io.gfile.isdir(ckpt_dir_or_file): + ckpt_dir_or_file = tf.train.latest_checkpoint(ckpt_dir_or_file) + + # Restoring checkpoint. + if self.task_config.init_checkpoint_modules == 'all': + ckpt = tf.train.Checkpoint(model=model) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + elif self.task_config.init_checkpoint_modules == 'backbone': + ckpt = tf.train.Checkpoint(backbone=model.backbone) + status = ckpt.read(ckpt_dir_or_file) + status.expect_partial().assert_existing_objects_matched() + else: + raise ValueError( + "Only 'all' or 'backbone' can be used to initialize the model.") + + logging.info('Finished loading pretrained checkpoint from %s', + ckpt_dir_or_file) + + def _get_dataset_fn(self, params): + if params.file_type == 'tfrecord': + return tf.data.TFRecordDataset + else: + raise ValueError('Unknown input file type {!r}'.format(params.file_type)) + + def _get_decoder_fn(self, params): + if params.tfds_name: + decoder = video_input.VideoTfdsDecoder( + image_key=params.image_field_key, label_key=params.label_field_key) + else: + decoder = video_input.Decoder( + image_key=params.image_field_key, label_key=params.label_field_key) + if self.task_config.train_data.output_audio: + assert self.task_config.train_data.audio_feature, 'audio feature is empty' + decoder.add_feature(self.task_config.train_data.audio_feature, + tf.io.VarLenFeature(dtype=tf.float32)) + return decoder.decode + + def build_inputs(self, + params: exp_cfg.DataConfig, + input_context: Optional[tf.distribute.InputContext] = None): + """Builds classification input.""" + + parser = video_input.Parser( + input_params=params, + image_key=params.image_field_key, + label_key=params.label_field_key) + postprocess_fn = video_input.PostBatchProcessor(params) + if params.mixup_and_cutmix is not None: + def mixup_and_cutmix(features, labels): + augmenter = augment.MixupAndCutmix( + mixup_alpha=params.mixup_and_cutmix.mixup_alpha, + cutmix_alpha=params.mixup_and_cutmix.cutmix_alpha, + prob=params.mixup_and_cutmix.prob, + label_smoothing=params.mixup_and_cutmix.label_smoothing, + num_classes=self._get_num_classes()) + features['image'], labels = augmenter(features['image'], labels) + return features, labels + postprocess_fn = mixup_and_cutmix + + reader = input_reader_factory.input_reader_generator( + params, + dataset_fn=self._get_dataset_fn(params), + decoder_fn=self._get_decoder_fn(params), + parser_fn=parser.parse_fn(params.is_training), + postprocess_fn=postprocess_fn) + + dataset = reader.read(input_context=input_context) + + return dataset + + def build_losses(self, + labels: Any, + model_outputs: Any, + aux_losses: Optional[Any] = None): + """Sparse categorical cross entropy loss. + + Args: + labels: labels. + model_outputs: Output logits of the classifier. + aux_losses: auxiliarly loss tensors, i.e. `losses` in keras.Model. + + Returns: + The total loss tensor. + """ + all_losses = {} + losses_config = self.task_config.losses + total_loss = None + if self._is_multilabel(): + entropy = -tf.reduce_mean( + tf.reduce_sum(model_outputs * tf.math.log(model_outputs + 1e-8), -1)) + total_loss = tf.keras.losses.binary_crossentropy( + labels, model_outputs, from_logits=False) + all_losses.update({ + 'class_loss': total_loss, + 'entropy': entropy, + }) + else: + if losses_config.one_hot: + total_loss = tf.keras.losses.categorical_crossentropy( + labels, + model_outputs, + from_logits=False, + label_smoothing=losses_config.label_smoothing) + else: + total_loss = tf.keras.losses.sparse_categorical_crossentropy( + labels, model_outputs, from_logits=False) + + total_loss = tf_utils.safe_mean(total_loss) + all_losses.update({ + 'class_loss': total_loss, + }) + if aux_losses: + all_losses.update({ + 'reg_loss': aux_losses, + }) + total_loss += tf.add_n(aux_losses) + all_losses[self.loss] = total_loss + + return all_losses + + def build_metrics(self, training: bool = True): + """Gets streaming metrics for training/validation.""" + if self.task_config.losses.one_hot: + metrics = [ + tf.keras.metrics.CategoricalAccuracy(name='accuracy'), + tf.keras.metrics.TopKCategoricalAccuracy(k=1, name='top_1_accuracy'), + tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='top_5_accuracy') + ] + if self._is_multilabel(): + metrics.append( + tf.keras.metrics.AUC( + curve='ROC', multi_label=self._is_multilabel(), name='ROC-AUC')) + metrics.append( + tf.keras.metrics.RecallAtPrecision( + 0.95, name='RecallAtPrecision95')) + metrics.append( + tf.keras.metrics.AUC( + curve='PR', multi_label=self._is_multilabel(), name='PR-AUC')) + if self.task_config.metrics.use_per_class_recall: + for i in range(self._get_num_classes()): + metrics.append( + tf.keras.metrics.Recall(class_id=i, name=f'recall-{i}')) + else: + metrics = [ + tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'), + tf.keras.metrics.SparseTopKCategoricalAccuracy( + k=1, name='top_1_accuracy'), + tf.keras.metrics.SparseTopKCategoricalAccuracy( + k=5, name='top_5_accuracy') + ] + return metrics + + def process_metrics(self, metrics: List[Any], labels: Any, + model_outputs: Any): + """Process and update metrics. + + Called when using custom training loop API. + + Args: + metrics: a nested structure of metrics objects. The return of function + self.build_metrics. + labels: a tensor or a nested structure of tensors. + model_outputs: a tensor or a nested structure of tensors. For example, + output of the keras model built by self.build_model. + """ + for metric in metrics: + metric.update_state(labels, model_outputs) + + def train_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + optimizer: tf.keras.optimizers.Optimizer, + metrics: Optional[List[Any]] = None): + """Does forward and backward. + + Args: + inputs: a dictionary of input tensors. + model: the model, forward pass definition. + optimizer: the optimizer for this training step. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + input_partition_dims = self.task_config.train_input_partition_dims + if input_partition_dims: + strategy = tf.distribute.get_strategy() + features['image'] = strategy.experimental_split_to_logical_devices( + features['image'], input_partition_dims) + + num_replicas = tf.distribute.get_strategy().num_replicas_in_sync + with tf.GradientTape() as tape: + outputs = model(features, training=True) + # Casting output layer as float32 is necessary when mixed_precision is + # mixed_float16 or mixed_bfloat16 to ensure output is casted as float32. + outputs = tf.nest.map_structure( + lambda x: tf.cast(x, tf.float32), outputs) + + # Computes per-replica loss. + if self._is_multilabel(): + outputs = tf.nest.map_structure(tf.math.sigmoid, outputs) + else: + outputs = tf.nest.map_structure(tf.math.softmax, outputs) + all_losses = self.build_losses( + model_outputs=outputs, labels=labels, aux_losses=model.losses) + loss = all_losses[self.loss] + # Scales loss as the default gradients allreduce performs sum inside the + # optimizer. + scaled_loss = loss / num_replicas + + # For mixed_precision policy, when LossScaleOptimizer is used, loss is + # scaled for numerical stability. + if isinstance( + optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + scaled_loss = optimizer.get_scaled_loss(scaled_loss) + + tvars = model.trainable_variables + grads = tape.gradient(scaled_loss, tvars) + # Scales back gradient before apply_gradients when LossScaleOptimizer is + # used. + if isinstance(optimizer, tf.keras.mixed_precision.LossScaleOptimizer): + grads = optimizer.get_unscaled_gradients(grads) + optimizer.apply_gradients(list(zip(grads, tvars))) + + logs = all_losses + if metrics: + self.process_metrics(metrics, labels, outputs) + logs.update({m.name: m.result() for m in metrics}) + elif model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in model.metrics}) + return logs + + def validation_step(self, + inputs: Tuple[Any, Any], + model: tf.keras.Model, + metrics: Optional[List[Any]] = None): + """Validatation step. + + Args: + inputs: a dictionary of input tensors. + model: the keras.Model. + metrics: a nested structure of metrics objects. + + Returns: + A dictionary of logs. + """ + features, labels = inputs + input_partition_dims = self.task_config.eval_input_partition_dims + if input_partition_dims: + strategy = tf.distribute.get_strategy() + features['image'] = strategy.experimental_split_to_logical_devices( + features['image'], input_partition_dims) + + outputs = self.inference_step(features, model) + outputs = tf.nest.map_structure(lambda x: tf.cast(x, tf.float32), outputs) + logs = self.build_losses(model_outputs=outputs, labels=labels, + aux_losses=model.losses) + + if metrics: + self.process_metrics(metrics, labels, outputs) + logs.update({m.name: m.result() for m in metrics}) + elif model.compiled_metrics: + self.process_compiled_metrics(model.compiled_metrics, labels, outputs) + logs.update({m.name: m.result() for m in model.metrics}) + return logs + + def inference_step(self, features: tf.Tensor, model: tf.keras.Model): + """Performs the forward step.""" + outputs = model(features, training=False) + if self._is_multilabel(): + outputs = tf.nest.map_structure(tf.math.sigmoid, outputs) + else: + outputs = tf.nest.map_structure(tf.math.softmax, outputs) + num_test_views = self._get_num_test_views() + if num_test_views > 1: + # Averaging output probabilities across multiples views. + outputs = tf.reshape(outputs, [-1, num_test_views, outputs.shape[-1]]) + outputs = tf.reduce_mean(outputs, axis=1) + return outputs diff --git a/official/vision/train.py b/official/vision/train.py new file mode 100644 index 0000000000000000000000000000000000000000..cb0a3cb58c42a99b83d0c4519dbe5e5ce0924f01 --- /dev/null +++ b/official/vision/train.py @@ -0,0 +1,69 @@ +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""TensorFlow Model Garden Vision training driver.""" + +from absl import app +from absl import flags +import gin + +from official.common import distribute_utils +from official.common import flags as tfm_flags +from official.core import task_factory +from official.core import train_lib +from official.core import train_utils +from official.modeling import performance +# pylint: disable=unused-import +from official.vision import registry_imports +# pylint: enable=unused-import + +FLAGS = flags.FLAGS + + +def main(_): + gin.parse_config_files_and_bindings(FLAGS.gin_file, FLAGS.gin_params) + params = train_utils.parse_configuration(FLAGS) + model_dir = FLAGS.model_dir + if 'train' in FLAGS.mode: + # Pure eval modes do not output yaml files. Otherwise continuous eval job + # may race against the train job for writing the same file. + train_utils.serialize_config(params, model_dir) + + # Sets mixed_precision policy. Using 'mixed_float16' or 'mixed_bfloat16' + # can have significant impact on model speeds by utilizing float16 in case of + # GPUs, and bfloat16 in the case of TPUs. loss_scale takes effect only when + # dtype is float16 + if params.runtime.mixed_precision_dtype: + performance.set_mixed_precision_policy(params.runtime.mixed_precision_dtype) + distribution_strategy = distribute_utils.get_distribution_strategy( + distribution_strategy=params.runtime.distribution_strategy, + all_reduce_alg=params.runtime.all_reduce_alg, + num_gpus=params.runtime.num_gpus, + tpu_address=params.runtime.tpu) + with distribution_strategy.scope(): + task = task_factory.get_task(params.task, logging_dir=model_dir) + + train_lib.run_experiment( + distribution_strategy=distribution_strategy, + task=task, + mode=FLAGS.mode, + params=params, + model_dir=model_dir) + + train_utils.save_gin_config(FLAGS.mode, model_dir) + +if __name__ == '__main__': + tfm_flags.define_flags() + flags.mark_flags_as_required(['experiment', 'mode', 'model_dir']) + app.run(main) diff --git a/official/vision/beta/train_spatial_partitioning.py b/official/vision/train_spatial_partitioning.py similarity index 97% rename from official/vision/beta/train_spatial_partitioning.py rename to official/vision/train_spatial_partitioning.py index 6bcbb5327944808e90e460be45f8dc638cd5ce42..bb0f5ec972385c5cf3d65d2f5bdc3bc4b803cb46 100644 --- a/official/vision/beta/train_spatial_partitioning.py +++ b/official/vision/train_spatial_partitioning.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -12,7 +12,6 @@ # See the License for the specific language governing permissions and # limitations under the License. -# Lint as: python3 """TensorFlow Model Garden Vision training driver with spatial partitioning.""" from typing import Sequence @@ -22,13 +21,13 @@ import gin import numpy as np import tensorflow as tf -from official.common import registry_imports # pylint: disable=unused-import from official.common import distribute_utils from official.common import flags as tfm_flags from official.core import task_factory from official.core import train_lib from official.core import train_utils from official.modeling import performance +from official.vision import registry_imports # pylint: disable=unused-import FLAGS = flags.FLAGS diff --git a/official/vision/utils/__init__.py b/official/vision/utils/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/vision/utils/__init__.py +++ b/official/vision/utils/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/__init__.py b/official/vision/utils/object_detection/__init__.py index e419af524b5f349fe04abfa820c3cb51b777d422..310bfb28f0c252bc4a4485325059bff28c5250c2 100644 --- a/official/vision/utils/object_detection/__init__.py +++ b/official/vision/utils/object_detection/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/argmax_matcher.py b/official/vision/utils/object_detection/argmax_matcher.py index c3b012a52a179ce80f7f2ea24d903ed5ce9f0efd..6be34ae3e9cca6a105580a086b3a373074bb202c 100644 --- a/official/vision/utils/object_detection/argmax_matcher.py +++ b/official/vision/utils/object_detection/argmax_matcher.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/balanced_positive_negative_sampler.py b/official/vision/utils/object_detection/balanced_positive_negative_sampler.py index 37463d479b9177b1bb0e1d217c71e657321bc56b..f9a579b99c6622b4328157b3c6dff096423c232b 100644 --- a/official/vision/utils/object_detection/balanced_positive_negative_sampler.py +++ b/official/vision/utils/object_detection/balanced_positive_negative_sampler.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -64,7 +64,7 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): sorted_indices_tensor: A sorted int32 tensor of shape [N] which contains the signed indices of the examples where the sign is based on the label value. The examples that cannot be sampled are set to 0. It samples - atmost sample_size*positive_fraction positive examples and remaining + at most sample_size*positive_fraction positive examples and remaining from negative examples. sample_size: Size of subsamples. @@ -77,8 +77,8 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): tf.zeros(input_length, tf.int32)) num_sampled_pos = tf.reduce_sum( input_tensor=tf.cast(valid_positive_index, tf.int32)) - max_num_positive_samples = tf.constant( - int(sample_size * self._positive_fraction), tf.int32) + max_num_positive_samples = tf.cast( + tf.cast(sample_size, tf.float32) * self._positive_fraction, tf.int32) num_positive_samples = tf.minimum(max_num_positive_samples, num_sampled_pos) num_negative_samples = tf.constant(sample_size, tf.int32) - num_positive_samples @@ -219,7 +219,7 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): indicator: boolean tensor of shape [N] whose True entries can be sampled. batch_size: desired batch size. If None, keeps all positive samples and randomly selects negative samples so that the positive sample fraction - matches self._positive_fraction. It cannot be None is is_static is True. + matches self._positive_fraction. It cannot be None if is_static is True. labels: boolean tensor of shape [N] denoting positive(=True) and negative (=False) examples. scope: name scope. @@ -259,7 +259,9 @@ class BalancedPositiveNegativeSampler(minibatch_sampler.MinibatchSampler): max_num_pos = tf.reduce_sum( input_tensor=tf.cast(positive_idx, dtype=tf.int32)) else: - max_num_pos = int(self._positive_fraction * batch_size) + max_num_pos = tf.cast( + self._positive_fraction * tf.cast(batch_size, tf.float32), + tf.int32) sampled_pos_idx = self.subsample_indicator(positive_idx, max_num_pos) num_sampled_pos = tf.reduce_sum( input_tensor=tf.cast(sampled_pos_idx, tf.int32)) diff --git a/official/vision/utils/object_detection/box_coder.py b/official/vision/utils/object_detection/box_coder.py index c58eead30d4912b8c947fc532cb5b71dc5138233..94904df2600ce7efab6fd40856761c133ab29ee4 100644 --- a/official/vision/utils/object_detection/box_coder.py +++ b/official/vision/utils/object_detection/box_coder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/box_list.py b/official/vision/utils/object_detection/box_list.py index f5d4443c81b22f1586f6691c5c4e309de3046f9c..bf78c8e81e542f3e24b8005678a15125b8fdb208 100644 --- a/official/vision/utils/object_detection/box_list.py +++ b/official/vision/utils/object_detection/box_list.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/box_list_ops.py b/official/vision/utils/object_detection/box_list_ops.py index 9f1b230b533a9fe3680fff8ce7bbe385b48a1194..819d115a34c258576f846e2c40e0b88fa04ee1cf 100644 --- a/official/vision/utils/object_detection/box_list_ops.py +++ b/official/vision/utils/object_detection/box_list_ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -54,7 +54,7 @@ def area(boxlist, scope=None): Returns: a tensor with shape [N] representing box areas. """ - with tf.name_scope(scope, 'Area'): + with tf.name_scope(scope or 'Area'): y_min, x_min, y_max, x_max = tf.split( value=boxlist.get(), num_or_size_splits=4, axis=1) return tf.squeeze((y_max - y_min) * (x_max - x_min), [1]) @@ -71,7 +71,7 @@ def height_width(boxlist, scope=None): Height: A tensor with shape [N] representing box heights. Width: A tensor with shape [N] representing box widths. """ - with tf.name_scope(scope, 'HeightWidth'): + with tf.name_scope(scope or 'HeightWidth'): y_min, x_min, y_max, x_max = tf.split( value=boxlist.get(), num_or_size_splits=4, axis=1) return tf.squeeze(y_max - y_min, [1]), tf.squeeze(x_max - x_min, [1]) @@ -89,7 +89,7 @@ def scale(boxlist, y_scale, x_scale, scope=None): Returns: boxlist: BoxList holding N boxes """ - with tf.name_scope(scope, 'Scale'): + with tf.name_scope(scope or 'Scale'): y_scale = tf.cast(y_scale, tf.float32) x_scale = tf.cast(x_scale, tf.float32) y_min, x_min, y_max, x_max = tf.split( @@ -121,7 +121,7 @@ def clip_to_window(boxlist, window, filter_nonoverlapping=True, scope=None): Returns: a BoxList holding M_out boxes where M_out <= M_in """ - with tf.name_scope(scope, 'ClipToWindow'): + with tf.name_scope(scope or 'ClipToWindow'): y_min, x_min, y_max, x_max = tf.split( value=boxlist.get(), num_or_size_splits=4, axis=1) win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) @@ -160,7 +160,7 @@ def prune_outside_window(boxlist, window, scope=None): valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes in the input tensor. """ - with tf.name_scope(scope, 'PruneOutsideWindow'): + with tf.name_scope(scope or 'PruneOutsideWindow'): y_min, x_min, y_max, x_max = tf.split( value=boxlist.get(), num_or_size_splits=4, axis=1) win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) @@ -194,7 +194,7 @@ def prune_completely_outside_window(boxlist, window, scope=None): valid_indices: a tensor with shape [M_out] indexing the valid bounding boxes in the input tensor. """ - with tf.name_scope(scope, 'PruneCompleteleyOutsideWindow'): + with tf.name_scope(scope or 'PruneCompleteleyOutsideWindow'): y_min, x_min, y_max, x_max = tf.split( value=boxlist.get(), num_or_size_splits=4, axis=1) win_y_min, win_x_min, win_y_max, win_x_max = tf.unstack(window) @@ -220,7 +220,7 @@ def intersection(boxlist1, boxlist2, scope=None): Returns: a tensor with shape [N, M] representing pairwise intersections """ - with tf.name_scope(scope, 'Intersection'): + with tf.name_scope(scope or 'Intersection'): y_min1, x_min1, y_max1, x_max1 = tf.split( value=boxlist1.get(), num_or_size_splits=4, axis=1) y_min2, x_min2, y_max2, x_max2 = tf.split( @@ -245,7 +245,7 @@ def matched_intersection(boxlist1, boxlist2, scope=None): Returns: a tensor with shape [N] representing pairwise intersections """ - with tf.name_scope(scope, 'MatchedIntersection'): + with tf.name_scope(scope or 'MatchedIntersection'): y_min1, x_min1, y_max1, x_max1 = tf.split( value=boxlist1.get(), num_or_size_splits=4, axis=1) y_min2, x_min2, y_max2, x_max2 = tf.split( @@ -270,7 +270,7 @@ def iou(boxlist1, boxlist2, scope=None): Returns: a tensor with shape [N, M] representing pairwise iou scores. """ - with tf.name_scope(scope, 'IOU'): + with tf.name_scope(scope or 'IOU'): intersections = intersection(boxlist1, boxlist2) areas1 = area(boxlist1) areas2 = area(boxlist2) @@ -292,7 +292,7 @@ def matched_iou(boxlist1, boxlist2, scope=None): Returns: a tensor with shape [N] representing pairwise iou scores. """ - with tf.name_scope(scope, 'MatchedIOU'): + with tf.name_scope(scope or 'MatchedIOU'): intersections = matched_intersection(boxlist1, boxlist2) areas1 = area(boxlist1) areas2 = area(boxlist2) @@ -317,7 +317,7 @@ def ioa(boxlist1, boxlist2, scope=None): Returns: a tensor with shape [N, M] representing pairwise ioa scores. """ - with tf.name_scope(scope, 'IOA'): + with tf.name_scope(scope or 'IOA'): intersections = intersection(boxlist1, boxlist2) areas = tf.expand_dims(area(boxlist2), 0) return tf.truediv(intersections, areas) @@ -344,7 +344,7 @@ def prune_non_overlapping_boxes(boxlist1, keep_inds: A tensor with shape [N'] indexing kept bounding boxes in the first input BoxList `boxlist1`. """ - with tf.name_scope(scope, 'PruneNonOverlappingBoxes'): + with tf.name_scope(scope or 'PruneNonOverlappingBoxes'): ioa_ = ioa(boxlist2, boxlist1) # [M, N] tensor ioa_ = tf.reduce_max(ioa_, reduction_indices=[0]) # [N] tensor keep_bool = tf.greater_equal(ioa_, tf.constant(min_overlap)) @@ -364,7 +364,7 @@ def prune_small_boxes(boxlist, min_side, scope=None): Returns: A pruned boxlist. """ - with tf.name_scope(scope, 'PruneSmallBoxes'): + with tf.name_scope(scope or 'PruneSmallBoxes'): height, width = height_width(boxlist) is_valid = tf.logical_and( tf.greater_equal(width, min_side), tf.greater_equal(height, min_side)) @@ -391,7 +391,7 @@ def change_coordinate_frame(boxlist, window, scope=None): Returns: Returns a BoxList object with N boxes. """ - with tf.name_scope(scope, 'ChangeCoordinateFrame'): + with tf.name_scope(scope or 'ChangeCoordinateFrame'): win_height = window[2] - window[0] win_width = window[3] - window[1] boxlist_new = scale( @@ -423,7 +423,7 @@ def sq_dist(boxlist1, boxlist2, scope=None): Returns: a tensor with shape [N, M] representing pairwise distances """ - with tf.name_scope(scope, 'SqDist'): + with tf.name_scope(scope or 'SqDist'): sqnorm1 = tf.reduce_sum(tf.square(boxlist1.get()), 1, keep_dims=True) sqnorm2 = tf.reduce_sum(tf.square(boxlist2.get()), 1, keep_dims=True) innerprod = tf.matmul( @@ -463,7 +463,7 @@ def boolean_mask(boxlist, Raises: ValueError: if `indicator` is not a rank-1 boolean tensor. """ - with tf.name_scope(scope, 'BooleanMask'): + with tf.name_scope(scope or 'BooleanMask'): if indicator.shape.ndims != 1: raise ValueError('indicator should have rank 1') if indicator.dtype != tf.bool: @@ -521,7 +521,7 @@ def gather(boxlist, indices, fields=None, scope=None, use_static_shapes=False): ValueError: if specified field is not contained in boxlist or if the indices are not of type int32 """ - with tf.name_scope(scope, 'Gather'): + with tf.name_scope(scope or 'Gather'): if len(indices.shape.as_list()) != 1: raise ValueError('indices should have rank 1') if indices.dtype != tf.int32 and indices.dtype != tf.int64: @@ -562,7 +562,7 @@ def concatenate(boxlists, fields=None, scope=None): contains non BoxList objects), or if requested fields are not contained in all boxlists """ - with tf.name_scope(scope, 'Concatenate'): + with tf.name_scope(scope or 'Concatenate'): if not isinstance(boxlists, list): raise ValueError('boxlists should be a list') if not boxlists: @@ -612,7 +612,7 @@ def sort_by_field(boxlist, field, order=SortOrder.descend, scope=None): ValueError: if specified field does not exist ValueError: if the order is not either descend or ascend """ - with tf.name_scope(scope, 'SortByField'): + with tf.name_scope(scope or 'SortByField'): if order != SortOrder.descend and order != SortOrder.ascend: raise ValueError('Invalid sort order') @@ -653,7 +653,7 @@ def visualize_boxes_in_image(image, boxlist, normalized=False, scope=None): Returns: image_and_boxes: an image tensor with shape [height, width, 3] """ - with tf.name_scope(scope, 'VisualizeBoxesInImage'): + with tf.name_scope(scope or 'VisualizeBoxesInImage'): if not normalized: height, width, _ = tf.unstack(tf.shape(image)) boxlist = scale(boxlist, 1.0 / tf.cast(height, tf.float32), @@ -679,7 +679,7 @@ def filter_field_value_equals(boxlist, field, value, scope=None): ValueError: if boxlist not a BoxList object or if it does not have the specified field. """ - with tf.name_scope(scope, 'FilterFieldValueEquals'): + with tf.name_scope(scope or 'FilterFieldValueEquals'): if not isinstance(boxlist, box_list.BoxList): raise ValueError('boxlist must be a BoxList') if not boxlist.has_field(field): @@ -710,7 +710,7 @@ def filter_greater_than(boxlist, thresh, scope=None): ValueError: if boxlist not a BoxList object or if it does not have a scores field """ - with tf.name_scope(scope, 'FilterGreaterThan'): + with tf.name_scope(scope or 'FilterGreaterThan'): if not isinstance(boxlist, box_list.BoxList): raise ValueError('boxlist must be a BoxList') if not boxlist.has_field('scores'): @@ -746,7 +746,7 @@ def non_max_suppression(boxlist, thresh, max_output_size, scope=None): Raises: ValueError: if thresh is not in [0, 1] """ - with tf.name_scope(scope, 'NonMaxSuppression'): + with tf.name_scope(scope or 'NonMaxSuppression'): if not 0 <= thresh <= 1.0: raise ValueError('thresh must be between 0 and 1') if not isinstance(boxlist, box_list.BoxList): @@ -802,7 +802,7 @@ def to_normalized_coordinates(boxlist, Returns: boxlist with normalized coordinates in [0, 1]. """ - with tf.name_scope(scope, 'ToNormalizedCoordinates'): + with tf.name_scope(scope or 'ToNormalizedCoordinates'): height = tf.cast(height, tf.float32) width = tf.cast(width, tf.float32) @@ -842,7 +842,7 @@ def to_absolute_coordinates(boxlist, boxlist with absolute coordinates in terms of the image size. """ - with tf.name_scope(scope, 'ToAbsoluteCoordinates'): + with tf.name_scope(scope or 'ToAbsoluteCoordinates'): height = tf.cast(height, tf.float32) width = tf.cast(width, tf.float32) @@ -987,10 +987,9 @@ def box_voting(selected_boxes, pool_boxes, iou_thresh=0.5): # match to any boxes in pool_boxes. For such boxes without any matches, we # should return the original boxes without voting. match_assert = tf.Assert( - tf.reduce_all(tf.greater(num_matches, 0)), [ - 'Each box in selected_boxes must match with at least one box ' - 'in pool_boxes.' - ]) + tf.reduce_all(tf.greater(num_matches, 0)), + 'Each box in selected_boxes must match with at least one box ' + 'in pool_boxes.') scores = tf.expand_dims(pool_boxes.get_field('scores'), 1) scores_assert = tf.Assert( @@ -1024,7 +1023,7 @@ def get_minimal_coverage_box(boxlist, default_box=None, scope=None): boxes in the box list. If the boxlist does not contain any boxes, the default box is returned. """ - with tf.name_scope(scope, 'CreateCoverageBox'): + with tf.name_scope(scope or 'CreateCoverageBox'): num_boxes = boxlist.num_boxes() def coverage_box(bboxes): @@ -1068,7 +1067,7 @@ def sample_boxes_by_jittering(boxlist, sampled_boxlist: A boxlist containing num_boxes_to_sample boxes in normalized coordinates. """ - with tf.name_scope(scope, 'SampleBoxesByJittering'): + with tf.name_scope(scope or 'SampleBoxesByJittering'): num_boxes = boxlist.num_boxes() box_indices = tf.random_uniform([num_boxes_to_sample], minval=0, diff --git a/official/vision/utils/object_detection/faster_rcnn_box_coder.py b/official/vision/utils/object_detection/faster_rcnn_box_coder.py index 00ce4c00c62b9e81e33013b52607ca659ab4c8df..f319ef8b7d85883e650f93b5ac21207db76c8b9a 100644 --- a/official/vision/utils/object_detection/faster_rcnn_box_coder.py +++ b/official/vision/utils/object_detection/faster_rcnn_box_coder.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/matcher.py b/official/vision/utils/object_detection/matcher.py index 1586830970437a19566bf430c18e8ca2b7e47a62..412a7d8590b098bc196077f89a99421add4f2228 100644 --- a/official/vision/utils/object_detection/matcher.py +++ b/official/vision/utils/object_detection/matcher.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/minibatch_sampler.py b/official/vision/utils/object_detection/minibatch_sampler.py index 07ffc8bc1901344b699bc44d82181447ede65a7e..d013a438de07ce8c3a7a755d08cb187e030eab2a 100644 --- a/official/vision/utils/object_detection/minibatch_sampler.py +++ b/official/vision/utils/object_detection/minibatch_sampler.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/ops.py b/official/vision/utils/object_detection/ops.py index a0892e46ac3ad8fbc2eaa502b085da7a04e4fd3d..dac1cc869df5afdda468527af1f9a0781ce8c852 100644 --- a/official/vision/utils/object_detection/ops.py +++ b/official/vision/utils/object_detection/ops.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/preprocessor.py b/official/vision/utils/object_detection/preprocessor.py index 082495b5e0e4c3f527a1bd5e2eb17bbe3bb0ccf0..fd2d87fc73a93785da89f3e75dbf69865039ddfc 100644 --- a/official/vision/utils/object_detection/preprocessor.py +++ b/official/vision/utils/object_detection/preprocessor.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/region_similarity_calculator.py b/official/vision/utils/object_detection/region_similarity_calculator.py index 9b26b4c65f9c0dde20b1d2d1916b0c3aa3d9e23f..e94660d684749fbd790f1e5370a955c752af53d6 100644 --- a/official/vision/utils/object_detection/region_similarity_calculator.py +++ b/official/vision/utils/object_detection/region_similarity_calculator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/shape_utils.py b/official/vision/utils/object_detection/shape_utils.py index 6bf7c49d0e1d0eb9f10524e7889ba74c461bfc50..15af56d4af124249aa0cd16d72976ea3b53971e9 100644 --- a/official/vision/utils/object_detection/shape_utils.py +++ b/official/vision/utils/object_detection/shape_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/target_assigner.py b/official/vision/utils/object_detection/target_assigner.py index 4dae06dba09c48d7e76a1acc51adb548b0f93147..7c1b378d128452325bca23c6f701525d4409548d 100644 --- a/official/vision/utils/object_detection/target_assigner.py +++ b/official/vision/utils/object_detection/target_assigner.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/official/vision/utils/object_detection/visualization_utils.py b/official/vision/utils/object_detection/visualization_utils.py index a36a89ddb02be8157076d0293ac1bfbadfd03022..48159e6ba9c00dec4eca0b49a4fd727ba4df0c41 100644 --- a/official/vision/utils/object_detection/visualization_utils.py +++ b/official/vision/utils/object_detection/visualization_utils.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -34,7 +34,7 @@ import PIL.ImageFont as ImageFont import six import tensorflow as tf -from official.vision.beta.ops import box_ops +from official.vision.ops import box_ops from official.vision.utils.object_detection import shape_utils _TITLE_LEFT_MARGIN = 10 @@ -679,7 +679,7 @@ def add_cdf_image_summary(values, name): np.arange(cumulative_values.size, dtype=np.float32) / cumulative_values.size) fig = plt.figure(frameon=False) - ax = fig.add_subplot('111') + ax = fig.add_subplot(1, 1, 1) ax.plot(fraction_of_examples, cumulative_values) ax.set_ylabel('cumulative normalized values') ax.set_xlabel('fraction of examples') @@ -708,7 +708,7 @@ def add_hist_image_summary(values, bins, name): def hist_plot(values, bins): """Numpy function to plot hist.""" fig = plt.figure(frameon=False) - ax = fig.add_subplot('111') + ax = fig.add_subplot(1, 1, 1) y, x = np.histogram(values, bins=bins) ax.plot(x[:-1], y) ax.set_ylabel('count') diff --git a/orbit/__init__.py b/orbit/__init__.py index 01442a565d5260a93f97a0bd38b0c793fed1bb1f..58f0f5f2f8551c2febc36afa7557dfcd09c9c9d6 100644 --- a/orbit/__init__.py +++ b/orbit/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -15,6 +15,7 @@ """Defines exported symbols for the `orbit` package.""" from orbit import actions +# Internal import orbit. from orbit import utils from orbit.controller import Action diff --git a/orbit/actions/__init__.py b/orbit/actions/__init__.py index 5c3eab2d8b09cb8fb273751c5e4151224c7b0bfd..a18cc94b91879004945e39eb70bdb30cc0d8eaac 100644 --- a/orbit/actions/__init__.py +++ b/orbit/actions/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/actions/conditional_action.py b/orbit/actions/conditional_action.py index e4b8122270f18c3cd3f48138ad8cf205f74974ac..95e33f1216091312d59ea5dcf41c19d90719d161 100644 --- a/orbit/actions/conditional_action.py +++ b/orbit/actions/conditional_action.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/actions/conditional_action_test.py b/orbit/actions/conditional_action_test.py index cfcfd0f541b0335c1d450bf6169e6ca225321ebc..53f4891624f4d3bc5f0cf1971fce25d204c1cf18 100644 --- a/orbit/actions/conditional_action_test.py +++ b/orbit/actions/conditional_action_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/actions/export_saved_model.py b/orbit/actions/export_saved_model.py index e53c40c38d787b2f40b378afdca3f0f3c118435a..b2dea003108fa2a8af66da8b0d3603a4eb8a9f41 100644 --- a/orbit/actions/export_saved_model.py +++ b/orbit/actions/export_saved_model.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -14,6 +14,7 @@ """Provides the `ExportSavedModel` action and associated helper classes.""" +import os import re from typing import Callable, Optional @@ -77,9 +78,9 @@ class ExportFileManager: One common alternative maybe be to use the current global step count, for instance passing `next_id_fn=global_step.numpy`. """ - self._base_name = base_name + self._base_name = os.path.normpath(base_name) self._max_to_keep = max_to_keep - self._next_id_fn = next_id_fn or _CounterIdFn(base_name) + self._next_id_fn = next_id_fn or _CounterIdFn(self._base_name) @property def managed_files(self): diff --git a/orbit/actions/export_saved_model_test.py b/orbit/actions/export_saved_model_test.py index 191f6fdb5beb61ecd51ffaa512dacf6e43db09b3..c9583e0c29bb38d72e7c2282dd56fd475d94071a 100644 --- a/orbit/actions/export_saved_model_test.py +++ b/orbit/actions/export_saved_model_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -122,6 +122,26 @@ class ExportSavedModelTest(tf.test.TestCase): manager.managed_files, [f'{base_name}-10', f'{base_name}-50', f'{base_name}-1000']) + def test_export_file_manager_managed_files_double_slash(self): + directory = self.create_tempdir('foo//bar') + directory.create_file('basename-5') + directory.create_file('basename-10') + directory.create_file('basename-50') + directory.create_file('basename-1000') + directory.create_file('basename-9') + directory.create_file('basename-10-suffix') + base_name = os.path.join(directory.full_path, 'basename') + expected_base_name = os.path.normpath(base_name) + self.assertNotEqual(base_name, expected_base_name) + manager = actions.ExportFileManager(base_name, max_to_keep=3) + self.assertLen(manager.managed_files, 5) + self.assertEqual(manager.next_name(), f'{expected_base_name}-1001') + manager.clean_up() + self.assertEqual(manager.managed_files, [ + f'{expected_base_name}-10', f'{expected_base_name}-50', + f'{expected_base_name}-1000' + ]) + def test_export_saved_model(self): directory = self.create_tempdir() base_name = os.path.join(directory.full_path, 'basename') diff --git a/orbit/actions/new_best_metric.py b/orbit/actions/new_best_metric.py index f2a01c80f55ea6e2cbe51bc0dfb579d98f1c95a1..c551fd43b160c27c09f91363e41b6955b39490c5 100644 --- a/orbit/actions/new_best_metric.py +++ b/orbit/actions/new_best_metric.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/actions/new_best_metric_test.py b/orbit/actions/new_best_metric_test.py index aff21fda2c711e9fbde167e9b5f3ff0bcb5428d4..d14a86aaf5f8af135523a53c73888cc70009de66 100644 --- a/orbit/actions/new_best_metric_test.py +++ b/orbit/actions/new_best_metric_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/controller.py b/orbit/controller.py index a2dc8f1477a1e8da4c000f5502404af2924d7fe1..c47859f388281b8c044df4da507abfadf4b6a0c0 100644 --- a/orbit/controller.py +++ b/orbit/controller.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -17,7 +17,7 @@ import pprint import time -from typing import Callable, List, Optional, Union +from typing import Callable, Iterable, Optional, Union from absl import logging @@ -76,13 +76,13 @@ class Controller: other custom outer loop implementations easy to achieve. Some additional customization can be achieved by supplying `train_actions` or - `eval_actions` when constructing the `Controller`. These are just lists of - arbitrary callables that are applied by the `Controller` to the output of - train steps (after each inner loop of `steps_per_loop` steps) or an - evaluation. This provides a hook mechanism, enabling things like reporting - metrics to Vizier, model exporting, additional logging, etc. See the - `orbit.actions` package for a small handful of predefined actions and some - utility classes that may be useful in defining your own. + `eval_actions` when constructing the `Controller`. Actions arbitrary callables + that are applied by the `Controller` to the output of train steps (after each + inner loop of `steps_per_loop` steps) or an evaluation. This provides a hook + mechanism, enabling things like reporting metrics to Vizier, model exporting, + additional logging, etc. See the `orbit.actions` package for a small handful + of predefined actions and some utility classes that may be useful in defining + your own. """ def __init__( @@ -93,17 +93,18 @@ class Controller: evaluator: Optional[runner.AbstractEvaluator] = None, strategy: Optional[tf.distribute.Strategy] = None, # Actions - train_actions: Optional[List[Action]] = None, - eval_actions: Optional[List[Action]] = None, + train_actions: Optional[Iterable[Action]] = None, + eval_actions: Optional[Iterable[Action]] = None, # Train related - steps_per_loop: Optional[int] = None, + steps_per_loop: Optional[Union[int, Callable[[int], int]]] = None, checkpoint_manager: Optional[tf.train.CheckpointManager] = None, # Summary related summary_interval: Optional[int] = None, summary_dir: Optional[str] = None, # Evaluation related eval_summary_dir: Optional[str] = None, - ): + summary_manager: Optional[utils.SummaryManagerInterface] = None, + eval_summary_manager: Optional[utils.SummaryManagerInterface] = None): """Initializes a `Controller` instance. Note that if `checkpoint_manager` is provided and there are checkpoints in @@ -127,14 +128,16 @@ class Controller: strategy: An instance of `tf.distribute.Strategy`. If not provided, the strategy will be initialized from the current in-scope strategy using `tf.distribute.get_strategy()`. - train_actions: An optional list of `orbit.Action`s to call after each - block of `steps_per_loop` training steps are run. These will be called - with the output of `trainer.train`. - eval_actions: An optional list of `orbit.Action`s to call after each - evaluation. These will be called with the output of - `evaluator.evaluate`. - steps_per_loop: The number of steps to run in each inner loop of training - (passed as the `num_steps` parameter of `trainer.train`). + train_actions: Optional `orbit.Action`s to call after each block of + `steps_per_loop` training steps are run. These will be called with the + output of `trainer.train`. + eval_actions: Optional `orbit.Action`s to call after each evaluation. + These will be called with the output of `evaluator.evaluate`. + steps_per_loop: Optional integer to indicate the number of steps to run in + each inner loop of training (passed as the `num_steps` parameter of + `trainer.train`). It can be also a callable which takes the current + global step value as input and returns the number of steps to run as + output. checkpoint_manager: An instance of `tf.train.CheckpointManager`. If provided and there are checkpoints in the associated model directory, the model will be restored from the most recent checkpoint inside this @@ -152,10 +155,18 @@ class Controller: eval_summary_dir: The directory to write eval summaries to. If `None`, it will be set to `summary_dir`. If both `summary_dir` and `eval_summary_dir` are `None`, no eval summaries will be written. + summary_manager: Instance of the summary manager. If set, the + `summary_dir` will be ignored. Otherwise the summary manager will be + created internally for TensorBoard summaries by default from the + `summary_dir`. + eval_summary_manager: Instance of the eval summary manager. If set, the + `eval_summary_dir` will be ignored. Otherwise the eval summary manager + will be created internally for TensorBoard summaries by default from the + `eval_summary_dir`. Raises: ValueError: If both `trainer` and `evaluator` are `None`. - ValueError: If `steps_per_loop` is not a positive integer. + ValueError: If `steps_per_loop` is not a positive integer or a callable. ValueError: If `summary_interval` is not a positive integer or is not divisible by `steps_per_loop`. """ @@ -166,15 +177,18 @@ class Controller: if steps_per_loop is None: raise ValueError( "`steps_per_loop` is required when `trainer` is provided.") - elif not isinstance(steps_per_loop, int) or steps_per_loop < 1: + elif not callable(steps_per_loop) and ( + not isinstance(steps_per_loop, int) or steps_per_loop < 1): raise ValueError( - f"`steps_per_loop` ({steps_per_loop}) must be a positive integer.") + f"`steps_per_loop` ({steps_per_loop}) must be a positive integer " + "or a callable.") if summary_interval is not None: if summary_interval <= 0: raise ValueError( f"`summary_interval` ({summary_interval}) must be larger than 0.") - elif summary_interval % steps_per_loop != 0: + elif not callable(steps_per_loop) and (summary_interval % steps_per_loop + != 0): raise ValueError( f"`summary interval` ({summary_interval}) must be a multiple " f"of `steps_per_loop` ({steps_per_loop}).") @@ -187,18 +201,21 @@ class Controller: self.strategy = strategy or tf.distribute.get_strategy() - self.train_actions = train_actions or [] - self.eval_actions = eval_actions or [] + self.train_actions = () if train_actions is None else tuple(train_actions) + self.eval_actions = () if eval_actions is None else tuple(eval_actions) self.global_step = global_step self.checkpoint_manager = checkpoint_manager if self.trainer is not None: self.step_timer = None - self.steps_per_loop = steps_per_loop self.summary_interval = summary_interval - self.summary_manager = utils.SummaryManager( - summary_dir, tf.summary.scalar, global_step=self.global_step) + if summary_manager: + self.summary_manager = summary_manager + else: + self.summary_manager = utils.SummaryManager( + summary_dir, tf.summary.scalar, global_step=self.global_step) + self._steps_per_loop = steps_per_loop if self.evaluator is not None: eval_summary_dir = eval_summary_dir or summary_dir @@ -207,8 +224,11 @@ class Controller: # are the same. self.eval_summary_manager = self.summary_manager else: - self.eval_summary_manager = utils.SummaryManager( - eval_summary_dir, tf.summary.scalar, global_step=self.global_step) + if eval_summary_manager: + self.eval_summary_manager = eval_summary_manager + else: + self.eval_summary_manager = utils.SummaryManager( + eval_summary_dir, tf.summary.scalar, global_step=self.global_step) tf.summary.experimental.set_step(self.global_step) @@ -319,9 +339,6 @@ class Controller: results in a shorter inner loop than specified by `steps_per_loop` setting. If None, evaluation will only be performed after training is complete. - - Raises: - ValueError: If eval_interval is not a multiple of self.steps_per_loop. """ self._require("trainer", for_method="train_and_evaluate") self._require("evaluator", for_method="train_and_evaluate") @@ -413,6 +430,13 @@ class Controller: self._require("checkpoint_manager", for_method="save_checkpoint") self._maybe_save_checkpoint(check_interval=False) + @property + def steps_per_loop(self): + """Returns current steps_per_loop value in a training loop.""" + if callable(self._steps_per_loop): + return self._steps_per_loop(self.global_step.numpy()) + return self._steps_per_loop + def _train_n_steps(self, num_steps: int): """Runs training for `num_steps` steps. diff --git a/orbit/controller_test.py b/orbit/controller_test.py index fd1d1b8b87c5ed6d3639c7ec683d45aa0a0d0df4..50cf2443c9bd495b6a615888ddf2e01830af6dfe 100644 --- a/orbit/controller_test.py +++ b/orbit/controller_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -24,6 +24,7 @@ import numpy as np from orbit import controller from orbit import runner from orbit import standard_runner +import orbit.utils import tensorflow as tf @@ -698,12 +699,22 @@ class ControllerTest(tf.test.TestCase, parameterized.TestCase): self.assertLen( summaries_with_matching_keyword("eval_loss", self.model_dir), 2) - def test_evaluate_with_nested_summaries(self): + @parameterized.named_parameters(("DefaultSummary", False), + ("InjectSummary", True)) + def test_evaluate_with_nested_summaries(self, inject_summary_manager): test_evaluator = TestEvaluatorWithNestedSummary() + if inject_summary_manager: + summary_manager = orbit.utils.SummaryManager( + self.model_dir, + tf.summary.scalar, + global_step=tf.Variable(0, dtype=tf.int64)) + else: + summary_manager = None test_controller = controller.Controller( evaluator=test_evaluator, global_step=tf.Variable(0, dtype=tf.int64), - eval_summary_dir=self.model_dir) + eval_summary_dir=self.model_dir, + summary_manager=summary_manager) test_controller.evaluate(steps=5) self.assertNotEmpty( @@ -770,6 +781,32 @@ class ControllerTest(tf.test.TestCase, parameterized.TestCase): self.assertIn("eval_loss", output) self.assertGreaterEqual(output["eval_loss"], 0) + def test_step_per_loop_callable(self): + test_runner = TestRunner() + + checkpoint = tf.train.Checkpoint( + model=test_runner.model, optimizer=test_runner.optimizer) + checkpoint_manager = tf.train.CheckpointManager( + checkpoint, + self.model_dir, + max_to_keep=None, + step_counter=test_runner.global_step, + checkpoint_interval=10) + + def steps_per_loop_fn(global_step): + if global_step > 4: + return 4 + return 2 + + test_controller = controller.Controller( + trainer=test_runner, + global_step=test_runner.global_step, + steps_per_loop=steps_per_loop_fn, + checkpoint_manager=checkpoint_manager, + ) + test_controller.train(steps=10) + self.assertEqual(test_runner.global_step, 10) + if __name__ == "__main__": tf.test.main() diff --git a/orbit/examples/__init__.py b/orbit/examples/__init__.py index a4d9cc3a1b148e5c8c153f2f2357d0475e7a43b6..8d5738a7ad90c425662ca3d492737ff81134d501 100644 --- a/orbit/examples/__init__.py +++ b/orbit/examples/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/examples/single_task/__init__.py b/orbit/examples/single_task/__init__.py index a4d9cc3a1b148e5c8c153f2f2357d0475e7a43b6..8d5738a7ad90c425662ca3d492737ff81134d501 100644 --- a/orbit/examples/single_task/__init__.py +++ b/orbit/examples/single_task/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/examples/single_task/single_task_evaluator.py b/orbit/examples/single_task/single_task_evaluator.py index 0dcbae063a6282cbf76b8247bdcea43ee9c26c42..1fee37a14b6e6893b6e65cb8faa85fef2ac82143 100644 --- a/orbit/examples/single_task/single_task_evaluator.py +++ b/orbit/examples/single_task/single_task_evaluator.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/examples/single_task/single_task_evaluator_test.py b/orbit/examples/single_task/single_task_evaluator_test.py index c074da0fb9de8e054451ac1a07f729c26ff08613..349e7598ee833a3e1e4eabd32554e5f86b0c7ca0 100644 --- a/orbit/examples/single_task/single_task_evaluator_test.py +++ b/orbit/examples/single_task/single_task_evaluator_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/examples/single_task/single_task_trainer.py b/orbit/examples/single_task/single_task_trainer.py index f9b29185a760ea391581ee08be94ff1f9df79932..a6a1ef605d1e30a8aade3e2c1b450f8a7edc7d47 100644 --- a/orbit/examples/single_task/single_task_trainer.py +++ b/orbit/examples/single_task/single_task_trainer.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/examples/single_task/single_task_trainer_test.py b/orbit/examples/single_task/single_task_trainer_test.py index cba34f7b05485d891c9b1c6da4e02522df1f772a..3ff48797cbdcb30251e132ad751856161fc7769d 100644 --- a/orbit/examples/single_task/single_task_trainer_test.py +++ b/orbit/examples/single_task/single_task_trainer_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/runner.py b/orbit/runner.py index b0377c5218cac2f60ceacdc66721bac5b149c68b..722ae49f482fe7e6510a72f24d0a964040564949 100644 --- a/orbit/runner.py +++ b/orbit/runner.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/standard_runner.py b/orbit/standard_runner.py index 7b0c8a3791fc13fcdd01425db6c035081b245daa..dbd7a8fdd6d1c769eaab1d5a401fa47885a8f342 100644 --- a/orbit/standard_runner.py +++ b/orbit/standard_runner.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -360,7 +360,7 @@ class StandardEvaluator(runner.AbstractEvaluator, metaclass=abc.ABCMeta): Note that this method is called before dataset iterator creation. Returns: - An value to pass as the `state` argument to `eval_reduce`. + A value to pass as the `state` argument to `eval_reduce`. """ pass @@ -421,7 +421,7 @@ class StandardEvaluator(runner.AbstractEvaluator, metaclass=abc.ABCMeta): evaluation for subsequent processing in `eval_end()`. Args: - state: A state being mainted throughout the evaluation. + state: A state being maintained throughout the evaluation. step_outputs: Outputs from the current evaluation step. Returns: diff --git a/orbit/standard_runner_test.py b/orbit/standard_runner_test.py index d3d45016570a5fcfbeb3c0f6e8d4a0d8fdaeeb26..b21f8a148340f3ac1a7baaa36d3b133bb1c86c41 100644 --- a/orbit/standard_runner_test.py +++ b/orbit/standard_runner_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/utils/__init__.py b/orbit/utils/__init__.py index 3eeb67c4a284d238e260e9587c4cfda1aab13a9a..f14c0ab254b067b7124ea9f5df69cc4b42ee543c 100644 --- a/orbit/utils/__init__.py +++ b/orbit/utils/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -25,5 +25,6 @@ from orbit.utils.loop_fns import create_tf_while_loop_fn from orbit.utils.loop_fns import LoopFnWithSummaries from orbit.utils.summary_manager import SummaryManager +from orbit.utils.summary_manager_interface import SummaryManagerInterface from orbit.utils.tpu_summaries import OptionalSummariesFunction diff --git a/orbit/utils/common.py b/orbit/utils/common.py index 11d7f5decbace69d75120d9b306b7f8352c2620d..27a49e566a5656b3d547edb2f687f37c767eec76 100644 --- a/orbit/utils/common.py +++ b/orbit/utils/common.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/utils/common_test.py b/orbit/utils/common_test.py index 1a68e7c66b20b0814d2618de190128a0dcaa0387..4a8c2bf884a97b770c4aba5f3e10ec2dc574c27f 100644 --- a/orbit/utils/common_test.py +++ b/orbit/utils/common_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/utils/epoch_helper.py b/orbit/utils/epoch_helper.py index 10c11324ae8371b290c9973ae008902cd76fb9eb..21381b04968eb9f98188e22317d9536a69bfe33a 100644 --- a/orbit/utils/epoch_helper.py +++ b/orbit/utils/epoch_helper.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/utils/loop_fns.py b/orbit/utils/loop_fns.py index 7cc6529a3eb50d39e251c059498c67ebe080df82..df6ea7d96a35890ad76ebdb3ecba158c7f528dd9 100644 --- a/orbit/utils/loop_fns.py +++ b/orbit/utils/loop_fns.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/utils/summary_manager.py b/orbit/utils/summary_manager.py index 63a44940f533a33f39d78edaf936dcd1c8354648..c3982ccbee24cd0e31cacaba75dfd57eedc61e98 100644 --- a/orbit/utils/summary_manager.py +++ b/orbit/utils/summary_manager.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -16,10 +16,12 @@ import os +from orbit.utils.summary_manager_interface import SummaryManagerInterface + import tensorflow as tf -class SummaryManager: +class SummaryManager(SummaryManagerInterface): """A utility class for managing summary writing.""" def __init__(self, summary_dir, summary_fn, global_step=None): diff --git a/orbit/utils/summary_manager_interface.py b/orbit/utils/summary_manager_interface.py new file mode 100644 index 0000000000000000000000000000000000000000..5b984590a66d1843af2a2f80c9d1d2850d1fc7c8 --- /dev/null +++ b/orbit/utils/summary_manager_interface.py @@ -0,0 +1,64 @@ +# Copyright 2022 The Orbit Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Provides a utility class for managing summary writing.""" + +import abc + + +class SummaryManagerInterface(abc.ABC): + """A utility interface for managing summary writing.""" + + @abc.abstractmethod + def flush(self): + """Flushes the the recorded summaries.""" + raise NotImplementedError + + @abc.abstractmethod + def summary_writer(self, relative_path=""): + """Returns the underlying summary writer for scoped writers.""" + raise NotImplementedError + + @abc.abstractmethod + def write_summaries(self, summary_dict): + """Writes summaries for the given dictionary of values. + + The summary_dict can be any nested dict. The SummaryManager should + recursively creates summaries, yielding a hierarchy of summaries which will + then be reflected in the corresponding UIs. + + For example, users may evaluate on multiple datasets and return + `summary_dict` as a nested dictionary: + + { + "dataset1": { + "loss": loss1, + "accuracy": accuracy1 + }, + "dataset2": { + "loss": loss2, + "accuracy": accuracy2 + }, + } + + This will create two set of summaries, "dataset1" and "dataset2". Each + summary dict will contain summaries including both "loss" and "accuracy". + + Args: + summary_dict: A dictionary of values. If any value in `summary_dict` is + itself a dictionary, then the function will create a new summary_dict + with name given by the corresponding key. This is performed recursively. + Leaf values are then summarized using the parent relative path. + """ + raise NotImplementedError diff --git a/orbit/utils/tpu_summaries.py b/orbit/utils/tpu_summaries.py index 3501c7aa8041f977082e9d224d4105810b78f64c..2791eff4386bef095d1e9703e56e21f0417864f7 100644 --- a/orbit/utils/tpu_summaries.py +++ b/orbit/utils/tpu_summaries.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/orbit/utils/tpu_summaries_test.py b/orbit/utils/tpu_summaries_test.py index 4aa0d0820fa8c501d7db339b568d56fd7dc1bf28..7ffe16be870eb83cb07479bb4d7bca137b3b07af 100644 --- a/orbit/utils/tpu_summaries_test.py +++ b/orbit/utils/tpu_summaries_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The Orbit Authors. All Rights Reserved. +# Copyright 2022 The Orbit Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/tensorflow_models/LICENSE b/tensorflow_models/LICENSE deleted file mode 100644 index d3da228420e973edaf4123d5eeb42210f4450b0c..0000000000000000000000000000000000000000 --- a/tensorflow_models/LICENSE +++ /dev/null @@ -1,203 +0,0 @@ -Copyright 2015 The TensorFlow Authors. All rights reserved. - - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright 2015, The TensorFlow Authors. - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/tensorflow_models/__init__.py b/tensorflow_models/__init__.py index 61c120deba3749745c032fd46756a5fa9598b7cc..9e775026d187086c9397aa95c20b5cf800b4912d 100644 --- a/tensorflow_models/__init__.py +++ b/tensorflow_models/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -20,4 +20,4 @@ from tensorflow_models import vision from official import core from official.modeling import hyperparams from official.modeling import optimization -from official.modeling import tf_utils +from official.modeling import tf_utils as utils diff --git a/tensorflow_models/nlp/__init__.py b/tensorflow_models/nlp/__init__.py index b26a57ce67bb11fae20bd8ce375754ebd4e44ce0..b26f691806a54a1441d8e7718eddb6223803d23a 100644 --- a/tensorflow_models/nlp/__init__.py +++ b/tensorflow_models/nlp/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,5 +13,8 @@ # limitations under the License. """TensorFlow Models NLP Libraries.""" + from official.nlp import tasks +from official.nlp.configs import encoders from official.nlp.modeling import * +from official.nlp.serving import serving_modules diff --git a/tensorflow_models/tensorflow_models_pypi.ipynb b/tensorflow_models/tensorflow_models_pypi.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..f7bd3dfa179ce4d724895140b2144f8a944ba068 --- /dev/null +++ b/tensorflow_models/tensorflow_models_pypi.ipynb @@ -0,0 +1,339 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "bK-7g5sizhg5" + }, + "source": [ + "## Install Tensorflow-Models packages\n", + "\n", + "The notebook is tested with Google Colab sandbox.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 9737, + "status": "ok", + "timestamp": 1650513863935, + "user": { + "displayName": "Hongkun Yu", + "userId": "12855578661733349593" + }, + "user_tz": 420 + }, + "id": "eTz93_P2dMty", + "outputId": "d147b4b0-954f-4064-d179-bb82bc3ea4fe" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[K |████████████████████████████████| 21.8 MB 1.6 MB/s \n", + "\u001b[?25h" + ] + } + ], + "source": [ + "!pip3 install -q tf-models-nightly\n", + "# Fix Colab default opencv problem\n", + "!pip3 install -q opencv-python-headless==4.1.2.30\n", + "\n", + "## Colab environment setup. To use a stable TF release version\n", + "## because of the possible breakage in tf-nightly.\n", + "# !pip3 install -U numpy\u003e=1.20\n", + "# !pip3 install -q tensorflow==2.8.0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 2, + "status": "ok", + "timestamp": 1650513867685, + "user": { + "displayName": "Hongkun Yu", + "userId": "12855578661733349593" + }, + "user_tz": 420 + }, + "id": "GHvGWdCcdQqG", + "outputId": "863683b2-6b70-4de9-98bc-5df743619ad5" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.21.6\n", + "2.10.0-dev20220420\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import tensorflow as tf\n", + "print(np.__version__)\n", + "print(tf.__version__)\n", + "\n", + "import tensorflow_models as tfm" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eYSeQJniztc8" + }, + "source": [ + "## Check out modules\n", + "\n", + "**Note: As the TensorFlow Models (NLP + Vision) 2.9 release which is tested for this notebook, we partially exported selected modules but the APIs are not stable. Also be aware that, the\n", + "modeling libraries are advancing very fast, so we generally don't guarantee compatability between versions.** " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 206, + "status": "ok", + "timestamp": 1650513874596, + "user": { + "displayName": "Hongkun Yu", + "userId": "12855578661733349593" + }, + "user_tz": 420 + }, + "id": "Y1iEMMGTMrQu", + "outputId": "f3da68d7-ecda-471c-c27b-915b80b55131" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Top-level modules: ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'core', 'hyperparams', 'nlp', 'optimization', 'utils', 'vision']\n", + "NLP modules: ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'encoders', 'layers', 'losses', 'models', 'networks', 'ops', 'serving_modules', 'tasks']\n", + "Vision modules: ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'anchor', 'anchor_generator', 'augment', 'backbones', 'box_matcher', 'box_ops', 'classification_model', 'configs', 'decoders', 'factory', 'factory_3d', 'heads', 'iou_similarity', 'layers', 'mask_ops', 'maskrcnn_model', 'nms', 'preprocess_ops', 'preprocess_ops_3d', 'retinanet_model', 'sampling_ops', 'segmentation_model', 'spatial_transform_ops', 'target_gather', 'video_classification_model']\n" + ] + } + ], + "source": [ + "print(\"Top-level modules: \", dir(tfm))\n", + "print(\"NLP modules: \", dir(tfm.nlp))\n", + "print(\"Vision modules: \", dir(tfm.vision))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UMHeJmk_1yUf" + }, + "source": [ + "## Quick Examples\n", + "\n", + "### 1. Use a tfm.nlp Keras layer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 400, + "status": "ok", + "timestamp": 1650514040957, + "user": { + "displayName": "Hongkun Yu", + "userId": "12855578661733349593" + }, + "user_tz": 420 + }, + "id": "XVWEUozQ1xQY", + "outputId": "5de2aa91-8c38-438e-80b6-481617917c08" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "transformer_encoder_block_1\n", + "tf.Tensor(\n", + "[[[-1.063648 1.4375787 -0.79198956 0.4180589 ]\n", + " [-1.063648 1.4375787 -0.79198956 0.4180589 ]\n", + " [-1.063648 1.4375787 -0.79198956 0.4180589 ]]\n", + "\n", + " [[-1.063648 1.4375787 -0.79198956 0.4180589 ]\n", + " [-1.063648 1.4375787 -0.79198956 0.4180589 ]\n", + " [-1.063648 1.4375787 -0.7919895 0.41805887]]], shape=(2, 3, 4), dtype=float32)\n" + ] + } + ], + "source": [ + "encoder_block = tfm.nlp.layers.TransformerEncoderBlock(\n", + " num_attention_heads=2, inner_dim=10, inner_activation='relu')\n", + "\n", + "batch, length, hidden_size = 2, 3, 4\n", + "qkv_inputs = tf.ones((batch, length, hidden_size), tf.float32)\n", + "attention_mask = None\n", + "outputs = encoder_block([qkv_inputs, attention_mask])\n", + "print(encoder_block.name)\n", + "print(outputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AOOrWjKkSYM0" + }, + "source": [ + "### 2. Use a tfm.vision Backbone models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 5979, + "status": "ok", + "timestamp": 1650514078414, + "user": { + "displayName": "Hongkun Yu", + "userId": "12855578661733349593" + }, + "user_tz": 420 + }, + "id": "xwD0UhUdSzNU", + "outputId": "770d46c0-8c71-4f59-ee0b-67430c791380" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:absl:SpineNet output level out of range [min_level, max_level] = [4, 6] will not be used for further processing.\n", + "WARNING:absl:SpineNet output level out of range [min_level, max_level] = [4, 6] will not be used for further processing.\n", + "WARNING:absl:SpineNet output level out of range [min_level, max_level] = [4, 6] will not be used for further processing.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "spine_net\n", + "{'4': \u003cKerasTensor: shape=(1, 8, 8, 128) dtype=float32 (created by layer 'spine_net')\u003e, '5': \u003cKerasTensor: shape=(1, 4, 4, 128) dtype=float32 (created by layer 'spine_net')\u003e, '6': \u003cKerasTensor: shape=(1, 2, 2, 128) dtype=float32 (created by layer 'spine_net')\u003e}\n" + ] + } + ], + "source": [ + "input_size = 128\n", + "filter_size_scale, block_repeats, resample_alpha, endpoints_num_filters, min_level, max_level = 0.65, 1, 0.5, 128, 4, 6\n", + "input_specs = tf.keras.layers.InputSpec(\n", + " shape=[None, input_size, input_size, 3])\n", + "model = tfm.vision.backbones.SpineNet(\n", + " input_specs=input_specs,\n", + " min_level=min_level,\n", + " max_level=max_level,\n", + " endpoints_num_filters=endpoints_num_filters,\n", + " resample_alpha=resample_alpha,\n", + " block_repeats=block_repeats,\n", + " filter_size_scale=filter_size_scale,\n", + " init_stochastic_depth_rate=0.2,\n", + ")\n", + "\n", + "inputs = tf.keras.Input(shape=(input_size, input_size, 3), batch_size=1)\n", + "endpoints = model(inputs)\n", + "print(model.name)\n", + "print(endpoints)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HHJs4lRlTk8q" + }, + "source": [ + "### 3. Use Orbit package for advanced training loops" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 215, + "status": "ok", + "timestamp": 1650514185283, + "user": { + "displayName": "Hongkun Yu", + "userId": "12855578661733349593" + }, + "user_tz": 420 + }, + "id": "X4ek9IrJTkP_", + "outputId": "7b31de68-53a0-4b8a-d9ab-109e7c43c933" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Orbit modules: ['AbstractEvaluator', 'AbstractTrainer', 'Action', 'Controller', 'StandardEvaluator', 'StandardEvaluatorOptions', 'StandardTrainer', 'StandardTrainerOptions', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'actions', 'controller', 'runner', 'standard_runner', 'utils']\n" + ] + } + ], + "source": [ + "import orbit\n", + "print(\"Orbit modules: \", dir(orbit))" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "tensorflow_models_pypi", + "provenance": [ + { + "file_id": "1dm1dUZ2Bo6S6Zom7GTQrIG78Xz7iFeZY", + "timestamp": 1650514452505 + } + ] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/tensorflow_models/tensorflow_models_test.py b/tensorflow_models/tensorflow_models_test.py index e55fcf9b929488718727e6cdb2ca717d08e77f98..8d85915e9e23875fedc8d4edfb2a06405e374732 100644 --- a/tensorflow_models/tensorflow_models_test.py +++ b/tensorflow_models/tensorflow_models_test.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. diff --git a/tensorflow_models/vision/__init__.py b/tensorflow_models/vision/__init__.py index ab90ee03882278690ec1a4a72423a80573897845..3e541e367cd13a05d51b178eec54ba187937c657 100644 --- a/tensorflow_models/vision/__init__.py +++ b/tensorflow_models/vision/__init__.py @@ -1,4 +1,4 @@ -# Copyright 2021 The TensorFlow Authors. All Rights Reserved. +# Copyright 2022 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -13,5 +13,9 @@ # limitations under the License. """TensorFlow Models Vision Libraries.""" -from official.vision.beta import configs -from official.vision.beta.modeling import * +from official.vision import configs +from official.vision import serving +from official.vision.modeling import * +from official.vision.ops import * +from official.vision.tasks import * +