Fixup tensorflow_models.nlp tutorials

PiperOrigin-RevId: 443681252

Fixup tensorflow_models.nlp tutorials
PiperOrigin-RevId: 443681252
828ed8b2 · Mark Daoust · A. Unique TensorFlower · 3f635b4d · 828ed8b2 · 828ed8b2
Commit 828ed8b2 authored Apr 22, 2022 by Mark Daoust Committed by A. Unique TensorFlower Apr 22, 2022
3 changed files
--- a/official/colab/decoding_api_in_tf_nlp.ipynb
+++ b/official/colab/decoding_api_in_tf_nlp.ipynb
@@ -34,14 +34,10 @@
    {
      "cell_type": "markdown",
      "metadata": {
-        "id": "fsACVQpVSifi"
+        "id": "2X-XaMSVcLua"
      },
      "source": [
-        "### Install the TensorFlow Model Garden pip package\n",
-        "\n",
-        "*  `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n",
-        "which is the nightly Model Garden package created daily automatically.\n",
-        "*  pip will install all models and dependencies automatically."
+        "# Decoding API"
      ]
    },
    {
@@ -66,6 +62,30 @@
        "\u003c/table\u003e"
      ]
    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fsACVQpVSifi"
+      },
+      "source": [
+        "### Install the TensorFlow Model Garden pip package\n",
+        "\n",
+        "*  `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n",
+        "which is the nightly Model Garden package created daily automatically.\n",
+        "*  pip will install all models and dependencies automatically."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "G4BhAu01HZcM"
+      },
+      "outputs": [],
+      "source": [
+        "!pip uninstall -y opencv-python"
+      ]
+    },
    {
      "cell_type": "code",
      "execution_count": null,
@@ -74,7 +94,7 @@
      },
      "outputs": [],
      "source": [
-        "pip install  tf-models-nightly"
+        "!pip install tf-models-official"
      ]
    },
    {
@@ -92,9 +112,20 @@
        "\n",
        "import tensorflow as tf\n",
        "\n",
-        "from official import nlp\n",
-        "from official.nlp.modeling.ops import sampling_module\n",
-        "from official.nlp.modeling.ops import beam_search"
+        "from tensorflow_models import nlp"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "T92ccAzlnGqh"
+      },
+      "outputs": [],
+      "source": [
+        "def length_norm(length, dtype):\n",
+        "  \"\"\"Return length normalization factor.\"\"\"\n",
+        "  return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)"
      ]
    },
    {
@@ -103,7 +134,8 @@
        "id": "0AWgyo-IQ5sP"
      },
      "source": [
-        "# Decoding API\n",
+        "## Overview\n",
+        "\n",
        "This API provides an interface to experiment with different decoding strategies used for auto-regressive models.\n",
        "\n",
        "1. The following sampling strategies are provided in sampling_module.py, which inherits from the base Decoding class:\n",
@@ -182,7 +214,7 @@
        "id": "lV1RRp6ihnGX"
      },
      "source": [
-        "# Initialize the Model Hyper-parameters"
+        "## Initialize the Model Hyper-parameters"
      ]
    },
    {
@@ -193,44 +225,32 @@
      },
      "outputs": [],
      "source": [
-        "params = {}\n",
-        "params['num_heads'] = 2\n",
-        "params['num_layers'] = 2\n",
-        "params['batch_size'] = 2\n",
-        "params['n_dims'] = 256\n",
-        "params['max_decode_length'] = 4"
+        "params = {\n",
+        "    'num_heads': 2\n",
+        "    'num_layers': 2\n",
+        "    'batch_size': 2\n",
+        "    'n_dims': 256\n",
+        "    'max_decode_length': 4}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
-        "id": "UGvmd0_dRFYI"
+        "id": "CYXkoplAij01"
      },
      "source": [
-        "## What is a Cache?\n",
-        "In auto-regressive architectures like Transformer based [Encoder-Decoder](https://arxiv.org/abs/1706.03762) models, \n",
-        "Cache is used for fast sequential decoding.\n",
-        "It is a nested dictionary storing pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) for every layer.\n",
-        "\n",
-        "```\n",
-        "{\n",
-        "    'layer_%d' % layer: {\n",
-        "        'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32),\n",
-        "        'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32)\n",
-        "        } for layer in range(params['num_layers']),\n",
-        "    'model_specific_item' : Model specific tensor shape,\n",
-        "}\n",
-        "\n",
-        "```"
+        "## Initialize cache. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
-        "id": "CYXkoplAij01"
+        "id": "UGvmd0_dRFYI"
      },
      "source": [
-        "# Initialize cache. "
+        "In auto-regressive architectures like Transformer based [Encoder-Decoder](https://arxiv.org/abs/1706.03762) models, \n",
+        "Cache is used for fast sequential decoding.\n",
+        "It is a nested dictionary storing pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) for every layer."
      ]
    },
    {
@@ -243,35 +263,15 @@
      "source": [
        "cache = {\n",
        "    'layer_%d' % layer: {\n",
-        "        'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32),\n",
-        "        'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32)\n",
+        "        'k': tf.zeros(\n",
+        "            shape=[params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims'] // params['num_heads']],\n",
+        "            dtype=tf.float32),\n",
+        "        'v': tf.zeros(\n",
+        "            shape=[params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims'] // params['num_heads']],\n",
+        "            dtype=tf.float32)\n",
        "        } for layer in range(params['num_layers'])\n",
        "    }\n",
-        "print(\"cache key shape for layer 1 :\", cache['layer_1']['k'].shape)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "nNY3Xn8SiblP"
-      },
-      "source": [
-        "# Define closure for length normalization. **optional.**\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "T92ccAzlnGqh"
-      },
-      "outputs": [],
-      "source": [
-        "def length_norm(length, dtype):\n",
-        "  \"\"\"Return length normalization factor.\"\"\"\n",
-        "  return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)"
+        "print(\"cache value shape for layer 1 :\", cache['layer_1']['k'].shape)"
      ]
    },
    {
@@ -280,15 +280,14 @@
        "id": "syl7I5nURPgW"
      },
      "source": [
-        "# Create model_fn\n",
+        "### Create model_fn\n",
        "  In practice, this will be replaced by an actual model implementation such as [here](https://github.com/tensorflow/models/blob/master/official/nlp/transformer/transformer.py#L236)\n",
        "```\n",
        "Args:\n",
        "i : Step that is being decoded.\n",
        "Returns:\n",
        "  logit probabilities of size [batch_size, 1, vocab_size]\n",
-        "```\n",
-        "\n"
+        "```\n"
      ]
    },
    {
@@ -307,15 +306,6 @@
        "  return probabilities[:, i, :]"
      ]
    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "DBMUkaVmVZBg"
-      },
-      "source": [
-        "# Initialize symbols_to_logits_fn\n"
-      ]
-    },
    {
      "cell_type": "code",
      "execution_count": null,
@@ -339,7 +329,7 @@
        "id": "R_tV3jyWVL47"
      },
      "source": [
-        "# Greedy \n",
+        "## Greedy \n",
        "Greedy decoding selects the token id with the highest probability as its next id: $id_t = argmax_{w}P(id | id_{1:t-1})$ at each timestep $t$. The following sketch shows greedy decoding. "
      ]
    },
@@ -370,7 +360,7 @@
        "id": "s4pTTsQXVz5O"
      },
      "source": [
-        "# top_k sampling\n",
+        "## top_k sampling\n",
        "In *Top-K* sampling, the *K* most likely next token ids are filtered and the probability mass is redistributed among only those *K* ids. "
      ]
    },
@@ -404,7 +394,7 @@
        "id": "Jp3G-eE_WI4Y"
      },
      "source": [
-        "# top_p sampling\n",
+        "## top_p sampling\n",
        "Instead of sampling only from the most likely *K* token ids, in *Top-p* sampling chooses from the smallest possible set of ids whose cumulative probability exceeds the probability *p*."
      ]
    },
@@ -438,7 +428,7 @@
        "id": "2hcuyJ2VWjDz"
      },
      "source": [
-        "# Beam search decoding\n",
+        "## Beam search decoding\n",
        "Beam search reduces the risk of missing hidden high probability token ids by keeping the most likely num_beams of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability. "
      ]
    },

--- a/official/colab/nlp/customize_encoder.ipynb
+++ b/official/colab/nlp/customize_encoder.ipynb
 {
-  "nbformat": 4,
-  "nbformat_minor": 0,
-  "metadata": {
-    "colab": {
-      "name": "Customizing a Transformer Encoder",
-      "private_outputs": true,
-      "provenance": [],
-      "collapsed_sections": [],
-      "toc_visible": true
-    },
-    "kernelspec": {
-      "display_name": "Python 3",
-      "name": "python3"
-    }
-  },
  "cells": [
    {
      "cell_type": "markdown",
@@ -26,10 +11,12 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "rxPj2Lsni9O4"
      },
+      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
@@ -42,9 +29,7 @@
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -61,20 +46,20 @@
        "id": "Mwb9uw1cDXsa"
      },
      "source": [
-        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
-        "  <td>\n",
-        "    <a target=\"_blank\" href=\"https://www.tensorflow.org/official_models/nlp/customize_encoder\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
-        "  </td>\n",
-        "  <td>\n",
-        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/official/colab/nlp/customize_encoder.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
-        "  </td>\n",
-        "  <td>\n",
-        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/official/colab/nlp/customize_encoder.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
-        "  </td>\n",
-        "  <td>\n",
-        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/models/official/colab/nlp/customize_encoder.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a>\n",
-        "  </td>\n",
-        "</table>"
+        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
+        "  \u003ctd\u003e\n",
+        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/official_models/nlp/customize_encoder\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
+        "  \u003c/td\u003e\n",
+        "  \u003ctd\u003e\n",
+        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/official/colab/nlp/customize_encoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
+        "  \u003c/td\u003e\n",
+        "  \u003ctd\u003e\n",
+        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/official/colab/nlp/customize_encoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
+        "  \u003c/td\u003e\n",
+        "  \u003ctd\u003e\n",
+        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/official/colab/nlp/customize_encoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
+        "  \u003c/td\u003e\n",
+        "\u003c/table\u003e"
      ]
    },
    {
@@ -87,7 +72,7 @@
        "\n",
        "The [TensorFlow Models NLP library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling) is a collection of tools for building and training modern high performance natural language models.\n",
        "\n",
-        "The [TransformEncoder](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/encoder_scaffold.py) is the core of this library, and lots of new network architectures are proposed to improve the encoder. In this Colab notebook, we will learn how to customize the encoder to employ new network architectures."
+        "The `tfm.nlp.networks.EncoderScaffold` is the core of this library, and lots of new network architectures are proposed to improve the encoder. In this Colab notebook, we will learn how to customize the encoder to employ new network architectures."
      ]
    },
    {
@@ -114,14 +99,27 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
-        "id": "thsKZDjhswhR"
+        "id": "mfHI5JyuJ1y9"
      },
+      "outputs": [],
      "source": [
-        "!pip install -q tf-models-official==2.4.0"
-      ],
+        "# Uninstall colab's opencv-python, it conflicts with `opencv-python-headless`\n",
+        "# which is installed by tf-models-official\n",
+        "!pip uninstall -y opencv-python"
+      ]
+    },
+    {
+      "cell_type": "code",
      "execution_count": null,
-      "outputs": []
+      "metadata": {
+        "id": "thsKZDjhswhR"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install -q tf-models-nightly"
+      ]
    },
    {
      "cell_type": "markdown",
@@ -134,19 +132,18 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "my4dp-RMssQe"
      },
+      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
-        "from official.modeling import activations\n",
-        "from official.nlp import modeling\n",
-        "from official.nlp.modeling import layers, losses, models, networks"
-      ],
-      "execution_count": null,
-      "outputs": []
+        "import tensorflow_models as tfm\n",
+        "from tensorflow_models import nlp"
+      ]
    },
    {
      "cell_type": "markdown",
@@ -156,14 +153,16 @@
      "source": [
        "## Canonical BERT encoder\n",
        "\n",
-        "Before learning how to customize the encoder, let's firstly create a canonical BERT enoder and use it to instantiate a `BertClassifier` for classification task."
+        "Before learning how to customize the encoder, let's firstly create a canonical BERT enoder and use it to instantiate a `bert_classifier.BertClassifier` for classification task."
      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "Oav8sbgstWc-"
      },
+      "outputs": [],
      "source": [
        "cfg = {\n",
        "    \"vocab_size\": 100,\n",
@@ -171,22 +170,20 @@
        "    \"num_layers\": 3,\n",
        "    \"num_attention_heads\": 4,\n",
        "    \"intermediate_size\": 64,\n",
-        "    \"activation\": activations.gelu,\n",
+        "    \"activation\": tfm.utils.activations.gelu,\n",
        "    \"dropout_rate\": 0.1,\n",
        "    \"attention_dropout_rate\": 0.1,\n",
        "    \"max_sequence_length\": 16,\n",
        "    \"type_vocab_size\": 2,\n",
        "    \"initializer\": tf.keras.initializers.TruncatedNormal(stddev=0.02),\n",
        "}\n",
-        "bert_encoder = modeling.networks.BertEncoder(**cfg)\n",
+        "bert_encoder = nlp.networks.BertEncoder(**cfg)\n",
        "\n",
        "def build_classifier(bert_encoder):\n",
-        "  return modeling.models.BertClassifier(bert_encoder, num_classes=2)\n",
+        "  return nlp.models.BertClassifier(bert_encoder, num_classes=2)\n",
        "\n",
        "canonical_classifier_model = build_classifier(bert_encoder)"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -201,9 +198,11 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "csED2d-Yt5h6"
      },
+      "outputs": [],
      "source": [
        "def predict(model):\n",
        "  batch_size = 3\n",
@@ -216,9 +215,7 @@
        "  print(model([word_ids, mask, type_ids], training=False))\n",
        "\n",
        "predict(canonical_classifier_model)"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -249,7 +246,7 @@
      "source": [
        "### Use EncoderScaffold\n",
        "\n",
-        "`EncoderScaffold` allows users to provide a custom embedding subnetwork\n",
+        "`networks.EncoderScaffold` allows users to provide a custom embedding subnetwork\n",
        "  (which will replace the standard embedding logic) and/or a custom hidden layer class (which will replace the `Transformer` instantiation in the encoder)."
      ]
    },
@@ -261,30 +258,32 @@
      "source": [
        "#### Without Customization\n",
        "\n",
-        "Without any customization, `EncoderScaffold` behaves the same the canonical `BertEncoder`.\n",
+        "Without any customization, `networks.EncoderScaffold` behaves the same the canonical `networks.BertEncoder`.\n",
        "\n",
-        "As shown in the following example, `EncoderScaffold` can load `BertEncoder`'s weights and output the same values:"
+        "As shown in the following example, `networks.EncoderScaffold` can load `networks.BertEncoder`'s weights and output the same values:"
      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "ktNzKuVByZQf"
      },
+      "outputs": [],
      "source": [
        "default_hidden_cfg = dict(\n",
        "    num_attention_heads=cfg[\"num_attention_heads\"],\n",
        "    intermediate_size=cfg[\"intermediate_size\"],\n",
-        "    intermediate_activation=activations.gelu,\n",
+        "    intermediate_activation=cfg[\"activation\"],\n",
        "    dropout_rate=cfg[\"dropout_rate\"],\n",
        "    attention_dropout_rate=cfg[\"attention_dropout_rate\"],\n",
-        "    kernel_initializer=tf.keras.initializers.TruncatedNormal(0.02),\n",
+        "    kernel_initializer=cfg[\"initializer\"],\n",
        ")\n",
        "default_embedding_cfg = dict(\n",
        "    vocab_size=cfg[\"vocab_size\"],\n",
        "    type_vocab_size=cfg[\"type_vocab_size\"],\n",
        "    hidden_size=cfg[\"hidden_size\"],\n",
-        "    initializer=tf.keras.initializers.TruncatedNormal(0.02),\n",
+        "    initializer=cfg[\"initializer\"],\n",
        "    dropout_rate=cfg[\"dropout_rate\"],\n",
        "    max_seq_length=cfg[\"max_sequence_length\"]\n",
        ")\n",
@@ -294,17 +293,15 @@
        "    num_hidden_instances=cfg[\"num_layers\"],\n",
        "    pooled_output_dim=cfg[\"hidden_size\"],\n",
        "    return_all_layer_outputs=True,\n",
-        "    pooler_layer_initializer=tf.keras.initializers.TruncatedNormal(0.02),\n",
+        "    pooler_layer_initializer=cfg[\"initializer\"],\n",
        ")\n",
        "\n",
-        "encoder_scaffold = modeling.networks.EncoderScaffold(**default_kwargs)\n",
+        "encoder_scaffold = nlp.networks.EncoderScaffold(**default_kwargs)\n",
        "classifier_model_from_encoder_scaffold = build_classifier(encoder_scaffold)\n",
        "classifier_model_from_encoder_scaffold.set_weights(\n",
        "    canonical_classifier_model.get_weights())\n",
        "predict(classifier_model_from_encoder_scaffold)"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -321,26 +318,26 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "LTinnaG6vcsw"
      },
+      "outputs": [],
      "source": [
        "word_ids = tf.keras.layers.Input(\n",
        "    shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_word_ids\")\n",
        "mask = tf.keras.layers.Input(\n",
        "    shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_mask\")\n",
-        "embedding_layer = modeling.layers.OnDeviceEmbedding(\n",
+        "embedding_layer = nlp.layers.OnDeviceEmbedding(\n",
        "    vocab_size=cfg['vocab_size'],\n",
        "    embedding_width=cfg['hidden_size'],\n",
-        "    initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02),\n",
+        "    initializer=cfg[\"initializer\"],\n",
        "    name=\"word_embeddings\")\n",
        "word_embeddings = embedding_layer(word_ids)\n",
-        "attention_mask = layers.SelfAttentionMask()([word_embeddings, mask])\n",
+        "attention_mask = nlp.layers.SelfAttentionMask()([word_embeddings, mask])\n",
        "new_embedding_network = tf.keras.Model([word_ids, mask],\n",
        "                                       [word_embeddings, attention_mask])"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -354,14 +351,14 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "fO9zKFE4OpHp"
      },
+      "outputs": [],
      "source": [
        "tf.keras.utils.plot_model(new_embedding_network, show_shapes=True, dpi=48)"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -374,9 +371,11 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "mtFDMNf2vIl9"
      },
+      "outputs": [],
      "source": [
        "kwargs = dict(default_kwargs)\n",
        "\n",
@@ -384,16 +383,14 @@
        "kwargs['embedding_cls'] = new_embedding_network\n",
        "kwargs['embedding_data'] = embedding_layer.embeddings\n",
        "\n",
-        "encoder_with_customized_embedding = modeling.networks.EncoderScaffold(**kwargs)\n",
+        "encoder_with_customized_embedding = nlp.networks.EncoderScaffold(**kwargs)\n",
        "classifier_model = build_classifier(encoder_with_customized_embedding)\n",
        "# ... Train the model ...\n",
        "print(classifier_model.inputs)\n",
        "\n",
        "# Assert that there are only two inputs.\n",
        "assert len(classifier_model.inputs) == 2"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -403,34 +400,34 @@
      "source": [
        "#### Customized Transformer\n",
        "\n",
-        "User can also override the [hidden_cls](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/encoder_scaffold.py#L103) argument in `EncoderScaffold`'s constructor to employ a customized Transformer layer.\n",
+        "User can also override the `hidden_cls` argument in `networks.EncoderScaffold`'s constructor to employ a customized Transformer layer.\n",
        "\n",
-        "See [ReZeroTransformer](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/rezero_transformer.py) for how to implement a customized Transformer layer.\n",
+        "See [the source of `nlp.layers.ReZeroTransformer`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/rezero_transformer.py) for how to implement a customized Transformer layer.\n",
        "\n",
-        "Following is an example of using `ReZeroTransformer`:\n"
+        "Following is an example of using `nlp.layers.ReZeroTransformer`:\n"
      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "uAIarLZgw6pA"
      },
+      "outputs": [],
      "source": [
        "kwargs = dict(default_kwargs)\n",
        "\n",
        "# Use ReZeroTransformer.\n",
-        "kwargs['hidden_cls'] = modeling.layers.ReZeroTransformer\n",
+        "kwargs['hidden_cls'] = nlp.layers.ReZeroTransformer\n",
        "\n",
-        "encoder_with_rezero_transformer = modeling.networks.EncoderScaffold(**kwargs)\n",
+        "encoder_with_rezero_transformer = nlp.networks.EncoderScaffold(**kwargs)\n",
        "classifier_model = build_classifier(encoder_with_rezero_transformer)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)\n",
        "\n",
        "# Assert that the variable `rezero_alpha` from ReZeroTransformer exists.\n",
        "assert 'rezero_alpha' in ''.join([x.name for x in classifier_model.trainable_weights])"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -438,10 +435,9 @@
        "id": "6PMHFdvnxvR0"
      },
      "source": [
-        "### Use [TransformerScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py)\n",
+        "### Use `nlp.layers.TransformerScaffold`\n",
        "\n",
-        "The above method of customizing `Transformer` requires rewriting the whole `Transformer` layer, while sometimes you may only want to customize either attention layer or feedforward block. In this case, [TransformerScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py) can be used.\n",
-        "\n"
+        "The above method of customizing the model requires rewriting the whole `nlp.layers.Transformer` layer, while sometimes you may only want to customize either attention layer or feedforward block. In this case, `nlp.layers.TransformerScaffold` can be used.\n"
      ]
    },
    {
@@ -452,37 +448,48 @@
      "source": [
        "#### Customize Attention Layer\n",
        "\n",
-        "User can also override the [attention_cls](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py#L45) argument in `TransformerScaffold`'s constructor to employ a customized Attention layer.\n",
+        "User can also override the `attention_cls` argument in `layers.TransformerScaffold`'s constructor to employ a customized Attention layer.\n",
        "\n",
-        "See [TalkingHeadsAttention](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/talking_heads_attention.py) for how to implement a customized `Attention` layer.\n",
+        "See [the source of `nlp.layers.TalkingHeadsAttention`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/talking_heads_attention.py) for how to implement a customized `Attention` layer.\n",
        "\n",
-        "Following is an example of using [TalkingHeadsAttention](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/talking_heads_attention.py):"
+        "Following is an example of using `nlp.layers.TalkingHeadsAttention`:"
      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "nFrSMrZuyNeQ"
      },
+      "outputs": [],
      "source": [
        "# Use TalkingHeadsAttention\n",
        "hidden_cfg = dict(default_hidden_cfg)\n",
-        "hidden_cfg['attention_cls'] = modeling.layers.TalkingHeadsAttention\n",
+        "hidden_cfg['attention_cls'] = nlp.layers.TalkingHeadsAttention\n",
        "\n",
        "kwargs = dict(default_kwargs)\n",
-        "kwargs['hidden_cls'] = modeling.layers.TransformerScaffold\n",
+        "kwargs['hidden_cls'] = nlp.layers.TransformerScaffold\n",
        "kwargs['hidden_cfg'] = hidden_cfg\n",
        "\n",
-        "encoder = modeling.networks.EncoderScaffold(**kwargs)\n",
+        "encoder = nlp.networks.EncoderScaffold(**kwargs)\n",
        "classifier_model = build_classifier(encoder)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)\n",
        "\n",
        "# Assert that the variable `pre_softmax_weight` from TalkingHeadsAttention exists.\n",
        "assert 'pre_softmax_weight' in ''.join([x.name for x in classifier_model.trainable_weights])"
-      ],
+      ]
+    },
+    {
+      "cell_type": "code",
      "execution_count": null,
-      "outputs": []
+      "metadata": {
+        "id": "tKkZ8spzYmpc"
+      },
+      "outputs": [],
+      "source": [
+        "tf.keras.utils.plot_model(encoder_with_rezero_transformer, show_shapes=True, dpi=48)"
+      ]
    },
    {
      "cell_type": "markdown",
@@ -494,35 +501,35 @@
        "\n",
        "Similiarly, one could also customize the feedforward layer.\n",
        "\n",
-        "See [GatedFeedforward](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/gated_feedforward.py) for how to implement a customized feedforward layer.\n",
+        "See [the source of `nlp.layers.GatedFeedforward`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/gated_feedforward.py) for how to implement a customized feedforward layer.\n",
        "\n",
-        "Following is an example of using [GatedFeedforward](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/gated_feedforward.py)."
+        "Following is an example of using `nlp.layers.GatedFeedforward`:"
      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "XAbKy_l4y_-i"
      },
+      "outputs": [],
      "source": [
-        "# Use TalkingHeadsAttention\n",
+        "# Use GatedFeedforward\n",
        "hidden_cfg = dict(default_hidden_cfg)\n",
-        "hidden_cfg['feedforward_cls'] = modeling.layers.GatedFeedforward\n",
+        "hidden_cfg['feedforward_cls'] = nlp.layers.GatedFeedforward\n",
        "\n",
        "kwargs = dict(default_kwargs)\n",
-        "kwargs['hidden_cls'] = modeling.layers.TransformerScaffold\n",
+        "kwargs['hidden_cls'] = nlp.layers.TransformerScaffold\n",
        "kwargs['hidden_cfg'] = hidden_cfg\n",
        "\n",
-        "encoder_with_gated_feedforward = modeling.networks.EncoderScaffold(**kwargs)\n",
+        "encoder_with_gated_feedforward = nlp.networks.EncoderScaffold(**kwargs)\n",
        "classifier_model = build_classifier(encoder_with_gated_feedforward)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)\n",
        "\n",
        "# Assert that the variable `gate` from GatedFeedforward exists.\n",
        "assert 'gate' in ''.join([x.name for x in classifier_model.trainable_weights])"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -530,26 +537,28 @@
        "id": "a_8NWUhkzeAq"
      },
      "source": [
-        "### Build a new Encoder using building blocks from KerasBERT.\n",
+        "### Build a new Encoder\n",
        "\n",
        "Finally, you could also build a new encoder using building blocks in the modeling library.\n",
        "\n",
-        "See [AlbertEncoder](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/albert_encoder.py) as an example:\n"
+        "See [the source for `nlp.networks.AlbertEncoder`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/albert_encoder.py) as an example of how to du this. \n",
+        "\n",
+        "Here is an example using `nlp.networks.AlbertEncoder`:\n"
      ]
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "xsiA3RzUzmUM"
      },
+      "outputs": [],
      "source": [
-        "albert_encoder = modeling.networks.AlbertEncoder(**cfg)\n",
+        "albert_encoder = nlp.networks.AlbertEncoder(**cfg)\n",
        "classifier_model = build_classifier(albert_encoder)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
    },
    {
      "cell_type": "markdown",
@@ -562,14 +571,28 @@
    },
    {
      "cell_type": "code",
+      "execution_count": null,
      "metadata": {
        "id": "Uv_juT22HERW"
      },
+      "outputs": [],
      "source": [
        "tf.keras.utils.plot_model(albert_encoder, show_shapes=True, dpi=48)"
-      ],
-      "execution_count": null,
-      "outputs": []
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "collapsed_sections": [],
+      "name": "customize_encoder.ipynb",
+      "provenance": [],
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
    }
-  ]
-}
\ No newline at end of file
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
--- a/official/colab/nlp/nlp_modeling_library_intro.ipynb
+++ b/official/colab/nlp/nlp_modeling_library_intro.ipynb
@@ -95,6 +95,19 @@
        "*  `pip` will install all models and dependencies automatically."
      ]
    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "IAOmYthAzI7J"
+      },
+      "outputs": [],
+      "source": [
+        "# Uninstall colab's opencv-python, it conflicts with `opencv-python-headless`\n",
+        "# which is installed by tf-models-official\n",
+        "!pip uninstall -y opencv-python"
+      ]
+    },
    {
      "cell_type": "code",
      "execution_count": null,
@@ -103,7 +116,7 @@
      },
      "outputs": [],
      "source": [
-        "!pip install -q tf-models-official==2.4.0"
+        "!pip install tf-models-official"
      ]
    },
    {
@@ -126,8 +139,7 @@
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
-        "from official.nlp import modeling\n",
-        "from official.nlp.modeling import layers, losses, models, networks"
+        "from tensorflow_models import nlp"
      ]
    },
    {
@@ -151,9 +163,9 @@
      "source": [
        "### Build a `BertPretrainer` model wrapping `BertEncoder`\n",
        "\n",
-        "The [BertEncoder](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/bert_encoder.py) implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers, but not the masked language model or classification task networks.\n",
+        "The `nlp.networks.BertEncoder` class implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers (`nlp.layers.TransformerEncoderBlock`), but not the masked language model or classification task networks.\n",
        "\n",
-        "The [BertPretrainer](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_pretrainer.py) allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives."
+        "The `nlp.models.BertPretrainer` class allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives."
      ]
    },
    {
@@ -166,9 +178,10 @@
      "source": [
        "# Build a small transformer network.\n",
        "vocab_size = 100\n",
-        "sequence_length = 16\n",
-        "network = modeling.networks.BertEncoder(\n",
-        "    vocab_size=vocab_size, num_layers=2, sequence_length=16)"
+        "network = nlp.networks.BertEncoder(\n",
+        "    vocab_size=vocab_size, \n",
+        "    # The number of TransformerEncoderBlock layers\n",
+        "    num_layers=3)"
      ]
    },
    {
@@ -177,7 +190,7 @@
        "id": "0NH5irV5KTMS"
      },
      "source": [
-        "Inspecting the encoder, we see it contains few embedding layers, stacked `Transformer` layers and are connected to three input layers:\n",
+        "Inspecting the encoder, we see it contains few embedding layers, stacked `nlp.layers.TransformerEncoderBlock` layers and are connected to three input layers:\n",
        "\n",
        "`input_word_ids`, `input_type_ids` and `input_mask`.\n"
      ]
@@ -190,7 +203,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(network, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(network, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -203,7 +216,7 @@
      "source": [
        "# Create a BERT pretrainer with the created network.\n",
        "num_token_predictions = 8\n",
-        "bert_pretrainer = modeling.models.BertPretrainer(\n",
+        "bert_pretrainer = nlp.models.BertPretrainer(\n",
        "    network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions')"
      ]
    },
@@ -213,7 +226,7 @@
        "id": "d5h5HT7gNHx_"
      },
      "source": [
-        "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `Classification` heads."
+        "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `nlp.layers.ClassificationHead` heads."
      ]
    },
    {
@@ -224,7 +237,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -236,7 +249,9 @@
      "outputs": [],
      "source": [
        "# We can feed some dummy data to get masked language model and sentence output.\n",
+        "sequence_length = 16\n",
        "batch_size = 2\n",
+        "\n",
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
@@ -246,8 +261,8 @@
        "    [word_id_data, mask_data, type_id_data, masked_lm_positions_data])\n",
        "lm_output = outputs[\"masked_lm\"]\n",
        "sentence_output = outputs[\"classification\"]\n",
-        "print(lm_output)\n",
-        "print(sentence_output)"
+        "print(f'lm_output: shape={lm_output.shape}, dtype={lm_output.dtype!r}')\n",
+        "print(f'sentence_output: shape={sentence_output.shape}, dtype={sentence_output.dtype!r}')"
      ]
    },
    {
@@ -272,14 +287,15 @@
        "masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n",
        "next_sentence_labels_data = np.random.randint(2, size=(batch_size))\n",
        "\n",
-        "mlm_loss = modeling.losses.weighted_sparse_categorical_crossentropy_loss(\n",
+        "mlm_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
        "    labels=masked_lm_ids_data,\n",
        "    predictions=lm_output,\n",
        "    weights=masked_lm_weights_data)\n",
-        "sentence_loss = modeling.losses.weighted_sparse_categorical_crossentropy_loss(\n",
+        "sentence_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
        "    labels=next_sentence_labels_data,\n",
        "    predictions=sentence_output)\n",
        "loss = mlm_loss + sentence_loss\n",
+        "\n",
        "print(loss)"
      ]
    },
@@ -290,8 +306,7 @@
      },
      "source": [
        "With the loss, you can optimize the model.\n",
-        "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_pretraining.py) for the full example.\n",
-        "\n"
+        "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_pretraining.py) for the full example.\n"
      ]
    },
    {
@@ -315,9 +330,9 @@
      "source": [
        "### Build a BertSpanLabeler wrapping BertEncoder\n",
        "\n",
-        "[BertSpanLabeler](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_span_labeler.py) implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n",
+        "The `nlp.models.BertSpanLabeler` class implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n",
        "\n",
-        "Note that `BertSpanLabeler` wraps a `BertEncoder`, the weights of which can be restored from the above pretraining model.\n"
+        "Note that `nlp.models.BertSpanLabeler` wraps a `nlp.networks.BertEncoder`, the weights of which can be restored from the above pretraining model.\n"
      ]
    },
    {
@@ -328,11 +343,11 @@
      },
      "outputs": [],
      "source": [
-        "network = modeling.networks.BertEncoder(\n",
-        "        vocab_size=vocab_size, num_layers=2, sequence_length=sequence_length)\n",
+        "network = nlp.networks.BertEncoder(\n",
+        "        vocab_size=vocab_size, num_layers=2)\n",
        "\n",
        "# Create a BERT trainer with the created network.\n",
-        "bert_span_labeler = modeling.models.BertSpanLabeler(network)"
+        "bert_span_labeler = nlp.models.BertSpanLabeler(network)"
      ]
    },
    {
@@ -352,7 +367,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -370,8 +385,9 @@
        "\n",
        "# Feed the data to the model.\n",
        "start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])\n",
-        "print(start_logits)\n",
-        "print(end_logits)"
+        "\n",
+        "print(f'start_logits: shape={start_logits.shape}, dtype={start_logits.dtype!r}')\n",
+        "print(f'end_logits: shape={end_logits.shape}, dtype={end_logits.dtype!r}')"
      ]
    },
    {
@@ -432,7 +448,7 @@
      "source": [
        "### Build a BertClassifier model wrapping BertEncoder\n",
        "\n",
-        "[BertClassifier](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_classifier.py) implements a [CLS] token classification model containing a single classification head."
+        "`nlp.models.BertClassifier` implements a [CLS] token classification model containing a single classification head."
      ]
    },
    {
@@ -443,12 +459,12 @@
      },
      "outputs": [],
      "source": [
-        "network = modeling.networks.BertEncoder(\n",
-        "        vocab_size=vocab_size, num_layers=2, sequence_length=sequence_length)\n",
+        "network = nlp.networks.BertEncoder(\n",
+        "        vocab_size=vocab_size, num_layers=2)\n",
        "\n",
        "# Create a BERT trainer with the created network.\n",
        "num_classes = 2\n",
-        "bert_classifier = modeling.models.BertClassifier(\n",
+        "bert_classifier = nlp.models.BertClassifier(\n",
        "    network, num_classes=num_classes)"
      ]
    },
@@ -469,7 +485,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -487,7 +503,7 @@
        "\n",
        "# Feed the data to the model.\n",
        "logits = bert_classifier([word_id_data, mask_data, type_id_data])\n",
-        "print(logits)"
+        "print(f'logits: shape={logits.shape}, dtype={logits.dtype!r}')"
      ]
    },
    {
@@ -529,8 +545,7 @@
  "metadata": {
    "colab": {
      "collapsed_sections": [],
-      "name": "Introduction to the TensorFlow Models NLP library",
-      "private_outputs": true,
+      "name": "nlp_modeling_library_intro.ipynb",
      "provenance": [],
      "toc_visible": true
    },