Fixup tensorflow_models.nlp tutorials

PiperOrigin-RevId: 443681252

Fixup tensorflow_models.nlp tutorials
PiperOrigin-RevId: 443681252
a576ea60 · Mark Daoust · A. Unique TensorFlower · 53227b70 · a576ea60 · a576ea60
Commit a576ea60 authored Apr 22, 2022 by Mark Daoust Committed by A. Unique TensorFlower Apr 22, 2022
3 changed files
--- a/official/colab/decoding_api_in_tf_nlp.ipynb
+++ b/official/colab/decoding_api_in_tf_nlp.ipynb
@@ -34,14 +34,10 @@
    {
      "cell_type": "markdown",
      "metadata": {
-        "id": "fsACVQpVSifi"
+        "id": "2X-XaMSVcLua"
      },
      "source": [
-        "### Install the TensorFlow Model Garden pip package\n",
-        "\n",
-        "*  `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n",
-        "which is the nightly Model Garden package created daily automatically.\n",
-        "*  pip will install all models and dependencies automatically."
+        "# Decoding API"
      ]
    },
    {
@@ -66,6 +62,30 @@
        "\u003c/table\u003e"
      ]
    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "fsACVQpVSifi"
+      },
+      "source": [
+        "### Install the TensorFlow Model Garden pip package\n",
+        "\n",
+        "*  `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n",
+        "which is the nightly Model Garden package created daily automatically.\n",
+        "*  pip will install all models and dependencies automatically."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "G4BhAu01HZcM"
+      },
+      "outputs": [],
+      "source": [
+        "!pip uninstall -y opencv-python"
+      ]
+    },
    {
      "cell_type": "code",
      "execution_count": null,
@@ -74,7 +94,7 @@
      },
      "outputs": [],
      "source": [
-        "pip install  tf-models-nightly"
+        "!pip install tf-models-official"
      ]
    },
    {
@@ -92,9 +112,20 @@
        "\n",
        "import tensorflow as tf\n",
        "\n",
-        "from official import nlp\n",
-        "from official.nlp.modeling.ops import sampling_module\n",
-        "from official.nlp.modeling.ops import beam_search"
+        "from tensorflow_models import nlp"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "T92ccAzlnGqh"
+      },
+      "outputs": [],
+      "source": [
+        "def length_norm(length, dtype):\n",
+        "  \"\"\"Return length normalization factor.\"\"\"\n",
+        "  return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)"
      ]
    },
    {
@@ -103,7 +134,8 @@
        "id": "0AWgyo-IQ5sP"
      },
      "source": [
-        "# Decoding API\n",
+        "## Overview\n",
+        "\n",
        "This API provides an interface to experiment with different decoding strategies used for auto-regressive models.\n",
        "\n",
        "1. The following sampling strategies are provided in sampling_module.py, which inherits from the base Decoding class:\n",
@@ -182,7 +214,7 @@
        "id": "lV1RRp6ihnGX"
      },
      "source": [
-        "# Initialize the Model Hyper-parameters"
+        "## Initialize the Model Hyper-parameters"
      ]
    },
    {
@@ -193,44 +225,32 @@
      },
      "outputs": [],
      "source": [
-        "params = {}\n",
-        "params['num_heads'] = 2\n",
-        "params['num_layers'] = 2\n",
-        "params['batch_size'] = 2\n",
-        "params['n_dims'] = 256\n",
-        "params['max_decode_length'] = 4"
+        "params = {\n",
+        "    'num_heads': 2\n",
+        "    'num_layers': 2\n",
+        "    'batch_size': 2\n",
+        "    'n_dims': 256\n",
+        "    'max_decode_length': 4}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
-        "id": "UGvmd0_dRFYI"
+        "id": "CYXkoplAij01"
      },
      "source": [
-        "## What is a Cache?\n",
-        "In auto-regressive architectures like Transformer based [Encoder-Decoder](https://arxiv.org/abs/1706.03762) models, \n",
-        "Cache is used for fast sequential decoding.\n",
-        "It is a nested dictionary storing pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) for every layer.\n",
-        "\n",
-        "```\n",
-        "{\n",
-        "    'layer_%d' % layer: {\n",
-        "        'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32),\n",
-        "        'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32)\n",
-        "        } for layer in range(params['num_layers']),\n",
-        "    'model_specific_item' : Model specific tensor shape,\n",
-        "}\n",
-        "\n",
-        "```"
+        "## Initialize cache. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
-        "id": "CYXkoplAij01"
+        "id": "UGvmd0_dRFYI"
      },
      "source": [
-        "# Initialize cache. "
+        "In auto-regressive architectures like Transformer based [Encoder-Decoder](https://arxiv.org/abs/1706.03762) models, \n",
+        "Cache is used for fast sequential decoding.\n",
+        "It is a nested dictionary storing pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) for every layer."
      ]
    },
    {
@@ -243,35 +263,15 @@
      "source": [
        "cache = {\n",
        "    'layer_%d' % layer: {\n",
-        "        'k': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32),\n",
-        "        'v': tf.zeros([params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims']/params['num_heads']], dtype=tf.float32)\n",
+        "        'k': tf.zeros(\n",
+        "            shape=[params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims'] // params['num_heads']],\n",
+        "            dtype=tf.float32),\n",
+        "        'v': tf.zeros(\n",
+        "            shape=[params['batch_size'], params['max_decode_length'], params['num_heads'], params['n_dims'] // params['num_heads']],\n",
+        "            dtype=tf.float32)\n",
        "        } for layer in range(params['num_layers'])\n",
        "    }\n",
-        "print(\"cache key shape for layer 1 :\", cache['layer_1']['k'].shape)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "nNY3Xn8SiblP"
-      },
-      "source": [
-        "# Define closure for length normalization. **optional.**\n",
-        "\n",
-        "\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "T92ccAzlnGqh"
-      },
-      "outputs": [],
-      "source": [
-        "def length_norm(length, dtype):\n",
-        "  \"\"\"Return length normalization factor.\"\"\"\n",
-        "  return tf.pow(((5. + tf.cast(length, dtype)) / 6.), 0.0)"
+        "print(\"cache value shape for layer 1 :\", cache['layer_1']['k'].shape)"
      ]
    },
    {
@@ -280,15 +280,14 @@
        "id": "syl7I5nURPgW"
      },
      "source": [
-        "# Create model_fn\n",
+        "### Create model_fn\n",
        "  In practice, this will be replaced by an actual model implementation such as [here](https://github.com/tensorflow/models/blob/master/official/nlp/transformer/transformer.py#L236)\n",
        "```\n",
        "Args:\n",
        "i : Step that is being decoded.\n",
        "Returns:\n",
        "  logit probabilities of size [batch_size, 1, vocab_size]\n",
-        "```\n",
-        "\n"
+        "```\n"
      ]
    },
    {
@@ -307,15 +306,6 @@
        "  return probabilities[:, i, :]"
      ]
    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "DBMUkaVmVZBg"
-      },
-      "source": [
-        "# Initialize symbols_to_logits_fn\n"
-      ]
-    },
    {
      "cell_type": "code",
      "execution_count": null,
@@ -339,7 +329,7 @@
        "id": "R_tV3jyWVL47"
      },
      "source": [
-        "# Greedy \n",
+        "## Greedy \n",
        "Greedy decoding selects the token id with the highest probability as its next id: $id_t = argmax_{w}P(id | id_{1:t-1})$ at each timestep $t$. The following sketch shows greedy decoding. "
      ]
    },
@@ -370,7 +360,7 @@
        "id": "s4pTTsQXVz5O"
      },
      "source": [
-        "# top_k sampling\n",
+        "## top_k sampling\n",
        "In *Top-K* sampling, the *K* most likely next token ids are filtered and the probability mass is redistributed among only those *K* ids. "
      ]
    },
@@ -404,7 +394,7 @@
        "id": "Jp3G-eE_WI4Y"
      },
      "source": [
-        "# top_p sampling\n",
+        "## top_p sampling\n",
        "Instead of sampling only from the most likely *K* token ids, in *Top-p* sampling chooses from the smallest possible set of ids whose cumulative probability exceeds the probability *p*."
      ]
    },
@@ -438,7 +428,7 @@
        "id": "2hcuyJ2VWjDz"
      },
      "source": [
-        "# Beam search decoding\n",
+        "## Beam search decoding\n",
        "Beam search reduces the risk of missing hidden high probability token ids by keeping the most likely num_beams of hypotheses at each time step and eventually choosing the hypothesis that has the overall highest probability. "
      ]
    },

--- a/official/colab/nlp/customize_encoder.ipynb
+++ b/official/colab/nlp/customize_encoder.ipynb
--- a/official/colab/nlp/nlp_modeling_library_intro.ipynb
+++ b/official/colab/nlp/nlp_modeling_library_intro.ipynb
@@ -95,6 +95,19 @@
        "*  `pip` will install all models and dependencies automatically."
      ]
    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "IAOmYthAzI7J"
+      },
+      "outputs": [],
+      "source": [
+        "# Uninstall colab's opencv-python, it conflicts with `opencv-python-headless`\n",
+        "# which is installed by tf-models-official\n",
+        "!pip uninstall -y opencv-python"
+      ]
+    },
    {
      "cell_type": "code",
      "execution_count": null,
@@ -103,7 +116,7 @@
      },
      "outputs": [],
      "source": [
-        "!pip install -q tf-models-official==2.4.0"
+        "!pip install tf-models-official"
      ]
    },
    {
@@ -126,8 +139,7 @@
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
-        "from official.nlp import modeling\n",
-        "from official.nlp.modeling import layers, losses, models, networks"
+        "from tensorflow_models import nlp"
      ]
    },
    {
@@ -151,9 +163,9 @@
      "source": [
        "### Build a `BertPretrainer` model wrapping `BertEncoder`\n",
        "\n",
-        "The [BertEncoder](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/bert_encoder.py) implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers, but not the masked language model or classification task networks.\n",
+        "The `nlp.networks.BertEncoder` class implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers (`nlp.layers.TransformerEncoderBlock`), but not the masked language model or classification task networks.\n",
        "\n",
-        "The [BertPretrainer](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_pretrainer.py) allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives."
+        "The `nlp.models.BertPretrainer` class allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives."
      ]
    },
    {
@@ -166,9 +178,10 @@
      "source": [
        "# Build a small transformer network.\n",
        "vocab_size = 100\n",
-        "sequence_length = 16\n",
-        "network = modeling.networks.BertEncoder(\n",
-        "    vocab_size=vocab_size, num_layers=2, sequence_length=16)"
+        "network = nlp.networks.BertEncoder(\n",
+        "    vocab_size=vocab_size, \n",
+        "    # The number of TransformerEncoderBlock layers\n",
+        "    num_layers=3)"
      ]
    },
    {
@@ -177,7 +190,7 @@
        "id": "0NH5irV5KTMS"
      },
      "source": [
-        "Inspecting the encoder, we see it contains few embedding layers, stacked `Transformer` layers and are connected to three input layers:\n",
+        "Inspecting the encoder, we see it contains few embedding layers, stacked `nlp.layers.TransformerEncoderBlock` layers and are connected to three input layers:\n",
        "\n",
        "`input_word_ids`, `input_type_ids` and `input_mask`.\n"
      ]
@@ -190,7 +203,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(network, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(network, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -203,7 +216,7 @@
      "source": [
        "# Create a BERT pretrainer with the created network.\n",
        "num_token_predictions = 8\n",
-        "bert_pretrainer = modeling.models.BertPretrainer(\n",
+        "bert_pretrainer = nlp.models.BertPretrainer(\n",
        "    network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions')"
      ]
    },
@@ -213,7 +226,7 @@
        "id": "d5h5HT7gNHx_"
      },
      "source": [
-        "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `Classification` heads."
+        "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `nlp.layers.ClassificationHead` heads."
      ]
    },
    {
@@ -224,7 +237,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -236,7 +249,9 @@
      "outputs": [],
      "source": [
        "# We can feed some dummy data to get masked language model and sentence output.\n",
+        "sequence_length = 16\n",
        "batch_size = 2\n",
+        "\n",
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
@@ -246,8 +261,8 @@
        "    [word_id_data, mask_data, type_id_data, masked_lm_positions_data])\n",
        "lm_output = outputs[\"masked_lm\"]\n",
        "sentence_output = outputs[\"classification\"]\n",
-        "print(lm_output)\n",
-        "print(sentence_output)"
+        "print(f'lm_output: shape={lm_output.shape}, dtype={lm_output.dtype!r}')\n",
+        "print(f'sentence_output: shape={sentence_output.shape}, dtype={sentence_output.dtype!r}')"
      ]
    },
    {
@@ -272,14 +287,15 @@
        "masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n",
        "next_sentence_labels_data = np.random.randint(2, size=(batch_size))\n",
        "\n",
-        "mlm_loss = modeling.losses.weighted_sparse_categorical_crossentropy_loss(\n",
+        "mlm_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
        "    labels=masked_lm_ids_data,\n",
        "    predictions=lm_output,\n",
        "    weights=masked_lm_weights_data)\n",
-        "sentence_loss = modeling.losses.weighted_sparse_categorical_crossentropy_loss(\n",
+        "sentence_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
        "    labels=next_sentence_labels_data,\n",
        "    predictions=sentence_output)\n",
        "loss = mlm_loss + sentence_loss\n",
+        "\n",
        "print(loss)"
      ]
    },
@@ -290,8 +306,7 @@
      },
      "source": [
        "With the loss, you can optimize the model.\n",
-        "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_pretraining.py) for the full example.\n",
-        "\n"
+        "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_pretraining.py) for the full example.\n"
      ]
    },
    {
@@ -315,9 +330,9 @@
      "source": [
        "### Build a BertSpanLabeler wrapping BertEncoder\n",
        "\n",
-        "[BertSpanLabeler](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_span_labeler.py) implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n",
+        "The `nlp.models.BertSpanLabeler` class implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n",
        "\n",
-        "Note that `BertSpanLabeler` wraps a `BertEncoder`, the weights of which can be restored from the above pretraining model.\n"
+        "Note that `nlp.models.BertSpanLabeler` wraps a `nlp.networks.BertEncoder`, the weights of which can be restored from the above pretraining model.\n"
      ]
    },
    {
@@ -328,11 +343,11 @@
      },
      "outputs": [],
      "source": [
-        "network = modeling.networks.BertEncoder(\n",
-        "        vocab_size=vocab_size, num_layers=2, sequence_length=sequence_length)\n",
+        "network = nlp.networks.BertEncoder(\n",
+        "        vocab_size=vocab_size, num_layers=2)\n",
        "\n",
        "# Create a BERT trainer with the created network.\n",
-        "bert_span_labeler = modeling.models.BertSpanLabeler(network)"
+        "bert_span_labeler = nlp.models.BertSpanLabeler(network)"
      ]
    },
    {
@@ -352,7 +367,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -370,8 +385,9 @@
        "\n",
        "# Feed the data to the model.\n",
        "start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])\n",
-        "print(start_logits)\n",
-        "print(end_logits)"
+        "\n",
+        "print(f'start_logits: shape={start_logits.shape}, dtype={start_logits.dtype!r}')\n",
+        "print(f'end_logits: shape={end_logits.shape}, dtype={end_logits.dtype!r}')"
      ]
    },
    {
@@ -432,7 +448,7 @@
      "source": [
        "### Build a BertClassifier model wrapping BertEncoder\n",
        "\n",
-        "[BertClassifier](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/models/bert_classifier.py) implements a [CLS] token classification model containing a single classification head."
+        "`nlp.models.BertClassifier` implements a [CLS] token classification model containing a single classification head."
      ]
    },
    {
@@ -443,12 +459,12 @@
      },
      "outputs": [],
      "source": [
-        "network = modeling.networks.BertEncoder(\n",
-        "        vocab_size=vocab_size, num_layers=2, sequence_length=sequence_length)\n",
+        "network = nlp.networks.BertEncoder(\n",
+        "        vocab_size=vocab_size, num_layers=2)\n",
        "\n",
        "# Create a BERT trainer with the created network.\n",
        "num_classes = 2\n",
-        "bert_classifier = modeling.models.BertClassifier(\n",
+        "bert_classifier = nlp.models.BertClassifier(\n",
        "    network, num_classes=num_classes)"
      ]
    },
@@ -469,7 +485,7 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, dpi=48)"
+        "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
@@ -487,7 +503,7 @@
        "\n",
        "# Feed the data to the model.\n",
        "logits = bert_classifier([word_id_data, mask_data, type_id_data])\n",
-        "print(logits)"
+        "print(f'logits: shape={logits.shape}, dtype={logits.dtype!r}')"
      ]
    },
    {
@@ -529,8 +545,7 @@
  "metadata": {
    "colab": {
      "collapsed_sections": [],
-      "name": "Introduction to the TensorFlow Models NLP library",
-      "private_outputs": true,
+      "name": "nlp_modeling_library_intro.ipynb",
      "provenance": [],
      "toc_visible": true
    },