{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "80xnUmoI7fBX"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "8nvTnfs6Q692"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WmfcMK5P5C1G"
      },
      "source": [
        "# Introduction to the TensorFlow Models NLP library"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cH-oJ8R6AHMK"
      },
      "source": [
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/nlp\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0H_EFIhq4-MJ"
      },
      "source": [
        "## Learning objectives\n",
        "\n",
        "In this Colab notebook, you will learn how to build transformer-based models for common NLP tasks including pretraining, span labelling and classification using the building blocks from [NLP modeling library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2N97-dps_nUk"
      },
      "source": [
        "## Install and import"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "459ygAVl_rg0"
      },
      "source": [
        "### Install the TensorFlow Model Garden pip package\n",
        "\n",
        "*  `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n",
        "which is the nightly Model Garden package created daily automatically.\n",
        "*  `pip` will install all models and dependencies automatically."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IAOmYthAzI7J"
      },
      "outputs": [],
      "source": [
        "!pip install -q opencv-python"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Y-qGkdh6_sZc"
      },
      "outputs": [],
      "source": [
        "!pip install tf-models-official"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e4huSSwyAG_5"
      },
      "source": [
        "### Import Tensorflow and other libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "jqYXqtjBAJd9"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
        "from tensorflow_models import nlp"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "djBQWjvy-60Y"
      },
      "source": [
        "## BERT pretraining model\n",
        "\n",
        "BERT ([Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)) introduced the method of pre-training language representations on a large text corpus and then using that model for downstream NLP tasks.\n",
        "\n",
        "In this section, we will learn how to build a model to pretrain BERT on the masked language modeling task and next sentence prediction task. For simplicity, we only show the minimum example and use dummy data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MKuHVlsCHmiq"
      },
      "source": [
        "### Build a `BertPretrainer` model wrapping `BertEncoder`\n",
        "\n",
        "The `nlp.networks.BertEncoder` class implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers (`nlp.layers.TransformerEncoderBlock`), but not the masked language model or classification task networks.\n",
        "\n",
        "The `nlp.models.BertPretrainer` class allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EXkcXz-9BwB3"
      },
      "outputs": [],
      "source": [
        "# Build a small transformer network.\n",
        "vocab_size = 100\n",
        "network = nlp.networks.BertEncoder(\n",
        "    vocab_size=vocab_size, \n",
        "    # The number of TransformerEncoderBlock layers\n",
        "    num_layers=3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0NH5irV5KTMS"
      },
      "source": [
        "Inspecting the encoder, we see it contains few embedding layers, stacked `nlp.layers.TransformerEncoderBlock` layers and are connected to three input layers:\n",
        "\n",
        "`input_word_ids`, `input_type_ids` and `input_mask`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "lZNoZkBrIoff"
      },
      "outputs": [],
      "source": [
        "tf.keras.utils.plot_model(network, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "o7eFOZXiIl-b"
      },
      "outputs": [],
      "source": [
        "# Create a BERT pretrainer with the created network.\n",
        "num_token_predictions = 8\n",
        "bert_pretrainer = nlp.models.BertPretrainer(\n",
        "    network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d5h5HT7gNHx_"
      },
      "source": [
        "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `nlp.layers.ClassificationHead` heads."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "2tcNfm03IBF7"
      },
      "outputs": [],
      "source": [
        "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "F2oHrXGUIS0M"
      },
      "outputs": [],
      "source": [
        "# We can feed some dummy data to get masked language model and sentence output.\n",
        "sequence_length = 16\n",
        "batch_size = 2\n",
        "\n",
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "masked_lm_positions_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n",
        "\n",
        "outputs = bert_pretrainer(\n",
        "    [word_id_data, mask_data, type_id_data, masked_lm_positions_data])\n",
        "lm_output = outputs[\"masked_lm\"]\n",
        "sentence_output = outputs[\"classification\"]\n",
        "print(f'lm_output: shape={lm_output.shape}, dtype={lm_output.dtype!r}')\n",
        "print(f'sentence_output: shape={sentence_output.shape}, dtype={sentence_output.dtype!r}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bnx3UCHniCS5"
      },
      "source": [
        "### Compute loss\n",
        "Next, we can use `lm_output` and `sentence_output` to compute `loss`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "k30H4Q86f52x"
      },
      "outputs": [],
      "source": [
        "masked_lm_ids_data = np.random.randint(vocab_size, size=(batch_size, num_token_predictions))\n",
        "masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n",
        "next_sentence_labels_data = np.random.randint(2, size=(batch_size))\n",
        "\n",
        "mlm_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
        "    labels=masked_lm_ids_data,\n",
        "    predictions=lm_output,\n",
        "    weights=masked_lm_weights_data)\n",
        "sentence_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
        "    labels=next_sentence_labels_data,\n",
        "    predictions=sentence_output)\n",
        "loss = mlm_loss + sentence_loss\n",
        "\n",
        "print(loss)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wrmSs8GjHxVw"
      },
      "source": [
        "With the loss, you can optimize the model.\n",
        "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/legacy/bert/run_pretraining.py) for the full example.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k8cQVFvBCV4s"
      },
      "source": [
        "## Span labeling model\n",
        "\n",
        "Span labeling is the task to assign labels to a span of the text, for example, label a span of text as the answer of a given question.\n",
        "\n",
        "In this section, we will learn how to build a span labeling model. Again, we use dummy data for simplicity."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xrLLEWpfknUW"
      },
      "source": [
        "### Build a BertSpanLabeler wrapping BertEncoder\n",
        "\n",
        "The `nlp.models.BertSpanLabeler` class implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n",
        "\n",
        "Note that `nlp.models.BertSpanLabeler` wraps a `nlp.networks.BertEncoder`, the weights of which can be restored from the above pretraining model.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "B941M4iUCejO"
      },
      "outputs": [],
      "source": [
        "network = nlp.networks.BertEncoder(\n",
        "        vocab_size=vocab_size, num_layers=2)\n",
        "\n",
        "# Create a BERT trainer with the created network.\n",
        "bert_span_labeler = nlp.models.BertSpanLabeler(network)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QpB9pgj4PpMg"
      },
      "source": [
        "Inspecting the `bert_span_labeler`, we see it wraps the encoder with additional `SpanLabeling` that outputs `start_position` and `end_position`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "RbqRNJCLJu4H"
      },
      "outputs": [],
      "source": [
        "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "fUf1vRxZJwio"
      },
      "outputs": [],
      "source": [
        "# Create a set of 2-dimensional data tensors to feed into the model.\n",
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "\n",
        "# Feed the data to the model.\n",
        "start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])\n",
        "\n",
        "print(f'start_logits: shape={start_logits.shape}, dtype={start_logits.dtype!r}')\n",
        "print(f'end_logits: shape={end_logits.shape}, dtype={end_logits.dtype!r}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WqhgQaN1lt-G"
      },
      "source": [
        "### Compute loss\n",
        "With `start_logits` and `end_logits`, we can compute loss:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "waqs6azNl3Nn"
      },
      "outputs": [],
      "source": [
        "start_positions = np.random.randint(sequence_length, size=(batch_size))\n",
        "end_positions = np.random.randint(sequence_length, size=(batch_size))\n",
        "\n",
        "start_loss = tf.keras.losses.sparse_categorical_crossentropy(\n",
        "    start_positions, start_logits, from_logits=True)\n",
        "end_loss = tf.keras.losses.sparse_categorical_crossentropy(\n",
        "    end_positions, end_logits, from_logits=True)\n",
        "\n",
        "total_loss = (tf.reduce_mean(start_loss) + tf.reduce_mean(end_loss)) / 2\n",
        "print(total_loss)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Zdf03YtZmd_d"
      },
      "source": [
        "With the `loss`, you can optimize the model. Please see [run_squad.py](https://github.com/tensorflow/models/blob/master/official/legacy/bert/run_squad.py) for the full example."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0A1XnGSTChg9"
      },
      "source": [
        "## Classification model\n",
        "\n",
        "In the last section, we show how to build a text classification model.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MSK8OpZgnQa9"
      },
      "source": [
        "### Build a BertClassifier model wrapping BertEncoder\n",
        "\n",
        "`nlp.models.BertClassifier` implements a [CLS] token classification model containing a single classification head."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cXXCsffkCphk"
      },
      "outputs": [],
      "source": [
        "network = nlp.networks.BertEncoder(\n",
        "        vocab_size=vocab_size, num_layers=2)\n",
        "\n",
        "# Create a BERT trainer with the created network.\n",
        "num_classes = 2\n",
        "bert_classifier = nlp.models.BertClassifier(\n",
        "    network, num_classes=num_classes)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8tZKueKYP4bB"
      },
      "source": [
        "Inspecting the `bert_classifier`, we see it wraps the `encoder` with additional `Classification` head."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "snlutm9ZJgEZ"
      },
      "outputs": [],
      "source": [
        "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, expand_nested=True, dpi=48)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yyHPHsqBJkCz"
      },
      "outputs": [],
      "source": [
        "# Create a set of 2-dimensional data tensors to feed into the model.\n",
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "\n",
        "# Feed the data to the model.\n",
        "logits = bert_classifier([word_id_data, mask_data, type_id_data])\n",
        "print(f'logits: shape={logits.shape}, dtype={logits.dtype!r}')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "w--a2mg4nzKm"
      },
      "source": [
        "### Compute loss\n",
        "\n",
        "With `logits`, we can compute `loss`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9X0S1DoFn_5Q"
      },
      "outputs": [],
      "source": [
        "labels = np.random.randint(num_classes, size=(batch_size))\n",
        "\n",
        "loss = tf.keras.losses.sparse_categorical_crossentropy(\n",
        "    labels, logits, from_logits=True)\n",
        "print(loss)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mzBqOylZo3og"
      },
      "source": [
        "With the `loss`, you can optimize the model. Please see [run_classifier.py](https://github.com/tensorflow/models/blob/master/official/legacy/bert/run_classifier.py) or the [Fine tune_bert](https://www.tensorflow.org/text/tutorials/fine_tune_bert) notebook for the full example."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "nlp_modeling_library_intro.ipynb",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}