"torchvision/csrc/ops/cuda/interpolate_aa_kernels.cu" did not exist on "2c52d9f9e3a5ebb4513f1399ae5a84df140f5c96"
index.ipynb 18.7 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "80xnUmoI7fBX"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
14
      "execution_count": null,
15
16
17
18
      "metadata": {
        "cellView": "form",
        "id": "8nvTnfs6Q692"
      },
19
      "outputs": [],
20
21
22
23
24
25
26
27
28
29
30
31
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
32
      ]
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WmfcMK5P5C1G"
      },
      "source": [
        "# Introduction to the TensorFlow Models NLP library"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cH-oJ8R6AHMK"
      },
      "source": [
49
50
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
51
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/nlp\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
52
53
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
54
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
55
56
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
57
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
58
59
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
60
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/nlp/index.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
61
62
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0H_EFIhq4-MJ"
      },
      "source": [
        "## Learning objectives\n",
        "\n",
        "In this Colab notebook, you will learn how to build transformer-based models for common NLP tasks including pretraining, span labelling and classification using the building blocks from [NLP modeling library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2N97-dps_nUk"
      },
      "source": [
        "## Install and import"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "459ygAVl_rg0"
      },
      "source": [
        "### Install the TensorFlow Model Garden pip package\n",
        "\n",
93
94
        "*  `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n",
        "which is the nightly Model Garden package created daily automatically.\n",
95
96
97
        "*  `pip` will install all models and dependencies automatically."
      ]
    },
98
99
100
101
102
103
104
105
106
107
108
109
110
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IAOmYthAzI7J"
      },
      "outputs": [],
      "source": [
        "# Uninstall colab's opencv-python, it conflicts with `opencv-python-headless`\n",
        "# which is installed by tf-models-official\n",
        "!pip uninstall -y opencv-python"
      ]
    },
111
112
    {
      "cell_type": "code",
113
      "execution_count": null,
114
115
116
      "metadata": {
        "id": "Y-qGkdh6_sZc"
      },
117
      "outputs": [],
118
      "source": [
119
        "!pip install tf-models-official"
120
      ]
121
122
123
124
125
126
127
128
129
130
131
132
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e4huSSwyAG_5"
      },
      "source": [
        "### Import Tensorflow and other libraries"
      ]
    },
    {
      "cell_type": "code",
133
      "execution_count": null,
134
135
136
      "metadata": {
        "id": "jqYXqtjBAJd9"
      },
137
      "outputs": [],
138
139
140
141
      "source": [
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
142
        "from tensorflow_models import nlp"
143
      ]
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "djBQWjvy-60Y"
      },
      "source": [
        "## BERT pretraining model\n",
        "\n",
        "BERT ([Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)) introduced the method of pre-training language representations on a large text corpus and then using that model for downstream NLP tasks.\n",
        "\n",
        "In this section, we will learn how to build a model to pretrain BERT on the masked language modeling task and next sentence prediction task. For simplicity, we only show the minimum example and use dummy data."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MKuHVlsCHmiq"
      },
      "source": [
164
        "### Build a `BertPretrainer` model wrapping `BertEncoder`\n",
165
        "\n",
166
        "The `nlp.networks.BertEncoder` class implements the Transformer-based encoder as described in [BERT paper](https://arxiv.org/abs/1810.04805). It includes the embedding lookups and transformer layers (`nlp.layers.TransformerEncoderBlock`), but not the masked language model or classification task networks.\n",
167
        "\n",
168
        "The `nlp.models.BertPretrainer` class allows a user to pass in a transformer stack, and instantiates the masked language model and classification networks that are used to create the training objectives."
169
170
171
172
      ]
    },
    {
      "cell_type": "code",
173
      "execution_count": null,
174
175
176
      "metadata": {
        "id": "EXkcXz-9BwB3"
      },
177
      "outputs": [],
178
179
180
      "source": [
        "# Build a small transformer network.\n",
        "vocab_size = 100\n",
181
182
183
184
        "network = nlp.networks.BertEncoder(\n",
        "    vocab_size=vocab_size, \n",
        "    # The number of TransformerEncoderBlock layers\n",
        "    num_layers=3)"
185
      ]
186
187
188
189
190
191
192
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0NH5irV5KTMS"
      },
      "source": [
193
        "Inspecting the encoder, we see it contains few embedding layers, stacked `nlp.layers.TransformerEncoderBlock` layers and are connected to three input layers:\n",
194
195
196
197
198
199
        "\n",
        "`input_word_ids`, `input_type_ids` and `input_mask`.\n"
      ]
    },
    {
      "cell_type": "code",
200
      "execution_count": null,
201
202
203
      "metadata": {
        "id": "lZNoZkBrIoff"
      },
204
      "outputs": [],
205
      "source": [
206
        "tf.keras.utils.plot_model(network, show_shapes=True, expand_nested=True, dpi=48)"
207
      ]
208
209
210
    },
    {
      "cell_type": "code",
211
      "execution_count": null,
212
213
214
      "metadata": {
        "id": "o7eFOZXiIl-b"
      },
215
      "outputs": [],
216
217
218
      "source": [
        "# Create a BERT pretrainer with the created network.\n",
        "num_token_predictions = 8\n",
219
        "bert_pretrainer = nlp.models.BertPretrainer(\n",
220
        "    network, num_classes=2, num_token_predictions=num_token_predictions, output='predictions')"
221
      ]
222
223
224
225
226
227
228
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d5h5HT7gNHx_"
      },
      "source": [
229
        "Inspecting the `bert_pretrainer`, we see it wraps the `encoder` with additional `MaskedLM` and `nlp.layers.ClassificationHead` heads."
230
231
232
233
      ]
    },
    {
      "cell_type": "code",
234
      "execution_count": null,
235
236
237
      "metadata": {
        "id": "2tcNfm03IBF7"
      },
238
      "outputs": [],
239
      "source": [
240
        "tf.keras.utils.plot_model(bert_pretrainer, show_shapes=True, expand_nested=True, dpi=48)"
241
      ]
242
243
244
    },
    {
      "cell_type": "code",
245
      "execution_count": null,
246
247
248
      "metadata": {
        "id": "F2oHrXGUIS0M"
      },
249
      "outputs": [],
250
251
      "source": [
        "# We can feed some dummy data to get masked language model and sentence output.\n",
252
        "sequence_length = 16\n",
253
        "batch_size = 2\n",
254
        "\n",
255
256
257
258
259
260
261
262
263
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "masked_lm_positions_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n",
        "\n",
        "outputs = bert_pretrainer(\n",
        "    [word_id_data, mask_data, type_id_data, masked_lm_positions_data])\n",
        "lm_output = outputs[\"masked_lm\"]\n",
        "sentence_output = outputs[\"classification\"]\n",
264
265
        "print(f'lm_output: shape={lm_output.shape}, dtype={lm_output.dtype!r}')\n",
        "print(f'sentence_output: shape={sentence_output.shape}, dtype={sentence_output.dtype!r}')"
266
      ]
267
268
269
270
271
272
273
274
275
276
277
278
279
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bnx3UCHniCS5"
      },
      "source": [
        "### Compute loss\n",
        "Next, we can use `lm_output` and `sentence_output` to compute `loss`."
      ]
    },
    {
      "cell_type": "code",
280
      "execution_count": null,
281
282
283
      "metadata": {
        "id": "k30H4Q86f52x"
      },
284
      "outputs": [],
285
286
287
288
289
      "source": [
        "masked_lm_ids_data = np.random.randint(vocab_size, size=(batch_size, num_token_predictions))\n",
        "masked_lm_weights_data = np.random.randint(2, size=(batch_size, num_token_predictions))\n",
        "next_sentence_labels_data = np.random.randint(2, size=(batch_size))\n",
        "\n",
290
        "mlm_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
291
292
293
        "    labels=masked_lm_ids_data,\n",
        "    predictions=lm_output,\n",
        "    weights=masked_lm_weights_data)\n",
294
        "sentence_loss = nlp.losses.weighted_sparse_categorical_crossentropy_loss(\n",
295
296
297
        "    labels=next_sentence_labels_data,\n",
        "    predictions=sentence_output)\n",
        "loss = mlm_loss + sentence_loss\n",
298
        "\n",
299
        "print(loss)"
300
      ]
301
302
303
304
305
306
307
308
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wrmSs8GjHxVw"
      },
      "source": [
        "With the loss, you can optimize the model.\n",
309
        "After training, we can save the weights of TransformerEncoder for the downstream fine-tuning tasks. Please see [run_pretraining.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_pretraining.py) for the full example.\n"
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k8cQVFvBCV4s"
      },
      "source": [
        "## Span labeling model\n",
        "\n",
        "Span labeling is the task to assign labels to a span of the text, for example, label a span of text as the answer of a given question.\n",
        "\n",
        "In this section, we will learn how to build a span labeling model. Again, we use dummy data for simplicity."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xrLLEWpfknUW"
      },
      "source": [
331
        "### Build a BertSpanLabeler wrapping BertEncoder\n",
332
        "\n",
333
        "The `nlp.models.BertSpanLabeler` class implements a simple single-span start-end predictor (that is, a model that predicts two values: a start token index and an end token index), suitable for SQuAD-style tasks.\n",
334
        "\n",
335
        "Note that `nlp.models.BertSpanLabeler` wraps a `nlp.networks.BertEncoder`, the weights of which can be restored from the above pretraining model.\n"
336
337
338
339
      ]
    },
    {
      "cell_type": "code",
340
      "execution_count": null,
341
342
343
      "metadata": {
        "id": "B941M4iUCejO"
      },
344
      "outputs": [],
345
      "source": [
346
347
        "network = nlp.networks.BertEncoder(\n",
        "        vocab_size=vocab_size, num_layers=2)\n",
348
349
        "\n",
        "# Create a BERT trainer with the created network.\n",
350
        "bert_span_labeler = nlp.models.BertSpanLabeler(network)"
351
      ]
352
353
354
355
356
357
358
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QpB9pgj4PpMg"
      },
      "source": [
Mark Daoust's avatar
Mark Daoust committed
359
        "Inspecting the `bert_span_labeler`, we see it wraps the encoder with additional `SpanLabeling` that outputs `start_position` and `end_position`."
360
361
362
363
      ]
    },
    {
      "cell_type": "code",
364
      "execution_count": null,
365
366
367
      "metadata": {
        "id": "RbqRNJCLJu4H"
      },
368
      "outputs": [],
369
      "source": [
370
        "tf.keras.utils.plot_model(bert_span_labeler, show_shapes=True, expand_nested=True, dpi=48)"
371
      ]
372
373
374
    },
    {
      "cell_type": "code",
375
      "execution_count": null,
376
377
378
      "metadata": {
        "id": "fUf1vRxZJwio"
      },
379
      "outputs": [],
380
381
382
383
384
385
386
387
      "source": [
        "# Create a set of 2-dimensional data tensors to feed into the model.\n",
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "\n",
        "# Feed the data to the model.\n",
        "start_logits, end_logits = bert_span_labeler([word_id_data, mask_data, type_id_data])\n",
388
389
390
        "\n",
        "print(f'start_logits: shape={start_logits.shape}, dtype={start_logits.dtype!r}')\n",
        "print(f'end_logits: shape={end_logits.shape}, dtype={end_logits.dtype!r}')"
391
      ]
392
393
394
395
396
397
398
399
400
401
402
403
404
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WqhgQaN1lt-G"
      },
      "source": [
        "### Compute loss\n",
        "With `start_logits` and `end_logits`, we can compute loss:"
      ]
    },
    {
      "cell_type": "code",
405
      "execution_count": null,
406
407
408
      "metadata": {
        "id": "waqs6azNl3Nn"
      },
409
      "outputs": [],
410
411
412
413
414
415
416
417
418
419
420
      "source": [
        "start_positions = np.random.randint(sequence_length, size=(batch_size))\n",
        "end_positions = np.random.randint(sequence_length, size=(batch_size))\n",
        "\n",
        "start_loss = tf.keras.losses.sparse_categorical_crossentropy(\n",
        "    start_positions, start_logits, from_logits=True)\n",
        "end_loss = tf.keras.losses.sparse_categorical_crossentropy(\n",
        "    end_positions, end_logits, from_logits=True)\n",
        "\n",
        "total_loss = (tf.reduce_mean(start_loss) + tf.reduce_mean(end_loss)) / 2\n",
        "print(total_loss)"
421
      ]
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Zdf03YtZmd_d"
      },
      "source": [
        "With the `loss`, you can optimize the model. Please see [run_squad.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_squad.py) for the full example."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0A1XnGSTChg9"
      },
      "source": [
        "## Classification model\n",
        "\n",
        "In the last section, we show how to build a text classification model.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MSK8OpZgnQa9"
      },
      "source": [
449
        "### Build a BertClassifier model wrapping BertEncoder\n",
450
        "\n",
451
        "`nlp.models.BertClassifier` implements a [CLS] token classification model containing a single classification head."
452
453
454
455
      ]
    },
    {
      "cell_type": "code",
456
      "execution_count": null,
457
458
459
      "metadata": {
        "id": "cXXCsffkCphk"
      },
460
      "outputs": [],
461
      "source": [
462
463
        "network = nlp.networks.BertEncoder(\n",
        "        vocab_size=vocab_size, num_layers=2)\n",
464
465
466
        "\n",
        "# Create a BERT trainer with the created network.\n",
        "num_classes = 2\n",
467
        "bert_classifier = nlp.models.BertClassifier(\n",
468
        "    network, num_classes=num_classes)"
469
      ]
470
471
472
473
474
475
476
477
478
479
480
481
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8tZKueKYP4bB"
      },
      "source": [
        "Inspecting the `bert_classifier`, we see it wraps the `encoder` with additional `Classification` head."
      ]
    },
    {
      "cell_type": "code",
482
      "execution_count": null,
483
484
485
      "metadata": {
        "id": "snlutm9ZJgEZ"
      },
486
      "outputs": [],
487
      "source": [
488
        "tf.keras.utils.plot_model(bert_classifier, show_shapes=True, expand_nested=True, dpi=48)"
489
      ]
490
491
492
    },
    {
      "cell_type": "code",
493
      "execution_count": null,
494
495
496
      "metadata": {
        "id": "yyHPHsqBJkCz"
      },
497
      "outputs": [],
498
499
500
501
502
503
504
505
      "source": [
        "# Create a set of 2-dimensional data tensors to feed into the model.\n",
        "word_id_data = np.random.randint(vocab_size, size=(batch_size, sequence_length))\n",
        "mask_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "type_id_data = np.random.randint(2, size=(batch_size, sequence_length))\n",
        "\n",
        "# Feed the data to the model.\n",
        "logits = bert_classifier([word_id_data, mask_data, type_id_data])\n",
506
        "print(f'logits: shape={logits.shape}, dtype={logits.dtype!r}')"
507
      ]
508
509
510
511
512
513
514
515
516
517
518
519
520
521
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "w--a2mg4nzKm"
      },
      "source": [
        "### Compute loss\n",
        "\n",
        "With `logits`, we can compute `loss`:"
      ]
    },
    {
      "cell_type": "code",
522
      "execution_count": null,
523
524
525
      "metadata": {
        "id": "9X0S1DoFn_5Q"
      },
526
      "outputs": [],
527
528
529
      "source": [
        "labels = np.random.randint(num_classes, size=(batch_size))\n",
        "\n",
530
531
        "loss = tf.keras.losses.sparse_categorical_crossentropy(\n",
        "    labels, logits, from_logits=True)\n",
532
        "print(loss)"
533
      ]
534
535
536
537
538
539
540
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mzBqOylZo3og"
      },
      "source": [
Mark Daoust's avatar
Mark Daoust committed
541
        "With the `loss`, you can optimize the model. Please see [run_classifier.py](https://github.com/tensorflow/models/blob/master/official/nlp/bert/run_classifier.py) or the [Fine tune_bert](https://www.tensorflow.org/text/tutorials/fine_tune_bert) notebook for the full example."
542
543
      ]
    }
544
545
546
547
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
548
      "name": "nlp_modeling_library_intro.ipynb",
549
550
551
552
553
554
555
556
557
558
559
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}