customize_encoder.ipynb 20.9 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Bp8t2AI8i7uP"
      },
      "source": [
        "##### Copyright 2020 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
14
      "execution_count": null,
15
16
17
18
      "metadata": {
        "cellView": "form",
        "id": "rxPj2Lsni9O4"
      },
19
      "outputs": [],
20
21
22
23
24
25
26
27
28
29
30
31
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
32
      ]
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6xS-9i5DrRvO"
      },
      "source": [
        "# Customizing a Transformer Encoder"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Mwb9uw1cDXsa"
      },
      "source": [
49
50
        "\u003ctable class=\"tfo-notebook-buttons\" align=\"left\"\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
51
        "    \u003ca target=\"_blank\" href=\"https://www.tensorflow.org/tfmodels/nlp/customize_encoder\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" /\u003eView on TensorFlow.org\u003c/a\u003e\n",
52
53
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
54
        "    \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/docs/nlp/customize_encoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
55
56
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
57
        "    \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/docs/nlp/customize_encoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView source on GitHub\u003c/a\u003e\n",
58
59
        "  \u003c/td\u003e\n",
        "  \u003ctd\u003e\n",
Mark Daoust's avatar
Mark Daoust committed
60
        "    \u003ca href=\"https://storage.googleapis.com/tensorflow_docs/models/docs/nlp/customize_encoder.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/download_logo_32px.png\" /\u003eDownload notebook\u003c/a\u003e\n",
61
62
        "  \u003c/td\u003e\n",
        "\u003c/table\u003e"
63
64
65
66
67
68
69
70
71
72
73
74
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iLrcV4IyrcGX"
      },
      "source": [
        "## Learning objectives\n",
        "\n",
        "The [TensorFlow Models NLP library](https://github.com/tensorflow/models/tree/master/official/nlp/modeling) is a collection of tools for building and training modern high performance natural language models.\n",
        "\n",
75
        "The `tfm.nlp.networks.EncoderScaffold` is the core of this library, and lots of new network architectures are proposed to improve the encoder. In this Colab notebook, we will learn how to customize the encoder to employ new network architectures."
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YYxdyoWgsl8t"
      },
      "source": [
        "## Install and import"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fEJSFutUsn_h"
      },
      "source": [
        "### Install the TensorFlow Model Garden pip package\n",
        "\n",
95
96
        "*  `tf-models-official` is the stable Model Garden package. Note that it may not include the latest changes in the `tensorflow_models` github repo. To include latest changes, you may install `tf-models-nightly`,\n",
        "which is the nightly Model Garden package created daily automatically.\n",
97
98
99
100
101
        "*  `pip` will install all models and dependencies automatically."
      ]
    },
    {
      "cell_type": "code",
102
      "execution_count": null,
103
      "metadata": {
104
        "id": "mfHI5JyuJ1y9"
105
      },
106
      "outputs": [],
107
      "source": [
108
109
110
111
112
113
114
        "# Uninstall colab's opencv-python, it conflicts with `opencv-python-headless`\n",
        "# which is installed by tf-models-official\n",
        "!pip uninstall -y opencv-python"
      ]
    },
    {
      "cell_type": "code",
115
      "execution_count": null,
116
117
118
119
120
121
122
      "metadata": {
        "id": "thsKZDjhswhR"
      },
      "outputs": [],
      "source": [
        "!pip install -q tf-models-nightly"
      ]
123
124
125
126
127
128
129
130
131
132
133
134
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hpf7JPCVsqtv"
      },
      "source": [
        "### Import Tensorflow and other libraries"
      ]
    },
    {
      "cell_type": "code",
135
      "execution_count": null,
136
137
138
      "metadata": {
        "id": "my4dp-RMssQe"
      },
139
      "outputs": [],
140
141
142
143
      "source": [
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "\n",
144
145
146
        "import tensorflow_models as tfm\n",
        "from tensorflow_models import nlp"
      ]
147
148
149
150
151
152
153
154
155
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vjDmVsFfs85n"
      },
      "source": [
        "## Canonical BERT encoder\n",
        "\n",
156
        "Before learning how to customize the encoder, let's firstly create a canonical BERT enoder and use it to instantiate a `bert_classifier.BertClassifier` for classification task."
157
158
159
160
      ]
    },
    {
      "cell_type": "code",
161
      "execution_count": null,
162
163
164
      "metadata": {
        "id": "Oav8sbgstWc-"
      },
165
      "outputs": [],
166
167
168
169
170
171
172
      "source": [
        "cfg = {\n",
        "    \"vocab_size\": 100,\n",
        "    \"hidden_size\": 32,\n",
        "    \"num_layers\": 3,\n",
        "    \"num_attention_heads\": 4,\n",
        "    \"intermediate_size\": 64,\n",
173
        "    \"activation\": tfm.utils.activations.gelu,\n",
174
175
        "    \"dropout_rate\": 0.1,\n",
        "    \"attention_dropout_rate\": 0.1,\n",
176
        "    \"max_sequence_length\": 16,\n",
177
178
179
        "    \"type_vocab_size\": 2,\n",
        "    \"initializer\": tf.keras.initializers.TruncatedNormal(stddev=0.02),\n",
        "}\n",
180
        "bert_encoder = nlp.networks.BertEncoder(**cfg)\n",
181
182
        "\n",
        "def build_classifier(bert_encoder):\n",
183
        "  return nlp.models.BertClassifier(bert_encoder, num_classes=2)\n",
184
185
        "\n",
        "canonical_classifier_model = build_classifier(bert_encoder)"
186
      ]
187
188
189
190
191
192
193
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Qe2UWI6_tsHo"
      },
      "source": [
Mark Daoust's avatar
Mark Daoust committed
194
        "`canonical_classifier_model` can be trained using the training data. For details about how to train the model, please see the [Fine tuning bert](https://www.tensorflow.org/text/tutorials/fine_tune_bert) notebook. We skip the code that trains the model here.\n",
195
196
197
198
199
200
        "\n",
        "After training, we can apply the model to do prediction.\n"
      ]
    },
    {
      "cell_type": "code",
201
      "execution_count": null,
202
203
204
      "metadata": {
        "id": "csED2d-Yt5h6"
      },
205
      "outputs": [],
206
207
208
209
210
      "source": [
        "def predict(model):\n",
        "  batch_size = 3\n",
        "  np.random.seed(0)\n",
        "  word_ids = np.random.randint(\n",
211
212
        "      cfg[\"vocab_size\"], size=(batch_size, cfg[\"max_sequence_length\"]))\n",
        "  mask = np.random.randint(2, size=(batch_size, cfg[\"max_sequence_length\"]))\n",
213
        "  type_ids = np.random.randint(\n",
214
        "      cfg[\"type_vocab_size\"], size=(batch_size, cfg[\"max_sequence_length\"]))\n",
215
216
217
        "  print(model([word_ids, mask, type_ids], training=False))\n",
        "\n",
        "predict(canonical_classifier_model)"
218
      ]
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PzKStEK9t_Pb"
      },
      "source": [
        "## Customize BERT encoder\n",
        "\n",
        "One BERT encoder consists of an embedding network and multiple transformer blocks, and each transformer block contains an attention layer and a feedforward layer."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rmwQfhj6fmKz"
      },
      "source": [
        "We provide easy ways to customize each of those components via (1)\n",
        "[EncoderScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/encoder_scaffold.py) and (2) [TransformerScaffold](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer_scaffold.py)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xsMgEVHAui11"
      },
      "source": [
        "### Use EncoderScaffold\n",
        "\n",
249
        "`networks.EncoderScaffold` allows users to provide a custom embedding subnetwork\n",
250
251
252
253
254
255
256
257
258
259
260
        "  (which will replace the standard embedding logic) and/or a custom hidden layer class (which will replace the `Transformer` instantiation in the encoder)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-JBabpa2AOz8"
      },
      "source": [
        "#### Without Customization\n",
        "\n",
261
        "Without any customization, `networks.EncoderScaffold` behaves the same the canonical `networks.BertEncoder`.\n",
262
        "\n",
263
        "As shown in the following example, `networks.EncoderScaffold` can load `networks.BertEncoder`'s weights and output the same values:"
264
265
266
267
      ]
    },
    {
      "cell_type": "code",
268
      "execution_count": null,
269
270
271
      "metadata": {
        "id": "ktNzKuVByZQf"
      },
272
      "outputs": [],
273
274
275
276
      "source": [
        "default_hidden_cfg = dict(\n",
        "    num_attention_heads=cfg[\"num_attention_heads\"],\n",
        "    intermediate_size=cfg[\"intermediate_size\"],\n",
277
        "    intermediate_activation=cfg[\"activation\"],\n",
278
279
        "    dropout_rate=cfg[\"dropout_rate\"],\n",
        "    attention_dropout_rate=cfg[\"attention_dropout_rate\"],\n",
280
        "    kernel_initializer=cfg[\"initializer\"],\n",
281
282
283
284
285
        ")\n",
        "default_embedding_cfg = dict(\n",
        "    vocab_size=cfg[\"vocab_size\"],\n",
        "    type_vocab_size=cfg[\"type_vocab_size\"],\n",
        "    hidden_size=cfg[\"hidden_size\"],\n",
286
        "    initializer=cfg[\"initializer\"],\n",
287
        "    dropout_rate=cfg[\"dropout_rate\"],\n",
288
        "    max_seq_length=cfg[\"max_sequence_length\"]\n",
289
290
291
292
293
294
295
        ")\n",
        "default_kwargs = dict(\n",
        "    hidden_cfg=default_hidden_cfg,\n",
        "    embedding_cfg=default_embedding_cfg,\n",
        "    num_hidden_instances=cfg[\"num_layers\"],\n",
        "    pooled_output_dim=cfg[\"hidden_size\"],\n",
        "    return_all_layer_outputs=True,\n",
296
        "    pooler_layer_initializer=cfg[\"initializer\"],\n",
297
        ")\n",
298
        "\n",
299
        "encoder_scaffold = nlp.networks.EncoderScaffold(**default_kwargs)\n",
300
301
302
303
        "classifier_model_from_encoder_scaffold = build_classifier(encoder_scaffold)\n",
        "classifier_model_from_encoder_scaffold.set_weights(\n",
        "    canonical_classifier_model.get_weights())\n",
        "predict(classifier_model_from_encoder_scaffold)"
304
      ]
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sMaUmLyIuwcs"
      },
      "source": [
        "#### Customize Embedding\n",
        "\n",
        "Next, we show how to use a customized embedding network.\n",
        "\n",
        "We firstly build an embedding network that will replace the default network. This one will have 2 inputs (`mask` and `word_ids`) instead of 3, and won't use positional embeddings."
      ]
    },
    {
      "cell_type": "code",
321
      "execution_count": null,
322
323
324
      "metadata": {
        "id": "LTinnaG6vcsw"
      },
325
      "outputs": [],
326
327
      "source": [
        "word_ids = tf.keras.layers.Input(\n",
328
        "    shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_word_ids\")\n",
329
        "mask = tf.keras.layers.Input(\n",
330
        "    shape=(cfg['max_sequence_length'],), dtype=tf.int32, name=\"input_mask\")\n",
331
        "embedding_layer = nlp.layers.OnDeviceEmbedding(\n",
332
333
        "    vocab_size=cfg['vocab_size'],\n",
        "    embedding_width=cfg['hidden_size'],\n",
334
        "    initializer=cfg[\"initializer\"],\n",
335
336
        "    name=\"word_embeddings\")\n",
        "word_embeddings = embedding_layer(word_ids)\n",
337
        "attention_mask = nlp.layers.SelfAttentionMask()([word_embeddings, mask])\n",
338
339
        "new_embedding_network = tf.keras.Model([word_ids, mask],\n",
        "                                       [word_embeddings, attention_mask])"
340
      ]
341
342
343
344
345
346
347
348
349
350
351
352
353
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HN7_yu-6O3qI"
      },
      "source": [
        "Inspecting `new_embedding_network`, we can see it takes two inputs:\n",
        "`input_word_ids` and `input_mask`."
      ]
    },
    {
      "cell_type": "code",
354
      "execution_count": null,
355
356
357
      "metadata": {
        "id": "fO9zKFE4OpHp"
      },
358
      "outputs": [],
359
360
      "source": [
        "tf.keras.utils.plot_model(new_embedding_network, show_shapes=True, dpi=48)"
361
      ]
362
363
364
365
366
367
368
369
370
371
372
373
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9cOaGQHLv12W"
      },
      "source": [
        "We then can build a new encoder using the above `new_embedding_network`."
      ]
    },
    {
      "cell_type": "code",
374
      "execution_count": null,
375
376
377
      "metadata": {
        "id": "mtFDMNf2vIl9"
      },
378
      "outputs": [],
379
380
381
382
383
384
385
      "source": [
        "kwargs = dict(default_kwargs)\n",
        "\n",
        "# Use new embedding network.\n",
        "kwargs['embedding_cls'] = new_embedding_network\n",
        "kwargs['embedding_data'] = embedding_layer.embeddings\n",
        "\n",
386
        "encoder_with_customized_embedding = nlp.networks.EncoderScaffold(**kwargs)\n",
387
388
389
390
391
392
        "classifier_model = build_classifier(encoder_with_customized_embedding)\n",
        "# ... Train the model ...\n",
        "print(classifier_model.inputs)\n",
        "\n",
        "# Assert that there are only two inputs.\n",
        "assert len(classifier_model.inputs) == 2"
393
      ]
394
395
396
397
398
399
400
401
402
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Z73ZQDtmwg9K"
      },
      "source": [
        "#### Customized Transformer\n",
        "\n",
403
        "User can also override the `hidden_cls` argument in `networks.EncoderScaffold`'s constructor to employ a customized Transformer layer.\n",
404
        "\n",
405
        "See [the source of `nlp.layers.ReZeroTransformer`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/rezero_transformer.py) for how to implement a customized Transformer layer.\n",
406
        "\n",
407
        "Following is an example of using `nlp.layers.ReZeroTransformer`:\n"
408
409
410
411
      ]
    },
    {
      "cell_type": "code",
412
      "execution_count": null,
413
414
415
      "metadata": {
        "id": "uAIarLZgw6pA"
      },
416
      "outputs": [],
417
418
419
420
      "source": [
        "kwargs = dict(default_kwargs)\n",
        "\n",
        "# Use ReZeroTransformer.\n",
421
        "kwargs['hidden_cls'] = nlp.layers.ReZeroTransformer\n",
422
        "\n",
423
        "encoder_with_rezero_transformer = nlp.networks.EncoderScaffold(**kwargs)\n",
424
425
426
427
428
429
        "classifier_model = build_classifier(encoder_with_rezero_transformer)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)\n",
        "\n",
        "# Assert that the variable `rezero_alpha` from ReZeroTransformer exists.\n",
        "assert 'rezero_alpha' in ''.join([x.name for x in classifier_model.trainable_weights])"
430
      ]
431
432
433
434
435
436
437
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6PMHFdvnxvR0"
      },
      "source": [
438
        "### Use `nlp.layers.TransformerScaffold`\n",
439
        "\n",
440
        "The above method of customizing the model requires rewriting the whole `nlp.layers.Transformer` layer, while sometimes you may only want to customize either attention layer or feedforward block. In this case, `nlp.layers.TransformerScaffold` can be used.\n"
441
442
443
444
445
446
447
448
449
450
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D6FejlgwyAy_"
      },
      "source": [
        "#### Customize Attention Layer\n",
        "\n",
451
        "User can also override the `attention_cls` argument in `layers.TransformerScaffold`'s constructor to employ a customized Attention layer.\n",
452
        "\n",
453
        "See [the source of `nlp.layers.TalkingHeadsAttention`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/talking_heads_attention.py) for how to implement a customized `Attention` layer.\n",
454
        "\n",
455
        "Following is an example of using `nlp.layers.TalkingHeadsAttention`:"
456
457
458
459
      ]
    },
    {
      "cell_type": "code",
460
      "execution_count": null,
461
462
463
      "metadata": {
        "id": "nFrSMrZuyNeQ"
      },
464
      "outputs": [],
465
466
467
      "source": [
        "# Use TalkingHeadsAttention\n",
        "hidden_cfg = dict(default_hidden_cfg)\n",
468
        "hidden_cfg['attention_cls'] = nlp.layers.TalkingHeadsAttention\n",
469
470
        "\n",
        "kwargs = dict(default_kwargs)\n",
471
        "kwargs['hidden_cls'] = nlp.layers.TransformerScaffold\n",
472
473
        "kwargs['hidden_cfg'] = hidden_cfg\n",
        "\n",
474
        "encoder = nlp.networks.EncoderScaffold(**kwargs)\n",
475
476
477
478
479
480
        "classifier_model = build_classifier(encoder)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)\n",
        "\n",
        "# Assert that the variable `pre_softmax_weight` from TalkingHeadsAttention exists.\n",
        "assert 'pre_softmax_weight' in ''.join([x.name for x in classifier_model.trainable_weights])"
481
482
483
484
      ]
    },
    {
      "cell_type": "code",
485
      "execution_count": null,
486
487
488
489
490
491
492
      "metadata": {
        "id": "tKkZ8spzYmpc"
      },
      "outputs": [],
      "source": [
        "tf.keras.utils.plot_model(encoder_with_rezero_transformer, show_shapes=True, dpi=48)"
      ]
493
494
495
496
497
498
499
500
501
502
503
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kuEJcTyByVvI"
      },
      "source": [
        "#### Customize Feedforward Layer\n",
        "\n",
        "Similiarly, one could also customize the feedforward layer.\n",
        "\n",
504
        "See [the source of `nlp.layers.GatedFeedforward`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/gated_feedforward.py) for how to implement a customized feedforward layer.\n",
505
        "\n",
506
        "Following is an example of using `nlp.layers.GatedFeedforward`:"
507
508
509
510
      ]
    },
    {
      "cell_type": "code",
511
      "execution_count": null,
512
513
514
      "metadata": {
        "id": "XAbKy_l4y_-i"
      },
515
      "outputs": [],
516
      "source": [
517
        "# Use GatedFeedforward\n",
518
        "hidden_cfg = dict(default_hidden_cfg)\n",
519
        "hidden_cfg['feedforward_cls'] = nlp.layers.GatedFeedforward\n",
520
521
        "\n",
        "kwargs = dict(default_kwargs)\n",
522
        "kwargs['hidden_cls'] = nlp.layers.TransformerScaffold\n",
523
524
        "kwargs['hidden_cfg'] = hidden_cfg\n",
        "\n",
525
        "encoder_with_gated_feedforward = nlp.networks.EncoderScaffold(**kwargs)\n",
526
527
528
529
530
531
        "classifier_model = build_classifier(encoder_with_gated_feedforward)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)\n",
        "\n",
        "# Assert that the variable `gate` from GatedFeedforward exists.\n",
        "assert 'gate' in ''.join([x.name for x in classifier_model.trainable_weights])"
532
      ]
533
534
535
536
537
538
539
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a_8NWUhkzeAq"
      },
      "source": [
540
        "### Build a new Encoder\n",
541
542
543
        "\n",
        "Finally, you could also build a new encoder using building blocks in the modeling library.\n",
        "\n",
544
545
546
        "See [the source for `nlp.networks.AlbertEncoder`](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/networks/albert_encoder.py) as an example of how to du this. \n",
        "\n",
        "Here is an example using `nlp.networks.AlbertEncoder`:\n"
547
548
549
550
      ]
    },
    {
      "cell_type": "code",
551
      "execution_count": null,
552
553
554
      "metadata": {
        "id": "xsiA3RzUzmUM"
      },
555
      "outputs": [],
556
      "source": [
557
        "albert_encoder = nlp.networks.AlbertEncoder(**cfg)\n",
558
559
560
        "classifier_model = build_classifier(albert_encoder)\n",
        "# ... Train the model ...\n",
        "predict(classifier_model)"
561
      ]
562
563
564
565
566
567
568
569
570
571
572
573
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MeidDfhlHKSO"
      },
      "source": [
        "Inspecting the `albert_encoder`, we see it stacks the same `Transformer` layer multiple times."
      ]
    },
    {
      "cell_type": "code",
574
      "execution_count": null,
575
576
577
      "metadata": {
        "id": "Uv_juT22HERW"
      },
578
      "outputs": [],
579
580
      "source": [
        "tf.keras.utils.plot_model(albert_encoder, show_shapes=True, dpi=48)"
581
582
583
584
585
586
587
588
589
590
591
592
593
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "customize_encoder.ipynb",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
594
    }
595
596
597
598
  },
  "nbformat": 4,
  "nbformat_minor": 0
}