Commit 886cb497 authored by thomwolf's avatar thomwolf
Browse files

updating readme and notebooks

parent fd647e8c
# PyTorch implementation of Google AI's BERT model with a script to load Google's pre-trained models
## Introduction
# PyTorch implementation of Google AI's BERT model with Google's pre-trained models
This repository contains an op-for-op PyTorch reimplementation of [Google's TensorFlow repository for the BERT model](https://github.com/google-research/bert) that was released together with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
......@@ -8,18 +6,36 @@ This implementation can load any pre-trained TensorFlow checkpoint for BERT (in
The code to use, in addition, [the Multilingual and Chinese models](https://github.com/google-research/bert/blob/master/multilingual.md) will be added later this week (it's actually just the tokenization code that needs to be updated).
## Installation, requirements, test
# Documentation
| Section | Content |
|-|-|
| [Installation](#installation) | How to install the package |
| [Content](#content) | Overview of the package |
| [Usage](#usage) | Quickstart examples |
| [Doc](#doc) | Detailed documentation |
| [Examples](#examples) | Detailed examples on how to fine-tune Bert |
| [Notebooks](#notebooks) | Introduction on the provided Jupyter Notebooks |
| [TPU](#tup) | Notes on TPU support and pretraining scripts |
| [Command-line interface](#Command-line-interface) | Convert a TensorFlow checkpoint in a PyTorch dump |
# Installation
This code was tested on Python 3.5+. The requirements are:
This repo was tested on Python 3.5+ and PyTorch 0.4.1
- PyTorch (>= 0.4.1)
- tqdm
## From pip
To install the dependencies:
PyTorch pretrained bert can be installed by pip as follows:
```bash
pip install pytorch_pretrained_bert
```
````bash
pip install -r ./requirements.txt
````
## From source
Clone the repository and run:
```bash
pip install [--editable] .
```
A series of tests is included in the [tests folder](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/tests) and can be run using `pytest` (install pytest if needed: `pip install pytest`).
......@@ -28,15 +44,123 @@ You can run the tests with the command:
python -m pytest -sv tests/
```
## PyTorch models for BERT
# Content
This package comprises the following classes that can be imported in Python and are detailed in the [Doc](#doc) section of this readme:
- Six PyTorch models (`torch.nn.Module`) for Bert with pre-trained weights:
- `BertModel` - raw BERT Transformer model (**fully pre-trained**),
- `BertForMaskedLM` - BERT Transformer with the pre-trained masked language modeling head on top (**fully pre-trained**),
- `BertForNextSentencePrediction` - BERT Transformer with the pre-trained next sentence prediction classifier on top (**fully pre-trained**),
- `BertForPretraining` - BERT Transformer with masked language modeling head and next sentence prediction classifier on top (**fully pre-trained**),
- `BertForSequenceClassification` - BERT Transformer with a sequence classification head on top (BERT Transformer is **pre-trained**, the sequence classification head **is only initialized and has to be trained**),
- `BertForQuestionAnswering` - BERT Transformer with a token classification head on top (BERT Transformer is **pre-trained**, the token classification head **is only initialized and has to be trained**).
- Three tokenizers:
- `BasicTokenizer` - basic tokenization (punctuation splitting, lower casing, etc.),
- `WordpieceTokenizer` - WordPiece tokenization,
- `BertTokenizer` - perform end-to-end tokenization, i.e. basic tokenization followed by WordPiece tokenization.
- One optimizer:
- `BERTAdam` - Bert version of Adam algorithm with weight decay fix, warmup and linear decay of the learning rate.
- A configuration class:
- `BertConfig` - Configuration class to store the configuration of a `BertModel` with utilisities to read and write from JSON configuration files.
The repository further comprises:
- Three examples on how to use Bert (in the [`examples` folder](./examples)):
- [`extract_features.py`](./examples/extract_features.py) - Show how to extract hidden states from an instance of `BertModel`,
- [`run_classifier.py`](./examples/run_classifier.py) - Show how to fine-tune an instance of `BertForSequenceClassification` on GLUE's MRPC task,
- [`run_squad.py`](./examples/run_squad.py) - Show how to fine-tune an instance of `BertForQuestionAnswering` on SQuAD v1.0 task.
These examples are detailed in the [Examples](#examples) section of this readme.
- Three notebooks that were used to check that the TensorFlow and PyTorch models behave identically (in the [`notebooks` folder](./notebooks)):
- [`Comparing-TF-and-PT-models.ipynb`](./notebooks/Comparing-TF-and-PT-models.ipynb) - Compare the hidden states predicted by `BertModel`,
- [`Comparing-TF-and-PT-models-SQuAD.ipynb`](./notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb) - Compare the spans predicted by `BertForQuestionAnswering` instances,
- [`Comparing-TF-and-PT-models-MLM-NSP.ipynb`](./notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb) - Compare the predictions of the `BertForPretraining` instances.
These notebooks are detailed in the [Notebooks](#notebooks) section of this readme.
- A command-line interface to convert any TensorFlow checkpoint in a PyTorch dump:
This CLI is detailed in the [Command-line interface](#Command-line-interface) section of this readme.
# Usage
Here is a quick-start example using the `BertForMaskedLM` class with Google AI's pre-trained `Bert base uncased` model:
```python
import torch
from pytorch_pretrained_bert import BertForMaskedLM, BertTokenizer
# Load pre-trained model and tokenizer (weights and vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
# Prepare tokenized input with a masked token
tokenized_text = "Who was Jim Henson ? Jim Henson was a puppeteer"
tokenized_text = tokenizer.tokenize(text)
masked_index = 6
tokenized_text[masked_index] = '[MASK]'
assert tokenized_text == ['who', 'was', 'jim', 'henson', '?', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer']
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Assign sentence A and sentence B indices to 1st (resp 2nd) sentences
segments_ids = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
# Predict masked tokens with model
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
model.eval()
predictions = model(tokens_tensor, segments_tensors)
# Use model to predict
predicted_index = torch.argmax(predictions[0, masked_index]).item()
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])
assert predicted_token == 'henson'
```
# Doc
Here is a detailed documentation of the classes in the package.
## Loading pre-trained weigths
To load Google AI's pre-trained weight, the PyTorch model classes and the tokenizer can be instantiated as
```python
model = BERT_CLASS.from_pretrain(PRE_TRAINED_MODEL_NAME_OR_PATH)
```
where
- `BERT_CLASS` is either the `BertTokenizer` class (to load the vocabulary) or one of the six PyTorch model classes: `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPretraining`, `BertForSequenceClassification` or `BertForQuestionAnswering` (to load the pre-trained weights), and
- `PRE_TRAINED_MODEL_NAME` is either:
We included three PyTorch models in this repository that you will find in [`modeling.py`](modeling.py):
- the shortcut name of a Google AI's pre-trained model selected in the list:
- `BertModel` - the basic BERT Transformer model
- `BertForSequenceClassification` - the BERT model with a sequence classification head on top
- `BertForQuestionAnswering` - the BERT model with a token classification head on top
- `bert-base-uncased`: 12-layer, 768-hidden, 12-heads, 110M parameters
- `bert-large-uncased`: 24-layer, 1024-hidden, 16-heads, 340M parameters
- `bert-base-cased`: 12-layer, 768-hidden, 12-heads , 110M parameters
- `bert-base-multilingual`: 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
- `bert-base-chinese`: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
Here are some details on each class.
- a path or url to a pretrained model archive containing:
. `bert_config.json` a configuration file for the model
. `pytorch_model.bin` a PyTorch dump of a pre-trained instance `BertForPreTraining` (saved with the usual `torch.save()`)
If `PRE_TRAINED_MODEL_NAME` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links [here](pytorch_pretrained_bert/modeling.py)) and stored in a cache folder to avoid future download (the cache folder can be found at `~/.pytorch_pretrained_bert/`).
Example:
```python
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
```
## PyTorch models
### 1. `BertModel`
......@@ -44,14 +168,14 @@ Here are some details on each class.
The inputs and output are **identical to the TensorFlow model inputs and outputs**.
We detail them here. This model takes as inputs:
We detail them here. This model takes as *inputs*:
- `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] with the word token indices in the vocabulary (see the tokens preprocessing logic in the scripts `extract_features.py`, `run_classifier.py` and `run_squad.py`), and
- `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to a `sentence B` token (see BERT paper for more details).
- `attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [0, 1]. It's a mask to be used if the input sequence length is smaller than the max input sequence length in the current batch. It's the mask that we typically use for attention when a batch has varying length sentences.
- `output_all_encoded_layers`: boolean which controls the content of the `encoded_layers` output as described below. Default: `True`.
This model outputs a tuple composed of:
This model *outputs* a tuple composed of:
- `encoded_layers`: controled by the value of the `output_encoded_layers` argument:
......@@ -62,7 +186,52 @@ This model outputs a tuple composed of:
An example on how to use this class is given in the `extract_features.py` script which can be used to extract the hidden states of the model for a given input.
### 2. `BertForSequenceClassification`
### 2. `BertForPreTraining`
`BertForPreTraining` includes the `BertModel` Transformer followed by the two pre-training heads:
- the masked language modeling head, and
- the next sentence classification head.
*Inputs* comprises the inputs of the [`BertModel`](###-1.-`BertModel`) class plus two optional labels:
- `masked_lm_labels`: masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss is only computed for the labels set in [0, ..., vocab_size]
- `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size] with indices selected in [0, 1]. 0 => next sentence is the continuation, 1 => next sentence is a random sentence.
*Outputs*:
- if `masked_lm_labels` and `next_sentence_label` are not `None`: Outputs the total_loss which is the sum of the masked language modeling loss and the next sentence classification loss.
- if `masked_lm_labels` or `next_sentence_label` is `None`: Outputs a tuple comprising
- the masked language modeling logits, and
- the next sentence classification logits.
### 3. `BertForMaskedLM`
`BertForMaskedLM` includes the `BertModel` Transformer followed by the (possibly) pre-trained masked language modeling head.
*Inputs* comprises the inputs of the [`BertModel`](###-1.-`BertModel`) class plus optional label:
- `masked_lm_labels`: masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss is only computed for the labels set in [0, ..., vocab_size]
*Outputs*:
- if `masked_lm_labels` is not `None`: Outputs the masked language modeling loss.
- if `masked_lm_labels` is `None`: Outputs the masked language modeling logits.
### 4. `BertForNextSentencePrediction`
`BertForNextSentencePrediction` includes the `BertModel` Transformer followed by the next sentence classification head.
*Inputs* comprises the inputs of the [`BertModel`](###-1.-`BertModel`) class plus an optional label:
- `next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size] with indices selected in [0, 1]. 0 => next sentence is the continuation, 1 => next sentence is a random sentence.
*Outputs*:
- if `next_sentence_label` is not `None`: Outputs the next sentence classification loss.
- if `next_sentence_label` is `None`: Outputs the next sentence classification logits.
### 5. `BertForSequenceClassification`
`BertForSequenceClassification` is a fine-tuning model that includes `BertModel` and a sequence-level (sequence or pair of sequences) classifier on top of the `BertModel`.
......@@ -70,7 +239,7 @@ The sequence-level classifier is a linear layer that takes as input the last hid
An example on how to use this class is given in the `run_classifier.py` script which can be used to fine-tune a single sequence (or pair of sequence) classifier using BERT, for example for the MRPC task.
### 3. `BertForQuestionAnswering`
### 6. `BertForQuestionAnswering`
`BertForQuestionAnswering` is a fine-tuning model that includes `BertModel` with a token-level classifiers on top of the full sequence of last hidden states.
......@@ -78,31 +247,54 @@ The token-level classifier takes as input the full sequence of the last hidden s
An example on how to use this class is given in the `run_squad.py` script which can be used to fine-tune a token classifier using BERT, for example for the SQuAD task.
## Tokenizers
## Converting a TensorFlow checkpoint in a PyTorch checkpoint
### `BertTokenizer`
You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the [`convert_tf_checkpoint_to_pytorch.py`](convert_tf_checkpoint_to_pytorch.py) script.
`BertTokenizer` perform end-to-end tokenization, i.e. basic tokenization followed by WordPiece tokenization.
This script takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using `torch.load()` (see examples in `extract_features.py`, `run_classifier.py` and `run_squad.py`).
This class has two arguments:
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too.
- `vocab_file`: path to a vocabulary file.
- `do_lower_case`: convert text to lower-case while tokenizing. **Default = True**.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch.
and three methods:
Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model:
- `tokenize(text)`: convert a `str` in a list of `str` tokens by (1) performing basic tokenization and (2) WordPiece tokenization.
- `convert_tokens_to_ids(tokens)`: convert a list of `str` tokens in a list of `int` indices in the vocabulary.
- `convert_ids_to_tokens(tokens)`: convert a list of `int` indices in a list of `str` tokens in the vocabulary.
```shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
### `BasicTokenizer` and `WordpieceTokenizer`
python convert_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt \
--bert_config_file $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_path $BERT_BASE_DIR/pytorch_model.bin
```
Please refer to the doc strings and code in [`tokenization.py`](./pytorch_pretrained_bert/tokenization.py) for the details of these classes. In general it is recommended to use `BertTokenizer` unless you know what you are doing.
You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/bert#pre-trained-models).
## Optimizer
### `BERTAdam`
`BERTAdam` is a `torch.optimizer` adapted to be closer to the optimizer used in the TensorFlow implementation of Bert. The differences with PyTorch Adam optimizer are the following:
- BERTAdam implements weight decay fix,
- BERTAdam doesn't compensate for bias as in the regular Adam optimizer.
The optimizer accepts the following arguments:
- `lr` : learning rate
- `warmup` : portion of t_total for the warmup, -1 means no warmup. Default : -1
- `t_total` : total number of training steps for the learning
rate schedule, -1 means constant learning rate. Default : -1
- `schedule` : schedule to use for the warmup (see above). Default : 'warmup_linear'
- `b1` : Adams b1. Default : 0.9
- `b2` : Adams b2. Default : 0.999
- `e` : Adams epsilon. Default : 1e-6
- `weight_decay_rate:` Weight decay. Default : 0.01
- `max_grad_norm` : Maximum norm for the gradients (-1 means no clipping). Default : 1.0
## Training on large batches: gradient accumulation, multi-GPU and distributed training
# Examples
Fine-tuning the models
## Training large models: introduction, tools and examples
BERT-base and BERT-large are respectively 110M and 340M parameters models and it can be difficult to fine-tune them on a single GPU with the recommended batch size for good performance (in most case a batch size of 32).
......@@ -122,26 +314,6 @@ python -m torch.distributed.launch --nproc_per_node=4 --nnodes=2 --node_rank=$TH
```
Where `$THIS_MACHINE_INDEX` is an sequential index assigned to each of your machine (0, 1, 2...) and the machine with rank 0 has an IP address `192.168.1.1` and an open port `1234`.
## TPU support and pretraining scripts
TPU are not supported by the current stable release of PyTorch (0.4.1). However, the next version of PyTorch (v1.0) should support training on TPU and is expected to be released soon (see the recent [official announcement](https://cloud.google.com/blog/products/ai-machine-learning/introducing-pytorch-across-google-cloud)).
We will add TPU support when this next release is published.
The original TensorFlow code further comprises two scripts for pre-training BERT: [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py) and [run_pretraining.py](https://github.com/google-research/bert/blob/master/run_pretraining.py).
Since, pre-training BERT is a particularly expensive operation that basically requires one or several TPUs to be completed in a reasonable amout of time (see details [here](https://github.com/google-research/bert#pre-training-with-bert)) we have decided to wait for the inclusion of TPU support in PyTorch to convert these pre-training scripts.
## Comparing the PyTorch model and the TensorFlow model predictions
We also include [two Jupyter Notebooks](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/notebooks) that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model.
- The first NoteBook ([Comparing TF and PT models.ipynb](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks/Comparing%20TF%20and%20PT%20models.ipynb)) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e-7 to 9e-7 on the various hidden state of the models.
- The second NoteBook ([Comparing TF and PT models SQuAD predictions.ipynb](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks/Comparing%20TF%20and%20PT%20models%20SQuAD%20predictions.ipynb)) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the `BertForQuestionAnswering` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models.
Please follow the instructions given in the notebooks to run and modify them. They can also be nice example on how to use the models in a simpler way than the full fine-tuning scripts we provide.
## Fine-tuning with BERT: running the examples
We showcase the same examples as [the original implementation](https://github.com/google-research/bert/): fine-tuning a sequence-level classifier on the MRPC classification corpus and a token-level classifier on the question answering dataset SQuAD.
......@@ -270,3 +442,52 @@ The results were similar to the above FP32 results (actually slightly higher):
```bash
{"exact_match": 84.65468306527909, "f1": 91.238669287002}
```
# Notebooks
Comparing the PyTorch model and the TensorFlow model predictions
We also include [three Jupyter Notebooks](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/notebooks) that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model.
- The first NoteBook ([Comparing TF and PT models.ipynb](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks/Comparing%20TF%20and%20PT%20models.ipynb)) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e-7 to 9e-7 on the various hidden state of the models.
- The second NoteBook ([Comparing TF and PT models SQuAD predictions.ipynb](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/notebooks/Comparing%20TF%20and%20PT%20models%20SQuAD%20predictions.ipynb)) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the `BertForQuestionAnswering` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models.
Please follow the instructions given in the notebooks to run and modify them. They can also be nice example on how to use the models in a simpler way than the full fine-tuning scripts we provide.
# Command-line interface
A command-line interface is provided to convert a TensorFlow checkpoint in a PyTorch checkpoint
You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the [`convert_tf_checkpoint_to_pytorch.py`](convert_tf_checkpoint_to_pytorch.py) script.
This script takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using `torch.load()` (see examples in `extract_features.py`, `run_classifier.py` and `run_squad.py`).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model:
```shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
python convert_tf_checkpoint_to_pytorch.py \
--tf_checkpoint_path $BERT_BASE_DIR/bert_model.ckpt \
--bert_config_file $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_path $BERT_BASE_DIR/pytorch_model.bin
```
You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/bert#pre-trained-models).
# TPU
TPU support and pretraining scripts
TPU are not supported by the current stable release of PyTorch (0.4.1). However, the next version of PyTorch (v1.0) should support training on TPU and is expected to be released soon (see the recent [official announcement](https://cloud.google.com/blog/products/ai-machine-learning/introducing-pytorch-across-google-cloud)).
We will add TPU support when this next release is published.
The original TensorFlow code further comprises two scripts for pre-training BERT: [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py) and [run_pretraining.py](https://github.com/google-research/bert/blob/master/run_pretraining.py).
Since, pre-training BERT is a particularly expensive operation that basically requires one or several TPUs to be completed in a reasonable amout of time (see details [here](https://github.com/google-research/bert#pre-training-with-bert)) we have decided to wait for the inclusion of TPU support in PyTorch to convert these pre-training scripts.
......@@ -22,8 +22,8 @@
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:26.999106Z",
"start_time": "2018-11-16T10:02:26.985709Z"
"end_time": "2018-11-16T12:57:29.908082Z",
"start_time": "2018-11-16T12:57:29.895380Z"
}
},
"outputs": [],
......@@ -44,8 +44,8 @@
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:27.664528Z",
"start_time": "2018-11-16T10:02:27.651019Z"
"end_time": "2018-11-16T12:57:30.748660Z",
"start_time": "2018-11-16T12:57:30.734752Z"
}
},
"outputs": [],
......@@ -69,8 +69,8 @@
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:30.202182Z",
"start_time": "2018-11-16T10:02:28.112570Z"
"end_time": "2018-11-16T12:57:33.447672Z",
"start_time": "2018-11-16T12:57:31.295837Z"
}
},
"outputs": [],
......@@ -103,8 +103,8 @@
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:30.238027Z",
"start_time": "2018-11-16T10:02:30.204943Z"
"end_time": "2018-11-16T12:57:33.486399Z",
"start_time": "2018-11-16T12:57:33.450132Z"
},
"code_folding": [
15
......@@ -167,11 +167,11 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:30.304018Z",
"start_time": "2018-11-16T10:02:30.240189Z"
"end_time": "2018-11-16T13:01:37.821298Z",
"start_time": "2018-11-16T13:01:37.735808Z"
}
},
"outputs": [
......@@ -201,8 +201,8 @@
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:33.324167Z",
"start_time": "2018-11-16T10:02:33.291909Z"
"end_time": "2018-11-16T12:57:36.214438Z",
"start_time": "2018-11-16T12:57:36.181993Z"
},
"code_folding": [
16
......@@ -274,8 +274,8 @@
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:34.185367Z",
"start_time": "2018-11-16T10:02:34.155046Z"
"end_time": "2018-11-16T12:57:36.707966Z",
"start_time": "2018-11-16T12:57:36.679964Z"
}
},
"outputs": [
......@@ -290,7 +290,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:34 - INFO - tensorflow - *** Example ***\n"
"11/16/2018 13:57:36 - INFO - tensorflow - *** Example ***\n"
]
},
{
......@@ -304,7 +304,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:34 - INFO - tensorflow - tokens: who was jim henson ? jim [MASK] was a puppet ##eer\n"
"11/16/2018 13:57:36 - INFO - tensorflow - tokens: who was jim henson ? jim [MASK] was a puppet ##eer\n"
]
},
{
......@@ -313,7 +313,7 @@
"text": [
"INFO:tensorflow:features: input_ids:[2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997, 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"input_mask:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"segment_ids:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_positions:[6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_ids:[27227, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_weights:[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
......@@ -324,9 +324,9 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:34 - INFO - tensorflow - features: input_ids:[2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997, 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"11/16/2018 13:57:36 - INFO - tensorflow - features: input_ids:[2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997, 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"input_mask:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"segment_ids:[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_positions:[6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_ids:[27227, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_weights:[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
......@@ -345,8 +345,8 @@
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:34.912005Z",
"start_time": "2018-11-16T10:02:34.882111Z"
"end_time": "2018-11-16T12:57:37.270106Z",
"start_time": "2018-11-16T12:57:37.239090Z"
}
},
"outputs": [],
......@@ -428,8 +428,8 @@
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:35.671603Z",
"start_time": "2018-11-16T10:02:35.626167Z"
"end_time": "2018-11-16T12:57:37.784427Z",
"start_time": "2018-11-16T12:57:37.737622Z"
},
"code_folding": [
64,
......@@ -607,8 +607,8 @@
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:40.328700Z",
"start_time": "2018-11-16T10:02:36.289676Z"
"end_time": "2018-11-16T12:57:42.465851Z",
"start_time": "2018-11-16T12:57:38.254858Z"
}
},
"outputs": [
......@@ -616,54 +616,54 @@
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12a864ae8>) includes params argument, but params are not passed to Estimator.\n"
"WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12e3b0620>) includes params argument, but params are not passed to Estimator.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12a864ae8>) includes params argument, but params are not passed to Estimator.\n"
"11/16/2018 13:57:42 - WARNING - tensorflow - Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12e3b0620>) includes params argument, but params are not passed to Estimator.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d\n"
"WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpbmo71s73\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d\n"
"11/16/2018 13:57:42 - WARNING - tensorflow - Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpbmo71s73\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
"INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpbmo71s73', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
"graph_options {\n",
" rewrite_options {\n",
" meta_optimizer_iterations: ONE\n",
" }\n",
"}\n",
", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12dbb5ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x131700ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
"11/16/2018 13:57:42 - INFO - tensorflow - Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpbmo71s73', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
"graph_options {\n",
" rewrite_options {\n",
" meta_optimizer_iterations: ONE\n",
" }\n",
"}\n",
", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12dbb5ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x131700ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
]
},
{
......@@ -677,7 +677,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.\n"
"11/16/2018 13:57:42 - WARNING - tensorflow - Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.\n"
]
},
{
......@@ -691,7 +691,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - _TPUContext: eval_on_tpu True\n"
"11/16/2018 13:57:42 - INFO - tensorflow - _TPUContext: eval_on_tpu True\n"
]
},
{
......@@ -705,7 +705,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - eval_on_tpu ignored because use_tpu is False.\n"
"11/16/2018 13:57:42 - WARNING - tensorflow - eval_on_tpu ignored because use_tpu is False.\n"
]
}
],
......@@ -744,8 +744,8 @@
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:46.596956Z",
"start_time": "2018-11-16T10:02:40.331008Z"
"end_time": "2018-11-16T12:57:48.906267Z",
"start_time": "2018-11-16T12:57:42.468656Z"
}
},
"outputs": [
......@@ -753,14 +753,14 @@
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d, running initialization to predict.\n"
"INFO:tensorflow:Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpbmo71s73, running initialization to predict.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d, running initialization to predict.\n"
"11/16/2018 13:57:42 - INFO - tensorflow - Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpbmo71s73, running initialization to predict.\n"
]
},
{
......@@ -774,7 +774,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Calling model_fn.\n"
"11/16/2018 13:57:42 - INFO - tensorflow - Calling model_fn.\n"
]
},
{
......@@ -788,7 +788,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Running infer on CPU\n"
"11/16/2018 13:57:42 - INFO - tensorflow - Running infer on CPU\n"
]
},
{
......@@ -802,7 +802,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - *** Features ***\n"
"11/16/2018 13:57:42 - INFO - tensorflow - *** Features ***\n"
]
},
{
......@@ -816,7 +816,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = input_ids, shape = (?, 128)\n"
"11/16/2018 13:57:42 - INFO - tensorflow - name = input_ids, shape = (?, 128)\n"
]
},
{
......@@ -830,7 +830,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = input_mask, shape = (?, 128)\n"
"11/16/2018 13:57:42 - INFO - tensorflow - name = input_mask, shape = (?, 128)\n"
]
},
{
......@@ -844,7 +844,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = masked_lm_ids, shape = (?, 20)\n"
"11/16/2018 13:57:42 - INFO - tensorflow - name = masked_lm_ids, shape = (?, 20)\n"
]
},
{
......@@ -858,7 +858,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = masked_lm_positions, shape = (?, 20)\n"
"11/16/2018 13:57:42 - INFO - tensorflow - name = masked_lm_positions, shape = (?, 20)\n"
]
},
{
......@@ -872,7 +872,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = masked_lm_weights, shape = (?, 20)\n"
"11/16/2018 13:57:42 - INFO - tensorflow - name = masked_lm_weights, shape = (?, 20)\n"
]
},
{
......@@ -886,7 +886,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = next_sentence_labels, shape = (?, 1)\n"
"11/16/2018 13:57:42 - INFO - tensorflow - name = next_sentence_labels, shape = (?, 1)\n"
]
},
{
......@@ -900,7 +900,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = segment_ids, shape = (?, 128)\n"
"11/16/2018 13:57:42 - INFO - tensorflow - name = segment_ids, shape = (?, 128)\n"
]
},
{
......@@ -914,7 +914,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - **** Trainable Variables ****\n"
"11/16/2018 13:57:45 - INFO - tensorflow - **** Trainable Variables ****\n"
]
},
{
......@@ -928,7 +928,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -942,7 +942,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -956,7 +956,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -970,7 +970,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -984,7 +984,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -998,7 +998,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1012,7 +1012,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1026,7 +1026,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1040,7 +1040,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1054,7 +1054,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1068,7 +1068,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1082,7 +1082,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1096,7 +1096,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1110,7 +1110,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1124,7 +1124,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1138,7 +1138,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1152,7 +1152,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1166,7 +1166,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1180,7 +1180,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1194,7 +1194,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1208,7 +1208,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1222,7 +1222,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1236,7 +1236,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1250,7 +1250,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1264,7 +1264,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1278,7 +1278,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1292,7 +1292,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1306,7 +1306,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1320,7 +1320,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1334,7 +1334,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1348,7 +1348,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1362,7 +1362,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1376,7 +1376,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1390,7 +1390,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1404,7 +1404,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1418,7 +1418,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1432,7 +1432,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1446,7 +1446,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1460,7 +1460,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1474,7 +1474,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1488,7 +1488,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1502,7 +1502,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1516,7 +1516,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1530,7 +1530,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1544,7 +1544,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1558,7 +1558,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1572,7 +1572,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1586,7 +1586,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1600,7 +1600,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1614,7 +1614,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1628,7 +1628,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1642,7 +1642,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1656,7 +1656,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1670,7 +1670,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1684,7 +1684,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1698,7 +1698,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1712,7 +1712,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1726,7 +1726,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1740,7 +1740,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1754,7 +1754,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1768,7 +1768,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1782,7 +1782,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1796,7 +1796,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1810,7 +1810,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1824,7 +1824,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1838,7 +1838,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1852,7 +1852,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1866,7 +1866,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1880,7 +1880,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1894,7 +1894,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1908,7 +1908,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1922,7 +1922,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1936,7 +1936,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1950,7 +1950,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1964,7 +1964,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1978,7 +1978,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -1992,7 +1992,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2006,7 +2006,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2020,7 +2020,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2034,7 +2034,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2048,7 +2048,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2062,7 +2062,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2076,7 +2076,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2090,7 +2090,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2104,7 +2104,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2118,7 +2118,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2132,7 +2132,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2146,7 +2146,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2160,7 +2160,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2174,7 +2174,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2188,7 +2188,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2202,7 +2202,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2216,7 +2216,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2230,7 +2230,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2244,7 +2244,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2258,7 +2258,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2272,7 +2272,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2286,7 +2286,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2300,7 +2300,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2314,7 +2314,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2328,7 +2328,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2342,7 +2342,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2356,7 +2356,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2370,7 +2370,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2384,7 +2384,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2398,7 +2398,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2412,7 +2412,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2426,7 +2426,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2440,7 +2440,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2454,7 +2454,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2468,7 +2468,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2482,7 +2482,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2496,7 +2496,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2510,7 +2510,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2524,7 +2524,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2538,7 +2538,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2552,7 +2552,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2566,7 +2566,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2580,7 +2580,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2594,7 +2594,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2608,7 +2608,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2622,7 +2622,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2636,7 +2636,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2650,7 +2650,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2664,7 +2664,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2678,7 +2678,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2692,7 +2692,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2706,7 +2706,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2720,7 +2720,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2734,7 +2734,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2748,7 +2748,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2762,7 +2762,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2776,7 +2776,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2790,7 +2790,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2804,7 +2804,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2818,7 +2818,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2832,7 +2832,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2846,7 +2846,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2860,7 +2860,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2874,7 +2874,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2888,7 +2888,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2902,7 +2902,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2916,7 +2916,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2930,7 +2930,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2944,7 +2944,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2958,7 +2958,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2972,7 +2972,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -2986,7 +2986,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3000,7 +3000,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3014,7 +3014,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3028,7 +3028,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3042,7 +3042,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3056,7 +3056,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3070,7 +3070,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3084,7 +3084,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3098,7 +3098,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3112,7 +3112,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3126,7 +3126,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3140,7 +3140,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3154,7 +3154,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3168,7 +3168,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3182,7 +3182,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3196,7 +3196,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3210,7 +3210,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3224,7 +3224,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3238,7 +3238,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3252,7 +3252,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3266,7 +3266,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3280,7 +3280,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3294,7 +3294,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3308,7 +3308,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3322,7 +3322,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3336,7 +3336,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3350,7 +3350,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3364,7 +3364,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3378,7 +3378,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3392,7 +3392,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3406,7 +3406,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3420,7 +3420,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3434,7 +3434,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3448,7 +3448,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3462,7 +3462,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3476,7 +3476,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3490,7 +3490,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3504,7 +3504,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3518,7 +3518,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3532,7 +3532,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3546,7 +3546,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3560,7 +3560,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3574,7 +3574,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3588,7 +3588,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3602,7 +3602,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3616,7 +3616,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3630,7 +3630,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3644,7 +3644,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3658,7 +3658,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3672,7 +3672,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3686,7 +3686,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3700,7 +3700,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3714,7 +3714,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3728,7 +3728,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3742,7 +3742,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3756,7 +3756,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3770,7 +3770,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3784,7 +3784,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3798,7 +3798,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*\n"
"11/16/2018 13:57:45 - INFO - tensorflow - name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*\n"
]
},
{
......@@ -3812,7 +3812,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - Done calling model_fn.\n"
"11/16/2018 13:57:45 - INFO - tensorflow - Done calling model_fn.\n"
]
},
{
......@@ -3826,7 +3826,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:44 - INFO - tensorflow - Graph was finalized.\n"
"11/16/2018 13:57:46 - INFO - tensorflow - Graph was finalized.\n"
]
},
{
......@@ -3840,7 +3840,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:45 - INFO - tensorflow - Running local_init_op.\n"
"11/16/2018 13:57:47 - INFO - tensorflow - Running local_init_op.\n"
]
},
{
......@@ -3854,7 +3854,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:45 - INFO - tensorflow - Done running local_init_op.\n"
"11/16/2018 13:57:47 - INFO - tensorflow - Done running local_init_op.\n"
]
},
{
......@@ -3868,7 +3868,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:46 - INFO - tensorflow - prediction_loop marked as finished\n"
"11/16/2018 13:57:48 - INFO - tensorflow - prediction_loop marked as finished\n"
]
},
{
......@@ -3882,7 +3882,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:46 - INFO - tensorflow - prediction_loop marked as finished\n"
"11/16/2018 13:57:48 - INFO - tensorflow - prediction_loop marked as finished\n"
]
}
],
......@@ -3897,8 +3897,8 @@
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:46.634304Z",
"start_time": "2018-11-16T10:02:46.598800Z"
"end_time": "2018-11-16T12:57:48.948759Z",
"start_time": "2018-11-16T12:57:48.908094Z"
}
},
"outputs": [
......@@ -3909,9 +3909,9 @@
"1\n",
"2\n",
"dict_keys(['masked_lm_predictions', 'next_sentence_predictions'])\n",
"masked_lm_predictions [27227 1010 1010 1010 1010 1010 1010 1010 1010 1010 1010 1010\n",
" 1010 1010 1010 1010 1010 1010 1010 1010]\n",
"predicted token ['henson', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',']\n"
"masked_lm_predictions [1010 1012 1012 1012 1012 1012 1012 1012 1012 1012 1012 1012 1012 1012\n",
" 1012 1012 1012 1012 1012 1012]\n",
"predicted token [',', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.', '.']\n"
]
}
],
......@@ -3928,8 +3928,8 @@
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:46.671229Z",
"start_time": "2018-11-16T10:02:46.637102Z"
"end_time": "2018-11-16T12:57:48.985852Z",
"start_time": "2018-11-16T12:57:48.951851Z"
}
},
"outputs": [
......@@ -3937,7 +3937,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"tensorflow_output: ['henson']\n"
"tensorflow_output: [',']\n"
]
}
],
......
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comparing TensorFlow (original) and PyTorch models\n",
"\n",
"You can use this small notebook to check the conversion of the model's weights from the TensorFlow model to the PyTorch model. In the following, we compare the weights of the last layer on a simple example (in `input.txt`) but both models returns all the hidden layers so you can check every stage of the model.\n",
"\n",
"To run this notebook, follow these instructions:\n",
"- make sure that your Python environment has both TensorFlow and PyTorch installed,\n",
"- download the original TensorFlow implementation,\n",
"- download a pre-trained TensorFlow model as indicaded in the TensorFlow implementation readme,\n",
"- run the script `convert_tf_checkpoint_to_pytorch.py` as indicated in the `README` to convert the pre-trained TensorFlow model to PyTorch.\n",
"\n",
"If needed change the relative paths indicated in this notebook (at the beggining of Sections 1 and 2) to point to the relevent models and code."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:26.999106Z",
"start_time": "2018-11-16T10:02:26.985709Z"
}
},
"outputs": [],
"source": [
"import os\n",
"os.chdir('../')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1/ TensorFlow code"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:27.664528Z",
"start_time": "2018-11-16T10:02:27.651019Z"
}
},
"outputs": [],
"source": [
"original_tf_inplem_dir = \"./tensorflow_code/\"\n",
"model_dir = \"../google_models/uncased_L-12_H-768_A-12/\"\n",
"\n",
"vocab_file = model_dir + \"vocab.txt\"\n",
"bert_config_file = model_dir + \"bert_config.json\"\n",
"init_checkpoint = model_dir + \"bert_model.ckpt\"\n",
"\n",
"input_file = \"./samples/input.txt\"\n",
"max_seq_length = 128\n",
"max_predictions_per_seq = 20\n",
"\n",
"masked_lm_positions = [6]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:30.202182Z",
"start_time": "2018-11-16T10:02:28.112570Z"
}
},
"outputs": [],
"source": [
"import importlib.util\n",
"import sys\n",
"import tensorflow as tf\n",
"import pytorch_pretrained_bert as ppb\n",
"\n",
"def del_all_flags(FLAGS):\n",
" flags_dict = FLAGS._flags() \n",
" keys_list = [keys for keys in flags_dict] \n",
" for keys in keys_list:\n",
" FLAGS.__delattr__(keys)\n",
"\n",
"del_all_flags(tf.flags.FLAGS)\n",
"import tensorflow_code.extract_features as ef\n",
"del_all_flags(tf.flags.FLAGS)\n",
"import tensorflow_code.modeling as tfm\n",
"del_all_flags(tf.flags.FLAGS)\n",
"import tensorflow_code.tokenization as tft\n",
"del_all_flags(tf.flags.FLAGS)\n",
"import tensorflow_code.run_pretraining as rp\n",
"del_all_flags(tf.flags.FLAGS)\n",
"import tensorflow_code.create_pretraining_data as cpp"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:30.238027Z",
"start_time": "2018-11-16T10:02:30.204943Z"
},
"code_folding": [
15
]
},
"outputs": [],
"source": [
"import re\n",
"class InputExample(object):\n",
" \"\"\"A single instance example.\"\"\"\n",
"\n",
" def __init__(self, tokens, segment_ids, masked_lm_positions,\n",
" masked_lm_labels, is_random_next):\n",
" self.tokens = tokens\n",
" self.segment_ids = segment_ids\n",
" self.masked_lm_positions = masked_lm_positions\n",
" self.masked_lm_labels = masked_lm_labels\n",
" self.is_random_next = is_random_next\n",
" def __repr__(self):\n",
" return '\\n'.join(k + \":\" + str(v) for k, v in self.__dict__.items())\n",
"\n",
"\n",
"def read_examples(input_file, tokenizer, masked_lm_positions):\n",
" \"\"\"Read a list of `InputExample`s from an input file.\"\"\"\n",
" examples = []\n",
" unique_id = 0\n",
" with tf.gfile.GFile(input_file, \"r\") as reader:\n",
" while True:\n",
" line = reader.readline()#tokenization.convert_to_unicode(reader.readline())\n",
" if not line:\n",
" break\n",
" line = line.strip()\n",
" text_a = None\n",
" text_b = None\n",
" m = re.match(r\"^(.*) \\|\\|\\| (.*)$\", line)\n",
" if m is None:\n",
" text_a = line\n",
" else:\n",
" text_a = m.group(1)\n",
" text_b = m.group(2)\n",
" tokens_a = tokenizer.tokenize(text_a)\n",
" tokens_b = None\n",
" if text_b:\n",
" tokens_b = tokenizer.tokenize(text_b)\n",
" tokens = tokens_a + tokens_b\n",
" masked_lm_labels = []\n",
" for m_pos in masked_lm_positions:\n",
" masked_lm_labels.append(tokens[m_pos])\n",
" tokens[m_pos] = '[MASK]'\n",
" examples.append(\n",
" InputExample(\n",
" tokens = tokens,\n",
" segment_ids = [0] * len(tokens_a) + [1] * len(tokens_b),\n",
" masked_lm_positions = masked_lm_positions,\n",
" masked_lm_labels = masked_lm_labels,\n",
" is_random_next = False))\n",
" unique_id += 1\n",
" return examples"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:30.304018Z",
"start_time": "2018-11-16T10:02:30.240189Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tokens:['who', 'was', 'jim', 'henson', '?', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer']\n",
"segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]\n",
"masked_lm_positions:[6]\n",
"masked_lm_labels:['henson']\n",
"is_random_next:False\n"
]
}
],
"source": [
"bert_config = tfm.BertConfig.from_json_file(bert_config_file)\n",
"tokenizer = ppb.BertTokenizer(\n",
" vocab_file=vocab_file, do_lower_case=True)\n",
"examples = read_examples(input_file, tokenizer, masked_lm_positions=masked_lm_positions)\n",
"\n",
"print(examples[0])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:33.324167Z",
"start_time": "2018-11-16T10:02:33.291909Z"
},
"code_folding": [
16
]
},
"outputs": [],
"source": [
"class InputFeatures(object):\n",
" \"\"\"A single set of features of data.\"\"\"\n",
"\n",
" def __init__(self, input_ids, input_mask, segment_ids, masked_lm_positions,\n",
" masked_lm_ids, masked_lm_weights, next_sentence_label):\n",
" self.input_ids = input_ids\n",
" self.input_mask = input_mask\n",
" self.segment_ids = segment_ids\n",
" self.masked_lm_positions = masked_lm_positions\n",
" self.masked_lm_ids = masked_lm_ids\n",
" self.masked_lm_weights = masked_lm_weights\n",
" self.next_sentence_labels = next_sentence_label\n",
"\n",
" def __repr__(self):\n",
" return '\\n'.join(k + \":\" + str(v) for k, v in self.__dict__.items())\n",
"\n",
"def pretraining_convert_examples_to_features(instances, tokenizer, max_seq_length,\n",
" max_predictions_per_seq):\n",
" \"\"\"Create TF example files from `TrainingInstance`s.\"\"\"\n",
" features = []\n",
" for (inst_index, instance) in enumerate(instances):\n",
" input_ids = tokenizer.convert_tokens_to_ids(instance.tokens)\n",
" input_mask = [1] * len(input_ids)\n",
" segment_ids = list(instance.segment_ids)\n",
" assert len(input_ids) <= max_seq_length\n",
"\n",
" while len(input_ids) < max_seq_length:\n",
" input_ids.append(0)\n",
" input_mask.append(0)\n",
" segment_ids.append(0)\n",
"\n",
" assert len(input_ids) == max_seq_length\n",
" assert len(input_mask) == max_seq_length\n",
" assert len(segment_ids) == max_seq_length\n",
"\n",
" masked_lm_positions = list(instance.masked_lm_positions)\n",
" masked_lm_ids = tokenizer.convert_tokens_to_ids(instance.masked_lm_labels)\n",
" masked_lm_weights = [1.0] * len(masked_lm_ids)\n",
"\n",
" while len(masked_lm_positions) < max_predictions_per_seq:\n",
" masked_lm_positions.append(0)\n",
" masked_lm_ids.append(0)\n",
" masked_lm_weights.append(0.0)\n",
"\n",
" next_sentence_label = 1 if instance.is_random_next else 0\n",
"\n",
" features.append(\n",
" InputFeatures(input_ids, input_mask, segment_ids,\n",
" masked_lm_positions, masked_lm_ids,\n",
" masked_lm_weights, next_sentence_label))\n",
"\n",
" if inst_index < 5:\n",
" tf.logging.info(\"*** Example ***\")\n",
" tf.logging.info(\"tokens: %s\" % \" \".join(\n",
" [str(x) for x in instance.tokens]))\n",
" tf.logging.info(\"features: %s\" % str(features[-1]))\n",
" return features"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:34.185367Z",
"start_time": "2018-11-16T10:02:34.155046Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:*** Example ***\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:34 - INFO - tensorflow - *** Example ***\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:tokens: who was jim henson ? jim [MASK] was a puppet ##eer\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:34 - INFO - tensorflow - tokens: who was jim henson ? jim [MASK] was a puppet ##eer\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:features: input_ids:[2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997, 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"input_mask:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_positions:[6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_ids:[27227, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_weights:[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
"next_sentence_labels:0\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:34 - INFO - tensorflow - features: input_ids:[2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997, 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"input_mask:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_positions:[6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_ids:[27227, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
"masked_lm_weights:[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
"next_sentence_labels:0\n"
]
}
],
"source": [
"features = pretraining_convert_examples_to_features(\n",
" instances=examples, max_seq_length=max_seq_length, \n",
" max_predictions_per_seq=max_predictions_per_seq, tokenizer=tokenizer)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:34.912005Z",
"start_time": "2018-11-16T10:02:34.882111Z"
}
},
"outputs": [],
"source": [
"def input_fn_builder(features, seq_length, max_predictions_per_seq, tokenizer):\n",
" \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n",
"\n",
" all_input_ids = []\n",
" all_input_mask = []\n",
" all_segment_ids = []\n",
" all_masked_lm_positions = []\n",
" all_masked_lm_ids = []\n",
" all_masked_lm_weights = []\n",
" all_next_sentence_labels = []\n",
"\n",
" for feature in features:\n",
" all_input_ids.append(feature.input_ids)\n",
" all_input_mask.append(feature.input_mask)\n",
" all_segment_ids.append(feature.segment_ids)\n",
" all_masked_lm_positions.append(feature.masked_lm_positions)\n",
" all_masked_lm_ids.append(feature.masked_lm_ids)\n",
" all_masked_lm_weights.append(feature.masked_lm_weights)\n",
" all_next_sentence_labels.append(feature.next_sentence_labels)\n",
"\n",
" def input_fn(params):\n",
" \"\"\"The actual input function.\"\"\"\n",
" batch_size = params[\"batch_size\"]\n",
"\n",
" num_examples = len(features)\n",
"\n",
" # This is for demo purposes and does NOT scale to large data sets. We do\n",
" # not use Dataset.from_generator() because that uses tf.py_func which is\n",
" # not TPU compatible. The right way to load data is with TFRecordReader.\n",
" d = tf.data.Dataset.from_tensor_slices({\n",
" \"input_ids\":\n",
" tf.constant(\n",
" all_input_ids, shape=[num_examples, seq_length],\n",
" dtype=tf.int32),\n",
" \"input_mask\":\n",
" tf.constant(\n",
" all_input_mask,\n",
" shape=[num_examples, seq_length],\n",
" dtype=tf.int32),\n",
" \"segment_ids\":\n",
" tf.constant(\n",
" all_segment_ids,\n",
" shape=[num_examples, seq_length],\n",
" dtype=tf.int32),\n",
" \"masked_lm_positions\":\n",
" tf.constant(\n",
" all_masked_lm_positions,\n",
" shape=[num_examples, max_predictions_per_seq],\n",
" dtype=tf.int32),\n",
" \"masked_lm_ids\":\n",
" tf.constant(\n",
" all_masked_lm_ids,\n",
" shape=[num_examples, max_predictions_per_seq],\n",
" dtype=tf.int32),\n",
" \"masked_lm_weights\":\n",
" tf.constant(\n",
" all_masked_lm_weights,\n",
" shape=[num_examples, max_predictions_per_seq],\n",
" dtype=tf.float32),\n",
" \"next_sentence_labels\":\n",
" tf.constant(\n",
" all_next_sentence_labels,\n",
" shape=[num_examples, 1],\n",
" dtype=tf.int32),\n",
" })\n",
"\n",
" d = d.batch(batch_size=batch_size, drop_remainder=False)\n",
" return d\n",
"\n",
" return input_fn\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:35.671603Z",
"start_time": "2018-11-16T10:02:35.626167Z"
},
"code_folding": [
64,
77
]
},
"outputs": [],
"source": [
"def model_fn_builder(bert_config, init_checkpoint, learning_rate,\n",
" num_train_steps, num_warmup_steps, use_tpu,\n",
" use_one_hot_embeddings):\n",
" \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n",
"\n",
" def model_fn(features, labels, mode, params): # pylint: disable=unused-argument\n",
" \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n",
"\n",
" tf.logging.info(\"*** Features ***\")\n",
" for name in sorted(features.keys()):\n",
" tf.logging.info(\" name = %s, shape = %s\" % (name, features[name].shape))\n",
"\n",
" input_ids = features[\"input_ids\"]\n",
" input_mask = features[\"input_mask\"]\n",
" segment_ids = features[\"segment_ids\"]\n",
" masked_lm_positions = features[\"masked_lm_positions\"]\n",
" masked_lm_ids = features[\"masked_lm_ids\"]\n",
" masked_lm_weights = features[\"masked_lm_weights\"]\n",
" next_sentence_labels = features[\"next_sentence_labels\"]\n",
"\n",
" is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n",
"\n",
" model = tfm.BertModel(\n",
" config=bert_config,\n",
" is_training=is_training,\n",
" input_ids=input_ids,\n",
" input_mask=input_mask,\n",
" token_type_ids=segment_ids,\n",
" use_one_hot_embeddings=use_one_hot_embeddings)\n",
"\n",
" (masked_lm_loss,\n",
" masked_lm_example_loss, masked_lm_log_probs) = rp.get_masked_lm_output(\n",
" bert_config, model.get_sequence_output(), model.get_embedding_table(),\n",
" masked_lm_positions, masked_lm_ids, masked_lm_weights)\n",
"\n",
" (next_sentence_loss, next_sentence_example_loss,\n",
" next_sentence_log_probs) = rp.get_next_sentence_output(\n",
" bert_config, model.get_pooled_output(), next_sentence_labels)\n",
"\n",
" total_loss = masked_lm_loss + next_sentence_loss\n",
"\n",
" tvars = tf.trainable_variables()\n",
"\n",
" initialized_variable_names = {}\n",
" scaffold_fn = None\n",
" if init_checkpoint:\n",
" (assignment_map,\n",
" initialized_variable_names) = tfm.get_assigment_map_from_checkpoint(\n",
" tvars, init_checkpoint)\n",
" if use_tpu:\n",
"\n",
" def tpu_scaffold():\n",
" tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n",
" return tf.train.Scaffold()\n",
"\n",
" scaffold_fn = tpu_scaffold\n",
" else:\n",
" tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n",
"\n",
" tf.logging.info(\"**** Trainable Variables ****\")\n",
" for var in tvars:\n",
" init_string = \"\"\n",
" if var.name in initialized_variable_names:\n",
" init_string = \", *INIT_FROM_CKPT*\"\n",
" tf.logging.info(\" name = %s, shape = %s%s\", var.name, var.shape,\n",
" init_string)\n",
"\n",
" output_spec = None\n",
" if mode == tf.estimator.ModeKeys.TRAIN:\n",
" masked_lm_positions = features[\"masked_lm_positions\"]\n",
" masked_lm_ids = features[\"masked_lm_ids\"]\n",
" masked_lm_weights = features[\"masked_lm_weights\"]\n",
" next_sentence_labels = features[\"next_sentence_labels\"]\n",
" train_op = optimization.create_optimizer(\n",
" total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)\n",
"\n",
" output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
" mode=mode,\n",
" loss=total_loss,\n",
" train_op=train_op,\n",
" scaffold_fn=scaffold_fn)\n",
" elif mode == tf.estimator.ModeKeys.EVAL:\n",
" masked_lm_positions = features[\"masked_lm_positions\"]\n",
" masked_lm_ids = features[\"masked_lm_ids\"]\n",
" masked_lm_weights = features[\"masked_lm_weights\"]\n",
" next_sentence_labels = features[\"next_sentence_labels\"]\n",
"\n",
" def metric_fn(masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n",
" masked_lm_weights, next_sentence_example_loss,\n",
" next_sentence_log_probs, next_sentence_labels):\n",
" \"\"\"Computes the loss and accuracy of the model.\"\"\"\n",
" masked_lm_log_probs = tf.reshape(masked_lm_log_probs,\n",
" [-1, masked_lm_log_probs.shape[-1]])\n",
" masked_lm_predictions = tf.argmax(\n",
" masked_lm_log_probs, axis=-1, output_type=tf.int32)\n",
" masked_lm_example_loss = tf.reshape(masked_lm_example_loss, [-1])\n",
" masked_lm_ids = tf.reshape(masked_lm_ids, [-1])\n",
" masked_lm_weights = tf.reshape(masked_lm_weights, [-1])\n",
" masked_lm_accuracy = tf.metrics.accuracy(\n",
" labels=masked_lm_ids,\n",
" predictions=masked_lm_predictions,\n",
" weights=masked_lm_weights)\n",
" masked_lm_mean_loss = tf.metrics.mean(\n",
" values=masked_lm_example_loss, weights=masked_lm_weights)\n",
"\n",
" next_sentence_log_probs = tf.reshape(\n",
" next_sentence_log_probs, [-1, next_sentence_log_probs.shape[-1]])\n",
" next_sentence_predictions = tf.argmax(\n",
" next_sentence_log_probs, axis=-1, output_type=tf.int32)\n",
" next_sentence_labels = tf.reshape(next_sentence_labels, [-1])\n",
" next_sentence_accuracy = tf.metrics.accuracy(\n",
" labels=next_sentence_labels, predictions=next_sentence_predictions)\n",
" next_sentence_mean_loss = tf.metrics.mean(\n",
" values=next_sentence_example_loss)\n",
"\n",
" return {\n",
" \"masked_lm_accuracy\": masked_lm_accuracy,\n",
" \"masked_lm_loss\": masked_lm_mean_loss,\n",
" \"next_sentence_accuracy\": next_sentence_accuracy,\n",
" \"next_sentence_loss\": next_sentence_mean_loss,\n",
" }\n",
"\n",
" eval_metrics = (metric_fn, [\n",
" masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n",
" masked_lm_weights, next_sentence_example_loss,\n",
" next_sentence_log_probs, next_sentence_labels\n",
" ])\n",
" output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
" mode=mode,\n",
" loss=total_loss,\n",
" eval_metrics=eval_metrics,\n",
" scaffold_fn=scaffold_fn)\n",
" elif mode == tf.estimator.ModeKeys.PREDICT:\n",
" masked_lm_log_probs = tf.reshape(masked_lm_log_probs,\n",
" [-1, masked_lm_log_probs.shape[-1]])\n",
" masked_lm_predictions = tf.argmax(\n",
" masked_lm_log_probs, axis=-1, output_type=tf.int32)\n",
"\n",
" next_sentence_log_probs = tf.reshape(\n",
" next_sentence_log_probs, [-1, next_sentence_log_probs.shape[-1]])\n",
" next_sentence_predictions = tf.argmax(\n",
" next_sentence_log_probs, axis=-1, output_type=tf.int32)\n",
"\n",
" masked_lm_predictions = tf.reshape(masked_lm_predictions,\n",
" [1, masked_lm_positions.shape[-1]])\n",
" next_sentence_predictions = tf.reshape(next_sentence_predictions,\n",
" [1, 1])\n",
"\n",
" predictions = {\n",
" \"masked_lm_predictions\": masked_lm_predictions,\n",
" \"next_sentence_predictions\": next_sentence_predictions\n",
" }\n",
"\n",
" output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
" mode=mode, predictions=predictions, scaffold_fn=scaffold_fn)\n",
" return output_spec\n",
" else:\n",
" raise ValueError(\"Only TRAIN, EVAL and PREDICT modes are supported: %s\" % (mode))\n",
"\n",
" return output_spec\n",
"\n",
" return model_fn"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:40.328700Z",
"start_time": "2018-11-16T10:02:36.289676Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12a864ae8>) includes params argument, but params are not passed to Estimator.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12a864ae8>) includes params argument, but params are not passed to Estimator.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
"graph_options {\n",
" rewrite_options {\n",
" meta_optimizer_iterations: ONE\n",
" }\n",
"}\n",
", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12dbb5ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
"graph_options {\n",
" rewrite_options {\n",
" meta_optimizer_iterations: ONE\n",
" }\n",
"}\n",
", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12dbb5ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:_TPUContext: eval_on_tpu True\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - _TPUContext: eval_on_tpu True\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - WARNING - tensorflow - eval_on_tpu ignored because use_tpu is False.\n"
]
}
],
"source": [
"is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n",
"run_config = tf.contrib.tpu.RunConfig(\n",
" master=None,\n",
" tpu_config=tf.contrib.tpu.TPUConfig(\n",
" num_shards=1,\n",
" per_host_input_for_training=is_per_host))\n",
"\n",
"model_fn = model_fn_builder(\n",
" bert_config=bert_config,\n",
" init_checkpoint=init_checkpoint,\n",
" learning_rate=0,\n",
" num_train_steps=1,\n",
" num_warmup_steps=1,\n",
" use_tpu=False,\n",
" use_one_hot_embeddings=False)\n",
"\n",
"# If TPU is not available, this will fall back to normal Estimator on CPU\n",
"# or GPU.\n",
"estimator = tf.contrib.tpu.TPUEstimator(\n",
" use_tpu=False,\n",
" model_fn=model_fn,\n",
" config=run_config,\n",
" predict_batch_size=1)\n",
"\n",
"input_fn = input_fn_builder(\n",
" features=features, seq_length=max_seq_length, max_predictions_per_seq=max_predictions_per_seq,\n",
"tokenizer=tokenizer)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:46.596956Z",
"start_time": "2018-11-16T10:02:40.331008Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d, running initialization to predict.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d, running initialization to predict.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Calling model_fn.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Calling model_fn.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Running infer on CPU\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - Running infer on CPU\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:*** Features ***\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - *** Features ***\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = input_ids, shape = (?, 128)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = input_ids, shape = (?, 128)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = input_mask, shape = (?, 128)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = input_mask, shape = (?, 128)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = masked_lm_ids, shape = (?, 20)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = masked_lm_ids, shape = (?, 20)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = masked_lm_positions, shape = (?, 20)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = masked_lm_positions, shape = (?, 20)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = masked_lm_weights, shape = (?, 20)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = masked_lm_weights, shape = (?, 20)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = next_sentence_labels, shape = (?, 1)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = next_sentence_labels, shape = (?, 1)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = segment_ids, shape = (?, 128)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:40 - INFO - tensorflow - name = segment_ids, shape = (?, 128)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:**** Trainable Variables ****\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - **** Trainable Variables ****\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow: name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Done calling model_fn.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:43 - INFO - tensorflow - Done calling model_fn.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Graph was finalized.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:44 - INFO - tensorflow - Graph was finalized.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Running local_init_op.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:45 - INFO - tensorflow - Running local_init_op.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Done running local_init_op.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:45 - INFO - tensorflow - Done running local_init_op.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:prediction_loop marked as finished\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:46 - INFO - tensorflow - prediction_loop marked as finished\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:prediction_loop marked as finished\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:02:46 - INFO - tensorflow - prediction_loop marked as finished\n"
]
}
],
"source": [
"tensorflow_all_out = []\n",
"for result in estimator.predict(input_fn, yield_single_examples=True):\n",
" tensorflow_all_out.append(result)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:46.634304Z",
"start_time": "2018-11-16T10:02:46.598800Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"dict_keys(['masked_lm_predictions', 'next_sentence_predictions'])\n",
"masked_lm_predictions [27227 1010 1010 1010 1010 1010 1010 1010 1010 1010 1010 1010\n",
" 1010 1010 1010 1010 1010 1010 1010 1010]\n",
"predicted token ['henson', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',']\n"
]
}
],
"source": [
"print(len(tensorflow_all_out))\n",
"print(len(tensorflow_all_out[0]))\n",
"print(tensorflow_all_out[0].keys())\n",
"print(\"masked_lm_predictions\", tensorflow_all_out[0]['masked_lm_predictions'])\n",
"print(\"predicted token\", tokenizer.convert_ids_to_tokens(tensorflow_all_out[0]['masked_lm_predictions']))"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:02:46.671229Z",
"start_time": "2018-11-16T10:02:46.637102Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensorflow_output: ['henson']\n"
]
}
],
"source": [
"tensorflow_outputs = tokenizer.convert_ids_to_tokens(tensorflow_all_out[0]['masked_lm_predictions'])[:len(masked_lm_positions)]\n",
"print(\"tensorflow_output:\", tensorflow_outputs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2/ PyTorch code"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:03:03.556557Z",
"start_time": "2018-11-16T10:03:03.519654Z"
}
},
"outputs": [],
"source": [
"from examples import extract_features\n",
"from examples.extract_features import *"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:03:03.952710Z",
"start_time": "2018-11-16T10:03:03.921917Z"
}
},
"outputs": [],
"source": [
"init_checkpoint_pt = \"../google_models/uncased_L-12_H-768_A-12/pytorch_model.bin\""
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:03:12.307673Z",
"start_time": "2018-11-16T10:03:04.439317Z"
},
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"11/16/2018 11:03:05 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at /Users/thomaswolf/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba\n",
"11/16/2018 11:03:05 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /Users/thomaswolf/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpaqgsm566\n",
"11/16/2018 11:03:08 - INFO - pytorch_pretrained_bert.modeling - Model config {\n",
" \"attention_probs_dropout_prob\": 0.1,\n",
" \"hidden_act\": \"gelu\",\n",
" \"hidden_dropout_prob\": 0.1,\n",
" \"hidden_size\": 768,\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 3072,\n",
" \"max_position_embeddings\": 512,\n",
" \"num_attention_heads\": 12,\n",
" \"num_hidden_layers\": 12,\n",
" \"type_vocab_size\": 2,\n",
" \"vocab_size\": 30522\n",
"}\n",
"\n"
]
},
{
"data": {
"text/plain": [
"BertForPreTraining(\n",
" (bert): BertModel(\n",
" (embeddings): BertEmbeddings(\n",
" (word_embeddings): Embedding(30522, 768)\n",
" (position_embeddings): Embedding(512, 768)\n",
" (token_type_embeddings): Embedding(2, 768)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (encoder): BertEncoder(\n",
" (layer): ModuleList(\n",
" (0): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (1): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (2): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (3): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (4): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (5): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (6): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (7): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (8): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (9): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (10): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (11): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (pooler): BertPooler(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (activation): Tanh()\n",
" )\n",
" )\n",
" (cls): BertPreTrainingHeads(\n",
" (predictions): BertLMPredictionHead(\n",
" (transform): BertPredictionHeadTransform(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" )\n",
" (decoder): Linear(in_features=768, out_features=30522, bias=False)\n",
" )\n",
" (seq_relationship): Linear(in_features=768, out_features=2, bias=True)\n",
" )\n",
")"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"device = torch.device(\"cpu\")\n",
"model = ppb.BertForPreTraining.from_pretrained('bert-base-uncased')\n",
"model.to(device)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:03:12.351625Z",
"start_time": "2018-11-16T10:03:12.310736Z"
},
"code_folding": []
},
"outputs": [
{
"data": {
"text/plain": [
"BertForPreTraining(\n",
" (bert): BertModel(\n",
" (embeddings): BertEmbeddings(\n",
" (word_embeddings): Embedding(30522, 768)\n",
" (position_embeddings): Embedding(512, 768)\n",
" (token_type_embeddings): Embedding(2, 768)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (encoder): BertEncoder(\n",
" (layer): ModuleList(\n",
" (0): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (1): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (2): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (3): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (4): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (5): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (6): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (7): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (8): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (9): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (10): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (11): BertLayer(\n",
" (attention): BertAttention(\n",
" (self): BertSelfAttention(\n",
" (query): Linear(in_features=768, out_features=768, bias=True)\n",
" (key): Linear(in_features=768, out_features=768, bias=True)\n",
" (value): Linear(in_features=768, out_features=768, bias=True)\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" (output): BertSelfOutput(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" (intermediate): BertIntermediate(\n",
" (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
" )\n",
" (output): BertOutput(\n",
" (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" (dropout): Dropout(p=0.1)\n",
" )\n",
" )\n",
" )\n",
" )\n",
" (pooler): BertPooler(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (activation): Tanh()\n",
" )\n",
" )\n",
" (cls): BertPreTrainingHeads(\n",
" (predictions): BertLMPredictionHead(\n",
" (transform): BertPredictionHeadTransform(\n",
" (dense): Linear(in_features=768, out_features=768, bias=True)\n",
" (LayerNorm): BertLayerNorm()\n",
" )\n",
" (decoder): Linear(in_features=768, out_features=30522, bias=False)\n",
" )\n",
" (seq_relationship): Linear(in_features=768, out_features=2, bias=True)\n",
" )\n",
")"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)\n",
"all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)\n",
"all_segment_ids = torch.tensor([f.segment_ids for f in features], dtype=torch.long)\n",
"all_masked_lm_positions = torch.tensor([f.masked_lm_positions for f in features], dtype=torch.long)\n",
"\n",
"eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_masked_lm_positions)\n",
"eval_sampler = SequentialSampler(eval_data)\n",
"eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=1)\n",
"\n",
"model.eval()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:03:12.792741Z",
"start_time": "2018-11-16T10:03:12.354253Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[ 2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997,\n",
" 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0]])\n",
"tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0]])\n",
"tensor([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0]])\n",
"(1, 20, 30522)\n",
"[27227, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010]\n"
]
}
],
"source": [
"import numpy as np\n",
"pytorch_all_out = []\n",
"for input_ids, input_mask, segment_ids, tensor_masked_lm_positions in eval_dataloader:\n",
" print(input_ids)\n",
" print(input_mask)\n",
" print(segment_ids)\n",
" input_ids = input_ids.to(device)\n",
" input_mask = input_mask.to(device)\n",
" segment_ids = segment_ids.to(device)\n",
"\n",
" prediction_scores, _ = model(input_ids, token_type_ids=segment_ids, attention_mask=input_mask)\n",
" prediction_scores = prediction_scores[0, tensor_masked_lm_positions].detach().cpu().numpy()\n",
" print(prediction_scores.shape)\n",
" masked_lm_predictions = np.argmax(prediction_scores, axis=-1).squeeze().tolist()\n",
" print(masked_lm_predictions)\n",
" pytorch_all_out.append(masked_lm_predictions)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-16T10:03:12.828439Z",
"start_time": "2018-11-16T10:03:12.795420Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pytorch_output: ['henson']\n",
"tensorflow_output: ['henson']\n"
]
}
],
"source": [
"pytorch_outputs = tokenizer.convert_ids_to_tokens(pytorch_all_out[0])[:len(masked_lm_positions)]\n",
"print(\"pytorch_output:\", pytorch_outputs)\n",
"print(\"tensorflow_output:\", tensorflow_outputs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"hide_input": false,
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "48px",
"width": "252px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 4,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
......@@ -42,7 +42,7 @@ SCHEDULES = {
class BERTAdam(Optimizer):
"""Implements BERT version of Adam algorithm with weight decay fix (and no ).
"""Implements BERT version of Adam algorithm with weight decay fix.
Params:
lr: learning rate
warmup: portion of t_total for the warmup, -1 means no warmup. Default: -1
......@@ -136,7 +136,7 @@ class BERTAdam(Optimizer):
# the correct way of using L2 regularization/weight decay with Adam,
# since that will interact with the m and v parameters in strange ways.
#
# Instead we want ot decay the weights in a manner that doesn't interact
# Instead we want to decay the weights in a manner that doesn't interact
# with the m/v parameters. This is equivalent to adding the square
# of the weights to the loss with plain (non-momentum) SGD.
if group['weight_decay_rate'] > 0.0:
......@@ -154,6 +154,7 @@ class BERTAdam(Optimizer):
state['step'] += 1
# step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1
# No bias correction
# bias_correction1 = 1 - beta1 ** state['step']
# bias_correction2 = 1 - beta2 ** state['step']
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment