"# Comparing TensorFlow (original) and PyTorch models\n",
"\n",
"We use this small notebook to test the conversion of the model's weights and to make sure both the TensorFlow and PyTorch are coherent. In particular, we compare the weights of the last layer on a simple example (in `input.txt`).\n",
"\n",
"To run this notebook, please make sure that your Python environment has both TensorFlow and PyTorch.\n",
"You should follow the instructions in the `README.md` and make sure that you have:\n",
"- the original TensorFlow implementation\n",
"- the `BERT-base, Uncased` model\n",
"- run the script `convert_tf_checkpoint_to_pytorch.py` to convert the weights to PyTorch\n",
"\n",
"Please modify the relative paths accordingly (at the beggining of Sections 1 and 2)."
"/usr/local/lib/python3.6/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
" from ._conv import register_converters as _register_converters\n"
"WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12b0bcc80>) includes params argument, but params are not passed to Estimator.\n",
"WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpgpb5nz3u\n",
"WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x1289c1a60>) includes params argument, but params are not passed to Estimator.\n",
"WARNING:tensorflow:Using temporary folder as model directory: /var/folders/y2/py87pn6115bdsdftbc6394nh0000gn/T/tmpmcfk2tyr\n",
"WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.\n"
...
...
@@ -123,7 +162,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2018-11-03T02:09:46.413197Z",
...
...
@@ -135,7 +174,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpgpb5nz3u, running initialization to predict.\n",
"INFO:tensorflow:Could not find trained model in model_dir: /var/folders/y2/py87pn6115bdsdftbc6394nh0000gn/T/tmpmcfk2tyr, running initialization to predict.\n",
This is a PyTorch implementation of the [TensorFlow code](https://github.com/google-research/bert) released by Google AI with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
## Converting the TensorFlow pre-trained models to Pytorch
You can convert the pre-trained weights released by GoogleAI by calling the script `convert_tf_checkpoint_to_pytorch.py`.
We showcase the same examples as in the original implementation: fine-tuning on the MRPC classification corpus and the question answering dataset SQUAD.
You should see a result similar to the 88.5% reported in the paper for
`BERT-Base`.
## Comparing TensorFlow and PyTorch models
If you have access to a Cloud TPU, you can train with `BERT-Large`. Here is a
set of hyperparameters (slightly different than the paper) which consistently
obtain around 90.5%-91.0% F1 single-system trained only on SQuAD:
We also include [a small Notebook](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/Comparing%20TF%20and%20PT%20models.ipynb) we used to verify that the conversion of the weights to PyTorch are consistent with the original TensorFlow weights.
Please follow the instructions in the Notebook to run it.
If you want to use BERT with [Colab](https://colab.sandbox.google.com), you can
get started with the notebook
"[BERT FineTuning with Cloud TPUs](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)".
**At the time of this writing (October 31st, 2018), Colab users can access a
Cloud TPU completely for free.** Note: One per user, availability limited,
requires a Google Cloud Platform account with storage (although storage may be
purchased with free credit for signing up with GCP), and this capability may not
longer be available in the future. Click on the BERT Colab that was just linked
for more information.
## FAQ
#### Is this code compatible with Cloud TPUs? What about GPUs?
Yes, all of the code in this repository works out-of-the-box with CPU, GPU, and
Cloud TPU. However, GPU training is single-GPU only.
#### I am getting out-of-memory errors, what is wrong?
See the section on [out-of-memory issues](#out-of-memory-issues) for more
information.
#### Is there a PyTorch version available?
There is no official PyTorch implementation. If someone creates a line-for-line
PyTorch reimplementation so that our pre-trained checkpoints can be directly
converted, we would be happy to link to that PyTorch version here.
#### Will models in other languages be released?
Yes, we plan to release a multi-lingual BERT model in the near future. We cannot
make promises about exactly which languages will be included, but it will likely
be a single model which includes *most* of the languages which have a
significantly-sized Wikipedia.
#### Will models larger than `BERT-Large` be released?
So far we have not attempted to train anything larger than `BERT-Large`. It is
possible that we will release larger models if we are able to obtain significant
improvements.
#### What license is this library released under?
All code *and* models are released under the Apache 2.0 license. See the
`LICENSE` file for more information.
#### How do I cite BERT?
For now, cite [the Arxiv paper](https://arxiv.org/abs/1810.04805):
```
@article{devlin2018bert,
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1810.04805},
year={2018}
}
```
If we submit the paper to a conference or journal, we will update the BibTeX.
## Note on pre-training
## Disclaimer
The original TensorFlow code also release two scripts for pre-training BERT: [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py) and [run_pretraining.py](https://github.com/google-research/bert/blob/master/run_pretraining.py).
As the authors notice, pre-training BERT is particularly expensive and requires TPU to run in a reasonable amout of time (see [here](https://github.com/google-research/bert#pre-training-with-bert)).
This is not an official Google product.
We have decided **not** to port these scripts for now and wait for the TPU support on PyTorch (see the recent [official announcement](https://cloud.google.com/blog/products/ai-machine-learning/introducing-pytorch-across-google-cloud)).
## Contact information
For help or issues using BERT, please submit a GitHub issue.
## Requirements
For personal communication related to BERT, please contact Jacob Devlin
(`jacobdevlin@google.com`), Ming-Wei Chang (`mingweichang@google.com`), or