This is a PyTorch implementation of the [TensorFlow code](https://github.com/google-research/bert) released by Google AI with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
This is a PyTorch implementation of the [TensorFlow code](https://github.com/google-research/bert) released by Google AI with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805).
## Converting the TensorFlow pre-trained models to Pytorch
## Converting the TensorFlow pre-trained models to Pytorch
You can convert the pre-trained weights released by GoogleAI by calling the script `convert_tf_checkpoint_to_pytorch.py`.
You can convert the pre-trained weights released by GoogleAI by calling the script `convert_tf_checkpoint_to_pytorch.py`.
We showcase the same examples as in the original implementation: fine-tuning on the MRPC classification corpus and the question answering dataset SQUAD.
We showcase the same examples as in the original implementation: fine-tuning on the MRPC classification corpus and the question answering dataset SQUAD.
You should see a result similar to the 88.5% reported in the paper for
## Comparing TensorFlow and PyTorch models
`BERT-Base`.
If you have access to a Cloud TPU, you can train with `BERT-Large`. Here is a
We also include [a small Notebook](https://github.com/huggingface/pytorch-pretrained-BERT/blob/master/Comparing%20TF%20and%20PT%20models.ipynb) we used to verify that the conversion of the weights to PyTorch are consistent with the original TensorFlow weights.
set of hyperparameters (slightly different than the paper) which consistently
Please follow the instructions in the Notebook to run it.
obtain around 90.5%-91.0% F1 single-system trained only on SQuAD:
If you want to use BERT with [Colab](https://colab.sandbox.google.com), you can
get started with the notebook
"[BERT FineTuning with Cloud TPUs](https://colab.sandbox.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_finetuning_with_cloud_tpus.ipynb)".
**At the time of this writing (October 31st, 2018), Colab users can access a
Cloud TPU completely for free.** Note: One per user, availability limited,
requires a Google Cloud Platform account with storage (although storage may be
purchased with free credit for signing up with GCP), and this capability may not
longer be available in the future. Click on the BERT Colab that was just linked
for more information.
## FAQ
#### Is this code compatible with Cloud TPUs? What about GPUs?
Yes, all of the code in this repository works out-of-the-box with CPU, GPU, and
Cloud TPU. However, GPU training is single-GPU only.
#### I am getting out-of-memory errors, what is wrong?
See the section on [out-of-memory issues](#out-of-memory-issues) for more
information.
#### Is there a PyTorch version available?
There is no official PyTorch implementation. If someone creates a line-for-line
PyTorch reimplementation so that our pre-trained checkpoints can be directly
converted, we would be happy to link to that PyTorch version here.
#### Will models in other languages be released?
Yes, we plan to release a multi-lingual BERT model in the near future. We cannot
make promises about exactly which languages will be included, but it will likely
be a single model which includes *most* of the languages which have a
significantly-sized Wikipedia.
#### Will models larger than `BERT-Large` be released?
So far we have not attempted to train anything larger than `BERT-Large`. It is
possible that we will release larger models if we are able to obtain significant
improvements.
#### What license is this library released under?
All code *and* models are released under the Apache 2.0 license. See the
`LICENSE` file for more information.
#### How do I cite BERT?
For now, cite [the Arxiv paper](https://arxiv.org/abs/1810.04805):
```
@article{devlin2018bert,
title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
journal={arXiv preprint arXiv:1810.04805},
year={2018}
}
```
If we submit the paper to a conference or journal, we will update the BibTeX.
## Note on pre-training
## Disclaimer
The original TensorFlow code also release two scripts for pre-training BERT: [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py) and [run_pretraining.py](https://github.com/google-research/bert/blob/master/run_pretraining.py).
As the authors notice, pre-training BERT is particularly expensive and requires TPU to run in a reasonable amout of time (see [here](https://github.com/google-research/bert#pre-training-with-bert)).
This is not an official Google product.
We have decided **not** to port these scripts for now and wait for the TPU support on PyTorch (see the recent [official announcement](https://cloud.google.com/blog/products/ai-machine-learning/introducing-pytorch-across-google-cloud)).
## Contact information
For help or issues using BERT, please submit a GitHub issue.
## Requirements
For personal communication related to BERT, please contact Jacob Devlin
The main dependencies of this code are:
(`jacobdevlin@google.com`), Ming-Wei Chang (`mingweichang@google.com`), or