Unverified Commit 956c9173 authored by Thomas Wolf's avatar Thomas Wolf Committed by GitHub
Browse files

fix typos in readme

parent 27ee0fff
...@@ -159,7 +159,7 @@ Here is a detailed documentation of the classes in the package and how to use th ...@@ -159,7 +159,7 @@ Here is a detailed documentation of the classes in the package and how to use th
### Loading Google AI's pre-trained weigths and PyTorch dump ### Loading Google AI's pre-trained weigths and PyTorch dump
To load Google AI's pre-trained weight or a PyTorch saved instance of `BertForPreTraining`, the PyTorch model classes and the tokenizer can be instantiated as To load one of Google AI's pre-trained models or a PyTorch saved model (an instance of `BertForPreTraining` saved with `torch.save()`), the PyTorch model classes and the tokenizer can be instantiated as
```python ```python
model = BERT_CLASS.from_pretrain(PRE_TRAINED_MODEL_NAME_OR_PATH) model = BERT_CLASS.from_pretrain(PRE_TRAINED_MODEL_NAME_OR_PATH)
...@@ -180,8 +180,9 @@ where ...@@ -180,8 +180,9 @@ where
- `bert-base-chinese`: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters - `bert-base-chinese`: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
- a path or url to a pretrained model archive containing: - a path or url to a pretrained model archive containing:
. `bert_config.json` a configuration file for the model
. `pytorch_model.bin` a PyTorch dump of a pre-trained instance `BertForPreTraining` (saved with the usual `torch.save()`) - `bert_config.json` a configuration file for the model, and
- `pytorch_model.bin` a PyTorch dump of a pre-trained instance `BertForPreTraining` (saved with the usual `torch.save()`)
If `PRE_TRAINED_MODEL_NAME` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links [here](pytorch_pretrained_bert/modeling.py)) and stored in a cache folder to avoid future download (the cache folder can be found at `~/.pytorch_pretrained_bert/`). If `PRE_TRAINED_MODEL_NAME` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links [here](pytorch_pretrained_bert/modeling.py)) and stored in a cache folder to avoid future download (the cache folder can be found at `~/.pytorch_pretrained_bert/`).
...@@ -304,15 +305,15 @@ Please refer to the doc strings and code in [`tokenization.py`](./pytorch_pretra ...@@ -304,15 +305,15 @@ Please refer to the doc strings and code in [`tokenization.py`](./pytorch_pretra
The optimizer accepts the following arguments: The optimizer accepts the following arguments:
- `lr` : learning rate - `lr` : learning rate
- `warmup` : portion of t_total for the warmup, -1 means no warmup. Default : -1 - `warmup` : portion of `t_total` for the warmup, `-1` means no warmup. Default : `-1`
- `t_total` : total number of training steps for the learning - `t_total` : total number of training steps for the learning
rate schedule, -1 means constant learning rate. Default : -1 rate schedule, `-1` means constant learning rate. Default : `-1`
- `schedule` : schedule to use for the warmup (see above). Default : 'warmup_linear' - `schedule` : schedule to use for the warmup (see above). Default : `'warmup_linear'`
- `b1` : Adams b1. Default : 0.9 - `b1` : Adams b1. Default : `0.9`
- `b2` : Adams b2. Default : 0.999 - `b2` : Adams b2. Default : `0.999`
- `e` : Adams epsilon. Default : 1e-6 - `e` : Adams epsilon. Default : `1e-6`
- `weight_decay_rate:` Weight decay. Default : 0.01 - `weight_decay_rate:` Weight decay. Default : `0.01`
- `max_grad_norm` : Maximum norm for the gradients (-1 means no clipping). Default : 1.0 - `max_grad_norm` : Maximum norm for the gradients (`-1` means no clipping). Default : `1.0`
## Examples ## Examples
...@@ -467,21 +468,19 @@ The results were similar to the above FP32 results (actually slightly higher): ...@@ -467,21 +468,19 @@ The results were similar to the above FP32 results (actually slightly higher):
## Notebooks ## Notebooks
Comparing the PyTorch model and the TensorFlow model predictions We include [three Jupyter Notebooks](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/notebooks) that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model.
We also include [three Jupyter Notebooks](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/notebooks) that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model.
- The first NoteBook ([Comparing-TF-and-PT-models.ipynb](./notebooks/Comparing-TF-and-PT-models.ipynb)) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e-7 to 9e-7 on the various hidden state of the models. - The first NoteBook ([Comparing-TF-and-PT-models.ipynb](./notebooks/Comparing-TF-and-PT-models.ipynb)) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e-7 to 9e-7 on the various hidden state of the models.
- The second NoteBook ([Comparing-TF-and-PT-models-SQuAD.ipynb](./notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb)) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the `BertForQuestionAnswering` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models. - The second NoteBook ([Comparing-TF-and-PT-models-SQuAD.ipynb](./notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb)) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the `BertForQuestionAnswering` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models.
- The third NoteBook ([Comparing-TF-and-PT-models-MLM-NSP.ipynb](./notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb)) compares the predictions computed by the TensorFlow and the PyTorch models for masked token using the pre-trained masked language modeling model. - The third NoteBook ([Comparing-TF-and-PT-models-MLM-NSP.ipynb](./notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb)) compares the predictions computed by the TensorFlow and the PyTorch models for masked token language modeling using the pre-trained masked language modeling model.
Please follow the instructions given in the notebooks to run and modify them. Please follow the instructions given in the notebooks to run and modify them.
## Command-line interface ## Command-line interface
A command-line interface is provided to convert a TensorFlow checkpoint in a PyTorch checkpoint A command-line interface is provided to convert a TensorFlow checkpoint in a PyTorch dump of the `BertForPreTraining` class (see above).
You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the [`convert_tf_checkpoint_to_pytorch.py`](convert_tf_checkpoint_to_pytorch.py) script. You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the [`convert_tf_checkpoint_to_pytorch.py`](convert_tf_checkpoint_to_pytorch.py) script.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment