"...composable_kernel.git" did not exist on "93e7f92a9cbfd076791b415d26ece01539846c99"
Commit c82b74b9 authored by LysandreJik's avatar LysandreJik
Browse files

Fixed Sphinx errors and warnings

parent 5288913b
This diff is collapsed.
...@@ -14,6 +14,8 @@ Here is a detailed documentation of the classes in the package and how to use th ...@@ -14,6 +14,8 @@ Here is a detailed documentation of the classes in the package and how to use th
* - `Serialization best-practices <#serialization-best-practices>`__ * - `Serialization best-practices <#serialization-best-practices>`__
- How to save and reload a fine-tuned model - How to save and reload a fine-tuned model
* - `Configurations <#configurations>`__ * - `Configurations <#configurations>`__
- API of the configuration classes for BERT, GPT, GPT-2 and Transformer-XL
TODO Lysandre filled: Removed Models/Tokenizers/Optimizers as no single link can be made. TODO Lysandre filled: Removed Models/Tokenizers/Optimizers as no single link can be made.
......
...@@ -11,6 +11,6 @@ We include `three Jupyter Notebooks <https://github.com/huggingface/pytorch-pret ...@@ -11,6 +11,6 @@ We include `three Jupyter Notebooks <https://github.com/huggingface/pytorch-pret
The second NoteBook (\ `Comparing-TF-and-PT-models-SQuAD.ipynb <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb>`_\ ) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the ``BertForQuestionAnswering`` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models. The second NoteBook (\ `Comparing-TF-and-PT-models-SQuAD.ipynb <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb>`_\ ) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the ``BertForQuestionAnswering`` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models.
* *
The third NoteBook (\ `Comparing-TF-and-PT-models-MLM-NSP.ipynb <https://github.com/huggingface/pytorch-pretrained-BERT/tree/mnotebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb>`_\ ) compares the predictions computed by the TensorFlow and the PyTorch models for masked token language modeling using the pre-trained masked language modeling model. The third NoteBook (\ `Comparing-TF-and-PT-models-MLM-NSP.ipynb <https://github.com/huggingface/pytorch-pretrained-BERT/tree/notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb>`_\ ) compares the predictions computed by the TensorFlow and the PyTorch models for masked token language modeling using the pre-trained masked language modeling model.
Please follow the instructions given in the notebooks to run and modify them. Please follow the instructions given in the notebooks to run and modify them.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment