Unverified Commit 08f534d2 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Doc styling (#8067)

* Important files

* Styling them all

* Revert "Styling them all"

This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.

* Syling them for realsies

* Fix syntax error

* Fix benchmark_utils

* More fixes

* Fix modeling auto and script

* Remove new line

* Fixes

* More fixes

* Fix more files

* Style

* Add FSMT

* More fixes

* More fixes

* More fixes

* More fixes

* Fixes

* More fixes

* More fixes

* Last fixes

* Make sphinx happy
parent 04a17f85
......@@ -17,7 +17,7 @@ work properly.
the text you give it in tokens the same way for the pretraining corpus, and it will use the same correspondence
token to index (that we usually call a `vocab`) as during pretraining.
To automatically download the vocab used during pretraining or fine-tuning a given model, you can use the
To automatically download the vocab used during pretraining or fine-tuning a given model, you can use the
:func:`~transformers.AutoTokenizer.from_pretrained` method:
.. code-block::
......@@ -39,10 +39,10 @@ is its ``__call__``: you just need to feed your sentence to your tokenizer objec
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
This returns a dictionary string to list of ints.
The `input_ids <glossary.html#input-ids>`__ are the indices corresponding to each token in our sentence. We will see
below what the `attention_mask <glossary.html#attention-mask>`__ is used for and in
:ref:`the next section <sentence-pairs>` the goal of `token_type_ids <glossary.html#token-type-ids>`__.
This returns a dictionary string to list of ints. The `input_ids <glossary.html#input-ids>`__ are the indices
corresponding to each token in our sentence. We will see below what the `attention_mask
<glossary.html#attention-mask>`__ is used for and in :ref:`the next section <sentence-pairs>` the goal of
`token_type_ids <glossary.html#token-type-ids>`__.
The tokenizer can decode a list of token ids in a proper sentence:
......@@ -51,10 +51,10 @@ The tokenizer can decode a list of token ids in a proper sentence:
>>> tokenizer.decode(encoded_input["input_ids"])
"[CLS] Hello, I'm a single sentence! [SEP]"
As you can see, the tokenizer automatically added some special tokens that the model expects. Not all models need special
tokens; for instance, if we had used` gtp2-medium` instead of `bert-base-cased` to create our tokenizer, we would have
seen the same sentence as the original one here. You can disable this behavior (which is only advised if you have added
those special tokens yourself) by passing ``add_special_tokens=False``.
As you can see, the tokenizer automatically added some special tokens that the model expects. Not all models need
special tokens; for instance, if we had used` gtp2-medium` instead of `bert-base-cased` to create our tokenizer, we
would have seen the same sentence as the original one here. You can disable this behavior (which is only advised if you
have added those special tokens yourself) by passing ``add_special_tokens=False``.
If you have several sentences you want to process, you can do this efficiently by sending them as a list to the
tokenizer:
......@@ -114,9 +114,9 @@ You can do all of this by using the following options when feeding your list of
[1, 1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]])}
It returns a dictionary with string keys and tensor values. We can now see what the `attention_mask <glossary.html#attention-mask>`__ is
all about: it points out which tokens the model should pay attention to and which ones it should not (because they
represent padding in this case).
It returns a dictionary with string keys and tensor values. We can now see what the `attention_mask
<glossary.html#attention-mask>`__ is all about: it points out which tokens the model should pay attention to and which
ones it should not (because they represent padding in this case).
Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
......@@ -127,9 +127,9 @@ can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer
Preprocessing pairs of sentences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sometimes you need to feed a pair of sentences to your model. For instance, if you want to classify if two sentences in a
pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input is
then represented like this: :obj:`[CLS] Sequence A [SEP] Sequence B [SEP]`
Sometimes you need to feed a pair of sentences to your model. For instance, if you want to classify if two sentences in
a pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input
is then represented like this: :obj:`[CLS] Sequence A [SEP] Sequence B [SEP]`
You can encode a pair of sentences in the format expected by your model by supplying the two sentences as two arguments
(not a list since a list of two sentences will be interpreted as a batch of two single sentences, as we saw before).
......@@ -146,8 +146,8 @@ This will once again return a dict string to list of ints:
This shows us what the `token_type_ids <glossary.html#token-type-ids>`__ are for: they indicate to the model which part
of the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
`token_type_ids` are not required or handled by all models. By default, a tokenizer will only return the inputs that
its associated model expects. You can force the return (or the non-return) of any of those special arguments by
using ``return_input_ids`` or ``return_token_type_ids``.
its associated model expects. You can force the return (or the non-return) of any of those special arguments by using
``return_input_ids`` or ``return_token_type_ids``.
If we decode the token ids we obtained, we will see that the special tokens have been properly added.
......@@ -215,7 +215,7 @@ three arguments you need to know for this are :obj:`padding`, :obj:`truncation`
a single sequence).
- :obj:`'max_length'` to pad to a length specified by the :obj:`max_length` argument or the maximum length accepted
by the model if no :obj:`max_length` is provided (``max_length=None``). If you only provide a single sequence,
padding will still be applied to it.
padding will still be applied to it.
- :obj:`False` or :obj:`'do_not_pad'` to not pad the sequences. As we have seen before, this is the default
behavior.
......@@ -238,9 +238,9 @@ three arguments you need to know for this are :obj:`padding`, :obj:`truncation`
truncation/padding to :obj:`max_length` is deactivated.
Here is a table summarizing the recommend way to setup padding and truncation. If you use pair of inputs sequence in
any of the following examples, you can replace :obj:`truncation=True` by a :obj:`STRATEGY` selected in
:obj:`['only_first', 'only_second', 'longest_first']`, i.e. :obj:`truncation='only_second'` or
:obj:`truncation= 'longest_first'` to control how both sequence in the pair are truncated as detailed before.
any of the following examples, you can replace :obj:`truncation=True` by a :obj:`STRATEGY` selected in
:obj:`['only_first', 'only_second', 'longest_first']`, i.e. :obj:`truncation='only_second'` or :obj:`truncation=
'longest_first'` to control how both sequence in the pair are truncated as detailed before.
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| Truncation | Padding | Instruction |
......
......@@ -3,7 +3,8 @@ Pretrained models
Here is the full list of the currently provided pretrained models together with a short presentation of each model.
For a list that includes community-uploaded models, refer to `https://huggingface.co/models <https://huggingface.co/models>`__.
For a list that includes community-uploaded models, refer to `https://huggingface.co/models
<https://huggingface.co/models>`__.
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Architecture | Shortcut name | Details of the model |
......
Quick tour
=======================================================================================================================
Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for
Natural Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG),
Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for Natural
Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG),
such as completing a prompt with new text or translating in another language.
First we will see how to easily leverage the pipeline API to quickly use those pretrained models at inference. Then, we
......@@ -29,8 +29,8 @@ provides the following tasks out of the box:
- Translation: translate a text in another language.
- Feature extraction: return a tensor representation of the text.
Let's see how this work for sentiment analysis (the other tasks are all covered in the
:doc:`task summary </task_summary>`):
Let's see how this work for sentiment analysis (the other tasks are all covered in the :doc:`task summary
</task_summary>`):
.. code-block::
......@@ -160,9 +160,10 @@ To apply these steps on a given text, we can just feed it to our tokenizer:
>>> inputs = tokenizer("We are very happy to show you the 🤗 Transformers library.")
This returns a dictionary string to list of ints. It contains the `ids of the tokens <glossary.html#input-ids>`__,
as mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an
`attention mask <glossary.html#attention-mask>`__ that the model will use to have a better understanding of the sequence:
This returns a dictionary string to list of ints. It contains the `ids of the tokens <glossary.html#input-ids>`__, as
mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an
`attention mask <glossary.html#attention-mask>`__ that the model will use to have a better understanding of the
sequence:
.. code-block::
......@@ -191,8 +192,8 @@ and get tensors back. You can specify all of that to the tokenizer:
... return_tensors="tf"
... )
The padding is automatically applied on the side expected by the model (in this case, on the right), with the
padding token the model was pretrained with. The attention mask is also adapted to take the padding into account:
The padding is automatically applied on the side expected by the model (in this case, on the right), with the padding
token the model was pretrained with. The attention mask is also adapted to take the padding into account:
.. code-block::
......@@ -213,8 +214,8 @@ Using the model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once your input has been preprocessed by the tokenizer, you can send it directly to the model. As we mentioned, it will
contain all the relevant information the model needs. If you're using a TensorFlow model, you can pass the
dictionary keys directly to tensors, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`.
contain all the relevant information the model needs. If you're using a TensorFlow model, you can pass the dictionary
keys directly to tensors, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`.
.. code-block::
......@@ -223,8 +224,8 @@ dictionary keys directly to tensors, for a PyTorch model, you need to unpack the
>>> ## TENSORFLOW CODE
>>> tf_outputs = tf_model(tf_batch)
In 🤗 Transformers, all outputs are tuples (with only one element potentially). Here, we get a tuple with just the
final activations of the model.
In 🤗 Transformers, all outputs are tuples (with only one element potentially). Here, we get a tuple with just the final
activations of the model.
.. code-block::
......@@ -239,11 +240,10 @@ final activations of the model.
[ 0.08181786, -0.04179301]], dtype=float32)>,)
The model can return more than just the final activations, which is why the output is a tuple. Here we only asked for
the final activations, so we get a tuple with one element.
.. note::
the final activations, so we get a tuple with one element. .. note::
All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final
activation function (like SoftMax) since this final activation function is often fused with the loss.
All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final activation
function (like SoftMax) since this final activation function is often fused with the loss.
Let's apply the SoftMax activation to get predictions.
......@@ -281,11 +281,11 @@ If you have labels, you can provide them to the model, it will return a tuple wi
>>> import tensorflow as tf
>>> tf_outputs = tf_model(tf_batch, labels = tf.constant([1, 0]))
Models are standard `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ or
`tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ so you can use them in your usual
training loop. 🤗 Transformers also provides a :class:`~transformers.Trainer` (or :class:`~transformers.TFTrainer` if
you are using TensorFlow) class to help with your training (taking care of things such as distributed training, mixed
precision, etc.). See the :doc:`training tutorial <training>` for more details.
Models are standard `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ or `tf.keras.Model
<https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ so you can use them in your usual training loop. 🤗
Transformers also provides a :class:`~transformers.Trainer` (or :class:`~transformers.TFTrainer` if you are using
TensorFlow) class to help with your training (taking care of things such as distributed training, mixed precision,
etc.). See the :doc:`training tutorial <training>` for more details.
.. note::
......@@ -336,13 +336,13 @@ The :obj:`AutoModel` and :obj:`AutoTokenizer` classes are just shortcuts that wi
pretrained model. Behind the scenes, the library has one model class per combination of architecture plus class, so the
code is easy to access and tweak if you need to.
In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's
using the :doc:`DistilBERT </model_doc/distilbert>` architecture. As
:class:`~transformers.AutoModelForSequenceClassification` (or :class:`~transformers.TFAutoModelForSequenceClassification`
if you are using TensorFlow) was used, the model automatically created is then a
:class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant
to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer
without the auto magic:
In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's using
the :doc:`DistilBERT </model_doc/distilbert>` architecture. As
:class:`~transformers.AutoModelForSequenceClassification` (or
:class:`~transformers.TFAutoModelForSequenceClassification` if you are using TensorFlow) was used, the model
automatically created is then a :class:`~transformers.DistilBertForSequenceClassification`. You can look at its
documentation for all details relevant to that specific model, or browse the source code. This is how you would
directly instantiate model and tokenizer without the auto magic:
.. code-block::
......
......@@ -5,16 +5,18 @@ Exporting transformers models
ONNX / ONNXRuntime
=======================================================================================================================
Projects `ONNX (Open Neural Network eXchange) <http://onnx.ai>`_ and `ONNXRuntime (ORT) <https://microsoft.github.io/onnxruntime/>`_ are part of an effort from leading industries in the AI field
to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
Projects `ONNX (Open Neural Network eXchange) <http://onnx.ai>`_ and `ONNXRuntime (ORT)
<https://microsoft.github.io/onnxruntime/>`_ are part of an effort from leading industries in the AI field to provide a
unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
of hardware and dedicated optimizations.
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to
the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines using
Hugging Face Transformers and ONNX Runtime <https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines
using Hugging Face Transformers and ONNX Runtime
<https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources.
The following command shows how easy it is to export a BERT model from the library, simply run:
Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources. The
following command shows how easy it is to export a BERT model from the library, simply run:
.. code-block:: bash
......@@ -27,62 +29,66 @@ The conversion tool works for both PyTorch and Tensorflow models and ensures:
* The generated model can be correctly loaded through onnxruntime.
.. note::
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations
on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please
open up an issue on transformers.
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations on the
ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please open up an issue on
transformers.
Also, the conversion tool supports different options which let you tune the behavior of the generated model:
* **Change the target opset version of the generated model.** (More recent opset generally supports more operators and enables faster inference)
* **Change the target opset version of the generated model.** (More recent opset generally supports more operators and
enables faster inference)
* **Export pipeline-specific prediction heads.** (Allow to export model along with its task-specific prediction head(s))
* **Export pipeline-specific prediction heads.** (Allow to export model along with its task-specific prediction
head(s))
* **Use the external data format (PyTorch only).** (Lets you export model which size is above 2Gb (`More info <https://github.com/pytorch/pytorch/pull/33062>`_))
* **Use the external data format (PyTorch only).** (Lets you export model which size is above 2Gb (`More info
<https://github.com/pytorch/pytorch/pull/33062>`_))
Optimizations
-----------------------------------------------------------------------------------------------------------------------
ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph.
Below are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*):
ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph. Below
are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*):
* Constant folding
* Attention Layer fusing
* Skip connection LayerNormalization fusing
* FastGeLU approximation
Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances
if used on another machine with a different hardware configuration than the one used for exporting the model.
For this reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled,
ensuring the model can be easily exported to various hardware.
Optimizations can then be enabled when loading the model through ONNX runtime for inference.
Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances if
used on another machine with a different hardware configuration than the one used for exporting the model. For this
reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled, ensuring the model can be easily
exported to various hardware. Optimizations can then be enabled when loading the model through ONNX runtime for
inference.
.. note::
When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the model
because quantization would modify the underlying graph making it impossible for ONNX runtime to do the optimizations
afterwards.
When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the
model because quantization would modify the underlying graph making it impossible for ONNX runtime to do the
optimizations afterwards.
.. note::
For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github
<https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
Quantization
-----------------------------------------------------------------------------------------------------------------------
ONNX exporter supports generating a quantized version of the model to allow efficient inference.
Quantization works by converting the memory representation of the parameters in the neural network
to a compact integer format. By default, weights of a neural network are stored as single-precision float (`float32`)
which can express a wide-range of floating-point numbers with decent precision.
These properties are especially interesting at training where you want fine-grained representation.
Quantization works by converting the memory representation of the parameters in the neural network to a compact integer
format. By default, weights of a neural network are stored as single-precision float (`float32`) which can express a
wide-range of floating-point numbers with decent precision. These properties are especially interesting at training
where you want fine-grained representation.
On the other hand, after the training phase, it has been shown one can greatly reduce the range and the precision of `float32` numbers
without changing the performances of the neural network.
On the other hand, after the training phase, it has been shown one can greatly reduce the range and the precision of
`float32` numbers without changing the performances of the neural network.
More technically, `float32` parameters are converted to a type requiring fewer bits to represent each number, thus reducing
the overall size of the model. Here, we are enabling `float32` mapping to `int8` values (a non-floating, single byte, number representation)
according to the following formula:
More technically, `float32` parameters are converted to a type requiring fewer bits to represent each number, thus
reducing the overall size of the model. Here, we are enabling `float32` mapping to `int8` values (a non-floating,
single byte, number representation) according to the following formula:
.. math::
y_{float32} = scale * x_{int8} - zero\_point
......@@ -96,9 +102,9 @@ Leveraging tiny-integers has numerous advantages when it comes to inference:
* Integer operations execute a magnitude faster on modern hardware
* Integer operations require less power to do the computations
In order to convert a transformers model to ONNX IR with quantized weights you just need to specify ``--quantize``
when using ``convert_graph_to_onnx.py``. Also, you can have a look at the ``quantize()`` utility-method in this
same script file.
In order to convert a transformers model to ONNX IR with quantized weights you just need to specify ``--quantize`` when
using ``convert_graph_to_onnx.py``. Also, you can have a look at the ``quantize()`` utility-method in this same script
file.
Example of quantized BERT model export:
......@@ -111,26 +117,27 @@ Example of quantized BERT model export:
.. note::
When exporting quantized model you will end up with two different ONNX files. The one specified at the end of the
above command will contain the original ONNX model storing `float32` weights.
The second one, with ``-quantized`` suffix, will hold the quantized parameters.
above command will contain the original ONNX model storing `float32` weights. The second one, with ``-quantized``
suffix, will hold the quantized parameters.
TorchScript
=======================================================================================================================
.. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
with variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming
releases, with more code examples, a more flexible implementation, and benchmarks comparing python-based codes
with compiled TorchScript.
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities with
variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming releases,
with more code examples, a more flexible implementation, and benchmarks comparing python-based codes with compiled
TorchScript.
According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch code".
Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export
According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch
code". Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export
their model to be re-used in other programs, such as efficiency-oriented C++ programs.
We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can
be reused in a different environment than a Pytorch-based python program. Here we explain how to export and use our models using TorchScript.
We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can be reused
in a different environment than a Pytorch-based python program. Here we explain how to export and use our models using
TorchScript.
Exporting a model requires two things:
......@@ -145,13 +152,14 @@ Implications
TorchScript flag and tied weights
-----------------------------------------------------------------------------------------------------------------------
This flag is necessary because most of the language models in this repository have tied weights between their
``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied weights, therefore
it is necessary to untie and clone the weights beforehand.
``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied
weights, therefore it is necessary to untie and clone the weights beforehand.
This implies that models instantiated with the ``torchscript`` flag have their ``Embedding`` layer and ``Decoding`` layer
separate, which means that they should not be trained down the line. Training would de-synchronize the two layers,
leading to unexpected results.
This implies that models instantiated with the ``torchscript`` flag have their ``Embedding`` layer and ``Decoding``
layer separate, which means that they should not be trained down the line. Training would de-synchronize the two
layers, leading to unexpected results.
This is not the case for models that do not have a Language Model head, as those do not have tied weights. These models
can be safely exported without the ``torchscript`` flag.
......@@ -160,8 +168,8 @@ Dummy inputs and standard lengths
-----------------------------------------------------------------------------------------------------------------------
The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used
to create the "trace" of the model.
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used to
create the "trace" of the model.
The trace is created relatively to the inputs' dimensions. It is therefore constrained by the dimensions of the dummy
input, and will not work for any other sequence length or batch size. When trying with a different size, an error such
......@@ -185,8 +193,8 @@ Below is an example, showing how to save, load models as well as how to use the
Saving a model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated according
to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
.. code-block:: python
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
......@@ -14,18 +14,19 @@ def swish(x):
def _gelu_python(x):
"""Original Implementation of the gelu activation function in Google Bert repo when initially created.
For information: OpenAI GPT's gelu is slightly different (and gives slightly different results):
0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
This is now written in C in torch.nn.functional
Also see https://arxiv.org/abs/1606.08415
"""
Original Implementation of the gelu activation function in Google Bert repo when initially created. For
information: OpenAI GPT's gelu is slightly different (and gives slightly different results): 0.5 * x * (1 +
torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))) This is now written in C in
torch.nn.functional Also see https://arxiv.org/abs/1606.08415
"""
return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
def gelu_new(x):
"""Implementation of the gelu activation function currently in Google Bert repo (identical to OpenAI GPT).
Also see https://arxiv.org/abs/1606.08415
"""
Implementation of the gelu activation function currently in Google Bert repo (identical to OpenAI GPT). Also see
https://arxiv.org/abs/1606.08415
"""
return 0.5 * x * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (x + 0.044715 * torch.pow(x, 3.0))))
......
......@@ -4,11 +4,11 @@ import tensorflow as tf
def gelu(x):
"""Gaussian Error Linear Unit.
Original Implementation of the gelu activation function in Google Bert repo when initially created.
For information: OpenAI GPT's gelu is slightly different (and gives slightly different results):
0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
Also see https://arxiv.org/abs/1606.08415
"""
Gaussian Error Linear Unit. Original Implementation of the gelu activation function in Google Bert repo when
initially created. For information: OpenAI GPT's gelu is slightly different (and gives slightly different results):
0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3)))) Also see
https://arxiv.org/abs/1606.08415
"""
x = tf.convert_to_tensor(x)
cdf = 0.5 * (1.0 + tf.math.erf(x / tf.math.sqrt(2.0)))
......@@ -17,11 +17,12 @@ def gelu(x):
def gelu_new(x):
"""Gaussian Error Linear Unit.
This is a smoother version of the GELU.
Original paper: https://arxiv.org/abs/1606.08415
"""
Gaussian Error Linear Unit. This is a smoother version of the GELU. Original paper: https://arxiv.org/abs/1606.0841
Args:
x: float Tensor to perform activation.
x: float Tensor to perform activation
Returns:
`x` with the GELU activation applied.
"""
......
......@@ -46,8 +46,9 @@ class PyTorchBenchmarkArguments(BenchmarkArguments):
]
def __init__(self, **kwargs):
"""This __init__ is there for legacy code. When removing
deprecated args completely, the class can simply be deleted
"""
This __init__ is there for legacy code. When removing deprecated args completely, the class can simply be
deleted
"""
for deprecated_arg in self.deprecated_args:
if deprecated_arg in kwargs:
......
......@@ -43,8 +43,9 @@ class TensorFlowBenchmarkArguments(BenchmarkArguments):
]
def __init__(self, **kwargs):
"""This __init__ is there for legacy code. When removing
deprecated args completely, the class can simply be deleted
"""
This __init__ is there for legacy code. When removing deprecated args completely, the class can simply be
deleted
"""
for deprecated_arg in self.deprecated_args:
if deprecated_arg in kwargs:
......
......@@ -16,9 +16,9 @@ def convert_command_factory(args: Namespace):
)
IMPORT_ERROR_MESSAGE = """transformers can only be used from the commandline to convert TensorFlow models in PyTorch,
In that case, it requires TensorFlow to be installed. Please see
https://www.tensorflow.org/install/ for installation instructions.
IMPORT_ERROR_MESSAGE = """
transformers can only be used from the commandline to convert TensorFlow models in PyTorch, In that case, it requires
TensorFlow to be installed. Please see https://www.tensorflow.org/install/ for installation instructions.
"""
......
......@@ -164,9 +164,9 @@ class ServeCommand(BaseTransformersCLICommand):
def tokenize(self, text_input: str = Body(None, embed=True), return_ids: bool = Body(False, embed=True)):
"""
Tokenize the provided input and eventually returns corresponding tokens id:
- **text_input**: String to tokenize
- **return_ids**: Boolean flags indicating if the tokens have to be converted to their integer mapping.
Tokenize the provided input and eventually returns corresponding tokens id: - **text_input**: String to
tokenize - **return_ids**: Boolean flags indicating if the tokens have to be converted to their integer
mapping.
"""
try:
tokens_txt = self._pipeline.tokenizer.tokenize(text_input)
......@@ -187,10 +187,9 @@ class ServeCommand(BaseTransformersCLICommand):
cleanup_tokenization_spaces: bool = Body(True, embed=True),
):
"""
Detokenize the provided tokens ids to readable text:
- **tokens_ids**: List of tokens ids
- **skip_special_tokens**: Flag indicating to not try to decode special tokens
- **cleanup_tokenization_spaces**: Flag indicating to remove all leading/trailing spaces and intermediate ones.
Detokenize the provided tokens ids to readable text: - **tokens_ids**: List of tokens ids -
**skip_special_tokens**: Flag indicating to not try to decode special tokens - **cleanup_tokenization_spaces**:
Flag indicating to remove all leading/trailing spaces and intermediate ones.
"""
try:
decoded_str = self._pipeline.tokenizer.decode(tokens_ids, skip_special_tokens, cleanup_tokenization_spaces)
......
......@@ -37,9 +37,8 @@ class AlbertConfig(PretrainedConfig):
arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar
configuration to that of the ALBERT `xxlarge <https://huggingface.co/albert-xxlarge-v2>`__ architecture.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used
to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig`
for more information.
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
Args:
vocab_size (:obj:`int`, `optional`, defaults to 30000):
......@@ -61,15 +60,15 @@ class AlbertConfig(PretrainedConfig):
inner_group_num (:obj:`int`, `optional`, defaults to 1):
The number of inner repetition of attention and ffn.
hidden_act (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu_new"`):
The non-linear activation function (function or string) in the encoder and pooler.
If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
The non-linear activation function (function or string) in the encoder and pooler. If string,
:obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
hidden_dropout_prob (:obj:`float`, `optional`, defaults to 0):
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
attention_probs_dropout_prob (:obj:`float`, `optional`, defaults to 0):
The dropout ratio for the attention probabilities.
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
The maximum sequence length that this model might ever be used with. Typically set this to something
large (e.g., 512 or 1024 or 2048).
The maximum sequence length that this model might ever be used with. Typically set this to something large
(e.g., 512 or 1024 or 2048).
type_vocab_size (:obj:`int`, `optional`, defaults to 2):
The vocabulary size of the :obj:`token_type_ids` passed when calling :class:`~transformers.AlbertModel` or
:class:`~transformers.TFAlbertModel`.
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment