"sgl-router/git@developer.sourcefind.cn:zhaoyu6/sglang.git" did not exist on "0e82fd3df4b2f36e3352b72ec2a441d9efed3a5f"
Unverified Commit 207594be authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Convert rst files (#14888)

* Convert all tutorials and guides

* Convert all remaining rst to mdx

* Track and fix bad links
parent b0c7d2ec
.. <!--Copyright 2020 The HuggingFace Team. All rights reserved.
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
-->
Exporting transformers models # Exporting transformers models
***********************************************************************************************************************
ONNX / ONNXRuntime ## ONNX / ONNXRuntime
=======================================================================================================================
Projects `ONNX (Open Neural Network eXchange) <http://onnx.ai>`_ and `ONNXRuntime (ORT) Projects [ONNX (Open Neural Network eXchange)](http://onnx.ai) and [ONNXRuntime (ORT)](https://microsoft.github.io/onnxruntime/) are part of an effort from leading industries in the AI field to provide a
<https://microsoft.github.io/onnxruntime/>`_ are part of an effort from leading industries in the AI field to provide a
unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
of hardware and dedicated optimizations. of hardware and dedicated optimizations.
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to
the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines the ONNX format. You can have a look at the effort by looking at our joint blog post [Accelerate your NLP pipelines
using Hugging Face Transformers and ONNX Runtime using Hugging Face Transformers and ONNX Runtime](https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333).
<https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
Configuration-based approach ### Configuration-based approach
-----------------------------------------------------------------------------------------------------------------------
Transformers v4.9.0 introduces a new package: ``transformers.onnx``. This package allows converting checkpoints to an Transformers v4.9.0 introduces a new package: `transformers.onnx`. This package allows converting checkpoints to an
ONNX graph by leveraging configuration objects. These configuration objects come ready made for a number of model ONNX graph by leveraging configuration objects. These configuration objects come ready made for a number of model
architectures, and are made to be easily extendable to other architectures. architectures, and are made to be easily extendable to other architectures.
Ready-made configurations include the following models: Ready-made configurations include the following models:
.. <!--This table is automatically generated by make style, do not fill manually!-->
This table is automatically generated by make style, do not fill manually!
- ALBERT - ALBERT
- BART - BART
...@@ -57,104 +51,104 @@ Ready-made configurations include the following models: ...@@ -57,104 +51,104 @@ Ready-made configurations include the following models:
This conversion is handled with the PyTorch version of models - it, therefore, requires PyTorch to be installed. If you This conversion is handled with the PyTorch version of models - it, therefore, requires PyTorch to be installed. If you
would like to be able to convert from TensorFlow, please let us know by opening an issue. would like to be able to convert from TensorFlow, please let us know by opening an issue.
.. note:: <Tip>
The models showcased here are close to fully feature complete, but do lack some features that are currently in
development. Namely, the ability to handle the past key values for decoder models is currently in the works.
The models showcased here are close to fully feature complete, but do lack some features that are currently in
development. Namely, the ability to handle the past key values for decoder models is currently in the works.
Converting an ONNX model using the ``transformers.onnx`` package </Tip>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The package may be used as a Python module: #### Converting an ONNX model using the `transformers.onnx` package
.. code-block:: The package may be used as a Python module:
python -m transformers.onnx --help ```bash
python -m transformers.onnx --help
usage: Hugging Face ONNX Exporter tool [-h] -m MODEL -f {pytorch} [--features {default}] [--opset OPSET] [--atol ATOL] output usage: Hugging Face ONNX Exporter tool [-h] -m MODEL -f {pytorch} [--features {default}] [--opset OPSET] [--atol ATOL] output
positional arguments: positional arguments:
output Path indicating where to store generated ONNX model. output Path indicating where to store generated ONNX model.
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
-m MODEL, --model MODEL -m MODEL, --model MODEL
Model's name of path on disk to load. Model's name of path on disk to load.
--features {default} Export the model with some additional features. --features {default} Export the model with some additional features.
--opset OPSET ONNX opset version to export the model with (default 12). --opset OPSET ONNX opset version to export the model with (default 12).
--atol ATOL Absolute difference tolerance when validating the model. --atol ATOL Absolute difference tolerance when validating the model.
```
Exporting a checkpoint using a ready-made configuration can be done as follows: Exporting a checkpoint using a ready-made configuration can be done as follows:
.. code-block:: ```bash
python -m transformers.onnx --model=bert-base-cased onnx/bert-base-cased/
python -m transformers.onnx --model=bert-base-cased onnx/bert-base-cased/ ```
This exports an ONNX graph of the mentioned checkpoint. Here it is `bert-base-cased`, but it can be any model from the This exports an ONNX graph of the mentioned checkpoint. Here it is *bert-base-cased*, but it can be any model from the
hub, or a local path. hub, or a local path.
It will be exported under ``onnx/bert-base-cased``. You should see similar logs: It will be exported under `onnx/bert-base-cased`. You should see similar logs:
.. code-block:: ```bash
Validating ONNX model...
Validating ONNX model... -[✓] ONNX model outputs' name match reference model ({'pooler_output', 'last_hidden_state'}
-[✓] ONNX model outputs' name match reference model ({'pooler_output', 'last_hidden_state'} - Validating ONNX Model output "last_hidden_state":
- Validating ONNX Model output "last_hidden_state": -[] (2, 8, 768) matchs (2, 8, 768)
-[] (2, 8, 768) matchs (2, 8, 768) -[] all values close (atol: 0.0001)
-[] all values close (atol: 0.0001) - Validating ONNX Model output "pooler_output":
- Validating ONNX Model output "pooler_output": -[] (2, 768) matchs (2, 768)
-[] (2, 768) matchs (2, 768) -[] all values close (atol: 0.0001)
-[] all values close (atol: 0.0001) All good, model saved at: onnx/bert-base-cased/model.onnx
All good, model saved at: onnx/bert-base-cased/model.onnx ```
This export can now be used in the ONNX inference runtime: This export can now be used in the ONNX inference runtime:
.. code-block:: ```python
import onnxruntime as ort
import onnxruntime as ort
from transformers import BertTokenizerFast from transformers import BertTokenizerFast
tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased") tokenizer = BertTokenizerFast.from_pretrained("bert-base-cased")
ort_session = ort.InferenceSession("onnx/bert-base-cased/model.onnx") ort_session = ort.InferenceSession("onnx/bert-base-cased/model.onnx")
inputs = tokenizer("Using BERT in ONNX!", return_tensors="np") inputs = tokenizer("Using BERT in ONNX!", return_tensors="np")
outputs = ort_session.run(["last_hidden_state", "pooler_output"], dict(inputs)) outputs = ort_session.run(["last_hidden_state", "pooler_output"], dict(inputs))
```
The outputs used (:obj:`["last_hidden_state", "pooler_output"]`) can be obtained by taking a look at the ONNX The outputs used (`["last_hidden_state", "pooler_output"]`) can be obtained by taking a look at the ONNX
configuration of each model. For example, for BERT: configuration of each model. For example, for BERT:
.. code-block:: ```python
from transformers.models.bert import BertOnnxConfig, BertConfig
from transformers.models.bert import BertOnnxConfig, BertConfig
config = BertConfig() config = BertConfig()
onnx_config = BertOnnxConfig(config) onnx_config = BertOnnxConfig(config)
output_keys = list(onnx_config.outputs.keys()) output_keys = list(onnx_config.outputs.keys())
```
Implementing a custom configuration for an unsupported architecture #### Implementing a custom configuration for an unsupported architecture
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's take a look at the changes necessary to add a custom configuration for an unsupported architecture. Firstly, we Let's take a look at the changes necessary to add a custom configuration for an unsupported architecture. Firstly, we
will need a custom ONNX configuration object that details the model inputs and outputs. The BERT ONNX configuration is will need a custom ONNX configuration object that details the model inputs and outputs. The BERT ONNX configuration is
visible below: visible below:
.. code-block:: ```python
class BertOnnxConfig(OnnxConfig):
class BertOnnxConfig(OnnxConfig): @property
@property def inputs(self) -> Mapping[str, Mapping[int, str]]:
def inputs(self) -> Mapping[str, Mapping[int, str]]: return OrderedDict(
return OrderedDict( [
[ ("input_ids", {0: "batch", 1: "sequence"}),
("input_ids", {0: "batch", 1: "sequence"}), ("attention_mask", {0: "batch", 1: "sequence"}),
("attention_mask", {0: "batch", 1: "sequence"}), ("token_type_ids", {0: "batch", 1: "sequence"}),
("token_type_ids", {0: "batch", 1: "sequence"}), ]
] )
)
@property
@property def outputs(self) -> Mapping[str, Mapping[int, str]]:
def outputs(self) -> Mapping[str, Mapping[int, str]]: return OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"}), ("pooler_output", {0: "batch"})])
return OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"}), ("pooler_output", {0: "batch"})]) ```
Let's understand what's happening here. This configuration has two properties: the inputs, and the outputs. Let's understand what's happening here. This configuration has two properties: the inputs, and the outputs.
...@@ -168,144 +162,155 @@ The outputs return a similar dictionary, where, once again, each key corresponds ...@@ -168,144 +162,155 @@ The outputs return a similar dictionary, where, once again, each key corresponds
indicates the axis of that output. indicates the axis of that output.
Once this is done, a single step remains: adding this configuration object to the initialisation of the model class, Once this is done, a single step remains: adding this configuration object to the initialisation of the model class,
and to the general ``transformers`` initialisation. and to the general `transformers` initialisation.
An important fact to notice is the use of `OrderedDict` in both inputs and outputs properties. This is a requirements An important fact to notice is the use of *OrderedDict* in both inputs and outputs properties. This is a requirements
as inputs are matched against their relative position within the `PreTrainedModel.forward()` prototype and outputs are as inputs are matched against their relative position within the *PreTrainedModel.forward()* prototype and outputs are
match against there position in the returned `BaseModelOutputX` instance. match against there position in the returned *BaseModelOutputX* instance.
An example of such an addition is visible here, for the MBart model: `Making MBART ONNX-convertible An example of such an addition is visible here, for the MBart model: [Making MBART ONNX-convertible](https://github.com/huggingface/transformers/pull/13049/commits/d097adcebd89a520f04352eb215a85916934204f)
<https://github.com/huggingface/transformers/pull/13049/commits/d097adcebd89a520f04352eb215a85916934204f>`__
If you would like to contribute your addition to the library, we recommend you implement tests. An example of such If you would like to contribute your addition to the library, we recommend you implement tests. An example of such
tests is visible here: `Adding tests to the MBART ONNX conversion tests is visible here: [Adding tests to the MBART ONNX conversion](https://github.com/huggingface/transformers/pull/13049/commits/5d642f65abf45ceeb72bd855ca7bfe2506a58e6a)
<https://github.com/huggingface/transformers/pull/13049/commits/5d642f65abf45ceeb72bd855ca7bfe2506a58e6a>`__
Graph conversion ### Graph conversion
-----------------------------------------------------------------------------------------------------------------------
.. note:: <Tip>
The approach detailed here is bing deprecated. We recommend you follow the part above for an up to date approach.
The approach detailed here is bing deprecated. We recommend you follow the part above for an up to date approach.
Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources. The </Tip>
following command shows how easy it is to export a BERT model from the library, simply run:
.. code-block:: bash Exporting a model is done through the script *convert_graph_to_onnx.py* at the root of the transformers sources. The
following command shows how easy it is to export a BERT model from the library, simply run:
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx ```bash
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx
```
The conversion tool works for both PyTorch and Tensorflow models and ensures: The conversion tool works for both PyTorch and Tensorflow models and ensures:
* The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint. - The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint.
* The inputs and outputs are correctly generated to their ONNX counterpart. - The inputs and outputs are correctly generated to their ONNX counterpart.
* The generated model can be correctly loaded through onnxruntime. - The generated model can be correctly loaded through onnxruntime.
<Tip>
.. note:: Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations on the
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please open up an issue on
ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please open up an issue on transformers.
transformers.
</Tip>
Also, the conversion tool supports different options which let you tune the behavior of the generated model: Also, the conversion tool supports different options which let you tune the behavior of the generated model:
* **Change the target opset version of the generated model.** (More recent opset generally supports more operators and - **Change the target opset version of the generated model.** (More recent opset generally supports more operators and
enables faster inference) enables faster inference)
* **Export pipeline-specific prediction heads.** (Allow to export model along with its task-specific prediction - **Export pipeline-specific prediction heads.** (Allow to export model along with its task-specific prediction
head(s)) head(s))
* **Use the external data format (PyTorch only).** (Lets you export model which size is above 2Gb (`More info - **Use the external data format (PyTorch only).** (Lets you export model which size is above 2Gb ([More info](https://github.com/pytorch/pytorch/pull/33062)))
<https://github.com/pytorch/pytorch/pull/33062>`_))
Optimizations ### Optimizations
-----------------------------------------------------------------------------------------------------------------------
ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph. Below ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph. Below
are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*): are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*):
* Constant folding - Constant folding
* Attention Layer fusing - Attention Layer fusing
* Skip connection LayerNormalization fusing - Skip connection LayerNormalization fusing
* FastGeLU approximation - FastGeLU approximation
Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances if Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances if
used on another machine with a different hardware configuration than the one used for exporting the model. For this used on another machine with a different hardware configuration than the one used for exporting the model. For this
reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled, ensuring the model can be easily reason, when using `convert_graph_to_onnx.py` optimizations are not enabled, ensuring the model can be easily
exported to various hardware. Optimizations can then be enabled when loading the model through ONNX runtime for exported to various hardware. Optimizations can then be enabled when loading the model through ONNX runtime for
inference. inference.
.. note:: <Tip>
When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the
model because quantization would modify the underlying graph making it impossible for ONNX runtime to do the When quantization is enabled (see below), `convert_graph_to_onnx.py` script will enable optimizations on the
optimizations afterwards. model because quantization would modify the underlying graph making it impossible for ONNX runtime to do the
optimizations afterwards.
.. note:: </Tip>
For more information about the optimizations enabled by ONNXRuntime, please have a look at the `ONNXRuntime Github
<https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_.
Quantization <Tip>
-----------------------------------------------------------------------------------------------------------------------
For more information about the optimizations enabled by ONNXRuntime, please have a look at the [ONNXRuntime Github](https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers).
</Tip>
### Quantization
ONNX exporter supports generating a quantized version of the model to allow efficient inference. ONNX exporter supports generating a quantized version of the model to allow efficient inference.
Quantization works by converting the memory representation of the parameters in the neural network to a compact integer Quantization works by converting the memory representation of the parameters in the neural network to a compact integer
format. By default, weights of a neural network are stored as single-precision float (`float32`) which can express a format. By default, weights of a neural network are stored as single-precision float (*float32*) which can express a
wide-range of floating-point numbers with decent precision. These properties are especially interesting at training wide-range of floating-point numbers with decent precision. These properties are especially interesting at training
where you want fine-grained representation. where you want fine-grained representation.
On the other hand, after the training phase, it has been shown one can greatly reduce the range and the precision of On the other hand, after the training phase, it has been shown one can greatly reduce the range and the precision of
`float32` numbers without changing the performances of the neural network. *float32* numbers without changing the performances of the neural network.
More technically, `float32` parameters are converted to a type requiring fewer bits to represent each number, thus More technically, *float32* parameters are converted to a type requiring fewer bits to represent each number, thus
reducing the overall size of the model. Here, we are enabling `float32` mapping to `int8` values (a non-floating, reducing the overall size of the model. Here, we are enabling *float32* mapping to *int8* values (a non-floating,
single byte, number representation) according to the following formula: single byte, number representation) according to the following formula:
.. math:: $$y_{float32} = scale * x_{int8} - zero\_point$$
y_{float32} = scale * x_{int8} - zero\_point
.. note:: <Tip>
The quantization process will infer the parameter `scale` and `zero_point` from the neural network parameters
The quantization process will infer the parameter *scale* and *zero_point* from the neural network parameters
</Tip>
Leveraging tiny-integers has numerous advantages when it comes to inference: Leveraging tiny-integers has numerous advantages when it comes to inference:
* Storing fewer bits instead of 32 bits for the `float32` reduces the size of the model and makes it load faster. - Storing fewer bits instead of 32 bits for the *float32* reduces the size of the model and makes it load faster.
* Integer operations execute a magnitude faster on modern hardware - Integer operations execute a magnitude faster on modern hardware
* Integer operations require less power to do the computations - Integer operations require less power to do the computations
In order to convert a transformers model to ONNX IR with quantized weights you just need to specify ``--quantize`` when In order to convert a transformers model to ONNX IR with quantized weights you just need to specify `--quantize` when
using ``convert_graph_to_onnx.py``. Also, you can have a look at the ``quantize()`` utility-method in this same script using `convert_graph_to_onnx.py`. Also, you can have a look at the `quantize()` utility-method in this same script
file. file.
Example of quantized BERT model export: Example of quantized BERT model export:
.. code-block:: bash ```bash
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --quantize bert-base-cased.onnx
```
<Tip>
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --quantize bert-base-cased.onnx Quantization support requires ONNX Runtime >= 1.4.0
.. note:: </Tip>
Quantization support requires ONNX Runtime >= 1.4.0
.. note:: <Tip>
When exporting quantized model you will end up with two different ONNX files. The one specified at the end of the
above command will contain the original ONNX model storing `float32` weights. The second one, with ``-quantized``
suffix, will hold the quantized parameters.
When exporting quantized model you will end up with two different ONNX files. The one specified at the end of the
above command will contain the original ONNX model storing *float32* weights. The second one, with `-quantized`
suffix, will hold the quantized parameters.
TorchScript </Tip>
=======================================================================================================================
.. note:: ## TorchScript
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities with
variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming releases,
with more code examples, a more flexible implementation, and benchmarks comparing python-based codes with compiled
TorchScript.
<Tip>
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities with
variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming releases,
with more code examples, a more flexible implementation, and benchmarks comparing python-based codes with compiled
TorchScript.
</Tip>
According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch
code". Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export code". Pytorch's two modules [JIT and TRACE](https://pytorch.org/docs/stable/jit.html) allow the developer to export
their model to be re-used in other programs, such as efficiency-oriented C++ programs. their model to be re-used in other programs, such as efficiency-oriented C++ programs.
We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can be reused We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can be reused
...@@ -314,31 +319,28 @@ TorchScript. ...@@ -314,31 +319,28 @@ TorchScript.
Exporting a model requires two things: Exporting a model requires two things:
* a forward pass with dummy inputs. - a forward pass with dummy inputs.
* model instantiation with the ``torchscript`` flag. - model instantiation with the `torchscript` flag.
These necessities imply several things developers should be careful about. These are detailed below. These necessities imply several things developers should be careful about. These are detailed below.
Implications ### Implications
-----------------------------------------------------------------------------------------------------------------------
TorchScript flag and tied weights ### TorchScript flag and tied weights
-----------------------------------------------------------------------------------------------------------------------
This flag is necessary because most of the language models in this repository have tied weights between their This flag is necessary because most of the language models in this repository have tied weights between their
``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied `Embedding` layer and their `Decoding` layer. TorchScript does not allow the export of models that have tied
weights, therefore it is necessary to untie and clone the weights beforehand. weights, therefore it is necessary to untie and clone the weights beforehand.
This implies that models instantiated with the ``torchscript`` flag have their ``Embedding`` layer and ``Decoding`` This implies that models instantiated with the `torchscript` flag have their `Embedding` layer and `Decoding`
layer separate, which means that they should not be trained down the line. Training would de-synchronize the two layer separate, which means that they should not be trained down the line. Training would de-synchronize the two
layers, leading to unexpected results. layers, leading to unexpected results.
This is not the case for models that do not have a Language Model head, as those do not have tied weights. These models This is not the case for models that do not have a Language Model head, as those do not have tied weights. These models
can be safely exported without the ``torchscript`` flag. can be safely exported without the `torchscript` flag.
Dummy inputs and standard lengths ### Dummy inputs and standard lengths
-----------------------------------------------------------------------------------------------------------------------
The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers, The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used to Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used to
...@@ -348,7 +350,7 @@ The trace is created relatively to the inputs' dimensions. It is therefore const ...@@ -348,7 +350,7 @@ The trace is created relatively to the inputs' dimensions. It is therefore const
input, and will not work for any other sequence length or batch size. When trying with a different size, an error such input, and will not work for any other sequence length or batch size. When trying with a different size, an error such
as: as:
``The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2`` `The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2`
will be raised. It is therefore recommended to trace the model with a dummy input size at least as large as the largest will be raised. It is therefore recommended to trace the model with a dummy input size at least as large as the largest
input that will be fed to the model during inference. Padding can be performed to fill the missing values. As the model input that will be fed to the model during inference. Padding can be performed to fill the missing values. As the model
...@@ -358,75 +360,71 @@ resulting in more calculations. ...@@ -358,75 +360,71 @@ resulting in more calculations.
It is recommended to be careful of the total number of operations done on each input and to follow performance closely It is recommended to be careful of the total number of operations done on each input and to follow performance closely
when exporting varying sequence-length models. when exporting varying sequence-length models.
Using TorchScript in Python ### Using TorchScript in Python
-----------------------------------------------------------------------------------------------------------------------
Below is an example, showing how to save, load models as well as how to use the trace for inference. Below is an example, showing how to save, load models as well as how to use the trace for inference.
Saving a model #### Saving a model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated according
to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
.. code-block:: python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
enc = BertTokenizer.from_pretrained("bert-base-uncased") This snippet shows how to use TorchScript to export a `BertModel`. Here the `BertModel` is instantiated according
to a `BertConfig` class and then saved to disk under the filename `traced_bert.pt`
# Tokenizing input text ```python
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]" from transformers import BertModel, BertTokenizer, BertConfig
tokenized_text = enc.tokenize(text) import torch
# Masking one of the input tokens enc = BertTokenizer.from_pretrained("bert-base-uncased")
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Creating a dummy input # Tokenizing input text
tokens_tensor = torch.tensor([indexed_tokens]) text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
segments_tensors = torch.tensor([segments_ids]) tokenized_text = enc.tokenize(text)
dummy_input = [tokens_tensor, segments_tensors]
# Initializing the model with the torchscript flag # Masking one of the input tokens
# Flag set to True even though it is not necessary as this model does not have an LM Head. masked_index = 8
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768, tokenized_text[masked_index] = '[MASK]'
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, torchscript=True) indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Instantiating the model # Creating a dummy input
model = BertModel(config) tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]
# The model needs to be in evaluation mode # Initializing the model with the torchscript flag
model.eval() # Flag set to True even though it is not necessary as this model does not have an LM Head.
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, torchscript=True)
# If you are instantiating the model with `from_pretrained` you can also easily set the TorchScript flag # Instantiating the model
model = BertModel.from_pretrained("bert-base-uncased", torchscript=True) model = BertModel(config)
# Creating the trace # The model needs to be in evaluation mode
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors]) model.eval()
torch.jit.save(traced_model, "traced_bert.pt")
Loading a model # If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``. # Creating the trace
We are re-using the previously initialised ``dummy_input``. traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")
```
.. code-block:: python #### Loading a model
loaded_model = torch.jit.load("traced_bert.pt") This snippet shows how to load the `BertModel` that was previously saved to disk under the name `traced_bert.pt`.
loaded_model.eval() We are re-using the previously initialised `dummy_input`.
all_encoder_layers, pooled_output = loaded_model(*dummy_input) ```python
loaded_model = torch.jit.load("traced_bert.pt")
loaded_model.eval()
Using a traced model for inference all_encoder_layers, pooled_output = loaded_model(*dummy_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ```
Using the traced model for inference is as simple as using its ``__call__`` dunder method: #### Using a traced model for inference
.. code-block:: python Using the traced model for inference is as simple as using its `__call__` dunder method:
traced_model(tokens_tensor, segments_tensors) ```python
traced_model(tokens_tensor, segments_tensors)
```
...@@ -172,8 +172,8 @@ class AddNewModelCommand(BaseTransformersCLICommand): ...@@ -172,8 +172,8 @@ class AddNewModelCommand(BaseTransformersCLICommand):
os.remove(f"{directory}/test_modeling_flax_{lowercase_model_name}.py") os.remove(f"{directory}/test_modeling_flax_{lowercase_model_name}.py")
shutil.move( shutil.move(
f"{directory}/{lowercase_model_name}.rst", f"{directory}/{lowercase_model_name}.mdx",
f"{path_to_transformer_root}/docs/source/model_doc/{lowercase_model_name}.rst", f"{path_to_transformer_root}/docs/source/model_doc/{lowercase_model_name}.mdx",
) )
shutil.move( shutil.move(
......
...@@ -770,7 +770,7 @@ class MLflowCallback(TrainerCallback): ...@@ -770,7 +770,7 @@ class MLflowCallback(TrainerCallback):
class NeptuneCallback(TrainerCallback): class NeptuneCallback(TrainerCallback):
""" """
A [`TrainerCallback`] that sends the logs to *Neptune <https://neptune.ai>*. A [`TrainerCallback`] that sends the logs to [Neptune](https://neptune.ai).
""" """
def __init__(self): def __init__(self):
......
...@@ -955,8 +955,8 @@ class BeitPyramidPoolingModule(nn.ModuleList): ...@@ -955,8 +955,8 @@ class BeitPyramidPoolingModule(nn.ModuleList):
class BeitUperHead(nn.Module): class BeitUperHead(nn.Module):
""" """
Unified Perceptual Parsing for Scene Understanding. This head is the implementation of `UPerNet Unified Perceptual Parsing for Scene Understanding. This head is the implementation of
<https://arxiv.org/abs/1807.10221>`_. [UPerNet](https://arxiv.org/abs/1807.10221).
Based on OpenMMLab's implementation, found in https://github.com/open-mmlab/mmsegmentation. Based on OpenMMLab's implementation, found in https://github.com/open-mmlab/mmsegmentation.
""" """
...@@ -1040,8 +1040,8 @@ class BeitUperHead(nn.Module): ...@@ -1040,8 +1040,8 @@ class BeitUperHead(nn.Module):
class BeitFCNHead(nn.Module): class BeitFCNHead(nn.Module):
""" """
Fully Convolution Networks for Semantic Segmentation. This head is implemented of `FCNNet Fully Convolution Networks for Semantic Segmentation. This head is implemented of
<https://arxiv.org/abs/1411.4038>`_. [FCNNet](https://arxiv.org/abs/1411.4038>).
Args: Args:
config (BeitConfig): Configuration. config (BeitConfig): Configuration.
......
...@@ -441,7 +441,7 @@ class BertweetTokenizer(PreTrainedTokenizer): ...@@ -441,7 +441,7 @@ class BertweetTokenizer(PreTrainedTokenizer):
# Author: Christopher Potts <cgpotts@stanford.edu> # Author: Christopher Potts <cgpotts@stanford.edu>
# Ewan Klein <ewan@inf.ed.ac.uk> (modifications) # Ewan Klein <ewan@inf.ed.ac.uk> (modifications)
# Pierpaolo Pantone <> (modifications) # Pierpaolo Pantone <> (modifications)
# URL: <http://nltk.org/> # URL: http://nltk.org/
# For license information, see LICENSE.TXT # For license information, see LICENSE.TXT
# #
......
...@@ -33,7 +33,7 @@ class CpmTokenizer(XLNetTokenizer): ...@@ -33,7 +33,7 @@ class CpmTokenizer(XLNetTokenizer):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
""" """
Construct a CPM tokenizer. Based on *Jieba <https://pypi.org/project/jieba/>* and [SentencePiece](https://github.com/google/sentencepiece). Construct a CPM tokenizer. Based on [Jieba](https://pypi.org/project/jieba/) and [SentencePiece](https://github.com/google/sentencepiece).
This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main
methods. Users should refer to this superclass for more information regarding those methods. methods. Users should refer to this superclass for more information regarding those methods.
......
...@@ -36,7 +36,7 @@ class CpmTokenizerFast(XLNetTokenizerFast): ...@@ -36,7 +36,7 @@ class CpmTokenizerFast(XLNetTokenizerFast):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
""" """
Construct a CPM tokenizer. Based on *Jieba <https://pypi.org/project/jieba/>* and [SentencePiece](https://github.com/google/sentencepiece). Construct a CPM tokenizer. Based on [Jieba](https://pypi.org/project/jieba/) and [SentencePiece](https://github.com/google/sentencepiece).
This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main
methods. Users should refer to this superclass for more information regarding those methods. methods. Users should refer to this superclass for more information regarding those methods.
......
...@@ -518,8 +518,8 @@ FNET_INPUTS_DOCSTRING = r""" ...@@ -518,8 +518,8 @@ FNET_INPUTS_DOCSTRING = r"""
class FNetModel(FNetPreTrainedModel): class FNetModel(FNetPreTrainedModel):
""" """
The model can behave as an encoder, following the architecture described in `FNet: Mixing Tokens with Fourier The model can behave as an encoder, following the architecture described in [FNet: Mixing Tokens with Fourier
Transforms <https://arxiv.org/abs/2105.03824>`__ by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago
Ontanon. Ontanon.
""" """
......
...@@ -64,8 +64,8 @@ def _compute_mask_indices( ...@@ -64,8 +64,8 @@ def _compute_mask_indices(
min_masks: int = 0, min_masks: int = 0,
) -> np.ndarray: ) -> np.ndarray:
""" """
Computes random mask spans for a given shape. Used to implement `SpecAugment: A Simple Data Augmentation Method for Computes random mask spans for a given shape. Used to implement [SpecAugment: A Simple Data Augmentation Method for
ASR <https://arxiv.org/abs/1904.08779>`__. Note that this method is not optimized to run on TPU and should be run ASR](https://arxiv.org/abs/1904.08779). Note that this method is not optimized to run on TPU and should be run
on CPU as part of the preprocessing during training. on CPU as part of the preprocessing during training.
Args: Args:
...@@ -923,8 +923,8 @@ class HubertModel(HubertPreTrainedModel): ...@@ -923,8 +923,8 @@ class HubertModel(HubertPreTrainedModel):
attention_mask: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.LongTensor] = None,
): ):
""" """
Masks extracted features along time axis and/or along feature axis according to `SpecAugment Masks extracted features along time axis and/or along feature axis according to
<https://arxiv.org/abs/1904.08779>`__ . [SpecAugment](https://arxiv.org/abs/1904.08779).
""" """
# `config.apply_spec_augment` can set masking to False # `config.apply_spec_augment` can set masking to False
......
...@@ -212,8 +212,8 @@ def _compute_mask_indices( ...@@ -212,8 +212,8 @@ def _compute_mask_indices(
mask_length: size of the mask mask_length: size of the mask
min_masks: minimum number of masked spans min_masks: minimum number of masked spans
Adapted from `fairseq's data_utils.py Adapted from [fairseq's
<https://github.com/pytorch/fairseq/blob/e0788f7007a8473a76db573985031f3c94201e79/fairseq/data/data_utils.py#L376>`__. data_utils.py](https://github.com/pytorch/fairseq/blob/e0788f7007a8473a76db573985031f3c94201e79/fairseq/data/data_utils.py#L376).
""" """
batch_size, sequence_length = shape batch_size, sequence_length = shape
...@@ -1146,8 +1146,8 @@ class TFHubertMainLayer(tf.keras.layers.Layer): ...@@ -1146,8 +1146,8 @@ class TFHubertMainLayer(tf.keras.layers.Layer):
def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: Optional[tf.Tensor] = None): def _mask_hidden_states(self, hidden_states: tf.Tensor, mask_time_indices: Optional[tf.Tensor] = None):
""" """
Masks extracted features along time axis and/or along feature axis according to `SpecAugment Masks extracted features along time axis and/or along feature axis according to
<https://arxiv.org/abs/1904.08779>`__ . [SpecAugment](https://arxiv.org/abs/1904.08779).
""" """
batch_size, sequence_length, hidden_size = shape_list(hidden_states) batch_size, sequence_length, hidden_size = shape_list(hidden_states)
......
...@@ -734,8 +734,8 @@ class IBertModel(IBertPreTrainedModel): ...@@ -734,8 +734,8 @@ class IBertModel(IBertPreTrainedModel):
""" """
The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
cross-attention is added between the self-attention layers, following the architecture described in `Attention is cross-attention is added between the self-attention layers, following the architecture described in [Attention is
all you need <https://arxiv.org/abs/1706.03762>`__ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, all you need](https://arxiv.org/abs/1706.03762) by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin. Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.
""" """
......
...@@ -966,7 +966,7 @@ class LayoutLMForMaskedLM(LayoutLMPreTrainedModel): ...@@ -966,7 +966,7 @@ class LayoutLMForMaskedLM(LayoutLMPreTrainedModel):
@add_start_docstrings( @add_start_docstrings(
""" """
LayoutLM Model with a sequence classification head on top (a linear layer on top of the pooled output) e.g. for LayoutLM Model with a sequence classification head on top (a linear layer on top of the pooled output) e.g. for
document image classification tasks such as the `RVL-CDIP <https://www.cs.cmu.edu/~aharley/rvl-cdip/>`__ dataset. document image classification tasks such as the [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/) dataset.
""", """,
LAYOUTLM_START_DOCSTRING, LAYOUTLM_START_DOCSTRING,
) )
...@@ -1096,8 +1096,8 @@ class LayoutLMForSequenceClassification(LayoutLMPreTrainedModel): ...@@ -1096,8 +1096,8 @@ class LayoutLMForSequenceClassification(LayoutLMPreTrainedModel):
@add_start_docstrings( @add_start_docstrings(
""" """
LayoutLM Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for LayoutLM Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for
sequence labeling (information extraction) tasks such as the `FUNSD <https://guillaumejaume.github.io/FUNSD/>`__ sequence labeling (information extraction) tasks such as the [FUNSD](https://guillaumejaume.github.io/FUNSD/)
dataset and the `SROIE <https://rrc.cvc.uab.es/?ch=13>`__ dataset. dataset and the [SROIE](https://rrc.cvc.uab.es/?ch=13) dataset.
""", """,
LAYOUTLM_START_DOCSTRING, LAYOUTLM_START_DOCSTRING,
) )
......
...@@ -943,8 +943,8 @@ class LayoutLMv2Model(LayoutLMv2PreTrainedModel): ...@@ -943,8 +943,8 @@ class LayoutLMv2Model(LayoutLMv2PreTrainedModel):
""" """
LayoutLMv2 Model with a sequence classification head on top (a linear layer on top of the concatenation of the LayoutLMv2 Model with a sequence classification head on top (a linear layer on top of the concatenation of the
final hidden state of the [CLS] token, average-pooled initial visual embeddings and average-pooled final visual final hidden state of the [CLS] token, average-pooled initial visual embeddings and average-pooled final visual
embeddings, e.g. for document image classification tasks such as the `RVL-CDIP embeddings, e.g. for document image classification tasks such as the
<https://www.cs.cmu.edu/~aharley/rvl-cdip/>`__ dataset. [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/) dataset.
""", """,
LAYOUTLMV2_START_DOCSTRING, LAYOUTLMV2_START_DOCSTRING,
) )
...@@ -1110,9 +1110,9 @@ class LayoutLMv2ForSequenceClassification(LayoutLMv2PreTrainedModel): ...@@ -1110,9 +1110,9 @@ class LayoutLMv2ForSequenceClassification(LayoutLMv2PreTrainedModel):
@add_start_docstrings( @add_start_docstrings(
""" """
LayoutLMv2 Model with a token classification head on top (a linear layer on top of the text part of the hidden LayoutLMv2 Model with a token classification head on top (a linear layer on top of the text part of the hidden
states) e.g. for sequence labeling (information extraction) tasks such as `FUNSD states) e.g. for sequence labeling (information extraction) tasks such as
<https://guillaumejaume.github.io/FUNSD/>`__, `SROIE <https://rrc.cvc.uab.es/?ch=13>`__, `CORD [FUNSD](https://guillaumejaume.github.io/FUNSD/), [SROIE](https://rrc.cvc.uab.es/?ch=13),
<https://github.com/clovaai/cord>`__ and `Kleister-NDA <https://github.com/applicaai/kleister-nda>`__. [CORD](https://github.com/clovaai/cord) and [Kleister-NDA](https://github.com/applicaai/kleister-nda).
""", """,
LAYOUTLMV2_START_DOCSTRING, LAYOUTLMV2_START_DOCSTRING,
) )
...@@ -1226,8 +1226,8 @@ class LayoutLMv2ForTokenClassification(LayoutLMv2PreTrainedModel): ...@@ -1226,8 +1226,8 @@ class LayoutLMv2ForTokenClassification(LayoutLMv2PreTrainedModel):
@add_start_docstrings( @add_start_docstrings(
""" """
LayoutLMv2 Model with a span classification head on top for extractive question-answering tasks such as `DocVQA LayoutLMv2 Model with a span classification head on top for extractive question-answering tasks such as
<https://rrc.cvc.uab.es/?ch=17>`__ (a linear layer on top of the text part of the hidden-states output to compute [DocVQA](https://rrc.cvc.uab.es/?ch=17) (a linear layer on top of the text part of the hidden-states output to compute
`span start logits` and `span end logits`). `span start logits` and `span end logits`).
""", """,
LAYOUTLMV2_START_DOCSTRING, LAYOUTLMV2_START_DOCSTRING,
......
...@@ -55,7 +55,7 @@ PROPHETNET_START_DOCSTRING = r""" ...@@ -55,7 +55,7 @@ PROPHETNET_START_DOCSTRING = r"""
methods the library implements for all its model (such as downloading or saving, resizing the input embeddings, methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
pruning heads etc.) pruning heads etc.)
Original ProphetNet code can be found at <https://github.com/microsoft/ProphetNet> . Checkpoints were converted Original ProphetNet code can be found [here](https://github.com/microsoft/ProphetNet). Checkpoints were converted
from original Fairseq checkpoints. For more information on the checkpoint conversion, please take a look at the from original Fairseq checkpoints. For more information on the checkpoint conversion, please take a look at the
file `convert_prophetnet_original_pytorch_checkpoint_to_pytorch.py`. file `convert_prophetnet_original_pytorch_checkpoint_to_pytorch.py`.
......
...@@ -214,8 +214,8 @@ class RetrievAugLMOutput(ModelOutput): ...@@ -214,8 +214,8 @@ class RetrievAugLMOutput(ModelOutput):
class RagPreTrainedModel(PreTrainedModel): class RagPreTrainedModel(PreTrainedModel):
r""" r"""
RAG models were released with the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks RAG models were released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP
<https://arxiv.org/abs/2005.11401>`_ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al. Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a
generator, the encoder and generator are trainable while the retriever is just an indexed dataset. generator, the encoder and generator are trainable while the retriever is just an indexed dataset.
......
...@@ -200,8 +200,8 @@ class TFRetrievAugLMOutput(ModelOutput): ...@@ -200,8 +200,8 @@ class TFRetrievAugLMOutput(ModelOutput):
class TFRagPreTrainedModel(TFPreTrainedModel): class TFRagPreTrainedModel(TFPreTrainedModel):
r""" r"""
RAG models were released with the paper `Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks RAG models were released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP
<https://arxiv.org/abs/2005.11401>`__ by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al. Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.
RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a RAG is a retriever augmented model and encapsulate three components: a question encoder, a dataset retriever and a
generator, the encoder and generator are trainable while the retriever is just an indexed dataset. generator, the encoder and generator are trainable while the retriever is just an indexed dataset.
......
...@@ -60,7 +60,7 @@ PRETRAINED_INIT_CONFIGURATION = { ...@@ -60,7 +60,7 @@ PRETRAINED_INIT_CONFIGURATION = {
class RoFormerTokenizer(PreTrainedTokenizer): class RoFormerTokenizer(PreTrainedTokenizer):
r""" r"""
Construct a RoFormer tokenizer. Based on *Rust Jieba <https://pypi.org/project/rjieba/>*. Construct a RoFormer tokenizer. Based on [Rust Jieba](https://pypi.org/project/rjieba/).
This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods. This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
Users should refer to this superclass for more information regarding those methods. Users should refer to this superclass for more information regarding those methods.
......
...@@ -62,8 +62,8 @@ def _compute_mask_indices( ...@@ -62,8 +62,8 @@ def _compute_mask_indices(
min_masks: int = 0, min_masks: int = 0,
) -> np.ndarray: ) -> np.ndarray:
""" """
Computes random mask spans for a given shape. Used to implement `SpecAugment: A Simple Data Augmentation Method for Computes random mask spans for a given shape. Used to implement [SpecAugment: A Simple Data Augmentation Method for
ASR <https://arxiv.org/abs/1904.08779>`__. Note that this method is not optimized to run on TPU and should be run ASR](https://arxiv.org/abs/1904.08779). Note that this method is not optimized to run on TPU and should be run
on CPU as part of the preprocessing during training. on CPU as part of the preprocessing during training.
Args: Args:
...@@ -820,8 +820,8 @@ class SEWModel(SEWPreTrainedModel): ...@@ -820,8 +820,8 @@ class SEWModel(SEWPreTrainedModel):
attention_mask: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.LongTensor] = None,
): ):
""" """
Masks extracted features along time axis and/or along feature axis according to `SpecAugment Masks extracted features along time axis and/or along feature axis according to
<https://arxiv.org/abs/1904.08779>`__ . [SpecAugment](https://arxiv.org/abs/1904.08779).
""" """
# `config.apply_spec_augment` can set masking to False # `config.apply_spec_augment` can set masking to False
......
...@@ -68,8 +68,8 @@ def _compute_mask_indices( ...@@ -68,8 +68,8 @@ def _compute_mask_indices(
min_masks: int = 0, min_masks: int = 0,
) -> np.ndarray: ) -> np.ndarray:
""" """
Computes random mask spans for a given shape. Used to implement `SpecAugment: A Simple Data Augmentation Method for Computes random mask spans for a given shape. Used to implement [SpecAugment: A Simple Data Augmentation Method for
ASR <https://arxiv.org/abs/1904.08779>`__. Note that this method is not optimized to run on TPU and should be run ASR](https://arxiv.org/abs/1904.08779). Note that this method is not optimized to run on TPU and should be run
on CPU as part of the preprocessing during training. on CPU as part of the preprocessing during training.
Args: Args:
...@@ -1352,8 +1352,8 @@ class SEWDModel(SEWDPreTrainedModel): ...@@ -1352,8 +1352,8 @@ class SEWDModel(SEWDPreTrainedModel):
attention_mask: Optional[torch.LongTensor] = None, attention_mask: Optional[torch.LongTensor] = None,
): ):
""" """
Masks extracted features along time axis and/or along feature axis according to `SpecAugment Masks extracted features along time axis and/or along feature axis according to
<https://arxiv.org/abs/1904.08779>`__ . [SpecAugment](https://arxiv.org/abs/1904.08779).
""" """
# `config.apply_spec_augment` can set masking to False # `config.apply_spec_augment` can set masking to False
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment