Unverified Commit 640550fc authored by Funtowicz Morgan's avatar Funtowicz Morgan Committed by GitHub
Browse files

ONNX documentation (#5992)



* Move torchscript and add ONNX documentation under modle_export
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>

* Fix invalid link
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>

* Improve spelling
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>

* Final wording pass
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
parent 92f8ce2e
...@@ -157,8 +157,8 @@ conversion utilities for the following models: ...@@ -157,8 +157,8 @@ conversion utilities for the following models:
notebooks notebooks
converting_tensorflow_models converting_tensorflow_models
migration migration
torchscript
contributing contributing
serialization
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
......
**********************************************
Exporting transformers models
**********************************************
ONNX / ONNXRuntime
==============================================
Projects ONNX (Open Neural Network eXchange) and ONNXRuntime (ORT) are part of an effort from leading industries in the AI field
to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
of hardware and dedicated optimizations.
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to
the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines using
Hugging Face Transformers and ONNX Runtime <https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources.
The following command shows how easy it is to export a BERT model from the library, simply run:
.. code-block:: bash
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx
The conversion tool works for both PyTorch and Tensorflow models and ensures:
* The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint.
* The inputs and outputs are correctly generated to their ONNX counterpart.
* The generated model can be correctly loaded through onnxruntime.
.. note::
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations
on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please
open up an issue on transformers.
Also, the conversion tool supports different options which let you tune the behavior of the generated model:
* Change the target opset version of the generated model: More recent opset generally supports more operator and enables faster inference.
* Export pipeline specific prediction heads: Allow to export model along with its task-specific prediction head(s).
* Use the external data format (PyTorch only): Lets you export model which size is above 2Gb (`More info <https://github.com/pytorch/pytorch/pull/33062>`_).
TorchScript TorchScript
================================================ =======================================
.. note:: .. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
...@@ -25,7 +64,7 @@ These necessities imply several things developers should be careful about. These ...@@ -25,7 +64,7 @@ These necessities imply several things developers should be careful about. These
Implications Implications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ------------------------------------------------
TorchScript flag and tied weights TorchScript flag and tied weights
------------------------------------------------ ------------------------------------------------
...@@ -62,12 +101,12 @@ It is recommended to be careful of the total number of operations done on each i ...@@ -62,12 +101,12 @@ It is recommended to be careful of the total number of operations done on each i
when exporting varying sequence-length models. when exporting varying sequence-length models.
Using TorchScript in Python Using TorchScript in Python
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -------------------------------------------------
Below are examples of using the Python to save, load models as well as how to use the trace for inference. Below are examples of using the Python to save, load models as well as how to use the trace for inference.
Saving a model Saving a model
------------------------------------------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt`` according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
...@@ -113,7 +152,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename `` ...@@ -113,7 +152,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename ``
torch.jit.save(traced_model, "traced_bert.pt") torch.jit.save(traced_model, "traced_bert.pt")
Loading a model Loading a model
------------------------------------------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``. This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
We are re-using the previously initialised ``dummy_input``. We are re-using the previously initialised ``dummy_input``.
...@@ -126,7 +165,7 @@ We are re-using the previously initialised ``dummy_input``. ...@@ -126,7 +165,7 @@ We are re-using the previously initialised ``dummy_input``.
all_encoder_layers, pooled_output = loaded_model(dummy_input) all_encoder_layers, pooled_output = loaded_model(dummy_input)
Using a traced model for inference Using a traced model for inference
------------------------------------------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using the traced model for inference is as simple as using its ``__call__`` dunder method: Using the traced model for inference is as simple as using its ``__call__`` dunder method:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment