Unverified Commit 640550fc authored by Funtowicz Morgan's avatar Funtowicz Morgan Committed by GitHub
Browse files

ONNX documentation (#5992)



* Move torchscript and add ONNX documentation under modle_export
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc
Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>

* Fix invalid link
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>

* Improve spelling
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>

* Final wording pass
Signed-off-by: default avatarMorgan Funtowicz <morgan@huggingface.co>
parent 92f8ce2e
......@@ -157,8 +157,8 @@ conversion utilities for the following models:
notebooks
converting_tensorflow_models
migration
torchscript
contributing
serialization
.. toctree::
:maxdepth: 2
......
**********************************************
Exporting transformers models
**********************************************
ONNX / ONNXRuntime
==============================================
Projects ONNX (Open Neural Network eXchange) and ONNXRuntime (ORT) are part of an effort from leading industries in the AI field
to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
of hardware and dedicated optimizations.
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to
the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines using
Hugging Face Transformers and ONNX Runtime <https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources.
The following command shows how easy it is to export a BERT model from the library, simply run:
.. code-block:: bash
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx
The conversion tool works for both PyTorch and Tensorflow models and ensures:
* The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint.
* The inputs and outputs are correctly generated to their ONNX counterpart.
* The generated model can be correctly loaded through onnxruntime.
.. note::
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations
on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please
open up an issue on transformers.
Also, the conversion tool supports different options which let you tune the behavior of the generated model:
* Change the target opset version of the generated model: More recent opset generally supports more operator and enables faster inference.
* Export pipeline specific prediction heads: Allow to export model along with its task-specific prediction head(s).
* Use the external data format (PyTorch only): Lets you export model which size is above 2Gb (`More info <https://github.com/pytorch/pytorch/pull/33062>`_).
TorchScript
================================================
=======================================
.. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
......@@ -25,7 +64,7 @@ These necessities imply several things developers should be careful about. These
Implications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
------------------------------------------------
TorchScript flag and tied weights
------------------------------------------------
......@@ -62,12 +101,12 @@ It is recommended to be careful of the total number of operations done on each i
when exporting varying sequence-length models.
Using TorchScript in Python
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-------------------------------------------------
Below are examples of using the Python to save, load models as well as how to use the trace for inference.
Saving a model
------------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
......@@ -113,7 +152,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename ``
torch.jit.save(traced_model, "traced_bert.pt")
Loading a model
------------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
We are re-using the previously initialised ``dummy_input``.
......@@ -126,7 +165,7 @@ We are re-using the previously initialised ``dummy_input``.
all_encoder_layers, pooled_output = loaded_model(dummy_input)
Using a traced model for inference
------------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using the traced model for inference is as simple as using its ``__call__`` dunder method:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment