Commit b105f2c6 authored by Morgan Funtowicz's avatar Morgan Funtowicz
Browse files

Update ONNX doc to match the removal of --optimize argument.


Signed-off-by: default avatarMorgan Funtowicz <funtowiczmo@gmail.com>
parent e5f45227
...@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu ...@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu
* Skip connection LayerNormalization fusing * Skip connection LayerNormalization fusing
* FastGeLU approximation * FastGeLU approximation
Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances
if used on another machine with a different hardware configuration than the one used for exporting the model.
For this reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled,
ensuring the model can be easily exported to various hardware.
Optimizations can then be enabled when loading the model through ONNX runtime for inference.
Fortunately, you can let ONNXRuntime find all the possible optimized operators for you. Simply add ``--optimize``
when exporting your model through ``convert_graph_to_onnx.py``.
Example: .. note::
When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the model
.. code-block:: bash because quantization would modify the underlying graph making it impossible for ONNX runtime to do the optimizations
afterwards.
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --optimize bert-base-cased.onnx
.. note:: .. note::
For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_) For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
...@@ -112,8 +114,6 @@ Example of quantized BERT model export: ...@@ -112,8 +114,6 @@ Example of quantized BERT model export:
above command will contain the original ONNX model storing `float32` weights. above command will contain the original ONNX model storing `float32` weights.
The second one, with ``-quantized`` suffix, will hold the quantized parameters. The second one, with ``-quantized`` suffix, will hold the quantized parameters.
.. note::
The quantization export gives the best performances when used in combination with ``--optimize``.
TorchScript TorchScript
======================================= =======================================
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment